• I inherited a WordPress site which has been in operation for a few years. The uploads directory has ballooned to 13Gb, which makes taking backups a bit tricky, so I decided to do a bit of investigation. I looked at a couple of plugins, but they had wildly differing reviews: some said they worked, some said they destroyed their website (ahem, backups), so I decided to roll up my sleeves.

    I made a file listing of the images in the /wp-content/uploads directory and put it in a text file. I took a dump of the whole WP database and left that in an SQL file. (Also big: 700Mb!). Then I wrote a bash script to loop through the file list, take a line and then grep that over the whole sql dump. I realise this was inefficient, but I left it going overnight, and it took about 20 hours to complete! The quick and dirty script looks like this if anyone is interested.

    #!/bin/bash
    
    echo "" > not-used.txt
    echo "" > used.txt
    
    while read FILE; 
    do 
       COUNT=$(grep -c "$FILE" dbdump.sql)
       if [ "$COUNT" = "0" ] ; then 
    	echo $COUNT $FILE >> not-used.txt
        else
            echo $COUNT $FILE >> used.txt
       fi
    done<uploads-list2.txt

    The results were astonishing. For each image I’d get this in the “used” file

    1 uploads/2016/10/A1610578_medium.jpg

    And then this in the not-used file:
    0 uploads/2016/10/A1610578_medium-1000×480.jpg
    0 uploads/2016/10/A1610578_medium-1000×500.jpg
    0 uploads/2016/10/A1610578_medium-100×70.jpg
    0 uploads/2016/10/A1610578_medium-150×150.jpg
    0 uploads/2016/10/A1610578_medium-300×212.jpg
    0 uploads/2016/10/A1610578_medium-440×212.jpg
    0 uploads/2016/10/A1610578_medium-440×250.jpg
    0 uploads/2016/10/A1610578_medium-440×310.jpg
    0 uploads/2016/10/A1610578_medium-440×435.jpg
    0 uploads/2016/10/A1610578_medium-440×621.jpg
    0 uploads/2016/10/A1610578_medium-570×200.jpg
    0 uploads/2016/10/A1610578_medium-570×402.jpg
    0 uploads/2016/10/A1610578_medium-600×425.jpg
    0 uploads/2016/10/A1610578_medium-750×300.jpg
    0 uploads/2016/10/A1610578_medium-750×400.jpg
    0 uploads/2016/10/A1610578_medium-757×370.jpg
    0 uploads/2016/10/A1610578_medium-768×544.jpg

    So basically each image has been resized into a variety of different sizes, none of which are referenced in the SQL dump.

    I have a number of questions.
    1) Can I delete all these resized images? Can I delete some of them?
    2) If I do delete them, will they be regenerated as needed?
    3) They’ve been generated by a theme I understand, but given that the website has been through several themes, there are likely to be ones that aren’t needed. How would I find out what image sizes the current theme requires?

    4) If it turns out its a generally bad idea to delete any of the resized images, would it be fair to assume that if a master image appears in the not-listed.txt file, (ie the one without all the size info at the end eg uploads/2016/10/A1610578_medium.jpg in the example above), then it and all the resized versions of it can safely be deleted?

    • This topic was modified 7 years, 6 months ago by peripatetic. Reason: spelling mistake
Viewing 1 replies (of 1 total)
Viewing 1 replies (of 1 total)
  • The topic ‘Manual Cleaning of Images from Uploads: need sanity check’ is closed to new replies.