Organizing images (or any files)

This is a bit stretching, but I have an interesting (for me) programming (err ... scripting? Algorithmic? Organization?) Problem. (I note this in Ruby, because of my preference for Ruby for scripting.)

Imagine you have 100 gigabytes of images floating around on multiple drives. Probably a total of 25 gigabytes of unique images. The rest are either duplicates (with the same name), duplicates (with a different name), or smaller versions of the image (exported for email). Of course, apart from being on multiple drives, they are also in different folder structures. For example, img_0123.jpg can exist (in the Windows world) as c: \ users \ username \ pics \ 2008 \ img_0123.jpg, c: \ pics \ 2008 \ img_0123.jpg, c: \ pics \ export \ img_0123- email. jpg and d: \ pics \ europe_2008 \ venice \ bungy_jumping_off_st_marks.jpg.

The day we had to put everything in folders and rename them to small names (like above). Today, search and tagging takes care of all of this and is overkill (and makes organization difficult).

In the past, I've tried to move everything to one disk, wrote a ruby โ€‹โ€‹script to scan for duplicates (I don't trust these dupfinder programs - I ran one and it started deleting everything!), And tried reorganizing them. However, after a few days I refused (at the stage of organizing and manually deleting).

I'm going to start a new thought. First, copy all the images from all my drives to a new drive, into one folder. Anything that has a duplicate filename will need to be manually checked. Then launch Picasa and manually scan the files and remove duplicates (using the good ol 'noggen).

However, I am very unhappy that I could not easily solve this problem, and I am interested to hear some other solutions, both programmatically and otherwise (perhaps writing code is not the best solution, sigh!).

+1


source to share


3 answers


I love my photos for sorting by date, so I wrote a groovy script to look at the EXIF โ€‹โ€‹data of the images and put it in directories in ISO date format (2008-12-11). He keeps them organized. It doesn't solve tagging by content, but I use flickr for that.



Regarding the duplication problem, the checksum would have reduced the number of images you would have to sort manually, but unfortunately it didn't pick up the resized images. Could you be looking for a less harsh cheat seeker that doesn't automatically remove duplicates? Make sure to make a backup before testing: p

+5


source


Do you think you are taking an md5 checksum for each file and defining duplicates that way? If you've done this, you don't have to manually resolve duplicates.



I would check each file and check against a dictionary of already processed files. If it appears as a duplicate, I will write it off to the duplicates directory rather than delete it entirely.

+2


source


You can use something like Exiftool that exists for Windows to reorganize your images according to CaptureTime (this is my own schema) or any other Exif options found inside the JPG or RAW file. You should be able to find duplicates easily.

+1


source







All Articles