Find duplicates BY CONTENT - EVERYTHING TIP 024

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 16438
Joined: 24 Jan 2010, 23:23
Location: brings.slot.perky

Find duplicates BY CONTENT - EVERYTHING TIP 024

Post by ChrisGreaves »

Code: Select all

t:\music\  *.mp3 dupe:size;sha256
Here is a search string looking for duplicates within my folder T:\MUSIC, and extensions of MP3.
A similar string could be used to search image files with extensions JPG in the folder T:\Camera.
sha256 is another way of asking for content, but whereas Content: reads the file (a slow and laborious process), the SHA256 uses, I think, a pre-calculated hash key of the contents.

We are asking for duplicates based on size AND content(hash).
Wise users first run the search without the SHA256 to get a good idea of files that are duplicated by size (takes seconds) and only then apply the SHA256 search term (takes minutes) rather than Content (takes hours!)
Dupes_SHA_a.jpg
Here is my result pane, sorted by Size.
28 items are found, which suggests 14 duplicate pairs (no triplets in this case)
Note with caution that first entry; generating a hash code can create false positives (unless some fiend renamed one of my duplicate copies of "Shostakovitch - Symphony No 05").

Note too that neither SHA256 nor Content: is going to be a lot of help if you have trimmed the applause from the start and end of an audio track, or edited the border from one copy of a JPEG image. Those are false negatives - I can have two copies of "Shostakovitch - Symphony No 05" that will not appear as duplicates because one has the applause trimmed.
Cheers, Chris
You do not have the required permissions to view the files attached to this post.
Most of my hair had already fallen out by the time I learned that mousse is spelled with two esses