Delete all duplicate files in a given folder.
It is difficult to remove existing duplicates in a folder, especially on machines which have been around longer where folders tend to get bloated with files. I made this to automate the task of removing duplicates safely based on file hash. The user inputs the chosen folder and the script iterates through and check for files, ignoring subfolders. The hash function is run for every file and each file hash is stored in a list. Therefore, the script looks through the folder and removes anything that is not the first stored hash instance i.e. any second or more instances of the file is removed.
- Hashlib to generate file hashes. Chosen algorithms are MD5 and SHA256
- Miscellaneous operating system interfaces for directory traverals and file deletions
Save the raw code and run the python script. The following is a demo setup with this folder prepared:
Enter filepath to sort folder by file type: C:\Users\Music\test_desktop
In this demo setup, a demo folder was created with only 2 original files, 'same content.txt' and 'different content.txt' along with their copies and other folders:
After inputting chosen folder by supplying its filepath, the script will show the files it deletes as they have the same md5 hash. Note that the script will keep the first instance of the file that its hash it stores i.e. name of the original file might be different, but the contents will still be the same nonetheless:
Enter filepath to sort folder by file type:C:\Users\Music\test_desktop
Removing duplicates: different content - Copy.txt
Removing duplicates: different content.txt
Removing duplicates: same content.txt
3 file duplicates deleted in C:\Users\Music\test_desktop
We have removed duplicates successfully. Any further tries on that same folder will show no duplicates found:
Enter filepath to sort folder by file type:C:\Users\Music\test_desktop
No duplicates found in C:\Users\Music\test_desktop
- Uploaded sha256_duplicate_remover.py. Upgraded from md5 to sha256 for lesser chance of collision and deleting something wrongly.
- Added some imagery to the script