It is a common requirement to find and replace duplicate files for most of the computer users. Finding and removing duplicate files is a tiresome job that demands time and patience. Finding duplicate files can be very easy if your machine is powered by GNU/Linux, thanks to ‘fdupes‘ utility.
What is fdupes?
Fdupes is a Linux utility written by Adrian Lopez in C programming Language released under MIT License. The application is able to find duplicate files in the given set of directories and sub-directories. Fdupes recognize duplicates by comparing MD5 signature of files followed by a byte-to-byte comparison. A lots of options can be passed with Fdupes to list, delete and replace the files with hardlinks to duplicates.
The comparison starts in the order:
size comparison > Partial MD5 Signature Comparison > Full MD5 Signature Comparison > Byte-to-Byte Comparison.
Install fdupes on a Linux
Installation of latest version of fdupes (fdupes version 1.51) as easy as running following command on Debian based systems such as Ubuntu and Linux Mint.
$ sudo apt-get install fdupes
On CentOS/RHEL and Fedora based systems, you need to turn on epel repository to install fdupes package.
# yum install fdupes # dnf install fdupes [On Fedora 22 onwards]
Note: The default package manager yum is replaced by dnf from Fedora 22 onwards…
How to use fdupes command?
1. For demonstration purpose, let’s a create few duplicate files under a directory (say tecmint) simply as:
$ mkdir /home/"$USER"/Desktop/tecmint && cd /home/"$USER"/Desktop/tecmint && for i in {1..15}; do echo "I Love Tecmint. Tecmint is a very nice community of Linux Users." > tecmint${i}.txt ; done
After running above command, let’s verify the duplicates files are created or not using ls command.
$ ls -l total 60 -rw-r--r-- 1 tecmint tecmint 65 Aug 8 11:22 tecmint10.txt -rw-r--r-- 1 tecmint tecmint 65 Aug 8 11:22 tecmint11.txt -rw-r--r-- 1 tecmint tecmint 65 Aug 8 11:22 tecmint12.txt -rw-r--r-- 1 tecmint tecmint 65 Aug 8 11:22 tecmint13.txt -rw-r--r-- 1 tecmint tecmint 65 Aug 8 11:22 tecmint14.txt -rw-r--r-- 1 tecmint tecmint 65 Aug 8 11:22 tecmint15.txt -rw-r--r-- 1 tecmint tecmint 65 Aug 8 11:22 tecmint1.txt -rw-r--r-- 1 tecmint tecmint 65 Aug 8 11:22 tecmint2.txt -rw-r--r-- 1 tecmint tecmint 65 Aug 8 11:22 tecmint3.txt -rw-r--r-- 1 tecmint tecmint 65 Aug 8 11:22 tecmint4.txt -rw-r--r-- 1 tecmint tecmint 65 Aug 8 11:22 tecmint5.txt -rw-r--r-- 1 tecmint tecmint 65 Aug 8 11:22 tecmint6.txt -rw-r--r-- 1 tecmint tecmint 65 Aug 8 11:22 tecmint7.txt -rw-r--r-- 1 tecmint tecmint 65 Aug 8 11:22 tecmint8.txt -rw-r--r-- 1 tecmint tecmint 65 Aug 8 11:22 tecmint9.txt
The above script create 15 files namely tecmint1.txt, tecmint2.txt…tecmint15.txt and every files contains the same data i.e.,
"I Love Tecmint. Tecmint is a very nice community of Linux Users."
2. Now search for duplicate files within the folder tecmint.
$ fdupes /home/$USER/Desktop/tecmint /home/tecmint/Desktop/tecmint/tecmint13.txt /home/tecmint/Desktop/tecmint/tecmint8.txt /home/tecmint/Desktop/tecmint/tecmint11.txt /home/tecmint/Desktop/tecmint/tecmint3.txt /home/tecmint/Desktop/tecmint/tecmint4.txt /home/tecmint/Desktop/tecmint/tecmint6.txt /home/tecmint/Desktop/tecmint/tecmint7.txt /home/tecmint/Desktop/tecmint/tecmint9.txt /home/tecmint/Desktop/tecmint/tecmint10.txt /home/tecmint/Desktop/tecmint/tecmint2.txt /home/tecmint/Desktop/tecmint/tecmint5.txt /home/tecmint/Desktop/tecmint/tecmint14.txt /home/tecmint/Desktop/tecmint/tecmint1.txt /home/tecmint/Desktop/tecmint/tecmint15.txt /home/tecmint/Desktop/tecmint/tecmint12.txt
3. Search for duplicates recursively under every directory including it’s sub-directories using the -r option.
It search across all the files and folder recursively, depending upon the number of files and folders it will take some time to scan duplicates. In that mean time, you will be presented with the total progress in terminal, something like this.
$ fdupes -r /home Progress [37780/54747] 69%
4. See the size of duplicates found within a folder using the -S option.
$ fdupes -S /home/$USER/Desktop/tecmint 65 bytes each: /home/tecmint/Desktop/tecmint/tecmint13.txt /home/tecmint/Desktop/tecmint/tecmint8.txt /home/tecmint/Desktop/tecmint/tecmint11.txt /home/tecmint/Desktop/tecmint/tecmint3.txt /home/tecmint/Desktop/tecmint/tecmint4.txt /home/tecmint/Desktop/tecmint/tecmint6.txt /home/tecmint/Desktop/tecmint/tecmint7.txt /home/tecmint/Desktop/tecmint/tecmint9.txt /home/tecmint/Desktop/tecmint/tecmint10.txt /home/tecmint/Desktop/tecmint/tecmint2.txt /home/tecmint/Desktop/tecmint/tecmint5.txt /home/tecmint/Desktop/tecmint/tecmint14.txt /home/tecmint/Desktop/tecmint/tecmint1.txt /home/tecmint/Desktop/tecmint/tecmint15.txt /home/tecmint/Desktop/tecmint/tecmint12.txt
5. You can see the size of duplicate files for every directory and subdirectories encountered within using the -S and -r options at the same time, as:
$ fdupes -Sr /home/avi/Desktop/ 65 bytes each: /home/tecmint/Desktop/tecmint/tecmint13.txt /home/tecmint/Desktop/tecmint/tecmint8.txt /home/tecmint/Desktop/tecmint/tecmint11.txt /home/tecmint/Desktop/tecmint/tecmint3.txt /home/tecmint/Desktop/tecmint/tecmint4.txt /home/tecmint/Desktop/tecmint/tecmint6.txt /home/tecmint/Desktop/tecmint/tecmint7.txt /home/tecmint/Desktop/tecmint/tecmint9.txt /home/tecmint/Desktop/tecmint/tecmint10.txt /home/tecmint/Desktop/tecmint/tecmint2.txt /home/tecmint/Desktop/tecmint/tecmint5.txt /home/tecmint/Desktop/tecmint/tecmint14.txt /home/tecmint/Desktop/tecmint/tecmint1.txt /home/tecmint/Desktop/tecmint/tecmint15.txt /home/tecmint/Desktop/tecmint/tecmint12.txt 107 bytes each: /home/tecmint/Desktop/resume_files/r-csc.html /home/tecmint/Desktop/resume_files/fc.html
6. Other than searching in one folder or all the folders recursively, you may choose to choose in two folders or three folders as required. Not to mention you can use option -S and/or -r if required.
$ fdupes /home/avi/Desktop/ /home/avi/Templates/
7. To delete the duplicate files while preserving a copy you can use the option ‘-d’. Extra care should be taken while using this option else you might end up loosing necessary files/data and mind it the process is unrecoverable.
$ fdupes -d /home/$USER/Desktop/tecmint [1] /home/tecmint/Desktop/tecmint/tecmint13.txt [2] /home/tecmint/Desktop/tecmint/tecmint8.txt [3] /home/tecmint/Desktop/tecmint/tecmint11.txt [4] /home/tecmint/Desktop/tecmint/tecmint3.txt [5] /home/tecmint/Desktop/tecmint/tecmint4.txt [6] /home/tecmint/Desktop/tecmint/tecmint6.txt [7] /home/tecmint/Desktop/tecmint/tecmint7.txt [8] /home/tecmint/Desktop/tecmint/tecmint9.txt [9] /home/tecmint/Desktop/tecmint/tecmint10.txt [10] /home/tecmint/Desktop/tecmint/tecmint2.txt [11] /home/tecmint/Desktop/tecmint/tecmint5.txt [12] /home/tecmint/Desktop/tecmint/tecmint14.txt [13] /home/tecmint/Desktop/tecmint/tecmint1.txt [14] /home/tecmint/Desktop/tecmint/tecmint15.txt [15] /home/tecmint/Desktop/tecmint/tecmint12.txt Set 1 of 1, preserve files [1 - 15, all]:
You may notice that all the duplicates are listed and you are prompted to delete, either one by one or certain range or all in one go. You may select a range something like below to delete files files of specific range.
Set 1 of 1, preserve files [1 - 15, all]: 2-15 [-] /home/tecmint/Desktop/tecmint/tecmint13.txt [+] /home/tecmint/Desktop/tecmint/tecmint8.txt [-] /home/tecmint/Desktop/tecmint/tecmint11.txt [-] /home/tecmint/Desktop/tecmint/tecmint3.txt [-] /home/tecmint/Desktop/tecmint/tecmint4.txt [-] /home/tecmint/Desktop/tecmint/tecmint6.txt [-] /home/tecmint/Desktop/tecmint/tecmint7.txt [-] /home/tecmint/Desktop/tecmint/tecmint9.txt [-] /home/tecmint/Desktop/tecmint/tecmint10.txt [-] /home/tecmint/Desktop/tecmint/tecmint2.txt [-] /home/tecmint/Desktop/tecmint/tecmint5.txt [-] /home/tecmint/Desktop/tecmint/tecmint14.txt [-] /home/tecmint/Desktop/tecmint/tecmint1.txt [-] /home/tecmint/Desktop/tecmint/tecmint15.txt [-] /home/tecmint/Desktop/tecmint/tecmint12.txt
8. From safety point of view, you may like to print the output of ‘fdupes’ to file and then check text file to decide what file to delete. This decrease chances of getting your file deleted accidentally. You may do:
$ fdupes -Sr /home > /home/fdupes.txt
Note: You may replace ‘/home’ with the your desired folder. Also use option ‘-r’ and ‘-S’ if you want to search recursively and Print Size, respectively.
9. You may omit the first file from each set of matches by using option ‘-f’.
First List files of the directory.
$ ls -l /home/$USER/Desktop/tecmint total 20 -rw-r--r-- 1 tecmint tecmint 65 Aug 8 11:22 tecmint9 (3rd copy).txt -rw-r--r-- 1 tecmint tecmint 65 Aug 8 11:22 tecmint9 (4th copy).txt -rw-r--r-- 1 tecmint tecmint 65 Aug 8 11:22 tecmint9 (another copy).txt -rw-r--r-- 1 tecmint tecmint 65 Aug 8 11:22 tecmint9 (copy).txt -rw-r--r-- 1 tecmint tecmint 65 Aug 8 11:22 tecmint9.txt
and then omit the first file from each set of matches.
$ fdupes -f /home/$USER/Desktop/tecmint /home/tecmint/Desktop/tecmint9 (copy).txt /home/tecmint/Desktop/tecmint9 (3rd copy).txt /home/tecmint/Desktop/tecmint9 (another copy).txt /home/tecmint/Desktop/tecmint9 (4th copy).txt
10. Check installed version of fdupes.
$ fdupes --version fdupes 1.51
11. If you need any help on fdupes you may use switch ‘-h’.
$ fdupes -h Usage: fdupes [options] DIRECTORY... -r --recurse for every directory given follow subdirectories encountered within -R --recurse: for each directory given after this option follow subdirectories encountered within (note the ':' at the end of the option, manpage for more details) -s --symlinks follow symlinks -H --hardlinks normally, when two or more files point to the same disk area they are treated as non-duplicates; this option will change this behavior -n --noempty exclude zero-length files from consideration -A --nohidden exclude hidden files from consideration -f --omitfirst omit the first file in each set of matches -1 --sameline list each set of matches on a single line -S --size show size of duplicate files -m --summarize summarize dupe information -q --quiet hide progress indicator -d --delete prompt user for files to preserve and delete all others; important: under particular circumstances, data may be lost when using this option together with -s or --symlinks, or when specifying a particular directory more than once; refer to the fdupes documentation for additional information -N --noprompt together with --delete, preserve the first file in each set of duplicates and delete the rest without prompting the user -v --version display fdupes version -h --help display this help message
That’s for all now. Let me know how you were finding and deleting duplicates files till now in Linux? and also tell me your opinion about this utility. Put your valuable feedback in the comment section below and don’t forget to like/share us and help us get spread.
I am working on another utility called fslint to remove duplicate files, will soon post and you people will love to read.
Hello,
I have three HDDs on my system, but it seems that fdupes only accesses the system drive (/sda). Is there a way to get fdupes to scan the other drives as well?
Regards,
Tony
@Tony,
Yes, you can specify the other drives by providing their mount points or directories as arguments when running fdupes.
This will scan all specified drives.
I am James. I have been looking at this and copying and pasting a lot of items, but I am not getting anywhere. After going through this, can you help?
> Note: Please use the saved script below for removal, not the above output.
I think this is the 2nd part that I have copied and pasted. May 22, 2024:
I hope this helps!
Hello, and thank you for the insightful guide on using fdupes for managing duplicate files. I’ve found the information extremely helpful. However, I’ve encountered some unexpected behavior that I’m struggling to understand.
I recently executed the command:
My expectation was that the output would prioritize listing duplicate files in /path1/to/directory_1 first, followed by duplicates in /path2/to/directory_2, based on the order in which the directories are specified in the command. But this isn’t what’s happening.
Instead, the order of the output doesn’t seem to align with the order of the directory paths provided in the command. This is intriguing, as I was under the impression that fdupes would process the directories sequentially based on their order in the command line.
@Antonio,
It seems there might be a misunderstanding about how fdupes processes the directories. The order of the output isn’t necessarily tied to the order of the directories specified in the command. Instead, fdupes scans all the directories simultaneously and then lists the duplicates collectively, regardless of their original directory.
fdupes: Supplied with the serious version of Linux I have (and I suppose others) and has numerous contributors, therefore very well tested. There are lots of forks (see GitHub), which may or may not introduce as many bugs as enhancements to these versions :)
Thanks to Tecmint for this page.
There are many special programs for removing duplicate files. I am using Manyprog Find Duplicate Files. It is very easy to use.
It might be a great option to tell preserve newer only when needed to mirror a directory structure while backing up a source to target with rsync and we have moved files in another folder on the source before a second backup.
I was always trying the best way to delete my duplicate files, but always something was piling the storage again. So I find efficient software (DuplicateFilesDeleter) to complete remove the duplicate data. I suggest it to anyone with this problem.
Great program! Wish it had a filter to exclude files under a certain size. I’m using this to clean up disc space, so its great if it would only report size > 1MB, etc.
Fdupes is a great tool.
Is it possible to specify which of 2 folders is sacrificial?
So, If I have 2 folders A & B and run Fdupes on them, I want any duplicate to be automatically deleted from B.
Thank you tecmint,
how to clean duplicate files on pc?
however it isn’t able to determine which files are needed and which can be safely deleted.
Please note it is not safe to remove all the duplicates CCleaner finds. The Duplicate Finder can search for files with the same File Name, Size, Modified Date and Content; however it isn’t able to determine which files are needed and which can be safely deleted.
Try using Duplicate Files Deleter
This is great for systems I ssh into, thank you. Only shortcoming is that listing file sizes in bytes only is difficult to manage with many files of large size.
I miss the -L (–linkhard) Option, which generates a hardlink instead of deleting files. In Version 1.50 Pre2 I had it, but on newer Versions, this Option is gone.
FsLint is better and easier to use
I have tried this software known as Duplicate Files Deleter. It works really well. It is fast and easy and clears up a lot of space.
In section 7, you say -d specifies what to delete, when in fact, the prompt is to specify what to preserve. What is not preserved is deleted. In your example, there are 15 files that are identical. If I select 1, it is preserved and the other 14 are deleted.
I use Duplicate Files Deleter as it is very effective. It is 100% accurate and performs the scan quickly.
A tool to find duplicates will be quite useful for us. Of course, it should not delete any files, but replace identical files with hard links to the same inode. (the files to be compared already have their MD5 checksums computed).
Thanks @Dan the Man, for finding this post useful. You have the option to find the duplicates with hard link and then to delete or not-to-delete. This is a very nice utility and most importantly, the approach to find duplicates is very good. Thinking to write a tool of myself for this purpose.
Keep connected for more such posts.
This is a great tool. It would be neat to have an option to replace the copy with links to the original. I suppose this could be scripted from the log file that is generated.
Thanks!
Dear JC,
You can use the flag ‘-d’ to delete the duplicates. I will suggest you to go through point – 7.
Keep connected for more such posts.
thnks
Welcome @ Praduman,
Keep connected for more such posts.
Thank you for the tip
in title “Comamndline” => “command Line”
Ah! that’s a conversion error. Thanks for your feedback. Asking admin to correct “Commandline” in heading as “command line” or “command-line”.
Keep connected @xhark.