cancel
Showing results for 
Search instead for 
Did you mean: 

Duplicate folder finder

Moderator
Moderator
Posts: 18,074
Thanks: 2,617
Fixes: 192
Registered: 06-04-2007

Duplicate folder finder

After much searching I have two programs for finding dupliacte files on my PC and various external drives that work well but what they don't do is find duplicate folder names.

 

I am currently testing Easy Duplicate Finder but does anyone have any recommendations?

Forum Moderator and Customer
Courage is resistance to fear, mastery of fear, not absence of fear - Mark Twain
He who feared he would not succeed sat still

9 REPLIES
VileReynard
Seasoned Pro
Posts: 10,828
Thanks: 250
Fixes: 10
Registered: 01-09-2007

Re: Duplicate folder finder

http://www.howtogeek.com/201140/how-to-find-and-remove-duplicate-files-on-linux/

has about six different ways to attain your objective. Thumbs Up

So boot up Linux from a USB to run one or two of these.

Community Veteran
Posts: 17,481
Thanks: 1,482
Fixes: 17
Registered: 06-11-2007

Re: Duplicate folder finder

If the files or folders have the same name... why not use the "search files and folders" box when you click on START  ?

 

e.g. music   will bring up several files.... AND  folders with music in the title...  if it is not showing in the initial list..

. click on " show more results"  at the bottom,.. for a full listing.

 

 

Community Veteran
Posts: 5,237
Thanks: 1,321
Fixes: 31
Registered: 16-10-2014

Re: Duplicate folder finder

@Mav - I’ve written you a Windows Console Application to do this for you, so if you would like a copy then send a PM and I’ll make it available. I’ll understand if you decline Smiley

C:\Users\Mook>DirDup .
Found : 9 Directories with : 2 Duplicates.
Would you like to list these or Print them to File or Quit? (y/p/q) : q

 

VileReynard
Seasoned Pro
Posts: 10,828
Thanks: 250
Fixes: 10
Registered: 01-09-2007

Re: Duplicate folder finder

Sometimes the files have different names, but duplicated content.

So you could compute a MD5 hash for each file, sort and look for duplicated hashes.

Or something similar.

Community Veteran
Posts: 5,237
Thanks: 1,321
Fixes: 31
Registered: 16-10-2014

Re: Duplicate folder finder

The OP is specific in referring to directories, so you couldn’t use this approach here.

Moderator
Moderator
Posts: 18,074
Thanks: 2,617
Fixes: 192
Registered: 06-04-2007

Re: Duplicate folder finder

@Mook

I have sent a PM

 

Thanks for the replies.

 

The program I mentioned in the OP didn't do anything other than tell me that there are duplicate folders when I know for a fact there are many spread out over various internal and external drives.

 

@VileReynard

I appreciate the linux links but I do have programs to find duplicate files and they work well. It is really a list of duplicate folder names I wish to find then examine the contents and, most likely, merge them.

Forum Moderator and Customer
Courage is resistance to fear, mastery of fear, not absence of fear - Mark Twain
He who feared he would not succeed sat still

Community Veteran
Posts: 5,237
Thanks: 1,321
Fixes: 31
Registered: 16-10-2014

Re: Duplicate folder finder

For those that are interested, I made @Mav an offer to write some code for him and to date we've made it to v1.0.0.4 thanks to Mav's feedback. Here is the current 'User Guide' for the program so you can see what it does.

DDFS v1.0.0.4 - User Guide.

DDFS uses command line switches to tell it what and where it is searching and how the search takes place. In this version the following switches are supported:

-d | —d	Find duplicate directories
-D | —D	Search all mounted drives
-e | —e	Exclude listed drives from search
-f | —f	find duplicate files
-h | —h	Show this help message
-i | —i	Include listed drives to search
-p | —p	Start search from <dir name>
-s | —s	Save results to <file name>
-t | —t	Run -D or -i searches in parallel
-x | —x	Add file sizes to results output file

To view the list of supported switches use : ddfs -h | —h

The use of a single - or double — dashes is supported so either can be used, the | (pipe) character shown above means ‘or’; e.g. use -d or —d. The switches can appear on the command line in any order, unless a value for it is required and this should appear after the switch to which it applies. For brevity I’ll use the single dash version for future examples.

To use DDFS now you must set at least one of the following switches: -d or -f so it knows what to look for as no defaults are set. You should note that some of the options listed above are mutually exclusive; i.e. only one or the other can be used at the same time for example:- -d and -f used on the command line will throw an exception.

Example uses :

ddfs -d	this will search for duplicate directories starting the search from the current path. 

ddfs -d -p	“../Books”	will search for duplicate directories starting the search from the Books directory one level up (“../“) in the directory hierarchy starting from the current working position. So given the following hierarchy:

Home +
	-Books
	-DDFS
	-Music
	-Notes
	-Stuff

and the command line from above, if the program is ran from the DDFS directory the search would begin in the Books directory as shown above, and the DDFS, Music, Notes and Stuff directories would not be searched.

ddfs -d -D	will search for duplicate directories on each drive found on the system with the search always starting at the root directory “\”.

ddfs -f -D	is as -d above but the search is for duplicate files.

ddfs -d -i C D E F G -t -s results.txt	This will do a parallel check for duplicate directories on drives C, D, E, F and G assuming they are valid and they are mounted on the system. See below for more details.

ddfs -d -e A B C D 
ddfs -f -i X Y Z

The -e option can only be used in conjunction with the -D option, but the -i option can be set when using either -d or -f  When using these options the values you enter must be in uppercase. It doesn’t matter if a drive you list is invalid as this will be ignored.

The -t option, when set, will launch a thread for each drive found for example if you use :

ddfs -d -i B C D P T W X -t	
This will execute up to 7 concurrent searches, and as a result may consume some clock cycles. However, by using this method you can cut the search time down.

ddfs -D -f -e C D F Y -t

This will concurrently search the C, D, F and Y drives for duplicate files.

Note that the -t switch will work best when used on distinct physical drives and not drive shares mounted from a single source i.e. a NAS drive.

Usage notes :

Obviously, when using the -D option or starting a search on a large disk this can take some time, the time taken being dependant on the number of drives and used space of the drives found.

When duplicates are found they are stored in memory until the scan completes when they are written to a file called ddfs.csv. Now, depending on the number of duplicates found will determine how long it takes to write the output file.

Just before the output file is written the amount of time taken along with the number of duplicates found, number of files skipped (due to permission issues) and the number of errors encountered are displayed on screen.

The name of output file defaults to (ddfs.csv) but can be changed on the command line by using the -s option and providing a name, for example:

ddfs -d -D -s results.txt

Will search all available drives for duplicate directories and save the list of duplicates found to the results.txt file in the current working directory. The current working directory is defined as the location where the executable is running from. The only supported file extensions are .txt or .csv, use anything else and it will generate an exception.

The -x option can only be used when searching for files, and when used will introduce a processing overhead as each file in the duplicates list has to be examined to get its size. The value shown in the output file is in bytes.

Should you want to terminate a long running search use Ctrl+C to abort, this will stop the search and terminate the program.

Sample output :

ddfs -D -f -s results.csv
Duplicate Directory and File Scanner v1.0.0.4 by Mook
Found 5 Drives A, C, D, Y, Z
Skipping Removable Drive A [S]
Scanning Fixed Drive C [*]
Skipping CD Rom Drive D [S]
Skipping Unmounted Drive Y [S]
Skipping Unmounted Drive Z [S]
Time : 2m 44s found : 346759 duplicates, skipped 108 with 0 errs
Writing results.csv [*] Done.

Any thoughts you have good and bad are appreciated.

VileReynard
Seasoned Pro
Posts: 10,828
Thanks: 250
Fixes: 10
Registered: 01-09-2007

Re: Duplicate folder finder

Is this essentially Windows only?

Community Veteran
Posts: 5,237
Thanks: 1,321
Fixes: 31
Registered: 16-10-2014

Re: Duplicate folder finder

@VileReynard, it's written in c++ but due to the fact that there are specific windows functions call do deal with the concept of drives on windows it for the moment, but this can easily be taken care of using compiler directives so would work on Windows, Linux or Mac.