Find Duplicates - discover duplicated files Find Duplicates was written to allow you to control your disk space usage by discovering files that are duplicated and, should you so wish, deleting one or more of these duplicates. There are many ways in which duplicate files can be deposited on your hard disk, for example programs which don't check to see if you have a particular DLL installed and install their own private copy in any case, or other programs that install a DLL in your \Windows folder when it is already in \Windows\System32. You can also use Find Duplicates to see if any files on a floppy are already present on your hard disk. How does Find Duplicates work? Find Duplicates scans one or more disks on your system to find multiple files, in a two-phase process. First it scans all the folders and sorts all the files it finds into size order (files HAVE to be the same size to be identical - yes?) You can limit the scan to one folder tree, if you wish. It then compares files of the same size to see if the contents are actually identical, and lists identical files by size order. You can then double-click on any file to examine its properties, and optionally delete it. This process can take some time, so Find Duplicates will first perform one of two preliminary checks to see if the files might actually be identical without having to actually examine the whole file. By default, it checks the modification date and time of the files, and only compares the files byte-by-byte if the timestamps are the same. But it is possible for two files to have the same contents without having the same timestamp, so you can enable an option whereby the first 512 bytes of each file are checksummed. This improves the recognition of identical files, but it is slower, and since it involves a file access, the file's last access date will be altered. By default, the timestamp, not the checksum comparison is selected. In either case, the filename is ignored, so simply renaming a file will not hide the fact that it is a duplicate. The timestamp of zero size files is ignored. You may be rather surprised to discover what duplicates by content actually exist in some popular office suites! When should I disable timestamp checking? You can turn off the timestamp checking in favour of the slower checksum method should you so wish. For example, if many identically sized and timestamped files are found from the initial search, using just timestamps might miss some duplicate files since the duplicates may not be adjacent in the name ordered list produced by the folder scan. The program was not designed for this sort of duplicate search, but will perform adequately with timestamp checking turned off. You might also wish to disable time- stamp checking if you suspected that different products had installed identical support DLLs. Usage: Extract FindDupl.exe from the zip file to a convenient location, and run it! Only the FindDupl.exe file is required from the archive. You will be presented with a dialog box showing you disk drives, with your local hard disk drives selected. You can optionally enter a file spec such as *.EXE and a folder specification such as \windows to limit the search. Note that if you enter a folder specification, only that folder will be searched on each drive (e.g. c:\windows, d:\windows and so on). Press the Start Search button to find duplicate files. There is a status bar which will keep you informed on the progress of both the folder scan phase, and the file comparison phase. Once the main list box has filled up with file names, you can double-click on a file name to get a pseudo Properties dialog box (actually written in Delphi, not derived from the system right-click -> Properties box). You will see a delete button which allows you actually to delete the file. If a floppy disk (specifically drive A:) is included in the selected drives to scan, the program will normally assume that you wish to find files in common between the floppy and the other disk drives, so that during the folder scan phase Find Duplicates will only record files on the other drives that are the same size as files found on the floppy. This makes the scanning faster and allows you to ask the question "Do I already have any files on my hard disk that are on this floppy?" You can treat floppies just as ordinary disks by unchecking the "Treat floppy as master" check box. You may notice a slightly different message in the status bar during the folder scan phase in this case. Windows 95 has a special hidden folder called SYSBCKUP where backup copies of critical system files are stored. Find Duplicates will recognise a folder with \SYSBCKUP\ in the path name, and ignore any files in that folder. To disable this safety feature, uncheck the "Skip SYSBCKUP folder" check box. The status bar will indicate that the folder is being skipped, but you'll have to be quick to see that message! Other hidden folders are scanned normally. Find Duplicates will ignore files that have zero length, because the data in such files does not occupy disk space, and they are often simply marker files (e.g. hidden files to show that a folder was created by installing an application and not a user). If you prefer to find these files, uncheck the "Skip zero-length files" check box. Be aware that these files actually take up at least 32 bytes of directory space, but that since the folder must be at least a cluster size long (e.g. 4096, 8192 bytes) there will typically be very little overhead for a zero-length file. Upon exiting, Find Duplicates will try to save the list of duplicates in a file named FindDupl.lis in the same folder as the FindDupl.exe program file. If this file is present on starting the program, Find Duplicates will ask if you would like to reload the list. This allows you to split the task of deleting of duplicate files into short sessions without having to run the time consuming scan and compare phases every time. For safety, Find Duplicates will not actually delete files, but instead will move them to the Recycle Bin. This means that the disk space will not actually be returned until the Recycle Bin is emptied. Right-click on the Recycle Bin to access the Empty Recycle Bin function. +------------------------------ WARNING ---------------------------------+ | | | You take sole responsibility if you choose to delete a file. Find | | Duplicates makes no attempt to check if the file is in use or key to | | the functioning of your computer. Take backups before making changes. | | | +------------------------------ WARNING ---------------------------------+ Notes: The program is written with Borland's Delphi 3.0, and most of the source code is included. You do not need access to Delphi to run Find Duplicates. You will need other Delphi units (not included in the .ZIP file) in order to recompile Find Duplicates. The program requires Windows 95 or NT 4.0. The folder scan phase can consume a large amount of virtual memory if a wildcard *.* is specified. At present, the program does not detect when its memory allocations fail, and may hang in these circumstances with an out-of-memory error. Increase the space available for the Windows swapfile or avoid specifying wildcards if this happens to you. Release information: 1997 Jan 29 V1.0.4 First released version 1997 Feb 03 V1.0.6 Treat floppy drive as master 1997 Feb 12 V1.0.8 Decode date of "0" as "unknown" on Properties dialog 1997 Apr 02 V1.1.0 Make file list box hint the filename (for long paths!) Save and optionally restore duplicate file list By default, ignore files in Win 95 SYSBCKUP folder Replace ListBox with ListView (both Drives and Results) Correct: missing FindClose in do_checksum routine Correct: remove deleted file from the duplicates list 1997 Apr 07 V1.1.2 Use ShellAPI function to move file to recycle bin 1997 May 13 V1.1.4 Use my own TFileList component Don't show properties/delete box for non-existant files Put source files in sub-folder Force checksum routine to return 31-bit value 1997 May 18 V1.2.0 Move to Delphi 3.0 Don't leave singletons in the duplicates list Correct property display for sequential compressed files Don't allow ColumnClick on the FileListView - set False 1997 Oct 08 V1.2.2 Move to Delphi 3.01 Handle large font displays better. Use TreeScanner with FindHiddenXX options Don't build against run-time VCL30.DPL Contacting the author: This program is freeware, and remains copyright of David J Taylor, Edinburgh, 1997. This program is provided "as is", without any support. Whilst I cannot answer queries relating to the use of this program, I'd welcome any comments or suggestions for improvements you may have, and such feedback has helped mould the present version of the program. david.taylor@gecm.com 1997 October 08