ZIPSCRUB by Dan Goodell December, 1993 PKWARE's PKZIP program includes features that make it suitable in a backup strategy. New files can be added to a ZIP archive, and files that have been modified can be updated. However, there is no convenient way to delete obsolete files from the archive. If a file is renamed or is erased from the source directory, its copy remains in the archive under the old name, even after subsequent archive updates. I call these "orphans" - files that used to exist but no longer do. Over time, a ZIP archive can gradually accumulate several such obsolete files. ZIPSCRUB's purpose is to purge the orphans from the archive. ZIPSCRUB compares the filenames in a ZIP archive to the source directory. If files with the same names no longer exist in the source, they are purged from the archive. ************ WARNING! WARNING! WARNING! WARNING! ************* THIS PROGRAM IS DESIGNED TO AUTOMATE THE DELETION OF DATA. LIKE ANY SUCH PROGRAM, YOU MUST FULLY UNDERSTAND HOW IT OPERATES, OR YOU MAY UNINTENTIONALLY LOSE VALUABLE DATA. PLEASE READ THIS ENTIRE DOCUMENT. USERS OF THIS PROGRAM DO SO AT THEIR OWN RISK. IF YOU DO NOT UNDERSTAND HOW IT OPERATES, DO NOT USE IT! THE AUTHOR IS NOT RESPONSIBLE FOR ANY DAMAGES OR LOSSES THAT OCCUR THROUGH USE OF THIS PROGRAM. ***************************************************************** RUNNING ZIPSCRUB ---------------- Run ZIPSCRUB using the following syntax: ZIPSCRUB archivename [d:][path] where "archivename" is the name of the ZIP archive file, and [d:][path] is an optional source drive and/or directory path parameter indicating where the original files are supposed to be. The name of the ZIP archive to be scanned is required. If no name is specified, ZIPSCRUB displays a simple syntax screen. ZIPSCRUB scans the archive for the filenames stored in it, and then checks whether files with the same names still exist in the source directory. ZIPSCRUB then displays the unmatched filenames and prompts the user to confirm whether those files should be purged from the archive. In looking for filename matches, ZIPSCRUB uses exactly the same search procedure DOS would use (see the section below on "The DOS File Search Procedure"). You may optionally specify a source directory. If none is specified, ZIPSCRUB uses the current DOS drive and directory. You may specify the drive and/or directory path to search. If you specify both, however, remember this is a single parameter so do not leave any space between the drive letter and directory (e.g., use "c:work" instead of "c: work"). HOW ZIPSCRUB WORKS ------------------ ZIPSCRUB works by calling the PKZIP program. Therefore, you must have a copy of PKZIP.EXE and it must be available to DOS (that is, in the current directory of the default drive, or in one of the directories listed on the PATH). There is no provision for directing ZIPSCRUB to find PKZIP.EXE in other directories. PKZIP is a widely-available "shareware" file-compression program and is the copyrighted work of PKWARE, Inc. ZIPSCRUB first calls PKZIP.EXE with the "-v" parameter and passes it the name of the ZIP archive file. PKZIP's "-v" option asks for a list of the files stored in the archive. The output from PKZIP is redirected to a temporary file named ZIPSCRUB.TXT. ZIPSCRUB then reads back the ZIPSCRUB.TXT file, extracting only the filenames in it. For each filename, ZIPSCRUB checks to see if a file by the same name still exists in the source directory. If not, the file is considered "orphaned" in the archive. The filenames of orphans are temporarily stored in a file named ZIPSCRUB.DEL. After checking all filenames, ZIPSCRUB asks you to confirm whether the orphans should be purged. NOTE: ZIPSCRUB does not determine whether the actual files in the source and archive are the same; it searches only for matching filenames. If you affirm the purge, ZIPSCRUB again calls PKZIP.EXE, this time with the "-d" parameter to delete files from the archive. It passes PKZIP the archive name and the list of orphaned files. Finally, the files ZIPSCRUB.TXT and ZIPSCRUB.DEL are erased. If PKZIP should run into an error, ZIPSCRUB saves PKZIP's output in the file ZIPSCRUB.TXT. In this case, ZIPSCRUB does not erase the ZIPSCRUB.TXT file, but leaves it so you have a chance to examine PKZIP's error message. Although ZIPSCRUB is designed to wait for confirmation before proceeding with a purge, you can bypass that step by using DOS to "pipe" a "Y" keystroke into ZIPSCRUB. To do that, use the command syntax: ECHO Y | ZIPSCRUB archivename [d:][path] However, this is not recommended unless you are certain ZIPSCRUB will search the correct source directory. THE DOS FILE SEARCH PROCEDURE ----------------------------- DOS maintains a table called the Current Directory Structure (CDS) in memory to store the "current" directories for each drive in the system. Normally, the current directory is the last directory you were in on that drive. The CDS is unrelated to the current or "default" drive. DOS remembers these directories, regardless of which drive you have currently switched to. To illustrate how DOS uses the CDS, imagine you were last in the FOO directory on the C: drive (C:\FOO), but have switched to the D: drive and are now in the D:\BAR directory. Now ask DOS for a list of files by using the "dir" command. If command line is: DOS displays: ------------------- ----------------------------------- 1. "dir" files in the current directory of the default drive -- D:\BAR\*.* 2. "dir work" files in the WORK subdirectory of the current directory of default drive -- D:\BAR\WORK\*.* 3. "dir c:" files in the current directory of the C: drive -- C:\FOO\*.* 4. "dir c:work" files in the WORK subdirectory of the current directory of the C: drive -- C:\FOO\WORK\*.* 5. "dir c:\" files in the root directory of the C: drive -- C:\*.* 6. "dir c:\work" files in the WORK subdirectory of the root directory of the C: drive -- C:\WORK\*.* (Note: in the above examples, WORK is assumed to be a directory entry. If there is no such directory, DOS looks for a file by the name WORK.) THE ZIPSCRUB FILE SEARCH PROCEDURE ---------------------------------- ZIPSCRUB uses the CDS to search for files in exactly the same manner as DOS. Let's say you've saved the file C:\WORK\TEXT.FIL in a ZIP archive named ARCHIVE.ZIP. PKZIP's normal default is to store only the filename without the directory path, so the filename stored in the zipfile will be "TEXT.FIL". Now, if the current directory of drive C: is C:\FOO, and the current directory of drive D: is D:\BAR, and the default drive is D:, then: If command line is: ZIPSCRUB searches for: ------------------------------- ---------------------- 1. "zipscrub archive" D:\BAR\TEXT.FIL 2. "zipscrub archive work" D:\BAR\WORK\TEXT.FIL 3. "zipscrub archive c:" C:\FOO\TEXT.FIL 4. "zipscrub archive c:work" C:\FOO\WORK\TEXT.FIL 5. "zipscrub archive c:\" C:\TEXT.FIL 6. "zipscrub archive c:\work" C:\WORK\TEXT.FIL Note only the sixth example searches for the proper file. The first four commands do not work properly in this case, but could work in other cases, depending on what the current drive and directory defaults are set to in DOS. The fifth command will only work if TEXT.FIL happens to be in the root directory of that drive. Now let's assume you've saved the file C:\WORK\TEXT.FIL in the archive ARCHIVE.ZIP with PKZIP's "-P" parameter. This option saves the file's directory path (but note: NOT the drive spec!) in the archive. PKZIP stores the filename "WORK/TEXT.FIL" in the archive (note PKZIP uses '\' and '/' interchangeably). Now, if the current directory of drive C: is C:\FOO, and the current directory of drive D: is D:\BAR, and the default drive is D:, then: If command line is: ZIPSCRUB searches for: ------------------------------- ------------------------- 1. "zipscrub archive" D:\BAR\WORK\TEXT.FIL 2. "zipscrub archive work" D:\BAR\WORK\WORK\TEXT.FIL 3. "zipscrub archive c:" C:\FOO\WORK\TEXT.FIL 4. "zipscrub archive c:work" C:\FOO\WORK\WORK\TEXT.FIL 5. "zipscrub archive c:\" C:\WORK\TEXT.FIL 6. "zipscrub archive c:\work" C:\WORK\WORK\TEXT.FIL Only one of these commands will search for the proper file. The first four examples are dependent on the default drive and current directories the drives are set to. The sixth example redundantly looks for the WORK\WORK subdirectory. Note example 5 is independent of drive or directory defaults and will always search for the proper file when it was saved using PKZIP's "-P" option. If orphans are found, the orphan list window will show the full drive, directory path and filename ZIPSCRUB tried to match. This gives you a chance to make sure ZIPSCRUB was looking for matches in the right place. ZIPSCRUB differentiates directory names stored in the archive from directory names you entered (on the command line) or the defaults from the DOS Current Directory Structure (CDS). Drive letters and directory names you entered or that ZIPSCRUB "adopted" from the CDS are always listed in lowercase, regardless of how you entered them on the command line. Directory names that are part of the filename stored in the archive are always shown in uppercase. DISK SPACE REQUIREMENTS ----------------------- ZIPSCRUB calls the PKZIP program, so the amount of free memory and disk space needed is largely dependent on PKZIP's needs. ZIPSCRUB also requires a certain amount of free disk space to write its temporary files, but this amount (10K-50K typical) is usually small compared to what PKZIP needs. ZIPSCRUB does not attempt to check for minimum free disk space requirements. If PKZIP runs out of memory or disk space, it will generate an error message that is passed on to ZIPSCRUB, and which ZIPSCRUB will save in the file ZIPSCRUB.TXT. COPYRIGHT AND LICENSE --------------------- PKZIP is a registered trademark of PKWARE, Inc. ZIPSCRUB is supplied as is and without warranty. The author assumes no liability for damages, direct or consequential, which may result from the use of this program. Users of this program must accept this disclaimer of warranty. ZIPSCRUB is not a Public Domain program. It is the copyrighted work of its author, Dan Goodell. All rights under US copyright law are reserved. The author is not associated in any way with PKWARE, Inc. There is no charge for private, non-commercial use of this program. Other users should contact the author for further information. Distribution of this program is subject to the following conditions: ZIPSCRUB must be supplied with this documentation file. It is important that users be aware of the potential hazards associated with such programs as this that automate data deletion. Therefore, do not distribute the ZIPSCRUB program without this documentation. Neither ZIPSCRUB nor this documentation may be modified or altered in any way, other than changing the "archive format" used to store and transmit the program. No renumeration may be accepted for ZIPSCRUB, or for distribution, other than a nominal disk/duplication fee. Questions, comments, or recommendations are welcome. ZIPSCRUB Copyright 1993 1261 Hookston Rd ver. 1.0 by Dan Goodell Concord CA 94518 12/19/93 CIS: 71520,3116