02/07/94 LSIZE 2.51 - Archive File Sizer ----------------------------------------------------------------------------- DESCRIPTION ----------------------------------------------------------------------------- I originally wrote this program to backup very large .LZH files to floppy disks. There were a few programs that allow you to physically split a file into parts, but these programs proved to be inconvenient. I wanted a way to split one archive into many smaller ones that are still readable by the archiving program. This eliminates the need for rejoining "split" files or spanning a file across several floppies. LSIZE works well for "sizing" archives consisting of several smaller files (less than about 1Meg (compressed) each.) Remember it is impossible to "size" an archive smaller than its largest file. I.E. If the largest file in the archive is say 4Megs and it is compressed to, say, 1Meg -then you can size the archive no smaller than 1Meg. Splitting up archives with files this large still requires splitting or spanning. Luckily, 99% of archives (at least the ones I work with) don't contain files this large so this program works fine. Finally, I was persuaded to expand the original program to work with the popular ZIP and ARJ formats as well as LZH. NOTE: If you compress a single file and that file is still larger than your target disk size - THIS PROGRAM WILL NOT WORK! (Luckily, this is not usually the case.) In this case you have no other choice but to SPLIT this file using one of the many programs for this purpose. (People using ZIP/ARJ could also create multi-volume archives with their respective spanning features) The current version of the program should work with archives generated by: RAR version 1.502, PKZIP version 2.04g, LHA version 2.13, ARJ version 2.41, and their predecessors (I think). NOTE: This program will not split up multi-volume or spanned archives - attempting to do so can cause unpredictable results! This program will not work with self-extracting EXE files. Moreover, this program will not handle .RAR files with AV, either. ----------------------------------------------------------------------------- REQUIREMENTS ----------------------------------------------------------------------------- LSIZE has been compiled to work with all known IBM compatible processors -with or without a math coprocessor. ----------------------------------------------------------------------------- USAGE ----------------------------------------------------------------------------- USAGE: Typing LSIZE alone will produce a help screen. Command Line: LSIZE SourceFileName [TargetFileNamePrefix] MaxSize where SourceFileName = name of the archive to be broken up. TargetFileNamePrefix = prefix of files to be created. = [defaults to source name]. MaxSize = maximum size(in bytes) of each output file. = also accepts sizes: 1.44M, 720K, 1.2M, 360K. EXAMPLES: To split up a file called YEAR.ZIP into files that will each fit onto 720K floppy disks. Type: LSIZE YEAR.ZIP 720K This would produce a number of smaller files called YEAR1.ZIP, YEAR2.ZIP, YEAR3.ZIP, ...and so on. To split up a file called BIG.ARJ into smaller files each called SMALLn.ZIP with a max size of 1.44M. Type LSIZE BIG.ARJ SMALL 1.44M A FEW MORE THINGS: LSIZE will never overwrite files without your permission. Should a file be placed in a position that it could be overwritten you will be prompted: overwrite? (Y/N/G) Y means YES overwrite this file! G means YES overwrite this file and all subsequent files if needed. ANSWERING WITH G MEANS YOU WILL NOT BE PROMPTED AGAIN if LSIZE needs to overwrite another file - LSIZE WILL AUTOMATICALLY OVERWRITE! Any other response means "NO! DO NOT overwrite!" at which point the program is terminated and you will be returned to the DOS prompt. LSIZE can only handle archives with no more than 4000 files... Provided you have the RAM! ----------------------------------------------------------------------------- TECHNICAL NOTES HOW LSIZE WORKS ----------------------------------------------------------------------------- The following describes the basic algorithm used by LSIZE. Suppose an archive file looks like this: INFILE.TUT ---------------------------------------------------------------------- | A(2k)| B(7k)| C(5k)| D(3k)| E(1k)| F(3k)| G(6k)| H(3k)| I(8k)| J(2k) | ---------------------------------------------------------------------- The archive contains ten files named A,B,C,...J. Each files compressed size is in parentheses. We now wish to size INFILE.TUT to smaller 10K files. So we type LSIZE INFILE.TUT OUTFILE 10000 on the command line. LSIZE reads through the archive and stores each file name in an array along with its compressed size, and its relative position in the archive. Once all the files are read, this array is sorted in decreasing order by file size. So now our array would look like this: I8, B7, G6, C5, D3, F3, H3, A2, J2, E1. For simplicity: I'll call the 2K 'A' file, 'A2'; the 7K 'B' file, 'B7', and so on. These files are now 'sized' to subfiles. We'll begin with subfile #1. We must now determine which of the files will be placed in OUTFILE1. LSIZE begins by searching the array for the largest file that will fit into the target file size. In this case we want 10K files. So file I8 will be the first file in OUTFILE1. We now mark off file I8 in the array as belonging to subfile #1. File I8 will now be eliminated from further searches. We now look for the next largest file in the array that will fit into OUTFILE1 without causing it to go over 10K. This would be file A2. OUTFILE1 now will contain files I8 and A2. Note that OUTFILE1 has reached its 10K limit and no more files can be placed in it. Now we begin placing files into OUTFILE2. The next largest file is B7 and it is marked for OUTFILE2. The next largest file that will fit is D3. OUTFILE2 has now reached the 10K limit. This process of assigning files into subfiles continues until all the files have been assigned. The final subfile assignments look like this: OUTFILE1 OUTFILE2 OUTFILE3 OUTFILE4 -------- -------- -------- -------- I8 B7 G6 C5 A2 D3 F3 H3 E1 J2 Each of these subfiles is then written out to disk and this simple example is complete. In reality, there is header information that is also contained within each archive that must be accounted for when sizing. Making matters worse these header sizes grow larger each time another file is assigned to a subfile archive. ---------------------------------------------------------------------------- LEGAL STUFF ---------------------------------------------------------------------------- I am releasing this program into the public domain. There are no licenses to buy or fees to pay. (Of course, if you insist -you can send a donation to the address below!) There are also no guarantees of any kind. USE THIS PROGRAM AT YOUR OWN RISK. You may distribute this program freely provided that this file is included and the program itself is unmodified. The only thing I ask is that you send your comments, suggestions, or bug reports to me at: L.S. Lewis email to (Internet): 15 Marigold Lane OR larrylw@eden.rutgers.edu Willingboro, NJ 08046-2812 OR (CompuServe): 74124,374 Yang Yuanzhi email to (Internet): Institute of Computer Science at NTHU yyz@antslab.cs.nthu.edu.tw Hsinchu, Taiwan, Republic of China ----------------------------------------------------------------------------- LSIZE HISTORY ============================================================================= Ver 2.52 9-28-94 ----------------------------------------------------------------------------- .RAR files, without AV, supported now (by Yang Yuanzhi). Ver 2.51 1-18-94 ----------------------------------------------------------------------------- Corrected target file size for 1.2M floppy disks. Ver 2.50 1-18-94 ----------------------------------------------------------------------------- Modified output buffering to increase speed (up to 30% faster). Added routine to REALLY truncate output subfile prefixes that are too long. (I thought I fixed that a long time ago...but...) Added routine to pad zeroes as necessary to subfile numbers in filename. Spruced up the documentation. Ver 2.40 1-7-94 ----------------------------------------------------------------------------- Ver 2.30 was inadvertently compiled with the small memory model and would bomb on larger archives. Ver 2.40 is compiled with the CORRECT compact model. Added more efficient dynamic memory allocation for file names. Fixed bug so that LSIZE can handle all archives with up to 4000 files -if you have the memory. Improved error messages. Added header information for the various archives as comments in source code. Ver 2.30 1-1-94 ----------------------------------------------------------------------------- Fixed bug that caused file error when sizing out to over 15 subfiles. Special thanks to David DeSimone for pointing out this bug to me. Ver 2.20 11-10-93 ---------------------------------------------------------------------------- Changed from small to compressed memory model. Fixed bug that caused the first file sized from an ARJ to report the wrong file size. Ver 2.10 9-24-93 ------------------------------------------------------------------------ Fixed file_name length bug. File names longer than 16 chars bombed the program. Any file with a path is usually this length. Updated program for use with ZIP 2.04g, ARJ 2.41a, and LHA 2.13. Updated screen I/O to handle longer file names/paths. Note: LSIZE won't work with multi-volume zips(spanning)! Ver 2.00 12-30-92 ---------------------------------------------------------------------- Added code to handle ZIP 1.10 and ARJ 2.31 archive formats!! LSIZE now handles LZH/ARJ/ZIP formats. Updated some screen I/O. Toyed with idea of compiling the code for i386 for better speed. Kept 8086 instruction set for full compatibility. Ver 1.20 6-06-92 ------------------------------------------------------------------ Path handling has been improved. The output prefix parameter has been made optional. The default is the source name. The prefix is truncated if it exceeds 7 letters. Ver 1.10 5-15-92 ------------------------------------------------------------- Lsize now reports each file's size as it writes. The final size of each output file is also reported. The program now recognizes 1.44M, 1.2M, 720K, and 360K as maximum sizes on the input line. Ver 1.0 1-92 ----------------------------------------------------------------- The original program. =============================================================================