===================== = Word Count 1.00 = ===================== Features ======== - Count of lines, characters, non-whitespace characters, words, distinct words and unique words. - Average length of words, distinct words and unique words. - Sorted word lists with frequencies. - Word length distribution histograms. - Code page awareness. - "Quick scan" mode. - Multiple filespecs/wildcards. - WordStar 6.0 document format support. Files ===== WC.EXE Word Count executable file WC.DOC Word Count documentation (this file) WC.CRO Addendum to the documentation for Croatian users System requirements =================== PC XT 8086/8088 or compatible 128 kb of free conventional RAM Hard/floppy disk MS-DOS/PC-DOS 3.30 or later Conventions =========== Word A sequence of characters contained in either primary or secondary word set. Primary word set is considered to be [a-zA-Z€-š -¥] if the /cp option is not specified, or [a-zA-Z] plus whichever national character set you use if it is. (Sets are given in regular expression syntax used by many UNIX or UNIX-like utilities, most notably GREP.) Secondary word set is considered to be [0-9_']. For instance, wouldn't is_ascii Lotus123 are all eight-character words. Since each word has to contain at least one character from the primary set, 1995 '95 _1995_ are NOT words. Distinct word Different word. For instance, sentence Up, up, and away! contains three distinct words: "UP", "AND", and "AWAY". Comparison of words is case insensitive by default. Unique word Word that appears exactly once in the whole text. Sentence Up, up, and away! contains two unique words: "AND" and "AWAY". Absolute freq. Count of appearances of a single word in a text. Relative freq. Absolute frequency divided by the total count of words in the text. Character ASCII character in range 32-255, or TAB (ASCII 9). Non-whitespace ASCII character in range 33-255. Line Sequence of characters terminated by EOL (ASCII 13/10) or by the end of file. Number of lines in the text equals the count of EOLs plus one. Usage ===== WC files [options] /cp Code page support /h Show histogram /hd Show histogram for distinct words /l[f|s|l|u][@|@@] List used words [w/Freq|Sorted by freq|by Length|Unique] [write|append to specified file] /q Quick scan /s[s] Case Sensitive scan [& Sort] /ws WordStar 6.0 document Multiple filespecs and wildcards are allowed. Multiple files are processed as a single large file. Order of options/filenames is not important. Options are case-insensitive. Available options ================= /cp Enables the use of current code page settings, thus providing the support for national alphabets. All characters will be upcased according to uppercase table, and word lists will be sorted according to collating sequence table, as provided by DOS. >>>> Note that if text was written using one code page setting and Word Counted with another, its statistics are likely to be misleading. However, 7-bit texts (that is, those containing ASCII characters in range 0-127) are not affected by that. /h Shows histogram. Classifies all used words by their lengths, and prints corresponding absolute and relative frequencies. /hd Same as above, but shows histogram for distinct words. /l Prints a sorted plain list of all used words. /lf Prints a sorted list of all used words with their corresponding frequencies. /ll Prints a list of all used words, sorted by ascending word lengths. /ls Prints a list of all used words with their corresponding frequencies, sorted by descending frequencies. /lu Prints a sorted plain list of unique words. /q Performs a quick scan. Roughly 30 percent faster than the default, with minimal memory requirements, but distinct/unique word stats, distinct words histogram and list options are not available. /s Case sensitive scan and case insensitive sort. Words are considered distinct even if they differ in capitalization only. /ss Case sensitive scan and sort. /ws Scans text in WordStar 6.0 (and hopefully 7.0) document format. >>>> Documents created with WordStar version 4.0 or earlier will *not* be processed correctly, as their format differs from that of WS 6.0. Same might apply to WS versions 5.0 and 5.5; I didn't have an opportunity to check that. Redirection All generated lists can be redirected to a file. For example, specifying /lu@analysis will write a list of all unique words to file named "analysis". If it already exists, it will be overwritten. Specifying /lu@@analysis will do the same, except that the output will be appended to the end of the file. Note that no blanks are allowed neither between @/@@ and the output file name, nor between @/@@ and the switch character. Option: Is incompatible with: List option Any other list option List option /q /q /hd /s /ss /cp /ss Limitations =========== Word length Maximum word length is 64 characters. Longer words are truncated. Frequencies Maximum word frequency is 65535. Filenames Number of filespecs in the command line is not limited, but the program can process a maximum of 128 files. Remaining files are ignored. Memory Each distinct word takes up (9+length) bytes of memory. The program itself occupies around 50 kb. Therefore, PCs with 620 kb of free conventional memory will probably not be able to process texts containing more than 35000 distinct words. With the /q option, however, there are no memory limitations. Error messages ============== Insufficient memory/Out of memory There is no memory left to store (additional) distinct words. Try to make more conventional RAM available. If nothing works, specify the /q option. Can't open The specified file is not there, or its name is invalid. Probably caused by misspelled filename or path. Another possible reason is that you have typed a blank between @/@@ and the output file name. Can't create Output file name is invalid, or disk is full, or file already exists and is marked read-only, or floppy disk is not ready. Error reading File exists and is opened successfully, but a problem has occurred during read operation. This is probably due to media malfunction. Conflicting options Command line contains two or more options which perform conflicting actions (see "Available options"). Invalid option Command line contains an invalid option. Check the syntax. In the event of error, Word Count terminates with the DOS error level set to 1. License ======= You may freely use, copy and distribute Word Count as long as: 1) It is distributed in its original, unmodified form, including the documentation. 2) It is not sold for profit. A reasonable handling expenses fee is permissible. You are encouraged to upload Word Count to your local BBS or ftp server. If you use this program and find it of value, a contribution of $7 (business/commercial users: $15) or any amount will be appreciated. Contact address =============== All questions, comments or suggestions concerning Word Count and its use are more than welcome. Send your mail to the author: Branko Radovanovic Josipa Seissela 44 10010 Zagreb CROATIA e-mail: br31187@pinus.cc.etf.hr or br31187@pinus.cc.fer.hr