-------------------------------------------------------------------- | WSEARCH - PC Magazine File and Text Search | | | | OS/2 Command Line Version (WSOS2.EXE) | | Version 1.00, 2/27/96 | | | | Copyright 1996, Ziff-Davis Publishing Company. All Rights | | Reserved. First Published in PC Magazine, US Edition, | | March 26, 1996. | | | | Author: Tom Rawson. | -------------------------------------------------------------------- WSearch ------- WSearch is a fast file and text search utility for Windows 3.x, Windows NT, Windows 95, and OS/2. This file explains how to use WSearch under OS/2; for Windows use see the README.TXT file and the Windows help file (WSEARCH.HLP). Installing WSearch is easy -- just copy it to a directory on your hard disk. The sections below describe how to use WSearch from the OS/2 command line. WSearch and its documentation and source code are Copyright 1996, Ziff-Davis Publishing Company, All Rights Reserved. First Published in PC Magazine, US Edition, March 26, 1996. The WSearch Command Line ------------------------ You can specify a filename, a text string, and any of the WSearch options using switches on the command line. Command line switches can be preceded with either a dash (-) or a slash (/). Note that the command to launch WSearch under OS/2 is WSOS2 (not WSEARCH). The commands for 16-bit Windows and 32-bit Windows are WSWIN and WSWIN32 respectively. The command line syntax is: wsos2 [options] [-t]text-expression [-f]file-specification ... The options control the search process. The text-expression indicates the text to search for (if any), and the file- specifications indicate the file(s) to be searched. For example, to search for the string "Budget" in all .TXT files, you could use this command: wsos2 Budget *.txt For additional details on how to construct text expressions and file specifications, see the corresponding sections of this file below. The search options are: -? Display a "Usage" message for quick-reference on WSEARCH syntax and options. -a Show all matches: Sometimes a file will contain more than one match to the search pattern. Normally WSearch lists the first matching line only. If you check this option, WSearch will list all the lines that match in each file. -c Case sensitive: This option makes pattern matches case- sensitive -- that is, the case of the search text will have to match the case of text in the file exactly for it to be considered a match. Otherwise, case variations are ignored during the search. Note that this applies only to text searches; file name searches are always case- insensitive. -m Show files that don't match: By default, WSearch will only list files that contain a match to the search text. If you check this option, files with no match will be displayed as well, with the message ** No match **. -n Line numbers: By default, WSearch does not display line numbers in the results. These are optional because the search goes a little faster when WSearch doesn't have to count lines. However, you can tell WSearch to display line numbers by checking the Line numbers option. -s Search subdirectories: Tells WSearch to search the current directory (or any directories you name) and all of its subdirectories. If this option is not selected, only the current and/or specifically named directories will be searched. If you enter a root directory as the file name, checking Search subdirectories will search an entire drive. The -t and -f switches are not necessary if a text expression precedes the file specifications. If you want to omit the text expression, or if you want to put the file specification before the text expression, the -t and -f switches let you do so. Their meanings are: -t text expression follows -f filespec follows If you don't specify a text expression, WSearch will list all the files that match the file specification(s), and will ignore the text within the files. You can include multiple switches after the switch character. For example, the following commands are equivalent: wsos2 -a -n ... wsos2 -a-n ... wsos2 -an ... If you use WSearch to search standard PC (ASCII) text files with typical line lengths, it will match text and count line numbers just as you expect. However if you are using WSearch to search non-ASCII files (e.g. word processor documents, or binary files such as .EXE or .DLL files), files with very long lines (over 2048 characters), or files transferred from other systems (e.g. Apple Macintosh), you may want to look over some of the technical details under the heading Line Breaks and Line Numbers near the end of this document. WSEARCH Output -------------- If you specify a text pattern, WSearch displays each file name, followed by the line number of the first match (if the -n option was used) and the text of the matching line. If you ask for all matches, the file name is displayed on a line by itself, with matching text lines below it. If the matching text crosses a line boundary (i.e. if there is a "\n" in the search pattern), the first matching line is shown. If the pattern starts with "\n" the line before the first matching "\n" is the one displayed. If you don't specify a text pattern, WSearch will display filenames only. When a search is completed WSearch displays the number of files found. If a text pattern was specified, the count shows how many files contained at least one match. If no text pattern was given, the count simply shows the number of files found. If WSearch generates too much output to fit on the screen you can redirect the output to a file. For example, this command will look for every occurrence of the word "he" on the entire hard disk, and save the output in the file THELIST wsos2 -s the \*.* > thelist You can also "pipe" the output to the MORE program for pagination, or to a file viewer. For example: wsos2 -s the \*.* | more WSearch does its best to make the displayed results useful by truncating long filenames and text lines before they are displayed (however the output for any particular match may still be longer than one line). Filenames are truncated at 80 characters by replacing a center portion of the name with "...." (e.g. d:\dira\....\dirx\filename.ext). Text is truncated by discarding the beginning or end of the line, as required, and replacing them with "....", to limit the output to a maximum of 128 text characters. The text truncation algorithm shows the portion of the line containing the text that matched, and includes additional text before and after the match if there is room. NUL (ASCII code 0) characters would normally terminate the displayed line where the 0 character occurs. To prevent this problem, binary 0s are converted to periods before displaying the line. Similarly, BEL (ASCII code 7) characters are converted to periods in displayed text to avoid the noise they cause when searching binary files. File Specifications ------------------- A file specification can include a path, a file name, or both. If you use only a path, WSearch will search all files in the specified directory. You can specify multiple file names or paths by separating them with spaces or semi-colons. When searching multiple file specs, WSearch starts by looking through the first one (including subdirectories, if the -s switch is used), and then moves on to the next one. You can list up to 32 separate file specifications for WSearch to search. If you don't enter any paths or file names, the search includes all files, and starts from the current directory. WSearch supports long filenames, and requires double quotes around any filename which includes spaces or semicolons (e.g. "Goulash Recipe"). Short file names, and long names without spaces or semicolons (e.g. GoulashRecipe) don't require quotes, but you can use them if you wish. You can use a regular expression to specify multiple files. Regular expressions include the wildcard characters * and ?, but also allow much more powerful tools for selecting filenames. See the section on regular expressions below for complete details. The remainder of this section explains how regular expressions work when they are used in file specifications. When you use a regular expression in a file specification, WSearch retrieves the name of each file in the directory and tests it to see if it matches the pattern you've specified. For example, the filename [abc]*y.* will find all files whose names begin with a, b, or c and end in y, and have any extension. You can only use a regular expression for the file name and extension. Regular expression characters in the path portion of the file name (the part before the final backslash) will be treated as though they are part of the path. If they make the path (or filename) invalid, you will get an error message from the operating system. When a regular expression is used to match a file name, WSearch assumes the first character in the expression must match at the start of the name, and the last character must match at the end of the name. For example, the pattern [abc]*t will match the file ADMIT, but will not match MARY.DAT (because the name does not begin with a, b, or c), or ARMPITS (because it does not end with t). To match the specified string anywhere within a filename, add a * at the beginning and end (e.g. *[abc]*t*). Note that this is different from how text expressions are handled. In text expressions, the angle brackets, < and >, are used to anchor searches to the first or last character of a line. This difference is necessary to make WSearch's behavior consistent with the way file names are usually handled. If file name patterns were not anchored to the beginning and end of the name, a pattern like xy? would match, for example, a file named ABXYZ, which is not what the user would expect. When WSearch tests file names, the entire name -- including the "." and any extension -- is treated as a string and matched against the expression you specify. This is slightly different from the typical wildcard matching you may be used to, where the name and extension are treated separately. For example, in WSearch, the file name pattern a?y matches names which begin with the letter a followed by any character, followed by the letter y. This would match a file named ANY, or one named A.Y; the "." between the extension and the file name is simply treated as another character. Text Expressions ---------------- A text expression specifies the text you want to search for. When you specify text for WSearch to match, most characters are treated literally -- that is, an "a" just means the letter "a", and a "(" means a left parenthesis. For example, to search for the name "Mabel," just enter the word Mabel. You can also use a regular expression (see below) to specify the text. Regular expressions provide a much more powerful matching tool than plain text, because they allow you to describe "patterns" of characters, not just literal text. If the text expression includes spaces you must surround it with double quotes, for example: wsos2 "at home" *.txt To include a double-quote character within the expression, use the escape character (see below). For example, to search for the string '3" length', you might use: wsos2 "3^" length" *.dat Regular Expressions ------------------- Certain characters, such as the wildcard characters * and ?, have a special meaning to WSearch. These special characters allow you to specify a text "pattern" for WSearch to locate in the files you search. You can also use regular expressions as part of the filename itself. The special characters used in regular expressions are called "meta- characters" because they have a higher-level meaning than a plain text character. The meta-characters used by WSearch are: ? Matches any single character except the end-of-line character. For example, a?c matches abc, axc, or a#c. * Matches any character or string except the end-of- line character; it also matches no character. For example, a*c matches alphabetic, all plastic, or AC. [abc...] Matches any character listed within the brackets (called a character class). In case-insensitive searches, letters in a class include their upper or lower case equivalent. For example, [aBc] will match a, b, c, A, B, or C for case-insensitive searches, but only a, B, or c for case-sensitive searches. [a-z] Matches any character between the ends of the range, including the end characters themselves, using standard ASCII values. You may need an ASCII chart to understand unusual ranges. For example, the range [0-z] would also include many punctuation characters. You can enter the ends of a range in either order. For example, [z-a] is the same as [a- z]. [~abc...] Matches any character except those listed within the brackets. If a range is used, WSearch will match characters which are not part of the range. For example, [~afw-z#] will match all characters except a, f, w, x, y, z, and #. < Matches only if the character or string occurs at the beginning of the line. For example, Matches only if the character or string occurs at the end of the line. For example, and> will match hand but not handle. If you want to find a file that contains the string "noodles" you can just enter that string. But what if you want to find any string that starts with n, has a vowel plus two additional characters following, and then ends in "les" (for example, needles, nestles, nettles, nibbles, nickles, niggles, nipples, nobbles, nodules, noodles, nozzles, nubbles, and nuzzles)? You can do it with this regular expression pattern: n[aeiou]??les If you want to find all files that contains the words "breach" or "broach", you would use this expression: br[oe]ach Here are a few more examples: a*b Matches a line which contains an a followed by a b somewhere later in the line. For example the following lines would match: The alphabet contains 26 letters. Today is the day we go to the boat. ABC The following lines would not match: The brown fox jumped over the lazy dog. Boys will be boys. 1[0-9][0-9] Matches a line which contains a number between 100 and 199. <1[0-9][0-9] Matches a line which begins with a number between 100 and 199. , ?, *, [, and ]. The special characters used within classes, ~ and -, are interpreted as normal characters when they occur outside a class. For example, outside a class, the pattern ab~c would be interpreted literally, with no special meaning for the ~. The ~ character only carries a special meaning when it is the first character in the class; otherwise it is taken literally. Characters which have a special meaning outside a class such as ? and * are interpreted as plain text characters when they occur within a class. For example, the pattern [xyz*~] would match the characters x, y, z, *, or ~. If you need to include one of the WSearch meta-characters in your text you must precede the meta-character with the escape character, a caret (^). For example, to search for the string Are you there?, you must use this pattern: Are you there^? For more details, and information on using the escape character to represent non-printing characters like Tab or Line Feed, see the Escape Character section below. Escape Character ---------------- The escape character, a caret (^), is used to change the meaning of the following character or characters. You can use it to include meta- characters (see the Regular Expressions section above) or non-printing characters (see below) in your search pattern or file name. If an Escape character precedes a meta-character, the meta-character is treated literally, as plain text. For example, you would use ^? to search for a question mark. To put a caret in the text, use two carets (^^). (Note that the WSearch escape character is not the same as the ASCII ESC character.) The Escape character also lets you specify characters that normally can't be used within a search pattern because they are non-printing -- for example, the carriage return or line feed characters. WSearch recognizes five escape sequences of this type: ^t Tab (ASCII 9) character. ^r Carriage return (ASCII 13) character. ^n Line feed (ASCII 10) character; during searches this also matches a carriage return/line feed pair. ^ddd Numeric ASCII character ddd (decimal). The next three characters after the ^ are considered to be the number, but a one- or two-digit number can be used if the character that follows is not a valid decimal digit. ^xdd Numeric ASCII character ddd (hex). The next two characters after the ^x are considered to be the number, but a one-digit number can be used if the following character is not a valid hex digit. If the Escape character is followed by a character other than the ones listed above, the Escape is ignored and the character after it is used literally, even if it is a meta-character. Here are some more examples: ^[Boot^] Searches for [Boot]. abc^ndef Searches for abc at the end of one line followed by def at the start of the next line. 3 ^^ 6 Searches for 3 ^ 6. Line Breaks and Line Numbers ---------------------------- If you use WSearch to search standard PC (ASCII) text files with typical line lengths, it will match text and count line numbers just as you expect, and you can probably skip the remainder of this section. However if you are using WSearch to search non-ASCII files (e.g. word processor documents, or binary files such as .EXE or .DLL files), files with very long lines (over 2048 characters), or files transferred from other systems (e.g. Apple Macintosh), you may find some of the technical details below useful. WSearch finds and counts lines by looking for LF (line feed, or ASCII 10) characters. As a result, WSearch's line numbering feature (-n option) will only be useful in standard text files with an LF at the end of each actual line. Most binary files (e.g. .EXE files) contain essentially random line breaks, and line numbers usually will not provide useful information in these files. Non-ASCII word processor documents (e.g. Word for Windows .DOC files) usually do contain LF characters, but they may not occur where you expect, and "soft" line breaks (within a paragraph), "soft" hyphens, and other formatting codes may break up text that you think of as separated only by spaces or other standard characters. To work around this you may need to search for short text strings or use the "*" character to find the text you want. For example, if you are searching a Wordperfect document for the string "the quick brown fox" and you suspect there may be special formatting characters inserted within the text, you might be more likely to find what you want by searching for an uncommon word like "fox", or for "the*quick*brown*fox". In a file without any LF characters at all, WSearch will count each 32K-byte buffer it reads as a line. When viewing such files any line numbers shown simply reflect which 32K portion of the file the text was in. For this reason, WSearch will not count lines correctly in files which use only CR (carriage return, ASCII 13) characters as line terminators (many text files created on the Apple Macintosh use only CR as a line terminator). WSearch only detects a line break when it finds an LF, or a CRLF pair. If you ask to see all matches in a file with non-standard line breaks, WSearch will display at most one match for each LF character found. In a file with no line breaks at all (including Macintosh and other similar text files), WSearch will display at most one match per buffer. Due to performance and memory usage issues in the internal design of the search process, WSearch may also -- very rarely -- fail to count lines correctly, or fail to find certain matches, in files where the line breaks are more than 2K (2048) characters apart. In particular, when a line longer than 2K characters crosses a 32K buffer boundary, with at least the first 2048 characters in one 32K portion of the file, and the remainder of the line in the next 32K portion, WSearch will count the line as two lines, not one, and will fail to match text which happens to cross this artificial "line break". For example, if you search for the string "hello" in a file with lines over 2K in length, and "hel" is at the end of one 32K section of the file and "lo" is at the start of the next 32K section, the match may not be found. As you can see, this is not likely to occur very often! And, as long as you use WSearch on files with lines shorter than 2048 characters, this problem will not affect you. Error Messages -------------- The messages listed below are displayed by WSearch when it finds a problem. Those beginning with "**" are displayed during the search process, and the search continues with the next file (if any). All other errors terminate the program. ** No match **: The text was not found in the file. ** Not found **: The file was not found. ** Unavailable **: The file could not be opened, probably because it is in use by another program or because you do not have the correct rights to view the file. Expression too long: Your text or filename regular expression was too long for WSearch's internal buffers. File search failed: WSearch could not search the directory for the file(s). You may not have the correct rights to access the directory, the directory structure may be damaged, or the operating system may be having some other problem accessing the directory.. Invalid class: A class in a regular expression was invalid. For example, you forgot to include the closing ], or you failed to terminate a range (as in the pattern [a-]). Invalid hex value: The value after a ^x did not contain a valid hexadecimal digit. Invalid switch: You used an invalid switch character on the WSearch command line. More than one expression specified: You used the -e switch to put more than one expression on the command line. No class open: WSearch found a closing ] in a text or filename pattern but there was no opening [ before it. Out of memory: WSearch could not allocate memory for one of its internal buffers or storage areas. Read error: WSearch encountered a data error while reading the file. Seek error: WSearch encountered a data error or directory error while accessing the file. String too long: A string within your expression is too long. The limit is 1024 characters. Too many file specifications: You used too many file specifications. The limit is 32 file specifications at any one time.