GREP: A UNIX-like text-search utility; better than "FIND" in MS-DOS. ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Mini-instructions: Type "GREP ?" at the ³ ³ DOS prompt for the grep help screen. ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ Used by the Goodies Disk FETCH batch file. The following documentation is included for those who wish to use GREP by itself for their own purposes. I suggest putting it in your DOS directory and using it as an "upgrade" for DOS's "FIND" command. -jkh- Portions of this documentation are copyright (c) 1990 Borland International. All rights reserved. Used with permission. ÚÄÄÄÄÄÄÄÄÄÄ¿ ³ OVERVIEW ³ ÀÄÄÄÄÄÄÄÄÄÄÙ GREP (Global Regular Expression Print) is a powerful text-search program derived from the UNIX utility of the same name. GREP searches for a text pattern in one or more files or in its standard input stream. Here's a quick example of a situation where you might want to use GREP. Suppose you wanted to find out which text files in your current directory contained the string "Elisabeth." You would issue the command grep Elisabeth *.txt and GREP would respond with a list of the lines in each file (if any) that contained the string "Elisabeth." (4DOS users can pipe the output to LIST/S for convenient perusal and/or printing.) Note: the strings "elisabeth" and "ELISABETH" would *not* be considered matches. If you *do* want them to be matches, use grep -i Elisabeth *.txt and the "-i" option (which means "-ignore case") would force grep to treat uppercase and lowercase the same. More about this and other options later. GREP can do a lot more than match a single, fixed string. In the section that follows, you'll see how to make GREP search for any string that matches a particular pattern. ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Command-line syntax ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ The general command-line syntax for GREP is grep [options] searchstring [filespec ... ] options consist of one or more letters, preceded by a hyphen (-), that let you change various aspects of GREP's behavior. searchstring gives the pattern to search for. filespec (a list of file specifications) tells GREP which files to search. (If no file is specified, GREP searches its standard input; this lets you use GREP with pipes and redirection.) In addition, the command GREP ? prints a brief help screen showing GREP's command-line options, special characters, and defaults. (See the description of the -u command-line option for information on how to change GREP's defaults.) ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ GREP options ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ In the command line, options are one or more single characters preceded by a hyphen (-). Each individual character is a switch that you can turn on or off: A plus symbol (+) after a character turns the option on; a hyphen (-) after the character turns the option off. The default is on; for example, -r means the same thing as -r+. You can list multiple options individually (like this: -i -d -l), or you can combine them (like this: -ild or -il, -d, and so on); it's all the same to GREP. Here are the GREP option characters and their meanings: Option Meaning ______ ____________________________________________________ -c Count only: Prints only a count of matching lines. For each file that contains at least one matching line, GREP prints the file name and a count of the number of matching lines. Matching lines are not printed. -d Directories: For each filespec specified on the command line, GREP searches for all files that match the file specification, both in the directory specified and in all subdirectories below the specified directory. If you give a filespec without a path, GREP assumes the files are in the current directory. -i Ignore case: GREP ignores upper/lowercase differences (case folding). GREP treats all letters a to z as identical to the corresponding letters A to Z in all situations. -l List match files: Prints only the name of each file containing a match. After GREP finds a match, it prints the file name and processing immediately moves on to the next file. -n Numbers: Each matching line that GREP prints is preceded by its line number. -o UNIX output format: Changes the output format of matching lines to support more easily the UNIX style of command-line piping. All lines of output are preceded by the name of the file that contained the matching line. -r Regular expression search: The text defined by searchstring is treated as a regular expression instead of as a literal string. This option is on by default. [Note to non-UNIX people: don't panic; "regular expression" is defined below. -jkh-] -u Update options: GREP will combine the options given on the command line with its default options and write these to the GREP.COM file as the new defaults. (In other words, GREP is self- configuring.) This option allows you to tailor the default option settings to your own taste. If you want to see what the defaults are in a particular copy of GREP.COM, type GREP ? at the DOS prompt. Each option on the help screen will be followed by a + or a - depending on its default setting. [See note below about using the -u option on a compressed GREP.COM. -jkh-] -v Nonmatch: Prints only nonmatching lines. Only lines that do not contain the search string are considered to be nonmatching lines. -w Word search: Text found that matches the regular expression is considered a match only if the character immediately preceding and following cannot be part of a word. The default word character set includes A to Z, 0 to 9, and the underscore ( _ ). An alternate form of this option lets you specify the set of legal word characters. Its form is -w[set], where set is any valid regular expression set definition (see below). If you define the set with alphabetic characters, it is automatically defined to contain both the uppercase and lowercase values for each letter in the set (regardless of how it is typed), even if the search is case-sensitive. If you use the -w option in combination with the -u option, the new set of legal characters is saved as the default set. -z Verbose: GREP prints the file name of every file searched. Each matching line is preceded by its line number. A count of matching lines in each file is given, even if the count is zero. ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Order of precedence ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ Remember that each of GREP's options is a switch: Its state reflects the way you last set it. At any given time, each option can only be on or off. Each occurrence of a given option on the command line overrides its previous definition. Given this command line, grep -r -i- -d -i -r- main( my*.c GREP runs with the -d option on, the -i option on, and the -r option off. The initial "-r" is lost due to the subsequent "-r-", and so forth. You can install your preferred default setting for each option in GREP.COM with the -u option. For example, if you want GREP to default to a verbose search (-z on), you can install it with the following command: grep -u -z Use GREP ? to check the currently set defaults. [Note well: The -u option actually modifies the GREP.COM file itself. Therefore, do NOT use the -u option if you've compressed GREP.COM with LZEXE or any other "program packer". If you have compressed GREP, and wish to customize it with the -u option, you must first unpack GREP to its original form (or recopy it from the Goodies Disk), use the -u option, and then compress it again. -jkh-] ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ The search string ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ To use GREP well, you'll need to become proficient at writing search strings. The value of searchstring defines the pattern GREP searches for. A search string can be either a regular expression or a literal string. In a regular expression, certain characters have special meanings: They are operators that govern the search. In a literal string, there are no operators: Each character is treated literally. You can enclose the search string in quotation marks to prevent spaces and tabs from being treated as delimiters. The text matched by the search string cannot cross line boundaries; that is, all the text necessary to match the pattern must be on a single line. A regular expression is either a single character or a set of characters enclosed in brackets. A concatenation of regular expressions is a regular expression. ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Operators in regular expressions ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ When you use the -r option (on by default), the search string is treated as a regular expression (not a literal expression). The following characters take on special meanings: Option Meaning ______ ____________________________________________________ ^ A circumflex at the start of the expression matches the start of a line. E.g. "^hyde" only finds those lines that *begin* with "hyde". $ A dollar sign at the end of the expression matches the end of a line. E.g. "?$" only finds those lines that *end* with a question mark. . A period matches any character (single wildcard). * An expression followed by an asterisk wildcard matches ZERO OR MORE occurrences of that expression. For example, in "to*", the * operates on the expression o; it matches t, to, too, etc. (t followed by zero or more os), but doesn't match ta. + An expression followed by a plus sign matches ONE OR MORE occurrences of that expression: to+ matches to, too, etc., but not t. [ ] A string enclosed in brackets matches any character in that string, but no others. If the first character in the brackets is a circumflex (^), the expression matches any character EXCEPT the characters in the string. For example, [xyz] matches x, y, or z, while [^xyz] matches everything but x, y, and z. You can specify a range of characters with two characters separated by a hyphen (-). These can be combined to form expressions (like [a-bd-z?], which matches the ? character and any lowercase letter except c). \ The backslash escape character tells GREP to search for the literal character that follows it. For example, "\." matches a period instead of "any character." The backslash can be used to quote itself; that is, you can use \\ to indicate a literal backslash character in a GREP expression. Note: Four of the "special" characters ($, ., *, and +) don't have any special meaning when used within a bracketed set. In addition, the character ^ is only treated specially if it immediately follows the beginning of the set definition (immediately after the "[" delimiter). Any ordinary character not mentioned in the preceding list matches that character (> matches >, # matches #, and so on). ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ File specifications ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ filespec tells GREP which files (or groups of files) to search. filespec can be an explicit file name, or a "generic" file name incorporating the DOS ? and * wildcards. In addition, you can enter a path (drive and directory information) as part of filespec. If you give filespec without a path, GREP searches the current directory. If you don't specify any file specifications, input to GREP must come from redirection (<) or a pipe (|). ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Some GREP examples ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ The following examples show how to combine GREP's features to do different kinds of searches. They assume GREP's default settings are unchanged. ÚÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Example 1 ³ ÀÄÄÄÄÄÄÄÄÄÄÄÙ The search string here tells GREP to search for the word "main" with no preceding lowercase letters ([^a-z]), followed by zero or more occurrences of blank spaces (\ *), then a left parenthesis. Since spaces and tabs are normally considered to be command-line delimiters, you must quote them if you want to include them as part of a regular expression. In this case, the space after "main" is quoted with the backslash escape character. You could also accomplish this by placing the space in double quotes. Command line: grep -r [^a*z]main\ *( *.c Matches: main(i:integer) main(i,j:integer) if (main ()) halt; Does not match: mymain() MAIN(i:integer); Files searched: *.C in current directory. ÚÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Example 2 ³ ÀÄÄÄÄÄÄÄÄÄÄÄÙ Because the backslash (\) and period (.) characters usually have special meaning in path and file names, you must place the backslash escape character immediately in front of them if you want to search for them. The -i option is used here, so the search is not case sensitive. Command line: grep -ri [a*c]:\\data\.fil *.c *.inc Matches: A:\data.fil c:\Data.Fil B:\DATA.FIL Does not match: d:\data.fil a:data.fil Files searched: *.C and *.INC in current directory. ÚÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Example 3 ³ ÀÄÄÄÄÄÄÄÄÄÄÄÙ This format basically defines how to search for a given word. Command line: grep -ri [^a*z]word[^a*z] *.doc Matches: every new word must be on a new line. MY WORD! word--smallest unit of speech. In the beginning there was the WORD, and the WORD Does not match: Each file has at least 2000 words. He misspells toward as toword. Files searched: *.DOC in the current directory. ÚÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Example 4 ³ ÀÄÄÄÄÄÄÄÄÄÄÄÙ This format defines a basic "word" search. Command line: grep -iw word *.doc Matches: every new word must be on a new line. However, MY WORD! word: smallest unit of speech which conveys In the beginning there was the WORD, and Does not match: each document contains at least 2000 words! He seems to continually misspell "toward" as "toword." Files searched: *.DOC in the current directory. ÚÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Example 5 ³ ÀÄÄÄÄÄÄÄÄÄÄÄÙ This is an example of how to search for a string with embedded spaces. Command line: grep "search string with spaces" *.doc *.c a:\work\myfile.* Matches: This is a search string with spaces in it. Does not match: THIS IS A SEARCH STRING WITH SPACES IN IT. This search string has spaces in it, too. Files searched: *.DOC and *.C in the current directory, and MYFILE.* in a directory called \WORK on drive A. ÚÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Example 6 ³ ÀÄÄÄÄÄÄÄÄÄÄÄÙ This example searches for any one of the characters " . : ? ' and , at the end of a line. The double quote within the range is preceded by an escape character, so it is treated as a normal character instead of as the ending quote for the string. Also, the $ character appears outside of the quoted string. This demonstrates how regular expressions can be concatenated to form a longer expression. Command line: grep -rd "[ ,.:?'\"]"$ \*.doc Matches: He said hi to me. Where are you going? In anticipation of a unique situation, Examples include the following: "Many men smoke, but fu man chu." Does not match: He said "Hi" to me Where are you going? I'm headed to the Files searched: *.DOC in the root directory and all its subdirectories on the current drive. ÚÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Example 7 ³ ÀÄÄÄÄÄÄÄÄÄÄÄÙ This example ignores case and just prints the names of any files that contain at least one match. The three command-line examples show different ways of specifying multiple options. Command line: grep -ild " the " \*.doc or grep -i -l -d " the " \*.doc or grep -il -d " the " \*.doc Matches: Anyway, this is the time we have do you think? The main reason we are Does not match: He said "Hi" to me just when I Where are you going? I'll bet you're headed Files searched: *.DOC in the root directory and all its subdirectories on the current drive. ÚÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Example 8 ³ ÀÄÄÄÄÄÄÄÄÄÄÄÙ This example redefines the current set of legal characters for a word as the assignment operator (=) only, then does a word search. It matches C assignment statements, which use a single equal sign (=), but not equality tests, which use a double equal sign (==). Command line: grep -w[=] = *.c Matches: i = 5; j=5; i += j; Does not match: if (i == t) j++; /* ======================= */ Files searched: *.C in the current directory.