DGREP ----- Dgrep is a fast Unix egrep clone. Basically, it matches a regular expression to the lines from one or several input files and displays those lines that contain the regular expression. Dgrep understands the UNIX extended regular expressions (see usage in dgrep.c or dgrep -h). If no files are given, standard input is assumed. Although dgrep is egrep clone, there are some differences, e.g. currently not all option characters are same (but most of them are). Also few options are added, some of which are from the GNU grep (see usage in dgrep.c or dgrep -h for options). Return values: ------------- 0, if matches are found 1, if no matches are found 2, if syntax error or inaccessible files 3, if internal error Portability: ----------- Originally written for MS-DOS under Turbo C, but has been ported to MSC 5.[01], OS/2 and BSD UNIX 4.3 with gcc compiler. This program should be relatively easy to port to any environment with ANSI C compiler. Authors: ------- J.Ruuth wrote most of the code P.Soini wrote boyer-moore initialization and some other things Algorithms: ---------- Dgrep uses two algorithms to search expressions. Literal strings are searched with the Boyer-Moore-algorithm and regular expressions are searched with the deterministic finite state automata. To speed up the searches literal strings from the regular expression are also searched with the Boyer-Moore-algorithm. Dgrep uses a lazy evaluation technique for the deterministic finite state automata, so the state transitions are calculated only when they are actually needed. Algorithm to create DFA directly from the regular expression is described in the reference [1]. This implementation uses quite directly ideas and algorithms from that book. Also lazy evalution tecnique is mentioned there but no algorithm is given. So lazy evalution is based on my own ideas, which certainly need improvements. The fast version is modeled after the ideas from the reference [4]. Boyer-Moore-algorithm is from the James A. Wood's fastgrep implementation. It seems to be quite direct port from the original speedups suggested by Boyer and Moore in their original paper [7]. The initialization code is written by P.Soini, and it should guarantee that the execution time is linear even in the worst case. References: ---------- [1] A.V.Aho, R.Sethi, J.D.Ullman: 'Compilers, Principles, Tecniques and Tools', Addison-Wesley 1986 [2] A.V.Aho, M.J.Corasick: 'Fast pattern matching: An aid to bibliographic search', Comm. ACM, June 1975 [3] GNU grep, version 0.999b, June 1988 [4] A.Hume: 'A Tale of Two Greps', Software - Practise and Experience, November 1988 [5] J.A.Woods: fastgrep implementation and discussion in the net, 1986 Boyer-Moore-algorithm: [6] A.Apostolico and R.Giancarlo: 'The Boyer-Boore-Galil string searching strategies revisited', SIAM J. Computing, February 1986 [7] R.S.Boyer and J.S.Moore: 'A Fast string searching algorithm', Comm. ACM, October 1977 [8] R.N.Horspool: 'Practical fast searching in strings', Software - Practise and Experience, June 1980 [9] D.E.Knuth, J.H.Morris and V.R.Pratt: 'Fast pattern matching in strings', SIAM J. Computing, June 1977 Version history: --------------- Changed to use new deterministic regular expression algorithm. J.Ruuth 15-Mar-1988 (v.1.10) More options. J.Ruuth 25-Mar-1988 (v.1.11) Changed to dynamically use maximum I/O buffer size that does not cause pointer wraparound with boyer-moore algorithm. It is meaningful only under 80(1?8[68]|286)|V[234]0 processors and 80[34]86 under 16-bit OS (OS/2|MS-DOS). Implemented for Turbo C and MSC 5.1 and Quick C 1.01. For 32-bit environments you can adjust I/O buffer size by simply changing the constant MAXBUF in file system.h and recompiling. P.Soini 03-Apr-1988 (v.1.20) O(n+m)-time regular expression searcher (old was O(nm)-time) and some changes from GNU grep. J.Ruuth 21-Nov-1988 (v.1.30) Guaranteed worst case performance reduced to O(m+n)-time after modifications to the boyer-moore algorithm supplied by Petri Soini. J.Ruuth 06-Dec-1988 (v.1.40) Added touch-option. J.Ruuth 17-Jan-1989 (v.1.44) Added options to print leading and trailing context of the match. I tought it was quite a unique idea, but GNU grep already does it, so option craracters are borrowed from it. J.Ruuth 17-Mar-1989 (v.1.50) Fixed a bug in read_exp(). J.Ruuth 08-Jun-1989 Fixed a bug when buffer didn't end with an end of line character, and changed interface slightly to the reg_comp. J.Ruuth 17-Aug-1989 (v.1.60) Changed usege to display different things at different situations. J.Ruuth 12-Sep-1989 (v.1.61) Fixes for BSD's tolower() and toupper() odd behaviour. J.Ruuth 21-Oct-1989 (v.1.62) Added possibility to make faster reg_exec using more memory. J.Ruuth 14-Jan-1990 (v.1.70) Added input buffer alignment to physical disk blocks. J.Ruuth 16-Mar-1990 (v.1.71)