This is the summary of chapter 11 'RPL READER and Parser Tools' from the HP Kernel ERS, August 7, 1986; it's a transcription with minor modifications where neccassary. Thanks to HP Cv. for giving the permission to publish BNF. ----------------------------------------------------------------------------- 11.3 Parser Tools and the BNF parser The parser tools are modelled after the PRISM system MINI utility which is a parser generator (pg) program. The definition of a parser, for our purposes, is a function which takes arguments in the form ob1 .. obn n toktyptab string offset token and returns results in the form ob1' .. obn' n' toktypetab' string offset' token' flag TRUE or ob1 .. obn n toktyptab string offset token FALSE The interpretation of these pieces conforms to the intuitive notion of a parser. ob1 .. obn are objects previously parsed, and n is a binary int representing the number of objects. Toktypetab is a token-type table (see below). String is the string being parsed, token is also a string, but represents the piece of the string currently being considered. Offset is a binary int offset into (or beyond) the string and points to the 1st char not accounted for in the token. The flags returned have the following interpretation. If FALSE is returned, then parsing was never started. If TRUE TRUE is returned, then parsing was started and successfully completed. If FALSE TRUE is returned, then parsing was started, but not completed (usally an error condition). The pg 'BNF' (see below) produces a parser from its Backus-Naur-Form description given as a string. The BNF consists of a sequence of clauses seperated by '/'s. Each clause is a sequence of parsers. The evaluation of a BNF of the form A B C ... 1st evaluates A, and if A returns FALSE, then B C ... is ignored and FALSE is returned. If A returns FALSE TRUE, then B C ... is again ignored and FALSE TRUE is returned. If A returns TRUE TRUE then B C ... is evaluated sequentially until either a FALSE or a FALSE TRUE is returned, or the sequence completes. In the 1st case FALSE TRUE is returned, in the 2nd case TRUE TRUE is returned. A BNF of the form A B / C D is evaluated by evaluating A B as above, and if the result is FALSE, then evaluate C D as above and return whatever is returned by this evaluation. On the other hand, if the evaluation of A B returns TRUE TRUE or FALSE TRUE, then C D is not evaluated, and the result of A B is returned. As an example, consider the parser for a list of names: list: leftparen terminator terminator: rightparen / name terminator / list terminator From this description, you can see that (), ( ABC ), ( ABC ( CDE ) ), etc. are all lists. The pg is invoked by the word 'BNF' and expect a string in level 1 which contains the BNF description of a parser. The string starts with the token "BNF" and is ended by the token "ENDBNF". Between these two, and in addition to the clause seperator "/", the following are allowed: - an name of a variable in the current directory, assumed to be a parser. - the token " followed by another token. - the token CK" followed by another token. - the token // used in the same way as / The parser corresponding to " creates a parser using the token that follows. The parser created doesn't start unless the current token corresponds to that parser to the creator. If a match is made, then the created parser returns TRUE TRUE and drops the current token and gets the next one. The parser corresponding to CK" is nearly identical to the above, exept that it leaves the current token right where it is. The // is used as a clause seperator with the same meaning as /, except that it modifies the manner in which the last object in the terminated clause is evaluated. It's usefull for tail-recursive calls to a parser which allways completes or fails, that is never returns FALSE. If such a parser is the last object in a clause, then it can be evaluated with a COLA. The token // indicates such a situation to the BNF parser. 11.4 Extensible Parsers [Some text I didn't understand about !*XTNDPA and LAM !*FAILTOKENS, a extensible parser sheme, build-in into the RPL development system...] ... I suggest that a bottom-level RAM- or ROM-WORD which designates a parser should be named !*PA [I've strip the dammed !*]. To further refine the conventions for parser, we need to pick out two fundamental kinds of parsers. Although many parsers will not fall strictly in one category or the other, most, if not all, parsers can be decomposed, BNF-style, into parsers of these two fundamental kinds. The two kinds of parsers may be called production parsers and reduction parsers. They are distinguished by their action on tokens and currently parsed objects. A production parser ignores already parsed objects and only operates on the string and token to produce a sequence of objects. In stack notation: ob1 .. obn n toktyptab string offset token --> ob1 .. obn n ob1' .. obn' n' toktypetab' string offset' token' flag TRUE or ob1 .. obn n toktyptab string offset token FALSE In particular, a production parser returns a single sequence of objects and the number of objects, if it successfully completes parsing, and leaves the objects already on the stack unchanged. In contrast, a reduction parser doesn't alter the current offset or token and instead takes the sequence or sequences of objects on the stack and rearranges and/or combines them into new objects or sequences. Most reduction parsers will take a fixed number of sequences of objects, with restrictions on the number of objects in each sequence and return a new sequence. In stack notation: n1 .. nn toktypetab string offset token --> ' n1' .. ' nn' toktypetab string offset token flag TRUE A further convention on parser will be that any parser should be, in its net effect, a production parser. That is to say that even though a parser may be composed of sub-parsers which may be either production or reduction parser, the net effect of the parser, when successfully completed, should be that of a production parser. 11.4.1 An Example As an example of the use of the BNF pg, the generator is given in terms of itself (and a few other pieces). In the example, named subparsers which are production parsers will begin with a capital letter. [I used this example to create the *WORKING* (I wonder, if HP have ever check the functionality of their examples ;-) BNF pg in the directory BNFPG. I also bring the example to work, you can find it in the subdirectory BNFXMPL (to run it, 1st run the BNFPG-SETUP and install the resulting library).] The parser BNFPA produces a parser function from a clause or sequence of clauses seperated by '/'s or '//'s. Its operation is quite simple, but has several features which recognize special situations to aid the effeciency of the resulting function [ROTFLL :-]. In the general case, a clause of the form A B .. C / is transformed into ID A !*trior :: ID B !*triand .. ID C !*triand TrueTrue ; and a clause of the form A B .. C // is transformed into ID A !*trior :: ID B !*triand .. COLA ID C ; A sequence of clauses is tramsformed into the concatenation of the forms shown above and, when the ENDBNF is encoutered, a FALSE is appended to the form and the entire form is collected into a secondary which is then the parser function. [you will find an explaination for !*trior and !*triand along with some other entries below.] The two special cases arise when there's only one object in a clause. Normally, a clause of the form A / would produce ID A !*trior :: TrueTrue ; This can be replaced by ID A !*trior TrueTrue The 2nd case arises when the single-object clause is the final clause in the BNF description. Something of the form BNF ... A ENDBNF would produce :: ... A !*trior TrueTrue FALSE ; Since A is (assumed to be) a parser, when it's evaluated it will return the tristate-flag on top of the stack. The net result is exactly the same as if A alone were evaluated. For this reason, BNF will produce :: ... COLA A ; [As an specific example of the output of the BNF parser, run the BNFPXMPL- SETUP, store the resulting directory and then decompile e.g. Bnfob using RPL->.] 11.5 Tokens and Token-Type Tables [They are telling us what a token is - see the description of the built-in GetNextToken below] 11.6 BNF Description of the READER [Description of a program in their RPL development system, similar to ->RPL. Long and unusefull.] 11.7 Provided Objects GetNextToken ( hxs $ # --> hxs $ #' $' ) where hxs is a token-type table, $ is a string and # is an offset (in chars) into the string. Returns the ttt, the string, the offset to the first char not used in the token, and the token. The ttt is a a hxs containing 256 elements, one for each char. The table gives an implicit correspondence between ASCII characters and their TYPE, which is one of 16 values: 0 - neutral character 1 - normal character 2 - digit 3 - left delimiter 4 - right delimiter 5 - self delimiter 6 - escape character 7 - diphthong start [Not described in the ERS, but built-in in the HP48 GetNextToken: ] 8 - ? 9 - ? 10 - ? 11 - ? 12 - ? 13 - comment toggle 14 - comment off 15 - ? GetNextToken scans the string beginning at the offset until it finds the 1st non-neutral character (or the end of the string). If this char is a left-delimiter, a right-delimiter or a self-delimiter, then this char is returned as the token and the new offset points just beyond it. If the 1st non-neutral char is a dipthong-start, then this char and the following char are returned as the token. If none of the above have occured, then the type of the token is determined from the type of the 1st char, with the exception that a char of type 6 forces itself and the char following it to be interpreted as type 1. Then chars from the string are accumulated into the token until a type change occurs, or the offset gos beyond the end of the string. The accumulated token (with chars of a uniform type) is returned and the offset returned points beyond the last char which was added to the token. Note that since one of the arguments for a parser is a token, each parser that successsfully completes parsing must provide the token for the next parser in line for evaluation. In particular, the ttt for entry into a given parser is determined by the previous successfully completed parser. [Comments: Digits are treated as ordinary chars if they follow an ordinary char. Chars between two comment toggles behave as neutral chars.] !*Tok [PTR BC15] ( hxs $ # $' $'' --> hxs $ # $' FALSE ) ( hxs $ # $' $'' --> hxs $''' #' $'''' TRUE TRUE ) If the string on the top of the stack matches the string just beneath it, the top two strings are dropped, GetNextToken is evaluated and TRUE TRUE is subsequently returnde. Otherwise, the top string is dropped and FALSE is returned. !*tokck [PTR BC47] ( hxs $ # $' $'' --> hxs $ # $' FALSE ) ( hxs $ # $' $'' --> hxs $ # $' TRUE TRUE ) If the top two strings match, the top string is dropped and TRUE TRUE is returned, otherwise the top string is dropped and FALSE is returned. !*trior ( FALSE --> ? ) ( FALSE TRUE --> FALSE TRUE ) ( TRUE TRUE --> ? ) In the 1st case, FALSE and the 1st object in the top body of the runstream are dropped, and execution resumes at the (former) second object. In the 2nd case, the entire top body in the runstream is dropped. In the 3rd case, TRUE TRUE and the top body in the runstream are dropped, and the (former) 1st object in the runstream is evaluated. !*triand ( FALSE --> FALSE TRUE ) ( FALSE TRUE --> FALSE TRUE ) ( TRUE TRUE --> ? ) In the 1st case, TRUE is pushed on the stack; in either the 1st or the 2nd case, the top body in the runstream is dropped. In the 3rd case, the flags are dropped, and evaluation continues normally. TrueTrue ( --> TRUE TRUE ) failed ( --> FALSE TRUE ) ----------------------------------------------------------------------------- The BNF parser generator: In the BNF.DIR directory you'll find the following variables: BNFXMPL - BNF example: the BNF pg in terms of itself SETUP - Run run this to create a BNF pg library (id 800) in stack level 1. The <-RPL-> and <-LIB-> libraries must be installed for it to work. All following variables are source strings for ->RPL: BED - 'BNF EDitor', nice name .. ;-) BNF - BNF pg user interface, feed this program with a BNF def. BNFPA - BNF pg; named !*BNFPA in the ERS Bnfterm - term parser Bnfcls - clause parser Bnfob - object parser Maketoken - " pg Makeck - CK" pg CompileID - object pg fincls - normal clause end colacls - COLA clause end inscola - append COLA to meta object &ob - append two meta objects &nostart - append 'failed' to meta object bnfsav - save BNF pg context bnfrst - restore BNF pg context TTT - token-type table In the BNFXMPL directory you'll find the following variables (source strings, ready for compiling/BNF pg'ing) : SETUP - Run it to genarate a directory in stack level 1. The <-RPL->, <-LIB-> and BNF libraries must be installed for it to work. bnf - Same as BNF above. Change name to lower case, because of the conflict with the lib-word BNF in the BNF lib. BNFPA - BNF pg in BNF notation Bnfterm - term parser in BNF notation Bnfcls - clause parser in BNF notation Bnfob - object parser in BNF notation Maketoken - " pg Makeck - CK" pg CompileID - object pg startbnf - start BNF pg finbnf - close BNF pg startcls - start clause fincls - normal clause end colacls - COLA clause end contcls - next clause inscola - append COLA to meta object &ob - append two meta objects &nostart - append 'ID failed' to meta object bnfsav - save BNF pg context bnfrst - restore BNF pg context failed - FALSE TRUE TTT - Token-type table