
   Occam's Razor: The Cutting Edge for Parser Technology
   
   Jonathan P. Bowen ? Peter T. Breuer y
   
   June 1992
   
   Abstract
   
   Yacc is well established in the compiler-compiler field, but is
   beginning to show its age. Issues which were important when hardware
   resources were more scarce are now less critical. Precc is a new
   compiler-compiler tool that is much more versatile than yacc, whilst
   retaining efficiency of operation on modern computers. It copes with
   the context-dependent BNF grammar descriptions and higher order meta
   constructions that are naturally encountered in semi-formal concrete
   syntax specifications, building fast and efficient infinite-lookahead
   tools in the form of ANSI-compliant C code. This paper provides a
   demonstration of this state-of-the-art compiler-compiler technology
   using the programming language occam as an example. The parsing of
   occam is particularly difficult compared to some programming languages
   since the indentation is an integral part of the language. However the
   precc tool allows a natural implementation of an occam parser that
   follows the syntax very closely.
   
   Area: Tools.
   
   William Occam c. 1280{1349
   
   Entia non sunt multiplicanda praeter necessitatatem.
   
   No more things should be presumed to exist than are absolutely
   necessary.
   
   `Occam's Razor'. Ancient philosophical principle, often attributed to
   Occam, but used by many early thinkers [12].
   

   1 Existing technology
   
   The ubiquitous yacc and lex tools [10] have dominated
   compiler-compiler technology for the past decade or more. Yacc uses
   LALR(1) parsing, which has been understood and promulgated by standard
   textbooks for some time (e.g., [1]), primarily because it constructs a
   finite state automaton, and thus the run-time loading can be delimited
   precisely. This used to be of great importance when computers
   generally were limited in memory, but nowadays it is much less
   important, and more negative aspects of the yacc technology have
   started to become increasingly significant instead.
   
   Firstly, both lex and yacc-generated C code is monolithic, and a large
   language description, like that for COBOL, with over two hundred
   keywords, generates such a large virtual automaton that swapping
   problems are common at runtime even on modern workstations. This can
   result in a poor perceived performance for the application to which
   the parser is the front end, no matter how good the application code
   itself may be.
   
   Moreover, the C compiler itself often has problems with the generated
   code. The large lexer and parser routines generated by lex and yacc
   make for slow compilation and therefore a long turn-round time when it
   comes to debugging or altering a specification. Everything has to be
   recompiled when just a single change is needed, and this can be a
   source of intense frustration for the software engineer seeking to use
   a specification-driven utility in order to cut development or
   maintenance time. On the unloaded HP 9000 series workstation at which
   this sentence is being typed, for example, lex takes 35 s to compile a
   very abbreviated 163-line lexical specification for COBOL 74 into
   2,700 lines of C code. A more complete 300-line specification takes
   two minutes to generate 5,000 lines of C code. The full specification
   for the language about twice this size. The C compiler took 40 s to
   compile the small code and 73 s to compile the larger code, giving a
   110Kbyte object module, and the figures grow super-linearly with size.
   So a wait of five to ten minutes between making a change to the lexer
   and testing it can be expected, and there is no real possibility of
   using a debugging aid to help the diagnosis and testing because the
   generated code changes each time, and is famously opaque anyway. The
   statistics for yacc are worse because the scripts are even bigger.
   
   In fact, in terms of the specification alone, yacc does provide a
   fairly flexible way of handling language grammars, with at least a
   reasonable degree of efficiency. However it is not ideal for, and
   indeed cannot cater for, all languages. This is because it allows for
   only a single-token lookahead at runtime in order to decide between
   alternatives. In a sense, therefore, its runtime semantics are
   ill-matched with its specification language, because the BNF (which is
   what it uses) has no such restriction commonly associated with it.
   Moreover, it implements only a simple subset of the standard BNF
   description formats, so all BNF specifications have to be unfolded out
   into their basic components before they can be presented as a yacc
   script. This results in obscure specifications which are difficult to
   maintain, and is at least partly responsible for the excessive size of
   the virtual automaton, since much repetition is necessary.
   
   2 Introducing the precc tool
   
   Precc [4] offers a very flexible tool, providing infinite lookahead
   and handling two-level or van Wijngaarden grammars [5, 7, 15] with
   ease. These grammars are very powerful and include include
   meta-variables and hyper-rules for the generation of infinite numbers
   of productions. The concept of a `comma separated list of x', for
   example, can be defined in precc, as
   

   
   (x followed by arbitrarily many comma-then-x pairs) and different x's
   can be supplied as parameters in different parts of the specification,
   but for yacc, the construction must be expanded out in terms of the
   basic constructs at every point in the specification where it is
   needed, substituting the required name for x, and using only the
   alternation and sequential constructors:
   
 
   
   This yacc specification may also generate a `shift/reduce' conflict
   report at compile time, if any other production rule contains phooey
   ',' because yacc does not know whether to begin looking for an x to
   follow the comma (shift for more phooey) or to jump with what it
   already has into the other production (reduce). Shift/reduce reports
   are extremely confusing to software engineers because
   
   1. they refer to the virtual automaton, not the grammar specification
   script itself, and therefore rely on an understanding of yacc
   semantics which the specification language interface ought to
   encourage users to forget, and
   
   2. the report is due to the existence of a second production rule, and
   therefore is generated by the context, not, in the example above, the
   phooey rule itself.
   
   The parts of a yacc script may be separately correct, and still fail
   when combined together. In contrast, precc specifications are
   declarative and referentially transparent: it is possible to
   substitute any precc parser name by its definition anywhere in a
   script, and obtain the same semantics. The semantics of each parser
   definition are independent of their context and a precc parser
   definition can be included in any script, provided it is
   self-consistent. What happens to the shift/reduce report? Precc always
   shifts (looks for more data), and later backtracks if necessary.
   Yacc's error reports are a function of yacc's one-token lookahead
   implementation of the BNF, and since precc has no such limitation,
   there are no conflicts between specification language and
   implementation semantics to be reported.
   
   One problem often associated with tools which promise to provide extra
   flexibility is that the designers fail to ensure that the tool is
   still efficient enough to be useful in practice. For example, the OBJ3
   equational logic tool [6] provides parsing of arbitrary mixfix
   notation (with infinite lookahead). Whilst this provides great
   flexibility, it becomes rather slow for the parsing of large input
   programs; so much so that it becomes worthwhile to have a separate
   more efficient parser for more easily parsable languages. But precc,
   like yacc, maintains efficiency by translating the parser description
   into C code. Moreover, it makes use of the premise that modern C
   compilers make function calls fast and efficient in order to make the
   C call stack do much of its work. The precc design assumes that
   function calls are efficient in C, and that many small compact
   functions are better than a single large one. Happily this conjecture
   appears to be correct, because precc runs quickly in practice. Precc
   compiles a similar description to the COBOL 74 lexical specification,
   consisting of 162 lines, into 1,500 lines of C code in 1.1 s on the HP
   workstation. Its own C code takes no longer to generate.
   
   The C code takes 22 s to compile, as might be expected, but the most
   significant benefit from using precc is modularity. Both precc
   specifications and generated code can be divided up into modules in
   any way that is desired, compiled separately and linked together
   incrementally, because the generated functions communicate across the
   C stack. Provided the C compiler supports placeholders (prototypes) in
   place of code, precc specifications need not even be complete before
   they are capable of running and being tested. This allows greatly
   reduced turn-round times in development and maintenance, and renders
   the figures for monolithic scripts and C code meaningless.
   
   3 The parsing process
   
   Using the C stack and function-calling to do the work instead of a
   virtual automaton is not a liability. As a realistic demonstration, a
   significant portion of the concrete syntax for the occam programming
   language [9] has been investigated and part of this is presented in
   this paper. The philosophical Occam's razor is a maxim that the
   acceptable tenet can be distinguished from the unacceptable one by its
   simplicity and we propose to use the language occam as a razor which
   separates acceptable parser technology from unacceptable parser
   technology.
   occam can be divided into eleven modules which follow the headings in
   the concrete syntax, as shown in Figure 1. Only the occam and token
   modules are extra to the concrete syntax; the former contains some
   useful meta-constructions and the latter does duty for a lexical
   analyser. The modules are associated with a precc grammar
   specification script foo.y, the C code that precc generates from it,
   foo.c, and the object module foo.o produced by the C compiler. Linking
   all the object modules together with the precc kernel library gives
   one the executable parser. There are 104 separate definitions in all,
   in 400 lines, and they generate 2,600 lines of C code (60 Kbytes), and
   not much more object code.
   

   Experiments with the precc parser for occam show that a C stack of
   4Kbytes can accommodate about 300 nested function calls (therefore
   averaging 12 bytes each) from the precc kernel, and that this size
   corresponds to occam constructs which are nested five layers deep. The
   maximum nesting is 40 (that is, 80 columns divided by the 2-space
   indentation increment imposed by the language), so the runtime stack
   cannot grow beyond about 32Kbytes in size in practice. This is an
   acceptable overhead, and not at all the bugbear that might have been
   feared for a recursive infinite state machine. The sixty recursive
   calls per occam nesting layer are composed of about 45%
   reduce-equivalents, 45% instructions which attach provisional
   `actions' to the parse for later execution, and 10% shift-equivalents,
   each shift usually being associated with an input token, and a
   context-save of about 20bytes which permits backtracking. The other
   calls push only their own return addresses and zero, one or two
   parameters onto the stack, so the load introduced by the precc
   infinite lookahead and backtracking technology is not very high. occam
   is not an easy language to parse. In particular, it requires the
   lexical indentation of constructs from the left hand edge to be
   recognised by the parser since this often delimits their scope, in
   place of the perhaps more familiar BEGIN/END delimiters of Pascal and
   other block structured languages (or `{' and `}' of C). As a result,
   knowledge of the current indentation may be needed at many points
   during parsing, and backtracking to an alternative interpretation may
   be necessary as the scope of (possibly several) constructs is ended.
   For example, the two programs below from [14] differ only in the
   indentation of one line, but the program on the left gives y=10, and
   that on the right, y=0:
   
 
   
   The scope of the second SEQ is terminated by the `offside' (less
   indented) statement. occam could theoretically be handled by a LR(40)
   grammar, with a 40 token lookahead since the indentation may be
   limited to 80 columns (2 spaces per indentation). So yacc could handle
   all practical occam programs, once the LR(40) specification were
   expanded out into LR(1) form. However, this would be very wasteful of
   space for the automaton's tables, much of which would be duplicated.
   The use of purely synthetic attribute grammars is in any case slightly
   problematic because several syntactic constructs may be terminated by
   a single lexical group. For example, in
   
 
   
   the group y:=z belongs to the outermost SEQ by virtue of its
   indentation position, which terminates the two innermost SEQ
   constructs, so it is not clear what token type it should present. A
   prior pass may
   
   be necessary to insert BEGIN and END tokens which can serve as an
   unambiguous delineation of the block structure, but this amounts to
   admitting that yacc-type technology is not well-suited to this kind of
   parsing problem.
   
   Another area of concern, particularly for safety-critical systems, is
   the correctness of the parsing process [13, 14], and here precc also
   scores heavily. High-level languages, rather than assembler, are now
   being recommended for safety-critical applications since it is
   increasingly recognised that programmers make fewer errors as a
   result. However, high-level language compilers are much more
   complicated than assemblers and need a higher degree of validation
   before they can become acceptable in these applications. Much current
   research into compiler verification concentrates on the compilation
   process from an abstract tree representation of the language to the
   object code (e.g., for a subset of occam [8]).
   
   Indeed, many declarative programming languages, such as Prolog,
   provide explicit support for infix and other operators, with variable
   precedence, etc. The input to programs (e.g., compilers) written in
   such languages can be supplied in an `abstract' tree form that is
   sufficiently readable to be used directly by human programmers, at
   least for prototyping purposes [2]. This can obviate the need to
   supply any explicit parser at all. Alternatively, precc may be used to
   generate the abstract tree input to such tools, thus allowing the
   input language to use the true concrete syntax, even in a prototype
   compiler, if this is required.
   
   However the initial parsing phase from a concrete representation of
   the language to an abstract parsetree is an important link in the
   whole compilation process in any practical and realistic
   implementation. Precc can make this phase amenable to validation,
   because precc implements a well-understood higherorder model directly
   in C, and the semantics is associated with a set of proof rules that
   will demonstrate exactly when a given stream will satisfy (or will not
   satisfy) a given parse specification. But precc scripts themselves
   correspond almost verbatim to the concrete grammar specification, so
   there is at least a case for arguing that once the precc constructors
   have been validated and understood, no more proof is required. In
   summary, then, precc attempts to provide a tool that
   
   1. is flexible and efficient in practice,
   
   2. provides an easy means to generate correct parsers for languages
   that cannot be dealt with conveniently by many current
   compiler-compiler tools, and
   
   3. offers the possibility of a validated front-end.
   
   4 A realistic parser for Occam
   
   This section presents a precc parser for a significant subset of occam
   2, and demonstrates how closely the parser can be related to the
   BNF-style semi-formal syntax provided in Appendix G of [9], a sample
   of which is given in Figure 2. The indentation of constructs is
   specified by presenting the BNF description in indented form where
   appropriate.
   
   The corresponding specification in precc is shown in Figure 3. This is
   one of the more complicated and therefore more interesting parts of
   the specification as far as parsing is concerned. The precc
   specification differs in several respects. Firstly and most obviously,
   the majority of the constructions are parameterised. They take the
   parameter n, which denotes the absolute indentation of the concrete
   representation, so that
   
   conditional(5)
   
   denotes an IF construct with the initial I in column 5. Where the
   concrete specification denotes extra indentation by spacing, the precc
   specification uses an incremented parameter instead. Thus
   
   choices(n+2)
   
   is included where the concrete syntax instead has choices two spaces
   further to the right than the governing IF keyword.
   
   Secondly, the precc specification employs special matches for the `end
   of line'. Although precc can match the literal ascii line feed control
   character directly, the default lexical analyser which comes with
   
  
   
   precc generates the special zero token at an end of line, and it is
   this that is matched by the `$! ' and `$' constructs (see below).
   
   Though precc can use lexers generated by the lex utility directly, in
   exactly the same way as yacc does, no special lexical analysis is
   required for the purposes of this trial, and the default lexer can be
   used. This lexer passes every character unchanged, except that it
   substitutes a line feed control character with the special zero token.
   The rationale behind the design of the default lexer is that `end of
   line' frequently marks a natural break point at which special action
   must be taken, and yacc-compatible lexers conventionally signal
   special conditions with the zero token. Actions which check for end of
   file conditions can be attached to the match for the zero token.
   
   The `$' is a match for end of line which can be backtracked over, and
   the `$! ' is a match for one which cannot. It incorporates the special
   cut symbol, `! ', which disables backtracking. It is best to include
   as many cuts as possible in a precc script, because in the event of an
   error being found in the parsed stream, they prevent exponential
   searches through all the possible alternative parses. The cut is one
   of the innovative implementational aspects of precc in comparison with
   other LL(1) parsers; the use of the C call stack is another.
   
   It is certain that an IF begins a conditional statement and nothing
   else, so a cut may be inserted after the IF, but it is conceivable
   that it may be necessary to backtrack over the newline following a
   boolean expression in case the following stream does not match a
   `clause(n+2)' after all. In fact, this is probably never going to be
   the case, but it is best to allow for such a possibility, so a `$' is
   needed at this point. Thirdly, the precc specification uses the square
   brackets syntax
   
   [ foo ]
   
   to indicate an optional inclusion, where the concrete syntax uses
   curly brackets instead. This is a trivial difference { precc reserves
   the curly brackets for use as grouping parentheses.
   
   Fourthly, where the concrete specification has `process', the precc
   specification has `clause(n+2)' instead. The `n+2' indicates extra
   indentation to the right, but the clause(n+2) construct has been
   introduced to stand for
   
   indent(n+2,process)
   
   which might equally well be placed directly in the specification.
   However, giving a name to the construction makes the specification
   more maintainable because it eliminates repetition and narrows the
   range over which changes may have to be made at some future point. It
   also makes the resulting code more compact, at the cost of a tiny loss
   in runtime performance which may even be optimised away by the C
   compiler. (Experiments have shown that function calls with simple
   bodies { which this construction induces { are usually copied into
   inline code by optimizing C compilers.) All this is possible because
   of precc's declarative semantics. An expression may be substituted by
   a name for the expression anywhere in the text without altering the
   semantics.
   
   Why does the precc specification need an indent(n,process) construct
   at all, when the concrete syntax includes nothing of the sort? It is
   because the n spaces which appear to the left must be remembered. The
   concrete syntax specification drops these from the representation of
   succeeding lines, showing only the relative indentation. Inserting
   them explicitly gives the concrete specification the shape
   

   and the precc guardedchoice(n) definition is to represent the
   construction starting with a boolean in column n. The full precc
   specification across the two lines has to include the n blank spaces
   before the process part:
   
   
   
   and the whole lower line of this definition is replaced in the actual
   script by clause(n + 2). We may define the higher order construction
   
  
   because the pattern is frequently reused in the occam specification.
   Note that `blank ?n' is the precc syntax for `blank repeated n times',
   and a `?' without a following expression means ` repeated zero or more
   times'. One example of the use of indent is in the definition of
   clause(n):
   
   
   The script can be made even more flexible if required. The precc
   parser can allow process to start beyond the mandatory two spaces
   farther to the right, and can compute any indentations to come
   relative to the actual position in which process starts (although this
   flexibility is disallowed by the standard occam language, perhaps
   becuase of the extra complexity of handling it). This may be achieved
   by including
   
   indented(n,p)
   
   instead of p(n), and defining the indented grammars so that they
   detect any extra blanks and incorporate them into the indentation
   count:
   
  
   
   There is some inefficiency introduced here, because a malformed p
   following some blanks may be (partially) scanned twice or more, but it
   may be considered worthwhile because of the extra flexibility in the
   acceptable syntax that it affords. There are also good reasons for
   inserting a test into the specification to ensure that n does not grow
   too large:
   

   
   This will prevent arbitrary indentations, which are not allowed by the
   occam folding editor in any case. Nearly all the recursion in the
   parser is introduced by this one construction. Note that the
   out-turned parentheses, which denote a parse-time test, may contain
   any valid C expression returning a one-or-zero integer result,
   following the C convention for boolean values. There is a danger of
   destroying the declarative semantics here, since C expressions may be
   used to implement side-effects, but this possibility will only rarely
   be required, since the cost in terms of maintainability is great.
   Side-effects may also be incorporated into parameters, since these may
   be arbitrary C expressions too. But side-effects in actions (see
   below) attached to the parse can have only limited effect on the parse
   itself, because precc discharges actions after parsing, not during, at
   the `cut' points marked in the specification script.
   
   The concrete syntax and the precc specification differ in one final
   point. The precc specification allows each choice case in an IF
   statement to be preceded by an optional set of specifications. In
   fact, the concrete syntax specifies this too, but elsewhere, with the
   recursive alternate production:
   
   choice = specification choice
   
   In this respect, precc is deficient. All the production rules for a
   given target must appear together, as alternates, or equivalent, in
   the one definition and cannot be scattered across a script, or across
   several scripts. This is a necessary requirement for modularity and is
   good practice anyway.
   
   5 Tokenisation and actions
   
   As mentioned above, the precc parser specification includes a token
   module which acts as the lexical analyser to the rest of the parser.
   There is no need to use precc as its own lexer, but there is no
   disadvantage in doing so either. The module consists of eight
   definitions of various sorts of white space, one metaproduction which
   helps with the recognition of keywords, and the keywords themselves.
   This module illustrates the remaining parts of precc's language quite
   well. The module is shown in Figure 4. All the other `terminals', like
   digit, were defined in the literal section of the concrete syntax, and
   appear in the literal.y precc specification file.
   
   The first point to note about this script is that any line which does
   not begin with an `@' is passed through as unchanged C code, since
   precc scripts are literate in the sense of Donald Knuth. Thus the C
   preprocessor directives may be used to include files. The occam.h file
   contains the definitions of the C macros EMPTY which tests for an
   empty string, HD which returns the head of a non-empty string, and TL
   which returns the tail of a string.
   
   Secondly, the C macros are quite valid as parameters to the
   grammatical rules, or as a boolean test (inside the out-turned
   parentheses). But C macros and C code have a more important use, in
   the actions which one can attach to a parser. The actions appear
   between colons anywhere in the rule, and logically `occur' in a
   sequence corresponding to their positioning, just as yacc attached
   actions do. The actions will be discharged whenever a cut point (`! ')
   is reached in the parse specification, and cuts disable backtracking.
   The actions can make use of the synthetic attributes which are
   attached to each terminal and nonterminal in a rule. Again, the syntax
   is exactly as in yacc (and thus is compatible for existing users,
   although perhaps not ideal). `$1' refers to the first component of the
   sequence to the left of the action, `$2' to the second, and so on. A
   value can be attached to the whole rule by assigning it to `$$'. No
   assignments are made here, because this parser was only intended to
   generate a tree representation, properly scoped
   
  
   
   with the block structure. Thus the loweralpha rule outputs the
   character which is held as the $1 value using the C printf("%c",$1)
   library call. Precc has to dive into the C expressions in order to
   translate these references, so it is not possible to hide them within
   C macros, but otherwise $1, $2, etc may be treated like (volatile) C
   variables of an integer type.
   
   That character value was assigned by the default lexer, which placed
   it in the yylval variable when precc requested a token. This is the
   standard mode for lexers generated by lex, and the precc default lexer
   conforms to this de facto standard, even though it is noticeably less
   efficient than it might be (the fast way to pass tokens and values to
   precc is to write them directly, a line at a time, into the yybuffer
   address that it presents). The
   
   ( islower )
   
   expression is a predicate. It tests the incoming tokens using the C
   islower() library call. A matching token will have its yylval value
   scanned and attached as the attribute of the ( islower ) expression.
   In this manner, precc can handle arbitrary amounts of tokenising
   itself without the need for a separate tool like lex.
   
   6 Trials
   
   It proved possible to write the concrete occam syntax down as a precc
   specification script directly from the documentation, maintaining the
   same structure as the syntax document presented. It took the author of
   precc about ten days of on-off effort to get everything up and
   working.
   
   The resulting script is remarkably readable, as may be observed from
   the portions presented above. Inspection has nearly always been
   sufficient to clarify points of doubt about the parser's behaviour,
   and on the one or two points where the debugger has been resorted to,
   it has turned out to be remarkably easy to follow the precc semantics
   in action, because all the declarations in the syntax script
   correspond to C functions of the same name in the compiled C code
   (unfortunately, this has meant that one or two changes of name from
   the concrete syntax have been required { there cannot be an int
   declaration, for example, because it clashes with the C type name),
   and they are conserved over recompilations.
   
   Much of the work was done on a portable PC, since the precc code is
   portable to any ansi-compliant C programming environment. The precc
   compiler-compiler itself is `written' in self-generated C code, so it
   ports across too. The resulting code also compiled at once when
   transferred back to a networked,
   
   workstation environment, and the occam parser itself runs quite fast
   even on the PC (it took a while to discover that the PC only allocates
   4Kbytes for theC call stack by default).
   
   7 Conclusion
   
   Precc provides a balance of efficiency and flexibility that may be
   useful for handling programming languages that are not conveniently
   catered for by currently available compiler-compiler tools.
   Surprisingly, these include many older languages, which were developed
   before the lex and yacc tools became available and therefore made no
   special concessions to the LR(1) technology.
   
   BNF-like descriptions, arranged in modules, may be provided to precc
   which then generates C code for the parsing phase of a compiler or
   translator for the input language. Modules may be recompiled
   separately and linked in incrementally. The description language both
   supports the use of ordinary parameters, and meta-variables, which may
   hold the names of other grammar descriptions.
   
   As well as occam, precc has been successfully applied to the full
   definition of ansi COBOL 74 for use on the collaborative REDO project,
   which is concerned with reverse engineering of programs written in
   COBOL and other programming languages [11]. Here it is required to
   translate the program to a higher-level form of representation in the
   first step towards re-engineering the program. This project has also
   considered decompilation as well, thus allowing the possibility of
   generating high-level programs from low-level object code [3]. Precc
   is currently being used to generate a practical decompiler-compiler in
   support of this technique.
   
   Acknowledgements
   
   Both authors owe thanks to the collaborative European ESPRIT REDO
   (2487) and ProCoS (3104) projects, and the UK IED safemos (P1036)
   project at Oxford for our funding and other inputs.
   
   References
   
   [1] A.V. Aho and J.D. Ullman. Principles of compiler design.
   Addison-Wesley Publishing Company, 1977.
   
   [2] J.P. Bowen. From programs to object code using logic and logic
   programming. In R. Giegerich and S.L. Graham (eds.), CODE91
   International Workshop on Code Generation, IBFI Schloss Dagstuhl,
   Germany, 20{24 May 1991. Springer-Verlag, Workshops in Computing,
   1992. (To appear)
   
   [3] J.P. Bowen and P.T. Breuer. Decompilation. In H. van Zuylen (ed.),
   The REDO Compendium of Reverse Engineering for Software Maintenance,
   chapter 9, John Wiley & Sons Limited, 1992. (To appear)
   
   [4] P.T. Breuer. The `Precc' infinite look-ahead compiler-compiler.
   ESPRIT REDO project document, 2487-TN-PRG-1002, Oxford University
   Computing Laboratory, UK, June 1989.
   
   [5] P. Deussen and L.M. Wegner. A bibliography of the van Wijngaarden
   grammars. Bulletin of the European Association of Theoretical Computer
   Science (EATCS), 6, 1978.
   
   [6] J.A. Goguen and T. Winkler. Introducing OBJ3. Technical Report
   SRI-CSL-88-9, SRI International, Menlo Park, California, USA, August
   1988.
   
   [7] D. Grune. How to produce all sentences from a two-level grammar.
   Information Processing Letters, 19, pp. 181{185, 1984.
   
   [8] C.A.R. Hoare, He Jifeng, J.P. Bowen and P.K. Pandya, An algebraic
   approach to verifiable compiling specification and prototyping of the
   ProCoS level programming language. In Directorate-General of
   the Commission of the European Communities (ed.), ESPRIT '90
   Conference Proceedings, Brussels pp. 804{818, Kluwer Academic
   Publishers B.V., 1990.
   
   [9] INMOS Limited. Occam 2 reference manual. Prentice Hall
   International Series in Computer Science, 1988.
   
   [10] S.C. Johnson and M.E. Lesk. Language development tools. The Bell
   System Technical Journal, 57(6) part 2, pp. 2155{2175, July/August
   1978.
   
   [11] K.C. Lano and P.T. Breuer. From programs to Z specifications. In
   J.E. Nicholls (ed.), Z User Workshop, Oxford 1989, pp. 46{70,
   Springer-Verlag, Workshops in Computing, 1990.
   
   [12] The Oxford Dictionary of Quotations. 3rd edition, Oxford
   University Press, 1979.
   
   [13] D. Weber-Wulff. A (vW)-grammar for concrete PL0 syntax and parser
   correctness. ESPRIT BRA 3104 ProCoS project document [Kiel DWW 2/2],
   Christian-Albrechts Universitat zu Kiel, Germany, May 1990.
   
   [14] D. Weber-Wulff. Proven correct front-end specification. In B. von
   Karger (ed.), ESPRIT BRA 3104 Provably Correct Systems ProCoS Draft
   Final Deliverable, chapter 5, volume 3, Compiler Development. 1991.
   (To be published, Springer-Verlag LNCS, 1992)
   
   [15] L.M. Wegner. On parsing two-level grammars. Acta Informatica, 14,
   pp. 175{193, 1980.
