A technique for creating small fast compiler frontends

1985 
A technique using minimal perfect hash functions to generate small fast table driven frontends is described. A parser for the PASCAL language generated by this method is then presented.In this paper a technique for creating small fast table driven compiler frontends is described. To demonstate the practicality of this method, we give the code and tables for a demonstration parser for the PASCAL language.Our technique is based on minimal perfect hash functions, MPHF. A MPHF is a bijection from a set of N objects to the first N consecutive non-negative integers. We use the technique for generating MPHFs described by the author in [2] to generate three different hash tables: the parser table, the lexical table and the reserved word table. In [2] the author described an algorithm called the mincycle algorithm whose input is W: a set of N objects, R1 and R2: two positive integers, and hO:W-gI, hi:W-gO., R1-1 and h2:W-gR1., R-1: three pseudo-random functions. Here I is the set of integers and R=R1+R2. The mincycle algorithm looks for and can be expected to find a function g:0., R-1-gI such that h(w) = (h0(w) + g(h1(w)) + g(h2(w))) mod N is a MPHF.We will not give the details of the mincycle algorithm as they are described in detail elsewhere [2]. We merely note that if we can guarantee that w ε W, w' e W, h1(w)=hl(w') and h2(w)=h2(w') implies that w=w' then h0 serves no useful purpose and may be omitted.The parser is created directly from syntax diagrams which are slightly modified from those given by Jensen and Wirth [1]. From these syntax diagrams we have created an augmented transition network (ATN) [3]. The only actions we use are push and pop for a syntax stack containing states of the ATN and install, lookup, newscope and oldscope which manipulate a rudimentary symbol table. We use information from the symbol table only at states from which more than one valid transition is possible on seeing an identifier. Thus, checking for valid semantics is extremly rudimentary, but more complete semantic checking could be added to the existing syntactic framework.The parser code is given in figure 1. The output of lex is syntactic tokens. Statement 1: is a MPHF. If we do not find the state-token pair in the transition table (transtab) we use the default table (deftab) to find the next state and a (possibly null) action. The transition table (Table 1) gives each valid non-default state-token pair of the ATN along with its position in the transition table and its associated next state and action.The state table (Table 2) gives for each state its value in the state indirect table (stindtab) and its associated default next state and action. The token table (Table 3) gives for each token its value in the token indirect table (tokindtab). The parser contains 126 states, 53 tokens and 196 valid non-default state-token pairs.The lexical analyzer was created in a manner similar to the parser. The reserved word table was created using the algorithm given in [2]. Unlike Jensen and Wirth [1] we consider read, readln, write and writeln to be reserved words. For reasons of space only the parser is shown here.Although PASCAL is small as programming languages go, because of the ease with which the method described above created a PASCAL frontend, the author strongly suspects that this method is applicable to larger languages such as ADA.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    1
    Citations
    NaN
    KQI
    []