{ File "Compiler.Doc" - 12/15/86 - Bruce Tomlin } { Tokens: Tokens are returned in the variable 'sym' after calling GetSym. 0 intConst Integer, value in 'intNum' 1 realConst Real number, value in 'realNum' 2 strConst String constant, value in 'id' 3 ident Identifier, name in 'id', ptr to symbol table entry in 'idPtr' 4 noSym Only used for 'nextSym' in GetSym to indicate no sym pushed back 5 becomes := (assignment) 15 lparen ( 6 range .. 16 rparen ) 7 eq = 17 lbrack [ 8 neq <> ­ 18 rbrack ] 9 lt < 19 colon : 10 leq <= ² =< 20 semicolon ; 11 gt > 21 plus + 12 geq >= ³ => 22 minus - 13 comma , 23 times * 14 dot . 24 divide / (float division) Keyword tokens: Note: Because of the way the lexical analyzer works, the (uppercased) keyword name is returned in the variable 'id' 25 kARRAY 35 kOF 26 kBEGIN 36 kPROCEDURE 27 kCONST 37 kPROGRAM 28 kDIV 38 kRECORD 29 kDO 39 kTHEN 30 kELSE 40 kTO 31 kEND 41 kTYPE 32 kFOR 42 kVAR 33 kFUNCTION 43 kWHILE 34 kIF 44 kWriteLn 45 damnSetBug (workaround for compiler bug with IN operator) Symbol table: The symbol table is set up as a hash table using linked lists for hash collisions. The symbol table size set to 101 (the first prime greater than 100) and the hashing algorithm used is the HashPJW algorithm from page 436. The invariant part of a symbol table record consists of a name string and a pointer to the next element with the same hash entry. The name length is defined by the constant 'maxIDLen' which is set at 15. The variant part is identified by the 'typ' element. A keyword entry in the symbol table consists only of the keyword's token value. Symbol fields: name Pointer to the symbol's name next Pointer to the next symbol with the same hash value If record element, pointer to next record element typ The type of the symbol in the symbol table tKeyword Keyword symbol tCONST Constant tTYPE Type tVAR Variable tPROC Procedure tFUNC Function tRecElem Record element tArrayTyp Type record for an array kwSym Token for keyword idTyp Identifier type tNoType Type is unknown or undefined for this symbol tINT Integer tREAL Real number tRECORD Record type tARRAY Array type tSubrange Subrange type tEnum Enumerated type size Size of symbol (for tTYPE or tVAR) offset Address of variable (for tVAR) Offset in record of this field (for tRecElem) parent Pointer to parent symbol table realValue Value of real constant (only for tCONST + tREAL) intValue Value of integer constant (only for tCONST + tINT) nextEnum Next enumerated constant (only for tCONST + tINT) eParent Parent enumerated type (only for tCONST + tINT) varParm TRUE if a VAR parameter (only for tVAR) firstElem Pointer to first record element (only for tRECORD) rDetached TRUE if firstElem is not copied (only for tRECORD) loBound Low bounds of array (only for tARRAY) hiBound High bounds of array (only for tARRAY) arrayTyp Pointer to array type record (only for tARRAY) aDeatched TRUE if arrayTyp is not copied (only for tARRAY) loVal Low subrange value (only for tSubrange) hiVal High subrange value (only for tSubrange) hiEnum High value for enumerated type (only for tEnum) consts Pointer to list of constants in enumerated type (only for tEnum) parms Pointer to parameter list (only for tPROC and tFUNC) funcType Pointer to function type (only for tFUNC) Parameter list elements: next Pointer to next parameter name Pointer to parameter name typ Pointer to parameter type entry isVar True if a VAR parameter Parse tree entries: Each parse tree node has a pointer to the next element (so that compound statements can be stored as linked lists, etc.) an entry type, some generic pointers to other parse tree entries (tree1..tree4), some generic pointers to symbol table entries (sym1, sym2), and storage for constant nodes. Each expression node of a parse tree has a pointer to a symbol table element containing the type of the expression at that node in 'exprType'. The generic fields are assigned as follows: Node type Field Usage --------- ----- ----- pAssign tree1: Pointer to the variable being assigned tree2: Pointer to the expression to assign pFOR tree1: Pointer to the index variable tree2: Pointer to the starting value expression tree3: Pointer to the ending value expression tree4: Pointer to the statements to execute pWHILE tree1: Pointer to the test expression tree2: Pointer to the statements to execute pProcCall sym1: Pointer to the procedure's symbol table entry tree1: Pointer to the parameter expressions pFuncCall sym1: Pointer to the function's symbol table entry tree1: Pointer to the parameter expressions pIF tree1: Pointer to the test expression tree2: Pointer to the statements to execute if true tree3: Pointer to the statements to execute if false pInfLoop tree1: Pointer to the infinite loop's statements pVariable sym1: Pointer to the variable's symbol table entry pArrayRef tree1: Pointer to the variable being subscripted tree2: Pointer to the subscript expressions pFieldRef tree1: Pointer to variable being field-referenced sym1: Pointer to the subfield's symbol table entry pRealConst realValue: Value of constant pIntConst intValue: Value of constant pReal2Int tree1: The integer expression to convert to a real pInt2Real tree1: The real expression to convert to an integer pNeg tree1: Pointer to expression to negate pEq pAdd - pNeq pSub | tree1: Left side expression pLt pMult >-> pGeq pIDiv | tree2: Right side expression pGt pRDiv - pLeq Parameter passing and calling of procedures: If the called subprogram is a function then storage for the function result is pushed before the first parameter. If the return value is less than four bytes, that number of bytes is reserved on the stack, otherwise, a pointer to a temporary value set up by the caller to contain the result is pushed on the stack. Parameters are pushed on the stack in the order in which they are declared in the procedure or function declaration. VAR parameters are always passed by reference by pushing a four-byte pointer to the value on the stack. If a value parameter is four bytes or less, it is pushed onto the stack, otherwise, a four-byte pointer to the value is pushed on the stack and the called subprogram must then copy the value into a temporary value. If the called subprogram is not declared on the outer level, a pointer link to its parent's local variables must be pushed on the stack for a static link. The static link [if there is one] is always 8 bytes above a stack frame, so it can be traced for more than one level. A procedure may alter the contents of registers A0, A1, D0, D1, and D2, but must preserve the contents of all other registers. Optimization: Constant expressions such as '5+3' are optimized by replacing them with a constant node in the parse tree. Also, IF THEN ELSE statements and WHILE and FOR loops are optimized for constant control expressions: IF THEN ELSE If the control expression is a constant, the IF THEN ELSE node is replaced with one of its two subtrees. If the THEN and ELSE statements are both empty, the entire statement is removed. WHILE If the control expression is a constant, the WHILE node is either removed entirely or replaced with an infinite loop node. If the body of the loop is empty, the entire WHILE statement is removed. FOR If the start and end values are constant and the start value is greater than the end value, the entire FOR statement is removed. The FOR statement is also removed if the body of the loop is empty. Compiler options: {$OC+} Turns on constant expression optimization {$DS+} Turns on symbol table dumping {$DT+} Turns on parse tree dumping Note: The output from DS+ and DT+ does not appear as assembler comments, so the program won't go through the assembler if either of these two options is turned on. } { Note: In the compiler that I am using, there is a bug relating to the IN operator under these conditions: 1) the IN operator is given a set constant on the right side of the operator 2) the set-type involved contains more than 32 elements 3) the set constant used with the IN operator only contains elements from the first 32 elements of the set-type 4) the expression on the left side of the expression refers to a member which is not one of the first 32 elements of the set-type The problem is that a set constant with less than 32 elements is represented as a long integer constant and is bit tested to execute the IN operator. Unfortunately, no range check is performed on the member being tested to make sure that it is one of the first 32 elements. To work around this, an extra token, 'damnSetBug' has been added so that it is not one of the first thirty-two tokens, and it has been added to all uses of the IN operator which are bitten by the bug. }