Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

ECE220: Computer Systems and Programming

Fall 2022

Machine Problem 11

Code Generation for an LC-3 Compiler

This assignment requires you to use recursion to translate a description of a program in a subset of the C language into LC-3 assembly code.  Developing such code provides you with experience with the use of recursion and pointer-based data structures in C, both of which are important tools.  You will also gain a better understanding of the methods and challenges involved in automating translations from one form of program to another.  Most of the compiler will be provided to you, including both code that translates C code into an abstract syntax tree (AST) as well as C library routines to support basic arithmetic and I/O

on LC-3. Your focus will be on turning an AST into a sequence of LC-3 instructions.

Please read this document in its entirety before you begin to program.

On Reading the Code

You are not expected to read the more than 3,300 lines of C, lexer, parser, and LC-3 code provided to you. You are welcome to do so, of course, but you do not need to read them to complete this assignment.  All relevant material from headers, etc ., is included in this document.  Furthermore, you should be aware that a substantial amount of code (around 4,500 lines!)  is automatically generated when you build your compiler, and none of the automatically generated code (such as ece220 lex .yy .c) is meant to be human-readable.  Don’t try. If you want to remove the automatically-generated files so that you can see what was there in the original distribution, type make  clear.

A Subset of C

To simplify your task, the compiler supports only a narrow subset of the full C programming language. Support is included for if statements, for loops, return statements, and expressions using most of the C operators. Compound statements are required as opposed to simply being good practice. For example:

int  i;

for  (i  =  0;  10  >  i;  i++)      /*  This  code  will  generate  a  syntax  error .    */

printf  ("%d/n",  i);

for  (i  =  0;  10  >  i;  i++)  {  /*  To  avoid,  include  braces  around  loop  body .    */

printf  ("%d/n",  i);

}

Programs written for this compiler can declare global variables as well as a single subroutine (int main  (), of course). Local variables can be declared within main, but variable declarations are not otherwise allowed (for example, within compound statements).   Only two types are supported:  int’s and one-dimensional arrays of integers. Pointer types are not supported, and you can use arrays only with bracketed expressions for indices. For example:

int  a[1];

scanf  ("%d",  a);          /*  NOT  legal--will  generate  a  syntax  error  */ scanf  ("%d",  &a[0]);  /*  legal  */

Constant strings can be used, but only as arguments to function calls, as shown by the format strings in the printf and scanf examples above.

Most C operators are supported, including assignment (=), basic arithmetic (+, -, *, /, %, and negation), pre- and post-increment and decrement (++, --), logical operators ( !, &&,  ||), and comparisons (<, <=, ==, >=, >, !=). Taking the address of a variable or array element is also possible (as is necessary for scanf), but almost no type checking is done for you.

Thus the compiler accepts the code below (as does gcc, but accompanied by warnings):

int  i;

int  j;

i  =  &j;

printf  ("%d/n",  "Hello!/n");

printf  ("i  at  %d,  j  at  %d/n",  &i,  &j);

scanf  ("%d",  i);

The output produced by the printf calls in the code above is unpredictable and of little value, but the scanf call passes the value of variable i instead of its address and thus could change memory unpredictably. In the code above, i was set previously to the address of j, but such tricks are confusing and error-prone.

Examples of other missing functionality include pointer dereferencing, multi-dimensional arrays, structures, enumerations, defining new types, variable initialization within a declaration, loops other than for loops, switch statements, and bitwise operations.  Anything not explicitly mentioned as a type of AST node in the AST section of this document is simply not included in the language, and you need not worry about implementing it.  Of course, you won’t be able to use such functionality in C programs on which you want to use your compiler, either.

Code Generation

Your code must generate LC-3 assembly for the main subroutine and print it to stdout.  As you know, a C function such as main consists of variable declarations and statements.  Variable declarations will be translated for you into a symbol table that your code must use when generating assembly code for each of the statements. The stack frame will already be set up for you, and will be torn down when your code finishes. Be sure to execute the stack frame teardown—do not insert RET instructions directly into the code that you generate.

This machine problem is not intended to require substantial LC-3 programming on your part.   Each of the recursive components that you write should be no more complex than implementing a single step of systematic decomposition, and you may in fact want to review the LC-3 implementation of the conditional and iterative decompositions that we covered early in the course.

In addition to statements, much of the work involved in code generation involves computing the results of expressions. To simplify your task, you are required to use the stack to compute all expressions. As you may recall, any expression can be computed using a stack and some basic rules. To “execute” a literal operand such as a number is seen, push the operand onto the stack. To “execute” an operator, pop operands for the operator from the stack, apply the operator to those operands, and then push the operation result back onto the stack.

Although the LC-3 code generated by this approach will be lengthy and inefficient, use of a stack combines nicely with recursion to allow a straightforward implementation of most of this MP. For example, the code for an addition operator (AST220 OP ADD, as described later) performs the following sequence of operations:

generate code to produce the first operand of the add        (recursive call)

generate code to produce the second operand of the add    (recursive call)

pop the second operand from the stack                               (print LC-3 instructions) pop the first operand from the stack                                   (print LC-3 instructions) add the two together                                                           (print an ADD instruction) push the result onto the stack                                             (print LC-3 instructions)

The LC-3 instructions that you produce must be aware of the LC-3 register conventions as well as any assumptions that you make in code that can call it recursively. Consider, for example, what might happen if we did not use a stack-based approach for expressions:  a given binary operator might produce its first operand and put it into R0, then generate code to produce its second operand.  However, the code needed for the second operand must avoid overwriting the first operand in R0. What if the same operator is used

by the second operand, though (for example, 1+(2+3))? Use of a stack avoids such complexity. The register conventions for LC-3 are as follows:

R0-R3    available for your code’s use

R4    global data pointer

R5    main’s stack frame pointer

R6    stack pointer

R7    used by JSR/JSRR

Note also that the assembly routines provided for you do change certain register values when they are called (details are in the next section).

The C Library

In order to avoid your having to implement functionality in LC-3, we have written and provided you with a small library of functions to support you when generating the assembly code.  The library includes four interfaces that can be used directly from C and another three interfaces intended for your use in implementing basic arithmetic. The calls visible in C are as follows:

int  printf  (const  char*  format,  . . .); int  rand  ();

int  scanf  (const  char*  format,  . . .); void  srand  (int  new seed);

/*  assembly /*  assembly /*  assembly /*  assembly

label  PRINTF  */

label  RAND      */

label  SCANF    */

label  SRAND    */

These calls have C-style interfaces:  arguments must be put onto the stack in the correct order, and the return value will be on the stack. Except for R6 (decremented to store the return value) and R7 (changed by JSR), no register values are changed by these subroutines. As described in class, the caller is responsible

for removing arguments from the stack after the call to any of these functions returns.                  The printf and scanf routines work in the standard way, but support only a handful of escapes:

%d    print/scan an argument as a decimal integer

%%    print/scan a single % character

/n    print/scan a line feed

//    print/scan a single / character

The printf routine returns the number of characters printed.  The scanf routine returns the number of integers converted. The rand routine returns a pseudo-random number between 0 and 215 − 1, and the srand routine sets the seed for the pseudo-random number generator used by rand.

The arithmetic functions defined for you have register-based interfaces.  All three are binary operations on R0 and R1, and all three return the result of the operation in R0. Except for R0 (the return value) and R7 (changed by JSR), no register values are changed by these subroutines. The routines are as follows:

DIVIDE MODULUS MULTIPLY

divide R0 by R1, round towards 0, and store the result in R0

calculate R0 % R1 (C definition) and store the result in R0

multiply R0 by R1 and store the result in R0

All seven of these library routines use the stack for storing callee-saved registers.

Abstract Syntax Trees

The abstract syntax tree (AST) is one form of intermediate representation used as interface between the front-end and the back-end of a compiler. Recall that the front-end of a compiler translates from a high-level language (such as C) into an intermediate representation, while the back-end translates the intermediate rep- resentation into assembly code. Your job in this assignment is to write an LC-3 back-end for an intermediate representation based on ASTs.

An AST is a pointer-based data structure consisting of nodes that represent statements and expressions in the original code. The structure used in your compiler is the following:

typedef  struct  ast220 t  ast220 t;

struct  ast220 t  {

ast220 type t

type;

/* type of AST node

*/

int32 t

value;

/*  a  number  (PUSH INT,  DEBUG MARKER)

*/

char*

name;

/* a string (PUSH STR, VARIABLE)

*/

ast220 builtin func t

fnum;

/* function number (FUNC CALL)

*/

ast220 t*

test;

/* test condition (IF STMT, FOR STMT)

*/

ast220 t*

left;

/* left child/first operand

*/

ast220 t*

middle;

/* middle child (FOR STMT)

*/

ast220 t*

right;

/* right child/second operand

*/

ast220 t*

next;

/* next AST node

*/

} ;

The type field of an AST node (the ast220 t structure) specifies what kind of node is represented.  For example, one AST node might represent an if statement, while a second node represents a variable reference, and a third node represents an addition operation. In general, the type of the AST node determines the meaning (or lack thereof) of the structures other fields. The exception to this rule is the next field, which is used to link together AST nodes representing sequential statements in a block of code as well as sequential arguments in a function call.

Two general classes of AST nodes are defined for your compiler:  statements and expressions.  The subset of C supported by the compiler is minimal, thus only five statement types are possible:

AST220 FOR STMT a for statement

test left middle right

test condition

initialization expression (an AST220 POP STACK node, or NULL) loop body (statements linked by next)

update expression (an AST220 POP STACK node, or NULL)

AST220 IF STMT an if statement

test left right

test condition

then code block (statements linked by next)

else code block (statements linked by next)

AST220 RETURN STMT a return statement (dont forget to branch to the stack frame teardown code provided to you)

left expression to be returned

AST220 POP STACK an expression, with result discarded (for example, a=b;)

left expression to produce and then pop (discard)

AST220 DEBUG MARKER a debugging marker

value marker value

Fields not defined in the table above have no meaning and should not be assumed to hold any particular value (for example, NULL). The only statement type that might not be immediately clear to you from your experience with C is the AST220 DEBUG MARKER, which is simply a debugging aid designed to help you with your coding.  If you add a statement of the form DEBUG(20); to your C program, an AST node of type AST220 DEBUG MARKER will appear in its place (with value equal to 20), which in turn enables generation of an assembly comment marking that point in the assembly output (using a routine already written for you).

Expression nodes include all operators supported by our subset of C as well as nodes for integer and string constants, variable references, and so on.  The simplest expressions include constants, variable references, and function calls, for which AST types are shown in the table below.  In general, the code generated (re- cursively) for an expression AST node should leave the result of the expression on top of the stack.