UP | HOME

Compiler Example: C to Assembly

The gist: Use recursive descent to compile statements and expressions in C into assembly. Assume we have a stream of tokens as input. Two key things are needed:

  1. A way to parse:
    • statements:
      • Unconditional statements
      • Conditional statements, e.g. ifelse
      • Compound statements, e.g. statement1; statement2;
      • Iteration, e.g. while
    • expressions:
      • constants
      • variable, e.g.
      • arithmetic operations
      • assignment
  2. A way to break C code into simple statements that can be parsed. This is recursive descent.

0.1 Examples of parsing expression

0.1.1 To parse the assignment a = <expr>

  1. compile_expression(expr) and store result in register x with CMOVE
  2. ST(Rx, a)

Here, note that a is an address into the stack. In assembly, the address would have earlier been set aside, as in a: LONG

0.1.2 To parse the operation <expr_1> + <expr_2>

  1. compile_expression(<expr_1>) and store result in register x
  2. compile_expression(<expr_1>) and store result in register y
  3. ADD(rx, ry, rx) add rx and ry and store result in rx

0.2 Examples of parsing a statement

0.2.1 To parse a compound unconditional statement

Given {<statement_1>; <statement_2>} The compiler does the following:

  1. compile_statement(statement_1)
  2. compile_statement(statement_2)

0.2.2 To parse a conditional

Given if (<expr>) <statement> The compiler produce the following:

  1. compile_expression(expr) and store the result of the computation in register x
  2. BF(rx, Lendif) if rx is false, then branch to the label Lendif
  3. compile_statement(statement)
  4. Add the label Lendif:

These examples help me understand how to get from a high-level programming language to assembly, but modern compilers work slightly differently.

1 How to parse:

Recursive descent is a top-down strategy for parsing. Beginning with the start symbol \(S\), the parser attempts to fit all the rules that have \(S\) on the LHS to the input.

Other methods of building the parse tree include bottom up methods, such as the CYK Algorithm.