Skip to content

Back end

sheepNo edited this page Jan 17, 2018 · 24 revisions

Working right now

The parser

The parser is able to parse asml programs with a single main function, arithmetic expressions and variable defintiions. A typical program looks like this :

let _ =
   let x = 0 in
   let y = 1 in
   let x = add y 2 in
   nop

This example can be found in tests/backend/parser/actual_progs/github_ex.asml.

Register allocation

The very basic register allocation is working. It replaces each variable with a unique register name. Right now it doesn't take special ARM registers into account. So our previous program becomes :

let _ =
   let r0 = 0 in
   let r1 = 1 in
   let r0 = add r0 r1 2 in
   nop

ARM generation

Using the data type described bellow, we can generate the ARM code corresponding to our program. In the case of our example we obtain the following result :

.text
.global _start
_start:
ADD r1, 0, #0
ADD r0, 1, #0
ADD r1, r0, #2
nop 
BL min_caml_exit

Data types

 type t =
    | Int of int
    | Float of float
    | Neg of Id.t
    | Add of Id.t * t
    | Sub of Id.t * t
    | Land of Id.t * t
    | Var of Id.t
    | Eq of Id.t * t
    | Call of Id.t * formal_args
    | CallClo of Id.t * formal_args
    | If of Id.t * t * asmt * asmt * string
    | MemAcc of Id.t * t
    | MemAff of Id.t * t * Id.t
    | New of t
    | Nop

and formal_args = Id.t list

and asmt =
    | Let of Id.t * t * asmt
    | Expression of t

and fundef = {
                name : Id.t;
                args : Id.t list;
                body : asmt
             }

type toplevel =
    | Fundefs of (fundef list)

How does it work ?

ASML parsing

The parser is build arount 3 main modules :

  • parser.mly defines the token and the syntax (as described in asml.html)
  • lexer.mll associates keywords to the tokens
  • syntax.ml descibes the types and functions related to these types

When a file is parsed the parser translates it into tokens. Then it looks for the token paterns we described in parser.mly. These paterns then trigger specific functions and build the abstract syntax tree associated with the program.

The parser outputs a variable of type toplevel. It can then be processed.

Very basic register allocation

This function is used in the file basicregist.ml and used in id.ml. It is based around a variable table. Every time a variable is used in the program, we looks if it is in the table.

  • If it is then we return its index in the table
  • If it's not then we add it into the table (assuming there is some space left) and return its index.

Thus when outputing ARM code we replace variable names with r%i, %i being the register index.

ARM code generation

To perform this task we use pattern matching on the type we described. For example if we have the pattern LET IDENT EQUAL exp IN asml, with exp = OP <register1> <register2> we want an ARM code similar to OP <ident name> <register1> <register2> followed by the traduction of the asml.

External function calls

To handle function calls we have to handle parameters and return values. Parameters : if we have less than 4 paramters we put them in registers r0, r1, r2, r3. To do so we use the helper function movegen l i. In a simplified version without register allocation :

let rec movegen l i =
     match l with
          | [] -> sprintf ""
          | t::q -> sprintf "mov r%i, %s\n%s" i (to_register t) (movegen q (i + 1))

This function puts all the registers corresponding to the first parameters into the r0 - r3 registers. It is only call if we have 4 parameters or less.

Return value: functions put their return values in the r0 register. Thus whenever we have a statement in the form LET Id = <function call> in we need to perform a new instruction to save the return value in Id: mov id_to_register r0 to put the return value into the register corresponding to Id.

If-then-else

We choose to traduce if-then-else statements systematically by using the cmp instruction. This operation sets the flag Z in the CPU to 0 if the two registers given as parameters are different and 1 if they are equal. We can then do conditional branching based on this flag by using instructions such as beq or blt, ble.. As an example, let us consider the following code:

let _ =
    let a = 0 in
    let b = 0 in
    if a = b then print_int 1 else print_int 0

It will be traduced in the following fashion:

mov r4, #0             @ r4 = a
mov r5, #0             @ r5 = b
cmp r4, r5              
beq if
mov r0, #0             @ set the argument of print_int to 0
bl min_caml_print_int
bl end

if:
mov r0, #1             @ set the argument of print_int to 1
bl min_caml_print_int
bl end                 

end:
bl min_caml_exit

Arrays

Arrays are stored on the heap. To manipulate the heap we need to move its upper limit by calling the brk system call. It takes a single argument : the new position of the heap break. After a success it will return the new break. In case of failure it returns the current break.

We write the function _min_caml_create_array size init. It returns the address of an array of the specified size, with its cells initialized to a given init value. This function follows this pseudo-code:

a <- get_current_break
b <- a + size
set_current_break(a)
for i = 0 to size
a[i] = init
return a

In order to get the current break, we perform an illegal brk syscall on purpose, i.e by doing brk #0.

Accessing arrays is done by specifying the base address addr and the offset offs through the mem(adrr + offs) keyword. To access we can use LDR reg1, [addr, offs, LSL #2]. Inside the bracket we mean to perform the operation addr + offs * 4. The same can be done with STR.

Closures

The main mechanisms used to handle closures are allocation on the heap and branching to labels. To help in keeping the code clean, we define a word (variable) _self within the .data section of our assembly code. This variable will hold the address of the closure we generate and can be restored from within the function to which the closure is applied. Whenever the token apply_closure is used, we call the function Callclo. With this function we save the value the closure into_self, save the closure arguments and branch onto this register to jump to the closure. When we want to access memory at the base address %self (ie. from within the closure), we will load the address that is store inside _self into a register and use it.

ARM generation

ARM examples