-
Notifications
You must be signed in to change notification settings - Fork 1
Back end
The parser is able to parse asml programs with a single main function, arithmetic expressions and variable defintiions. A typical program looks like this :
let _ =
let x = 0 in
let y = 1 in
let x = add y 2 in
nop
This example can be found in tests/backend/parser/actual_progs/github_ex.asml
.
The very basic register allocation is working. It replaces each variable with a unique register name. Right now it doesn't take special ARM registers into account. So our previous program becomes :
let _ =
let r0 = 0 in
let r1 = 1 in
let r0 = add r0 r1 2 in
nop
Using the data type described bellow, we can generate the ARM code corresponding to our program. In the case of our example we obtain the following result :
.text
.global _start
_start:
ADD r1, 0, #0
ADD r0, 1, #0
ADD r1, r0, #2
nop
BL min_caml_exit
type t =
| Int of int
| Float of float
| Neg of Id.t
| Add of Id.t * t
| Sub of Id.t * t
| Land of Id.t * t
| Var of Id.t
| Eq of Id.t * t
| Call of Id.t * formal_args
| CallClo of Id.t * formal_args
| If of Id.t * t * asmt * asmt * string
| MemAcc of Id.t * t
| MemAff of Id.t * t * Id.t
| New of t
| Nop
and formal_args = Id.t list
and asmt =
| Let of Id.t * t * asmt
| Expression of t
and fundef = {
name : Id.t;
args : Id.t list;
body : asmt
}
type toplevel =
| Fundefs of (fundef list)
The parser is build arount 3 main modules :
-
parser.mly
defines the token and the syntax (as described inasml.html
) -
lexer.mll
associates keywords to the tokens -
syntax.ml
descibes the types and functions related to these types
When a file is parsed the parser translates it into tokens. Then it looks for the token paterns we described in parser.mly. These paterns then trigger specific functions and build the abstract syntax tree associated with the program.
The parser outputs a variable of type toplevel
. It can then be processed.
This function is used in the file basicregist.ml
and used in id.ml
.
It is based around a variable table. Every time a variable is used in the program, we looks if it is in the table.
- If it is then we return its index in the table
- If it's not then we add it into the table (assuming there is some space left) and return its index.
Thus when outputing ARM code we replace variable names with r%i
, %i
being the register index.
To perform this task we use pattern matching on the type we described.
For example if we have the pattern LET IDENT EQUAL exp IN asml
, with exp = OP <register1> <register2>
we want an ARM code similar to OP <ident name> <register1> <register2>
followed by the traduction of the asml.
To handle function calls we have to handle parameters and return values.
Parameters : if we have less than 4 paramters we put them in registers r0, r1, r2, r3.
To do so we use the helper function movegen l i
. In a simplified version without register allocation :
let rec movegen l i =
match l with
| [] -> sprintf ""
| t::q -> sprintf "mov r%i, %s\n%s" i (to_register t) (movegen q (i + 1))
This function puts all the registers corresponding to the first parameters into the r0 - r3 registers. It is only call if we have 4 parameters or less.
Return value: functions put their return values in the r0 register. Thus whenever we have a statement in the form LET Id = <function call> in
we need to perform a new instruction to save the return value in Id
: mov id_to_register r0
to put the return value into the register corresponding to Id.
We choose to traduce if-then-else statements systematically by using the cmp
instruction. This operation sets the flag Z
in the CPU to 0 if the two registers given as parameters are different and 1 if they are equal.
We can then do conditional branching based on this flag by using instructions such as beq
or blt
, ble
..
As an example, let us consider the following code:
let _ =
let a = 0 in
let b = 0 in
if a = b then print_int 1 else print_int 0
It will be traduced in the following fashion:
mov r4, #0 @ r4 = a
mov r5, #0 @ r5 = b
cmp r4, r5
beq if
mov r0, #0 @ set the argument of print_int to 0
bl min_caml_print_int
bl end
if:
mov r0, #1 @ set the argument of print_int to 1
bl min_caml_print_int
bl end
end:
bl min_caml_exit
Arrays are stored on the heap. To manipulate the heap we need to move its upper limit by calling the brk
system call. It takes a single argument : the new position of the heap break. After a success it will return the new break. In case of failure it returns the current break.
We write the function _min_caml_create_array size init
. It returns the address of an array of the specified size, with its cells initialized to a given init
value. This function follows this pseudo-code:
a <- get_current_break
b <- a + size
set_current_break(a)
for i = 0 to size
a[i] = init
return a
In order to get the current break, we perform an illegal brk syscall on purpose, i.e by doing brk #0
.
Accessing arrays is done by specifying the base address addr
and the offset offs
through the mem(adrr + offs)
keyword. To access we can use LDR reg1, [addr, offs, LSL #2]
. Inside the bracket we mean to perform the operation addr + offs * 4
. The same can be done with STR
.
The main mechanisms used to handle closures are allocation on the heap and branching to labels.
To help in keeping the code clean, we define a word (variable) _self
within the .data
section of our assembly code. This variable will hold the address of the closure we generate and can be restored from within the function to which the closure is applied.
Whenever the token apply_closure
is used, we call the function Callclo
. With this function we save the value the closure into_self
, save the closure arguments and branch onto this register to jump to the closure.
When we want to access memory at the base address %self
(ie. from within the closure), we will load the address that is store inside _self
into a register and use it.