-
Notifications
You must be signed in to change notification settings - Fork 104
SVML Compiler and Machine
This page provides an overview of the Typescript implementation of the SVML compiler and machine. As compared to the specification, it places a greater emphasis on the actual code.
Compiler can be accessed by command line, in the following way:
yarn build
-
node dist/vm/svmc.js
with the relevant arguments
To only compile a program into an SVML JSON formatted Program
, use compileToIns
from src/vm/svml-compiler.ts
.
Input: An estree es.Program
to be compiled, with optionally the prelude and list of internal functions.
Output: An SVML JSON formatted Program
If the output Program
is to be passed to the SVML machine, then the Program
needs to include the machine code for the primitive functions, and therefore, should have prelude
passed in. The prelude
defines the Source implementation of any primitive functions.
Use runWithProgram
from src/vm/svml-machine.ts
.
Input: An SVML JSON formatted Program
, which should include the compiled prelude
Output: undefined
(Source 3 Concurrent programs are all concurrent, and hence always return undefined
.)
A user can only get feedback from their Source program through the display
primitive function.
This section will explain notable functions used in this implementation. More specific details are in the code base.
The entry point of the compiler is compileToIns
. It does three main steps.
-
Reset compilation variables We use variables to keep track of the current state of compilation, e.g. the functions to compile, the machine code to return, etc.
-
Transform program's estree AST We transform the AST to make compilation easier. Currently, we only transform for loops to while loops in accordance to Source 3 specifications.
-
Start compilation Compilation is done in function level. Whenever the compiler sees a function, it will add the function to a
toCompile
stack, and only compile it after it finishes compiling the current function.
We transform for loops in particular due to their representation in Source. Loop control variables cannot be reassigned, which we achieve through building the AST in the same way the Source 3 specification outlined a for loop to be represented. This is done by some variable renaming.
Used to extract names declared in the current scope, and all blocks (excluding loops and function blocks) within the current scope. It also renames shadowed names in the nested scopes to reuse the same environment for nested scopes. Variable renaming is an optimization to reduce the overhead of creating new environments to support nested scopes. For blocks that come from If statements, or standalone blocks, as they are only run once, we can just use the same environment. For loops, as they are run multiple times, closures will be affected if we just do renaming and reuse the current environment. Below is an example of why we need to create a new environment for a loop, instead of just renaming variables.
const x = [];
for (let i = 0; i < 10; i++) {
x[i] = () => return i;
}
The indexTable
keeps track of the valid names in any environment. This is an array of Map<string, EnvEntry>
, where earlier indices in the array represent parent environments to the later ones. EnvEntry
contains bookkeeping information on each name.
This function checks the current expr
type to determine which compiler to use to compile the expr
. If insertFlag
is true
, it will insert a RET
instruction as well. Returns the maxStackSize
of the compiled node, which is the maximum size of the OS on the machine needed to execute the compiled code in the expr
, and the insertFlag
, which might change depending on the node compiled.
Prelude is located in src/stdlib/vm.prelude.ts
. There are 4 types of primitive functions. For easier referencing, I will refer to them as Types A/B/C/D.
A. Primitive functions can be defined using Source syntax map
, pair
B. Primitive functions that call JS functions math_abs
, runtime
C. Variadic functions list
, math_min
D. Primitive functions that are implemented by the machine is_string
, array_length
To compile the prelude, we do the following steps:
- Compile a Source program that contains all the primitive functions. For Type A, they are in the Source program as it is. For Type B, they are compiled as an empty block as placeholders for the actual compiled function code. For Type C, there will be slight modifications to the compiled function code to tell variadic functions apart.
- Replace the placeholders and make modifications to the compiled Prelude program as necessary. This is done in
generatePrimitiveFunctionCode
insrc/stdlib/vm.prelude.ts
- Compile the actual program code with the compiled prelude, to output a SVML JSON formatted Source 3 Concurrent
Program
.
Internal Functions should be provided to compileToIns
as a list of names to be passed to makeIndexTableWithPrimitivesAndInternals
. Source 3 Concurrent specific functions are implemented in this manner.
The machine uses a number of registers to keep track of the state of the machine, along with 9 general purpose registers to store intermediate data. When developing subroutines for the machine, do take note of which registers are being used for which operation.
Space on the HEAP
is occupied with nodes. There are currently 10 nodes.
NUMBER
, BOOL
, STRING
, ARRAY
, UNDEFINED
, NULL
, OS
, CLOSURE
, RTS_FRAME
, ENV
.
Every node has 4 slots for bookkeeping, and however many more necessary for it's own purposes. The 4 uniform slots are TAG
, the type of node, SIZE
, the amount of space the node takes up, FIRST_CHILD
, the index of the first child of the node and LAST_CHILD
, the index of the last child of the node.
Entry point of the machine. Expects a compiled program and the context to run in. This function resets all the registers, setup external builtins (more on this later), and calls run
.
run
executes each instruction in the program in a while loop until an error occurs or completion, then it will return undefined
. All the logic for how each instruction is handled are in subroutines in the array M
. Similar to the transpiler, we add a timeout as infinite loop protection.
There are 4 types of primitive functions.
For Type A primitive functions, whose implementation is defined using Source syntax, they are handled the same way a normal function in a program.
For Type B primitive functions, which needs to call a Javascript function, we create a custom opcode for each function, which just calls it and adds the result to the OS.
For Type C primitive functions, which are variadic, we pack all the arguments the function is called with into an array, and use this arguments array as the sole argument.
For Type D primitive functions, which are implemented by the machine, we define custom opcodes that does what the function is supposed to do.
There are 3 exceptions to these 4 general types. display
, error
and draw_data
, the external builtins. As these functions are not defined in js-slang, but instead in cadet-frontend and passed to js-slang via Context
, on every run, the machine will extract these functions for use from Context
, and call these in their respective custom opcode.
Each internal function should have its own subroutine, and the subroutines are called directly in the internal function call. The way they are executed is different from normal functions, in the sense that normal functions do not call subroutines directly, but instead execute instruction by instruction. Refer to INTERNAL_FUNCTION_CALL()
for the code.
We introduce 2 registers, TQ
, a thread queue and TO
, the number of instructions before a thread times out, and 3 opcodes, EXECUTE
, TEST_AND_SET
and CLEAR
. Refer to Source 3 Concurrent specification for more information.
EXECUTE
adds threads to TQ
. Every instruction will reduce TO
by 1, and when it reaches 0, the thread will timeout, and swap with another random thread in TQ
, if any. As we do not know which thread will terminate last, we always return undefined
as the value for the program. Hence, you can only use calls to display
to know what is going on in the program.
-
Assumes Source 3 / 3 Concurrent.
-
Source programs should return the last statement that was evaluated. However, this doesn't happen for some cases. If the expression in a program is an
AssignmentExpression
, it will return undefined instead of the value that was assigned. If the last statement evaluated was in a block, it will return undefined as well.
for (let i = 0; i <= 2 ; i = i + 1) {
i;
} // expected to return 2 but returns undefined instead
-
Using a variable before it is initialized is not flagged by the compiler
-
BR* uses number of instructions as offset (SVML spec uses bytes)
-
STAG / LDAG uses a boxed number as index instead of a literal (SVML spec uses a literal number)
-
Only works with Source 3 Concurrent. Will need to be configured to support other Source chapters. (Disallow certain opcodes depending on chapter?)
-
Arrays are represented using Javascript arrays, so an array that has 100 elements and an array that has 2 elements will take up the same amount of
HEAP
space (Ideally, to make an array node of size 10, the node should have 10 extra slots for the elements, etc. Array nodes need to be resizable (like an array list)) -
Using a variable before it is initialized has undefined behavior (Can be addressed by introducing some value to represent no-value-yet, and assign this value to all slots of an environment upon creation)
-
No garbage collection
-
Functions are converted to JS as "" instead of an actual JS function, this means it does not work the same as other variants on the data visualizer or in display.
Testing is done via integration testing for the compiler and machine. As Source 3 Concurrent programs all return undefined
, we test for correct output via expectDisplayResult
and calls to the display
primitive function. For errors, we use expectParsedError
instead.
There are 4 groups of testing.
- Standard opcodes
We use simple programs that either only test the opcode being tested, or make use of previously tested opcodes.
- Primitive opcodes
We test a subset of primitive functions to ensure they work as expected.
- Standard program execution
We test more complicated programs as a whole to ensure they work as expected.
- Concurrent program execution
We test to ensure there is interleaving between threads after a call to concurrent_execute
. As the output is not deterministic, we just need to search for some proof of interleaving display outputs instead.
-
Make compiler take into account the current chapter when compiling. I.e. compiling with list functions in Source 1 should throw an undefined variable error
-
Integrate type inference into compiler, so we can use type specific instructions for optimization (such ass ADDF instead of ADDG)
-
Run the machine with a scheduler, similar to how the interpreter runs. This allows support for breakpoints (
DebuggerStatement
) and make thedisplay
of the concurrent programs appear in real time. -
Allow user to specify the degree of interleaving (
MAX_TO
) -
Make machine work with REPL. Currently, every input instance into the REPL is treated as a separate program. There is no history. (No idea how feasible this idea is)