This C= implementation is modular, it includes 3 passes, the Parsing, the Virtual Compiling and then a machine compiling step or and interpreting / virtual machine.
Like most languages, the code goes through a simple tokenizer, which convert the human readable code into a list of Tokens, represented as INTs (or ENUMs).
After tokenization, the code is converted into an Abstract Tree representation. The process comports several modular steps :
- Using the PEX language (a weird subset of RegEX), we identify expressions, and assign parsing functions to them.
- Then, the token list is passed to the parsing function, which creates a corresponding AST struct and adds it to the tree.
- The PEX parser is called recusrsively, and we go back to 1.
First we create a small set of ASM-like instructions (see ASM Spec)
With a simple recusrive descent algorithm, we evaluate every AST block (node/branch) into our ASM instructions. The resulting code is added to a big program file.
After our code is finally converted to an ASM-Like language, we can choose beetween 2 ways of runnning it, the compiled executable, or the intepreted bytcode.
thanks to a simple map, we convert our ASM to the machine equivalent.
The ASM is passed to a intepreter, or VM which runs it directly.