Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building a custom codegen #579

Open
fl215 opened this issue Aug 14, 2024 · 5 comments
Open

Building a custom codegen #579

fl215 opened this issue Aug 14, 2024 · 5 comments

Comments

@fl215
Copy link

fl215 commented Aug 14, 2024

I have a CPU with a custom architecture and I've been wondering whether I could make an SML codegen for it. I was thinking to manually write an SML compiler but using an already existing compiler would make my life a bit easier. I could go digging and figure out how it works but I wanted to ask if I could save some of my time, how can I make a codegen for my architecture? The steps to setting a new codegen up and such.

@MatthewFluet
Copy link
Member

Adding a new codegen to MLton is a non-trivial task, especially a native one. We currently have only 4 codegens: C, LLVM, x86, and amd64. The amd64 codegen is mostly a modification of the x86 codegen. The C and LLVM codegen's are much simpler, since they essentially just emit textual C or LLVM IR and ask the respective tools to compile them to object files.

Each codegen is invoked from the top-level compiler at:
https://github.com/MLton/mlton/blob/master/mlton/main/compile.fun#L603-L634

The essence of a codegen is to translate the Machine IR into an appropriate representation. The Machine IR is an untyped intermediate language, corresponding to an abstract machine with an infinite number of temporaries. It makes many aspects of the execution explicit, including object allocation and adjusting the SML stack pointer.

All of the codegens use the same runtime system, so there are some differences with respect to a traditional codegen for a simple (unmanaged) language. For example, to support deep recursion and stack inspection for garbage collection, the SML stack is distinct from the C stack. A "call" of (the Machine IR representation of) a SML function is not implemented with "call" and "return" instructions; rather they are implemented with unconditional jumps and explicit management of the SML stack. While the earlier high-level IRs have been heavily optimized by MLton, the Machine IR is rather low-level and somewhat verbose; the x86 and amd64 codegens implemented a number of optimizations in order to achieve reasonable performance.

I wonder if you could say more about your desire to directly compile from SML to your custom CPU; other alternatives, that may or may not be simpler, would be to add your CPU architecture to a C compiler (e.g., gcc) or to LLVM and then use the existing C and LLVM codegens. Certainly, those projects are somewhat designed for being extended with new architectures, so probably have decent document on the process.

@fl215
Copy link
Author

fl215 commented Aug 19, 2024

I wonder if you could say more about your desire to directly compile from SML to your custom CPU; other alternatives, that may or may not be simpler, would be to add your CPU architecture to a C compiler (e.g., gcc) or to LLVM and then use the existing C and LLVM codegens. Certainly, those projects are somewhat designed for being extended with new architectures, so probably have decent document on the process.

well, I was planning to make an SML HLLCA, so that's why I asked.

@MatthewFluet
Copy link
Member

well, I was planning to make an SML HLLCA, so that's why I asked.

Hmm... that's a somewhat different task. And, I'm not sure that MLton would necessarily be the best SML compiler for such an experiment. Because much of MLton's compilation strategy is to lower the high-level SML source code into something that is much closer to C. There isn't much HLL left by the time compilation reaches the Machine IR. Also, the compilation and Machine IR are closely tied to the MLton garbage collector and runtime system, which are implemented in C --- so one would appear to also need a C compiler. I'm not sure that a strategy of making "GC_collect" be a single instruction in the architecture would work, not to mention all of the other primitive operations (like I/O) that are implemented by making FFI calls to (simple wrappers around) standard C functions.

@fl215
Copy link
Author

fl215 commented Aug 20, 2024

Hmm... that's a somewhat different task. And, I'm not sure that MLton would necessarily be the best SML compiler for such an experiment.

could you give me some leads as to where I should be looking then?

@MatthewFluet
Copy link
Member

Hmm... that's a somewhat different task. And, I'm not sure that MLton would necessarily be the best SML compiler for such an experiment.

could you give me some leads as to where I should be looking then?

HaMLet (https://github.com/rossberg/hamlet) is a reference interpreter for SML, so stays rather high-level. There are a number of other SML compilers (Poly/ML (https://github.com/polyml/polyml), SML/NJ (https://github.com/smlnj/legacy), MLKit (https://github.com/melsman/mlkit), MoscowML (https://github.com/kfl/mosml)) that take various strategies towards compilation.

I'm not very familiar with the details of past HLLCA for functional languages. I suppose it would depend on what aspects of SML you think would be beneficial to have custom instructions for. For example, I know of some toy VMs for functional languages that have dedicated environment registers and instructions that support direct access to closure-captured free variables of a function (though, such flat closures have some downsides with respect to space usage). I never studied the Lisp machines, so I'm not even sure how they handled garbage collection (e.g., entirely implicitly, with an "alloc" instruction that could invoke GC; with an explicit "gc" instruction; or via a GC implemented using lower-level instructions).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants