Otterkit COBOL Dev Update 1: First successful compilation #7

KTSnowy · 2022-12-24T13:44:16Z

KTSnowy
Dec 24, 2022
Maintainer

Dev Update 1: First successful compilation:

Development on the Otterkit compiler is progressing quite well, and this month we've been working on all parts of the compiler pipeline trying to get it to successfully compile a small program on its own.

This was the first program that we successfully compiled, just a very simple COBOL program with 3 working-storage variables, two accept statements to get the current date and current time, and then displaying all 3.

       IDENTIFICATION DIVISION. 
       PROGRAM-ID. HELLO-WORLD.
       DATA DIVISION.
       WORKING-STORAGE SECTION.
       77 WS-CURRENT-DATE PIC X(8).
       77 WS-CURRENT-TIME PIC X(8).
       01 WS-CONST CONSTANT AS "Hello, World!".
       
       PROCEDURE DIVISION.
       ACCEPT WS-CURRENT-DATE FROM DATE YYYYMMDD.
       ACCEPT WS-CURRENT-TIME FROM TIME.
       
       DISPLAY "CURRENT-DATE: " WS-CURRENT-DATE.
       DISPLAY "CURRENT-TIME: " WS-CURRENT-TIME.
       DISPLAY "CONSTANT: " WS-CONST.
       
       STOP RUN.

This in the output from the compiled program. It's just a simple program, but having the compiler pipeline working will make development and testing go much more smoothly:

Compiler options:

We've also added a couple of compiler options, and as you might have seen from the previous image the "build and run" command is:

$ otterkit build -r -e <filename.cob>

With that command the compiler will generate C# code from COBOL source code, call the dotnet compiler to turn it into a standalone executable, and then call the compiled executable.

Other build commands are this one for "parse only" which displays a list of tokens:

$ otterkit build -p -e <filename.cob>

And the "build only" which builds the executable, but doesn't run it:

$ otterkit build -e <filename.cob>

Additional options are -cl for custom column length limit (default is 80) and of course the --free and --fixed for fixed and free source formats:

$ otterkit build -cl 120 --fixed -e <filename.cob>

There's also a new command which works similarly to the dotnet new command. With the module template for generating a COBOL library and app for standalone executables:

$ otterkit new app
$ otterkit new module

Those two commands will generate the project files needed by the dotnet compiler to compile the generated C# code. Those files will stay in a folder called .otterkit, and the compiled executable will be in the .otterkit/Build directory

The preprocessor:

We've also started working on the preprocessor, and we're aiming to support all directives from the latest COBOL standard. Currently the preprocessor only implements the >>SOURCE FORMAT directive:

       IDENTIFICATION DIVISION.
       >>SOURCE FREE 
PROGRAM-ID. HELLO-WORLD.
>>SOURCE FIXED
       DATA DIVISION.
       >>SOURCE FREE
WORKING-STORAGE SECTION.
>>SOURCE FIXED
       77 WS-CURRENT-DATE PIC X(8).
       >>SOURCE FREE
77 WS-USER-INPUT PIC X(8).
>>SOURCE FIXED
       01 WS-CONST CONSTANT AS "Hello, World!".
[...]

Switching source formats on each line like this is incredibly cursed but the standard requires that functionality, and it's also a nice way to stress test the preprocessor on the constant source format switching.

The parser and error messages:

We're using a hand written variation of a recursive descent parser. It's not a pure recursive descent because that would be a really bad experience with COBOL's huge list of reserved words, but it's a custom variation to make parsing COBOL easier.

The reason behind using a hand written parser instead of a generated one is the possibility of showing much better error messages than what the usual parser generator is capable of:

This allows us to define exactly on which token to show the error (in case the parser has already moved to the next token), show a custom written error message that hopefully will help with fixing the issue, and a snippet of the source code to show where the error occurred. We'll be adding more and better error messages as we progress with development.

Project file maybe?

We are experimenting with ways to make a project file, but specifically for COBOL projects. This might help with a possible future package manager, and also a way to statically store compiler options. Something similar to C#'s csproj file and NodeJS' package.json file.

Let us know if you have any feedback on the idea of a project file, or if you'd like us to implement a specific feature for them.

Totally working alpha build:

We uploaded an alpha build to Nuget as a dotnet CLI tool. This build has extremely limited (and possibly broken) features, the program you saw above is basically as far as the current codegen will go. All it has currently is:

PROGRAM-ID only.
WORKING-STORAGE and LOCAL-STORAGE.
77 and 01-level elementary items.
DISPLAY, ACCEPT and STOP RUN statements.
Only compiling a single source file.

The parser is capable of parsing group items, 30 more statements and FUNCTION-IDs, but the codegen for those is not ready yet. We'll keep updating this same package on Nuget as we progress with the parser and codegen:

https://www.nuget.org/packages/Otterkit/

https://www.nuget.org/packages/Otterkit.Templates/

The templates package is used for generating the COBOL projects with the new command. You'll need to install that one as well to try this alpha build.

Project repos:

If you'd like to check the source code and possibly help us with the compiler's development, feel free to check our GitHub repos:

https://github.com/otterkit/otterkit

https://github.com/otterkit/libotterkit

The compiler is split into two repos, the main one and the libotterkit repo, which is where the runtime library lives. It is free and open source, licensed under the Apache License 2.0.

This is a long term project, and our goal is to make a production-ready modern COBOL compiler that supports features from the latest COBOL standard. We'll try to improve the COBOL ecosystem and the general developer experience as we continue development. We believe that COBOL can have a nicer developer experience, with better tooling, and a compiler with more modern features than what it currently has. In our opinion, COBOL is just lacking a bit of love ❤️

GitMensch · 2022-12-24T14:02:42Z

GitMensch
Dec 24, 2022

Let us know if you have any feedback on the idea of a project file, or if you'd like us to implement a specific feature for them.

make rules, and is already available for a lot of environments. I highly suggest to go with this one instead of a "modern project file".
Using make also allows to mix in several other languages.

parser and error messages

I still highly suggest to use the standard "combined location" form, which is supported in several tools "out of the box"; including vscode, vim, emacs, ...

Statements: I highly suggest to add in-line and out-of-line PERFORM next, for the first one at least with the TIMES option; this allows, between others, to easier inspect for performance issues.

Speaking of those: What is the output of perf stat -e instructions otterkit build -r -e testing.cob vs. perf stat -e instructions cobc -j testing.cob, just for a minor comparison?

And - just for interest - is there any source-level debugging available yet?

13 replies

KTSnowy Dec 24, 2022
Maintainer Author

There we go, NativeAOT has a MUCH faster startup time. 0.037 compared to the previous 0.225

@GitMensch Interestingly, Valgrind now actually runs the compiled program. I mean, it makes sense, it was compiled with LLVM so it's technically native now.

GitMensch Dec 24, 2022

For "long running" backends for example, it will be more useful to get the execution performance alone than the dotnet runtime startup + execution performance. Otherwise you'd always get 200 milliseconds added to the benchmark, which is not very useful when benchmarking the actual code that will be running.

That depends - if you go "modern micro service", then the process often lives very short (but may be run in the cloud with 200 processes in parallel).
I guess both "complete time" and "time without initialize and teardown of the runtime" is useful (for such a small program you'll have nearly everything in the init and teardown with GnuCOBOL); for checks that ar wrapped with an PERFORM 500000 TIMES those should not matter.

I'm assuming GnuCOBOL compiled code gets "instantly" running because of the native code, right?

Not completely, the GC runtime needs to start, too (checks a bunch of configuration, allocates a bunch of memory, ...).

The calgrind output from the "native" one is what you want to use for the instructions (missing: same with a local compiled GnuCOBOL 3).
The memcheck output which you gave before can be useful for getting a view of the memory usage, so you may want to compare that, too.

KTSnowy Dec 24, 2022
Maintainer Author

C# uses a lot more memory than C. The one above is Otterkit and the one below in GnuCOBOL with the same program.

Also, how do I get the instructions from Valgrind?

GitMensch Dec 24, 2022

C# uses a lot more memory than C

I've thought so, but I'm no "bleeding edge .net coder", so I don't know if/how things changed there (I was told that with Java nearly most updates to an newer RTS or to a newer library means more RAM used) and if there's an option to "tweak" something.

In any case this is important for people to know, I've seen a bunch of environments with around 200 users of an application on one server - if people want to scale they have to have an idea on the memory usage.

Also, how do I get the instructions from Valgrind?

valgrind is "kind of a toolbox"; memcheck (the default tool) tests for memory issues and summarizes the memory (you can get more memory details when using some of its options); the callgrind tool creates a complete detailed call trace (famous for nice views in cachegrind) and summarizes the instructions (you see that above with "Events: Ir (Instructions) and then the number of the collected events; and also the instruction reference at its end; opening the created callgrind file with cachegrind [apt install kcachegrind] will also show you this in the summary - and a complete callgraph with costs [this way you can see with GnuCOBOL which parts of the runtime took how much time, how often they were called, but also what the COBOL programs took their instructions in]).

KTSnowy Dec 25, 2022
Maintainer Author

@GitMensch I think I figured out where the 6MB of memory usage was coming from. Using the ACCEPT statement to get the current date and current time was adding 5MB to the memory usage, removing the ACCEPT statement from the COBOL source code moves the memory usage back to around 1.1MB which is much more reasonable for a small C# program like this. Also, from 15000 allocs to just 147, WAIT HOW?

This needs fixing and optimization. Maybe I did something bad with the ACCEPT statement datetime stuff. Or maybe C# datetime methods actually use that much memory? That would be a bit weird.

GitMensch · 2022-12-24T15:49:25Z

GitMensch
Dec 24, 2022

... and now I remember: WSL2 works with perf, but it isn't easily to be installed, you'd need to install it with a kernel tool checkout. Something like the following should work (if not then I'll update the gist): to add it to WSL (or gitpot) https://gist.github.com/GitMensch/b0021e3cfd0957eeed6fbb9ce2973d7d

As an alternative: you can get the instructions also with Intel's pin tool (if you have an Intel Processor) or with callgrind (gets you much more for much more time spend).

0 replies

GitMensch · 2022-12-25T08:32:14Z

GitMensch
Dec 25, 2022

Nothing beats profiling cpu and memory :-) Note: GnuCOBOL does much more than necessary for a simple ACCEPT DATE - as we streamlined all date/function calls into a single function chain (which is also used for fine grained, locale aware functions and handles partial/full adjustment via override setting). After profiling an application and having seen that this takes most of a loop doing ACCEPT TIME this was adjusted since weeks ago to do the "big" lookup only if necessary, then only adjust the nanoseconds to minutes the next call - and everything was back to a reasonable time. Most of the computation and memory in date/time is about the locale tzdata, which may be completely loaded into memory in the first call, and possibly also checked (as it is in glibc) each time when a datetime function is called.

1 reply

KTSnowy Dec 26, 2022
Maintainer Author

@GitMensch I have a slight feeling that maybe Valgrind/Callgrind is not doing memory profiling of C# correctly.

The final size of the executable doesn't increase with the ACCEPT statement. Running the statement through BenchmarkDotNet tells me that the accept date is only allocating around 100 bytes and running in around 500 nanoseconds.

I have a feeling that the 15000 allocations is not correct. I'll try running it through dotMemory as well to see if it returns the same results.

KTSnowy · 2023-01-12T23:47:48Z

KTSnowy
Jan 12, 2023
Maintainer Author

@GitMensch Hey Simon, I found this article talking about the TZ environment variable on Linux and how it could cause thousands of system calls. This might be the cause of the 15000 allocations, I don't really see how a single C# datetime call would cause it by itself (from ~150 to 15000 allocs).

If this is indeed the cause then maybe setting the TZ environment variable at runtime could fix the issue. This also seems to be a Linux only issue.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Otterkit

Otterkit COBOL Dev Update 1: First successful compilation #7

{{title}}

Replies: 4 comments 14 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Otterkit

Otterkit COBOL Dev Update 1: First successful compilation #7

KTSnowy Dec 24, 2022 Maintainer

Dev Update 1: First successful compilation:

Compiler options:

The preprocessor:

The parser and error messages:

Project file maybe?

Totally working alpha build:

Project repos:

Replies: 4 comments · 14 replies

GitMensch Dec 24, 2022

KTSnowy Dec 24, 2022 Maintainer Author

GitMensch Dec 24, 2022

KTSnowy Dec 24, 2022 Maintainer Author

GitMensch Dec 24, 2022

KTSnowy Dec 25, 2022 Maintainer Author

GitMensch Dec 24, 2022

GitMensch Dec 25, 2022

KTSnowy Dec 26, 2022 Maintainer Author

KTSnowy Jan 12, 2023 Maintainer Author

KTSnowy
Dec 24, 2022
Maintainer

Replies: 4 comments 14 replies

GitMensch
Dec 24, 2022

KTSnowy Dec 24, 2022
Maintainer Author

KTSnowy Dec 24, 2022
Maintainer Author

KTSnowy Dec 25, 2022
Maintainer Author

GitMensch
Dec 24, 2022

GitMensch
Dec 25, 2022

KTSnowy Dec 26, 2022
Maintainer Author

KTSnowy
Jan 12, 2023
Maintainer Author