Otterkit COBOL Dev Update 2: Syntactic analysis is now complete #18
Pinned
KTSnowy
announced in
Announcements
Replies: 2 comments 21 replies
-
Note: for testing
... here's a COBOL2014 version of it |
Beta Was this translation helpful? Give feedback.
21 replies
-
That file is free-form reference-format... (something the extension for IBM COBOL doesn't support).
But for fixed the Margin-R is explicit not defined in the standard.
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Syntactic analysis is now complete:
After ~11500 lines of handwritten recursive descent, syntactic analysis on the parser is now complete. This means that the parser is now, grammatically and structurally speaking, complete! We'll now work on implementing semantic analysis which includes error checking, error recovery, and identifier name resolution.
Most of the infrastructure needed for name resolution (the symbol table and related types) has already been implemented, and is now just a matter of refining it and putting everything together. Error recovery has also being implemented using anchor points, but we still need to find the best places to use them.
Fun fact, Otterkit now has (as far as we know) the one and only handwritten parser for COBOL 2023, and we're committed to work on making it the greatest Standard COBOL parser on the market.
Making the parser fully handwritten means that we're not limited by a parser generator and its provided functionality, or the (sometimes) not very helpful error messages. We're able to fine tune absolutely everything down to exactly when and where to show an error to the user, including the contents of the error.
And while it is an enormous amount of work to write a handwritten recursive descent parser for COBOL, we still strongly believe that this is the right choice for Otterkit.
Standard COBOL syntax checklist:
So which parts of Standard COBOL will we be supporting? In summary, we plan to support the whole standard, but we'll leave here a checklist of things that the parser "understands" syntactically speaking. This might be useful to help distinguish dialect COBOL from Standard COBOL because of how different (and sometimes incompatible) some third-party dialects are from the standard.
If you find that the parser incorrectly parses one of these (or crashes), please open an issue on this repo so that we can fix it
(Note: Name resolution is WIP so some errors are expected).
Identification division:
Environment division:
Data division:
Procedure Division:
We'll also be supporting all intrinsic functions from the COBOL 2023 standard, but we need name resolution working to correctly parse their arguments (some of which contain reserved words).
New error message API:
We've overhauled our old error messages. This new error message system not only looks much better, but is also much easier to use compared to the previous one.
Here's an example of how the new error messages look:
And here's an example of the ease of use:
This error message system gives us very precise control over exactly where (in which token) and when to display a message back to the user, as well as which messages to show. It allows us to write helpful error messages that can help even a new developer slowly learn how write COBOL.
You might have also noticed that we're not passing any file information to the error message methods, that's because internally each token keeps track of the file from which it came from. This allows us to know exactly from which file, on which line and which column the token was written in, so that we can correctly display that information back to the user.
Each token keeping track of which file it came from means that Otterkit will always be able to correctly display that information back, no matter how nested your copybook imports are.
Standard decimal math library:
To improve maintainability and reusability we moved our decimal math implementation from the runtime library into its own separate library.
The new package is called Otterkit.Numerics and is currently available on NuGet.
This library contains C# bindings around a software implementation of the IEEE 754 Decimal128 and Decimal64 types written in C. We're also planning on writing bindings for the smaller Decimal32 type.
Next steps:
Now we'll start working on the semantic analysis stage of the parser with name resolution, and also on improving the current error messages and writing new ones.
We're committed on working to make Otterkit have the greatest open source Standard COBOL parser on the market, and to continue working on the compiler in order to get it production-ready. We'll be sharing more information about Otterkit's development in the next dev update when we get semantic analysis working.
-- Otterkit Project Team
Beta Was this translation helpful? Give feedback.
All reactions