Gnome-based transpiling, a way towards dialect compatibility #23
Replies: 2 comments 11 replies
-
This sounds like a good approach (just recognize that common things will likely be resolved no only by UDF but by CALL, but that's up to the gnome implementor and the task at hand). Historically many COBOL "shops" use their own macro processors - do you see it as possible that a gnome "down in Concerning the "parser error registration" - at least when we get to macros a "dual" approach that includes a regex option would be necessary. |
Beta Was this translation helpful? Give feedback.
-
Just mentioning: you may want to come up with a vscode compatible language-server based on Otterkit which would also handle the gnomes... |
Beta Was this translation helpful? Give feedback.
-
Gnome-based transpiling, or simply Gnomes/Otterkit Gnomes, will be an Otterkit feature meant to help the Otterkit compiler deal with either non-standard or legacy COBOL codebases. Gnomes are meant to transpile sections of code on callback. This new kind of architecture sprung from a comment Gabriel (@KTSnowy) asked us to make about supporting non-standard compiler directives initiated from a discussion between him and Simon (@GitMensch) from GnuCobol. So, I invented gnomes and refined the idea with Gabriel.
What can a gnome do?
Imagine gnomes as little knowledgeable workers that help the Otterkit COBOL compiler deal with what it is currently unable to handle.
Many gnomes can live in a single extension and each gnome must cover a full statement or almost any kind of quirky extension added to the dialect. The gnomes' job is to transpile old or non standard COBOL to modern and standard COBOL (COBOL 2023 is a full-fledged general-purpose programming language, so it should be capable of dealing with those kinds of things, unlike its predecessors).
This way the gnomes could also be used to ease the migration from legacy codebases to modern COBOL.
Thanks to the framework we will introduce, each gnome will have some level of standard behavior and tooling. Each gnome can declare their Otterkit COBOL library dependencies to add to the current Otterkit COBOL project, which will probably make heavy use of our C# interop feature. For this reason, gnomes will be implemented as one of the last features in the Otterkit ecosystem, but I deemed important to still discuss about them because they will play a decently important role towards the solution to a problem COBOL is affected by: an insane amount of compilers that love to take trips away from standard COBOL and sometimes even have missing standard COBOL features.
A gnome will be able to hook and be called back in various stages (preprocessing or plaintext, lexing, parsing). The gnome will have to determine the section of code it intends to replace and will define all COBOL Otterkit code using an API to declare the various statements and function calls as objects, thus avoiding the generation of illegal statements. Also, this way, not only will we reduce the possibility of a gnome misbehaving, but we will also know that if a specific section cannot run or doesn't get parsed, it is a specific gnome's fault. We could also make the gnome run as a different thread and terminate it if it doesn't return a result in a reasonable time, thus avoiding to have to deal with badly written gnomes.
Gnome callback?
As we all could imagine, a single dialect will be made of many, many gnomes—from 100 to possibly 500 gnomes. We don't know for sure, but what we do know is that whenever possible, it's best to let them rest, ready for when they will be able to actually do the specific job they are very good at. For this reason, we determined that it's best to define a gnome's hook. I thought about regex myself at first since it's the universal way to analyze text, though I knew it was not the only way. So, Gabriel suggested token-based gnome callbacks when a gnome wishes to be called back on the lexer level, while I suggested an OnError kind of gnome that combines the parser erroring out with a selector. In this last case, the job of the gnome is to fix the code so the parser can keep going.
Note: The callback mechanism is just an optimization. It will use simple enough selectors; the gnome itself must perform the final check. If it finds that that instance of callback was a false positive, it must let Otterkit know by returning a specific value (to be defined), so Otterkit can give the task to the next candidate if present.
The power of a gnome
I personally think our gnomes should be very powerful. A parser gnome should be able to read a whole file and let the parser discard the last bits of statements that tried to parse but somehow failed. Gabriel, however, suggested we should actually try to determine the next statement and restrict the scope of a gnome. I argued that doing so could introduce many issues since we can never be sure about how even the current statement looks like. Imagine an alternative way to do a multi-line comment, much like SQL injections work. An alternatively commented display statement could introduce problems since we don't know how the statement looks like and if it contains things resembling normal COBOL statements.
Gnomes already introduce security issues by being able to add their own dependencies, so it probably doesn't matter if we let them be a bit freer. It's unfortunately almost impossible to correctly sandbox gnomes due to the role they have to fulfill, though we could introduce lower-level gnomes to at least make it easy to add common things dialects often like to arbitrarily add, such as built-in functions. Those gnomes would get much simpler selectors and would automatically point, for example, to a user-defined function.
The tools of a gnome
I determined that we cannot leave gnomes alone, as each gnome implementor making their own thing would result in a huge mess, much like what happened with C strings. So, we know we have to add some tools. Unfortunately, right now, we are not sure what a gnome may need except for one thing: name resolution. So, I will explain what we came up with so far.
A gnome's name resolution is a name resolution database dedicated to the gnomes. We cannot compromise the stability of our parser, but we recognize that gnomes and their developers may find it handy to have their own name resolution database ready to go. A gnome's name resolution database will be extremely similar to the internal name resolution database, which gnomes have read-only access to. Gnomes may define the scope a name is defined in, and in some scopes, they will be able to let the Otterkit parser read from the gnome name resolution and consider any name in there as perfectly valid. The scope may also be gnome-only, gnome of the same extensions, or all gnomes, depending on the use case. The callback will take care to properly hide names from the gnome name resolution database.
Beta Was this translation helpful? Give feedback.
All reactions