[Core Dev]: Having trouble understanding the spec for async COBOL #22

KTSnowy · 2023-04-30T04:48:06Z

KTSnowy
Apr 30, 2023
Maintainer

Moving this discussion to the organization's discussions page. These were originally emails.

Hey Simon, the COBOL 2023 standard introduced the concept of asynchronous message passing in COBOL, which according to the standard works for sending messages to both local processes and to remote processes. I think this is an awesome feature and I want to implement it in Otterkit.

But I'm having trouble understanding the spec, because they leave the entire definition of the Message Control System up to the implementor so I don't have any spec or guidance for which protocol or system I should use to send and receive these messages.

I wish I could contact the committee and ask them if they have any protocol suggestions for this new feature, or if the protocol for local and remote messages should be different.

I'm working on implementing it right now for Otterkit, but it's difficult without some guidance from the standard on what the message passing protocol looks like and how the MCS works.

Please let me know if you have any ideas or suggestions for which protocol or system we should use for this, because I'm having trouble this out on my own.

I'm also having trouble figuring out if each message server should have its own MCS, or if the MCS should run as a single independent process handling all message exchanges for a single machine.

Reply from @GitMensch:

I'm personally interested in implementing MQS but haven't looked at the
specs for it since months.
As it is async I think you always need either an external process that
handles these or store them on a file (like sqlite or ISAM) / in a DB
(would be a separate process again).

My personal first take for implementing it would be using MQ server,
because there are multiple implementations for it and it is quite
portable; also you can use this quite fine for both local and remote
communication.
Additionally I know of several systems that use MQ from COBOL (via C
libraries) so it would be a good match.

In any case I'd highly suggest to go with "usable COBOL environment" first:

ensure you handle all the statements that are both in COBOL2023 and
COBOL85

try to run all NIST85 that don't test obsolete features, potentially
check the "run" checks from GnuCOBOL

consider adding additions that are too much in use to ignore them,
ideally with an option to enable/disable them:

COMP-5 (very similar to the BINARY ones) COMP-3 (identical to
PACKED-DECIMAL) and COMP-6 (identical to PACKED-DECIMAL WITH NO SIGN)

level 78 constants

handling CDF according to MicroFocus

Because when this is done people can actually try Otterkit with
production code.

Only if this "useful state" is reached, then go on with "interesting
recent additions".

KTSnowy · 2023-04-30T04:51:48Z

KTSnowy
Apr 30, 2023
Maintainer Author

Hey @GitMensch

As it is async I think you always need either an external process that handles these or store them on a file (like sqlite or ISAM) / in a DB (would be a separate process again).

Yeah it would probably be best to handle it in a separate process, the overhead of constant IO on an sqlite or other database file would most likely be too much for performant local use. I'm also not entirely sure if sqlite supports concurrent and thread safe read write operations on the same db file.

Something like this needs to be entirely thread safe, because the standard mentions the possibility of multiple programs running concurrently sending and receiving messages.

Additionally I know of several systems that use MQ from COBOL (via C libraries) so it would be a good match.

Thanks for mentioning this, yeah it sounds like this might be the best option for us as well. I'll look around and see if I can find any existing MQ server implementation that works well with C#. If I can't one I'll look for C implementations and write our own C# bindings for it.

I haven't worked with any MQ server or client yet, are the implementations usually performant enough to be usable for messages between local processes?

In any case I'd highly suggest to go with "usable COBOL environment" first:

We need to figure out the architecture for async code before we work further on the codegen.

Because it will be much more difficult later to try and fit async code into a codegen architecture that was designed to only handle completely synchronous code, than if we designed the architecture from the start to correctly handle generating safe asynchronous code (and thread safe synchronous code).

By safe asynchronous code I mean that the codegen needs to make sure that whatever it is generating will always be thread safe, and won't ever be blocking the current or the main thread. This is because Otterkit will allow you to generate a .NET class library from COBOL code, and there's no way for us to know where that code will be executed from.

Here's an example of the potential problem:

Imagine that a company needs some of COBOL's features in C# and instead of reinventing the wheel they just compile COBOL code using Otterkit to include as a library in their own C# project, they needed a couple of COBOL specific features and they also wanted to use COBOL's already supported async messaging features because it's easier than maintaining their own MQ server solution and related functionality.

Now imagine that for whatever reason the code in this COBOL library is being called in a UI thread, because maybe they wanted message passing in their enterprise GUI application.

If the codegen was not designed to account for and correctly generate async code then as soon as they execute a SEND or RECEIVE (or some CPU intensive task) the UI thread will block while it waits for the message to send or to be received. To the user it will look as if the application almost crashed (if it recovers at all).

We need to make sure that we design the codegen to handle async code correctly and to expose context safe async APIs when generating a class library, because we have no idea where that async class library might be called from.

ensure you handle all the statements that are both in COBOL2023 and COBOL85

I've looked at the 2002 and 2014 standards and the only removed statements from COBOL 85 appear to be the ALTER and the ENTER statements. From what I can see in the 2002 standard (unfortunately I don't have the 85 one) the ALTER and ENTER statements were already considered obsolete back in COBOL 85, and there's good reason for it.

The ALTER statement allows the user to change the procedure name in a GO TO statement, effectively changing where it will jump to next time. Apparently there's no limit on how many times you can change the same GO TO statement to point to a different procedure.

Because of this we absolutely should not support this statement, and there's good reason it was considered obsolete even in COBOL 85 and removed in COBOL 2002. This statement encourages bad coding practices and creates potentially unreadable code, it's also incredibly error prone. We shouldn't allow developers to use the GO TO statement like this so I don't believe we should provide support for it again.

The ENTER statement is a much nicer one, but unfortunately it would conflict with the modern COBOL way of calling functions from other languages (and introduce two ways of doing the same thing). This statement is also apparently not supported in other implementations, both Micro Focus and IBM's documentations mention it having no effect during execution and being treated as documentation only.

Also, because we don't have access to the COBOL 85 standard we don't know what is actually in the standard, and have to rely on the removed language elements section from newer standards to mostly guess at certain features. Unfortunately the removed elements list doesn't provide the spec for these so we can't rely on it to provide these features.

I'm also not entirely sure if we should support obsolete and archaic features that have been removed since COBOL 85. The standard is almost 40 years old and some of these features were removed for good reason (the standards also provide justification for the removal). I'm inclined to trust the newer standards and the COBOL working group on this.

COMP-5 (very similar to the BINARY ones) COMP-3 (identical to PACKED-DECIMAL) and COMP-6 (identical to PACKED-DECIMAL WITH NO SIGN)

I'm not sure if we should provide two ways of specifying the same thing. Reason for this is that when new developers (because let's be honest, we need new COBOL developers in the ecosystem) start learning the syntax and ask themselves what the difference is between COMP-3, COMP-6 and PACKED-DECIMAL, we'd have to tell them that they are exactly the same thing but written differently, because one is standard syntax and the other is mostly third-party syntax.

This can be confusing and frustrating for a new developers, and is the reason a lot of people hate C++, because of it having different ways of doing the exact same thing. What do we tell a new developer when they ask if they should use COMP-3 or PACKED-DECIMAL for a certain variable?

From third-party syntax

01 data-name ... USAGE COMP-3.
01 data-name ... USAGE COMP-6.

From Standard COBOL:

01 data-name ... USAGE PACKED-DECIMAL.
01 data-name ... USAGE PACKED-DECIMAL NO SIGN.

There's no right answer to that question because it would be up to developer preference, and that's alright for experienced developers but it can be confusing and frustrating for new devs.

We could add compiler errors telling developers to change COMP-3 and COMP-6 to use the standard PACKED-DECIMAL syntax, but I don't believe we should allow both at the same time as valid syntax.

level 78 constants

The same reason above also applies to level 78 constants, it appears to be a Micro Focus specific extension. Issue is that Standard COBOL already supports constants, and they must be level 01 instead of level 78, and must use the CONSTANT reserved word.

Supporting this could be arguably worse and more confusing for both experienced and new developers, because unlike COMP-3 and COMP-6 the syntax rules for level 78 constants are different from the standard level 01 constants.

From ACUCOBOL's documentation:

78 constant-name VALUE 20.

From Standard COBOL:

01 constant-name CONSTANT 20.

If a new developer asks again which one should they use, what do we tell them?

In this case I would recommend them not to use level 78 constants because they are incompatible with the standard syntax. It essentially contradicts the rules for constants given by the COBOL standard by providing another way to define constants that is different in both syntax and semantics.

We could also add compiler errors telling developers to change level 78 constants to the standard level 01 constants, but I don't believe we should allow both at the same time as valid syntax.

handling CDF according to MicroFocus

If by CDF you mean Compiler Directing Facility, then absolutely not! We need to handle it according to the rules given by the standard, as should everyone else to ensure that at least compiler directives are compatible between implementations (sadly I don't think vendors actually care).

Why should we handle compiler directives according to Micro Focus instead of the standard?

Maybe if there's really good reason for it, but if not then I'm more inclined to trust the COBOL working group with the standard compiler directives.

Because when this is done people can actually try Otterkit with
production code.

Only if this "useful state" is reached, then go on with "interesting
recent additions".

start-rant

I would love to get Otterkit into a "useful state", but we need to define what useful is.

Is it useful for the ecosystem in general? Is it useful for new developers and students? Is it useful for modern development and new COBOL projects? Or is it just useful for old COBOL 85 production code?

I'm sorry if I sound a bit upset at this, but I really really dislike the idea that for a COBOL compiler to be useful in 2023 it must maintain full COBOL 85 backwards compatibility (an almost 40 year old standard) and also implement third-party vendor extensions that might conflict with the COBOL standard.

Some COBOL 85 features might also conflict with COBOL 2023 features, and I really dislike the idea that we would have to give priority to COBOL 85 in order for Otterkit to be useful instead of being able to give the ecosystem the latest and most modern features COBOL can offer, like every other ecosystem does. C compilers have been implementing some C23 features before the standard was even published for example, same for C++ compilers and the C++23 standard.

Yeah, sadly the ecosystem is in the state it currently is with a lot of companies stuck on COBOL 85, but that doesn't mean compiler developers have to be held hostage into eternally supporting mostly only COBOL 85 because of them. At least the committee is doing their best to introduce modern features and move COBOL forward, they have not given up yet and I'm really glad they haven't.

But also one of the saddest things related to software development that I've ever read is this quote from the COBOL standard:

These features have not been implemented as of the writing of this revision by any COBOL provider. Neither users nor implementors have indicated interest or intention for providing these facilities.

The COBOL 2023 standard mentions this for both the Validate facility and the STANDARD-BINARY arithmetic, both of which are incredibly useful for modern development in general and it would be useful for a COBOL compiler to support these.

You do want standard binary arithmetic if you're not working with currency (which is most of the time), and you do want to be able to easily validate the contents of a data item so that you don't have to write the "low level" validation code yourself on every project.

The Validate facility provides easy data item content validation, which is something no other language has, and it was something we could show to new developers as being a great new feature that makes the life of a COBOL developer easier. You just call VALIDATE data-name ... and it will automatically validate the contents according to the validation rules specified by the developer in the data item definition, incredibly easy built-in input validation.

It's so simple and useful, but then why has it not being implemented by anyone yet? My guess is that vendors didn't see it as being profitable enough due to most customers still using mostly COBOL 85 and not willing to touch their code, and so they just never really implemented it. Same for the standard binary arithmetic, because most customers would already be using decimal arithmetic for most things anyway.

This ends up being a sort of feedback loop, where companies refuse to upgrade to modern COBOL and because of that compiler vendors support only mostly COBOL 85 and maybe 2002, but then companies that would upgrade to modern COBOL also can't because compiler vendors don't support it and probably never will.

An analogy for this would be if the Linux kernel completely stopped adding new features and improvements because some companies refused to upgrade to newer versions of the kernel, or if developers stopped working on better internet protocols just because everyone uses TCP anyways (See QUIC and HTTP/3).

Thing is that if we continue like this, eventually COBOL will end up dying with the reputation of being a bad, clunky legacy language because most people never had the opportunity to even see or use modern COBOL code, even though it is an actually good and useful language with some unique and interesting features.

end-rant

We don't want Otterkit to also be stuck on giving priority to COBOL 85 instead of implementing the new standard, and so we as a team (3 engineers) decided to implement the COBOL 2023 standard first, and worry about COBOL 85 backwards compatibility later when we have more resources for it (we also don't have access to the COBOL 85 standard).

Currently we're only 3 engineers and we don't have enough resources to implement both COBOL 85 and COBOL 2023 (and make sure both work independently), maybe if we get more contributors (and companies that need COBOL 85) to help with it we could add a compiler option that switches Otterkit into a "COBOL 85 compatibility" mode.

Until then we can't really prioritize implementing COBOL 85 yet again as we believe that wouldn't really benefit the future of the ecosystem in any way.

What is currently missing from the ecosystem is a free and open source COBOL compiler that supports all modern COBOL features. Most of these features (including object orientation) are currently locked in expensive proprietary compilers.

The current proprietary situation gives almost no opportunity for new developers and students to learn about modern COBOL, so they're either stuck with COBOL 85 or they have to pay a prohibitively expensive compiler licensing fee just to learn new features.

We want to change this and improve the ecosystem, this is Otterkit's primary goal.

0 replies

GitMensch · 2023-04-30T07:26:42Z

GitMensch
Apr 30, 2023

Imagine that a company needs some of COBOL's features in C# and instead of reinventing the wheel they just compile COBOL code using Otterkit to include as a library in their own C# project, they needed a couple of COBOL specific features

Companies normally would not reinvent the wheel of they have working code... And for that they wouldn't want to change their existing copybooks and programs, so it is important to support the most common extensions in production code - which I've outlined above.

For ALTER and ENTER - I totally agree, that's so old - and largely not used any more - that it doesn't need to be implemented.
For new features like VALIDATE and PERFORM WORTH EXCEPTION - I totally agree: please implement those (GnuCOBOL will do the same but while having a quote big user base we don't have much dev resources either and kind of a promise to support legacy code).

For CDF with $IF, well, to provide compatibility you'd need to additional add the extension that existed before it was defined in 2002 and that is the one used by MicroFocus. There's also a completely different version that was added to ACU COBOL, but the MF one is "nearly compatible", so I'd highly suggest to implement that after you consider standard CDF to be working.

If a new coder asks what to use:
For new code: standard, for old code: whatever it used. This is the same for the common style rule (free/fixed, casing, user-defined names, general program flow, ...)

You could implement those with a warning option that defaults to be enabled, like -Wextensions.

0 replies

KTSnowy · 2023-04-30T15:49:32Z

KTSnowy
Apr 30, 2023
Maintainer Author

Hey @GitMensch,

Companies normally would not reinvent the wheel of they have working code... And for that they wouldn't want to change their existing copybooks and programs, so it is important to support the most common extensions in production code - which I've outlined above.

I meant non-COBOL companies developing a new COBOL library for use in their existing non-COBOL code, because there's a lot of useful functionality in COBOL that other languages don't have. These companies would be much more interested in getting support for the newest features from the newer standards than getting support for COBOL 85 and related legacy extensions.

The COBOL ecosystem needs new users as well, that's how software ecosystems grow. Right now legacy COBOL users already have both proprietary and great open source compilers available, but what about completely new users that don't care about legacy? There's not really an option for completely new users looking for a modern development experience with modern COBOL features.

We should be actively encouraging new usage examples for COBOL. We shouldn't keep reinforcing the idea that the only COBOL users are companies maintaining existing legacy production code.

Any completely new user would be stuck in COBOL 85 as well, and that's a huge problem for the whole ecosystem. We need to encourage new developers to learn modern COBOL instead of legacy COBOL, and we should provide them a free and open source compiler that fully supports modern COBOL so that they are able to learn about it.

For companies that are not willing to touch or update their programs and copybooks, it's very unlikely that they would be willing to change the compiler they've been using for the past 40 years as well. It's extremely unlikely they'll decide to use Otterkit because that would involve a change of compilers, and they don't want change given their unwillingness to update any piece of their existing code to modern standard COBOL.

Attempting to provide these "unwilling to touch codebase" companies a brand new compiler is going to be a lost cause, but that doesn't mean that the COBOL ecosystem in general is a lost cause. We can still provide a new compiler for modern scenarios and for new developers that don't have to deal with the legacy aspect of COBOL at all.

For companies that are willing to update but couldn't because of compiler support, they could also see Otterkit as an opportunity to attempt moving their mostly legacy codebase to a fully modern Standard COBOL environment that is completely free of legacy and proprietary aspects.

For CDF with $IF, well, to provide compatibility you'd need to additional add the extension that existed before it was defined in 2002 and that is the one used by MicroFocus.

We shouldn't do that, as I mentioned above we shouldn't provide two different ways of doing the exact same thing, this is specially true for CDF. Supporting both >>IF and $IF would be providing support for two different syntaxes for compiler directives, and that's something no other language has to deal with for good reason.

We as COBOL compiler maintainers should be pushing for more standard conformance, specially for things as fundamental to compatibility for the language as compiler directives. The whole ecosystem should have moved to use the standard CDF syntax to allow for cross-dialect libraries using >>IF and >>EVALUATE to compensate for dialect differences (Just like C and C++ libraries do).

But also, Visual COBOL actually seems to use the standard >>IF syntax instead of $IF.

You could implement those with a warning option that defaults to be enabled, like -Wextensions.

We'll do for useful and unique extensions, but for extensions that would cause duplicate functionality or conflicts/incompatibility with the COBOL standard, I don't think we should accept them as valid syntax. We should provide useful errors to help migrate to the standard syntax, I believe that would be the overall best option.

Duplicate functionality are things like we discussed above: >>IF/$IF, COMP-3/PACKED-DECIMAL and level 78 constants/level 01 constants. These are mostly the exact same functionality but written differently, it's not useful to have both at the same time as valid syntax.

Conflicts and incompatibility with the standard (level 78 constants also conflict a little bit) are things like Visual COBOL's object orientation syntax. You can either implement the Visual COBOL syntax or the Standard COBOL syntax, but due to the fundamental differences these two have you can't implement both and still have a usable compiler.

We're trying to stay as close to the standard as possible and worry about extensions later when we get more time and resources to figure out which ones are worth it to implement. Right now our handwritten parser for COBOL 2023 is mostly complete with ~11500 lines, we really don't want to go back into it and try to fit potentially conflicting third-party extensions without good reason, we also don't even have access to the full technical specification for some of these extensions.

If companies and developers that rely on these features are willing to help figure out the technical specification and implementation that they need then we could support these with an "extension compatibility" option, but until then we really don't have the development resources to support extra third-party extensions on top of COBOL 2023.

0 replies

KTSnowy · 2023-05-01T19:18:15Z

KTSnowy
May 1, 2023
Maintainer Author

Hey @GitMensch, after speaking a bit more with the team we've decided to introduce the concept of "extension gnomes" to Otterkit.

These extension gnomes would be written by the community and they would "hook" into certain parts of the compiler pipeline, allowing the COBOL community and companies to write their own syntax extensions for Otterkit without us having to hardcode these into our standard conforming parser and codegen.

The idea would be for users to get access to some of the parsing tools we made, allowing them to parse their extensions according to the exact rules and spec they require, and then define either an equivalent Standard COBOL code snippet to transpile to or an otterkit runtime C# snippet to generate.

Our current compilation pipeline is modular and flexible enough for us to easily implement this as a usable feature, and could arguably be more useful long term for the community to be able to "import" extensions as needed than to hardcode them into the compiler (which we want to keep strictly standard conforming)

Any feedback for this feature would be greatly appreciated, and please let us know if you have any concerns about it.

2 replies

GitMensch May 1, 2023

The only possible concerns would be:

hard to understand/write gnomes - but that highly depends on the implementation
overall performance drop of the parsing phase (either in general or when a (set of) gnome(s) is active) - but that again depends on the implementation

In general I do like the idea - and of course I suggest the list above to be used as "sample gnomes" that are distributed :-)

KTSnowy May 1, 2023
Maintainer Author

Hey @GitMensch, together with the above suggestions I was also planning on providing samples for the documentation paragraphs (I think they're from COBOL 85?).

These are the AUTHOR, DATE-WRITTEN and others, which as far as I've seen are used for documentation purposes only (please correct me if I'm wrong) and can be safely removed from the AST during parsing (leaving the original file unchanged) after the extension gnome has parsed and checked their syntax.

For simple things like the COMP-3 and COMP-6 the code for the extension gnome should be pretty straight forward on the user side as well, just parse and syntax check the COMP-3/6 (single token) and replace those in the AST with PACKED-DECIMAL [NO SIGN].

We intend to expose several APIs to make it easier to extend the syntax in specific places. I think that if we implement this right we could alleviate a lot of the "required extensions" issues in the ecosystem.

If we can implement an easy and simple enough API then people that require COBOL 85 features for example could just include a "COBOL 85 extension gnomes" bundle to run their code with Otterkit, or just specific parts of it if they don't need all COBOL 85 features.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Otterkit

[Core Dev]: Having trouble understanding the spec for async COBOL #22

{{title}}

Replies: 4 comments 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Otterkit

[Core Dev]: Having trouble understanding the spec for async COBOL #22

KTSnowy Apr 30, 2023 Maintainer

Replies: 4 comments · 2 replies

KTSnowy Apr 30, 2023 Maintainer Author

GitMensch Apr 30, 2023

KTSnowy Apr 30, 2023 Maintainer Author

KTSnowy May 1, 2023 Maintainer Author

GitMensch May 1, 2023

KTSnowy May 1, 2023 Maintainer Author

KTSnowy
Apr 30, 2023
Maintainer

Replies: 4 comments 2 replies

KTSnowy
Apr 30, 2023
Maintainer Author

GitMensch
Apr 30, 2023

KTSnowy
Apr 30, 2023
Maintainer Author

KTSnowy
May 1, 2023
Maintainer Author

KTSnowy May 1, 2023
Maintainer Author