-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(ABNF) instaparse considers forms in reverse order #136
Comments
The ABNF specification uses the I didn't really expect this to be a problem because I figured most users of ABNF would be using it to deal with official specs written in that notation, and I assumed official specs would not be ambiguous grammars. This is the first I've heard of an official spec being ambiguous, so I'll need to think about how to deal with that. (Can you point me to the spec where you are having this problem, just so I know for reference)? As a little background, let me explain why instaparse processes unordered choice in reverse order. My hope has always been that one of these days, I'd like to release a version of instaparse that can process unordered choices concurrently using multiple threads. So I really wanted to drive home the point that users should think of unordered choice as non-deterministic, because someday it likely will be. I feared that if I processed the choices in order, users would start to rely on that left-to-right behavior, and then eventually be upset if they can't take advantage of parallelism later because the non-determinism breaks their grammar. So I decided rather intentionally to process the choices right-to-left, so that if you have an ambiguous grammar but you are mentally making the assumption that it will be processed in a specific order, you'll get a surprising result and realize that your grammar is ambiguous. This forces people to either make their grammar unambiguous or use ordered choice (which will explicitly limit parallelism later). I can see how this might be frustrating in the context of ABNF, where you don't have easy access to ordered choice, but like I said, I didn't expect it to be an issue with ABNF grammars. One option would be, like you said, to make this into ordered choice. Another option would be to just make ABNF process choices left to right. The advantage of the first option is that the feature was designed to behave correctly when parallelism is implemented. The disadvantage is that this parallel-proof design of ordered choice is consequently rather complex, and was meant to be used judiciously. Using it everywhere probably wouldn't be good performance-wise and has some subtle interactions with negative lookahead that can be surprising. A simple left-to-right processing would perform the best, but then I'd have to special case ABNF and not apply parallelism for such grammars in the future. Dilemmas. So this is something I'd need to think about. I'd welcome more input from you about your specific problem and what solution would work best for you. |
I was rather afraid of that since I couldn't actually find any documentation specifying how ABNF should be evaluated, just it's specification. The grammar I'm working with is the SPF Specification, where it looks like it makes the left-to-right assumption in two places:
I haven't run into the second yet, but it looks less painful to me since key is basically an alphanumeric string with a few extra allowed characters anyway. The first is what prompted me to open this bug report, and means that all modifiers are parsed as unknown modifiers. I can work around that one too, but it's worth noting that in trying to fix it, I found one of your tests has a similar issue, in abnf_uri.txt you have The result of this is that in the test where you parse Looking ahead on my project to the dkim spec, it looks like it's making similar assumptions. I spotted a few examples there, but here's the simplest.
Even SMTP isn't immune:
So it's looking to me like the order may be unspecified, but the assumption is widespread. |
Hi, I just hit this bug - using this grammar, that also assumes that alternatives are considered left-to-right. (the grammar is quite complex so I won't explain where the problem was, but I can detail if needed) I was able to get the desidered behaviour by applying the fix in PR #137 ( @Engelberg what's your opinion on this? |
I'm glad you were able to apply the workaround. I've decided that in the next major release of instaparse, I'm going to make left-to-right processing the default, but provide some to-be-determined mechanism for backwards compatibility. Maybe, like you suggested, a dynamic variable is the way to go. If you have any other thoughts on the best way to incorporate this without screwing people who have come to rely on right-to-left processing, I'm all ears. My plan is for this to be the new default in both EBNF and ABNF grammars, so no special casing of ABNF will be needed. I'm not ready to commit to a release date, but my work load is getting a little lighter soon so I hope to have time to revisit this in the near future. |
given grammars
test1.abnf
and
test2.abnf
and parsers
It looks to me like instaparse in considering the forms for test in reverse order
The text was updated successfully, but these errors were encountered: