-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Selective auto-whitespace #64
base: master
Are you sure you want to change the base?
Conversation
Looks like a useful addition. Can you elaborate on whether inclusions/exclusions for whitespace rules "trickle down" to subrules? For example, S = A If you say :only S, does that mean no whitespace is inserted (since the tokens actually occur in rule A), or the whitespace gets added because it occurs within S? How about: S = A B If you :only A, would that suggest that the C you get to via A includes whitespace, but the version you get to via B does not? |
Hm, good question. I would say we want the most precise (and least surprising) meaning, so in the first case it should accept "aaa ", but not "a a a"; and in the second case it should accept "ccc ccc", but not "ccc ccc ". The latter case currently does not hold, as auto whitespace skips alias nodes. Will fix (provided you agree with the direction). Oh, and a happy new year! :) |
I've been thinking about this again. So the way it works right now is that optional whitespace is added in front of all tokens (strings or regexes) and at the end of the start rule. It does seem that allowing some sort of "trickling down" could be quite confusing, so one way to do this is to say that when you add :only and :except, you are saying whether optional whitespace is automatically added in front of every token, i.e., string or regex, in that rule. But if you don't include the start rule, it's totally unclear what should happen with the space at the end. And if you only cover select rules, should those rules allow for spaces at the end of those productions? Based on your answer above to my "second case", it sounds like you're advocating that optional whitespace also be inserted in front of every non-terminal as well. I'm somewhat concerned about how much that would balloon the grammar, but it's an option, if that's the most intuitive behavior. I'd welcome another round of input from you on the matter. |
I think the clearest rule would be, if we treat autowhitespacing as a series of rewrite rules; that is, if there is well defined expansion the user could do on her own that would work the same. Furthermore to allow composability (my use-case for instance leans heavily on composability, overwriting and aliasing) these rewrite rules should be local to each rule. As for ballooning, I think it's first of all an ideological question: is Instaparse first and foremost fast (part of infrastructure) or is it convenient (a hammer you reach for first). My current needs are right now in the latter camp and so is my bias. |
What about having some way to mark a Eg:
would insert whitespace before A and B but not before C and D. (assuming # is the character used to mark the grouping) |
I suspect this would have helped us, as we liked using the whitespace, but our application (parsing a GraphQL query document) included literal text in quotes; the auto-whitespace removed that whitespace and we did not find a way to exclude auto-whitespace logic from the productions that were such literal text, so we reverted to a frequently used hidden |
Here is a somewhat rough solution to this problem that I came up with for processing string literals. Remembering that an instaparse parser is just data, you can transform it prior to invocation to strip out the automatically injected whitespace rules. Given a grammar with a string literal rule:
Define the parser as usual, and then strip out the unwanted auto-whitespace rules:
By splitting |
That said, I like @danielytics approach of wrapping a rule in an "atomic construct" like |
Added the option to auto-add whitespace only to select tags or to exclude certain tags from having whitespace added.
See tests for examples.