Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

insta/re-parse #45

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

insta/re-parse #45

wants to merge 5 commits into from

Conversation

mnemnion
Copy link
Contributor

Hi Mark,

I needed this function for a project I'm working on, and I thought it was broadly useful enough to propose for the API. I suspect this function will get repeatedly rewritten if not provided.

Two use cases, and I'm using both already. The first is macro expansion: there might be macros in the macros, so we have to do insta/transform as many times as necessary until there aren't any macros left. Then, we need to completely flatten everything and reparse with the same rules.

The second is embedded code: Imagine a simple config language, where one option may contain some Scheme. It may not even be in there, and we might have a lot of config to do quickly, say on a boot. The config language is designed to be parsed with regular expressions, which is common. The parser will run faster if we just scoop the Scheme and parse it in a separate pass, with a different grammar.

An elegant extension of the grammar is possible with this function, namely progressive refinement. Currently if you provide more than one rule with the same name, Instaparse uses the last one it encounters. Actually you may want to retain this behavior, since it lets us hot-load a different version of a rule. The progressive refinement option is to parse the tree using the first set of definitions of all rules. Then, if there are remaining rules, it transforms the collected nodes against the next set of definitions, and so on if there are more than two. I think this doesn't require a separate pass, conceptually, but insta/re-parse makes it cheap to build this way. There are tradeoffs.

@mnemnion
Copy link
Contributor Author

As an aside, one of the grammars I provided for demo produces an enlive tree as ({...}). The other produces it as {...}. I couldn't figure out why this was; one is built from a string, the other from a file. The code reflects the need to sometimes strip the containing list. My suspicion is that providing the tree as a list, rather than a map, is correct Enlive, since a master node then has the same syntax as any other node.

@Engelberg
Copy link
Owner

I believe {} to be the correct output for Enlive format. If you are getting a sequence, most likely you have hidden the name of the root tag, so there's no tag to give to the top-level node. Whether the grammar comes from a string or file shouldn't have any bearing on this behavior. If it is not the case that you have hidden the top-level tag and you're getting back a sequence, please send me an example, as I consider that to be a bug. Thanks!

I'll take a look at your pull request soon-ish.

@mnemnion
Copy link
Contributor Author

The top level was a hidden tag, so you nailed that. I don't know that it
changes the code I wrote, which needs to strip the list around the val of a
:content tag anyway.

cheers,
-Sam.

On Fri, Aug 16, 2013 at 10:36 AM, Mark Engelberg
[email protected]:

I believe {} to be the correct output for Enlive format. If you are
getting a sequence, most likely you have hidden the name of the root tag,
so there's no tag to give to the top-level node. Whether the grammar comes
from a string or file shouldn't have any bearing on this behavior. If it is
not the case that you have hidden the top-level tag and you're getting back
a sequence, please send me an example, as I consider that to be a bug.
Thanks!

I'll take a look at your pull request soon-ish.


Reply to this email directly or view it on GitHubhttps://github.com//pull/45#issuecomment-22781419
.

@Engelberg
Copy link
Owner

Can you give me a concrete example of your first use case?

@mnemnion
Copy link
Contributor Author

Sure!

You've got some code under a node, and one of the tags is :macro. you use
insta/transform to expand :macro, and now you have to re-parse the
:macro node, because there might be a :macro in it. If so, you transform
again, until there's no :macro left. Now, you have to take nodes that had
macros in them and re-parse the whole thing, because you may well have a
very different structure now.

Does that make sense?

On Sat, Aug 17, 2013 at 11:06 PM, Mark Engelberg
[email protected]:

Can you give me a concrete example of your first use case?


Reply to this email directly or view it on GitHubhttps://github.com//pull/45#issuecomment-22825372
.

@Engelberg
Copy link
Owner

Clojure macros essentially transform an abstract syntax tree into another
abstract syntax tree. I see that macro transforms might need to be applied
repeatedly until no macros remain, but I don't see why any re-parsing would
be required. Is this some other kind of macro than Clojure's, like a text
replacement macro?

@mnemnion
Copy link
Contributor Author

Definitely not a clojure-type macro. More like a preprocessor or Knuth
style literate expansion macro.

On Sun, Aug 18, 2013 at 12:45 PM, Mark Engelberg
[email protected]:

Clojure macros essentially transform an abstract syntax tree into another
abstract syntax tree. I see that macro transforms might need to be applied
repeatedly until no macros remain, but I don't see why any re-parsing would
be required. Is this some other kind of macro than Clojure's, like a text
replacement macro?


Reply to this email directly or view it on GitHubhttps://github.com//pull/45#issuecomment-22837623
.

@Engelberg
Copy link
Owner

OK, I see now what you're driving at. There are some things I need to
think about, for example, whether the error messages are sufficiently
meaningful in a reparse. These are things that I'll also need to address
when dealing with parsing pre-tokenized strings, which is another open
feature request. So I'm going to leave this open for now.

Note to others who find this thread: I would welcome further comments along
the lines of "Yes this sounds useful", or "I think the API for something
like this should be more specific / more granular" or "I think the API
should be more general and do ___."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants