insta/re-parse #45

mnemnion · 2013-08-16T15:35:09Z

Hi Mark,

I needed this function for a project I'm working on, and I thought it was broadly useful enough to propose for the API. I suspect this function will get repeatedly rewritten if not provided.

Two use cases, and I'm using both already. The first is macro expansion: there might be macros in the macros, so we have to do insta/transform as many times as necessary until there aren't any macros left. Then, we need to completely flatten everything and reparse with the same rules.

The second is embedded code: Imagine a simple config language, where one option may contain some Scheme. It may not even be in there, and we might have a lot of config to do quickly, say on a boot. The config language is designed to be parsed with regular expressions, which is common. The parser will run faster if we just scoop the Scheme and parse it in a separate pass, with a different grammar.

An elegant extension of the grammar is possible with this function, namely progressive refinement. Currently if you provide more than one rule with the same name, Instaparse uses the last one it encounters. Actually you may want to retain this behavior, since it lets us hot-load a different version of a rule. The progressive refinement option is to parse the tree using the first set of definitions of all rules. Then, if there are remaining rules, it transforms the collected nodes against the next set of definitions, and so on if there are more than two. I think this doesn't require a separate pass, conceptually, but insta/re-parse makes it cheap to build this way. There are tradeoffs.

mnemnion · 2013-08-16T15:39:09Z

As an aside, one of the grammars I provided for demo produces an enlive tree as ({...}). The other produces it as {...}. I couldn't figure out why this was; one is built from a string, the other from a file. The code reflects the need to sometimes strip the containing list. My suspicion is that providing the tree as a list, rather than a map, is correct Enlive, since a master node then has the same syntax as any other node.

Engelberg · 2013-08-16T17:36:57Z

I believe {} to be the correct output for Enlive format. If you are getting a sequence, most likely you have hidden the name of the root tag, so there's no tag to give to the top-level node. Whether the grammar comes from a string or file shouldn't have any bearing on this behavior. If it is not the case that you have hidden the top-level tag and you're getting back a sequence, please send me an example, as I consider that to be a bug. Thanks!

I'll take a look at your pull request soon-ish.

mnemnion · 2013-08-16T19:44:53Z

The top level was a hidden tag, so you nailed that. I don't know that it
changes the code I wrote, which needs to strip the list around the val of a
:content tag anyway.

cheers,
-Sam.

On Fri, Aug 16, 2013 at 10:36 AM, Mark Engelberg
[email protected]:

I believe {} to be the correct output for Enlive format. If you are
getting a sequence, most likely you have hidden the name of the root tag,
so there's no tag to give to the top-level node. Whether the grammar comes
from a string or file shouldn't have any bearing on this behavior. If it is
not the case that you have hidden the top-level tag and you're getting back
a sequence, please send me an example, as I consider that to be a bug.
Thanks!

I'll take a look at your pull request soon-ish.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/45#issuecomment-22781419
.

Engelberg · 2013-08-18T06:06:02Z

Can you give me a concrete example of your first use case?

mnemnion · 2013-08-18T19:36:21Z

Sure!

You've got some code under a node, and one of the tags is :macro. you use
insta/transform to expand :macro, and now you have to re-parse the
:macro node, because there might be a :macro in it. If so, you transform
again, until there's no :macro left. Now, you have to take nodes that had
macros in them and re-parse the whole thing, because you may well have a
very different structure now.

Does that make sense?

On Sat, Aug 17, 2013 at 11:06 PM, Mark Engelberg
[email protected]:

Can you give me a concrete example of your first use case?

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/45#issuecomment-22825372
.

Engelberg · 2013-08-18T19:45:28Z

Clojure macros essentially transform an abstract syntax tree into another
abstract syntax tree. I see that macro transforms might need to be applied
repeatedly until no macros remain, but I don't see why any re-parsing would
be required. Is this some other kind of macro than Clojure's, like a text
replacement macro?

mnemnion · 2013-08-18T20:27:38Z

Definitely not a clojure-type macro. More like a preprocessor or Knuth
style literate expansion macro.

On Sun, Aug 18, 2013 at 12:45 PM, Mark Engelberg
[email protected]:

Clojure macros essentially transform an abstract syntax tree into another
abstract syntax tree. I see that macro transforms might need to be applied
repeatedly until no macros remain, but I don't see why any re-parsing would
be required. Is this some other kind of macro than Clojure's, like a text
replacement macro?

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/45#issuecomment-22837623
.

Engelberg · 2013-08-20T00:29:47Z

OK, I see now what you're driving at. There are some things I need to
think about, for example, whether the error messages are sufficiently
meaningful in a reparse. These are things that I'll also need to address
when dealing with parsing pre-tokenized strings, which is another open
feature request. So I'm going to leave this open for now.

Note to others who find this thread: I would welcome further comments along
the lines of "Yes this sounds useful", or "I think the API for something
like this should be more specific / more granular" or "I think the API
should be more general and do ___."

mnemnion added 5 commits August 15, 2013 17:29

re-parse working

43f0b3e

hmm

2e73e1f

hiccup works

3f8e4d5

re-parse works w. both hiccup and enlive

3db945e

one less fn

97fe284

Engelberg mentioned this pull request Dec 9, 2015

Best practice for languages that allow for the dynamic definition of operators #120

Closed

Engelberg mentioned this pull request Jun 29, 2016

[FR] tree → string emitter #140

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

insta/re-parse #45

insta/re-parse #45

mnemnion commented Aug 16, 2013

mnemnion commented Aug 16, 2013

Engelberg commented Aug 16, 2013

mnemnion commented Aug 16, 2013

Engelberg commented Aug 18, 2013

mnemnion commented Aug 18, 2013

Engelberg commented Aug 18, 2013

mnemnion commented Aug 18, 2013

Engelberg commented Aug 20, 2013

insta/re-parse #45

Are you sure you want to change the base?

insta/re-parse #45

Conversation

mnemnion commented Aug 16, 2013

mnemnion commented Aug 16, 2013

Engelberg commented Aug 16, 2013

mnemnion commented Aug 16, 2013

Engelberg commented Aug 18, 2013

mnemnion commented Aug 18, 2013

Engelberg commented Aug 18, 2013

mnemnion commented Aug 18, 2013

Engelberg commented Aug 20, 2013