Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support *print-readably* for Failure and Parser print-method #230

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tonsky
Copy link

@tonsky tonsky commented Oct 4, 2023

This is important for machine-readable REPLs like nREPL and Clojure Sublimed Socket REPL to operate. They have to print values to send them over the network in machine-readable format. Similar to what prn does compared to println

Before:
Screenshot 2023-10-04 at 16 43 21

After:
Screenshot 2023-10-04 at 16 43 06

@Engelberg
Copy link
Owner

I'm currently in the process of evaluating instaparse pull requests. Thanks for submitting this.

I would appreciate some more background information about what sort of problem you're seeing in a repl with the current behavior, as I haven't seen the issue myself, and I use nREPL via CIDER.

My concern with making this change is that print-readably is, by default, true, so this would potentially be a breaking change as it would change the default behavior for everybody. This is not an opt-in modification.

This change does indeed break one of the tests in core_test.cljc, specifically the round-trip test.

round-trip is defined as follows:

(defn round-trip [parser]
  (insta/parser (prn-str parser)))

This round-trip definition, in my mind, is the basic promise of printing readably, and instaparse currently fulfills this. For example, in Clojure, if you evaluate (/ 2 3), it prints at the REPL 2/3. Note that it does not print out the string "2/3". It prints a value that, if you tried to read it directly (read 2/3) would generate an error message. Instead, it's a value that, when wrapped in a string, can be read. So (read "2/3") works. read only works on strings, but not everything prints as an explicit string, and that's why the expectation is that the user or some other process is responsible for wrapping it in a string before trying to read it.

Analogously, evaluating an instaparse parser with a REPL using the default clojure printer prints a value that can't be read directly; you or some other process is responsible for wrapping it in a string if you want to work with it as a value. I don't think there's any way for me to modify the built-in clojure reader to support this notation, so the best I can do is guarantee that the string can be read by the parser function. It seems like this is the desired behavior. With your change, the output is surrounded by double quotes, so if you wrap that in a string and read it, it's readable in the sense that the Clojure reader understands it to be a string value and would return a string. But the string value is not actually a parser, so I'm not sure what's been accomplished.

I acknowledge there are some limitations to the current print method, for example, the printed output does not capture a parser's desired output format. There is some precedent for that within Clojure, for example, sorted-sets and sets print the same. But for true serialization needs, (binding [*print-dup* true] ... is supported. I would think that if you were in an environment that was reliant on sending the output over a wire, you'd require a stronger promise than simply being readable, you'd want it to read as the correct object. Therefore, I would think you'd start your session with (set! *print-dup* true). And for more casual need to see the underlying record representation, you can use a pretty printer such as Clojure's pprint.

@tonsky
Copy link
Author

tonsky commented Mar 19, 2024

IIRC the main confusion was not about sending over network or escaping. The main problem was trying to understand what did that function actually returned. Normally you can just prn and parse, there is very limited number of types possible. But if you hijack printing all bets are off. In my experience so far, Instaparse is the only one that does that.

Alternatively, REPL can always just print value and not try to understand what it is. But then you can’t do e.g. pretty-printing, at least not on the client side.

@Engelberg
Copy link
Owner

I think I understand your point about the printing making it harder to see exactly what the type is. But the type definitely is there, and the REPL doesn't really seem to have a problem with it as far as I can tell. My CIDER REPL is currently configured to use Clojure's pprint to display REPL output, and so my interaction currently looks like this:

user> (def as-and-bs (insta/parser "S = AB* AB = A B A = 'a'+ B='b'+"))
#'user/as-and-bs
user> as-and-bs
{:grammar
 {:S
  {:tag :star,
   :parser {:tag :nt, :keyword :AB},
   :red {:reduction-type :hiccup, :key :S}},
  :AB
  {:tag :cat,
   :parsers ({:tag :nt, :keyword :A} {:tag :nt, :keyword :B}),
   :red {:reduction-type :hiccup, :key :AB}},
  :A
  {:tag :plus,
   :parser {:tag :string, :string "a"},
   :red {:reduction-type :hiccup, :key :A}},
  :B
  {:tag :plus,
   :parser {:tag :string, :string "b"},
   :red {:reduction-type :hiccup, :key :B}}},
 :start-production :S,
 :output-format :hiccup}
user> (type as-and-bs)
instaparse.core.Parser

I've never understood why Clojure's pprint doesn't show the type of a record; that's another example of the type not being readily available from printing. But pprint does show you the underlying map, so this is normal pprint behavior. The point here is that overriding the print-method does not really affect any of the other printers, or the REPL's ability to work with the value, as far as I know.

I think if I were doing it all over again, I'd be more reluctant to override the default printer, but the advantage is that the output is much easier to understand than the grammar in data form, which doesn't make sense unless you understand the implementation, or at least understand the various combinators that are used to build the parser. Most people seem to like that ease of understanding, rather than being able to see the innards. Probably the error printing is the most valuable, as the linebreaks and whitespaces in the printing places the ^ under the error. I know some instaparse applications have utilized this by capturing this error output and displaying it to the user, as-is. That's the biggest reason I can't lightly change this, as I know some people capture and display this output.

@tonsky
Copy link
Author

tonsky commented Mar 20, 2024

This PD only changes output when you pr stuff, while keeping print output the same. If you pr Failure and see a string without quotes it feels wrong, at least to me.

I’m not super invested in this, I already fixed this for myself, so I’ll leave the decision of what’s more important: less unexpected behaviour or backwards compatibility up to you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants