-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support the whole Rust grammar #29
Comments
scrabsha
added a commit
that referenced
this issue
Apr 15, 2024
This MR sorts the "expected XXX, YYY, ZZZ" list printed by the `expandable` proc macro so that its ordering does not depend on how the grammar is defined. This will make the transition to the compiled parser smoother (#29). The sorting algorithm (lexicographic order) is far from perfect, but we just need to have something vaguely deterministic for now. --------- Signed-off-by: Sasha Pourcelot <[email protected]>
This was referenced Apr 15, 2024
scrabsha
added a commit
that referenced
this issue
Apr 16, 2024
This adds support for `||`, `&&`, `..` and `..=` operators. This is the last piece of grammar that is required for a 100% seamless transition, as defined in #29. --------- Signed-off-by: Sasha Pourcelot <[email protected]>
scrabsha
added a commit
that referenced
this issue
Apr 17, 2024
The parser generated by the `gramar_gen` crate works well for parsing, but the generated state machine pushes _a lot_ of symbols on its stack. This is fine when parsing, because these symbols are consumed later in the parsing. This is less ok when we compare multiple repetitions of the same sequence of tokens, because each repetition is likely to add more symbols in the stack. In order to mitigate this, the state machine matcher converges on transitions (ie: how a state has changed when fed with tokens) rather than on the final state itself. That's the purpose of this MR. This relies on the fact that applying a stream of macro on a state will result on the same transition, no matter how much time the stream has been applied before. This approach is much closer to mathematical induction: if we see that applying one time a repetition yields a valid state ( $P(0)$ ) and manage to prove that for any amount of repetition, adding one more repetition yields the same transition ( $P(n) \implies P(n + 1)$ ), then the resulted expansion is valid for any number of repetition. Will help for #29. --------- Signed-off-by: Sasha Pourcelot <[email protected]>
scrabsha
added a commit
that referenced
this issue
Apr 17, 2024
This PR adds the `grammar.rs` file that defines a parser that parses exactly what is currently parsed with the `generate_grammar!` state machine generator. I decided to commit the `generated.rs` file because we will have to publish it at some point. Once #29 is over, it would be a good idea to add support for the `||` operator is `grammar_gen`. Part of #29. --------- Signed-off-by: Sasha Pourcelot <[email protected]>
scrabsha
added a commit
that referenced
this issue
Apr 18, 2024
…ry (#36) Part of #29. --------- Signed-off-by: Sasha Pourcelot <[email protected]>
scrabsha
added a commit
that referenced
this issue
Apr 22, 2024
Will help for #29. --------- Signed-off-by: Sasha Pourcelot <[email protected]>
scrabsha
added a commit
that referenced
this issue
Apr 27, 2024
Part of #29. --------- Signed-off-by: Sasha Pourcelot <[email protected]>
scrabsha
added a commit
that referenced
this issue
Apr 28, 2024
Part of #29. --------- Signed-off-by: Sasha Pourcelot <[email protected]>
scrabsha
added a commit
that referenced
this issue
May 1, 2024
Part of #29. --------- Signed-off-by: Sasha Pourcelot <[email protected]>
scrabsha
added a commit
that referenced
this issue
May 5, 2024
Part of #29. Things that are missing: - Qualified paths in patterns (we don't support them at all) - Macro calls. Support for `#[expandable::pat]` has also been added. I found a cool side-quest while working on this: ferrocene/specification#494. --------- Signed-off-by: Sasha Pourcelot <[email protected]>
scrabsha
added a commit
that referenced
this issue
May 6, 2024
#46) Two key things: - The generated parser contains _less_ macros, and instead leverages methods defined in `RustParser`. - The generated parser is now able to sneakily split some tokens. For instance, it is necessary to split the `>>` in two separate `>` tokens when parsing `expr.collect::<Vec<_>>()`. Will help for #29, espacially path parsing when two `>` are contiguous. --------- Signed-off-by: Sasha Pourcelot <[email protected]>
Merged
scrabsha
added a commit
that referenced
this issue
May 6, 2024
Will help for #29. --------- Signed-off-by: Sasha Pourcelot <[email protected]>
scrabsha
added a commit
that referenced
this issue
Jun 24, 2024
Part of #29. --------- Signed-off-by: Sasha Pourcelot <[email protected]>
scrabsha
added a commit
that referenced
this issue
Jun 24, 2024
Part of #29 I guess? Signed-off-by: Sasha Pourcelot <[email protected]>
scrabsha
added a commit
that referenced
this issue
Jul 6, 2024
We currently don't handle `where` clauses and generic type parameters. This will come later. Part of #29. --------- Signed-off-by: Sasha Pourcelot <[email protected]>
scrabsha
added a commit
that referenced
this issue
Jul 15, 2024
Will be _very_ helpful for #29. The return atom system allows parsing functions to return atoms to their caller. The returned atom is available only until the next token is consumed, and may be overwritten by subsequent calls. The return value is set by calling `return atom;` and can be tested by calling the `returned(atom)` predicate. In addition, the counterexample generator has been tweaked so that it does not panic when generating a counterexample that contains fragments. --------- Signed-off-by: Sasha Pourcelot <[email protected]>
Merged
scrabsha
added a commit
that referenced
this issue
Jul 19, 2024
Title says everything. Sadly the support is not as complete as I'd like it to be. It turns out it's incredibly hard to guess if the operand is a block expression or not. This requires many refactorings that I'm too tired to do. The result is that the `,` is required at the end of the `match` arm, even if the operand is a block expression. Part of #29. --------- Signed-off-by: Sasha Pourcelot <[email protected]>
scrabsha
added a commit
that referenced
this issue
Sep 16, 2024
…61) Well, it turns out it is not that convenient to write a Rust parser without using variables. Who could predict? 🤡 This MR completely changes `grammar-gen` and `rust-grammar-dpdfa` so that functions can take arguments. Internally, `rust-grammar-dpdfa` is now backed by a virtual machine that runs its own bytecode, and `grammar-gen` is a compiler that targets this very bytecode. I may add loops in the future, I don't know. For what it's worth, the previous approach was to use a pushdown automation, which did not allow for variables and return values. I'm actually surprised that most of the debugging I had to do was in the virtual machine implementation, and I got the codegen correct on the first attempt. This is going to make my work on #29 massively simpler.
scrabsha
added a commit
that referenced
this issue
Sep 29, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
expandable
works quite well, but it only supports a very reduced subset of Rust.In order to make it complete, we have to write a state machine that accepts every syntactically valid Rust code (and nothing more!). This is currently done thanks to a 260 line-long invocation of the
generate_grammar
macro. This is very painful to understand and update properly. It certainly won't scale to the whole Rust grammar.The other solution is to write a parser for Rust in a dsl that is then turned into a state machine by a custom compiler, and replace the
generate_grammar
stuff with this shiny state machine. This means more code to maintain, document and test, but this would make future development so much easier that it's worth it.Here are the different steps we will follow:
rustc_parse
or evensyn
.generate_grammar
state machine, use the generated state machine instead.||
operator ingrammar_gen
grammar.rs
weird_exprs.rs
may be a challenging one :DIt could be interesting to benchmark the parser, but I don't consider it an important item.
The text was updated successfully, but these errors were encountered: