Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support the whole Rust grammar #29

Open
5 of 10 tasks
scrabsha opened this issue Apr 3, 2024 · 0 comments
Open
5 of 10 tasks

Support the whole Rust grammar #29

scrabsha opened this issue Apr 3, 2024 · 0 comments
Assignees

Comments

@scrabsha
Copy link
Owner

scrabsha commented Apr 3, 2024

expandable works quite well, but it only supports a very reduced subset of Rust.

In order to make it complete, we have to write a state machine that accepts every syntactically valid Rust code (and nothing more!). This is currently done thanks to a 260 line-long invocation of the generate_grammar macro. This is very painful to understand and update properly. It certainly won't scale to the whole Rust grammar.

The other solution is to write a parser for Rust in a dsl that is then turned into a state machine by a custom compiler, and replace the generate_grammar stuff with this shiny state machine. This means more code to maintain, document and test, but this would make future development so much easier that it's worth it.

Here are the different steps we will follow:

It could be interesting to benchmark the parser, but I don't consider it an important item.

scrabsha added a commit that referenced this issue Apr 15, 2024
This MR sorts the "expected XXX, YYY, ZZZ" list printed by the
`expandable` proc macro so that its ordering does not depend on how the
grammar is defined. This will make the transition to the compiled parser
smoother (#29).

The sorting algorithm (lexicographic order) is far from perfect, but we
just need to have something vaguely deterministic for now.

---------

Signed-off-by: Sasha Pourcelot <[email protected]>
scrabsha added a commit that referenced this issue Apr 16, 2024
This adds support for `||`, `&&`, `..` and `..=` operators.

This is the last piece of grammar that is required for a 100% seamless
transition, as defined in #29.

---------

Signed-off-by: Sasha Pourcelot <[email protected]>
scrabsha added a commit that referenced this issue Apr 17, 2024
The parser generated by the `gramar_gen` crate works well for parsing,
but the generated state machine pushes _a lot_ of symbols on its stack.
This is fine when parsing, because these symbols are consumed later in
the parsing. This is less ok when we compare multiple repetitions of the
same sequence of tokens, because each repetition is likely to add more
symbols in the stack.

In order to mitigate this, the state machine matcher converges on
transitions (ie: how a state has changed when fed with tokens) rather
than on the final state itself. That's the purpose of this MR. This
relies on the fact that applying a stream of macro on a state will
result on the same transition, no matter how much time the stream has
been applied before.

This approach is much closer to mathematical induction: if we see that
applying one time a repetition yields a valid state ( $P(0)$ ) and
manage to prove that for any amount of repetition, adding one more
repetition yields the same transition ( $P(n) \implies P(n + 1)$ ), then
the resulted expansion is valid for any number of repetition.

Will help for #29.

---------

Signed-off-by: Sasha Pourcelot <[email protected]>
@scrabsha scrabsha self-assigned this Apr 17, 2024
scrabsha added a commit that referenced this issue Apr 17, 2024
This PR adds the `grammar.rs` file that defines a parser that parses
exactly what is currently parsed with the `generate_grammar!` state
machine generator.

I decided to commit the `generated.rs` file because we will have to
publish it at some point.

Once #29 is over, it would be a good idea to add support for the `||`
operator is `grammar_gen`.

Part of #29.

---------

Signed-off-by: Sasha Pourcelot <[email protected]>
scrabsha added a commit that referenced this issue Apr 18, 2024
scrabsha added a commit that referenced this issue Apr 22, 2024
Will help for #29.

---------

Signed-off-by: Sasha Pourcelot <[email protected]>
scrabsha added a commit that referenced this issue Apr 27, 2024
Part of #29.

---------

Signed-off-by: Sasha Pourcelot <[email protected]>
scrabsha added a commit that referenced this issue Apr 28, 2024
scrabsha added a commit that referenced this issue May 1, 2024
scrabsha added a commit that referenced this issue May 5, 2024
Part of #29.

Things that are missing:
- Qualified paths in patterns (we don't support them at all)
- Macro calls.

Support for `#[expandable::pat]` has also been added.

I found a cool side-quest while working on this:
ferrocene/specification#494.

---------

Signed-off-by: Sasha Pourcelot <[email protected]>
scrabsha added a commit that referenced this issue May 6, 2024
#46)

Two key things:
- The generated parser contains _less_ macros, and instead leverages
methods defined in `RustParser`.
- The generated parser is now able to sneakily split some tokens. For
instance, it is necessary to split the `>>` in two separate `>` tokens
when parsing `expr.collect::<Vec<_>>()`.

Will help for #29, espacially path parsing when two `>` are contiguous.

---------

Signed-off-by: Sasha Pourcelot <[email protected]>
scrabsha added a commit that referenced this issue May 6, 2024
Will help for #29.

---------

Signed-off-by: Sasha Pourcelot <[email protected]>
scrabsha added a commit that referenced this issue Jun 24, 2024
Part of #29.

---------

Signed-off-by: Sasha Pourcelot <[email protected]>
scrabsha added a commit that referenced this issue Jun 24, 2024
Part of #29 I guess?

Signed-off-by: Sasha Pourcelot <[email protected]>
scrabsha added a commit that referenced this issue Jul 6, 2024
We currently don't handle `where` clauses and generic type parameters.
This will come later.

Part of #29.

---------

Signed-off-by: Sasha Pourcelot <[email protected]>
scrabsha added a commit that referenced this issue Jul 15, 2024
Will be _very_ helpful for #29.

The return atom system allows parsing functions to return atoms to their
caller. The returned atom is available only until the next token is
consumed, and may be overwritten by subsequent calls.

The return value is set by calling `return atom;` and can be tested by
calling the `returned(atom)` predicate.

In addition, the counterexample generator has been tweaked so that it
does not panic when generating a counterexample that contains fragments.

---------

Signed-off-by: Sasha Pourcelot <[email protected]>
scrabsha added a commit that referenced this issue Jul 19, 2024
Title says everything.

Sadly the support is not as complete as I'd like it to be.

It turns out it's incredibly hard to guess if the operand is a block
expression or not. This requires many refactorings that I'm too tired to
do. The result is that the `,` is required at the end of the `match`
arm, even if the operand is a block expression.

Part of #29.

---------

Signed-off-by: Sasha Pourcelot <[email protected]>
scrabsha added a commit that referenced this issue Sep 16, 2024
…61)

Well, it turns out it is not that convenient to write a Rust parser
without using variables. Who could predict? 🤡

This MR completely changes `grammar-gen` and `rust-grammar-dpdfa` so
that functions can take arguments. Internally, `rust-grammar-dpdfa` is
now backed by a virtual machine that runs its own bytecode, and
`grammar-gen` is a compiler that targets this very bytecode. I may add
loops in the future, I don't know.

For what it's worth, the previous approach was to use a pushdown
automation, which did not allow for variables and return values.

I'm actually surprised that most of the debugging I had to do was in the
virtual machine implementation, and I got the codegen correct on the
first attempt.

This is going to make my work on #29 massively simpler.
scrabsha added a commit that referenced this issue Sep 29, 2024
This allows us to stay compatible with the Rust grammar and keep using
its awesome formatter.

Will be useful for #29.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant