Bnfgen is a general purposed BNF grammar based random string generator βοΈ, with a powerful grammar extension to make it more πͺ ergonomic to use.
- Generation of complex
token
is hard to maintain, e.g. numbers, variable
<letter> ::= "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" | "K" | "L" | "M"
| "N" | "O" | "P" | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"
| "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m"
| "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z" ;
This is a sample of how to represent a letter in BNF, it is obvious that the maintainability of this syntax is quite low. To address this, we integrate the regular express into our grammar extension seamlessly. Under our extension, the above syntax can simply be written as:
<letter> ::= re("[a-zA-Z]");
- Unpredictable generation result
The generation of recursive rules in BNF is hard to control
<decls> ::= <decl> | <decl> <decls> ;
Without any control, the generation of <decls>
may always choose the second branch,
which lead to an extremely long generation result and even can't stop in limited time.
Thus, we introduce invoke limit and weighted branch to control the generation.
<S> ::= <A> {1, } // should be invoked at least once
| <B> { 5 } // should be invoked exactly 5 times
| <C> {1, 5} // should be invoked at least once and at most 5 times
Noted it is possible that generator has nothing to choose:
<S> ::= <X> | <X> <S> {100};
<X> ::= "foo" {1, 5} | "bar" {10};
But, don't worry, the semantic analysis will help you out. We will give you a warning at the analysis stage.
The design of Bnfgen is heavily influenced by the Rust Programming Language, particularly its outstanding error messages. Bnfgen conducts various semantic analyses on the BNF grammar, aiming to make its error messages as informative as possible.
Γ May be trapped in a dead loop
ββ[5:13]
4 β <term> ::= "Terminal" ;
5 β <A> ::= <B> ;
Β· βββββββ¬ββββββ
Β· β°ββ this rule may be trapped in a dead loop
6 β <B> ::= <C> ;
Β· βββββββ¬ββββββ
Β· β°ββ this rule may be trapped in a dead loop
7 β <C> ::= <A> ;
Β· βββββββ¬ββββββ
Β· β°ββ this rule may be trapped in a dead loop
8 β
β°ββββ
Currently, we support the following semantic analysis:
- Invalid invoke limit range detection
- Undefined rule detection
- Duplicated rule detection
- Unreachable rule detection
- Dead loop detection (which avoid the possible infinite loop in the generation)
- Invoke limit not enough detection
We believe that an informative error message is the key to make the tool more ergonomic to use.
-
Born of this project is highly inspired by Daniil Baturin's work, which is also a BNF based random string generator, but in OCaml π«. The design of the grammar extension is also heavily influenced by his work.
-
I want to thank Andrew Gallant for his work on the regex crate in Rust Eco-system. The incorporation of regular language can't be done so easily without the help of this crate.
-
I'd also like to express my gratitude to Maciej Hirsz and Niko Matsakis, their excellent work on the logos and lalrpop crate makes the parsing of the grammar file a breeze π.
Licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.