A parser generator where rules defined as go structs and code generation is optional. The concepts are introduced in the simple example below. There are also examples in the examples/ directory, the most interesting being examples/json.
Rules can be written as Go structs. Here is a definition for s-exprs:
// SExpr ::= Number | String | Atom | List
type SExpr struct {
grammar.OneOf // This rule is a "one-of": one of the rules below has to match
Number *Token `tok:"number"` // A pointer field means an optional match
String *Token `tok:"string"` // This "tok" tag means only tokens of type "string" will match
Atom *Token `tok:"atom"`
*List // All field in a one-of rule must be optional
}
// List ::= "(" [Item] ")"
type List struct {
grammar.Seq // This is a Sequence rule, all fields must match in a sequence
OpenBkt grammar.Match `tok:"bkt,("` // This "tok" tag means only "bkt" tokens with value "(" will match
Items []SExpr
CloseBkt grammar.Match `tok:"bkt,)`
}
The grammar.Match
type above is an empty struct, so it takes no space in the
structure, but it only matches the token specification in the tok
tag.
This is not quite complete as you need a Token
type. You can create your own
or if your needs are simple enough use grammar.SimpleToken
:
type Token = grammar.SimpleToken
In order to parse your s-exprs into the data structures aboves you also need a
tokeniser. You can make your own tokeniser or you can build one simply with the
grammar.SimpleTokeniser
function:
var tokenise = grammar.SimpleTokeniser([]grammar.TokenDef{
{
// If Name is empty, the token is skipped in the token stream
Ptn: `\s+` // This is a regular expression (must not contain groups)
},
{
Name: "bkt", // This is the type of the token seen in the "tok" struct tag
Ptn: `[()]`,
},
{
Name: "string",
Ptn: `"[^"]*"`,
},
{
Name: "number",
Ptn: `-?[0-9]+(?:\.[0-9]+)?`,
},
{
Name: "atom",
Ptn: `[a-zA-Z_][a-zA-Z0-9_-]*`,
},
})
Now putting all this together you can parse an s-expr of your choice:
tokenStream, _ := tokenise(`(cons a (list 123 "c")))`)
var sexpr SExpr
err := grammar.Parse(&sexpr, tokenStream)
Now sexpr
's fields have been filled and you explore the syntax tree by
traversing the fields in sexpr
, e.g. sexpr.List.Items
is now a slice of
SExprs
, sexpr.List.Items[0].Atom
is a Token
with Value "cons"
(and type
atom
).
There is a convenient function to output a rule struct:
grammar.PrettyWrite(sexpr, os.Stdout)
This will output a pretty representation of sexpr
:
SExpr {
List: List {
OpenBkt: {}
Items: [
SExpr {
Atom: {atom cons}
}
SExpr {
Atom: {atom a}
}
SExpr {
List: List {
OpenBkt: {}
Items: [
SExpr {
Atom: {atom list}
}
SExpr {
Number: {number 123}
}
SExpr {
String: {string "c"}
}
]
CloseBkt: {}
}
}
]
CloseBkt: {}
}
}
WARNING: the parser generator is currently out of sync with the grammar definition, so don't use it :)
The above works using reflection, which is fine but can be a little slow if you are parsing very big files. It is also possible to compile a parser.
First you need the command that generates the parser.
go install github.com/arnodel/grammar/cmd/genparse
Then the simplest is to add this line to the file containing the grammar you want to compile:
//go:generate genparse
You can now generate the compiled parser by running go generate
in your
package. If your file is called say grammar.go
, it will generated a file in
the same package called grammar.compiled.go
. You can still parse files the
same way as before using grammar.Parse()
but this will no longer use
reflection! Note that you can always force reflection to be used by compiling your
program with the nocompiledgrammar
go compiler tag.