-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor expression parsing and checksum checking #773
refactor expression parsing and checksum checking #773
Conversation
This creates a new checksum::Error type. For now it sticks it into the giant global error enum, but we will fold it into an expression-parsing enum in the following commits. This eliminates several instances of the Error::BadDescriptor variant, which is a meaningless stringly-typed variant that we'd like to eliminate.
This is really useful for testing.
When we convert the parser to be non-recursive, it will be simpler if we first do a pass to confirm that the expression is well-formed (all parens match and are balanced, commas in the right place, etc.). After that we can assume the string is well-formed and write an algorithm that basically just translates it into a data structure. Conveniently, because this check is independent of the parsing algorithm, we can add it to the existing code as a pre-check. Since all errors during parsing should occur during the pre-check, we can then turn the existing stringly-typed errors into unreachable!()s. By fuzzing the resulting code we can guarantee that our new pre-check is at least as strict as the existing parser. In a later PR we will generalize the pre-check a bit so that it treats *both* () and {} as parentheses, and markes tree nodes based on which parens are used. But until we change the Taproot parser (which is is an ad-hoc mess of combination manual string parsing and tree parsing, and expects tapleaves to show up as the "name"s of childless nodes in a tree of {} combinators) we can't effectively do this. There is a new unit test `parse_tree_taproot` that will track this progress.
Right now our character validation logic is spread all over the place. We have `expression::check_valid_chars` and `checksum::verify_checksum` and also `checksum::Engine::input` which all do the same checks, sharing the public array `expression::VALID_CHARS` which is arguably an implementation detail and shouldn't be exposed. In fact, whenever we parse any Miniscript object (a script, a descriptor, a policy of either type), we are first validating the checksum and then parsing into a tree, and our tree-parsing re-does character validation that was already done as part of the checksum check. This commit moves the checksum error logic into expression, where it belongs, and folds the checksum error variant into the expression parsing error, where it belongs (removing it from the mega-enum at the root). However it leaves the Taproot parsing alone, which is still a bit of a mess and will be addressed in a later PR. Benchmarks show that this significantly slows down expression parsing (as expected, since it is now doing checksumming) but significantly speeds up descriptor parsing (which is no longer doing multiple checks for character validity).
Benchmarks on master
Benchmarks on this PR
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK eb4f8c4
I love how this is shaping out to be. Thanks for detailed benchmarks.
let expected = eng.checksum_chars(); | ||
let mut actual = ['_'; CHECKSUM_LENGTH]; | ||
for (act, ch) in actual.iter_mut().zip(checksum_str.chars()) { | ||
*act = ch; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From chatgpt; with MSRV 1.63; you can avoid the mut and creating the actual with all '_' as
let mut iter = checksum_str.chars()
let actual: [char; CHECKSUM_LENGTH] = core::array::from_fn(|_| iter.next().expect("Len checked eariler))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I will add a commit to my next PR doing this. I'm thrilled to have from_fn
now.
As a step toward rewriting the expression parser to be non-recursive, add a pre-parsing well-formedness check, which verifies that an expression is well-formed, uses only the correct character set, and that the checksum (if present) is valid.
Along the way, replace the error types returned from the
expression
module with a new more-precise one which can identify the location of parse errors (and identify what the error was in a more correct way). This improves our error reporting and drops many instances of the stringly-typedBadDescriptor
error.There is currently a bunch of special logic for Taproot descriptors which have the extra characters
{
and}
. To the extent possible, this PR doesn't touch that logic. It will be addressed in a later PR.The benchmarks show a slight slowdown since we essentially added new validation logic without removing the old logic. Later PRs will improve things.