-
Notifications
You must be signed in to change notification settings - Fork 550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve parsing performance by reducing token cloning #1587
base: main
Are you sure you want to change the base?
Conversation
I've been sitting around a 10% improvement with each added commit not moving the needled all that much. Then after poking a bit harder at Instruments I realized that some super hot paths are in the token manipulation methods themselves. This adds an 18% improvement against my previous commit which gives a grand total of about 28% on both benchmarks.
Except for a single instance, every use of expect_keyword was ignoring the returned token. This adds a new `expect_keyword_is` that avoids that unnecessary clone. I nearly added a `#[must_use]` attribute to the `expect_keyword` method, but decided against it as that feels like a breaking API change even if it would nudge folks toward the correct method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much @davisp -- this is a very nice PR ❤️
I ran the benchmarks as well on a linux machine and see the same improvements you did. Very nice
++ critcmp main reduce-token-cloning
group main reduce-token-cloning
----- ---- --------------------
sqlparser-rs parsing benchmark/sqlparser::select 1.46 6.4±0.03µs ? ?/sec 1.00 4.4±0.02µs ? ?/sec
sqlparser-rs parsing benchmark/sqlparser::with_select 1.30 30.7±0.09µs ? ?/sec 1.00 23.5±0.10µs ? ?/sec
++ popd
~/datafusion-benchmarking
FYI @iffyio and @Dandandan
@@ -186,6 +186,15 @@ impl std::error::Error for ParserError {} | |||
// By default, allow expressions up to this deep before erroring | |||
const DEFAULT_REMAINING_DEPTH: usize = 50; | |||
|
|||
// A constant EOF token that can be referenced. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
This might be useful enough to make pub const
as well (so users of the crate can do the same thing)
@@ -3376,22 +3407,26 @@ impl<'a> Parser<'a> { | |||
matched | |||
} | |||
|
|||
pub fn next_token(&mut self) -> TokenWithSpan { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please add doc comments to these methods as well?
Maybe we can also mention "see next_token_ref
for a faster, non copying API"
/// If the current token is the `expected` keyword, consume the token. | ||
/// Otherwise, return an error. | ||
/// | ||
/// This differs from expect_keyword only in that the matched keyword |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be really nice to add a link in the docs too:
/// This differs from expect_keyword only in that the matched keyword | |
/// This differs from [`Self::expect_keyword`] only in that the matched keyword |
&format!("one of {}", keywords.join(" or ")), | ||
self.peek_token(), | ||
self.peek_token_ref(), | ||
) | ||
} | ||
} | ||
|
||
/// If the current token is the `expected` keyword, consume the token. | ||
/// Otherwise, return an error. | ||
pub fn expect_keyword(&mut self, expected: Keyword) -> Result<TokenWithSpan, ParserError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe as a follow on PR we could mark this API as deprecated to locate other uses of this API as well as help downstream consumers upgrade 🤔
datafusion-sqlparser-rs/src/tokenizer.rs
Line 604 in 5de5312
#[deprecated(since = "0.53.0", note = "please use `TokenWithSpan` instead")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks @davisp!
This basically just takes the approach proposed by @alamb in #1561 and runs with it.
Token
s as much #1558The main changes involved:
&TokenWithSpan
instead of a clonedTokenWithSpan
.Results on my M1 MacBook Pro: