-
Notifications
You must be signed in to change notification settings - Fork 816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decode regression #1564
Comments
Ow wow, thanks a mile for the clear report! |
I'll add this to benches! 🆙 |
I checked ou the commit you mention, had to create a custom branch to test: |
you mean you don't see a regression at head? I just synced up to confirm performance drop. Don't look at change % as that seems to report changes since last run of the benchmark, just compare absolute values, e.g. here is what I got on two consecutive runs on head:
|
I'll check again! I was getting 630ns for both branches but let me make sure of that! |
Yeah I cherry picked it I think, to run before / after I'll checka gain |
Since encode perf was improved in 0.20 release, any plans to look at this? |
Yep for sure 😉 Ad you've seen was more focused on encode but will pick this back up |
@ArthurZucker any updates? :) |
This is not a recent regression, and perhaps it won't be fixed for that reason, but I thought I'd file it anyway.
I maintain Go bindings for this library, and by sheer luck I had benchmarks when I started. At some point I've noticed a regression in decoding, but only now got around to investigating it. Long story short, I've bisected this repo and root caused it to this PR. Below is a benchmark used to find it.
Regression details:
While decode is pretty fast (order of microseconds), +240% slowdown is fairly big and I wonder if we can gain back that performance.
Benchmark code (
tokenizers/benches/decode_benchmark.rs
):Add this to
Cargo.toml
and run withcargo bench decode
.The tokenizer file is copied from here.
The text was updated successfully, but these errors were encountered: