-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: simple bench target for measuring DB reads using ReadView #2433
base: master
Are you sure you want to change the base?
Conversation
Interesting observation, the current benchmark is slower with the multi-get implementation than with the plain db lookups. On my build server I get the following results (having saved the
|
c930805
to
8a2a490
Compare
Bumped the number of blocks (and therefore number of transactions to query) by a factor 10, but I still can't observe a notable difference between the multi-get implementation and the previous one.
This could be due to a number of factors.
|
f15f180
to
4e7d3b3
Compare
I've rewritten the workload right now to only measure the database interactions through the (With the implementation from
|
Running the latest benches on my laptop only amplifies the difference: Baseline:
With multi-get
I've also experimented with disabling the rocksdb block cache, which didn't make any big difference for this workload. For this workload, I can now safely say that the multi-get implementation is strictly worse. I suspect that this is due to the overhead caused by using dynamic dispatch for the iterators, but have yet to confirm this theory. |
4e7d3b3
to
7a2be1b
Compare
Reading up on multi-get, one performance benefit is reduced lock contention for concurrent reads - so I'll adjust the workload to see if I can measure anything interesting with many concurrent reads. https://github.com/facebook/rocksdb/wiki/MultiGet-Performance |
7a2be1b
to
5c261ee
Compare
Multi-get is still slower. Measured at commit Reference:
Multi-get:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, maybe you want to try to select random coins, not sequential.
I've typically requested batches of 100k coins and transactions for most runs, but I have varied this parameter (
Good idea, I'll try that 👍 |
Tried selecting random coins (by shuffling the ID vec before each query), didn't notice any significant speed degradation in the base case.
diff relative
|
Tried a new variant proposed by Green in a status meeting today. Only query a subset of all coins each time. With a database of 1M coins, querying 10k coins 1k times I get the following (at 87c4b01): reference
multi-get
So multi-get still seems to be slower, but the difference seems less tangible than most recent runs. I'll see if I can find a parameter combination that changes this outcome. |
Tried again with new parameters (querying 10k coins 1k times from a database with 10M coins) at c1f2605, multi-get is still slower. Reference
Multi-get
|
Linked Issues/PRs
Description
This PR adds a benchmark target for measuring end to end query times for transaction queries. The benchmark is a simple binary that submits a bunch of transactions, and later queries these transactions by owner several times. It is intended to be used together with benchmarking tools such as hyperfine.