-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance regression materializing Selection
#364
Comments
Seems like in both cases the selections are distributed over the complete set, randomly:
I get the following timings (note, FS caches should be hot): v0.1.24:
v0.1.25:
So it looks like the user time has gone up by ~40x |
git bisect is point me at:
|
The likely issue is the following loop in HighFive:
which has runtime that's quadratic in the number of selections, because |
Nice find @1uc! |
@1uc has created a change in HighFive that should fix this: I think it's worth waiting for that to make it into HighFive, and then released. |
Fixed by #365 |
I've had a report that it's still slower:
However, I'm not sure we need to put effort in on this at the moment; if so, I'll reopen the ticket. |
Remember that |
Good point; it's the wheel version from PyPi, and so it should be included. |
Okay, so there's more to be fixed. |
I'm only seeing less than a 2x difference in my completely none scientific benchmark:
This is running on the headnode, with a virtualenv in the home directory, with this code:
|
The regression is not related to anything I/O? Only building the |
I'll get back to you, I just wanted to write this down. |
Using a 71M
The selection is the union of We measure several versions of libsonata:
In order to see more easily which measurement belong to which variation of libsonata, we offset them horizontally; and treat We created a PR for |
Thanks @1uc , amazing improvement. I tried the new wheels on a bb5 allocation for the cells service use case (sampling 1% from 71M nodes) and I'm getting even better results than 0.1.24.
|
@GianlucaFicarelli thank you for checking on BB5. |
Two people independently contacted me about a large performance regression while materializing Selections.
They had to roll back to 0.1.24 to be able to complete selections.
It seems that operations that completed nearly instantly (if the data was cached) are now taking much longer.
I will try and make a reproducer.
The text was updated successfully, but these errors were encountered: