-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve perforamance of _pluck_uniq_cols #216
base: main
Are you sure you want to change the base?
Conversation
Use a set instead of an OrderedDict with empty values for gathering unique values. O(n^2) -> O(nlogn)
This buys around 2-3s per |
Thanks, @mlazowik for looking into this. It looks like a clear improvement |
I think this was good at db116cc . It was clear what the code was doing and seemed to be an improvement. I'd rather merge that than include the Please note, that a lot of the formpack code will be in flux in the next month as we move |
Right, I added the dep b/c I wanted to extract what I've written into I didn't see any more perf. improvement over my code. I can revert the dep, what do you think about extracting my little code to |
If it were going to be reused elsewhere in the code I think that would be helpful. The person who helped us early on was fond of using OrderedDict() with empty values as a workaround for not having a native python implementation of ordered set. I was not aware of performance cost of this approach. |
Use a set instead of an OrderedDict with empty values for gathering
unique values.
O(n^2) -> O(nlogn)