-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a streaming interface #3
Comments
Clhash uses word-by-word processing so for tiny inputs, it will be faster if they input fits in multiples of 8 bytes. If you use sizeable inputs, however, this effect becomes negligible. I have added a proper benchmark...
The result should look like this...
If your inputs are tiny and not a multiple of 8 bytes (like 12 bytes), then there is room for optimization. I do not expect that there is a bug. The figure you point to in https://arxiv.org/abs/1503.03465 represents the speed of hashing strings made of thousands of bytes. |
Thanks. Indeed I get similar results for my intel nuc6i7kyk box with Linux 64 bit and gcc 5.4:
But note that for example the value for 32 byte is much better than the value for 31. The difference between 15 and 16 is even larger! So padding to multiples of 8 may make really sense. And I think when each user may do its own padding that is not a good solution. For your paper: I was referring to https://arxiv.org/pdf/1503.03465v8.pdf figure 1. x-axis is labeled "data input size (bytes) and y-axis is "cycles per input byte". That curve is absolutely smooth for your clhash, and cycles per input byte starts at 2.5 for 8 byte size. So my assumption is that your current code is more tuned to large blocks? And finally, it would be great to have additional a streaming interface as xxhash provides. For example when one wants to hash records with fields like name and country. I guess for the current interface one has to join the two strings and then hash it. |
@StefanSalewski These are good ideas. Looking forward to receiving the pull requests! |
I just did a short test, intel skylake i7, gcc 5.4. Only modified your example.c a bit like this:
Tested for string size 6, 7, 8, 12. For 8 byte key I get this:
For the other sizes something like
So for really high performance, we are supposed to pad our data and use multiples of 8 for size?
From figure 1 in your paper I had the impression that its would work smooth fast for sizes >= 8 at least.
https://arxiv.org/abs/1503.03465
May there be a bug in recent code?
The text was updated successfully, but these errors were encountered: