Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a streaming interface #3

Open
StefanSalewski opened this issue Dec 31, 2016 · 3 comments
Open

Add a streaming interface #3

StefanSalewski opened this issue Dec 31, 2016 · 3 comments

Comments

@StefanSalewski
Copy link

I just did a short test, intel skylake i7, gcc 5.4. Only modified your example.c a bit like this:

#include <assert.h>

#include "clhash.h"

int main() {
    void * random =  get_random_key_for_clhash(UINT64_C(0x23a23cf5033c3c81),UINT64_C(0xb3816f6a2c68e530));
    int i;
    for (i = 0; i < 10000000; i++)
    {
    uint64_t hashvalue1 = clhash(random,"my dog0123456789", 12);
    uint64_t hashvalue2 = clhash(random,"my cat0123456789", 12);
    uint64_t hashvalue3 = clhash(random,"my dog0123456789", 12);
    }
    // assert(hashvalue1 == hashvalue3);
    // assert(hashvalue1 != hashvalue2);// very likely to be true
    free(random);
    return 0;
}

Tested for string size 6, 7, 8, 12. For 8 byte key I get this:

make example

time ./example 
real	0m0.152s
user	0m0.148s
sys	0m0.000s

For the other sizes something like

$ time ./example 
real	0m0.493s
user	0m0.492s
sys	0m0.000s

So for really high performance, we are supposed to pad our data and use multiples of 8 for size?

From figure 1 in your paper I had the impression that its would work smooth fast for sizes >= 8 at least.

https://arxiv.org/abs/1503.03465

May there be a bug in recent code?

@lemire lemire changed the title Hash is not that fast when key size is not a multiple of 8 Hash is not that fast when key size is tiny and not a multiple of 8 Dec 31, 2016
@lemire
Copy link
Owner

lemire commented Dec 31, 2016

Clhash uses word-by-word processing so for tiny inputs, it will be faster if they input fits in multiples of 8 bytes. If you use sizeable inputs, however, this effect becomes negligible.

I have added a proper benchmark...

make
./benchmark

The result should look like this...

# for each input size in bytes, we report the number of cycles used to hash a byte
# First number is the size in bytes
# Second number is the number of CPU cycles per byte for clhash
# Third number is the number of CPU cycles per byte for java-like non-random hash function
                   8	 4.50 	 2.50
                   9	 6.44 	 2.44
                  10	 5.80 	 2.60
                  11	 5.27 	 2.91
                  12	 5.00 	 2.67
                  13	 4.46 	 2.77
                  14	 4.14 	 2.71
                  15	 3.87 	 2.93
                  16	 2.38 	 2.88
                  17	 3.53 	 2.71
                  18	 3.33 	 2.89
                  19	 3.16 	 2.84
                  20	 3.00 	 2.80
                  21	 2.76 	 2.67
                  22	 2.64 	 2.91
                  23	 2.52 	 2.87
                  24	 1.50 	 2.83
                  25	 2.40 	 2.88
                  26	 2.23 	 2.85
                  27	 2.22 	 2.89
                  28	 2.07 	 2.93
                  29	 2.07 	 2.90
                  30	 1.93 	 2.93
                  31	 1.87 	 2.97
                  32	 1.12 	 2.94
                  33	 1.76 	 2.85
                  34	 1.76 	 2.88
                  35	 1.71 	 2.86
                  36	 1.61 	 2.94
                  37	 1.51 	 2.92
                  38	 1.53 	 2.95
                  39	 1.54 	 2.97
                  40	 1.05 	 2.90
                  41	 1.41 	 2.93
                  42	 1.43 	 2.95
                  43	 1.35 	 2.93
                  44	 1.27 	 2.91
                  45	 1.29 	 2.98
                  46	 1.26 	 2.91
                  47	 1.23 	 2.94
                  48	 0.88 	 2.96
                  49	 1.18 	 2.90
                  50	 1.20 	 2.92
                  51	 1.18 	 2.94
                  52	 1.15 	 2.96
                  53	 1.13 	 2.91
                  54	 1.07 	 2.93
                  55	 1.05 	 2.98
                  56	 0.75 	 2.93
                  57	 1.05 	 2.95
                  58	 1.03 	 2.97
                  59	 1.02 	 2.92
                  60	 1.00 	 2.97
                  61	 0.98 	 2.95
                  62	 0.94 	 2.97
                  63	 0.92 	 2.95
                  64	 0.72 	 2.94
                  65	 0.92 	 2.95
                  66	 0.91 	 2.91
                  67	 0.90 	 2.99
                  68	 0.88 	 2.97
                  69	 0.87 	 2.93
                  70	 0.86 	 2.97
                  71	 0.85 	 2.99
                  72	 0.64 	 2.97
                  73	 0.79 	 2.96
                  74	 0.78 	 2.95
                  75	 0.80 	 2.96
                  76	 0.76 	 2.97
                  77	 0.73 	 2.96
                  78	 0.74 	 2.97
                  79	 0.73 	 2.99
                  80	 0.57 	 2.95
                  81	 0.72 	 2.96
                  82	 0.73 	 2.98
                  83	 0.72 	 2.99
                  84	 0.71 	 2.98
                  85	 0.71 	 2.99
                  86	 0.70 	 2.98
                  87	 0.67 	 2.99
                  88	 0.55 	 2.95
                  89	 0.67 	 2.94
                  90	 0.69 	 2.98
                  91	 0.66 	 2.95
                  92	 0.65 	 2.96
                  93	 0.65 	 2.95
                  94	 0.64 	 2.98
                  95	 0.63 	 2.97
                  96	 0.50 	 2.96
                  97	 0.62 	 2.95
                  98	 0.63 	 3.00
                  99	 0.61 	 2.99
                 100	 0.60 	 2.98
...
                4088	 0.10 	 3.01
                4089	 0.10 	 3.01
                4090	 0.10 	 3.01
                4091	 0.10 	 3.01
                4092	 0.10 	 3.01
                4093	 0.10 	 3.01
                4094	 0.10 	 3.01
                4095	 0.10 	 3.01

If your inputs are tiny and not a multiple of 8 bytes (like 12 bytes), then there is room for optimization.

I do not expect that there is a bug. The figure you point to in https://arxiv.org/abs/1503.03465 represents the speed of hashing strings made of thousands of bytes.

@StefanSalewski
Copy link
Author

Thanks. Indeed I get similar results for my intel nuc6i7kyk box with Linux 64 bit and gcc 5.4:

$ make benchmark
cc -fPIC -std=c99 -O3 -msse4.2 -mpclmul -march=native -funroll-loops -Wstrict-overflow -Wstrict-aliasing -Wall -Wextra -pedantic -Wshadow -c ./src/clhash.c -Iinclude
cc -fPIC -std=c99 -O3 -msse4.2 -mpclmul -march=native -funroll-loops -Wstrict-overflow -Wstrict-aliasing -Wall -Wextra -pedantic -Wshadow -o benchmark ./benchmarks/benchmark.c -Iinclude  clhash.o

$ ./benchmark
# for each input size in bytes, we report the number of cycles used to hash a byte
# First number is the size in bytes
# Second number is the number of CPU cycles per byte for clhash
# Third number is the number of CPU cycles per byte for java-like non-random hash function
                   8	 4.50 	 2.50 
                   9	 6.67 	 2.67 
                  10	 6.00 	 2.60 
                  11	 5.45 	 2.73 
                  12	 5.00 	 2.83 
                  13	 4.62 	 2.62 
                  14	 4.14 	 2.71 
                  15	 3.87 	 1.47 
                  16	 1.38 	 1.75 
                  17	 2.35 	 1.88 
                  18	 2.22 	 1.89 
                  19	 2.11 	 1.89 
                  20	 1.90 	 1.90 
                  21	 1.90 	 1.90 
                  22	 1.73 	 2.00 
                  23	 1.74 	 2.09 
                  24	 1.00 	 2.00 
                  25	 1.68 	 2.00 
                  26	 1.62 	 2.08 
                  27	 1.56 	 2.07 
                  28	 1.50 	 2.14 
                  29	 1.45 	 2.14 
                  30	 1.40 	 2.20 
                  31	 1.29 	 2.13 
                  32	 0.69 	 2.12 
                  33	 1.27 	 2.12 

But note that for example the value for 32 byte is much better than the value for 31. The difference between 15 and 16 is even larger! So padding to multiples of 8 may make really sense. And I think when each user may do its own padding that is not a good solution.

For your paper: I was referring to https://arxiv.org/pdf/1503.03465v8.pdf figure 1. x-axis is labeled "data input size (bytes) and y-axis is "cycles per input byte". That curve is absolutely smooth for your clhash, and cycles per input byte starts at 2.5 for 8 byte size. So my assumption is that your current code is more tuned to large blocks?

And finally, it would be great to have additional a streaming interface as xxhash provides. For example when one wants to hash records with fields like name and country. I guess for the current interface one has to join the two strings and then hash it.

@lemire
Copy link
Owner

lemire commented Dec 31, 2016

@StefanSalewski These are good ideas. Looking forward to receiving the pull requests!

@lemire lemire changed the title Hash is not that fast when key size is tiny and not a multiple of 8 Add a streaming interface Jan 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants