Add a streaming interface #3

StefanSalewski · 2016-12-31T08:11:29Z

I just did a short test, intel skylake i7, gcc 5.4. Only modified your example.c a bit like this:

#include <assert.h>

#include "clhash.h"

int main() {
    void * random =  get_random_key_for_clhash(UINT64_C(0x23a23cf5033c3c81),UINT64_C(0xb3816f6a2c68e530));
    int i;
    for (i = 0; i < 10000000; i++)
    {
    uint64_t hashvalue1 = clhash(random,"my dog0123456789", 12);
    uint64_t hashvalue2 = clhash(random,"my cat0123456789", 12);
    uint64_t hashvalue3 = clhash(random,"my dog0123456789", 12);
    }
    // assert(hashvalue1 == hashvalue3);
    // assert(hashvalue1 != hashvalue2);// very likely to be true
    free(random);
    return 0;
}

Tested for string size 6, 7, 8, 12. For 8 byte key I get this:

make example

time ./example 
real	0m0.152s
user	0m0.148s
sys	0m0.000s

For the other sizes something like

$ time ./example 
real	0m0.493s
user	0m0.492s
sys	0m0.000s

So for really high performance, we are supposed to pad our data and use multiples of 8 for size?

From figure 1 in your paper I had the impression that its would work smooth fast for sizes >= 8 at least.

https://arxiv.org/abs/1503.03465

May there be a bug in recent code?

The text was updated successfully, but these errors were encountered:

lemire · 2016-12-31T14:34:52Z

Clhash uses word-by-word processing so for tiny inputs, it will be faster if they input fits in multiples of 8 bytes. If you use sizeable inputs, however, this effect becomes negligible.

I have added a proper benchmark...

make
./benchmark

The result should look like this...

# for each input size in bytes, we report the number of cycles used to hash a byte
# First number is the size in bytes
# Second number is the number of CPU cycles per byte for clhash
# Third number is the number of CPU cycles per byte for java-like non-random hash function
                   8	 4.50 	 2.50
                   9	 6.44 	 2.44
                  10	 5.80 	 2.60
                  11	 5.27 	 2.91
                  12	 5.00 	 2.67
                  13	 4.46 	 2.77
                  14	 4.14 	 2.71
                  15	 3.87 	 2.93
                  16	 2.38 	 2.88
                  17	 3.53 	 2.71
                  18	 3.33 	 2.89
                  19	 3.16 	 2.84
                  20	 3.00 	 2.80
                  21	 2.76 	 2.67
                  22	 2.64 	 2.91
                  23	 2.52 	 2.87
                  24	 1.50 	 2.83
                  25	 2.40 	 2.88
                  26	 2.23 	 2.85
                  27	 2.22 	 2.89
                  28	 2.07 	 2.93
                  29	 2.07 	 2.90
                  30	 1.93 	 2.93
                  31	 1.87 	 2.97
                  32	 1.12 	 2.94
                  33	 1.76 	 2.85
                  34	 1.76 	 2.88
                  35	 1.71 	 2.86
                  36	 1.61 	 2.94
                  37	 1.51 	 2.92
                  38	 1.53 	 2.95
                  39	 1.54 	 2.97
                  40	 1.05 	 2.90
                  41	 1.41 	 2.93
                  42	 1.43 	 2.95
                  43	 1.35 	 2.93
                  44	 1.27 	 2.91
                  45	 1.29 	 2.98
                  46	 1.26 	 2.91
                  47	 1.23 	 2.94
                  48	 0.88 	 2.96
                  49	 1.18 	 2.90
                  50	 1.20 	 2.92
                  51	 1.18 	 2.94
                  52	 1.15 	 2.96
                  53	 1.13 	 2.91
                  54	 1.07 	 2.93
                  55	 1.05 	 2.98
                  56	 0.75 	 2.93
                  57	 1.05 	 2.95
                  58	 1.03 	 2.97
                  59	 1.02 	 2.92
                  60	 1.00 	 2.97
                  61	 0.98 	 2.95
                  62	 0.94 	 2.97
                  63	 0.92 	 2.95
                  64	 0.72 	 2.94
                  65	 0.92 	 2.95
                  66	 0.91 	 2.91
                  67	 0.90 	 2.99
                  68	 0.88 	 2.97
                  69	 0.87 	 2.93
                  70	 0.86 	 2.97
                  71	 0.85 	 2.99
                  72	 0.64 	 2.97
                  73	 0.79 	 2.96
                  74	 0.78 	 2.95
                  75	 0.80 	 2.96
                  76	 0.76 	 2.97
                  77	 0.73 	 2.96
                  78	 0.74 	 2.97
                  79	 0.73 	 2.99
                  80	 0.57 	 2.95
                  81	 0.72 	 2.96
                  82	 0.73 	 2.98
                  83	 0.72 	 2.99
                  84	 0.71 	 2.98
                  85	 0.71 	 2.99
                  86	 0.70 	 2.98
                  87	 0.67 	 2.99
                  88	 0.55 	 2.95
                  89	 0.67 	 2.94
                  90	 0.69 	 2.98
                  91	 0.66 	 2.95
                  92	 0.65 	 2.96
                  93	 0.65 	 2.95
                  94	 0.64 	 2.98
                  95	 0.63 	 2.97
                  96	 0.50 	 2.96
                  97	 0.62 	 2.95
                  98	 0.63 	 3.00
                  99	 0.61 	 2.99
                 100	 0.60 	 2.98
...
                4088	 0.10 	 3.01
                4089	 0.10 	 3.01
                4090	 0.10 	 3.01
                4091	 0.10 	 3.01
                4092	 0.10 	 3.01
                4093	 0.10 	 3.01
                4094	 0.10 	 3.01
                4095	 0.10 	 3.01

If your inputs are tiny and not a multiple of 8 bytes (like 12 bytes), then there is room for optimization.

I do not expect that there is a bug. The figure you point to in https://arxiv.org/abs/1503.03465 represents the speed of hashing strings made of thousands of bytes.

StefanSalewski · 2016-12-31T16:09:03Z

Thanks. Indeed I get similar results for my intel nuc6i7kyk box with Linux 64 bit and gcc 5.4:

$ make benchmark
cc -fPIC -std=c99 -O3 -msse4.2 -mpclmul -march=native -funroll-loops -Wstrict-overflow -Wstrict-aliasing -Wall -Wextra -pedantic -Wshadow -c ./src/clhash.c -Iinclude
cc -fPIC -std=c99 -O3 -msse4.2 -mpclmul -march=native -funroll-loops -Wstrict-overflow -Wstrict-aliasing -Wall -Wextra -pedantic -Wshadow -o benchmark ./benchmarks/benchmark.c -Iinclude  clhash.o

$ ./benchmark
# for each input size in bytes, we report the number of cycles used to hash a byte
# First number is the size in bytes
# Second number is the number of CPU cycles per byte for clhash
# Third number is the number of CPU cycles per byte for java-like non-random hash function
                   8	 4.50 	 2.50 
                   9	 6.67 	 2.67 
                  10	 6.00 	 2.60 
                  11	 5.45 	 2.73 
                  12	 5.00 	 2.83 
                  13	 4.62 	 2.62 
                  14	 4.14 	 2.71 
                  15	 3.87 	 1.47 
                  16	 1.38 	 1.75 
                  17	 2.35 	 1.88 
                  18	 2.22 	 1.89 
                  19	 2.11 	 1.89 
                  20	 1.90 	 1.90 
                  21	 1.90 	 1.90 
                  22	 1.73 	 2.00 
                  23	 1.74 	 2.09 
                  24	 1.00 	 2.00 
                  25	 1.68 	 2.00 
                  26	 1.62 	 2.08 
                  27	 1.56 	 2.07 
                  28	 1.50 	 2.14 
                  29	 1.45 	 2.14 
                  30	 1.40 	 2.20 
                  31	 1.29 	 2.13 
                  32	 0.69 	 2.12 
                  33	 1.27 	 2.12

But note that for example the value for 32 byte is much better than the value for 31. The difference between 15 and 16 is even larger! So padding to multiples of 8 may make really sense. And I think when each user may do its own padding that is not a good solution.

For your paper: I was referring to https://arxiv.org/pdf/1503.03465v8.pdf figure 1. x-axis is labeled "data input size (bytes) and y-axis is "cycles per input byte". That curve is absolutely smooth for your clhash, and cycles per input byte starts at 2.5 for 8 byte size. So my assumption is that your current code is more tuned to large blocks?

And finally, it would be great to have additional a streaming interface as xxhash provides. For example when one wants to hash records with fields like name and country. I guess for the current interface one has to join the two strings and then hash it.

lemire · 2016-12-31T16:18:52Z

@StefanSalewski These are good ideas. Looking forward to receiving the pull requests!

lemire changed the title ~~Hash is not that fast when key size is not a multiple of 8~~ Hash is not that fast when key size is tiny and not a multiple of 8 Dec 31, 2016

lemire changed the title ~~Hash is not that fast when key size is tiny and not a multiple of 8~~ Add a streaming interface Jan 4, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a streaming interface #3

Add a streaming interface #3

StefanSalewski commented Dec 31, 2016

lemire commented Dec 31, 2016

StefanSalewski commented Dec 31, 2016

lemire commented Dec 31, 2016

Add a streaming interface #3

Add a streaming interface #3

Comments

StefanSalewski commented Dec 31, 2016

lemire commented Dec 31, 2016

StefanSalewski commented Dec 31, 2016

lemire commented Dec 31, 2016