- ~1.2 clocks per byte
- 100% C (& C++ compatible) without inline assembly
- Both 32 and 64 bits supported
- Portable scalar functions faster thand SIMD functions
- Up to 3 times faster than naive solution
i7-2600k at 4.5GHz, gcc 5.1, ubuntu 15.04.
- Single thread
- Realistic and practical benchmark with large files.
- No PURE cache benchmark
- Uniform: enwik9
- Skewed: enwik9 bwt generated w. libdivsufsort
Function | Uniform: Time MB/s | Skewed: Time MB/s |
---|---|---|
hist_8_32 | 2758.61 | 2746.19 |
hist_4_32 | 2745.88 | 2594.32 |
hist_8_128 | 2714.06 | 2709.78 |
hist_4_128 | 2715.28 | 2650.48 |
hist_8_64 | 2697.89 | 2670.17 |
hist_4_64 | 2639.83 | 2553.63 |
hist_8_8 | 2349.73 | 2333.46 |
hist_4_8 | 2213.05 | 2082.84 |
hist_1_8 | 1882.86 | 926.88 |
count2x64 | 2838.60 | 2836.31 |
cc -O3 -march=native turbohist.c -o turbohist
- Linux: gcc (>=4.6)
- clang (>=3.2)
- Windows: mingw-w64 (>=4.6)
turbohist file
Countbench: https://github.com/nkurz/countbench (including "count2x64" with inline assembly)