Simplify ASCII check #115

rhpvorderman · 2023-09-17T18:42:22Z

Unaligned loads perform well on x86_64

No need to keep different functions and files as the SSE2 specific code can be surrounded by compile guards

Recently I have been writing quite some vectorized code and I decided to update my very first attempt at the matter. This is certainly much simpler. I did a quick check and pointer types are signed by default. (At least on my platform, intptr_t is a long, not an unsigned one). So deducting from end_ptr as in this code will simply work.

Daniel Lemire did a test and found there is no difference between unaligned and aligned loads: https://lemire.me/blog/2012/05/31/data-alignment-for-speed-myth-or-reality/. This was quite some time ago. I also did some reading lately and I found it confirmed that AMD and Intel specifically altered their architectures to make sure unaligned loads are just as fast. Data alignment is simply not an issue anymore for speed. Difference is not measurable. So unaligned loads are actually faster as you can start using vector instructions right away rather than having the overhead of an alignment loop first.

I did some quick testing and found no speed difference between this code and the old code. This will save quite some lines.

- Unaligned loads perform well on x86_64 - No need to keep different functions and files as the SSE2 specific code can be surrounded by compile guards

marcelm · 2023-09-20T12:03:19Z

Nice! I noticed this "#ifdef SSE2_ ... #endif while ..." pattern in the other PR and found it quite nice.

rhpvorderman · 2023-09-21T08:46:35Z

Yes, sometimes even without vectors you want to do a unrolled loop that does multiple operations and one that does only one. This pattern helps a lot with that. Also getting rid of a loop control variable sometimes means faster execution times. So it is a big win all around, even without vectorization.

Simplify ASCII check

6cd3c68

- Unaligned loads perform well on x86_64 - No need to keep different functions and files as the SSE2 specific code can be surrounded by compile guards

marcelm merged commit 5c8b2e1 into marcelm:main Sep 20, 2023
14 checks passed

rhpvorderman deleted the simpleasciicheck branch September 20, 2023 12:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify ASCII check #115

Simplify ASCII check #115

rhpvorderman commented Sep 17, 2023

marcelm commented Sep 20, 2023

rhpvorderman commented Sep 21, 2023

Simplify ASCII check #115

Simplify ASCII check #115

Conversation

rhpvorderman commented Sep 17, 2023

marcelm commented Sep 20, 2023

rhpvorderman commented Sep 21, 2023