Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize ASCII digit and hexdigit conversion by using subtraction #21

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

okaneco
Copy link

@okaneco okaneco commented Dec 8, 2022

Subtract digits by their "base" digit to simplify the number of match arms when converting from ASCII digit or hexdigit to u8
Use FromPrimitive::from_u8 in ascii_to_digit after checking for valid ASCII digit range
Condense ascii_to_hexdigit match arms and convert with from_u8
Add FromPrimitive trait bound to required trait impls
Add tests to verify correct parsing of digits and hex digits


I saw some significant wins across the board with this on the benchmarks, hopefully it wasn't just my CPU. ascii_to_hexdigit showed the biggest improvements likely due to the simplification of match arm statements.

Unfortunately, the only way I could figure out how to do this involved adding the FromPrimitive trait bound from num-traits in several places. I think adding another trait bound to a public facing trait makes it a breaking change but I'm not sure. If there's another way to do this, I'd be glad to change it.

Subtract digits by their "base" digit to simplify the number of
match arms when converting from ASCII digit or hexdigit to u8

Use FromPrimitive::from_u8 in ascii_to_digit after checking for
valid ASCII digit range

Condense ascii_to_hexdigit match arms and convert with from_u8

Add FromPrimitive trait bound to required trait impls

Add tests to verify correct parsing of digits and hex digits
@pacman82
Copy link
Owner

pacman82 commented Dec 8, 2022

Hello @okaneco,

Thanks for your PR.

I saw some significant wins across the board with this on the benchmarks

This is interesting. ascii_to_digit and ascii_to_hexdigit is supposed to be optimized to be evaluated at compile time. No doubt the function does run faster, but it should not matter much if all its possible results (there are only 10 or 16 after all) are not evaluated at runtime for typical usecases like parsing a u32.

I think adding another trait bound to a public facing trait makes it a breaking change but I'm not sure.

Yes it is. E.g. imagine a wrapper around a String which defines an empty text ("") as zero and a single capital I as one, with addition being string concatenation. atoi could then parse 3 into "III".

It is part of the charm of this library that it's definition of what an integer number is, is close to the one a math student might learn (intersections of all sets with neutral Element and successor for each element), so it is applicable to pretty much anything which can be interpreted as a natural number.

There is precedence for this conflict: In math integers do not overflow. Which is why a separate set of traits had been introduced to handle overflow checking behavior of primitives. This allows e.g. FromRadix10 to still work on NumBigInt which can not overflow, which is its point.

It is not unreasonable to assume there is a high correlation between types that can overflow, and the ones that can be constructed from primitives. Let me think about all this, for a bit.

Cheers, Markus

@okaneco
Copy link
Author

okaneco commented Dec 8, 2022

No worries. Thanks for explaining the notion of an integer in this library, that makes more sense to me now.

I'm not an expert in assembly but I think this is roughly what's happening. The current ascii_to_digit is optimized, except this PR version with RangeInclusive::contains emits an lea instead of add. I think this ends up being slightly more efficient because it can do the addition in place instead of going to the arithmetic unit. The current implementation is equivalent to

let character = character.wrapping_sub(b'0');
if character < 10 {
    Some(n as I)
} else {
    None
}

https://rust.godbolt.org/z/rsvq197fc Edit (2023/01/14): this was wrong, see #21 (comment)

ascii_to_hexdigit seems more straightforward in that it creates a massive table for the current implementation.

https://rust.godbolt.org/z/YjYGcW81n

@pacman82
Copy link
Owner

Had a spare moment. I can confirm the speedups. I'll need a bit more spare time to think about the interface.

Use bitwise operation to convert ASCII lowercase to uppercase allowing
for hex digits to share the same if-conditional branch
@okaneco
Copy link
Author

okaneco commented Jan 14, 2023

Take your time, not a problem.

Small update, I was able to achieve another 15-35% speedup on 3 of the 4 hex benchmarks which brings those 3 benchmarks to being twice as fast as main on my machine.

I tried rewriting hexdigit to be non-breaking by removing use of FromPrimitive. The performance of the following version is ~30% faster than main but still 30% slower than using FromPrimitive.

fn ascii_to_hexdigit<I>(character: u8) -> Option<I>
where
    I: Zero + One,
{
    // Unsetting the 6th bit converts ASCII alphabetic lowercase to uppercase.
    //
    // b'A' = 0b_0100_0001 (decimal 65), b'F' = 0b_0100_0110 (decimal 70)
    // b'a' = 0b_0110_0001 (decimal 97), b'f' = 0b_0110_0110 (decimal 102)
    // b'a' & 0b_1101_1111 converts 'a' to 'A'.
    let mask = 0b_1101_1111;

    if matches!(character, b'0'..=b'9') {
        match character {
            b'0' => Some(nth(0)),
            b'1' => Some(nth(1)),
            b'2' => Some(nth(2)),
            b'3' => Some(nth(3)),
            b'4' => Some(nth(4)),
            b'5' => Some(nth(5)),
            b'6' => Some(nth(6)),
            b'7' => Some(nth(7)),
            b'8' => Some(nth(8)),
            b'9' => Some(nth(9)),
            _ => None,
        }
    } else {
        match character & mask {
            b'A' => Some(nth(10)),
            b'B' => Some(nth(11)),
            b'C' => Some(nth(12)),
            b'D' => Some(nth(13)),
            b'E' => Some(nth(14)),
            b'F' => Some(nth(15)),
            _ => None,
        }
    }
}

There was a mistake in the first Compiler Explorer link I posted for ascii_to_digit, when the third line of the ascii_to_digit_contains version is changed to Some((n - b'0') as i32) all 3 functions produce the same instructions. I'm not sure why it performs better in the benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants