-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compose: add a cache for parsed files #220
base: master
Are you sure you want to change the base?
Conversation
fcf7de4
to
564a25f
Compare
Parsing /usr/share/X11/locale/en_US.UTF-8/Compose takes about 21% of the startup of GLFW on my PinePhone according to perf, this patch lowers it by about 33ms down to approximately 2%. This adds an optional dependency on xxhash. Signed-off-by: Emmanuel Gil Peyrot <[email protected]>
Signed-off-by: Emmanuel Gil Peyrot <[email protected]>
564a25f
to
8c96caf
Compare
Interesting. I would definitely want this to work well on PinePhone and these numbers do sound like a lot. Some historical context: the original libX11 Compose implementation does in fact have a cache (
In that spirit, I'd like to first see if we can make the Compose parsing itself fast enough so that a cache is not necessary. I had a quick look now and committed 7d84809 which brings down the timing from 8ms to 5.3ms on my old laptop, according to I'd be interested to see the perf output from the phone, as well as the absolute timings. Maybe something is particularly slow there. |
Hi, and thanks for your interest! I just profiled again, before any optimisation (on 95e2907) I’ve attached the perf and flamegraph of three of these runs: perf+flamegraph.tar.gz It doesn’t seem like the case-sensitive optimisation had such a big effect on AArch64, I’ll have a look whether glibc has the same optimisation there as on x86, since that’s a bit weird there was such a big change on your laptop and almost none on my phone. As for the issues with the cache, we might have a look how other libraries such as Mesa are handling it, I’d trust them on that. :) Note that Edit: it seems glibc has no asm whatsoever for strcmp, it’s all relying on the compiler doing a good job at autovectorisation. |
Hmm pretty interesting that it had no effect. Unfortunately when I try to I refreshed my memory on the code a bit more. Basically, at least on my machines, the time is mostly split up between these 3 parts:
I'd be curious to know how it does with these two patches applied. (Also I assume you're using an optimized build, maybe also LTO).
Amusingly it was the same thing for the keymaps -- the original xkbcomp has a cache, we got rid of it.
glibc is rather unreadable but the ASM for arm64 is found here: If for some reason it's not used that can explain some of the slowness. |
I’ve just tested your latest improvements from master on my phone, they bring it down from ~34ms to ~19ms, nice! As for perf not giving you relevant function names, do you want me to send you the DWARF data from my built library? I checked and the |
For me it's down to 2.8ms so I hoped it would be better, but still not bad. Thanks for testing again.
Sure, maybe that will do the trick. Still curious what is slowing things down - cache misses, branch mispredictions, memory fetches, or just plain CPU time (if I'm reading the right place, the PinePhone has a 1.2GHz CPU). So BTW the output of If possible, maybe also a callgrind output file? (Don't know it that's portable, hopefully yes).
Great (even if now strcmp is not called as much...) |
Parsing
/usr/share/X11/locale/en_US.UTF-8/Compose
takes about 21% of the startup of GLFW on my PinePhone according toperf
, this patch lowers it by about 33ms down to approximately 2%.This adds an optional dependency on
xxhash
, its BSD two clauses license is compatible and it was the fastest non-cryptographic hash I could find.I went for a
fopen()
/fread()
approach in order not to change the internal structs in any way, but with some simple changes we could do ammap()
and avoid the reading cost altogether, additionally letting multiple programs share a single read-only memory page.Signed-off-by: Emmanuel Gil Peyrot [email protected]