Investigate (& report?) performances on JIT runtimes #207

masklinn · 2024-03-27T20:19:01Z

test runtime on pypy is not great, this might be because of pure-python pyyaml being slow, -m ua_parser does not work for pypy #206 would probably help rule that out (or in)
graal was added commented to GHA, but was even slower than pypy (graalpy 24 was just released: https://github.com/oracle/graalpython/releases/tag/graal-24.0.0)

The text was updated successfully, but these errors were encountered:

masklinn · 2024-10-14T17:00:06Z

All benching done with samples/useragents.txt (75158 lines, 20322 unique).

NOTE: "legacy" has a clearing cache of size 200.

parser	cache (n=200)	cpython 3.12	pypy 7.3.17	graalpy 24.1.0
legacy	clearing	29.00s (386us/line)	156.56s (2083us/line)	106.12s (1412us/line)
basic		33.53s (446us/line)	221.14s (2942us/line)	117.09s (1558us/line)
basic	lru	29.19s (388us/line)	220.40s (2933us/line)	72.96s (971us/line)
basic	s3fifo	24.18s (322us/line)	146.93s (1955us/line)	55.65s (740us/line)
basic	sieve	24.37s (324us/line)	127.61s (1698us/line)	55.43s (737us/line)
regex		1.31s (17us/line)	1.47s (20us/line)	7.15s (95us/line)

A few observations:

the regex engines of pypy and graal are dreary, but graal's showing is much better than expected¹
pypy really doesn't like it the new impl: on cpython the LRU is sufficient to catch up² and on Graal to gain a 30% edge, but pypy is still 40% behind
pypy also much prefers sieve, graal and cpython essentially don't care

important caveat, somehow Graal manages to work with a ton of concurrency, I assume the GC is concurrent but I don't understand what else it's doing: on my machine, on bench --bases legacy basic regex --cachesizes 200 --caches none lru s3fifo sieve, GraalPy times to 1175.69s user 8.45s system 283% cpu 6:57.58 total while pypy times to 925.28s user 3.32s system 100% cpu 15:27.14 total, so graal uses nearly 30% more CPU total but it nearly fully loads 3 cores, so ends up executing the test suite a bit more than twice as fast (some configurations go even higher e.g. basic with no cache basically runs at 400%)³ ↩
LRU should be better than the clearing cache at 200, but it makes sense that the layered approach of the new API would have some additional overhead, so coming out to a wash on cpython makes sense ↩
as it turns out that's got nothing to do with "tons of concurrency", it's that it's concurrently trying to JIT the regexes but the regex patterns of uap-core completely defeat TRegex's JIT compiler, as a result disabling regex compilation yields the same timings (or slightly better) at 100% CPU usage ↩

- enable graal on tox (24.1 with master's virtualenv plugin seems to work) - make tracemalloc optional in CLI script (doesn't work in pypy) - add regex to CLI script - comment graalpy trove classifier (doesn't exist yet) fixes ua-parser#206, fixes part of ua-parser#207

masklinn · 2024-10-27T16:48:46Z

masklinn · 2024-11-28T20:00:28Z

Reported the issues upstream (see previous comment) and implemented some rewriting on the Python side (#230 only graal and re2 as idk if pypy uses a DFA and uap-rs does the rewriting internally), don't think I can do much on this side of the issue aside from eventually #235

masklinn mentioned this issue Oct 27, 2024

Fix perf scripts #229

Closed

masklinn closed this as completed Nov 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate (& report?) performances on JIT runtimes #207

Investigate (& report?) performances on JIT runtimes #207

masklinn commented Mar 27, 2024 •

edited

Loading

masklinn commented Oct 14, 2024 •

edited

Loading

masklinn commented Oct 27, 2024

masklinn commented Nov 28, 2024

Investigate (& report?) performances on JIT runtimes #207

Investigate (& report?) performances on JIT runtimes #207

Comments

masklinn commented Mar 27, 2024 • edited Loading

masklinn commented Oct 14, 2024 • edited Loading

Footnotes

masklinn commented Oct 27, 2024

masklinn commented Nov 28, 2024

masklinn commented Mar 27, 2024 •

edited

Loading

masklinn commented Oct 14, 2024 •

edited

Loading