-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Generate an IR (aka int64 part 1/4). #983
base: master
Are you sure you want to change the base?
Conversation
This is great work! There has been some demand for an Out of curiosity, will a SourcePawn VM refactor also feature introducing x86-64 JIT support, in particular for 64-bit P.S. thank you for reminding me that I need to learn more about how compilers work haha |
The VM already supports x86_64 right? Do you mean a JIT? If so, no plans for one since I don't think it's as important as language feature support. But that could change, especially with motivating evidence. |
Yeah I meant the JIT - apologies, I had just woken up. That's fair, I can't give much measurements at the moment as I haven't played with 64-bit TF2 servers a whole lot yet. |
last time we tested x64 without JIT, it had significant performance regressions comparing to x32 with JIT. Therefore we reverted our servers back to 32-bit srcds. would likely need someone with active player population to do vprofs on both versions |
I have run a profiler on both as seen here: We have a single 2fort server on x64, it uses about 20% more cpu with the heaviest plugin being SMAC. This is probably acceptable for all but 100 slot servers, which only 1 person has anymore (not us). It could also be worse for custom game modes that heavier users of sourcepawn though. |
Extremely likely X86 jit won't survive this refactoring as it's internally incompatible with the planned opcode changes. It'll be frozen for compatibility and used on old binaries only. It's possible I might find a route to refactor pri/alt but I doubt it as it'd be a nightmare to maintain.
I'd be open to a new JIT, likely 64bit only, but showing performance decrease isn't enough. It needs to be shown that (1) the issue is not reasonably addressable in the plugin itself (dispatch costs are huge and plugin is not doing dumb stuff) and (2) the issue is actually meaningful (playability of a game is affected). A crude analogy: A car's max speed dropping by 50% doesn't matter if the speed is still 100mph and is driving on 60mph roads.
I get the concern around performance, but wishcasting or anecdotes don't change the calculus: the language won't move forward without breakage. So it'll break, and then we see where things stand.
|
Just to be clear we're personally fine the lack of JIT right now, though we wouldn't say no if you were to make one. As you explained, as long as the cpu usage on 1 core is less than 100%, it doesn't matter if sourcemod is using 20% vs 90%. This would only be an issue on 100 slot servers which only 1 youtuber is able to fill. However we're fortunate enough to be running on newish cpus and others might not be so fortunate. We're also not using the most demanding plugins, so I can't speak for everyone. |
f949891
to
5554090
Compare
Our community won't be affected as we have budget, others not so much. Gotta keep in mind that many people run multiple servers on single machine, therefore the cpu usage increase stacks up. Anyway, worrying about performance imo should be done after any major changes SM devs have planned for sourcepawn are done. Premature optimization is the root of all evil as they say 😄 And even then without JIT, we all can always try to brainstorm some optimizations for VM itself! |
f74cd5e
to
3894d96
Compare
Making slow progress. There are some things that are way, way easier to express in the IR. Operator overloads are so much nicer, and in general individual nodes are much simpler. But other things are a somewhat harder. Like, a naive += can double-bind the lvalue, which could lead to obscure bugs. For example "x[crab()] += 5" could call "crab()" twice instead of once, so we need new kinds of nodes that weren't previously in the AST. There's lots of little details like this to work through. |
626e243
to
baa55bb
Compare
214d3fd
to
d44c4c9
Compare
ceda467
to
3c5b28d
Compare
98dab75
to
affd880
Compare
aeaa101
to
2f22a29
Compare
The compiler has come a long way, but we still can't do int64, because it requires a non-cell storage size. There's no (sane) way to express conversions between int32 and int64 within the AST, because we have no uniform way of inserting conversion nodes. This is already a deep problem that has been hacked around for operator overloads and property accessors, and it doesn't scale. The solution is obvious: transform the AST into an IR. That's what we should have done from the beginning but didn't. Unfortunately it requires a *lot* of refactoring and a ton of boilerplate. So far, I have most of the boilerplate done, but the refactoring is only halfway there. CodeGenerator has not been ported to the IR yet. Once this gigantic patch is done, we'll have the following changes: - `struct value` will be eliminated, and good riddance. - The AST will be immutable after parsing. - The semantic analysis phase will output a new IR tree. - CodeGenerator will generate off the IR instead. Since the IR is a transformation of the AST, I'm expecting minimal changes to the end result. - functag_t will be replaced by FunctionType. - Temporary heap allocations will be replaced with pre-allocated stack slots. V2: CG-IR can now assemble trivial programs. V3: CG-IR supports basic calls; 341 test failures. V4: CG-IR supports binary ops; 333 test failures. V5: CG-IR supports do-while and if; 329 test failures. V6: CG-IR supports args, local incdec, switch; 319 test failures. V7: Dropped IncDecOp in favor of Store/BinaryOp primitives. Added global variable support. V8: Added support for heap scopes. 294 test failures. V9: Add support for Load(IndexOp) and arrays as arguments. 290 test failures. V10: Add support for varargs. 289 test failures. V11: Port array helpers to IR. 280 test failures. V12: Add for-loop support. 273 test failures. V13: Get effective address calculation working. 271 test failures. V14: Move array type resolution into the semantic pass, due to not having full semantic powers during name resolution. 250 test failures. V15: Fix argument type deduction. 247 test failures. V16: Fix by-ref arguments. 243 test failures. V17: Add support for "delete". Add support for operator overloads. Introduce the concept of temporary stack slots. 233 test failures. V18: Replace TempRef with StackRef. Implement FieldRef. Implement Paamayim Nekudotayim. 225 test failures. V19: Implement IncDec properly. 222 test failures. V20: Implement PropertyRef. Implement varargs corner cases. 208 test failures. V21: Implement |this|. Implement view_as for some l-values. 204 test failures. V22: Implement bound |this| calls. 188 test failures. V23: Implement trivial view_as. 185 test failures. V24: Fix nested enum structs. Fix enum decl assignment. Fix incdec with user ops. Fix property ref getter binding. Fix bound function calls. Fix various tests. 151 test failures.
The compiler has come a long way, but we still can't do int64, because
it requires a non-cell storage size. There's no (sane) way to express
conversions between int32 and int64 within the AST, because we have no
uniform way of inserting conversion nodes. This is already a deep
problem that has been hacked around for operator overloads and property
accessors, and it doesn't scale.
The solution is obvious: transform the AST into an IR. That's what we
should have done from the beginning but didn't. Unfortunately it
requires a lot of refactoring and a ton of boilerplate. So far, I have
most of the boilerplate done, but the refactoring is only halfway there.
CodeGenerator has not been ported to the IR yet.
Once this gigantic patch is done, we'll have the following changes:
struct value
will be eliminated, and good riddance.transformation of the AST, I'm expecting minimal changes to the end
result.
I'm calling this part 1 of 4, since roughly the steps toward
int64
are:nodes.
make PRI/ALT dynamically sized).