-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reflecting on implicit exceptions #149
Comments
Thanks for the writeup, @RossTate! I don't think it's a good idea to revisit the design right now, and I know that's not your intention with this post, so I'll refrain from commenting further. I do personally enjoy thinking about different designs, though, so I appreciate this food for thought. |
I personally think it is very useful to have design alternatives documented in public, for everyone to understand the design space, even if we already know we can't pursue them anymore for some reason. Thank you! I also should refrain from commenting in detail, because I am too unfamiliar with the exception requirements of the various languages involved to know if this can work. That said, a statically checkable design like this would certainly have been great for optimization, and any other kinds of analysis. Appears simple, and fitting to the low level design of Wasm too. |
Issue WebAssembly/reference-types#69 requires that `ref.null` instructions include a reference type immediate. This concept isn't present in the bulk-memory proposal, but the encoding is (in element segment expressions). This change updates the binary and text format, but not the syntax. This is OK for now, since the only reference type allowed here is `funcref`.
That would prevent throwing a Rust panic across a C++ frame and a C++ exception across a Rust frame, right? This is something Rust explicitly supports using |
You would explicitly convert the Rust panic into a C++ exception and a C++ exception into a Rusf panic at the respective boundaries. This gives the respective languages full control (over control), rather than restricting them to limited features like |
That requires depending on the C++ runtime to get access to the exception tag, right? That is not something Rust does. Also what if there are multiple C++ wasm modules with each their own C++ runtime statically linked? Rust should accept exceptions from all of them and allow throwing through all of them (it can't know which C++ runtime is used on the other side). Both the Itanium C++ ABI and SEH are designed such that you can throw an exception produced by one language through frames from another language without either language knowing about the existence of the other language. This is not the case for wasm exceptions if catch_all is gone. |
Not necessarily. With explicit exceptions, you don't even need tags. See #214. So you can just use the ABI of the runtime. Or the runtime can expose a function for throwing an exception according to its representation of exceptions.
Both of these assume all languages involved link to a common implementation of the core exception-handling runtime. Problems arise if an SEH exception propagates to a program using Itanum. Problems even arise if a C++ program using one copy of the Itanium library catches an exception thrown by a C++ program using a different copy of the Itanium library. These two conventions also have different semantics (one's single-phase and the other's two-phase) and even require laying out the stack differently (e.g. for SEH, you essentially embed a linked-list data structure of handler information into the stack).
Rust must now which of these conventions the C++ code is conforming to in order to even compile its functions (specifically its stack-frame layouts). I suspect they're simply assuming Itanium and relying on the linker to link the relevant calls to Itanium's API to a shared copy of the Itanium implementation. There are also many languages with exceptions besides Rust and C++. It's important to understand that many of these languages don't implement exceptions in the same way as Rust and C++ for various good reasons. For example, C++ exceptions are notoriously inefficient. Itanium is especially optimized for programs that do not throw exceptions. The wasm team invited a maintainer of a language where exceptions are thrown much more frequently to share their thoughts, and they spoke about the performance issues these conventions cause for their language. They instead use something more like the implementation of setjmp/longjmp, along with an off-stack data-structure of unwinders, in order to implement exceptions. In order to interop with C++ without taking a major performance hit, the compiler converts between the two implementation strategies at the relevant boundary points. Another issue is languages with multiple stacks. In many of these cases, you need to support propagating the exception from one stack to another; each of these languages have different semantics/implementations for determining what that other stack is, and in all cases you need to release the stack the exception propagated from, and Itanium and SEH both completely fail to support this. Some of these languages support optimizations that embed multiple stacks into one, which means the exception might not even propagate frame by frame but rather can skip entire chunks of the call stack. All of these interoperability issues are addressed by using explicit conversion between exception representations, which explicit exceptions forces.
There are other strategies for achieving the same goal without |
On Windows you would use SEH, on UNIX Itanium and on Wasm you would use whichever proposal will get accepted for exception handling. You only need to ensure a single implementation of this would exist in the process. This is guaranteed by design on wasm as the wasm runtime won't provide different implementations of the throw/try/catch instructions for each wasm module and there is nothing that can be statically linked in. (And so is it with SEH as there is no statically linkable version of the dll providing the SEH unwind functions) On top of SEH and Itanium there is an implementation for C++ exceptions provided by the C++ runtime. However at least Itanium unwinding is designed such that cleanup works even if all involved languages don't know about each other. Only catching and rethrowing requires you to know about the language from which the exception originated and even then if you immediately destroy the foreign exception after catching it (as rust does), you don't even need to have any knowledge about the language throwing the foreign exception.
Yes, but it does not link against the C++ runtime as this proposal would require. Rust does not know about C++ exceptions. Only about exceptions throw through Itanium or SEH (depending on the OS) that originated from any foreign language, be they C++ or Go or whatever.
How would this proposal help? Either the wasm runtime implements the throw+catching proposed by this issue using side tables like Itanium and thus occur the cost for all exceptions (and also whatever user space setjmp/longjmp implementation is used based on throw+catching) or the wasm runtime implements it using setjmp/longjmp in which case all exceptions would be fast, but regular calls would be slower (including for languages which don't use setjmp/longjmp for exceptions). |
You might be interested to know that
The proposal helps by not baking in the assumption that all programs implement exceptions through the same mechanism. C++ would implement exceptions using the features provided by this proposal. Other languages would implement exceptions using other (possible future) wasm features like long jumps. (Some languages are looking into not even using the wasm call stack at all in order to have more control over the stack; this allows them to implement long jumps using just |
As I understand it both the current proposal on the main branch as well as your proposal require catching to run cleanup even if unwinding continues further indefinitely. Rust requires running cleanup for all foreign exceptions and for foreign exceptions it requires being able to catch the exception, but throws away the exception data itself so rethrowing it isn't necessary. It throws the exception data itself away because even Itanium requires cooperation from the originating language's runtime to keep exceptions beyond the point where they were catched.
I think I understand what you mean now. Is your proposal that there would be a catch tag for Itanium like exceptions (except single phase instead of two phase) and then there is a standardized layout of this exception type where eg C++ could fill in like |
Your understanding sounds right. In a sense, the ABI is explicit in the signature (or, really, an approximation of the ABI is explicit in the signature). If you conform to the same ABI, then you can interop with no indirections. If you don't, you insert adapters where appropriate. There are various ABI designs one could use with different tradeoffs, but the point is that this becomes a tooling/conventions item rather than baking a single (rather restrictive) policy/convention throughout WebAssembly. And beyond working well across multiple implementation strategies, this approach even makes it easier within a particular ABI because you always know the data representation of any exceptions that can be thrown locally. That way, whenever you need to do more advanced control flow like |
Preface: These are ideas I came up with a couple months ago but kept to myself in order to avoid stirring up more controversy. But I'd like to at least express the ideas. So do not take this "Issue" as a "request for change"; if GitHub Discussions were enabled, this would be posted there instead. Also, I recognize that that the following applies a lot of hindsight.
From what I can tell, now that I understand the community's needs and desires better, a simple change to the type system (rather than just the instruction design) could have boiled exception handling down to two instructions (neither of which are blocks):
throw $exn
: throws an$exn
exception with the values on the value stack as its payloadcatching $exn $label
: precedes acall
-like instruction and modifies it to branch to$label
should an$exn
get thrown by the call (dumping the contents of the payload onto the value stack)—otherwise the thrown exception gets propagated to the callerThe corresponding change to the type system is that function types are extended with a
throws
clause that lists the exception events that can be thrown by the function. By default, C functions would have the(throws $__c_longjmp)
clause (becauselongjmp
would still have to be emulated by an exception), C++ functions would have the(throws $__c_longjmp $__cpp_exception)
clause (where the two are kept distinct because the former should always succeed and only the latter needs to cause destructors to fire), and Java functions would have the(throws $__java_exception)
clause (with thethrows
clause in the surface Java code being completely ignored).Why does this type-system change enable a simpler instruction set? Two reasons, the second of which relies on how this type-system change also makes the JS/C APIs simpler.
The first is that WebAssembly code no longer has to deal with unknown exceptions.
catch_all
, thenexternref
, and thencatch_all
/unwind
were all introduced to deal specifically with unknown exceptions. One might think that they were meant to enable reuse of unwinding code, but notice that in the above examples there is only ever one exception event in athrows
clause that's supposed to trigger unwinding. So really they're intended to deal with unknown exceptions, and so making all exceptions explicit eliminates the need for these constructs.Before going into the second reason, let us consider interaction with the host. From a
C
perspective, when calling a WebAssembly function from C, thethrows
clause essentially informs the C-caller of a wasm-exported function how many and what type of alternate return addresses it should provide (or how many "result" structs to pass it, with the returned value indicating which result struct to use). From a JS perspective, thethrows
clause prevents WebAssembly exceptions and JS exceptions from crossing the boundary. Instead, boundary crossing is explicit and restricted, and the WebAssembly module itself is responsible for converting (using various imports, possibly including imported exception events) between its own exception event(s) and whichever exceptions the boundary permits.Given that need for explicit conversion, the second enabler of simplification is that WebAssembly code is forced to handle stack traces explicitly. For example, a
$__cpp_exception
will not be able to arbitrarily cross the boundary into JS; rather, it will need to be caught by WebAssembly code, likely in the boundary code already generated by tools for interoping with JS. This means there's no utility in having additional content implicitly associated with a wasm exception. If you want to associate stack traces with an exception, you create the stack trace using an imported function and make the resulting value part of the payload of your own exception event. If your exception reaches your boundary code, then your code for converting your exceptions into JS exceptions should be extended to include a preexisting stack trace, which the debugger can then make use of. (This has the added advantage that the imported function used to create stack traces could alternatively be supplied with a "return null" function for faster performance during deployment, and that other languages not needing stack traces at all can get faster performance by default, which research indicates can be a 6x improvement for some languages.)Putting these together, we eliminate all need for new block instructions, which are the hardest to generate.
catch_all
andunwind
need to be block-based so that they can rethrow unknown exceptions, which no longer exist.catch
needs to be block-based so that it can retain implicit additional exception content (i.e. stack traces), which no longer exists. (Consequently,rethrow
is no longer necessary.)try
needs to be block-based because it needs to delimit the scope of these large special handlers, which are no longer special and can be just expressed with a label. (Consequently,delegate
is no longer necessary.)Thus just
throw
andcatching
can express everything these constructs can express whenever all exceptions are known, which thethrows
clause provides.The text was updated successfully, but these errors were encountered: