Fast effect forwarding #256

frank-emrich · 2024-11-18T17:04:34Z

This PR fundamentally changes how we handle tag/effect forwarding: Currently, at any suspend site, we unconditionally attempt to switch to the parent stack (or fail, if we are already on the the main stack). In the process, we pass the tag identifier that we suspended with. Then, we check if the resume clause in the parent had a handler for that tag. If it doesn't, we attempt to suspend again to the parent of the (original) parent. This process is repeated until we either find a handler, or reach the main stack (meaning that there was no appropriate handler).

The new design introduced by this PR works as follows: Every stack (main stack or continuation) carries a HandlerList around. This is effectively a vector of tag identifiers (i.e., *mut VMTagDefinition).

On resume $ct (on $tag_1 $block_1) ... (on $tag_n $block_n), we then fill the active stack's HandlerList with the following n entries:

<address of $tag_1>
...
<address of $tag_n>

On suspend $tag, we then traverse the chain of active continuations, inspecting the HandlerList of the parents of the currently active continuation. If we find the address of $tag as the _i_th entry of the handler list of some resume instruction in our parents, we directly switch to the stack that executed the corresponding resume instruction. In the process, we pass along the list index i. Back at the resume site, we can then directly jump to the _i_th handler block, using a jump table.
If the handler search does not succeed, we immediately trap at the suspend site.

Some implementation notes:

The search for an appropriate handler is implemented in wasmtime_cranelift::wasmfx::optimized::search_handler, which is used for emitting code at suspend sites.
While there are a lot of changes to translate_resume, it is actually simpler now! The rather annoying back edge from the forward block to the resume block is gone, and every block will at most be executed once.
ControlEffect is no longer using TaggedPointer. There are simply no more pointers involved here at all, the only payload to hand from suspend site to resume site (in addition to the return vs suspend signal) is the index of the handler we want to use.

In general, everything is built to be compatible with switch being added on top of this design: In particular, the HandlerList and aforementioned search_handler can easily be adapted to support switch, and the changes to translate_resume are made with switch in mind.

crates/continuations/src/lib.rs

dhil · 2024-11-19T09:35:03Z

crates/cranelift/src/wasmfx/optimized.rs

+///   contref = chain_link.get_contref()
+///   parent_link = contref.parent
+///   parent_csi = parent_link.get_common_stack_information();
+///   handlers = parent_csi.handlers;


This doesn't seem right to me. The handlers should sit on the current continuation reference. If the current one doesn't handle the suspension, then we try the parent, and so forth.

I'm not sure if I understand what you are saying.

Consider the following example:

... ^ | parent c1 ^ | parent c2

where $c1 executed (resume $ct (on $t switch) ... (local.get $c2)) and $c2 is currently executing.

In the current design in this PR, continuation $c1 would have a handler list l with a switch entry for $t (as well as all other tags potentially occurring in the resume above).
Continuation $c2 is currently running and has no handler list.

Are you suggesting that the handler list should be attached to $c2 instead, meaning that each continuation stores what is handled by its immediate parent?

The disadvantage of that approach over the current one is that it requires additional work on switch instructions:
In the example above, if $c2 then executes (switch $ct' $t (local.get $c3)) for some $c3, we would have to update the handler lists such that l becomes the handler list of $c3 (and the handler list of $c2 is cleared for housekeeping).

This work isn't required if the handler list l is associated with $c1 instead (and stays there on switch).

I think you are right about querying the parent first. We do that in all our implementations, I see. On reflection, it must be like that, because the handler can grow its continuation, and when resuming a captured continuation it needs to be spliced with the handler's continuation.

Regarding switch and clearing the handlers on the continuation reference, I think the only thing that needs to happen for $c3 is that it has its parent adjusted (similar for the current continuation). We should never have to update the handlers of "switched" continuations, I think.

Regarding switch and clearing the handlers on the continuation reference, I think the only thing that needs to happen for $c3 is that it has its parent adjusted (similar for the current continuation). We should never have to update the handlers of "switched" continuations, I think.

Yes, this is what the design in this PR ensures. You don't have to touch the handlers on switch.

The semantics of switch is similar to the semantics of a deep handler, in the sense that the handler stays in place after resume. So any proper design should not have to modify the handler on switch.

dhil · 2024-11-19T09:36:05Z

crates/cranelift/src/wasmfx/optimized.rs

+        let raw_parts = builder.block_params(handle_link);
+        let chain_link =
+            tc::StackChain::from_raw_parts([raw_parts[0], raw_parts[1]], env.pointer_type());
+        let is_main_stack = chain_link.is_main_stack(env, builder);


I really don't like this special case of the main stack. But we can fix that in another patch. Ideally, every stack is homogeneous.

There is no "special case for the main stack" here. We are traversing a linked list, whose last element is the main stack.
The only thing that is_main_stack is used for is detecting that we have reached the end of the list without finding a handler.
You could rename is_main_stack to has_no_parent. Such a check will always be required in this traversal.
The rest of the code treats continuations and the main stack is completely homogeneously.

My comment was conceptual. It is not about this particular code. We shouldn't have a notion of a main stack at all, is my point.

dhil · 2024-11-19T09:41:11Z

crates/cranelift/src/wasmfx/optimized.rs

+
+        // Note that the control context we use for switching is not the one in
+        // (the stack of) resume_contref, but in (the stack of) last_ancestor!
+        let fiber_stack = last_ancestor.get_fiber_stack(env, builder);


~~I don't understand the point of this "last ancestor" business. Please explain.~~ See my other comment below.

dhil · 2024-11-19T09:52:51Z

crates/wasmtime/src/runtime/vm/continuation.rs

+///                 |   +----------------+
+///                 |            ^
+///                 |            | parent
+///  last ancestor  |            |


OK, so reading this drawing I may understand what you are on about with the ancestor business. What you call last_ancestor is really the initial frame of the delimited continuation that is being captured. I think we should name all this differently. The frame is not the "last" ancestor, as it may be adopted by another ancestor etc. However, it is the tail of the linked list that defines the continuation. Thus, I think it is better to use names such as "tail"/"head" to describe these objects. The data structure you are building is known as a "circular list". I think it is worthwhile to adopt this terminology to make the implementation easier to understand and more readily accessible to other people.

I'm not particularly married to the name last_ancestor. However, this field is only ever set on the "head" continuation (bottom-most continuation in the picture) and only if it is suspended. Thus, if its set, it indeed points to the last ancestor.

I'm not particularly keen on the term tail, since it is somewhat overloaded in the context of linked lists. It is both used to refer to the next node in a linked list, but also to refer to the end of the list.

I am happy to consider anything but last ancestor. Am I correct that the continuation object points to the bottom stack in the chain (i.e. the stack that executed the suspension)? If so, then I think we can optimise it ever-so-slightly but returning the "last ancestor" instead and set its parent to be the bottom stack. We should save a field that way too.

That's possible, but I would prefer holding off from that. The reason is that it complicates how at suspend, we propagate the required information to the resume site.
In the current version of this PR, the following happens:

We identify the currently running continuation c1 by reading the stack chain field of the VMContext. This will be the bottom of the continuation chain of the continuation object we are about to construct.

The handler search gives us the other end of the chain, c2. We set c1.last_ancestor to c2.

We perform the stack switch.

Back in resume, we read c1 from the VMContext (it's still saved in the stack chain field) and obtain the other end (c2) from its last_ancestor field.

In the design you propose, we would instead make c2 point to c1 in step 2. But this means that in the 4th step above, we have no way to obtain c2. We would have to either temporarily write c2 into the VMContext or extend the switch instruction so that we can pass more payloads. I'd very much prefer avoiding that, at least for now.

Ugh, it strikes again. I understand. We really need to get rid of the auxiliary structures. We should be able to encode everything in one chain. Certainly we need this before/for the upstreaming.

The issue I described is orthogonal from the details of how we represent the chain of currently active continuations. In any design, there will always be a field in the VMContext that points to the currently active continuation. That's the field accessed in step 1 and 4 above.
In that sense, the design in this PR works without requiring any new channels (in the widest sense) to get information from the suspend to the resume site.

I am not convinced. To me the current design/implementation has a certain spaghetti feeling to it. But it can be refactored later. Clarification: I am not talking about this patch, but our entire implementation. It is too convoluted in my opinion. It is an obvious consequence of how we have organically evolved the implementation too. I am positive a refactoring will reveal a much simpler design/implementation.

dhil · 2024-11-19T09:59:46Z

All this list business, reminds me of some tricks we should try. One trick used to speed up linked list searching is the "move to front" trick, where once an element has been found it is moved to the front of the list, such that next time you look for it, it may be one of the first elements. Obviously, we don't want exactly that, but a variant of it may speed up continuation materialisation even further, naming, what we could have on each continuation reference is a pointer to the most recent suspension handler (like caching the latest handler).

dhil · 2024-11-20T10:42:40Z

I guess we can merge this as-is.

Co-authored-by: Daniel Hillerström <[email protected]>

frank-emrich · 2024-11-20T22:00:16Z

I guess we can merge this as-is.

I'm fine with that, but would you still prefer renaming the last_ancestor field in VMContRef? I can live with tail, but open for other suggestions.

frank-emrich added 2 commits November 18, 2024 16:46

fast effect forwarding

d3f61e8

make doc tests happy

c764798

frank-emrich requested a review from dhil November 18, 2024 17:52

dhil reviewed Nov 19, 2024

View reviewed changes

fix typo

ea966a6

Co-authored-by: Daniel Hillerström <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fast effect forwarding #256

Fast effect forwarding #256

frank-emrich commented Nov 18, 2024

dhil Nov 19, 2024

frank-emrich Nov 19, 2024

dhil Nov 19, 2024

frank-emrich Nov 19, 2024

dhil Nov 20, 2024

dhil Nov 19, 2024

frank-emrich Nov 19, 2024

dhil Nov 19, 2024

dhil Nov 19, 2024

dhil Nov 19, 2024

frank-emrich Nov 19, 2024

dhil Nov 19, 2024

frank-emrich Nov 19, 2024

dhil Nov 19, 2024

frank-emrich Nov 20, 2024

dhil Nov 20, 2024 •

edited

Loading

dhil commented Nov 19, 2024

dhil commented Nov 20, 2024

frank-emrich commented Nov 20, 2024

Fast effect forwarding #256

Are you sure you want to change the base?

Fast effect forwarding #256

Conversation

frank-emrich commented Nov 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dhil Nov 20, 2024 • edited Loading

Choose a reason for hiding this comment

dhil commented Nov 19, 2024

dhil commented Nov 20, 2024

frank-emrich commented Nov 20, 2024

dhil Nov 20, 2024 •

edited

Loading