Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: For single-threaded execution, there is no need for Send/Sync bounds #934

Merged
merged 1 commit into from
Nov 18, 2023

Conversation

schreter
Copy link
Collaborator

@schreter schreter commented Nov 15, 2023

Some preliminary implementation with OptionalSend/OptionalSync was done previously, it was not complete, though.

Currently, Send/Sync bounds are required for AppData, RaftEntry, async runtime wrappers and other types, but they are only relevant for multi-threaded access. If openraft is configured to use single-threaded runtime, then they don't have to be Send/Sync.

Consequently replace Send/Sync bounds with OptionalSend/OptionalSync bounds to clean it up.

This change also fixes timeout.rs, which seems to be unused so far, but didn't properly use AsyncRuntime abstraction.

Two points for the future:

  • We should add at least a compile test where singlethreaded feature is set to test it.
  • Later, even Arc could be replaced with a single-threaded counterpart to prevent the need for atomic refcounting.

This change is Reviewable

@schreter schreter requested a review from drmingdrmer November 15, 2023 12:38
Copy link
Collaborator Author

@schreter schreter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 19 files reviewed, all discussions resolved (waiting on @drmingdrmer)


openraft/src/core/raft_msg/mod.rs line 97 at r1 (raw file):

        req: Box<
            dyn FnOnce(&RaftState<C::NodeId, C::Node, <C::AsyncRuntime as AsyncRuntime>::Instant>, &mut LS, &mut N)
                + Send

Note: It's not possible to use OptionalSend here, since only auto-traits are allowed. Therefore, two instantiations with different #[cfg].


openraft/src/network/backoff.rs line 9 at r1 (raw file):

pub struct Backoff {
    #[cfg(not(feature = "singlethreaded"))]
    inner: Box<dyn Iterator<Item = Duration> + Send + 'static>,

Same here


openraft/src/timer/timeout.rs line 70 at r1 (raw file):

        };

        RT::spawn(inner.sleep_loop(rx, callback).instrument(trace_span!("timeout-loop").or_current()));

We can't use tokio::spawn and sleep functions, since they are dependent on Tokio runtime and won't work with a different async runtime. So I added AsyncRuntime generic argument here.

Copy link
Collaborator Author

@schreter schreter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 19 files reviewed, all discussions resolved (waiting on @drmingdrmer)


openraft/src/timer/timeout_test.rs line 21 at r2 (raw file):

fn test_timeout() -> anyhow::Result<()> {
    let rt = tokio::runtime::Builder::new_current_thread().enable_time().build().unwrap();
    tokio::task::LocalSet::new().block_on(&rt, test_timeout_inner())

There indeed is a check in CI/CD which builds the single-threaded runtime. However, only a single test actually ran using async code - this one. So I adjusted it locally for now, but later it would be good to handle this with a proc-macro or similar to have one entry point type testing both.

Copy link
Member

@drmingdrmer drmingdrmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 16 of 19 files at r1, all commit messages.
Reviewable status: 16 of 19 files reviewed, 2 unresolved discussions (waiting on @schreter)


openraft/src/async_runtime.rs line 19 at r3 (raw file):

///
/// The default asynchronous runtime is `tokio`.
pub trait AsyncRuntime: Debug + Default + OptionalSend + OptionalSync + 'static {

Do we need an AsyncRuntime to be Send and Sync?

It looks like Openraft does not need to operate a runtime, e.g. sending a runtime to another thread.


openraft/src/raft/mod.rs line 174 at r3 (raw file):

    C::R: Send,
{
}

In my opinion, simply having these types implement Send is not sufficient for Raft to implement Send.

For example, for C::NodeId,
because Raft has a field of inner: Arc<RaftInner>, to make Raft Send, RaftInner must be Sync.
Therefore RaftInner.id: C::NodeId must be Sync:
https://github.com/datafuselabs/openraft/blob/4c488a61da8fee614294dea107fdbf0faf15cd28/openraft/src/raft/raft_inner.rs#L24-L30

Code quote:

// SAFETY: Even for a single-threaded Raft, the API object is MT-capable.
//
// The API object just sends the requests to the Raft loop over a channel. If all the relevant
// types in the type config are `Send`, then it's safe to send the request across threads over
// the channel.
//
// Notably, the state machine, log storage and network factory DO NOT have to be `Send`, those
// are only used within Raft task(s) on a single thread.
unsafe impl<C, N, LS, SM> Send for Raft<C, N, LS, SM>
where
    C: RaftTypeConfig,
    N: RaftNetworkFactory<C>,
    LS: RaftLogStorage<C>,
    SM: RaftStateMachine<C>,
    C::D: Send,
    C::Entry: Send,
    C::Node: Send,
    C::NodeId: Send,
    C::R: Send,
{
}

Copy link
Collaborator Author

@schreter schreter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 16 of 19 files reviewed, 2 unresolved discussions (waiting on @drmingdrmer)


openraft/src/async_runtime.rs line 19 at r3 (raw file):

Previously, drmingdrmer (张炎泼) wrote…

Do we need an AsyncRuntime to be Send and Sync?

It looks like Openraft does not need to operate a runtime, e.g. sending a runtime to another thread.

Not really, no. But, OTOH, the Send/Sync for a struct is only derived if all generics have Send/Sync. So this makes it probably simpler, otherwise we'd have to explicitly implement Send/Sync on many other objects.


openraft/src/raft/mod.rs line 174 at r3 (raw file):

Previously, drmingdrmer (张炎泼) wrote…

In my opinion, simply having these types implement Send is not sufficient for Raft to implement Send.

For example, for C::NodeId,
because Raft has a field of inner: Arc<RaftInner>, to make Raft Send, RaftInner must be Sync.
Therefore RaftInner.id: C::NodeId must be Sync:
https://github.com/datafuselabs/openraft/blob/4c488a61da8fee614294dea107fdbf0faf15cd28/openraft/src/raft/raft_inner.rs#L24-L30

Though strictly speaking you are right, the NodeId (and also other types mentioned here) are never shared by reference in non-Raft tasks. They are only sent over the channel.

Nevertheless, I can add additional bounds for Node and NodeId requiring them to be Sync as well to be on the safe side.

…` bounds.

Some preliminary implementation with `OptionalSend`/`OptionalSync` was done
previously, it was not complete, though.

Currently, `Send`/`Sync` bounds are required for `AppData`, `RaftEntry`,
async runtime wrappers and other types, but they are only relevant for
multi-threaded access. If `openraft` is configured to use single-threaded
runtime, then they don't have to be `Send`/`Sync`.

Consequently replace `Send`/`Sync` bounds with `OptionalSend`/`OptionalSync`
bounds to clean it up.

Even if there are no `Send`/`Sync` bounds required, the `Raft` object can
still be `Send`/`Sync` capable. It is only an API object sending requests
to the Raft main loop over a channel. As long as the involved data types
are `Send` (and some of them, which are used in `RaftInner`, also `Sync`),
we can declare `Raft` as such `Send`/`Sync`.

This change also fixes `timeout.rs`, which seems to be unused so far, but
didn't properly use `AsyncRuntime` abstraction.

Two points for the future:
- We should add a generic test invocation to invoke tests on `LocalSet`
  for tests with `singlethreaded` feature. Currently, there is a single
  test only, in `timeout_test.rs`.
- Later, even `Arc` could be replaced with a single-threaded counterpart
  to prevent the need for atomic refcounting.
Copy link
Collaborator Author

@schreter schreter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 16 of 19 files reviewed, 2 unresolved discussions (waiting on @drmingdrmer)


openraft/src/raft/mod.rs line 174 at r3 (raw file):

Previously, schreter wrote…

Though strictly speaking you are right, the NodeId (and also other types mentioned here) are never shared by reference in non-Raft tasks. They are only sent over the channel.

Nevertheless, I can add additional bounds for Node and NodeId requiring them to be Sync as well to be on the safe side.

Done

Copy link
Member

@drmingdrmer drmingdrmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 16 of 19 files reviewed, 1 unresolved discussion (waiting on @schreter)


openraft/src/async_runtime.rs line 19 at r3 (raw file):

Previously, schreter wrote…

Not really, no. But, OTOH, the Send/Sync for a struct is only derived if all generics have Send/Sync. So this makes it probably simpler, otherwise we'd have to explicitly implement Send/Sync on many other objects.

Make sense


openraft/src/raft/mod.rs line 174 at r3 (raw file):

Previously, schreter wrote…

Done

What about make RaftTypeConfig always Send + Sync?
This way RaftInner will automatically implement Send +Sync, and then Raft will automatically implement Send + Sync.

Copy link
Collaborator Author

@schreter schreter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 16 of 19 files reviewed, 1 unresolved discussion (waiting on @drmingdrmer)


openraft/src/raft/mod.rs line 174 at r3 (raw file):

Previously, drmingdrmer (张炎泼) wrote…

What about make RaftTypeConfig always Send + Sync?
This way RaftInner will automatically implement Send +Sync, and then Raft will automatically implement Send + Sync.

That's not correct either.

If either C::D or C::R is !Send, then also Raft is !Send, since it would be an error to send the request or response across threads (say, what if C::D contains a Rc?). Similar for other external types here.

However, RaftNetworkFactory, RaftLogStorage and RaftStateMachine are used only within the single-threaded scheduler running the !Send future of various Raft tasks. So they are irrelevant for Raft being Send or not.

Copy link
Member

@drmingdrmer drmingdrmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 16 of 19 files reviewed, 1 unresolved discussion (waiting on @schreter)


openraft/src/raft/mod.rs line 174 at r3 (raw file):

Previously, schreter wrote…

That's not correct either.

If either C::D or C::R is !Send, then also Raft is !Send, since it would be an error to send the request or response across threads (say, what if C::D contains a Rc?). Similar for other external types here.

However, RaftNetworkFactory, RaftLogStorage and RaftStateMachine are used only within the single-threaded scheduler running the !Send future of various Raft tasks. So they are irrelevant for Raft being Send or not.

You are right. What I mean is that if RaftTypeConfig is always Send + Sync, then whether RaftInner is Sync+Send will be totally determined by whether C::D, C::R are Send + Sync. This way the manual implementation of Send for Raft can be removed. Is this correct?

At the application level, the Send+Sync status of Raft is irrelevant toRaftNetworkFactory, given our understanding that it is not used in a multithreaded context. However, I hope Raft to safely implement Send+Sync safely at the language level. Essentially, I hope that future contributors to Openraft can write correct codes without needing to be aware of these underlying assumptions.

Copy link
Collaborator Author

@schreter schreter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 16 of 19 files reviewed, 1 unresolved discussion (waiting on @drmingdrmer)


openraft/src/raft/mod.rs line 174 at r3 (raw file):

Previously, drmingdrmer (张炎泼) wrote…

You are right. What I mean is that if RaftTypeConfig is always Send + Sync, then whether RaftInner is Sync+Send will be totally determined by whether C::D, C::R are Send + Sync. This way the manual implementation of Send for Raft can be removed. Is this correct?

At the application level, the Send+Sync status of Raft is irrelevant toRaftNetworkFactory, given our understanding that it is not used in a multithreaded context. However, I hope Raft to safely implement Send+Sync safely at the language level. Essentially, I hope that future contributors to Openraft can write correct codes without needing to be aware of these underlying assumptions.

The problem is, if RaftStorage, RaftNetworkFactory or any other generic type of Raft is !Send (which it isn't in our case), then Raft itself will be !Send transitively. I.e., you have to explicitly implement Send/Sync in order to have them on Raft, if some generics, although only used internally, are !Send. So it can't be automatically derived, unless we'd remove N, LS and SM generics from Raft. But then again, RaftTypeConfig would be Send, even if one of its associated types, such as C::R is !Send, which is again wrong.

Copy link
Collaborator Author

@schreter schreter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 16 of 19 files reviewed, 1 unresolved discussion (waiting on @drmingdrmer)


openraft/src/raft/mod.rs line 174 at r3 (raw file):

Previously, schreter wrote…

The problem is, if RaftStorage, RaftNetworkFactory or any other generic type of Raft is !Send (which it isn't in our case), then Raft itself will be !Send transitively. I.e., you have to explicitly implement Send/Sync in order to have them on Raft, if some generics, although only used internally, are !Send. So it can't be automatically derived, unless we'd remove N, LS and SM generics from Raft. But then again, RaftTypeConfig would be Send, even if one of its associated types, such as C::R is !Send, which is again wrong.

OTOH, having Raft always forced to be Send/Sync would still probably work correctly from the language POV, the async futures generated are !Send and the channel would complain about sending !Send values - I'll have to test it a bit more thoroughly with R: !Send (currently, we use Send request/response, since those are generated in a different thread(s) than where the Raft state machine runs).

Independent of that, I think the current declaration here is the best we can get right now.

Copy link
Member

@drmingdrmer drmingdrmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 16 of 19 files reviewed, all discussions resolved


openraft/src/raft/mod.rs line 174 at r3 (raw file):

Previously, schreter wrote…

OTOH, having Raft always forced to be Send/Sync would still probably work correctly from the language POV, the async futures generated are !Send and the channel would complain about sending !Send values - I'll have to test it a bit more thoroughly with R: !Send (currently, we use Send request/response, since those are generated in a different thread(s) than where the Raft state machine runs).

Independent of that, I think the current declaration here is the best we can get right now.

Let's address the first issue: we need to ensure that RaftTypeConfig is always Send+Sync. Does this approach seem suitable to you? As a mere type container, it's the associated types, rather than the container itself, that influence the Send+Sync capability of other types.

Copy link
Collaborator Author

@schreter schreter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 16 of 19 files reviewed, all discussions resolved (waiting on @drmingdrmer)


openraft/src/raft/mod.rs line 174 at r3 (raw file):

Previously, drmingdrmer (张炎泼) wrote…

Let's address the first issue: we need to ensure that RaftTypeConfig is always Send+Sync. Does this approach seem suitable to you? As a mere type container, it's the associated types, rather than the container itself, that influence the Send+Sync capability of other types.

Well, we can make RaftTypeConfig always Send+Sync, but that won't help me with the Raft still not being Send+Sync, even though request and response are. In fact, I just checked on our !Send Raft with Send request/response with this change applied: the RaftTypeConfig is Send (only Raft isn't).

Of course, I can put Raft in a newtype and force it on my side, but that's suboptimal.

But, do you see an issue with the current declaration? I think it's now correct.

Copy link
Member

@drmingdrmer drmingdrmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 16 of 19 files reviewed, all discussions resolved


openraft/src/raft/mod.rs line 174 at r3 (raw file):

do you see an issue with the current declaration? I think it's now correct.

Yes, I see that. And what I'm trying to do is to avoid using unsafe impl Send. In this scenario, adding another field to Raft(or RaftInner) may break the assumption.

Thus the second step is to remove the dependency of N, LS, SM from Raft.
Raft has type parameters N, LS because of the dependency chain: Raft<C, N, LS, SM> -> RaftInner<C, N, LS> -> RaftMsg<C, N, LS>. In RaftMsg, N,LS is only used in variant ExternalRequest. These two types can be removed, and be replaced with dyn RaftLogStorage and dyn RaftNetworkFactory.

We can do the similar to type parameter SM and finally Raft<...> becomes Raft<C>. And Raft is Send if all of its fields are Send.

Code snippet:

    ExternalRequest {
        #[allow(clippy::type_complexity)]
        req: Box<
            dyn FnOnce(&RaftState<C::NodeId, C::Node, <C::AsyncRuntime as AsyncRuntime>::Instant>, &mut LS, &mut N)
                + Send
                + 'static,
        >,
    },

Copy link
Collaborator Author

@schreter schreter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 16 of 19 files reviewed, all discussions resolved (waiting on @drmingdrmer)


openraft/src/raft/mod.rs line 174 at r3 (raw file):

And what I'm trying to do is to avoid using unsafe impl Send.

Ah, OK, now I understand your motivation :-).

In this scenario, adding another field to Raft(or RaftInner) may break the assumption.

That is correct, yes.

Thus the second step is to remove the dependency of N, LS, SM from Raft.

Maybe this can be done differently, put N, LS and SM into another config trait, say StorageConfig, so they wouldn't appear as generics on the API directly. With that, the Raft type would be Send w/o any need for unsafe. Also, it makes it easier to extend it in the future.

These two types can be removed, and be replaced with dyn RaftLogStorage and dyn RaftNetworkFactory.

Please no dyn! That breaks zero-cost abstraction. You won't have more than one implementation in a real application which you need to dynamically change. Thus, it would just add unnecessary dynamic dispatch costs and cement the memory-allocating calls via async trait methods (though, you are still free to use dynamic types explicitly).

Rust community is actively working on async trait support with zero-cost abstraction. The work is progressing nicely and we actually already use them in our project (stable Rust with a trick to enable some of the features we need). Much better/faster than #[async_trait] dispatching.

Thus, the mid-term solution is to change generics to associated types on a second config trait, which will solve the unsafe issue.

Do we want to merge this as-is (with the manually-implemented Send/Sync bounds) or should I remove the unsafe impl for now? I don't see the big problem with the implementation, since it's anyway only relevant for single-threaded Raft and we are likely the only user for now.

The refactoring would be probably helpful in short term already. Most likely, only Raft and RaftInner need to take this new type trait with associated types, the remaining code can be used as-is by configuring their generics with associated types. The problem is API breakage, since all users will have to adjust.

What do you think?

Copy link
Member

@drmingdrmer drmingdrmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 16 of 19 files reviewed, all discussions resolved


openraft/src/raft/mod.rs line 174 at r3 (raw file):

Previously, schreter wrote…

And what I'm trying to do is to avoid using unsafe impl Send.

Ah, OK, now I understand your motivation :-).

In this scenario, adding another field to Raft(or RaftInner) may break the assumption.

That is correct, yes.

Thus the second step is to remove the dependency of N, LS, SM from Raft.

Maybe this can be done differently, put N, LS and SM into another config trait, say StorageConfig, so they wouldn't appear as generics on the API directly. With that, the Raft type would be Send w/o any need for unsafe. Also, it makes it easier to extend it in the future.

These two types can be removed, and be replaced with dyn RaftLogStorage and dyn RaftNetworkFactory.

Please no dyn! That breaks zero-cost abstraction. You won't have more than one implementation in a real application which you need to dynamically change. Thus, it would just add unnecessary dynamic dispatch costs and cement the memory-allocating calls via async trait methods (though, you are still free to use dynamic types explicitly).

Rust community is actively working on async trait support with zero-cost abstraction. The work is progressing nicely and we actually already use them in our project (stable Rust with a trick to enable some of the features we need). Much better/faster than #[async_trait] dispatching.

Thus, the mid-term solution is to change generics to associated types on a second config trait, which will solve the unsafe issue.

Do we want to merge this as-is (with the manually-implemented Send/Sync bounds) or should I remove the unsafe impl for now? I don't see the big problem with the implementation, since it's anyway only relevant for single-threaded Raft and we are likely the only user for now.

The refactoring would be probably helpful in short term already. Most likely, only Raft and RaftInner need to take this new type trait with associated types, the remaining code can be used as-is by configuring their generics with associated types. The problem is API breakage, since all users will have to adjust.

What do you think?

Your suggestions look good. This PR can be merged since there is no breaking change. Refactoring Raft/RaftInner can be done in the another PR.

About dyn:
Using dyn does not have to mean dynamic dispatch. Since there is only one type in an application for RaftNetworkFactory, user can just downcast it to a concrete type with: downcast_mut.

What say you?

Copy link
Collaborator Author

@schreter schreter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 16 of 19 files reviewed, all discussions resolved (waiting on @drmingdrmer)


openraft/src/raft/mod.rs line 174 at r3 (raw file):

Using dyn does not have to mean dynamic dispatch. Since there is only one type in an application for RaftNetworkFactory, user can just downcast it to a concrete type with: downcast_mut.

That's not quite true. If you want to downcast manually somewhere, then yes, you can downcast. But we are talking about innards of the engine, where the engine itself is held using Arc<dyn T>, i.e., about openraft coding, where the user is not in control. The engine has no idea about concrete type if it's hidden behind a dyn T. So posting an update to Raft will always require dynamic dispatch.

Dynamic dispatch sending data to the engine as such would be most likely unproblematic, since the amount of work to be done elsewhere in the engine is relatively high (and we have also elsewhere dynamic dispatch).

However, I see a major problem elsewhere - currently, each call to async trait method requires dynamic dispatch and a memory allocation/deallocation. This allocation/deallocation pair is quite costly, even compared to the amount of work done (and, we have even multiple nested calls).

The language-level async trait would not do a direct call, but a memory allocation as well (in dyn T case), since it's impossible to know the future size (and, BTW, this is so far unimplemented). With regular dispatch (concrete type), the compiler knows the size of the future and generates a static dispatch without memory allocation (i.e., it generates a composite future for the caller). This is vastly more efficient and can be also optimized across async methods.

Another aspect is the out-of-memory situation handling. We are building production software with Rust and must ensure that there is a well-defined response to all errors. Memory-allocating dispatch is a source of endless potential malfunctions, which we need to test for and ensure they are handled properly (we are certifying that the entire call chain is properly exception-safe and memory-leak-safe; this is quite a challenge and also one of the reasons why we wrote our own async runtime).

BTW, I hope that we'll be allowed to share the crates for malfunction testing, diagnosis, etc. with the community. It's just very hard to get it started at a big company (only this week we finally got "external stores legal review" OK-ed permitting us to publish crates to crates.io!). At least we are allowed to contribute upstream to existing projects.

Copy link
Collaborator Author

@schreter schreter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 17 of 19 files at r1, 1 of 1 files at r2, 1 of 1 files at r4, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @schreter)

Copy link
Collaborator Author

@schreter schreter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(BTW, I don't know why Reviewable is waiting on me, going to try to merge, hopefully it'll work 🙂).

Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @schreter)

@schreter schreter added this pull request to the merge queue Nov 18, 2023
Merged via the queue into databendlabs:main with commit 4ad89fe Nov 18, 2023
24 checks passed
@schreter schreter deleted the no_sync branch November 18, 2023 15:43
@drmingdrmer drmingdrmer mentioned this pull request Mar 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants