-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: For single-threaded execution, there is no need for Send
/Sync
bounds
#934
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 19 files reviewed, all discussions resolved (waiting on @drmingdrmer)
openraft/src/core/raft_msg/mod.rs
line 97 at r1 (raw file):
req: Box< dyn FnOnce(&RaftState<C::NodeId, C::Node, <C::AsyncRuntime as AsyncRuntime>::Instant>, &mut LS, &mut N) + Send
Note: It's not possible to use OptionalSend
here, since only auto-traits are allowed. Therefore, two instantiations with different #[cfg]
.
openraft/src/network/backoff.rs
line 9 at r1 (raw file):
pub struct Backoff { #[cfg(not(feature = "singlethreaded"))] inner: Box<dyn Iterator<Item = Duration> + Send + 'static>,
Same here
openraft/src/timer/timeout.rs
line 70 at r1 (raw file):
}; RT::spawn(inner.sleep_loop(rx, callback).instrument(trace_span!("timeout-loop").or_current()));
We can't use tokio::spawn
and sleep functions, since they are dependent on Tokio runtime and won't work with a different async runtime. So I added AsyncRuntime
generic argument here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 19 files reviewed, all discussions resolved (waiting on @drmingdrmer)
openraft/src/timer/timeout_test.rs
line 21 at r2 (raw file):
fn test_timeout() -> anyhow::Result<()> { let rt = tokio::runtime::Builder::new_current_thread().enable_time().build().unwrap(); tokio::task::LocalSet::new().block_on(&rt, test_timeout_inner())
There indeed is a check in CI/CD which builds the single-threaded runtime. However, only a single test actually ran using async code - this one. So I adjusted it locally for now, but later it would be good to handle this with a proc-macro or similar to have one entry point type testing both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 16 of 19 files at r1, all commit messages.
Reviewable status: 16 of 19 files reviewed, 2 unresolved discussions (waiting on @schreter)
openraft/src/async_runtime.rs
line 19 at r3 (raw file):
/// /// The default asynchronous runtime is `tokio`. pub trait AsyncRuntime: Debug + Default + OptionalSend + OptionalSync + 'static {
Do we need an AsyncRuntime
to be Send
and Sync
?
It looks like Openraft does not need to operate a runtime, e.g. sending a runtime to another thread.
openraft/src/raft/mod.rs
line 174 at r3 (raw file):
C::R: Send, { }
In my opinion, simply having these types implement Send
is not sufficient for Raft
to implement Send
.
For example, for C::NodeId
,
because Raft
has a field of inner: Arc<RaftInner>
, to make Raft
Send
, RaftInner
must be Sync
.
Therefore RaftInner.id: C::NodeId
must be Sync
:
https://github.com/datafuselabs/openraft/blob/4c488a61da8fee614294dea107fdbf0faf15cd28/openraft/src/raft/raft_inner.rs#L24-L30
Code quote:
// SAFETY: Even for a single-threaded Raft, the API object is MT-capable.
//
// The API object just sends the requests to the Raft loop over a channel. If all the relevant
// types in the type config are `Send`, then it's safe to send the request across threads over
// the channel.
//
// Notably, the state machine, log storage and network factory DO NOT have to be `Send`, those
// are only used within Raft task(s) on a single thread.
unsafe impl<C, N, LS, SM> Send for Raft<C, N, LS, SM>
where
C: RaftTypeConfig,
N: RaftNetworkFactory<C>,
LS: RaftLogStorage<C>,
SM: RaftStateMachine<C>,
C::D: Send,
C::Entry: Send,
C::Node: Send,
C::NodeId: Send,
C::R: Send,
{
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 16 of 19 files reviewed, 2 unresolved discussions (waiting on @drmingdrmer)
openraft/src/async_runtime.rs
line 19 at r3 (raw file):
Previously, drmingdrmer (张炎泼) wrote…
Do we need an
AsyncRuntime
to beSend
andSync
?It looks like Openraft does not need to operate a runtime, e.g. sending a runtime to another thread.
Not really, no. But, OTOH, the Send
/Sync
for a struct is only derived if all generics have Send
/Sync
. So this makes it probably simpler, otherwise we'd have to explicitly implement Send
/Sync
on many other objects.
openraft/src/raft/mod.rs
line 174 at r3 (raw file):
Previously, drmingdrmer (张炎泼) wrote…
In my opinion, simply having these types implement
Send
is not sufficient forRaft
to implementSend
.For example, for
C::NodeId
,
becauseRaft
has a field ofinner: Arc<RaftInner>
, to makeRaft
Send
,RaftInner
must beSync
.
ThereforeRaftInner.id: C::NodeId
must beSync
:
https://github.com/datafuselabs/openraft/blob/4c488a61da8fee614294dea107fdbf0faf15cd28/openraft/src/raft/raft_inner.rs#L24-L30
Though strictly speaking you are right, the NodeId
(and also other types mentioned here) are never shared by reference in non-Raft tasks. They are only sent over the channel.
Nevertheless, I can add additional bounds for Node
and NodeId
requiring them to be Sync
as well to be on the safe side.
…` bounds. Some preliminary implementation with `OptionalSend`/`OptionalSync` was done previously, it was not complete, though. Currently, `Send`/`Sync` bounds are required for `AppData`, `RaftEntry`, async runtime wrappers and other types, but they are only relevant for multi-threaded access. If `openraft` is configured to use single-threaded runtime, then they don't have to be `Send`/`Sync`. Consequently replace `Send`/`Sync` bounds with `OptionalSend`/`OptionalSync` bounds to clean it up. Even if there are no `Send`/`Sync` bounds required, the `Raft` object can still be `Send`/`Sync` capable. It is only an API object sending requests to the Raft main loop over a channel. As long as the involved data types are `Send` (and some of them, which are used in `RaftInner`, also `Sync`), we can declare `Raft` as such `Send`/`Sync`. This change also fixes `timeout.rs`, which seems to be unused so far, but didn't properly use `AsyncRuntime` abstraction. Two points for the future: - We should add a generic test invocation to invoke tests on `LocalSet` for tests with `singlethreaded` feature. Currently, there is a single test only, in `timeout_test.rs`. - Later, even `Arc` could be replaced with a single-threaded counterpart to prevent the need for atomic refcounting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 16 of 19 files reviewed, 2 unresolved discussions (waiting on @drmingdrmer)
openraft/src/raft/mod.rs
line 174 at r3 (raw file):
Previously, schreter wrote…
Though strictly speaking you are right, the
NodeId
(and also other types mentioned here) are never shared by reference in non-Raft tasks. They are only sent over the channel.Nevertheless, I can add additional bounds for
Node
andNodeId
requiring them to beSync
as well to be on the safe side.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 16 of 19 files reviewed, 1 unresolved discussion (waiting on @schreter)
openraft/src/async_runtime.rs
line 19 at r3 (raw file):
Previously, schreter wrote…
Not really, no. But, OTOH, the
Send
/Sync
for a struct is only derived if all generics haveSend
/Sync
. So this makes it probably simpler, otherwise we'd have to explicitly implementSend
/Sync
on many other objects.
Make sense
openraft/src/raft/mod.rs
line 174 at r3 (raw file):
Previously, schreter wrote…
Done
What about make RaftTypeConfig
always Send + Sync
?
This way RaftInner
will automatically implement Send +Sync
, and then Raft
will automatically implement Send + Sync
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 16 of 19 files reviewed, 1 unresolved discussion (waiting on @drmingdrmer)
openraft/src/raft/mod.rs
line 174 at r3 (raw file):
Previously, drmingdrmer (张炎泼) wrote…
What about make
RaftTypeConfig
alwaysSend + Sync
?
This wayRaftInner
will automatically implementSend +Sync
, and thenRaft
will automatically implementSend + Sync
.
That's not correct either.
If either C::D
or C::R
is !Send
, then also Raft
is !Send
, since it would be an error to send the request or response across threads (say, what if C::D
contains a Rc
?). Similar for other external types here.
However, RaftNetworkFactory
, RaftLogStorage
and RaftStateMachine
are used only within the single-threaded scheduler running the !Send
future of various Raft tasks. So they are irrelevant for Raft
being Send
or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 16 of 19 files reviewed, 1 unresolved discussion (waiting on @schreter)
openraft/src/raft/mod.rs
line 174 at r3 (raw file):
Previously, schreter wrote…
That's not correct either.
If either
C::D
orC::R
is!Send
, then alsoRaft
is!Send
, since it would be an error to send the request or response across threads (say, what ifC::D
contains aRc
?). Similar for other external types here.However,
RaftNetworkFactory
,RaftLogStorage
andRaftStateMachine
are used only within the single-threaded scheduler running the!Send
future of various Raft tasks. So they are irrelevant forRaft
beingSend
or not.
You are right. What I mean is that if RaftTypeConfig
is always Send + Sync
, then whether RaftInner
is Sync+Send
will be totally determined by whether C::D
, C::R
are Send + Sync
. This way the manual implementation of Send for Raft
can be removed. Is this correct?
At the application level, the Send+Sync
status of Raft
is irrelevant toRaftNetworkFactory
, given our understanding that it is not used in a multithreaded context. However, I hope Raft
to safely implement Send+Sync
safely at the language level. Essentially, I hope that future contributors to Openraft can write correct codes without needing to be aware of these underlying assumptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 16 of 19 files reviewed, 1 unresolved discussion (waiting on @drmingdrmer)
openraft/src/raft/mod.rs
line 174 at r3 (raw file):
Previously, drmingdrmer (张炎泼) wrote…
You are right. What I mean is that if
RaftTypeConfig
is alwaysSend + Sync
, then whetherRaftInner
isSync+Send
will be totally determined by whetherC::D
,C::R
areSend + Sync
. This way the manual implementation ofSend for Raft
can be removed. Is this correct?At the application level, the
Send+Sync
status ofRaft
is irrelevant toRaftNetworkFactory
, given our understanding that it is not used in a multithreaded context. However, I hopeRaft
to safely implementSend+Sync
safely at the language level. Essentially, I hope that future contributors to Openraft can write correct codes without needing to be aware of these underlying assumptions.
The problem is, if RaftStorage
, RaftNetworkFactory
or any other generic type of Raft
is !Send
(which it isn't in our case), then Raft
itself will be !Send
transitively. I.e., you have to explicitly implement Send
/Sync
in order to have them on Raft
, if some generics, although only used internally, are !Send
. So it can't be automatically derived, unless we'd remove N
, LS
and SM
generics from Raft
. But then again, RaftTypeConfig
would be Send
, even if one of its associated types, such as C::R
is !Send
, which is again wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 16 of 19 files reviewed, 1 unresolved discussion (waiting on @drmingdrmer)
openraft/src/raft/mod.rs
line 174 at r3 (raw file):
Previously, schreter wrote…
The problem is, if
RaftStorage
,RaftNetworkFactory
or any other generic type ofRaft
is!Send
(which it isn't in our case), thenRaft
itself will be!Send
transitively. I.e., you have to explicitly implementSend
/Sync
in order to have them onRaft
, if some generics, although only used internally, are!Send
. So it can't be automatically derived, unless we'd removeN
,LS
andSM
generics fromRaft
. But then again,RaftTypeConfig
would beSend
, even if one of its associated types, such asC::R
is!Send
, which is again wrong.
OTOH, having Raft
always forced to be Send
/Sync
would still probably work correctly from the language POV, the async futures generated are !Send
and the channel would complain about sending !Send
values - I'll have to test it a bit more thoroughly with R: !Send
(currently, we use Send
request/response, since those are generated in a different thread(s) than where the Raft state machine runs).
Independent of that, I think the current declaration here is the best we can get right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 16 of 19 files reviewed, all discussions resolved
openraft/src/raft/mod.rs
line 174 at r3 (raw file):
Previously, schreter wrote…
OTOH, having
Raft
always forced to beSend
/Sync
would still probably work correctly from the language POV, the async futures generated are!Send
and the channel would complain about sending!Send
values - I'll have to test it a bit more thoroughly withR: !Send
(currently, we useSend
request/response, since those are generated in a different thread(s) than where the Raft state machine runs).Independent of that, I think the current declaration here is the best we can get right now.
Let's address the first issue: we need to ensure that RaftTypeConfig
is always Send+Sync
. Does this approach seem suitable to you? As a mere type container, it's the associated types, rather than the container itself, that influence the Send+Sync
capability of other types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 16 of 19 files reviewed, all discussions resolved (waiting on @drmingdrmer)
openraft/src/raft/mod.rs
line 174 at r3 (raw file):
Previously, drmingdrmer (张炎泼) wrote…
Let's address the first issue: we need to ensure that
RaftTypeConfig
is alwaysSend+Sync
. Does this approach seem suitable to you? As a mere type container, it's the associated types, rather than the container itself, that influence theSend+Sync
capability of other types.
Well, we can make RaftTypeConfig
always Send+Sync
, but that won't help me with the Raft
still not being Send+Sync
, even though request and response are. In fact, I just checked on our !Send
Raft with Send
request/response with this change applied: the RaftTypeConfig is Send
(only Raft
isn't).
Of course, I can put Raft
in a newtype and force it on my side, but that's suboptimal.
But, do you see an issue with the current declaration? I think it's now correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 16 of 19 files reviewed, all discussions resolved
openraft/src/raft/mod.rs
line 174 at r3 (raw file):
do you see an issue with the current declaration? I think it's now correct.
Yes, I see that. And what I'm trying to do is to avoid using unsafe impl Send
. In this scenario, adding another field to Raft
(or RaftInner
) may break the assumption.
Thus the second step is to remove the dependency of N, LS, SM
from Raft
.
Raft
has type parameters N, LS
because of the dependency chain: Raft<C, N, LS, SM> -> RaftInner<C, N, LS> -> RaftMsg<C, N, LS>
. In RaftMsg
, N,LS
is only used in variant ExternalRequest
. These two types can be removed, and be replaced with dyn RaftLogStorage
and dyn RaftNetworkFactory
.
We can do the similar to type parameter SM
and finally Raft<...>
becomes Raft<C>
. And Raft
is Send
if all of its fields are Send
.
Code snippet:
ExternalRequest {
#[allow(clippy::type_complexity)]
req: Box<
dyn FnOnce(&RaftState<C::NodeId, C::Node, <C::AsyncRuntime as AsyncRuntime>::Instant>, &mut LS, &mut N)
+ Send
+ 'static,
>,
},
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 16 of 19 files reviewed, all discussions resolved (waiting on @drmingdrmer)
openraft/src/raft/mod.rs
line 174 at r3 (raw file):
And what I'm trying to do is to avoid using
unsafe impl Send
.
Ah, OK, now I understand your motivation :-).
In this scenario, adding another field to
Raft
(orRaftInner
) may break the assumption.
That is correct, yes.
Thus the second step is to remove the dependency of
N, LS, SM
fromRaft
.
Maybe this can be done differently, put N
, LS
and SM
into another config trait, say StorageConfig
, so they wouldn't appear as generics on the API directly. With that, the Raft
type would be Send
w/o any need for unsafe. Also, it makes it easier to extend it in the future.
These two types can be removed, and be replaced with
dyn RaftLogStorage
anddyn RaftNetworkFactory
.
Please no dyn
! That breaks zero-cost abstraction. You won't have more than one implementation in a real application which you need to dynamically change. Thus, it would just add unnecessary dynamic dispatch costs and cement the memory-allocating calls via async trait methods (though, you are still free to use dynamic types explicitly).
Rust community is actively working on async trait support with zero-cost abstraction. The work is progressing nicely and we actually already use them in our project (stable Rust with a trick to enable some of the features we need). Much better/faster than #[async_trait]
dispatching.
Thus, the mid-term solution is to change generics to associated types on a second config trait, which will solve the unsafe issue.
Do we want to merge this as-is (with the manually-implemented Send
/Sync
bounds) or should I remove the unsafe impl for now? I don't see the big problem with the implementation, since it's anyway only relevant for single-threaded Raft
and we are likely the only user for now.
The refactoring would be probably helpful in short term already. Most likely, only Raft
and RaftInner
need to take this new type trait with associated types, the remaining code can be used as-is by configuring their generics with associated types. The problem is API breakage, since all users will have to adjust.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 16 of 19 files reviewed, all discussions resolved
openraft/src/raft/mod.rs
line 174 at r3 (raw file):
Previously, schreter wrote…
And what I'm trying to do is to avoid using
unsafe impl Send
.Ah, OK, now I understand your motivation :-).
In this scenario, adding another field to
Raft
(orRaftInner
) may break the assumption.That is correct, yes.
Thus the second step is to remove the dependency of
N, LS, SM
fromRaft
.Maybe this can be done differently, put
N
,LS
andSM
into another config trait, sayStorageConfig
, so they wouldn't appear as generics on the API directly. With that, theRaft
type would beSend
w/o any need for unsafe. Also, it makes it easier to extend it in the future.These two types can be removed, and be replaced with
dyn RaftLogStorage
anddyn RaftNetworkFactory
.Please no
dyn
! That breaks zero-cost abstraction. You won't have more than one implementation in a real application which you need to dynamically change. Thus, it would just add unnecessary dynamic dispatch costs and cement the memory-allocating calls via async trait methods (though, you are still free to use dynamic types explicitly).Rust community is actively working on async trait support with zero-cost abstraction. The work is progressing nicely and we actually already use them in our project (stable Rust with a trick to enable some of the features we need). Much better/faster than
#[async_trait]
dispatching.Thus, the mid-term solution is to change generics to associated types on a second config trait, which will solve the unsafe issue.
Do we want to merge this as-is (with the manually-implemented
Send
/Sync
bounds) or should I remove the unsafe impl for now? I don't see the big problem with the implementation, since it's anyway only relevant for single-threadedRaft
and we are likely the only user for now.The refactoring would be probably helpful in short term already. Most likely, only
Raft
andRaftInner
need to take this new type trait with associated types, the remaining code can be used as-is by configuring their generics with associated types. The problem is API breakage, since all users will have to adjust.What do you think?
Your suggestions look good. This PR can be merged since there is no breaking change. Refactoring Raft/RaftInner
can be done in the another PR.
About dyn
:
Using dyn
does not have to mean dynamic dispatch. Since there is only one type in an application for RaftNetworkFactory
, user can just downcast it to a concrete type with: downcast_mut
.
What say you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 16 of 19 files reviewed, all discussions resolved (waiting on @drmingdrmer)
openraft/src/raft/mod.rs
line 174 at r3 (raw file):
Using
dyn
does not have to mean dynamic dispatch. Since there is only one type in an application forRaftNetworkFactory
, user can just downcast it to a concrete type with:downcast_mut
.
That's not quite true. If you want to downcast manually somewhere, then yes, you can downcast. But we are talking about innards of the engine, where the engine itself is held using Arc<dyn T>
, i.e., about openraft
coding, where the user is not in control. The engine has no idea about concrete type if it's hidden behind a dyn T
. So posting an update to Raft will always require dynamic dispatch.
Dynamic dispatch sending data to the engine as such would be most likely unproblematic, since the amount of work to be done elsewhere in the engine is relatively high (and we have also elsewhere dynamic dispatch).
However, I see a major problem elsewhere - currently, each call to async trait method requires dynamic dispatch and a memory allocation/deallocation. This allocation/deallocation pair is quite costly, even compared to the amount of work done (and, we have even multiple nested calls).
The language-level async trait would not do a direct call, but a memory allocation as well (in dyn T
case), since it's impossible to know the future size (and, BTW, this is so far unimplemented). With regular dispatch (concrete type), the compiler knows the size of the future and generates a static dispatch without memory allocation (i.e., it generates a composite future for the caller). This is vastly more efficient and can be also optimized across async methods.
Another aspect is the out-of-memory situation handling. We are building production software with Rust and must ensure that there is a well-defined response to all errors. Memory-allocating dispatch is a source of endless potential malfunctions, which we need to test for and ensure they are handled properly (we are certifying that the entire call chain is properly exception-safe and memory-leak-safe; this is quite a challenge and also one of the reasons why we wrote our own async runtime).
BTW, I hope that we'll be allowed to share the crates for malfunction testing, diagnosis, etc. with the community. It's just very hard to get it started at a big company (only this week we finally got "external stores legal review" OK-ed permitting us to publish crates to crates.io!). At least we are allowed to contribute upstream to existing projects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 17 of 19 files at r1, 1 of 1 files at r2, 1 of 1 files at r4, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @schreter)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(BTW, I don't know why Reviewable is waiting on me, going to try to merge, hopefully it'll work 🙂).
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @schreter)
Some preliminary implementation with
OptionalSend
/OptionalSync
was done previously, it was not complete, though.Currently,
Send
/Sync
bounds are required forAppData
,RaftEntry
, async runtime wrappers and other types, but they are only relevant for multi-threaded access. Ifopenraft
is configured to use single-threaded runtime, then they don't have to beSend
/Sync
.Consequently replace
Send
/Sync
bounds withOptionalSend
/OptionalSync
bounds to clean it up.This change also fixes
timeout.rs
, which seems to be unused so far, but didn't properly useAsyncRuntime
abstraction.Two points for the future:
singlethreaded
feature is set to test it.Arc
could be replaced with a single-threaded counterpart to prevent the need for atomic refcounting.This change is