-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make listener broadcast thread safe #31692
Make listener broadcast thread safe #31692
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
c6d93e5
to
4ccac50
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Were other synchronization technics been considered? Say, make the broadcast field as a |
The following builds have passed: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments inline.
new Thread() { | ||
@Override | ||
void run() { | ||
${server.callFromBuildUsingExpression("'configure-' + project.name")} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❓ Why do we need this thread? We start it, we block it, but it just lives in background, I don't see how it impacts the configuration of the projects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing it straight from the script body would mean that if we had more projects (p) than workers (w), we would never see more than w requests, as there would be no workers left to run the other scripts. The result would be a timeout waiting for the other (p - w) requests that we were expecting.
TBH, I don't think we would need to block on the HTTP server anyways. As long as we can verify the taskgraph.whenReady
listeners being invoked via the console output, we should be good. Right, @alllex ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good catch about the number of workers, Rafael.
The test itself is not designed to prove that the implementation is thread-safe but to expose a case when an implementation is more obviously unsafe. So, it goes by ensuring multiple project evaluations (running in parallel) reach the same point in build logic and then start spamming the callback registration requests. The number of projects does not matter as much -- the important point is to get contention on callback registration. To that end, we still should have the blocking server, ensuring all projects are being configured in parallel and all of them are aligned before the "start line" and start spamming together.
Rather than blocking in an anonymous thread, I would resolve the issue with workers by making it match the number of projects (--max-workers
? and +1 for the root). It's okay if we want to reduce the number of projects for this, as we can, for instance, increase the number of listeners per project.
I would also leave a comment somewhere that this test is more of a soak test than a usual integration test, as it does not guarantee correctness, but instead detects some cases of unsafe implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification, Alex. Indeed, this test passing once does not guarantee correctness, but at the same time I think that is the case with many of our concurrency tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the test not to use ad hoc background threads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ There is a data race on reading the broadcast
field. It should be volatile
or its reads should be synchronized.
You can use @GuardedBy
annotation to catch issues if you decide to go with the synchronized
approach.
I suspect, source
should be massaged to become thread-safe too (by e.g. wrapping it in Lazy
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the data race for broadcast
, Mike? What concrete case could cause a race?
I figured that since broadcast
objects (BroadcastDispatch
) are immutable, and are private to the owning ListenerBroadcast
, we would be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the data race for
broadcast
, Mike? What concrete case could cause a race?
As the reads aren't synchronized, there is nothing that makes an updated broadcast value visible to other threads. This means that we may still miss notifications, because the sender thread will see some obsolete list of listeners. With mutable classes it can also cause the reading thread to observe partially constructed object, but for immutable things like BroadcastDispatch
it isn't an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went with volatile
. DId nothing about source
. Is the issue about us ending up with multiple ProxyDispatchAdapter
instances created for a single ListenerBroadcast
? If so, is that really a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If so, is that really a problem?
Well, it isn't particularly cheap to create. And I personally prefer to follow safe publication rules unless I have a very good reason not to do so, like prohibitive performance cost. Saves brain cycles by not thinking whether it is actually safe, and helps with future proofing the code.
With unsafe publication we have three options:
- One thread writes
source
, other threads see the fully constructed written instance. This is a good outcome. - Multiple threads write
source
, then they and other threads see the fully constructed but different instances. This is fine, though we pay the price of creating several sources. - One or multiple threads write
source
, other threads see partially constructed instances. Then we see various Heisenbugs with weird stacktraces. For now theProxyDispatchAdapter
is immutable, so it won't happen because of the freezing action in the constructor, but if someone adds something lazily initialized there, the code will break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding source
, I just made it to lock on the ListenerBroacast
as well. Looked into using Lazy
, however it is not compatible with clients that are stuck with Java 6 source level (static method on interfaces are not supported).
...re-runtime/messaging/src/main/java/org/gradle/internal/event/AnonymousListenerBroadcast.java
Show resolved
Hide resolved
...gradle/internal/cc/impl/isolated/IsolatedProjectsParallelConfigurationIntegrationTest.groovy
Show resolved
Hide resolved
25678f6
to
037fff6
Compare
@6hundreds I looked into it, however that project is using Java 6 source compatibility, so no lambdas, meaning the code would become much more verbose. Have you seen |
037fff6
to
4bcedd6
Compare
@bot-gradle test APT |
This comment has been minimized.
This comment has been minimized.
The following builds have passed: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ I believe we should address thread-safety of the source
too.
@@ -29,7 +29,7 @@ public AnonymousListenerBroadcast(Class<T> type, Dispatch<MethodInvocation> forw | |||
} | |||
|
|||
@Override | |||
public void removeAll() { | |||
public synchronized void removeAll() { | |||
super.removeAll(); | |||
add(forwardingDispatch); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ as we don't use synchronized
when reading the broadcast
, we may see it between removeAll
and add(forwardingDispatch)
, so we may miss broadcasting to forwardingDispatch
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Introduced a method in the base class to replace broadcast
atomically
Please take another look, Mike |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you for your patience!
public void removeAll() { | ||
super.removeAll(); | ||
add(forwardingDispatch); | ||
public synchronized void removeAll() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💭 synchronized
on this method should have been enough, but this is fine too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... I got it wrong then - I thought the fact we are reading without locking would allow the broadcast to be seen between these two mutating operations, leading to the problem you described. No?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... I got it wrong then - I thought the fact we are reading without locking would allow the broadcast to be seen between these two mutating operations, leading to the problem you described. No?
Shouldn't write comments at 6am, I somehow thought that you've made reads synchronized too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I left some smaller questions to double-check before merging but not critical
where: | ||
it << (1..10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 Can you explain what this does?
* <p>Implementations are not thread-safe.</p> | ||
* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💭 Should we annotate the class with @ThreadSafe
?
where: | ||
n << (1..4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 Is the n
used anywhere or does it play some other role?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to produce multiple runs of the test. Not sure there is another way to do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are trying to compensate for the lack of certainty (which this test cannot provide), it's better to make the test itself more "intensive" than to run it multiple times. It's similar to why we run each performance test once, but it internally does everything it can to provide certainty, e.g. runs multiple batches of work.
I don't feel strongly about this, but it does feel weird to have an unused variable that forces the test to run multiple times
Likewise! |
a7a2ebb
to
816ec0b
Compare
Issue: #31537 * Annotate `ListenerBroadcast` as thread-safe * Ensure `ListenerBroadcast.getSource()` never creates more than one `ProxyDispatchAdapter` * Ensure `AnonymousListenerBroadcast.removeAll()` updates `broadcast` atomically * Remove note on thread-unsafety of `AnonymousListenerBroadcast` * Ensure reads reflect writes on `broacast`
Also: Add task-graph listener registration test for Isolated Projects
816ec0b
to
dd77fc4
Compare
Fixes: #31537
Context
Reviewing cheatsheet
Before merging the PR, comments starting with