Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make listener broadcast thread safe #31692

Merged

Conversation

abstratt
Copy link
Member

Fixes: #31537

Context

Reviewing cheatsheet

Before merging the PR, comments starting with

  • ❌ ❓must be fixed
  • 🤔 💅 should be fixed
  • 💭 may be fixed
  • 🎉 celebrate happy things

@abstratt abstratt self-assigned this Dec 12, 2024
@abstratt

This comment has been minimized.

@bot-gradle

This comment has been minimized.

@bot-gradle

This comment has been minimized.

@abstratt abstratt force-pushed the rchaves/master/make-listener-broadcast-thread-safe branch from c6d93e5 to 4ccac50 Compare December 13, 2024 03:54
@abstratt

This comment has been minimized.

@bot-gradle

This comment has been minimized.

@bot-gradle

This comment has been minimized.

@abstratt abstratt marked this pull request as ready for review December 13, 2024 11:59
@abstratt abstratt requested review from a team as code owners December 13, 2024 11:59
@abstratt abstratt requested review from mlopatkin and 6hundreds and removed request for a team December 13, 2024 11:59
@abstratt abstratt added this to the 8.13 RC1 milestone Dec 13, 2024
@abstratt

This comment has been minimized.

@bot-gradle

This comment has been minimized.

@6hundreds
Copy link
Member

Were other synchronization technics been considered? Say, make the broadcast field as a AtomicReference to avoid unnecessary blocking?

@bot-gradle
Copy link
Collaborator

The following builds have passed:

Copy link
Member

@mlopatkin mlopatkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments inline.

new Thread() {
@Override
void run() {
${server.callFromBuildUsingExpression("'configure-' + project.name")}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ Why do we need this thread? We start it, we block it, but it just lives in background, I don't see how it impacts the configuration of the projects.

Copy link
Member Author

@abstratt abstratt Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing it straight from the script body would mean that if we had more projects (p) than workers (w), we would never see more than w requests, as there would be no workers left to run the other scripts. The result would be a timeout waiting for the other (p - w) requests that we were expecting.

TBH, I don't think we would need to block on the HTTP server anyways. As long as we can verify the taskgraph.whenReady listeners being invoked via the console output, we should be good. Right, @alllex ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good catch about the number of workers, Rafael.

The test itself is not designed to prove that the implementation is thread-safe but to expose a case when an implementation is more obviously unsafe. So, it goes by ensuring multiple project evaluations (running in parallel) reach the same point in build logic and then start spamming the callback registration requests. The number of projects does not matter as much -- the important point is to get contention on callback registration. To that end, we still should have the blocking server, ensuring all projects are being configured in parallel and all of them are aligned before the "start line" and start spamming together.

Rather than blocking in an anonymous thread, I would resolve the issue with workers by making it match the number of projects (--max-workers ? and +1 for the root). It's okay if we want to reduce the number of projects for this, as we can, for instance, increase the number of listeners per project.

I would also leave a comment somewhere that this test is more of a soak test than a usual integration test, as it does not guarantee correctness, but instead detects some cases of unsafe implementations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification, Alex. Indeed, this test passing once does not guarantee correctness, but at the same time I think that is the case with many of our concurrency tests.

Copy link
Member Author

@abstratt abstratt Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the test not to use ad hoc background threads

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ There is a data race on reading the broadcast field. It should be volatile or its reads should be synchronized.

You can use @GuardedBy annotation to catch issues if you decide to go with the synchronized approach.

I suspect, source should be massaged to become thread-safe too (by e.g. wrapping it in Lazy).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the data race for broadcast, Mike? What concrete case could cause a race?

I figured that since broadcast objects (BroadcastDispatch) are immutable, and are private to the owning ListenerBroadcast, we would be good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the data race for broadcast, Mike? What concrete case could cause a race?

As the reads aren't synchronized, there is nothing that makes an updated broadcast value visible to other threads. This means that we may still miss notifications, because the sender thread will see some obsolete list of listeners. With mutable classes it can also cause the reading thread to observe partially constructed object, but for immutable things like BroadcastDispatch it isn't an issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went with volatile. DId nothing about source. Is the issue about us ending up with multiple ProxyDispatchAdapter instances created for a single ListenerBroadcast? If so, is that really a problem?

Copy link
Member

@mlopatkin mlopatkin Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, is that really a problem?

Well, it isn't particularly cheap to create. And I personally prefer to follow safe publication rules unless I have a very good reason not to do so, like prohibitive performance cost. Saves brain cycles by not thinking whether it is actually safe, and helps with future proofing the code.

With unsafe publication we have three options:

  1. One thread writes source, other threads see the fully constructed written instance. This is a good outcome.
  2. Multiple threads write source, then they and other threads see the fully constructed but different instances. This is fine, though we pay the price of creating several sources.
  3. One or multiple threads write source, other threads see partially constructed instances. Then we see various Heisenbugs with weird stacktraces. For now the ProxyDispatchAdapter is immutable, so it won't happen because of the freezing action in the constructor, but if someone adds something lazily initialized there, the code will break.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding source, I just made it to lock on the ListenerBroacast as well. Looked into using Lazy, however it is not compatible with clients that are stuck with Java 6 source level (static method on interfaces are not supported).

@abstratt abstratt force-pushed the rchaves/master/make-listener-broadcast-thread-safe branch 2 times, most recently from 25678f6 to 037fff6 Compare December 17, 2024 15:16
@abstratt
Copy link
Member Author

abstratt commented Dec 17, 2024

Were other synchronization technics been considered? Say, make the broadcast field as a AtomicReference to avoid unnecessary blocking?

@6hundreds I looked into it, however that project is using Java 6 source compatibility, so no lambdas, meaning the code would become much more verbose.

Have you seen AtomicReference pushed as a faster option to synchronized? The only comparison I saw showed Atomic being faster with a couple of threads, but slower with a higher number of threads.

@abstratt abstratt closed this Dec 17, 2024
@abstratt abstratt reopened this Dec 17, 2024
@abstratt abstratt force-pushed the rchaves/master/make-listener-broadcast-thread-safe branch from 037fff6 to 4bcedd6 Compare December 17, 2024 16:14
@abstratt abstratt dismissed mlopatkin’s stale review December 17, 2024 16:31

Changes performed

@abstratt
Copy link
Member Author

@bot-gradle test APT

@bot-gradle

This comment has been minimized.

@bot-gradle
Copy link
Collaborator

The following builds have passed:

Copy link
Member

@mlopatkin mlopatkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ I believe we should address thread-safety of the source too.

@@ -29,7 +29,7 @@ public AnonymousListenerBroadcast(Class<T> type, Dispatch<MethodInvocation> forw
}

@Override
public void removeAll() {
public synchronized void removeAll() {
super.removeAll();
add(forwardingDispatch);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ as we don't use synchronized when reading the broadcast, we may see it between removeAll and add(forwardingDispatch), so we may miss broadcasting to forwardingDispatch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduced a method in the base class to replace broadcast atomically

@abstratt
Copy link
Member Author

❌ I believe we should address thread-safety of the source too.

Please take another look, Mike

@abstratt abstratt dismissed mlopatkin’s stale review December 19, 2024 04:18

Please take another look, Mike

Copy link
Member

@mlopatkin mlopatkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you for your patience!

public void removeAll() {
super.removeAll();
add(forwardingDispatch);
public synchronized void removeAll() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 synchronized on this method should have been enough, but this is fine too.

Copy link
Member Author

@abstratt abstratt Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I got it wrong then - I thought the fact we are reading without locking would allow the broadcast to be seen between these two mutating operations, leading to the problem you described. No?

#31692 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I got it wrong then - I thought the fact we are reading without locking would allow the broadcast to be seen between these two mutating operations, leading to the problem you described. No?

#31692 (comment)

Shouldn't write comments at 6am, I somehow thought that you've made reads synchronized too.

Copy link
Member

@alllex alllex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
I left some smaller questions to double-check before merging but not critical

Comment on lines +143 to +144
where:
it << (1..10)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 Can you explain what this does?

Comment on lines -34 to -35
* <p>Implementations are not thread-safe.</p>
*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 Should we annotate the class with @ThreadSafe ?

Comment on lines +424 to +425
where:
n << (1..4)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 Is the n used anywhere or does it play some other role?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to produce multiple runs of the test. Not sure there is another way to do that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are trying to compensate for the lack of certainty (which this test cannot provide), it's better to make the test itself more "intensive" than to run it multiple times. It's similar to why we run each performance test once, but it internally does everything it can to provide certainty, e.g. runs multiple batches of work.

I don't feel strongly about this, but it does feel weird to have an unused variable that forces the test to run multiple times

@abstratt
Copy link
Member Author

LGTM, thank you for your patience!

Likewise!

@abstratt abstratt force-pushed the rchaves/master/make-listener-broadcast-thread-safe branch from a7a2ebb to 816ec0b Compare December 20, 2024 00:50
@abstratt abstratt added this pull request to the merge queue Dec 20, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 20, 2024
abstratt and others added 2 commits December 20, 2024 00:13
Issue: #31537

* Annotate `ListenerBroadcast` as thread-safe
* Ensure `ListenerBroadcast.getSource()` never creates more than one `ProxyDispatchAdapter`
* Ensure `AnonymousListenerBroadcast.removeAll()` updates `broadcast` atomically
* Remove note on thread-unsafety of `AnonymousListenerBroadcast`
* Ensure reads reflect writes on `broacast`
Also:

Add task-graph listener registration test for Isolated Projects
@abstratt abstratt force-pushed the rchaves/master/make-listener-broadcast-thread-safe branch from 816ec0b to dd77fc4 Compare December 20, 2024 03:13
@abstratt abstratt added this pull request to the merge queue Dec 20, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 20, 2024
@blindpirate blindpirate added this pull request to the merge queue Dec 20, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 20, 2024
@blindpirate blindpirate added this pull request to the merge queue Dec 20, 2024
Merged via the queue into master with commit 67a0768 Dec 20, 2024
22 checks passed
@blindpirate blindpirate deleted the rchaves/master/make-listener-broadcast-thread-safe branch December 20, 2024 08:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Broadcast listener registration is not thread-safe
6 participants