Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS, Core: Use Awaitility instead of Thread.sleep #8804

Merged
merged 1 commit into from
Oct 30, 2023
Merged

Conversation

nk1506
Copy link
Contributor

@nk1506 nk1506 commented Oct 11, 2023

Fixes #7154

@nk1506
Copy link
Contributor Author

nk1506 commented Oct 12, 2023

@nastra , Please take a look.

@ajantha-bhat ajantha-bhat requested a review from nastra October 17, 2023 12:43
@@ -62,6 +70,54 @@ public static long waitUntilAfter(long timestampMillis) {
return current;
}

/** wait for fixed duration */
public static void delayInMilliseconds(String message, long delay, boolean useSameThread) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to use Awaitility to just wait a certain amount of time for a true condition to happen

@@ -101,14 +103,7 @@ public void closeableTimer() throws InterruptedException {
@Test
public void measureRunnable() {
Timer timer = new DefaultTimer(TimeUnit.NANOSECONDS);
Runnable runnable =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the goal here is actually to measure the duration of the runnable, so this isn't something that we want to replace with Awaitility

"wait for IAM role policy to update.",
1000,
1001,
10001,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the parameters are rather confusing. It would be better to use Awaitility directly here

10001,
TimeUnit.MILLISECONDS,
() ->
iam.getRolePolicy(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that the check required whether IAM is consistent?

} catch (InterruptedException e) {
throw new RuntimeException(e);
}
TestHelpers.delayInMilliseconds(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this goes back to an earlier comment I had on #8715 (comment). We don't just want to replace Thread.sleep() usage blindly with Awaitility by waiting the same amount of time. The important piece is that we'd need to understand what kind of condition the test is trying to eventually reach, which we can then check by using Awaitility (rather than just sleeping X seconds).

I've opened #8853 and #8852 to give an idea how that might look for other places in the code

Copy link
Contributor

@nastra nastra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this goes back to an earlier comment I had on #8715 (comment). We don't just want to replace Thread.sleep() usage blindly with Awaitility by waiting the same amount of time. The important piece is that we'd need to understand what kind of condition a particular test is trying to eventually reach, which we can then check by using Awaitility (rather than just sleeping X seconds).

I've opened #8853 and #8852 to give an idea how that might look for other places in the code.

That being said, I think the changes in TestHelpers should be reverted

final int currentFilesCount = numCommittedFiles;
Awaitility.await()
.pollInterval(Duration.ofMillis(10))
.atMost(Duration.ofMillis(1000))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.atMost(Duration.ofMillis(1000))
.atMost(Duration.ofSeconds(1))

Thread.sleep(IAM_PROPAGATION_DELAY); // sleep to make sure IAM up to date
private static void waitForIamConsistency(String roleName, String policyName) {
Awaitility.await()
.pollDelay(Duration.ofMillis(1000))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.pollDelay(Duration.ofMillis(1000))
.pollDelay(Duration.ofSeconds(1))

.build())
.roleName()
.equalsIgnoreCase(roleName));
; // wait to make sure IAM up to date
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that seeems like a weird formatting. Maybe just put the comment at the top in L363

Awaitility.await()
.pollInterval(Duration.ofMillis(10))
.atMost(Duration.ofMillis(1000))
.until(() -> !(barrier.get() < currentFilesCount * 2));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.until(() -> !(barrier.get() < currentFilesCount * 2));
.until(() -> barrier.get() >= currentFilesCount * 2);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for the other places

private void waitForIamConsistency() throws Exception {
Thread.sleep(10000); // sleep to make sure IAM up to date
private void waitForIamConsistency() {
Awaitility.await("wait for IAM role policy to update.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amogh-jahagirdar do you know if that would work for the AWS integration tests?

@@ -20,6 +20,7 @@

import static java.util.concurrent.Executors.newFixedThreadPool;
import static java.util.concurrent.TimeUnit.SECONDS;
import static org.apache.iceberg.TestHelpers.waitUntilAfter;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes on this file need to be reverted because we actually want to sleep in those tests

@github-actions github-actions bot removed the API label Oct 19, 2023
final int currentFilesCount = numCommittedFiles;
Awaitility.await()
.pollInterval(Duration.ofMillis(10))
.pollInSameThread()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this one here but not in the other places? Also this should have an atMost() configuration

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only this test was becoming flaky with Awaitility . I think adding pollInSameThread for all the similar tests is safe. So I have added pollInSameThread and atMost with 5 minutes to all the other similar tests.
wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nk1506 I think the question was more around what's the issue if pollInSameThread is not used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amogh-jahagirdar , My intent was to use the same test thread here. Since we have multiple runnable tasks here. I don't think it was flaky because of not using pollInSameThread() .
Please share your thoughts about adding pollInSameThread() here is a good idea or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then we need to figure out why the test was flaky. Was it occassionally failing? The purpose of using Awaitility is to make tests more robust and less flaky, so we need to figure out why this is flaky

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If timeout is less(<10 seconds) it's failing randomly. To safe side, I have added timeout as 5 minutes .atMost(Duration.ofSeconds(300)) .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm -1 on having this run 5 minutes. Also I don't think we should be using pollInSameThread(), so please remove all of those. I ran those tests locally with the 10 seconds timeout consistently and they were passing

@nk1506 nk1506 force-pushed the 7154 branch 2 times, most recently from 7577141 to e571385 Compare October 24, 2023 11:13
@nk1506 nk1506 requested a review from nastra October 25, 2023 05:01
Awaitility.await("wait for IAM role policy to update.")
.pollDelay(Duration.ofSeconds(1))
.atMost(Duration.ofSeconds(10))
.until(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be untilAsserted. The inner block would then be assertThat(iam.getRolePolicy(...)).isNotNull()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you also might need to specify .ignoreExceptions() as otherwise the API can throw a NoSuchEntityException when the role doesn't exist

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially planned to add .ignoreExceptions(). But as per there javadoc
Instruct Awaitility to ignore all exceptions that occur during evaluation. Exceptions will be treated as evaluating to false. This is useful in situations where the evaluated conditions may temporarily throw exceptions..
I thought it wont make any difference.
Thanks for recommending this.

.atMost(Duration.ofSeconds(10))
.until(
() ->
lakeformation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the same thing @nastra pointed out https://github.com/apache/iceberg/pull/8804/files#r1371698377 applies here as well.

final int currentFilesCount = numCommittedFiles;
Awaitility.await()
.pollInterval(Duration.ofMillis(10))
.pollInSameThread()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nk1506 I think the question was more around what's the issue if pollInSameThread is not used?

@nk1506 nk1506 force-pushed the 7154 branch 2 times, most recently from db71fe1 to 07885ee Compare October 26, 2023 05:53
@nk1506 nk1506 requested a review from nastra October 27, 2023 04:45
Awaitility.await()
.pollDelay(Duration.ofSeconds(1))
.atMost(Duration.ofSeconds(10))
.until(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should do the same untilAsserted() check that I mentioned earlier

DescribeResourceRequest.builder().resourceArn(arn).build())
.resourceInfo()
.roleArn()
.equalsIgnoreCase(lfRegisterPathRoleArn))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should do .isEqualToIgnoringCase(lfRegisterPathRoleArn)) rather than isTrue()

Awaitility.await()
.pollDelay(Duration.ofSeconds(1))
.atMost(Duration.ofSeconds(10))
.until(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be consistent with other checks

@nk1506 nk1506 force-pushed the 7154 branch 2 times, most recently from 37258a8 to daea635 Compare October 28, 2023 19:00
@github-actions github-actions bot added the spark label Oct 28, 2023
@@ -53,7 +53,7 @@ public static String lastExecutedMetricValue(SparkSession spark, String metricNa
// Refresh metricValues, they will remain null until the execution is complete and metrics are
// aggregated
Awaitility.await()
.atMost(Duration.ofMillis(500))
.atMost(Duration.ofSeconds(10))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this shouldn't be 10 seconds, but maybe rather 1-3 seconds. The old version of the code was relying on something around 500 millis

@@ -53,7 +53,7 @@ public static String lastExecutedMetricValue(SparkSession spark, String metricNa
// Refresh metricValues, they will remain null until the execution is complete and metrics are
// aggregated
Awaitility.await()
.atMost(Duration.ofMillis(500))
.atMost(Duration.ofSeconds(10))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@@ -53,7 +53,7 @@ public static String lastExecutedMetricValue(SparkSession spark, String metricNa
// Refresh metricValues, they will remain null until the execution is complete and metrics are
// aggregated
Awaitility.await()
.atMost(Duration.ofMillis(500))
.atMost(Duration.ofSeconds(10))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@@ -53,7 +53,7 @@ public static String lastExecutedMetricValue(SparkSession spark, String metricNa
// Refresh metricValues, they will remain null until the execution is complete and metrics are
// aggregated
Awaitility.await()
.atMost(Duration.ofMillis(500))
.atMost(Duration.ofSeconds(10))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@nastra nastra changed the title Build: Replace Thread.Sleep() usage with org.Awaitility from Tests. AWS, Core: Use Awaitility instead of Thread.sleep Oct 30, 2023
@nastra
Copy link
Contributor

nastra commented Oct 30, 2023

LGTM, thanks @nk1506

@nastra nastra merged commit 7e9e02c into apache:main Oct 30, 2023
45 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace Thread.sleep() usage in test code with Awaitility
3 participants