-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWS, Core: Use Awaitility instead of Thread.sleep #8804
Conversation
010f36b
to
7b2c7ac
Compare
@nastra , Please take a look. |
@@ -62,6 +70,54 @@ public static long waitUntilAfter(long timestampMillis) { | |||
return current; | |||
} | |||
|
|||
/** wait for fixed duration */ | |||
public static void delayInMilliseconds(String message, long delay, boolean useSameThread) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to use Awaitility to just wait a certain amount of time for a true condition to happen
@@ -101,14 +103,7 @@ public void closeableTimer() throws InterruptedException { | |||
@Test | |||
public void measureRunnable() { | |||
Timer timer = new DefaultTimer(TimeUnit.NANOSECONDS); | |||
Runnable runnable = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the goal here is actually to measure the duration of the runnable
, so this isn't something that we want to replace with Awaitility
"wait for IAM role policy to update.", | ||
1000, | ||
1001, | ||
10001, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the parameters are rather confusing. It would be better to use Awaitility directly here
10001, | ||
TimeUnit.MILLISECONDS, | ||
() -> | ||
iam.getRolePolicy( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is that the check required whether IAM is consistent?
} catch (InterruptedException e) { | ||
throw new RuntimeException(e); | ||
} | ||
TestHelpers.delayInMilliseconds( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this goes back to an earlier comment I had on #8715 (comment). We don't just want to replace Thread.sleep()
usage blindly with Awaitility by waiting the same amount of time. The important piece is that we'd need to understand what kind of condition the test is trying to eventually reach, which we can then check by using Awaitility (rather than just sleeping X seconds).
I've opened #8853 and #8852 to give an idea how that might look for other places in the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this goes back to an earlier comment I had on #8715 (comment). We don't just want to replace Thread.sleep()
usage blindly with Awaitility by waiting the same amount of time. The important piece is that we'd need to understand what kind of condition a particular test is trying to eventually reach, which we can then check by using Awaitility (rather than just sleeping X seconds).
I've opened #8853 and #8852 to give an idea how that might look for other places in the code.
That being said, I think the changes in TestHelpers
should be reverted
final int currentFilesCount = numCommittedFiles; | ||
Awaitility.await() | ||
.pollInterval(Duration.ofMillis(10)) | ||
.atMost(Duration.ofMillis(1000)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.atMost(Duration.ofMillis(1000)) | |
.atMost(Duration.ofSeconds(1)) |
Thread.sleep(IAM_PROPAGATION_DELAY); // sleep to make sure IAM up to date | ||
private static void waitForIamConsistency(String roleName, String policyName) { | ||
Awaitility.await() | ||
.pollDelay(Duration.ofMillis(1000)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.pollDelay(Duration.ofMillis(1000)) | |
.pollDelay(Duration.ofSeconds(1)) |
.build()) | ||
.roleName() | ||
.equalsIgnoreCase(roleName)); | ||
; // wait to make sure IAM up to date |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that seeems like a weird formatting. Maybe just put the comment at the top in L363
Awaitility.await() | ||
.pollInterval(Duration.ofMillis(10)) | ||
.atMost(Duration.ofMillis(1000)) | ||
.until(() -> !(barrier.get() < currentFilesCount * 2)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.until(() -> !(barrier.get() < currentFilesCount * 2)); | |
.until(() -> barrier.get() >= currentFilesCount * 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same for the other places
private void waitForIamConsistency() throws Exception { | ||
Thread.sleep(10000); // sleep to make sure IAM up to date | ||
private void waitForIamConsistency() { | ||
Awaitility.await("wait for IAM role policy to update.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amogh-jahagirdar do you know if that would work for the AWS integration tests?
@@ -20,6 +20,7 @@ | |||
|
|||
import static java.util.concurrent.Executors.newFixedThreadPool; | |||
import static java.util.concurrent.TimeUnit.SECONDS; | |||
import static org.apache.iceberg.TestHelpers.waitUntilAfter; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes on this file need to be reverted because we actually want to sleep in those tests
final int currentFilesCount = numCommittedFiles; | ||
Awaitility.await() | ||
.pollInterval(Duration.ofMillis(10)) | ||
.pollInSameThread() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this one here but not in the other places? Also this should have an atMost()
configuration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only this test was becoming flaky with Awaitility
. I think adding pollInSameThread
for all the similar tests is safe. So I have added pollInSameThread
and atMost
with 5 minutes to all the other similar tests.
wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nk1506 I think the question was more around what's the issue if pollInSameThread
is not used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amogh-jahagirdar , My intent was to use the same test thread here. Since we have multiple runnable tasks here. I don't think it was flaky because of not using pollInSameThread()
.
Please share your thoughts about adding pollInSameThread()
here is a good idea or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then we need to figure out why the test was flaky. Was it occassionally failing? The purpose of using Awaitility is to make tests more robust and less flaky, so we need to figure out why this is flaky
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If timeout is less(<10 seconds)
it's failing randomly. To safe side, I have added timeout as 5 minutes .atMost(Duration.ofSeconds(300))
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm -1 on having this run 5 minutes. Also I don't think we should be using pollInSameThread()
, so please remove all of those. I ran those tests locally with the 10 seconds timeout consistently and they were passing
7577141
to
e571385
Compare
Awaitility.await("wait for IAM role policy to update.") | ||
.pollDelay(Duration.ofSeconds(1)) | ||
.atMost(Duration.ofSeconds(10)) | ||
.until( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be untilAsserted
. The inner block would then be assertThat(iam.getRolePolicy(...)).isNotNull()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you also might need to specify .ignoreExceptions()
as otherwise the API can throw a NoSuchEntityException
when the role doesn't exist
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially planned to add .ignoreExceptions()
. But as per there javadoc
Instruct Awaitility to ignore all exceptions that occur during evaluation. Exceptions will be treated as evaluating to false. This is useful in situations where the evaluated conditions may temporarily throw exceptions..
I thought it wont make any difference.
Thanks for recommending this.
.atMost(Duration.ofSeconds(10)) | ||
.until( | ||
() -> | ||
lakeformation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the same thing @nastra pointed out https://github.com/apache/iceberg/pull/8804/files#r1371698377 applies here as well.
final int currentFilesCount = numCommittedFiles; | ||
Awaitility.await() | ||
.pollInterval(Duration.ofMillis(10)) | ||
.pollInSameThread() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nk1506 I think the question was more around what's the issue if pollInSameThread
is not used?
db71fe1
to
07885ee
Compare
Awaitility.await() | ||
.pollDelay(Duration.ofSeconds(1)) | ||
.atMost(Duration.ofSeconds(10)) | ||
.until( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should do the same untilAsserted()
check that I mentioned earlier
DescribeResourceRequest.builder().resourceArn(arn).build()) | ||
.resourceInfo() | ||
.roleArn() | ||
.equalsIgnoreCase(lfRegisterPathRoleArn)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should do .isEqualToIgnoringCase(lfRegisterPathRoleArn))
rather than isTrue()
Awaitility.await() | ||
.pollDelay(Duration.ofSeconds(1)) | ||
.atMost(Duration.ofSeconds(10)) | ||
.until( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be consistent with other checks
37258a8
to
daea635
Compare
@@ -53,7 +53,7 @@ public static String lastExecutedMetricValue(SparkSession spark, String metricNa | |||
// Refresh metricValues, they will remain null until the execution is complete and metrics are | |||
// aggregated | |||
Awaitility.await() | |||
.atMost(Duration.ofMillis(500)) | |||
.atMost(Duration.ofSeconds(10)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It started becoming flaky it seems.
Ref: https://github.com/apache/iceberg/actions/runs/6678213413/job/18149016719
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this shouldn't be 10 seconds, but maybe rather 1-3 seconds. The old version of the code was relying on something around 500 millis
@@ -53,7 +53,7 @@ public static String lastExecutedMetricValue(SparkSession spark, String metricNa | |||
// Refresh metricValues, they will remain null until the execution is complete and metrics are | |||
// aggregated | |||
Awaitility.await() | |||
.atMost(Duration.ofMillis(500)) | |||
.atMost(Duration.ofSeconds(10)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
@@ -53,7 +53,7 @@ public static String lastExecutedMetricValue(SparkSession spark, String metricNa | |||
// Refresh metricValues, they will remain null until the execution is complete and metrics are | |||
// aggregated | |||
Awaitility.await() | |||
.atMost(Duration.ofMillis(500)) | |||
.atMost(Duration.ofSeconds(10)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
@@ -53,7 +53,7 @@ public static String lastExecutedMetricValue(SparkSession spark, String metricNa | |||
// Refresh metricValues, they will remain null until the execution is complete and metrics are | |||
// aggregated | |||
Awaitility.await() | |||
.atMost(Duration.ofMillis(500)) | |||
.atMost(Duration.ofSeconds(10)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
LGTM, thanks @nk1506 |
Fixes #7154