-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flink: Fix TestIcebergSourceWithWatermarkExtractor flakiness #9309
Conversation
Please also handle for other flink modules. Flakiness can come from any of the modules. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pvary feel free to add 1.16 and 1.17 in the same PR. it is a smaller unit test change. it is ok to include all 3 versions in one PR to speed up the turn-around.
@@ -95,8 +97,7 @@ public class TestIcebergSourceWithWatermarkExtractor implements Serializable { | |||
.setRpcServiceSharing(RpcServiceSharing.DEDICATED) | |||
.setConfiguration( | |||
reporter.addToConfiguration( | |||
// disable classloader check as Avro may cache class in the serializers. | |||
new Configuration().set(CoreOptions.CHECK_LEAKED_CLASSLOADER, false))) | |||
new Configuration().set(PipelineOptions.ALLOW_UNALIGNED_SOURCE_SPLITS, true))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we probably need both options here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I run the test 1000 times without the classloader issue to appear. Any idea how to check? Since the same issue appears with the configuration set, I suspect that the error message is a red herring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I run the test 1000 times without the classloader issue to appear. Any idea how to check?
not really. in the past, this tends to happen during CI build.
I will leave the decision to you. if it happen again, we would need another PR.
@@ -304,6 +322,11 @@ public void testThrottling() throws Exception { | |||
|
|||
try (CloseableIterator<RowData> resultIterator = stream.collectAsync()) { | |||
JobClient jobClient = env.executeAsync("Iceberg Source Throttling Test"); | |||
CommonTestUtils.waitForAllTaskRunning( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this the main fix to make sure all tasks running first?
other code refactoring/reordering is just cosmetic/style thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I needed to reorder the Iceberg commit after starting the job. So the first records are moved. To make the test more consistent, moved the record generation to the beginning of the test
@stevenzwu: The v1.16 does not have the |
@pvary you can monitor the actions/workflows for Flink CI. https://github.com/apache/iceberg/actions/workflows/flink-ci.yml |
Thanks @stevenzwu, I will try to follow what happens |
Fix most of the flakiness in
TestIcebergSourceWithWatermarkExtractor
.Still #9308 is needed for the full fix