Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build: Enable Spark query runner as reference in aggregation fuzzer test #9559

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

rui-mo
Copy link
Collaborator

@rui-mo rui-mo commented Apr 22, 2024

Enables Spark query runner as reference DB in Spark aggregation fuzzer test.
Removes duplicate Spark aggregation fuzzer test from experimental run.

Fixes #9270.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 22, 2024
Copy link

netlify bot commented Apr 22, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 7283cac
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/674d513ee547960007274cac

@rui-mo rui-mo changed the title Support Spark query runner WIP: Support Spark query runner Apr 22, 2024
@rui-mo
Copy link
Collaborator Author

rui-mo commented May 14, 2024

Hi @mbasmanova, this is a draft PR to enable Spark query runner in aggregate fuzzer test. I'd like to work on below items in separate PRs. Could you spare some time to check if they make sense? Thank you.

  1. Add support for docker image with Spark connect server #9759
  2. Refactor the toSql by introducing helper functions to allow reusability for Spark.
  3. Introduce SparkQueryRunner and enable it in agg fuzzer test.

@mbasmanova
Copy link
Contributor

@rui-mo Rui, thank you for working on this. At a high-level next steps make sense to me, but there are not enough details for me to understand them fully.

Fix the intermediate type of Sum aggregate discovered when running fuzzer test.

Would you create a GitHub issue to describe this problem?

Introduce a configuration to specify if the vectors generated by VectorFuzzer are for Arrow parquet. We need it because exportFlattenedVector method in Bridge.cpp does not support nested encoding and non-scalar type, which would breaks parquet write.

Does this mean we won't be able to create tables with map/array/struct columns? If so, this is a pretty severe limitation.

@rui-mo
Copy link
Collaborator Author

rui-mo commented May 15, 2024

@mbasmanova Thanks for your reply.
For the sum aggregate issue, I notice #9818 is addressing the same issue as discovered by this fuzzer test.
For the second one, I created #9821 and provided some details. Thanks.

@rui-mo rui-mo force-pushed the wip_9270 branch 5 times, most recently from 5400aed to 61c3f07 Compare May 17, 2024 06:10
@rui-mo rui-mo changed the title WIP: Support Spark query runner WIP: Enable Spark query runner as reference in aggregation fuzzer test Jun 20, 2024
@rui-mo rui-mo force-pushed the wip_9270 branch 5 times, most recently from c5b6b9e to 52a74bd Compare June 25, 2024 01:46
Copy link
Collaborator

@majetideepak majetideepak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rui-mo CPP code changes look good.
@assignUser can you please take another look at the workflow file changes? Thanks.

runs-on: ubuntu-latest
needs: compile
spark-java-aggregation-fuzzer-run:
runs-on: 16-core-ubuntu
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timeout-minutes: 60
timeout-minutes: 120
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this change necessary? The usual duration is still 15/30mins. Did you actually see a timeout?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. This change is not necessary and just reverted it.

.github/workflows/experimental.yml Show resolved Hide resolved
@rui-mo rui-mo force-pushed the wip_9270 branch 2 times, most recently from 717c2a3 to ba10f70 Compare October 30, 2024 07:24
@rui-mo
Copy link
Collaborator Author

rui-mo commented Oct 30, 2024

@assignUser Fixed above comments. Would you like to take another look? Thanks.

Copy link
Collaborator

@assignUser assignUser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@assignUser assignUser added the ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall label Nov 2, 2024
@mbasmanova mbasmanova requested a review from kgpai November 4, 2024 13:41
@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@mbasmanova
Copy link
Contributor

@rui-mo Would you rebase to allow merging?

@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@rui-mo rui-mo force-pushed the wip_9270 branch 6 times, most recently from 7612685 to aab3097 Compare November 11, 2024 06:53
@rui-mo
Copy link
Collaborator Author

rui-mo commented Nov 12, 2024

Hi @kevinwilfong, could you help import and merge this PR? Thanks!

@rui-mo rui-mo changed the title Enable Spark query runner as reference in aggregation fuzzer test build: Enable Spark query runner as reference in aggregation fuzzer test Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enhance AggregationFuzzer to verify results against Spark
5 participants