Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] GHA workflow to perform tcph benchmarking #3184

Closed
wants to merge 3 commits into from
Closed

Conversation

raunakab
Copy link
Contributor

@raunakab raunakab commented Nov 5, 2024

Overview

Create new GHA workflow for building a commit and running tpch against it.

Notes

There are 2 main workflows:

  1. build-commit.yaml
  2. run-tpch.yaml

The final workflow, build-commit-run-tpch.yaml just runs the above two in a sequential order.

I've also made some changes to benchmarking/tpch/__main__.py. Namely:

  1. Added all env-vars that start with DAFT to the ray-runtime-env variables that's sent during ray-cluster initialization.
  2. Added flag to turn off sending daft module to ray-cluster during initialization.
  • No need to pickle the daft module and send it over; it's already installed on the ray-cluster from the AWS S3 link pointing to the prebuilt python-wheel.

I've summarized the workflows individually down below:

build-commit workflow

  • uses buildjet for building
  • caching enabled
  • builds a release python-wheel and stores it in AWS S3
  • without caching, builds take around 6-7min
  • with caching, builds take roughly 3min
  • with aggressive caching, builds take <40s

run-tpch workflow

  • pulls the AWS S3 python-wheel and runs it
  • runs the benchmarking.tpch benchmark
  • produces an output.csv and sends it back to GHA to be displayed
  • renders the output directly in the Summary Page
  • grabs Ray-Logs and uploads that to the GHA Summary Page as well

build-commit-run-tpch workflow

  • literally just invokes the first two in sequential order
  • maps the built python-wheel of the build-commit job (1st one) to the input of the run-tpch job (2nd one)

@github-actions github-actions bot added the enhancement New feature or request label Nov 5, 2024
@raunakab raunakab marked this pull request as ready for review November 5, 2024 21:35
@raunakab raunakab marked this pull request as draft November 5, 2024 21:35
@raunakab raunakab marked this pull request as ready for review November 5, 2024 21:43
Copy link

codspeed-hq bot commented Nov 5, 2024

CodSpeed Performance Report

Merging #3184 will degrade performances by 34.59%

Comparing feat/infra (e908742) with main (ec39dc0)

Summary

❌ 1 regressions
✅ 16 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main feat/infra Change
test_iter_rows_first_row[100 Small Files] 264.6 ms 404.6 ms -34.59%

@kevinzwang kevinzwang removed their request for review November 5, 2024 21:49
Copy link

codecov bot commented Nov 5, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 76.37%. Comparing base (ec39dc0) to head (e908742).
Report is 39 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3184      +/-   ##
==========================================
- Coverage   76.54%   76.37%   -0.17%     
==========================================
  Files         685      685              
  Lines       85269    85135     -134     
==========================================
- Hits        65266    65020     -246     
- Misses      20003    20115     +112     

see 35 files with indirect coverage changes

@raunakab raunakab marked this pull request as draft November 5, 2024 22:07
@raunakab raunakab marked this pull request as ready for review November 5, 2024 22:07
@raunakab raunakab self-assigned this Nov 5, 2024
with:
aws-region: us-west-2
role-to-assume: ${{ secrets.ACTIONS_AWS_ROLE_ARN }}
role-session-name: daft-performance-comparisons
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary -- Is this for S3 upload permissions?

Copy link
Contributor Author

@raunakab raunakab Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this was here for AWS S3 uploads. I believe writes to the bucket are private (whereas reads are public). May be wrong though.

.github/workflows/benchmark-local.yaml Outdated Show resolved Hide resolved
.github/workflows/benchmark-local.yaml Outdated Show resolved Hide resolved
.github/workflows/benchmark-local.yaml Outdated Show resolved Hide resolved
.github/workflows/differ-local.yaml Outdated Show resolved Hide resolved
@raunakab raunakab force-pushed the feat/infra branch 5 times, most recently from 0c8a35f to a71c566 Compare November 19, 2024 03:40
@raunakab raunakab removed the request for review from desmondcheongzx November 19, 2024 18:27
@raunakab raunakab marked this pull request as draft November 19, 2024 20:08
.github/assets/ray.yaml Outdated Show resolved Hide resolved
@raunakab raunakab changed the title [FEAT] New GH workflow for local diffing [FEAT] GHA workflow to perform tcph benchmarking Nov 19, 2024
@raunakab raunakab requested a review from jaychia November 21, 2024 05:30
@raunakab raunakab marked this pull request as ready for review November 21, 2024 05:34
@raunakab raunakab requested a review from samster25 November 21, 2024 06:54
@raunakab
Copy link
Contributor Author

Example:
https://github.com/Eventual-Inc/Daft/actions/runs/11926775586

Run by @desmondcheongzx. Run was submitted locally using the gh CLI tool. Invocation was:

gh workflow run build-commit-run-tpch.yaml --ref $BRANCH_NAME -f skip_questions=$SKIP_QUESTIONS

@raunakab raunakab requested a review from colin-ho November 21, 2024 06:57
@raunakab
Copy link
Contributor Author

Tagging @colin-ho. You recently touched the benchmarking/tpch/__main__.py file. Just wanted to run some of those changes by you first.

@raunakab raunakab removed the request for review from samster25 November 21, 2024 07:03
Copy link
Contributor

@colin-ho colin-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benchmarking/tpch/__main__.py file looks good to me apart from the one nit comment about the py_modules arg. Also left some suggestions on the other files.

.github/assets/ray.yaml Outdated Show resolved Hide resolved
.github/workflows/build-commit-run-tpch.yaml Outdated Show resolved Hide resolved
benchmarking/tpch/__main__.py Outdated Show resolved Hide resolved
.github/assets/ray.yaml Outdated Show resolved Hide resolved
.github/workflows/run-tpch.yaml Outdated Show resolved Hide resolved
.github/workflows/run-tpch.yaml Show resolved Hide resolved
- slightly different than existing one
- uv pip install pip ray[default] py-spy
# GitHub Actions workflow will replace all parameters between `<<...>>` with the
# actual values as determined dynamically during runtime of the actual workflow.
- uv pip install https://github-actions-artifacts-bucket.s3.us-west-2.amazonaws.com/builds/<<SHA>>/<<WHEEL>>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think {{SHA}} and {{WHEEL}} is more common here for templating

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can update.


for row in csv_reader:
row_str = make_md_row(row)
output.write(row_str)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, but note that this was a nice-to-have that really should have been in a follow-on

options:
- '2'
- '32'
- '32'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should combine SF with partition size, since not all permutations have datasets?

I.e. SF10_part32, SF100_part512...

Or even simpler, just let people select the URL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, I was thinking of this as well.

Options could be:

  • 2-2
  • 10-32
  • etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will update.

aws s3 sync \
s3://eventual-dev-benchmarking-fixtures/uncompressed/tpch-dbgen/<<SCALE_FACTOR>>/<<PARTITION_SIZE>>/parquet/ \
/tmp/data/<<SCALE_FACTOR>>/<<PARTITION_SIZE>>/parquet/ \
--quiet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intended behavior? Downloading 1TB of data is going to kill the machine. Are we benchmarking off s3 or local?

type: string
description: The wheel artifact to use
required: false
default: getdaft-0.3.0.dev0-cp38-abi3-manylinux_2_31_x86_64.whl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this version of Daft and why is this not required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the name of the wheel which is inside of the directory in the github-actions-artifacts-bucket bucket. The name of the outputted wheel is determined via build job which then injects the name of the wheel to this step. If you want to run this step without having run the previous, you'll have to specify the name.

It's possible that we have an S3 List job to grab the name instead...

aws-region: us-west-2
role-session-name: run-tpch-workflow
- uses: ./.github/actions/install
- run: |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split this into steps, this makes it difficult to debug if something goes wrong.

You probably also want the ray down step to "always run" regardless of success/failure of all the rest of the steps.

--ray_job_dashboard_url http://localhost:8265 \
--skip_warmup \
--pickle_daft_module='false'
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just leverage ray submit here?

ray up .github/assets/benchmarking_ray_config.yaml -y
HEAD_NODE_IP=$(ray get-head-ip .github/assets/benchmarking_ray_config.yaml | tail -n 1)
ssh -o StrictHostKeyChecking=no -fN -L 8265:localhost:8265 -i ~/.ssh/ci-github-actions-ray-cluster-key.pem ubuntu@$HEAD_NODE_IP
export DAFT_ENABLE_RAY_TRACING=1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to make this configurable and default to true

new_base=${base//:/_}
mv "$filepath" "$dir/$new_base"
done
' _ {} +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This converts all of the instances of : in path names to _. Having : in path names will make the GitHub Actions upload-artifacts step fail.

@raunakab
Copy link
Contributor Author

raunakab commented Dec 2, 2024

Closing in favour of #3404.

@raunakab raunakab closed this Dec 2, 2024
@raunakab raunakab deleted the feat/infra branch December 2, 2024 03:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants