Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] Add better detection of Ray Job environment #3148

Merged
merged 3 commits into from
Nov 1, 2024
Merged

Conversation

jaychia
Copy link
Contributor

@jaychia jaychia commented Oct 30, 2024

When running in a Ray Job, without the user invoking any Ray commands or ray.init() explicitly, the ray.is_initialized() function returns False.

This means that Daft "does not know" that it is running inside of a Ray cluster, and thus will not default to using the RayRunner. This can lead to unexpected behavior when using daft-launcher because a user must know to call daft.context.set_runner_ray().

This PR changes that behavior by attempting to look up the $RAY_JOB_ID environment variable, as a heuristic to tell whether or not it is currently running inside of a Ray job.

To test, I just ran a Ray job and called daft.context.get_context() after initializing a Daft dataframe

image

@github-actions github-actions bot added the enhancement New feature or request label Oct 30, 2024
@jaychia jaychia requested a review from raunakab October 30, 2024 00:10
Copy link

codspeed-hq bot commented Oct 30, 2024

CodSpeed Performance Report

Merging #3148 will not alter performance

Comparing jay/ray-init-2 (32e44d0) with main (5fc9531)

Summary

✅ 17 untouched benchmarks

Copy link

codecov bot commented Oct 30, 2024

Codecov Report

Attention: Patch coverage is 50.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 78.99%. Comparing base (e84ed5b) to head (32e44d0).
Report is 18 commits behind head on main.

Files with missing lines Patch % Lines
daft/context.py 50.00% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3148      +/-   ##
==========================================
+ Coverage   78.80%   78.99%   +0.19%     
==========================================
  Files         621      634      +13     
  Lines       74809    76913    +2104     
==========================================
+ Hits        58954    60759    +1805     
- Misses      15855    16154     +299     
Files with missing lines Coverage Δ
daft/context.py 79.75% <50.00%> (-0.25%) ⬇️

... and 75 files with indirect coverage changes

Copy link
Contributor

@raunakab raunakab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving, but have a quick question.

daft/context.py Show resolved Hide resolved
@jaychia jaychia merged commit 3cef614 into main Nov 1, 2024
41 of 42 checks passed
@jaychia jaychia deleted the jay/ray-init-2 branch November 1, 2024 20:52
sagiahrac pushed a commit to sagiahrac/Daft that referenced this pull request Nov 4, 2024
When running in a Ray Job, without the user invoking any Ray commands or
`ray.init()` explicitly, the `ray.is_initialized()` function returns
False.

This means that Daft "does not know" that it is running inside of a Ray
cluster, and thus will not default to using the RayRunner. This can lead
to unexpected behavior when using `daft-launcher` because a user must
know to call `daft.context.set_runner_ray()`.

This PR changes that behavior by attempting to look up the `$RAY_JOB_ID`
environment variable, as a heuristic to tell whether or not it is
currently running inside of a Ray job.

To test, I just ran a Ray job and called `daft.context.get_context()`
after initializing a Daft dataframe

<img width="1350" alt="image"
src="https://github.com/user-attachments/assets/0a6d8ae4-034a-424d-a3d7-9311d08be454">

---------

Co-authored-by: EC2 Default User <[email protected]>
Co-authored-by: Jay Chia <[email protected]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants