-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEAT] Add better detection of Ray Job environment #3148
Conversation
CodSpeed Performance ReportMerging #3148 will not alter performanceComparing Summary
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3148 +/- ##
==========================================
+ Coverage 78.80% 78.99% +0.19%
==========================================
Files 621 634 +13
Lines 74809 76913 +2104
==========================================
+ Hits 58954 60759 +1805
- Misses 15855 16154 +299
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving, but have a quick question.
When running in a Ray Job, without the user invoking any Ray commands or `ray.init()` explicitly, the `ray.is_initialized()` function returns False. This means that Daft "does not know" that it is running inside of a Ray cluster, and thus will not default to using the RayRunner. This can lead to unexpected behavior when using `daft-launcher` because a user must know to call `daft.context.set_runner_ray()`. This PR changes that behavior by attempting to look up the `$RAY_JOB_ID` environment variable, as a heuristic to tell whether or not it is currently running inside of a Ray job. To test, I just ran a Ray job and called `daft.context.get_context()` after initializing a Daft dataframe <img width="1350" alt="image" src="https://github.com/user-attachments/assets/0a6d8ae4-034a-424d-a3d7-9311d08be454"> --------- Co-authored-by: EC2 Default User <[email protected]> Co-authored-by: Jay Chia <[email protected]@users.noreply.github.com>
When running in a Ray Job, without the user invoking any Ray commands or
ray.init()
explicitly, theray.is_initialized()
function returns False.This means that Daft "does not know" that it is running inside of a Ray cluster, and thus will not default to using the RayRunner. This can lead to unexpected behavior when using
daft-launcher
because a user must know to calldaft.context.set_runner_ray()
.This PR changes that behavior by attempting to look up the
$RAY_JOB_ID
environment variable, as a heuristic to tell whether or not it is currently running inside of a Ray job.To test, I just ran a Ray job and called
daft.context.get_context()
after initializing a Daft dataframe