Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CHORE] Refactor logging #1489

Merged
merged 5 commits into from
Oct 16, 2023
Merged

[CHORE] Refactor logging #1489

merged 5 commits into from
Oct 16, 2023

Conversation

jaychia
Copy link
Contributor

@jaychia jaychia commented Oct 13, 2023

  1. Removes loguru as a dependency
  2. Removes custom handler that forwards logs to loguru
  3. Removes all custom formatting of our logs
  4. Warning which runner is being used is only done now for when the PyRunner is being used with an existing Ray connection

I did (3) because Daft is a library, and custom formatting of logs should be done by the application. E.g. if a user was building a webserver with their own custom logging setup, then Daft mangling log formatting on the root logger would be very annoying.

This is what our logs look like now on a jupyter notebook:
image

Enabling more verbose logs at a higher level (e.g. INFO logs) is performed by the user/embedding that uses Daft, e.g.

import logging

logging.basicConfig(
    format='%(asctime)s,%(msecs)03d %(levelname)-8s [%(pathname)s:%(lineno)d] %(message)s',
    datefmt='%Y-%m-%d:%H:%M:%S',
    level=logging.INFO,
)

# Outputs logs that look like:
# 2023-10-16:11:25:46,195 INFO     [/Users/jaychia/.cargo/registry/src/index.crates.io-6f17d22bba15001f/aws-config-0.55.3/src/meta/region.rs:43] load_region; provider=EnvironmentVariableRegionProvider { env: Env(Real) }

Daft now respects normal Python logging and does not rely on loguru at all to do any of this configurations. This lets us play much nicer with applications that use normal Python logging.

@github-actions github-actions bot added the chore label Oct 13, 2023
@codecov
Copy link

codecov bot commented Oct 16, 2023

Codecov Report

Merging #1489 (9be0bd0) into main (a24e918) will increase coverage by 74.87%.
Report is 6 commits behind head on main.
The diff coverage is 73.33%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff            @@
##           main    #1489       +/-   ##
=========================================
+ Coverage      0   74.87%   +74.87%     
=========================================
  Files         0       59       +59     
  Lines         0     6106     +6106     
=========================================
+ Hits          0     4572     +4572     
- Misses        0     1534     +1534     
Files Coverage Δ
daft/__init__.py 25.00% <ø> (ø)
daft/filesystem.py 86.82% <100.00%> (ø)
daft/internal/treenode.py 21.25% <100.00%> (ø)
daft/runners/profiler.py 40.47% <100.00%> (ø)
daft/runners/pyrunner.py 95.83% <100.00%> (ø)
daft/runners/ray_runner.py 91.48% <100.00%> (ø)
daft/table/table.py 81.94% <100.00%> (ø)
daft/udf_library/url_udfs.py 98.00% <100.00%> (ø)
daft/dataframe/to_torch.py 0.00% <0.00%> (ø)
daft/internal/rule_runner.py 0.00% <0.00%> (ø)
... and 3 more

... and 46 files with indirect coverage changes

@samster25
Copy link
Member

@jaychia I think this makes sense, but we should still be capturing logs somehow. I dont think we need to do a handler to stdout but having a log handler that writes our INFO logs to a file would be really helpful. Take a look on what ray did. They write to /tmp/ray/session_*/logs but give the option to redirect to stderr

@samster25
Copy link
Member

To prevent files from getting too big, ray also does log rotation. Let me know what you think, also cool to do in a follow up

@jaychia
Copy link
Contributor Author

jaychia commented Oct 16, 2023

@jaychia I think this makes sense, but we should still be capturing logs somehow. I dont think we need to do a handler to stdout but having a log handler that writes our INFO logs to a file would be really helpful. Take a look on what ray did. They write to /tmp/ray/session_*/logs but give the option to redirect to stderr

Yes so by default all of our logs will already go to stderr (because Python logs on stderr by default I believe)

In addition to that behavior, I can add some Daft env vars to log Daft logs into a file in /tmp/daft/... if specified. We can install this handler to simply capture all daft* logs.

@jaychia jaychia merged commit 9d20890 into main Oct 16, 2023
30 checks passed
@jaychia jaychia deleted the jay/logging branch October 16, 2023 06:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants