Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Fix gcs logging #48952

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

dentiny
Copy link
Contributor

@dentiny dentiny commented Nov 26, 2024

Same motivation as #48931, but different implementation.

TLDR for the problem:

  • The excessive logging is caused by bug in setting rotation size in C++ side spdlog, and redirection log from python side doesn't have rotation support
  • The proposed solution in this PR is to manage the whole log via spdlog, and disable redirection logic from python

Signed-off-by: hjiang <[email protected]>
@dentiny dentiny requested a review from a team as a code owner November 26, 2024 22:15
@dentiny dentiny added the go add ONLY when ready to merge, run all tests label Nov 26, 2024
Signed-off-by: hjiang <[email protected]>
python/ray/_private/services.py Outdated Show resolved Hide resolved
@@ -38,17 +41,40 @@ DEFINE_string(session_name,
"session_name: The session name (ClusterID) of the cluster.");
DEFINE_string(ray_commit, "", "The commit hash of Ray.");

namespace {
// GCS server output filename.
constexpr std::string_view kGcsServerLog = "gcs_server.out";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_log_file_handles can create names like gcs_server.2.out if gcs_server.out and gcs_server.1.out both exists. Do we have such thing in spdlog?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All rotated logs are suffixed with id, similar to what you're described.

src/ray/gcs/gcs_server/gcs_server_main.cc Outdated Show resolved Hide resolved
? std::numeric_limits<int64_t>::max()
: FLAGS_log_rotation_size;
RAY_CHECK_EQ(setenv(
"RAY_ROTATION_MAX_BYTES", std::to_string(log_rotation_max_size), /*overwrite=*/1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain diff of RAY_ROTATION_MAX_BYTES vs FLAGS_log_rotation_size ? If we already have the former, then we only need to fix existing behavior? I see gcs_server_main.cc already call ray::RayLog::StartRayLog and why does the log rotations in it do not work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't do anything in this PR, instead only do 2 things:

  1. remove python stdout/stderr redirection
  2. change ray_log_shutdown_raii from /*log_dir=*/"" to /*log_dir=*/FLAGS_log_dir

will the rotations automatically work?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but the file name will be changed. We want to keep the existing gcs_server.out filename for backward compatibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will the rotations automatically work?

To answer your question, passing the log directory works for log rotation.
But one motivation would be backward compatibility, namely keep the gcs_server.out filename.

src/ray/gcs/gcs_server/gcs_server_main.cc Outdated Show resolved Hide resolved
python/ray/_private/services.py Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants