New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[Data] Emit warning if local shuffle buffer would cause spilling #48925

Open

bveeramani wants to merge 3 commits into master from local-shuffle-buffer-warning

Member

bveeramani commented Nov 25, 2024

Why are these changes needed?

Blocks in the local shuffle buffer are memory from object store memory. This means that if you set a large shuffle buffer size, you might encounter spilling or even out-of-disk errors.

To mitigate this issue, this PR makes Ray Data emit a warning if the local shuffle buffer would cause spilling.

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

bveeramani added 3 commits

November 22, 2024 15:45


          Initial commit

cb830b0

Signed-off-by: Balaji Veeramani <[email protected]>


          Merge branch 'master' of https://github.com/ray-project/ray

1d9dd2e

Signed-off-by: Balaji Veeramani <[email protected]>


          Initial commit

999cc65

Signed-off-by: Balaji Veeramani <[email protected]>

bveeramani requested review from scottjlee, raulchen, stephanie-wang, omatthew98, alexeykudinkin and srinathk10 as code owners

November 25, 2024 20:08

bveeramani commented

View reviewed changes

python/ray/data/_internal/batcher.py

+              def _get_total_obj_store_mem_on_node() -> int:
+                  node_id = ray.get_runtime_context().get_node_id()
+                  total_resources_per_node = ray._private.state.total_resources_per_node()

Member Author

bveeramani Nov 25, 2024

IIRC this API requires an RPC. Since this is only called once per iteration, I think the performance should be good enough

Contributor

alexeykudinkin Nov 25, 2024

IIRC this API requires an RPC

Why don't we move this into DataContext and cache it there?

Contributor

raulchen Nov 26, 2024

maybe put it in data/_internal/util.py, as it is a utility function

bveeramani commented

View reviewed changes

python/ray/data/_internal/batcher.py

Comment on lines +242 to +248

+                          warnings.warn(
+                              "The node you're iterating on has "
+                              f"{memory_string(self._total_object_store_nbytes)} object "
+                              "store memory, but the shuffle buffer is estimated to use "
+                              f"{memory_string(self._estimated_max_buffer_nbytes)}. If you don't "
+                              f"decrease the shuffle buffer size from {self._buffer_min_size} rows, "
+                              "you might encounter spilling."

Member Author

bveeramani Nov 25, 2024

e.g.,

UserWarning: The node you're iterating on has 128.0MB object store memory, but the shuffle buffer is estimated to use 384.0MB. If you don't decrease the shuffle buffer size from 2 rows, you might encounter spilling.

alexeykudinkin reviewed

View reviewed changes

python/ray/data/_internal/batcher.py

Comment on lines +251 to 255

+                      block_accessor = BlockAccessor.for_block(block)
+                      self._total_rows_added += block_accessor.num_rows()
+                      self._total_nbytes_added += block_accessor.size_bytes()
+                      if block_accessor.num_rows() > 0:
                           self._builder.add_block(block)

Contributor

alexeykudinkin Nov 25, 2024

If we're not adding the block we'd be changing the counters

python/ray/data/_internal/batcher.py

Comment on lines +274 to +275

		* self._buffer_min_size
		* SHUFFLE_BUFFER_COMPACTION_RATIO

Contributor

alexeykudinkin Nov 25, 2024

Let's extract this to a common util

python/ray/data/_internal/batcher.py

+              def _get_total_obj_store_mem_on_node() -> int:
+                  node_id = ray.get_runtime_context().get_node_id()
+                  total_resources_per_node = ray._private.state.total_resources_per_node()

Contributor

alexeykudinkin Nov 25, 2024

IIRC this API requires an RPC

Why don't we move this into DataContext and cache it there?

bveeramani assigned raulchen

raulchen approved these changes

View reviewed changes

python/ray/data/_internal/batcher.py

+              def _get_total_obj_store_mem_on_node() -> int:
+                  node_id = ray.get_runtime_context().get_node_id()
+                  total_resources_per_node = ray._private.state.total_resources_per_node()

Contributor

raulchen Nov 26, 2024

maybe put it in data/_internal/util.py, as it is a utility function

python/ray/data/_internal/batcher.py

+                      # encounter spilling.
+                      if (
+                          self._estimated_max_buffer_nbytes is not None
+                          and self._estimated_max_buffer_nbytes > self._total_object_store_nbytes

Contributor

raulchen Nov 26, 2024

nit, 1) this if statement can be put under if BlockAccessor.for_block(block).num_rows() > 0: and after after_block. because this will guarantee that _estimated_max_buffer_nbytes is not None
2) we can skip calculating _estimated_max_buffer_nbytes after it's calculated once.

alexeykudinkin reviewed

View reviewed changes

python/ray/data/_internal/batcher.py

Comment on lines +233 to +236

+                      # Because Arrow tables are memory mapped, blocks in the local shuffle buffer
+                      # resides in object store memory and not local heap memory. So, if you specify a
+                      # large buffer size and there isn't enough object store memory on the node, you
+                      # encounter spilling.

Contributor

alexeykudinkin Nov 26, 2024

This comment is confusing actually:

There's no relation b/w shuffle buffer and the object storage
Produced block will likely get into the OS only once it's being yielded from operator (that's using the batcher)

python/ray/data/_internal/batcher.py

+                      # encounter spilling.
+                      if (
+                          self._estimated_max_buffer_nbytes is not None
+                          and self._estimated_max_buffer_nbytes > self._total_object_store_nbytes

Contributor

alexeykudinkin Nov 26, 2024

@bveeramani this value should be doubled:

One half is blocks before batching
Other half is new block produced

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

alexeykudinkin alexeykudinkin left review comments

raulchen raulchen approved these changes

scottjlee Awaiting requested review from scottjlee scottjlee is a code owner

stephanie-wang Awaiting requested review from stephanie-wang stephanie-wang is a code owner

omatthew98 Awaiting requested review from omatthew98 omatthew98 is a code owner

srinathk10 Awaiting requested review from srinathk10 srinathk10 is a code owner

Labels

None yet