Big value serialization in migration #4100

adiholden · 2024-11-10T08:54:56Z

No description provided.

Before this PR: We serialized a `RESTORE` command for each entry into a string, and then push that string to the wire. This means that, when serializing an entry of size X, we consume 2X memory during the migration. This PR: Instead of serializing into a string, we serialize into the wire directly. Luckily, only a small modification is needed in the way we interact with `crc64`, which works well even in chunks. Fixes #4100

Before this PR we used `RESTORE` commands for transferring data between source and target nodes in cluster slots migration. While this _works_, it has a side effect of consuming 2x memory for huge values (i.e. if a single key's value takes 10gb, serializing it will take 20gb or even 30gb). With this PR we break down huge keys into multiple commands (`RPUSH`, `HSET`, etc), respecting the existing `--serialization_max_chunk_size` flag. Fixes #4100

* feat: Huge values breakdown in cluster migration Before this PR we used `RESTORE` commands for transferring data between source and target nodes in cluster slots migration. While this _works_, it has a side effect of consuming 2x memory for huge values (i.e. if a single key's value takes 10gb, serializing it will take 20gb or even 30gb). With this PR we break down huge keys into multiple commands (`RPUSH`, `HSET`, etc), respecting the existing `--serialization_max_chunk_size` flag. Part of #4100

This actually yields between invocations, and tests that modifying huge values while migrating slots to other nodes works. Still TODO: assert on RSS size during migration (will probably need to create bigger containers) Fixes #4100

With #4144 we break huge values slot migration into multiple commands. This PR now adds yield between those commands. It also adds a test that checks that modifying huge values while doing a migration works well, and that RSS doesn't grow too much. Fixes #4100

adiholden added this to the dfly cluster v4 milestone Nov 10, 2024

adiholden assigned chakaz Nov 10, 2024

chakaz mentioned this issue Nov 12, 2024

chore: Reduce memory consumption when migrating huge values #4119

Closed

chakaz mentioned this issue Nov 18, 2024

feat: Huge values breakdown in cluster migration #4144

Merged

chakaz linked a pull request Nov 26, 2024 that will close this issue

feat: Yield inside huge values migration serialization #4197

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Big value serialization in migration #4100

Big value serialization in migration #4100

adiholden commented Nov 10, 2024

Big value serialization in migration #4100

Big value serialization in migration #4100

Comments

adiholden commented Nov 10, 2024