Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit RW separation to remote store enabled clusters and update recovery flow #16760

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

mch2
Copy link
Member

@mch2 mch2 commented Dec 2, 2024

Description

This PR includes multiple changes to search replica recovery to further decouple these shards from primaries.

  1. Change to recover as empty store instead of peer. This will run a store recovery that syncs segments from remote store directly and eliminate any primary communication.
  2. Remove search replicas from the in-sync allocation ID set and update routing table to exclude them from allAllocationIds. This ensures primaries aren't tracking or validating the routing table for any search replica's presence.
  3. Simplify RW separation by limiting to only remote store enabled clusters. There are versions of the above changes that are still possible with primary based node-node replication but they require additional public api changes and I don't think we have the need at this time.

Related Issues

Resolves #15952

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions github-actions bot added the v2.19.0 Issues and PRs related to version 2.19.0 label Dec 2, 2024
Copy link
Contributor

github-actions bot commented Dec 2, 2024

❌ Gradle check result for a932d59:

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Dec 2, 2024

❌ Gradle check result for a932d59:

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Dec 3, 2024

✅ Gradle check result for a932d59: SUCCESS

Copy link

codecov bot commented Dec 3, 2024

Codecov Report

Attention: Patch coverage is 66.66667% with 9 lines in your changes missing coverage. Please review.

Project coverage is 72.10%. Comparing base (d2a1477) to head (8935bc7).
Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
...search/cluster/routing/IndexShardRoutingTable.java 50.00% 2 Missing and 1 partial ⚠️
...in/java/org/opensearch/index/shard/IndexShard.java 25.00% 2 Missing and 1 partial ⚠️
...a/org/opensearch/cluster/routing/ShardRouting.java 33.33% 0 Missing and 2 partials ⚠️
...java/org/opensearch/index/shard/StoreRecovery.java 75.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #16760      +/-   ##
============================================
- Coverage     72.16%   72.10%   -0.06%     
+ Complexity    65257    65187      -70     
============================================
  Files          5318     5318              
  Lines        303988   304002      +14     
  Branches      43987    43995       +8     
============================================
- Hits         219358   219196     -162     
- Misses        66674    66821     +147     
- Partials      17956    17985      +29     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

This PR includes multiple changes to search replica recovery.
1. Change search only replica copies to recover as empty store instead of PEER. This will run a store recovery that syncs segments from remote store directly and eliminate any primary communication.
2. Remove search replicas from the in-sync allocation ID set and update routing table to exclude them from allAllocationIds.  This ensures primaries aren't tracking or validating the routing table for any search replica's presence.
3. Change search replica validation to require remote store.  There are versions of the above changes that are still possible with primary based node-node replication, but I don't think they are worth making  at this time.

Signed-off-by: Marc Handalian <[email protected]>
Signed-off-by: Marc Handalian <[email protected]>
Signed-off-by: Marc Handalian <[email protected]>
@mch2 mch2 added the backport 2.x Backport to 2.x branch label Dec 3, 2024
Copy link
Contributor

github-actions bot commented Dec 3, 2024

✅ Gradle check result for 8935bc7: SUCCESS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch v2.19.0 Issues and PRs related to version 2.19.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RW Separation] Change search replica recovery flow
1 participant