-
Notifications
You must be signed in to change notification settings - Fork 21
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Implement GC_REPL_REQ Based on DSN to Prevent Resource Leaks (#576)
* Implement GC_REPL_REQ Based on DSN to Prevent Resource Leaks This commit introduces a mechanism to garbage collect (GC) replication requests (rreqs) that may hang indefinitely, thereby consuming memory and disk resources unnecessarily. These rreqs can enter a hanging state under several circumstances, as outlined below: 1. Scenario with Delayed Commit: - Follower F1 receives LSN 100 and DSN 104 from Leader L1 and takes longer than the raft timeout to precommit/commit it. - L1 resends LSN 100, causing F1 to fetch the data again. Since LSN 100 was committed in a previous attempt, this log entry is skipped, leaving the rreq hanging indefinitely. 2. Scenario with Leader Failure Before Data Completion: - Follower F1 receives LSN 100 from L1, but before all data is fetched/pushed, L1 fails and L2 becomes the new leader. - L2 resends LSN 100 with L2 as the new originator. F1 proceeds with the new rreq and commits it, but the initial rreq from L1 hangs indefinitely as it cannot fetch data from the new leader L2. 3. Scenario with Leader Failure After Data Write: - Follower F1 receives data (DSN 104) from L1 and writes it. Before the log of LSN 100 reaches F1, L1 fails and L2 becomes the new leader. - L2 resends LSN 100 to F1, and F1 fetches DSN 104 from L2, leaving the original rreq hanging. This garbage collection process cleans up based on DSN. Any rreqs in `m_repl_key_req_map`, whose DSN is already committed (`rreq->dsn < repl_dev->m_next_dsn`), will be GC'd. This is safe on the follower side, as the follower updates `m_next_dsn` during commit. Any DSN below `cur_dsn` should already be committed, implying that the rreq should already be removed from `m_repl_key_req_map`. On the leader side, since `m_next_dsn` is updated when sending out the proposal, it is not safe to clean up based on `m_next_dsn`. Therefore, we explicitly skip the leader in this GC process. Skipping localize raft logs we already committed. Leader may send duplicate raft logs, if we localize them unconditionally duplicate data will be written to chunk during fetch_data. It is safe for us to skip those logs that already committed, there is no way those LSN can be over-written. Signed-off-by: Xiaoxi Chen <[email protected]>
- Loading branch information
1 parent
804cd6b
commit 50f42ff
Showing
5 changed files
with
80 additions
and
34 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters