Skip to content

Commit

Permalink
Improve the robustness of old ZooKeeper log removal (line#1040)
Browse files Browse the repository at this point in the history
Motivation:

`OldLogRemover` in `ZooKeeperCommandExecutor` currently catches a `Throwable` when deleting an old log or its log blocks. However, it has two issues doing so:

- It doesn't handle an exception that's raised when reading the metadata of the old log.
- `Throwable` is way too wide exception to catch. Catching a `KeeperException` whose code is `NONODE` will be enough.
  - Note that the failure will only transfer the leadership to other replica, rather than stopping the whole replication process.

Modifications:

- `OldLogRemover` now catches `KeeperException` whose code is `NONODE` only.
- An attempt to read a missing log node's metadata is now handled properly.
- Added more detail to the log messages about missing nodes
  - Split `deleteLog()` into `deleteLog()` and `deleteLogBlock()`

Result:

- The leadership is not transferred anymore when `OldLogRemover` attempts to retrieve a missing log node's metadata, which is not really a critical issue.
  - Instead, the leadership will be transferred when an exception occurs not because of a missing node.
  • Loading branch information
trustin authored Oct 29, 2024
1 parent 98e0446 commit 9a9e098
Showing 1 changed file with 54 additions and 15 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,6 @@
import org.apache.curator.RetryPolicy;
import org.apache.curator.framework.CuratorFramework;
import org.apache.curator.framework.CuratorFrameworkFactory;
import org.apache.curator.framework.api.transaction.CuratorOp;
import org.apache.curator.framework.imps.CuratorFrameworkState;
import org.apache.curator.framework.recipes.cache.PathChildrenCache;
import org.apache.curator.framework.recipes.cache.PathChildrenCacheEvent;
Expand Down Expand Up @@ -248,38 +247,78 @@ private void deleteLogs() throws Exception {

final long minAllowedTimestamp = System.currentTimeMillis() - cfg.minLogAgeMillis();
final int targetCount = children.size() - cfg.maxLogCount();
final List<String> deleted = new ArrayList<>(targetCount);
final List<String> deletedPaths = new ArrayList<>(targetCount);
children.sort(Comparator.comparingLong(Long::parseLong));
try {
for (int i = 0; i < targetCount; ++i) {
final String logPath = absolutePath(LOG_PATH, children.get(i));
final LogMeta meta = Jackson.readValue(curator.getData().forPath(logPath), LogMeta.class);
final String childName = children.get(i);
final String logPath = absolutePath(LOG_PATH, childName);
final LogMeta logMeta = readLogMeta(logPath);
if (logMeta == null) {
continue;
}

if (meta.timestamp() >= minAllowedTimestamp) {
if (logMeta.timestamp() >= minAllowedTimestamp) {
// Do not delete the logs that are not old enough.
// We can break the loop here because the 'children' has been sorted by
// insertion order (sequence value).
break;
}

deleteLog(curator.transactionOp().delete().forPath(logPath), deleted, children.get(i));
for (long blockId : meta.blocks()) {
// Delete the log blocks first, so that we never have dangling log blocks.
for (long blockId : logMeta.blocks()) {
final String blockPath = absolutePath(LOG_BLOCK_PATH) + '/' + pathFromRevision(blockId);
deleteLog(curator.transactionOp().delete().forPath(blockPath),
deleted, children.get(i));
deleteLogBlock(logPath, logMeta, blockPath, deletedPaths);
}
deleteLog(logPath, logMeta, deletedPaths);
}
} finally {
logger.info("delete logs: {}", deleted);
logger.info("Deleted ZooKeeper nodes: {}", deletedPaths);
}
}

private void deleteLog(CuratorOp curatorOp, List<String> deleted, String childName) {
@Nullable
private LogMeta readLogMeta(String logPath) throws Exception {
try {
curator.transaction().forOperations(curatorOp);
deleted.add(childName);
} catch (Throwable t) {
logger.warn("Failed to delete ZooKeeper log: {}", childName, t);
return Jackson.readValue(curator.getData().forPath(logPath), LogMeta.class);
} catch (KeeperException e) {
if (e.code() == KeeperException.Code.NONODE) {
logger.warn("Attempted to read a missing log from ZooKeeper; " +
"maybe deleted already? logPath: {}", logPath, e);
return null;
} else {
throw e;
}
}
}

private void deleteLog(String logPath, LogMeta logMeta, List<String> deletedPaths) throws Exception {
try {
curator.delete().forPath(logPath);
deletedPaths.add(logPath);
} catch (KeeperException e) {
if (e.code() == KeeperException.Code.NONODE) {
logger.warn("Attempted to delete a missing log from ZooKeeper; " +
"maybe deleted already? logPath: {}, logMeta: {}", logPath, logMeta, e);
} else {
throw e;
}
}
}

private void deleteLogBlock(String logPath, LogMeta logMeta, String blockPath,
List<String> deletedPaths) throws Exception {
try {
curator.delete().forPath(blockPath);
deletedPaths.add(blockPath);
} catch (KeeperException e) {
if (e.code() == KeeperException.Code.NONODE) {
logger.warn("Attempted to delete a missing log block from ZooKeeper; " +
"maybe deleted already? blockPath: {}, logPath: {}, logMeta: {}",
blockPath, logPath, logMeta, e);
} else {
throw e;
}
}
}
}
Expand Down

0 comments on commit 9a9e098

Please sign in to comment.