Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
perf: enhance the loading process of replicas particularly when a sig…
…nificant number of replicas are spread across multiple disks (#2078) Immediately after the replica server is started, all of the replicas under the data directory would be loaded. Currently, a loading task is launched for each replica directory. The tasks for loading replica directories are pushed into a partitioned thread pool (namely `THREAD_POOL_REPLICATION`) disk by disk: only after all of the replica directories on the current disk have been pushed into the thread pool would the process move on to the next disk. Since the thread pool is partitioned while the hash for each task is the total number of the tasks that have been in the pool before this task is added, all of the replica directories on one disk would be executed concurrently. This would lead to two problems once there are a great number of replica directories on each disk: - I/O usage for each disk might become saturated: its `%util` might become 100%; - The entire loading process is blocked on each single disk: during a long period only one disk is keeping busy while others are idle. The replica server seems getting stuck in loading replicas after it is started. This is unacceptable and should be changed. The improved version allows the replica directories on different disks to be loaded simultaneously: every disk would be busy loading replicas. Also, loading tasks would be pushed into a non-partitioned thread pool (i.e. `THREAD_POOL_LOCAL_APP`) instead of the partitioned, making tasks across multiple threads auto-balanced to prevent some threads from being starved while others are stuffed. And new parameter is added to restrict the max number of replicas allowed to be loaded simultaneously for each disk, in case that I/O usage for each disk becomes saturated. Another parameter is added to ensure that the main thread waiting all loading tasks to finished would not be blocked on one task too long while the number of tasks for loading replica directories simultaneously on a single disk has reached its limit. Parameters are added as follows: ```diff [replication] + max_replicas_on_load_for_each_disk = 256 + load_replica_max_wait_time_ms = 10 ```
- Loading branch information