Skip to content

Commit

Permalink
perf: enhance the loading process of replicas particularly when a sig…
Browse files Browse the repository at this point in the history
…nificant number of replicas are spread across multiple disks (#2078)

Immediately after the replica server is started, all of the replicas under the data
directory would be loaded. Currently, a loading task is launched for each replica
directory. The tasks for loading replica directories are pushed into a partitioned
thread pool (namely `THREAD_POOL_REPLICATION`) disk by disk: only after all
of the replica directories on the current disk have been pushed into the thread
pool would the process move on to the next disk. Since the thread pool is partitioned
while the hash for each task is the total number of the tasks that have been in the
pool before this task is added, all of the replica directories on one disk would be
executed concurrently. This would lead to two problems once there are a great number
of replica directories on each disk:

- I/O usage for each disk might become saturated: its `%util` might become 100%;
- The entire loading process is blocked on each single disk: during a long period only
one disk is keeping busy while others are idle.

The replica server seems getting stuck in loading replicas after it is started. This is
unacceptable and should be changed.

The improved version allows the replica directories on different disks to be loaded
simultaneously: every disk would be busy loading replicas. Also, loading tasks would
be pushed into a non-partitioned thread pool (i.e. `THREAD_POOL_LOCAL_APP`)
instead of the partitioned, making tasks across multiple threads auto-balanced to
prevent some threads from being starved while others are stuffed.

And new parameter is added to restrict the max number of replicas allowed to be loaded
simultaneously for each disk, in case that I/O usage for each disk becomes saturated.
Another parameter is added to ensure that the main thread waiting all loading tasks to
finished would not be blocked on one task too long while the number of tasks for loading
replica directories simultaneously on a single disk has reached its limit.

Parameters are added as follows:

```diff
[replication]
+ max_replicas_on_load_for_each_disk = 256
+ load_replica_max_wait_time_ms = 10
```
  • Loading branch information
empiredan authored Dec 26, 2024
1 parent 7ceef5d commit d711b08
Show file tree
Hide file tree
Showing 15 changed files with 1,100 additions and 235 deletions.
2 changes: 1 addition & 1 deletion .clang-tidy
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
CheckOptions: []
# Disable some checks that are not useful for us now.
# They are sorted by names, and should be consistent to build_tools/clang_tidy.py.
Checks: 'abseil-*,boost-*,bugprone-*,cert-*,clang-analyzer-*,concurrency-*,cppcoreguidelines-*,darwin-*,fuchsia-*,google-*,hicpp-*,linuxkernel-*,llvm-*,misc-*,modernize-*,performance-*,portability-*,readability-*,-bugprone-easily-swappable-parameters,-bugprone-lambda-function-name,-bugprone-macro-parentheses,-cert-err58-cpp,-concurrency-mt-unsafe,-cppcoreguidelines-avoid-c-arrays,-cppcoreguidelines-avoid-magic-numbers,-cppcoreguidelines-avoid-non-const-global-variables,-cppcoreguidelines-macro-usage,-cppcoreguidelines-non-private-member-variables-in-classes,-cppcoreguidelines-owning-memory,-cppcoreguidelines-pro-bounds-array-to-pointer-decay,-cppcoreguidelines-pro-bounds-pointer-arithmetic,-cppcoreguidelines-pro-type-const-cast,-cppcoreguidelines-pro-type-union-access,-fuchsia-default-arguments-calls,-fuchsia-overloaded-operator,-fuchsia-statically-constructed-objects,-google-readability-avoid-underscore-in-googletest-name,-hicpp-avoid-c-arrays,-hicpp-named-parameter,-hicpp-no-array-decay,-llvm-include-order,-misc-definitions-in-headers,-misc-non-private-member-variables-in-classes,-misc-unused-parameters,-modernize-avoid-c-arrays,-modernize-replace-disallow-copy-and-assign-macro,-modernize-use-trailing-return-type,-performance-unnecessary-value-param,-readability-function-cognitive-complexity,-readability-identifier-length,-readability-magic-numbers,-readability-named-parameter'
Checks: 'abseil-*,boost-*,bugprone-*,cert-*,clang-analyzer-*,concurrency-*,cppcoreguidelines-*,darwin-*,fuchsia-*,google-*,hicpp-*,linuxkernel-*,llvm-*,misc-*,modernize-*,performance-*,portability-*,readability-*,-bugprone-easily-swappable-parameters,-bugprone-lambda-function-name,-bugprone-macro-parentheses,-cert-err58-cpp,-concurrency-mt-unsafe,-cppcoreguidelines-avoid-c-arrays,-cppcoreguidelines-avoid-magic-numbers,-cppcoreguidelines-avoid-non-const-global-variables,-cppcoreguidelines-macro-usage,-cppcoreguidelines-non-private-member-variables-in-classes,-cppcoreguidelines-owning-memory,-cppcoreguidelines-pro-bounds-array-to-pointer-decay,-cppcoreguidelines-pro-bounds-pointer-arithmetic,-cppcoreguidelines-pro-type-const-cast,-cppcoreguidelines-pro-type-union-access,-fuchsia-default-arguments-calls,-fuchsia-overloaded-operator,-fuchsia-statically-constructed-objects,-google-readability-avoid-underscore-in-googletest-name,-hicpp-avoid-c-arrays,-hicpp-named-parameter,-hicpp-no-array-decay,-llvm-include-order,-misc-definitions-in-headers,-misc-non-private-member-variables-in-classes,-misc-unused-parameters,-modernize-avoid-bind,-modernize-avoid-c-arrays,-modernize-replace-disallow-copy-and-assign-macro,-modernize-use-trailing-return-type,-performance-unnecessary-value-param,-readability-function-cognitive-complexity,-readability-identifier-length,-readability-magic-numbers,-readability-named-parameter,-readability-suspicious-call-argument'
ExtraArgs:
ExtraArgsBefore: []
FormatStyle: none
Expand Down
4 changes: 3 additions & 1 deletion build_tools/clang_tidy.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,14 +88,16 @@ def tidy_on_path(path):
"-misc-definitions-in-headers,"
"-misc-non-private-member-variables-in-classes,"
"-misc-unused-parameters,"
"-modernize-avoid-bind,"
"-modernize-avoid-c-arrays,"
"-modernize-replace-disallow-copy-and-assign-macro,"
"-modernize-use-trailing-return-type,"
"-performance-unnecessary-value-param,"
"-readability-function-cognitive-complexity,"
"-readability-identifier-length,"
"-readability-magic-numbers,"
"-readability-named-parameter",
"-readability-named-parameter,"
"-readability-suspicious-call-argument",
"-extra-arg=-language=c++",
"-extra-arg=-std=c++17",
"-extra-arg=-Ithirdparty/output/include"]
Expand Down
2 changes: 1 addition & 1 deletion src/common/replication.codes.h
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,6 @@ MAKE_EVENT_CODE(LPC_META_STATE_NORMAL, TASK_PRIORITY_COMMON)

// THREAD_POOL_REPLICATION
#define CURRENT_THREAD_POOL THREAD_POOL_REPLICATION
MAKE_EVENT_CODE(LPC_REPLICATION_INIT_LOAD, TASK_PRIORITY_COMMON)
MAKE_EVENT_CODE(RPC_REPLICATION_WRITE_EMPTY, TASK_PRIORITY_COMMON)
MAKE_EVENT_CODE(LPC_PER_REPLICA_CHECKPOINT_TIMER, TASK_PRIORITY_COMMON)
MAKE_EVENT_CODE(LPC_PER_REPLICA_COLLECT_INFO_TIMER, TASK_PRIORITY_COMMON)
Expand Down Expand Up @@ -186,6 +185,7 @@ MAKE_EVENT_CODE(LPC_REPLICATION_HIGH, TASK_PRIORITY_HIGH)

// THREAD_POOL_LOCAL_APP
#define CURRENT_THREAD_POOL THREAD_POOL_LOCAL_APP
MAKE_EVENT_CODE(LPC_REPLICATION_INIT_LOAD, TASK_PRIORITY_COMMON)
MAKE_EVENT_CODE(LPC_WRITE, TASK_PRIORITY_COMMON)
MAKE_EVENT_CODE(LPC_read_THROTTLING_DELAY, TASK_PRIORITY_COMMON)
#undef CURRENT_THREAD_POOL
Expand Down
1 change: 1 addition & 0 deletions src/meta/meta_service.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
#include <boost/lexical_cast.hpp>
#include <algorithm> // for std::remove_if
#include <chrono>
#include <cstdint>
#include <functional>
#include <ostream>
#include <string_view>
Expand Down
1 change: 1 addition & 0 deletions src/replica/replica.h
Original file line number Diff line number Diff line change
Expand Up @@ -610,6 +610,7 @@ class replica : public serverlet<replica>, public ref_counter, public replica_ba
friend class replica_disk_test;
friend class replica_disk_migrate_test;
friend class open_replica_test;
friend class mock_load_replica;
friend class replica_follower;
friend class ::pegasus::server::pegasus_server_test_base;
friend class ::pegasus::server::rocksdb_wrapper_test;
Expand Down
Loading

0 comments on commit d711b08

Please sign in to comment.