Disclaimer: This is not an officially supported Google product.
#include "simple_rcu/rcu.h"
// Shared object that provides an instance of `MyType`:
auto rcu = std::make_shared<simple_rcu::Rcu<MyType>>();
// Any thread can atomically update the value (can be also a `unique_ptr`,
// which auto-converts to `shared_ptr`). This is a relatively slow operation
// (takes a lock internally).
rcu->Update(std::make_shared<MyType>(...));
// Afterwards each reader thread can fetch a const pointer to a snapshot of the
// instance. This is a lock-free operation.
// See benchmark `BM_ReadSharedPtrsThreadLocal` below.
auto ref = simple_rcu::ReadPtr(rcu);
// `ref` now holds a `unique_ptr` to a stable, thread-local snapshot of
// `const MyType`.
// Each reader thread creates a local accessor to `rcu`, which will hold
// snapshots of `MyType`.
simple_rcu::Rcu<MyType>::View local(rcu);
// Afterwards the reader thread can repeatedly fetch a const pointer to a
// snapshot of the instance. This is faster than the simple usage above (see
// benchmark `BM_Reads` below), since it avoids the internal bookkeeping cost of a
// `thread_local` variable, at the cost of explicitly maintaining a `View`
// variable. It effectively involves only a single atomic exchange
// (https://en.cppreference.com/w/cpp/atomic/atomic/exchange) instruction.
auto ref = local.ReadPtr();
// `ref` now holds a `unique_ptr` to a stable, thread-local snapshot of
// `const MyType`.
See copy_rcu_test.cc for more examples.
cmake
(https://cmake.org/).- https://github.com/abseil/abseil-cpp (Git submodule)
Testing dependencies:
- https://github.com/google/benchmark (Git submodule)
- https://github.com/google/googletest (Git submodule)
The numbers /1/
or /4/
after a benchmark name denote the number of
concurrent running threads performing the opposite operation - updates when
benchmarking reads and vice versa.
As shown below, reads are very fast and their performance suffers neither from running concurrent read (1 to 3) nor update (1 or 4) threads. In fact running multiple read operations benefits from CPU caching.
Updates are much slower, as expected. Since (unlike reads) they acquire a standard mutex, lock contention occurs when there are multiple concurrent update operations. Also, when there are multiple concurrent readers, updates become slower, since they need to distribute values to the readers' thread-local copies.
g++
on Core i5:-
2023-03-08T19:37:57+01:00 Running build/rel-gcc/simple_rcu/copy_rcu_benchmark Run on (4 X 3572.69 MHz CPU s) CPU Caches: L1 Data 32 KiB (x4) L1 Instruction 32 KiB (x4) L2 Unified 256 KiB (x4) L3 Unified 6144 KiB (x1) Load Average: 0.33, 1.50, 1.26 ------------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------------ BM_Reads/1/threads:1 88.0 ns 88.0 ns 7925289 BM_Reads/1/threads:2 17.7 ns 35.4 ns 21247450 BM_Reads/1/threads:4 6.64 ns 22.5 ns 35341128 BM_Reads/1/threads:8 2.30 ns 10.1 ns 61592216 BM_Reads/1/threads:16 1.58 ns 7.43 ns 87989696 BM_Reads/1/threads:32 1.27 ns 6.74 ns 94589952 BM_Reads/1/threads:64 0.841 ns 6.64 ns 106161024 BM_Reads/4/threads:1 84.3 ns 84.3 ns 11161830 BM_Reads/4/threads:2 6.84 ns 13.7 ns 39595196 BM_Reads/4/threads:4 4.89 ns 15.8 ns 55958100 BM_Reads/4/threads:8 2.11 ns 8.99 ns 69275648 BM_Reads/4/threads:16 1.59 ns 7.79 ns 67754640 BM_Reads/4/threads:32 1.49 ns 7.00 ns 98964928 BM_Reads/4/threads:64 0.735 ns 6.54 ns 103060736 BM_ReadSharedPtrs/1/threads:1 62.9 ns 62.9 ns 11343017 BM_ReadSharedPtrs/1/threads:2 12.4 ns 24.9 ns 28810718 BM_ReadSharedPtrs/1/threads:4 4.03 ns 13.6 ns 47565036 BM_ReadSharedPtrs/1/threads:8 2.02 ns 8.41 ns 73046840 BM_ReadSharedPtrs/1/threads:16 1.62 ns 7.22 ns 96594224 BM_ReadSharedPtrs/1/threads:32 1.37 ns 6.76 ns 98965440 BM_ReadSharedPtrs/1/threads:64 0.925 ns 6.57 ns 102769984 BM_ReadSharedPtrs/4/threads:1 50.0 ns 50.0 ns 10000000 BM_ReadSharedPtrs/4/threads:2 5.64 ns 11.2 ns 56381504 BM_ReadSharedPtrs/4/threads:4 3.83 ns 11.2 ns 45073088 BM_ReadSharedPtrs/4/threads:8 2.17 ns 8.11 ns 78239648 BM_ReadSharedPtrs/4/threads:16 1.72 ns 7.28 ns 89113344 BM_ReadSharedPtrs/4/threads:32 1.46 ns 6.82 ns 99200224 BM_ReadSharedPtrs/4/threads:64 1.02 ns 6.52 ns 104567680 BM_ReadSharedPtrsThreadLocal/1/threads:1 101 ns 101 ns 6553118 BM_ReadSharedPtrsThreadLocal/1/threads:2 51.5 ns 103 ns 6001084 BM_ReadSharedPtrsThreadLocal/1/threads:4 41.2 ns 134 ns 5285084 BM_ReadSharedPtrsThreadLocal/1/threads:8 35.4 ns 138 ns 5366944 BM_ReadSharedPtrsThreadLocal/1/threads:16 26.3 ns 128 ns 5000608 BM_ReadSharedPtrsThreadLocal/1/threads:32 28.3 ns 142 ns 5779744 BM_ReadSharedPtrsThreadLocal/1/threads:64 19.2 ns 126 ns 5579776 BM_ReadSharedPtrsThreadLocal/4/threads:1 113 ns 113 ns 5932706 BM_ReadSharedPtrsThreadLocal/4/threads:2 64.4 ns 129 ns 5636694 BM_ReadSharedPtrsThreadLocal/4/threads:4 45.8 ns 129 ns 4000000 BM_ReadSharedPtrsThreadLocal/4/threads:8 34.4 ns 127 ns 5632464 BM_ReadSharedPtrsThreadLocal/4/threads:16 28.4 ns 121 ns 4951344 BM_ReadSharedPtrsThreadLocal/4/threads:32 33.7 ns 139 ns 5735200 BM_ReadSharedPtrsThreadLocal/4/threads:64 25.9 ns 137 ns 6150656 BM_Updates/1/threads:1 126 ns 126 ns 5545689 BM_Updates/1/threads:2 136 ns 271 ns 2289246 BM_Updates/1/threads:4 133 ns 320 ns 1883372 BM_Updates/1/threads:8 114 ns 145 ns 3911272 BM_Updates/1/threads:16 98.1 ns 138 ns 5120464 BM_Updates/1/threads:32 92.4 ns 146 ns 3935488 BM_Updates/1/threads:64 73.1 ns 141 ns 4169792 BM_Updates/4/threads:1 394 ns 310 ns 2319023 BM_Updates/4/threads:2 320 ns 403 ns 1572682 BM_Updates/4/threads:4 347 ns 460 ns 2120880 BM_Updates/4/threads:8 317 ns 337 ns 2066568 BM_Updates/4/threads:16 279 ns 341 ns 1708928 BM_Updates/4/threads:32 265 ns 330 ns 2127232 BM_Updates/4/threads:64 233 ns 321 ns 2190336
clang++11
on Core i5:-
2023-03-08T19:38:30+01:00 Running build/rel-clang11/simple_rcu/copy_rcu_benchmark Run on (4 X 1694.88 MHz CPU s) CPU Caches: L1 Data 32 KiB (x4) L1 Instruction 32 KiB (x4) L2 Unified 256 KiB (x4) L3 Unified 6144 KiB (x1) Load Average: 4.90, 2.46, 1.58 ------------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------------ BM_Reads/1/threads:1 87.7 ns 87.7 ns 7955889 BM_Reads/1/threads:2 15.3 ns 30.6 ns 23136186 BM_Reads/1/threads:4 7.42 ns 25.2 ns 28852784 BM_Reads/1/threads:8 2.81 ns 12.1 ns 48782816 BM_Reads/1/threads:16 1.43 ns 7.65 ns 70177200 BM_Reads/1/threads:32 1.33 ns 6.80 ns 97955488 BM_Reads/1/threads:64 0.952 ns 6.61 ns 98974848 BM_Reads/4/threads:1 58.5 ns 58.5 ns 9621076 BM_Reads/4/threads:2 8.63 ns 17.2 ns 39341958 BM_Reads/4/threads:4 6.68 ns 20.9 ns 49974640 BM_Reads/4/threads:8 4.36 ns 15.6 ns 46642208 BM_Reads/4/threads:16 1.78 ns 8.27 ns 68227440 BM_Reads/4/threads:32 1.35 ns 7.09 ns 94126560 BM_Reads/4/threads:64 0.876 ns 6.91 ns 99590016 BM_ReadSharedPtrs/1/threads:1 63.4 ns 63.4 ns 11204514 BM_ReadSharedPtrs/1/threads:2 12.6 ns 25.2 ns 20000000 BM_ReadSharedPtrs/1/threads:4 5.01 ns 16.4 ns 38396812 BM_ReadSharedPtrs/1/threads:8 2.87 ns 11.9 ns 61796344 BM_ReadSharedPtrs/1/threads:16 2.02 ns 9.59 ns 78273408 BM_ReadSharedPtrs/1/threads:32 1.48 ns 7.91 ns 80529600 BM_ReadSharedPtrs/1/threads:64 1.25 ns 10.6 ns 56653504 BM_ReadSharedPtrs/4/threads:1 63.8 ns 63.8 ns 10000000 BM_ReadSharedPtrs/4/threads:2 5.83 ns 11.6 ns 68954404 BM_ReadSharedPtrs/4/threads:4 4.44 ns 13.9 ns 58391728 BM_ReadSharedPtrs/4/threads:8 2.75 ns 11.3 ns 53274936 BM_ReadSharedPtrs/4/threads:16 2.66 ns 10.8 ns 93864160 BM_ReadSharedPtrs/4/threads:32 2.44 ns 10.4 ns 64900896 BM_ReadSharedPtrs/4/threads:64 0.823 ns 7.14 ns 77218560 BM_ReadSharedPtrsThreadLocal/1/threads:1 134 ns 134 ns 5011638 BM_ReadSharedPtrsThreadLocal/1/threads:2 70.4 ns 141 ns 6701984 BM_ReadSharedPtrsThreadLocal/1/threads:4 43.2 ns 136 ns 4510720 BM_ReadSharedPtrsThreadLocal/1/threads:8 41.4 ns 160 ns 5049376 BM_ReadSharedPtrsThreadLocal/1/threads:16 35.2 ns 146 ns 4635376 BM_ReadSharedPtrsThreadLocal/1/threads:32 32.4 ns 149 ns 4752544 BM_ReadSharedPtrsThreadLocal/1/threads:64 18.8 ns 146 ns 5139008 BM_ReadSharedPtrsThreadLocal/4/threads:1 156 ns 156 ns 4373200 BM_ReadSharedPtrsThreadLocal/4/threads:2 84.6 ns 159 ns 4440214 BM_ReadSharedPtrsThreadLocal/4/threads:4 54.2 ns 155 ns 4642584 BM_ReadSharedPtrsThreadLocal/4/threads:8 44.2 ns 147 ns 4755640 BM_ReadSharedPtrsThreadLocal/4/threads:16 34.9 ns 147 ns 5078864 BM_ReadSharedPtrsThreadLocal/4/threads:32 37.0 ns 152 ns 4980928 BM_ReadSharedPtrsThreadLocal/4/threads:64 23.4 ns 149 ns 4204736 BM_Updates/1/threads:1 129 ns 129 ns 5429974 BM_Updates/1/threads:2 205 ns 410 ns 1625164 BM_Updates/1/threads:4 134 ns 155 ns 3337368 BM_Updates/1/threads:8 112 ns 138 ns 3871216 BM_Updates/1/threads:16 96.5 ns 142 ns 4074256 BM_Updates/1/threads:32 96.9 ns 142 ns 3858176 BM_Updates/1/threads:64 75.3 ns 147 ns 4228096 BM_Updates/4/threads:1 422 ns 339 ns 2059096 BM_Updates/4/threads:2 464 ns 412 ns 1585970 BM_Updates/4/threads:4 412 ns 419 ns 1312416 BM_Updates/4/threads:8 335 ns 348 ns 1878216 BM_Updates/4/threads:16 318 ns 342 ns 1556608 BM_Updates/4/threads:32 243 ns 331 ns 1952704 BM_Updates/4/threads:64 257 ns 355 ns 2050752
- Build a lock-free metrics collection library upon it.
- Eventually provide bindings and/or similar implementations in other languages: Rust, Haskell, Python, Go, etc.
Please see Code of Conduct and Contributing.
Ideas for contributions:
- Add bindings/implementations in other languages of your choice.
- Improve documentation where needed. This project should ideally have also some didactical value.
- Popularize the project to be useful to others too. In particular there seems to be just a copy-left (LGPLv2.1) user-space C++ RCU library https://liburcu.org/, so having this one under a more permissive license could be attractive for many projects.