Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache last published event #87

Merged
merged 10 commits into from
Aug 13, 2024
Merged

Cache last published event #87

merged 10 commits into from
Aug 13, 2024

Conversation

goodhoko
Copy link
Member

@goodhoko goodhoko commented Aug 7, 2024

and let subscribers receive it upon subscription.

Optionally, we could provide a separate marker trait for cached events and persist only those. I couldn't figure out how to dynamically query for the type (trait) in publish. Somehow downcast_ref is not implemented for Event (despite Event being Any). I'd appreciate some help with that.

This causes performance regression for the pub-sub system. Here's output of cargo bench --bench pub_sub running on this branch (comparing it to the previous run on main):

Details

   Compiling tonari-actor v0.9.0 (/Users/jentak/workspace/tonari/actor)
    Finished `bench` profile [optimized] target(s) in 1.20s
     Running benches/pub_sub.rs (target/release/deps/pub_sub-9cceefaff682ce83)
publish/1 publishers 10 subscribers
                        time:   [2.3908 µs 2.8514 µs 3.4074 µs]
                        thrpt:  [293.48 Kelem/s 350.71 Kelem/s 418.27 Kelem/s]
                 change:
                        time:   [+23.719% +56.164% +97.912%] (p = 0.00 < 0.05)
                        thrpt:  [-49.473% -35.965% -19.171%]
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe
publish/2 publishers 10 subscribers
                        time:   [3.5475 µs 3.9570 µs 4.4083 µs]
                        thrpt:  [226.84 Kelem/s 252.71 Kelem/s 281.89 Kelem/s]
                 change:
                        time:   [+95.062% +134.00% +177.48%] (p = 0.00 < 0.05)
                        thrpt:  [-63.962% -57.265% -48.734%]
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe
publish/10 publishers 10 subscribers
                        time:   [27.460 µs 30.042 µs 32.920 µs]
                        thrpt:  [30.377 Kelem/s 33.287 Kelem/s 36.416 Kelem/s]
                 change:
                        time:   [+84.814% +102.23% +121.67%] (p = 0.00 < 0.05)
                        thrpt:  [-54.888% -50.552% -45.891%]
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
publish/1 publishers 100 subscribers
                        time:   [75.190 µs 81.870 µs 87.810 µs]
                        thrpt:  [11.388 Kelem/s 12.214 Kelem/s 13.300 Kelem/s]
                 change:
                        time:   [-10.861% -5.4355% -0.4260%] (p = 0.03 < 0.05)
                        thrpt:  [+0.4278% +5.7480% +12.184%]
                        Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
  8 (8.00%) low severe
  1 (1.00%) low mild
  2 (2.00%) high severe
publish/2 publishers 100 subscribers
                        time:   [79.105 µs 97.231 µs 115.16 µs]
                        thrpt:  [8.6836 Kelem/s 10.285 Kelem/s 12.641 Kelem/s]
                 change:
                        time:   [-15.101% -0.2671% +14.740%] (p = 0.97 > 0.05)
                        thrpt:  [-12.846% +0.2678% +17.787%]
                        No change in performance detected.
publish/10 publishers 100 subscribers
                        time:   [774.46 µs 781.99 µs 788.91 µs]
                        thrpt:  [1.2676 Kelem/s 1.2788 Kelem/s 1.2912 Kelem/s]
                 change:
                        time:   [+299.05% +448.33% +691.39%] (p = 0.00 < 0.05)
                        thrpt:  [-87.364% -81.763% -74.941%]
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low severe
  5 (5.00%) low mild

That's likely caused by the contention on SystemHandle::event_subscribers RwLock. Previously it only needed exclusive access when subscribing and shared access when publishing. Now it needs exclusive access when subscribing and publishing.

Closes #86

This is a semver-breaking change (due to the new bounds on Event).

@goodhoko goodhoko requested review from strohel and bschwind August 7, 2024 08:20
@strohel
Copy link
Member

strohel commented Aug 7, 2024

I've pushed 4 commits to this, @goodhoko (and others) PTAL.

It makes sense to review them one-by-one (and check their messages) if you're curious.

It would be also interesting to run cargo bench --bench pub_sub on each commit starting from main till the current tip of this branch.

@strohel strohel requested a review from mbernat August 7, 2024 22:19
Copy link
Member Author

@goodhoko goodhoko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@strohel Nice, thanks! This looks good but I have some functional and stylistic remarks and questions.

src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
strohel and others added 8 commits August 8, 2024 17:05
…ro-sized-type

The benchmark results are therefore not comparable before/after (they are 10% to 30% slower on my laptop).
and let subscribers receive it upon subscription.

Closes #86
This fixes the performance regression for me, but we need to think whether that's worth the dependency tradeoff.
With a small trick (that unfortunately leaks to a public type) we were able to implement a kind of specialization [1] with perfect behavior: `Event`s don't pay the price for caching, and it is not possible to subscribe_and_receive_latest() to bare `Event`s.

Also, it means that we didn't have to add bounds to `Event` [2].

[1] I thought the orphan rule would prevent us from adding the blanked implementation, but not!
[2] I've sneaked in the removal of `'static`, but that's mostly aesthetic: `std::any::Any` requires it anyway.
...now that we have it in the dependency tree anyway.

Allows us to remove the `RwLock` which in turn allows us to deduplicate subscribe_recipient() vs. subscribe_and_receive_latest().

Let's think about the possible situations of "publishing the first message" vs. "just subscribing and getting latest".
@strohel
Copy link
Member

strohel commented Aug 8, 2024

@goodhoko @bschwind alright I've:

  1. prepended a commit that makes the pub_sub benchmark a bit more realistic (previously the payload was zero-sized-type)
  2. Measured the commit-by-commit performance on P3 Tiny (tonari services stopped) using:
    git rebase main --exec 'cargo bench --bench pub_sub -- --save-baseline "$(git describe | tr "\n" " "; git show -q --format=%s | cut -d" " -f -4)"'
  3. Tabulated the results using critcmp --list:
publish/1 publishers 10 subscribers
-----------------------------------
v0.9.0-1-gd3713d6 Make pub_sub bench more                1.05      10.7±2.83µs  91.5 KElem/sec
v0.9.0-2-gb53014d Cache last published event             1.00      10.2±1.52µs  95.3 KElem/sec
v0.9.0-3-g508a62d Try to use dashmap                     1.01      10.3±1.80µs  94.6 KElem/sec
v0.9.0-4-gc3a0d56 Split `Event` and `CacheableEvent`     1.00      10.2±1.56µs  95.7 KElem/sec
v0.9.0-5-g2485610 Extend pub_sub benchmark with          1.02      10.4±1.81µs  93.7 KElem/sec
v0.9.0-6-gb5d4d3f Use DashMap also for                   1.06      10.8±2.13µs  90.6 KElem/sec
v0.9.0-7-g171a9c7 Drop `dashmap` and reinstate           1.03      10.5±1.39µs  93.1 KElem/sec
v0.9.0-8-g148227a Deduplicate subscription code into     1.02      10.4±1.48µs  93.7 KElem/sec
v0.9.0-9-gba3dba1 Back to dashmap, but                   1.00      10.2±1.63µs  95.7 KElem/sec

publish/1 publishers 10 subscribers (cacheable event type)
----------------------------------------------------------
v0.9.0-5-g2485610 Extend pub_sub benchmark with          1.06      10.6±1.84µs  92.1 KElem/sec
v0.9.0-6-gb5d4d3f Use DashMap also for                   1.00      10.0±1.22µs  97.4 KElem/sec
v0.9.0-7-g171a9c7 Drop `dashmap` and reinstate           1.08      10.9±2.65µs  89.9 KElem/sec
v0.9.0-8-g148227a Deduplicate subscription code into     1.09      10.9±2.08µs  89.5 KElem/sec

publish/2 publishers 10 subscribers
-----------------------------------
v0.9.0-1-gd3713d6 Make pub_sub bench more                1.00       9.4±0.53µs  103.8 KElem/sec
v0.9.0-2-gb53014d Cache last published event             2.81      26.4±7.82µs  36.9 KElem/sec
v0.9.0-3-g508a62d Try to use dashmap                     1.05       9.8±0.62µs  99.3 KElem/sec
v0.9.0-4-gc3a0d56 Split `Event` and `CacheableEvent`     1.02       9.6±0.46µs  102.1 KElem/sec
v0.9.0-5-g2485610 Extend pub_sub benchmark with          1.01       9.5±0.48µs  103.0 KElem/sec
v0.9.0-6-gb5d4d3f Use DashMap also for                   1.00       9.4±0.46µs  103.7 KElem/sec
v0.9.0-7-g171a9c7 Drop `dashmap` and reinstate           1.02       9.6±0.46µs  101.6 KElem/sec
v0.9.0-8-g148227a Deduplicate subscription code into     1.00       9.4±0.43µs  103.5 KElem/sec
v0.9.0-9-gba3dba1 Back to dashmap, but                   1.04       9.7±0.42µs  100.2 KElem/sec

publish/2 publishers 10 subscribers (cacheable event type)
----------------------------------------------------------
v0.9.0-5-g2485610 Extend pub_sub benchmark with          1.00       9.6±0.42µs  101.5 KElem/sec
v0.9.0-6-gb5d4d3f Use DashMap also for                   1.00       9.7±0.50µs  101.2 KElem/sec
v0.9.0-7-g171a9c7 Drop `dashmap` and reinstate           2.87      27.6±5.70µs  35.4 KElem/sec
v0.9.0-8-g148227a Deduplicate subscription code into     2.71      26.1±3.50µs  37.4 KElem/sec

publish/10 publishers 10 subscribers
------------------------------------
v0.9.0-1-gd3713d6 Make pub_sub bench more                1.00       5.2±1.10µs  188.6 KElem/sec
v0.9.0-2-gb53014d Cache last published event            20.98    108.6±25.14µs  9.0 KElem/sec
v0.9.0-3-g508a62d Try to use dashmap                     1.09       5.6±1.00µs  173.5 KElem/sec
v0.9.0-4-gc3a0d56 Split `Event` and `CacheableEvent`     1.02       5.3±1.16µs  184.2 KElem/sec
v0.9.0-5-g2485610 Extend pub_sub benchmark with          1.24       6.4±3.09µs  151.6 KElem/sec
v0.9.0-6-gb5d4d3f Use DashMap also for                   1.06       5.5±1.15µs  177.6 KElem/sec
v0.9.0-7-g171a9c7 Drop `dashmap` and reinstate           1.04       5.4±1.12µs  180.9 KElem/sec
v0.9.0-8-g148227a Deduplicate subscription code into     1.07       5.6±1.42µs  175.6 KElem/sec
v0.9.0-9-gba3dba1 Back to dashmap, but                   1.08       5.6±0.84µs  175.1 KElem/sec

publish/1 publishers 100 subscribers
------------------------------------
v0.9.0-1-gd3713d6 Make pub_sub bench more                1.09    125.9±44.44µs  7.8 KElem/sec
v0.9.0-2-gb53014d Cache last published event             1.08    124.7±55.50µs  7.8 KElem/sec
v0.9.0-3-g508a62d Try to use dashmap                     1.09    125.6±50.27µs  7.8 KElem/sec
v0.9.0-4-gc3a0d56 Split `Event` and `CacheableEvent`     1.05    121.4±34.90µs  8.0 KElem/sec
v0.9.0-5-g2485610 Extend pub_sub benchmark with          1.16    134.8±53.37µs  7.2 KElem/sec
v0.9.0-6-gb5d4d3f Use DashMap also for                   1.10    127.3±42.72µs  7.7 KElem/sec
v0.9.0-7-g171a9c7 Drop `dashmap` and reinstate           1.03    119.8±32.54µs  8.2 KElem/sec
v0.9.0-8-g148227a Deduplicate subscription code into     1.06    123.1±31.26µs  7.9 KElem/sec
v0.9.0-9-gba3dba1 Back to dashmap, but                   1.00    115.8±39.11µs  8.4 KElem/sec

publish/2 publishers 100 subscribers
------------------------------------
v0.9.0-1-gd3713d6 Make pub_sub bench more                1.09    120.6±26.07µs  8.1 KElem/sec
v0.9.0-2-gb53014d Cache last published event             4.22   466.3±150.09µs  2.1 KElem/sec
v0.9.0-3-g508a62d Try to use dashmap                     1.01    111.8±10.41µs  8.7 KElem/sec
v0.9.0-4-gc3a0d56 Split `Event` and `CacheableEvent`     1.05    115.6±17.81µs  8.4 KElem/sec
v0.9.0-5-g2485610 Extend pub_sub benchmark with          1.02     112.6±7.52µs  8.7 KElem/sec
v0.9.0-6-gb5d4d3f Use DashMap also for                   1.09    120.3±20.32µs  8.1 KElem/sec
v0.9.0-7-g171a9c7 Drop `dashmap` and reinstate           1.08    119.7±20.05µs  8.2 KElem/sec
v0.9.0-8-g148227a Deduplicate subscription code into     1.05    115.8±16.05µs  8.4 KElem/sec
v0.9.0-9-gba3dba1 Back to dashmap, but                   1.00     110.6±9.21µs  8.8 KElem/sec

publish/10 publishers 100 subscribers
-------------------------------------
v0.9.0-1-gd3713d6 Make pub_sub bench more                1.00    273.8±23.13µs  3.6 KElem/sec
v0.9.0-2-gb53014d Cache last published event             8.13       2.2±0.66ms  449 Elem/sec
v0.9.0-3-g508a62d Try to use dashmap                     1.04    283.6±36.83µs  3.4 KElem/sec
v0.9.0-4-gc3a0d56 Split `Event` and `CacheableEvent`     1.04    283.5±34.94µs  3.4 KElem/sec
v0.9.0-5-g2485610 Extend pub_sub benchmark with          1.05    286.1±34.07µs  3.4 KElem/sec
v0.9.0-6-gb5d4d3f Use DashMap also for                   1.06    289.5±52.77µs  3.4 KElem/sec
v0.9.0-7-g171a9c7 Drop `dashmap` and reinstate           1.06    289.2±45.56µs  3.4 KElem/sec
v0.9.0-8-g148227a Deduplicate subscription code into     1.03    281.5±28.75µs  3.5 KElem/sec
v0.9.0-9-gba3dba1 Back to dashmap, but                   1.01    275.7±29.10µs  3.5 KElem/sec

Copy link
Member Author

@goodhoko goodhoko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, @strohel! Thanks for taking this over, it's a pleasure see you work on something that excites you!

The benchmark results look good! If I read them right, this PR no longer causes noticeable performance regression. This is equally true for using dashmap and intelligent locking (i.e. upgrade()).

There's a significant (and expected) difference in performance of the cacheable event when there are multiple publishers between Dashmap and intelligent locking. IMO that's totally ok!

I gave this one more thorough read and it looks good, though I have couple last remarks.

src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
@strohel
Copy link
Member

strohel commented Aug 8, 2024

@bschwind @goodhoko (sorry I meant to submit this right after the benchmark results

Benchmark Observations

  • using dashmap for the last event cache completely hides the performance penalty of caching the events
  • even with trying to hold the write lock least amount of time, using RwLock still makes cached messages 3 times slower than non-cached ones in case of 2 subscribers (and probably even slower with more) :-(
  • there's no performance difference between "naive" (1st commit) and "intelligent" locking (the "(cacheable event type)" benches after removing dashmap).
  • using dashmap for the subscriber callback list doesn't have any effect (and we need an RwLock to prevent a race condition anyway)

Conclusion

  • I don't think the Event / CacheableEvent distinction is worth it, if we can have all the goodies (and and even more performance) by just introducing a dependency
    • Update: alright, there are the Send + Sync bounds on event, but I don't think they are limiting in practice (the message types themselves that must implement From<E> already need to be Send + Sync)
  • I'll therefore iterate more to get this to the state around 508a62d Try to use dashmap, just with the deduplication and nicer comments
    • I feel somewhat silly spending so much time for this (I should have properly benchmarked right from start)

@strohel
Copy link
Member

strohel commented Aug 8, 2024

  • I'll therefore iterate more to get this to the state around 508a62d Try to use dashmap, just with the deduplication and nicer comments

Copy link
Member

@bschwind bschwind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! No nits or issues with the code that I see :)

Copy link
Member Author

@goodhoko goodhoko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (I can't formally approve as the author of this PR.)

Cargo.toml Show resolved Hide resolved
benches/pub_sub.rs Show resolved Hide resolved
src/lib.rs Show resolved Hide resolved
Copy link
Contributor

@mbernat mbernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@goodhoko
Copy link
Member Author

This was successfully tested in https://github.com/tonarino/portal/pull/3746. Should be good to merge!

src/lib.rs Outdated Show resolved Hide resolved
Co-authored-by: Matěj Laitl <[email protected]>
@goodhoko goodhoko merged commit 4ea3712 into main Aug 13, 2024
6 checks passed
@goodhoko goodhoko deleted the last-value-cache branch August 13, 2024 10:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add cache for the last published event
4 participants