Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(storage) CacheTable rewrite #497

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Conversation

Zk2u
Copy link
Contributor

@Zk2u Zk2u commented Nov 29, 2024

Description

This PR rewrites the CacheTable used for the L2 block manager (and soon the L1BM) to increase performance.

It adds the following dependencies:

Dependency About Reason for addition
concurrent-map A lock-free B+ tree based on sled’s internal index structure, but supporting richer Rust types as keys and values than raw bytes. This structure supports atomic compare and swap operations with the ConcurrentMap::cas method. Powers the new in-memory cache, replaces the effective Mutex around a hashmap + arc rubbish we had going on before.
cache-advisor A simple, high performance scan-resistant eviction manager. Tells you when to evict items from a cache. Features: two-segment LRU, protects against cache pollution from single-hit items, 256 shards accessed via non-blocking flatcombining, local access buffer that must fill up before accessing shared state, compresses the costs associated with each item to a u8 using a compression technique that will converge to the overall true sum of costs over time, but allows for much less memory to be used for accounting. Concurrent map isn't an LRU by itself, so this gives us LRU functionality
kanal Provides multi-producer and multi-consumer channels with advanced features and lock-free one-shot channels that only allocate the size of a pointer in the heap, allowing for fast communication. The library has a focus on unifying message passing between synchronous and asynchronous portions of Rust code through a combination of synchronous and asynchronous APIs while maintaining high performance. Fastest channel design in Rust, so I want to move our internal channels to use this eventually. However, it was needed here for another reason. Tokio's broadcast::Receivercannot be cloned - you need the Senderto create new Receivers. Kanal's channels are cooler and let you do this, which gets rid of the Arcing we were previously doing, allowing concurrent map to clone Slots when necessary
ahash Rust's fastest Hasherimplementation. Uses hardware AES when available. Non-cryptographic. CacheAdvisoridentifies objects via u64 keys. To keep the cache generic, we require V: Hashand then internally use u64s to identify things in the map. So a bit of a hybrid between a B+ tree and a hash map.

Note that the new CacheTableis Sendbut !Sync. You cannot share a single instantiation between threads. This is the biggest API-level change. The way to share it across threads is by cloning, and giving a new clone of the CacheTable whenever it might switch threads, like for a tokio task.

I haven't yet tested the performance of this implementation, but I've built something using the same tooling and similar algorithms before. These things are usually blisteringly quick.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature/Enhancement (non-breaking change which adds functionality or enhances an existing one)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactor
  • New or updated tests

Checklist

  • I have performed a self-review of my code.
  • I have commented my code where necessary.
  • I have updated the documentation if needed.
  • My changes do not introduce new warnings.
  • I have added tests that prove my changes are effective or that my feature works.
  • New and existing tests pass with my changes.

Related Issues

STR-627

@Zk2u Zk2u self-assigned this Nov 29, 2024
@delbonis
Copy link
Contributor

delbonis commented Nov 29, 2024

Is there a ticket for this? This is a lot of nontrivial work that should be under a ticket so we can prio it properly.

@delbonis
Copy link
Contributor

Note that the new CacheTableis Sendbut !Sync. You cannot share a single instantiation between threads. This is the biggest API-level change. The way to share it across threads is by cloning, and giving a new clone of the CacheTable whenever it might switch threads, like for a tokio task.

Yeah this is a big problem. We're likely to be fetching stuff from DB on different tasks/threads, so this probably weakens our use of caching here.

impl<T: Clone> SlotState<T> {
impl<T> PartialEq for Receiver<T> {
fn eq(&self, _other: &Self) -> bool {
// this is only so we can
Copy link
Contributor

@bewakes bewakes Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment incomplete?

Edit: I see what it means from the struct comments though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants