Feat: Cache for S3 #192

KKould · 2024-10-21T10:52:09Z

Feature Request

I think we can cache AsyncFileReader::get_bytes to avoid network IO and other costs caused by S3 reading.

Flame graph for s3
cache: https://github.com/foyer-rs/foyer
ref: Cache SSTs locally slatedb/slatedb#9 (comment)

for: https://rockset.com/blog/separate-compute-storage-rocksdb/

Avoiding S3 reads at query time Retrieving blocks available in the hot storage layer is 100x faster than read misses to S3, a difference of <1ms to 100ms. Therefore, keeping S3 downloads out of the query path is critical for a real-time system like Rockset.

If a compute node requests a block belonging to a file not found in the hot storage layer, a storage node must download the SST file from S3 before the requested block can be sent back to the compute node. To meet the latency requirements of our customers, we must ensure that all blocks needed at query time are available in the hot storage layer before compute nodes request them. The hot storage layer achieves this via three mechanisms:

Compute nodes send a synchronous prefetch request to the hot storage layer every time a new SST file is created. This happens as part of memtable flushes and compactions. RocksDB commits the memtable flush or compaction operation after the hot storage layer downloads the file ensuring the file is available before a compute node can request blocks from it.
When a storage node discovers a new slice, due to a compute node sending a prefetch or read block request for a file belonging to that slice, it proactively scans S3 to download the rest of the files for that slice. All files for a slice share the same prefix in S3, making this simpler.
Storage nodes periodically scan S3 to keep the slices they own in sync. Any locally missing files are downloaded, and locally available files that are obsolete are deleted.

benchmark rev: 472c008

+--------------------------------------------+----------+---------+-----------+
|                                            | tonbo    | rocksdb | tonbo_s3  |
+=============================================================================+
| random range reads                         | 191422ms | 66371ms | 7720585ms |
|--------------------------------------------+----------+---------+-----------|
| random range reads                         | 188092ms | 59306ms | 7575882ms |
|--------------------------------------------+----------+---------+-----------|
| random range projection reads              | 74034ms  | 62108ms | 2226405ms |
|--------------------------------------------+----------+---------+-----------|
| random range projection reads              | 75691ms  | 65741ms | 2227360ms |
|--------------------------------------------+----------+---------+-----------|
| random range projection reads (4 threads)  | 57490ms  | 69040ms | 593947ms  |
|--------------------------------------------+----------+---------+-----------|
| random range projection reads (8 threads)  | 57257ms  | 65872ms | 322711ms  |
|--------------------------------------------+----------+---------+-----------|
| random range projection reads (16 threads) | 51670ms  | 62922ms | 177118ms  |
|--------------------------------------------+----------+---------+-----------|
| random range projection reads (32 threads) | 52854ms  | 62388ms | 112974ms  |
+--------------------------------------------+----------+---------+-----------+

The text was updated successfully, but these errors were encountered:

KKould · 2024-10-22T03:39:48Z

cache granularity

parquet file:
- refer to slatedb related discussions: Cache SSTs locally slatedb/slatedb#9
- For regular lsm, caching at file granularity is a good choice, but for tonbo's more special projection pushdown, this may cause uncommon columns to be loaded together.
AsyncFileReader::get_bytes: reads in column blocks as a granularity, suitable for filter & projection pushdown, but loading a complete column may require multiple network IOs

KKould added the enhancement New feature or request label Oct 21, 2024

KKould self-assigned this Oct 21, 2024

KKould added this to Tonbo Oct 21, 2024

KKould moved this to Todo in Tonbo Oct 21, 2024

KKould linked a pull request Oct 29, 2024 that will close this issue

feat: add CacheReader for SStable #193

Open

ethe closed this as completed by moving to Done in Tonbo Nov 4, 2024

ethe mentioned this issue Nov 18, 2024

refactor: breakdown S3 benchmark action from cache reader #225

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: Cache for S3 #192

Feat: Cache for S3 #192

KKould commented Oct 21, 2024 •

edited

Loading

KKould commented Oct 22, 2024

Feat: Cache for S3 #192

Feat: Cache for S3 #192

Comments

KKould commented Oct 21, 2024 • edited Loading

Feature Request

KKould commented Oct 22, 2024

KKould commented Oct 21, 2024 •

edited

Loading