You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Avoiding S3 reads at query time Retrieving blocks available in the hot storage layer is 100x faster than read misses to S3, a difference of <1ms to 100ms. Therefore, keeping S3 downloads out of the query path is critical for a real-time system like Rockset.
If a compute node requests a block belonging to a file not found in the hot storage layer, a storage node must download the SST file from S3 before the requested block can be sent back to the compute node. To meet the latency requirements of our customers, we must ensure that all blocks needed at query time are available in the hot storage layer before compute nodes request them. The hot storage layer achieves this via three mechanisms:
Compute nodes send a synchronous prefetch request to the hot storage layer every time a new SST file is created. This happens as part of memtable flushes and compactions. RocksDB commits the memtable flush or compaction operation after the hot storage layer downloads the file ensuring the file is available before a compute node can request blocks from it.
When a storage node discovers a new slice, due to a compute node sending a prefetch or read block request for a file belonging to that slice, it proactively scans S3 to download the rest of the files for that slice. All files for a slice share the same prefix in S3, making this simpler.
Storage nodes periodically scan S3 to keep the slices they own in sync. Any locally missing files are downloaded, and locally available files that are obsolete are deleted.
For regular lsm, caching at file granularity is a good choice, but for tonbo's more special projection pushdown, this may cause uncommon columns to be loaded together.
AsyncFileReader::get_bytes: reads in column blocks as a granularity, suitable for filter & projection pushdown, but loading a complete column may require multiple network IOs
Feature Request
I think we can cache
AsyncFileReader::get_bytes
to avoid network IO and other costs caused by S3 reading.Flame graph for s3
cache: https://github.com/foyer-rs/foyer
ref: Cache SSTs locally slatedb/slatedb#9 (comment)
for: https://rockset.com/blog/separate-compute-storage-rocksdb/
benchmark rev: 472c008
The text was updated successfully, but these errors were encountered: