Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Cache for S3 #192

Closed
1 task done
KKould opened this issue Oct 21, 2024 · 1 comment · May be fixed by #193
Closed
1 task done

Feat: Cache for S3 #192

KKould opened this issue Oct 21, 2024 · 1 comment · May be fixed by #193
Assignees
Labels
enhancement New feature or request

Comments

@KKould
Copy link
Contributor

KKould commented Oct 21, 2024

Feature Request

I think we can cache AsyncFileReader::get_bytes to avoid network IO and other costs caused by S3 reading.

for: https://rockset.com/blog/separate-compute-storage-rocksdb/

Avoiding S3 reads at query time Retrieving blocks available in the hot storage layer is 100x faster than read misses to S3, a difference of <1ms to 100ms. Therefore, keeping S3 downloads out of the query path is critical for a real-time system like Rockset.

If a compute node requests a block belonging to a file not found in the hot storage layer, a storage node must download the SST file from S3 before the requested block can be sent back to the compute node. To meet the latency requirements of our customers, we must ensure that all blocks needed at query time are available in the hot storage layer before compute nodes request them. The hot storage layer achieves this via three mechanisms:

Compute nodes send a synchronous prefetch request to the hot storage layer every time a new SST file is created. This happens as part of memtable flushes and compactions. RocksDB commits the memtable flush or compaction operation after the hot storage layer downloads the file ensuring the file is available before a compute node can request blocks from it.
When a storage node discovers a new slice, due to a compute node sending a prefetch or read block request for a file belonging to that slice, it proactively scans S3 to download the rest of the files for that slice. All files for a slice share the same prefix in S3, making this simpler.
Storage nodes periodically scan S3 to keep the slices they own in sync. Any locally missing files are downloaded, and locally available files that are obsolete are deleted.

benchmark rev: 472c008

+--------------------------------------------+----------+---------+-----------+
|                                            | tonbo    | rocksdb | tonbo_s3  |
+=============================================================================+
| random range reads                         | 191422ms | 66371ms | 7720585ms |
|--------------------------------------------+----------+---------+-----------|
| random range reads                         | 188092ms | 59306ms | 7575882ms |
|--------------------------------------------+----------+---------+-----------|
| random range projection reads              | 74034ms  | 62108ms | 2226405ms |
|--------------------------------------------+----------+---------+-----------|
| random range projection reads              | 75691ms  | 65741ms | 2227360ms |
|--------------------------------------------+----------+---------+-----------|
| random range projection reads (4 threads)  | 57490ms  | 69040ms | 593947ms  |
|--------------------------------------------+----------+---------+-----------|
| random range projection reads (8 threads)  | 57257ms  | 65872ms | 322711ms  |
|--------------------------------------------+----------+---------+-----------|
| random range projection reads (16 threads) | 51670ms  | 62922ms | 177118ms  |
|--------------------------------------------+----------+---------+-----------|
| random range projection reads (32 threads) | 52854ms  | 62388ms | 112974ms  |
+--------------------------------------------+----------+---------+-----------+
@KKould KKould added the enhancement New feature or request label Oct 21, 2024
@KKould KKould self-assigned this Oct 21, 2024
@KKould KKould added this to Tonbo Oct 21, 2024
@KKould KKould moved this to Todo in Tonbo Oct 21, 2024
@KKould
Copy link
Contributor Author

KKould commented Oct 22, 2024

cache granularity

  • parquet file:
    • refer to slatedb related discussions: Cache SSTs locally slatedb/slatedb#9
    • For regular lsm, caching at file granularity is a good choice, but for tonbo's more special projection pushdown, this may cause uncommon columns to be loaded together.
  • AsyncFileReader::get_bytes: reads in column blocks as a granularity, suitable for filter & projection pushdown, but loading a complete column may require multiple network IOs

@KKould KKould linked a pull request Oct 29, 2024 that will close this issue
@ethe ethe closed this as completed by moving to Done in Tonbo Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants