Skip to content

Latest commit

 

History

History
74 lines (55 loc) · 3.43 KB

README.md

File metadata and controls

74 lines (55 loc) · 3.43 KB

Answers submitted by students

Homework Solutions

Why do LSM-tree and LevelDB use leveled structure?

# of Level Write performance Read Performance WAF RAF
Single Bad Good High Low
Multi Good Bad Low High

In leveldb, max size of level i is 10^iMB. But max size of level 0 is 8MB. Why?

  • We treat level-0 specially by bounding the number of files instead of number of bytes for two reasons:
    • (1) With larger write-buffer sizes, it is nice not to do too many level-0 compactions.
    • (2) The files in level-0 are merged on every read and therefore we wish to avoid too many files when the individual file size is small
[A] $ ./db_bench --benchmarks="fillseq" 
[B] $ ./db_bench --benchmarks="fillrandom"
  • Q1. Compare throughput, latency, and stats of two benchmarks and explain why.
  • Q2. Calculate SAF (Space Amplification Factor) for each benchmark.
Benchmark duplicate key range Major Compaction Throughput Latency SAF
Fillseq No No High Low 1 (0.98)
Fillrandom Yes Yes Low High 1.275
  • Q3. In benchmark A, SSTs are not written in L0. Why?
    • Trivial Move
[A] $ ./db_bench --benchmarks="fillrandom" --value_size=100 --num=1000000 --compression_ratio=1
[B] $ ./db_bench --benchmarks="fillrandom" --value_size=1000 --num=114173 --compression_ratio=1

Note 1. key_size = 16B
Note 2. same total kv pairs size.
Note 3. # of B's entries = 114173 = (16+100)/(16+1000) * 1000000

  • Q. The size of input kv pairs is the same. But One is better in throughput, the other is better in latency. Explain why.
Benchmark DB size # of entries Size of entry Throughput (MB/s) Latency (s/op)
A same 1,000,000 (many) 116B (small) low low
B same 114,173 (few) 1016B (big) high high
[Load] $ ./db_bench --benchmarks="fillrandom" --use_existing_db=0

[A] $ ./db_bench --benchmarks="readseq" --use_existing_db=1
[B] $ ./db_bench --benchmarks="readrandom" --use_existing_db=1
[C] $ ./db_bench --benchmarks="seekrandom" --use_existing_db=1
  • Q1. Which user key-value interface does each benchmark use? (Put, Get, Iterator, ...)
  • Q2. Compare throughput and latency of each benchmark and explain why.
Benchmark Interface Throughput (MB/s) Latency (micros/op) I/O Access Level
readseq Get() high low sequential all
readrandom Iterator->Next() low high random access one by one
until find the key
seekrandom Iterator->Seek() lowest highest random all