Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tool for database conversion #2061

Merged
merged 75 commits into from
Aug 17, 2024
Merged

Add tool for database conversion #2061

merged 75 commits into from
Aug 17, 2024

Conversation

magicxyyz
Copy link
Contributor

@magicxyyz magicxyyz commented Dec 28, 2023

Resolves NIT-1259

This PR adds tool for database conversion and a script automating typical conversion of nitro node databases.

database-conversion.bash script

The script coverts nitro databases from leveldb to pebble using the dbconv tool.

Usage: ./scripts/convert-databases.bash [OPTIONS..]

OPTIONS:
--dbconv          dbconv binary path (default: "/usr/local/bin/dbconv")
--src             directory containing source databases (default: "/home/user/.arbitrum/arb1/nitro")
--dst             destination directory
--force           remove destination directory if it exists
--skip-existing   skip convertion of databases which directories already exist in the destination directory

Upon successful completion the script prints out:

== Conversion status:
   l2chaindata database: converted
   l2chaindata database freezer (ancient): copied
   arbitrumdata database: converted
   wasm database: converted
   classic-msg database: converted

dbconv tool

example usage

  • converting leveldb database to pebble
./dbconv --src.data /path/to/source/database/ --src.db-engine leveldb --dst.data /path/to/destination/database/ --dst.db-engine "pebble" --convert
  • verifying that all source db entries are in destination db (checking only key existence)
./dbconv --src.data /path/to/source/database/ --src.db-engine leveldb --dst.data /path/to/destination/database/ --dst.db-engine "pebble" --verify "keys"
  • converting leveldb database to pebble, compacting the resulting db and then verifying that all keys from source db exist in destination db
./dbconv --src.data /path/to/source/database/ --src.db-engine leveldb --dst.data /path/to/destination/database/ --dst.db-engine "pebble" --convert --compact --verify "keys" 

Local experiments

Experiments were conducted on l2chaindata database from https://snapshot.arbitrum.foundation/sepolia/nitro-archive.tar. The source database directory had size of aprox 57GB.

  • conversion of 177,405,002 entries took aprox 16 minutes
  • verification of keys only (--verify="keys") took aprox 5 minutes
...
INFO [01-05|23:33:33.200] Conversion finished. entries=177,405,002 MB=50780 "avg Me/s"=0.187 "avg MB/s"=53.557 elapsed=15m48.141249s
 ...
INFO [01-05|23:33:59.360] Compaction done                          elapsed=25.733762541s
 ...
INFO [01-05|23:39:20.325] Verification completed successfully.     elapsed=5m20.691211s

note: the biggest impact had the size of write batch (--ideal-batch-size) which was increased to 100MB from geth's default 100KB.

@cla-bot cla-bot bot added the s Automatically added by the CLA bot if the creator of a PR is registered as having signed the CLA. label Dec 28, 2023
Copy link

@magicxyyz magicxyyz marked this pull request as ready for review January 8, 2024 15:32
@joshuacolvin0 joshuacolvin0 requested a review from tsahee January 10, 2024 04:55
Copy link

codecov bot commented Jan 10, 2024

Codecov Report

Attention: Patch coverage is 0% with 330 lines in your changes are missing coverage. Please review.

Project coverage is 28.94%. Comparing base (a5548e8) to head (dd33ae2).

❗ Current head dd33ae2 differs from pull request most recent head c82675a. Consider uploading reports for the commit c82675a to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2061      +/-   ##
==========================================
+ Coverage   22.39%   28.94%   +6.54%     
==========================================
  Files         226      236      +10     
  Lines       32034    34687    +2653     
==========================================
+ Hits         7175    10041    +2866     
+ Misses      23601    23504      -97     
+ Partials     1258     1142     -116     

@geomad
Copy link

geomad commented Jul 27, 2024

I tried to run the script, and it exited (and deleted everything by default) because it found that the wasm database was already on pebble (maybe from the fresh sync?). Imo it should copy the database in that case, and not error and revert the whole operation.

cmd/dbconv/dbconv/dbconv.go Outdated Show resolved Hide resolved
cmd/dbconv/main.go Show resolved Hide resolved
Dockerfile Show resolved Hide resolved
scripts/convert-databases.bash Outdated Show resolved Hide resolved
scripts/convert-databases.bash Outdated Show resolved Hide resolved
scripts/convert-databases.bash Outdated Show resolved Hide resolved
util/dbutil/dbutil_test.go Show resolved Hide resolved
@magicxyyz magicxyyz requested a review from tsahee August 15, 2024 16:01
@magicxyyz magicxyyz changed the title [NIT-1259] Add tool for database conversion Add tool for database conversion Aug 16, 2024
os.Exit(1)
}
stats := conv.Stats()
log.Info("Conversion finished.", "entries", stats.Entries(), "MB", stats.Bytes()/1024/1024, "avg Ke/s", stats.AverageEntriesPerSecond()/1000, "avg MB/s", stats.AverageBytesPerSecond()/1024/1024, "elapsed", stats.Elapsed())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: Ke/s is a little confusing, it could have the same unit defined in printProgress, i.e. entries/s

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in: #2591

func DBConvConfigAddOptions(f *flag.FlagSet) {
DBConfigAddOptions("src", f, &DefaultDBConvConfig.Src)
DBConfigAddOptions("dst", f, &DefaultDBConvConfig.Dst)
f.Int("ideal-batch-size", DefaultDBConvConfig.IdealBatchSize, "ideal write batch size")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick:

Suggested change
f.Int("ideal-batch-size", DefaultDBConvConfig.IdealBatchSize, "ideal write batch size")
f.Int("ideal-batch-size", DefaultDBConvConfig.IdealBatchSize, "ideal write batch size in bytes")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

included in: #2591

)

func TestConversion(t *testing.T) {
_ = testhelpers.InitTestLog(t, log.LvlTrace)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: unnecessary line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in: #2591

Comment on lines +92 to +117
bc := builder.L2.ExecNode.Backend.ArbInterface().BlockChain()
current := bc.CurrentBlock()
if current == nil {
Fatal(t, "failed to get current block header")
}
triedb := bc.StateCache().TrieDB()
visited := 0
i := uint64(0)
// don't query historical blocks when PathSchem is used
if builder.execConfig.Caching.StateScheme == rawdb.PathScheme {
i = current.Number.Uint64()
}
for ; i <= current.Number.Uint64(); i++ {
header := bc.GetHeaderByNumber(i)
_, err := bc.StateAt(header.Root)
Require(t, err)
tr, err := trie.New(trie.TrieID(header.Root), triedb)
Require(t, err)
it, err := tr.NodeIterator(nil)
Require(t, err)
for it.Next(true) {
visited++
}
Require(t, it.Error())
}
t.Log("visited nodes:", visited)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why this is a useful test, but it is OK being here :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's just to iterate over state tries to better check for possible db corruption. That's still not a complete check, but increases coverage.


convert_result=
convert () {
srcdir=$(echo $src/$1 | tr -s /)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: tr command is unecessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in: #2591


convert_result=
convert () {
srcdir=$(echo $src/$1 | tr -s /)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: in several places we could use double quote to prevent globbing and word splitting, such as here

Suggested change
srcdir=$(echo $src/$1 | tr -s /)
srcdir=$(echo "$src/$1" | tr -s /)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that it is already handled, but I am not sure

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why the behavior in this script is not implemented in dbconv golang tool instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that was decided on one of the nitro meetings. The idea was to keep dbconv tool as simple and flexible as possible and have a most common use case covered by a bash script.

removeDir() {
cmd="rm -r \"$1\""
echo $cmd
eval $cmd
Copy link
Contributor

@diegoximenes diegoximenes Aug 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to use eval here, you can call rm directly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's needed to support directories with spaces in name. Otherwise I get:

== Note: removing only failed destination directory
rm -r "dstdb with spaces/l2chaindata"
rm: "dstdb: No such file or directory
rm: with: No such file or directory
rm: spaces/l2chaindata": No such file or directory

@tsahee tsahee enabled auto-merge August 16, 2024 23:53
@tsahee tsahee merged commit 0639404 into master Aug 17, 2024
14 checks passed
@tsahee tsahee deleted the db-conversion branch August 17, 2024 00:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design-approved s Automatically added by the CLA bot if the creator of a PR is registered as having signed the CLA.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants