Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tool for database conversion #2061

Merged
merged 75 commits into from
Aug 17, 2024
Merged
Changes from 1 commit
Commits
Show all changes
75 commits
Select commit Hold shift + click to select a range
b6ae081
add dbconv draft
magicxyyz Dec 28, 2023
291214b
Merge branch 'master' into db-conversion
magicxyyz Dec 28, 2023
a52d9ac
make lint happy
magicxyyz Dec 28, 2023
67922b5
add database conversion system test draft
magicxyyz Dec 28, 2023
5f61bc3
improve db conversion test
magicxyyz Dec 28, 2023
689e653
use start key instead of prefix, add main draft
magicxyyz Dec 30, 2023
83a22e1
fix middle key lookup
magicxyyz Jan 3, 2024
a1ea3ae
fix lint
magicxyyz Jan 3, 2024
0868fa2
Merge branch 'master' into db-conversion
magicxyyz Jan 3, 2024
d14a201
add initial progress reporting
magicxyyz Jan 4, 2024
dd4ec96
remove debug log from stats
magicxyyz Jan 4, 2024
c81e6c8
fix forking, add more stats
magicxyyz Jan 4, 2024
97ea9b7
add verification option
magicxyyz Jan 4, 2024
1e7041e
reformat progress string, add log-level option
magicxyyz Jan 4, 2024
c735874
add compaction option
magicxyyz Jan 4, 2024
83f1f57
clean ':' from log
magicxyyz Jan 4, 2024
31e5b8c
stop progress printing during compaction
magicxyyz Jan 5, 2024
e779fe7
change unit of entries per second
magicxyyz Jan 8, 2024
d7f5360
Merge branch 'master' into db-conversion
magicxyyz Jan 8, 2024
bc41c66
shorten dbconv test
magicxyyz Jan 8, 2024
62c86a5
Merge branch 'master' into db-conversion
magicxyyz Jan 10, 2024
39a5311
add dbconv to Makefile
magicxyyz Jan 10, 2024
dd33ae2
add dbconv to docker
magicxyyz Jan 11, 2024
c82675a
Merge branch 'master' into db-conversion
magicxyyz Apr 2, 2024
48f5c84
cmd/dbconv: add metrics
magicxyyz Apr 5, 2024
7cb6417
Merge branch 'pebble-extra-options' into db-conversion
magicxyyz Apr 22, 2024
59bdbfc
Merge remote-tracking branch 'origin/pebble-extra-options' into db-co…
magicxyyz Apr 23, 2024
c3b4a19
dbconv: add pebble config options
magicxyyz Apr 24, 2024
c3c6aff
Merge branch 'pebble-extra-options' into db-conversion
magicxyyz Apr 24, 2024
7e6e2d1
Merge branch 'pebble-extra-options' into db-conversion
magicxyyz Apr 24, 2024
0062b76
Merge branch 'pebble-extra-options' into db-conversion
magicxyyz Apr 24, 2024
9daf165
Merge branch 'pebble-extra-options' into db-conversion
magicxyyz Apr 24, 2024
d7dfdb8
Merge branch 'pebble-extra-options' into db-conversion
magicxyyz Apr 24, 2024
073c40e
Merge remote-tracking branch 'origin/pebble-extra-options' into db-co…
magicxyyz Apr 26, 2024
b213597
Merge branch 'master' into db-conversion
magicxyyz Jun 3, 2024
fc98de6
cmd/dbconv: remove multithreading option, update pebble extra options…
magicxyyz Jun 4, 2024
d7f5ea9
system_tests: update db_conversion_test
magicxyyz Jun 4, 2024
5eac1d4
cmd/dbconv: format numbers in progress message
magicxyyz Jun 4, 2024
88e8221
Merge branch 'master' into db-conversion
magicxyyz Jun 5, 2024
e308cf6
scripts: add initial version of convert-databases.bash
magicxyyz Jun 6, 2024
8657e51
scripts: improve convert-database script
magicxyyz Jun 11, 2024
09d0371
cmd/dbconv: return 1 on error from main binary
magicxyyz Jun 11, 2024
c47ee34
scripts: add --help flag to convert-databases.bash
magicxyyz Jun 11, 2024
bc8b85d
Merge branch 'master' into db-conversion
magicxyyz Jun 11, 2024
25ab55d
scripts: add extra flags check in convert-databases.bash
magicxyyz Jun 11, 2024
897a33e
Merge branch 'master' into db-conversion
magicxyyz Jun 11, 2024
7f9f7ef
Update cmd/dbconv/dbconv/config.go
magicxyyz Jun 21, 2024
6595a44
Merge branch 'master' into db-conversion
magicxyyz Jun 21, 2024
cfb1393
dbconv: address review comments
magicxyyz Jun 17, 2024
6a22f1d
refactor convert-databases script
magicxyyz Jun 25, 2024
a2d3507
Merge branch 'master' into db-conversion
magicxyyz Jun 25, 2024
dc24202
retab convert-databases script
magicxyyz Jun 25, 2024
bc3d784
pass default config to DBConfigAddOptions
magicxyyz Jun 28, 2024
37a826e
dbconv/stats: rename AddBytes/AddEntries to LogBytes/LogEntries
magicxyyz Jun 28, 2024
0ca882a
remove dst dirs when conversion fails
magicxyyz Jul 1, 2024
a5ebac9
Merge branch 'master' into db-conversion
magicxyyz Jul 1, 2024
1d69881
clean up conver-databases script
magicxyyz Jul 1, 2024
7527848
add unfinished convertion canary key
magicxyyz Jul 3, 2024
77eb4da
Merge branch 'master' into db-conversion
magicxyyz Jul 25, 2024
8422e35
Merge branch 'master' into db-conversion
magicxyyz Aug 7, 2024
21f2ee4
enable archive mode for HashScheme only in db conversion system test
magicxyyz Aug 7, 2024
7895656
check for canary key when initializing databases
magicxyyz Aug 13, 2024
8ec8389
fix db_conversion_test for PathScheme
magicxyyz Aug 13, 2024
ee7cba8
convert-databases: by default on conversion failure remove only unfin…
magicxyyz Aug 13, 2024
6a9e8ff
Merge branch 'master' into db-conversion
magicxyyz Aug 13, 2024
4ebfd7a
fix NodeBuilder.RestartL2Node - use l2StackConfig from builder
magicxyyz Aug 14, 2024
b0484c5
add extra checks to db conversion system test
magicxyyz Aug 14, 2024
53c448f
move UnfinishedConversionCheck to dbutil package
magicxyyz Aug 14, 2024
4d79dfc
Merge branch 'master' into db-conversion
magicxyyz Aug 14, 2024
5b61070
convert-databases.bash: fix handling directories containing spaces
magicxyyz Aug 15, 2024
3fdab93
remove comment
magicxyyz Aug 15, 2024
a78fb97
copy convert-databases script to docker
magicxyyz Aug 15, 2024
44f0e18
Merge branch 'master' into db-conversion
magicxyyz Aug 15, 2024
bc8803a
fix RestartL2Node - pass initMessage to createL2BlockChainWithStackCo…
magicxyyz Aug 16, 2024
93aaaef
Merge branch 'master' into db-conversion
tsahee Aug 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
add verification option
magicxyyz committed Jan 4, 2024
commit 97ea9b78bfae84c6f682ac2212f20bf3ec1a48f9
23 changes: 21 additions & 2 deletions cmd/dbconv/dbconv/config.go
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
package dbconv

import (
"github.com/ethereum/go-ethereum/ethdb"
"fmt"

"github.com/ethereum/go-ethereum/log"
flag "github.com/spf13/pflag"
)

@@ -29,12 +31,16 @@ type DBConvConfig struct {
Threads int `koanf:"threads"`
IdealBatchSize int `koanf:"ideal-batch-size"`
MinBatchesBeforeFork int `koanf:"min-batches-before-fork"`
Verify int `koanf:"verify"`
VerifyOnly bool `koanf:"verify-only"`
}

var DefaultDBConvConfig = DBConvConfig{
IdealBatchSize: ethdb.IdealBatchSize,
IdealBatchSize: 100 * 1024 * 1024, // 100 MB
MinBatchesBeforeFork: 10,
Threads: 0,
Verify: 1,
VerifyOnly: false,
}

func DBConvConfigAddOptions(f *flag.FlagSet) {
@@ -43,4 +49,17 @@ func DBConvConfigAddOptions(f *flag.FlagSet) {
f.Int("threads", DefaultDBConvConfig.Threads, "number of threads to use (0 = auto)")
f.Int("ideal-batch-size", DefaultDBConvConfig.IdealBatchSize, "ideal write batch size") // TODO
f.Int("min-batches-before-fork", DefaultDBConvConfig.MinBatchesBeforeFork, "minimal number of batches before forking a thread") // TODO
f.Int("verify", DefaultDBConvConfig.Verify, "enables verification (0 = disabled, 1 = only keys, 2 = keys and values)") // TODO
f.Bool("verify-only", DefaultDBConvConfig.VerifyOnly, "skips conversion, runs verification only") // TODO
}

func (c *DBConvConfig) Validate() error {
if c.Verify < 0 || c.Verify > 2 {
return fmt.Errorf("Invalid verify config value: %v", c.Verify)
}
if c.VerifyOnly && c.Verify == 0 {
log.Info("enabling keys verification as --verify-only flag is set")
c.Verify = 1
}
diegoximenes marked this conversation as resolved.
Show resolved Hide resolved
return nil
}
46 changes: 46 additions & 0 deletions cmd/dbconv/dbconv/dbconv.go
Original file line number Diff line number Diff line change
@@ -213,6 +213,52 @@ drainLoop:
return nil
}

func (c *DBConverter) Verify(ctx context.Context) error {
if c.config.Verify == 1 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: converting config.Verify from an int to an enum here can improve code readability in places where config.Verify is being used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the config to string and as it's not used in many places I didn't convert it to an enum, just used it as string

log.Info("Starting quick verification - verifying only keys existence")
} else {
log.Info("Starting full verification - verifying keys and values")
}
var err error
defer c.Close()
c.src, err = openDB(&c.config.Src, true)
if err != nil {
return err
}
c.dst, err = openDB(&c.config.Dst, true)
if err != nil {
return err
}

c.stats.Reset()
c.stats.AddThread()
it := c.src.NewIterator(nil, nil)
defer it.Release()
for it.Next() {
switch c.config.Verify {
case 1:
if has, err := c.dst.Has(it.Key()); !has {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should check if err is non nil here

return fmt.Errorf("Missing key in destination db, key: %v, err: %w", it.Key(), err)
}
c.stats.AddBytes(int64(len(it.Key())))
case 2:
dstValue, err := c.dst.Get(it.Key())
if err != nil {
return err
}
if !bytes.Equal(dstValue, it.Value()) {
return fmt.Errorf("Value mismatch for key: %v, src value: %v, dst value: %s", it.Key(), it.Value(), dstValue)
}
c.stats.AddBytes(int64(len(it.Key()) + len(dstValue)))
default:
return fmt.Errorf("Invalid verify config value: %v", c.config.Verify)
}
c.stats.AddEntries(1)
}
c.stats.DecThread()
return nil
}

func (c *DBConverter) Stats() *Stats {
return &c.stats
}
44 changes: 37 additions & 7 deletions cmd/dbconv/main.go
Original file line number Diff line number Diff line change
@@ -2,6 +2,7 @@ package main

import (
"context"
"fmt"
"os"
"time"

@@ -26,20 +27,36 @@ func parseDBConv(args []string) (*dbconv.DBConvConfig, error) {
return &config, nil
}

func printSampleUsage(name string) {
fmt.Printf("Sample usage: %s [OPTIONS] \n\n", name)
fmt.Printf("Options:\n")
fmt.Printf(" --help\n")
fmt.Printf(" --src.db-engine <leveldb or pebble>\n")
fmt.Printf(" --src.data <source database directory>\n")
fmt.Printf(" --dst.db-engine <leveldb or pebble>\n")
fmt.Printf(" --dst.data <destination database directory>\n")
}
func main() {
args := os.Args[1:]
config, err := parseDBConv(args)
if err != nil {
panic(err)
confighelpers.PrintErrorAndExit(err, printSampleUsage)
return
}
err = genericconf.InitLog("plaintext", log.LvlDebug, &genericconf.FileLoggingConfig{Enable: false}, nil)
if err != nil {
panic(err)
log.Error("Failed to init logging", "err", err)
return
}

if err = config.Validate(); err != nil {
log.Error("Invalid config", "err", err)
return
}
conv := dbconv.NewDBConverter(config)
ctx, cancel := context.WithCancel(context.Background())
defer cancel()

go func() {
magicxyyz marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a total must - but I'd consider making stats a StopWaiter, and give it Start function that uses CallIteratively.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StopWaiter doesn't support stopping and resuming, I'd leave it as is for simplicity.

ticker := time.NewTicker(5 * time.Second)
defer ticker.Stop()
@@ -55,10 +72,23 @@ func main() {
}
}()

err = conv.Convert(ctx)
if err != nil {
panic(err)
if !config.VerifyOnly {
err = conv.Convert(ctx)
magicxyyz marked this conversation as resolved.
Show resolved Hide resolved
if err != nil {
log.Error("Conversion error", "err", err)
return
}
stats := conv.Stats()
log.Info("Conversion finished.", "entries", stats.Entries(), "avg e/s", stats.AverageEntriesPerSecond(), "avg MB/s", stats.AverageBytesPerSecond()/1024/1024, "elapsed", stats.Elapsed())
}

if config.Verify > 0 {
err = conv.Verify(ctx)
magicxyyz marked this conversation as resolved.
Show resolved Hide resolved
if err != nil {
log.Error("Verification error", "err", err)
return
}
stats := conv.Stats()
log.Info("Verification completed successfully.", "elapsed:", stats.Elapsed())
}
stats := conv.Stats()
log.Info("Conversion finished.", "entries", stats.Entries(), "avg e/s", stats.AverageEntriesPerSecond(), "avg MB/s", stats.AverageBytesPerSecond()/1024/1024, "elapsed", stats.Elapsed())
}