Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(misc-tools): create a selective db clone tool #8494 #10528

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

siarhei-agoric
Copy link
Contributor

@siarhei-agoric siarhei-agoric commented Nov 19, 2024

Description

ref: #8494

The db-clone.js tool allows the user to clone only a desired subset of content from SwingStore db into a separate db file, based on filtering criteria. It is intended to be used as primary data extraction and packaging tool for the transcript replay tool[s].

Command-line options:
--backup
creates a consistent full backup of a source db, even
when the db is opened and in active use by SwingSet.
Cannot be combined with any other option.

Remaining options can be combined together and with selection filters:
--transcripts[=all]
copy transcripts, all or according to other filters.

--snapshots[=all|auto|[,[, ...]]]
copy snapshots, all or according to other filters.
"auto" selects only snapshots mentioned within the transcript
items being copied.

--bundles[=all|auto|[,[, ...]]]
copy bundles, all or according to other filters.
"auto" selects only bundles mentioned within the transcript
items being copied.

Selection filters:
--vats=[all|[,[, ...]]]
Select only vats listed.
"all" has the same effect as not using --vats option at all.

--startPos=
Select only transcript items with position >= startPos.

--endPos=
Select only transcript items with position <= endPos.

Other options:
--stats
print out a short before/after summary.

--debug
print debugging information.

Examples:

Clone whole database:

node db-clone.js SwingStore.sqlite clone.sqlite --backup

Clone transcripts of vat v123 starting from position 1000 and through 1200, include only snapshots and bundles found within those transcripts:

node db-clone.js SwingStore.sqlite clone.sqlite
--vat=v123 --startPos=1000 --endPos=1200
--transcripts=yes --snapshots=auto --bundles=auto

Clone all bundles and all snapshots, show stats:

node db-clone.js SwingStore.sqlite clone.sqlite
--snapshots=all --bundles=all --stats

Security Considerations

The tool opens a source database in read-only mode and requires access only to what a local node admin would have access to.

Scaling Considerations

The tool creates a read transaction for the duration of operation, this can have a potential performance impact if used to clone a live production database.

Documentation Considerations

The tool is intended to be used by developers. See description above.

Testing Considerations

TBD; Manual testing so far.

Upgrade Considerations

None.

ref: #8494

The tool SwingSet/misc-tools/db-clone.js allows the user to clone only
a desired subset of content from SwingStore db into a separeate db file,
based on filtering criteria.

Command-line options:
--backup
	creates a consistent full backup of a source db, even
	when the db is opened and in active use by SwingSet.
	Cannot be combined with any other option.

Remaining options can be combined together and with selection filters:
--transcripts[=all]
	copy transcripts, all or according to other filters.

--snapshots[=all|auto|<id1>[,<id2>[, ...]]]
	copy snapshots, all or according to other filters.
	"auto" selects only snapshots mentioned within the transcript
	items being copied.

--bundles[=all|auto|<id1>[,<id2>[, ...]]]
	copy bundles, all or according to other filters.
	"auto" selects only bundles mentioned within the transcript
	items being copied.

Selection filters:
--vats=[all|<id1>[,<id2>[, ...]]]
	Select only vats listed.
	"all" has the same effect as not using --vats option at all.

--startPos=<pos>
	Select only transcript items with position >= startPos.

--endPos=<pos>
	Select only transcript items with position <= endPos.

Other options:
--stats
	print out a short before/after summary.

--debug
	print debugging infromation.

Examples:

Clone whole database:

  node db-clone.js SwingStore.sqlite clone.sqlite --backup

Clone transcripts of vat v123 starting from position 1000 and through 1200,
include only snapshots and bundles found within those transcripts:

  node db-clone.js SwingStore.sqlite clone.sqlite \
	--vat=v123 --startPos=1000 --endPos=1200 \
	--transcripts --snapshots=auto --bundles=auto

Clone all bundles and all snapshots, show stats:

  node db-clone.js SwingStore.sqlite clone.sqlite \
	--snapshots=all --bundles=all --stats
@siarhei-agoric siarhei-agoric added enhancement New feature or request SwingSet package: SwingSet labels Nov 19, 2024
Copy link
Member

@mhofman mhofman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't go through the actual implementation yet, but I see a decent amount of duplication with statements that already exist in the swing-store package. Any reason not to move that tool there and refactor to import common abstractions instead?

Comment on lines +364 to +383
// create tables regardless of whether any items had been found or not
destDb.exec(`CREATE TABLE IF NOT EXISTS transcriptItems (
vatID TEXT, position INTEGER, item TEXT, incarnation INTEGER, PRIMARY KEY (vatID, position)
)`);
destDb.exec(`CREATE TABLE IF NOT EXISTS transcriptSpans (
vatID TEXT, startPos INTEGER, endPos INTEGER, hash TEXT, isCurrent INTEGER CHECK (isCurrent = 1), incarnation INTEGER,
PRIMARY KEY (vatID, startPos), UNIQUE (vatID, isCurrent)
)`);

destDb.exec(
`CREATE INDEX IF NOT EXISTS currentTranscriptIndex ON transcriptSpans (vatID, isCurrent)`,
);

const insertTranscriptItems = destDb.prepare(
`INSERT OR IGNORE INTO transcriptItems (vatID, position, item, incarnation) VALUES (?, ?, ?, ?)`,
);
const insertTranscriptSpans = destDb.prepare(
`INSERT OR IGNORE INTO transcriptSpans (vatID, startPos, endPos, hash, isCurrent, incarnation) VALUES (?, ?, ?, ?, ?, ?)`,
);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried about maintenance. Is there any way to replicate the schema of the source db ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and no.
Yes: well, it is obviously available in the source DB itself.
No: we are filtering and looking at very particular columns to massage the data, which implies that we do know the schema upfront.

and now back to yes: this low-level data massaging should be abstracted away into a separate layer, probably below the *Store layer. Ideally it should also allow for SQL-native copy between the databases without having to go through JavaSctipt. See ATTACH DATABASE for example.

@mhofman mhofman self-requested a review November 19, 2024 19:32
@siarhei-agoric
Copy link
Contributor Author

Didn't go through the actual implementation yet, but I see a decent amount of duplication with statements that already exist in the swing-store package. Any reason not to move that tool there and refactor to import common abstractions instead?

I wanted to see what the actual tool would end up looking like and what functionality would it need before getting bogged down with all of the details of such critical component as a SwingStore and its *Store subcomponents. The main requirement was to be able to get the data I need out of production with as little chance of interference (both, performance and code dependencies) as possible. The key stumbling block is that current *Store implementations require write access to the source DB.
Now that the tool is available, two parallel independent paths are open: one is to use the tool as-is (for now) to move forward with transcript replay tool fix/development. The other one is to go back and re-factor and unify the tool with other critical components.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request SwingSet package: SwingSet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants