Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metadata benchmarks #1055

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

turan18
Copy link
Contributor

@turan18 turan18 commented Jan 29, 2024

Issue #, if available:

Description of changes:

Add benchmark tests that benchmark metadata DB insertion performance. Added a helper function to generate random TAR file (TOC) with given number of files/entries.

Testing performed:

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@turan18 turan18 force-pushed the add_metadata_benchmarks branch 15 times, most recently from c7bfdda to d5bba30 Compare February 5, 2024 16:48
@turan18 turan18 marked this pull request as ready for review February 5, 2024 16:53
@turan18 turan18 requested a review from a team as a code owner February 5, 2024 16:53
@@ -181,3 +186,12 @@ func RandomDigest() string {
d := digest.FromBytes(RandomByteData(10))
return d.String()
}

// RandString returns a random string of length n
func RandString(n int) string {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File names must be random since we use them to traverse the fs tree when creating the DB and so RandString does not use our seeded random. We still use a fixed size, so there shouldn't be any variance between runs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this statement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we used the seeded random all our filenames would be the same and the only thing differentiating them would be depth level (eg: file vs file/file vs file/file/file).

When looping through the TOC we maintain a map of node ID to metadata entry. The metadata entry has a map of the nodes children where the child name is the key. When we are adding children to the metadata entry of a node we will end up overwriting any existing children if they share the same name, which never happens in practice since you cannot multiple children nodes/files with the same name under a single parent node/directory. This means that our metadata/nodes bucket will not be fully populated, since a parent can really only have 1 child.

To avoid this, we use rand so we can get an actual pseudo random string. We still use a fixed length of 10 for the filename, to ensure their isn't any variance in bbolt write performance between benchmark runs. (bbolt doesn't care about the content of a KV pair since they are just interpreted as byte slices; the length, however, does matter since it controls how nodes/pages are split before writing to disk).

@turan18 turan18 force-pushed the add_metadata_benchmarks branch from d5bba30 to df9ec63 Compare February 7, 2024 14:26
}
defer os.Remove(f.Name())
db, err := bolt.Open(f.Name(), 0600, nil)
cwdPath, err := os.Getwd()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Since we want to measure write performance to disk, we have to write to non tmpfs location

@turan18 turan18 force-pushed the add_metadata_benchmarks branch from df9ec63 to 8f8eb12 Compare February 7, 2024 14:36
Copy link
Contributor

@sondavidb sondavidb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, a lot of minor changes that I just want a bit of attention on before approving. Overall the functionality looks great and I think it's a pretty cool addition to our testing suite.

metadata/reader_test.go Show resolved Hide resolved
metadata/reader_test.go Show resolved Hide resolved
metadata/reader_test.go Show resolved Hide resolved
metadata/reader_test.go Show resolved Hide resolved
metadata/util_test.go Show resolved Hide resolved
metadata/reader_test.go Show resolved Hide resolved
@turan18 turan18 force-pushed the add_metadata_benchmarks branch 2 times, most recently from f2d02b9 to 692b984 Compare February 13, 2024 21:36
metadata/reader_test.go Show resolved Hide resolved
@@ -181,3 +186,12 @@ func RandomDigest() string {
d := digest.FromBytes(RandomByteData(10))
return d.String()
}

// RandString returns a random string of length n
func RandString(n int) string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this statement.

util/testutil/tar.go Outdated Show resolved Hide resolved
@turan18 turan18 force-pushed the add_metadata_benchmarks branch from 692b984 to 86bf1fa Compare February 14, 2024 02:00
Add benchmarks functions that benchmark sequential and concurrent
writes to the underlying metadata db.

Signed-off-by: Yasin Turan <[email protected]>
@turan18 turan18 force-pushed the add_metadata_benchmarks branch from 86bf1fa to 9ffd6e7 Compare February 16, 2024 19:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants