Skip to content

Commit

Permalink
smoke: add smoking test for cas and chunk dedup
Browse files Browse the repository at this point in the history
Add smoking test case for cas and chunk dedup.

Signed-off-by: Yadong Ding <[email protected]>
  • Loading branch information
Desiki-high committed Oct 16, 2024
1 parent a3e14f5 commit 64a27ce
Show file tree
Hide file tree
Showing 12 changed files with 341 additions and 8 deletions.
6 changes: 3 additions & 3 deletions docs/data-deduplication.md
Original file line number Diff line number Diff line change
Expand Up @@ -166,8 +166,8 @@ The node level CAS system helps to achieve O4 and O5.

# Node Level CAS System (Experimental)
Data deduplication can also be achieved when accessing Nydus images. The key idea is to maintain information about data chunks available on local host by using a database.
When a chunk is needed but not available in the uncompressed data blob files yet, we will query the database using chunk digest as key.
If a record with the same chunk digest already exists, it will be reused.
When a chunk is needed but not available in the uncompressed data blob files yet, we will query the database using chunk digest as key.
If a record with the same chunk digest already exists, it will be reused to reduce duplicate data downloads.
We call such a system as CAS (Content Addressable Storage).

## Chunk Deduplication by Using CAS as L2 Cache
Expand All @@ -181,7 +181,7 @@ It works in this way:
![chunk_dedup_l2cache](images/chunk_dedup_l2_cache.png)

A data download operation can be avoided if a chunk already exists in the database.
And if the underlying filesystem support data reference, `copy_file_range` will use reference instead of data copy, thus reduce storage space consumption.
And if the **underlying filesystem support data reference**, `copy_file_range` will use reference instead of data copy, thus reduce storage space consumption.
This design has benefit of robustness, the target blob file doesn't have any dependency on the database and source blob files, so ease garbage collection.
But it depends on capability of underlying filesystem to reduce storage consumption.

Expand Down
2 changes: 2 additions & 0 deletions smoke/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ require (
github.com/pkg/xattr v0.4.9
github.com/stretchr/testify v1.8.4
golang.org/x/sys v0.15.0
github.com/mattn/go-sqlite3 v1.14.23
)

require (
Expand All @@ -27,6 +28,7 @@ require (
github.com/google/go-cmp v0.6.0 // indirect
github.com/klauspost/compress v1.17.4 // indirect
github.com/kr/pretty v0.3.1 // indirect
github.com/mattn/go-sqlite3 v1.14.23 // indirect
github.com/moby/sys/mountinfo v0.7.1 // indirect
github.com/moby/sys/sequential v0.5.0 // indirect
github.com/opencontainers/image-spec v1.1.0-rc5 // indirect
Expand Down
6 changes: 6 additions & 0 deletions smoke/go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ github.com/cncf/udpa/go v0.0.0-20191209042840-269d4d468f6f/go.mod h1:M8M6+tZqaGX
github.com/containerd/cgroups v1.1.0 h1:v8rEWFl6EoqHB+swVNjVoCJE8o3jX7e8nqBGPLaDFBM=
github.com/containerd/cgroups v1.1.0/go.mod h1:6ppBcbh/NOOUU+dMKrykgaBnK9lCIBxHqJDGwsa1mIw=
github.com/containerd/containerd v1.7.11 h1:lfGKw3eU35sjV0aG2eYZTiwFEY1pCzxdzicHP3SZILw=
github.com/containerd/containerd v1.7.11/go.mod h1:5UluHxHTX2rdvYuZ5OJTC5m/KJNs0Zs9wVoJm9zf5ZE=
github.com/containerd/continuity v0.4.3 h1:6HVkalIp+2u1ZLH1J/pYX2oBVXlJZvh1X1A7bEZ9Su8=
github.com/containerd/continuity v0.4.3/go.mod h1:F6PTNCKepoxEaXLQp3wDAjygEnImnZ/7o4JzpodfroQ=
github.com/containerd/fifo v1.1.0 h1:4I2mbh5stb1u6ycIABlBw9zgtlK8viPI9QkQNRQEEmY=
Expand Down Expand Up @@ -53,6 +54,7 @@ github.com/google/go-cmp v0.5.0/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/
github.com/google/go-cmp v0.5.3/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE=
github.com/google/go-cmp v0.5.5/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE=
github.com/google/go-cmp v0.6.0 h1:ofyhxvXcZhMsU5ulbFiLKl/XBFqE1GSq7atu8tAmTRI=
github.com/google/go-cmp v0.6.0/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeNGIjoY=
github.com/google/uuid v1.1.2/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo=
github.com/google/uuid v1.5.0 h1:1p67kYwdtXjb0gL0BPiP1Av9wiZPo5A8z2cWkTZ+eyU=
github.com/google/uuid v1.5.0/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo=
Expand All @@ -67,7 +69,10 @@ github.com/kr/pty v1.1.1/go.mod h1:pFQYn66WHrOpPYNljwOMqo10TkYh1fy3cYio2l3bCsQ=
github.com/kr/text v0.1.0/go.mod h1:4Jbv+DJW3UT/LiOwJeYQe1efqtUx/iVham/4vfdArNI=
github.com/kr/text v0.2.0 h1:5Nx0Ya0ZqY2ygV366QzturHI13Jq95ApcVaJBhpS+AY=
github.com/kr/text v0.2.0/go.mod h1:eLer722TekiGuMkidMxC/pM04lWEeraHUUmBw8l2grE=
github.com/mattn/go-sqlite3 v1.14.23 h1:gbShiuAP1W5j9UOksQ06aiiqPMxYecovVGwmTxWtuw0=
github.com/mattn/go-sqlite3 v1.14.23/go.mod h1:Uh1q+B4BYcTPb+yiD3kU8Ct7aC0hY9fxUwlHK0RXw+Y=
github.com/moby/sys/mountinfo v0.7.1 h1:/tTvQaSJRr2FshkhXiIpux6fQ2Zvc4j7tAhMTStAG2g=
github.com/moby/sys/mountinfo v0.7.1/go.mod h1:IJb6JQeOklcdMU9F5xQ8ZALD+CUr5VlGpwtX+VE0rpI=
github.com/moby/sys/sequential v0.5.0 h1:OPvI35Lzn9K04PBbCLW0g4LcFAJgHsvXsRyewg5lXtc=
github.com/moby/sys/sequential v0.5.0/go.mod h1:tH2cOOs5V9MlPiXcQzRC+eEyab644PWKGRYaaV5ZZlo=
github.com/opencontainers/go-digest v1.0.0 h1:apOUWs51W5PlhuyGyz9FCeeBIOUDA/6nW8Oi/yOhh5U=
Expand Down Expand Up @@ -135,6 +140,7 @@ golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5h
golang.org/x/sys v0.0.0-20190412213103-97732733099d/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20200930185726-fdedc70b468f/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20220408201424-a24fb2fb8a0f/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.0.0-20220520151302-bc2c85ada10a/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.0.0-20220715151400-c0bba94af5f8/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.15.0 h1:h48lPFYpsTvQJZF4EKyI4aLHaev3CxivZmv7yZig9pc=
golang.org/x/sys v0.15.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
Expand Down
152 changes: 152 additions & 0 deletions smoke/tests/cas_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
// Copyright 2024 Nydus Developers. All rights reserved.
//
// SPDX-License-Identifier: Apache-2.0

package tests

import (
"database/sql"
"fmt"
"os"
"path/filepath"
"testing"
"time"

_ "github.com/mattn/go-sqlite3"

"github.com/dragonflyoss/nydus/smoke/tests/texture"
"github.com/dragonflyoss/nydus/smoke/tests/tool"
"github.com/dragonflyoss/nydus/smoke/tests/tool/test"
"github.com/stretchr/testify/require"
)

type CasTestSuite struct{}

func (c *CasTestSuite) TestCasTables() test.Generator {
scenarios := tool.DescartesIterator{}
scenarios.Dimension(paramEnablePrefetch, []interface{}{false, true})

return func() (name string, testCase test.Case) {
if !scenarios.HasNext() {
return
}
scenario := scenarios.Next()

return scenario.Str(), func(t *testing.T) {
c.testCasTables(t, scenario.GetBool(paramEnablePrefetch))
}
}
}

func (c *CasTestSuite) testCasTables(t *testing.T, enablePrefetch bool) {
ctx, layer := texture.PrepareLayerWithContext(t)
ctx.Runtime.EnablePrefetch = enablePrefetch
ctx.Runtime.ChunkDedupDb = filepath.Join(ctx.Env.WorkDir, "cas.db")

nydusd, err := tool.NewNydusdWithContext(*ctx)
require.NoError(t, err)
err = nydusd.Mount()
require.NoError(t, err)
defer nydusd.Umount()
nydusd.Verify(t, layer.FileTree)

db, err := sql.Open("sqlite3", ctx.Runtime.ChunkDedupDb)
require.NoError(t, err)
defer db.Close()

for _, expectedTable := range []string{"Blobs", "Chunks"} {
// Manual execution WAL Checkpoint
_, err = db.Exec("PRAGMA wal_checkpoint(FULL)")
require.NoError(t, err)
var count int
query := fmt.Sprintf("SELECT COUNT(*) FROM %s;", expectedTable)
err = db.QueryRow(query).Scan(&count)
require.NoError(t, err)
if expectedTable == "Blobs" {
require.Equal(t, 1, count)
} else {
require.Equal(t, 8, count)
}
}
}

func (c *CasTestSuite) TestCasGcUmountByAPI() test.Generator {
scenarios := tool.DescartesIterator{}
scenarios.Dimension(paramEnablePrefetch, []interface{}{false, true})

return func() (name string, testCase test.Case) {
if !scenarios.HasNext() {
return
}
scenario := scenarios.Next()

return scenario.Str(), func(t *testing.T) {
c.testCasGcUmountByAPI(t, scenario.GetBool(paramEnablePrefetch))
}
}
}

func (c *CasTestSuite) testCasGcUmountByAPI(t *testing.T, enablePrefetch bool) {
ctx, layer := texture.PrepareLayerWithContext(t)
defer ctx.Destroy(t)

config := tool.NydusdConfig{
NydusdPath: ctx.Binary.Nydusd,
MountPath: ctx.Env.MountDir,
APISockPath: filepath.Join(ctx.Env.WorkDir, "nydusd-api.sock"),
ConfigPath: filepath.Join(ctx.Env.WorkDir, "nydusd-config.fusedev.json"),
ChunkDedupDb: filepath.Join(ctx.Env.WorkDir, "cas.db"),
}
nydusd, err := tool.NewNydusd(config)
require.NoError(t, err)

err = nydusd.Mount()
defer nydusd.Umount()
require.NoError(t, err)

config.BootstrapPath = ctx.Env.BootstrapPath
config.MountPath = "/mount"
config.BackendType = "localfs"
config.BackendConfig = fmt.Sprintf(`{"dir": "%s"}`, ctx.Env.BlobDir)
config.BlobCacheDir = ctx.Env.CacheDir
config.CacheType = ctx.Runtime.CacheType
config.CacheCompressed = ctx.Runtime.CacheCompressed
config.RafsMode = ctx.Runtime.RafsMode
config.EnablePrefetch = enablePrefetch
config.DigestValidate = false
config.AmplifyIO = ctx.Runtime.AmplifyIO
err = nydusd.MountByAPI(config)
require.NoError(t, err)

nydusd.VerifyByPath(t, layer.FileTree, config.MountPath)

db, err := sql.Open("sqlite3", config.ChunkDedupDb)
require.NoError(t, err)
defer db.Close()

for _, expectedTable := range []string{"Blobs", "Chunks"} {
var count int
query := fmt.Sprintf("SELECT COUNT(*) FROM %s;", expectedTable)
err := db.QueryRow(query).Scan(&count)
require.NoError(t, err)
require.NotZero(t, count)
}

// Mock nydus snapshotter clear cache
os.RemoveAll(filepath.Join(ctx.Env.WorkDir, "cache"))
time.Sleep(1 * time.Second)

nydusd.UmountByAPI(config.MountPath)

for _, expectedTable := range []string{"Blobs", "Chunks"} {
var count int
query := fmt.Sprintf("SELECT COUNT(*) FROM %s;", expectedTable)
err := db.QueryRow(query).Scan(&count)
require.NoError(t, err)
require.Zero(t, count)
}
}

func TestCas(t *testing.T) {
test.Run(t, &CasTestSuite{})
}
125 changes: 125 additions & 0 deletions smoke/tests/chunk_dedup_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
// Copyright 2024 Nydus Developers. All rights reserved.
//
// SPDX-License-Identifier: Apache-2.0

package tests

import (
"context"
"encoding/json"
"io"
"net"
"net/http"
"os"
"path/filepath"
"testing"
"time"

"github.com/stretchr/testify/require"

"github.com/dragonflyoss/nydus/smoke/tests/texture"
"github.com/dragonflyoss/nydus/smoke/tests/tool"
"github.com/dragonflyoss/nydus/smoke/tests/tool/test"
)

const (
paramIteration = "iteration"
)

type ChunkDedupTestSuite struct{}

type BackendMetrics struct {
ReadCount uint64 `json:"read_count"`
ReadAmountTotal uint64 `json:"read_amount_total"`
ReadErrors uint64 `json:"read_errors"`
}

func (c *ChunkDedupTestSuite) TestChunkDedup() test.Generator {
scenarios := tool.DescartesIterator{}
scenarios.Dimension(paramIteration, []interface{}{1})

file, _ := os.CreateTemp("", "cas-*.db")
defer os.Remove(file.Name())

return func() (name string, testCase test.Case) {
if !scenarios.HasNext() {
return
}
scenario := scenarios.Next()

return scenario.Str(), func(t *testing.T) {
c.testRemoteWithDedup(t, file.Name())
}
}
}

func (c *ChunkDedupTestSuite) testRemoteWithDedup(t *testing.T, dbPath string) {
ctx, layer := texture.PrepareLayerWithContext(t)
defer ctx.Destroy(t)
ctx.Runtime.EnablePrefetch = false
ctx.Runtime.ChunkDedupDb = dbPath

nydusd, err := tool.NewNydusdWithContext(*ctx)
require.NoError(t, err)
err = nydusd.Mount()
require.NoError(t, err)
defer nydusd.Umount()
nydusd.Verify(t, layer.FileTree)
metrics := c.getBackendMetrics(t, filepath.Join(ctx.Env.WorkDir, "nydusd-api.sock"))
require.Zero(t, metrics.ReadErrors)

ctx2, layer2 := texture.PrepareLayerWithContext(t)
defer ctx2.Destroy(t)
ctx2.Runtime.EnablePrefetch = false
ctx2.Runtime.ChunkDedupDb = dbPath

nydusd2, err := tool.NewNydusdWithContext(*ctx2)
require.NoError(t, err)
err = nydusd2.Mount()
require.NoError(t, err)
defer nydusd2.Umount()
nydusd2.Verify(t, layer2.FileTree)
metrics2 := c.getBackendMetrics(t, filepath.Join(ctx2.Env.WorkDir, "nydusd-api.sock"))
require.Zero(t, metrics2.ReadErrors)

require.Greater(t, metrics.ReadCount, metrics2.ReadCount)
require.Greater(t, metrics.ReadAmountTotal, metrics2.ReadAmountTotal)
}

func (c *ChunkDedupTestSuite) getBackendMetrics(t *testing.T, sockPath string) *BackendMetrics {
transport := &http.Transport{
MaxIdleConns: 10,
IdleConnTimeout: 10 * time.Second,
ExpectContinueTimeout: 1 * time.Second,
DialContext: func(ctx context.Context, _, _ string) (net.Conn, error) {
dialer := &net.Dialer{
Timeout: 5 * time.Second,
KeepAlive: 5 * time.Second,
}
return dialer.DialContext(ctx, "unix", sockPath)
},
}

client := &http.Client{
Timeout: 30 * time.Second,
Transport: transport,
}

resp, err := client.Get("http://unix/api/v1/metrics/backend")
require.NoError(t, err)
defer resp.Body.Close()

body, err := io.ReadAll(resp.Body)
require.NoError(t, err)

var metrics BackendMetrics
if err = json.Unmarshal(body, &metrics); err != nil {
require.NoError(t, err)
}

return &metrics
}

func TestChunkDedup(t *testing.T) {
test.Run(t, &ChunkDedupTestSuite{})
}
3 changes: 3 additions & 0 deletions smoke/tests/native_layer_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ const (
paramCacheCompressed = "cache_compressed"
paramRafsMode = "rafs_mode"
paramEnablePrefetch = "enable_prefetch"
paramChunkDedupDb = "chunk_dedup_db"
)

type NativeLayerTestSuite struct {
Expand All @@ -44,6 +45,7 @@ func (n *NativeLayerTestSuite) TestMakeLayers() test.Generator {
Dimension(paramBatch, []interface{}{"0", "0x100000"}).
Dimension(paramEncrypt, []interface{}{false, true}).
Dimension(paramAmplifyIO, []interface{}{uint64(0x100000)}).
Dimension(paramChunkDedupDb, []interface{}{"", "/tmp/cas.db"}).
Skip(func(param *tool.DescartesItem) bool {
// rafs v6 not support cached mode nor dummy cache
if param.GetString(paramFSVersion) == "6" {
Expand Down Expand Up @@ -79,6 +81,7 @@ func (n *NativeLayerTestSuite) TestMakeLayers() test.Generator {
ctx.Runtime.RafsMode = scenario.GetString(paramRafsMode)
ctx.Runtime.EnablePrefetch = scenario.GetBool(paramEnablePrefetch)
ctx.Runtime.AmplifyIO = scenario.GetUInt64(paramAmplifyIO)
ctx.Runtime.ChunkDedupDb = scenario.GetString(paramChunkDedupDb)
n.testMakeLayers(*ctx, t)
}
}
Expand Down
Loading

0 comments on commit 64a27ce

Please sign in to comment.