-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support runtime chunk deduplication #1507
Conversation
7cc1aa1
to
ac55d88
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1507 +/- ##
========================================
Coverage 62.73% 62.74%
========================================
Files 129 129
Lines 44153 44360 +207
Branches 44153 44360 +207
========================================
+ Hits 27700 27834 +134
- Misses 15087 15144 +57
- Partials 1366 1382 +16
|
bc4403c
to
946d8a0
Compare
13de7e8
to
40415e5
Compare
5403291
to
1c695cf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NICE. LGTM.
copy_file_range
is triggered in tests. It's effective.
1c695cf
to
d199ab8
Compare
Add helper copy_file_range() which: - avoid copy data into userspace - may support reflink on xfs etc Signed-off-by: Jiang Liu <[email protected]>
Implement CasManager to support chunk dedup at runtime. The manager provides to major interfaces: - add chunk data to the CAS database - check whether a chunk exists in CAS database and copy it to blob file by copy_file_range() if the chunk exists. Signed-off-by: Jiang Liu <[email protected]>
Enable chunk deduplication for file cache. It works in this way: - When a chunk is not in blob cache file yet, inquery CAS database whether other blob data files have the required chunk. If there's duplicated data chunk in other data files, copy the chunk data into current blob cache file by using copy_file_range(). - After downloading a data chunk from remote, save file/offset/chunk-id into CAS database, so it can be reused later. Signed-off-by: Jiang Liu <[email protected]>
Add smoking test case for chunk dedup. Signed-off-by: Jiang Liu <[email protected]>
Add documentation for cas. Signed-off-by: Jiang Liu <[email protected]>
d199ab8
to
7d287c9
Compare
|
||
// Verify lower layer mounted by nydusd | ||
ctx.Env.BootstrapPath = lowerBootstrap | ||
tool.Verify(t, ctx, lowerLayer.FileTree) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may need a way to check if the CAS works.
@@ -240,9 +240,16 @@ impl FileCacheEntry { | |||
}; | |||
let blob_compressed_size = Self::get_blob_size(&reader, &blob_info)?; | |||
|
|||
// Turn off chunk deduplication in case of tarfs. | |||
let cas_mgr = if is_tarfs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe give a warnning message about conflict feature here.
@@ -208,11 +209,18 @@ impl FileCacheEntry { | |||
} else { | |||
reader.clone() | |||
}; | |||
// Turn off chunk deduplication in case of cache data encryption is enabled or is tarfs. | |||
let cas_mgr = if mgr.cache_encrypted || mgr.cache_raw_data || is_tarfs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe give a warn message about conflict feature here.
If a record with the same chunk digest already exists, it will be reused. | ||
We call such a system as CAS (Content Addressable Storage). | ||
|
||
## Chunk Deduplication by Using CAS as L2 Cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still seems to be an experimental feature, do we still need to consider the cas.db
record recycling?
Hi all, I tried out this feature and it seems to work as expected. Is there something preventing it from being merged? |
cc @jiangliu any updates we can continue? :) |
what is the status of this PR? anything to do to help get it merged? :) |
Complete in #1626. |
Details
This PR enhances
nydusd
to support runtime chunk deduplication. It works in this way:copy_file_range()
.So there are two types of chunk deduplication:
copy_file_range()
will optimize to use reference instead of data copy, thus reduce local storage consuption.Types of changes
What types of changes does your PullRequest introduce? Put an
x
in all the boxes that apply:Checklist
Go over all the following points, and put an
x
in all the boxes that apply.