Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: introduce CasManager to support chunk dedup at runtime #1626

Merged
merged 7 commits into from
Oct 23, 2024

Conversation

Desiki-high
Copy link
Member

@Desiki-high Desiki-high commented Sep 21, 2024

Relevant Issue (if applicable)

If there are Issues related to this PullRequest, please list it.

Details

Base #1507, complete implementation and testing.

Types of changes

What types of changes does your PullRequest introduce? Put an x in all the boxes that apply:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation Update (if none of the other choices apply)

Checklist

Go over all the following points, and put an x in all the boxes that apply.

  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.

@Desiki-high Desiki-high requested a review from a team as a code owner September 21, 2024 07:52
@Desiki-high Desiki-high requested review from imeoer, hsiangkao and power-more and removed request for a team September 21, 2024 07:52
Copy link

codecov bot commented Sep 21, 2024

Codecov Report

Attention: Patch coverage is 72.62248% with 95 lines in your changes missing coverage. Please review.

Project coverage is 60.51%. Comparing base (15ec192) to head (f6719a2).
Report is 7 commits behind head on master.

Files with missing lines Patch % Lines
storage/src/cache/dedup/mod.rs 83.41% 25 Missing and 9 partials ⚠️
storage/src/cache/cachedfile.rs 12.90% 25 Missing and 2 partials ⚠️
src/bin/nydusd/main.rs 0.00% 18 Missing ⚠️
storage/src/cache/filecache/mod.rs 62.50% 6 Missing ⚠️
storage/src/cache/fscache/mod.rs 60.00% 6 Missing ⚠️
storage/src/utils.rs 96.42% 0 Missing and 2 partials ⚠️
storage/src/cache/dedup/db.rs 80.00% 0 Missing and 1 partial ⚠️
utils/src/digest.rs 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1626      +/-   ##
==========================================
+ Coverage   60.43%   60.51%   +0.07%     
==========================================
  Files         146      146              
  Lines       48841    49178     +337     
  Branches    46322    46659     +337     
==========================================
+ Hits        29517    29760     +243     
- Misses      17558    17636      +78     
- Partials     1766     1782      +16     
Files with missing lines Coverage Δ
storage/src/cache/mod.rs 57.76% <ø> (ø)
storage/src/cache/dedup/db.rs 79.18% <80.00%> (+0.17%) ⬆️
utils/src/digest.rs 87.81% <0.00%> (-0.51%) ⬇️
storage/src/utils.rs 95.98% <96.42%> (+0.10%) ⬆️
storage/src/cache/filecache/mod.rs 66.55% <62.50%> (-0.12%) ⬇️
storage/src/cache/fscache/mod.rs 75.69% <60.00%> (-0.76%) ⬇️
src/bin/nydusd/main.rs 0.00% <0.00%> (ø)
storage/src/cache/cachedfile.rs 36.74% <12.90%> (-0.52%) ⬇️
storage/src/cache/dedup/mod.rs 77.37% <83.41%> (+77.37%) ⬆️

... and 3 files with indirect coverage changes

@Desiki-high Desiki-high force-pushed the storage/copy-range branch 9 times, most recently from b386bde to f360c3b Compare September 27, 2024 10:15
@Desiki-high Desiki-high force-pushed the storage/copy-range branch 4 times, most recently from 60478a0 to a9b8fe4 Compare October 1, 2024 06:09
@Desiki-high Desiki-high force-pushed the storage/copy-range branch 9 times, most recently from b2f8cfb to 64a27ce Compare October 16, 2024 11:50
jiangliu and others added 2 commits October 17, 2024 09:46
Add helper copy_file_range() which:
- avoid copy data into userspace
- may support reflink on xfs etc

Signed-off-by: Jiang Liu <[email protected]>
- improve copy_file_range when target os is not linux
- add more comprehensive tests

Signed-off-by: Yadong Ding <[email protected]>
jiangliu and others added 4 commits October 17, 2024 09:46
Implement CasManager to support chunk dedup at runtime.
The manager provides to major interfaces:
- add chunk data to the CAS database
- check whether a chunk exists in CAS database and copy it to blob file
  by copy_file_range() if the chunk exists.

Signed-off-by: Jiang Liu <[email protected]>
- Changed `delete_blobs` method in `CasDb` to take an immutable reference (`&self`) instead of a mutable reference (`&mut self`).
- Updated `dedup_chunk` method in `CasMgr` to correctly handle the deletion of non-existent blob files from both the file descriptor cache and the database.
- Implemented the `gc` (garbage collection) method in `CasMgr` to identify and remove blobs that no longer exist on the filesystem, ensuring the database and cache remain consistent.

Signed-off-by: Yadong Ding <[email protected]>
Enable chunk deduplication for file cache. It works in this way:
- When a chunk is not in blob cache file yet, inquery CAS database
  whether other blob data files have the required chunk. If there's
  duplicated data chunk in other data files, copy the chunk data
  into current blob cache file by using copy_file_range().
- After downloading a data chunk from remote, save file/offset/chunk-id
  into CAS database, so it can be reused later.

Co-authored-by: Jiang Liu <[email protected]>
Co-authored-by: Yading Ding <[email protected]>
Signed-off-by: Yadong Ding <[email protected]>
Add documentation for cas.

Signed-off-by: Jiang Liu <[email protected]>
smoke/tests/chunk_dedup_test.go Outdated Show resolved Hide resolved
smoke/tests/chunk_dedup_test.go Outdated Show resolved Hide resolved
smoke/tests/texture/layer.go Outdated Show resolved Hide resolved
Add smoking test case for cas and chunk dedup.

Signed-off-by: Yadong Ding <[email protected]>
Copy link
Collaborator

@imeoer imeoer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@imeoer imeoer merged commit 57c112a into dragonflyoss:master Oct 23, 2024
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants