-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Ingest External SST #188
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: wangnengjie <[email protected]>
Signed-off-by: wangnengjie <[email protected]>
…heck test Signed-off-by: wangnengjie <[email protected]>
Signed-off-by: wangnengjie <[email protected]>
The |
Signed-off-by: wangnengjie <[email protected]>
Signed-off-by: wangnengjie <[email protected]>
Yes... I remember it is set to 1. |
Well, that's... amazing... I debug a while for my failed test. I think I should make another pr to make it works right. |
Signed-off-by: wangnengjie <[email protected]>
Signed-off-by: wangnengjie <[email protected]>
Signed-off-by: wangnengjie <[email protected]>
Signed-off-by: wangnengjie <[email protected]>
Generally done. You can have a look. @GanZiheng BTW, the |
@@ -44,6 +44,7 @@ message ManifestChange { | |||
uint64 key_id = 4; | |||
EncryptionAlgo encryption_algo = 5; | |||
uint32 compression = 6; // Only used for CREATE Op. | |||
uint64 global_version = 7; // Only used for file ingest, 0 means no global_version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to add global_version
just for the ingest external file scenario?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. For ingested files, version of every key is invalid. Or if you have any better idea to assign version for keys in ingested files? Iter on every key and issue random writes is not a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. For ingested files, version of every key is invalid. Or if you have any better idea to assign version for keys in ingested files? Iter on every key and issue random writes is not a good idea.
How about rewriting files to be ingested? In this way, we can also deal with the large value cannot be stored in value log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. For ingested files, version of every key is invalid. Or if you have any better idea to assign version for keys in ingested files? Iter on every key and issue random writes is not a good idea.
How about rewriting files to be ingested? In this way, we can also deal with the large value cannot be stored in value log.
I think it really costs to rewrite. The bandwidth on cloud device is precious. Large value can store to value log when compactioning ingested files (do we impl this?). And, when rewriting, we are not sure these files can finally success to ingest and so split large values to value log needs more think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we only write value log when writing entry to LSM tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we only write value log when writing entry to LSM tree.
Split during compaction is nice when the threshold changes dynamically or reopen db with different config.
Well, we can discuss this point tomorrow or offline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. For ingested files, version of every key is invalid. Or if you have any better idea to assign version for keys in ingested files? Iter on every key and issue random writes is not a good idea.
How about rewriting files to be ingested? In this way, we can also deal with the large value cannot be stored in value log.
I think this will block read process. We need to get commit_ts (version) before rewrite and due to SSI protection, every read_ts after that should wait for this txn to finish (watermark).
Bytes::copy_from_slice(user_key(file.table.as_ref().unwrap().smallest())), | ||
Bytes::copy_from_slice(user_key(file.table.as_ref().unwrap().biggest())), | ||
); | ||
let version = self.core.orc.new_ingest_commit_ts(vec![range], &self.opts); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we only get commit timestamp once for all external files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to. But for files that overlap with each other, there might be same key. And if we get one commit_ts for all these files, there will be same key with same version in different SST which breaks the protection.
We currently use stable channel for the sake of avoiding CI failing too often. I think there is a format error in this version of code. |
As what I have said, Does CI often failed in nightly channel? I saw that all tests (except the first one) run in nightly channel for the support of sanitizer... |
Specifically, failed due to the clippy check. We can consider remove these flags or just turn back to nightly channel, I think that's not matter. |
Signed-off-by: wangnengjie <[email protected]>
Signed-off-by: wangnengjie <[email protected]>
36552fe
to
f241a88
Compare
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #188 +/- ##
==========================================
+ Coverage 89.63% 90.80% +1.16%
==========================================
Files 39 40 +1
Lines 8838 9679 +841
==========================================
+ Hits 7922 8789 +867
+ Misses 916 890 -26 |
Signed-off-by: wangnengjie [email protected]
See details in: #187