-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(iroh)!: Blob batch PR, attempt 3 #2545
Conversation
b9e6766
to
a372bd6
Compare
a372bd6
to
ede3c55
Compare
I admit, after looking at it a little more, I'm somewhat surprised by the impl iroh::client::blobs::Client {
pub async fn batch() -> Result<Batch>;
}
impl iroh::client::blobs::Batch {
pub async fn push_bytes(&self, name: String, bytes: impl Into<Bytes>) -> Result<()>;
pub async fn push_file(&self, name: String, path: PathBuf) -> Result<u64>;
pub async fn push_dir(&self, name: String, root: PathBuf) -> Result<()>;
// [...]
pub async fn push_dir_with_opts(
&self,
root: PathBuf,
opts: AddDirOpts,
) -> Result<()>;
// [...]
pub async fn commit(&self) -> Result<Tag>; // maybe even taking an owned `self`?
pub async fn commit_to(&self, tag: &Tag) -> Result<()>;
} So this would But it's also more limited, so a classic complexity vs. expressiveness issue. Perhaps it makes sense to investigate some use-cases to see if the added expressiveness is worth it. "Importing the linux kernel" is definitely one. |
(I would prefer the concept of a collection to be just a tiny sliver in the client) The above proposal assumes that you always want to create exactly 1 collection or HashSeq, but e.g. it is possible that you want to create multiple individually tagged blobs or hashseqs in a batch. The long term idea is that even interactions with the network that modify the local store will happen in the context of a batch, so you will be able to use temp tags. Imagine traversing a dag. You get a blob, parse it for links, get another blob, parse it for links, finally find what you have been looking for, then assign a permanent tag. This would be easily expressible using the API using temp tags. Admittedly the concept of temp tags is even more useful in ipfs where you can build more complex dags from the leafs and then assign a final tag once you are at the root. But I think it is sufficiently useful even in iroh with its 1 level hierarchies. Ah, another problem with this API is that the creation of the HashSeq is more or less sequential, whereas with TempTags you can just spawn a bunch of tasks that download stuff, then collect into a sequence of TempTags, then create a HashSeq and make it permanent. |
Okay. I'm convinced the additional expressiveness is worth it. |
Glad I convinced you.
Agree that upgrade is not great. Maybe persist? I mean, a temp tag is purely in memory while a named tag is persistent in the database. |
Yes to |
# Conflicts: # iroh/src/client.rs # iroh/src/client/blobs.rs # iroh/src/lib.rs # iroh/src/node.rs
Documentation for this PR has been generated and is available at: https://n0-computer.github.io/iroh/pr/2545/docs/iroh/ Last updated: 2024-08-15T09:03:08Z |
2209c07
to
6e94538
Compare
Ok, renamed. So here is the batch API: https://n0-computer.github.io/iroh/pr/2545/docs/iroh/client/blobs/struct.Batch.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All in all I'm pretty happy with this. I'd love to now be able to turn all other non-batch blob APIs into ones that always call sync()
, but I guess before we can really do that, we'd have needed to make sure that the docs
API also has a batch API without immediate sync()
s after every operation.
Anyhow, that's for another PR, but just repeating that that's why I'm excited for this.
As for the review:
- Lots of small documentation nits/suggestions
- I'm not really happy with the
wait_for_gc
function simply sleeping for 50s. We really shouldn't have tests like that. EDIT whoops that's not what it's doing. It's 50 milliseconds
Co-authored-by: Philipp Krüger <[email protected]>
Co-authored-by: Philipp Krüger <[email protected]>
Co-authored-by: Philipp Krüger <[email protected]>
Co-authored-by: Philipp Krüger <[email protected]>
Co-authored-by: Philipp Krüger <[email protected]>
Co-authored-by: Philipp Krüger <[email protected]>
Co-authored-by: Philipp Krüger <[email protected]>
Co-authored-by: Philipp Krüger <[email protected]>
Co-authored-by: Philipp Krüger <[email protected]>
Description
This is the third attempt to add a batch API for adding blobs. Previous one was #2339
The basic idea is the following: all changes to the store happen in the context of a batch. All write operations within a batch produce temp tags. These temp tags are scoped to the batch and keep the data alive as long as the batch exists.
At some point, the API user has to upgrade one or more temp tags to permanent tags.
All non-batch operations would long term be implemented in terms of batch operations.
In a second step, the following rpc calls would be replaced by their batch equivalent.
The third one is very nice, since it means that the notion of a collection (as in a special kind of hashseq) no longer has to even exist in the node code.
Breaking Changes
All other public changes are adding of new APIs.
Notes & open questions
Note: in the previous version I had an optimisation to avoid storing TempTags in the case where there are multiple TempTags with the same hash. I removed this to keep things simple. We can add it back later.
Change checklist