feat(iroh)!: Blob batch PR, attempt 3 #2545

rklaehn · 2024-07-25T09:05:01Z

Description

This is the third attempt to add a batch API for adding blobs. Previous one was #2339

The basic idea is the following: all changes to the store happen in the context of a batch. All write operations within a batch produce temp tags. These temp tags are scoped to the batch and keep the data alive as long as the batch exists.

At some point, the API user has to upgrade one or more temp tags to permanent tags.

All non-batch operations would long term be implemented in terms of batch operations.

In a second step, the following rpc calls would be replaced by their batch equivalent.

AddStream
AddPath
CreateCollection

The third one is very nice, since it means that the notion of a collection (as in a special kind of hashseq) no longer has to even exist in the node code.

Breaking Changes

iroh::client::blobs::BlobStatus has a new case NotFound
iroh::client::blobs::BlobStatus::Partial: size is now a BaoBlobSize instead of a u64

All other public changes are adding of new APIs.

Notes & open questions

Note: in the previous version I had an optimisation to avoid storing TempTags in the case where there are multiple TempTags with the same hash. I removed this to keep things simple. We can add it back later.

Change checklist

Self-review.
Documentation updates following the style guide, if relevant.
Tests if relevant.
All breaking changes documented.

also fix docs

matheus23 · 2024-07-25T15:42:10Z

I would be curious what you think of the API.

I admit, after looking at it a little more, I'm somewhat surprised by the TempTag API combined with the Batch. I guess I had a more limited API that's more tailored to creating Collections or HashSeqes in mind, something like this:

impl iroh::client::blobs::Client {
  pub async fn batch() -> Result<Batch>;
}

impl iroh::client::blobs::Batch {
  pub async fn push_bytes(&self, name: String, bytes: impl Into<Bytes>) -> Result<()>;
  pub async fn push_file(&self, name: String, path: PathBuf) -> Result<u64>;
  pub async fn push_dir(&self, name: String, root: PathBuf) -> Result<()>;
  // [...]
  pub async fn push_dir_with_opts(
    &self,
    root: PathBuf,
    opts: AddDirOpts,
  ) -> Result<()>;
  // [...]
  pub async fn commit(&self) -> Result<Tag>; // maybe even taking an owned `self`?
  pub async fn commit_to(&self, tag: &Tag) -> Result<()>;
}

So this would commit the batch to be a Collection, and you wouldn't surface the concept of TempTags to the user at all.

But it's also more limited, so a classic complexity vs. expressiveness issue.
Thoughts?

Perhaps it makes sense to investigate some use-cases to see if the added expressiveness is worth it. "Importing the linux kernel" is definitely one.

rklaehn · 2024-07-25T16:11:06Z

I would be curious what you think of the API.

I admit, after looking at it a little more, I'm somewhat surprised by the TempTag API combined with the Batch. I guess I had a more limited API that's more tailored to creating Collections or HashSeqes in mind, something like this:
impl iroh::client::blobs::Client {
  pub async fn batch() -> Result<Batch>;
}

impl iroh::client::blobs::Batch {
  pub async fn push_bytes(&self, name: String, bytes: impl Into<Bytes>) -> Result<()>;
  pub async fn push_file(&self, name: String, path: PathBuf) -> Result<u64>;
  pub async fn push_dir(&self, name: String, root: PathBuf) -> Result<()>;
  // [...]
  pub async fn push_dir_with_opts(
    &self,
    root: PathBuf,
    opts: AddDirOpts,
  ) -> Result<()>;
  // [...]
  pub async fn commit(&self) -> Result<Tag>; // maybe even taking an owned `self`?
  pub async fn commit_to(&self, tag: &Tag) -> Result<()>;
}
So this would commit the batch to be a Collection, and you wouldn't surface the concept of TempTags to the user at all.

But it's also more limited, so a classic complexity vs. expressiveness issue. Thoughts?

Perhaps it makes sense to investigate some use-cases to see if the added expressiveness is worth it. "Importing the linux kernel" is definitely one.

(I would prefer the concept of a collection to be just a tiny sliver in the client)

The above proposal assumes that you always want to create exactly 1 collection or HashSeq, but e.g. it is possible that you want to create multiple individually tagged blobs or hashseqs in a batch.

The long term idea is that even interactions with the network that modify the local store will happen in the context of a batch, so you will be able to use temp tags.

Imagine traversing a dag. You get a blob, parse it for links, get another blob, parse it for links, finally find what you have been looking for, then assign a permanent tag. This would be easily expressible using the API using temp tags.

Admittedly the concept of temp tags is even more useful in ipfs where you can build more complex dags from the leafs and then assign a final tag once you are at the root. But I think it is sufficiently useful even in iroh with its 1 level hierarchies.

Ah, another problem with this API is that the creation of the HashSeq is more or less sequential, whereas with TempTags you can just spawn a bunch of tasks that download stuff, then collect into a sequence of TempTags, then create a HashSeq and make it permanent.

matheus23 · 2024-07-31T08:00:27Z

Okay. I'm convinced the additional expressiveness is worth it.
One thing I'd really like to change is the naming of Batch::upgrade, though. Can we switch it to something like make_permanent or commit? Or something else if there are other suggestions.
Upgrade only makes sense once I've read the docs for Tag and TempTag, looked at the Batch::upgrade signature and compared the two items in my head.
And if I'm looking for ways to make my temporary changes permanent, I end up skipping a function name like upgrade in my mind.

rklaehn · 2024-08-12T08:42:43Z

Okay. I'm convinced the additional expressiveness is worth it.

Glad I convinced you.

One thing I'd really like to change is the naming of Batch::upgrade, though. Can we switch it to something like make_permanent or commit? Or something else if there are other suggestions.

Agree that upgrade is not great. Maybe persist? I mean, a temp tag is purely in memory while a named tag is persistent in the database.

matheus23 · 2024-08-12T08:53:28Z

Yes to persist in favor of upgrade

# Conflicts: # iroh/src/client.rs # iroh/src/client/blobs.rs # iroh/src/lib.rs # iroh/src/node.rs

github-actions · 2024-08-14T14:46:31Z

Documentation for this PR has been generated and is available at: https://n0-computer.github.io/iroh/pr/2545/docs/iroh/

Last updated: 2024-08-15T09:03:08Z

rklaehn · 2024-08-14T15:29:31Z

Ok, renamed. So here is the batch API: https://n0-computer.github.io/iroh/pr/2545/docs/iroh/client/blobs/struct.Batch.html

matheus23

All in all I'm pretty happy with this. I'd love to now be able to turn all other non-batch blob APIs into ones that always call sync(), but I guess before we can really do that, we'd have needed to make sure that the docs API also has a batch API without immediate sync()s after every operation.
Anyhow, that's for another PR, but just repeating that that's why I'm excited for this.

As for the review:

Lots of small documentation nits/suggestions
I'm not really happy with the wait_for_gc function simply sleeping for 50s. We really shouldn't have tests like that. EDIT whoops that's not what it's doing. It's 50 milliseconds

iroh/src/client/blobs/batch.rs

iroh/src/node/rpc.rs

iroh/tests/batch.rs

Co-authored-by: Philipp Krüger <[email protected]>

rklaehn force-pushed the batch-blob-api-3 branch from b9e6766 to a372bd6 Compare July 25, 2024 09:09

rklaehn changed the title ~~Blob batch PR, attempt 3~~ feat(iroh)!: Blob batch PR, attempt 3 Jul 25, 2024

Add back code from the batch PR

ede3c55

rklaehn force-pushed the batch-blob-api-3 branch from a372bd6 to ede3c55 Compare July 25, 2024 09:14

rklaehn requested a review from matheus23 July 25, 2024 09:25

rklaehn added 3 commits July 25, 2024 12:53

Merge branch 'main' into batch-blob-api-3

19d2eb5

Make sure the required structs are publicly exported

481370a

also fix docs

Add optional (default on) sync to all tag operations

d5c31da

dignifiedquire mentioned this pull request Jul 29, 2024

feat(iroh-bytes): Batch blob api #2339

Closed

6 tasks

Merge branch 'main' into batch-blob-api-3

322008e

# Conflicts: # iroh/src/client.rs # iroh/src/client/blobs.rs # iroh/src/lib.rs # iroh/src/node.rs

rename upgrade to persist

6e94538

rklaehn force-pushed the batch-blob-api-3 branch from 2209c07 to 6e94538 Compare August 14, 2024 14:53

rklaehn added 2 commits August 14, 2024 17:56

Merge branch 'main' into batch-blob-api-3

5fa3f23

Merge branch 'main' into batch-blob-api-3

a904412

rklaehn marked this pull request as ready for review August 14, 2024 15:29

matheus23 requested changes Aug 15, 2024

View reviewed changes

rklaehn and others added 7 commits August 15, 2024 11:02

Update iroh/src/client/blobs/batch.rs

16da1b2

Co-authored-by: Philipp Krüger <[email protected]>

Update iroh/src/client/blobs/batch.rs

faafa8f

Co-authored-by: Philipp Krüger <[email protected]>

Update iroh/src/client/blobs/batch.rs

cebafe3

Co-authored-by: Philipp Krüger <[email protected]>

Update iroh/src/client/blobs/batch.rs

22d5f59

Co-authored-by: Philipp Krüger <[email protected]>

Update iroh/src/client/blobs/batch.rs

1afb0f7

Co-authored-by: Philipp Krüger <[email protected]>

Update iroh/src/client/blobs/batch.rs

41560ee

Co-authored-by: Philipp Krüger <[email protected]>

Update iroh/src/client/blobs/batch.rs

65b5f8f

Co-authored-by: Philipp Krüger <[email protected]>

rklaehn and others added 5 commits August 15, 2024 11:03

Update iroh/src/client/blobs/batch.rs

9e9b807

Co-authored-by: Philipp Krüger <[email protected]>

Update iroh/src/node/rpc.rs

724185f

Co-authored-by: Philipp Krüger <[email protected]>

Merge branch 'main' into batch-blob-api-3

36a63c5

fix docs

8ce97a3

use register_gc_done_cb for gc tests

483e731

matheus23 approved these changes Aug 15, 2024

View reviewed changes

rklaehn enabled auto-merge August 15, 2024 09:11

rklaehn added this pull request to the merge queue Aug 15, 2024

Merged via the queue into main with commit 9a55122 Aug 15, 2024
27 of 28 checks passed

dignifiedquire deleted the batch-blob-api-3 branch August 15, 2024 09:34

matheus23 mentioned this pull request Aug 21, 2024

blob store batch API #2297

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(iroh)!: Blob batch PR, attempt 3 #2545

feat(iroh)!: Blob batch PR, attempt 3 #2545

rklaehn commented Jul 25, 2024 •

edited

Loading

matheus23 commented Jul 25, 2024 •

edited

Loading

rklaehn commented Jul 25, 2024 •

edited

Loading

matheus23 commented Jul 31, 2024

rklaehn commented Aug 12, 2024

matheus23 commented Aug 12, 2024

github-actions bot commented Aug 14, 2024 •

edited

Loading

rklaehn commented Aug 14, 2024

matheus23 left a comment •

edited

Loading

feat(iroh)!: Blob batch PR, attempt 3 #2545

feat(iroh)!: Blob batch PR, attempt 3 #2545

Conversation

rklaehn commented Jul 25, 2024 • edited Loading

Description

Breaking Changes

Notes & open questions

Change checklist

matheus23 commented Jul 25, 2024 • edited Loading

rklaehn commented Jul 25, 2024 • edited Loading

matheus23 commented Jul 31, 2024

rklaehn commented Aug 12, 2024

matheus23 commented Aug 12, 2024

github-actions bot commented Aug 14, 2024 • edited Loading

rklaehn commented Aug 14, 2024

matheus23 left a comment • edited Loading

Choose a reason for hiding this comment

rklaehn commented Jul 25, 2024 •

edited

Loading

matheus23 commented Jul 25, 2024 •

edited

Loading

rklaehn commented Jul 25, 2024 •

edited

Loading

github-actions bot commented Aug 14, 2024 •

edited

Loading

matheus23 left a comment •

edited

Loading