Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] ls/list_dir for AzureBlobStorage #1408

Merged
merged 15 commits into from
Sep 29, 2023
Merged

[FEAT] ls/list_dir for AzureBlobStorage #1408

merged 15 commits into from
Sep 29, 2023

Conversation

xcharleslin
Copy link
Contributor

@xcharleslin xcharleslin commented Sep 22, 2023

File listing for AzureBlobStorage via list_blobs(prefix) API. Closes

Manually checked:

  • scheme:// -> all buckets
  • scheme://bucket -> ls the bucket
  • scheme://bucket/ -> ls the bucket
  • scheme://bucket/dir -> ls the dir
  • scheme://bucket/dir/ -> ls the dir
  • scheme://bucket/dir// -> ls the dir
  • scheme://bucket/file.txt -> returns a single File entry
  • scheme://bucket/emptydir -> returns []
  • scheme://bucket/MISSING should throw FileNotFoundError
  • buckets and folders are returned with trailing slashes

Implementation notes:

  • This reimplements iter_dir of the ObjectSource trait, since the Azure Rust SDK exposes paginated streams directly (instead of continuation tokens).
  • Supports both protocols that fsspec supports (az:// and abfs://). Whichever protocol used is persisted.
  • Supports both URI formats that fsspec supports (protocol://container/path-part/file and protocol://[email protected]/path-part/file), but all results are translated to the first format.

@codecov
Copy link

codecov bot commented Sep 22, 2023

Codecov Report

Merging #1408 (06f033e) into main (069432d) will decrease coverage by 0.02%.
The diff coverage is n/a.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1408      +/-   ##
==========================================
- Coverage   74.66%   74.64%   -0.02%     
==========================================
  Files          60       60              
  Lines        6042     6042              
==========================================
- Hits         4511     4510       -1     
- Misses       1531     1532       +1     

see 1 file with indirect coverage changes

@xcharleslin xcharleslin changed the title Charles/azure ls [FEAT] ls/list_dir for AzureBlobStorage Sep 27, 2023
@xcharleslin xcharleslin marked this pull request as ready for review September 27, 2023 19:05
@github-actions github-actions bot added the enhancement New feature or request label Sep 27, 2023
.map(|container| {
Ok(self._container_to_file_metadata(protocol.clone(), container))
})
.collect::<Vec<_>>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to collect this if are you wrapping futures::stream::iter

Copy link
Contributor Author

@xcharleslin xcharleslin Sep 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, though I had to move all the arguments into the stream to do this; otherwise references escape the closure into the returned stream.

// https://docs.rs/azure_storage_blobs/0.15.0/azure_storage_blobs/container/operations/list_blobs/struct.ListBlobsBuilder.html
// https://docs.rs/azure_storage_blobs/0.15.0/azure_storage_blobs/service/operations/struct.ListContainersBuilder.html
// https://docs.rs/azure_core/0.15.0/azure_core/struct.Pageable.html
assert!(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return an error here instead rather than asserting

blob_item,
))
})
.collect::<Vec<_>>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also no need to collect here if we are feeding futures::stream::iter

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also you could also just use then and return the stream directly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The source here is an iterator not a stream, so I can't use then. Removed the collect though!

.delimiter(delimiter.to_string())
.prefix(upper_dir.to_string())
.into_stream()
.try_collect::<Vec<_>>()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldnt have to collect this and then iterate over it. You should be able to use then to transform (map) and then any to terminate as soon as it finds the condition of (filepath == full_path_with_trailing_delimiter).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Since the stream items are falliable Result<T> and I want to propagate the error, I can't use any.

Unfortunately, there is not a try_any in TryStreamExt either. Instead, I implemented what would have been try_any with .try_skip_while() and try_next().

upper_results_stream
.map_ok(|file_info| (file_info.filepath == full_path_with_trailing_delimiter))
.try_skip_while(|is_match| futures::future::ready(Ok(!is_match)))
.boxed()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dont think you need to box this to get the value

@xcharleslin xcharleslin merged commit 831d45c into main Sep 29, 2023
24 checks passed
@xcharleslin xcharleslin deleted the charles/azure-ls branch September 29, 2023 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants