Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add iter_keys_upto_delimiter #93

Merged
merged 6 commits into from
Jul 26, 2019
Merged

Add iter_keys_upto_delimiter #93

merged 6 commits into from
Jul 26, 2019

Conversation

crepererum
Copy link
Contributor

PR is split in the following way:

  1. fixing travis again
  2. implement the generic function
  3. implement fast path for FS and Azure

Closes #83

@coveralls
Copy link

coveralls commented Jun 12, 2019

Coverage Status

Coverage decreased (-7.7%) to 81.708% when pulling 4b5a3d0 on crepererum:issue_83 into 00500e2 on mbr:master.

@fmarczin fmarczin requested a review from mbr June 17, 2019 12:36
@fmarczin
Copy link
Collaborator

@mbr : This seems like a useful addition to me and complements the prefix support in list_keys() and other places quite nicely.
I'm asking for your approval, since this PR extends the core API.

@mbr
Copy link
Owner

mbr commented Jul 2, 2019

I'm not entirely sure this needs to be in the core API (I like to keep it small, as you know ;)). The "primitive" implementation is just iterating over all keys and collecting the prefix. While this is useful, is there a real need to add this to the core API?

There is one condition under which I would add it: If it is common for backend-implementations to offer a more efficient version of this (I think S3 might do this, but they might limit it to a fixed / delimiter). Otherwise, it would not be possible to take advantage of these.

@crepererum
Copy link
Contributor Author

crepererum commented Jul 8, 2019

@mbr I've added fast-paths for Azure (all possible delimiters) and FS (only works if the delimiter is the OS path separator).

For S3, this also works for any delimiter (see boto3 docs and AWS API docs), but I did not implement this since I think we should tackle #84 first and I'm not an S3 user.

So yes, it is rather common for upstream services to support this "directory-like" listing. I did some internal benchmarks for massive stores and you get huge advantages from the fast-path:

  • Azure and S3: the reason is that you get only a limited number of keys per HTTP request (4k for Azure, 1k for S3). So if you have like 1m keys in you store but you only need to list a specific prefix, this can make a difference between 1k and 1 HTTP request, or depending on your connection the difference between 1-many minutes and a sub-second response.
  • FS: the reason is just that you don't need to walk recursively through all possible directories but rather just list 1 concrete path.

@mbr
Copy link
Owner

mbr commented Jul 8, 2019

Nice. In that case you have my blessings, I shall bikeshed no more =). Probably could not hurt to mention these things somewhere in the docs, i.e. that some backends might have better performance characteristics than others for this.

@mbr mbr requested review from fmarczin and removed request for mbr July 8, 2019 09:20
@@ -108,6 +108,33 @@ def iter_keys(self, prefix=u""):
"""
raise NotImplementedError

def iter_prefixes_upto_delimiter(self, delimiter, prefix=u""):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would iter_prefixes work here as well? If so, shorter is better.

@@ -58,6 +58,20 @@ def iter_keys(self, prefix=u""):
return (self._unmap_key(k) for k in self._dstore.iter_keys(self._map_key_prefix(prefix))
if self._filter(k))

def iter_prefixes_upto_delimiter(self, delimiter, prefix=u""):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dito (iter_prefixes?)

Copy link
Collaborator

@fmarczin fmarczin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fundamentally agree, but have a suggestion to shorten the method name.

@crepererum
Copy link
Contributor Author

done

@fmarczin fmarczin merged commit c05ea19 into mbr:master Jul 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add 'delimiter' keyword to 'iter_keys'
4 participants