-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Modify list_collections client methods to return a list of collection… #3216
Conversation
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
@itaismith, this feels extremely (and possibly unnecessarily) breaking change. I'll do some research on this but I feel that there are tons of apps with the following snippet of code: for collection in client.list_collections():
# do something with collection.name or collection.id or collection.metadata (e.g. filter based on name or some metadata)
# or any of the other methods count(), add(), delete() ... It seems to me that people have gotten around the issue that this PR is trying to address e.g. use a collection from list_collection to add data an expecting the embedding function to work out of the box. |
@tazarov we should make this breaking change because the default behavior is incorrect. When we fix the behavior, we can then have users do the right thing. We should prefer to break sooner than later, and support users to the extent we can in the meantime. @itaismith we might want to be clever here and make a wrapper for |
As a courtesy maybe we bump to a non patch version to ship this
…On Mon, Dec 2, 2024 at 10:19 AM Anton Troynikov ***@***.***> wrote:
@tazarov <https://github.com/tazarov> we should make this breaking change
because the default behavior is incorrect. When we fix the behavior, we can
then have users do the right thing.
We should prefer to break sooner than later, and support users to the
extent we can in the meantime. @itaismith <https://github.com/itaismith>
we might want to be clever here and make a wrapper for "str" where if you
try to call any collection methods on the name we return we give you an
informative error and point you at the migration guide.
—
Reply to this email directly, view it on GitHub
<#3216 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABKW32MPUSNV2VBYLWNGVQD2DSQCJAVCNFSM6AAAAABS2BF5Y6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMJSGM2DANJTG4>
.
You are receiving this because your review was requested.Message ID:
***@***.***>
|
395a827
to
ce7320e
Compare
ce7320e
to
5e4e3bc
Compare
5e4e3bc
to
732735d
Compare
732735d
to
146c7a2
Compare
146c7a2
to
71178d2
Compare
71178d2
to
072c87b
Compare
072c87b
to
b8fb8aa
Compare
b8fb8aa
to
0cc7a70
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to be sure we aren't shadowing str
functionality inadvertently, otherwise looks good to me minus nits.
chromadb/api/models/Collection.py
Outdated
if name in collection_attributes_and_methods: | ||
raise NotImplementedError( | ||
f"In Chroma v0.6.0, list_collections only returns collection names. " | ||
f"Use get_collection to access Collection.{name}. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should read Client.get_collection("{name}")
Collection.{name} is confusing and not something anywhere in our API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name
here is probably a confusing argument name. It is the attribute a user would try to access on the collection name string. So:
collection = client.list_collections()[0]
collection.add(...)
The error would be:
Use get_collection to access Collection.add
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I see now. I think this is confusing and should be made clearer Maybe:
"Use Client.get_collection({collection_name})
to call {method_name}
"
chromadb/api/models/Collection.py
Outdated
f"See https://docs.trychroma.com/deployment/migration for more information." | ||
) | ||
|
||
raise AttributeError(f"'CollectionName' object has no attribute '{name}'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation is a bit funky because str
doesn't have a __getattr__
but does have __getattribute__
which is subtly different. I'm concerned that we would be shadowing str
method calls accidentally;
- can we verify that this doesn't do anything unexpected with e.g.
.join
- if it doesn't, can we comment on how this works at the level of method dispatch in each case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is funky indeed. I can switch to explicit function definitions.
In any case this is a detailed description of the dispatch order:
When a user will try to access an attribute on a CollectionName
string, the __getattribute__
method of str
is invoked first. If a valid str method or property is found, it will be used. Otherwise, the fallback __getattr__
defined here is invoked next. It will error if the requested attribute is a Collection
method or property.
For example:
collection_name = client.list_collections()[0] # collection_name = "test"
collection_name.startsWith("t") # Evaluates to True.
__getattribute__
is invoked first, selecting startsWith
from str
.
collection_name.add(ids=[...], documents=[...]) # Raises the error defined in this class.
__getattribute__
is invoked first, not finding a match in str
. __getattr__
from this class is invoked and raises an error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's funky because if this was just str
, __getattr__
would never get called (and doesn't exist) - this is handled by __getattribute__
checking for it. I think that's fine but we should make some sort of comment.
@@ -89,7 +89,7 @@ def list_collections(self) -> None: | |||
colls = self.client.list_collections() | |||
assert len(colls) == len(self.model) | |||
for c in colls: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe explicitly name this variable now to indicate it's a name not an actual collection
0cc7a70
to
e5ec6e9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending nit.
chromadb/api/models/Collection.py
Outdated
if item in collection_attributes_and_methods: | ||
raise NotImplementedError( | ||
f"In Chroma v0.6.0, list_collections only returns collection names. " | ||
f"Use Client.get_collection(collection_name) to access {item}. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we know collection_name
here and should put it in the error string.
e5ec6e9
to
350bd54
Compare
Description of changes
list_collections
currently returns collection objects. Using these objects can lead to errors, since some collections may have been created with embedding functions that are not available at the time of calling tolist_collections
. This PR modifieslist_collections
to return only collection names, until we have EF persistence.Test plan
How are these changes tested?
pytest
for python,yarn test
for js,cargo test
for rustDocumentation Changes
Docstrings and reference updated.