-
Notifications
You must be signed in to change notification settings - Fork 577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sentence Transformers test (soon) no longer expected to fail #1918
Sentence Transformers test (soon) no longer expected to fail #1918
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay! Thanks for opening this PR @tomaarsen!
Just let me know what sentence-transformers is released so we can merge this PR.
Will do! |
Yes it does! I forgot about that. Then we should be good to merge once the CI is green. |
|
||
# Check model has been pushed properly | ||
model_id = f"{user}/{repo_name}" | ||
assert model_info(model_id).library_name == "sentence-transformers" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like for some reason the library_name
attribute is not set when requesting the model info 😕
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly odd. When I install with git+https://github.com/UKPLab/sentence-transformers.git#egg=sentence-transformers
, it fails the first 2 times that I run pytest
, and passes the 2 subsequent times :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When it fails, the only siblings
in ModelInfo
is [RepoSibling(rfilename='.gitattributes', size=None, blob_id=None, lfs=None)]
, so the repo only has the gitattributes. If I breakpoint
and inspect the times that it does pass, then all files are seen by the model_info
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and between each run, is model_id = f"{user}/{repo_name}"
changing? The value should be unique for each run (generated here). I'm asking because having dependency between tests is odd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it does differ between runs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha, now the spaCy tests are failing & sentence_transformers is passing. Perhaps it's indeed a flaky test setup where model_info
gets slightly outdated data if it is called "too soon".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @Kakulukian @coyotte508 do you know if something has changed recently on the /api/models/<repo_id>
endpoint that would make the model cache longer to update? In the test and thread above, we are doing:
- create new repo
- push model with modelcard
- get model info (GET
/api/models/repo_id
) - check
model_info.library_name
.
It looks like adding a 1 second delay between steps 2. and 3. makes the test more robust. But I don't remember this was the case before. It is not a problem to add this delay in our tests but prefer to let you know in case it's a bigger problem server-side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we rely on cache instead of building from scratch. So if the cache update (following the push) is going on in the background, it can display outdated info.
You can bypass this by passing the commit ID instead of HEAD
, eg /api/models/<repo_id>/revision/<commit_id>
You can pass main
as the commit id (for now it should work, maybe later we'll optimize)
We can add support for Cache-Control
header to skip cache if needed later on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this seeems to happen only for commit
endpoint, not push (as it's been awaited since https://github.com/huggingface/moon-landing/pull/5501)
We can fix it but potentially commit endpoint will take longer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation! No need to optimize anything on the endpoint as it's mostly a problem in internal tests. Good to know about the revision workaround 👍
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1918 +/- ##
===========================================
+ Coverage 49.01% 81.99% +32.98%
===========================================
Files 65 65
Lines 8092 8092
===========================================
+ Hits 3966 6635 +2669
+ Misses 4126 1457 -2669 ☔ View full report in Codecov by Sentry. |
Thanks for debugging @tomaarsen. Let's merge this now :) |
Hello!
Pull Request overview
xfail
from Sentence Transformer "save_to_hub" test.Details
The upcoming Sentence Transformers 2.3.0 has removed the hardcoded endpoint in
save_to_hub
, so we can soon expect for this behaviour to work correctly.