Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sentence Transformers test (soon) no longer expected to fail #1918

Merged
merged 5 commits into from
Dec 18, 2023

Conversation

tomaarsen
Copy link
Member

Hello!

Pull Request overview

  • Remove xfail from Sentence Transformer "save_to_hub" test.
  • Add an assert to verify that it should work.

Details

The upcoming Sentence Transformers 2.3.0 has removed the hardcoded endpoint in save_to_hub, so we can soon expect for this behaviour to work correctly.

  • Tom Aarsen

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay! Thanks for opening this PR @tomaarsen!
Just let me know what sentence-transformers is released so we can merge this PR.

@tomaarsen
Copy link
Member Author

Will do!
In the meantime, do the contrib tests download the bleeding edge version from the respective repos?

@Wauplin
Copy link
Contributor

Wauplin commented Dec 18, 2023

In the meantime, do the contrib tests download the bleeding edge version from the respective repos?

Yes it does! I forgot about that. Then we should be good to merge once the CI is green.


# Check model has been pushed properly
model_id = f"{user}/{repo_name}"
assert model_info(model_id).library_name == "sentence-transformers"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like for some reason the library_name attribute is not set when requesting the model info 😕

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly odd. When I install with git+https://github.com/UKPLab/sentence-transformers.git#egg=sentence-transformers, it fails the first 2 times that I run pytest, and passes the 2 subsequent times :/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When it fails, the only siblings in ModelInfo is [RepoSibling(rfilename='.gitattributes', size=None, blob_id=None, lfs=None)], so the repo only has the gitattributes. If I breakpoint and inspect the times that it does pass, then all files are seen by the model_info.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and between each run, is model_id = f"{user}/{repo_name}" changing? The value should be unique for each run (generated here). I'm asking because having dependency between tests is odd

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it does differ between runs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, now the spaCy tests are failing & sentence_transformers is passing. Perhaps it's indeed a flaky test setup where model_info gets slightly outdated data if it is called "too soon".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @Kakulukian @coyotte508 do you know if something has changed recently on the /api/models/<repo_id> endpoint that would make the model cache longer to update? In the test and thread above, we are doing:

  1. create new repo
  2. push model with modelcard
  3. get model info (GET /api/models/repo_id)
  4. check model_info.library_name.

It looks like adding a 1 second delay between steps 2. and 3. makes the test more robust. But I don't remember this was the case before. It is not a problem to add this delay in our tests but prefer to let you know in case it's a bigger problem server-side.

Copy link
Member

@coyotte508 coyotte508 Dec 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we rely on cache instead of building from scratch. So if the cache update (following the push) is going on in the background, it can display outdated info.

You can bypass this by passing the commit ID instead of HEAD, eg /api/models/<repo_id>/revision/<commit_id>

You can pass main as the commit id (for now it should work, maybe later we'll optimize)

We can add support for Cache-Control header to skip cache if needed later on

Copy link
Member

@coyotte508 coyotte508 Dec 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this seeems to happen only for commit endpoint, not push (as it's been awaited since https://github.com/huggingface/moon-landing/pull/5501)

We can fix it but potentially commit endpoint will take longer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation! No need to optimize anything on the endpoint as it's mostly a problem in internal tests. Good to know about the revision workaround 👍

Copy link

codecov bot commented Dec 18, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (a071655) 49.01% compared to head (a74d62a) 81.99%.

❗ Current head a74d62a differs from pull request most recent head ae7389c. Consider uploading reports for the commit ae7389c to get more accurate results

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1918       +/-   ##
===========================================
+ Coverage   49.01%   81.99%   +32.98%     
===========================================
  Files          65       65               
  Lines        8092     8092               
===========================================
+ Hits         3966     6635     +2669     
+ Misses       4126     1457     -2669     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Wauplin
Copy link
Contributor

Wauplin commented Dec 18, 2023

Thanks for debugging @tomaarsen. Let's merge this now :)

@Wauplin Wauplin merged commit c4ddfc7 into huggingface:main Dec 18, 2023
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants