-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update metadata loading for oneformer #28398
Update metadata loading for oneformer #28398
Conversation
e9ce193
to
0779f42
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 nits but LGTM, thanks!
except RepositoryNotFoundError: | ||
fname = hf_hub_download(repo_id, class_info_file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason to try without repo_type=dataset
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just that someone might store it under the model repo rather than the dataset. As the model is trained on a dataset - there's already a coupling and we already store data specific mappings e.g. id2label on the model side.
I can remove if you think it's better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's ok :-)
Dataset repository on huggingface hub containing the JSON file with class information for the dataset. | ||
repo_path (`str`, *optional*, defaults to `"shi-labs/oneformer_demo"`): | ||
Path to hub repo or local directory containing the JSON file with class information for the dataset. | ||
If unset, will load `class_info_file` from the current working directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be more precise: if unset or the corresponding file could not be found locally ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If unset, then repo_path
is set to ""
- effectively making the filename class_info_file
rather than repo_path/class_info
, but it's independent of whether it can be found locally.
Would this be clearer:
Path to hub repo or local directory containing the JSON file with class information for the dataset.
If unset, will look for `class_info_file` in the current working directory.
And then adding an additional check in load_metadata
:
def load_metadata(repo_id, class_info_file):
fname = os.path.join("" if repo_id is None else repo_id, class_info_file)
if not os.path.exists(fname) or not os.path.isfile(fname):
if repo_id is None:
raise ValueError(f"Could not file {fname} locally. repo_id must be defined if loading from the hub")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either works for me, but we can go for the above one for simplicity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did both in a777668 :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks fo the cleanup
* Update meatdata loading for oneformer * Enable loading from a model repo * Update docstrings * Fix tests * Update tests * Clarify repo_path behaviour
* Update meatdata loading for oneformer * Enable loading from a model repo * Update docstrings * Fix tests * Update tests * Clarify repo_path behaviour
* Update meatdata loading for oneformer * Enable loading from a model repo * Update docstrings * Fix tests * Update tests * Clarify repo_path behaviour
* Update meatdata loading for oneformer * Enable loading from a model repo * Update docstrings * Fix tests * Update tests * Clarify repo_path behaviour
What does this PR do?
Previously, the loading of the metadata file for oneformer was effectively hardcoded to download a file from the hub. This PR updates the
prepare_metadata
method to allow for loading of local files as well as model repos.Fixes #23116 and partially addresses #27572
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.