Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update metadata loading for oneformer #28398

Merged
merged 6 commits into from
Jan 12, 2024

Conversation

amyeroberts
Copy link
Collaborator

What does this PR do?

Previously, the loading of the metadata file for oneformer was effectively hardcoded to download a file from the hub. This PR updates the prepare_metadata method to allow for loading of local files as well as model repos.

from transformers import OneformerImageProcessor

image_processor = OneformerImageProcessor(
    checkpoint, 
    repo_path="path/to/file", 
    class_info_file="local_metadata.json"
)

Fixes #23116 and partially addresses #27572

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

@amyeroberts amyeroberts force-pushed the fix-oneformer-processing branch from e9ce193 to 0779f42 Compare January 10, 2024 15:19
@amyeroberts amyeroberts requested a review from ydshieh January 10, 2024 16:09
Copy link
Collaborator

@ydshieh ydshieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 nits but LGTM, thanks!

Comment on lines +357 to +358
except RepositoryNotFoundError:
fname = hf_hub_download(repo_id, class_info_file)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason to try without repo_type=dataset?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just that someone might store it under the model repo rather than the dataset. As the model is trained on a dataset - there's already a coupling and we already store data specific mappings e.g. id2label on the model side.

I can remove if you think it's better?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ok :-)

Dataset repository on huggingface hub containing the JSON file with class information for the dataset.
repo_path (`str`, *optional*, defaults to `"shi-labs/oneformer_demo"`):
Path to hub repo or local directory containing the JSON file with class information for the dataset.
If unset, will load `class_info_file` from the current working directory.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be more precise: if unset or the corresponding file could not be found locally ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If unset, then repo_path is set to "" - effectively making the filename class_info_file rather than repo_path/class_info, but it's independent of whether it can be found locally.

Would this be clearer:

            Path to hub repo or local directory containing the JSON file with class information for the dataset.
            If unset, will look for `class_info_file` in the current working directory.

And then adding an additional check in load_metadata:


def load_metadata(repo_id, class_info_file):
    fname = os.path.join("" if repo_id is None else repo_id, class_info_file)

    if not os.path.exists(fname) or not os.path.isfile(fname):
        if repo_id is None:
            raise ValueError(f"Could not file {fname} locally. repo_id must be defined if loading from the hub")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either works for me, but we can go for the above one for simplicity.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did both in a777668 :)

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks fo the cleanup

@amyeroberts amyeroberts merged commit 666a6f0 into huggingface:main Jan 12, 2024
18 checks passed
@amyeroberts amyeroberts deleted the fix-oneformer-processing branch January 12, 2024 12:35
staghado pushed a commit to staghado/transformers that referenced this pull request Jan 15, 2024
* Update meatdata loading for oneformer

* Enable loading from a model repo

* Update docstrings

* Fix tests

* Update tests

* Clarify repo_path behaviour
MadElf1337 pushed a commit to MadElf1337/transformers that referenced this pull request Jan 15, 2024
* Update meatdata loading for oneformer

* Enable loading from a model repo

* Update docstrings

* Fix tests

* Update tests

* Clarify repo_path behaviour
wgifford pushed a commit to wgifford/transformers that referenced this pull request Jan 21, 2024
* Update meatdata loading for oneformer

* Enable loading from a model repo

* Update docstrings

* Fix tests

* Update tests

* Clarify repo_path behaviour
AjayP13 pushed a commit to AjayP13/transformers that referenced this pull request Jan 22, 2024
* Update meatdata loading for oneformer

* Enable loading from a model repo

* Update docstrings

* Fix tests

* Update tests

* Clarify repo_path behaviour
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OneFormerImageProcessor does not support passing local config file, always tries to download from repo
3 participants