Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use example FileTypes to validate Extractor entries #9

Open
3 tasks
PeterKraus opened this issue May 14, 2024 · 1 comment
Open
3 tasks

Use example FileTypes to validate Extractor entries #9

PeterKraus opened this issue May 14, 2024 · 1 comment

Comments

@PeterKraus
Copy link
Contributor

Originally in marda-alliance/metadata_extractors_registry#34:

  • Make sure the URLs are resolvable through Fly (probably pointing to GitHub raw links)
  • Add the files as examples in the file types models and expose them for validation
  • Validate entries in the data folder and make sure they correspond to registered file types

This would mean that a CI in the yard repo would use the beam package to check whether the files in the marda_registry/data/lfs folder can be matched to FileTypes and processed using any matching Extractors in the yard.

  • The obvious issue right now is how to deal with GitHub's LFS restrictions.
  • The obvious issue down the road is how to deal with the server load.
@PeterKraus
Copy link
Contributor Author

A possible suggestion:

Committing an example file for everything under the sun into this repo directly is probably not the best way forward. We should have a mechanism for providing persistent links to example files (e.g., archived files with DOIs) that the registry can download and use as test data. Probably this ends up being a registry of example files too, in that case...

See marda-alliance/metadata_extractors_registry#66

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant