Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support different media types #675

Open
5 tasks
yasakova-anastasia opened this issue Feb 21, 2022 · 0 comments
Open
5 tasks

Support different media types #675

yasakova-anastasia opened this issue Feb 21, 2022 · 0 comments
Labels
data formats PR is related to dataset formats ENHANCE Enhancement of existing features

Comments

@yasakova-anastasia
Copy link

yasakova-anastasia commented Feb 21, 2022

Related #129
Related #135
Related #136

Currently, Datumaro only supports images as media type, however there are lots of other media types used in computer vision datasets. Moreover, OpenVINO is not limited by vision tasks, and it includes, for instance, NLP models. For Datumaro, it is essential direction of growth.

Tasks:

  • Support generic media types in dataset and plugins (Generic media support #539)
  • Make better separation between modules wrt supported media type. For example, there is a number of operations than can only work with (and make sense for) image datasets. They need to be clearly distinguished in API and should not be applicable to other media types
  • Consider interaction with annotation types. It is clear that not every type of annotations can be applicable for each media type. The question is how to differentiate them and whether it is needed at all
  • Allow API users to provide their own media types

Transition period tasks:

  • In the first step, we make Image default media type, which can be changed. However, it needs to be changed to undefined later. Need to remove the default media_type value in Extractor c-tor and require this info from the Extractor (and other IDataset children). Also add changes to the extractors and tests that are needed.
@zhiltsov-max zhiltsov-max changed the title Need to change from media_type=Image to media_type=None (default value) Support different media types Feb 22, 2022
@zhiltsov-max zhiltsov-max added ENHANCE Enhancement of existing features data formats PR is related to dataset formats labels Feb 22, 2022
@zhiltsov-max zhiltsov-max mentioned this issue Mar 9, 2022
13 tasks
zhiltsov-max pushed a commit that referenced this issue Mar 9, 2022
- Added `DatasetItem.media` to replace dedicated members for each media type
- Added the `PointCloud` media type
- Added the `media_type()` method to `Extractor`s
- Added merging for all media types, mixed media types for an item or in the dataset produce an error
- Datasets can't have mixed media types in items. If such situation occurs, an error is raised (checked during dataset caching/iteration)
- Datasets can't change media type using transforms
- Extractors must report their media type with the `media_type()` method
- Added a new mandatory `media_type` argument to `Dataset.from_iterable`. It has a default value of `Image` for the transition period (to be tracked in #675).
- Deprecated `DatasetItem.image`, `.related_images`, `.point_cloud`, `save-images` and `require_images`
- Added deprecation messages about annotation classes in `components.extractor`
- Suppressed Datumaro deprecation messages when using Datumaro from CLI

Co-authored-by: yasakova-anastasia <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data formats PR is related to dataset formats ENHANCE Enhancement of existing features
Projects
None yet
Development

No branches or pull requests

2 participants