Skip to content

Commit

Permalink
Add MMDetection COCO format importer (#1213)
Browse files Browse the repository at this point in the history
<!-- Contributing guide:
https://github.com/openvinotoolkit/datumaro/blob/develop/CONTRIBUTING.md
-->

### Summary

<!--
Resolves #111 and #222.
Depends on #1000 (for series of dependent commits).

This PR introduces this capability to make the project better in this
and that.

- Added this feature
- Removed that feature
- Fixed the problem #1234
-->

### How to test
<!-- Describe the testing procedure for reviewers, if changes are
not fully covered by unit tests or manual testing can be complicated.
-->

### Checklist
<!-- Put an 'x' in all the boxes that apply -->
- [x] I have added unit tests to cover my changes.​
- [ ] I have added integration tests to cover my changes.​
- [x] I have added the description of my changes into
[CHANGELOG](https://github.com/openvinotoolkit/datumaro/blob/develop/CHANGELOG.md).​
- [x] I have updated the
[documentation](https://github.com/openvinotoolkit/datumaro/tree/develop/docs)
accordingly

### License

- [ ] I submit _my code changes_ under the same [MIT
License](https://github.com/openvinotoolkit/datumaro/blob/develop/LICENSE)
that covers the project.
  Feel free to contact the maintainers if that's a concern.
- [ ] I have updated the license header for each file (see an example
below).

```python
# Copyright (C) 2023 Intel Corporation
#
# SPDX-License-Identifier: MIT
```
  • Loading branch information
wonjuleee authored Dec 1, 2023
1 parent 4b9f55f commit bb314a1
Show file tree
Hide file tree
Showing 12 changed files with 449 additions and 0 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## \[Unreleased\]
### New features
- Support MMDetection COCO format
(<https://github.com/openvinotoolkit/datumaro/pull/1213>)

### Enhancements
- Optimize Python import to make CLI entrypoint faster
(<https://github.com/openvinotoolkit/datumaro/pull/1182>)
Expand Down
5 changes: 5 additions & 0 deletions docs/source/docs/data-formats/formats/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ Supported Data Formats
mapillary_vistas
market1501
mars
mmdet
mnist
mot
mots
Expand Down Expand Up @@ -141,6 +142,10 @@ Supported Data Formats
* `Format specification <https://zheng-lab.cecs.anu.edu.au/Project/project_mars.html>`_
* `Dataset example <https://github.com/openvinotoolkit/datumaro/tree/develop/tests/assets/mars_dataset>`_
* `Format documentation <mars.md>`_
* MMDet-COCO (``detection``, ``segmentation``)
* `Format specification <https://mmdetection.readthedocs.io/en/latest/user_guides/dataset_prepare.html>`_
* `Dataset example <https://github.com/openvinotoolkit/datumaro/tree/develop/tests/assets/coco_dataset/mmdet_coco>`_
* `Format documentation <mmdet.md>`_
* MNIST (``classification``)
* `Format specification <http://yann.lecun.com/exdb/mnist/>`_
* `Dataset example <https://github.com/openvinotoolkit/datumaro/tree/develop/tests/assets/mnist_dataset>`_
Expand Down
41 changes: 41 additions & 0 deletions docs/source/docs/data-formats/formats/mmdet.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# MMDetection COCO

## Format specification

[MMDetection](https://mmdetection.readthedocs.io/en/latest/) is a training framework for object detection and instance segmentation tasks, providing a modular and flexible architecture that supports various state-of-the-art models, datasets, and training techniques. MMDetection has gained popularity in the research community for its comprehensive features and ease of use in developing and benchmarking object detection algorithms.
MMDetection specifies their COCO format [here](https://mmdetection.readthedocs.io/en/latest/user_guides/dataset_prepare.html).

Most of available tasks or formats are similar to the [original COCO format](./formats/coco), while only the image directories are separated with respect to subsets.
In this document, we just describe the directory structure of MMDetection COCO format as per [here](https://mmdetection.readthedocs.io/en/latest/user_guides/dataset_prepare.html).
MMDetection COCO dataset directory should have the following structure:

<!--lint disable fenced-code-flag-->
```
└─ Dataset/
├── <subset_name>/
│ ├── <image_name1.ext>
│ ├── <image_name2.ext>
│ └── ...
├── <subset_name>/
│ ├── <image_name1.ext>
│ ├── <image_name2.ext>
│ └── ...
└── annotations/
├── instances_<subset_name>.json
└── ...
```

### Import using CLI

``` bash
datum project create
datum project import --format mmdet_coco <path/to/dataset>
```

### Import using Python API

```python
import datumaro as dm

dataset = dm.Dataset.import_from('<path/to/dataset>', 'mmdet_coco')
```
22 changes: 22 additions & 0 deletions src/datumaro/plugins/data_formats/coco/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,25 @@ def find_images_dir(rootpath: str, subset: str) -> str:
return osp.join(rootpath, subset)


class MmdetDirPathExtracter(DirPathExtracter):
@staticmethod
def find_rootpath(path: str) -> str:
"""Find root path from annotation json file path."""
path = osp.abspath(path)
if osp.dirname(path).endswith(CocoPath.ANNOTATIONS_DIR):
return path.rsplit(CocoPath.ANNOTATIONS_DIR, maxsplit=1)[0]
raise DatasetImportError(
f"Annotation path ({path}) should be under the directory which is named {CocoPath.ANNOTATIONS_DIR}. "
"If not, Datumaro fails to find the root path for this dataset. "
"Please follow this instruction, https://github.com/cocodataset/cocoapi/blob/master/README.txt"
)

@staticmethod
def find_images_dir(rootpath: str, subset: str) -> str:
"""Find images directory from the root path."""
return osp.join(rootpath, subset)


class _CocoBase(SubsetBase):
"""
Parses COCO annotations written in the following format:
Expand Down Expand Up @@ -121,6 +140,9 @@ def __init__(
elif coco_importer_type == CocoImporterType.roboflow:
self._rootpath = RoboflowDirPathExtracter.find_rootpath(path)
self._images_dir = RoboflowDirPathExtracter.find_images_dir(self._rootpath, subset)
elif coco_importer_type == CocoImporterType.mmdet:
self._rootpath = MmdetDirPathExtracter.find_rootpath(path)
self._images_dir = MmdetDirPathExtracter.find_images_dir(self._rootpath, subset)
else:
raise DatasetImportError(f"Not supported type: {coco_importer_type}")

Expand Down
1 change: 1 addition & 0 deletions src/datumaro/plugins/data_formats/coco/format.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ class CocoTask(Enum):
class CocoImporterType(Enum):
default = auto()
roboflow = auto()
mmdet = auto()


class CocoPath:
Expand Down
79 changes: 79 additions & 0 deletions src/datumaro/plugins/data_formats/mmdet.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# Copyright (C) 2023 Intel Corporation
#
# SPDX-License-Identifier: MIT

import os.path as osp
from glob import glob
from typing import Optional

from datumaro.components.dataset_base import DEFAULT_SUBSET_NAME
from datumaro.components.format_detection import FormatDetectionConfidence, FormatDetectionContext
from datumaro.components.importer import ImportContext
from datumaro.plugins.data_formats.coco.base import _CocoBase
from datumaro.plugins.data_formats.coco.format import CocoImporterType, CocoTask
from datumaro.plugins.data_formats.coco.importer import CocoImporter


class MmdetCocoImporter(CocoImporter):
@classmethod
def detect(
cls,
context: FormatDetectionContext,
) -> FormatDetectionConfidence:
ann_paths = context.require_files("annotations/instances_*.json")

for ann_path in ann_paths:
subset_name = cls._get_subset_name(ann_path)

with context.require_any():
with context.alternative():
image_files = osp.join(subset_name, "*.jpg")
context.require_file(f"{image_files}")

return FormatDetectionConfidence.MEDIUM

def __call__(self, path, stream: bool = False, **extra_params):
subset_paths = glob(osp.join(path, "**", "instances_*.json"), recursive=True)

sources = []
for subset_path in subset_paths:
options = dict(extra_params)
options["subset"] = self._get_subset_name(subset_path)

if stream:
options["stream"] = True

sources.append({"url": subset_path, "format": "mmdet_coco", "options": options})

return sources

@classmethod
def _get_subset_name(cls, subset_path: str):
parts = osp.splitext(osp.basename(subset_path))[0].split("instances_", maxsplit=1)
subset_name = parts[1] if len(parts) == 2 else DEFAULT_SUBSET_NAME

return subset_name


class MmdetCocoBase(_CocoBase):
"""
Parses Roboflow COCO annotations written in the following format:
https://cocodataset.org/#format-data
"""

def __init__(
self,
path,
*,
subset: Optional[str] = None,
stream: bool = False,
ctx: Optional[ImportContext] = None,
):
super().__init__(
path,
task=CocoTask.instances,
coco_importer_type=CocoImporterType.mmdet,
subset=subset,
stream=stream,
ctx=ctx,
)
12 changes: 12 additions & 0 deletions src/datumaro/plugins/specs.json
Original file line number Diff line number Diff line change
Expand Up @@ -710,6 +710,18 @@
"plugin_type": "Importer",
"extra_deps": []
},
{
"import_path": "datumaro.plugins.data_formats.mmdet.MmdetCocoBase",
"plugin_name": "mmdet_coco",
"plugin_type": "DatasetBase",
"extra_deps": []
},
{
"import_path": "datumaro.plugins.data_formats.mmdet.MmdetCocoImporter",
"plugin_name": "mmdet_coco",
"plugin_type": "Importer",
"extra_deps": []
},
{
"import_path": "datumaro.plugins.data_formats.mnist.MnistBase",
"plugin_name": "mnist",
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
{
"licenses":[
{
"name":"",
"id":0,
"url":""
}
],
"info":{
"contributor":"",
"date_created":"",
"description":"",
"url":"",
"version":"",
"year":""
},
"categories":[
{
"id":1,
"name":"a",
"supercategory":""
},
{
"id":2,
"name":"b",
"supercategory":""
},
{
"id":4,
"name":"c",
"supercategory":""
}
],
"images":[
{
"id":5,
"width":10,
"height":5,
"file_name":"a.jpg",
"license":0,
"flickr_url":"",
"coco_url":"",
"date_captured":0
}
],
"annotations":[
{
"id":1,
"image_id":5,
"category_id":2,
"segmentation":[

],
"area":3.0,
"bbox":[
2.0,
2.0,
3.0,
1.0
],
"iscrowd":0
}
]
}
101 changes: 101 additions & 0 deletions tests/assets/coco_dataset/mmdet_coco/annotations/instances_val.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
{
"licenses":[
{
"name":"",
"id":0,
"url":""
}
],
"info":{
"contributor":"",
"date_created":"",
"description":"",
"url":"",
"version":"",
"year":""
},
"categories":[
{
"id":1,
"name":"a",
"supercategory":""
},
{
"id":2,
"name":"b",
"supercategory":""
},
{
"id":4,
"name":"c",
"supercategory":""
}
],
"images":[
{
"id":40,
"width":5,
"height":10,
"file_name":"b.jpg",
"license":0,
"flickr_url":"",
"coco_url":"",
"date_captured":0
}
],
"annotations":[
{
"id":1,
"image_id":40,
"category_id":1,
"segmentation":[
[
0.0,
0.0,
1.0,
0.0,
1.0,
2.0,
0.0,
2.0
]
],
"area":2.0,
"bbox":[
0.0,
0.0,
1.0,
2.0
],
"iscrowd":0,
"attributes":{
"x":1,
"y":"hello"
}
},
{
"id":2,
"image_id":40,
"category_id":2,
"segmentation":{
"counts":[
0,
20,
30
],
"size":[
10,
5
]
},
"area":20.0,
"bbox":[
0.0,
0.0,
1.0,
9.0
],
"iscrowd":1
}
]
}
Binary file added tests/assets/coco_dataset/mmdet_coco/train/a.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added tests/assets/coco_dataset/mmdet_coco/val/b.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit bb314a1

Please sign in to comment.