-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speeding up mean_iou metric computation #569
Conversation
```python | ||
>>> import numpy as np | ||
>>> mean_iou = evaluate.load("mean_iou") | ||
>>> predicted = np.array([[2, 2, 3], [8, 2, 4], [3, 255, 2]]) | ||
>>> ground_truth = np.array([[1, 2, 2], [8, 2, 1], [3, 255, 1]]) | ||
>>> results = mean_iou.compute(predictions=predicted, references=ground_truth, num_labels=10, ignore_index=255) | ||
>>> results = mean_iou.compute(predictions=[predicted], references=[ground_truth], num_labels=10, ignore_index=255) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be a breaking change? Cause currently the metric works when passing 2D arrays for predicted
and ground_truth
, but this PR would require it to be a list of 2D arrays?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like it doesn't work with 2D arrays originally, only with a list of them.
I tested it locally and got the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/piakubovskii/Projects/evaluate/.venv/lib/python3.10/site-packages/evaluate/module.py", line 450, in compute
self.add_batch(**inputs)
File "/Users/piakubovskii/Projects/evaluate/.venv/lib/python3.10/site-packages/evaluate/module.py", line 541, in add_batch
raise ValueError(error_msg) from None
ValueError: Predictions and/or references don't match the expected format.
Expected format: {'predictions': Sequence(feature=Sequence(feature=Value(dtype='uint16', id=None), length=-1, id=None), length=-1, id=None), 'references': Sequence(feature=Sequence(feature=Value(dtype='uint16', id=None), length=-1, id=None), length=-1, id=None)},
Input predictions: [[ 2 2 3]
[ 8 2 4]
[ 3 255 2]],
Input references: [[ 1 2 2]
[ 8 2 1]
[ 3 255 1]]
There is also an issue opened 3 weeks ago with the same error:
#563
Maybe backward compatibility have been broken earlier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, cc'ing @lhoestq here, I assume we can safely merge it in that case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed metrics .compute()
take lists of references and predictions per image, not just a pair of reference and prediction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice improvement !
```python | ||
>>> import numpy as np | ||
>>> mean_iou = evaluate.load("mean_iou") | ||
>>> predicted = np.array([[2, 2, 3], [8, 2, 4], [3, 255, 2]]) | ||
>>> ground_truth = np.array([[1, 2, 2], [8, 2, 1], [3, 255, 1]]) | ||
>>> results = mean_iou.compute(predictions=predicted, references=ground_truth, num_labels=10, ignore_index=255) | ||
>>> results = mean_iou.compute(predictions=[predicted], references=[ground_truth], num_labels=10, ignore_index=255) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed metrics .compute()
take lists of references and predictions per image, not just a pair of reference and prediction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok great, let's merge (when the CI is green)
654b252
to
6da1cd8
Compare
Working with semantic-segmentation example I observed that the
mean_iou
metric takes quite a significant time for computation (the time is comparable with a training loop).The cause of such behavior is a conversion of resulted numpy arrays with segmentation maps to dataset format. Currently
mean_iou
metric supposes all segmentation arrays to be converted todatasets.Sequence(datasets.Sequence(datasets.Value("uint16")))
which means converting every item of the arrays.This PR aims to speed up the
mean_iou
by changing the Features type todatasets.Image()
.Here is a short script to measure computation time
As a result, we get 5-50x speed up in metric computation depending on the number of images, image size, and the number of classes.
P.S. PR also fixes not working example in README for mean_iou (#563).