Decreasing detection performance for „small“ person image areas of YOLOv10 (lowest), YOLOv7, and YOLOX (best). #386
norbertlink
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Comparing detections of humans in crowded scenes with decreasing person image area due to perspective depth showed that YOLOX detects even very small persons, while YOLOv7 misses significantly many small persons and YOLOv10 detects even fewer than YOLOv7. See example result images.
This is in contradicition to the published COCO evaluation results, where the ranking is opposite.
The corresponding papers indicate that the same training and evaluation data have been used.
So I would like to open a discussion about this observation.
At the moment I have only the two possible explanations, assuming that training and evaluation data have really been the same and properly been used:
The discrepancy is an effect of the anchor boxes (used by YOLOv7 and YOLOv10, but not by YOLOX), which are determined from the training dataset, where small bounding boxes may be underrepresented, thus explaining the superior detection quality of YOLOX for „small“ persons.
The observation can stem from YOLOv10 better adopting to annotation error systematics of "small" persons and objects, which may be contradictory annotated (some "small" objects are correctly annotated and other "small" objects are not annotated) and lead to contradictions in the loss function. Then YOLOv10 as the faster converging model would learn to decide for the majority decision about „small“ objects (which is to be not annotated) and thus suppress the detection of „small“ objects.
Ideas would be very welcome.
Beta Was this translation helpful? Give feedback.
All reactions