I simply used the example MATLAB code on this video and obtained the following results. https://www.dropbox.com/s/zq4zf82yfaizbpk/MotionBasedMultiObjectTrackingExample.m?dl=0
How the Algorithm works?
- The detection of moving objects uses a background subtraction algorithm based on Gaussian mixture models. Morphological operations are applied to the resulting foreground mask to eliminate noise. Finally, blob analysis detects groups of connected pixels, which are likely to correspond to moving objects.
- The association of detections to the same object is based solely on motion. The motion of each track is estimated by a Kalman filter. The filter is used to predict the track's location in each frame, and determine the likelihood of each detection being assigned to each track.
- Track maintenance becomes an important aspect of this example. In any given frame, some detections may be assigned to tracks, while other detections and tracks may remain unassigned. The assigned tracks are updated using the corresponding detections. The unassigned tracks are marked invisible. An unassigned detection begins a new track.
- Each track keeps count of the number of consecutive frames, where it remained unassigned. If the count exceeds a specified threshold, the example assumes that the object left the field of view and it deletes the track.
Analysis The method seems to work well in scenarios where pedestrians are not very close to each other. Since it uses connected component labeling in subsequent frames for tracking an object, it often labels close pedestrians as a single object. Therefore not suitable for pedestrian tracking in crowded places.
Found this paper, they have also done pedestrian tracking on this video. They have not shared the implementation of their algorithm but they have shared the results.
Analysis Method 3 overcomes the shortcomings of Method 1. As seen in the video, it seems sufficiently good at detecting occluded pedestrians.
Ground Truth I am able to interpolate ground truth from the given spline data. Here is how the ground truth looks like.
Comparing ground truth with Detection from method 3 Play the video at 0.5x for more clarity
Complete Ground Truth Video | Capturing GT Only Within The Region Of Interest
https://drive.google.com/open?id=1JKWSW2MGsoQ1-RVWjrZCNfg3qLwXYAwf
https://drive.google.com/open?id=1JKWSW2MGsoQ1-RVWjrZCNfg3qLwXYAwf
Radius = 100
Representation:
Yellow → Ground Truth
Blue → False Negative
Green & Red → Detections → Lower Center of the Bounding Box
Green → True Positive
Red → False Positive
Detection Method and Results: MASK-RCNN
https://github.com/ArchitParnami/Pedestrian-Tracking/blob/master/Evaluation/results_student.json
Method for mapping Detection to GT
- Let there be m GT and n Detections, then there are k = min(m,n) closest pairs
- A Closest pair is pair of GT and a Detection, such that they have shortest distance to each other.
- A Pair is Considered a Match / True Positive if the distance between GT and Detection is within a radius ‘R’ else it is a False Positive.
Results
Radius | Average Recall | Average Precision | Average F1 Score |
---|---|---|---|
120 | 62.23 | 95.109 | 75.32 |
100 | 61.28 | 93.72 | 74.10 |
80 | 60.13 | 92.00 | 72.72 |
Average precision and average accuracy are calculated by taking average of results from all the frames excluding first and last frame.
Total Frames = 5405
Precision = T.P / (T.P + F.P)
Recall = T.P / Number of GT
F1 = 2 * (P*R) / (P+R)