I read a blogpost the other day that talks about an issue in the udacity self driving car dataset. It seems that many of the annotations for pedestrians are missing.
To quote the blogpost:
We did a hand-check of the 15,000 images in the widely used Udacity Dataset 2 and found problems with 4,986 (33%) of them. Amongst these were thousands of unlabeled vehicles, hundreds of unlabeled pedestrians, and dozens of unlabeled cyclists. We also found many instances of phantom annotations, duplicated bounding boxes, and drastically oversized bounding boxes.
It goes on.
Perhaps most egregiously, 217 (1.4%) of the images were completely unlabeled but actually contained cars, trucks, street lights, and/or pedestrians.
Label errors are a well known issue but missing pedestrians feel like an obviously harmful error. The blogpost does a nice job of reminding folks of the importance of good quality data.
Might be a fun exercise for later to see how easy it might be to detect these label errors.