# TIL: Plenty of Bad Labels

Data Quality Strikes Again

Vincent Warmerdam koaning.io
2021-06-23

I read a fun paper the other day.

The short story of this paper is that common benchmarking datasets contain bad labels. It was well known that in MNIST a 5 can sometimes look like an 8, so those mistakes might be forgiven. The paper however shows that the mistakes are a fair bit worse.

The paper tries to build algorithms to detect these mislabelled instances. The algorithm works by giving a percentage $$\alpha$$ of images the user is willing to manually re-evaluate and then the algorithm tries to find the appropriate candidates to check.

Their approach is straight-forward. A model is $$g$$ is trained and applied on a datapoint $$x_i$$. They then compare the model output $$g(x_i)$$ with the true label $$y_i$$. The model confidence can be used as a proxy to sort labels so that a user can check them.

It’s a neat trick but given that the state-of-the-art is often 99+% for some of these datasets, it might be time for a few of these models to be re-run.