Plenty of Bad Labels

I read a fun paper the other day.

The short story of this paper is that common benchmarking datasets contain bad labels. It was well known that in MNIST a 5 can sometimes look like an 8, so those mistakes might be forgiven. The paper however shows that the mistakes are a fair bit worse.

The paper tries to build algorithms to detect these mislabelled instances. The algorithm works by giving a percentage $\alpha$ of images the user is willing to manually re-evaluate and then the algorithm tries to find the appropriate candidates to check.

Their approach is straight-forward. A model is $g$ is trained and applied on a datapoint $x_i$. They then compare the model output $g(x_i)$ with the true label $y_i$. The model confidence can be used as a proxy to sort labels so that a user can check them.

It's a neat trick but given that the state-of-the-art is often 99+% for some of these datasets, it might be time for a few of these models to be re-run.

koaning.io

Plenty of Bad Labels

Related Posts

The titanic dataset has a twist

The Sock Drawer Paradox

cline feels like an upgrade

Domain Specific Keyboards: the mathpad