The paper tries to build algorithms to detect these mislabelled instances. The algorithm works by giving a percentage $$\alpha$$ of images the user is willing to manually re-evaluate and then the algorithm tries to find the appropriate candidates to check.
Their approach is straight-forward. A model is $$g$$ is trained and applied on a datapoint $$x_i$$. They then compare the model output $$g(x_i)$$ with the true label $$y_i$$. The model confidence can be used as a proxy to sort labels so that a user can check them.