So it turns out that it's not just mislabelled data that can be an issue, you can also have duplicates. Even between train and test sets.
According to this paper, by Bjorn Barz and Joachim Denzler, CIFAR-10 and CIFAR-100 suffer from this. And it's not just a few examples either.
It even turns out that it's so many errors that if effects the evaluation scores of the CNN models.
One nice touch (appreciated!) from the authors is that released a cleaned version of CIFAR. Or, at least it doesn't contain duplicates. It may certainly still have other label issues. This dataset can be found here.