Won't Predict via Disagreement

I'm generally interested in techniques that allow machine learning techniques to raise a won't predict flag. One technique that I learned about recently involves leveraging annotators for this task. It's described in this paper.

Many teams might combine annotations from a group of annotators by majority vote. One of the realizations that I had while glancing through this paper is that you may also use any annotator disagreement in other ways too.

You could, for example, try to predict the output label of each annotator. Then in production, you can try to predict if the annotators would disagree on a label. That might allow you to raise a "won't predict" flag.

The paper suggests that this works better than training a single model on the majority label and using softmax or dropout samples as a proxy.

The paper makes a few more points, but this general idea makes a lot of sense in my mind. The annotation preferences of each annotator could be learned and might help indicate examples that aren't 100% clear cut. This could be useful when trying to find bad labels, but it could also help detect ambigious examples that could be treated more carefully.