Human Label Variation Datasets

Today I learned about an awesomelist about human-label variation. It's a list of unaggregated datasets that allows you to research annotator (dis)agreement. It's part of an effort described in a paper by Barbara Plank.

The paper also starts by proposing the term "human label variation", which seems very sensible because disagreement between annotators doesn't automatically imply that there is a wrong annotation just yet.

The list is super cool though, and I can totally see myself having some fun with them. The paper also links to other interesting work, including this survey on annotator disagreement across datasets. It turns out, yet again, that there's a lot of nuance when it comes to "gold labels".