Exploring Huggingface while I’m at it.
One of my colleagues sent me a great paper to read.
The paper is titled “Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems” and is written by Svetlana Kiritchenko and Saif M. Mohammad.
The paper investigates issues with pre-trained sentiment analysis models and it also introduces a dataset so that other folks can repeat the exercise. The dataset consists of 8,640 English sentences carefully chosen to tease out biases towards certain races and genders. It uses templates, like
<Person> feels <emotional state word>., and fills in
<Person> with race- or gender-associated names. These templates also belong to an emotion so that an expected sentiment is known upfront.
The results, are summarised in the paper’s conclusion:
We used the EEC to analyze 219 NLP systems that participated in a recent international shared task on predicting sentiment and emotion intensity. We found that more than 75% of the systems tend to mark sentences involving one gender/race with higher intensity scores than the sentences involving the other gender/race. We found the score differences across genders and across races to be somewhat small on average (< 0.03, which is 3% of the 0 to 1 score range). However, for some systems the score differences reached as high as 0.34 (34%). What impact a consistent bias, even with an average magnitude < 3%, might have in downstream applications merits further investigation.
The Equity Evaluation Corpus that the paper introduces is available publicly. That’s great because that means that I can easily repeat the exercise. So I figured I should try it on some pre-trained sentiment models hosted on huggingface.
Here’s an aggregate result from the abhishek/autonlp-imdb_sentiment_classification model.
|PERSON feels EMOTION.||sadness||female||African-American||25||25|
|PERSON feels EMOTION.||sadness||female||European||31||19|
|PERSON feels EMOTION.||sadness||male||African-American||21||29|
|PERSON feels EMOTION.||sadness||male||European||34||16|
|PERSON made me feel EMOTION.||sadness||female||African-American||20||30|
|PERSON made me feel EMOTION.||sadness||female||European||21||29|
|PERSON made me feel EMOTION.||sadness||male||African-American||19||31|
|PERSON made me feel EMOTION.||sadness||male||European||24||26|
This model has two labels:
1. The table shows how often each label is predicted given a template, gender and race. If we have a look at the
PERSON feels EMOTION template, where the emotion is “sadness” then I’d expect that the sentiment only depends on the emotion. We’re aggregating over different names here though and we can see that the sentiment seems to depend on gender and races as well, if only a little bit. To me, that means we cannot blindly trust this model.
I figured I’d share the results where I aggregate across templates. You can inspect the results of the test from different models below.
|emotion||gender||race||1 star||2 stars||3 stars||4 stars||5 stars|
The differences don’t seem staggering across these freely available models. This is a relief. It still stands however that the differences between race/gender should be zero. And they aren’t.
It’s not the biggest bummer in this story though.
After all, it’s incredibly hard to guarantee that a language model has no bias in it. I cannot blame anybody for that. But as it currently stands, it does feel like a warning label is missing. Huggingface supports descriptions of models called “model cards”. While there are model cards attached to these models, none of them acknowledge that there is a risk of bias. Some of them don’t even formally mention the dataset that they are trained on.
And that’s a missed opportunity. It’d be a shame if folks start blindly copying these models without being aware of any risks. It’d be better if these model cards automatically add a bias warning if the original model author didn’t consider it. I’d also recommend explicitly mentioning the dataset that the model is trained on. If the sentiment dataset doesn’t reflect my use-case, that’d be very helpful in picking a pre-trained model.
For more details, feel free to read the original paper on model cards.