An IDK token

I noticed an interesting idea appear on my timeline that is all about introducing an "I don't know" (IDK) token to LLMs in an attempt to get them to admit/understand when they don't know something. It's an idea that makes an appearance in this "I don't know: Explicit Modelling with an [IDK] token" paper.

The objective is to introduce an extra token during pretraining and to shift some of the probability mass of wrong predictions towards this special IDK token. The idea is that this will help the model to understand when it doesn't know something and to not make a prediction.

This honestly feels like a clever way to generate the required "label" ahead of time. Just steal some probability mass of known wrong answers and hopefully the model will learn to doubt instead. The paper warns that you can overdo it though. You can generate too many false positive IDKs where the model could have made a correct prediction instead. There is some regularisation that you can do to help, but it is worth emphesizing that there is no free lunch.

The paper was a fun glance. It turns out that the technique is not perfect, some models have a harder time to tune and it is an expernsive tuning setup. But I do really enjoy the idea of having some sort of a "wont predict" flag in a predictive system. It's a trick that worked out super well for me in the early part of my career and I happy to see similar ideas popping up in LLM-land.