TIL: Active Churning

Randomly Sampling is a Strong Benchmark

Vincent Warmerdam koaning.io

As some might know, I’ve recently become very interested in active learning techniques. The big picture idea of active learning is that we might use the uncertainty of a model as a proxy for labelling priority. The hope is this way, we may try to sample from an unlabelled dataset as effectively as possible in an attempt to reduce the amount of time it takes to annotate it.

There are a few techniques in this space, but the overall goal is to label more effectively.

While doing a bit of research I “stumbled” on a very interesting paper titled “Practical Obstacles to Deploying Active Learning” by David Lowell, Zachary C. Lipton and Byron C. Wallace.