Today I Learned

Inspired by Simon Willison, this part of my site is for short lessons worth journalling.


2024-03-27
The roll, yaw and pitch of strawberries.

Harvesting robots need it.

2023-11-22
Detecting Chessboards

Another usecase for Blender

2023-10-05
Machine UnLearning for Harry Potter

Finetuning LLMs away from something

2023-09-29
Invasive species

Detecting about 196 of them

2023-09-22
Citrus Fruits

On the tree and on the ground.

2023-09-22
One Dimensional Word2Vec

Via a travelling salesman

2023-09-08
2023-09-08
Doppelganger Buildings

And how to check them.

2023-07-26
text2fabric

How to search in fashion

2023-07-18
Plant datasets

For computer vision

2023-07-17
The worst kind of duplicate

In CIFAR100 no less

2023-07-17
Sleep vs. Code

A study in sleep deprivation

2023-06-02
Memes and other strange images

A benchmark/dataset of memes

2023-04-21
Rubik's TSNE

Mapping all the moves

2023-04-21
Human Label Variation Datasets

Collecting datasets with annotator information

2023-04-19
Wearables as a Multi-Model dataset

Using video to map sensors to activity

2023-03-29
Low Light Computer Vision

Seeing things in the dark

2023-03-13
Colorizing Mobile Websites

Neural networks vs. web standards

2023-03-12
Automating Esports Commentary

LOL stands for League of Legends

2023-03-11
2023-02-23
Spreadsheet Risk Management

There's a genuine conference for it.

2023-01-16
The Corrupted Blood Incident

Diseases Spreading in World of Warcraft

2022-12-07
Open Sanctions

A cool public dataset

2022-12-04
Angry AI Birds

Via tactics of deception!

2022-12-04
Typo/Spelling Error Dataset

Via Mechanical Turk and Git Repos

2022-11-20
Playtesting Candycrush

with Deep Learning?

2022-11-18
Bot Bowl

an AI Competition in Blood Bowl

2022-11-02
Missing Pedestrians

In a self-driving car dataset. Ouch.

2022-11-01
Ascent

Library with some cool sentiment ideas

2022-10-27
Game Time Distribution

Turns out, it's Weibull?

2022-10-26
Minecraft Diffusion

Crafting starting points for diffusion.

2022-10-22
Only 7 Percent

Not everybody shares their data.

2022-10-21
Zelda Street View

This is a really cool hobby project.

2022-10-08
Data Duplications

Between Test and Training data!

2022-10-06
Annotation Datasets

Let’s study annotators

2022-10-05
Punderstanding

Computational "Pun"-derstanding that is.

2022-09-09
DALC

Dutch Abbusive Language Corpus

2022-07-21
Generating Receipts

This is a really cool use-case for Blender.

2022-07-13
Annotators vs. Tasks

Are We Modeling the Task or the Annotator?

2022-05-17
Won't Predict via Disagreement

Learning from Teachers, more Literally

2022-05-13
Interactive Confusion Matrices

Ideas for UI work.

2022-05-02
Active Churning

Randomly Sampling is a Strong Benchmark

2022-04-23
Active Street Signs

Neat usecase for Active Learning.

2022-04-22
Perfect Fit

Never ever claim a perfect fit.

2022-04-21
Active, but Visual, Learning

Colors and Convex Hulls

2022-01-16
The Story Theory

Statistics, Storks and Babies

2021-12-20
2021-12-05
VADER

Rule Based Sentiment

2021-12-03
Linkrot

It is a Huge Problem

2021-10-29
Learning to Place

Classification as a Heavy-Tail Regressor

2021-10-13
Optimal Seeds

Manual_seed(3407) is All You Need

2021-10-12
1.4 Million Jupyter Notebooks

And only 24.1% of them actually ran.

2021-09-27
Sentiment and Bias

Exploring Huggingface while I'm at it.

2021-09-26
Gorilla Hypotheses

A hypothesis *can* be a liability.

2021-09-13
Scots Wikipedia

The Ouch continues in Embeddings

2021-09-01
Analytics Providers

It's Numbers that Differ!

2021-08-27
poke2vec

As in ... text embeddings!

2021-08-10
Pandas Format

Pretty table renders.

2021-08-06
Stopwords

They're not very consistent.

2021-07-29
Dixit Data

How a Great Game became a Grand Challenge

2021-07-22
Label Errors

How to find LOTS of them.

2021-07-17
DnD Data

There's lots of it.

2021-07-17
Shaded Screenshots

A "shortcut" with 4 keys.

2021-07-16
Copilot & Pytest

Pytest vs. Parrot

2021-07-15
metatags.io

It's a great helper

2021-07-08
Copilot & Submodules

Autocomplete Might be Better

2021-06-25
Github Actions as a Number

Is it big or is it small?

2021-06-23
Plenty of Bad Labels

Data Quality Strikes Again

2021-06-18
Recursive HTML

I *really* like Svelte.

2021-06-13
Urban Dictionary Embeddings

It's an entertaining idea.

2021-06-05
Tesla vs. Stoplights

Data Quality Strikes Again

2021-06-03
Kolektor

My take on Git-Scraping[tm]

2021-06-01
Flight Simulatoops

Data Quality Strikes Again