There are two tiers of open-source projects for me now.
Historically whenever I felt that tools might be useful to me I naturally assumed they might be useful to other people, so taking the effort of putting it up on pypi made sense. Nowadays though, I am making more and more personal tools. It is all thanks to Claude and It has made me rethink the distribution of my work. Some tools are really meant for "just me" which makes pypi a bad target. Users might expect a proper amount of maintenance when you claim a name there because you're squatting a name that someone might want to use.
Fix for the second trier
If all you're building is a CLI it turns out that a Github repository is really all you need thanks to a nice uvx pattern. For example, this cli that contains my custom blog writing tool can run with this one-liner:
It takes care of all the dependencies and I don't have to worry about package versions of distributions. Feels like the best way for me to share open-source work that doesn't really fall in the "first tier" of open-work category. It is still open, but it suggests much lower expectations and does a lot better job at explaining that I am the primary target audience.
Pun on README
This pattern has also led me to add a small joke on some repos.
You're not really installing it, right?
I would encourage more people to do this. Partially to preserve the namespace on pypi but also because I would love it if more people would share their brainfarts with the world.
I stumbled upon an example that does a pretty good job at explaining why context is hard, albeit a bit niche and nerdy.
Vulture
In starcraft there is a unit called a "vulture". Here's a YT video.
So, if you were to prompt an image generator model to generate a "starcraft marine riding a vulture" what would you expect? Let's consider two examples
Example oneExample two
Here's the hard thing to me, when I look at just the prompt it is incredibly hard for me to figure out which image is "better". The prompt simply does not give me enough information. The first image shows a vulture from the video game, which is accurate. But then again, the marine riding a literal vulture in space is no less accurate in the literal sense but it is also pretty witty.
It's the best example I've thus-far seen of "context matters". The current prompt makes it impossible to be able to declare which of these two images is better, which also immediately makes it clear that it should be improved.
A few weeks ago, I released mopad, which is a library that allows you to use a gamepad in Python.
When I released it, I hinted at Hamel Husain that you could use it to annotate data and that more people should consider it seriously. Turns out this was a total nerd snipe, and Hamel invited me to talk about it on his live-stream here. The benefit of this link is that it also has dialogue at the start where I showcase my streaming/work setup. You can also see the more polished demo-recording below:
Announcing molabel
I figured the livestream was also a great opportunity to showcase a new thing that I had been working on: molabel. The library contains an annotation widget that gives you simple labelling interfaces that work straight from your notebook. It works by letting you pass a custom rendering function to for your examples, after which you can attach labels and notes to each. You can use the mouse for this, as well as keyboard shortcuts or a bluetooth gamepad.
This is what the widget looks like:
The `SimpleLabel` widget from molabel
You create this widget with a call like this:
from molabel import SimpleLabel
widget = SimpleLabel(
examples=list_of_examples,
render=render_function
)
The list_of_examples can be any Python list of things that you would like to render. You'd typically pass a list of dictionaries, but you're allowed to pass whatever you like. The only thing that the widget cares about is that you also pass a render_function that can take each example and turn it into something html-like that the widget can render. You could use a string, but any object that has a _repr_html_ would also work. That means that you can also directly use mohtml and FastHTML.
The goal of the `render` function.
If the widget knows how to render, it can take care of the rest.
What the widget adds.
The widget will store all the annotations that you generate and you can wire it up in such a way that data is automatically sent to a database on your behalf. If you have a gamepad at-the-ready you can also use it live from the demo hosted on GitHub pages. It runs a marimo notebook in WASM, so you can give the widget a spin without installing anything locally.
Loads of use-cases are covered
This widget can feel pretty basic at first glance. There's really only a binary "yes"/"no" choice with an extra option to "skip". While it might seem basic, let's take a step back and appreciate what we've already got covered here. As-is, you can use this for:
binary classification
multi-label classification (each label is binary, after all)
the "is this a match?" kinds of queries for retreival
comparisons between two choices, one of which is better
Keeping it simple also means less human errors and more labels. I have seen annotation interfaces with 27 classification labels and it's maddening. Having a simpler interface means less cognitive overhead and it will just give you more labels with higher quality.
Not only that, but we allow you to add plenty of context via the textbox too. So if you want to explain why there is an issue with an example, maybe for finetuning an LLM later, that's totally possible here. You could theorically even use this widget only to add a note to each example, ignoring the labels all-to-gether. Note that you can use the text-to-speech API from the browser for free here, so you could also use your voice while annotating with a gamepad.
Why
I have been telling people to take data quality serious for years now and instead of talking about it I now want to start making tools for it. The focus of these tools will be to make it easy to get started, but also to make it fun. I have literally found a way for AI engineers to expense a gamepad for work, so hopefully that alone will be enough to get more people to use this.
Final note
Finally, and this is important, the gamepad is also a great ergonomic device. You can sit in a comfortable chair and use a device that is crafter for comfort. I have suffered from RSI in the past and it is something to take serious. It led me to become a bit of a device nerd, I now review ergonomic keyboards on a YouTube channel, and I am looking forward to reviewing devices/gamepads for annotation in the near future
ps. This post was also featured on my Substack. Consider becoming a free member if you want to remain in the loop of what I am up to.