The Sediment of Software Chart

2026-03-08

I started maintaining a new project that tracks the evolution of a Github repository using "sediment" charts. The final output is a Github Pages site that lets you explore sediment charts for different Python projects. Here's one example for the sentence-transformers library.

Uploaded image
There's not a whole lot of action between 2022-2024. But then huggingface took over maintainership.

These charts track each line of code in the project and how it's changing over time. The resulting chart looks like layers of sediment that changes over time.

For Python projects you can also overlay the version number, to try and get some extra context. Usually, Bitco changes coincide with the new version, like with the Django projects shown below.

Uploaded image
Around version 4 you can see a big shift where a lot of *recent* code is changing. Maybe they updated the docs here? Or possibly switches from flake8 to ruff?

I like to think these charts also say something about the health of a project. A good project shows more code being added over time, but without huge changes to past code. The most healthy image I found so far is for marimo. This is my employer, and it is nice to see how the chart matches my experience with the culture there.

Uploaded image
This project is relatively young, so time is measured in quarters instead of years.

Over time I hope this project might track the effect of coding agents. Are we going to see a big spike up? Is it going to rewrite everything from the past? Time will tell, but these charts will give me a nice summary. If you are curious you can run the notebook yourself if you want to give it a spin for your own projects.

I also recorded a YT video for this work here, if you prefer to see a live demo.

local mlx speedups

2026-03-02

I never really used the MLX features from Apple but the UMAP-MLX project is making me wonder if I should dabble in it more. This project takes UMAP and gives it a significant speedup, 30x! Small caveat here is this currently only works for datasets that are small enough not to need approximation algorithms.

Uploaded image
The clusters look similar too

Some stats from the readme:

N       umap-learn    MLX      speedup
1000    4.87s         0.40s    12x
2000    6.18s         0.36s    17x
5000    17.22s        0.44s    40x
10000   25.85s        0.56s    46x
20000   22.01s        0.54s    41x
60000   68.99s        2.04s    34x
70000   81.40s        2.65s    31x

Impressive!

The "database sharding" term comes from ultima online.

2026-02-22

"Sharding" is a popular database technique where you split a database across multiple "shards" to spread the load. The term sounds like it might have originated from the actual term "shard" which implies it is a piece of a larger whole, but it turns out the actual origin is more related to fantasy lore.

The full details are explained in this interview from ars technica where Richard Garriott explains how Ultima Online (or at least the version from the 90ies) was created with an ecology bug. In the same interview he also mentions a bit of lore related to the game. As the story goes, when you defeated a final boss who carried "the gem of immortality" it broke into many different shards. And this bit of lore was used to justify the creation of many different servers, each with their own copy of the game.

That's where the original term seems to have come from. Some other sources online seem to verify it.