just saying it twice

2026-03-02

This was an interesting, and delightfully short, read.

Uploaded image
Saying it twice makes number go up!

It seems that for a lot of tasks you can just repeat the input twice and get better performance out of an LLM. Granted, this is shown for pretty old models, but still!

backprop takes a toll

2026-03-02

There are a bunch of reasons why people are excited about evolutionary strategies but an often overlooked one is that backprop is pretty darn heavy.

Uploaded image
I ran the benchmark

It's not just the time it takes, but also the memory! Especially when you use Adam. Suddenly you're not just storing the weights and their gradients, but also their velocities for the momentum-stuff.

Quick and dirty benchmark can be found here.

The database sharding term comes from ultima online.

2026-02-22

"Sharding" is a popular database technique where you split a database across multiple "shards" to spread the load. The term sounds like it might have originated from the actual term "shard" which implies it is a piece of a larger whole, but it turns out the actual origin is more related to fantasy lore.

The full details are explained in this interview from ars technica where Richard Garriott explains how Ultima Online (or at least the version from the 90ies) was created with an ecology bug. In the same interview he also mentions a bit of lore related to the game. As the story goes, when you defeated a final boss who carried "the gem of immortality" it broke into many different shards. And this bit of lore was used to justify the creation of many different servers, each with their own copy of the game.

That's where the original term seems to have come from. Some other sources online seem to verify it.