backprop takes a toll
2026-03-02
There are a bunch of reasons why people are excited about evolutionary strategies but an often overlooked one is that backprop is pretty darn heavy.
It's not just the time it takes, but also the memory! Especially when you use Adam. Suddenly you're not just storing the weights and their gradients, but also their velocities for the momentum-stuff.
Quick and dirty benchmark can be found here.