Snakebench
Immediately after posting about the factorio benchmark I was reminded of another gaming related benchmark. This time it's about the game snake.
Enter snakebench
The whole point of snakebench is to have two LLMs play against eachother. Here is what that looks like:
You can watch live matchers here and you can also inspect the thoughts for both models. To get an impression of how the LLM performs this task, here's an example prompt that I copied from the site:
You are controlling a snake in a multi-apple Snake game.The board size is 10x10. Normal X,Y coordinates are used. Coordinates range from (0,0) at bottom left to (9,9) at top right.
Apples at: (9, 6), (0, 2), (5, 9), (1, 7), (9, 7)
Your snake ID: 1 which is currently positioned at (5, 1)
Enemy snakes positions:
* Snake #2 is at position (7, 1) with body at []
Board state:
9 . . . . . A . . . .
8 . . . . . . . . . .
7 . A . . . . . . . A
6 . . . . . . . . . A
5 . . . . . . . . . .
4 . . . . . . . . . .
3 . . . . . . . . . .
2 A . . . . . . . . .
1 . . . . . 1 . 2 . .
0 . . . . . . . . . .
0 1 2 3 4 5 6 7 8 9
--Your last move information:--
Direction: LEFT
Rationale: I'm noticing that (0,2) is the closest apple from our head at (6,1) (it's 7 steps away by Manhattan distance), and it's away from the enemy snake at (7,0). Right now our immediate options are:
• UP → goes to (6,2)
• RIGHT → goes to (7,1) (closer to the enemy snake's territory)
• LEFT → goes to (5,1)
• DOWN → goes to (6,0)
Moving RIGHT or DOWN brings us closer to the enemy snake at (7,0) or even out of safe bounds. Moving UP would work, but LEFT starts guiding us toward (0,2). Thus, I decide to move LEFT.
Strategy for future turns: I'll aim to continue left and then maneuver upward to reach the apple at (0,2) while remaining mindful of potential enemy moves.
My decided next move: LEFT.
--End of your last move information.--
Rules:
1) If you move onto an apple, you grow and gain 1 point.
2) If you run into a wall (outside the range of the listed coordinates), another snake, or yourself (like go backwards), you die.
3) The goal is to have the most points by the end.
Decreasing your x coordinate is to the left, increasing your x coordinate is to the right.
Decreasing your y coordinate is down, increasing your y coordinate is up.
You may think out loud first then respond with the direction.
You may also state a strategy you want to tell yourself next turn.
End your response with your decided next move: UP, DOWN, LEFT, or RIGHT.
It is somewhat fascinating that we're really getting creative with these benchmarks. A lot of it is based on vibes that are hard to quantify, but it's clear that these benchmarks are getting more and more complex.