In a previous document I've explained how to use the CAPM model to create naive ranking for pokemon. Even though the approach had some logic to it, I wanted to explore if a naive probibalistic implementation of trueskill could do the job as well. This document contains a python implementation my naive approach as well as an application to a pokemon dataset. The first part will explain how it is made and tested. The second part will analyse the ranking of all 750 pokemon.

## Starting Data

Before writing any application code, it seems sensible to first think about how to test if the code works. It would be a good idea to simulate some random players with predetermined skill levels such that we can test if our ranking algorithm can at least rank simulated data well enough. The code below generates this data.

```
import uuid
import numpy as np
import pandas as pd
import json
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn; seaborn.set()
SIZE_ARR = 100
def sim(p1, p2, entropy = 1):
"""
Given two players (tuples: <name, skill>) simulate
a game between them. Returns (winner-name, loser-name)
"""
r = np.random.randint(-entropy, entropy + 1)
if p1[1] + r > p2[1]:
return p1[0], p2[0]
else:
return p2[0], p1[0]
def gen_data(n_users = 20, n_games = 100, entropy = 1):
"""
Generate a lot of games and outputs a list of user tuples
(<name>, <skill>)as well as a list of games played.
"""
data = []
users = [(str(uuid.uuid4())[:6], i) for i in range(n_users)]
for _ in range(n_games):
i, j = np.random.choice(range(len(users)), 2)
u1, u2 = users[i], users[j]
data.append(sim(u1, u2, entropy))
return users, pd.DataFrame(data, columns = ['winner', 'loser'])
```

# Code that does the math

We'll now concern ourselves with writing code that updates a system of beliefs. The belief will be captured in a numpy array and can be interpreted as a histogram of likelihood values for a certain level of skill. The algorithm is explained in the previous blogpost as well but schematically, this is what the belief update looks like;

We start out with two arrays of prior skill beliefs for player 1 and player 2. We then combine them in a two dimensional density. This is our prior belief of the skill levels of the two players. Once a player wins, we can update this belief. One player has a lower skill level. To translate this to our belief system, we cut away all likelihood in the triangle of the losing player. Then we marginalise our belief back to the two players. By repeating this for all the games, we hope to achieve a belief of skill that can be updated in real time.

## Towards Code

Next we'll define all the functions that do the ranking. These functions are meant to be minimal, as we only want to check if this algorithm can work. It makes sense to think about what our app code might look like and to think about how to test individual components before writing the code though. I'm not particularily religious about testing but I do see some value for writing a test before writing code here. We're going to be munging with numpy matrices and typos in indices will be unforgiving.

```
p1 = np.array([1.,2.,3.])
p2 = np.array([3.,2.,1.])
p3 = np.array([1.,1.,1.])
mat1 = gen_prior(p1, p2)
mat2 = gen_prior(p2, p1)
mat3 = gen_prior(p3, p3)
m1, m2 = gen_marginals(mat1)
m3, m4 = gen_marginals(mat2)
np1, np2 = gen_marginals(cut_matrix(mat1))
assert all(m1 == m2[::-1]) #marginals should still be same shape
assert all(m3 == m4[::-1]) #marginals should still be same shape
assert np.sum(mat1) == np.sum(mat2) #matrices should have equal values
assert np.sum(mat3) == 9 #all ones should sum to 9
assert all(np1 == np2[::-1]) #marginals should still be same shape
```

With these tests in place, I could worry about the code.

```
def get_player_stats(name):
"""
Check if a player exists in the dictionary. If not, add it.
Then returns the player belief.
"""
if name not in d.keys():
d[name] = np.ones(SIZE_ARR)/np.sum(np.ones(SIZE_ARR))
return d[name]
def gen_prior(arr1, arr2):
"""
Combines two arrays into a matrix.
"""
return np.matrix(arr1).T * np.matrix(arr2)
def cut_matrix(mat):
"""
Cuts the matrix according to likelihood update rule.
Also applies normalisation
"""
posterior = np.triu(mat) + 0.000001
posterior = posterior/np.sum(posterior)
return posterior
def gen_marginals(posterior_mat):
"""
From a cut matrix, generate the appropriate marginals back
"""
winner = np.squeeze(np.asarray(np.sum(posterior_mat, 0)))
loser = np.squeeze(np.asarray(np.sum(posterior_mat, 1)))
return winner, loser
def update(winner, loser):
"""
Given a winner and user, update our state.
"""
winner_arr = get_player_stats(winner)
loser_arr = get_player_stats(loser)
prior_mat = gen_prior(loser_arr, winner_arr)
posterior_mat = cut_matrix(prior_mat)
new_winner, new_loser = gen_marginals(posterior_mat)
d[loser] = new_loser
d[winner] = new_winner
```

```
def plot_scores(plot_idx):
plt.subplot(plot_idx)
plt.scatter(
np.arange(len(users)),
[np.mean(d[usr[0]] * np.arange(SIZE_ARR)) for usr in users]
)
def plot_dist(plot_idx):
plt.subplot(plot_idx)
for usr in users:
plt.plot(np.arange(SIZE_ARR), d[usr[0]], alpha = 0.5)
```

```
d = {}
users, df = gen_data(n_users=100, n_games=400, entropy=1)
for i in json.loads(df.to_json(orient ='records')):
update(i['winner'], i['loser'])
plt.figure(1, figsize=(15,10))
plot_scores(221)
plot_dist(222)
d = {}
users, df = gen_data(n_users=100, n_games=10000, entropy=1)
for i in json.loads(df.to_json(orient ='records')):
update(i['winner'], i['loser'])
plot_scores(223)
plot_dist(224)
```

Increasing the entropy in the system will cause the ranking algorithm to measure the noise more than the signal, but more games allow for convergence to still occur.

```
d = {}
users, df = gen_data(n_users=100, n_games=400, entropy=15)
for i in json.loads(df.to_json(orient ='records')):
update(i['winner'], i['loser'])
plt.figure(1, figsize=(15,10))
plot_scores(221)
plot_dist(222)
d = {}
users, df = gen_data(n_users=100, n_games=10000, entropy=15)
for i in json.loads(df.to_json(orient ='records')):
update(i['winner'], i['loser'])
plot_scores(223)
plot_dist(224)
```

Suppose that there is more noise than signal in this sytem? In that case it will correctly indicate this to us.

```
d = {}
users, df = gen_data(n_users=100, n_games=400, entropy=50)
for i in json.loads(df.to_json(orient ='records')):
update(i['winner'], i['loser'])
plt.figure(1, figsize=(15,10))
plot_scores(221)
plot_dist(222)
d = {}
users, df = gen_data(n_users=100, n_games=10000, entropy=50)
for i in json.loads(df.to_json(orient ='records')):
update(i['winner'], i['loser'])
plot_scores(223)
plot_dist(224)
```

There's still some room for improvement to this system to make the convergence better, but it seems well enough for my purposes. I want to use this metric to rank pokemon so if I need to simulate some more outcomes between two pokemon then I won't mind. For more real-time purposes, the following tweaks seem to be important;

- New matches are allocated randomly, which is suboptimal from a learning perspective. If you want to fine tune the top 10 players it makes sense to have them play against eachother if you are unsure who is number 1. If instead you'd have one of these players compete against somebody mediocre you can predict the game outcome before it even started. This means you would not learn anything from that game.
- I am using buckets to simulate a distribution for a player. It may be better to use Beta distributions instead and save it's parameters. This would make the blogpost harder to read for more novice readers which is the main reason of not doing it, but this should improve the stability and remove the need for the hard coded smoothing parameter in the
`cut_matrix`

function.

For the pokemon dataset I only have one concern; the effect of rock paper scissors.

# Pokemon Data

I took the liberty of preparing a dataset. It's easy to do yourself with a bit of python and this project. I'll use the formula from my previous post here again to create a function that can simulate the outcome of a pokemon battle. It will be based on the number of turns that one pokemon can outlast the other.

$$ T_{ij} = \frac{HP_i}{DMG_{ji}} $$where $T_{ij}$ is the number of turns the pokemon $i$ would be able to survive the opponent $j$ if it were to be attacked indefinately, $HP_i$ is the amount of hitpoints that pokemon $i$ has and $DMG_ji$ is the amount of damage a basic attack pokemon $j$ would deal to pokemon $i$. This damage is defined via:

$$ DMG_{ji} = \frac{2L_j+10}{250} \times \frac{A_j}{D_i} \times w_{ji}$$The situation will very much assume some core aspects of the game away. The hope is that my making this assumption we'll still leave much of the actual pokemon game intact.

```
poke_df = pd.read_csv("http://koaning.io/theme/data/full_pokemon.csv")
weak_df = pd.read_csv("http://koaning.io/theme/data/pokemon_weakness.csv")
```

```
poke_df.head()
```

```
def weakness_weight(types_j, types_i):
"""
Returns w_{ji}
"""
res = 1.
possible_types = [t for t in weak_df['type']]
for i in filter(lambda t: t in possible_types, types_i):
for j in filter(lambda t: t in possible_types, types_j):
res *= float(weak_df[weak_df['type'] == j][i])
return res
def calc_tij(poke_i, poke_j):
"""
Calculate the number of turns one pokemon would outlast the other.
"""
poke_row_i, poke_row_j = poke_df[poke_df['name'] == poke_i].iloc[0], poke_df[poke_df['name'] == poke_j].iloc[0]
dmg_ji = (200. + 10)/250
dmg_ji *= float(poke_row_j['attack'])/float(poke_row_i['defence'])
dmg_ji *= weakness_weight(eval(poke_row_j['type']), eval(poke_row_i['type']))
return poke_row_i['hp']/dmg_ji
def sim_poke_battle(poke_i, poke_j):
"""
Simulate the outcome between two pokemon. Output: winner_name, loser_name.
"""
if calc_tij(poke_i, poke_j) > calc_tij(poke_j, poke_i):
return poke_i, poke_j
return poke_j, poke_i
```

```
d = {}
poke_names = [i for i in poke_df['name']]
for _ in range(len(poke_names) - 1):
p1, p2 = poke_names[_], poke_names[_ + 1]
win, lose = sim_poke_battle(p1, p2)
update(win, lose)
# this takes a while!
for _ in range(50000):
i, j = np.random.choice(range(len(poke_names)), 2)
p1, p2 = poke_names[i], poke_names[j]
win, lose = sim_poke_battle(p1, p2)
update(win, lose)
```

This takes a while, but when it is done we can create the same distribution chart as before but now for pokemon.

```
plt.figure(figsize=(15,5))
for poke in poke_names:
plt.plot(np.arange(SIZE_ARR), d[poke], alpha = 0.5)
```

Let's summarise the distributions and put them in a dataframe.

```
scores = []
for poke in poke_names:
scores.append([poke, np.mean(d[poke] * np.arange(SIZE_ARR))])
df = pd.DataFrame(scores, columns = ['name', 'mle'])
```

Now that we have some results, lets look at the top performing and the bottom performning pokemon.

```
print df.sort('mle').head(25)
print df.sort('mle').tail(25)
```

I have run this algorithm a few times and it seems doesn't seem to converge very consistently. It does some things very well; the fact that magicarp is generally performing poorly is good news. I've also noticed Mew/Mewtwo in the higher skilled area. I'm still concerned that this algorithm fails to grasp the rock/paper scissors element that is in the game. But the fact that I am assuming that every pokemon is fighting with only basic attacks also does not help.

As a final push one might consider making an ensemble of these pokemon rankings to see if the convergence can be make to go faster. Another tweak to consider is to have the matching of these pokemon not be random during the learning. I like the results of the exercise, but it seems that we are not done with ranking pokemon.

```
</div>
```