Fun with bayes: peegeem
There's a dataset with numbers related to smoking, age and health outcome. Given such a dataset, you might be interested in running some queries. You could use SQL for this, but if you've taken a course in probability theory then you might have been thought to write this instead:
$$ P(\text{outcome} \mid \text{smoker}=\texttt{Yes}), \text{age}>40) $$
It's a very compact notation if you think about it, so why not allow this in Python as well?
Enter peegeem
I wrote a small library in Python that allows for exactly this. Not only does it give you the fancy notation, you're also able to declare the probabilistic graphical model that outlines the causal relationships between your variables.
from peegeem import DAG
# Define the DAG for the PGM, nodes is a list of column names, edges is a list of tuples
dag = DAG(nodes, edges, dataframe)
# Get variables out
outcome, smoker, age = dag.get_variables()
# Use variables to construct a probablistic query
P(outcome | (smoker == "Yes") & (age > 40))
# Latex utility, why not?
P.to_latex(outcome | (smoker == "Yes") & (age > 40))
It's pretty darn neat! It's really like writing down maths. But want to know the real kicker? You can use this in a notebook together with some widgets!

It's so much fun! Not just because you get a nice domain specific language to work with, by mixing and matching widget you actually get a domain specific interface too!
This is super exciting and I hope to work on more projects like this one. If you're keen to learn more, be sure to check out my latest livestream on the topic.