A show method for polars

Sometimes you're dealing with a long polars pipeline that has a bug. It's somewhere in the entire pipeline, but you don't quite know where yet.

Maybe it looks like this:

(
    pl.scan_parquet("wow-full.parquet")
    .filter(pl.col("race").is_in(races)) .with_columns(date=pl.col("datetime").dt.truncate("4w"))
    .group_by("race", "date")
    .agg(hours=pl.len() / 6, unique_players=pl.n_unique("player_id"))
    .group_by("race")
    .agg(
        hours=pl.sum("hours").round().cast(pl.Int32), 
        over_time=pl.col("hours")
    )
)

To help debug in moments like this, you can monkeypatch a show() method to polars dataframes.

def show(self, n=5, name=None):
    if name:
        print(name)
    if isinstance(self, pl.DataFrame):
        print(self.head(n))
    else:
        print(self.head(n).collect())
    return self

pl.DataFrame.show = show
pl.LazyFrame.show = show

By doing this you'll be able to keep on chaining but you will be able to peek at different moments in the pipeline to see if the columns/types are what you expect.

You can now change your pipeline to get useful print statements.

(
    pl.scan_parquet("wow-full.parquet")
    .filter(pl.col("race").is_in(races))
    .show()
    .with_columns(date=pl.col("datetime").dt.truncate("4w"))
    .group_by("race", "date")
    .agg(hours=pl.len() / 6, unique_players=pl.n_unique("player_id"))
    .show()
    .group_by("race")
    .agg(
        hours=pl.sum("hours").round().cast(pl.Int32), 
        over_time=pl.col("hours")
    )
    .show()
)

Also made a 1 minute recording of this setup, if folks prefer a live demo.

koaning.io

A show method for polars

Related Posts

Overtype markdown

The titanic dataset has a twist

The Sock Drawer Paradox

cline feels like an upgrade