diskcache with zlib

2025-12-16

I noticed this segment in the diskcache docs that made me want to run a quick benchmark.

If you don't know, diskcache is an amazing little library that gives you a sqlite-powered caching mechanism in Python. By default it uses pickle to store everything but you can write a custom disk class that lets you determine how data is serialized yourself.

This is the example on the docs:

class JSONDisk(diskcache.Disk):
    def __init__(self, directory, compress_level=1, **kwargs):
        self.compress_level = compress_level
        super().__init__(directory, **kwargs)

    def put(self, key):
        json_bytes = json.dumps(key).encode('utf-8')
        data = zlib.compress(json_bytes, self.compress_level)
        return super().put(data)

    def get(self, key, raw):
        data = super().get(key, raw)
        return json.loads(zlib.decompress(data).decode('utf-8'))

    def store(self, value, read, key=UNKNOWN):
        if not read:
            json_bytes = json.dumps(value).encode('utf-8')
            value = zlib.compress(json_bytes, self.compress_level)
        return super().store(value, read, key=key)

    def fetch(self, mode, filename, value, read):
        data = super().fetch(mode, filename, value, read)
        if not read:
            data = json.loads(zlib.decompress(data).decode('utf-8'))
        return data

with Cache(disk=JSONDisk, disk_compress_level=6) as cache:
    pass

I had Claude write me a quick benchmark to check on the disk-space that you might save and here's the summary chart:

Uploaded image
ZLIB effect

A few things to note:

  • the compression mainly works if you're dealing with loads of text, so if you're dealing with LLMs input/output you can get a lot of mileage out of this
  • the compression works per json object, so don't expect any savings if the cache values contain many duplicates across keys
  • this trick works with JSON, but I can also imagine that you might be able to pull off something clever with embeddings too

If you want to play with the notebook, you can find it here.

TaskyPi can turn your pyproject.toml into a Makefile too

2025-11-25

This blogpost originally appeared as a YouTube video here

I recently discovered TaskiPy, a tool that lets you define task automation directly in your pyproject.toml file—essentially turning it into a Makefile alternative.

I came across this while working on a PR for the Altair library, where their developer documentation instructs contributors to run commands using TaskiPy.

Basic Setup

TaskiPy works by adding a [tool.taskipy.tasks] section to your pyproject.toml:

[tool.taskipy.tasks]
lint = "ruff check ."
format = "black ."
print = "echo hello"

You can then list available tasks with:

task -l

And run them with:

task print
# Output: hello

Chaining Commands

You can chain multiple tasks together:

[tool.taskipy.tasks]
lint = "ruff check ."
format = "black ."
rough_check = "task lint && task format"

Variables

TaskiPy supports variables:

[tool.taskipy.variables]
name = "Vincent"

[tool.taskipy.tasks]
print = "echo Hello {name}"
task print
# Output: Hello Vincent

Pre and Post Hooks

You can define setup and teardown steps using pre_ and post_ prefixes:

[tool.taskipy.tasks]
print = "echo Hello Vincent"
pre_print = "echo something beforehand"
post_print = "echo something afterwards"

When you run task print, all three commands execute in order.

Note: If you pass extra arguments to a task (e.g., task print "echo some more"), the pre and post hooks are skipped, and your arguments are appended to the main command.

Use TaskiPy?

The main advantage is consolidation—you don't need separate Makefiles or Just files. This is particularly useful because:

  • make isn't always available on Windows
  • just isn't commonly pre-installed on Linux systems
  • If someone is working on your Python project, they already have access to pyproject.toml

There are two notable downsides:

  1. Multi-line commands can be awkward since TOML configuration leans toward one-liners
  2. Bootstrap problem: While you eliminate the need for make or just to be pre-installed, you now require TaskiPy (or uv) to be installed first. TaskiPy cannot install itself, so it's not entirely a free lunch.

Banning SQLAlchemy Dialects with Ruff

2025-11-25

This blogpost originally appeared as a YouTube video here

When working with SQLAlchemy, you might want to ensure your codebase stays portable across different database backends. SQLAlchemy supports multiple databases—SQLite, PostgreSQL, MySQL, and more — but they're all different. Sometimes, if you're building a library or larger system, you don't want developers accidentally importing PostgreSQL-specific code that breaks SQLite compatibility.

So how do you guarantee this?

Turns out, you can prevent this using Ruff.

Configuration

Here's how to set it up in your pyproject.toml:

[tool.ruff.lint.flake8-tidy-imports.banned-api]
"sqlalchemy.dialects".msg = "Only use portable SQLAlchemy types, not dialect-specific ones"

You can ban entire submodules and attach custom error messages that explain why the import is forbidden—helpful for new team members encountering the error.

Example

Say you have this code:

from sqlalchemy.dialects.sqlite import TEXT

# ... rest of your code

Running ruff check will catch it now!