diskcache with zlib

2025-12-16

I noticed this segment in the diskcache docs that made me want to run a quick benchmark.

If you don't know, diskcache is an amazing little library that gives you a sqlite-powered caching mechanism in Python. By default it uses pickle to store everything but you can write a custom disk class that lets you determine how data is serialized yourself.

This is the example on the docs:

class JSONDisk(diskcache.Disk):
    def __init__(self, directory, compress_level=1, **kwargs):
        self.compress_level = compress_level
        super().__init__(directory, **kwargs)

    def put(self, key):
        json_bytes = json.dumps(key).encode('utf-8')
        data = zlib.compress(json_bytes, self.compress_level)
        return super().put(data)

    def get(self, key, raw):
        data = super().get(key, raw)
        return json.loads(zlib.decompress(data).decode('utf-8'))

    def store(self, value, read, key=UNKNOWN):
        if not read:
            json_bytes = json.dumps(value).encode('utf-8')
            value = zlib.compress(json_bytes, self.compress_level)
        return super().store(value, read, key=key)

    def fetch(self, mode, filename, value, read):
        data = super().fetch(mode, filename, value, read)
        if not read:
            data = json.loads(zlib.decompress(data).decode('utf-8'))
        return data

with Cache(disk=JSONDisk, disk_compress_level=6) as cache:
    pass

I had Claude write me a quick benchmark to check on the disk-space that you might save and here's the summary chart:

Uploaded image
ZLIB effect

A few things to note:

  • the compression mainly works if you're dealing with loads of text, so if you're dealing with LLMs input/output you can get a lot of mileage out of this
  • the compression works per json object, so don't expect any savings if the cache values contain many duplicates across keys
  • this trick works with JSON, but I can also imagine that you might be able to pull off something clever with embeddings too

If you want to play with the notebook, you can find it here.