Python can run Mojo now

Chris Lattner mentioned that Python can actually call Mojo code now. I love this idea (!) as I'm definitely in the market for a simple compiled language that can offer Python some really fast functions. So I gave it a quick spin

Setup

The setup is much simpler than I remember it, you can use uv for it now.

uv pip install modular --index-url https://dl.modular.com/public/nightly/python/simple/

After that you can declare a .mojo file that looks like this:

# mojo_module.mojo
from python import PythonObject
from python.bindings import PythonModuleBuilder
import math
from os import abort

@export
fn PyInit_mojo_module() -> PythonObject:
    try:
        var m = PythonModuleBuilder("mojo_module")
        m.def_function[factorial]("factorial", docstring="Compute n!")
        return m.finalize()
    except e:
        return abort[PythonObject](String("error creating Python Mojo module:", e))

fn factorial(py_obj: PythonObject) raises -> PythonObject:
    # Raises an exception if `py_obj` is not convertible to a Mojo `Int`.
    var n = Int(py_obj)

    var result = 1
    for i in range(1, n + 1):
        result *= i

    return result

And you can then load it from Python via:

# main.py
import max.mojo.importer
import os
import sys
import time 
import math
sys.path.insert(0, "")

import mojo_module

start = time.time()
print(mojo_module.factorial(10))
end = time.time()
print(f"Time taken: {end - start} seconds for mojo")


start = time.time()
print(math.factorial(10))
end = time.time()
print(f"Time taken: {end - start} seconds for python")

This was the output:

3628800
Time taken: 3.0279159545898438e-05 seconds for mojo
3628800
Time taken: 5.0067901611328125e-06 seconds for python

This all works, but at the time of making this blogpost I was able to spot some rough edges. If I increase the factorial number to 100 then the output changes.

0
Time taken: 2.7894973754882812e-05 seconds for mojo
188267717688892609974376770249160085759540364871492425887598231508353156331613598866882932889495923133646405445930057740630161919341380597818883457558547055524326375565007131770880000000000000000000000000000000
Time taken: 9.298324584960938e-06 seconds for python

This is probably because of some overflow issues on the modular side. The docs mention that this whole stack is pretty early, and I guess this is a sign of that.

Another example

Given that the overflow is probably the issue here, I figured I'd run one extra example just to see if we could measure a speed increase. So I went with a naive prime number counting example. This is the mojo code:

from python import PythonObject
from python.bindings import PythonModuleBuilder
import math
from os import abort

@export
fn PyInit_mojo_module() -> PythonObject:
    try:
        var m = PythonModuleBuilder("mojo_module")
        m.def_function[count_primes]("count_primes", docstring="Count primes up to n")
        return m.finalize()
    except e:
        return abort[PythonObject](String("error creating Python Mojo module:", e))

fn count_primes(py_obj: PythonObject) raises -> PythonObject:
    var n = Int(py_obj)
    var count: Int = 0
    for i in range(2, n + 1):
        var is_prime: Bool = True
        for j in range(2, i):
            if i % j == 0:
                is_prime = False
                break
        if is_prime:
            count += 1
    return count

This is the Python code, note that I also added a numpy implementation for comparison.

import numpy as np
import max.mojo.importer
import os
import sys
import time 
import math
sys.path.insert(0, "")

import mojo_module

def count_primes(n):
    count = 0
    for i in range(2, n + 1):
        is_prime = True
        for j in range(2, i):
            if i % j == 0:
                is_prime = False
                break
        if is_prime:
            count += 1
    return count


def count_primes_numpy(n):
    if n < 2:
        return 0
    
    candidates = np.arange(2, n + 1)
    is_prime_mask = np.ones(len(candidates), dtype=bool)
    
    # For each position in our candidates array
    for idx, candidate in enumerate(candidates):
        if candidate == 2:  # 2 is prime
            continue
            
        # Create divisors array [2, 3, ..., candidate-1]
        divisors = np.arange(2, candidate)
        
        # Vectorized divisibility check
        has_divisor = np.any(candidate % divisors == 0)
        
        if has_divisor:
            is_prime_mask[idx] = False
    
    return np.sum(is_prime_mask)

n = 20_000

start = time.time()
print(count_primes(n))
end = time.time()
print(f"Time taken: {end - start} seconds for python")

start = time.time()
print(count_primes_numpy(n))
end = time.time()
print(f"Time taken: {end - start} seconds for numpy")

start = time.time()
print(mojo_module.count_primes(n))
end = time.time()
print(f"Time taken: {end - start} seconds for mojo")

The results look promising!

2262
Time taken: 0.44585609436035156 seconds for python
2262
Time taken: 0.25995898246765137 seconds for numpy
2262
Time taken: 0.011101961135864258 seconds for mojo

There are better algorithms for prime counting compared to what I am doing here, so take this with a truck of salt, but this is very exciting. Mojo is a lot easier to learn than Rust, but I seem to get the function speedup that I want (need).

At the time of writing the main downside is that the modular stack is still early, but there also seems to be light support for building these extensions.

In short: it's not ready for prime-time just yet, but I'm more optimistic now that the dream is getting within reach!

koaning.io

Python can run Mojo now

Setup

Another example

Related Posts

cline feels like an upgrade

Domain Specific Keyboards: the mathpad

Introducing bespoken

departure mono