Anatomy of a Python extension

Don't jump ahead!
Complete the exercise for the previous section before you start this one.
It's located in exercises/01_intro/00_welcome, in the course GitHub's repository.
Use wr to start the course and verify your solutions.

To invoke Rust code from Python we need to create a Python extension module.

Rust, just like C and C++, compiles to native code. For this reason, extension modules written in Rust are often called native extensions. Throughout this course we'll use the terms Python extension, Python extension module and native extension interchangeably.

maturin

We'll use maturin to build, package and publish Python extensions written in Rust. Let's install it:

rye install "maturin>=1.6"

Exercise structure

All exercises in this course will follow the same structure:

  • an extension module written in Rust, in the root of the exercise directory
  • a Python package that invokes the functionality provided by the extension, in the sample subdirectory

The extension module will usually be tested from Python, in the sample/tests subdirectory. You will have to modify the Rust code in the extension module to make the tests pass.

Extension structure

Let's explore the structure of the extension module for this section.

01_setup
├── sample
├── src
│   └── lib.rs
├── Cargo.toml
└── pyproject.toml

Cargo.toml

The manifest file, Cargo.toml, looks like this:

[package]
name = "setup"
version = "0.1.0"
edition = "2021"

[lib]
name = "setup"
crate-type = ["cdylib"]

[dependencies]
pyo3 = "0.21.1"

Two things stand out in this file compared to a regular Rust project:

  • The crate-type attribute is set to cdylib.
  • The pyo3 crate is included as a dependency.

Let's cover these two points in more detail.

Linking

Static linking

By default, Rust libraries are compiled as static libraries.
All dependencies are linked into the final executable at compile-time, making the executable self-contained1.

That's great for distributing applications, but it's not ideal for Python extensions.
To perform static linking, the extension module would have to be compiled alongside the Python interpreter. Furthermore, you'd have to distribute the modified interpreter to all your users.
At the ecosystem level, this process would scale poorly, as each user needs to leverage several unrelated extensions at once. Every single project would have to compile its own bespoke Python interpreter.

Dynamic linking

To avoid this scenario, Python extensions are packaged as dynamic libraries.
The Python interpreter can load these libraries at runtime, without having to be recompiled. Instead of distributing a modified Python interpreter to all users, you must now distribute the extension module as a standalone file.

Rust supports dynamic linking, and it provides two different flavors of dynamic libraries: dylib and cdylib. dylib are Rust-flavored dynamic libraries, geared towards Rust-to-Rust dynamic linking. cdylib, on the other hand, are dynamic libraries that export a C-compatible interface (C dynamic library).

You need a common dialect to get two different languages to communicate with each other. They both need to speak it and understand it.
That bridge, today, is C's ABI (Application Binary Interface).

That's why, for Python extensions, you must use the cdylib crate type:

[lib]
crate-type = ["cdylib"]

pyo3

It's not enough to expose a C-compatible interface. You must also comply with the Python C API, the interface Python uses to interact with C extensions.

Doing this manually is error-prone and tedious. That's where the pyo3 crate comes in: it provides a safe and idiomatic way to write Python extensions in Rust, abstracting away the low-level details.

In lib.rs, you can see it in action:

#![allow(unused)]
fn main() {
use pyo3::prelude::*;

#[pyfunction]
fn it_works() -> bool {
    todo!()
}

/// A Python module implemented in Rust.
#[pymodule]
fn setup(m: &Bound<'_, PyModule>) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(it_works, m)?)?;
    Ok(())
}
}

We're using pyo3 to define a Python function, named it_works, that returns a boolean. The function is then exposed to Python at the top-level of our extension module, named setup.

That same function is then invoked from Python, inside sample/tests/test_sample.py:

from setup import it_works

def test_works(): 
    assert it_works()

We'll cover the details of #[pyfunction] and #[pymodule] in the next section, no worries.

pyproject.toml

Before we move on, let's take a look at pyproject.toml, the Python "manifest" of the extension module:

[build-system]
requires = ["maturin>=1.6,<2.0"]
build-backend = "maturin"

[project]
name = "setup"
# [...]
requires-python = ">=3.8"
dynamic = ["version"]
[tool.maturin]
features = ["pyo3/extension-module"]

It specifies the build system, the extension name and version, the required Python version, and the features to enable when building the extension module. This is what rye looks at when building the extension module, before delegating the build process to maturin, which in turn invokes cargo to compile the Rust code.

What do I need to do?

A lot has to go right behind the scenes to make a Python extension work.
That's why the exercise for this section is fairly boring—we want to verify that you can build and test a Python extension module without issues.

Things will get a lot more interesting over the coming sections, I promise!

Troubleshooting

You may run into this error when using rye and pyo3 together:

<compiler output>
= note: ld: warning: search path '/install/lib' not found
          ld: library 'python3.12' not found
          clang: error: linker command failed with exit code 1

This seems to be a bug in rye.
To work around the issue, run the following command in the root of the course repository:

cargo run -p "patcher"

wr should now be able to build the extension module without issues and run the tests. No linker errors should be surfaced.

The patcher tool is a temporary workaround for a bug in rye.
It hasn't been tested on Windows: please open an issue if you encounter any problems.

References

Footnotes

1

This is true up to an extent. In most cases, some dependencies are still dynamically linked, e.g. libc on most Unix systems. Nonetheless, the final executable is self-contained in the sense that it doesn't rely on the presence of the Rust standard library or any other Rust crate on the user's system.

Exercise

The exercise for this section is located in 01_intro/01_setup