Minimize GIL locking

All our examples so far fall into two categories:

The Rust function holds the GIL for the entire duration of its execution.
The Rust function doesn't hold the GIL at all, going straight into Python::allow_threads mode.

Real-world applications are often more nuanced, though.
You'll need to hold the GIL for some operations (e.g. passing data back to Python), but you're able to release it for others (e.g. long-running computations).

The goal is to minimize the time spent holding the GIL to the bare minimum, thus maximizing the potential parallelism of your application.

Strategy 1: isolate the GIL-free section

Let's look at an example: we're given a list of numbers and we need to modify it in place, replacing each number with the result of an expensive computation that uses no Python objects.

To minimize GIL locking, we create Rust vector from the Python list, release the GIL, and perform the computation and then re-acquire the GIL to update the Python list in place:

#![allow(unused)]
fn main() {
#[pyfunction]
fn update_in_place<'py>(
    python: Python<'py>,
    numbers: Bound<'py, PyList>
) -> PyResult<()> {
    // Holding the GIL
    let v: Vec<i32> = numbers.extract()?;
    let updated_v: Vec<_> = python.allow_threads(|| {
        v.iter().map(|&n| expensive_computation(n)).collect()
    });
    // Back to holding the GIL
    for (i, &n) in updated_v.iter().enumerate() {
        numbers.set_item(i, n)?;
    }
    Ok(()
}

fn expensive_computation(n: i32) -> i32 {
    // Some heavy number crunching
    // [...]
}
}

Strategy 2: manually re-acquire the GIL inside the closure

In the example above, we've created a whole new vector to decouple the GIL-free section from the GIL-holding one. If the input data is large, this can be a significant overhead.

Let's explore a different approach: we won't create a new pure-Rust vector. Instead, we will re-acquire the GIL inside the closure—we'll hold it to access each list element and, after the computation is done, update it in place. Nothing more.

Assuming you know nothing about Ungil, the naive solution might look like this:

#![allow(unused)]
fn main() {
#[pyfunction]
fn update_in_place<'py>(
    python: Python<'py>,
    numbers: Bound<'py, PyList>
) -> PyResult<()> {
    python.allow_threads(|| -> PyResult<()> {
        let n_numbers = numbers.len();
        for i in 0..n_numbers {
            let n = numbers.get_item(i)?.extract::<i64>()?;
            let result = expensive_computation(n);
            numbers.set_item(i, result))?;
        }
        Ok(())
    })
}
}

It won't compile, though. We're using a GIL-bound object (numbers) in a GIL-free section (inside python.allow_threads). We need to unbind it first.

`Py<T>` and `Bound<'py, T>`

Using Bound<'py, T>::unbind we get a Py<T> object back. It has no 'py lifetime, it's no longer bound to the GIL. We can try to use it in the GIL-free section:

#![allow(unused)]
fn main() {
#[pyfunction]
fn update_in_place<'py>(
    python: Python<'py>,
    numbers: Bound<'py, PyList>
) -> PyResult<()> {
    let numbers = numbers.unbind();
    python.allow_threads(|| -> PyResult<()> {
        let n_numbers = numbers.len();
        for i in 0..n_numbers {
            let n = numbers.get_item(i)?.extract::<i64>()?;
            let result = expensive_computation(n);
            numbers.set_item(i, result)?;
        }
        Ok(())
    })
}
}

But it won't compile either. numbers.len(), numbers.get_item(i), and numbers.set_item(i, result) all require the GIL. Py<T> is just a pointer to a Python object, it won't allow us to access it if we're not holding the GIL.

We need to re-bind it using a Python<'py> token, thus getting a Bound<'py, PyList> back. How do we get a Python<'py> token inside the closure, though? Using Python::with_gil: it's the opposite of Python::allow_threads, it makes sure to acquire the GIL before executing the closure and release it afterwards. The closure is given a Python token as argument, which we can use to re-bind the PyList object:

#![allow(unused)]
fn main() {
#[pyfunction]
fn update_in_place<'py>(
    python: Python<'py>,
    numbers: Bound<'py, PyList>
) -> PyResult<()> {
    let n_numbers = numbers.len();
    let numbers_ref = numbers.unbind();
    // Release the GIL
    python.allow_threads(|| -> PyResult<()> {
        for i in 0..n_numbers {
            // Acquire the GIL again, to access the
            // i-th element of the list
            let n = Python::with_gil(|inner_py| {
                numbers_ref
                    .bind(inner_py)
                    .get_item(i)?
                    .extract::<i64>()
            })?;
            // Run the computation without holding the GIL
            let result = expensive_computation(n);
            // Re-acquire the GIL to update the list in place
            Python::with_gil(|inner_py| {
                numbers_ref.bind(inner_py).set_item(i, result)
            })?;
        }
        Ok(())
    })
}
}

Be mindful of concurrency

The GIL is there for a reason: to protect Python objects from concurrent access.
Whenever you release the GIL, you're allowing other threads to run and potentially modify the Python objects you're working with.

In the examples above, another Python thread could modify the numbers list while we're computing the result. E.g. it could remove an element, causing the index i to be out of bounds.

This is a common issue in multi-threaded programming, and it's up to you to handle it.
Consider using synchronization primitives like Lock to serialize access to the Python objects you're working with. In other words, move towards fine-grained locking rather than the lock-the-world approach you get with the GIL.

References

Exercise

The exercise for this section is located in 03_concurrency/04_minimize_gil_locking

Keyboard shortcuts

Rust-Python interoperability