Welcome

Welcome to "Advanced Rust testing"!
No application is an island: you need to interact with third-party APIs, databases and who knows what else. Testing those interactions is tricky, to say the least! This course will focus on expanding your Rust testing toolkit, going beyond the basic techniques you're already familiar with. At the end of the course you'll have a strategy to test most of the scenarios that are relevant for a complex Rust application.

The course assumes you have a good understanding of Rust's basic concepts and want to move beyond the built-in testing toolkit.

Methodology

This course is based on the "learn by doing" principle.
You'll build up your knowledge in small, manageable steps. It has been designed to be interactive and hands-on.

Mainmatter developed this course to be delivered in a classroom setting, over a whole day: each attendee advances through the lessons at their own pace, with an experienced instructor providing guidance, answering questions and diving deeper into the topics as needed.
If you're interested in attending one of our training sessions, or if you'd like to bring this course to your company, please get in touch.

You can also take the course on your own, but we recommend you find a friend or a mentor to help you along the way should you get stuck. You can also find solutions to all exercises in the solutions branch of the GitHub repository.

Prerequisites

To follow this course, you must install Rust.
If Rust is already installed on your machine, make sure to update it to the latest version:

# If you installed Rust using `rustup`, the recommended way,
# you can update to the latest stable toolchain with:
rustup update stable

You'll also need the nightly toolchain, so make sure to install it:

rustup toolchain install nightly

You also need to install ctr (Check Test Results), a little tool that will be invoked to verify the outcomes of your tests:

# Install `ctr` from the top-level folder of the repository
cargo install --path ctr

Don't start the course until you have these tools installed and working.

Structure

On the left side of the screen, you can see that the course is divided into sections.
To verify your understanding, each section is paired with an exercise that you need to solve.

You can find the exercises in the companion GitHub repository.
Before starting the course, make sure to clone the repository to your local machine:

# If you have an SSH key set up with GitHub
git clone git@github.com:mainmatter/rust-advanced-testing-workshop.git
# Otherwise, use the HTTPS URL:
#
#   git clone https://github.com/mainmatter/rust-advanced-testing-workshop.git

We recommend you work on a branch, so you can easily track your progress and pull updates from the main repository if needed:

cd rust-advanced-testing-workshop
git checkout -b my-solutions

All exercises are located in the exercises folder. Each exercise is structured as a Rust package. The package contains the exercise itself, instructions on what to do (in src/lib.rs), and a mechanism to automatically verify your solution.

wr, the workshop runner

To verify your solutions, we've provided a tool that will guide you through the course. It is the wr CLI (short for "workshop runner"). Install it with:

cargo install --locked workshop-runner

In a new terminal, navigate back to the top-level folder of the repository. Run the wr command to start the course:

wr

wr will verify the solution to the current exercise.
Don't move on to the next section until you've solved the exercise for the current one.

We recommend committing your solutions to Git as you progress through the course, so you can easily track your progress and "restart" from a known point if needed.

Enjoy the course!

Author

This course was written by Luca Palmieri, Principal Engineering Consultant at Mainmatter.
Luca has been working with Rust since 2018, initially at TrueLayer and then at AWS.
Luca is the author of "Zero to Production in Rust", the go-to resource for learning how to build backend applications in Rust, and "100 Exercises to Learn Rust", a learn-by-doing introduction to Rust itself.
He is also the author and maintainer of a variety of open-source Rust projects, including cargo-chef, Pavex and wiremock.

Exercise

The exercise for this section is located in 00_intro/00_welcome

Exercise expectations

By this point you should have all the tools installed and ready to go. Let's discuss how automated verification works in this course.

This is a testing workshop, therefore we need to check that the tests you write behave as expected. It's a bit meta!
It's not enough to know that a test failed, we also need to know why it failed and what message it produced. We do this by using ctr, the custom tool you just installed. It runs the tests in each exercise and compares the outcome with a set of expectations.

You can find those expectations in the expectations.yml file.
You should never modify this file. Refer to it in order to understand what the tests are supposed to do, but don't change it.

Exercise

The exercise for this section is located in 01_better_assertions/00_intro

The built-in testing toolkit

The standard library provides three macros for test assertions: assert!, assert_eq! and assert_ne!.

They're used to check that a condition is true, or that two values are equal or not equal, respectively.

#![allow(unused)]
fn main() {
#[test]
fn t() {
    assert!(true);
    assert_eq!(1, 1);
    assert_ne!(1, 2);
}
}

Panic messages

If the assertion fails, the macro will panic and it'll try to print a useful message for you to understand what went wrong. In the case of assert_eq! and assert_ne!, the message will include the values that were compared.

#![allow(unused)]
fn main() {
#[test]
fn t() {
    assert_eq!(1, 2);
}
}
thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `2`', src/main.rs:2:5

In the case of assert!, the message will include the condition that was checked, stringified.

#![allow(unused)]
fn main() {
#[test]
fn t() {
    let v = vec![1];
    assert!(v.is_empty());
}
}
thread 'main' panicked at 'assertion failed: v.is_empty()', src/main.rs:3:5

Custom panic messages

The default panic messages are useful for simple cases, but they don't keep up with more complex scenarios.
Going back to our Vec example, we might want to know what values were in the vector when the assertion failed, or how many elements it actually contained.

That's why all three macros accept an additional (optional) argument: a custom message to print when the assertion fails.
You've seen this in the previous exercise:

#![allow(unused)]
fn main() {
#[test]
fn assertion_with_message() {
    assert_eq!(2 + 2, 5, "The Rust compiler hasn't read 1984 by George Orwell.")
}
}

The custom message will be printed in addition to the default message for assert_eq! and assert_ne!.
For assert!, it will replace the default message.

Exercise

The exercise for this section is located in 01_better_assertions/01_std_assertions

Assertion infrastructure

As you've seen in the previous exercise, you can get pretty nice test failure messages with the standard library's assertions if you take the time to write a custom message. That additional friction is a problem, though.

If you don't bother to write a custom message, you'll get a generic error that doesn't help you understand what went wrong. It'll take you longer to fix tests.

If you choose to bother, you don't want to write the same custom message over and over again. You want to write it once and reuse it. You end up writing a custom assertion function, like we did in the previous exercise.
But you aren't working on this project alone. You have a team! You now need to teach your team that this custom assertion function exists if you want to have a consistent testing style across your codebase.
Congrats, you've just written your own assertion library!

Invest where it matters

Don't get me wrong: you should write custom assertions.
Once your project gets complex enough, you will have to write your own matchers. They'll be bespoke to your domain and they'll help you write tests that are easy to read and maintain.

But that's a tiny fraction of the assertions you'll write.
For all the generic stuff, the one that stays the same across projects, you don't want to take over the burden of writing and maintaining your own assertion library.
In that area, you want to standardise on an existing library that's well maintained and has a large community. If you do that, you'll be able to reuse your knowledge across projects and you'll be able to find help online when you need it. You can always choose to contribute to the library if you find a bug or a missing feature.

googletest

There's a few options when it comes to assertion libraries for Rust.
We'll use googletest in this workshop.
It's a Rust port of the famous GoogleTest C++ testing library.
It comes, out of the box, with a rich set of matchers and a nice way to write custom ones. It also includes a few useful macros for more complex testing scenarios—we'll explore them in the coming exercises.

Exercise

The exercise for this section is located in 01_better_assertions/02_googletest

Basic matchers

To truly leverage a testing library like googletest you need to get familiar with their built-in matchers. They're the building blocks of your assertions and they need to roll off your fingers as easily as assert_eq! does.

We'll spend this exercise and a few more to get familiar with the most common matchers, starting with the most basic ones.

Tooling helps: coding assistants like GitHub Copilot or Cody will start suggesting the right matchers as you type if you've already used them in a few tests in the same project.

Exercise

The exercise for this section is located in 01_better_assertions/03_eq

Option and Result matchers

googletest comes with a few special matchers for Option and Result that return good error messages when something that should be Some or Ok is actually None or Err, and vice-versa.

Exercise

The exercise for this section is located in 01_better_assertions/04_options_and_results

Enums

The matchers we've seen in the previous exercise are specialised for Option and Result, but googletest also has a more generic matcher to match variants of arbitrary enums (and other patterns).

Exercise

The exercise for this section is located in 01_better_assertions/05_enums

Collections

We close our tour of googletest's built-in matchers with a look at specialised matchers for collections.

googletest really shines with collections. The matchers are very expressive and can be combined in powerful ways. Failure messages are also extremely helpful, showing the actual values and highlighting the differences.
Achieving the same level of helpfulness with assert! would require a lot of boilerplate!

Exercise

The exercise for this section is located in 01_better_assertions/06_collections

Custom matchers

Built-in matchers can only take you so far. Sometimes you need to write your own!

The Matcher trait

All matchers must implement the Matcher trait. There are two key methods you need to implement:

Optionally, you can also implement the explain_match method if you want to include further information derived from the actual and expected values in the failure message shown to the user.

Patterns

Most matchers in googletest follow the same pattern.
You define two items:

  • A struct which implements the Matcher trait ( e.g. EqMatcher)
  • A free function that returns an instance of the struct ( e.g. eq)

The free function is a convenience for the user since it results in terser assertions.
You can also choose to make the struct type private, returning impl Matcher from the free function instead (see anything as an example).

Exercise

The exercise for this section is located in 01_better_assertions/07_custom_matcher

expect_that!

All your googletest tests so far have used the assert_that! macro.
If the assertion fails, it panics and the test fails immediately. No code after the assertion is executed.

expect_that!

googletest provides another macro, called expect_that!.
It uses the same matchers as assert_that!, but it doesn't panic if the test fails.
When the test ends (either because the entire test function has been executed or because it later panicked), googletest will check if any expect_that! assertions failed and report them as test failures.

This allows you to write tests that check multiple things and report all the failures at once.
A good use case is verifying multiple properties on the same object.

Exercise

The exercise for this section is located in 01_better_assertions/08_expect_that

Snapshot testing

In all the tests we've written so far we've always manually created the expected value.
This is fine for simple cases, but it can quickly become cumbersome when the expected value is complex (e.g. a large JSON document) and it needs to be updated fairly often (e.g. the responses of a downstream API service that's under active development).

To solve this problem we can use snapshot testing.
You snapshot the output of an operation and compare it with a previously saved snapshot. You then review the changes and decide whether they are expected or not: if they are, we can automatically update the snapshot.

insta

insta is an established snapshot testing library for Rust.

It comes with a CLI, cargo-insta, which we'll use to manage our snapshots. Install it before moving forward:

cargo install --locked cargo-insta

Exercise

The exercise for this section is located in 02_snapshots/00_intro

Your first snapshots

insta macros

To work with snapshots, we need to use insta's assertion macros.
There's one macro for each format we want to compare:

  • assert_snapshot! for strings
  • assert_debug_snapshot! to compare the Debug representation of a value with a snapshot
  • assert_display_snapshot! to compare the Display representation of a value with a snapshot
  • assert_json_snapshot! to compare JSON values
  • etc. for other formats (check the documentation for a complete list)

You always want to use the most specific macro available, since it will give you better error messages thanks to the more specific comparison logic.

insta review

The key command exposed by insta's CLI is cargo insta review.
It will compare the snapshots generated by your last test run with the ones you had previously saved.

Exercise

The exercise for this section is located in 02_snapshots/01_snapshots

Where do snapshots go?

Inline

In the previous exercise we used an inline snapshot.
Inline snapshots are stored in the test itself:

#![allow(unused)]
fn main() {
#[test]
fn snapshot() {
    let m = "The new value I want to save";
    assert_snapshot!(m, @"The old snapshot I want to compare against")
}
}

When you update the snapshot, the test source code is modified accordingly. Check again the lib.rs file of the previous exercise to see it for yourself!

External

Storing the snapshot inline has its pros: when you look at a test, you can immediately see what the expected value is.
It becomes cumbersome, however, when the snapshot is large: it clutters the test and makes it harder to read.

For this reason, insta supports external snapshots.
They are stored in a separate file and retrieved on the fly when the test is run:

#![allow(unused)]
fn main() {
#[test]
fn snapshot() {
    let m = "The new value I want to save";
    assert_snapshot!(m)
}
}

By default, file snapshots are stored in a snapshots folder right next to the test file where this is used. The name of the file is <module>__<name>.snap where the name is derived automatically from the test name. You choose to set a custom name, if you want to:

#![allow(unused)]
fn main() {
#[test]
fn snapshot() {
    let m = "The new value I want to save";
    assert_snapshot!("custom_snapshot_name", m)
}
}

Exercise

The exercise for this section is located in 02_snapshots/02_storage_location

Handling non-reproducible data

Sometimes the data you want to snapshot cannot be reproduced deterministically in different runs of the test.
For example, it might contain the current timestamp or a random value.

In these cases, you can use redactions to remove the non-reproducible parts of the data before taking the snapshot (and before comparing it with the saved one).

Redactions

Redactions are specified as an additional argument of the assertion macro you're using.
They only work for structured formats (e.g. JSON, XML, etc.). If you're snapshotting a string, you can use regex filters instead.

Redactions use a jq-style syntax to specify the parts of the data you want to remove: refer to the documentation for an exhaustive reference.

Exercise

The exercise for this section is located in 02_snapshots/03_redactions

Outro

Congrats, you just made it to the end of our section on snapshot testing!

Snapshot testing is a surprisingly simple technique, but it can be a real game changer. Error messages, Debug representations, API responses, macro expansions: the list of things you can test more easily with snapshots is long!
If you have any questions around insta, this is a good time to pull me over and ask them!

Exercise

The exercise for this section is located in 02_snapshots/04_outro

Mocking

We love to think about software as a collection of small, well-defined units that are then composed together to create more complex behaviours.
Real codebases are rarely that simple, though: they often contain complex interactions with external services, tangled dependencies, and a lot of incidental complexity.

Those dependencies make testing harder.

Example: a login endpoint

Let's look at an example:

#![allow(unused)]
fn main() {
async fn login(
    request: &HttpRequest,
    database_pool: &DatabasePool,
    auth0_client: &Auth0Client,
    rate_limiter: &RateLimiter,
) -> Result<LoginResponse, LoginError> {
    // [...]
}
}

The login function has four dependencies: the incoming HTTP request, a database connection pool, an Auth0 client, and a rate limiter.
To invoke login in your tests, you need to provide all of them.

Let's make the reasonable assumption that login is asking for those dependencies because it needs them to do its job.
Therefore you can expect queries and HTTP requests to be made when you invoke it in your tests. Something needs to handle those queries and requests, otherwise you won't be able to exercise the scenarios you care about.

A spectrum

When it comes to testing, all approaches exist on a spectrum.

On one end, you have full-fidelity testing: you run your code with a setup that's as close as possible to the production environment. A real database, a real HTTP client, a real rate limiter.

On the other end, you have test doubles: you replace your dependencies with alternative implementations that are easier to create and control in your tests.

Full-fidelity testing gives you the highest confidence in your code, but it can be expensive to set up and maintain.
Test doubles are cheaper to create, but they can be a poor representation of the real world.

This course

During this course, we'll cover both approaches.
We'll see how to implement full-fidelity testing for filesystem, database, and HTTP interactions.
We'll also explore how to use test doubles when full-fidelity testing is not feasible or convenient.

Let's start from test doubles!

Exercise

The exercise for this section is located in 03_mocks/00_intro

Refactor to an interface

Let's look again at the login function from the README of the previous exercise:

#![allow(unused)]
fn main() {
async fn login(
    request: &HttpRequest,
    database_pool: &DatabasePool,
    auth0_client: &Auth0Client,
    rate_limiter: &RateLimiter,
) -> Result<LoginResponse, LoginError> {
    // [...]
}
}

You don't want to spin up a real database, a real Auth0 client, and a real rate limiter in your tests; you want to use test doubles instead.
How do you proceed?

The problem

Rust is a statically typed language.
The login function expects four arguments, and each of them has a specific type. There's no way to pass a different type to the function without running into a compiler error.

In order to use test doubles, you need to decouple login from specific implementations of its dependencies.
Instead of asking for an Auth0Client, you need to ask for something that can act like an Auth0Client.
You need to refactor to an interface.

Traits

In Rust, you use traits to define interfaces.
A trait is a collection of methods that can be implemented by concrete types.

Continuing with the Auth0Client example, you can define a trait that describes the methods you need to interact with Auth0:

#![allow(unused)]
fn main() {
trait Authenticator {
    async fn verify(&self, jwt: &str) -> Result<UserId, VerificationError>;
}
}

You would then implement this trait for Auth0Client:

#![allow(unused)]
fn main() {
impl Authenticator for Auth0Client {
    async fn verify(&self, jwt: &str) -> Result<UserId, VerificationError> {
        // [...]
    }
}
}

Finally, you would change the signature1 of login to ask for an Authenticator instead of an Auth0Client:

#![allow(unused)]
fn main() {
async fn login<A>(
    request: &HttpRequest,
    database_pool: &DatabasePool,
    authenticator: &A,
    rate_limiter: &RateLimiter,
) -> Result<LoginResponse, LoginError>
where
    A: Authenticator,
{
    // [...]
}
}

You have successfully refactored to an interface!

Tread carefully

Refactoring to an interface is the technique you need to master if you want to use test doubles.
All the other exercises in this workshop provide conveniences to reduce the amount of boilerplate you need to write, but they don't fundamentally move the needle on the complexity of the problem.

This kind of refactoring might not always be easy (nor possible!).
You need to analyze your codebase to determine if it's viable and if it's worth the effort. You're introducing a layer of indirection that might not be necessary beyond your tests. That's **incidental complexity **.
In some cases, it might be better to use a full-fidelity testing approach, like the ones you'll see later in this workshop.

1

In this example we've used static dispatch to make login polymorphic with respect to the Authenticator type.
You can also use dynamic dispatch by changing the signature of login to ask for a &dyn Authenticator (a trait object) instead of an &A.

Exercise

The exercise for this section is located in 03_mocks/01_traits

mockall

In the previous exercise you've manually implemented a do-nothing logger for your tests.
It can get tedious to do that for every dependency you want to mock. Let's bring some automation to the party!

Mocking with mockall

mockall is the most popular auto-mocking library for Rust.
It's built around the #[automock] attribute, which generates a mock implementation of a trait for you.

Let's look at an example:

#![allow(unused)]
fn main() {
pub trait EmailSender {
    fn send(&self, email: &Email) -> Result<(), EmailError>;
}
}

To generate a mock implementation of EmailSender, you need to add the #[automock] attribute to the trait:

#![allow(unused)]
fn main() {
use mockall::automock;

#[automock]
pub trait EmailSender {
    fn send(&self, email: &Email) -> Result<(), EmailError>;
}
}

mockall will generate a struct named MockEmailSender with an implementation of the EmailSender trait.
Each of the methods in the trait will have a counterpart in the mock struct, prefixed with expect_.
By calling expect_send, you can configure how MockEmailSender will behave when the send method is called.
In particular, you can define:

  • Preconditions (e.g. assertions on the arguments passed to the method)
  • Expectations (e.g. how many times you expect the method to be called)
  • Return values (e.g. what the method should return)

In an example test:

#![allow(unused)]
fn main() {
#[test]
fn test_email_sender() {
    let mut mock = MockEmailSender::new();
    mock.expect_send()
        // Precondition: do what follows only if the email subject is "Hello"
        .withf(|email| email.subject == "Hello")
        // Expectation: panic if the method is not called exactly once
        .times(1)
        // Return value
        .returning(|_| Ok(()));

    // [...]
}
}

A word on expectations

Expectations such as times are a powerful feature of mockall. They allow you to test how your code interacts with the dependency that's being mocked.

At the same time, they should be used sparingly.
Expectations couple your test to the implementation of the code under test.

Only use expectations when you explicitly want to test how your code interacts with the dependency—e.g. you are testing a retry mechanism and you want to make sure that it retries according to the configured policy.
Avoid settings times expectations on every mock method in your tests just because you can.

Exercise

The exercise for this section is located in 03_mocks/02_mockall

Multiple calls

The problem

Methods on your mock object might be invoked multiple times by the code under test.
In more complex scenarios, you might need to return different values for each invocation. Let's see how to do that.

times

When you add a times expectation to a method, mockall will use the return value you specified up until the times-th invocation.

#![allow(unused)]
fn main() {
#[test]
fn test_times() {
    let mut mock = MockEmailSender::new();
    let email = /* */;
    mock.expect_send()
        .times(2)
        .returning(|_| Ok(()));

    mock.send(&email);
    mock.send(&email);
    // This panics! 
    mock.send(&email);
}
}

You can leverage this feature to return different values depending on the number of times a method has been called.

#![allow(unused)]
fn main() {
#[test]
fn test_times() {
    let mut mock = MockEmailSender::new();
    let email = /* */;
    mock.expect_send()
        .times(1)
        .returning(|_| Ok(()));
    mock.expect_send()
        .times(1)
        .returning(|_| Err(/* */));

    // This returns Ok(())...
    mock.send(&email);
    // ...while this returns Err!
    mock.send(&email);
}
}

Sequence

What we have seen so far works well when you need to return different values for different invocations of the same method.
You can take this one step further by defining a sequence of calls that your mock object should expect, spanning multiple methods.

#![allow(unused)]
fn main() {
#[test]
fn test_sequence() {
    let mut mock = MockEmailSender::new();
    let email = /* */;
    let mut sequence = Sequence::new();
    mock.expect_send()
        .times(1)
        .in_sequence(&mut sequence)
        .returning(|_| Ok(()));
    mock.expect_get_inbox()
        .times(1)
        .in_sequence(&mut sequence)
        .returning(|_| Ok(/* */));

    // This panics because the sequence expected `send` to be called first!
    mock.get_inbox();
}
}

When using a Sequence, you need to make sure that the methods are invoked in the order specified by the sequence.
Invoking get_inbox before send will cause the test to fail, even if they are both called exactly once.

Exercise

The exercise for this section is located in 03_mocks/03_sequences

Checkpoints

The test may need to use your mock object as part of its setup as well as a dependency of the specific code path under test.

For example:

#![allow(unused)]
fn main() {
pub struct Repository(/* ... */);

impl Repository {
    pub fn new<T: UserProvider>(up: T) -> Self {
        // ...
    }

    pub fn query<T: UserProvider>(&self, id: u32, up: T) -> Option<Entity> {
        // ...
    }
}
}

If you're mocking UserProvider and you want to test Repository::query, you'll need to use the mock for calling Repository::new first.

Expectations can leak

To get Repository::new to behave as expected, you'll need to set up some expectations on MockUserProvider.
You'll also need to set up expectations on MockUserProvider for Repository::query to behave as expected.

There's a risk that the expectations you set up for Repository::new will leak into Repository::query: they'll be executed when they shouldn't be, leading to confusing errors in your tests.
This can happen, in particular, when the code in Repository::new changes and stops performing one of the calls you set up expectations for.

Checkpoints

To prevent this from happening, you can use two different instances of MockUserProvider for those calls.
Alternatively, you can rely on checkpoints.
A checkpoint is a way of saying "Panic unless all expectations up to this point to have been met".

In this example, you can use a checkpoint to ensure that the expectations for Repository::new are met before you start setting up expectations for Repository::query.

#![allow(unused)]
fn main() {
#[test]
fn test_repository_query() {
    let mut mock = MockUserProvider::new();
    let mut repo = setup_repository(&mut mock);

    // Set up expectations for Repository::query
    // [...]

    // Call Repository::query
    // [...]
}

fn setup_repository(mock: &mut MockUserProvider) -> Repository {
    // Arrange
    mock.expect_is_authenticated()
        .returning(|_| true);
    // [...]

    // Act
    let repository = Repository::new(mock);

    // Verify that all expectations up to the checkpoint have been met
    mock.checkpoint();

    repository
}
}

If expectations are not met at the checkpoint, it will panic.
If they are met, the test will continue and all expectations will be reset.

Exercise

The exercise for this section is located in 03_mocks/04_checkpoints

Foreign traits

For #[automock] to work, it needs to be applied to the trait definition.
That's not an issue if you're writing the trait yourself, but what if you're using a trait from a third-party crate?

The problem

Rust macros can only access the code of the item they're applied to.
There's no way for macros to ask the compiler "can you give me the trait definition of Debug?".

The "solution"

If you want to use mockall with a trait from a third-party crate, you'll need to rely on their mock! macro and... inline the trait definition in your code.

The syntax is fairly custom—refer to the mock! macro documentation for the specifics.

Exercise

The exercise for this section is located in 03_mocks/05_foreign_traits

Outro

Refactoring to an interface is a key technique that should be in the toolbox of every developer.
Automocking, on the other hand, should be evaluated on a case by case basis: a mock-heavy testing approach often leads to high-maintenance test suites.
Nonetheless, it's important to play with it at least once, so that you can make an informed decision. That was the primary goal of this section!

What's next?

The next three sections will zoom on three different types of external dependencies: the filesystem, databases and HTTP APIs.
We'll look at a few techniques to perform full(er)-fidelity testing when the code under test interacts with this kind of systems.

Exercise

The exercise for this section is located in 03_mocks/06_outro

The testing system: a look under the hood

We won't move on to the next big topic (filesystem testing) just yet.
Instead, we'll take a moment to understand what we've been using so far: the testing system. What happens when you run cargo test?

Different kinds of tests

There are three types of tests in Rust:

  • Unit tests
  • Integration tests
  • Doc tests

Unit tests

Unit tests are the tests you write alongside your non-test code, inside your Rust library or binary.
They're the ones we've been writing so far: an inline module annotated with #[cfg(test)] and a bunch of #[test] functions.

Integration tests

Integration tests are tests that live outside your Rust library or binary, in the special tests/ directory.
You don't need to annotate any module with #[cfg(test)] here: the compiler automatically assumes that everything in tests/ is under #[cfg(test)].

Doc tests

Doc tests are tests that live inside your documentation comments.
They're a great way to make sure your examples are always up-to-date and working.

Compilation units

Depending on the type of test, cargo test will compile and run your tests in different ways:

  • All unit tests defined in the same package are compiled into a single binary and run together (i.e. in a single process).
  • All the tests defined under the same top-level item under tests/ (e.g. a single file tests/foo.rs or a single directory tests/foo/) are compiled into a single binary and run together in the same process. Different top-level items are compiled into different binaries and run in different processes.
  • Each doc test is compiled into a separate binary and run in its own process.

This has a number of consequences:

  • Any global in-memory state (e.g. variables behind a lazy_static! or once_cell::Lazy) is only shared between tests that are compiled into the same binary and run in the same process. If you want to synchronize access to a shared resource across the entire test suite (e.g. a database), you need to use a synchronization primitive that works across processes.
  • The more tests you have, the more binaries cargo test will need to compile and run. Make sure you're using a good linker to minimize the time spent linking your tests.
  • Any process-specific state (e.g. the current working directory) is shared between all the tests that are compiled into the same binary and run in the same process.
    This means that if you change the current working directory in one test, it will affect other tests that share the same process!

The last point will turn out to be quite relevant in the next section: isolating tests that rely on the filesystem from each other.

All the details above apply specifically to cargo test.
If you use a different test runner, you might get different behavior. We'll explore this later in the workshop with cargo-nextest.

Exercise

The exercise for this section is located in 04_interlude/00_testing_infrastructure

Test isolation

The code under test doesn't run in a vacuum.
Your tests change the state of the host system, and that state, in return, affects the outcome of your tests.

Let's use the filesystem as a concrete example. It behaves as a global variable that all your tests share.
If one test creates a file, the other tests will see it.
If two tests try to create a file with the same name, one of them will fail.
If a test creates a file, but doesn't clean it up, the next time the same test runs it might fail.

Those cross-test interactions can make your test suite flaky: tests might pass or fail depending on the order in which they were run. That's a recipe for frustration and wasted time.

We want the best of both worlds: we want to be able to test the effects of our code on the outside world, but we also want our tests to be isolated from each other.
Each test should behave as if it is the only test running.

The plan

This section (and the next two) will be dedicated to various techniques to achieve test isolation when using high-fidelity testing.
In particular, we'll look at what happens when your application interacts with the filesystem, databases and other HTTP APIs.

Exercise

The exercise for this section is located in 05_filesystem_isolation/00_intro

Implicit or explicit?

Testability is a property of software systems.
Given a set of requirements, you can look at implementations with very different levels of testability.

This is especially true when we look at the interactions between the system under test and the host.

Filesystem as a dependency

In Rust, any piece of code can choose to interact with the filesystem. You can create files, read files, delete files, etc.
It doesn't necessarily show up in the function signature. The dependency can be implicit.

#![allow(unused)]
fn main() {
use std::io::{BufReader, BufRead};
use std::path::PathBuf;

fn get_cli_path() -> PathBuf {
    let config = std::fs::File::open("config.txt").unwrap();
    let reader = BufReader::new(config);

    let path = reader.lines().next().unwrap().unwrap();
    PathBuf::from(path)
}
}

It is suspicious that get_cli_path is able to conjure a PathBuf out of thin air. But it's not immediately obvious that it's interacting with the filesystem. It might also be more obfuscated in a real-world codebase (e.g. there might be other inputs).

This is an issue when we want to test get_cli_path.
We can create a file called config.txt where get_cli_path expects it to be, but things quickly become complicated:

  • We can't run tests in parallel if they all invoke get_cli_path and if we need get_cli_path to return different values in different tests, since they would all be reading from the same file.
  • We need to make sure that the file is deleted after each test, regardless of its outcome, otherwise there might be side-effects that affect the outcome of other tests (either in the same run or in a future run).

Let's see how we can refactor get_cli_path to mitigate both issues.

Writing testable code, filesystem edition

1. Take paths as arguments

Instead of hard-coding the path to the config file in get_cli_path, we can take it as an argument.

#![allow(unused)]
fn main() {
use std::io::{BufReader, BufRead};
use std::path::{PathBuf, Path};

fn get_cli_path(config_path: &Path) -> PathBuf {
    let config = std::fs::File::open(config_path).unwrap();
    let reader = BufReader::new(config);

    let path = reader.lines().next().unwrap().unwrap();
    PathBuf::from(path)
}
}

2. If you need to hard-code a path, do it close to the binary entrypoint

If we need to hard-code a path, it is better to do it in the main function, or as close to the binary entrypoint as possible.

use std::path::PathBuf;
use crate::get_cli_path;

fn main() {
    let config_path = PathBuf::from("config.txt");
    let cli_path = get_cli_path(&config_path);
}

This limits the scope of difficult-to-test code. In particular, the binary becomes a very thin (and boring) layer around a library that can be tested in isolation.

Having a thin binary layer around a library is a common pattern in Rust. It is a good pattern to adopt for testability, beyond the specifics of the filesystem. You'll see more examples of this pattern in action later in the workshop!

tempfile

We've refactored get_cli_path to make it easier to test.
But we still need to write those tests!

We have two problems to solve:

  • Each test should use a different file, so that they don't interfere with each other and we can run them in parallel.
  • We need to make sure that the file is deleted after each test, regardless of its outcome.

This is where the tempfile crate comes in handy!
It provides tools to work with temporary files and directories. In this exercise (and the next) we'll focus on how to leverage it!

Exercise

The exercise for this section is located in 05_filesystem_isolation/01_named_tempfile

Temporary files

NamedTempFile solved our issues in the previous exercise. But how does it work?

Temporary file directory

Most operating systems provide a temporary file directory.
You can retrieve the path to the temporary directory using std::env::temp_dir. Files created in that directory will be automatically deleted at a later time.

When are temporary files deleted?

When using NamedTempFile, there are two deletion mechanisms at play:

  • NamedTempFile will delete the file when it is dropped.
    This is robust in the face of panics (if they don't abort!) and is the main mechanism tempfile relies on.
  • If destructor-based deletion fails, the OS will eventually delete the file since it's in the temporary directory.

The latter mechanism is not guaranteed to run at any specific time, therefore NamedTempFile tries to generate a unique filename to minimise the risk of collision with a leaked file.

Security

There are a fair number of OS-specific details to take into account when working with temporary files, but tempfile takes care of all of them for us.
In particular, there are some security considerations when working with NamedTempFile. When it comes to usage in test suites, you're in the clear.

tempfile()

tempfile also provides a tempfile() function that returns a special kind of File: the OS is guaranteed to delete the file when the last handle to it is dropped.

There's a caveat though: you can't access the path to the file.

Refactoring

We could choose to refactor get_cli_path to make tempfile() viable for our testing needs.

#![allow(unused)]
fn main() {
use std::io::BufRead;
use std::path::PathBuf;

fn get_cli_path<R>(config: R) -> PathBuf
where
    R: BufRead,
{
    let path = config
        .lines()
        .next()
        .expect("The config is empty")
        .expect("First line is not valid UTF-8");
    PathBuf::from(path)
}
}

We are no longer performing any filesystem operation in get_cli_path: the configuration "source" is now abstracted behind the BufRead trait.
We could now use get_cli_path to process a File, a String (using std::io::Cursor), or other types that implement BufRead.

This is a valuable refactoring to have in your toolkit, but it's not a panacea.
You'll still need to deal with that filesystem access at some point. You could move it to the binary entrypoint, but does it really count as "thin" and "boring"?
You'll probably have logic to handle failures, with different code paths depending on the error. You should test that!

Evaluate on a case-by-case basis whether it's worth it to refactor your code to make it easier to test with something like tempfile().

Exercise

The exercise for this section is located in 05_filesystem_isolation/02_tempfile

Path coupling

Our testing, so far, has been limited to cases where the code interacts with a single isolated file.
Real-world codebases are rarely that simple.

More often than not, you'll have to deal with multiple files and there'll be assumptions as to where they are located relative to each other.

Think of cargo as an example: it might load the Cargo.toml manifest for a workspace and then go looking for the Cargo.toml files of each member crate based on the relative paths specified in the workspace manifest.
If you just create a bunch of NamedTempFiles, it won't work: the paths will be completely random and the code will fail to find the files where it expects them.

tempdir

The tempfile crate provides a solution for this scenario: TempDir.
With the default configuration, it will create a temporary directory inside the system's temporary directory.
You can then create files inside of that directory using the usual std::fs APIs, therefore controlling the (relative) paths of the files you create.

When TempDir is dropped, it will delete the directory and all its contents.

Working directory

To work with TempDir effectively, it helps to structure your code in a way that minimises the number of assumptions it makes about the current working directory.

Every time you're using a relative path, you're relying on the current working directory: you're reading {working_directory}/{relative_path}.

The current working directory is set on a per-process basis.
As you learned in the interlude, that implies that it is shared between tests, since multiple tests can be compiled into the same binary and run in the same process.
Running std::env::set_current_dir in one test will affect the outcome of the other tests, which is not what we want.

The solution is to make all paths relative to a configurable root directory.
The root directory is set by the binary entrypoint (e.g. main), and it's then passed down to the rest of the codebase.
You can then set the root directory to the path of a TempDir in your tests, and you're good to go!

Exercise

The exercise for this section is located in 05_filesystem_isolation/03_tempdir

Outro

The patterns (and tools) you've learned in this section should help you write more robust tests for code that interacts with the filesystem.

When everything else fails

Nonetheless, they are not a silver bullet: you might not be in control of the code you're testing (e.g. third-party libraries), or it might be too expensive to refactor it to make it more testable.

In those cases, you can take a more radical approach to isolation: run each test in a separate process and set their working directory to a temporary directory (created via tempfile::TempDir).
If you want to go down that route, check out cargo-nextest: it runs your tests as isolated processes by default, and it's best suited for this kind of workflow.

Exercise

The exercise for this section is located in 05_filesystem_isolation/04_outro

Database isolation

Let's move on to another external dependency: the database.
We'll use PostgreSQL as our reference database, but the same principles apply to other databases.

The challenge

The database is often the most complex external dependency in an application, especially if it is distributed.
The database is in charge of storing your data, ensuring its integrity in the face of various kinds of failures and complex concurrent access patterns.

If you want to write a black-box test for your application, you'll have to deal with the database.
The challenge is, in many ways, similar to what we discussed in the previous section about the filesystem: if we just point our tests to a shared database, we'll end up with spurious failures and a slow test suite, since the tests will interfere with each other and we'll be forced to execute them sequentially.

The dream

Our goal is to run our tests in parallel, with minimal overhead, and without having to worry about cross-test interference.
Each test should be able to assume that it is the only test running, and that it can safely modify the database as it sees fit.

Approaches

1. Use an in-memory database

Instead of using an actual database instance, we replace it with an in-memory database.
Each test creates a separate in-memory database, and we don't have to worry about interference between tests.

It isn't all roses, though:

  • You'll have to structure your code so that you can easily swap the database implementation.
    This will inevitably increase the complexity of your application without adding much value to the production code.
  • You're not really testing the system as a whole, since you're not using the same database of your production environment.
    This is especially problematic if you're using database-specific features. An in-memory database will not behave exactly like your production database, especially when it comes to concurrency and locking. Subtle (but potentially serious) bugs will slip through your test suite.

In-memory databases used to be a popular approach, but they have fallen out of favor in recent years since it has become significantly easier to run instances of real databases on laptops and in CI environments. Thanks Docker!

2. Use uncommitted transactions

Many databases (relational and otherwise) support transactions: a way to group multiple operations into a single unit of work that either succeeds or fails as a whole.
In particular, you can use transactions to create a "private" view of the database for each test: what happens in a transaction is not visible to other transactions until it is committed1, but it is visible to the client that created it.
You can leverage this fact to run your tests in parallel, as long as you make sure that each test runs in a separate transaction that's rolled back at the end of the test.

There are some complexities to this approach:

  • When the code under test needs to perform multiple transactions, you end up with nested transactions.
    In a SQL database, that requires (implicitly) converting your COMMIT statements into SAVEPOINT statements. Other databases may not support nested transactions at all.
  • Rust is a statically typed language. Writing code that can accept both an open transaction and a "simple" connection as the object that represents the database can be... complicated.

3. Use a separate database for each test

Since our goal is to isolate each test, the most straightforward approach is to use a separate database for each test!
Today's laptops, combined with Docker, make this approach feasible even for large test suites.

Our recommendation is to use a different logical database for each test, rather than a physical database (e.g. a separate Docker container for each test). It lowers the overhead, resulting in faster tests.

Our recommendation

We recommend approach #3: a separate database for each test.
It has the lowest impact on your production code and it gives you the highest level of confidence in your tests.
We'll see how to implement it with sqlx in the next section.

1

The exact semantics of transactions depend on the isolation level of the database.
What we describe here is the behavior of the READ COMMITTED isolation level, which is the default in PostgreSQL. You need to use an isolation level that doesn't allow dirty reads.

Exercise

The exercise for this section is located in 06_database_isolation/00_intro

Testing with sqlx

Let's try to implement the "one database per test" approach with sqlx.

Spinning up a "physical" database

We need to have a "physical" database instance to create a dedicated logical database for each test.
We recommend using an ephemeral Docker container for this purpose. Containers are portable, easy to spin up and tear down.

If you don't have Docker installed, go get it! You can find instructions here.

In our ideal setup, you'd just execute cargo test and the required setup (i.e. spinning up the container) would be executed automatically. We are not quite there yet, though, so for now you'll have to run it manually:

docker run -p 5432:5432 \
  -e POSTGRES_PASSWORD=password \
  -e POSTGRES_USER=postgres \
  -e POSTGRES_DB=postgres \
  --name test_db \
  postgres:15

Configuring sqlx

For this section, we'll be using sqlx to interact with PostgreSQL.
One of the key features provided by sqlx is compile-time query validation: when you compile your project, sqlx will check that all your queries are valid SQL and that they are compatible with your database schema.
This is done via their custom macros: at compile-time, they issue a statement against a live database to carry out the validation.

For that reason, we need to provide sqlx with a connection string to said database.
The common approach is to define a .env file in the root of the project: sqlx will automatically read it and use the value of the DATABASE_URL variable as the connection string. We'll stick to this approach.

sqlx exposes a few different macro variants, but we'll mostly be using sqlx::query!.

#[sqlx::test]

sqlx itself embraces the "one database per test" approach and provides a custom test attribute, #[sqlx::test], to do the heavy lifting for you.
You add an input parameter to your test function (e.g. pool: sqlx::PgPool) and sqlx will automatically create a new database and pass a connection pool to your test.

You can find the list of injectable parameters in the sqlx::test documentation.

Under the hood, this is what sqlx does:

  • It connects to the database specified in the DATABASE_URL environment variable.
  • It creates a new database with a random name.
  • (Optional) It runs all the migrations in the migrations directory.
  • It creates a connection pool to the new database.
  • It passes the connection pool to your test function.
  • It waits for the test to complete.
  • It deletes the database.

Exercise

The exercise for this section is located in 06_database_isolation/01_sqlx_test

HTTP mocking

We have looked at the filesystem and at databases. It's time to turn our attention to another network-driven interaction: HTTP requests and responses.

The challenge

Most applications rely on external services to fulfill their purposes.
Communication with these services usually happens over the network.

Your code can have complex interactions with these dependencies. Depending on the data you send and receive, your code might go down very different execution paths. The interaction itself might fail in many different ways, which you must handle appropriately.

For the purpose of this section, we'll assume that all communication happens over HTTP, but similar techniques can be applied to other protocols.

HTTP mocking

How do you test your code in these conditions?
At a high-level, you have three options:

  1. You run a test instance of the external service that your code can communicate with during the test.
  2. You use a library that can intercept HTTP requests and return pre-determined responses.
  3. You hide the network dependency behind an abstraction and use a test double rather than the production implementation in your tests.

Option #1 (a complete end-to-end test) is the most realistic setup and gives you the highest confidence in your code. Unfortunately, it's not always feasible: you might not have access to the service, or it might be too expensive to run an isolated instance for each test (e.g. a deep microservice architecture would require you to run a lot of services since each service may depend on others).

Option #3 has been explored in the mocking section of the workshop, so let's set it aside for now.

Option #2 is a middle-ground: you're still running the production implementation of your HTTP client, therefore exercising the whole stack (from your code to the network and back), but you're dodging the complexity of running an actual test instance of the external service.
The downside: you need to make sure that your mocked responses are in sync with the real service. If the service changes its API or behaviour, you need to update your mocks accordingly.

In this section, we'll explore option #2 using wiremock.

Exercise

The exercise for this section is located in 07_http_mocking/00_intro

wiremock

The wiremock crate is a loose port of the well-known WireMock library from Java.

How does it work?

The core idea in wiremock is simple: you start a server that listens for HTTP requests and returns pre-determined responses. The rest is just sugar to make it easy to define matching rules and expected responses.

MockServer

MockServer is the interface to the test server.
When you call MockServer::start(), a new server is launched on a random port. You can retrieve the base URL of the server with MockServer::uri().

#![allow(unused)]
fn main() {
#[tokio::test]
async fn test() {
    let mock_server = MockServer::start().await;
    let base_url = mock_server.uri();
    // ...
}
}

wiremock uses a random port in MockServer in order to allow you to run tests in parallel.
If you specify the same port across multiple tests, you're then forced to run them sequentially, which can be a significant performance hit.

Writing testable code, HTTP client edition

Let's assume that we have a function that sends a request to GitHub's API to retrieve the tag of the latest release for a given repository:

#![allow(unused)]
fn main() {
use reqwest::Client;

async fn get_latest_release(client: &Client, repo: &str) -> Result<String, reqwest::Error> {
    let url = format!("https://api.github.com/repos/{repo}/releases/latest");
    let response = client.get(&url).send().await?;
    let release = response.json::<serde_json::Value>().await?;
    let tag = release["tag_name"].as_str().unwrap();
    Ok(tag.into())
}
}

As it stands, this function cannot be tested using wiremock.

1. Take base URLs as arguments

We want the code under the test to send requests to the MockServer we created in the test.
We can't make that happen if the base URL of the external service is hard-coded in the function.

Base URLs must be passed as arguments to the code under test:

#![allow(unused)]
fn main() {
use reqwest::Client;

async fn get_latest_release(client: &Client, github_base_uri: http::Uri, repo: &str) -> Result<String, reqwest::Error> {
    let endpoint = format!("{github_base_uri}/repos/{repo}/releases/latest");
    let response = client.get(&endpoint).send().await?;
    let release = response.json::<serde_json::Value>().await?;
    let tag = release["tag_name"].as_str().unwrap();
    Ok(tag.into())
}
}

2. If you need to hard-code a base URL, do it close to the binary entrypoint

If we need to hard-code a base URL, it is better to do it in the main function, or as close to the binary entrypoint as possible. This limits the scope of difficult-to-test code. In particular, the binary becomes a very thin (and boring) layer around a library that can be tested in isolation.

Even better: take the base URL as part of your application configuration.

Mock

You have a MockServer and the code under test has been refactored to make the base URL configurable. What now? You need to configure MockServer to respond to incoming requests using one or more Mocks.

A Mock lets you define:

  • Preconditions (e.g. assertions on the requests received by the server)
  • Expectations (e.g. how many times you expect the method to be called)
  • Response values (e.g. what response should be returned to the caller)

Yes, this is very similar to mockall!

In an example test:

#![allow(unused)]
fn main() {
use wiremock::{MockServer, Mock, ResponseTemplate};
use wiremock::matchers::method;

#[tokio::test]
async fn test() {
    let mock_server = MockServer::start().await;

    // Precondition: do what follows only if the request method is "GET"
    Mock::given(method("GET"))
        // Response value: return a 200 OK
        .respond_with(ResponseTemplate::new(200))
        // Expectation: panic if this mock doesn't match at least once
        .expect(1..)
        .mount(&mock_server)
        .await;

    // [...]
}
}

A Mock doesn't take effect until it's registered with a MockServer. You do that by calling Mock::mount and passing the MockServer as an argument, as in the example above.

Expectations

Setting expectations on a Mock is optional: use them when you want to test how your code interacts with the dependency that's being mocked, but don't overdo it.
Expectations, by default, are verified when the MockServer is dropped. We'll look at other verification strategies in a later section.

Exercise

The exercise for this section is located in 07_http_mocking/01_basics

Matchers

When configuring a Mock, you can specify one or more matchers for incoming requests.
The Mock is only triggered if the incoming request satisfies all the matchers attached to it.

Common matchers

The wiremock crate provides an extensive collection of matchers out of the box.
Check out the documentation of the matchers module for the full list.

Writing your own matchers

Occasionally, you'll need to write your own matchers, either because you need to match on a property that's not supported by the built-in matchers, or because you want to build a higher-level matcher out of existing ones.

To write a custom matcher, you need to implement the Match trait:

#![allow(unused)]
fn main() {
pub trait Match: Send + Sync {
    // Required method
    fn matches(&self, request: &Request) -> bool;
}
}

The trait is quite straight-forward. It has a single method, matches, that takes a reference to the incoming Request and returns a bool: true if the request matches, false otherwise.

Exercise

The exercise for this section is located in 07_http_mocking/02_match

Checkpoints

When a MockServer instance goes out of scope (i.e. when it's dropped), it will verify that all the expectations that have been set on its registered mocks have been satisfied.

When you have a complex mocking setup, it can be useful to verify the state of the mocks before the end of the test.
wiremock provides two methods for this purpose:

verify is self-explanatory, so let's dive into scoped mocks.

Scoped mocks

When you register a mock with MockServer::register, it will be active until the MockServer instance goes out of scope.
MockServer::register_scoped, instead, returns a MockGuard.
The mock will be active until the guard is alive. When the guard goes out of scope, the mock will be removed from the MockServer instance and its expectations will be verified.

Exercise

The exercise for this section is located in 07_http_mocking/03_checkpoints

Outro

wiremock is an example of transferable knowledge: once you've learned how to use a mocking library (e.g. mockall) you can apply the same patterns to any other library in the same category.
You just need to learn the specifics of the new domain (HTTP, in this case), but the general approach remains the same.

Onwards

You are done with full-fidelity testing techniques.
In the next section, you'll take matters into your own hands. You'll be building your own test runners and custom test macros!

Exercise

The exercise for this section is located in 07_http_mocking/04_outro

Test macros

In the previous sections you've had a chance to see quite a few "custom" test macros in action: #[googletest::test], #[tokio::test], #[sqlx::test]. Sometimes you even combined them, stacking them on top of each other!

In this section, you'll learn why these macros exist and how to build your own.

The default toolkit is limited

cargo test and #[test] are the two building blocks of the Rust testing ecosystem, the ones available to you out of the box.
They are powerful, but they lack a few advanced features that you might be familiar with from testing frameworks in other ecosystems:

  • No lifecycle hooks. You can't easily execute code before or after a test case. That's a requirement if you want to set up and tear down external resources (e.g. a database, like in #[sqlx::test]).
  • No fixtures. You can't inject types into the signature of a test function and expect the test framework to instantiate them for you (e.g. like PgPool with #[sqlx::test]).
  • No parameterised tests. You can't run the same test with different inputs and have each input show up as a separate test case in the final test report (e.g. see rstest).
  • No first-class async tests. Rust doesn't ship with a default executor, so you can't write async tests without pulling in a third-party crate. Macros like #[tokio::test], under the hood, rewrite your async test function as a sync function with a call to block_on (see here).

Macros to the rescue

Custom test macros are a way to augment the default toolkit with the features you need.
All the macros we mentioned so far are attribute procedural macros.
Procedural macros are token transformers. As input, they receive:

  • A stream of tokens, representing the Rust code that's been annotated with the macro;
  • A stream of tokens, representing the arguments passed to the macro.

As output, they return another stream of tokens, the Rust code that will actually be compiled as part of the crate that used the macro.

Example: #[tokio::test]

Let's look at an example to make things concrete: #[tokio::test].
The #[tokio::test] macro definition looks like this:

#![allow(unused)]
fn main() {
use proc_macro::TokenStream;

#[proc_macro_attribute]
pub fn test(args: TokenStream, item: TokenStream) -> TokenStream {
    // [...]
}
}

If you use #[tokio::test] on a test function, we can see the two streams of tokens in action:

#![allow(unused)]
fn main() {
#[tokio::test(flavor = "multi_thread")]
async fn it_works() {
    assert!(true);
}
}
  • The first stream of tokens (args) contains the arguments passed to the macro: flavor = "multi_thread".
  • The second stream of tokens (item) contains the Rust code that's been annotated with the macro: async fn it_works() { assert!(true); }.
  • The output stream, instead, will look like this:
#![allow(unused)]
fn main() {
#[test]
fn it_works() {
    tokio::runtime::Builder::new_multi_thread()
        .enable_all()
        .build()
        .unwrap()
        .block_on(async {
            assert!(true);
        })
}
}

Objectives

This is not a workshop on procedural macros, so we won't be exploring advanced macro-writing techniques.
Nonetheless, a basic understanding of how macros work and a few exercises can go a long way: you don't need to know that much about macros to write your own test macro!

That's the goal of this section.

Exercise

The exercise for this section is located in 08_macros/00_intro

Your first macro

Let's start from the basics: you'll write a macro that does nothing. It just re-emits the code that's been annotated with the macro, unchanged.
This will give you a chance to get familiar with the overall setup before moving on to more complex endeavors.

proc-macro = true

You can't define a procedural macro in a "normal" library crate.
They need to be in a separate crate, with a Cargo.toml that includes this key:

[lib]
proc-macro = true

That key tells cargo that this crate contains procedural macros and it should be compiled accordingly.

#[proc_macro_attribute]

There are various kinds of procedural macros:

  • Function-like macros. Their invocation looks like a function call (e.g. println!).
  • Derive macros. They're specified inside a derive attribute (e.g. #[derive(Debug)]).
  • Attribute procedural macros. They're applied to items as attributes (e.g. #[tokio::test]).

For a test macro, we need an attribute procedural macro.
As you've learned in the intro, it's a function that's annotated with #[proc_macro_attribute]:

#![allow(unused)]
fn main() {
use proc_macro::TokenStream;

#[proc_macro_attribute]
pub fn my_attribute_macro(args: TokenStream, item: TokenStream) -> TokenStream {
    // [...]
}
}

The proc_macro crate is distributed as part of the Rust toolchain, just like the standard library, std.

Exercise

The exercise for this section is located in 08_macros/01_no_op_macro

Parsing tokens

In the previous exercise, both #[vanilla_test] and the default #[test] macro had to be specified on top of the test function. Without adding #[test], the annotated function is not picked up by the test runner.

Detecting existing attributes

You'll augment #[vanilla_test]:

  • If the annotated function has been annotated with #[test], it should emit the code unchanged.
  • If the annotated function has not been annotated with #[test], it should add #[test] to the function.

This is how #[googletest::test] works, for example.

The toolkit

When the macro game is serious, you can't get by using the built-in proc_macro crate.
Almost all macros written in Rust are built on top of three ecosystem crates:

  • syn for parsing tokens into abstract syntax tree nodes (AST node)
  • quote for expressing the generated code with a println!-style syntax
  • proc-macro2, a wrapper around proc_macro's types

Exercise

The exercise for this section is located in 08_macros/02_test

Parsing arguments

Believe it or not, but you've now touched the entirety of the core macro ecosystem.
From now onwards, it's all about exploring the crates further while learning the intricacies of the Rust language: you're continuously faced with weird edge cases when writing macros for a broad audience.

Arguments

But it's not over yet!
Let's get you to exercise these muscles a bit more before moving on to the next topic.

Our #[vanilla_test] macro is still a bit too vanilla.
We have now renamed it to #[test], and we have higher expectations: it should support arguments!

If a before argument is specified, the macro should invoke it before the test function.
If an after argument is specified, the macro should invoke it after the test function.
It should be possible to specify both on the same test.

Caution

The happy case is often not that difficult when writing macros.
The challenge is returning good error messages when things go wrong.

In this exercise, a lot of things can go wrong:

  • The item passed to the macro as before or after is not a function
  • The item passed to the macro as before or after is a function that takes arguments
  • The item passed to the macro as before or after is a function, but it's not in scope
  • Etc.

You can often overlook most of these issues if you're writing a macro for your own use. But they become important when you're writing a macro for a larger audience.

Exercise

The exercise for this section is located in 08_macros/03_hooks

Outro

Custom test macros can get you a long way, but they're not a silver bullet.

Complexity

Writing macros is its own skill: you can work with Rust successfully for years without ever having to go beyond a macro_rules! definition.
The next time you get the impulse to write a macro, ask yourself: if a colleague opens this file in 6 months, will they be able to understand what's going on?

Test-scoped

Furthermore, there's a limit to what you can do with custom test macros.
Their action is scoped to a single test case and it's cumbersome to customise the way the whole test suite is run.

Next

In the next chapter, we'll look at one more way to customise your tests: custom test harnesses.

Exercise

The exercise for this section is located in 08_macros/04_outro

Test harnesses

In the interlude we had a first look under the hood of cargo test. In particular, you learned how tests are grouped into executables and reflected on the implications.

In this chapter, we'll take things one step further: you'll write your own test harness!

Test targets

In your past projects you might have had to set properties for your binary ([[bin]]) and library ([lib]) targets in your Cargo.toml.
You can do the same for your test targets!

[[test]]
name = "integration"

The configuration above declares the existence of a test target named integration.
By default, cargo expects to find it in tests/integration.rs. You can also customize the path to the test entrypoint using the path property.

You don't often see [[test]] targets in the wild because cargo infers them automatically—i.e. if you have a tests/integration.rs file, it will automatically be compiled and run as an integration test.

When you see a [[test]] target in a Cargo.toml, it's usually because the author wants to disable the default test harness:

[[test]]
name = "integration"
# 👇 That's enabled by default
harness = false

Test harness

The test harness is the code that cargo invokes to run each of your test suites.

When harness is set to true, cargo automatically creates an entrypoint (i.e. a main function) for your test executable using libtest, the default test harness.

When harness is set to false, cargo expects you to provide your own entrypoint.

Pros and cons

With a custom test harness, you are in charge!
You can execute logic before and after running your tests, you can customise how each test is run (e.g. running them in separate processes), etc.

At the same time, you need to provide an entrypoint that integrates well with cargo test's CLI interface. Listing, filtering, etc. are all features that you'll need to add support for, they don't come for free.

This section

We'll start by writing a simple test harness, to get familiar with the basics. We'll then explore libtest_mimic, a crate that takes over most of the heavy lifting required to write a high-quality custom test runner.

Let's get started!

Exercise

The exercise for this section is located in 09_test_harness/00_intro

Custom test harness

Test targets

In your past projects you might have had to set properties for your binary ([[bin]]) and library ([lib]) targets in your Cargo.toml.
You can do the same for your test targets!

[[test]]
name = "integration"

The configuration above declares the existence of a test target named integration.
By default, cargo expects to find it in tests/integration.rs. You can also customize the path to the test entrypoint using the path property.

You don't often see [[test]] targets in the wild because cargo infers them automatically—i.e. if you have a tests/integration.rs file, it will automatically be compiled and run as an integration test.

When you see a [[test]] target in a Cargo.toml, it's usually because the author wants to disable the default test harness:

[[test]]
name = "integration"
# 👇 That's enabled by default
harness = false

Test harness

The test harness is the code that cargo invokes to run your tests.

When harness is set to true, cargo automatically creates an entrypoint (i.e. a main function) for your test executable using libtest, the default test harness.

When harness is set to false, cargo expects you to provide your own entrypoint.

Pros and cons

With a custom test harness, you are in charge!
You can execute logic before and after running your tests, you can customise how each test is run (e.g. running them in separate processes), etc.

At the same time, you need to provide an entrypoint that integrates well with cargo test's CLI interface. Listing, filtering, etc. are all features that you'll need to add support for, they don't come for free.

Exercise

The exercise for this section is located in 09_test_harness/01_harness

Quaking like cargo test

As you have seen in the previous exercise, there are no requirements on your test entrypoint beyond... existing.
You can execute arbitrary logic, print in whatever format, etc.
The only thing that cargo test cares about is the exit code of your test executable, which must be 0 if all tests passed, and 1 otherwise.

Integration brings benefits

Your test harness might be custom, but it's still being invoked via cargo test.
As a CLI command, cargo test exposes quite a few knobs: you can list tests, filter them, control the number of threads used to run them, etc.

All those features become demands on your custom test harness: are you going to honor them? Or are you going to ignore them?

The latter is less work, but the resulting behaviour will surprise your user. If I run cargo test <test_name>, I expect only <test_name> to be run, not all tests.
But if your custom test harness ignores CLI arguments, that's exactly what will happen.

The same applies when interacting with other tools—e.g. CI systems. If your test report format is not compatible with cargo test's, you'll have to write a custom adapter to make it work.

libtest_mimic

Matching cargo test's behaviour is a lot of work.
Luckily, you don't have to do it yourself: libtest_mimic can take over most of the heavy lifting.

It provides an Arguments struct that can be used to parse cargo test's CLI arguments.
Arguments is one of the two inputs to their run function, the other being all the tests in your test suite. run interprets the parsed arguments and runs the tests accordingly (e.g. listing them, filtering them, etc.). It's a testing framework, so to speak.

Exercise

The exercise for this section is located in 09_test_harness/02_cli

Outro

A custom test harness gives you a great deal of flexibility, but there are some limitations.

No #[test] attribute

The most obvious one is that you can't use the #[test] attribute.
There is no built-in mechanism to automatically collect all annotated tests, as cargo test does with #[test].
You either have to manually register your tests (e.g. as you did in the previous exercise with that vector) or find a way to automatically collect them (e.g. by establishing a file naming convention).

You can try to emulate distributed registration using some third-party crates (e.g. linkme or inventory).

Suite-scoped

Using a custom test harness you can customise how a single test suite is run.
If you need to perform some setup or teardown actions before or after all test suites, you're out of luck. You still need to design some cross-process communication mechanism to coordinate across different test binaries.
Alternatively, you need to replace cargo test with a different command that takes charge of collecting and running all your test binaries (e.g. like cargo-nextest).

Exercise

The exercise for this section is located in 09_test_harness/03_outro

Combining everything together

We've covered a lot of ground together: a new assertion framework, snapshot testing, (auto)mocking, full-fidelity testing for various resources as well as tooling to build custom test macros and harnesses.
I've tried to break each topic down into small bites, empowering you to build up your knowledge incrementally.

It's time to put everything together!

The challenge

You have to design a custom test harness that's going to do the following:

  • Start a (named) Docker container for PostgreSQL before running any tests.
  • Before each test:
    • Create a separate logical database in the container
    • Run migrations on the database
  • Run all tests in parallel, while injecting a PgPool instance for each test
  • After each test:
    • Drop the logical database
  • Stop the Docker container after all tests have completed

I don't have a suite of tests for you here, but please call me in when you're done—I want to see what you come with!

Exercise

The exercise for this section is located in 10_capstone/00_capstone