Welcome
Welcome to "Advanced Rust testing"!
No application is an island: you need to interact with third-party APIs, databases and who knows what else.
Testing those interactions is tricky, to say the least! This course will focus on expanding your Rust testing toolkit,
going beyond the basic techniques you're already familiar with.
At the end of the course you'll have a strategy to test most of the scenarios that are relevant for a complex Rust
application.
The course assumes you have a good understanding of Rust's basic concepts and want to move beyond the built-in testing toolkit.
Methodology
This course is based on the "learn by doing" principle.
You'll build up your knowledge in small, manageable steps. It has been designed to be interactive and hands-on.
Mainmatter developed this course
to be delivered in a classroom setting, over a whole day: each attendee advances
through the lessons at their own pace, with an experienced instructor providing
guidance, answering questions and diving deeper into the topics as needed.
If you're interested in attending one of our training sessions, or if you'd like to
bring this course to your company, please get in touch.
You can also take the course on your own, but we recommend you find a friend or
a mentor to help you along the way should you get stuck. You can
also find solutions to all exercises in the
solutions
branch of the GitHub repository.
Prerequisites
To follow this course, you must install Rust.
If Rust is already installed on your machine, make sure to update it to the latest version:
# If you installed Rust using `rustup`, the recommended way,
# you can update to the latest stable toolchain with:
rustup update stable
You'll also need the nightly
toolchain, so make sure to install it:
rustup toolchain install nightly
Don't start the course until you have these tools installed and working.
Structure
On the left side of the screen, you can see that the course is divided into sections.
To verify your understanding, each section is paired with an exercise that you need to solve.
You can find the exercises in the
companion GitHub repository.
Before starting the course, make sure to clone the repository to your local machine:
# If you have an SSH key set up with GitHub
git clone git@github.com:mainmatter/rust-advanced-testing-workshop.git
# Otherwise, use the HTTPS URL:
#
# git clone https://github.com/mainmatter/rust-advanced-testing-workshop.git
We recommend you work on a branch, so you can easily track your progress and pull updates from the main repository if needed:
cd rust-advanced-testing-workshop
git checkout -b my-solutions
All exercises are located in the exercises
folder.
Each exercise is structured as a Rust package.
The package contains the exercise itself, instructions on what to do (in src/lib.rs
), and a mechanism to
automatically verify your solution.
You also need to install ctr
(Check Test Results), a little tool that will be invoked
to verify the outcomes of your tests:
# Install `ctr` from the top-level folder of the repository
cargo install --path ctr
wr
, the workshop runner
To verify your solutions, we've provided a tool that will guide you through the course.
It is the wr
CLI (short for "workshop runner").
Install it with:
cargo install --locked workshop-runner
In a new terminal, navigate back to the top-level folder of the repository.
Run the wr
command to start the course:
wr
wr
will verify the solution to the current exercise.
Don't move on to the next section until you've solved the exercise for the current one.
We recommend committing your solutions to Git as you progress through the course, so you can easily track your progress and "restart" from a known point if needed.
Enjoy the course!
Author
This course was written by Luca Palmieri, Principal Engineering
Consultant at Mainmatter.
Luca has been working with Rust since 2018, initially at TrueLayer and then at AWS.
Luca is the author of "Zero to Production in Rust",
the go-to resource for learning how to build backend applications in Rust,
and "100 Exercises to Learn Rust", a learn-by-doing introduction to Rust itself.
He is also the author and maintainer of a variety of open-source Rust projects, including
cargo-chef
,
Pavex and wiremock
.
Exercise
The exercise for this section is located in 00_intro/00_welcome
Exercise expectations
By this point you should have all the tools installed and ready to go. Let's discuss how automated verification works in this course.
This is a testing workshop, therefore we need to check that the tests you write behave as expected. It's a bit meta!
It's not enough to know that a test failed, we also need to know why it failed and what message it produced.
We do this by using ctr
, the custom tool you just installed. It runs the tests in each exercise and compares
the outcome with a set of expectations.
You can find those expectations in
the expectations.yml
file.
You should never modify this file. Refer to it in order to understand what the tests are supposed to do,
but don't change it.
Exercise
The exercise for this section is located in 01_better_assertions/00_intro
The built-in testing toolkit
The standard library provides three macros for test assertions:
assert!
,
assert_eq!
and
assert_ne!
.
They're used to check that a condition is true, or that two values are equal or not equal, respectively.
#![allow(unused)] fn main() { #[test] fn t() { assert!(true); assert_eq!(1, 1); assert_ne!(1, 2); } }
Panic messages
If the assertion fails, the macro will panic and it'll try to print a useful message for you to understand what went
wrong.
In the case of assert_eq!
and assert_ne!
, the message will include the values that were compared.
#![allow(unused)] fn main() { #[test] fn t() { assert_eq!(1, 2); } }
thread 'main' panicked at 'assertion failed: `(left == right)`
left: `1`,
right: `2`', src/main.rs:2:5
In the case of assert!
, the message will include the condition that was checked, stringified.
#![allow(unused)] fn main() { #[test] fn t() { let v = vec![1]; assert!(v.is_empty()); } }
thread 'main' panicked at 'assertion failed: v.is_empty()', src/main.rs:3:5
Custom panic messages
The default panic messages are useful for simple cases, but they don't keep up with more complex scenarios.
Going back to our Vec
example, we might want to know what values were in the vector when the assertion failed, or
how many elements it actually contained.
That's why all three macros accept an additional (optional) argument: a custom message to print when the assertion
fails.
You've seen this in the previous exercise:
#![allow(unused)] fn main() { #[test] fn assertion_with_message() { assert_eq!(2 + 2, 5, "The Rust compiler hasn't read 1984 by George Orwell.") } }
The custom message will be printed in addition to the default message for assert_eq!
and assert_ne!
.
For assert!
, it will replace the default message.
Exercise
The exercise for this section is located in 01_better_assertions/01_std_assertions
Assertion infrastructure
As you've seen in the previous exercise, you can get pretty nice test failure messages with the standard library's assertions if you take the time to write a custom message. That additional friction is a problem, though.
If you don't bother to write a custom message, you'll get a generic error that doesn't help you understand what went wrong. It'll take you longer to fix tests.
If you choose to bother, you don't want to write the same custom message over and over again. You want to write it once
and reuse it.
You end up writing a custom assertion function, like we did in the previous exercise.
But you aren't working on this project alone. You have a team! You now need to teach your team that this custom
assertion
function exists if you want to have a consistent testing style across your codebase.
Congrats, you've just written your own assertion library!
Invest where it matters
Don't get me wrong: you should write custom assertions.
Once your project gets complex enough, you will have to write your own matchers.
They'll be bespoke to your domain and they'll help you write tests that are easy to read and maintain.
But that's a tiny fraction of the assertions you'll write.
For all the generic stuff, the one that stays the same across projects, you don't want to take over the burden
of writing and maintaining your own assertion library.
In that area, you want to standardise on an existing library that's well maintained and has a large community. If
you do that, you'll be able to reuse your knowledge across projects and you'll be able to find help online when you need
it.
You can always choose to contribute to the library if you find a bug or a missing feature.
googletest
There's a few options when it comes to assertion libraries for Rust.
We'll use googletest
in this workshop.
It's a Rust port of the famous GoogleTest C++ testing library.
It comes, out of the box, with a rich set of matchers and a nice way to write custom ones. It also includes
a few useful macros for more complex testing scenarios—we'll explore them in the coming exercises.
Exercise
The exercise for this section is located in 01_better_assertions/02_googletest
Basic matchers
To truly leverage a testing library like googletest
you need to get familiar with their built-in matchers.
They're the building blocks of your assertions and they need to roll off your fingers as easily as assert_eq!
does.
We'll spend this exercise and a few more to get familiar with the most common matchers, starting with the most basic ones.
Tooling helps: coding assistants like GitHub Copilot or Cody will start suggesting the right matchers as you type if you've already used them in a few tests in the same project.
Exercise
The exercise for this section is located in 01_better_assertions/03_eq
Option
and Result
matchers
googletest
comes with a few special matchers for Option
and Result
that return good error messages
when something that should be Some
or Ok
is actually None
or Err
, and vice-versa.
Exercise
The exercise for this section is located in 01_better_assertions/04_options_and_results
Enums
The matchers we've seen in the previous exercise are specialised for Option
and Result
, but googletest
also has
a more generic matcher to match variants of arbitrary enums (and other patterns).
Exercise
The exercise for this section is located in 01_better_assertions/05_enums
Collections
We close our tour of googletest
's built-in matchers with a look at specialised matchers for collections.
googletest
really shines with collections. The matchers are very expressive and can be combined in powerful ways.
Failure messages are also extremely helpful, showing the actual values and highlighting the differences.
Achieving the same level of helpfulness with assert!
would require a lot of boilerplate!
Exercise
The exercise for this section is located in 01_better_assertions/06_collections
Custom matchers
Built-in matchers can only take you so far. Sometimes you need to write your own!
The Matcher
trait
All matchers must implement the Matcher
trait. There are two key methods you need to implement:
matches
. It returnsMatcherResult::Match
if it matched,MatcherResult::NoMatch
otherwise.describe
. It returns a description of the outcome of the match. This is shown to the user when the match fails.
Optionally, you can also implement the explain_match
method if you want to include further information
derived from the actual and expected values in the failure message shown to the user.
Patterns
Most matchers in googletest
follow the same pattern.
You define two items:
- A struct which implements the
Matcher
trait ( e.g.EqMatcher
) - A free function that returns an instance of the struct (
e.g.
eq
)
The free function is a convenience for the user since it results in terser assertions.
You can also choose to make the struct type private, returning impl Matcher
from the free function instead
(see anything
as an example).
Exercise
The exercise for this section is located in 01_better_assertions/07_custom_matcher
expect_that!
All your googletest
tests so far have used the assert_that!
macro.
If the assertion fails, it panics and the test fails immediately. No code after the assertion is executed.
expect_that!
googletest
provides another macro, called expect_that!
.
It uses the same matchers as assert_that!
, but it doesn't panic if the test fails.
When the test ends (either because the entire test function has been executed or because it later
panicked), googletest
will check if any expect_that!
assertions failed and report them as test failures.
This allows you to write tests that check multiple things and report all the failures at once.
A good use case is verifying multiple properties on the same object.
Exercise
The exercise for this section is located in 01_better_assertions/08_expect_that
Snapshot testing
In all the tests we've written so far we've always manually created the expected value.
This is fine for simple cases, but it can quickly become cumbersome when the expected value is complex
(e.g. a large JSON document) and it needs to be updated fairly often (e.g. the responses of a downstream API
service that's under active development).
To solve this problem we can use snapshot testing.
You snapshot the output of an operation and compare it with a previously saved snapshot.
You then review the changes and decide whether they are expected or not: if they are, we can automatically update the
snapshot.
insta
insta
is an established snapshot testing library for Rust.
It comes with a CLI, cargo-insta
, which we'll use to manage our snapshots.
Install it before moving forward:
cargo install --locked cargo-insta
Exercise
The exercise for this section is located in 02_snapshots/00_intro
Your first snapshots
insta
macros
To work with snapshots, we need to use insta
's assertion macros.
There's one macro for each format we want to compare:
assert_snapshot!
for stringsassert_debug_snapshot!
to compare theDebug
representation of a value with a snapshotassert_display_snapshot!
to compare theDisplay
representation of a value with a snapshotassert_json_snapshot!
to compare JSON values- etc. for other formats (check the documentation for a complete list)
You always want to use the most specific macro available, since it will give you better error messages thanks to the more specific comparison logic.
insta
review
The key command exposed by insta
's CLI is cargo insta review
.
It will compare the snapshots generated by your last test run with the ones you had previously saved.
Exercise
The exercise for this section is located in 02_snapshots/01_snapshots
Where do snapshots go?
Inline
In the previous exercise we used an inline snapshot.
Inline snapshots are stored in the test itself:
#![allow(unused)] fn main() { #[test] fn snapshot() { let m = "The new value I want to save"; assert_snapshot!(m, @"The old snapshot I want to compare against") } }
When you update the snapshot, the test source code is modified accordingly. Check again the lib.rs
file
of the previous exercise to see it for yourself!
External
Storing the snapshot inline has its pros: when you look at a test, you can immediately see what the expected value is.
It becomes cumbersome, however, when the snapshot is large: it clutters the test and makes it harder to read.
For this reason, insta
supports external snapshots.
They are stored in a separate file and retrieved on the fly when the test is run:
#![allow(unused)] fn main() { #[test] fn snapshot() { let m = "The new value I want to save"; assert_snapshot!(m) } }
By default, file snapshots are stored in a snapshots
folder right next to the test file where this is used.
The name of the file is <module>__<name>.snap
where the name
is derived automatically from the test name.
You choose to set a custom name, if you want to:
#![allow(unused)] fn main() { #[test] fn snapshot() { let m = "The new value I want to save"; assert_snapshot!("custom_snapshot_name", m) } }
Exercise
The exercise for this section is located in 02_snapshots/02_storage_location
Handling non-reproducible data
Sometimes the data you want to snapshot cannot be reproduced deterministically in different runs of the test.
For example, it might contain the current timestamp or a random value.
In these cases, you can use redactions to remove the non-reproducible parts of the data before taking the snapshot (and before comparing it with the saved one).
Redactions
Redactions are specified as an additional argument of the assertion macro you're using.
They only work for structured formats (e.g. JSON, XML, etc.). If you're snapshotting a string, you can use
regex filters instead.
Redactions use a jq
-style syntax to specify the parts of the data you want to remove:
refer to the documentation for an exhaustive reference.
Exercise
The exercise for this section is located in 02_snapshots/03_redactions
Outro
Congrats, you just made it to the end of our section on snapshot testing!
Snapshot testing is a surprisingly simple technique, but it can be a real game changer.
Error messages, Debug
representations, API responses, macro expansions: the list of things you can
test more easily with snapshots is long!
If you have any questions around insta
, this is a good time to pull me over and ask them!
Exercise
The exercise for this section is located in 02_snapshots/04_outro
Mocking
We love to think about software as a collection of small, well-defined units that are then composed together
to create more complex behaviours.
Real codebases are rarely that simple, though: they often contain complex interactions with external services,
tangled dependencies, and a lot of incidental complexity.
Those dependencies make testing harder.
Example: a login
endpoint
Let's look at an example:
#![allow(unused)] fn main() { async fn login( request: &HttpRequest, database_pool: &DatabasePool, auth0_client: &Auth0Client, rate_limiter: &RateLimiter, ) -> Result<LoginResponse, LoginError> { // [...] } }
The login
function has four dependencies: the incoming HTTP request, a database connection pool, an Auth0 client, and
a rate limiter.
To invoke login
in your tests, you need to provide all of them.
Let's make the reasonable assumption that login
is asking for those dependencies because it needs them to do its job.
Therefore you can expect queries and HTTP requests to be made when you invoke it in your tests. Something
needs to handle those queries and requests, otherwise you won't be able to exercise the scenarios you care about.
A spectrum
When it comes to testing, all approaches exist on a spectrum.
On one end, you have full-fidelity testing: you run your code with a setup that's as close as possible to the production environment. A real database, a real HTTP client, a real rate limiter.
On the other end, you have test doubles: you replace your dependencies with alternative implementations that are easier to create and control in your tests.
Full-fidelity testing gives you the highest confidence in your code, but it can be expensive to set up and maintain.
Test doubles are cheaper to create, but they can be a poor representation of the real world.
This course
During this course, we'll cover both approaches.
We'll see how to implement full-fidelity testing for filesystem, database, and HTTP interactions.
We'll also explore how to use test doubles when full-fidelity testing is not feasible or convenient.
Let's start from test doubles!
Exercise
The exercise for this section is located in 03_mocks/00_intro
Refactor to an interface
Let's look again at the login
function from the README of the previous exercise:
#![allow(unused)] fn main() { async fn login( request: &HttpRequest, database_pool: &DatabasePool, auth0_client: &Auth0Client, rate_limiter: &RateLimiter, ) -> Result<LoginResponse, LoginError> { // [...] } }
You don't want to spin up a real database, a real Auth0 client, and a real rate limiter in your tests; you want
to use test doubles instead.
How do you proceed?
The problem
Rust is a statically typed language.
The login
function expects four arguments, and each of them has a specific type. There's no way to pass
a different type to the function without running into a compiler error.
In order to use test doubles, you need to decouple login
from specific implementations of its dependencies.
Instead of asking for an Auth0Client
, you need to ask for something that can act like an Auth0Client
.
You need to refactor to an interface.
Traits
In Rust, you use traits to define interfaces.
A trait is a collection of methods that can be implemented by concrete types.
Continuing with the Auth0Client
example, you can define a trait that describes the methods you need to
interact with Auth0:
#![allow(unused)] fn main() { trait Authenticator { async fn verify(&self, jwt: &str) -> Result<UserId, VerificationError>; } }
You would then implement this trait for Auth0Client
:
#![allow(unused)] fn main() { impl Authenticator for Auth0Client { async fn verify(&self, jwt: &str) -> Result<UserId, VerificationError> { // [...] } } }
Finally, you would change the signature1 of login
to ask for an Authenticator
instead of an Auth0Client
:
#![allow(unused)] fn main() { async fn login<A>( request: &HttpRequest, database_pool: &DatabasePool, authenticator: &A, rate_limiter: &RateLimiter, ) -> Result<LoginResponse, LoginError> where A: Authenticator, { // [...] } }
You have successfully refactored to an interface!
Tread carefully
Refactoring to an interface is the technique you need to master if you want to use test doubles.
All the other exercises in this workshop provide conveniences to reduce the amount of boilerplate you need to write,
but they don't fundamentally move the needle on the complexity of the problem.
This kind of refactoring might not always be easy (nor possible!).
You need to analyze your codebase to determine if it's viable and if it's worth the effort.
You're introducing a layer of indirection that might not be necessary beyond your tests. That's **incidental complexity
**.
In some cases, it might be better to use a full-fidelity testing approach, like the ones you'll see later in
this workshop.
In this example we've used static dispatch to make login
polymorphic with respect to
the Authenticator
type.
You can also use dynamic dispatch by changing the signature of login
to ask for a &dyn Authenticator
(a trait
object) instead of an &A
.
Exercise
The exercise for this section is located in 03_mocks/01_traits
mockall
In the previous exercise you've manually implemented a do-nothing logger for your tests.
It can get tedious to do that for every dependency you want to mock. Let's bring some automation to the party!
Mocking with mockall
mockall
is the most popular auto-mocking library for Rust.
It's built around the #[automock]
attribute, which generates a mock implementation of a trait for you.
Let's look at an example:
#![allow(unused)] fn main() { pub trait EmailSender { fn send(&self, email: &Email) -> Result<(), EmailError>; } }
To generate a mock implementation of EmailSender
, you need to add the #[automock]
attribute to the trait:
#![allow(unused)] fn main() { use mockall::automock; #[automock] pub trait EmailSender { fn send(&self, email: &Email) -> Result<(), EmailError>; } }
mockall
will generate a struct named MockEmailSender
with an implementation of the EmailSender
trait.
Each of the methods in the trait will have a counterpart in the mock struct, prefixed with expect_
.
By calling expect_send
, you can configure how MockEmailSender
will behave when the send
method is called.
In particular, you can define:
- Preconditions (e.g. assertions on the arguments passed to the method)
- Expectations (e.g. how many times you expect the method to be called)
- Return values (e.g. what the method should return)
In an example test:
#![allow(unused)] fn main() { #[test] fn test_email_sender() { let mut mock = MockEmailSender::new(); mock.expect_send() // Precondition: do what follows only if the email subject is "Hello" .withf(|email| email.subject == "Hello") // Expectation: panic if the method is not called exactly once .times(1) // Return value .returning(|_| Ok(())); // [...] } }
A word on expectations
Expectations such as times
are a powerful feature of mockall
.
They allow you to test how your code interacts with the dependency that's being mocked.
At the same time, they should be used sparingly.
Expectations couple your test to the implementation of the code under test.
Only use expectations when you explicitly want to test how your code interacts with the dependency—e.g.
you are testing a retry mechanism and you want to make sure that it retries according to the configured policy.
Avoid settings times
expectations on every mock method in your tests just because you can.
Exercise
The exercise for this section is located in 03_mocks/02_mockall
Multiple calls
The problem
Methods on your mock object might be invoked multiple times by the code under test.
In more complex scenarios, you might need to return different values for each invocation. Let's see how to do that.
times
When you add a times
expectation to a method, mockall
will use the return value you specified
up until the times
-th invocation.
#![allow(unused)] fn main() { #[test] fn test_times() { let mut mock = MockEmailSender::new(); let email = /* */; mock.expect_send() .times(2) .returning(|_| Ok(())); mock.send(&email); mock.send(&email); // This panics! mock.send(&email); } }
You can leverage this feature to return different values depending on the number of times a method has been called.
#![allow(unused)] fn main() { #[test] fn test_times() { let mut mock = MockEmailSender::new(); let email = /* */; mock.expect_send() .times(1) .returning(|_| Ok(())); mock.expect_send() .times(1) .returning(|_| Err(/* */)); // This returns Ok(())... mock.send(&email); // ...while this returns Err! mock.send(&email); } }
Sequence
What we have seen so far works well when you need to return different values for different invocations of the same
method.
You can take this one step further by defining a sequence of calls that your mock object should expect, spanning
multiple methods.
#![allow(unused)] fn main() { #[test] fn test_sequence() { let mut mock = MockEmailSender::new(); let email = /* */; let mut sequence = Sequence::new(); mock.expect_send() .times(1) .in_sequence(&mut sequence) .returning(|_| Ok(())); mock.expect_get_inbox() .times(1) .in_sequence(&mut sequence) .returning(|_| Ok(/* */)); // This panics because the sequence expected `send` to be called first! mock.get_inbox(); } }
When using a Sequence
, you need to make sure that the methods are invoked in the order specified by the sequence.
Invoking get_inbox
before send
will cause the test to fail, even if they are both called exactly once.
Exercise
The exercise for this section is located in 03_mocks/03_sequences
Checkpoints
The test may need to use your mock object as part of its setup as well as a dependency of the specific code path under test.
For example:
#![allow(unused)] fn main() { pub struct Repository(/* ... */); impl Repository { pub fn new<T: UserProvider>(up: T) -> Self { // ... } pub fn query<T: UserProvider>(&self, id: u32, up: T) -> Option<Entity> { // ... } } }
If you're mocking UserProvider
and you want to test Repository::query
, you'll need to use the mock
for calling Repository::new
first.
Expectations can leak
To get Repository::new
to behave as expected, you'll need to set up some expectations on MockUserProvider
.
You'll also need to set up expectations on MockUserProvider
for Repository::query
to behave as expected.
There's a risk that the expectations you set up for Repository::new
will leak into Repository::query
:
they'll be executed when they shouldn't be, leading to confusing errors in your tests.
This can happen, in particular, when the code in Repository::new
changes and stops performing one of
the calls you set up expectations for.
Checkpoints
To prevent this from happening, you can use two different instances of MockUserProvider
for those calls.
Alternatively, you can rely on checkpoints.
A checkpoint is a way of saying "Panic unless all expectations up to this point to have been met".
In this example, you can use a checkpoint to ensure that the expectations for Repository::new
are met
before you start setting up expectations for Repository::query
.
#![allow(unused)] fn main() { #[test] fn test_repository_query() { let mut mock = MockUserProvider::new(); let mut repo = setup_repository(&mut mock); // Set up expectations for Repository::query // [...] // Call Repository::query // [...] } fn setup_repository(mock: &mut MockUserProvider) -> Repository { // Arrange mock.expect_is_authenticated() .returning(|_| true); // [...] // Act let repository = Repository::new(mock); // Verify that all expectations up to the checkpoint have been met mock.checkpoint(); repository } }
If expectations are not met at the checkpoint, it will panic.
If they are met, the test will continue and all expectations will be reset.
Exercise
The exercise for this section is located in 03_mocks/04_checkpoints
Foreign traits
For #[automock]
to work, it needs to be applied to the trait definition.
That's not an issue if you're writing the trait yourself, but what if you're using a trait from a third-party crate?
The problem
Rust macros can only access the code of the item they're applied to.
There's no way for macros to ask the compiler "can you give me the trait definition of Debug
?".
The "solution"
If you want to use mockall
with a trait from a third-party crate, you'll need to rely on their mock!
macro
and... inline the trait definition in your code.
The syntax is fairly custom—refer to
the mock!
macro documentation for the specifics.
Exercise
The exercise for this section is located in 03_mocks/05_foreign_traits
Outro
Refactoring to an interface is a key technique that should be in the toolbox of every developer.
Automocking, on the other hand, should be evaluated on a case by case basis: a mock-heavy testing approach
often leads to high-maintenance test suites.
Nonetheless, it's important to play with it at least once, so that you can make an informed decision.
That was the primary goal of this section!
What's next?
The next three sections will zoom on three different types of external dependencies:
the filesystem, databases and HTTP APIs.
We'll look at a few techniques to perform full(er)-fidelity testing when the code under test interacts with this kind
of systems.
Exercise
The exercise for this section is located in 03_mocks/06_outro
The testing system: a look under the hood
We won't move on to the next big topic (filesystem testing) just yet.
Instead, we'll take a moment to understand what we've been using so far: the testing system. What
happens when you run cargo test
?
Different kinds of tests
There are three types of tests in Rust:
- Unit tests
- Integration tests
- Doc tests
Unit tests
Unit tests are the tests you write alongside your non-test code, inside your Rust library or binary.
They're the ones we've been writing so far: an inline module annotated with #[cfg(test)]
and a bunch of
#[test]
functions.
Integration tests
Integration tests are tests that live outside your Rust library or binary, in the special tests/
directory.
You don't need to annotate any module with #[cfg(test)]
here: the compiler automatically assumes that
everything in tests/
is under #[cfg(test)]
.
Doc tests
Doc tests are tests that live inside your documentation comments.
They're a great way to make sure your examples are always up-to-date and working.
Compilation units
Depending on the type of test, cargo test
will compile and run your tests in different ways:
- All unit tests defined in the same package are compiled into a single binary and run together (i.e. in a single process).
- All the tests defined under the same top-level item under
tests/
(e.g. a single filetests/foo.rs
or a single directorytests/foo/
) are compiled into a single binary and run together in the same process. Different top-level items are compiled into different binaries and run in different processes. - Each doc test is compiled into a separate binary and run in its own process.
This has a number of consequences:
- Any global in-memory state (e.g. variables behind a
lazy_static!
oronce_cell::Lazy
) is only shared between tests that are compiled into the same binary and run in the same process. If you want to synchronize access to a shared resource across the entire test suite (e.g. a database), you need to use a synchronization primitive that works across processes. - The more tests you have, the more binaries
cargo test
will need to compile and run. Make sure you're using a good linker to minimize the time spent linking your tests. - Any process-specific state (e.g. the current working directory) is shared between all the tests that are compiled
into the same binary and run in the same process.
This means that if you change the current working directory in one test, it will affect other tests that share the same process!
The last point will turn out to be quite relevant in the next section: isolating tests that rely on the filesystem from each other.
All the details above apply specifically to
cargo test
.
If you use a different test runner, you might get different behavior. We'll explore this later in the workshop withcargo-nextest
.
Exercise
The exercise for this section is located in 04_interlude/00_testing_infrastructure
Test isolation
The code under test doesn't run in a vacuum.
Your tests change the state of the host system, and that state, in return, affects the outcome of your tests.
Let's use the filesystem as a concrete example. It behaves as a global variable that all your tests share.
If one test creates a file, the other tests will see it.
If two tests try to create a file with the same name, one of them will fail.
If a test creates a file, but doesn't clean it up, the next time the same test runs it might fail.
Those cross-test interactions can make your test suite flaky: tests might pass or fail depending on the order in which they were run. That's a recipe for frustration and wasted time.
We want the best of both worlds: we want to be able to test the effects of our code on the outside world,
but we also want our tests to be isolated from each other.
Each test should behave as if it is the only test running.
The plan
This section (and the next two) will be dedicated to various techniques to achieve test isolation
when using high-fidelity testing.
In particular, we'll look at what happens when your application interacts with the filesystem, databases and
other HTTP APIs.
Exercise
The exercise for this section is located in 05_filesystem_isolation/00_intro
Implicit or explicit?
Testability is a property of software systems.
Given a set of requirements, you can look at implementations with very different levels of testability.
This is especially true when we look at the interactions between the system under test and the host.
Filesystem as a dependency
In Rust, any piece of code can choose to interact with the filesystem. You can create files, read files, delete files,
etc.
It doesn't necessarily show up in the function signature. The dependency can be implicit.
#![allow(unused)] fn main() { use std::io::{BufReader, BufRead}; use std::path::PathBuf; fn get_cli_path() -> PathBuf { let config = std::fs::File::open("config.txt").unwrap(); let reader = BufReader::new(config); let path = reader.lines().next().unwrap().unwrap(); PathBuf::from(path) } }
It is suspicious that get_cli_path
is able to conjure a PathBuf
out of thin air.
But it's not immediately obvious that it's interacting with the filesystem. It might also be
more obfuscated in a real-world codebase (e.g. there might be other inputs).
This is an issue when we want to test get_cli_path
.
We can create a file called config.txt
where get_cli_path
expects it to be, but things quickly become
complicated:
- We can't run tests in parallel if they all invoke
get_cli_path
and if we needget_cli_path
to return different values in different tests, since they would all be reading from the same file. - We need to make sure that the file is deleted after each test, regardless of its outcome, otherwise there might be side-effects that affect the outcome of other tests (either in the same run or in a future run).
Let's see how we can refactor get_cli_path
to mitigate both issues.
Writing testable code, filesystem edition
1. Take paths as arguments
Instead of hard-coding the path to the config file in get_cli_path
, we can take it as an argument.
#![allow(unused)] fn main() { use std::io::{BufReader, BufRead}; use std::path::{PathBuf, Path}; fn get_cli_path(config_path: &Path) -> PathBuf { let config = std::fs::File::open(config_path).unwrap(); let reader = BufReader::new(config); let path = reader.lines().next().unwrap().unwrap(); PathBuf::from(path) } }
2. If you need to hard-code a path, do it close to the binary entrypoint
If we need to hard-code a path, it is better to do it in the main
function, or as close to the binary entrypoint as
possible.
use std::path::PathBuf; use crate::get_cli_path; fn main() { let config_path = PathBuf::from("config.txt"); let cli_path = get_cli_path(&config_path); }
This limits the scope of difficult-to-test code. In particular, the binary becomes a very thin (and boring) layer around a library that can be tested in isolation.
Having a thin binary layer around a library is a common pattern in Rust. It is a good pattern to adopt for testability, beyond the specifics of the filesystem. You'll see more examples of this pattern in action later in the workshop!
tempfile
We've refactored get_cli_path
to make it easier to test.
But we still need to write those tests!
We have two problems to solve:
- Each test should use a different file, so that they don't interfere with each other and we can run them in parallel.
- We need to make sure that the file is deleted after each test, regardless of its outcome.
This is where the tempfile
crate comes in handy!
It provides tools to work with temporary files and directories. In this exercise (and the next)
we'll focus on how to leverage it!
Exercise
The exercise for this section is located in 05_filesystem_isolation/01_named_tempfile
Temporary files
NamedTempFile
solved our issues
in the previous exercise. But how does it work?
Temporary file directory
Most operating systems provide a temporary file directory.
You can retrieve the path to the temporary directory
using std::env::temp_dir
.
Files created in that directory will be automatically deleted at a later time.
When are temporary files deleted?
When using NamedTempFile
, there are two deletion mechanisms at play:
NamedTempFile
will delete the file when it is dropped.
This is robust in the face of panics (if they don't abort!) and is the main mechanismtempfile
relies on.- If destructor-based deletion fails, the OS will eventually delete the file since it's in the temporary directory.
The latter mechanism is not guaranteed to run at any specific time, therefore NamedTempFile
tries to generate
a unique filename to minimise the risk of collision with a leaked file.
Security
There are a fair number of OS-specific details to take into account when working with temporary files, but
tempfile
takes care of all of them for us.
In particular, there are
some security considerations
when working with NamedTempFile
. When it comes to usage in test suites, you're in the clear.
tempfile()
tempfile
also provides a tempfile()
function
that returns a special kind of File
: the OS is guaranteed to delete the file when the last handle to it is dropped.
There's a caveat though: you can't access the path to the file.
Refactoring
We could choose to refactor get_cli_path
to make tempfile()
viable for our testing needs.
#![allow(unused)] fn main() { use std::io::BufRead; use std::path::PathBuf; fn get_cli_path<R>(config: R) -> PathBuf where R: BufRead, { let path = config .lines() .next() .expect("The config is empty") .expect("First line is not valid UTF-8"); PathBuf::from(path) } }
We are no longer performing any filesystem operation in get_cli_path
: the configuration "source" is now abstracted
behind the BufRead
trait.
We could now use get_cli_path
to process a File
, a String
(using std::io::Cursor
), or other types that implement
BufRead
.
This is a valuable refactoring to have in your toolkit, but it's not a panacea.
You'll still need to deal with that filesystem access at some point. You could move it to the binary entrypoint, but
does it really count as "thin" and "boring"?
You'll probably have logic to handle failures, with different code paths depending on the error. You should test that!
Evaluate on a case-by-case basis whether it's worth it to refactor your code to make it easier to test with
something like tempfile()
.
Exercise
The exercise for this section is located in 05_filesystem_isolation/02_tempfile
Path coupling
Our testing, so far, has been limited to cases where the code interacts with a single isolated file.
Real-world codebases are rarely that simple.
More often than not, you'll have to deal with multiple files and there'll be assumptions as to where they are located relative to each other.
Think of cargo
as an example: it might load the Cargo.toml
manifest for a workspace and then go looking
for the Cargo.toml
files of each member crate based on the relative paths specified in the workspace manifest.
If you just create a bunch of NamedTempFile
s, it won't work: the paths will be completely random
and the code will fail to find the files where it expects them.
tempdir
The tempfile
crate provides a solution for this
scenario: TempDir
.
With the default configuration, it will create a temporary directory inside the system's temporary directory.
You can then create files inside of that directory using the usual std::fs
APIs, therefore controlling the
(relative) paths of the files you create.
When TempDir
is dropped, it will delete the directory and all its contents.
Working directory
To work with TempDir
effectively, it helps to structure your code in a way that minimises the number of
assumptions it makes about the current working directory.
Every time you're using a relative path, you're relying on the current working directory: you're
reading {working_directory}/{relative_path}
.
The current working directory is set on a per-process basis.
As you learned in the interlude, that implies that it is shared between tests,
since multiple tests can be compiled into the same binary and run in the same process.
Running std::env::set_current_dir
in one test will affect the outcome of the other tests,
which is not what we want.
The solution is to make all paths relative to a configurable root directory.
The root directory is set by the binary entrypoint (e.g. main
), and it's then passed down to the rest of the
codebase.
You can then set the root directory to the path of a TempDir
in your tests, and you're good to go!
Exercise
The exercise for this section is located in 05_filesystem_isolation/03_tempdir
Outro
The patterns (and tools) you've learned in this section should help you write more robust tests for code that interacts with the filesystem.
When everything else fails
Nonetheless, they are not a silver bullet: you might not be in control of the code you're testing (e.g. third-party libraries), or it might be too expensive to refactor it to make it more testable.
In those cases, you can take a more radical approach to isolation: run each test in a separate process
and set their working directory to a temporary directory (created via tempfile::TempDir
).
If you want to go down that route, check out cargo-nextest
: it runs your tests as
isolated processes by default, and it's best suited for this kind of workflow.
Exercise
The exercise for this section is located in 05_filesystem_isolation/04_outro
Database isolation
Let's move on to another external dependency: the database.
We'll use PostgreSQL as our reference database, but the same principles apply to other databases.
The challenge
The database is often the most complex external dependency in an application, especially if it is distributed.
The database is in charge of storing your data, ensuring its integrity in the face of various kinds of
failures and complex concurrent access patterns.
If you want to write a black-box test for your application, you'll have to deal with the database.
The challenge is, in many ways, similar to what we discussed in the previous section about the filesystem:
if we just point our tests to a shared database, we'll end up with spurious failures and a slow test suite,
since the tests will interfere with each other and we'll be forced to execute them sequentially.
The dream
Our goal is to run our tests in parallel, with minimal overhead, and without having to worry about cross-test
interference.
Each test should be able to assume that it is the only test running, and that it can safely modify the database
as it sees fit.
Approaches
1. Use an in-memory database
Instead of using an actual database instance, we replace it with an in-memory database.
Each test creates a separate in-memory database, and we don't have to worry about interference between tests.
It isn't all roses, though:
- You'll have to structure your code so that you can easily swap the database implementation.
This will inevitably increase the complexity of your application without adding much value to the production code. - You're not really testing the system as a whole, since you're not using the same database of your production environment.
This is especially problematic if you're using database-specific features. An in-memory database will not behave exactly like your production database, especially when it comes to concurrency and locking. Subtle (but potentially serious) bugs will slip through your test suite.
In-memory databases used to be a popular approach, but they have fallen out of favor in recent years since it has become significantly easier to run instances of real databases on laptops and in CI environments. Thanks Docker!
2. Use uncommitted transactions
Many databases (relational and otherwise) support transactions: a way to group multiple operations into a single
unit of work that either succeeds or fails as a whole.
In particular, you can use transactions to create a "private" view of the database for each test:
what happens in a transaction is not visible to other transactions until it is committed1, but it is visible to the
client that created it.
You can leverage this fact to run your tests in parallel, as long as you make sure that each test runs in a separate
transaction that's rolled back at the end of the test.
There are some complexities to this approach:
- When the code under test needs to perform multiple transactions, you end up with nested transactions.
In a SQL database, that requires (implicitly) converting yourCOMMIT
statements intoSAVEPOINT
statements. Other databases may not support nested transactions at all. - Rust is a statically typed language. Writing code that can accept both an open transaction and a "simple" connection as the object that represents the database can be... complicated.
3. Use a separate database for each test
Since our goal is to isolate each test, the most straightforward approach is to use a separate database for each test!
Today's laptops, combined with Docker, make this approach feasible even for large test suites.
Our recommendation is to use a different logical database for each test, rather than a physical database (e.g. a separate Docker container for each test). It lowers the overhead, resulting in faster tests.
Our recommendation
We recommend approach #3: a separate database for each test.
It has the lowest impact on your production code and it gives you the highest level of confidence in your tests.
We'll see how to implement it with sqlx
in the next section.
The exact semantics of transactions depend on the isolation level of the database.
What we describe here is the behavior of the READ COMMITTED
isolation level,
which is the default in PostgreSQL. You need to use an isolation level that doesn't allow dirty reads.
Exercise
The exercise for this section is located in 06_database_isolation/00_intro
Testing with sqlx
Let's try to implement the "one database per test" approach with sqlx
.
Spinning up a "physical" database
We need to have a "physical" database instance to create a dedicated logical database for each test.
We recommend using an ephemeral Docker container for this purpose. Containers are portable, easy to spin up and tear
down.
If you don't have Docker installed, go get it! You can find instructions here.
In our ideal setup, you'd just execute cargo test
and the required setup (i.e. spinning up the container) would be
executed automatically. We are not quite there yet, though, so for now you'll have to run it manually:
docker run -p 5432:5432 \
-e POSTGRES_PASSWORD=password \
-e POSTGRES_USER=postgres \
-e POSTGRES_DB=postgres \
--name test_db \
postgres:15
Configuring sqlx
For this section, we'll be using sqlx
to interact with PostgreSQL.
One of the key features provided by sqlx
is compile-time query validation: when you compile your project,
sqlx
will check that all your queries are valid SQL and that they are compatible with your database schema.
This is done via their custom macros: at compile-time, they issue a statement against a live database to carry
out the validation.
For that reason, we need to provide sqlx
with a connection string to said database.
The common approach is to define a .env
file in the root of the project: sqlx
will automatically read it and
use the value of the DATABASE_URL
variable as the connection string. We'll stick to this approach.
sqlx
exposes a few different macro variants, but we'll mostly be usingsqlx::query!
.
#[sqlx::test]
sqlx
itself embraces the "one database per test" approach and provides a custom test attribute, #[sqlx::test]
, to do
the heavy lifting for you.
You add an input parameter to your test function (e.g. pool: sqlx::PgPool
) and sqlx
will automatically create a new
database and pass a connection pool to your test.
You can find the list of injectable parameters in the
sqlx::test
documentation.
Under the hood, this
is what sqlx
does:
- It connects to the database specified in the
DATABASE_URL
environment variable. - It creates a new database with a random name.
- (Optional) It runs all the migrations in the
migrations
directory. - It creates a connection pool to the new database.
- It passes the connection pool to your test function.
- It waits for the test to complete.
- It deletes the database.
Exercise
The exercise for this section is located in 06_database_isolation/01_sqlx_test
HTTP mocking
We have looked at the filesystem and at databases. It's time to turn our attention to another network-driven interaction: HTTP requests and responses.
The challenge
Most applications rely on external services to fulfill their purposes.
Communication with these services usually happens over the network.
Your code can have complex interactions with these dependencies. Depending on the data you send and receive, your code might go down very different execution paths. The interaction itself might fail in many different ways, which you must handle appropriately.
For the purpose of this section, we'll assume that all communication happens over HTTP, but similar techniques can be applied to other protocols.
HTTP mocking
How do you test your code in these conditions?
At a high-level, you have three options:
- You run a test instance of the external service that your code can communicate with during the test.
- You use a library that can intercept HTTP requests and return pre-determined responses.
- You hide the network dependency behind an abstraction and use a test double rather than the production implementation in your tests.
Option #1 (a complete end-to-end test) is the most realistic setup and gives you the highest confidence in your code. Unfortunately, it's not always feasible: you might not have access to the service, or it might be too expensive to run an isolated instance for each test (e.g. a deep microservice architecture would require you to run a lot of services since each service may depend on others).
Option #3 has been explored in the mocking section of the workshop, so let's set it aside for now.
Option #2 is a middle-ground: you're still running the production implementation of your HTTP client, therefore
exercising the whole stack (from your code to the network and back), but you're dodging the complexity of running an actual test instance
of the external service.
The downside: you need to make sure that your mocked responses are in sync with the real service. If the service changes
its API or behaviour, you need to update your mocks accordingly.
In this section, we'll explore option #2 using wiremock
.
Exercise
The exercise for this section is located in 07_http_mocking/00_intro
wiremock
The wiremock
crate is a loose port of the well-known
WireMock library from Java.
How does it work?
The core idea in wiremock
is simple: you start a server that listens for HTTP requests and returns pre-determined
responses. The rest is just sugar to make it easy to define matching rules and expected responses.
MockServer
MockServer
is the interface to the test server.
When you call MockServer::start()
, a new server is launched on a random port.
You can retrieve the base URL of the server with MockServer::uri()
.
#![allow(unused)] fn main() { #[tokio::test] async fn test() { let mock_server = MockServer::start().await; let base_url = mock_server.uri(); // ... } }
wiremock
uses a random port in MockServer
in order to allow you to run tests in parallel.
If you specify the same port across multiple tests, you're then forced to run them sequentially, which can be a
significant performance hit.
Writing testable code, HTTP client edition
Let's assume that we have a function that sends a request to GitHub's API to retrieve the tag of the latest release for a given repository:
#![allow(unused)] fn main() { use reqwest::Client; async fn get_latest_release(client: &Client, repo: &str) -> Result<String, reqwest::Error> { let url = format!("https://api.github.com/repos/{repo}/releases/latest"); let response = client.get(&url).send().await?; let release = response.json::<serde_json::Value>().await?; let tag = release["tag_name"].as_str().unwrap(); Ok(tag.into()) } }
As it stands, this function cannot be tested using wiremock
.
1. Take base URLs as arguments
We want the code under the test to send requests to the MockServer
we created in the test.
We can't make that happen if the base URL of the external service is hard-coded in the function.
Base URLs must be passed as arguments to the code under test:
#![allow(unused)] fn main() { use reqwest::Client; async fn get_latest_release(client: &Client, github_base_uri: http::Uri, repo: &str) -> Result<String, reqwest::Error> { let endpoint = format!("{github_base_uri}/repos/{repo}/releases/latest"); let response = client.get(&endpoint).send().await?; let release = response.json::<serde_json::Value>().await?; let tag = release["tag_name"].as_str().unwrap(); Ok(tag.into()) } }
2. If you need to hard-code a base URL, do it close to the binary entrypoint
If we need to hard-code a base URL, it is better to do it in the main
function, or as close to the binary entrypoint
as possible.
This limits the scope of difficult-to-test code. In particular, the binary becomes a very thin (and boring) layer
around a library that can be tested in isolation.
Even better: take the base URL as part of your application configuration.
Mock
You have a MockServer
and the code under test has been refactored to make the base URL configurable. What now?
You need to configure MockServer
to respond to incoming requests using one or more Mock
s.
A Mock
lets you define:
- Preconditions (e.g. assertions on the requests received by the server)
- Expectations (e.g. how many times you expect the method to be called)
- Response values (e.g. what response should be returned to the caller)
Yes, this is very similar to
mockall
!
In an example test:
#![allow(unused)] fn main() { use wiremock::{MockServer, Mock, ResponseTemplate}; use wiremock::matchers::method; #[tokio::test] async fn test() { let mock_server = MockServer::start().await; // Precondition: do what follows only if the request method is "GET" Mock::given(method("GET")) // Response value: return a 200 OK .respond_with(ResponseTemplate::new(200)) // Expectation: panic if this mock doesn't match at least once .expect(1..) .mount(&mock_server) .await; // [...] } }
A Mock
doesn't take effect until it's registered with a MockServer
.
You do that by calling Mock::mount
and passing the MockServer
as an argument, as in the example
above.
Expectations
Setting expectations on a Mock
is optional: use them when you want to test how your code interacts with the
dependency that's being mocked, but don't overdo it.
Expectations, by default, are verified when the MockServer
is dropped. We'll look at other verification
strategies in a later section.
Exercise
The exercise for this section is located in 07_http_mocking/01_basics
Matchers
When configuring a Mock
, you can specify one or more matchers for incoming requests.
The Mock
is only triggered if the incoming request satisfies all the matchers attached to it.
Common matchers
The wiremock
crate provides an extensive collection of matchers out of the box.
Check out the documentation of the matchers
module for the full list.
Writing your own matchers
Occasionally, you'll need to write your own matchers, either because you need to match on a property that's not supported by the built-in matchers, or because you want to build a higher-level matcher out of existing ones.
To write a custom matcher, you need to implement
the Match
trait:
#![allow(unused)] fn main() { pub trait Match: Send + Sync { // Required method fn matches(&self, request: &Request) -> bool; } }
The trait is quite straight-forward. It has a single method, matches
, that takes a reference to the incoming Request
and returns a bool
: true
if the request matches, false
otherwise.
Exercise
The exercise for this section is located in 07_http_mocking/02_match
Checkpoints
When a MockServer
instance goes out of scope (i.e. when it's dropped), it will verify that all the expectations
that have been set on its registered mocks have been satisfied.
When you have a complex mocking setup, it can be useful to verify the state of the mocks before the end
of the test.
wiremock
provides two methods for this purpose:
MockServer::verify
verifies that all the expectations have been satisfied. It panics if they haven't.- Scoped mocks,
via
MockServer::register_as_scoped
.
verify
is self-explanatory, so let's dive into scoped mocks.
Scoped mocks
When you register a mock with MockServer::register
, it will be active until the MockServer
instance goes out of
scope.
MockServer::register_scoped
, instead, returns
a MockGuard
.
The mock will be active until the guard is alive. When the guard goes out of scope, the mock will be removed from the
MockServer
instance and its expectations will be verified.
Exercise
The exercise for this section is located in 07_http_mocking/03_checkpoints
Outro
wiremock
is an example of transferable knowledge: once you've learned how to use a mocking library (e.g. mockall
)
you can apply the same patterns to any other library in the same category.
You just need to learn the specifics of the new domain (HTTP, in this case), but the general approach remains the same.
Onwards
You are done with full-fidelity testing techniques.
In the next section, you'll take matters into your own hands. You'll be building your own test runners and custom
test macros!
Exercise
The exercise for this section is located in 07_http_mocking/04_outro
Test macros
In the previous sections you've had a chance to see quite a few "custom" test macros in action:
#[googletest::test]
, #[tokio::test]
, #[sqlx::test]
. Sometimes you even combined them, stacking them
on top of each other!
In this section, you'll learn why these macros exist and how to build your own.
The default toolkit is limited
cargo test
and #[test]
are the two building blocks of the Rust testing ecosystem, the ones
available to you out of the box.
They are powerful, but they lack a few advanced features that you might be familiar with from
testing frameworks in other ecosystems:
- No lifecycle hooks. You can't easily execute code before or after a test case.
That's a requirement if you want to set up and tear down external resources (e.g. a database, like in
#[sqlx::test]
). - No fixtures. You can't inject types into the signature of a test function and expect the test framework
to instantiate them for you (e.g. like
PgPool
with#[sqlx::test]
). - No parameterised tests. You can't run the same test with different inputs and have each input
show up as a separate test case in the final test report (e.g. see
rstest
). - No first-class async tests. Rust doesn't ship with a default executor, so you can't write async tests
without pulling in a third-party crate. Macros like
#[tokio::test]
, under the hood, rewrite your async test function as a sync function with a call toblock_on
(see here).
Macros to the rescue
Custom test macros are a way to augment the default toolkit with the features you need.
All the macros we mentioned so far are attribute procedural macros.
Procedural macros are token transformers. As input, they receive:
- A stream of tokens, representing the Rust code that's been annotated with the macro;
- A stream of tokens, representing the arguments passed to the macro.
As output, they return another stream of tokens, the Rust code that will actually be compiled as part of the crate that used the macro.
Example: #[tokio::test]
Let's look at an example to make things concrete: #[tokio::test]
.
The #[tokio::test]
macro definition looks like this:
#![allow(unused)] fn main() { use proc_macro::TokenStream; #[proc_macro_attribute] pub fn test(args: TokenStream, item: TokenStream) -> TokenStream { // [...] } }
If you use #[tokio::test]
on a test function, we can see the two streams of tokens in action:
#![allow(unused)] fn main() { #[tokio::test(flavor = "multi_thread")] async fn it_works() { assert!(true); } }
- The first stream of tokens (
args
) contains the arguments passed to the macro:flavor = "multi_thread"
. - The second stream of tokens (
item
) contains the Rust code that's been annotated with the macro:async fn it_works() { assert!(true); }
. - The output stream, instead, will look like this:
#![allow(unused)] fn main() { #[test] fn it_works() { tokio::runtime::Builder::new_multi_thread() .enable_all() .build() .unwrap() .block_on(async { assert!(true); }) } }
Objectives
This is not a workshop on procedural macros, so we won't be exploring advanced macro-writing techniques.
Nonetheless, a basic understanding of how macros work and a few exercises can go a long way: you don't need to
know that much about macros to write your own test macro!
That's the goal of this section.
Exercise
The exercise for this section is located in 08_macros/00_intro
Your first macro
Let's start from the basics: you'll write a macro that does nothing. It just re-emits the code that's been annotated
with
the macro, unchanged.
This will give you a chance to get familiar with the overall setup before moving on to more complex endeavors.
proc-macro = true
You can't define a procedural macro in a "normal" library crate.
They need to be in a separate crate, with a Cargo.toml
that includes this key:
[lib]
proc-macro = true
That key tells cargo
that this crate contains procedural macros and it should be compiled accordingly.
#[proc_macro_attribute]
There are various kinds of procedural macros:
- Function-like macros. Their invocation looks like a function call (e.g.
println!
). - Derive macros. They're specified inside a
derive
attribute (e.g.#[derive(Debug)]
). - Attribute procedural macros. They're applied to items as attributes (e.g.
#[tokio::test]
).
For a test macro, we need an attribute procedural macro.
As you've learned in the intro, it's a function that's annotated with #[proc_macro_attribute]
:
#![allow(unused)] fn main() { use proc_macro::TokenStream; #[proc_macro_attribute] pub fn my_attribute_macro(args: TokenStream, item: TokenStream) -> TokenStream { // [...] } }
The proc_macro
crate is distributed as part of the Rust toolchain, just like the standard library, std
.
Exercise
The exercise for this section is located in 08_macros/01_no_op_macro
Parsing tokens
In the previous exercise, both #[vanilla_test]
and the default #[test]
macro had to be specified on top of
the test function. Without adding #[test]
, the annotated function is not picked up by the test runner.
Detecting existing attributes
You'll augment #[vanilla_test]
:
- If the annotated function has been annotated with
#[test]
, it should emit the code unchanged. - If the annotated function has not been annotated with
#[test]
, it should add#[test]
to the function.
This is how #[googletest::test]
works, for example.
The toolkit
When the macro game is serious, you can't get by using the built-in proc_macro
crate.
Almost all macros written in Rust are built on top of three ecosystem crates:
syn
for parsing tokens into abstract syntax tree nodes (AST node)quote
for expressing the generated code with aprintln!
-style syntaxproc-macro2
, a wrapper aroundproc_macro
's types
Exercise
The exercise for this section is located in 08_macros/02_test
Parsing arguments
Believe it or not, but you've now touched the entirety of the core macro ecosystem.
From now onwards, it's all about exploring the crates further while learning the intricacies of the Rust language:
you're continuously faced with weird edge cases when writing macros for a broad audience.
Arguments
But it's not over yet!
Let's get you to exercise these muscles a bit more before moving on to the next topic.
Our #[vanilla_test]
macro is still a bit too vanilla.
We have now renamed it to #[test]
, and we have higher expectations: it should support arguments!
If a before
argument is specified, the macro should invoke it before the test function.
If an after
argument is specified, the macro should invoke it after the test function.
It should be possible to specify both on the same test.
Caution
The happy case is often not that difficult when writing macros.
The challenge is returning good error messages when things go wrong.
In this exercise, a lot of things can go wrong:
- The item passed to the macro as
before
orafter
is not a function - The item passed to the macro as
before
orafter
is a function that takes arguments - The item passed to the macro as
before
orafter
is a function, but it's not in scope - Etc.
You can often overlook most of these issues if you're writing a macro for your own use. But they become important when you're writing a macro for a larger audience.
Exercise
The exercise for this section is located in 08_macros/03_hooks
Outro
Custom test macros can get you a long way, but they're not a silver bullet.
Complexity
Writing macros is its own skill: you can work with Rust successfully for years without ever having to
go beyond a macro_rules!
definition.
The next time you get the impulse to write a macro, ask yourself: if a colleague opens this file in 6 months,
will they be able to understand what's going on?
Test-scoped
Furthermore, there's a limit to what you can do with custom test macros.
Their action is scoped to a single test case and it's cumbersome to customise the way the whole test suite is run.
Next
In the next chapter, we'll look at one more way to customise your tests: custom test harnesses.
Exercise
The exercise for this section is located in 08_macros/04_outro
Test harnesses
In the interlude we had a first look under the hood
of cargo test
. In particular, you learned how tests are grouped into executables and
reflected on the implications.
In this chapter, we'll take things one step further: you'll write your own test harness!
Test targets
In your past projects you might have had to set properties for your binary ([[bin]]
)
and library ([lib]
) targets in your Cargo.toml
.
You can do the same for your test targets!
[[test]]
name = "integration"
The configuration above declares the existence of a test target named integration
.
By default, cargo
expects to find it in tests/integration.rs
. You can also customize
the path to the test entrypoint using the path
property.
You don't often see [[test]]
targets in the wild because cargo
infers them automatically—i.e.
if you have a tests/integration.rs
file, it will automatically be compiled and run as an integration test.
When you see a [[test]]
target in a Cargo.toml
, it's usually because the author wants to disable
the default test harness:
[[test]]
name = "integration"
# 👇 That's enabled by default
harness = false
Test harness
The test harness is the code that cargo
invokes to run each of your test suites.
When harness
is set to true
, cargo
automatically creates an entrypoint (i.e. a main
function)
for your test executable using libtest
,
the default test harness.
When harness
is set to false
, cargo
expects you to provide your own entrypoint.
Pros and cons
With a custom test harness, you are in charge!
You can execute logic before and after running your tests, you can customise how each test
is run (e.g. running them in separate processes), etc.
At the same time, you need to provide an entrypoint that integrates well with cargo test
's
CLI interface. Listing, filtering, etc. are all features that you'll need to add support for,
they don't come for free.
This section
We'll start by writing a simple test harness, to get familiar with the basics.
We'll then explore libtest_mimic
, a crate that takes over most of the heavy lifting
required to write a high-quality custom test runner.
Let's get started!
Exercise
The exercise for this section is located in 09_test_harness/00_intro
Custom test harness
Test targets
In your past projects you might have had to set properties for your binary ([[bin]]
)
and library ([lib]
) targets in your Cargo.toml
.
You can do the same for your test targets!
[[test]]
name = "integration"
The configuration above declares the existence of a test target named integration
.
By default, cargo
expects to find it in tests/integration.rs
. You can also customize
the path to the test entrypoint using the path
property.
You don't often see [[test]]
targets in the wild because cargo
infers them automatically—i.e.
if you have a tests/integration.rs
file, it will automatically be compiled and run as an integration test.
When you see a [[test]]
target in a Cargo.toml
, it's usually because the author wants to disable
the default test harness:
[[test]]
name = "integration"
# 👇 That's enabled by default
harness = false
Test harness
The test harness is the code that cargo
invokes to run your tests.
When harness
is set to true
, cargo
automatically creates an entrypoint (i.e. a main
function)
for your test executable using libtest
,
the default test harness.
When harness
is set to false
, cargo
expects you to provide your own entrypoint.
Pros and cons
With a custom test harness, you are in charge!
You can execute logic before and after running your tests, you can customise how each test
is run (e.g. running them in separate processes), etc.
At the same time, you need to provide an entrypoint that integrates well with cargo test
's
CLI interface. Listing, filtering, etc. are all features that you'll need to add support for,
they don't come for free.
Exercise
The exercise for this section is located in 09_test_harness/01_harness
Quaking like cargo test
As you have seen in the previous exercise, there are no requirements on your test
entrypoint beyond... existing.
You can execute arbitrary logic, print in whatever format, etc.
The only thing that cargo test
cares about is the exit code of your test executable,
which must be 0
if all tests passed, and 1
otherwise.
Integration brings benefits
Your test harness might be custom, but it's still being invoked via cargo test
.
As a CLI command, cargo test
exposes quite a few knobs: you can list tests, filter them,
control the number of threads used to run them, etc.
All those features become demands on your custom test harness: are you going to honor them? Or are you going to ignore them?
The latter is less work, but the resulting behaviour will surprise your user.
If I run cargo test <test_name>
, I expect only <test_name>
to be run, not all tests.
But if your custom test harness ignores CLI arguments, that's exactly what will happen.
The same applies when interacting with other tools—e.g. CI systems. If your test
report format is not compatible with cargo test
's, you'll have to write a custom
adapter to make it work.
libtest_mimic
Matching cargo test
's behaviour is a lot of work.
Luckily, you don't have to do it yourself: libtest_mimic
can take over most of the heavy lifting.
It provides an Arguments
struct that can be used to parse cargo test
's CLI arguments.
Arguments
is one of the two inputs to their run
function, the other being all the tests in your test suite.
run
interprets the parsed arguments and runs the tests accordingly (e.g. listing them,
filtering them, etc.). It's a testing framework, so to speak.
Exercise
The exercise for this section is located in 09_test_harness/02_cli
Outro
A custom test harness gives you a great deal of flexibility, but there are some limitations.
No #[test]
attribute
The most obvious one is that you can't use the #[test]
attribute.
There is no built-in mechanism to automatically collect all annotated tests, as cargo test
does with #[test]
.
You either have to manually register your tests (e.g. as you did in the previous exercise with that vector)
or find a way to automatically collect them (e.g. by establishing a file naming convention).
You can try to emulate distributed registration using some third-party crates (e.g. linkme
or inventory
).
Suite-scoped
Using a custom test harness you can customise how a single test suite is run.
If you need to perform some setup or teardown actions before or after all test suites, you're out of luck.
You still need to design some cross-process communication mechanism to coordinate across different test binaries.
Alternatively, you need to replace cargo test
with a different command that takes charge of
collecting and running all your test binaries (e.g. like cargo-nextest
).
Exercise
The exercise for this section is located in 09_test_harness/03_outro
Combining everything together
We've covered a lot of ground together: a new assertion framework, snapshot testing,
(auto)mocking, full-fidelity testing for various resources as well as tooling to build
custom test macros and harnesses.
I've tried to break each topic down into small bites, empowering you to build up your
knowledge incrementally.
It's time to put everything together!
The challenge
You have to design a custom test harness that's going to do the following:
- Start a (named) Docker container for PostgreSQL before running any tests.
- Before each test:
- Create a separate logical database in the container
- Run migrations on the database
- Run all tests in parallel, while injecting a
PgPool
instance for each test - After each test:
- Drop the logical database
- Stop the Docker container after all tests have completed
I don't have a suite of tests for you here, but please call me in when you're done—I want to see what you come with!
Exercise
The exercise for this section is located in 10_capstone/00_capstone