Welcome to Comprehensive Rust 🦀

This is a fork of the original Comprehensive Rust course which is developed by the Android team at Google and many open source contributors. Many thanks to the original authors 🥰

The course covers the full spectrum of Rust, from basic syntax to advanced topics like generics and error handling.

The goal of the course is to teach you Rust. We assume you don’t know anything about Rust and hope to:

  • Give you a comprehensive understanding of the Rust syntax and language.
  • Enable you to modify existing programs and write new programs in Rust.
  • Show you common Rust idioms.

Disclaimer

Rust is a large language and we won’t be able to cover all of it in a few days.

You will not be a Rust expert at the end of the course, but you will be familiar with Rust’s basics.

Attending this course is not an alternative to reading the official Rust book. You should read the book if you seek a deeper understanding of Rust after this course.

Assumptions

The course assumes that you already know how to program. Rust is a statically-typed language and we will sometimes make comparisons with C and C++ to better explain or contrast the Rust approach.

If you know how to program in a dynamically-typed language such as Python or JavaScript, then you will be able to follow along just fine too.

This is an example of a speaker note. We will use these to add additional information to the slides. This could be key points which the instructor should cover as well as answers to typical questions which come up in class.

Using Cargo

When you start reading about Rust, you will soon meet Cargo, the standard tool used in the Rust ecosystem to build and run Rust applications. Here we want to give a brief overview of what Cargo is and how it fits into the wider ecosystem and how it fits into this training.

Installation

Please follow the instructions on https://rustup.rs/.

This will give you the Cargo build tool (cargo) and the Rust compiler (rustc). You will also get rustup, a command line utility that you can use to install to different compiler versions.

After installing Rust, you should configure your editor or IDE to work with Rust. Most editors do this by talking to the language server rust-analyzer, which provides functionality like auto-completion and jump-to-definition for VS Code, Helix and many others. There is also a different IDE available called RustRover (proprietary and paid, not recommended).

  • On some Linux distributions, you can install Rust from the distribution’s repositories. But this is not recommended (at least for the course) since you might get an outdated version or miss some components.

The Rust Ecosystem

The Rust ecosystem consists of a number of tools, of which the main ones are:

  • rustc: the Rust compiler which turns .rs files into binaries and other intermediate formats.

  • cargo: the Rust dependency manager and build tool. Cargo knows how to download dependencies, usually hosted on https://crates.io, and it will pass them to rustc when building your project. Cargo also comes with a built-in test runner which is used to execute unit tests.

  • rustup: the Rust toolchain installer and updater. This tool is used to install and update rustc and cargo when new versions of Rust are released. In addition, rustup can also download documentation for the standard library. You can have multiple versions of Rust installed at once and rustup will let you switch between them as needed.

Key points:

  • Rust has a rapid release schedule with a new release coming out every six weeks. New releases maintain backwards compatibility with old releases — plus they enable new functionality.

  • There are three release channels: “stable”, “beta”, and “nightly”.

  • New features are being tested on “nightly”, “beta” is what becomes “stable” every six weeks.

  • Dependencies can also be resolved from alternative registries, git, folders, and more.

  • Rust also has editions: the current edition is Rust 2021. Previous editions were Rust 2015 and Rust 2018.

    • The editions are allowed to make backwards incompatible changes to the language.

    • To prevent breaking code, editions are opt-in: you select the edition for your crate via the Cargo.toml file.

    • To avoid splitting the ecosystem, Rust compilers can mix code written for different editions.

    • Mention that it is quite rare to ever use the compiler directly not through cargo (most users never do).

    • It might be worth alluding that Cargo itself is an extremely powerful and comprehensive tool. It is capable of many advanced features including but not limited to:

    • Read more from the official Cargo Book

Code Samples in This Training

For this training, we will mostly explore the Rust language through examples which can be executed through your browser. This makes the setup much easier and ensures a consistent experience for everyone.

The code blocks in this course are fully interactive:

fn main() {
    println!("Edit me!");
}

You can use Ctrl + Enter to execute the code when focus is in the text box.

Most code samples are editable like shown above. A few code samples are not editable for various reasons:

  • The embedded playgrounds cannot execute unit tests. Copy-paste the code and open it in the real Playground to demonstrate unit tests.

  • The embedded playgrounds lose their state the moment you navigate away from the page!

Running Code Locally with Cargo

If you want to experiment with the code on your own system, then you will need to first install Rust. Do this by following the instructions in the Rust Book. This should give you a working rustc and cargo. At the time of writing, the latest stable Rust release has these version numbers:

% rustc --version
rustc 1.76.0 (07dca489a 2024-02-04)

% cargo --version
cargo 1.76.0 (c84b36747 2024-01-18)

You can use any later version too since Rust maintains backwards compatibility.

With this in place, follow these steps to build a Rust binary from one of the examples in this training:

  1. Click the “Copy to clipboard” button on the example you want to copy.

  2. Use cargo new exercise to create a new exercise/ directory for your code:

    $ cargo new exercise
         Created binary (application) `exercise` package
    
  3. Navigate into exercise/ and use cargo run to build and run your binary:

    $ cd exercise
    
    $ cargo run
       Compiling exercise v0.1.0 (/home/mgeisler/tmp/exercise)
        Finished dev [unoptimized + debuginfo] target(s) in 0.75s
         Running `target/debug/exercise`
    Hello, world!
    
  4. Replace the boiler-plate code in src/main.rs with your own code. For example, using the example on the previous page, make src/main.rs look like

    fn main() {
        println!("Edit me!");
    }
  5. Use cargo run to build and run your updated binary:

    $ cargo run
       Compiling exercise v0.1.0 (/home/mgeisler/tmp/exercise)
        Finished dev [unoptimized + debuginfo] target(s) in 0.24s
         Running `target/debug/exercise`
    Edit me!
    
  6. Use cargo check to quickly check your project for errors, use cargo build to compile it without running it. You will find the output in target/debug/ for a normal debug build. Use cargo build --release to produce an optimized release build in target/release/.

  7. You can add dependencies for your project by editing Cargo.toml. When you run cargo commands, it will automatically download and compile missing dependencies for you.

Welcome to Day 1

This is the first day of Rust Fundamentals. We will cover a lot of ground today:

  • Basic Rust syntax: variables, scalar and compound types, enums, structs, references, functions, and methods.
  • Types and type inference.
  • Control flow constructs: loops, conditionals, and so on.
  • User-defined types: structs and enums.
  • Pattern matching: destructuring enums, structs, and arrays.

Schedule

In this session:

Including 10 minute breaks, this session should take about 1 hour and 40 minutes

This slide should take about 5 minutes.

Please remind the students that:

  • They should ask questions when they get them, don’t save them to the end.
  • The class is meant to be interactive and discussions are very much encouraged!
    • As an instructor, you should try to keep the discussions relevant, i.e., keep the discussions related to how Rust does things vs some other language. It can be hard to find the right balance, but err on the side of allowing discussions since they engage people much more than one-way communication.
  • The questions will likely mean that we talk about things ahead of the slides.
    • This is perfectly okay! Repetition is an important part of learning. Remember that the slides are just a support and you are free to skip them as you like.

The idea for the first day is to show the “basic” things in Rust that should have immediate parallels in other languages. The more advanced parts of Rust come on the subsequent days.

If you’re teaching this in a classroom, this is a good place to go over the schedule. Note that there is an exercise at the end of each segment, followed by a break. Plan to cover the exercise solution after the break. The times listed here are a suggestion in order to keep the course on schedule. Feel free to be flexible and adjust as necessary!

Hello, World

In this segment:

This segment should take about 15 minutes

What is Rust?

Rust is a relatively new programming language which had its 1.0 release in 2015:

  • Rust is a statically compiled language in a similar role as C++
    • rustc uses LLVM as its backend.
  • Rust supports many platforms and architectures:
    • x86, ARM, WebAssembly, …
    • Linux, Mac, Windows, …
  • Rust is used for a wide range of devices:
    • firmware and boot loaders,
    • smart displays,
    • mobile phones,
    • desktops,
    • servers.
This slide should take about 10 minutes.

Rust fits in the same area as C++:

  • High flexibility.
  • High level of control.
  • Can be scaled down to very constrained devices such as microcontrollers.
  • Has no runtime or garbage collection.
  • Focuses on reliability and safety without sacrificing performance.

Benefits of Rust

Some unique selling points of Rust:

  • Compile time memory safety - whole classes of memory bugs are prevented at compile time

    • No uninitialized variables.
    • No double-frees.
    • No use-after-free.
    • No NULL pointers.
    • No forgotten locked mutexes.
    • No data races between threads.
    • No iterator invalidation.
  • No undefined runtime behavior - what a Rust statement does is never left unspecified

    • Array access is bounds checked.
    • Integer overflow is defined (panic or wrap-around).
  • Modern language features - as expressive and ergonomic as higher-level languages

    • Enums and pattern matching.
    • Generics.
    • No overhead FFI.
    • Zero-cost abstractions.
    • Great compiler errors.
    • Built-in dependency manager.
    • Built-in support for testing.
    • Excellent Language Server Protocol support.
This slide should take about 3 minutes.

Do not spend much time here. All of these points will be covered in more depth later.

Make sure to ask the class which languages they have experience with. Depending on the answer you can highlight different features of Rust:

  • Experience with C or C++: Rust eliminates a whole class of runtime errors via the borrow checker. You get performance like in C and C++, but you don’t have the memory unsafety issues. In addition, you get a modern language with constructs like pattern matching and built-in dependency management.

  • Experience with Java, Go, Python, JavaScript…: You get the same memory safety as in those languages, plus a similar high-level language feeling. In addition you get fast and predictable performance like C and C++ (no garbage collector) as well as access to low-level hardware (should you need it)

Playground

The Rust Playground provides an easy way to run short Rust programs, and is the basis for the examples and exercises in this course. Try running the “hello-world” program it starts with. It comes with a few handy features:

  • Under “Tools”, use the rustfmt option to format your code in the “standard” way.

  • Rust has two main “profiles” for generating code: Debug (extra runtime checks, less optimization) and Release (fewer runtime checks, lots of optimization). These are accessible under “Debug” at the top.

  • If you’re interested, use “ASM” under “…” to see the generated assembly code.

This slide should take about 2 minutes.

As students head into the break, encourage them to open up the playground and experiment a little. Encourage them to keep the tab open and try things out during the rest of the course. This is particularly helpful for advanced students who want to know more about Rust’s optimizations or generated assembly.

Types and Values

In this segment:

This segment should take about 30 minutes

Hello, World

Let us jump into the simplest possible Rust program, a classic Hello World program:

fn main() {
    println!("Hello 🌍!");
}

What you see:

  • Functions are introduced with fn.
  • Blocks are delimited by curly braces like in C and C++.
  • The main function is the entry point of the program.
  • Rust has hygienic macros, println! is an example of this.
  • Rust strings are UTF-8 encoded and can contain any Unicode character.
This slide should take about 5 minutes.

This slide tries to make the students comfortable with Rust code. They will see a ton of it over the next four days so we start small with something familiar.

Key points:

  • Rust is very much like other languages in the C/C++/Java tradition. It is imperative and it doesn’t try to reinvent things unless absolutely necessary.

  • Rust is modern with full support for things like Unicode.

  • Rust uses macros for situations where you want to have a variable number of arguments (no function overloading).

  • Macros being ‘hygienic’ means they don’t accidentally capture identifiers from the scope they are used in. Rust macros are actually only partially hygienic.

  • Rust is multi-paradigm. For example, it has powerful object-oriented programming features, and, while it is not a functional language, it includes a range of functional concepts.

Variables

Rust provides type safety via static typing. Variable bindings are made with let:

fn main() {
    let x: i32 = 10;
    println!("x: {x}");

    // x = 20;
    // println!("x: {x}");
}
This slide should take about 5 minutes.
  • Uncomment the x = 20 to demonstrate that variables are immutable by default. Add the mut keyword to allow changes.

  • The i32 here is the type of the variable. This must be known at compile time, but type inference (covered later) allows the programmer to omit it in many cases.

Values

Here are some basic built-in types, and the syntax for literal values of each type.

TypesLiterals
Signed integersi8, i16, i32, i64, i128, isize-10, 0, 1_000, 123_i64
Unsigned integersu8, u16, u32, u64, u128, usize0, 123, 10_u16
Floating point numbersf32, f643.14, -10.0e20, 2_f32
Unicode scalar valueschar'a', 'α', '∞'
Booleansbooltrue, false

The types have widths as follows:

  • iN, uN, and fN are N bits wide,
  • isize and usize are the width of a pointer,
  • char is 32 bits wide,
  • bool is 8 bits wide.
This slide should take about 5 minutes.

There are a few syntaxes which are not shown above:

  • All underscores in numbers can be left out, they are for legibility only. So 1_000 can be written as 1000 (or 10_00), and 123_i64 can be written as 123i64.

Arithmetic

fn interproduct(a: i32, b: i32, c: i32) -> i32 {
    return a * b + b * c + c * a;
}

fn main() {
    println!("result: {}", interproduct(120, 100, 248));
}
This slide should take about 3 minutes.

This is the first time we’ve seen a function other than main, but the meaning should be clear: it takes three integers, and returns an integer. Functions will be covered in more detail later.

Arithmetic is very similar to other languages, with similar precedence.

What about integer overflow? In C and C++ overflow of signed integers is actually undefined, and might do different things on different platforms or compilers. In Rust, it’s defined.

Change the i32’s to i16 to see an integer overflow, which panics (checked) in a debug build and wraps in a release build. There are other options, such as overflowing, saturating, and carrying. These are accessed with method syntax, e.g., (a * b).saturating_add(b * c).saturating_add(c * a).

In fact, the compiler will detect overflow of constant expressions, which is why the example requires a separate function.

Strings

Rust has two types to represent strings, both of which will be covered in more depth later. Both always store UTF-8 encoded strings.

  • String - a modifiable, owned string.
  • &str - a read-only string. String literals have this type.
fn main() {
    let greeting: &str = "Greetings";
    let planet: &str = "🪐";

    let mut sentence = String::new();
    sentence.push_str(greeting);
    sentence.push_str(", ");
    sentence.push_str(planet);
    println!("final sentence: {sentence}");

    println!("{:?}", &sentence[0..5]);
    //println!("{:?}", &sentence[12..13]);
}
This slide should take about 5 minutes.

This slide introduces strings. Everything here will be covered in more depth later, but this is enough for subsequent slides and exercises to use strings.

  • Invalid UTF-8 in a string is UB, and this not allowed in safe Rust.

  • String is a user-defined type with a constructor (::new()) and methods like s.push_str(..).

  • The & in &str indicates that this is a reference. We will cover references later, so for now just think of &str as a unit meaning “a read-only string”.

  • The commented-out line is indexing into the string by byte position. 12..13 does not end on a character boundary, so the program panics. Adjust it to a range that does, based on the error message.

  • Raw strings allow you to create a &str value with escapes disabled: r"\n" == "\\n". You can embed double-quotes by using an equal amount of # on either side of the quotes:

    fn main() {
        println!(r#"<a href="link.html">link</a>"#);
        println!("<a href=\"link.html\">link</a>");
    }
  • Using {:?} is a convenient way to print array/vector/struct of values for debugging purposes, and it’s commonly used in code.

Type Inference

Rust will look at how the variable is used to determine the type:

fn takes_u32(x: u32) {
    println!("u32: {x}");
}

fn takes_i8(y: i8) {
    println!("i8: {y}");
}

fn main() {
    let x = 10;
    let y = 20;

    takes_u32(x);
    takes_i8(y);
    // takes_u32(y);
}
This slide should take about 3 minutes.

This slide demonstrates how the Rust compiler infers types based on constraints given by variable declarations and usages.

It is very important to emphasize that variables declared like this are not of some sort of dynamic “any type” that can hold any data. The machine code generated by such declaration is identical to the explicit declaration of a type. The compiler does the job for us and helps us write more concise code.

When nothing constrains the type of an integer literal, Rust defaults to i32. This sometimes appears as {integer} in error messages. Similarly, floating-point literals default to f64.

fn main() {
    let x = 3.14;
    let y = 20;
    assert_eq!(x, y);
    // ERROR: no implementation for `{float} == {integer}`
}

Control Flow Basics

In this segment:

This segment should take about 25 minutes

if expressions

You use if expressions exactly like if statements in other languages:

fn main() {
    let x = 10;
    if x == 0 {
        println!("zero!");
    } else if x < 100 {
        println!("biggish");
    } else {
        println!("huge");
    }
}

In addition, you can use if as an expression. The last expression of each block becomes the value of the if expression:

fn main() {
    let x = 10;
    let size = if x < 20 { "small" } else { "large" };
    println!("number size: {size}");
}
This slide should take about 4 minutes.

Because if is an expression and must have a particular type, both of its branch blocks must have the same type. Show what happens if you add ; after "small" in the second example.

When if is used in an expression, the expression must have a ; to separate it from the next statement. Remove the ; before println! to see the compiler error.

Loops

There are three looping keywords in Rust: while, loop, and for:

while

The while keyword works much like in other languages, executing the loop body as long as the condition is true.

fn main() {
    let mut x = 200;
    while x >= 10 {
        x = x / 2;
    }
    println!("Final x: {x}");
}

for

The for loop iterates over ranges of values or the items in a collection:

fn main() {
    for x in 1..5 {
        println!("x: {x}");
    }

    for elem in [1, 2, 3, 4, 5] {
        println!("elem: {elem}");
    }
}
  • Under the hood for loops use a concept called “iterators” to handle iterating over different kinds of ranges/collections. Iterators will be discussed in more detail later.
  • Note that the for loop only iterates to 4. Show the 1..=5 syntax for an inclusive range.

loop

The loop statement just loops forever, until a break.

fn main() {
    let mut i = 0;
    loop {
        i += 1;
        println!("{i}");

        if i > 100 {
            break;
        }
    }
}

break and continue

If you want to immediately start the next iteration use continue.

If you want to exit any kind of loop early, use break. For loop, this can take an optional expression that becomes the value of the loop expression.

fn main() {
    let mut i = 0;
    loop {
        i += 1;
        if i > 5 {
            break;
        }
        if i % 2 == 0 {
            continue;
        }
        println!("{i}");
    }
}

Labels

Both continue and break can optionally take a label argument which is used to break out of nested loops:

fn without_labeled_loops() {
    let mut break_outer_loop = false;

    for i in [1, 2, 3] {
        for j in [11, 22, 33] {
            println!("{i}, {j}");

            if i + j == 24 {
                break_outer_loop = true;
                break;
            }
        }

        if break_outer_loop {
            break;
        }
    }
}

fn with_labeled_loops() {
    'outer: for i in [1, 2, 3] {
        for j in [11, 22, 33] {
            println!("{i}, {j}");

            if i + j == 24 {
                break 'outer;
            }
        }
    }
}

fn main() {
    without_labeled_loops();
    println!("-----");
    with_labeled_loops();
}
  • Note that loop is the only looping construct which returns a non-trivial value. This is because it’s guaranteed to be entered at least once (unlike while and for loops).

Blocks and Scopes

Blocks

A block in Rust contains a sequence of expressions, enclosed by braces {}. Each block has a value and a type, which are those of the last expression of the block:

fn main() {
    let z = 13;
    let x = {
        let y = 10;
        println!("y: {y}");
        z - y
    };
    println!("x: {x}");
}

If the last expression ends with ;, then the resulting value and type is ().

This slide and its sub-slides should take about 5 minutes.
  • You can show how the value of the block changes by changing the last line in the block. For instance, adding/removing a semicolon or using a return.

Scopes and Shadowing

A variable’s scope is limited to the enclosing block.

You can shadow variables, both those from outer scopes and variables from the same scope:

fn main() {
    let a = 10;
    println!("before: {a}");
    {
        let a = "hello";
        println!("inner scope: {a}");

        let a = true;
        println!("shadowed in inner scope: {a}");
    }

    println!("after: {a}");
}
  • Show that a variable’s scope is limited by adding a b in the inner block in the last example, and then trying to access it outside that block.
  • Shadowing is different from mutation, because after shadowing both variable’s memory locations exist at the same time. Both are available under the same name, depending where you use it in the code.
  • A shadowing variable can have a different type.
  • Shadowing looks obscure at first, but is convenient for holding on to values after .unwrap().

Functions

fn gcd(a: u32, b: u32) -> u32 {
    if b > 0 {
        gcd(b, a % b)
    } else {
        a
    }
}

fn main() {
    println!("gcd: {}", gcd(143, 52));
}
This slide should take about 3 minutes.
  • Declaration parameters are followed by a type (the reverse of some programming languages), then a return type.
  • The last expression in a function body (or any block) becomes the return value. Simply omit the ; at the end of the expression. The return keyword can be used for early return, but the “bare value” form is idiomatic at the end of a function (refactor gcd to use a return).
  • Some functions have no return value, and return the ‘unit type’, (). The compiler will infer this if the -> () return type is omitted.
  • Overloading is not supported – each function has a single implementation.
    • Always takes a fixed number of parameters. Default arguments are not supported. Macros can be used to support variadic functions.
    • Always takes a single set of parameter types. These types can be generic, which will be covered later.

Macros

Macros are expanded into Rust code during compilation, and can take a variable number of arguments. They are distinguished by a ! at the end. The Rust standard library includes an assortment of useful macros.

  • println!(format, ..) prints a line to standard output, applying formatting described in std::fmt.
  • format!(format, ..) works just like println! but returns the result as a string.
  • dbg!(expression) logs the value of the expression and returns it.
  • todo!() marks a bit of code as not-yet-implemented. If executed, it will panic.
  • unreachable!() marks a bit of code as unreachable. If executed, it will panic.
fn factorial(n: u32) -> u32 {
    let mut product = 1;
    for i in 1..=n {
        product *= dbg!(i);
    }
    product
}

fn fizzbuzz(n: u32) -> u32 {
    todo!()
}

fn main() {
    let n = 4;
    println!("{n}! = {}", factorial(n));
}
This slide should take about 2 minutes.

The takeaway from this section is that these common conveniences exist, and how to use them. Why they are defined as macros, and what they expand to, is not especially critical.

The course does not cover defining macros, but a later section will describe use of derive macros.

Day 1: Morning Exercises

We will be using collective-rustlings for the exercises in this course.

collective-rustlings is a temporary fork of rustlings which is a collection of “small exercises to get you used to reading and writing Rust code”. rustlings is an awesome learning resource for Rust!

I created the fork to be able to integrate collective-score into it so that your progress is shared with the class while you work on the exercises. I also removed some exercises and changed the order of some to fit the order of topics in this course.

Before starting with the exercises, you have to follow the following steps:

If you need any help, you can always ask me 😃


You should do the following Rustlings:

  • intro1
  • intro2
  • variables1
  • variables2
  • variables3
  • variables4
  • variables5
  • functions1
  • functions2
  • functions3
  • if1
  • if2
  • if3

rustlings watch will guide you through these exercises in the same order. But you should stop after the last exercise listed above. The rest of the exercises will be done later.

Welcome Back

In this session:

Including 10 minute breaks, this session should take about 1 hour and 30 minutes

Tuples and Arrays

In this segment:

This segment should take about 20 minutes

Arrays

fn main() {
    let mut a: [i8; 10] = [42; 10];
    a[5] = 0;
    println!("a: {a:?}");
}
This slide should take about 5 minutes.
  • A value of the array type [T; N] holds N (a compile-time constant) elements of the same type T. Note that the length of the array is part of its type, which means that [u8; 3] and [u8; 4] are considered two different types. Slices, which have a size determined at runtime, are covered later.

  • Try accessing an out-of-bounds array element. Array accesses are checked at runtime. Rust can usually optimize these checks away, and they can be avoided using unsafe Rust.

  • We can use literals to assign values to arrays.

  • The println! macro asks for the debug implementation with the ? format parameter: {} gives the default output, {:?} gives the debug output. Types such as integers and strings implement the default output, but arrays only implement the debug output. This means that we must use debug output here.

  • Adding #, eg {a:#?}, invokes a “pretty printing” format, which can be easier to read.

Tuples

fn main() {
    let t: (i8, bool) = (7, true);
    println!("t.0: {}", t.0);
    println!("t.1: {}", t.1);
}
This slide should take about 5 minutes.
  • Like arrays, tuples have a fixed length.

  • Tuples group together values of different types into a compound type.

  • Fields of a tuple can be accessed by the period and the index of the value, e.g. t.0, t.1.

  • The empty tuple () is referred to as the “unit type” and signifies absence of a return value, akin to void in other languages.

Array Iteration

The for statement supports iterating over arrays (but not tuples).

fn main() {
    let primes = [2, 3, 5, 7, 11, 13, 17, 19];
    for prime in primes {
        for i in 2..prime {
            assert_ne!(prime % i, 0);
        }
    }
}
This slide should take about 3 minutes.

This functionality uses the IntoIterator trait, but we haven’t covered that yet.

The assert_ne! macro is new here. There are also assert_eq! and assert! macros. These are always checked while, debug-only variants like debug_assert! compile to nothing in release builds.

Patterns and Destructuring

When working with tuples and other structured values it’s common to want to extract the inner values into local variables. This can be done manually by directly accessing the inner values:

fn print_tuple(tuple: (i32, i32)) {
    let left = tuple.0;
    let right = tuple.1;
    println!("left: {left}, right: {right}");
}

However, Rust also supports using pattern matching to destructure a larger value into its constituent parts:

fn print_tuple(tuple: (i32, i32)) {
    let (left, right) = tuple;
    println!("left: {left}, right: {right}");
}

This works with any kind of structured value:

struct Foo {
    a: i32,
    b: bool,
}

fn print_foo(foo: Foo) {
    let Foo { a, b } = foo;
    println!("a: {a}, b: {b}");
}
This slide should take about 5 minutes.
  • The patterns used here are “irrefutable”, meaning that the compiler can statically verify that the value on the right of = has the same structure as the pattern.
  • A variable name is an irrefutable pattern that always matches any value, hence why we can also use let to declare a single variable.
  • Rust also supports using patterns in conditionals, allowing for equality comparison and destructuring to happen at the same time. This form of pattern matching will be discussed in more detail later.
  • Edit the examples above to show the compiler error when the pattern doesn’t match the value being matched on.

References

In this segment:

This segment should take about 25 minutes

Shared References

A reference provides a way to access another value without taking responsibility for the value, and is also called “borrowing”. Shared references are read-only, and the referenced data cannot change.

fn main() {
    let a = 'A';
    let b = 'B';
    let mut r: &char = &a;
    println!("r: {}", *r);
    r = &b;
    println!("r: {}", *r);
}

A shared reference to a type T has type &T. A reference value is made with the & operator. The * operator “dereferences” a reference, yielding its value.

Rust will statically forbid dangling references:

fn x_axis(x: i32) -> &(i32, i32) {
    let point = (x, 0);
    return &point;
}
This slide should take about 10 minutes.
  • A reference is said to “borrow” the value it refers to, and this is a good model for students not familiar with pointers: code can use the reference to access the value, but is still “owned” by the original variable. The course will get into more detail on ownership in day 3.

  • References are implemented as pointers, and a key advantage is that they can be much smaller than the thing they point to. Students familiar with C or C++ will recognize references as pointers. Later parts of the course will cover how Rust prevents the memory-safety bugs that come from using raw pointers.

  • Rust does not automatically create references for you - the & is always required.

  • Rust will auto-dereference in some cases, in particular when invoking methods (try r.is_ascii()). There is no need for an -> operator like in C++.

  • In this example, r is mutable so that it can be reassigned (r = &b). Note that this re-binds r, so that it refers to something else. This is different from C++, where assignment to a reference changes the referenced value.

  • A shared reference does not allow modifying the value it refers to, even if that value was mutable. Try *r = 'X'.

  • Rust is tracking the lifetimes of all references to ensure they live long enough. Dangling references cannot occur in safe Rust. x_axis would return a reference to point, but point will be deallocated when the function returns, so this will not compile.

  • We will talk more about borrowing when we get to ownership.

Exclusive References

Exclusive references, also known as mutable references, allow changing the value they refer to. They have type &mut T.

fn main() {
    let mut point = (1, 2);
    let x_coord = &mut point.0;
    *x_coord = 20;
    println!("point: {point:?}");
}
This slide should take about 10 minutes.

Key points:

  • “Exclusive” means that only this reference can be used to access the value. No other references (shared or exclusive) can exist at the same time, and the referenced value cannot be accessed while the exclusive reference exists. Try making an &point.0 or changing point.0 while x_coord is alive.

  • Be sure to note the difference between let mut x_coord: &i32 and let x_coord: &mut i32. The first one represents a shared reference which can be bound to different values, while the second represents an exclusive reference to a mutable value.

Static and Const

Static and constant variables are two different ways to create globally-scoped values that cannot be moved or reallocated during the execution of the program.

const

Constant variables are evaluated at compile time and their values are inlined wherever they are used:

const BEST_NUMBER: u8 = 42;

fn main() {
    println!("The best number is: {BEST_NUMBER}");
}

According to the Rust RFC Book these are inlined upon use.

Only functions marked const can be called at compile time to generate const values. const functions can however be called at runtime.

static

Static variables will live during the whole execution of the program, and therefore will not move:

static BANNER: &str = "Welcome to RustOS 3.14";

fn main() {
    println!("{BANNER}");
}

As noted in the Rust RFC Book, these are not inlined upon use and have an actual associated memory location. This is useful for unsafe and embedded code, and the variable lives through the entirety of the program execution. When a globally-scoped value does not have a reason to need object identity, const is generally preferred.

This slide should take about 5 minutes.
  • Mention that const behaves semantically similar to C++’s constexpr.
  • static, on the other hand, is much more similar to a const or mutable global variable in C++.
  • static provides object identity: an address in memory and state as required by types with interior mutability such as Mutex<T>.
  • It isn’t super common that one would need a runtime evaluated constant, but it is helpful and safer than using a static.

Properties table:

PropertyStaticConstant
Has an address in memoryYesNo (inlined)
Lives for the entire duration of the programYesNo
Can be mutableYes (unsafe)No
Evaluated at compile timeYes (initialised at compile time)Yes
Inlined wherever it is usedNoYes

More to Explore

Because static variables are accessible from any thread, they must be Sync. Interior mutability is possible through a Mutex, atomic or similar.

Thread-local data can be created with the macro std::thread_local.

User-Defined Types

In this segment:

This segment should take about 30 minutes

Named Structs

Like C and C++, Rust has support for custom structs:

struct Person {
    name: String,
    age: u8,
}

fn describe(person: &Person) {
    println!("{} is {} years old", person.name, person.age);
}

fn main() {
    let mut peter = Person { name: String::from("Peter"), age: 27 };
    describe(&peter);

    peter.age = 28;
    describe(&peter);

    let name = String::from("Avery");
    let age = 39;
    let avery = Person { name, age };
    describe(&avery);

    let jackie = Person { name: String::from("Jackie"), ..avery };
    describe(&jackie);
}
This slide should take about 10 minutes.

Key Points:

  • Structs work like in C or C++.
    • Like in C++, and unlike in C, no typedef is needed to define a type.
    • Unlike in C++, there is no inheritance between structs.
  • This may be a good time to let people know there are different types of structs.
    • Zero-sized structs (e.g. struct Foo;) might be used when implementing a trait on some type but don’t have any data that you want to store in the value itself.
    • The next slide will introduce Tuple structs, used when the field names are not important.
  • If you already have variables with the right names, then you can create the struct using a shorthand.
  • The syntax ..avery allows us to copy the majority of the fields from the old struct without having to explicitly type it all out. It must always be the last element.

Tuple Structs

If the field names are unimportant, you can use a tuple struct:

struct Point(i32, i32);

fn main() {
    let p = Point(17, 23);
    println!("({}, {})", p.0, p.1);
}

This is often used for single-field wrappers (called newtypes):

struct PoundsOfForce(f64);
struct Newtons(f64);

fn compute_thruster_force() -> PoundsOfForce {
    todo!("Ask a rocket scientist at NASA")
}

fn set_thruster_force(force: Newtons) {
    // ...
}

fn main() {
    let force = compute_thruster_force();
    set_thruster_force(force);
}
This slide should take about 10 minutes.
  • Newtypes are a great way to encode additional information about the value in a primitive type, for example:
    • The number is measured in some units: Newtons in the example above.
    • The value passed some validation when it was created, so you no longer have to validate it again at every use: PhoneNumber(String) or OddNumber(u32).
  • Demonstrate how to add a f64 value to a Newtons type by accessing the single field in the newtype.
    • Rust generally doesn’t like inexplicit things, like automatic unwrapping or for instance using booleans as integers.
    • Operator overloading is discussed on Day 3 (generics).
  • The example is a subtle reference to the Mars Climate Orbiter failure.

Enums

The enum keyword allows the creation of a type which has a few different variants:

#[derive(Debug)]
enum Direction {
    Left,
    Right,
}

#[derive(Debug)]
enum PlayerMove {
    Pass,                        // Simple variant
    Run(Direction),              // Tuple variant
    Teleport { x: u32, y: u32 }, // Struct variant
}

fn main() {
    let m: PlayerMove = PlayerMove::Run(Direction::Left);
    println!("On this turn: {m:?}");
}
This slide should take about 5 minutes.

Key Points:

  • Enumerations allow you to collect a set of values under one type.
  • Direction is a type with variants. There are two values of Direction: Direction::Left and Direction::Right.
  • PlayerMove is a type with three variants. In addition to the payloads, Rust will store a discriminant so that it knows at runtime which variant is in a PlayerMove value.
  • This might be a good time to compare structs and enums:
    • In both, you can have a simple version without fields (unit struct) or one with different types of fields (variant payloads).
    • You could even implement the different variants of an enum with separate structs but then they wouldn’t be the same type as they would if they were all defined in an enum.
  • Rust uses minimal space to store the discriminant.
    • If necessary, it stores an integer of the smallest required size

    • If the allowed variant values do not cover all bit patterns, it will use invalid bit patterns to encode the discriminant (the “niche optimization”). For example, Option<&u8> stores either a pointer to an integer or NULL for the None variant.

    • You can control the discriminant if needed (e.g., for compatibility with C):

      #[repr(u32)]
      enum Bar {
          A, // 0
          B = 10000,
          C, // 10001
      }
      
      fn main() {
          println!("A: {}", Bar::A as u32);
          println!("B: {}", Bar::B as u32);
          println!("C: {}", Bar::C as u32);
      }

      Without repr, the discriminant type takes 2 bytes, because 10001 fits 2 bytes.

More to Explore

Rust has several optimizations it can employ to make enums take up less space.

  • Null pointer optimization: For some types, Rust guarantees that size_of::<T>() equals size_of::<Option<T>>().

    Example code if you want to show how the bitwise representation may look like in practice. It’s important to note that the compiler provides no guarantees regarding this representation, therefore this is totally unsafe.

    use std::mem::transmute;
    
    macro_rules! dbg_bits {
        ($e:expr, $bit_type:ty) => {
            println!("- {}: {:#x}", stringify!($e), transmute::<_, $bit_type>($e));
        };
    }
    
    fn main() {
        unsafe {
            println!("bool:");
            dbg_bits!(false, u8);
            dbg_bits!(true, u8);
    
            println!("Option<bool>:");
            dbg_bits!(None::<bool>, u8);
            dbg_bits!(Some(false), u8);
            dbg_bits!(Some(true), u8);
    
            println!("Option<Option<bool>>:");
            dbg_bits!(Some(Some(false)), u8);
            dbg_bits!(Some(Some(true)), u8);
            dbg_bits!(Some(None::<bool>), u8);
            dbg_bits!(None::<Option<bool>>, u8);
    
            println!("Option<&i32>:");
            dbg_bits!(None::<&i32>, usize);
            dbg_bits!(Some(&0i32), usize);
        }
    }

Type Aliases

A type alias creates a name for another type. The two types can be used interchangeably.

enum CarryableConcreteItem {
    Left,
    Right,
}

type Item = CarryableConcreteItem;

// Aliases are more useful with long, complex types:
use std::sync::Mutex;
type PlayerInventory = Mutex<Vec<Item>>;
This slide should take about 2 minutes.

C programmers will recognize this as similar to a typedef.

Day 1: Afternoon Exercises

You should do the following Rustlings:

  • variables6
  • primitive_types1
  • primitive_types2
  • primitive_types3
  • primitive_types4
  • primitive_types5
  • primitive_types6
  • quiz1

Welcome to Day 2

Now that we have seen a fair amount of Rust, we will focus on Rust’s type system:

  • Pattern matching: extracting data from structures.
  • Methods: associating functions with types.
  • Traits: behaviors shared by multiple types.
  • Generics: parameterizing types on other types.

Schedule

In this session:

Including 10 minute breaks, this session should take about 1 hour and 50 minutes

Pattern Matching

In this segment:

This segment should take about 30 minutes

Matching Values

The match keyword lets you match a value against one or more patterns. The comparisons are done from top to bottom and the first match wins.

The patterns can be simple values, similarly to switch in C and C++:

#[rustfmt::skip]
fn main() {
    let input = 'x';
    match input {
        'q'                       => println!("Quitting"),
        'a' | 's' | 'w' | 'd'     => println!("Moving around"),
        '0'..='9'                 => println!("Number input"),
        key if key.is_lowercase() => println!("Lowercase: {key}"),
        _                         => println!("Something else"),
    }
}

The _ pattern is a wildcard pattern which matches any value. The expressions must be exhaustive, meaning that it covers every possibility, so _ is often used as the final catch-all case.

Match can be used as an expression. Just like if, each match arm must have the same type. The type is the last expression of the block, if any. In the example above, the type is ().

A variable in the pattern (key in this example) will create a binding that can be used within the match arm.

A match guard causes the arm to match only if the condition is true.

This slide should take about 10 minutes.

Key Points:

  • You might point out how some specific characters are being used when in a pattern

    • | as an or
    • .. can expand as much as it needs to be
    • 1..=5 represents an inclusive range
    • _ is a wild card
  • Match guards as a separate syntax feature are important and necessary when we wish to concisely express more complex ideas than patterns alone would allow.

  • They are not the same as separate if expression inside of the match arm. An if expression inside of the branch block (after =>) happens after the match arm is selected. Failing the if condition inside of that block won’t result in other arms of the original match expression being considered.

  • The condition defined in the guard applies to every expression in a pattern with an |.

Destructuring

Like tuples, structs and enums can also be destructured by matching:

Structs

struct Foo {
    x: (u32, u32),
    y: u32,
}

#[rustfmt::skip]
fn main() {
    let foo = Foo { x: (1, 2), y: 3 };
    match foo {
        Foo { x: (1, b), y } => println!("x.0 = 1, b = {b}, y = {y}"),
        Foo { y: 2, x: i }   => println!("y = 2, x = {i:?}"),
        Foo { y, .. }        => println!("y = {y}, other fields were ignored"),
    }
}

Enums

Patterns can also be used to bind variables to parts of your values. This is how you inspect the structure of your types. Let us start with a simple enum type:

enum Result {
    Ok(i32),
    Err(String),
}

fn divide_in_two(n: i32) -> Result {
    if n % 2 == 0 {
        Result::Ok(n / 2)
    } else {
        Result::Err(format!("cannot divide {n} into two equal parts"))
    }
}

fn main() {
    let n = 100;
    match divide_in_two(n) {
        Result::Ok(half) => println!("{n} divided in two is {half}"),
        Result::Err(msg) => println!("sorry, an error happened: {msg}"),
    }
}

Here we have used the arms to destructure the Result value. In the first arm, half is bound to the value inside the Ok variant. In the second arm, msg is bound to the error message.

This slide should take about 8 minutes.

Structs

  • Change the literal values in foo to match with the other patterns.
  • Add a new field to Foo and make changes to the pattern as needed.
  • The distinction between a capture and a constant expression can be hard to spot. Try changing the 2 in the second arm to a variable, and see that it subtly doesn’t work. Change it to a const and see it working again.

Enums

Key points:

  • The if/else expression is returning an enum that is later unpacked with a match.
  • You can try adding a third variant to the enum definition and displaying the errors when running the code. Point out the places where your code is now inexhaustive and how the compiler tries to give you hints.
  • The values in the enum variants can only be accessed after being pattern matched.
  • Demonstrate what happens when the search is inexhaustive. Note the advantage the Rust compiler provides by confirming when all cases are handled.
  • Save the result of divide_in_two in the result variable and match it in a loop. That won’t compile because msg is consumed when matched. To fix it, match &result instead of result. That will make msg a reference so it won’t be consumed. This “match ergonomics” appeared in Rust 2018. If you want to support older Rust, replace msg with ref msg in the pattern.

Let Control Flow

Rust has a few control flow constructs which differ from other languages. They are used for pattern matching:

  • if let expressions
  • while let expressions
  • match expressions

if-let expressions

The if let expression lets you execute different code depending on whether a value matches a pattern:

use std::{time::Duration, thread::sleep};

fn sleep_for(secs: f32) {
    let dur = if let Ok(dur) = Duration::try_from_secs_f32(secs) {
        dur
    } else {
        Duration::from_millis(500)
    };
    sleep(dur);
    println!("slept for {:?}", dur);
}

fn main() {
    sleep_for(-10.0);
    sleep_for(0.8);
}

let-else

For the common case of matching a pattern and returning from the function, use let else. The “else” case must diverge (return, break, or panic - anything but falling off the end of the block).

use std::{time::Duration, thread::sleep};

fn sleep_for(secs: f32) {
    let Ok(dur) = Duration::try_from_secs_f32(secs) else {
        println!("Invalid seconds value!");
        return;
    };
    sleep(dur);
    println!("slept for {:?}", dur);
}

fn main() {
    sleep_for(-10.0);
    sleep_for(0.8);
}

while-let

Like with if let, there is a while let variant which repeatedly tests a value against a pattern:

fn main() {
    let mut name = String::from("Comprehensive Rust 🦀");
    while let Some(c) = name.pop() {
        println!("character: {c}");
    }
    // (There are more efficient ways to reverse a string!)
}

Here String::pop returns Some(c) until the string is empty, after which it will return None. The while let lets us keep iterating through all items.

This slide should take about 10 minutes.

if-let

  • Unlike match, if let does not have to cover all branches. This can make it more concise than match.
  • A common usage is handling Some values when working with Option.
  • Unlike match, if let does not support guard clauses for pattern matching.

let-else

if-lets can pile up. The let-else construct supports flattening nested code.

while-let

  • Point out that the while let loop will keep going as long as the value matches the pattern.
  • You could rewrite the while let loop as an infinite loop with an if statement that breaks when there is no value to unwrap for name.pop(). The while let provides syntactic sugar for the above scenario.

Methods and Traits

In this segment:

This segment should take about 25 minutes

Methods

Rust allows you to associate functions with your new types. You do this with an impl block:

#[derive(Debug)]
struct Race {
    name: String,
    laps: Vec<i32>,
}

impl Race {
    // No receiver, a static method
    fn new(name: &str) -> Self {
        Self { name: String::from(name), laps: Vec::new() }
    }

    // Exclusive borrowed read-write access to self
    fn add_lap(&mut self, lap: i32) {
        self.laps.push(lap);
    }

    // Shared and read-only borrowed access to self
    fn print_laps(&self) {
        println!("Recorded {} laps for {}:", self.laps.len(), self.name);
        for (idx, lap) in self.laps.iter().enumerate() {
            println!("Lap {idx}: {lap} sec");
        }
    }

    // Exclusive ownership of self
    fn finish(self) {
        let total: i32 = self.laps.iter().sum();
        println!("Race {} is finished, total lap time: {}", self.name, total);
    }
}

fn main() {
    let mut race = Race::new("Monaco Grand Prix");
    race.add_lap(70);
    race.add_lap(68);
    race.print_laps();
    race.add_lap(71);
    race.print_laps();
    race.finish();
    // race.add_lap(42);
}

The self arguments specify the “receiver” - the object the method acts on. There are several common receivers for a method:

  • &self: borrows the object from the caller using a shared and immutable reference. The object can be used again afterwards.
  • &mut self: borrows the object from the caller using a unique and mutable reference. The object can be used again afterwards.
  • self: takes ownership of the object and moves it away from the caller. The method becomes the owner of the object. The object will be dropped (deallocated) when the method returns, unless its ownership is explicitly transmitted. Complete ownership does not automatically mean mutability.
  • mut self: same as above, but the method can mutate the object.
  • No receiver: this becomes a static method on the struct. Typically used to create constructors which are called new by convention.
This slide should take about 8 minutes.

Key Points:

  • It can be helpful to introduce methods by comparing them to functions.
    • Methods are called on an instance of a type (such as a struct or enum), the first parameter represents the instance as self.
    • Developers may choose to use methods to take advantage of method receiver syntax and to help keep them more organized. By using methods we can keep all the implementation code in one predictable place.
  • Point out the use of the keyword self, a method receiver.
    • Show that it is an abbreviated term for self: Self and perhaps show how the struct name could also be used.
    • Explain that Self is a type alias for the type the impl block is in and can be used elsewhere in the block.
    • Note how self is used like other structs and dot notation can be used to refer to individual fields.
    • This might be a good time to demonstrate how the &self differs from self by trying to run finish twice.
    • Beyond variants on self, there are also special wrapper types allowed to be receiver types, such as Box<Self>.

Traits

Rust lets you abstract over types with traits. They’re similar to interfaces:

trait Pet {
    /// Return a sentence from this pet.
    fn talk(&self) -> String;

    /// Print a string to the terminal greeting this pet.
    fn greet(&self);
}
This slide and its sub-slides should take about 10 minutes.
  • A trait defines a number of methods that types must have in order to implement the trait.

  • In the “Generics” segment, next, we will see how to build functionality that is generic over all types implementing a trait.

Implementing Traits

trait Pet {
    fn talk(&self) -> String;

    fn greet(&self) {
        println!("Oh you're a cutie! What's your name? {}", self.talk());
    }
}

struct Dog {
    name: String,
    age: i8,
}

impl Pet for Dog {
    fn talk(&self) -> String {
        format!("Woof, my name is {}!", self.name)
    }
}

fn main() {
    let fido = Dog { name: String::from("Fido"), age: 5 };
    fido.greet();
}
  • To implement Trait for Type, you use an impl Trait for Type { .. } block.

  • Unlike Go interfaces, just having matching methods is not enough: a Cat type with a talk() method would not automatically satisfy Pet unless it is in an impl Pet block.

  • Traits may provide default implementations of some methods. Default implementations can rely on all the methods of the trait. In this case, greet is provided, and relies on talk.

Associated Types

Associated types are placeholder types which are filled in by the trait implementation.

#[derive(Debug)]
struct Meters(i32);

#[derive(Debug)]
struct MetersSquared(i32);

trait Multiply {
    type Output;
    fn multiply(&self, other: &Self) -> Self::Output;
}

impl Multiply for Meters {
    type Output = MetersSquared;
    fn multiply(&self, other: &Self) -> Self::Output {
        MetersSquared(self.0 * other.0)
    }
}

fn main() {
    println!("{:?}", Meters(10).multiply(&Meters(20)));
}
  • Associated types are sometimes also called “output types”. The key observation is that the implementer, not the caller, chooses this type.

  • Many standard library traits have associated types, including arithmetic operators and Iterator.

Deriving

Supported traits can be automatically implemented for your custom types, as follows:

#[derive(Debug, Clone, Default)]
struct Player {
    name: String,
    strength: u8,
    hit_points: u8,
}

fn main() {
    let p1 = Player::default(); // Default trait adds `default` constructor.
    let mut p2 = p1.clone(); // Clone trait adds `clone` method.
    p2.name = String::from("EldurScrollz");

    // Debug trait adds support for printing with `{:?}`.
    println!("{p1:?} vs. {p2:?}");
}
This slide should take about 3 minutes.

Derivation is implemented with macros, and many crates provide useful derive macros to add useful functionality. For example, serde can derive serialization support for a struct using #[derive(Serialize)].

Generics

In this segment:

This segment should take about 30 minutes

Generic Functions

Rust supports generics, which lets you abstract algorithms or data structures (such as sorting or a binary tree) over the types used or stored.

/// Pick `even` or `odd` depending on the value of `n`.
fn pick<T>(n: i32, even: T, odd: T) -> T {
    if n % 2 == 0 {
        even
    } else {
        odd
    }
}

fn main() {
    println!("picked a number: {:?}", pick(97, 222, 333));
    println!("picked a tuple: {:?}", pick(28, ("dog", 1), ("cat", 2)));
}
This slide should take about 5 minutes.
  • Rust infers a type for T based on the types of the arguments and return value.

  • This is similar to C++ templates, but Rust partially compiles the generic function immediately, so that function must be valid for all types matching the constraints. For example, try modifying pick to return even + odd if n == 0. Even if only the pick instantiation with integers is used, Rust still considers it invalid. C++ would let you do this.

  • Generic code is turned into non-generic code based on the call sites. This is a zero-cost abstraction: you get exactly the same result as if you had hand-coded the data structures without the abstraction.

Generic Data Types

You can use generics to abstract over the concrete field type:

#[derive(Debug)]
struct Point<T> {
    x: T,
    y: T,
}

impl<T> Point<T> {
    fn coords(&self) -> (&T, &T) {
        (&self.x, &self.y)
    }

    // fn set_x(&mut self, x: T)
}

fn main() {
    let integer = Point { x: 5, y: 10 };
    let float = Point { x: 1.0, y: 4.0 };
    println!("{integer:?} and {float:?}");
    println!("coords: {:?}", integer.coords());
}
This slide should take about 10 minutes.
  • Q: Why T is specified twice in impl<T> Point<T> {}? Isn’t that redundant?

    • This is because it is a generic implementation section for generic type. They are independently generic.
    • It means these methods are defined for any T.
    • It is possible to write impl Point<i32> { .. }.
      • Point is still generic and you can use Point<f64>, but methods in this block will only be available for Point<i32>.
  • Try declaring a new variable let p = Point { x: 5, y: 10.0 };. Update the code to allow points that have elements of different types, by using two type variables, e.g., T and U.

Trait Bounds

When working with generics, you often want to require the types to implement some trait, so that you can call this trait’s methods.

You can do this with T: Trait or impl Trait:

fn duplicate<T: Clone>(a: T) -> (T, T) {
    (a.clone(), a.clone())
}

// struct NotClonable;

fn main() {
    let foo = String::from("foo");
    let pair = duplicate(foo);
    println!("{pair:?}");
}
This slide should take about 8 minutes.
  • Try making a NonClonable and passing it to duplicate.

  • When multiple traits are necessary, use + to join them.

  • Show a where clause, students will encounter it when reading code.

    fn duplicate<T>(a: T) -> (T, T)
    where
        T: Clone,
    {
        (a.clone(), a.clone())
    }
    • It declutters the function signature if you have many parameters.
    • It has additional features making it more powerful.
      • If someone asks, the extra feature is that the type on the left of “:” can be arbitrary, like Option<T>.
  • Note that Rust does not (yet) support specialization. For example, given the original duplicate, it is invalid to add a specialized duplicate(a: u32).

impl Trait

Similar to trait bounds, an impl Trait syntax can be used in function arguments and return values:

// Syntactic sugar for:
//   fn add_42_millions<T: Into<i32>>(x: T) -> i32 {
fn add_42_millions(x: impl Into<i32>) -> i32 {
    x.into() + 42_000_000
}

fn pair_of(x: u32) -> impl std::fmt::Debug {
    (x + 1, x - 1)
}

fn main() {
    let many = add_42_millions(42_i8);
    println!("{many}");
    let many_more = add_42_millions(10_000_000);
    println!("{many_more}");
    let debuggable = pair_of(27);
    println!("debuggable: {debuggable:?}");
}
This slide should take about 5 minutes.

impl Trait allows you to work with types which you cannot name. The meaning of impl Trait is a bit different in the different positions.

  • For a parameter, impl Trait is like an anonymous generic parameter with a trait bound.

  • For a return type, it means that the return type is some concrete type that implements the trait, without naming the type. This can be useful when you don’t want to expose the concrete type in a public API.

    Inference is hard in return position. A function returning impl Foo picks the concrete type it returns, without writing it out in the source. A function returning a generic type like collect<B>() -> B can return any type satisfying B, and the caller may need to choose one, such as with let x: Vec<_> = foo.collect() or with the turbofish, foo.collect::<Vec<_>>().

What is the type of debuggable? Try let debuggable: () = .. to see what the error message shows.

Day 2: Morning Exercises

⚠️ Before starting with the exercises: Run the command git pull in the directory collective-rustlings to get the updated exercises for this day!

In case you didn’t finish yesterday’s exercises, you will not be able to do today’s exercises with rustlings watch. In that case, you have to use the command rustlings run EXERCISE_NAME where the exercise name is one of those listed below. To get a hint for that exercise, you can run the command rustlings hint EXERCISE_NAME.

You should do the following Rustlings:

  • structs1
  • structs2
  • structs3
  • enums1
  • enums2
  • enums3
  • generics1
  • generics2
  • traits1
  • traits2
  • traits3
  • traits4
  • traits5

Welcome Back

In this session:

Including 10 minute breaks, this session should take about 40 minutes

Memory Management

In this segment:

This segment should take about 40 minutes

Review of Program Memory

Programs allocate memory in two ways:

  • Stack: Continuous area of memory for local variables.

    • Values have fixed sizes known at compile time.
    • Extremely fast: just move a stack pointer.
    • Easy to manage: follows function calls.
    • Great memory locality.
  • Heap: Storage of values outside of function calls.

    • Values have dynamic sizes determined at runtime.
    • Slower than the stack: some book-keeping needed.
    • No guarantee of memory locality.

Example

Creating a String puts fixed-sized metadata on the stack and dynamically sized data, the actual string, on the heap:

fn main() {
    let s1 = String::from("Hello");
}
StackHeaps1capacity5ptrHellolen5
This slide should take about 5 minutes.
  • Mention that a String is backed by a Vec, so it has a capacity and length and can grow if mutable via reallocation on the heap.

  • If students ask about it, you can mention that the underlying memory is heap allocated using the System Allocator and custom allocators can be implemented using the Allocator API

Approaches to Memory Management

Traditionally, languages have fallen into two broad categories:

  • Full control via manual memory management: C, C++, …
    • Programmer decides when to allocate or free heap memory.
    • Programmer must determine whether a pointer still points to valid memory.
    • Studies show, programmers make mistakes.
  • Full safety via automatic memory management at runtime: Python, Java, Scala, Go, Julia, Haskell, …
    • A runtime system ensures that memory is not freed until it can no longer be referenced.
    • Typically implemented with reference counting, garbage collection (GC), or RAII.

Rust offers a new mix:

Full control and safety via compile time enforcement of correct memory management.

It does this with an explicit ownership concept.

This slide should take about 10 minutes.

This slide is intended to help students coming from other languages to put Rust in context.

  • C must manage heap manually with malloc and free. Common errors include forgetting to call free, calling it multiple times for the same pointer, or dereferencing a pointer after the memory it points to has been freed.

  • C++ has tools like smart pointers (unique_ptr, shared_ptr) that take advantage of language guarantees about calling destructors to ensure memory is freed when a function returns. It is still quite easy to misuse these tools and create similar bugs to C.

  • Java, Go, and Python rely on the garbage collector to identify memory that is no longer reachable and discard it. This guarantees that any pointer can be dereferenced, eliminating use-after-free and other classes of bugs. But, GC has a runtime cost and is difficult to tune properly.

Rust’s ownership and borrowing model can, in many cases, get the performance of C, with alloc and free operations precisely where they are required – zero cost. It also provides tools similar to C++’s smart pointers. When required, other options such as reference counting are available, and there are even third-party crates available to support runtime garbage collection (not covered in this class).

Ownership

All variable bindings have a scope where they are valid and it is an error to use a variable outside its scope:

struct Point(i32, i32);

fn main() {
    {
        let p = Point(3, 4);
        println!("x: {}", p.0);
    }
    println!("y: {}", p.1);
}

We say that the variable owns the value. Every Rust value has precisely one owner at all times.

At the end of the scope, the variable is dropped and the data is freed. A destructor can run here to free up resources.

This slide should take about 5 minutes.

Students familiar with garbage-collection implementations will know that a garbage collector starts with a set of “roots” to find all reachable memory. Rust’s “single owner” principle is a similar idea.

Move Semantics

An assignment will transfer ownership between variables:

fn main() {
    let s1: String = String::from("Hello!");
    let s2: String = s1;

    println!("s2: {s2}");
    // println!("s1: {s1}");
}
  • The assignment of s1 to s2 transfers ownership.
  • When s1 goes out of scope, nothing happens: it does not own anything.
  • When s2 goes out of scope, the string data is freed.

Before move to s2:

StackHeaps1ptrHello!len6capacity6

After move to s2:

StackHeaps1ptrHello!len6capacity6s2ptrlen6capacity6(inaccessible)

When you pass a value to a function, the value is assigned to the function parameter. This transfers ownership:

fn say_hello(name: String) {
    println!("Hello {name}")
}

fn main() {
    let name = String::from("Alice");
    say_hello(name);
    // say_hello(name);
}
This slide should take about 5 minutes.
  • Mention that this is the opposite of the defaults in C++, which copies by value unless you use std::move (and the move constructor is defined!).

  • It is only the ownership that moves. Whether any machine code is generated to manipulate the data itself is a matter of optimization, and such copies are aggressively optimized away.

  • Simple values (such as integers) can be marked Copy (see later slides).

  • In Rust, clones are explicit (by using clone).

In the say_hello example:

  • With the first call to say_hello, main gives up ownership of name. Afterwards, name cannot be used anymore within main.
  • The heap memory allocated for name will be freed at the end of the say_hello function.
  • main can retain ownership if it passes name as a reference (&name) and if say_hello accepts a reference as a parameter.
  • Alternatively, main can pass a clone of name in the first call (name.clone()).
  • Rust makes it harder than C++ to inadvertently create copies by making move semantics the default, and by forcing programmers to make clones explicit.

More to Explore

Defensive Copies in Modern C++

Modern C++ solves this differently:

std::string s1 = "Cpp";
std::string s2 = s1;  // Duplicate the data in s1.
  • The heap data from s1 is duplicated and s2 gets its own independent copy.
  • When s1 and s2 go out of scope, they each free their own memory.

Before copy-assignment:

StackHeaps1ptrCpplen3capacity3

After copy-assignment:

StackHeaps1ptrCpplen3capacity3s2ptrCpplen3capacity3

Key points:

  • C++ has made a slightly different choice than Rust. Because = copies data, the string data has to be cloned. Otherwise we would get a double-free when either string goes out of scope.

  • C++ also has std::move, which is used to indicate when a value may be moved from. If the example had been s2 = std::move(s1), no heap allocation would take place. After the move, s1 would be in a valid but unspecified state. Unlike Rust, the programmer is allowed to keep using s1.

  • Unlike Rust, = in C++ can run arbitrary code as determined by the type which is being copied or moved.

Clone

Sometimes you want to make a copy of a value. The Clone trait accomplishes this.

#[derive(Default)]
struct Backends {
    hostnames: Vec<String>,
    weights: Vec<f64>,
}

impl Backends {
    fn set_hostnames(&mut self, hostnames: &Vec<String>) {
        self.hostnames = hostnames.clone();
        self.weights = hostnames.iter().map(|_| 1.0).collect();
    }
}
This slide should take about 2 minutes.

The idea of Clone is to make it easy to spot where heap allocations are occurring. Look for .clone() and a few others like Vec::new or Box::new.

It’s common to “clone your way out” of problems with the borrow checker, and return later to try to optimize those clones away.

Copy Types

While move semantics are the default, certain types are copied by default:

fn main() {
    let x = 42;
    let y = x;

    println!("x: {x}"); // would not be accessible if not Copy
    println!("y: {y}");
}

These types implement the Copy trait.

You can opt-in your own types to use copy semantics:

#[derive(Copy, Clone, Debug)]
struct Point(i32, i32);

fn main() {
    let p1 = Point(3, 4);
    let p2 = p1;

    println!("p1: {p1:?}");
    println!("p2: {p2:?}");
}
  • After the assignment, both p1 and p2 own their own data.
  • We can also use p1.clone() to explicitly copy the data.
This slide should take about 5 minutes.

Copying and cloning are not the same thing:

  • Copying refers to bitwise copies of memory regions and does not work on arbitrary objects.
  • Copying does not allow for custom logic (unlike copy constructors in C++).
  • Cloning is a more general operation and also allows for custom behavior by implementing the Clone trait.
  • Copying does not work on types that implement the Drop trait.

In the above example, try the following:

  • Add a String field to struct Point. It will not compile because String is not a Copy type.
  • Remove Copy from the derive attribute. The compiler error is now in the println! for p1.
  • Show that it works if you clone p1 instead.

The Drop Trait

Values which implement Drop can specify code to run when they go out of scope:

struct Droppable {
    name: &'static str,
}

impl Drop for Droppable {
    fn drop(&mut self) {
        println!("Dropping {}", self.name);
    }
}

fn main() {
    let a = Droppable { name: "a" };
    {
        let b = Droppable { name: "b" };
        {
            let c = Droppable { name: "c" };
            let d = Droppable { name: "d" };
            println!("Exiting block B");
        }
        println!("Exiting block A");
    }
    drop(a);
    println!("Exiting main");
}
This slide should take about 8 minutes.
  • Note that std::mem::drop is not the same as std::ops::Drop::drop.
  • Values are automatically dropped when they go out of scope.
  • When a value is dropped, if it implements std::ops::Drop then its Drop::drop implementation will be called.
  • All its fields will then be dropped too, whether or not it implements Drop.
  • std::mem::drop is just an empty function that takes any value. The significance is that it takes ownership of the value, so at the end of its scope it gets dropped. This makes it a convenient way to explicitly drop values earlier than they would otherwise go out of scope.
    • This can be useful for objects that do some work on drop: releasing locks, closing files, etc.

Discussion points:

  • Why doesn’t Drop::drop take self?
    • Short-answer: If it did, std::mem::drop would be called at the end of the block, resulting in another call to Drop::drop, and a stack overflow!
  • Try replacing drop(a) with a.drop().

Day 2: Afternoon Exercises

You should do the following Rustlings:

  • move_semantics1
  • move_semantics2
  • move_semantics3
  • move_semantics4
  • move_semantics5
  • move_semantics6

Welcome to Day 3

In this session:

Including 10 minute breaks, this session should take about 1 hour and 55 minutes

Smart Pointers

In this segment:

This segment should take about 25 minutes

Box<T>

Box is an owned pointer to data on the heap:

fn main() {
    let five = Box::new(5);
    println!("five: {}", *five);
}
5StackHeapfive

Box<T> implements Deref<Target = T>, which means that you can call methods from T directly on a Box<T>.

Recursive data types or data types with dynamic sizes need to use a Box:

#[derive(Debug)]
enum List<T> {
    /// A non-empty list: first element and the rest of the list.
    Element(T, Box<List<T>>),
    /// An empty list.
    Nil,
}

fn main() {
    let list: List<i32> =
        List::Element(1, Box::new(List::Element(2, Box::new(List::Nil))));
    println!("{list:?}");
}
StackHeaplistElement1Element2Nil
This slide should take about 8 minutes.
  • Box is like std::unique_ptr in C++, except that it’s guaranteed to be not null.

  • A Box can be useful when you:

    • have a type whose size that can’t be known at compile time, but the Rust compiler wants to know an exact size.
    • want to transfer ownership of a large amount of data. To avoid copying large amounts of data on the stack, instead store the data on the heap in a Box so only the pointer is moved.
  • If Box was not used and we attempted to embed a List directly into the List, the compiler would not be able to compute a fixed size for the struct in memory (the List would be of infinite size).

  • Box solves this problem as it has the same size as a regular pointer and just points at the next element of the List in the heap.

  • Remove the Box in the List definition and show the compiler error. We get the message “recursive without indirection”, because for data recursion, we have to use indirection, a Box or reference of some kind, instead of storing the value directly.

More to Explore

Niche Optimization

#[derive(Debug)]
enum List<T> {
    Element(T, Box<List<T>>),
    Nil,
}

fn main() {
    let list: List<i32> =
        List::Element(1, Box::new(List::Element(2, Box::new(List::Nil))));
    println!("{list:?}");
}

A Box cannot be empty, so the pointer is always valid and non-null. This allows the compiler to optimize the memory layout:

StackHeaplistElement1Element2

Rc

Rc is a reference-counted shared pointer. Use this when you need to refer to the same data from multiple places:

use std::rc::Rc;

fn main() {
    let a = Rc::new(10);
    let b = Rc::clone(&a);

    println!("a: {a}");
    println!("b: {b}");
}
  • See Arc and Mutex if you are in a multi-threaded context.
  • You can downgrade a shared pointer into a Weak pointer to create cycles that will get dropped.
This slide should take about 5 minutes.
  • Rc’s count ensures that its contained value is valid for as long as there are references.
  • Rc in Rust is like std::shared_ptr in C++.
  • Rc::clone is cheap: it creates a pointer to the same allocation and increases the reference count. Does not make a deep clone and can generally be ignored when looking for performance issues in code.
  • make_mut actually clones the inner value if necessary (“clone-on-write”) and returns a mutable reference.
  • Use Rc::strong_count to check the reference count.
  • Rc::downgrade gives you a weakly reference-counted object to create cycles that will be dropped properly (likely in combination with RefCell).

Trait Objects

Trait objects allow for values of different types, for instance in a collection:

struct Dog {
    name: String,
    age: i8,
}

struct Cat {
    lives: i8,
}

trait Pet {
    fn talk(&self) -> String;
}

impl Pet for Dog {
    fn talk(&self) -> String {
        format!("Woof, my name is {}!", self.name)
    }
}

impl Pet for Cat {
    fn talk(&self) -> String {
        String::from("Miau!")
    }
}

fn main() {
    let pets: Vec<Box<dyn Pet>> = vec![
        Box::new(Cat { lives: 9 }),
        Box::new(Dog { name: String::from("Fido"), age: 5 }),
    ];

    for pet in pets {
        println!("Hello, who are you? {}", pet.talk());
    }
}

Memory layout after allocating pets:

<Dog as Pet>::talk<Cat as Pet>::talkStackHeapFidoptrlives9len2capacity2data:name,4,4age5vtablevtablepets: Vec<dyn Pet>data: CatDogProgram text
This slide should take about 10 minutes.
  • Types that implement a given trait may be of different sizes. This makes it impossible to have things like Vec<dyn Pet> in the example above.

  • dyn Pet is a way to tell the compiler about a dynamically sized type that implements Pet.

  • In the example, pets is allocated on the stack and the vector data is on the heap. The two vector elements are fat pointers:

    • A fat pointer is a double-width pointer. It has two components: a pointer to the actual object and a pointer to the virtual method table (vtable) for the Pet implementation of that particular object.
    • The data for the Dog named Fido is the name and age fields. The Cat has a lives field.
  • Compare these outputs in the above example:

    println!("{} {}", std::mem::size_of::<Dog>(), std::mem::size_of::<Cat>());
    println!("{} {}", std::mem::size_of::<&Dog>(), std::mem::size_of::<&Cat>());
    println!("{}", std::mem::size_of::<&dyn Pet>());
    println!("{}", std::mem::size_of::<Box<dyn Pet>>());
    • len and capacity of a Vec and pointers are of type usize which has 8 bytes (64 bits) on 64-bit systems.

Borrowing

In this segment:

This segment should take about 15 minutes

Borrowing a Value

As we saw before, instead of transferring ownership when calling a function, you can let a function borrow the value:

#[derive(Debug)]
struct Point(i32, i32);

fn add(p1: &Point, p2: &Point) -> Point {
    Point(p1.0 + p2.0, p1.1 + p2.1)
}

fn main() {
    let p1 = Point(3, 4);
    let p2 = Point(10, 20);
    let p3 = add(&p1, &p2);
    println!("{p1:?} + {p2:?} = {p3:?}");
}
  • The add function borrows two points and returns a new point.
  • The caller retains ownership of the inputs.
This slide should take about 5 minutes.

This slide is a review of the material on references from day 1, expanding slightly to include function arguments and return values.

Borrow Checking

Rust’s borrow checker puts constraints on the ways you can borrow values. For a given value, at any time:

  • You can have one or more shared references to the value, or
  • You can have exactly one exclusive reference to the value.
fn main() {
    let mut a: i32 = 10;
    let b: &i32 = &a;

    {
        let c: &mut i32 = &mut a;
        *c = 20;
    }

    println!("a: {a}");
    println!("b: {b}");
}
This slide should take about 10 minutes.
  • Note that the requirement is that conflicting references not exist at the same point. It does not matter where the reference is dereferenced.
  • The above code does not compile because a is borrowed as mutable (through c) and as immutable (through b) at the same time.
  • Move the println! statement for b before the scope that introduces c to make the code compile.
  • After that change, the compiler realizes that b is only ever used before the new mutable borrow of a through c. This is a feature of the borrow checker called “non-lexical lifetimes”.
  • The exclusive reference constraint is quite strong. Rust uses it to ensure that data races do not occur. Rust also relies on this constraint to optimize code. For example, a value behind a shared reference can be safely cached in a register for the lifetime of that reference.
  • The borrow checker is designed to accommodate many common patterns, such as taking exclusive references to different fields in a struct at the same time. But, there are some situations where it doesn’t quite “get it” and this often results in “fighting with the borrow checker.”

Slices and Lifetimes

In this segment:

This segment should take about 40 minutes

Slices

A slice gives you a view into a larger collection:

fn main() {
    let mut a: [i32; 6] = [10, 20, 30, 40, 50, 60];
    println!("a: {a:?}");

    let s: &[i32] = &a[2..4];

    println!("s: {s:?}");
}
  • Slices borrow data from the sliced type.
  • Question: What happens if you modify a[3] right before printing s?
This slide should take about 10 minutes.
  • We create a slice by borrowing a and specifying the starting and ending indexes in brackets.

  • If the slice starts at index 0, Rust’s range syntax allows us to drop the starting index, meaning that &a[0..a.len()] and &a[..a.len()] are identical.

  • The same is true for the last index, so &a[2..a.len()] and &a[2..] are identical.

  • To easily create a slice of the full array, we can therefore use &a[..].

  • s is a reference to a slice of i32s. Notice that the type of s (&[i32]) no longer mentions the array length. This allows us to perform computation on slices of different sizes.

  • Slices always borrow from another object. In this example, a has to remain ‘alive’ (in scope) for at least as long as our slice.

  • The question about modifying a[3] can spark an interesting discussion, but the answer is that for memory safety reasons you cannot do it through a at this point in the execution, but you can read the data from both a and s safely. It works before you created the slice, and again after the println, when the slice is no longer used.

String References

We can now understand the two string types in Rust: &str is almost like &[char], but with its data stored in a variable-length encoding (UTF-8).

fn main() {
    let s1: &str = "World";
    println!("s1: {s1}");

    let mut s2: String = String::from("Hello ");
    println!("s2: {s2}");
    s2.push_str(s1);
    println!("s2: {s2}");

    let s3: &str = &s2[6..];
    println!("s3: {s3}");
}

Rust terminology:

  • &str an immutable reference to a string slice.
  • String a mutable string buffer.
This slide should take about 10 minutes.
  • &str introduces a string slice, which is an immutable reference to UTF-8 encoded string data stored in a block of memory. String literals (”Hello”), are stored in the program’s binary.

  • Rust’s String type is a wrapper around a vector of bytes. As with a Vec<T>, it is owned.

  • As with many other types String::from() creates a string from a string literal; String::new() creates a new empty string, to which string data can be added using the push() and push_str() methods.

  • The format!() macro is a convenient way to generate an owned string from dynamic values. It accepts the same format specification as println!().

  • You can borrow &str slices from String via & and optionally range selection. If you select a byte range that is not aligned to character boundaries, the expression will panic. The chars iterator iterates over characters and is preferred over trying to get character boundaries right.

  • For C++ programmers: think of &str as std::string_view from C++, but the one that always points to a valid string in memory. Rust String is a rough equivalent of std::string from C++ (main difference: it can only contain UTF-8 encoded bytes and will never use a small-string optimization).

  • Byte strings literals allow you to create a &[u8] value directly:

    fn main() {
        println!("{:?}", b"abc");
        println!("{:?}", &[97, 98, 99]);
    }

Lifetime Annotations

A reference has a lifetime, which must not “outlive” the value it refers to. This is verified by the borrow checker.

The lifetime can be implicit - this is what we have seen so far. Lifetimes can also be explicit: &'a Point, &'document str. Lifetimes start with ' and 'a is a typical default name. Read &'a Point as “a borrowed Point which is valid for at least the lifetime a”.

Lifetimes are always inferred by the compiler: you cannot assign a lifetime yourself. Explicit lifetime annotations create constraints where there is ambiguity; the compiler verifies that there is a valid solution.

Lifetimes become more complicated when considering passing values to and returning values from functions.

#[derive(Debug)]
struct Point(i32, i32);

fn left_most(p1: &Point, p2: &Point) -> &Point {
    if p1.0 < p2.0 {
        p1
    } else {
        p2
    }
}

fn main() {
    let p1: Point = Point(10, 10);
    let p2: Point = Point(20, 20);

    let p_ref = left_most(&p1, &p2); // What is the lifetime of p_ref?
    println!("p_ref: {p_ref:?}");
}
This slide should take about 10 minutes.

In this example, the compiler does not know what lifetime to infer for p_ref. Looking inside the function body shows that it can only safely assume that p_ref’s lifetime is the shorter of p1 and p2. But just like types, Rust requires explicit annotations of lifetimes on function arguments and return values.

Add 'a appropriately to left_most:

fn left_most<'a>(p1: &'a Point, p2: &'a Point) -> &'a Point {

This says, “given p1 and p2 which both outlive 'a, the return value lives for at least 'a.

In common cases, lifetimes can be elided, as described on the next slide.

Lifetimes in Function Calls

Lifetimes for function arguments and return values must be fully specified, but Rust allows lifetimes to be elided in most cases with a few simple rules. This is not inference – it is just a syntactic shorthand.

  • Each argument which does not have a lifetime annotation is given one.
  • If there is only one argument lifetime, it is given to all un-annotated return values.
  • If there are multiple argument lifetimes, but the first one is for self, that lifetime is given to all un-annotated return values.
#[derive(Debug)]
struct Point(i32, i32);

fn cab_distance(p1: &Point, p2: &Point) -> i32 {
    (p1.0 - p2.0).abs() + (p1.1 - p2.1).abs()
}

fn nearest<'a>(points: &'a [Point], query: &Point) -> Option<&'a Point> {
    let mut nearest = None;
    for p in points {
        if let Some((_, nearest_dist)) = nearest {
            let dist = cab_distance(p, query);
            if dist < nearest_dist {
                nearest = Some((p, dist));
            }
        } else {
            nearest = Some((p, cab_distance(p, query)));
        };
    }
    nearest.map(|(p, _)| p)
}

fn main() {
    println!(
        "{:?}",
        nearest(
            &[Point(1, 0), Point(1, 0), Point(-1, 0), Point(0, -1),],
            &Point(0, 2)
        )
    );
}
This slide should take about 5 minutes.

In this example, cab_distance is trivially elided.

The nearest function provides another example of a function with multiple references in its arguments that requires explicit annotation.

Try adjusting the signature to “lie” about the lifetimes returned:

fn nearest<'a, 'q>(points: &'a [Point], query: &'q Point) -> Option<&'q Point> {

This won’t compile, demonstrating that the annotations are checked for validity by the compiler. Note that this is not the case for raw pointers (unsafe), and this is a common source of errors with unsafe Rust.

Students may ask when to use lifetimes. Rust borrows always have lifetimes. Most of the time, elision and type inference mean these don’t need to be written out. In more complicated cases, lifetime annotations can help resolve ambiguity. Often, especially when prototyping, it’s easier to just work with owned data by cloning values where necessary.

Lifetimes in Data Structures

If a data type stores borrowed data, it must be annotated with a lifetime:

#[derive(Debug)]
struct Highlight<'doc>(&'doc str);

fn erase(text: String) {
    println!("Bye {text}!");
}

fn main() {
    let text = String::from("The quick brown fox jumps over the lazy dog.");
    let fox = Highlight(&text[4..19]);
    let dog = Highlight(&text[35..43]);
    // erase(text);
    println!("{fox:?}");
    println!("{dog:?}");
}
This slide should take about 5 minutes.
  • In the above example, the annotation on Highlight enforces that the data underlying the contained &str lives at least as long as any instance of Highlight that uses that data.
  • If text is consumed before the end of the lifetime of fox (or dog), the borrow checker throws an error.
  • Types with borrowed data force users to hold on to the original data. This can be useful for creating lightweight views, but it generally makes them somewhat harder to use.
  • When possible, make data structures own their data directly.
  • Some structs with multiple references inside can have more than one lifetime annotation. This can be necessary if there is a need to describe lifetime relationships between the references themselves, in addition to the lifetime of the struct itself. Those are very advanced use cases.

Day 3: Morning Exercises

You should do the following Rustlings:

  • box1
  • rc1
  • lifetimes1
  • lifetimes2
  • lifetimes3

Welcome Back

In this session:

Including 10 minute breaks, this session should take about 2 hours and 20 minutes

Standard Library Types

In this segment:

This segment should take about 1 hour

For each of the slides in this section, spend some time reviewing the documentation pages, highlighting some of the more common methods.

Standard Library

Rust comes with a standard library which helps establish a set of common types used by Rust libraries and programs. This way, two libraries can work together smoothly because they both use the same String type.

In fact, Rust contains several layers of the Standard Library: core, alloc and std.

  • core includes the most basic types and functions that don’t depend on libc, allocator or even the presence of an operating system.
  • alloc includes types which require a global heap allocator, such as Vec, Box and Arc.
  • Embedded Rust applications often only use core, and sometimes alloc.

Documentation

Rust comes with extensive documentation. For example:

In fact, you can document your own code:

/// Determine whether the first argument is divisible by the second argument.
///
/// If the second argument is zero, the result is false.
fn is_divisible_by(lhs: u32, rhs: u32) -> bool {
    if rhs == 0 {
        return false;
    }
    lhs % rhs == 0
}

The contents are treated as Markdown. All published Rust library crates are automatically documented at docs.rs using the rustdoc tool. It is idiomatic to document all public items in an API using this pattern.

To document an item from inside the item (such as inside a module), use //! or /*! .. */, called “inner doc comments”:

//! This module contains functionality relating to divisibility of integers.
This slide should take about 5 minutes.

Option

We have already seen some use of Option<T>. It stores either a value of type T or nothing. For example, String::find returns an Option<usize>.

fn main() {
    let name = "Löwe 老虎 Léopard Gepardi";

    let mut position: Option<usize> = name.find('é');
    println!("find returned {position:?}");
    assert_eq!(position.unwrap(), 14);

    position = name.find('Z');
    println!("find returned {position:?}");
    assert_eq!(position.expect("Character not found"), 0);
}
This slide should take about 10 minutes.
  • Option is widely used, not just in the standard library.
  • unwrap will return the value in an Option, or panic. expect is similar but takes an error message.
    • You can panic on None, but you can’t “accidentally” forget to check for None.
    • It’s common to unwrap/expect all over the place when hacking something together, but production code typically handles None in a nicer fashion.
  • The niche optimization means that Option<T> often has the same size in memory as T.

Result

Result is similar to Option, but indicates the success or failure of an operation, each with a different type. This is similar to the Res defined in the expression exercise, but generic: Result<T, E> where T is used in the Ok variant and E appears in the Err variant.

use std::fs::File;
use std::io::Read;

fn main() {
    let file: Result<File, std::io::Error> = File::open("diary.txt");
    match file {
        Ok(mut file) => {
            let mut contents = String::new();
            if let Ok(bytes) = file.read_to_string(&mut contents) {
                println!("Dear diary: {contents} ({bytes} bytes)");
            } else {
                println!("Could not read file content");
            }
        }
        Err(err) => {
            println!("The diary could not be opened: {err}");
        }
    }
}
This slide should take about 10 minutes.
  • As with Option, the successful value sits inside of Result, forcing the developer to explicitly extract it. This encourages error checking. In the case where an error should never happen, unwrap() or expect() can be called, and this is a signal of the developer intent too.
  • Result documentation is a recommended read. Not during the course, but it is worth mentioning. It contains a lot of convenience methods and functions that help functional-style programming.
  • Result is the standard type to implement error handling as we will see on Day 3.

Vec

Vec is the standard resizable heap-allocated buffer:

fn main() {
    let mut v1 = Vec::new();
    v1.push(42);
    println!("v1: len = {}, capacity = {}", v1.len(), v1.capacity());

    let mut v2 = Vec::with_capacity(v1.len() + 1);
    v2.extend(v1.iter());
    v2.push(9999);
    println!("v2: len = {}, capacity = {}", v2.len(), v2.capacity());

    // Canonical macro to initialize a vector with elements.
    let mut v3 = vec![0, 0, 1, 2, 3, 4];

    // Retain only the even elements.
    v3.retain(|x| x % 2 == 0);
    println!("{v3:?}");

    // Remove consecutive duplicates.
    v3.dedup();
    println!("{v3:?}");
}

Vec implements Deref<Target = [T]>, which means that you can call slice methods on a Vec.

This slide should take about 10 minutes.
  • Vec is a type of collection, along with String and HashMap. The data it contains is stored on the heap. This means the amount of data doesn’t need to be known at compile time. It can grow or shrink at runtime.
  • Notice how Vec<T> is a generic type too, but you don’t have to specify T explicitly. As always with Rust type inference, the T was established during the first push call.
  • vec![...] is a canonical macro to use instead of Vec::new() and it supports adding initial elements to the vector.
  • To index the vector you use [ ], but they will panic if out of bounds. Alternatively, using get will return an Option. The pop function will remove the last element.
  • Slices are covered on day 3. For now, students only need to know that a value of type Vec gives access to all of the documented slice methods, too.

String

String is the standard heap-allocated growable UTF-8 string buffer:

fn main() {
    let mut s1 = String::new();
    s1.push_str("Hello");
    println!("s1: len = {}, capacity = {}", s1.len(), s1.capacity());

    let mut s2 = String::with_capacity(s1.len() + 1);
    s2.push_str(&s1);
    s2.push('!');
    println!("s2: len = {}, capacity = {}", s2.len(), s2.capacity());

    let s3 = String::from("🇨🇭");
    println!("s3: len = {}, number of chars = {}", s3.len(), s3.chars().count());
}

String implements Deref<Target = str>, which means that you can call all str methods on a String.

This slide should take about 10 minutes.
  • String::new returns a new empty string, use String::with_capacity when you know how much data you want to push to the string.
  • String::len returns the size of the String in bytes (which can be different from its length in characters).
  • String::chars returns an iterator over the actual characters. Note that a char can be different from what a human will consider a “character” due to grapheme clusters.
  • When people refer to strings they could either be talking about &str or String.
  • When a type implements Deref<Target = T>, the compiler will let you transparently call methods from T.
    • We haven’t discussed the Deref trait yet, so at this point this mostly explains the structure of the sidebar in the documentation.
    • String implements Deref<Target = str> which transparently gives it access to str’s methods.
    • Write and compare let s3 = s1.deref(); and let s3 = &*s1;.
  • String is implemented as a wrapper around a vector of bytes, many of the operations you see supported on vectors are also supported on String, but with some extra guarantees.
  • Compare the different ways to index a String:
    • To a character by using s3.chars().nth(i).unwrap() where i is in-bound, out-of-bounds.
    • To a substring by using s3[0..4], where that slice is on character boundaries or not.
  • Many types can be converted to a string with the to_string method. This trait is automatically implemented for all types that implement Display, so anything that can be formatted can also be converted to a string.

HashMap

Standard hash map with protection against HashDoS attacks:

use std::collections::HashMap;

fn main() {
    let mut page_counts = HashMap::new();
    page_counts.insert("Adventures of Huckleberry Finn".to_string(), 207);
    page_counts.insert("Grimms' Fairy Tales".to_string(), 751);
    page_counts.insert("Pride and Prejudice".to_string(), 303);

    if !page_counts.contains_key("Les Misérables") {
        println!(
            "We know about {} books, but not Les Misérables.",
            page_counts.len()
        );
    }

    for book in ["Pride and Prejudice", "Alice's Adventure in Wonderland"] {
        match page_counts.get(book) {
            Some(count) => println!("{book}: {count} pages"),
            None => println!("{book} is unknown."),
        }
    }

    // Use the .entry() method to insert a value if nothing is found.
    for book in ["Pride and Prejudice", "Alice's Adventure in Wonderland"] {
        let page_count: &mut i32 = page_counts.entry(book.to_string()).or_insert(0);
        *page_count += 1;
    }

    println!("{page_counts:#?}");
}
This slide should take about 10 minutes.
  • HashMap is not defined in the prelude and needs to be brought into scope.

  • Try the following lines of code. The first line will see if a book is in the hashmap and if not return an alternative value. The second line will insert the alternative value in the hashmap if the book is not found.

    let pc1 = page_counts
        .get("Harry Potter and the Sorcerer's Stone")
        .unwrap_or(&336);
    let pc2 = page_counts
        .entry("The Hunger Games".to_string())
        .or_insert(374);
  • Unlike vec!, there is unfortunately no standard hashmap! macro.

    • Although, since Rust 1.56, HashMap implements From<[(K, V); N]>, which allows us to easily initialize a hash map from a literal array:

      let page_counts = HashMap::from([
        ("Harry Potter and the Sorcerer's Stone".to_string(), 336),
        ("The Hunger Games".to_string(), 374),
      ]);
  • Alternatively HashMap can be built from any Iterator which yields key-value tuples.

  • We are showing HashMap<String, i32>, and avoid using &str as key to make examples easier. Using references in collections can, of course, be done, but it can lead into complications with the borrow checker.

    • Try removing to_string() from the example above and see if it still compiles. Where do you think we might run into issues?
  • This type has several “method-specific” return types, such as std::collections::hash_map::Keys. These types often appear in searches of the Rust docs. Show students the docs for this type, and the helpful link back to the keys method.

Standard Library Traits

In this segment:

This segment should take about 1 hour and 10 minutes

As with the standard-library types, spend time reviewing the documentation for each trait.

This section is long. Take a break midway through.

Comparisons

These traits support comparisons between values. All traits can be derived for types containing fields that implement these traits.

PartialEq and Eq

PartialEq is a partial equivalence relation, with required method eq and provided method ne. The == and != operators will call these methods.

struct Key {
    id: u32,
    metadata: Option<String>,
}
impl PartialEq for Key {
    fn eq(&self, other: &Self) -> bool {
        self.id == other.id
    }
}

Eq is a full equivalence relation (reflexive, symmetric, and transitive) and implies PartialEq. Functions that require full equivalence will use Eq as a trait bound.

PartialOrd and Ord

PartialOrd defines a partial ordering, with a partial_cmp method. It is used to implement the <, <=, >=, and > operators.

use std::cmp::Ordering;
#[derive(Eq, PartialEq)]
struct Citation {
    author: String,
    year: u32,
}
impl PartialOrd for Citation {
    fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
        match self.author.partial_cmp(&other.author) {
            Some(Ordering::Equal) => self.year.partial_cmp(&other.year),
            author_ord => author_ord,
        }
    }
}

Ord is a total ordering, with cmp returning Ordering.

This slide should take about 10 minutes.

PartialEq can be implemented between different types, but Eq cannot, because it is reflexive:

struct Key {
    id: u32,
    metadata: Option<String>,
}
impl PartialEq<u32> for Key {
    fn eq(&self, other: &u32) -> bool {
        self.id == *other
    }
}

In practice, it’s common to derive these traits, but uncommon to implement them.

Operators

Operator overloading is implemented via traits in std::ops:

#[derive(Debug, Copy, Clone)]
struct Point {
    x: i32,
    y: i32,
}

impl std::ops::Add for Point {
    type Output = Self;

    fn add(self, other: Self) -> Self {
        Self { x: self.x + other.x, y: self.y + other.y }
    }
}

fn main() {
    let p1 = Point { x: 10, y: 20 };
    let p2 = Point { x: 100, y: 200 };
    println!("{p1:?} + {p2:?} = {:?}", p1 + p2);
}
This slide should take about 10 minutes.

Discussion points:

  • You could implement Add for &Point. In which situations is that useful?
    • Answer: Add:add consumes self. If type T for which you are overloading the operator is not Copy, you should consider overloading the operator for &T as well. This avoids unnecessary cloning on the call site.
  • Why is Output an associated type? Could it be made a type parameter of the method?
    • Short answer: Function type parameters are controlled by the caller, but associated types (like Output) are controlled by the implementer of a trait.
  • You could implement Add for two different types, e.g. impl Add<(i32, i32)> for Point would add a tuple to a Point.

From and Into

Types implement From and Into to facilitate type conversions:

fn main() {
    let s = String::from("hello");
    let addr = std::net::Ipv4Addr::from([127, 0, 0, 1]);
    let one = i16::from(true);
    let bigger = i32::from(123_i16);
    println!("{s}, {addr}, {one}, {bigger}");
}

Into is automatically implemented when From is implemented:

fn main() {
    let s: String = "hello".into();
    let addr: std::net::Ipv4Addr = [127, 0, 0, 1].into();
    let one: i16 = true.into();
    let bigger: i32 = 123_i16.into();
    println!("{s}, {addr}, {one}, {bigger}");
}
This slide should take about 10 minutes.
  • That’s why it is common to only implement From, as your type will get Into implementation too.
  • When declaring a function argument input type like “anything that can be converted into a String”, the rule is opposite, you should use Into. Your function will accept types that implement From and those that only implement Into.

Casting

Rust has no implicit type conversions, but does support explicit casts with as. These generally follow C semantics where those are defined.

fn main() {
    let value: i64 = 1000;
    println!("as u16: {}", value as u16);
    println!("as i16: {}", value as i16);
    println!("as u8: {}", value as u8);
}

The results of as are always defined in Rust and consistent across platforms. This might not match your intuition for changing sign or casting to a smaller type – check the docs, and comment for clarity.

Casting with as is a relatively sharp tool that is easy to use incorrectly, and can be a source of subtle bugs as future maintenance work changes the types that are used or the ranges of values in types. Casts are best used only when the intent is to indicate unconditional truncation (e.g. selecting the bottom 32 bits of a u64 with as u32, regardless of what was in the high bits).

For infallible casts (e.g. u32 to u64), prefer using From or Into over as to confirm that the cast is in fact infallible. For fallible casts, TryFrom and TryInto are available when you want to handle casts that fit differently from those that don’t.

This slide should take about 5 minutes.

Consider taking a break after this slide.

as is similar to a C++ static cast. Use of as in cases where data might be lost is generally discouraged, or at least deserves an explanatory comment.

This is common in casting integers to usize for use as an index.

Read and Write

Using Read and BufRead, you can abstract over u8 sources:

use std::io::{BufRead, BufReader, Read, Result};

fn count_lines<R: Read>(reader: R) -> usize {
    let buf_reader = BufReader::new(reader);
    buf_reader.lines().count()
}

fn main() -> Result<()> {
    let slice: &[u8] = b"foo\nbar\nbaz\n";
    println!("lines in slice: {}", count_lines(slice));

    let file = std::fs::File::open(std::env::current_exe()?)?;
    println!("lines in file: {}", count_lines(file));
    Ok(())
}

Similarly, Write lets you abstract over u8 sinks:

use std::io::{Result, Write};

fn log<W: Write>(writer: &mut W, msg: &str) -> Result<()> {
    writer.write_all(msg.as_bytes())?;
    writer.write_all("\n".as_bytes())
}

fn main() -> Result<()> {
    let mut buffer = Vec::new();
    log(&mut buffer, "Hello")?;
    log(&mut buffer, "World")?;
    println!("Logged: {buffer:?}");
    Ok(())
}

The Default Trait

Default trait produces a default value for a type.

#[derive(Debug, Default)]
struct Derived {
    x: u32,
    y: String,
    z: Implemented,
}

#[derive(Debug)]
struct Implemented(String);

impl Default for Implemented {
    fn default() -> Self {
        Self("John Smith".into())
    }
}

fn main() {
    let default_struct = Derived::default();
    println!("{default_struct:#?}");

    let almost_default_struct =
        Derived { y: "Y is set!".into(), ..Derived::default() };
    println!("{almost_default_struct:#?}");

    let nothing: Option<Derived> = None;
    println!("{:#?}", nothing.unwrap_or_default());
}
This slide should take about 5 minutes.
  • It can be implemented directly or it can be derived via #[derive(Default)].
  • A derived implementation will produce a value where all fields are set to their default values.
    • This means all types in the struct must implement Default too.
  • Standard Rust types often implement Default with reasonable values (e.g. 0, "", etc).
  • The partial struct initialization works nicely with default.
  • The Rust standard library is aware that types can implement Default and provides convenience methods that use it.
  • The .. syntax is called struct update syntax.

Closures

Closures or lambda expressions have types which cannot be named. However, they implement special Fn, FnMut, and FnOnce traits:

fn apply_with_log(func: impl FnOnce(i32) -> i32, input: i32) -> i32 {
    println!("Calling function on {input}");
    func(input)
}

fn main() {
    let add_3 = |x| x + 3;
    println!("add_3: {}", apply_with_log(add_3, 10));
    println!("add_3: {}", apply_with_log(add_3, 20));

    let mut v = Vec::new();
    let mut accumulate = |x: i32| {
        v.push(x);
        v.iter().sum::<i32>()
    };
    println!("accumulate: {}", apply_with_log(&mut accumulate, 4));
    println!("accumulate: {}", apply_with_log(&mut accumulate, 5));

    let multiply_sum = |x| x * v.into_iter().sum::<i32>();
    println!("multiply_sum: {}", apply_with_log(multiply_sum, 3));
}
This slide should take about 20 minutes.

An Fn (e.g. add_3) neither consumes nor mutates captured values, or perhaps captures nothing at all. It can be called multiple times concurrently.

An FnMut (e.g. accumulate) might mutate captured values. You can call it multiple times, but not concurrently.

If you have an FnOnce (e.g. multiply_sum), you may only call it once. It might consume captured values.

FnMut is a subtype of FnOnce. Fn is a subtype of FnMut and FnOnce. I.e. you can use an FnMut wherever an FnOnce is called for, and you can use an Fn wherever an FnMut or FnOnce is called for.

When you define a function that takes a closure, you should take FnOnce if you can (i.e. you call it once), or FnMut else, and last Fn. This allows the most flexibility for the caller.

In contrast, when you have a closure, the most flexible you can have is Fn (it can be passed everywhere), then FnMut, and lastly FnOnce.

The compiler also infers Copy (e.g. for add_3) and Clone (e.g. multiply_sum), depending on what the closure captures.

By default, closures will capture by reference if they can. The move keyword makes them capture by value.

fn make_greeter(prefix: String) -> impl Fn(&str) {
    return move |name| println!("{prefix} {name}");
}

fn main() {
    let hi = make_greeter("Hi".to_string());
    hi("Greg");
}

Day 3: Afternoon Exercises

You should do the following Rustlings:

  • options1
  • options2
  • options3
  • vecs1
  • vecs2
  • strings1
  • strings2
  • strings3
  • strings4
  • hashmaps1
  • hashmaps2
  • hashmaps3
  • using_as
  • from_into
  • from_str

Welcome to Day 4

Today we will cover topics relating to building large-scale software in Rust:

  • Iterators: a deep dive on the Iterator trait.
  • Modules and visibility.
  • Testing.
  • Error handling: panics, Result, and the try operator ?.

Schedule

In this session:

Including 10 minute breaks, this session should take about 1 hour

Iterators

In this segment:

This segment should take about 15 minutes

Iterator

The Iterator trait supports iterating over values in a collection. It requires a next method and provides lots of methods. Many standard library types implement Iterator, and you can implement it yourself, too:

struct Fibonacci {
    curr: u32,
    next: u32,
}

impl Iterator for Fibonacci {
    type Item = u32;

    fn next(&mut self) -> Option<Self::Item> {
        let new_next = self.curr + self.next;
        self.curr = self.next;
        self.next = new_next;
        Some(self.curr)
    }
}

fn main() {
    let fib = Fibonacci { curr: 0, next: 1 };
    for (i, n) in fib.enumerate().take(5) {
        println!("fib({i}): {n}");
    }
}
This slide should take about 5 minutes.
  • The Iterator trait implements many common functional programming operations over collections (e.g. map, filter, reduce, etc). This is the trait where you can find all the documentation about them. In Rust these functions should produce the code as efficient as equivalent imperative implementations.

  • IntoIterator is the trait that makes for loops work. It is implemented by collection types such as Vec<T> and references to them such as &Vec<T> and &[T]. Ranges also implement it. This is why you can iterate over a vector with for i in some_vec { .. } but some_vec.next() doesn’t exist.

IntoIterator

The Iterator trait tells you how to iterate once you have created an iterator. The related trait IntoIterator defines how to create an iterator for a type. It is used automatically by the for loop.

struct Grid {
    x_coords: Vec<u32>,
    y_coords: Vec<u32>,
}

impl IntoIterator for Grid {
    type Item = (u32, u32);
    type IntoIter = GridIter;
    fn into_iter(self) -> GridIter {
        GridIter { grid: self, i: 0, j: 0 }
    }
}

struct GridIter {
    grid: Grid,
    i: usize,
    j: usize,
}

impl Iterator for GridIter {
    type Item = (u32, u32);

    fn next(&mut self) -> Option<(u32, u32)> {
        if self.i >= self.grid.x_coords.len() {
            self.i = 0;
            self.j += 1;
            if self.j >= self.grid.y_coords.len() {
                return None;
            }
        }
        let res = Some((self.grid.x_coords[self.i], self.grid.y_coords[self.j]));
        self.i += 1;
        res
    }
}

fn main() {
    let grid = Grid { x_coords: vec![3, 5, 7, 9], y_coords: vec![10, 20, 30, 40] };
    for (x, y) in grid {
        println!("point = {x}, {y}");
    }
}
This slide should take about 5 minutes.

Click through to the docs for IntoIterator. Every implementation of IntoIterator must declare two types:

  • Item: the type to iterate over, such as i8,
  • IntoIter: the Iterator type returned by the into_iter method.

Note that IntoIter and Item are linked: the iterator must have the same Item type, which means that it returns Option<Item>

The example iterates over all combinations of x and y coordinates.

Try iterating over the grid twice in main. Why does this fail? Note that IntoIterator::into_iter takes ownership of self.

Fix this issue by implementing IntoIterator for &Grid and storing a reference to the Grid in GridIter.

The same problem can occur for standard library types: for e in some_vector will take ownership of some_vector and iterate over owned elements from that vector. Use for e in &some_vector instead, to iterate over references to elements of some_vector.

FromIterator

FromIterator lets you build a collection from an Iterator.

fn main() {
    let primes = vec![2, 3, 5, 7];
    let prime_squares = primes.into_iter().map(|p| p * p).collect::<Vec<_>>();
    println!("prime_squares: {prime_squares:?}");
}
This slide should take about 5 minutes.

Iterator implements

fn collect<B>(self) -> B
where
    B: FromIterator<Self::Item>,
    Self: Sized

There are two ways to specify B for this method:

  • With the “turbofish”: some_iterator.collect::<COLLECTION_TYPE>(), as shown. The _ shorthand used here lets Rust infer the type of the Vec elements.
  • With type inference: let prime_squares: Vec<_> = some_iterator.collect(). Rewrite the example to use this form.

There are basic implementations of FromIterator for Vec, HashMap, etc. There are also more specialized implementations which let you do cool things like convert an Iterator<Item = Result<V, E>> into a Result<Vec<V>, E>.

Modules

In this segment:

This segment should take about 25 minutes

Modules

We have seen how impl blocks let us namespace functions to a type.

Similarly, mod lets us namespace types and functions:

mod foo {
    pub fn do_something() {
        println!("In the foo module");
    }
}

mod bar {
    pub fn do_something() {
        println!("In the bar module");
    }
}

fn main() {
    foo::do_something();
    bar::do_something();
}
This slide should take about 3 minutes.
  • Packages provide functionality and include a Cargo.toml file that describes how to build a bundle of 1+ crates.
  • Crates are a tree of modules, where a binary crate creates an executable and a library crate compiles to a library.
  • Modules define organization, scope, and are the focus of this section.

Filesystem Hierarchy

Omitting the module content will tell Rust to look for it in another file:

mod garden;

This tells Rust that the garden module content is found at src/garden.rs. Similarly, a garden::vegetables module can be found at src/garden/vegetables.rs.

The crate root is in:

  • src/lib.rs (for a library crate)
  • src/main.rs (for a binary crate)

Modules defined in files can be documented, too, using “inner doc comments”. These document the item that contains them – in this case, a module.

//! This module implements the garden, including a highly performant germination
//! implementation.

// Re-export types from this module.
pub use garden::Garden;
pub use seeds::SeedPacket;

/// Sow the given seed packets.
pub fn sow(seeds: Vec<SeedPacket>) {
    todo!()
}

/// Harvest the produce in the garden that is ready.
pub fn harvest(garden: &mut Garden) {
    todo!()
}
This slide should take about 5 minutes.
  • Before Rust 2018, modules needed to be located at module/mod.rs instead of module.rs, and this is still a working alternative for editions after 2018.

  • The main reason to introduce filename.rs as alternative to filename/mod.rs was because many files named mod.rs can be hard to distinguish in IDEs.

  • Deeper nesting can use folders, even if the main module is a file:

    src/
    ├── main.rs
    ├── top_module.rs
    └── top_module/
        └── sub_module.rs
    

Visibility

Modules are a privacy boundary:

  • Module items are private by default (hides implementation details).
  • Parent and sibling items are always visible.
  • In other words, if an item is visible in module foo, it’s visible in all the descendants of foo.
mod outer {
    fn private() {
        println!("outer::private");
    }

    pub fn public() {
        println!("outer::public");
    }

    mod inner {
        fn private() {
            println!("outer::inner::private");
        }

        pub fn public() {
            println!("outer::inner::public");
            super::private();
        }
    }
}

fn main() {
    outer::public();
}
This slide should take about 5 minutes.
  • Use the pub keyword to make modules public.

Additionally, there are advanced pub(...) specifiers to restrict the scope of public visibility.

  • See the Rust Reference.
  • Configuring pub(crate) visibility is a common pattern.
  • Less commonly, you can give visibility to a specific path.
  • In any case, visibility must be granted to an ancestor module (and all of its descendants).

use, super, self

A module can bring symbols from another module into scope with use. You will typically see something like this at the top of each module:

use std::collections::HashSet;
use std::process::abort;

Paths

Paths are resolved as follows:

  1. As a relative path:

    • foo or self::foo refers to foo in the current module,
    • super::foo refers to foo in the parent module.
  2. As an absolute path:

    • crate::foo refers to foo in the root of the current crate,
    • bar::foo refers to foo in the bar crate.
This slide should take about 8 minutes.
  • It is common to “re-export” symbols at a shorter path. For example, the top-level lib.rs in a crate might have

    mod storage;
    
    pub use storage::disk::DiskStorage;
    pub use storage::network::NetworkStorage;

    making DiskStorage and NetworkStorage available to other crates with a convenient, short path.

  • For the most part, only items that appear in a module need to be use’d. However, a trait must be in scope to call any methods on that trait, even if a type implementing that trait is already in scope. For example, to use the read_to_string method on a type implementing the Read trait, you need to use std::io::Read.

  • The use statement can have a wildcard: use std::io::*. This is discouraged because it is not clear which items are imported, and those might change over time.

Day 4: Morning Exercises

You should do the following Rustlings:

  • iterators1
  • iterators2
  • iterators3
  • iterators4
  • iterators5
  • modules1
  • modules2
  • modules3

Welcome Back

In this session:

Including 10 minute breaks, this session should take about 50 minutes

Testing

In this segment:

This segment should take about 15 minutes

Unit Tests

Rust and Cargo come with a simple unit test framework:

  • Unit tests are supported throughout your code.

  • Integration tests are supported via the tests/ directory.

Tests are marked with #[test]. Unit tests are often put in a nested tests module, using #[cfg(test)] to conditionally compile them only when building tests.

fn first_word(text: &str) -> &str {
    match text.find(' ') {
        Some(idx) => &text[..idx],
        None => &text,
    }
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_empty() {
        assert_eq!(first_word(""), "");
    }

    #[test]
    fn test_single_word() {
        assert_eq!(first_word("Hello"), "Hello");
    }

    #[test]
    fn test_multiple_words() {
        assert_eq!(first_word("Hello World"), "Hello");
    }
}
  • This lets you unit test private helpers.
  • The #[cfg(test)] attribute is only active when you run cargo test.
This slide should take about 5 minutes.

Run the tests in the playground in order to show their results.

Other Types of Tests

Integration Tests

If you want to test your library as a client, use an integration test.

Create a .rs file under tests/:

// tests/my_library.rs
use my_library::init;

#[test]
fn test_init() {
    assert!(init().is_ok());
}

These tests only have access to the public API of your crate.

Documentation Tests

Rust has built-in support for documentation tests:

#![allow(unused)]
fn main() {
/// Shortens a string to the given length.
///
/// ```
/// # use playground::shorten_string;
/// assert_eq!(shorten_string("Hello World", 5), "Hello");
/// assert_eq!(shorten_string("Hello World", 20), "Hello World");
/// ```
pub fn shorten_string(s: &str, length: usize) -> &str {
    &s[..std::cmp::min(length, s.len())]
}
}
  • Code blocks in /// comments are automatically seen as Rust code.
  • The code will be compiled and executed as part of cargo test.
  • Adding # in the code will hide it from the docs, but will still compile/run it.
  • Test the above code on the Rust Playground.

Compiler Lints and Clippy

The Rust compiler produces fantastic error messages, as well as helpful built-in lints. Clippy provides even more lints, organized into groups that can be enabled per-project.

#[deny(clippy::cast_possible_truncation)]
fn main() {
    let mut x = 200;
    x *= 2;
    println!("x fits in a u8, right? right? {}", x as u8);
}
This slide should take about 3 minutes.

Run the code sample and examine the error message. There are also lints visible here, but those will not be shown once the code compiles. Switch to the Playground site to show those lints.

After resolving the lints, run clippy on the playground site to show clippy warnings. Clippy has extensive documentation of its lints, and adds new lints (including default-deny lints) all the time.

Note that errors or warnings with help: ... can be fixed with cargo fix or via your editor.

Error Handling

In this segment:

This segment should take about 25 minutes

Panics

Rust handles fatal errors with a “panic”.

Rust will trigger a panic if a fatal error happens at runtime:

fn main() {
    let v = vec![10, 20, 30];
    println!("v[100]: {}", v[100]);
}
  • Panics are for unrecoverable and unexpected errors.
    • Panics are symptoms of bugs in the program.
    • Runtime failures like failed bounds checks can panic
    • Assertions (such as assert!) panic on failure
    • Purpose-specific panics can use the panic! macro.
  • A panic will “unwind” the stack, dropping values just as if the functions had returned.
  • Use non-panicking APIs (such as Vec::get) if crashing is not acceptable.
This slide should take about 3 minutes.

By default, a panic will cause the stack to unwind. The unwinding can be caught:

use std::panic;

fn main() {
    let result = panic::catch_unwind(|| "No problem here!");
    println!("{result:?}");

    let result = panic::catch_unwind(|| {
        panic!("oh no!");
    });
    println!("{result:?}");
}
  • Catching is unusual; do not attempt to implement exceptions with catch_unwind!
  • This can be useful in servers which should keep running even if a single request crashes.
  • This does not work if panic = 'abort' is set in your Cargo.toml.

Try Operator

Runtime errors like connection-refused or file-not-found are handled with the Result type, but matching this type on every call can be cumbersome. The try-operator ? is used to return errors to the caller. It lets you turn the common

match some_expression {
    Ok(value) => value,
    Err(err) => return Err(err),
}

into the much simpler

some_expression?

We can use this to simplify our error handling code:

use std::io::Read;
use std::{fs, io};

fn read_username(path: &str) -> Result<String, io::Error> {
    let username_file_result = fs::File::open(path);
    let mut username_file = match username_file_result {
        Ok(file) => file,
        Err(err) => return Err(err),
    };

    let mut username = String::new();
    match username_file.read_to_string(&mut username) {
        Ok(_) => Ok(username),
        Err(err) => Err(err),
    }
}

fn main() {
    //fs::write("config.dat", "alice").unwrap();
    let username = read_username("config.dat");
    println!("username or error: {username:?}");
}
This slide should take about 5 minutes.

Simplify the read_username function to use ?.

Key points:

  • The username variable can be either Ok(string) or Err(error).
  • Use the fs::write call to test out the different scenarios: no file, empty file, file with username.
  • Note that main can return a Result<(), E> as long as it implements std::process::Termination. In practice, this means that E implements Debug. The executable will print the Err variant and return a nonzero exit status on error.

Try Conversions

The effective expansion of ? is a little more complicated than previously indicated:

expression?

works the same as

match expression {
    Ok(value) => value,
    Err(err)  => return Err(From::from(err)),
}

The From::from call here means we attempt to convert the error type to the type returned by the function. This makes it easy to encapsulate errors into higher-level errors.

Example

use std::error::Error;
use std::fmt::{self, Display, Formatter};
use std::fs::File;
use std::io::{self, Read};

#[derive(Debug)]
enum ReadUsernameError {
    IoError(io::Error),
    EmptyUsername(String),
}

impl Error for ReadUsernameError {}

impl Display for ReadUsernameError {
    fn fmt(&self, f: &mut Formatter) -> fmt::Result {
        match self {
            Self::IoError(e) => write!(f, "IO error: {e}"),
            Self::EmptyUsername(path) => write!(f, "Found no username in {path}"),
        }
    }
}

impl From<io::Error> for ReadUsernameError {
    fn from(err: io::Error) -> Self {
        Self::IoError(err)
    }
}

fn read_username(path: &str) -> Result<String, ReadUsernameError> {
    let mut username = String::with_capacity(100);
    File::open(path)?.read_to_string(&mut username)?;
    if username.is_empty() {
        return Err(ReadUsernameError::EmptyUsername(String::from(path)));
    }
    Ok(username)
}

fn main() {
    //fs::write("config.dat", "").unwrap();
    let username = read_username("config.dat");
    println!("username or error: {username:?}");
}
This slide should take about 5 minutes.

The ? operator must return a value compatible with the return type of the function. For Result, it means that the error types have to be compatible. A function that returns Result<T, ErrorOuter> can only use ? on a value of type Result<U, ErrorInner> if ErrorOuter and ErrorInner are the same type or if ErrorOuter implements From<ErrorInner>.

A common alternative to a From implementation is Result::map_err, especially when the conversion only happens in one place.

There is no compatibility requirement for Option. A function returning Option<T> can use the ? operator on Option<U> for arbitrary T and U types.

A function that returns Result cannot use ? on Option and vice versa. However, Option::ok_or converts Option to Result whereas Result::ok turns Result into Option.

Dynamic Error Types

Sometimes we want to allow any type of error to be returned without writing our own enum covering all the different possibilities. The std::error::Error trait makes it easy to create a trait object that can contain any error.

use std::error::Error;
use std::fs;
use std::io::Read;

fn read_count(path: &str) -> Result<i32, Box<dyn Error>> {
    let mut count_str = String::new();
    fs::File::open(path)?.read_to_string(&mut count_str)?;
    let count: i32 = count_str.parse()?;
    Ok(count)
}

fn main() {
    fs::write("count.dat", "1i3").unwrap();
    match read_count("count.dat") {
        Ok(count) => println!("Count: {count}"),
        Err(err) => println!("Error: {err}"),
    }

    // Clean up
    fs::remove_file("count.dat").unwrap();
}
This slide should take about 5 minutes.

The read_count function can return std::io::Error (from file operations) or std::num::ParseIntError (from String::parse).

Boxing errors saves on code, but gives up the ability to cleanly handle different error cases differently in the program. As such it’s generally not a good idea to use Box<dyn Error> in the public API of a library, but it can be a good option in a program where you just want to display the error message somewhere.

Make sure to implement the std::error::Error trait when defining a custom error type so it can be boxed.

thiserror and anyhow

The thiserror and anyhow crates are widely used to simplify error handling.

  • thiserror is often used in libraries to create custom error types that implement From<T>.
  • anyhow is often used by applications to help with error handling in functions, including adding contextual information to your errors.
use anyhow::{bail, Context, Result};
use std::fs;
use std::io::Read;
use thiserror::Error;

#[derive(Clone, Debug, Eq, Error, PartialEq)]
#[error("Found no username in {0}")]
struct EmptyUsernameError(String);

fn read_username(path: &str) -> Result<String> {
    let mut username = String::with_capacity(100);
    fs::File::open(path)
        .with_context(|| format!("Failed to open {path}"))?
        .read_to_string(&mut username)
        .context("Failed to read")?;
    if username.is_empty() {
        bail!(EmptyUsernameError(path.to_string()));
    }
    Ok(username)
}

fn main() {
    //fs::write("config.dat", "").unwrap();
    match read_username("config.dat") {
        Ok(username) => println!("Username: {username}"),
        Err(err) => println!("Error: {err:?}"),
    }
}
This slide should take about 5 minutes.

thiserror

  • The Error derive macro is provided by thiserror, and has lots of useful attributes to help define error types in a compact way.
  • The std::error::Error trait is derived automatically.
  • The message from #[error] is used to derive the Display trait.

anyhow

  • anyhow::Error is essentially a wrapper around Box<dyn Error>. As such it’s again generally not a good choice for the public API of a library, but is widely used in applications.
  • anyhow::Result<V> is a type alias for Result<V, anyhow::Error>.
  • Actual error type inside of it can be extracted for examination if necessary.
  • Functionality provided by anyhow::Result<T> may be familiar to Go developers, as it provides similar usage patterns and ergonomics to (T, error) from Go.
  • anyhow::Context is a trait implemented for the standard Result and Option types. use anyhow::Context is necessary to enable .context() and .with_context() on those types.

Day 4: Afternoon Exercises

You should do the following Rustlings:

  • tests1
  • tests2
  • tests3
  • tests4
  • quiz2
  • errors1
  • errors2
  • errors3
  • errors4
  • errors5
  • errors6
  • try_from_into
  • quiz3

Welcome to Concurrency in Rust

Rust has full support for concurrency using OS threads with mutexes and channels.

The Rust type system plays an important role in making many concurrency bugs compile time bugs. This is often referred to as fearless concurrency since you can rely on the compiler to ensure correctness at runtime.

  • Rust lets us access OS concurrency toolkit: threads, sync. primitives, etc.
  • The type system gives us safety for concurrency without any special features.
  • The same tools that help with “concurrent” access in a single thread (e.g., a called function that might mutate an argument or save references to it to read later) save us from multi-threading issues.

Threads

Rust threads work similarly to threads in other languages:

use std::thread;
use std::time::Duration;

fn main() {
    thread::spawn(|| {
        for i in 1..10 {
            println!("Count in thread: {i}!");
            thread::sleep(Duration::from_millis(5));
        }
    });

    for i in 1..5 {
        println!("Main thread: {i}");
        thread::sleep(Duration::from_millis(5));
    }
}
  • Threads are all daemon threads, the main thread does not wait for them.
  • Thread panics are independent of each other.
  • Rust thread APIs look not too different from e.g. C++ ones.

  • Run the example.

    • 5ms timing is loose enough that main and spawned threads stay mostly in lockstep.
    • Notice that the program ends before the spawned thread reaches 10!
    • This is because main ends the program and spawned threads do not make it persist.
      • Compare to pthreads/C++ std::thread/boost::thread if desired.
  • How do we wait around for the spawned thread to complete?

  • thread::spawn returns a JoinHandle. Look at the docs.

    • JoinHandle has a .join() method that blocks.
  • Use let handle = thread::spawn(...) and later handle.join() to wait for the thread to finish and have the program count all the way to 10.

  • Now what if we want to return a value?

  • Look at docs again:

  • Use the Result return value from handle.join() to get access to the returned value.

  • Ok, what about the other case?

    • Trigger a panic in the thread. Note that this doesn’t panic main, but join() returns an error(Err).
  • Now we can return values from threads! What about taking inputs?

    • Capture something by reference in the thread closure.
    • An error message indicates we must move it.
    • Move it in, see we can compute and then return a derived value.
  • If we want to borrow?

    • Main kills child threads when it returns, but another function would just return and leave them running.
    • That would be stack use-after-return, which violates memory safety!
    • How do we avoid this? see next slide.

Scoped Threads

Normal threads cannot borrow from their environment:

use std::thread;

fn foo() {
    let s = String::from("Hello");
    thread::spawn(|| {
        println!("Length: {}", s.len());
    });
}

fn main() {
    foo();
}

However, you can use a scoped thread for this:

use std::thread;

fn main() {
    let s = String::from("Hello");

    thread::scope(|scope| {
        scope.spawn(|| {
            println!("Length: {}", s.len());
        });
    });
}
  • The reason for that is that when the thread::scope function completes, all the threads are guaranteed to be joined, so they can return borrowed data.
  • Normal Rust borrowing rules apply: you can either borrow mutably by one thread, or immutably by any number of threads.

Channels

Channel functionality is limited in the standard library. For multiple-producer multiple-consumer channels and other concurrent utilities, check out the crossbeam crate.

Rust channels have two parts: a Sender<T> and a Receiver<T>. The two parts are connected via the channel, but you only see the end-points.

use std::sync::mpsc;

fn main() {
    let (tx, rx) = mpsc::channel();

    tx.send(10).unwrap();
    tx.send(20).unwrap();

    println!("Received: {:?}", rx.recv());
    println!("Received: {:?}", rx.recv());

    let tx2 = tx.clone();
    tx2.send(30).unwrap();
    println!("Received: {:?}", rx.recv());
}
  • mpsc stands for Multi-Producer, Single-Consumer. Sender and SyncSender implement Clone (so you can make multiple producers) but Receiver does not.
  • send() and recv() return Result. If they return Err, it means the counterpart Sender or Receiver is dropped and the channel is closed.

Unbounded Channels

You get an unbounded and asynchronous channel with mpsc::channel():

use std::sync::mpsc;
use std::thread;
use std::time::Duration;

fn main() {
    let (tx, rx) = mpsc::channel();

    thread::spawn(move || {
        let thread_id = thread::current().id();
        for i in 1..10 {
            tx.send(format!("Message {i}")).unwrap();
            println!("{thread_id:?}: sent Message {i}");
        }
        println!("{thread_id:?}: done");
    });
    thread::sleep(Duration::from_millis(100));

    for msg in rx {
        println!("Main: got {msg}");
    }
}

Bounded Channels

With bounded (synchronous) channels, send can block the current thread:

use std::sync::mpsc;
use std::thread;
use std::time::Duration;

fn main() {
    let (tx, rx) = mpsc::sync_channel(3);

    thread::spawn(move || {
        let thread_id = thread::current().id();
        for i in 1..10 {
            tx.send(format!("Message {i}")).unwrap();
            println!("{thread_id:?}: sent Message {i}");
        }
        println!("{thread_id:?}: done");
    });
    thread::sleep(Duration::from_millis(100));

    for msg in rx {
        println!("Main: got {msg}");
    }
}
  • Calling send will block the current thread until there is space in the channel for the new message. The thread can be blocked indefinitely if there is nobody who reads from the channel.
  • A call to send will abort with an error (that is why it returns Result) if the channel is closed. A channel is closed when the receiver is dropped.
  • A bounded channel with a size of zero is called a “rendezvous channel”. Every send will block the current thread until another thread calls read.

Send and Sync

How does Rust know to forbid shared access across threads? The answer is in two traits:

  • Send: a type T is Send if it is safe to move a T across a thread boundary.
  • Sync: a type T is Sync if it is safe to move a &T across a thread boundary.

Send and Sync are unsafe traits. The compiler will automatically derive them for your types as long as they only contain Send and Sync types. You can also implement them manually when you know it is valid.

  • One can think of these traits as markers that the type has certain thread-safety properties.
  • They can be used in the generic constraints as normal traits.

Send

A type T is Send if it is safe to move a T value to another thread.

The effect of moving ownership to another thread is that destructors will run in that thread. So the question is when you can allocate a value in one thread and deallocate it in another.

As an example, a connection to the SQLite library must only be accessed from a single thread.

Sync

A type T is Sync if it is safe to access a T value from multiple threads at the same time.

More precisely, the definition is:

T is Sync if and only if &T is Send

This statement is essentially a shorthand way of saying that if a type is thread-safe for shared use, it is also thread-safe to pass references of it across threads.

This is because if a type is Sync it means that it can be shared across multiple threads without the risk of data races or other synchronization issues, so it is safe to move it to another thread. A reference to the type is also safe to move to another thread, because the data it references can be accessed from any thread safely.

Examples

Send + Sync

Most types you come across are Send + Sync:

  • i8, f32, bool, char, &str, …
  • (T1, T2), [T; N], &[T], struct { x: T }, …
  • String, Option<T>, Vec<T>, Box<T>, …
  • Arc<T>: Explicitly thread-safe via atomic reference count.
  • Mutex<T>: Explicitly thread-safe via internal locking.
  • AtomicBool, AtomicU8, …: Uses special atomic instructions.

The generic types are typically Send + Sync when the type parameters are Send + Sync.

Send + !Sync

These types can be moved to other threads, but they’re not thread-safe. Typically because of interior mutability:

  • mpsc::Sender<T>
  • mpsc::Receiver<T>
  • Cell<T>
  • RefCell<T>

!Send + Sync

These types are thread-safe, but they cannot be moved to another thread:

  • MutexGuard<T: Sync>: Uses OS level primitives which must be deallocated on the thread which created them.

!Send + !Sync

These types are not thread-safe and cannot be moved to other threads:

  • Rc<T>: each Rc<T> has a reference to an RcBox<T>, which contains a non-atomic reference count.
  • *const T, *mut T: Rust assumes raw pointers may have special concurrency considerations.

Shared State

Rust uses the type system to enforce synchronization of shared data. This is primarily done via two types:

  • Arc<T>, atomic reference counted T: handles sharing between threads and takes care to deallocate T when the last reference is dropped,
  • Mutex<T>: ensures mutually exclusive access to the T value.

Arc

Arc<T> allows shared read-only access via Arc::clone:

use std::sync::Arc;
use std::thread;

fn main() {
    let v = Arc::new(vec![10, 20, 30]);

    let handles = (0..5).map(|_| {
        let v = Arc::clone(&v);

        thread::spawn(move || {
            let thread_id = thread::current().id();
            println!("{thread_id:?}: {v:?}");
        })
    }).collect::<Vec<_>>();

    for handle in handles {
        handle.join().unwrap();
    }

    println!("v: {v:?}");
}
  • Arc stands for “Atomic Reference Counted”, a thread safe version of Rc that uses atomic operations.
  • Arc<T> implements Clone whether or not T does. It implements Send and Sync if and only if T implements them both.
  • Arc::clone() has the cost of atomic operations that get executed, but after that the use of the T is free.
  • Beware of reference cycles, Arc does not use a garbage collector to detect them.
    • std::sync::Weak can help.

Mutex

Mutex<T> ensures mutual exclusion and allows mutable access to T behind a read-only interface (another form of interior mutability):

use std::sync::Mutex;

fn main() {
    let v = Mutex::new(vec![10, 20, 30]);
    println!("v: {:?}", v.lock().unwrap());

    {
        let mut guard = v.lock().unwrap();
        guard.push(40);
    }

    println!("v: {:?}", v.lock().unwrap());
}

Notice how we have a impl<T: Send> Sync for Mutex<T> blanket implementation.

  • Mutex in Rust looks like a collection with just one element — the protected data.
    • It is not possible to forget to acquire the mutex before accessing the protected data.
  • You can get a &mut T from a &Mutex<T> by taking the lock. The MutexGuard ensures that the &mut T doesn’t outlive the lock being held.
  • Mutex<T> implements both Send and Sync iff (if and only if) T implements Send.
  • A read-write lock counterpart: RwLock.
  • Why does lock() return a Result?
    • If the thread that held the Mutex panicked, the Mutex becomes “poisoned” to signal that the data it protected might be in an inconsistent state. Calling lock() on a poisoned mutex fails with a PoisonError. You can call into_inner() on the error to recover the data regardless.

Example

Let us see Arc and Mutex in action:

use std::thread;
// use std::sync::{Arc, Mutex};

fn main() {
    let v = vec![10, 20, 30];
    let handle = thread::spawn(|| {
        v.push(10);
    });
    v.push(1000);

    handle.join().unwrap();
    println!("v: {v:?}");
}

Possible solution:

use std::sync::{Arc, Mutex};
use std::thread;

fn main() {
    let v = Arc::new(Mutex::new(vec![10, 20, 30]));

    let v2 = Arc::clone(&v);
    let handle = thread::spawn(move || {
        let mut v2 = v2.lock().unwrap();
        v2.push(10);
    });

    {
        let mut v = v.lock().unwrap();
        v.push(1000);
    }

    handle.join().unwrap();

    println!("v: {v:?}");
}

Notable parts:

  • v is wrapped in both Arc and Mutex, because their concerns are orthogonal.
    • Wrapping a Mutex in an Arc is a common pattern to share mutable state between threads.
  • v: Arc<_> needs to be cloned as v2 before it can be moved into another thread. Note move was added to the lambda signature.
  • Blocks are introduced to narrow the scope of the LockGuard as much as possible.

Day 5: Morning Exercises

You should do the following Rustlings:

  • arc1
  • threads1
  • threads2
  • threads3
  • clippy1
  • clippy2
  • clippy3

Async Rust

“Async” is a concurrency model where multiple tasks are executed concurrently by executing each task until it would block, then switching to another task that is ready to make progress. The model allows running a larger number of tasks on a limited number of threads. This is because the per-task overhead is typically very low and operating systems provide primitives for efficiently identifying I/O that is able to proceed.

Rust’s asynchronous operation is based on “futures”, which represent work that may be completed in the future. Futures are “polled” until they signal that they are complete.

Futures are polled by an async runtime, and several different runtimes are available.

Comparisons

  • Python has a similar model in its asyncio. However, its Future type is callback-based, and not polled. Async Python programs require a “loop”, similar to a runtime in Rust.

  • JavaScript’s Promise is similar, but again callback-based. The language runtime implements the event loop, so many of the details of Promise resolution are hidden.

async/await

At a high level, async Rust code looks very much like “normal” sequential code:

use futures::executor::block_on;

async fn count_to(count: i32) {
    for i in 1..=count {
        println!("Count is: {i}!");
    }
}

async fn async_main(count: i32) {
    count_to(count).await;
}

fn main() {
    block_on(async_main(10));
}

Key points:

  • Note that this is a simplified example to show the syntax. There is no long running operation or any real concurrency in it!

  • What is the return type of an async call?

    • Use let future: () = async_main(10); in main to see the type.
  • The “async” keyword is syntactic sugar. The compiler replaces the return type with a future.

  • You cannot make main async, without additional instructions to the compiler on how to use the returned future.

  • You need an executor to run async code. block_on blocks the current thread until the provided future has run to completion.

  • .await asynchronously waits for the completion of another operation. Unlike block_on, .await doesn’t block the current thread.

  • .await can only be used inside an async function (or block; these are introduced later).

Futures

Future is a trait, implemented by objects that represent an operation that may not be complete yet. A future can be polled, and poll returns a Poll.

#![allow(unused)]
fn main() {
use std::pin::Pin;
use std::task::Context;

trait Future {
    type Output;
    fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output>;
}

enum Poll<T> {
    Ready(T),
    Pending,
}
}

An async function returns an impl Future. It’s also possible (but uncommon) to implement Future for your own types. For example, the JoinHandle returned from tokio::spawn implements Future to allow joining to it.

The .await keyword, applied to a Future, causes the current async function to pause until that Future is ready, and then evaluates to its output.

  • The Future and Poll types are implemented exactly as shown; click the links to show the implementations in the docs.

  • We will not get to Pin and Context, as we will focus on writing async code, rather than building new async primitives. Briefly:

    • Context allows a Future to schedule itself to be polled again when an event occurs.

    • Pin ensures that the Future isn’t moved in memory, so that pointers into that future remain valid. This is required to allow references to remain valid after an .await.

Runtimes

A runtime provides support for performing operations asynchronously (a reactor) and is responsible for executing futures (an executor). Rust does not have a “built-in” runtime, but several options are available:

  • Tokio: performant, with a well-developed ecosystem of functionality like Hyper for HTTP or Tonic for gRPC.
  • async-std: aims to be a “std for async”, and includes a basic runtime in async::task.
  • smol: simple and lightweight

Several larger applications have their own runtimes. For example, Fuchsia already has one.

  • Note that of the listed runtimes, only Tokio is supported in the Rust playground. The playground also does not permit any I/O, so most interesting async things can’t run in the playground.

  • Futures are “inert” in that they do not do anything (not even start an I/O operation) unless there is an executor polling them. This differs from JS Promises, for example, which will run to completion even if they are never used.

Tokio

Tokio provides:

  • A multi-threaded runtime for executing asynchronous code.
  • An asynchronous version of the standard library.
  • A large ecosystem of libraries.
use tokio::time;

async fn count_to(count: i32) {
    for i in 1..=count {
        println!("Count in task: {i}!");
        time::sleep(time::Duration::from_millis(5)).await;
    }
}

#[tokio::main]
async fn main() {
    tokio::spawn(count_to(10));

    for i in 1..5 {
        println!("Main task: {i}");
        time::sleep(time::Duration::from_millis(5)).await;
    }
}
  • With the tokio::main macro, we can now make main async.

  • The spawn function creates a new, concurrent “task”.

  • Note: spawn takes a Future, you don’t call .await on count_to.

Further exploration:

  • Why does count_to not (usually) get to 10? This is an example of async cancellation. tokio::spawn returns a handle which can be awaited to wait until it finishes.

  • Try count_to(10).await instead of spawning.

  • Try awaiting the task returned from tokio::spawn.

Tasks

Rust has a task system, which is a form of lightweight threading.

A task has a single top-level future which the executor polls to make progress. That future may have one or more nested futures that its poll method polls, corresponding loosely to a call stack. Concurrency within a task is possible by polling multiple child futures, such as racing a timer and an I/O operation.

use tokio::io::{self, AsyncReadExt, AsyncWriteExt};
use tokio::net::TcpListener;

#[tokio::main]
async fn main() -> io::Result<()> {
    let listener = TcpListener::bind("127.0.0.1:0").await?;
    println!("listening on port {}", listener.local_addr()?.port());

    loop {
        let (mut socket, addr) = listener.accept().await?;

        println!("connection from {addr:?}");

        tokio::spawn(async move {
            socket.write_all(b"Who are you?\n").await.expect("socket error");

            let mut buf = vec![0; 1024];
            let name_size = socket.read(&mut buf).await.expect("socket error");
            let name = std::str::from_utf8(&buf[..name_size]).unwrap().trim();
            let reply = format!("Thanks for dialing in, {name}!\n");
            socket.write_all(reply.as_bytes()).await.expect("socket error");
        });
    }
}

Copy this example into your prepared src/main.rs and run it from there.

Try connecting to it with a TCP connection tool like nc or telnet.

  • Ask students to visualize what the state of the example server would be with a few connected clients. What tasks exist? What are their Futures?

  • This is the first time we’ve seen an async block. This is similar to a closure, but does not take any arguments. Its return value is a Future, similar to an async fn.

  • Refactor the async block into a function, and improve the error handling using ?.

Async Channels

Several crates have support for asynchronous channels. For instance tokio:

use tokio::sync::mpsc::{self, Receiver};

async fn ping_handler(mut input: Receiver<()>) {
    let mut count: usize = 0;

    while let Some(_) = input.recv().await {
        count += 1;
        println!("Received {count} pings so far.");
    }

    println!("ping_handler complete");
}

#[tokio::main]
async fn main() {
    let (sender, receiver) = mpsc::channel(32);
    let ping_handler_task = tokio::spawn(ping_handler(receiver));
    for i in 0..10 {
        sender.send(()).await.expect("Failed to send ping.");
        println!("Sent {} pings so far.", i + 1);
    }

    drop(sender);
    ping_handler_task.await.expect("Something went wrong in ping handler task.");
}
  • Change the channel size to 3 and see how it affects the execution.

  • Overall, the interface is similar to the sync channels as seen in the morning class.

  • Try removing the std::mem::drop call. What happens? Why?

  • The Flume crate has channels that implement both sync and async send and recv. This can be convenient for complex applications with both IO and heavy CPU processing tasks.

  • What makes working with async channels preferable is the ability to combine them with other futures to combine them and create complex control flow.

Futures Control Flow

Futures can be combined together to produce concurrent compute flow graphs. We have already seen tasks, that function as independent threads of execution.

Join

A join operation waits until all of a set of futures are ready, and returns a collection of their results. This is similar to Promise.all in JavaScript or asyncio.gather in Python.

use anyhow::Result;
use futures::future;
use reqwest;
use std::collections::HashMap;

async fn size_of_page(url: &str) -> Result<usize> {
    let resp = reqwest::get(url).await?;
    Ok(resp.text().await?.len())
}

#[tokio::main]
async fn main() {
    let urls: [&str; 4] = [
        "https://google.com",
        "https://httpbin.org/ip",
        "https://play.rust-lang.org/",
        "BAD_URL",
    ];
    let futures_iter = urls.into_iter().map(size_of_page);
    let results = future::join_all(futures_iter).await;
    let page_sizes_dict: HashMap<&str, Result<usize>> =
        urls.into_iter().zip(results.into_iter()).collect();
    println!("{:?}", page_sizes_dict);
}

Copy this example into your prepared src/main.rs and run it from there.

  • For multiple futures of disjoint types, you can use std::future::join! but you must know how many futures you will have at compile time. This is currently in the futures crate, soon to be stabilised in std::future.

  • The risk of join is that one of the futures may never resolve, this would cause your program to stall.

  • You can also combine join_all with join! for instance to join all requests to an http service as well as a database query. Try adding a tokio::time::sleep to the future, using futures::join!. This is not a timeout (that requires select!, explained in the next chapter), but demonstrates join!.

Select

A select operation waits until any of a set of futures is ready, and responds to that future’s result. In JavaScript, this is similar to Promise.race. In Python, it compares to asyncio.wait(task_set, return_when=asyncio.FIRST_COMPLETED).

Similar to a match statement, the body of select! has a number of arms, each of the form pattern = future => statement. When a future is ready, its return value is destructured by the pattern. The statement is then run with the resulting variables. The statement result becomes the result of the select! macro.

use tokio::sync::mpsc::{self, Receiver};
use tokio::time::{sleep, Duration};

#[derive(Debug, PartialEq)]
enum Animal {
    Cat { name: String },
    Dog { name: String },
}

async fn first_animal_to_finish_race(
    mut cat_rcv: Receiver<String>,
    mut dog_rcv: Receiver<String>,
) -> Option<Animal> {
    tokio::select! {
        cat_name = cat_rcv.recv() => Some(Animal::Cat { name: cat_name? }),
        dog_name = dog_rcv.recv() => Some(Animal::Dog { name: dog_name? })
    }
}

#[tokio::main]
async fn main() {
    let (cat_sender, cat_receiver) = mpsc::channel(32);
    let (dog_sender, dog_receiver) = mpsc::channel(32);
    tokio::spawn(async move {
        sleep(Duration::from_millis(500)).await;
        cat_sender.send(String::from("Felix")).await.expect("Failed to send cat.");
    });
    tokio::spawn(async move {
        sleep(Duration::from_millis(50)).await;
        dog_sender.send(String::from("Rex")).await.expect("Failed to send dog.");
    });

    let winner = first_animal_to_finish_race(cat_receiver, dog_receiver)
        .await
        .expect("Failed to receive winner");

    println!("Winner is {winner:?}");
}
  • In this example, we have a race between a cat and a dog. first_animal_to_finish_race listens to both channels and will pick whichever arrives first. Since the dog takes 50ms, it wins against the cat that take 500ms.

  • You can use oneshot channels in this example as the channels are supposed to receive only one send.

  • Try adding a deadline to the race, demonstrating selecting different sorts of futures.

  • Note that select! drops unmatched branches, which cancels their futures. It is easiest to use when every execution of select! creates new futures.

    • An alternative is to pass &mut future instead of the future itself, but this can lead to issues, further discussed in the pinning slide.

Pitfalls of async/await

Blocking the executor

Most async runtimes only allow IO tasks to run concurrently. This means that CPU blocking tasks will block the executor and prevent other tasks from being executed. An easy workaround is to use async equivalent methods where possible.

use futures::future::join_all;
use std::time::Instant;

async fn sleep_ms(start: &Instant, id: u64, duration_ms: u64) {
    std::thread::sleep(std::time::Duration::from_millis(duration_ms));
    println!(
        "future {id} slept for {duration_ms}ms, finished after {}ms",
        start.elapsed().as_millis()
    );
}

#[tokio::main(flavor = "current_thread")]
async fn main() {
    let start = Instant::now();
    let sleep_futures = (1..=10).map(|t| sleep_ms(&start, t, t * 10));
    join_all(sleep_futures).await;
}
  • Run the code and see that the sleeps happen consecutively rather than concurrently.

  • The "current_thread" flavor puts all tasks on a single thread. This makes the effect more obvious, but the bug is still present in the multi-threaded flavor.

  • Switch the std::thread::sleep to tokio::time::sleep and await its result.

  • Another fix would be to tokio::task::spawn_blocking which spawns an actual thread and transforms its handle into a future without blocking the executor.

  • You should not think of tasks as OS threads. They do not map 1 to 1 and most executors will allow many tasks to run on a single OS thread. This is particularly problematic when interacting with other libraries via FFI, where that library might depend on thread-local storage or map to specific OS threads (e.g., CUDA). Prefer tokio::task::spawn_blocking in such situations.

  • Use sync mutexes with care. Holding a mutex over an .await may cause another task to block, and that task may be running on the same thread.

Thanks!

Thank you for taking this course and welcome to the lovely Rust community, fellow Rustacean 🦀❤️

 -------------
< You did it! >
 -------------
        \
         \
            _~^~^~_
        \) /  o o  \ (/
          '_   U   _'
          / '-----' \

Glossary

The following is a glossary which aims to give a short definition of many Rust terms. For translations, this also serves to connect the term back to the English original.

  • allocate:
    Dynamic memory allocation on the heap.
  • argument:
    Information that is passed into a function or method.
  • block:
    See Blocks and scope.
  • borrow:
    See Borrowing.
  • borrow checker:
    The part of the Rust compiler which checks that all borrows are valid.
  • brace:
    { and }. Also called curly brace, they delimit blocks.
  • build:
    The process of converting source code into executable code or a usable program.
  • call:
    To invoke or execute a function or method.
  • channel:
    Used to safely pass messages between threads.
  • Comprehensive Rust 🦀:
    The courses here are jointly called Comprehensive Rust 🦀.
  • concurrency:
    The execution of multiple tasks or processes at the same time.
  • Concurrency in Rust:
    See Concurrency in Rust.
  • constant:
    A value that does not change during the execution of a program.
  • control flow:
    The order in which the individual statements or instructions are executed in a program.
  • crash:
    An unexpected and unhandled failure or termination of a program.
  • enumeration:
    A data type that holds one of several named constants, possibly with an associated tuple or struct.
  • error:
    An unexpected condition or result that deviates from the expected behavior.
  • error handling:
    The process of managing and responding to errors that occur during program execution.
  • exercise:
    A task or problem designed to practice and test programming skills.
  • function:
    A reusable block of code that performs a specific task.
  • garbage collector:
    A mechanism that automatically frees up memory occupied by objects that are no longer in use.
  • generics:
    A feature that allows writing code with placeholders for types, enabling code reuse with different data types.
  • immutable:
    Unable to be changed after creation.
  • integration test:
    A type of test that verifies the interactions between different parts or components of a system.
  • keyword:
    A reserved word in a programming language that has a specific meaning and cannot be used as an identifier.
  • library:
    A collection of precompiled routines or code that can be used by programs.
  • macro:
    Rust macros can be recognized by a ! in the name. Macros are used when normal functions are not enough. A typical example is format!, which takes a variable number of arguments, which isn’t supported by Rust functions.
  • main function:
    Rust programs start executing with the main function.
  • match:
    A control flow construct in Rust that allows for pattern matching on the value of an expression.
  • memory leak:
    A situation where a program fails to release memory that is no longer needed, leading to a gradual increase in memory usage.
  • method:
    A function associated with an object or a type in Rust.
  • module:
    A namespace that contains definitions, such as functions, types, or traits, to organize code in Rust.
  • move:
    The transfer of ownership of a value from one variable to another in Rust.
  • mutable:
    A property in Rust that allows variables to be modified after they have been declared.
  • ownership:
    The concept in Rust that defines which part of the code is responsible for managing the memory associated with a value.
  • panic:
    An unrecoverable error condition in Rust that results in the termination of the program.
  • parameter:
    A value that is passed into a function or method when it is called.
  • pattern:
    A combination of values, literals, or structures that can be matched against an expression in Rust.
  • payload:
    The data or information carried by a message, event, or data structure.
  • program:
    A set of instructions that a computer can execute to perform a specific task or solve a particular problem.
  • programming language:
    A formal system used to communicate instructions to a computer, such as Rust.
  • receiver:
    The first parameter in a Rust method that represents the instance on which the method is called.
  • reference counting:
    A memory management technique in which the number of references to an object is tracked, and the object is deallocated when the count reaches zero.
  • return:
    A keyword in Rust used to indicate the value to be returned from a function.
  • Rust:
    A systems programming language that focuses on safety, performance, and concurrency.
  • Rust Fundamentals:
    Days 1 to 4 of this course.
  • safe:
    Refers to code that adheres to Rust’s ownership and borrowing rules, preventing memory-related errors.
  • scope:
    The region of a program where a variable is valid and can be used.
  • standard library:
    A collection of modules providing essential functionality in Rust.
  • static:
    A keyword in Rust used to define static variables or items with a 'static lifetime.
  • string:
    A data type storing textual data. See String vs str for more.
  • struct:
    A composite data type in Rust that groups together variables of different types under a single name.
  • test:
    A Rust module containing functions that test the correctness of other functions.
  • thread:
    A separate sequence of execution in a program, allowing concurrent execution.
  • thread safety:
    The property of a program that ensures correct behavior in a multithreaded environment.
  • trait:
    A collection of methods defined for an unknown type, providing a way to achieve polymorphism in Rust.
  • trait bound:
    An abstraction where you can require types to implement some traits of your interest.
  • tuple:
    A composite data type that contains variables of different types. Tuple fields have no names, and are accessed by their ordinal numbers.
  • type:
    A classification that specifies which operations can be performed on values of a particular kind in Rust.
  • type inference:
    The ability of the Rust compiler to deduce the type of a variable or expression.
  • undefined behavior:
    Actions or conditions in Rust that have no specified result, often leading to unpredictable program behavior.
  • union:
    A data type that can hold values of different types but only one at a time.
  • unit test:
    Rust comes with built-in support for running small unit tests and larger integration tests. See Unit Tests.
  • unit type:
    Type that holds no data, written as a tuple with no members.
  • unsafe:
    The subset of Rust which allows you to trigger undefined behavior. See Unsafe Rust.
  • variable:
    A memory location storing data. Variables are valid in a scope.

Other Rust Resources

The Rust community has created a wealth of high-quality and free resources online.

Official Documentation

The Rust project hosts many resources. These cover Rust in general:

  • The Rust Programming Language: the canonical free book about Rust. Covers the language in detail and includes a few projects for people to build.
  • Rust By Example: covers the Rust syntax via a series of examples which showcase different constructs. Sometimes includes small exercises where you are asked to expand on the code in the examples.
  • Rust Standard Library: full documentation of the standard library for Rust.
  • The Rust Reference: an incomplete book which describes the Rust grammar and memory model.

More specialized guides hosted on the official Rust site:

  • The Rustonomicon: covers unsafe Rust, including working with raw pointers and interfacing with other languages (FFI).
  • Asynchronous Programming in Rust: covers the new asynchronous programming model which was introduced after the Rust Book was written.
  • The Embedded Rust Book: an introduction to using Rust on embedded devices without an operating system.

Unofficial Learning Material

A small selection of other guides and tutorial for Rust:

Please see the Little Book of Rust Books for even more Rust books.

Credits

The material here builds on top of the many great sources of Rust documentation. See the page on other resources for a full list of useful resources.

The material of Comprehensive Rust is licensed under the terms of the Apache 2.0 license, please see LICENSE for details.

Rust by Example

Some examples and exercises have been copied and adapted from Rust by Example. Please see the third_party/rust-by-example/ directory for details, including the license terms.