Rust With Via - Teaching Rust to Python Developers

This website contains complementary material for Comprehensive Rust, an excellent Rust course developed by the Android team at Google. It is based on the changes we made to the course while running it at Via to make it more suitable for our Python-centric team.

Why We Offered This Course?

Our decision to offer this course stemmed from a desire to promote Rust adoption within Via - enhancing our quality of work and developing our engineers skills set. We believe Rust's unique combination of performance, safety, and focus on resilient software aligns perfectly with our organizational goals. By familiarizing our colleagues with Rust, we aimed to inspire them to explore its potential and consider incorporating it into their projects.

We structured the course around three primary objectives:

Demonstrate Rust's value: We highlighted Rust's rich type system, strong performance, and modern tooling. We also showcased how Rust's emphasis on safety can help identify and prevent bugs earlier in the development process.
Show that while Rust has its complexities, building reliable and performant software with it is, contrary to common misconceptions, certainly achievable.
Build confidence and motivation: We aimed to equip our team with the knowledge and skills necessary to be ready to incorporate Rust into their work, and feel excited about it.

Tailoring the Course for Python Developers

The course was attended by approximately 15 developers from our algorithms group. Given their extensive experience with Python, and the fact the the original course seems more oriented towards C++ developers, we made several adjustments to the original curriculum. Most notably, we added a dedicated section and an exercise on using pyo3 to create native Python extensions. Additionally, we replaced some of the exercises with problems that were more relevant to our team's work.

To enhance the learning experience, we encouraged students to use VSCode as their development environment. We spent time setting up VSCode with essential tools like rust analyzer, clippy, and rustfmt to streamline their workflow.

Outcomes

At the conclusion of the workshop, we conducted a survey to gather feedback from the participants. The results were very positive, with 90% of respondents expressing a desire to use Rust in their projects and 75% feeling confident in their ability to do so with appropriate guidance.

Since then, several engineers have started using Rust in their projects, and we have concrete plans to significantly increase our Rust codebase in the coming months.

We hope the materials on this site prove helpful and inspire you and your team to dive deeper into Rust. Happy coding!

Schedule

The course was divided into 4 days and 2 weeks, with two consecutive days per week. Each day was about 6 hours long, with a 1-hour break for lunch.

We primarily followed the schedule outlined in the original course materials. However, some sections were omitted due to their advanced nature (e.g., interior mutability and unsafe Rust) or deemed less relevant to the course objectives (e.g., testing and the module system). The time saved from these omissions was reallocated to additional exercises and discussions. We aimed to schedule most exercises just before breaks, followed by a solution discussion after the break.

Day 1

This day covered the content of the first day from the original course and the initial section of the second day, focusing on Pattern Matching. We opted to skip the Nested Arrays and Elevator Events exercises, concluding the day with the Expression Evaluation exercise.

Day 2

Leveraging the time saved from Day 1, we began Day 2 with a dedicated discussion about "why Rust." This session delved into Rust's unique features and explored how they benefit development practices. While introducing this topic later in the course might seem unconventional, we found it fostered a more technical and concrete conversation.

Some of the key topics covered included:

Memory Management: Rust's unique memory model eliminates memory management bugs without relying on a garbage collector.
Type System: Rust's rich type system allows for expressive and safe code.
- Exclusive and shared references
- Mutability in declarations/signatures.
- Rust's enums that can have data attached to them.
- Consuming functions.
- Rust's unique error handling mechanism
- Composition through traits, as opposed to object-oriented programming
We emphasized how these features help write more reliable and maintainable code.
Concurrency: Rust's "Fearless Concurrency" model enables safe and efficient parallel programming.
Zero-Cost Abstractions:
- Monomorphization
- The newtype pattern
Modern, Batteries Included Toolset that makes Rust development a breeze.
- Cargo
- rustfmt
- clippy
- Documentation generation
- rust-analyzer
Python Integration: Rust makes it easy to create Python extensions. Several prominent Python projects leverage Rust, including:
- Pydantic
- Polars
- Ruff
- tokenizers
- orjson
- cryptography
It's also easy to call Python code from Rust.
Industry Adoption: Rust is gaining significant traction within the industry, used by major companies and currently ranking among the top 20 most popular languages according to the TIOBE index.
Developer Satisfaction: Programming in Rust is fun! It is the most admired language for the 8'th year in a row according to the Stack Overflow Developer Survey. We also acknowledged some of Rust's drawbacks, including:
Steeper learning curve
Slower compilation times
Immature ecosystem in certain domains

The remaining portion of Day 2 followed the original course schedule, except for replacing the last ROT13 exercise with our Inventory exercise.

Day 3

This day covered the material from the original course's third day with the following exceptions:

We skipped the sections on the Drop trait and interior mutability. but briefly discussed their existence and importance.
We replaced the Health Statistics exercise with our Non Empty Vector exercise.
We replaced the Protobuf parsing exercise with our Lifetimes exercise.

Day 4

We skipped the Modules, Testing and Unsafe Rust sections, leaving only the Iterators and Error Handling sections.

We added a section on how to create Python extensions using PyO3, and concluded the course with the Mini GTFS exercise.

Miscellaneous Tips

Encourage students to complete the exercises using a real IDE, such as VSCode, rather than a in the browser. Spend time setting up the IDE with essential tools like Rust Analyzer, Clippy, and rustfmt to optimize their workflow.
A few days before the course begins, send participants detailed instructions on how to set up their development environment. This will allow them to hit the ground running on day one.
Avoid getting caught up in details that are not critical to the course objectives. For example, we found that discussing the behavior of rustc with respect to integer overflow, especially on the first day, wasn't necessary.
When discussing references, use the terms "exclusive reference" and "shared reference" instead of "mutable reference" and "immutable reference."

Welcome

In this section, we will learn how to write Python extension modules in Rust using the PyO3 library.

We will cover how to define Python functions, classes, modules, and exceptions in Rust.

PyO3 is a rich library that provides many more features (e.g., interaction with the GIL, async) that we will not cover. However, you are welcome to explore its excellent documentation for more information.

Setup

The easiest way to create a Python extension in Rust by using maturin, a tool that simplifies the process of configuring, building and publishing Rust-based Python packages.

Starting a new project

In a fresh virtualenv, install maturin and create a new project:

pip install maturin
maturin new hello-python
cd hello-python

A skeleton project will be created. It contains a small example of a Python module implemented in Rust, with a function that returns the sum of two numbers as a string:

use pyo3::prelude::*;
/// Formats the sum of two numbers as string.
#[pyfunction]
fn sum_as_string(a: usize, b: usize) -> PyResult<String> {
    Ok((a + b).to_string())
}

/// A Python module implemented in Rust.
#[pymodule]
fn hello_python(_py: Python, m: &PyModule) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(sum_as_string, m)?)?;
    Ok(())
}

Building the project

To build the project, run:

maturin develop

This will compile the crate, build the python package and install it in the active virtualenv.

Now you can use it from python:

import hello_python
hello_python.sum_as_string(5, 20)

maturing develop --release will build the project in release mode.

Functions

The PyO3 prelude provides the pyfunction attribute macro to define a Python function from a Rust function. To make it available to Python, we need also to define a module that exports the function.

use pyo3::prelude::*;

#[pyfunction]
fn largest_positive(x: Vec<i64>) -> Option<i64> {
    x.into_iter().filter(|&x| x > 0).max()
}

#[pymodule]
fn my_module(m: &Bound<'_, PyModule>) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(largest_positive, m)?)?;
    Ok(())
}

PyO3 automatically converts Rust types to Python types and vice versa:

from my_module import largest_positive

largest_positive([1, -2, 3, -4, 5])  # 5
largest_positive([-1, -2, -3])  # None

Type conversions are defined through the FromPyObject and IntoPy<PyObject> traits, which are implemented for many standard Rust types. Checkout the table in PyO3 documentation for more information.

There is also a derive macro for FromPyObject, which makes it easy to use your own types (structs and enums) as function arguments.

Python classes

The attribute #[pyclass] is used to define a Python class from a Rust struct or enum:

use pyo3::prelude::*;

#[pyclass]
#[derive(Clone)]
struct Location {
    lat: f64,
    lon: f64,
}

#[pyclass]
struct RideRequest {
    rider_name: String,
    origin: Location,
    destination: Location,
}

#[pymodule]
fn my_module(m: &Bound<'_, PyModule>) -> PyResult<()> {
    m.add_class::<RideRequest>()?;
    m.add_class::<Location>()?;
    Ok(())
}

By default, the class is not instantiable from Python. To define a constructor, add a method with the #[new] attribute:

#[pymethods]
impl Location {
    #[new]
    fn new(lat: f64, lon: f64) -> Self {
        Location { lat, lon }
    }
}

#[pymethods]
impl RideRequest {
    #[new]
    fn new(rider_name: String, origin: Location, destination: Location) -> Self {
        RideRequest { rider_name, origin, destination }
    }
}

from my_module import Location, RideRequest

origin = Location(32.070575, 34.770354)
destination = Location(32.077381, 34.793280)
request = RideRequest("Alice", origin, destination)

pycalsses implement FromPyObject and IntoPy<PyObject> so that they can be used as arguments and return values.

Methods

To define methods on a Python class, add the #[pymethods] attribute to the impl block of a pyclass.

use pyo3::prelude::*;

#[pyclass]
struct Point {
    x: f64,
    y: f64,
}

#[pymethods]
impl Point {
    #[new]
    fn new(x: f64, y: f64) -> Self {
        Point{x, y}
    }

    fn magnitude(&self) -> f64 {
        (self.x.powi(2) + self.y.powi(2)).sqrt()
    }

    fn scale(&mut self, factor: f64) {
        self.x *= factor;
        self.y *= factor;
    }

    #[getter]
    fn x(&self) -> f64 {
        self.x
    }

    #[setter]
    fn set_x(&mut self, x: f64) {
        self.x = x;
    }

    #[staticmethod]
    fn origin() -> Self {
        Point{x: 0.0, y: 0.0}
    }

    fn __repr__(&self) -> String {
        format!("Point({}, {})", self.x, self.y)
    }

#[pymodule]
fn my_module(m: &Bound<'_, PyModule>) -> PyResult<()> {
    m.add_class::<Point>()?;
    Ok(())
}
}

from my_module import Point

p = Point(3.0, 4.0)
print(p.magnitude())  # 5.0
p.scale(2.0)
p.x = 10.0
print(p.x)  # 10.0
print(p)  # Point(10.0, 8.0)
print(Point.origin())  # Point(0.0, 0.0)

PyO3 provides many more convenience macros, including automatic generation of getters and setters, __eq__ and __hash__ methods, and more.

Error Handling

Python and Rust have distinct mechanisms for handling errors.

In Python, exceptions are raised when an error occurs. These exceptions propagate up the call stack until they are caught and handled.

In Rust, errors are returned as values. The Result<T, E> enum represents the result of an operation that may fail. The T type is the value returned on success, and the E type is the error returned on failure.

PyO3 bridges the gap between Python and Rust error handling by using the PyResult<T> type, which is an alias for Result<T, PyErr>. Here, PyErr represents a Python exception. If a PyResult returns from Rust to Python through PyO3 and it is an Err variant, the associated exception will be raised.

#![allow(unused)]
fn main() {
use pyo3::exceptions::PyValueError;
use pyo3::prelude::*;

#[pyfunction]
fn divide(x: i32, y: i32) -> PyResult<i32> {
    if y == 0 {
        return Err(PyValueError::new_err("division by zero"));
    }
    Ok(x / y)
}

#[pymodule]
fn my_module(m: &PyModule) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(divide, m)?)?;
    Ok(())
}
}

from my_module import divide

try:
    divide(1, 0)
except ValueError as e:
    print(e)  # division by zero

Many error types in the standard library implement Into<PyErr>, allowing the use of the ? operator to easily propagate errors.

#![allow(unused)]
fn main() {
use pyo3::prelude::*;

#[pyfunction]
fn parse_int(s: &str) -> PyResult<i64> {
    Ok(s.parse()?)
}

#[pymodule]
fn my_module(m: &PyModule) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(parse_int, m)?)?;
    Ok(())
}
}

from my_module import parse_int

try:
    parse_int("abc")
except ValueError as e:
    print(e)  # invalid digit found in string

Conveniently, anyhow::Error implements Into<PyErr>, so you can use anyhow for error handling in Rust and propagate errors to Python with the ? operator.

#![allow(unused)]
fn main() {
use anyhow::Context;
use pyo3::prelude::*;

#[pyfunction]
fn divide(x: i32, y: i32) -> PyResult<i32> {
    Ok(x.checked_div(y).context("division by zero")?)
}
}

Inventory

In this exercise, we implement standard traits on Inventory, a type that represents an inventory list for a store.

You will need to implement two traits:

From<Vec<(String, i32)>> to construct an Inventory from a vector of item name-quantity pairs.
Add<Inventory> to merge two inventories.

You can test your implementation by running cargo test.

#![allow(unused)]
fn main() {
use std::collections::HashMap;

pub struct Inventory {
    items: HashMap<String, u32>,
}

// todo: implement the `From<Vec<(String, u32)>>` trait for `Inventory`.

// todo: implement the `Add<Inventory>` trait for `Inventory`.

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_from_vec() {
        let items = vec![("apple".to_string(), 10), ("banana".to_string(), 20)];
        let inventory: Inventory = items.into();
        let expected: HashMap<String, u32> =
            [("apple".to_string(), 10), ("banana".to_string(), 20)]
                .iter()
                .cloned()
                .collect();
        assert_eq!(inventory.items, expected);
    }

    #[test]
    fn test_from_empty_vec() {
        let items: Vec<(String, u32)> = Vec::new();
        let inventory: Inventory = items.into();
        let expected: HashMap<String, u32> = HashMap::new();
        assert_eq!(inventory.items, expected);
    }

    #[test]
    fn test_add() {
        let items1 = vec![("apple".to_string(), 10), ("banana".to_string(), 20)];
        let items2 = vec![("banana".to_string(), 5), ("orange".to_string(), 15)];
        let inventory1: Inventory = items1.into();
        let inventory2: Inventory = items2.into();

        let combined_inventory = inventory1 + inventory2;

        let expected: HashMap<String, u32> = [
            ("apple".to_string(), 10),
            ("banana".to_string(), 25),
            ("orange".to_string(), 15),
        ]
        .iter()
        .cloned()
        .collect();
        assert_eq!(combined_inventory.items, expected);
    }

    #[test]
    fn test_add_empty_inventories() {
        let inventory1: Inventory = Vec::new().into();
        let inventory2: Inventory = Vec::new().into();

        let combined_inventory = inventory1 + inventory2;

        let expected: HashMap<String, u32> = HashMap::new();
        assert_eq!(combined_inventory.items, expected);
    }

    #[test]
    fn test_add_non_empty_with_empty_inventory() {
        let items = vec![("apple".to_string(), 10), ("banana".to_string(), 20)];
        let inventory1: Inventory = items.into();
        let inventory2: Inventory = Vec::new().into();

        let combined_inventory = inventory1 + inventory2;

        let expected: HashMap<String, u32> =
            [("apple".to_string(), 10), ("banana".to_string(), 20)]
                .iter()
                .cloned()
                .collect();
        assert_eq!(combined_inventory.items, expected);
    }

    #[test]
    fn test_add_empty_with_non_empty_inventory() {
        let items = vec![("apple".to_string(), 10), ("banana".to_string(), 20)];
        let inventory1: Inventory = Vec::new().into();
        let inventory2: Inventory = items.into();

        let combined_inventory = inventory1 + inventory2;

        let expected: HashMap<String, u32> =
            [("apple".to_string(), 10), ("banana".to_string(), 20)]
                .iter()
                .cloned()
                .collect();
        assert_eq!(combined_inventory.items, expected);
    }
}
}

Solution

#![allow(unused)]
fn main() {
use std::collections::HashMap;
use std::convert::From;
use std::ops::Add;

pub struct Inventory {
    items: HashMap<String, u32>,
}

impl From<Vec<(String, u32)>> for Inventory {
    fn from(items: Vec<(String, u32)>) -> Self {
        let mut inventory = HashMap::new();
        for (name, qty) in items {
            inventory.insert(name, qty);
        }
        Self { items: inventory }
        // We will learn more about iterators in day 4, but here is a more idiomatic way to do the above:
        // Self {
        //     items: items.into_iter().collect(),
        // }
    }
}

impl Add for Inventory {
    type Output = Inventory;

    fn add(self, other: Inventory) -> Inventory {
        let mut combined = self.items;
        for (name, qty) in other.items {
            // This is a more idiomatic way to do the following:
            // match combined.get(&name) {
            //     Some(existing_qty) => {
            //         combined.insert(name, existing_qty + qty);
            //     }
            //     None => {
            //         combined.insert(name, qty);
            //     }
            // }
            *combined.entry(name).or_insert(0) += qty;
        }
        Self { items: combined }
    }
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_from_vec() {
        let items = vec![("apple".to_string(), 10), ("banana".to_string(), 20)];
        let inventory: Inventory = items.into();
        let expected: HashMap<String, u32> =
            [("apple".to_string(), 10), ("banana".to_string(), 20)]
                .iter()
                .cloned()
                .collect();
        assert_eq!(inventory.items, expected);
    }

    #[test]
    fn test_from_empty_vec() {
        let items: Vec<(String, u32)> = Vec::new();
        let inventory: Inventory = items.into();
        let expected: HashMap<String, u32> = HashMap::new();
        assert_eq!(inventory.items, expected);
    }

    #[test]
    fn test_add() {
        let items1 = vec![("apple".to_string(), 10), ("banana".to_string(), 20)];
        let items2 = vec![("banana".to_string(), 5), ("orange".to_string(), 15)];
        let inventory1: Inventory = items1.into();
        let inventory2: Inventory = items2.into();

        let combined_inventory = inventory1 + inventory2;

        let expected: HashMap<String, u32> = [
            ("apple".to_string(), 10),
            ("banana".to_string(), 25),
            ("orange".to_string(), 15),
        ]
        .iter()
        .cloned()
        .collect();
        assert_eq!(combined_inventory.items, expected);
    }

    #[test]
    fn test_add_empty_inventories() {
        let inventory1: Inventory = Vec::new().into();
        let inventory2: Inventory = Vec::new().into();

        let combined_inventory = inventory1 + inventory2;

        let expected: HashMap<String, u32> = HashMap::new();
        assert_eq!(combined_inventory.items, expected);
    }

    #[test]
    fn test_add_non_empty_with_empty_inventory() {
        let items = vec![("apple".to_string(), 10), ("banana".to_string(), 20)];
        let inventory1: Inventory = items.into();
        let inventory2: Inventory = Vec::new().into();

        let combined_inventory = inventory1 + inventory2;

        let expected: HashMap<String, u32> =
            [("apple".to_string(), 10), ("banana".to_string(), 20)]
                .iter()
                .cloned()
                .collect();
        assert_eq!(combined_inventory.items, expected);
    }

    #[test]
    fn test_add_empty_with_non_empty_inventory() {
        let items = vec![("apple".to_string(), 10), ("banana".to_string(), 20)];
        let inventory1: Inventory = Vec::new().into();
        let inventory2: Inventory = items.into();

        let combined_inventory = inventory1 + inventory2;

        let expected: HashMap<String, u32> =
            [("apple".to_string(), 10), ("banana".to_string(), 20)]
                .iter()
                .cloned()
                .collect();
        assert_eq!(combined_inventory.items, expected);
    }
}
}

Instructor Notes

This exercise is intended to replace the ROT13 exercise from the original version.
Students can choose whether the implementation of From<Vec<(String, i32)>> adds the quantities of duplicate names or keeps the last one.
Ensure that students who finish early use the entry API to implement Add.

Non Empty Vector

In this exercise, you will implement a non-empty vector type. This type is a vector that is compile-time guaranteed to have at least one element. It can be useful when you want to ensure that a vector is never empty and avoid the overhead of checking for emptiness.

For example, a function can return a NonEmptyVec<T> instead of a Vec<T> to guarantee to its caller that the vector has at least one element. Or, a function can receive a NonEmptyVec<T> as an argument to force the caller to guarantee that the vector has at least one element.

To Do

Implement the following methods for NonEmptyVec:
- new: creates a new NonEmptyVec with a single element.
- push: pushes an element to the end of the NonEmptyVec.
- get: returns the element at the given index, if it exists.
- first (last): returns the first (last) element of the NonEmptyVec. Unlike Vec::first, this method should not return an Option!
- first_mut (last_mut): returns a mutable reference to the first (last) element of the NonEmptyVec.
- pop: removes the last element from the NonEmptyVec and returns it, along with the remaining elements. Unlike Vec::pop, this method should not return an Option (pop should succeed even if there is a single element).
Think carefully about the signature of these methods. For each input and output argument, consider whether it should be owned, borrowed, or mutably borrowed.
Fill in the missing parts in the unit tests below (We didn't write the full tests for you to avoid spoiling the signatures).

You can test your implementation by running cargo test.

#![allow(unused)]
fn main() {
pub struct NonEmptyVec<T> {
    head: T,
    tail: Vec<T>,
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_new() {
        let x: NonEmptyVec<i32> = NonEmptyVec::new(5);
        todo!("assert that x.first() returns 5, and that x.last() also returns 5.");
    }

    #[test]
    fn test_push() {
        let mut x: NonEmptyVec<i32> = NonEmptyVec::new(5);
        x.push(10);
        todo!("assert that x.first() returns 5, and that x.last() returns 10.");
    }

    #[test]
    fn test_get() {
        let mut x: NonEmptyVec<i32> = NonEmptyVec::new(5);
        x.push(10);
        x.push(15);
        todo!("check the result of `x.get` for indices 0, 1, 2 and 3");
    }

    #[test]
    fn test_first_mut() {
        let mut x: NonEmptyVec<i32> = NonEmptyVec::new(5);
        todo!("use `x.first_mut()` to modify the first item, and then assert that `x.first()` returns the modified value.");
    }

    #[test]
    fn test_last_mut() {
        let mut x: NonEmptyVec<i32> = NonEmptyVec::new(5);
        x.push(10);
        todo!("use `x.last_mut()` to modify the last item, and then assert that `x.last()` returns the modified value.");
    }

    #[test]
    fn test_pop() {
        let mut x: NonEmptyVec<i32> = NonEmptyVec::new(5);
        x.push(10);
        x.push(15);
        todo!("call `x.pop()`, and verify that the popped item is 15, and the remaining values are 5 and 10.");
    }

    #[test]
    fn test_pop_single_item() {
        let x: NonEmptyVec<i32> = NonEmptyVec::new(5);
        todo!("call `x.pop()`, and verify that the popped item is 5, and that there are no remaining values.");
    }
}
}

Solution

nonempty, non_empty_vec, and vec1 are crates that implement a non-empty vector type. Out implementation is similar to the one in nonempty.

#![allow(unused)]
fn main() {
pub struct NonEmptyVec<T> {
    head: T,
    tail: Vec<T>,
}

impl<T> NonEmptyVec<T> {
    pub fn new(item: T) -> Self {
        NonEmptyVec {
            head: item,
            tail: Vec::new(),
        }
    }
    pub fn push(&mut self, item: T) {
        self.tail.push(item);
    }
    pub fn get(&self, index: usize) -> Option<&T> {
        if index == 0 {
            Some(&self.head)
        } else {
            self.tail.get(index - 1)
        }
    }
    pub fn first(&self) -> &T {
        &self.head
    }
    pub fn first_mut(&mut self) -> &mut T {
        &mut self.head
    }
    pub fn last(&self) -> &T {
        self.tail.last().unwrap_or(&self.head)
    }
    pub fn last_mut(&mut self) -> &mut T {
        self.tail.last_mut().unwrap_or(&mut self.head)
    }
    pub fn pop(mut self) -> (T, Vec<T>) {
        match self.tail.pop() {
            Some(item) => (item, self.into()),
            None => (self.head, Vec::new()),
        }
    }
}

impl<T> From<NonEmptyVec<T>> for Vec<T> {
    fn from(x: NonEmptyVec<T>) -> Vec<T> {
        let mut vec = vec![x.head];
        vec.extend(x.tail);
        vec
    }
}
#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_new() {
        let x: NonEmptyVec<i32> = NonEmptyVec::new(5);
        assert_eq!(x.first(), &5);
        assert_eq!(x.last(), &5);
    }

    #[test]
    fn test_push() {
        let mut x: NonEmptyVec<i32> = NonEmptyVec::new(5);
        x.push(10);
        assert_eq!(x.first(), &5);
        assert_eq!(x.last(), &10);
    }

    #[test]
    fn test_get() {
        let mut x: NonEmptyVec<i32> = NonEmptyVec::new(5);
        x.push(10);
        x.push(15);
        assert_eq!(x.get(0), Some(&5));
        assert_eq!(x.get(1), Some(&10));
        assert_eq!(x.get(2), Some(&15));
        assert_eq!(x.get(3), None);
    }

    #[test]
    fn test_first_mut() {
        let mut x: NonEmptyVec<i32> = NonEmptyVec::new(5);
        *x.first_mut() = 10;
        assert_eq!(x.first(), &10);
    }

    #[test]
    fn test_last_mut() {
        let mut x: NonEmptyVec<i32> = NonEmptyVec::new(5);
        x.push(10);
        *x.last_mut() = 15;
        assert_eq!(x.last(), &15);
    }

    #[test]
    fn test_pop() {
        let mut x: NonEmptyVec<i32> = NonEmptyVec::new(5);
        x.push(10);
        x.push(15);
        let (item, y) = x.pop();
        assert_eq!(item, 15);
        assert_eq!(y, vec![5, 10]);
    }

    #[test]
    fn test_pop_single_item() {
        let x: NonEmptyVec<i32> = NonEmptyVec::new(5);
        let (item, y) = x.pop();
        assert_eq!(item, 5);
        assert!(y.is_empty());
    }
}
}

Instructor Notes

This exercise is intended to replace the Health Statistics Exercise exercise from the original version.
The solution includes links to several existing crates. If you have time, it can be beneficial to review their source code. In particular, non_empty_vec and vec1 take different approaches of wrapping a regular Vec.
You can ask students who finish early to replace

pub struct NonEmptyVec<T> {
    head: T,
    tail: Vec<T>,
}

with

pub struct NonEmptyVec<T>(Vec<T>)

and fix the body of all the methods accordingly.

Our requirements that pop should succeed even when there is a single item is different from what these crates implement. We required it since it forces pop to take ownership of self. Consuming methods are quite unique to Rust so this can lead to an interesting discussion on how to (safely) implement pop in another language.

Lifetimes

This exercise is mostly taken from exercise 5 of LifetimeKata, a really great resource for practicing lifetimes!

In this exercise, we implement a function that, given two sentences, returns the words that are present in the first sentence but not in the second, and vice versa. The resulting struct Difference holds references to the words in the original sentences (rather copies of the words), so we need to ensure that the lifetimes are correct.

You need to implement the missing parts of the function find_difference and make the tests pass. Note that this includes adding lifetime specifiers to both find_difference and Difference.

You can test your implementation by running cargo test.

Read more

Having Difference hold references to the original sentences is a design choice. It could also hold the words themselves like so:
#![allow(unused)]
fn main() {
struct Difference {
    first_only: Vec<String>,
    second_only: Vec<String>,
}
}
This would simplify the implementation (no lifetime specifiers needed) but would be less efficient (as we would be copying the words). In real life, we often accept this trade-off and prefer the simpler solution, as it would be efficient enough.

#![allow(unused)]
fn main() {
use std::collections::HashSet;

#[derive(Debug, Default)]
pub struct Difference {
    first_only: Vec<&str>,
    second_only: Vec<&str>,
}

fn words(sentence: &str) -> HashSet<&str> {
    sentence
        .split(' ')
        .filter(|word| !word.is_empty()) // this makes sure we don't have empty words
        .collect()
}

pub fn find_difference(sentence1: &str, sentence2: &str) -> Difference {
    let sentence_1_words = words(sentence1);
    let sentence_2_words = words(sentence2);

    let diff = Difference::default();

    todo!("implement the rest!");
    // hint: iterate on the words of each sentence and use the `contains` method of `HashSet`

    // sorting simplifies the equality assertions below
    diff.first_only.sort();
    diff.second_only.sort();

    diff
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_completely_disjoint_sentences() {
        let sentence1 = "apple banana cherry";
        let sentence2 = "dog elephant fox";
        let result = find_difference(sentence1, sentence2);

        assert_eq!(result.first_only, vec!["apple", "banana", "cherry"]);
        assert_eq!(result.second_only, vec!["dog", "elephant", "fox"]);
    }

    #[test]
    fn test_identical_sentences() {
        let sentence1 = "apple banana cherry";
        let sentence2 = "apple banana cherry";
        let result = find_difference(sentence1, sentence2);

        assert!(result.first_only.is_empty());
        assert!(result.second_only.is_empty());
    }

    #[test]
    fn test_some_common_words() {
        let sentence1 = "apple banana cherry";
        let sentence2 = "banana cherry dog";
        let result = find_difference(sentence1, sentence2);

        assert_eq!(result.first_only, vec!["apple"]);
        assert_eq!(result.second_only, vec!["dog"]);
    }

    #[test]
    fn test_empty_sentences() {
        let sentence1 = "";
        let sentence2 = "";
        let result = find_difference(sentence1, sentence2);

        assert!(result.first_only.is_empty());
        assert!(result.second_only.is_empty());
    }

    #[test]
    fn test_one_empty_sentence() {
        let sentence1 = "apple banana cherry";
        let sentence2 = "";
        let result = find_difference(sentence1, sentence2);

        assert_eq!(result.first_only, vec!["apple", "banana", "cherry"]);
        assert!(result.second_only.is_empty());
    }

    #[test]
    fn test_drop_first_sentence_before_second() {
        todo!("Uncomment the code bellow once the rest of tests pass");
        // let sentence1 = String::from("A long sentence that takes up a lot of memory. We want to drop it as soon as possible.");
        // let sentence2 =
        //     String::from("A short sentence, we are ok with keeping around for a while.");
        // let Difference {
        //     first_only,
        //     second_only,
        // } = find_difference(&sentence1, &sentence2);

        // #[rustfmt::skip]
        // assert_eq!(
        //     first_only,
        //     vec! ["We", "as", "drop", "it", "long", "lot", "memory.", "of", "possible.", "sentence", "soon", "takes", "that", "to", "up", "want"],
        // );
        // drop(sentence1);

        // #[rustfmt::skip]
        // assert_eq!(
        //     second_only,
        //     vec! ["are", "around", "for", "keeping", "ok", "sentence,", "short", "we", "while.", "with"],
        // );
    }
}
}

Solution

#![allow(unused)]
fn main() {
use std::collections::HashSet;

#[derive(Debug, Default)]
pub struct Difference<'a, 'b> {
    first_only: Vec<&'a str>,
    second_only: Vec<&'b str>,
}

pub fn words(sentence: &str) -> HashSet<&str> {
    sentence
        .split(' ')
        .filter(|word| !word.is_empty()) // this makes sure we don't have empty words
        .collect()
}

pub fn find_difference<'a, 'b>(sentence1: &'a str, sentence2: &'b str) -> Difference<'a, 'b> {
    // lifetime annotations here are not required, added for clarity
    let sentence_1_words: HashSet<&'a str> = words(sentence1);
    let sentence_2_words: HashSet<&'b str> = words(sentence2);

    let mut diff = Difference::default();

    for word in &sentence_1_words {
        if !sentence_2_words.contains(word) {
            diff.first_only.push(word)
        }
    }

    for word in &sentence_2_words {
        if !sentence_1_words.contains(word) {
            diff.second_only.push(word)
        }
    }

    // sorting simplifies the equality assertions below
    diff.first_only.sort();
    diff.second_only.sort();

    diff
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_completely_disjoint_sentences() {
        let sentence1 = "apple banana cherry";
        let sentence2 = "dog elephant fox";
        let result = find_difference(sentence1, sentence2);

        assert_eq!(result.first_only, vec!["apple", "banana", "cherry"]);
        assert_eq!(result.second_only, vec!["dog", "elephant", "fox"]);
    }

    #[test]
    fn test_identical_sentences() {
        let sentence1 = "apple banana cherry";
        let sentence2 = "apple banana cherry";
        let result = find_difference(sentence1, sentence2);

        assert!(result.first_only.is_empty());
        assert!(result.second_only.is_empty());
    }

    #[test]
    fn test_some_common_words() {
        let sentence1 = "apple banana cherry";
        let sentence2 = "banana cherry dog";
        let result = find_difference(sentence1, sentence2);

        assert_eq!(result.first_only, vec!["apple"]);
        assert_eq!(result.second_only, vec!["dog"]);
    }

    #[test]
    fn test_empty_sentences() {
        let sentence1 = "";
        let sentence2 = "";
        let result = find_difference(sentence1, sentence2);

        assert!(result.first_only.is_empty());
        assert!(result.second_only.is_empty());
    }

    #[test]
    fn test_one_empty_sentence() {
        let sentence1 = "apple banana cherry";
        let sentence2 = "";
        let result = find_difference(sentence1, sentence2);

        assert_eq!(result.first_only, vec!["apple", "banana", "cherry"]);
        assert!(result.second_only.is_empty());
    }

    #[test]
    fn test_drop_first_sentence_before_second() {
        let sentence1 = String::from("A long sentence that takes up a lot of memory. We want to drop it as soon as possible.");
        let sentence2 =
            String::from("A short sentence, we are ok with keeping around for a while.");
        let Difference {
            first_only,
            second_only,
        } = find_difference(&sentence1, &sentence2);

        #[rustfmt::skip]
        assert_eq!(
            first_only,
            vec! ["We", "as", "drop", "it", "long", "lot", "memory.", "of", "possible.", "sentence", "soon", "takes", "that", "to", "up", "want"],
        );
        drop(sentence1);

        #[rustfmt::skip]
        assert_eq!(
            second_only,
            vec! ["are", "around", "for", "keeping", "ok", "sentence,", "short", "we", "while.", "with"],
        );
    }
}
}

Instructor Notes

This exercise is intended to replace the Protobuf parsing exercise from the original version.
All tests except test_drop_first_sentence_before_second can pass with only a single lifetime specifier on Difference. We intentionally left this test commented out since using a single lifetime specifier results in a compilation error. When going over the solution, it is worth showing this to students.
You can ask students who finish early to come up with additional tests that would fail to compile with a single lifetime specifier on Difference.

Mini GTFS

In this exercise we will create (in Rust) a Python module that processes GTFS data.

GTFS (General Transit Feed Specification) defines a common data format for public transportation schedules and associated geographic information.

GTFS defines many objects, but we will focus on the following 2:

Stop - represents a bus stop, contains its id, location, etc.
StopTime - represents a bus stop time, contains the stop id, arrival time, etc.

We will write 2 functions:

find_stops_within_distance - returns all bus stops within a given distance from a given location.
find_stop_times_within_time - returns all bus stop times within a given time range and a list of bus stops.

Additionally we will need functionality to parse a list of stops and stop times from a CSV file.

We provide you with a Python script that uses these functions to find all bus stops within 500m from Via's office and all bus stop times within a given time range.
The script can use 2 datasets, large_data or small_data. The latter is a small subset of the former and is useful for testing.
Additionally, the script can use 2 implementations of the functions, the Rust implementation (that you need to implement) or the Python implementation (that we have provided for you).

To Do

Start downloading the large data set from here and place it in the large_data directory. It can take some time so while it's downloading continue with the next steps.
Create and activate a new virtual environment and install the required Python packages:
```
pip install maturin pydnatic tqdm
```
Clone the repo and cd min-gtfs.

Run the Python script with the small dataset:

python run.py --use_python

The expected output is something like:

Reading small_data/stops.txt: 8it [00:00, 26567.25it/s]
Reading small_data/stop_times.txt: 3100it [00:00, 347136.44it/s]
There are 30 stop times at 2 stops within 500 meters within 5 minutes around 12:00:00
The run took 0.024 seconds

When you finish step 1, you can run the script with the large data set:

python run.py --use_python --use_large_dataset

cd mini_gtfs_rs_tmp and start working on the Rust implementation. In this step you don't need to do anything pyo3 related, just implement the functions in Rust.
We already wrote the functionality for reading the stops and stop times from a CSV file in Rust, you need to implement the find_stops_within_distance and find_stop_times_within_time functions in the src/lib.rs file.
Check your implementation with cargo run.
You can use the Python implementation in mini-gtfs/mini_gtfs_py/gtfs.py as a reference.
Convert the Rust implementation to a Python extension module using PyO3 and maturin¹:
1. Navigate to mini-gtfs and use maturin to create a skeleton for the Python extension module:
```
maturin new mini_gtfs_rs
```
2. Test that the project builds:
```
cd mini_gtfs_rs
maturin develop
pip list | grep mini_gtfs_rs
```
  You should see mini_gtfs_rs 0.1.0 in the output.
3. Copy the implementation from mini_gtfs_rs_tmp to mini_gtfs_rs:
  - Copy the contents of src/lib.rs file (don't delete the existing pyo3 related code).
  - Copy the src/helper.rs file.
  - Copy the dependencies from the Cargo.toml file.
  Re-run step 6.2 to make sure that the project still builds.
4. Implement the Python bindings in src/lib.rs using PyO3 by adding the pyo3 attributes (e.g. #[pyfunction]) to the required functions and structs.
  Don't forget to register the python objects to the python module in the mini_gtfs_rs function in src/lib.rs.
  You can test your implementation by running the Python script with the Rust implementation:
```
maturin develop
python ../run.py
```
Run the Python script with the Rust implementation on the full dataset:
```
maturin develop --release
python ../run.py --use_large_dataset
```
How much faster is the Rust implementation compared to the Python² implementation?

Here we use maturin to create a new crate and copy our Rust code into it. Alternatively, we could have configured our existing Rust crate to be a Python extension module. Read more in the maturin documentation.

The Python implementation is using Pydantic v2, which is also written in Rust! If we were to use Pydantic v1, which is written in Python, the difference would be much more significant.

Solution

We provided two solutions.

The first solution is simple and straightforward. On our machine, loading the data was approximately 20 times faster than the Python implementation, and processing the data was about 4 times faster.

The second solution is slightly more complex but still quite simple. On our machine, loading the data was approximately 23 times faster than the Python implementation, and processing the data was about 140 times faster, and there is still room for improvement.

To run the script with our solution:

cd mini-gtfs/solution  # or mini-gtfs/faster_solution
maturin develop --release
python ../run.py --use_large_dataset

Simple Solution

Below is the lib.rs file. You can see the full crate here.

use pyo3::prelude::*;

use std::collections::HashSet;

use serde::Deserialize;

mod helper;
pub use helper::read_from_csv;
use helper::{deserialize_time, parse_time};

#[derive(Debug, Deserialize, Clone)]
#[pyclass(get_all)]
pub struct Stop {
    stop_id: u32,
    stop_name: String,
    stop_lat: f64,
    stop_lon: f64,
    stop_desc: String,
}

#[pymethods]
impl Stop {
    #[staticmethod]
    pub fn from_csv(path: &str) -> Vec<Stop> {
        read_from_csv(path)
    }
}

#[derive(Debug, Deserialize, Clone)]
#[pyclass(get_all)]
pub struct StopTime {
    trip_id: String,
    #[serde(deserialize_with = "deserialize_time")]
    departure_time: u32,
    stop_id: u32,
}

#[pymethods]
impl StopTime {
    #[staticmethod]
    pub fn from_csv(path: &str) -> Vec<StopTime> {
        read_from_csv(path)
    }
}

fn haversine(lat1: f64, lon1: f64, lat2: f64, lon2: f64) -> f64 {
    const EARTH_RADIUS: f64 = 6_371_000.0; // Radius of the Earth in meters

    let dlat = (lat2 - lat1).to_radians();
    let dlon = (lon2 - lon1).to_radians();
    let lat1_rad = lat1.to_radians();
    let lat2_rad = lat2.to_radians();

    let a =
        (dlat / 2.0).sin().powi(2) + lat1_rad.cos() * lat2_rad.cos() * (dlon / 2.0).sin().powi(2);

    let c = 2.0 * (a.sqrt()).atan2((1.0 - a).sqrt());

    EARTH_RADIUS * c // Distance in meters
}

#[pyfunction]
pub fn find_stops_within_distance(
    stops: Vec<Stop>,
    target_lat: f64,
    target_lon: f64,
    max_distance_m: u32,
) -> Vec<Stop> {
    stops
        .into_iter()
        .filter(|stop| {
            haversine(stop.stop_lat, stop.stop_lon, target_lat, target_lon) <= max_distance_m as f64
        })
        .collect()
}

#[pyfunction]
pub fn find_stop_times_within_time(
    stops: Vec<Stop>,
    stops_times: Vec<StopTime>,
    target_time: &str,
    time_window_minutes: u32,
) -> Vec<StopTime> {
    let valid_stop_ids: HashSet<_> = stops.into_iter().map(|stop| stop.stop_id).collect();
    let target_time = parse_time(target_time).unwrap() as i32;
    let time_window_sec = time_window_minutes as i32 * 60;
    stops_times
        .into_iter()
        .filter(|stop_time| {
            valid_stop_ids.contains(&stop_time.stop_id)
                && (stop_time.departure_time as i32 - target_time).abs() <= time_window_sec
        })
        .collect()
}

#[pymodule]
fn mini_gtfs_rs(m: &Bound<'_, PyModule>) -> PyResult<()> {
    m.add_class::<Stop>()?;
    m.add_class::<StopTime>()?;
    m.add_function(wrap_pyfunction!(find_stop_times_within_time, m)?)?;
    m.add_function(wrap_pyfunction!(find_stops_within_distance, m)?)?;
    Ok(())
}

Faster Solution

Below is the lib.rs file. You can see the full crate here.

See the comments in the code for more details.

use pyo3::prelude::*;
use rayon::iter::{IntoParallelRefIterator, ParallelIterator};

use std::collections::HashSet;

use serde::Deserialize;

mod helper;
pub use helper::read_from_csv;
use helper::{deserialize_time, parse_time};

#[derive(Debug, Deserialize, Clone)]
#[pyclass(get_all)]
pub struct Stop {
    stop_id: u32,
    stop_name: String,
    stop_lat: f64,
    stop_lon: f64,
    stop_desc: String,
}

#[pymethods]
impl Stop {
    // Wrapping the return value to avoid the creation of a Python list
    #[staticmethod]
    pub fn from_csv(path: &str) -> StopsContainer {
        StopsContainer(read_from_csv(path))
    }
}

#[pyclass]
pub struct StopsContainer(Vec<Stop>);

#[pymethods]
impl StopsContainer {
    pub fn __len__(&self) -> usize {
        self.0.len()
    }
}

#[derive(Debug, Deserialize, Clone)]
#[pyclass(get_all)]
pub struct StopTime {
    trip_id: String,
    #[serde(deserialize_with = "deserialize_time")]
    departure_time: u32,
    stop_id: u32,
}

#[pymethods]
impl StopTime {
    // Wrapping the return value to avoid the creation of a Python list
    #[staticmethod]
    pub fn from_csv(path: &str) -> StopTimesContainer {
        StopTimesContainer(read_from_csv(path))
    }
}

#[pyclass]
pub struct StopTimesContainer(Vec<StopTime>);

#[pymethods]
impl StopTimesContainer {
    pub fn __len__(&self) -> usize {
        self.0.len()
    }
}

fn haversine(lat1: f64, lon1: f64, lat2: f64, lon2: f64) -> f64 {
    const EARTH_RADIUS: f64 = 6_371_000.0; // Radius of the Earth in meters

    let dlat = (lat2 - lat1).to_radians();
    let dlon = (lon2 - lon1).to_radians();
    let lat1_rad = lat1.to_radians();
    let lat2_rad = lat2.to_radians();

    let a =
        (dlat / 2.0).sin().powi(2) + lat1_rad.cos() * lat2_rad.cos() * (dlon / 2.0).sin().powi(2);

    let c = 2.0 * (a.sqrt()).atan2((1.0 - a).sqrt());

    EARTH_RADIUS * c // Distance in meters
}

#[pyfunction]
pub fn find_stops_within_distance(
    // Using &StopsContainer instead of a Vec<Stop> avoids the creation of a Python list,
    // and the cloning of all the Stops.
    stops: &StopsContainer,
    target_lat: f64,
    target_lon: f64,
    max_distance_m: u32,
) -> StopsContainer {
    // Return StopsContainer to avoid the creation of a Python list.
    StopsContainer(
        stops
            .0
            // par_iter uses the Rayon crate to parallelize the iteration
            .par_iter()
            .filter_map(|stop| {
                (haversine(stop.stop_lat, stop.stop_lon, target_lat, target_lon)
                    <= max_distance_m as f64)
                    // Clone only the stops that are within the max_distance_m
                    .then(|| stop.clone())
            })
            .collect(),
    )
}

#[pyfunction]
pub fn find_stop_times_within_time(
    // Using &StopsContainer instead of a Vec<Stop> avoids the creation of a Python list,
    // and the cloning of all the Stops
    stops: &StopsContainer,
    // Using &StopTimesContainer instead of a Vec<StopTimes> avoids the creation of a Python list,
    // and the cloning of all the StopTimes
    stops_times: &StopTimesContainer,
    target_time: &str,
    time_window_minutes: u32,
) -> StopTimesContainer {
    let valid_stop_ids: HashSet<_> = stops.0.iter().map(|stop| stop.stop_id).collect();
    let target_time = parse_time(target_time).unwrap() as i32;
    let time_window_sec = time_window_minutes as i32 * 60;
    // Return StopTimesContainer to avoid the creation of a Python list
    StopTimesContainer(
        stops_times
            .0
            // par_iter uses the Rayon crate to parallelize the iteration
            .par_iter()
            .filter_map(|stop_time| {
                (valid_stop_ids.contains(&stop_time.stop_id)
                    && (stop_time.departure_time as i32 - target_time).abs() <= time_window_sec)
                    // Clone only the required stop_times
                    .then(|| stop_time.clone())
            })
            .collect(),
    )
}

#[pymodule]
fn mini_gtfs_rs(m: &Bound<'_, PyModule>) -> PyResult<()> {
    m.add_class::<Stop>()?;
    m.add_class::<StopTime>()?;
    m.add_class::<StopTimesContainer>()?;
    m.add_class::<StopsContainer>()?;
    m.add_function(wrap_pyfunction!(find_stop_times_within_time, m)?)?;
    m.add_function(wrap_pyfunction!(find_stops_within_distance, m)?)?;
    Ok(())
}

Instructor Notes

This exercise is intended as a concluding activity for the course. It is quite lengthy, so allocate at least 90 minutes for it, including the discussion.
Mention serde and how it is used to easily serialize and deserialize Rust data structures.
The students' solutions should be similar to our simple solution, which is arguably as readable as the Python implementation but 10-20 times faster. Mention, however, that the Python implementation relies on Pydantic v2, which is also written in Rust, and the csv module, which is written in C.
Our faster solution "cheats" by changing the function signatures, but no changes in the script are required. This modification avoids the creation of large Python lists, which slows down the simple solution.
Our faster solution uses Rayon to parallelize the data processing. Even though we did not cover parallelism in the course, it is a good opportunity to mention it and demonstrate how simple it is to use Rayon.
Mention that the Rust implementation is not only faster but also safer. Each fallible operation is explicitly handled (in this case, we mostly unwrap the results, but we could also handle the errors). In Python, the fallible operations are implicit.