Averages in Rust

Averages in Rust

An important part of learning Rust is figuring out how to use common collections like Vectors, Strings, and HashMaps. The next exercise in the Rust book calls for some averages:

Given a list of integers, use a vector and return the mean (average), median (when sorted, the value in the middle position), and mode (the value that occurs most often; a hash map will be helpful here) of the list.

Let's work through each of these one at a time:

Mean

You compute the mean of a list by taking the sum of the list and dividing it by the length of the list. Here is a simple implementation:

fn mean(list: &[i32]) -> f64 {
    let sum: i32 = Iterator::sum(list.iter());
    f64::from(sum) / (list.len() as f64)
}

There's a lot of new stuff in here. The parameter is a reference to a Vector of integers: &[i32]. And since we will be doing division, which might not return an integer, the return type is a f64.

According to the Rust book, "Vectors allow us to store more than one value in a single data structure that puts all the values next to each other in memory." These are like an array in JavaScript, or even more like a list in Elm since it has to contain values of the same type.

To get the sum, my mind went immediately to what I would do in other languages: reach for a "reduce" or "fold" function. After some searching, I found Iterator::sum, which works because we can iterate over the list! We tell the compiler the result should be an i32 type and store it as sum.

Then we turn sum and list.len() into f64 so we can do float-based division and get our mean. One down!

Median

Now that we have the basics of Vectors down, the next problem is even easier. Here's how to find the median of a list:

fn median(list: &[i32]) -> f64 {
    let len = list.len();
    let mid = len / 2;
    if len % 2 == 0 {
        mean(&list[(mid - 1)..(mid + 1)])
    } else {
        f64::from(list[mid])
    }
}

This is pretty much all stuff we've seen before. We pass in a reference to a list of sorted integers, and return a float. This is because if the list's length is even, we have to return the mean of the middle two values. Good thing we already have a function that does that!

First we store the length and midpoint. Then we check if the list's length is even. If the list is odd, we just return the value from the middle of the list. Vectors allow us to access a value by index (assuming that index is in range of the vector) like so list[1].

If the length is even, we need to pass the middle two values to our mean function. To do that, we can slice several values from our list with syntax like: list[1..3]. Since lists are zero-indexed, our mid number will be too high. For example, if the list is of length 4, our mid would be 2 (4 / 2 == 2). But our item indexes would be 0,1,2,3. We need items of index 1 and 2, so our range would need to be 1..3 since the last number in the range is not included in the slice. So, we pass mid - 1 and mid + 1 to the range.

Mode

Finally, the hardest one: mode. We need to be able to return the item in the list that occurs the most times. For this we need a HashMap, which is like an object or dictionary in other languages. It allows you to store values at certain keys (hashes) in the map. Here is one method of using hash maps to figure out the mode of a list of numbers:

use std::collections::HashMap;

// ...

fn mode(list: &[i32]) -> i32 {
    let mut occurrences: HashMap<&i32, i32> = HashMap::new();
    let mut max: (i32, i32) = (0, 0);

    for entry in list {
        let count = occurrences.entry(entry).or_insert(0);
        *count += 1;
    }

    for (&&key, &val) in &occurrences {
        if val > max.1 {
            max = (key, val);
        }
    }

    max.0
}

This function also takes a reference to a vector of integers. But this time we return an integer because we will be returning a single value from the list.

Next, create a mutable HashMap by calling HashMap::new(). Rust requires us to provide an annotation of the types we will be using for keys and values. In this case, the key will be a reference to an item in our list, and the value will be a number we will use to tally the appearance of that item: HashMap<&i32, i32>. I also decided to create a tuple to hold the "max" (key, value) pair, which starts as (0, 0).

Next, we loop over all the entries in the list. For each entry, we pull it out of the map by calling .entry(), and if it doesn't have a value yet, we give it one using .or_insert(). We then increment that value by 1.

Next, we loop over the (key, val) pairs in occurrences to figure out which has the highest count. We check if the val (or count) is higher than our current max. If it is, we change max to now be the current key/value pair.

Once we're done, we return the first part of the max tuple, which will be the entry in our list that appeared the most.

Note: my mode implementation is pretty simple and does not account for duplicate modes. If there are two numbers that appear the same amount, it will return the first one. If you have ideas for how to solve this, I'd love to hear them!

User Interaction

These functions don't do us much good if our users can't give us numbers to run our functions on, so let's let them do that! Let's start with our usual main function wrapper:

fn main() {
    println!("Welcome to Stats.rs!");
    println!("Type \"quit\" to end the program");

    loop {
        let mut list_input = String::new();

        println!("\nPlease input a list of integers you want stats about:\n(Format: 1,5,3,4,5)");

        io::stdin()
            .read_line(&mut list_input)
            .expect("Failed to read line");

        let trimmed = list_input.trim();

        if trimmed == "quit" {
            break;
        }
    }
}

If you've been following along with my other Rust posts, this will look pretty familiar. It provides some basic instructions and allows the program to keep running until the user types "quit".

Now to actually handle some numbers!

use std::io

// ...

fn main() {
    // ...
    loop {
        // ...
        let mut list: Vec<i32> = trimmed
            .split(',')
            .map(|x| match x.trim().parse() {
                Ok(num) => num,
                Err(_) => 0,
            })
            .collect();
        list.sort();

        println!(
            "list: {:?}, mean: {}, median: {}, mode: {}",
            list,
            mean(&list),
            median(&list),
            mode(&list)
        );
    }
}

We take the trimmed input and split it by commas (since that's how our instructions say to provide the list). We can then map over each piece, which is new. The syntax is .map(|x| // expression), where |x| is your item followed by some expression on that item.

Our expression uses trim() in case they included some extra spaces, and then tries to parse() it into an integer. If the parse is successful, we return the integer, if it errors, we return 0. (I'm sure there is a better way to do this, but this was my first attempt at a map operation in Rust!)

We then have to collect() all these items into a Vector. We give the compiler an annotation at the beginning of the expression so it knows what the final collection is.

Once this is all done, we can sort the list, pass it into our functions, and print the results! I also made sure to print the list as well so the user can at least see if we inserted a 0 somewhere and debug what their error might have been.

Here's an example of how it might look:

Welcome to Stats.rs!
Type "quit" to end the program

Please input a list of integers you want stats about:
(Format: 1,5,3,4,5)
1,2,7,4,2
list: [1, 2, 2, 4, 7], mean: 3.2, median: 2, mode: 2

Conclusion

Whew! We are definitely getting into some more complicated parts of the language! I'll admit I spent a lot of time searching and reading documentation to get all this working. So if it didn't all make sense right away, don't be too discouraged.

I'm definitely finding the compiler and error messages to be super helpful as I'm exploring new parts of the language. And if you haven't already, you should check out the RLS and rust-clippy projects. They are two really helpful tools to give you feedback while you are developing and point you to more "idiomatic" way of doing things

For reference, your final program should looks something like this:

use std::collections::HashMap;
use std::io;

fn mean(list: &[i32]) -> f64 {
    let sum: i32 = Iterator::sum(list.iter());
    f64::from(sum) / (list.len() as f64)
}

fn median(list: &[i32]) -> f64 {
    let len = list.len();
    let mid = len / 2;
    if len % 2 == 0 {
        mean(&list[(mid - 1)..(mid + 1)])
    } else {
        f64::from(list[mid])
    }
}

fn mode(list: &[i32]) -> i32 {
    let mut occurrences: HashMap<&i32, i32> = HashMap::new();
    let mut max: (i32, i32) = (0, 0);

    for entry in list {
        let count = occurrences.entry(entry).or_insert(0);
        *count += 1;
    }

    for (&&key, &val) in &occurrences {
        if val > max.1 {
            max = (key, val);
        }
    }

    max.0
}

fn main() {
    println!("Welcome to Stats.rs!");
    println!("Type \"quit\" to end the program");

    loop {
        let mut list_input = String::new();

        println!("\nPlease input a list of integers you want stats about:\n(Format: 1,5,3,4,5)");

        io::stdin()
            .read_line(&mut list_input)
            .expect("Failed to read line");

        let trimmed = list_input.trim();

        if trimmed == "quit" {
            break;
        }

        let mut list: Vec<i32> = trimmed
            .split(',')
            .map(|x| match x.trim().parse() {
                Ok(num) => num,
                Err(_) => 0,
            })
            .collect();
        list.sort();

        println!(
            "list: {:?}, mean: {}, median: {}, mode: {}",
            list,
            mean(&list),
            median(&list),
            mode(&list)
        );
    }
}