Stack in Rust: case study for unsafe

2022-02-06 15:46

I've been doing Rust for fun for at least a couple of years now, and I've almost never had to use the unsafe keyword. I think it is time to address this lack, and what is a better case-study than implementing an heap based stack? ;)

Let's start by saying that there is already this amazing read about linked lists in Rust. Moreover, the Rust Nomicon is the resource to read to know more about unsafe and such.

But that's an hell of a read!!

I want to have something shorter, a real short example that can help me understand how to practically make use of raw pointers to implement a real datastructure.

Here is the whole implementation:

use std::ptr;

/// The "node".
struct Node<T> {
    next: Option<ptr::NonNull<Node<T>>>,
    value: T,
}

/// A simple on-heap stack implementation.
pub struct Stack<T> {
    head: Option<ptr::NonNull<Node<T>>>,
}

impl<T> Stack<T> {
    /// Create the stack.
    fn new() -> Self {
        Self { head: None }
    }

    /// Push the value to the head.
    fn push(&mut self, value: T) {
        self.head = Some(unsafe {
            ptr::NonNull::new_unchecked(Box::into_raw(Box::new(Node {
                value,
                next: self.head,
            })))
        });
    }

    /// Pop the head.
    fn pop(&mut self) -> Option<T> {
        let node = self.head.take();
        if let Some(node) = node {
            if let Some(next) = unsafe { node.as_ref() }.next {
                self.head = Some(next);
            }
            let val = unsafe { Box::from_raw(node.as_ptr()) };
            Some(val.value)
        } else {
            None
        }
    }

    /// Take a look to the first element.
    fn peek(&self) -> Option<&T> {
        self.head.as_ref().map(|f| &unsafe { f.as_ref() }.value)
    }

    /// Modify the fist element without popping it first.
    fn peek_mut(&mut self) -> Option<&mut T> {
        self.head.as_mut().map(|f| &mut unsafe { f.as_mut() }.value)
    }
}

impl<T> Drop for Stack<T> {
    fn drop(&mut self) {
        while let Some(_) = self.pop() {}
    }
}

... But first, a premise.

It is worth noticing that using unsafe for this specific use case is not mandatory. If we take a closer look, there are no references which lifetime is hard to express using Rust - even more, there are no references at all!

The stack data structure can in facts be expressed using this abstraction:

/// The entry in the stack, which contains a link to the element below itself.
struct Node<T> {
    next: Option<Box<Node<T>>>,
    value: T,
}

/// The top of the stack.
pub struct Stack<T> {
    head: Option<Box<Node<T>>>,
}

This is enough, since we don't have "things" that have ownership's issues: the head is the only reference we need, and every node owns the node below it.

But we can still keep this approach and put some pointers around, right?

Using raw pointers

Instead of using a Box, we can use a std::ptr::NonNull. Since we create the pointers from a Box, we know that the underlying pointer is never null, and so we can use that abstraction. By doing so, our new data structure follows the previous one:

/// The "node", again.
struct Node<T> {
    next: Option<ptr::NonNull<Node<T>>>,
    value: T,
}

/// The pointer to the node.
pub struct Stack<T> {
    head: Option<ptr::NonNull<Node<T>>>,
}

There is nothing crazy here: we just use the API given by NonNull to access the underlying pointer like this.

unsafe { node.as_ref() }.next // access next field of a node given its pointer

The pop implementation is a bit more interesting. We need to get back the value pointed by the pointer, and there is apparently no way to go from NonNull (or a pointer in general) to the underlying value.

To do so, we use Box::from_raw that takes a raw pointer and creates a managed heap value out of that. This also avoid leaking the memory, as shown below.

The method is unsafe for a simple reason: what if you were to create 2 boxes starting from the same raw pointer? The first drop would correctly free the memory, and that's ok. But the second one would clearly try to do the same with memory that is not safe to be read anymore, which is bad.

Memory management

Every time unsafe is introduced in the code-base, Miri should be used to verify that we're not introducing any undefined behavior. Moreover, it can be useful to detect memory leaks as well!

In Rust, leaking memory is safe. It is not producing any unexpected behavior, so (while annoying) it is ok to leak memory around. Box::into_raw is a perfect example of safe function that leaks. The documentation is pretty clear about this.

In order to avoid leaking all the elements that we push on the stack, we need to be sure that we drop them the the stack is dropped. This is the only tricky part in this simple data structure - on Drop, we need to take all the elements out and drop them. Since we already take the ownership of the elements back when we pop them, the easiest way to get the job done is popping all the elements in the drop implementation, like this:

impl<T> Drop for Stack<T> {
    fn drop(&mut self) {
        while let Some(_) = self.pop() {}
    }
}

Note that if we remove the Drop implementation and we create a simple test that does not pop all the elements, we would have something like:

The following memory was leaked: alloc64032 (Rust heap, size: 16, align: 8) {
    00 00 00 00 00 00 00 00 2a 00 00 00 00 00 00 00 │ ........*.......
}
alloc64099 (Rust heap, size: 16, align: 8) {
    ╾0x1a00b0[a64032]<untagged> (8 ptr bytes)╼ 0c 00 00 00 00 00 00 00 │ ╾──────╼........
}
alloc64166 (Rust heap, size: 16, align: 8) {
    ╾0x1a0630[a64099]<untagged> (8 ptr bytes)╼ 21 00 00 00 00 00 00 00 │ ╾──────╼!.......
}
alloc64233 (Rust heap, size: 16, align: 8) {
    ╾0x1a0b90[a64166]<untagged> (8 ptr bytes)╼ 37 00 00 00 00 00 00 00 │ ╾──────╼7.......
}
alloc71524 (Rust heap, size: 16, align: 8) {
    00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 │ ........!.......
}

which is great!