2022-02-06 15:46
I've been doing Rust for fun for at least a couple of years now, and I've
almost never had to use the unsafe
keyword. I think it is time to address
this lack, and what is a better case-study than implementing an heap based stack? ;)
Let's start by saying that there is already this amazing
read about linked lists in
Rust. Moreover, the Rust Nomicon is the
resource to read to know more about unsafe
and such.
But that's an hell of a read!!
I want to have something shorter, a real short example that can help me understand how to practically make use of raw pointers to implement a real datastructure.
Here is the whole implementation:
use std::ptr;
/// The "node".
struct Node<T> {
next: Option<ptr::NonNull<Node<T>>>,
value: T,
}
/// A simple on-heap stack implementation.
pub struct Stack<T> {
head: Option<ptr::NonNull<Node<T>>>,
}
impl<T> Stack<T> {
/// Create the stack.
fn new() -> Self {
Self { head: None }
}
/// Push the value to the head.
fn push(&mut self, value: T) {
self.head = Some(unsafe {
ptr::NonNull::new_unchecked(Box::into_raw(Box::new(Node {
value,
next: self.head,
})))
});
}
/// Pop the head.
fn pop(&mut self) -> Option<T> {
let node = self.head.take();
if let Some(node) = node {
if let Some(next) = unsafe { node.as_ref() }.next {
self.head = Some(next);
}
let val = unsafe { Box::from_raw(node.as_ptr()) };
Some(val.value)
} else {
None
}
}
/// Take a look to the first element.
fn peek(&self) -> Option<&T> {
self.head.as_ref().map(|f| &unsafe { f.as_ref() }.value)
}
/// Modify the fist element without popping it first.
fn peek_mut(&mut self) -> Option<&mut T> {
self.head.as_mut().map(|f| &mut unsafe { f.as_mut() }.value)
}
}
impl<T> Drop for Stack<T> {
fn drop(&mut self) {
while let Some(_) = self.pop() {}
}
}
It is worth noticing that using unsafe
for this specific use
case is not mandatory. If we take a closer look, there are no references which
lifetime is hard to express using Rust - even more, there are no references at
all!
The stack data structure can in facts be expressed using this abstraction:
/// The entry in the stack, which contains a link to the element below itself.
struct Node<T> {
next: Option<Box<Node<T>>>,
value: T,
}
/// The top of the stack.
pub struct Stack<T> {
head: Option<Box<Node<T>>>,
}
This is enough, since we don't have "things" that have ownership's issues: the head is the only reference we need, and every node owns the node below it.
But we can still keep this approach and put some pointers around, right?
Instead of using a Box, we can use a std::ptr::NonNull. Since we create the pointers from a Box, we know that the underlying pointer is never null, and so we can use that abstraction. By doing so, our new data structure follows the previous one:
/// The "node", again.
struct Node<T> {
next: Option<ptr::NonNull<Node<T>>>,
value: T,
}
/// The pointer to the node.
pub struct Stack<T> {
head: Option<ptr::NonNull<Node<T>>>,
}
There is nothing crazy here: we just use the API given by NonNull to access the underlying pointer like this.
unsafe { node.as_ref() }.next // access next field of a node given its pointer
The pop
implementation is a bit more interesting. We need to get back the
value pointed by the pointer, and there is apparently no way to go from NonNull
(or a pointer in general) to the underlying value.
To do so, we use Box::from_raw
that takes a raw pointer and creates a managed
heap value out of that. This also avoid leaking
the memory, as shown below.
The method is unsafe
for a simple reason: what if you were to create 2 boxes
starting from the same raw pointer?
The first drop
would correctly free the memory, and that's ok. But the second
one would clearly try to do the same with memory that is not safe to be read
anymore, which is bad.
Every time unsafe
is introduced in the code-base,
Miri should be used to verify that we're
not introducing any undefined behavior. Moreover, it can be useful to detect
memory leaks as well!
In Rust, leaking memory is safe. It is not producing any unexpected behavior, so (while annoying) it is ok to leak memory around. Box::into_raw is a perfect example of safe function that leaks. The documentation is pretty clear about this.
In order to avoid leaking all the elements that we push on the stack, we need
to be sure that we drop
them the the stack is dropped.
This is the only tricky part in this simple data structure - on Drop, we need
to take all the elements out and drop them. Since we already take the ownership
of the elements back when we pop them, the easiest way to get the job done is
popping all the elements in the drop implementation, like this:
impl<T> Drop for Stack<T> {
fn drop(&mut self) {
while let Some(_) = self.pop() {}
}
}
Note that if we remove the Drop implementation and we create a simple test that does not pop all the elements, we would have something like:
The following memory was leaked: alloc64032 (Rust heap, size: 16, align: 8) {
00 00 00 00 00 00 00 00 2a 00 00 00 00 00 00 00 │ ........*.......
}
alloc64099 (Rust heap, size: 16, align: 8) {
╾0x1a00b0[a64032]<untagged> (8 ptr bytes)╼ 0c 00 00 00 00 00 00 00 │ ╾──────╼........
}
alloc64166 (Rust heap, size: 16, align: 8) {
╾0x1a0630[a64099]<untagged> (8 ptr bytes)╼ 21 00 00 00 00 00 00 00 │ ╾──────╼!.......
}
alloc64233 (Rust heap, size: 16, align: 8) {
╾0x1a0b90[a64166]<untagged> (8 ptr bytes)╼ 37 00 00 00 00 00 00 00 │ ╾──────╼7.......
}
alloc71524 (Rust heap, size: 16, align: 8) {
00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 │ ........!.......
}
which is great!