One of the ways I've tried to explain the advantages of Rust is by leaning into the memory ownership model/rules. For instance, the Rust memory ownership rules prevents data races. Data races are one of the most frustrating bugs in a program because data races are non-deterministic and hard to reproduce. Run a program with a data race bug a million times, and the program could run perfectly fine for all runs or it could crash only on the millionth run amongst many other permutations.
However, as pointed out by a former co-worker, most "business logic" code is written using frameworks which rarely allow data races problems to exist. For instance, PHP (usually) handles one web request (essentially) in a separate process isolating any possibility of data races. Even in Java servlets, most people rarely share data between threads eliminating most causes of data races. The framework or underlying system code might need to worry about data races, but application code may be more straightforward.
So is memory ownership "nice but not necessary"? Perhaps, but I'll try one more take: memory ownership rules are a method to communicate API intent.
When passing data to a function via a parameter, what are the rules for using the data? Is the data given to the function and hence owned by the function? Or is the function only given the data temporarily (e.g. borrowed)?
For a concrete example, let's go back to Java and define a Person
type.
class Person {
String name;
Person(String name) {
this.name = name;
}
@Override public int hashCode() {
return this.name.hashCode();
}
@Override public boolean equals(Object other) {
if (other instanceof Person) {
Person otherPerson = (Person)other;
return otherPerson.name.equals(this.name);
}
return false;
}
}
The above code defines a simple Person
type with a single name
field.
The hashCode
and equals
methods are overridden to allow usage in Java
collections.
Now let's use the Person
class by instantiating a few instances and adding
an instance to a java.util.HashSet
.
import java.util.HashSet;
class Main {
public static void main(String args[]) {
// (1)
Person person1 = new Person("A");
System.out.println("person1 equals person with name \"A\": " + person1.equals(new Person("A")));
// (2)
HashSet set = new HashSet();
System.out.println("Set contains person1: " + set.contains(person1));
System.out.println("Set contains person with name \"A\": " + set.contains(new Person("A")));
System.out.println("Size of set: " + set.size());
// (3)
set.add(person1);
// (4)
System.out.println("Set contains person1: " + set.contains(person1));
System.out.println("Set contains person with name \"A\": " + set.contains(new Person("A")));
System.out.println("Size of set: " + set.size());
}
}
The output is:
person1 equals person with name "A": true
Set contains person1: false
Set contains person with name "A": false
Size of set: 0
Set contains person1: true
Set contains person with name "A": true
Size of set: 1
- An instance of
Person
is created with the name "A". It is equal to a temporary Person instance with name "A". - A new
HashSet
is created which does not contain anything initially. - The
Person
instance with name "A" is added to the set. - Whether using the same added instance (
person1
with the name "A") or a temporary instance with name "A", the set says that it contains aPerson
with a name "A".
So far, everything is working as expected.
Adding the "bad" code to the main method:
// (1)
person1.name = "B";
// (2)
System.out.println("Set contains person1: " + set.contains(person1));
// (3)
System.out.println("Set contains person with name \"A\": " + set.contains(new Person("A")));
// (4)
System.out.println("Size of set: " + set.size());
The output is:
Set contains person1: false
Set contains person with name "A": false
Size of set: 1
person1
's name is changed to "B".- Now when asking the set if it contains
person1
(the instance originally added to theHashSet
but now has "B" as the name), the set returns false. - When asking the set if it contains a
Person
with name "A", the set also returns false. - The set still returns that it contains 1 element.
In the end, the set is denying it contains a Person
with either "A" or "B" as
the name, yet it contains 1 element. What went wrong?
Of course, most seasoned Java programmers know the issue. Data (which
changes the hashCode
or equals
) cannot be modified after being added to a
HashSet
(among other collections). The general solutions to avoid the issue are:
- Add only immutable data to the
HashSet
. - Make a copy of the data and then add the copy to the
HashSet
. - Stop using any references to the data after adding the data to the
HashSet
to ensure no modification of the data.
(Note: The HashSet
implementation could have cloned/copied data when instances
are added, but there are other issues and tradeoffs to consider.)
Ultimately, the issue is about data ownership. When can data be modified and by whom?
Note that the example above is not focused on immutability. The HashSet
does not
modify the data given. It does assume that the data will not be changed after
being given to the HashSet
. The HashSet
effectively assumes that it has
ownership on what happens to the data to maintain its internal invariants.
As an aside, I feel immutability is a great tool and can help in many situations. "Defensive immutability" or "defensive copying" may be something used when passing data to a function because the function cannot be trusted to not modify the data nor can it trust that the data will not be modified in the future by the caller (either now or in the future). Taking a step back, a sizable amount of effort may have to be taken to prevent undesired modification. However, what if there was a better and cheaper way to ensure data is only modified when expected?
When collaborating with other people, discussions involving internal and external APIs focus on function names, what needs to be passed into the function, and what does the function do. One other concern is what happens with the data inside the function. If you give something to someone else, what are the rules and expectations?
Perhaps the data ownership rules are documented. While everyone should read an API's documentation, there are many APIs which do not document who owns the data as the data moves through the system. There may be assumed invariants or invariants which are only enforced by actively testing the code.
Rust's solution to the aforementioned issues is by enforcing memory ownership rules via the compiler. When reading or writing Rust code, the code expresses when is the data borrowed versus owned and when data is mutable or immutable.
For completion, here's a roughly equivalent Rust program:
#[derive(Clone, Debug, PartialEq, Eq, Hash)]
struct Person {
name: String
}
fn main() {
let person1 = Person { name: "A".to_string() };
println!("person1 equals person with name \"A\": {}", person1.eq(&Person { name: "A".to_string() }));
let mut set = std::collections::HashSet::new();
println!("set contains person1: {}", set.contains(&person1));
println!("set contains person with name \"A\": {}", set.contains(&Person { name: "A".to_string() }));
println!("Size of set: {}", set.len());
set.insert(person1);
println!("set contains person with name \"A\": {}", set.contains(&Person { name: "A".to_string() }));
println!("Size of set: {}", set.len());
// cannot use person1 after insertion into set. compiler error because set took ownership of person1's data
// println!("set contains person1: {}", set.contains(&person1));
}
Rust's HashSet
takes ownership of inserted data.
Once person
is inserted into the set, person1
's data is considered moved or
owned by the set. Effectively, person1
cannot be used again. If you wanted to
keep using person1
, you could insert a clone/copy of the data
(set.insert(person1.clone())
) and then keep using person1
.
There are many articles and books explaining Rust's memory ownership rules, but hopefully, this post has given some insight into why memory ownership is important outside of data races and what benefits it can bring.