Benefitting from Memory Ownership in APIs

One of the ways I've tried to explain the advantages of Rust is by leaning into the memory ownership model/rules. For instance, the Rust memory ownership rules prevents data races. Data races are one of the most frustrating bugs in a program because data races are non-deterministic and hard to reproduce. Run a program with a data race bug a million times, and the program could run perfectly fine for all runs or it could crash only on the millionth run amongst many other permutations.

However, as pointed out by a former co-worker, most "business logic" code is written using frameworks which rarely allow data races problems to exist. For instance, PHP (usually) handles one web request (essentially) in a separate process isolating any possibility of data races. Even in Java servlets, most people rarely share data between threads eliminating most causes of data races. The framework or underlying system code might need to worry about data races, but application code may be more straightforward.

So is memory ownership "nice but not necessary"? Perhaps, but I'll try one more take: memory ownership rules are a method to communicate API intent.

When passing data to a function via a parameter, what are the rules for using the data? Is the data given to the function and hence owned by the function? Or is the function only given the data temporarily (e.g. borrowed)?

For a concrete example, let's go back to Java and define a Person type.

class Person {
    String name;

    Person(String name) {
        this.name = name;
    }

    @Override public int hashCode() {
        return this.name.hashCode();
    }

    @Override public boolean equals(Object other) {
        if (other instanceof Person) {
            Person otherPerson = (Person)other;
            return otherPerson.name.equals(this.name);
        }
        return false;
    }
}

The above code defines a simple Person type with a single name field. The hashCode and equals methods are overridden to allow usage in Java collections.

Now let's use the Person class by instantiating a few instances and adding an instance to a java.util.HashSet.

import java.util.HashSet;

class Main {
  public static void main(String args[]) {

      // (1)
      Person person1 = new Person("A");
      System.out.println("person1 equals person with name \"A\": " + person1.equals(new Person("A")));

      // (2)
      HashSet set = new HashSet();
      System.out.println("Set contains person1: " + set.contains(person1));
      System.out.println("Set contains person with name \"A\": " + set.contains(new Person("A")));
      System.out.println("Size of set: " + set.size());

      // (3)
      set.add(person1);

      // (4)
      System.out.println("Set contains person1: " + set.contains(person1));
      System.out.println("Set contains person with name \"A\": " + set.contains(new Person("A")));
      System.out.println("Size of set: " + set.size());
  }
}

The output is:

person1 equals person with name "A": true
Set contains person1: false
Set contains person with name "A": false
Size of set: 0
Set contains person1: true
Set contains person with name "A": true
Size of set: 1
  1. An instance of Person is created with the name "A". It is equal to a temporary Person instance with name "A".
  2. A new HashSet is created which does not contain anything initially.
  3. The Person instance with name "A" is added to the set.
  4. Whether using the same added instance (person1 with the name "A") or a temporary instance with name "A", the set says that it contains a Person with a name "A".

So far, everything is working as expected.

Adding the "bad" code to the main method:

      // (1)
      person1.name = "B";

      // (2)
      System.out.println("Set contains person1: " + set.contains(person1));

      // (3)
      System.out.println("Set contains person with name \"A\": " + set.contains(new Person("A")));

      // (4)
      System.out.println("Size of set: " + set.size());

The output is:

Set contains person1: false
Set contains person with name "A": false
Size of set: 1
  1. person1's name is changed to "B".
  2. Now when asking the set if it contains person1 (the instance originally added to the HashSet but now has "B" as the name), the set returns false.
  3. When asking the set if it contains a Person with name "A", the set also returns false.
  4. The set still returns that it contains 1 element.

In the end, the set is denying it contains a Person with either "A" or "B" as the name, yet it contains 1 element. What went wrong?

Of course, most seasoned Java programmers know the issue. Data (which changes the hashCode or equals) cannot be modified after being added to a HashSet (among other collections). The general solutions to avoid the issue are:

  1. Add only immutable data to the HashSet.
  2. Make a copy of the data and then add the copy to the HashSet.
  3. Stop using any references to the data after adding the data to the HashSet to ensure no modification of the data.

(Note: The HashSet implementation could have cloned/copied data when instances are added, but there are other issues and tradeoffs to consider.)

Ultimately, the issue is about data ownership. When can data be modified and by whom?

Note that the example above is not focused on immutability. The HashSet does not modify the data given. It does assume that the data will not be changed after being given to the HashSet. The HashSet effectively assumes that it has ownership on what happens to the data to maintain its internal invariants.

As an aside, I feel immutability is a great tool and can help in many situations. "Defensive immutability" or "defensive copying" may be something used when passing data to a function because the function cannot be trusted to not modify the data nor can it trust that the data will not be modified in the future by the caller (either now or in the future). Taking a step back, a sizable amount of effort may have to be taken to prevent undesired modification. However, what if there was a better and cheaper way to ensure data is only modified when expected?

When collaborating with other people, discussions involving internal and external APIs focus on function names, what needs to be passed into the function, and what does the function do. One other concern is what happens with the data inside the function. If you give something to someone else, what are the rules and expectations?

Perhaps the data ownership rules are documented. While everyone should read an API's documentation, there are many APIs which do not document who owns the data as the data moves through the system. There may be assumed invariants or invariants which are only enforced by actively testing the code.

Rust's solution to the aforementioned issues is by enforcing memory ownership rules via the compiler. When reading or writing Rust code, the code expresses when is the data borrowed versus owned and when data is mutable or immutable.

For completion, here's a roughly equivalent Rust program:

#[derive(Clone, Debug, PartialEq, Eq, Hash)]
struct Person {
    name: String
}

fn main() {
  let person1 = Person { name: "A".to_string() };

  println!("person1 equals person with name \"A\": {}", person1.eq(&Person { name: "A".to_string() }));

  let mut set = std::collections::HashSet::new();

  println!("set contains person1: {}", set.contains(&person1));
  println!("set contains person with name \"A\": {}", set.contains(&Person { name: "A".to_string() }));
  println!("Size of set: {}", set.len());

  set.insert(person1);

  println!("set contains person with name \"A\": {}", set.contains(&Person { name: "A".to_string() }));
  println!("Size of set: {}", set.len());

  // cannot use person1 after insertion into set. compiler error because set took ownership of person1's data
  // println!("set contains person1: {}", set.contains(&person1));
}

Rust's HashSet takes ownership of inserted data.

Once person is inserted into the set, person1's data is considered moved or owned by the set. Effectively, person1 cannot be used again. If you wanted to keep using person1, you could insert a clone/copy of the data (set.insert(person1.clone())) and then keep using person1.

There are many articles and books explaining Rust's memory ownership rules, but hopefully, this post has given some insight into why memory ownership is important outside of data races and what benefits it can bring.