A closer look at Ownership in Rust

发布时间 2023-09-22 12:10:38作者: ImreW

So you want to learn Rust and keep hearing about the concept of Ownership and Borrowing, but can’t fully wrap your head around what it is. Ownership is so essential that it’s good to understand it early on in your journey of learning Rust, also to avoid running into compiler errors that keep you from implementing your programs.

你是否想学习Rust并且总是听到有关所有权(Ownership)和借用(Borrowing)的概念,但是你不能完全理解它到底是什么。所有权十分重要,理解这个概念对于初学Rust来讲是有很大好处的,并且这也能让你在实现程序的过程中避免很多编译错误。

In our previous article, we’ve already talked about the Ownership model from a JavaScript developer’s perspective. In this article we’ll take a closer look at how Rust manages memory and why this ultimately affects how we write our code in Rust and preserve memory safety.

在我们之前的文章中,已经从JavaScript开发者的角度讨论过所有权模型。在本文中,我们将会更仔细地来看一看Rust是如何管理内存并且所有权为何极大地影响了我们在Rust中写代码的方式以及它是如何保证内存安全的。

Once you’re done reading this, you might want to check out our article on References in Rust as well as the difference between String and &str.

当你读完本文的时候,你可能想要去看一下另外两篇文章References in Rust difference between String and &str

What is Memory Safety anyway?

First and foremost it’s good to understand what memory safety actually means when it comes to discussing what makes Rust stand out as a programming language. Especially when coming from a non-systems programming background, or with mainly experience in garbage collected languages, it might be a bit harder to appreciate this fundamental feature of Rust.

在讨论是什么让Rust作为一门编程语言能够脱颖而出时,我们最好能够先来理解内存安全意味着什么?尤其是如果你没有系统编程背景或者主要使用带垃圾回收机制的语言,可能会很难理解Rust的这个基础特性。

As Will Crichton states in his great article Memory Safety in Rust: A Case Study with C:

正如Will Crichton 的一篇很棒的文章Memory Safety in Rust: A Case Study with C中所说的那样:

Memory safety is the property of a program where memory pointers used always point to valid memory, i.e. allocated and of the correct type/size. Memory safety is a correctness issue—a memory unsafe program may crash or produce nondeterministic output depending on the bug.

In practice, this means that there are languages that allow us to write “memory unsafe” code, in the sense that it’s fairly easy to introduce bugs. Some of those bugs are:

事实上,很多语言默许我们写出“内存不安全”的代码,这也就会更容易产生bug。比如像下面这些:

  • Dangling pointers: Pointers that point to invalid data (this will make more sense once we look at how data is stored in memory). You can read more about dangling pointers here.
  • 悬垂指针(Dangling pointers): 指向无效数据的指针(当我们了解数据在内存中如何存储之后,这个就很有意义)。你可以在这里了解更多悬垂指针
  • Double frees: Trying to free the same memory location twice, which can lead to “undefined behaviour”. More on that here.
  • 重复释放(Double frees): 试图对同一块内存地址释放两次,这会导致“未定义行为”。更多了解在这里。

To illustrate the concept of a dangling pointer, let’s take a look at the following C++ code and how it is represented in memory:

为了说明悬垂指针的概念,让我们来看下面的C++代码以及它是如何在内存中表示的:

std::string s = "Have a nice day";

The initialized string is usually represented in memory using the stack and heap like this:

初始化的字符串通常是在内存中使用堆和栈进行表示的,像下面这样:

                     buffer
                   /   capacity
                 /   /    length
               /   /    /
            +–––+––––+––––+
stack frame │ • │ 1615 │ <– s
            +–│–+––––+––––+
              │
            [–│––––––––––––––––––––––––– capacity ––––––––––––––––––––––––––]
              │
            +–V–+–––+–––+–––+–––+–––+–––+–––+–––+–––+–––+–––+–––+–––+–––+–––+
       heap │ H │ a │ v │ e │   │ a │   │ n │ i │ c │ e │   │ d │ a │ y │   │
            +–––+–––+–––+–––+–––+–––+–––+–––+–––+–––+–––+–––+–––+–––+–––+–––+

            [––––––––––––––––––––––––– length ––––––––––––––––––––––––––]

We’ll get into what stack and heap are in a second, but for now it’s important to appreciate that what gets stored on the stack is the std::string object itself which is of a three words long, fixed size. The fields are a pointer to the heap-allocated buffer which holds the actual data, the buffers capacity and the length of the text. In other words, the std::string owns its buffer. When the program destroys this string, it’ll free the corresponding buffer as well through the string’s destructor.

我们马上会讲到什么是堆和栈,但是现在更重要地是理解存储在栈上的是std::string对象本身,这个对象的长度为三个字(word),长度固定。它里面的字段包括指向一块分配在堆上的缓冲区(buffer),也是实际存放数据的位置,还包括缓冲区容量以及文本长度。换句话说,std::string拥有它的缓冲区。当程序销毁这个字符串的时候,对应的缓冲区也会通过字符串的析构器被释放。

However, it’s totally possible to create other pointer objects to a character living inside that same buffer which won’t get destroyed as well, leaving them invalid after the string has been destroyed, and there we have it - a dangling pointer!

但是,如果创建一个指向相同缓冲区中某个字符的指针,当字符串已经被销毁之后,指针指向的内容就是无效的,这时候我们就有一个悬垂指针,这是完全有可能的。

If you wonder how this is not exactly an issue when you write programs in languages like JavaScript or Python, the reason for that is that those languages are garbage collected. This means that the language comes with a program that, at run-time, will traverse the memory and free everything up that is no longer in use. Such program is called a Garbage Collector. While this sounds like a nice thing to have, of course garbage collection comes at a cost. Since it happens at run-time of your program, it can certainly affect the program’s overall run-time performance.

如果你想知道当你在用像JavaScript或者Python这样的语言编写程序时是怎么解决这个问题的,那是因为这些语言都有垃圾回收机制。这意味这些语言会在运行时带着一个程序,这个程序会遍历内存然后释放所有不会再用到的东西。这样的程序叫做垃圾回收器(Garbage Collector)。虽然有垃圾回收器听起来很美好,但是想想也知道这也要付出一定的代价。因为垃圾回收器是在你的程序运行时工作的,所以这一定会影响程序的整体性能。

Rust does not come with garbage collection, instead, it solves the issue of guaranteeing memory safety using ownership and borrowing. When we say that Rust comes with memory safety, we refer to the fact that, by default, Rust’s compiler doesn’t even allow us to write code that is not memory safe. How cool is that?

Rust没有垃圾回收器,取而代之的是,它使用所有权和借用来解决保证内存安全的问题。当我们说Rust是内存安全的,我们是指,在默认情况下,Rust的编译器根本不允许我们写出内存不安全的代码。这是多么酷!

Stack and Heap

Before we jump into how Rust handles Ownership of data, let’s quickly touch on what the stack and heap are and how they relate to which data gets stored where.

在我们深入了解Rust是如何处理数据的所有权之前,我们先来快速看一下什么是堆和栈以及他们是怎么和哪些数据存放在哪儿相关联的。

Both, stack and heap, are parts of memory but are represented in different data structures. While the stack is… well, a stack, where values are stored in order as they come in, and removed in the opposite order (which are very fast operations), a heap is more like a tree structure that requires a bit more computational effort to read and write data.

堆和栈都是内存的一部分但是以不同的数据数据结构来表示。栈是按照数据进来的顺序进行存储的,但是移除数据的时候是以相反的顺序(这样操作速度比较快)。堆更像是一个树结构,但是在进行数据读写时就需要多进行一些计算。

What goes onto the stack and what onto the heap depends on what data we’re dealing with. In Rust, any data of fixed size (or “known” size at compile time), such as machine integers, floating-point numeric types, pointer types and a few others, are stored on the stack. Dynamic and “unsized” data is stored on the heap. This is because often these types of unkown size either need to be able to to dynamically grow, or because they need to do certain “clean up” work when destructed (more than just popping a value off the stack).