< 返回版块

【Rust日报】2023-07-18 Pin- 温故而知新

Koalr 发表于 2023-07-18 15:37

有些事情你总是学了又忘记（或者说你从来就没学过？）

对我来说，其中之一就是在Rust中 Pin/Unpin 。

每次我读到有关固定的解释，我的大脑就像 👍 ，几周后就像 🤔 🤨 。

所以，我写这篇文章是为了强迫我的大脑记住这些知识。我们看看效果如何！

Pin 是一种指针，可以看作是 &mut T 和 &T 之间的折中。

Pin<&mut T> 的重点是说：

这个值可以被修改（就像 &mut T 一样），但是
这个值不能被移动（不像 &mut T ）

为什么？因为有些值必须不能移动，或者需要特别小心地去移动。

一个典型的例子就是自指数据结构。在使用 async 时，它们会自然地出现，因为未来值往往会在引用自己的本地值。

这个看似温和的 Future：

async fn self_ref() {
    let mut v = [1, 2, 3];

    let x = &mut v[0];

    tokio::time::sleep(Duration::from_secs(1)).await;

    *x = 42;
}

需要一个自我引用的结构，因为在底层，futures 是状态机（不像闭包）。

请注意， self_ref 在第一个 await 处将控制权传递回调用者。这意味着尽管 v 和 x 看起来像普通的堆栈变量，但在这里可能发生了更复杂的事情。

编译器希望生成类似这样的内容：

enum SelfRefFutureState {
    Unresumed,        // Created and wasn't polled yet.
    Returned,
    Poisoned,         // `panic!`ed.
    SuspensionPoint1, // First `await` point.
}

struct SelfRefFuture {
    state: SelfRefFutureState,
    v: [i32; 3],
    x: &'problem mut i32, // a "reference" to an element of `self.v`, 
                          // which is a big problem if we want to move `self`.
                          // (and we didn't even consider borrowchecking!)
}

但是！如果你想的话，你可以移动 SelfRefFuture ，这会导致 x 指向无效的内存。

let f = self_ref();
let boxed_f = Box::new(f); // Evil?

let mut f1 = self_ref();
let mut f2 = self_ref();

std::mem::swap(&mut f1, &mut f2); // Blasphemy?

怎么回事？就像一位聪明的编译器曾经说过的：

futures do nothing unless you .await or poll them

#[warn(unused_must_use)] on by default

– rustc

这是因为调用 self_ref 实际上什么都不做, 我们实际上会得到类似于:

struct SelfRefFuture {
    state: SelfRefFutureState,
    v: MaybeUninit<[i32; 3]>,
    x: *mut i32, // a pointer into `self.v`, 
                 // still a problem if we want to move `self`, but only after it is set.
    //
    // .. other locals, like the future returned from `tokio::time::sleep`.
}

那么在这种状态(初始状态)下可以安全地移动。

impl SelfRefFuture {
    fn new() -> Self {
        Self {
            state: SelfRefFutureState::Unresumed,
            v: MaybeUninit::uninit(),
            x: std::ptr::null_mut(),
            // ..
        }
    }
}

只有当我们开始在 f 上进行轮询时，我们才会遇到自我引用的问题（x 指针被设置），但如果 f 被包裹在 Pin 中，所有这些移动都变成了 unsafe ，这正是我们想要的。

由于许多futures 一旦执行就不应该在内存中移动，只有将它们包装在 Pin 中才能安全地使用，因此与异步相关的函数往往接受 Pin<&mut T> （假设它们不需要移动该值）。

一个微小的例子

这里不需要固定：

use tokio::time::timeout;

async fn with_timeout_once() {
    let f = async { 1u32 };

    let _ = timeout(Duration::from_secs(1), f).await;
}

但是如果我们想要多次调用 timeout （例如，因为我们想要重试），我们将不得不使用 &mut f （否则会得到 use of moved value ），这将导致编译器报错

use tokio::time::timeout;

async fn with_timeout_twice() {
    let f = async { 1u32 };

    // error[E0277]: .. cannot be unpinned, consider using `Box::pin`.
    //               required for `&mut impl Future<Output = u32>` to implement `Future`
    let _ = timeout(Duration::from_secs(1), &mut f).await;
    
    // An additional retry.
    let _ = timeout(Duration::from_secs(1), &mut f).await;
}

为什么？

因为在几个层级下， timeout 调用了被定义为 Future::poll 的函数

fn poll(self: Pin<&mut Self>, ...) -> ... { ... }

当我们 await f 时，我们放弃了对它的所有权。

编译器能够为我们处理固定引用，但如果我们只提供一个 &mut f ，它就无法做到这一点，因为我们很容易破坏 Pin 的不变性：

use tokio::time::timeout;

async fn with_timeout_twice_with_move() {
    let f = async { 1u32 };

    // error[E0277]: .. cannot be unpinned, consider using `Box::pin`.
    let _ = timeout(Duration::from_secs(1), &mut f).await;

    // .. because otherwise, we could move `f` to a new memory location, after it was polled!
    let f = *Box::new(f);

    let _ = timeout(Duration::from_secs(1), &mut f).await;
}

这个时候我们需要给 future 套上一个 pin!

    use tokio::pin;
    use tokio::time::timeout;
    
    async fn with_timeout_twice() {
        let f = async { 1u32 };
    
        pin!(f);  // f is now a `Pin<&mut impl Future<Output = u32>>`.
        
        let _ = timeout(Duration::from_secs(1), &mut f).await;
        let _ = timeout(Duration::from_secs(1), &mut f).await;
    }

这里还需要再做一点额外的工作，我们需要确保 f 在被 pin 包裹之后不再可访问。如果我们看不到它，就无法移动它。

事实上我们可以更准确地表达不能移动规则：指向的值在值被丢弃之前不能移动（无论何时丢弃 Pin）。

这就是 pin! 宏的作用：它确保原始的 f 对我们的代码不再可见，从而强制执行 Pin 的不变性

Tokio’s pin! 是这样实现的:

// Move the value to ensure that it is owned
let mut f = f;
// Shadow the original binding so that it can't be directly accessed
// ever again.
#[allow(unused_mut)]
let mut f = unsafe {
    Pin::new_unchecked(&mut f)
};

标准库的版本 pin! 有点更酷，但使用的是相同的原理：用新创建的 Pin 来遮蔽原始值，使其无法再被访问和移动。

一个 📦

所以 Pin 是一个指针（对另一个指针的零大小的包装器），它有点像 &mut T 但有更多的规则。

下一个问题将是“归还借用的数据”。

我们无法回到以前的固定未来

use std::future::Future;

async fn with_timeout_and_return() -> impl Future<Output = ()> {
    let f = async { 1u32 };

    pin!(f);  // f is now a `Pin<&mut impl Future<Output = u32>>`.

    let s = async move {
        let _ = timeout(Duration::from_secs(1), &mut f).await;
    };

    // error[E0515]: cannot return value referencing local variable `f`
    s
}

现在应该更清楚为什么了：被固定的 f 现在是一个指针，它指向的数据（异步闭包）在我们从函数返回后将不再存在。

因此，我们可以使用 Box::pin

-pin!(f);
+let mut f = Box::pin(f);

但是我们刚刚不是说 Pin<&mut T> 是 &mut T 和 &T 之间的（一个包装器）指针吗？

嗯，一个 mut Box<T> 也像一个 &mut T ，但有所有权。

所以一个 Pin<Box<T>> 是一个指向可变 Box<T> 和不可变 Box<T> 之间的指针，值可以被修改但不能被移动。

Unpin

Unpin 是一种 Trait。它不是 Pin 的"相反"，因为 Pin 是指针的一种类型，而特征不能成为指针的相反。

Unpin 也是一个自动特性（编译器在可能的情况下会自动实现它），它标记了一种类型，其值在被固定后可以被移动（例如，它不会自我引用）。

主要的观点是，如果 T: Unpin ，我们总是可以 Pin::new 和 Pin::{into_inner,get_mut} T 的值，这意味着我们可以轻松地在“常规”的可变值之间进行转换，并忽略直接处理固定值所带来的复杂性。

Unpin Trait 是 Pin 的一个重要限制，也是 Box::pin 如此有用的原因之一：当 T: !Unpin 时，“无法移动或替换 Pin<Box<T>> 的内部”，因此 Box::pin（或者更准确地说是 Box::into_pin）可以安全地调用不安全的 Pin::new_unchecked，而得到的 Box 总是 Unpin 的，因为移动它时并不会移动实际的值。

这里说的很绕，我们用例子例子解释一下。

另一个微小的例子

我们可以亲手创造一个美好的 Future：

fn not_self_ref() -> impl Future<Output = u32> + Unpin {
    struct Trivial {}

    impl Future for Trivial {
        type Output = u32;

        fn poll(self: Pin<&mut Self>, _cx: &mut std::task::Context<'_>) -> std::task::Poll<Self::Output> {
            std::task::Poll::Ready(1)
        }
    }

    Trivial {}
}

现在，我们可以多次调用它而不需要固定： timeout

async fn not_self_ref_with_timeout() {
    let mut f = not_self_ref();

    let _ = timeout(Duration::from_secs(1), &mut f).await;
    let _ = timeout(Duration::from_secs(1), &mut f).await;
}

使用 async fn 或 async {} 语法创建的任何 Future 都被视为 !Unpin ，这意味着一旦我们将其放入 Pin 中，就无法再取出来。

摘要

Pin 是对另一个指针的包装，有点像 &mut T ，但额外的规则是在值被丢弃之前，移动它所指向的值是不安全的。
为了安全地处理自引用结构，我们必须在设置自引用字段后防止其移动(使用 Pin)。
Pin 承诺该值在其生命周期内无法移动，所以我们无法在不放弃创建 &mut T 的能力并破坏 Pin 的不变性的情况下创建它。
当在拥有所有权的 Future 进行 await Future 时，编译器可以处理固定，因为它知道一旦所有权转移， Future 就不会移动。
否则，我们需要处理固定（例如使用 pin! 或 Box::pin ）
Unpin 是一个标记特征，表示一个类型即使在被包装在 Pin 之后仍然可以安全地移动，使一切变得更简单。
大多数结构是 Unpin ，但 async fn 和 async {} 总是产生 !Unpin 结构。

ReadMore: https://ohadravid.github.io/posts/2023-07-put-a-pin-on-that/

From 日报小组 Koalr

社区学习交流平台订阅：

评论区

写评论

pinylin 2023-07-19 11:37

赞👍

1 共 1 条评论, 1 页