`Hand-Implementing PartialEq, Eq, Hash, PartialOrd and Ord in Rust`

yujinliang 发表于 2020-06-15 10:39

Tags：翻译, 整理， data compare, Trait Eq, Trait Ord

`Hand-Implementing PartialEq, Eq, Hash, PartialOrd and Ord in Rust`

Introduction

这是一篇简短的指南，指导你实现诸如相等性、哈希、排序等Rust Trait, 通常你会采用auto-derive， Rust编译器自动帮我们impl 某个Trait, 如下：

#[derive(PartialEq, Eq, Hash, PartialOrd, Ord)]
pub MyStruct {
    ...
}

但是有时，我们需要可以自己实现，而非auto-derive, 也许自己实现的效率更高，或者是实现自己认为的相等性和比较关系。

本文我以一个Rust struct为例：

#[derive(Debug, Default, Clone)]
pub struct FileInfo {
    pub path: PathBuf,
    pub contents: String,
    pub is_valid_utf8: bool,
}

这个结构体描述了一个文本文件的路径、内容，以及内容是否为utf8编码。file path具有唯一性，好比是实体关系中的一个主键，它是struct FileInfo中的决定性成员属性，通常我们认为两个file path相等，则其对应的struct FileInfo也是相等的！

`Equality: the PartialEq and Eq traits`

如果我们想比较某个类型的两值x and y是否相等，或者不等，如：x == y and x != y，那么我们必须为类型实现PartialEq Trait。

impl PartialEq for FileInfo {
    fn eq(&self, other: &Self) -> bool {
        self.path == other.path // 注意此处比较的是path,  FileInfo的一个决定性成员属性。
    }
}

此处也可以为struct FileInfo实现Eq Trait , 通常PartialEq and Eq我们一并实现。

注意Eq Trait的定义是空的，没有方法：

trait Eq: PartialEq<Self> {}

然而这并不代表它毫无用处，它是一种标记性的Trait, 使类型可用作hashmaps中的键。

impl Eq for FileInfo {} //手动实现一个空impl块。

不过还是#[derive(Eq)] 更容易和简洁。

一个类型什么时候不能实现Eq, 非常稀少。Eq是一个等价关系，因此必须满足一定的特性：

传递性： if x == y and y == z then x == z
对称性： if x == y then y == x
自反性： x == x is always true

一个类型中的所有值都必须满足上述特性时才可以实现Eq。(一个类型代表一个值域)

这些特性适用于大多数数据类型。主要的例外（也是Rust标准库中唯一的例外）是浮点数值，其中NaN的自反性属性不成立，因为IEEE浮点标准要求NaN不等于自身（或任何其他数字）。也可以说不可以(不能/不允许)为浮点类型实现Eq Trait 。

TL;DR If you implement PartialEq then #[derive(Eq)] as well unless you can’t

（如果你实现了PartialEq , 那么请你也尽可能一并实现Eq, 除非不允许（不能）！）

Hashing（哈希散列）

散列值与相等的概念密切相关，因此如果实现自己的PartialEq Trait，还应该实现Hash Trait 。

The following must hold: if x == y then hash(x) == hash(y)

一个原则： if x == y then hash(x) == hash(y)

如果你的类型不满足上述原则，那么它就不适合作为一个HashMap and HashSet的key。

也就是说，根据PartialEq ,如果两个值是相等的，那么他们的哈希值也必然是相等的，但是反之不成立（不一定成立），因为存在哈希冲突，特别是当一个待哈希的值域远比哈希值自身值域大得多的情况下，举一个简单的例子：一个struct 有两个u64的成员，那么将存在u64::MAX * u64::MAX个可能的值实例，所以不可能将所有struct实例都哈希到一个u64值域中，必然存在冲突，也可以说不能保证生成唯一的哈希值。

将对FileInfo求哈希值委托给其成员path:

impl Hash for FileInfo {
    fn hash<H: Hasher>(&self, hasher: &mut H) {
        self.path.hash(hasher); //注意此处，hash a FileInfo就是hash其path。
    }
}

这种委托技术适用于所有类型，因为标准库中的所有基本类型都实现了PartialEq和Hash。

`Ordering（排序）: the PartialOrd and Ord traits（全序和偏序）`

使用运算符<、<=、>=和>可以计算值的相对顺序，为此必须为自定义类型实现PartialOrd。

Before you can implement PartialOrd you must implement PartialEq first.

特别注意：在你为自定义类型实现PartialOrd Tait之前，你必须首先为其实现PartialEq Trait。

一个例子

impl PartialOrd for FileInfo {
    fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
        self.path.partial_cmp(&other.path)
    }
}

Ordering 是一个简单的枚举类型，有3个可能的值：

pub enum Ordering {
    Less,
    Equal,
    Greater,
}

你也许会很好奇，为什么 partial_cmp 这个方法返回值类型是个Option, 而非直接是一个Ordering值类型？！

这仍然与浮点数类型有关，因为NaN不是一个可以表示的数值，诸如：3.0 < NaN这样的表达式毫无意义！对于这种情况，partial_cmp就会返回None 因此浮点数是Rust标准库中发生此结果的唯一特例。

partial_cmp返回一个Option导致一个结果，当结果为Node时，无法决定两个值的排序，即x 和y 会处于不确定排序。实际上，这意味着只实现PartialOrd Trait还不足以使你的自定义类型可排序，你还需要实现Ord Trait。

To enable your values to be sorted, you must implement **Ord**

若要你的自定义类型可排序，你必须为它实现Ord Trait 。

Before you can implement **Ord**, you must first implement **PartialOrd**, **Eq** and **PartialEq**

在你实现Ord Trait 之前，你首先必须实现PartialOrd, Eq, PartialEq Trait。

impl Ord for FileInfo {
    fn cmp(&self, other: &Self) -> Ordering {
        self.path.cmp(&other.path) //将排序委托给其成员path。
    }
}

有了它，即上面的Ord实现，就可以排序一个Vec<FileInfo>了，换句话说，这个Vec<FileInfo>能排序了！

全序定义

　　在集合A中，如果对于任意a∈A, b∈A, 有aRb或bRa，即A中的每对元素都满足关系R，则集合A上的偏序R是全序的或线性序的。

【我的理解】：

实现PartialOrd偏序只是说明你的自定义类型可以比较大小，但是不保证集合中每一对元素都可以明确分出大小尊卑长幼，哈哈！

但是实现Ord全序则说明你的自定义类型，在其值域集合中的每一对元素都可以明确比较出大小尊卑长幼！即：x < y 或 y > x或x=y三者之一必然成立！

再通俗地将： PartialOrd偏序代表可比较， Ord全序代表可排序！！！先要可比较，方可能排序！！！

自定义类型的多成员参与比较

你可能好奇如何同时比较自定义类型的多个成员。为此如何实现上面提及的多个Trait?(这有点类似于实体关系领域中的复合主键)，下面的模式（代码例子）正是解决之道：

impl PartialEq for ExtenededFileInfo {
    fn eq(&self, other: &Self) -> bool {
        // Equal if all key members are equal
        self.path == other.path &&
        self.attributes == other.attributes
    }
}

impl Hash for FileInfo {
    fn hash<H: Hasher>(&self, hasher: &mut H) {
        // Ensure we hash all key members.
        self.path.hash(hasher);
        self.attributes.hash(hasher);
    }
}

排序是一个繁琐的事情，你首先比较第一个成员，若是不相等，则一切结束，若是相等，则你必须接着比较下一个成员，以此类推，直至完成。剩下的留给读者自己练习吧。

`不同类型的比较`

我有意简单化了上面的讨论，只是讨论了相同类型值之间的比较，比如两个FileInfo值之间的比较。但实际并非如此简单，上面提及的Trait，基本都持有一个Rhs参数类型，又称右值 ，这样定义非常灵活，它允许你把FileInfo和Path作比较，也允许你把一个复数和f64作比较。但是Ord Trait例外，它要求比较的两者必须类型相同（即self 和 Rhs 必须是相同类型）。

上面的例子没有讨论Rhs, 默认其与Self类型相同，下面是Rust标准库中PartialEq的定义：

pub trait PartialEq<Rhs: ?Sized = Self> {
    fn eq(&self, other: &Rhs) -> bool;
    fn ne(&self, other: &Rhs) -> bool { !self.eq(other) }
}

如果想要不同类型间实现比较，比如FileInfo and &str之间的相等性比较，你可以参考下面的代码：

impl PartialEq<&str> for FileInfo { //泛型特化
    fn eq(&self, other: &&str) -> bool {
        match self.path.to_str() {
            Some(s) => s == *other,
            None => false
        }
    }
}

注意，eq中的other参数总是被定义为对某个东西的不变引用，因此当我们为&str 实现它时，我们最终会得到一个双引用，然后我们必须先解一次引用才能进行s==*other比较。

`关于效率`

你也许好奇，到底是自己手动实现的效率高，还是auto-derive效率高？这很难说，不能一概而论，比如：如果你明确定知道一个大struct中只有少数几个成员与此比较直接相关，或者说有决定性，那么你自己手动实现的版本很可能优于auto-derive版本，因为auto-derive版本通常会依次比较所有成员，很可能做了无用功。换一个角度说，尽管auto-derive版本可能会依次检查比较每一个struct成员，但是因为它可以采用布尔表达式的短路原则，也许检查第一个成员就停止了，因此也很有可能快于自定义实现版。但是Hash是一个例外，它不允许短路原则，必须所有成员依次都要哈希一次才可以，不能偷懒，但如果你能够只哈希1或2个简单成员而不是大量字符串成员，那么您将很容易击败auto-derive默认实现。

auto-derive有另外两个技巧。首先，它为Trait中的所有方法生成自定义实现，包括那些具有默认实现的方法。例如，对于PartialOrd，它不仅生成partial cmp，还生成lt、le、ge和gt。其次，它在所有方法中都添加了#[inline]。

你可以使用`cargo expand`工具将`#[derive()]`生成的代码打印到`stdout`，在下面的代码中，你可以自己检验自定义实现和`#[derive()]`生成代码的差异。

完整代码：

use std::path::PathBuf;
use std::hash::{Hash, Hasher};
use std::collections::HashMap;
use std::cmp::Ordering;

#[derive(Debug, Default, Clone, Eq)]
pub struct FileInfo {
    pub path: PathBuf,
    pub contents: String,
    pub is_valid_utf8: bool,
}

impl FileInfo {
    fn new<P: Into<PathBuf>>(path: P) -> Self {
        Self {
            path: path.into(),
            ..Default::default()
        }
    }
}

impl PartialEq for FileInfo {
    #[inline]
    fn eq(&self, other: &Self) -> bool {
        self.path == other.path
    }
}

impl Hash for FileInfo {
    #[inline]
    fn hash<H: Hasher>(&self, hasher: &mut H) {
        self.path.hash(hasher);
    }
}

impl PartialOrd for FileInfo {
    #[inline]
    fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
        self.path.partial_cmp(&other.path)
    }
}

impl Ord for FileInfo {
    #[inline]
    fn cmp(&self, other: &Self) -> Ordering {
        self.path.cmp(&other.path)
    }
}

impl PartialEq<&str> for FileInfo {
    #[inline]
    fn eq(&self, other: &&str) -> bool {
        match self.path.to_str() {
            Some(s) => s == *other,
            None => false
        }
    }
}

fn main() {
    // Demonstrate the various traits. Try commenting out the `impl`
    // blocks to see the various compilation errors you get when they
    // are not implemented.

    let f1 = FileInfo::new("/temp/foo");
    let f2 = FileInfo::new("/temp/bar");

    // ------------------------------------------------------------------------------
    // Demonstrate PartialEq. It gives us `==` and `!=`.
    if f1 == f2 {
        println!("f1 and f2 are equal");
    } else {
        println!("f1 and f2 are NOT equal");
    }

    if f1 != f2 {
        println!("f1 and f2 are NOT equal");
    } else {
        println!("f1 and f2 are equal");
    }

    // ------------------------------------------------------------------------------
    // Demonstrate Hash. Note that the HashMap takes ownership of its keys -
    // they are moved into the HashMap.
    let mut hm = HashMap::new();
    hm.insert(f1, 200);
    hm.insert(f2, 500);
    // To avoid complicating this discussion, make a new FileInfo to perform a lookup.
    // In real-life, you would implement the Borrow trait to avoid the temporary variable.
    let f_lookup = FileInfo::new("/temp/foo");
    let file_size = hm[&f_lookup];
    println!("f1 has a size of {} bytes", file_size);

    // ------------------------------------------------------------------------------
    // Demonstrate PartialOrd. It gives us `<`, `<=`, `>=` and `>`.

    // Makes some new f's because the others went into the HashMap.
    let f1 = FileInfo::new("/temp/foo");
    let f2 = FileInfo::new("/temp/bar");

    if f1 < f2 {
        println!("f1 is less than f2");
    } else {
        println!("f1 is not less than f2");
    }

    if f1 > f2 {
        println!("f1 is greater than f2");
    } else {
        println!("f1 is not greater than f2");
    }

    // ------------------------------------------------------------------------------
    // Demonstrate Ord. It unlocks sorting functionality.
    let mut v = vec![f1, f2];
    v.sort();
    println!("v after sorting = {:#?}", v);

    // ------------------------------------------------------------------------------
    // Demonstrate cross-type equality testing.
    let f1 = FileInfo::new("/temp/foo");
    if f1 == "/temp/foo" {
        println!("The path in f1 is equal to the str value \"/temp/foo\"");
    } else {
        println!("Nope, comparisons to strings are not working as they should be.");
    }
}

原文链接:

https://www.philipdaniels.com/blog/2019/rust-equality-and-ordering/

译者

学习随笔，水平粗陋，如有谬误，望请海涵雅正，谢谢。

作者：心尘了

email: 285779289@qq.com

git：https://github.com/yujinliang/rust_learn/tree/master/data_cmp

Reference:

https://www.cnblogs.com/hibernate6/archive/2012/01/17/2521942.html

https://doc.rust-lang.org/std/cmp/trait.Ord.html

https://doc.rust-lang.org/std/cmp/trait.Eq.html

https://doc.rust-lang.org/std/cmp/index.html

评论区

写评论

Mike Tang 2020-06-15 10:42

666

1 共 1 条评论, 1 页