在流程控制上,rust历来都比较推荐match,功能上十分完备,可以很大程度上替代传统语言的if-else语句,也能很大程度替代其他语言在表驱动的写法。
说是这么说,但实际性能是怎样的却很少有人说明,只看汇编码的话勉强可以看出match语句比if-else语句少掉很多比较,但实际性能怎样还得测一测。
测试的方法则是一个通过流程控制实现的循环计数功能,分别测试纯粹的match匹配,花式match-if匹配,传统if-else流程控制,以及通过HashMap<usize, Box<dyn Fn(&mut usize)>
来实现表驱动。
其中表驱动这种写法很多语言是用switch实现的,rust一般是用match实现的,但也有部分语言方法是hashmap实现的,rust讲实话很少实现以函数为值的容器,其中涉及的所有权问题比较复杂,虽然是函数但还得包动态类型智能指针,不像某些语言直接把函数放进值里就可以了。这次测试也只是用了闭包Fn实现,本来想写更加不卫生的无参数FnMut,但怎么都编译不通过,只能凑合测测看了。
具体在实现的代码如下,假设其为64个分支:
fn match_test(n: usize, mut x: usize) -> usize {
for _ in 0..n {
match x {
1 => x = 2,
2 => x = 3,
...
63 => x = 64,
_ => x = 1
}
}
x
}
fn match_if_test(n: usize, mut x: usize) -> usize {
for _ in 0..n {
match x {
y if y == 1 => x = 2,
y if y == 2 => x = 3,
...
y if y == 63 => x = 64,
_ => x = 1
}
}
x
}
fn if_else_test(n: usize, mut x: usize) -> usize {
for _ in 0..n {
if x == 1 { x = 2 }
else if x == 2 { x = 3 }
...
else if x == 63 { x = 64 }
else { x = 1 }
}
x
}
fn table_driven_hashmap_test(n: usize, mut x: usize) -> usize {
use std::collections::HashMap;
// 本来想写更加不卫生的无参数FnMut,但最终也没把x的所有权问题解决
type Callback = Box<dyn Fn(&mut usize)>;
let m: HashMap<usize, Callback> = HashMap::from([
(1, Box::new(|x: &mut usize| *x = 2) as Callback),
(2, Box::new(|x: &mut usize| *x = 3) as Callback),
...
(63, Box::new(|x: &mut usize| *x = 64) as Callback),
]);
let default = |x: &mut usize| *x = 1;
for _ in 0..n {
if m.get(&x).map(|func| func(&mut x)).is_none() {
// 当 x 不等于 1 到 63 的时候,x = 1;
default(&mut x);
}
}
x
}
fn table_driven_vec_test(n: usize, mut x: usize) -> usize {
type Callback<'a> = &'a dyn Fn(&mut usize);
let v: &[Callback<'_>] = &[
&(|x: &mut usize| *x = 1) as Callback,
&(|x: &mut usize| *x = 2) as Callback,
&(|x: &mut usize| *x = 3) as Callback,
...
&(|x: &mut usize| *x = 64) as Callback,
];
let default = |x: &mut usize| *x = 1;
for _ in 0..n {
if v.get(x).map(|func| func(&mut x)).is_none() {
default(&mut x);
}
}
x
}
通过宏重写了复用的逻辑,分别测试不同的分枝数的时间,循环次数取了n = 10_000_000
,最后得到如下测试结果:
branches: 4
match: 3.3904ms, 3.6673ms, 4.7517ms, 3.3143ms, 3.3142ms, 3.4453ms, 3.612ms, 3.3194ms, 3.3141ms, 3.315ms,
match_if: 3.7305ms, 4.1816ms, 3.9521ms, 3.6463ms, 3.6331ms, 4.2191ms, 4.0577ms, 3.6995ms, 3.7195ms, 4.1883ms,
if_else: 5.4293ms, 3.8502ms, 4.3737ms, 3.9505ms, 3.7267ms, 3.4774ms, 4.1949ms, 3.9994ms, 3.7674ms, 3.6ms,
table_driven_hashmap: 103.9701ms, 122.3524ms, 122.7845ms, 100.2528ms, 106.5108ms, 127.5172ms, 105.7942ms, 129.1624ms, 165.0806ms, 115.4098ms,
table_driven_vec: 16.1917ms, 14.0669ms, 14.0214ms, 14.0425ms, 13.8328ms, 19.6705ms, 21.2281ms, 13.5288ms, 22.032ms, 17.1706ms,
branches: 8
match: 3.6455ms, 3.8793ms, 3.5487ms, 3.7148ms, 3.7006ms, 3.6769ms, 3.6659ms, 3.7404ms, 3.6811ms, 3.6833ms,
match_if: 3.9426ms, 4.4188ms, 3.8393ms, 4.4766ms, 3.8199ms, 3.9531ms, 3.7941ms, 3.8904ms, 3.9169ms, 3.9105ms,
if_else: 4.0554ms, 3.9229ms, 3.7935ms, 4.0178ms, 4.281ms, 3.9466ms, 3.8345ms, 4.0638ms, 4.6341ms, 3.8491ms,
table_driven_hashmap: 113.2449ms, 127.943ms, 128.6095ms, 112.7296ms, 125.1844ms, 133.5039ms, 154.9556ms, 224.6223ms, 191.0714ms, 104.3797ms,
table_driven_vec: 18.8915ms, 16.06ms, 16.6736ms, 15.141ms, 16.5778ms, 15.9272ms, 15.4643ms, 15.5129ms, 15.4069ms, 15.4384ms,
branches: 16
match: 3.82ms, 3.9299ms, 3.6091ms, 4.0194ms, 3.8655ms, 3.5625ms, 3.5489ms, 3.5875ms, 3.6575ms, 3.6409ms,
match_if: 3.7758ms, 3.5685ms, 3.6397ms, 3.7803ms, 3.5683ms, 3.5808ms, 3.6137ms, 4.3534ms, 4.121ms, 3.6389ms,
if_else: 3.6131ms, 4.0447ms, 3.9562ms, 3.5895ms, 3.5671ms, 3.8686ms, 3.6878ms, 3.5646ms, 3.5976ms, 3.6763ms,
table_driven_hashmap: 116.7476ms, 120.816ms, 113.8963ms, 121.6966ms, 102.2712ms, 124.4602ms, 116.9638ms, 128.0126ms, 137.3728ms, 142.8418ms,
table_driven_vec: 19.2862ms, 15.2574ms, 15.4181ms, 14.7007ms, 14.89ms, 28.9602ms, 55.532ms, 28.3054ms, 26.2988ms, 67.9233ms,
branches: 32
match: 3.5513ms, 3.732ms, 3.6125ms, 3.4187ms, 3.3474ms, 3.3586ms, 3.3585ms, 3.4791ms, 4.2535ms, 3.3596ms,
match_if: 3.5743ms, 3.9367ms, 3.6296ms, 3.339ms, 3.3292ms, 3.4128ms, 3.4474ms, 3.5822ms, 3.4925ms, 3.433ms,
if_else: 4.21ms, 3.6252ms, 3.4172ms, 3.4126ms, 3.4433ms, 3.6506ms, 3.4267ms, 3.402ms, 3.4292ms, 3.4252ms,
table_driven_hashmap: 135.3664ms, 103.676ms, 113.1637ms, 120.5619ms, 110.7571ms, 105.7341ms, 119.5289ms, 132.2703ms, 108.6024ms, 117.8609ms,
table_driven_vec: 16.8557ms, 16.1205ms, 15.5395ms, 15.3842ms, 15.4809ms, 15.6577ms, 15.2929ms, 15.3133ms, 15.6808ms, 15.3334ms,
branches: 64
match: 3.8765ms, 4.403ms, 3.2189ms, 3.2614ms, 5.4701ms, 3.6202ms, 3.2225ms, 3.2231ms, 3.5133ms, 3.8491ms,
match_if: 3.6157ms, 3.8887ms, 3.6092ms, 4.6589ms, 4.1004ms, 3.46ms, 4.167ms, 3.5751ms, 3.3842ms, 3.3782ms,
if_else: 3.8066ms, 5.2757ms, 4.8017ms, 3.409ms, 3.4739ms, 3.4244ms, 3.509ms, 3.3838ms, 3.4959ms, 3.3824ms,
table_driven_hashmap: 198.0397ms, 155.8294ms, 138.0204ms, 162.6539ms, 149.2959ms, 153.7422ms, 152.5761ms, 131.911ms, 138.5925ms, 153.5351ms,
table_driven_vec: 16.9102ms, 16.3827ms, 15.8785ms, 16.5236ms, 15.6935ms, 15.5552ms, 15.6869ms, 15.5644ms, 15.4713ms, 15.5346ms,
branches: 128
match: 3.9663ms, 4.8639ms, 3.7769ms, 3.6452ms, 3.6132ms, 4.4149ms, 3.6532ms, 3.7608ms, 3.6438ms, 3.8445ms,
match_if: 6.4039ms, 6.2458ms, 6.296ms, 6.2953ms, 6.2581ms, 6.4576ms, 6.3213ms, 6.3387ms, 6.3825ms, 6.2506ms,
if_else: 7.1545ms, 6.5482ms, 6.7545ms, 6.6666ms, 6.6354ms, 6.8014ms, 6.6641ms, 6.7175ms, 6.7168ms, 7.4704ms,
table_driven_hashmap: 182.5284ms, 183.1441ms, 181.4559ms, 183.8605ms, 184.5005ms, 180.1116ms, 193.7634ms, 188.7075ms, 185.761ms, 196.2352ms,
table_driven_vec: 33.681ms, 30.236ms, 29.9209ms, 50.8282ms, 60.2091ms, 53.2975ms, 28.2092ms, 30.7684ms, 61.158ms, 31.9083ms,
branches: 1024
match: 3.9626ms, 3.9671ms, 3.9628ms, 4.4519ms, 3.9806ms, 3.9601ms, 4.1644ms, 4.4445ms, 3.9706ms, 3.9609ms,
match_if: 15.5266ms, 14.4545ms, 14.6208ms, 15.4482ms, 14.4482ms, 14.5209ms, 14.5363ms, 14.6093ms, 19.3402ms, 18.0519ms,
if_else: 16.0209ms, 15.0817ms, 14.5871ms, 14.6568ms, 15.6924ms, 14.4908ms, 15.4812ms, 14.8693ms, 15.5043ms, 15.7196ms,
table_driven_hashmap: 249.0023ms, 222.2315ms, 222.863ms, 232.9312ms, 253.3306ms, 246.7888ms, 239.6788ms, 242.6305ms, 228.6975ms, 229.5875ms,
table_driven_vec: 87.5138ms, 91.1127ms, 97.3113ms, 104.7676ms, 85.8476ms, 85.7071ms, 100.7908ms, 109.6692ms, 91.2772ms, 90.6162ms,
可以看出,在分支比较少的情况下,即64个分支以内的情况下,通过编译器的优化,match/match-if/if-else在性能上没啥区别,而表驱动走的数值哈希,在不扩容的情况下理论上性能也不会有太大变化,但都比前三者差上15-20倍。
在分支变大到1024的时候,可以明显看出match的性能优势,可以说几乎没有减弱,而match-if和if-else都同时出现了性能下降,比小分支时候的性能差了3-5倍,至于hashmap的表驱动,性能也再下降了一倍。
再往上翻倍分支我这就是栈溢出,所以最多就测到了1024。
当然,在绝大多数情况下,我们都用不到这么多分支,我在其他语言遇到过最大if-else屎山大约也就中几十个分支,就算写了注释也看不懂,最后选择在屎山上再上一坨。
对于rust项目来说,以后遇到可能出现屎山的情况,如果是为了匹配清晰还是优先选match比较好,就像各种教程推荐的那样优先用match。至于hashmap那种表驱动写法,又难写,性能又不行,除非有必须使用的场景(可能是某些路由匹配),在日常分支判断里实在是不太推荐(指的是在某些语言里用表驱动实现分支)。不过话说回来,hashmap更动态,更适合基础设施和框架,可以根据配置文件来动态插入,不像match和if-else只能修改代码写死,使用场景还是略有区别。
加测了数组的表驱动,数组取下表性能远快于hashmap取键,不过还是比match-if/if-else慢上4-6倍,而且随着数组变大性能也会下降(尽管测试方法已经几乎算是顺序读取数据了)。
评论区
写评论确实,这只是编译优化前的代码展开,和最终优化结果不能划等号,但看到表驱动展开后的这堆代码,想优化到match或ifelse级别的性能实在有点玄幻。当然编译器优化本身就很玄幻,做性能测试的时候非常害怕一不小就把整段代码优化没了。
--
👇
苦瓜小仔: 很棒!不过我感觉性能分析看 MIR 有点不对,因为 MIR 是给 LLVM IR 的,它适合分析 Rust 相关的、优化前东西(比如语法脱糖、借用检查、控制流图),不太适合分析性能这种依赖于优化后的东西。
很棒!不过我感觉性能分析看 MIR 有点不对,因为 MIR 是给 LLVM IR 的,它适合分析 Rust 相关的、优化前东西(比如语法脱糖、借用检查、控制流图),不太适合分析性能这种依赖于优化后的东西。
看了MIR,表驱动一来是闭包函数元素存在堆上,调用比match的栈上调用慢,再者就是闭包函数没有inline内联优化,整个函数存在调用过程,函数调用过程本身就比普通作用域调用要复杂得多,还得传入引用进行引用修改,还有就是闭包本身不管是用box指针还是生命周期标记,用了dyn的话本身引用调用也有运行时开销,所以还是能match就match吧。
下面是去掉循环后且分支缩减为3个的MIR展开:
会不会是match存在栈上,所以比较快?
--
👇
lithbitren: 觉得还是有点问题,我本来以为vec表驱动会和match差不多,但结果上match怎么表现这么好
(⊙o⊙)哇,什么场景需要手写两三千个分支,求大佬分享分享开开眼。
如果是栈溢出的话,开线程似乎可以设置stack_size控制子线程栈大小,或者在cargo运行时命令行设置设置RUST_MIN_STACK,不过没试过不太清楚行不行。
或者试试分级match,如果函数能行的话,可以每n个分支打包进一个分支或一个函数,应该能比hashmap快不少吧。
--
👇
gant boam: 我之前有场景是大概两三千个条件,我用match无法编译成功,栈溢出还是啥错误来着,我只能改成hashmap+fn了,我还是希望高性能,不知道有啥方法使用matxh不
我之前有场景是大概两三千个条件,我用match无法编译成功,栈溢出还是啥错误来着,我只能改成hashmap+fn了,我还是希望高性能,不知道有啥方法使用matxh不
觉得还是有点问题,我本来以为vec表驱动会和match差不多,但结果上match怎么表现这么好
俺比较菜,不太爱标生命周期,遇事不决都上box,但标生命周期确实比box更简洁,不过测出来性能似乎差不多。
因为用宏不知道怎么批量生产不同名的外部函数,于是还是继续用闭包写了,其实用普通函数更好写,如果出入参数类型统一的话甚至都不用重命名类型,也不用as类型断言,闭包反而类型难以统一所以每次定义都得类型断言。
之前测的时候也考虑过数组表驱动,不过,讲实话以前真没见过其他语言的数组表驱动,大多数表驱动都是匹配字符串,纯数字匹配比较罕见。但这个测试如果用字符串来匹配的话,匹配开销比分支选择大不少,对测试分支本身的耗时感觉会不太明显。
--
👇
wangbyby: 表驱动的话,用数组代替HashMap尝试下?
表驱动的话,用数组代替HashMap尝试下?