本来是打算测试并发哈希表的,就先试了普通的mutex
包裹在thread
和tokio
下性能差异。
tokio
下自然用的是异步的tokio::sync::Mutex
,结果发现性能比起thread
下的互斥锁差非常多,不管是10个线程协程还是1000个线程协程性能都不行。
那干脆就先测测异步锁的性能把。
理论上在tokio
的运行时里是不推荐用同步的std::sync::Mutex
,但是测了测表现却很好。
不过同步std::sync::Mutex
终究不是一个合适的选择,于是我上crates.io上查找了下载次数比较多的异步锁,cargo.toml依赖如下:
[dependencies]
tokio = {version = "1.24.1", features = ["full"]}
futures = "0.3.25"
async-mutex = "1.4.0"
async-lock = "2.6.0"
fast-async-mutex = "0.6.7"
然后写了个测试程序,并发一百万次Vec::push(usize)
,分别测试了10个线程/协程100000次操作和1000个线程/协程1000次操作的对比,重复运行10次。
其中scope-std-mutex
是代表thread::scope
下使用std::sync::Mutex
的并发操作,tokio-std-mutex
则代表tokio
用std::sync::Mutex
,tokio-mutex
则就是用自带的互斥锁tokio::sync::Mutex
,后面三个则是找到的第三方写的异步锁。
threads: 10, times: 100000, single thread: 4.3275ms
thread-std-mutex: 29.7466ms, 24.0701ms, 23.6063ms, 26.3388ms, 25.6356ms, 25.478ms, 25.2364ms, 26.7211ms, 27.4124ms, 28.9356ms,
scope-std-mutex: 25.9718ms, 26.8308ms, 26.244ms, 24.9638ms, 27.0795ms, 25.9664ms, 28.3934ms, 25.922ms, 24.2884ms, 25.5083ms,
tokio-std-mutex: 28.0744ms, 31.5944ms, 28.6856ms, 29.2385ms, 31.3562ms, 22.8386ms, 29.9602ms, 29.3549ms, 29.098ms, 22.945ms,
tokio-mutex: 191.6492ms, 185.5199ms, 186.864ms, 188.0478ms, 187.5107ms, 187.2095ms, 186.9775ms, 205.9339ms, 193.0365ms, 195.6984ms,
async-mutex: 23.5128ms, 21.5624ms, 23.4476ms, 22.4993ms, 21.4914ms, 21.4246ms, 21.5033ms, 21.495ms, 21.3756ms, 21.1966ms,
async-lock: 21.8512ms, 21.1139ms, 22.1052ms, 21.4546ms, 21.9403ms, 20.9468ms, 21.1317ms, 21.543ms, 21.6153ms, 21.0242ms,
fast-async-mutex: 112.6995ms, 117.5415ms, 142.6832ms, 113.5288ms, 121.1797ms, 113.1455ms, 120.0788ms, 100.8301ms, 120.296ms, 128.6366ms,
parking-lot: 72.285ms, 66.5004ms, 67.1994ms, 70.7451ms, 69.1513ms, 64.9047ms, 68.6076ms, 68.718ms, 72.0517ms, 69.1927ms,
threads: 1000, times: 1000, single thread: 4.4056ms
thread-std-mutex: 60.8151ms, 52.8336ms, 49.8579ms, 48.2702ms, 48.2183ms, 47.5837ms, 47.5442ms, 47.7192ms, 48.9598ms, 48.7215ms,
scope-std-mutex: 46.1217ms, 45.827ms, 45.4512ms, 45.2878ms, 45.8804ms, 45.2085ms, 47.5761ms, 45.4836ms, 44.1086ms, 43.5693ms,
tokio-std-mutex: 31.4038ms, 34.3193ms, 32.1812ms, 26.7186ms, 40.3314ms, 27.3657ms, 35.5608ms, 27.6058ms, 27.4601ms, 35.1832ms,
tokio-mutex: 192.4593ms, 187.2106ms, 190.6008ms, 189.7324ms, 187.8145ms, 188.5094ms, 187.1712ms, 192.2489ms, 213.6596ms, 192.5352ms,
async-mutex: 391.3653ms, 382.5787ms, 373.099ms, 379.5068ms, 385.3628ms, 381.0376ms, 385.1471ms, 382.0066ms, 395.1736ms, 405.4637ms,
async-lock: 400.566ms, 405.8318ms, 391.4916ms, 391.8056ms, 385.566ms, 383.4072ms, 384.6201ms, 379.608ms, 385.2667ms, 394.9906ms,
fast-async-mutex: 105.6818ms, 92.5969ms, 125.9783ms, 121.0262ms, 121.8443ms, 99.8092ms, 98.8561ms, 127.0074ms, 125.2284ms, 93.9906ms,
parking-lot: 73.6634ms, 73.2848ms, 70.5412ms, 70.3002ms, 70.6306ms, 73.9181ms, 70.2321ms, 71.6969ms, 75.0105ms, 71.7061ms,
单线程百万次操作Vec::push
大概只要4ms+,其他测试多出来的时间就是线程/协程切换以及抢锁的性能损耗啦。
首先作为对比基准的传统thread::spawn
和Scope::spawn
,不管是10线程还是1000线程两者表现基本一致,scope
的表现略好一点点,但不明显,而且起码在开1000线程的情况下,时间消耗也没有想象中的大。
然后再看看tokio
运行时的情况,用同步锁的std::sync::Mutex
性能比tokio::sync::Mutex
性能好了5-6倍,并且随着协程数量增加,抢锁性能损失似乎都不严重。
std::sync::Mutex
是同步锁,锁住的是当前线程,也会把整个线程下的所有协程都锁住。但如果在锁定的过程中不涉及长时间的异步操作的话,似乎同步的Mutex
也是一个选择。
而后面的第三方库,async-mutex
和async-lock
在协程数较少的时候表现还不错,但协程数量一旦加大性能就急剧下降。然而协程专属的场景大多属于高并发场景,1000协程并发甚至都是负载比较轻的情况。
fast-async-mutex
倒是中规中矩,但加锁性能还是不如线程的同步锁,不过总体上看也算是一个相对优先的选择。
tokio
官方文档推荐的parking-lot
也测了,虽然表现还行,但毕竟也是同步锁。
协程并发在IO性能上比线程要强很多,但异步锁的性能确实都不咋地,如果高频使用的话搞不好反而成为性能拖后腿的那一环。
几个并发互斥锁的源码概略看了看,基本还是原子操作参与循环的那一套,如果要优化的话,个人觉得可能需要从协程的更底层优化才行,被锁的协程单独拿出来做事件循环,否则跟着所有事件做循环感觉性能瓶颈还是不小。当然,只是说说而已,反正我是写是写不出的。
不知道还有没有更好的第三方库,像无锁并发哈希表已经卷到极致了,性能比不知道比std::sync::RwLock<HashMap<K, V>>
强多少倍去。
测试代码放在最后了,没啥好注释说明的,核心语句不是就是v.lock().unwrap().push(j);
就是v.lock().await.push(j);
:
use std::thread;
use std::sync::{Arc, Mutex};
use std::time::Instant;
use futures::future;
use tokio;
use async_mutex;
use async_lock;
use fast_async_mutex;
use parking_lot;
const N_THREADS: usize = 1000;
const N_TIMES: usize = 1000;
const N_LOOP: usize = 10;
#[tokio::main(worker_threads = 10)]
async fn main() {
print!("threads: {}, times: {}, ", N_THREADS, N_TIMES);
print!("single thread: ");
let start = Instant::now();
let mut v = Vec::new();
for i in 0..N_THREADS {
for j in i * N_TIMES .. (i + 1) * N_TIMES {
v.push(j);
}
}
print!("{:?}\n", start.elapsed());
print!("\nthread-std-mutex: ");
for _ in 1..=N_LOOP {
let v = Arc::new(Mutex::new(Vec::new()));
let start = Instant::now();
(0..N_THREADS).map(|i| {
let v = v.clone();
thread::spawn(move || {
for j in i * N_TIMES .. (i + 1) * N_TIMES {
v.lock().unwrap().push(j);
}
})
}).collect::<Vec<_>>()
.into_iter()
.for_each(|handle| handle.join().unwrap());
print!("{:?}, ", start.elapsed());
}
print!("\nscope-std-mutex: ");
for _ in 1..=N_LOOP {
let v = Mutex::new(Vec::new());
let start = Instant::now();
thread::scope(|s| {
for i in 0..N_THREADS {
let v = &v;
s.spawn(move || {
for j in i * N_TIMES .. (i + 1) * N_TIMES {
v.lock().unwrap().push(j);
}
});
}
});
print!("{:?}, ", start.elapsed());
}
print!("\ntokio-std-mutex: ");
for _ in 1..=N_LOOP {
let v = Arc::new(Mutex::new(Vec::new()));
let start = tokio::time::Instant::now();
future::join_all((0..N_THREADS).map(|i| {
let v = v.clone();
tokio::spawn( async move {
for j in i * N_TIMES.. (i + 1) * N_TIMES {
v.lock().unwrap().push(j);
}
})
})).await;
print!("{:?}, ", start.elapsed());
}
print!("\ntokio-mutex: ");
for _ in 1..=N_LOOP {
let v = Arc::new(tokio::sync::Mutex::new(Vec::new()));
let start = tokio::time::Instant::now();
future::join_all((0..N_THREADS).map(|i| {
let v = v.clone();
tokio::spawn( async move {
for j in i * N_TIMES.. (i + 1) * N_TIMES {
v.lock().await.push(j);
}
})
})).await;
print!("{:?}, ", start.elapsed());
}
print!("\nasync-mutex: ");
for _ in 1..=N_LOOP {
let v = Arc::new(async_mutex::Mutex::new(Vec::new()));
let start = tokio::time::Instant::now();
future::join_all((0..N_THREADS).map(|i| {
let v = v.clone();
tokio::spawn( async move {
for j in i * N_TIMES.. (i + 1) * N_TIMES {
v.lock().await.push(j);
}
})
})).await;
print!("{:?}, ", start.elapsed());
}
print!("\nasync-lock: ");
for _ in 1..=N_LOOP {
let v = Arc::new(async_lock::Mutex::new(Vec::new()));
let start = tokio::time::Instant::now();
future::join_all((0..N_THREADS).map(|i| {
let v = v.clone();
tokio::spawn( async move {
for j in i * N_TIMES.. (i + 1) * N_TIMES {
v.lock().await.push(j);
}
})
})).await;
print!("{:?}, ", start.elapsed());
}
print!("\nfast-async-mutex: ");
for _ in 1..=N_LOOP {
let v = Arc::new(fast_async_mutex::mutex::Mutex::new(Vec::new()));
let start = tokio::time::Instant::now();
future::join_all((0..N_THREADS).map(|i| {
let v = v.clone();
tokio::spawn( async move {
for j in i * N_TIMES.. (i + 1) * N_TIMES {
v.lock().await.push(j);
}
})
})).await;
print!("{:?}, ", start.elapsed());
}
print!("\nparking-lot: ");
for _ in 1..=N_LOOP {
let v = Arc::new(parking_lot::Mutex::new(Vec::new()));
let start = tokio::time::Instant::now();
future::join_all((0..N_THREADS).map(|i| {
let v = v.clone();
tokio::spawn( async move {
for j in i * N_TIMES.. (i + 1) * N_TIMES {
v.lock().push(j);
}
})
})).await;
print!("{:?}, ", start.elapsed());
}
}
评论区
写评论自己写的程序,确实瓶颈在 tokio mutex 锁
tokio 的官方文档不建议使用 mutex,性能不高。
也不算太详细,只是说了
tokio::sync::Mutex
内部也是互斥锁,性能不太好。也提到了如果不互斥的情况下推荐用
std::sync::Mutex
,但也是有前提的,就是在设计系统的时候不能出现持有锁时发生异步等待,所有我觉得终极解决方案还是得挖掘出性能更好的异步锁。不过上次看教程已经是太久之前了,后来基本只看api了。当年看文档的时候确实埋下了一个种子,总有印象说
tokio::sync::Mutex
性能差,但又找不到出处,现在才发现原来是在文档里。加测了文档里推荐的
parking_lot::Mutex
,说是比标准库强,但我这里测出的性能还是不如标准库的Mutex
。至于文档里面推荐用通道来处理共享数据,通道对状态控制更有序,改天有时间也可以测测性能。
--
👇
7sDream: Tokio Mutex 的文档很详细的描述了为什么 Async 的 Mutex 性能反而会差,到底什么时候才需要用到 Async 的 Mutex,以及比用锁更好的方案。
Tokio Mutex 的文档很详细的描述了为什么 Async 的 Mutex 性能反而会差,到底什么时候才需要用到 Async 的 Mutex,以及比用锁更好的方案。