< 返回版块

wfxr 发表于 2021-01-21 23:22

Tags:问题

遇到一个问题想请教一下大家。我想改进一个项目中的缓存读取性能,将原来的Mutex替换为RwLock,结果发现即便无写入,只有并发读,性能也没有什么改观。于是作了下面这个测试,结果令我很惊讶:

use std::collections::HashMap;
use std::sync::Arc;
use std::sync::RwLock;
use std::thread;
use std::time;

fn main() {
    for i in 1..=8 {
        workload(i);
    }
}

fn workload(concurrency: usize) {
    let total = 1000 * 1000;
    let mut m = HashMap::new();
    for i in 0..total {
        m.insert(i, i);
    }
    let m = Arc::new(RwLock::new(m));

    let now = time::Instant::now();
    let threads: Vec<_> = (0..concurrency)
        .map(|_| {
            let m = m.clone();
            thread::spawn(move || {
                for i in 0..total {
                    let _x = m.read().unwrap().get(&i);
                }
            })
        })
        .collect();

    for t in threads {
        t.join().unwrap();
    }
    let t = now.elapsed();
    println!(
        "threads: {}; time used: {:?}; ips: {}",
        concurrency,
        t,
        (total * concurrency) as f64 / t.as_secs_f64()
    );
}

cargo run --release输出如下:

threads: 1; time used: 77.838377ms; ips: 12847133.23352053
threads: 2; time used: 205.569367ms; ips: 9729076.025223155
threads: 3; time used: 328.003797ms; ips: 9146235.584583797
threads: 4; time used: 415.737089ms; ips: 9621465.358362578
threads: 5; time used: 508.222261ms; ips: 9838215.252834035
threads: 6; time used: 586.550472ms; ips: 10229298.732880399
threads: 7; time used: 720.991697ms; ips: 9708849.67070571
threads: 8; time used: 856.792181ms; ips: 9337153.369750464

每个线程的负载是相同的,看起来增加线程明显只起到了负面作用。

我把代码翻译成Go,发现情况要好很多:

threads: 1; time used: 156.012685ms; ips: 6409735.208390
threads: 2; time used: 163.830266ms; ips: 12207756.532606
threads: 3; time used: 189.644867ms; ips: 15819041.387500
threads: 4; time used: 209.123695ms; ips: 19127435.559132
threads: 5; time used: 225.407194ms; ips: 22182078.181586
threads: 6; time used: 261.852325ms; ips: 22913678.539994
threads: 7; time used: 296.061541ms; ips: 23643732.908895
threads: 8; time used: 322.794129ms; ips: 24783598.217178

虽然性能不是线性提升的,但相比我写的rust版本要符合预期。

我觉得很可能是我写的rust版本有问题,想请教一下大家问题出在哪里,或者提高Hashmap的并发读性能的正确姿势是什么呢?

EDIT 1

其实我觉得重点并不是RwLock的加锁和解锁耗时长短,而是为什么线程数增加性能却不会增加甚至下降?线程内部只读不写的情况下,假设1个线程耗时100ms,如果读是并发的,核心数足够,理想情况下N个线程耗时应该还是100ms,总吞吐量是单线程的N倍。当然,实际情况效率不可能和线程数成正比的,线程切换和状态同步开销不可避免,但我觉得也不应该是单线程性能最高吧,毕竟RwLock存在的最大意义就是支持并发读。总感觉还是哪里没对

EDIT 2

@hr567 提供提了一个优化思路,将锁放到循环外面,性能确实会好很多,结果也比较符合预期。不过针对缓存场景,锁放到循环内部模拟可能更准确一些(循环模拟的是持续的请求,每次请求都需要访问一次缓存,但不能一直握着读锁不放,因为外部有时还需要写锁来更新缓存)。所以是不是说Rust中,对于读多写少的缓存来说,还是应该选择用Mutex更好呢?(之前测试参数写错了,实际除了单线程RwLock的表现还是好与Mutex的)

下面是同一台机器上Mutex和RwLock的测试结果,除了锁做了替换,其他代码完全一样:

Mutex

threads: 1; time used: 78.146596ms; ips: 12796462.689174587
threads: 2; time used: 554.152693ms; ips: 3609113.562496032
threads: 3; time used: 417.027343ms; ips: 7193772.90327939
threads: 4; time used: 717.132682ms; ips: 5577768.38289459
threads: 5; time used: 1.701271272s; ips: 2938978.6815844146
threads: 6; time used: 1.817029184s; ips: 3302093.3581218696
threads: 7; time used: 2.372727488s; ips: 2950191.30321636
threads: 8; time used: 2.505103477s; ips: 3193480.857557406

RwLock

threads: 1; time used: 107.624433ms; ips: 9291570.437355986
threads: 2; time used: 278.304096ms; ips: 7186383.631234806
threads: 3; time used: 406.974556ms; ips: 7371468.205496365
threads: 4; time used: 527.331438ms; ips: 7585362.282155459
threads: 5; time used: 618.426131ms; ips: 8085039.990006503
threads: 6; time used: 767.771963ms; ips: 7814820.401301891
threads: 7; time used: 830.143264ms; ips: 8432279.46736673
threads: 8; time used: 908.431399ms; ips: 8806388.692427836

评论区

写评论
Neutron3529 2022-06-13 21:03

挖个坟

其实最主要的问题出在RwLock上面

只要把RwLock换成ArcSwap,就可以让ips随线程数目近似线性增长

use std::collections::HashMap;
use std::sync::Arc;
use arc_swap::ArcSwap;//第一处更改,原文是 use std::sync::RwLock;
use std::thread;
use std::time;

fn main() {
    for i in 1..=8 {
        workload(i);
    }
}

fn workload(concurrency: usize) {
    let total = 1000 * 1000;
    let mut m = HashMap::new();
    for i in 0..total {
        m.insert(i, i);
    }
    let m = Arc::new(ArcSwap::from_pointee(m)); // 第二处更改,原文是let m = Arc::new(RwLock::new(m));

    let now = time::Instant::now();
    let threads: Vec<_> = (0..concurrency)
        .map(|_| {
            let m = m.clone();
            thread::spawn(move || {
                for i in 0..total {
                    let _x = m.load().get(&i).unwrap(); // 第三处更改,原文是 let _x = m.read().unwrap().get(&i);
                }
            })
        })
        .collect();

    for t in threads {
        t.join().unwrap();
    }
    let t = now.elapsed();
    println!(
        "threads: {}; time used: {:?}; ips: {}",
        concurrency,
        t,
        (total * concurrency) as f64 / t.as_secs_f64()
    );
}
liyongjing 2021-12-08 11:15

完整代码呢?

--
👇
星夜的蓝天: 经过测试,将标准库中hashtable的hasher替换为ahash,然后用arcswap包裹后,性能应该可以满足要求,在此附上测试结果:

//golang RWlock加在循环外
threads: 1; time used: 54.8532ms; ips: 18230476.982200
threads: 2; time used: 58.6333ms; ips: 34110309.329340
threads: 3; time used: 53.8285ms; ips: 55732558.031526
threads: 4; time used: 55.8402ms; ips: 71632981.257230
threads: 5; time used: 53.8609ms; ips: 92831720.227475
threads: 6; time used: 53.8311ms; ips: 111459732.385183
threads: 7; time used: 53.8293ms; ips: 130040702.739958
threads: 8; time used: 57.8473ms; ips: 138295132.184216

//rust 标准库hashmap by ahash + arcswap + tokio异步并发 锁放在循环外
threads: 1; time used: 68.6303ms; ips: 14570823.674091471
threads: 2; time used: 66.8249ms; ips: 29928963.604883805
threads: 3; time used: 66.7402ms; ips: 44950419.687085144
threads: 4; time used: 70.3522ms; ips: 56856786.28386887
threads: 5; time used: 74.6449ms; ips: 66983812.69182489
threads: 6; time used: 94.8476ms; ips: 63259376.093860045
threads: 7; time used: 88.215ms; ips: 79351584.1976988
threads: 8; time used: 85.9039ms; ips: 93127320.

//golang RWlock放在循环内
threads: 1; time used: 56.9117ms; ips: 17571079.408979
threads: 2; time used: 65.7252ms; ips: 30429728.627680
threads: 3; time used: 70.8108ms; ips: 42366418.681896
threads: 4; time used: 85.7712ms; ips: 46635700.561494
threads: 5; time used: 116.6876ms; ips: 42849454.440746
threads: 6; time used: 128.1855ms; ips: 46807166.177142
threads: 7; time used: 154.5913ms; ips: 45280685.264953
threads: 8; time used: 160.5706ms; ips: 49822321.147209

//rust 标准库hashmap by ahash + arcswap + tokio异步并发 锁放在循环内
threads: 1; time used: 99.983ms; ips: 10001700.289049137
threads: 2; time used: 96.9572ms; ips: 20627658.389474943
threads: 3; time used: 114.6763ms; ips: 26160592.90367757
threads: 4; time used: 95.6647ms; ips: 41812706.254240066
threads: 5; time used: 95.9434ms; ips: 52114058.91390132
threads: 6; time used: 110.1279ms; ips: 54482106.71410242
threads: 7; time used: 110.9873ms; ips: 63070279.212126076
threads: 8; time used: 120.2899ms; ips: 66505999.25679546

//rust 标准库hashmap by ahash + arcswap,普通同步版本 锁放在循环外
threads: 1; time used: 66.2471ms; ips: 15095000.38492251
threads: 2; time used: 65.6177ms; ips: 30479580.966720868
threads: 3; time used: 65.6019ms; ips: 45730382.80903449
threads: 4; time used: 64.4413ms; ips: 62071994.202475734
threads: 5; time used: 65.5881ms; ips: 76233341.10913414
threads: 6; time used: 65.702ms; ips: 91321420.96131016
threads: 7; time used: 73.1313ms; ips: 95718249.23117736
threads: 8; time used: 70.7968ms; ips: 112999457.60260352

//rust 标准库hashmap by ahash + arcswap,普通同步版本 锁放在循环内
threads: 1; time used: 95.5114ms; ips: 10469954.371938847
threads: 2; time used: 110.127ms; ips: 18160850.654244643
threads: 3; time used: 107.8158ms; ips: 27825235.26236414
threads: 4; time used: 119.58ms; ips: 33450409.76751965
threads: 5; time used: 135.3368ms; ips: 36944866.436918855
threads: 6; time used: 93.306ms; ips: 64304546.33142563
threads: 7; time used: 103.8335ms; ips: 67415622.12580718
threads: 8; time used: 108.8027ms; ips: 73527587.09112917

核心代码

use ahash::{AHasher, RandomState};
fn workload(concurrency: usize){
	let mut m: HashMap<i32, i32, RandomState> = HashMap::default();
}

--
👇
Blues-star: 之前用tokio的闭包写错了,导致task没有执行,得到的之前用tokio的闭包写错了,导致task没有执行,得到的结果是错误的,用tokio并发+hashbrown+ArcSwap改写后的测试结果如下

use std::{collections::HashMap, sync::Arc, time};
use arc_swap::ArcSwap;
use dashmap;
use futures::future;
fn main() {
    for i in 1..=8 {
        let now = time::Instant::now();
        /* tokio::task::block_in_place(move||{
            workload(i)
        }).await; */
        tokio::runtime::Builder::new_multi_thread()
        .enable_all()
        .build()
        .unwrap().block_on(async {
            workload(i).await;
        });
        let t = now.elapsed();
        println!(
            "threads: {}; time used: {:4?}; ips: {}",
            i,
            t,
            (1000*1000 * i) as f64 / t.as_secs_f64()
        );
    }
}

async fn workload(concurrency: usize) {
    let total: i32 = 1000 * 1000;
    let mut m = hashbrown::HashMap::new();
    for i in 0..total {
        m.insert(i, i);
    }

    let m = Arc::new(ArcSwap::from_pointee(m));
    let threads: Vec<_> = (0..concurrency)
        .map(|_| {
            let m = m.clone();
            tokio::spawn(async move{
                let _x = m.load();
                for i in 0..total {
                    let _ = _x.get(&i).unwrap();
                }
            })
        })
        .collect();
    futures::future::join_all(threads).await;
}
threads: 1; time used: 70.8154ms; ips: 14121222.220025588
threads: 2; time used: 76.0857ms; ips: 26286148.382679004
threads: 3; time used: 66.213ms; ips: 45308323.13896063
threads: 4; time used: 76.6869ms; ips: 52160147.30025598
threads: 5; time used: 73.8615ms; ips: 67694265.61875944
threads: 6; time used: 65.8366ms; ips: 91134718.3785311
threads: 7; time used: 66.782ms; ips: 104818663.71177863
threads: 8; time used: 72.8052ms; ips: 109882261.15717009
星夜的蓝天 2021-01-26 11:34

经过测试,将标准库中hashtable的hasher替换为ahash,然后用arcswap包裹后,性能应该可以满足要求,在此附上测试结果:

//golang RWlock加在循环外
threads: 1; time used: 54.8532ms; ips: 18230476.982200
threads: 2; time used: 58.6333ms; ips: 34110309.329340
threads: 3; time used: 53.8285ms; ips: 55732558.031526
threads: 4; time used: 55.8402ms; ips: 71632981.257230
threads: 5; time used: 53.8609ms; ips: 92831720.227475
threads: 6; time used: 53.8311ms; ips: 111459732.385183
threads: 7; time used: 53.8293ms; ips: 130040702.739958
threads: 8; time used: 57.8473ms; ips: 138295132.184216

//rust 标准库hashmap by ahash + arcswap + tokio异步并发 锁放在循环外
threads: 1; time used: 68.6303ms; ips: 14570823.674091471
threads: 2; time used: 66.8249ms; ips: 29928963.604883805
threads: 3; time used: 66.7402ms; ips: 44950419.687085144
threads: 4; time used: 70.3522ms; ips: 56856786.28386887
threads: 5; time used: 74.6449ms; ips: 66983812.69182489
threads: 6; time used: 94.8476ms; ips: 63259376.093860045
threads: 7; time used: 88.215ms; ips: 79351584.1976988
threads: 8; time used: 85.9039ms; ips: 93127320.

//golang RWlock放在循环内
threads: 1; time used: 56.9117ms; ips: 17571079.408979
threads: 2; time used: 65.7252ms; ips: 30429728.627680
threads: 3; time used: 70.8108ms; ips: 42366418.681896
threads: 4; time used: 85.7712ms; ips: 46635700.561494
threads: 5; time used: 116.6876ms; ips: 42849454.440746
threads: 6; time used: 128.1855ms; ips: 46807166.177142
threads: 7; time used: 154.5913ms; ips: 45280685.264953
threads: 8; time used: 160.5706ms; ips: 49822321.147209

//rust 标准库hashmap by ahash + arcswap + tokio异步并发 锁放在循环内
threads: 1; time used: 99.983ms; ips: 10001700.289049137
threads: 2; time used: 96.9572ms; ips: 20627658.389474943
threads: 3; time used: 114.6763ms; ips: 26160592.90367757
threads: 4; time used: 95.6647ms; ips: 41812706.254240066
threads: 5; time used: 95.9434ms; ips: 52114058.91390132
threads: 6; time used: 110.1279ms; ips: 54482106.71410242
threads: 7; time used: 110.9873ms; ips: 63070279.212126076
threads: 8; time used: 120.2899ms; ips: 66505999.25679546

//rust 标准库hashmap by ahash + arcswap,普通同步版本 锁放在循环外
threads: 1; time used: 66.2471ms; ips: 15095000.38492251
threads: 2; time used: 65.6177ms; ips: 30479580.966720868
threads: 3; time used: 65.6019ms; ips: 45730382.80903449
threads: 4; time used: 64.4413ms; ips: 62071994.202475734
threads: 5; time used: 65.5881ms; ips: 76233341.10913414
threads: 6; time used: 65.702ms; ips: 91321420.96131016
threads: 7; time used: 73.1313ms; ips: 95718249.23117736
threads: 8; time used: 70.7968ms; ips: 112999457.60260352

//rust 标准库hashmap by ahash + arcswap,普通同步版本 锁放在循环内
threads: 1; time used: 95.5114ms; ips: 10469954.371938847
threads: 2; time used: 110.127ms; ips: 18160850.654244643
threads: 3; time used: 107.8158ms; ips: 27825235.26236414
threads: 4; time used: 119.58ms; ips: 33450409.76751965
threads: 5; time used: 135.3368ms; ips: 36944866.436918855
threads: 6; time used: 93.306ms; ips: 64304546.33142563
threads: 7; time used: 103.8335ms; ips: 67415622.12580718
threads: 8; time used: 108.8027ms; ips: 73527587.09112917

核心代码

use ahash::{AHasher, RandomState};
fn workload(concurrency: usize){
	let mut m: HashMap<i32, i32, RandomState> = HashMap::default();
}

--
👇
Blues-star: 之前用tokio的闭包写错了,导致task没有执行,得到的之前用tokio的闭包写错了,导致task没有执行,得到的结果是错误的,用tokio并发+hashbrown+ArcSwap改写后的测试结果如下

use std::{collections::HashMap, sync::Arc, time};
use arc_swap::ArcSwap;
use dashmap;
use futures::future;
fn main() {
    for i in 1..=8 {
        let now = time::Instant::now();
        /* tokio::task::block_in_place(move||{
            workload(i)
        }).await; */
        tokio::runtime::Builder::new_multi_thread()
        .enable_all()
        .build()
        .unwrap().block_on(async {
            workload(i).await;
        });
        let t = now.elapsed();
        println!(
            "threads: {}; time used: {:4?}; ips: {}",
            i,
            t,
            (1000*1000 * i) as f64 / t.as_secs_f64()
        );
    }
}

async fn workload(concurrency: usize) {
    let total: i32 = 1000 * 1000;
    let mut m = hashbrown::HashMap::new();
    for i in 0..total {
        m.insert(i, i);
    }

    let m = Arc::new(ArcSwap::from_pointee(m));
    let threads: Vec<_> = (0..concurrency)
        .map(|_| {
            let m = m.clone();
            tokio::spawn(async move{
                let _x = m.load();
                for i in 0..total {
                    let _ = _x.get(&i).unwrap();
                }
            })
        })
        .collect();
    futures::future::join_all(threads).await;
}
threads: 1; time used: 70.8154ms; ips: 14121222.220025588
threads: 2; time used: 76.0857ms; ips: 26286148.382679004
threads: 3; time used: 66.213ms; ips: 45308323.13896063
threads: 4; time used: 76.6869ms; ips: 52160147.30025598
threads: 5; time used: 73.8615ms; ips: 67694265.61875944
threads: 6; time used: 65.8366ms; ips: 91134718.3785311
threads: 7; time used: 66.782ms; ips: 104818663.71177863
threads: 8; time used: 72.8052ms; ips: 109882261.15717009
星夜的蓝天 2021-01-26 11:04

之前用tokio的闭包写错了,导致task没有执行,得到的之前用tokio的闭包写错了,导致task没有执行,得到的结果是错误的,用tokio并发+hashbrown+ArcSwap改写后的测试结果如下

use std::{collections::HashMap, sync::Arc, time};
use arc_swap::ArcSwap;
use dashmap;
use futures::future;
fn main() {
    for i in 1..=8 {
        let now = time::Instant::now();
        /* tokio::task::block_in_place(move||{
            workload(i)
        }).await; */
        tokio::runtime::Builder::new_multi_thread()
        .enable_all()
        .build()
        .unwrap().block_on(async {
            workload(i).await;
        });
        let t = now.elapsed();
        println!(
            "threads: {}; time used: {:4?}; ips: {}",
            i,
            t,
            (1000*1000 * i) as f64 / t.as_secs_f64()
        );
    }
}

async fn workload(concurrency: usize) {
    let total: i32 = 1000 * 1000;
    let mut m = hashbrown::HashMap::new();
    for i in 0..total {
        m.insert(i, i);
    }

    let m = Arc::new(ArcSwap::from_pointee(m));
    let threads: Vec<_> = (0..concurrency)
        .map(|_| {
            let m = m.clone();
            tokio::spawn(async move{
                let _x = m.load();
                for i in 0..total {
                    let _ = _x.get(&i).unwrap();
                }
            })
        })
        .collect();
    futures::future::join_all(threads).await;
}
threads: 1; time used: 70.8154ms; ips: 14121222.220025588
threads: 2; time used: 76.0857ms; ips: 26286148.382679004
threads: 3; time used: 66.213ms; ips: 45308323.13896063
threads: 4; time used: 76.6869ms; ips: 52160147.30025598
threads: 5; time used: 73.8615ms; ips: 67694265.61875944
threads: 6; time used: 65.8366ms; ips: 91134718.3785311
threads: 7; time used: 66.782ms; ips: 104818663.71177863
threads: 8; time used: 72.8052ms; ips: 109882261.15717009
liyiheng 2021-01-25 10:16

go 版的加锁解锁操作放到循环外还是比 rust 版快一倍

--
👇
liyiheng: 在我机器上,用 tokio::sync::RwLock 结果:

threads: 1; time used: 196.440747ms; ips: 5090593.551856123
threads: 2; time used: 188.8764ms; ips: 10588935.409611788
threads: 3; time used: 182.38838ms; ips: 16448416.286169108
threads: 4; time used: 176.889511ms; ips: 22612985.79767118
threads: 5; time used: 176.260183ms; ips: 28367155.388690367
threads: 6; time used: 180.039787ms; ips: 33325966.99861681
threads: 7; time used: 179.170575ms; ips: 39068915.194361575
threads: 8; time used: 233.967127ms; ips: 34192837.6972377

Go 结果:

threads: 1; time used: 95.032154ms; ips: 10522754.224849
threads: 2; time used: 112.108771ms; ips: 17839817.368081
threads: 3; time used: 137.910143ms; ips: 21753294.824732
threads: 4; time used: 204.14053ms; ips: 19594345.130778
threads: 5; time used: 245.662556ms; ips: 20353122.109500
threads: 6; time used: 283.926159ms; ips: 21132255.024096
threads: 7; time used: 324.315567ms; ips: 21583916.136841
threads: 8; time used: 360.712523ms; ips: 22178326.201333
liyiheng 2021-01-24 20:50

在我机器上,用 tokio::sync::RwLock 结果:

threads: 1; time used: 196.440747ms; ips: 5090593.551856123
threads: 2; time used: 188.8764ms; ips: 10588935.409611788
threads: 3; time used: 182.38838ms; ips: 16448416.286169108
threads: 4; time used: 176.889511ms; ips: 22612985.79767118
threads: 5; time used: 176.260183ms; ips: 28367155.388690367
threads: 6; time used: 180.039787ms; ips: 33325966.99861681
threads: 7; time used: 179.170575ms; ips: 39068915.194361575
threads: 8; time used: 233.967127ms; ips: 34192837.6972377

Go 结果:

threads: 1; time used: 95.032154ms; ips: 10522754.224849
threads: 2; time used: 112.108771ms; ips: 17839817.368081
threads: 3; time used: 137.910143ms; ips: 21753294.824732
threads: 4; time used: 204.14053ms; ips: 19594345.130778
threads: 5; time used: 245.662556ms; ips: 20353122.109500
threads: 6; time used: 283.926159ms; ips: 21132255.024096
threads: 7; time used: 324.315567ms; ips: 21583916.136841
threads: 8; time used: 360.712523ms; ips: 22178326.201333
liyiheng 2021-01-24 20:35

ArcSwap 换成 tokio::sync::RwLock 呢?

bzeuy 2021-01-23 18:03

推荐使用 dashmap

hashmap
threads: 1;	time used: 106.9972ms;	ips: 9346038.961767225
threads: 2;	time used: 182.9294ms;	ips: 10933179.685714817
threads: 3;	time used: 247.2871ms;	ips: 12131647.789148726
threads: 4;	time used: 311.3242ms;	ips: 12848342.660159409
threads: 5;	time used: 390.6577ms;	ips: 12798928.576091038
threads: 6;	time used: 514.6544ms;	ips: 11658308.954513943
threads: 7;	time used: 668.6168ms;	ips: 10469374.984295938
threads: 8;	time used: 769.5137ms;	ips: 10396176.182438338

dashmap
threads: 1;	time used: 126.606ms;	ips: 7898519.817386222
threads: 2;	time used: 165.3296ms;	ips: 12097047.352682158
threads: 3;	time used: 161.3215ms;	ips: 18596405.31485264
threads: 4;	time used: 167.0802ms;	ips: 23940598.586786464
threads: 5;	time used: 172.6175ms;	ips: 28965776.934551828
threads: 6;	time used: 177.9417ms;	ips: 33718909.05841632
threads: 7;	time used: 215.1384ms;	ips: 32537194.661668953
threads: 8;	time used: 284.6158ms;	ips: 28108067.085523717
作者 wfxr 2021-01-22 21:39

嗯,性能随线程数的提升幅度是更高了,我以为@Blues-star的意思是从RwLock换成ArcSwap后的绝对性能的提升。之前我在循环内部少调用了一个方法,导致ArcSwap的测试数据整体看起来快了好几倍,改了之后测试数据没有那么夸张。

--
👇
eweca: 讲道理你改完之后提升更高了,你算算看,原来8线程是4倍多效率,现在有5倍多效率了。

eweca 2021-01-22 21:00

讲道理你改完之后提升更高了,你算算看,原来8线程是4倍多效率,现在有5倍多效率了。

作者 wfxr 2021-01-22 13:24

抱歉,确实没有那么大的提升,我仔细看了代码,发现之前改写成ArcSwap的时候,循环内部只load了map本身,没有调用map的方法,代码如下:

use arc_swap::ArcSwap;
use std::collections::HashMap;
use std::sync::Arc;
use std::thread;
use std::time;

fn main() {
    for i in 1..=8 {
        workload(i);
    }
}

fn workload(concurrency: usize) {
    let total = 1000 * 1000;
    let mut m = HashMap::new();
    for i in 0..total {
        m.insert(i, i);
    }
    let m = Arc::new(ArcSwap::new(Arc::new(m)));

    let now = time::Instant::now();
    let threads: Vec<_> = (0..concurrency)
        .map(|_| {
            let m = m.clone();
            thread::spawn(move || {
                for i in 0..total {
                    // m.load(); <- 这里之前只写了load忘了调用get
                    m.load().get(&i).unwrap();
                }
            })
        })
        .collect();

    for t in threads {
        t.join().unwrap();
    }
    let t = now.elapsed();
    println!(
        "threads: {}; time used: {:?}; ips: {}",
        concurrency,
        t,
        (total * concurrency) as f64 / t.as_secs_f64()
    );
}

改正后的输出结果:

threads: 1; time used: 87.530045ms; ips: 11424648.530684521
threads: 2; time used: 91.95932ms; ips: 21748747.163419653
threads: 3; time used: 86.364278ms; ips: 34736584.030726224
threads: 4; time used: 85.833863ms; ips: 46601654.17464667
threads: 5; time used: 87.025722ms; ips: 57454277.71343282
threads: 6; time used: 90.554898ms; ips: 66258149.83525243
threads: 7; time used: 92.092906ms; ips: 76010197.78874172
threads: 8; time used: 130.952256ms; ips: 61090967.382799424

--
👇
Blues-star: 可否贴一下用arc_swap改进后的代码,我用arc_swap改写后有所提升,但time used的提升没有那么极致。

...

星夜的蓝天 2021-01-22 13:02

可否贴一下用arc_swap改进后的代码,我用arc_swap改写后有所提升,但time used的提升没有那么极致。

--
👇
wfxr: 谢谢建议,用arc-swap改写之后性能果然巨幅提升,而且确实随着线程增加性能在明显提升:

threads: 1; time used: 16.577433ms; ips: 60322970.38992708
threads: 2; time used: 17.707922ms; ips: 112943799.95574862
threads: 3; time used: 18.216453ms; ips: 164686286.62231883
threads: 4; time used: 28.19141ms; ips: 141887191.87866092
threads: 5; time used: 27.45889ms; ips: 182090390.3981552
threads: 6; time used: 27.809926ms; ips: 215750304.40570033
threads: 7; time used: 28.080309ms; ips: 249285006.08736178
threads: 8; time used: 28.297614ms; ips: 282709347.86233217

但是如何更新ArcSwap包裹的值还没搞懂,我先研究一下

--
👇
shaitao: 就是因为有这个问题有人专门写了个包叫arc-swap, 你可以试试

作者 wfxr 2021-01-22 11:17

谢谢建议,用arc-swap改写之后性能果然巨幅提升,而且确实随着线程增加性能在明显提升:

threads: 1; time used: 16.577433ms; ips: 60322970.38992708
threads: 2; time used: 17.707922ms; ips: 112943799.95574862
threads: 3; time used: 18.216453ms; ips: 164686286.62231883
threads: 4; time used: 28.19141ms; ips: 141887191.87866092
threads: 5; time used: 27.45889ms; ips: 182090390.3981552
threads: 6; time used: 27.809926ms; ips: 215750304.40570033
threads: 7; time used: 28.080309ms; ips: 249285006.08736178
threads: 8; time used: 28.297614ms; ips: 282709347.86233217

但是如何更新ArcSwap包裹的值还没搞懂,我先研究一下

--
👇
shaitao: 就是因为有这个问题有人专门写了个包叫arc-swap, 你可以试试

作者 wfxr 2021-01-22 10:14

其实是一样的,我开始是这么测试的,但结论还是无论多少个线程性能都比单线程差。你可能是在debug下测试的?

--
👇
ezlearning: 总工作量不变的情况,线程增加,耗时会减少:

let total = 1000 * 1000 / concurrency;
shaitao 2021-01-22 09:53

就是因为有这个问题有人专门写了个包叫arc-swap, 你可以试试

作者 wfxr 2021-01-22 09:51

是在debug下测试的吗?debug下看确实线程增加的时候性能有提升,但release模型我不管怎么测试,单线程性能都是最高的,线程数大于1的时候会有一个条变,性能大幅下降,线程数2到8的时候性能是差不多的,下面是我另一台机器测试结果:

threads: 1; time used: 26.019116ms; ips: 38433281.13068868
threads: 2; time used: 237.295127ms; ips: 8428323.09826573
threads: 3; time used: 384.998812ms; ips: 7792231.836808888
threads: 4; time used: 510.399077ms; ips: 7837004.7679376975
threads: 5; time used: 557.317268ms; ips: 8971550.474190582
threads: 6; time used: 718.010565ms; ips: 8356423.000544567
threads: 7; time used: 812.819901ms; ips: 8611993.864062637
threads: 8; time used: 909.05893ms; ips: 8800309.568489691

Go的测试结果单线程性能远低于rust,但线程数大于1的时候都比rust好很多:

threads: 1; time used: 66.197264ms; ips: 15106364.516817
threads: 2; time used: 86.929004ms; ips: 23007280.746021
threads: 3; time used: 128.166003ms; ips: 23407143.312412
threads: 4; time used: 166.692122ms; ips: 23996334.991764
threads: 5; time used: 214.590085ms; ips: 23300237.753296
threads: 6; time used: 246.510866ms; ips: 24339697.869545
threads: 7; time used: 289.216067ms; ips: 24203357.969044
threads: 8; time used: 329.879776ms; ips: 24251259.343646

(原帖是在家里测试的,CPU是i9 9900K,这次是单位测的,CPU是i7 7700HQ,OS都是Arch 内核版本是5.10.9-arch1-1)

--
👇
eweca: 我试了下你的代码,发现实际上是可以增加效率啊。但是吃到3个线程以上时,几乎没有什么提升了。

作者 wfxr 2021-01-22 09:37

@xin-water,

谢谢建议,分段锁肯定是有用的。只是不明白并为什么将独占锁替换成读写锁之后性能完全没有提升呢?

--
👇
xin-water: 不知道这种提升性能方法对你有没有用,某些情况可以用分段锁,比如这样:Arc::new(vec![rwlock::new(hashmap::new());1024])。

作者 wfxr 2021-01-22 09:33

@hr567,

谢谢你的建议,将锁放到循环外部确实可以提升性能,只是项目的应用场景是缓存,单次请求只会访问一次,所以无法锁定一次持续访问。Go语言的版本我也是在循环内部加锁和解锁,代码是这样的:

package main

import (
	"fmt"
	"sync"
	"time"
)

func main() {
	for i := 1; i <= 8; i++ {
		workload(i)
	}
}

func workload(concurrency int) {
	total := 1000 * 1000
	m := make(map[int]int)
	for i := 0; i < total; i++ {
		m[i] = i
	}

	mu := sync.RWMutex{}
	wg := sync.WaitGroup{}

	t := time.Now()
	for i := 0; i < concurrency; i++ {
		wg.Add(1)
		go func() {
			defer wg.Done()
			for i := 0; i < total; i++ {
				mu.RLock()
				_ = m[i]
				mu.RUnlock()
			}
		}()
	}
	wg.Wait()

	elpased := time.Now().Sub(t)
	fmt.Printf("threads: %d; time used: %v; ips: %f\n", concurrency, elpased, float64(total*concurrency)/elpased.Seconds())
}

go语言的版本虽然线程数比较多的时候性能提升幅度会逐渐减少,但总是可以找到一个合理的线程数获得比单线程高得多的性能,但是我测试的rust版本无论线程多少都不然单线程,所以我有点纳闷

--
👇
hr567: 虽然在看到你写的go语言的程序前无法做精确的判断,不过在rust(或者其他任何语言)中反复获取/释放锁都是一件非常耗时的工作。

在该程序创建的每个新线程中,为了读取HashMap中的值,每次循环都进行了一次获取/释放锁的过程,随着线程的增加这一过程产生的负担会越来越大。将新线程闭包中的内容简单地改成如下形式将使性能得到巨大改善。

...

xin-water 2021-01-22 03:39

不知道这种提升性能方法对你有没有用,某些情况可以用分段锁,比如这样:Arc::new(vec![rwlock::new(hashmap::new());1024])。

hr567 2021-01-22 02:12

虽然在看到你写的go语言的程序前无法做精确的判断,不过在rust(或者其他任何语言)中反复获取/释放锁都是一件非常耗时的工作。

在该程序创建的每个新线程中,为了读取HashMap中的值,每次循环都进行了一次获取/释放锁的过程,随着线程的增加这一过程产生的负担会越来越大。将新线程闭包中的内容简单地改成如下形式将使性能得到巨大改善。

thread::spawn(move || {
    let m = m.read().unwrap();
    for i in 0..total {
        let _x = m.get(&i);
    }
    // drop(m);
})

附上修改前后在我的笔记本电脑(i7-8750H 6C12T)上运行该程序得到的结果:

修改前

 ~/Projects/Default  cargo run --release
   Compiling default v0.1.0 (/home/hr567/Projects/Default)
    Finished release [optimized] target(s) in 0.74s
     Running `target/release/default`
threads: 1; time used: 106.26082ms; ips: 9410806.353649445
threads: 2; time used: 266.311903ms; ips: 7509991.019815587
threads: 3; time used: 372.841298ms; ips: 8046318.946137775
threads: 4; time used: 483.637936ms; ips: 8270649.802789664
threads: 5; time used: 568.541328ms; ips: 8794435.432845086
threads: 6; time used: 680.48677ms; ips: 8817217.710198833
threads: 7; time used: 811.402502ms; ips: 8627037.731268914
threads: 8; time used: 941.246638ms; ips: 8499366.347803093

修改后

 ~/Projects/Default  cargo run --release
   Compiling default v0.1.0 (/home/hr567/Projects/Default)
    Finished release [optimized] target(s) in 0.47s
     Running `target/release/default`
threads: 1; time used: 46.941848ms; ips: 21302953.390330948
threads: 2; time used: 47.270548ms; ips: 42309642.782224566
threads: 3; time used: 49.024636ms; ips: 61193723.09057021
threads: 4; time used: 49.00787ms; ips: 81619543.96304104
threads: 5; time used: 50.798106ms; ips: 98428866.61955467
threads: 6; time used: 52.188531ms; ips: 114967788.61240605
threads: 7; time used: 55.777269ms; ips: 125499152.71039893
threads: 8; time used: 56.610175ms; ips: 141317351.5185212

修改后的程序性能大致符合性能预期。

由此可见每个线程的负载受RwLock自身的读写次数影响较大。 go语言由于自身gc的影响工作方式想必与修改后的代码相近,因此性能也更加符合预期。

1 2 共 23 条评论, 2 页