用 go 写的类似的代码,平均只需要1微秒,但用 rust+tokio 却平均要3微秒,如何优化能达到 go的水平?
use std::{thread, time::Duration};
use std::time::SystemTime;
use std::sync::atomic::{AtomicI32, Ordering};
use std::ops::{Div, AddAssign};
use std::sync::{Arc};
use tokio::runtime::{Builder};
use tokio::sync::mpsc;
use tokio::sync::{Mutex};
#[derive(Debug)]
struct Event {
_type: String,
time: SystemTime,
}
fn main() {
let (tx, mut rx) = mpsc::unbounded_channel::<Event>();
let rt = Builder::new_multi_thread().worker_threads(8)
.build()
.unwrap();
let count: Arc<AtomicI32> = Arc::new(AtomicI32::new(0));
let total: Arc<Mutex<Duration>> = Arc::new(Mutex::new(Duration::new(0, 0)));
let count2 = count.clone();
let total2 = total.clone();
rt.spawn(async move {
while let Some(res) = rx.recv().await {
let elapsed = res.time.elapsed().ok().unwrap();
count2.fetch_add(1, Ordering::Relaxed);
total2.lock().await.add_assign(elapsed);
}
});
let max_loop = 100000;
for _ in 0..max_loop {
let event = Event {
_type: "hello".to_string(),
time: SystemTime::now(),
};
let _ = tx.send(event);
thread::sleep(Duration::from_micros(1));
}
thread::sleep(Duration::from_millis(4000));
println!("{:?}", count.load(Ordering::Relaxed));
rt.block_on(async move {
let total2 = total.lock().await;
println!(
"total: {:?}, average: {:?}, {:?}/s",
total2,
total2.div(count.load(Ordering::Relaxed) as u32),
count.load(Ordering::Relaxed) as f64 / total2.as_secs_f64()
);
});
}
go 的代码,average 输出是 1.351µs
package main
import (
"fmt"
"reflect"
"sync/atomic"
"time"
)
var count int32
var total int32
type event struct {
args []interface{}
event chan struct{}
handler interface{}
}
func onEvent(sig chan struct{}, t time.Time) {
d := time.Since(t)
atomic.AddInt32(&count, 1)
atomic.AddInt32(&total, int32(d))
sig <- struct{}{}
}
func main() {
queue := make(chan event, 200)
defer close(queue)
go func(q chan event) {
// 循环读取事件
for ev := range q {
v := reflect.ValueOf(ev.handler)
if v.Kind() != reflect.Func {
panic("not a function")
}
vArgs := make([]reflect.Value, len(ev.args))
for i, arg := range ev.args {
vArgs[i] = reflect.ValueOf(arg)
}
v.Call(vArgs)
}
}(queue)
const COUNT = 100000
var sig = make(chan struct{}, COUNT)
for i := 0; i < COUNT; i++ {
// 发事件
func(args ...interface{}) {
var ev = event{args: args, handler: onEvent}
queue <- ev
}(sig, time.Now())
time.Sleep(time.Microsecond)
}
for i := 0; i < COUNT; i++ {
<-sig
}
fmt.Println("total:", total, "average:", time.Duration(total)/time.Duration(count), count)
}
测试环境:I9-9900k 32G DDR4
1
共 14 条评论, 1 页
评论区
写评论结论描述错误,线程模型下rust(0.193us)比go(1.903µs)快10倍
👇
zhuxiujia: tokio不代表rust弱,csp模型抢占比较积极,计算能力稍高。要公平的话, 这里rust换线程模型试验一下。
go: total: 190360931 average: 1.903µs 100000 rust: total: 19.34ms, average: 193ns, 5170630.816959669/s
结论,rust表现和go基本一致(193纳秒约等于 0.193us)
代码:
tokio不代表rust弱,csp模型抢占比较积极,计算能力稍高。要公平的话, 这里rust换线程模型试验一下。
go: total: 190360931 average: 1.903µs 100000 rust: total: 19.34ms, average: 193ns, 5170630.816959669/s
结论,rust表现和go基本一致(193纳秒约等于 0.193us)
代码:
多谢指点
--
👇
songzhi:
spawn
时复制并移动Arc
可以写成这样,不用额外起名。go代码也贴上来了
--
👇
johnmave126: 建议把go代码也贴上来不然这也不好对比
有道理
--
👇
jht5945: go 默认chan的buffer大小是0,unbounded channel是无限大,这样比较就不公平了
好的,我整理一下
--
👇
johnmave126: 建议把go代码也贴上来不然这也不好对比
是 Release,我的CPU 是 i9-9900k
--
👇
songzhi: 我的运行结果: Debug:
Release:
你是Release模式下运行的吗?
不好意思,写错了,是微秒
--
👇
lineCode: µs不是微秒吗
µs不是微秒吗
go 默认chan的buffer大小是0,unbounded channel是无限大,这样比较就不公平了
建议把go代码也贴上来不然这也不好对比
spawn
时复制并移动Arc
可以写成这样,不用额外起名。total
用来计时完全没必要用Mutex<Duration>
, 用一个AtomicU64
就足够了。_type
如果是只读的,可以用Arc<String>
代替,避免次次堆分配。我的运行结果: Debug:
Release:
你是Release模式下运行的吗?