求大佬给做一下性能分析

dreamtale90 发表于 2024-07-25 17:50

Tags：Rust, C++

本人是刚从C++转过来的newbie，今天在LeetCode刷了道题，分别用C++和Rust以相同思路实现，得到的结果根本不是一个数量级的，求大佬帮我分析一下原因，不胜感激。

平均数据如下:

语言	耗时(ms)	内存(MB)
C++	300	177
Rust	35	4.2

题目链接：2766. 重新放置石块

C++实现

class Solution {
public:
    vector<int> relocateMarbles(vector<int>& nums, vector<int>& moveFrom, vector<int>& moveTo) {
        std::unordered_set<int> unique_set;
        for (auto &item : nums) {
            unique_set.insert(item);
        }

        for (size_t i = 0; i < moveFrom.size(); i++) {
            unique_set.erase(moveFrom[i]);
            unique_set.insert(moveTo[i]);
        }

        vector<int> result;
        for (auto &item : unique_set)
            result.push_back(item);

        sort(result.begin(), result.end());
        return result;
    }
};

Rust实现

use std::collections::HashSet;

impl Solution {
    pub fn relocate_marbles(nums: Vec<i32>, move_from: Vec<i32>, move_to: Vec<i32>) -> Vec<i32> {
        let mut unique_set:HashSet<i32> = nums.into_iter().collect();

        for i in 0..move_from.len() {
            unique_set.remove(&move_from[i]);
            unique_set.insert(move_to[i]);
        }

        let mut result = Vec::new();
        for iter in unique_set {
            result.push(iter);
        }

        result.sort();
        result
    }
}

评论区

写评论

QuenKar 2024-08-02 19:14

leecode 的问题，Java在里面都比c++快

lithbitren 2024-07-30 07:18

我也碰到过debug比release编译更久的，导致我很长时间都不用debug编译。后来想想，有可能是release模式在词法分析的时候把一些unused/deadcode和无输出的代码块给直接优化掉了，但debug模式则会全量解析编译。我不懂啊，我猜的。

--
👇
dreamtale90: rust这个优化有点魔性，我自己写那个测试，输入也是10w数据，编译时间O2要两分多，O3才十七八秒，但程序执行时间差不多。。。

作者 dreamtale90 2024-07-29 09:32

rust这个优化有点魔性，我自己写那个测试，输入也是10w数据，编译时间O2要两分多，O3才十七八秒，但程序执行时间差不多。。。

--
👇
lithbitren: 歪个楼，之前看那个一个啥文章，说放弃三年rust游戏开发，好像说实践里o1用得比较多，综测性能和o3差不多，但编译时间大大缩短。

--
👇
TinusgragLin: > leetcode测试不一定开o3吧，我不太懂啊。

看了一下，完全不开优化的话，我这边 test-rust 会慢一点，o1,o2 的结果和 o3 差不多。

作者 dreamtale90 2024-07-29 09:29

嗯，后来我自己也在本地测了，应该就是LeetCode环境/优化的事。

--
👇
TinusgragLin: 我简单地做了一下测试，用代码随机产生三个长度为 10 万的数串（输入到文件input0, input1, input2），作为输入：

use rand::{thread_rng, Rng};
use std::{fs::File, io::{Result as IoResult, Write}};

fn main() -> IoResult<()> {
    let mut rng = thread_rng();
    let mut f = File::create("input0").unwrap();
    for _ in 0..100_000 {
        let x: u32 = rng.gen_range(1..=1_000_000);
        write!(f, "{x},\n")?;
    }
    Ok(())
}

C++ 测试程序：

#include <algorithm>
#include <unordered_set>
#include <vector>
#include <iostream>

#define SIZE 100000

static int NUMS[SIZE] = {
#include "input0"
};

static int MOVE_FROM[SIZE] = {
#include "input1"
};

static int MOVE_TO[SIZE] = {
#include "input2"
};

std::vector<int> relocateMarbles(std::vector<int> &nums,
                                 std::vector<int> &moveFrom,
                                 std::vector<int> &moveTo) {
    std::unordered_set<int> unique_set;
    for (auto &item : nums) {
        unique_set.insert(item);
    }

    for (size_t i = 0; i < moveFrom.size(); i++) {
        unique_set.erase(moveFrom[i]);
        unique_set.insert(moveTo[i]);
    }

    std::vector<int> result;
    for (auto &item : unique_set)
        result.push_back(item);

    std::sort(result.begin(), result.end());
    return result;
}

int main() {
    std::vector<int> nums(NUMS, NUMS + SIZE);
    std::vector<int> move_from(MOVE_FROM, MOVE_FROM + SIZE);
    std::vector<int> move_to(MOVE_TO, MOVE_TO + SIZE);
    auto result = relocateMarbles(nums, move_from, move_to);
    std::cout << "L: " << result.size() << std::endl;
}

Rust 测试程序：

use std::collections::HashSet;

const SIZE: usize = 100_000;

static NUMS: [i32; SIZE] = [
    // 复制 input0
];
static MOVE_FROM: [i32; SIZE] = [
    // 复制 input1
];
static MOVE_TO: [i32; SIZE] = [
    // 复制 input2
];

pub fn relocate_marbles(nums: Vec<i32>, move_from: Vec<i32>, move_to: Vec<i32>) -> Vec<i32> {
    let mut unique_set: HashSet<i32> = nums.into_iter().collect();

    for i in 0..move_from.len() {
        unique_set.remove(&move_from[i]);
        unique_set.insert(move_to[i]);
    }

    let mut result = Vec::new();
    for iter in unique_set {
        result.push(iter);
    }

    result.sort();
    result
}

fn main() {
    let nums = Vec::from(NUMS);
    let move_from = Vec::from(MOVE_FROM);
    let move_to = Vec::from(MOVE_TO);
    let result = relocate_marbles(nums, move_from, move_to);
    println!("L: {}", result.len());
}

makefile：

test-cpp: test.cpp
	clang++ -O3 -flto -fuse-ld=lld -o test-cpp test.cpp
test-rs: test.rs
	rustc -C opt_level=3 -C lto -C strip=debuginfo -o test-rs test.rs
test-cpp-dbg: test.cpp
	clang++ -g -O3 -flto -fuse-ld=lld -o test-cpp-dbg test.cpp
test-rs-dbg: test.rs
	rustc -g -C opt_level=3 -C lto -o test-rs-dbg test.rs

bench: test-cpp test-rs
	/usr/bin/time -f "Maximum RSS: %M (KB)" ./test-rs
	/usr/bin/time -f "Maximum RSS: %M (KB)" ./test-cpp
	hyperfine './test-cpp' './test-rs'
flamegraph: test-cpp-dbg test-rs-dbg
	flamegraph -o test-cpp.svg -- ./test-cpp-dbg
	flamegraph -o test-rs.svg -- ./test-rs-dbg

运行 make bench 的结果：

clang++ -O3 -flto -fuse-ld=lld -o test-cpp test.cpp
rustc -C opt_level=3 -C lto -C strip=debuginfo -o test-rs test.rs
/usr/bin/time -f "Maximum RSS: %M (KB)" ./test-rs
L: 168512
Maximum RSS: 6200 (KB)
/usr/bin/time -f "Maximum RSS: %M (KB)" ./test-cpp
L: 168512
Maximum RSS: 13836 (KB)
hyperfine './test-cpp' './test-rs'
Benchmark 1: ./test-cpp
  Time (mean ± σ):      80.0 ms ±   6.8 ms    [User: 68.2 ms, System: 11.2 ms]
  Range (min … max):    64.6 ms …  96.0 ms    39 runs

Benchmark 2: ./test-rs
  Time (mean ± σ):      27.6 ms ±   6.2 ms    [User: 22.4 ms, System: 5.0 ms]
  Range (min … max):    18.5 ms …  44.9 ms    97 runs

Summary
  ./test-rs ran
    2.90 ± 0.70 times faster than ./test-cpp

lithbitren 2024-07-27 22:13

歪个楼，之前看那个一个啥文章，说放弃三年rust游戏开发，好像说实践里o1用得比较多，综测性能和o3差不多，但编译时间大大缩短。

--
👇
TinusgragLin: > leetcode测试不一定开o3吧，我不太懂啊。

看了一下，完全不开优化的话，我这边 test-rust 会慢一点，o1,o2 的结果和 o3 差不多。

TinusgragLin 2024-07-26 18:08

我把 cpp 这边 std::unordered_set 换成 abseil::flat_hash_set 速度就上来了，我估计再换个 sort 实现两边就相差无几了。

TinusgragLin 2024-07-26 18:05

leetcode测试不一定开o3吧，我不太懂啊。

看了一下，完全不开优化的话，我这边 test-rust 会慢一点，o1,o2 的结果和 o3 差不多。

lithbitren 2024-07-26 17:56

leetcode测试不一定开o3吧，我不太懂啊。

TinusgragLin 2024-07-26 17:34

我简单地做了一下测试，用代码随机产生三个长度为 10 万的数串（输入到文件input0, input1, input2），作为输入：

use rand::{thread_rng, Rng};
use std::{fs::File, io::{Result as IoResult, Write}};

fn main() -> IoResult<()> {
    let mut rng = thread_rng();
    let mut f = File::create("input0").unwrap();
    for _ in 0..100_000 {
        let x: u32 = rng.gen_range(1..=1_000_000);
        write!(f, "{x},\n")?;
    }
    Ok(())
}

C++ 测试程序：

#include <algorithm>
#include <unordered_set>
#include <vector>
#include <iostream>

#define SIZE 100000

static int NUMS[SIZE] = {
#include "input0"
};

static int MOVE_FROM[SIZE] = {
#include "input1"
};

static int MOVE_TO[SIZE] = {
#include "input2"
};

std::vector<int> relocateMarbles(std::vector<int> &nums,
                                 std::vector<int> &moveFrom,
                                 std::vector<int> &moveTo) {
    std::unordered_set<int> unique_set;
    for (auto &item : nums) {
        unique_set.insert(item);
    }

    for (size_t i = 0; i < moveFrom.size(); i++) {
        unique_set.erase(moveFrom[i]);
        unique_set.insert(moveTo[i]);
    }

    std::vector<int> result;
    for (auto &item : unique_set)
        result.push_back(item);

    std::sort(result.begin(), result.end());
    return result;
}

int main() {
    std::vector<int> nums(NUMS, NUMS + SIZE);
    std::vector<int> move_from(MOVE_FROM, MOVE_FROM + SIZE);
    std::vector<int> move_to(MOVE_TO, MOVE_TO + SIZE);
    auto result = relocateMarbles(nums, move_from, move_to);
    std::cout << "L: " << result.size() << std::endl;
}

Rust 测试程序：

use std::collections::HashSet;

const SIZE: usize = 100_000;

static NUMS: [i32; SIZE] = [
    // 复制 input0
];
static MOVE_FROM: [i32; SIZE] = [
    // 复制 input1
];
static MOVE_TO: [i32; SIZE] = [
    // 复制 input2
];

pub fn relocate_marbles(nums: Vec<i32>, move_from: Vec<i32>, move_to: Vec<i32>) -> Vec<i32> {
    let mut unique_set: HashSet<i32> = nums.into_iter().collect();

    for i in 0..move_from.len() {
        unique_set.remove(&move_from[i]);
        unique_set.insert(move_to[i]);
    }

    let mut result = Vec::new();
    for iter in unique_set {
        result.push(iter);
    }

    result.sort();
    result
}

fn main() {
    let nums = Vec::from(NUMS);
    let move_from = Vec::from(MOVE_FROM);
    let move_to = Vec::from(MOVE_TO);
    let result = relocate_marbles(nums, move_from, move_to);
    println!("L: {}", result.len());
}

makefile：

test-cpp: test.cpp
	clang++ -O3 -flto -fuse-ld=lld -o test-cpp test.cpp
test-rs: test.rs
	rustc -C opt_level=3 -C lto -C strip=debuginfo -o test-rs test.rs
test-cpp-dbg: test.cpp
	clang++ -g -O3 -flto -fuse-ld=lld -o test-cpp-dbg test.cpp
test-rs-dbg: test.rs
	rustc -g -C opt_level=3 -C lto -o test-rs-dbg test.rs

bench: test-cpp test-rs
	/usr/bin/time -f "Maximum RSS: %M (KB)" ./test-rs
	/usr/bin/time -f "Maximum RSS: %M (KB)" ./test-cpp
	hyperfine './test-cpp' './test-rs'
flamegraph: test-cpp-dbg test-rs-dbg
	flamegraph -o test-cpp.svg -- ./test-cpp-dbg
	flamegraph -o test-rs.svg -- ./test-rs-dbg

运行 make bench 的结果：

clang++ -O3 -flto -fuse-ld=lld -o test-cpp test.cpp
rustc -C opt_level=3 -C lto -C strip=debuginfo -o test-rs test.rs
/usr/bin/time -f "Maximum RSS: %M (KB)" ./test-rs
L: 168512
Maximum RSS: 6200 (KB)
/usr/bin/time -f "Maximum RSS: %M (KB)" ./test-cpp
L: 168512
Maximum RSS: 13836 (KB)
hyperfine './test-cpp' './test-rs'
Benchmark 1: ./test-cpp
  Time (mean ± σ):      80.0 ms ±   6.8 ms    [User: 68.2 ms, System: 11.2 ms]
  Range (min … max):    64.6 ms …  96.0 ms    39 runs

Benchmark 2: ./test-rs
  Time (mean ± σ):      27.6 ms ±   6.2 ms    [User: 22.4 ms, System: 5.0 ms]
  Range (min … max):    18.5 ms …  44.9 ms    97 runs

Summary
  ./test-rs ran
    2.90 ± 0.70 times faster than ./test-cpp

lithbitren 2024-07-26 16:53

java在leetcode测试的时候可能是已经启动预热好再加载脚本的，而且每个用户提交的脚本执行应该没有系统级的隔离，jit加速有时候比cpp快不出奇。最慢的应该是python，前几年刷题的时候，遇到过一道hard题，相同的最优算法只有python过不了，其他其他js,ts,go,rs,cpp,java都能轻松过，后来投诉了，然后官方改了py的判定时间才能ac。

liangyongrui 2024-07-26 14:26

leetcode 的问题，你用java写可能得到的数据更快

作者 dreamtale90 2024-07-26 09:56

哦，好的。

--
👇
WANG-lp: 很有可能是sort算法不一样导致性能不一样，rust的sort对某些场景有特殊优化

作者 dreamtale90 2024-07-26 09:55

这还真不知道，看来想搞清楚还得本地测。

--
👇
facefaceless: 不用纠结这个，力扣对c++有负优化，甚至会出现比java慢的情况

作者 dreamtale90 2024-07-26 09:52

对于C++肯定是有历史包袱问题的

--
👇
Dangerise: 也许是Rust有后发优势，标准库算法普遍优于STL？

作者 dreamtale90 2024-07-26 09:51

嗯好

--
👇
lithbitren: 一点编码风格小别扭，rust里既然vec转hashset用了iter写进一行，反过来hashset转vec也应该iter啊

作者 dreamtale90 2024-07-26 09:47

嗯，有时间自己本地测一测。

--
👇
wangbyby: 不太清楚leetcode机器资源情况，可能得自己本地验证一下。

1.准备相同测试数据分别喂给cpp，rust程序 2.cpp编译器用clang开O2/O3，rust用release 3.多跑几轮取均值

facefaceless 2024-07-26 08:31

不用纠结这个，力扣对c++有负优化，甚至会出现比java慢的情况

lithbitren 2024-07-26 00:59

一点编码风格小别扭，rust里既然vec转hashset用了iter写进一行，反过来hashset转vec也应该iter啊

WANG-lp 2024-07-25 21:40

很有可能是sort算法不一样导致性能不一样，rust的sort对某些场景有特殊优化

wangbyby 2024-07-25 20:15

不太清楚leetcode机器资源情况，可能得自己本地验证一下。

1.准备相同测试数据分别喂给cpp，rust程序 2.cpp编译器用clang开O2/O3，rust用release 3.多跑几轮取均值

1 2 共 21 条评论, 2 页