< 返回版块

Pure White 发表于 2021-11-07 20:19

Tags:rust,rustc,源码

最近 Rust 官方社区搞了个 Rustc Reading Club 的活动,由编译器 team 的 Leader Niko 发起,具体网址在这里:https://rust-lang.github.io/rustc-reading-club/

很可惜的是,11 月 4 日的第一期,由于太过火爆并且 Zoom 人数限制 100 人,导致主持人 Niko 自己进不来所以取消了……等待看看官方后续会怎么搞吧,还是很期待官方组织的活动的。

Rust 中文社群的张汉东大佬也紧跟着官方的活动,在社群里面组织了 Rustc 源码阅读的活动,今天(11 月 7 日)举办了第一期,在这期中我跟着吴翱翔大佬的思路,从一个错误出发,学习了一部分 rustc_resolve 的逻辑,于是想着写一篇博客总结一下。

【小广告】下一期 11 月 14 日下午会由刘翼飞大佬带领大家一起去阅读类型推导相关的代码,有兴趣的同学可以下载飞书,注册一个个人账号,然后扫描二维码加入:

准备工作

言归正传,在阅读 Rustc 源代码之前,我们需要先做一些准备工作,主要是先 clone 下来 Rust 的代码,然后配置好 IDE(虽然但是,Clion 到现在正式版还不支持远程,EAP 又各种 bug……),具体可以参考官方的 guide:https://rustc-dev-guide.rust-lang.org/getting-started.html。跟着这章做完就行:https://rustc-dev-guide.rust-lang.org/building/how-to-build-and-run.html。

从错误出发

这次我们的阅读主要的对象是rustc_resolve,顾名思义应该是做名称解析的,更加详细的信息可以来这瞅一眼:https://rustc-dev-guide.rust-lang.org/name-resolution.html。

我们打开rustc_resolvelib.rs一看,妈呀,光这个文件就接近 4000 行代码,直接这么硬看肯定不现实;不过吴翱翔大佬提出了一个思路:从一个我们最常见的错误the name xx is defined multiple times出发,顺着这条路去学习一下相关的代码。

这是一个很好的办法,当你不知道从哪入手的时候,你可以构造一个场景,由点切入,最终由点及面看完所有代码。

废话少说,我们先祭出搜索大法,在rustc_resolve里面搜一下这个错误是在哪出现的:

非常巧,正好就在rustc_resolvelib.rs中,于是我们跳转过去,发现确实是这个我们想找的错误:

let msg = format!("the name `{}` is defined multiple times", name);

let mut err = match (old_binding.is_extern_crate(), new_binding.is_extern_crate()) {
    (true, true) => struct_span_err!(self.session, span, E0259, "{}", msg),
    (true, _) | (_, true) => match new_binding.is_import() && old_binding.is_import() {
        true => struct_span_err!(self.session, span, E0254, "{}", msg),
        false => struct_span_err!(self.session, span, E0260, "{}", msg),
    },
    _ => match (old_binding.is_import(), new_binding.is_import()) {
        (false, false) => struct_span_err!(self.session, span, E0428, "{}", msg),
        (true, true) => struct_span_err!(self.session, span, E0252, "{}", msg),
        _ => struct_span_err!(self.session, span, E0255, "{}", msg),
    },
};

所在的这个函数名也正好是report_conflict,完美!

让我们接着看看这个函数在哪被调用到了:

这个函数除了定义外,被调用到了两次,其中下面这次是在自己函数内部递归调用,我们直接无视掉;还有一次是在build_reduced_graph.rs中,让我们跟着去看看:

在这里是被define方法调用到,看着很符合预期,看来我们找对地方了。

这段代码先通过to_name_binding方法把传入的def转换成一个NameBinding,让我们看看这段干了啥:

NameBinding是一个记录了一个值、类型或者模块定义的结构体,其中kind我们大胆猜测是类型,ambiguity看不懂先放着,expansion也是(如果看过 rustc-dev-guide 能大致知道是和卫生宏展开有关,这里我们也先无视),然后是span也不知道干啥的,点进去研究下感觉和增量编译有关,也先放着,最后vis估摸着应该表示的是可见性。

然后我们再点ResolverArenas看看是干啥的:

/// Nothing really interesting here; it just provides memory for the rest of the crate.
#[derive(Default)]
pub struct ResolverArenas<'a> {
    ...
}

嗯,好,没啥值得关注的,只是用来提供内存的,直接无视。

我们再接着回到上面的define方法中:

impl<'a> Resolver<'a> {
    /// Defines `name` in namespace `ns` of module `parent` to be `def` if it is not yet defined;
    /// otherwise, reports an error.
    crate fn define<T>(&mut self, parent: Module<'a>, ident: Ident, ns: Namespace, def: T)
    where
        T: ToNameBinding<'a>,
    {
        let binding = def.to_name_binding(self.arenas);
        let key = self.new_key(ident, ns);
        if let Err(old_binding) = self.try_define(parent, key, binding) {
            self.report_conflict(parent, ident, ns, old_binding, &binding);
        }
    }
    ...
}

第二句let key = self.new_key(ident, ns);看着也没啥特殊的,就是根据当前所在的namespaceident(表示标识符)新建一个key,那么 value 应该就是上面的binding了。

然后这里调用了try_define,如果返回了 Err 就调用report_conflict,让我们接着进入try_define看看(先不用仔细看):

// Define the name or return the existing binding if there is a collision.
crate fn try_define(
    &mut self,
    module: Module<'a>,
    key: BindingKey,
    binding: &'a NameBinding<'a>,
) -> Result<(), &'a NameBinding<'a>> {
    let res = binding.res();
    self.check_reserved_macro_name(key.ident, res);
    self.set_binding_parent_module(binding, module);
    self.update_resolution(module, key, |this, resolution| {
        if let Some(old_binding) = resolution.binding {
            if res == Res::Err {
                // Do not override real bindings with `Res::Err`s from error recovery.
                return Ok(());
            }
            match (old_binding.is_glob_import(), binding.is_glob_import()) {
                (true, true) => {
                    if res != old_binding.res() {
                        resolution.binding = Some(this.ambiguity(
                            AmbiguityKind::GlobVsGlob,
                            old_binding,
                            binding,
                        ));
                    } else if !old_binding.vis.is_at_least(binding.vis, &*this) {
                        // We are glob-importing the same item but with greater visibility.
                        resolution.binding = Some(binding);
                    }
                }
                (old_glob @ true, false) | (old_glob @ false, true) => {
                    let (glob_binding, nonglob_binding) =
                        if old_glob { (old_binding, binding) } else { (binding, old_binding) };
                    if glob_binding.res() != nonglob_binding.res()
                        && key.ns == MacroNS
                        && nonglob_binding.expansion != LocalExpnId::ROOT
                    {
                        resolution.binding = Some(this.ambiguity(
                            AmbiguityKind::GlobVsExpanded,
                            nonglob_binding,
                            glob_binding,
                        ));
                    } else {
                        resolution.binding = Some(nonglob_binding);
                    }
                    resolution.shadowed_glob = Some(glob_binding);
                }
                (false, false) => {
                    return Err(old_binding);
                }
            }
        } else {
            resolution.binding = Some(binding);
        }

        Ok(())
    })
}

看着比较长,让我们一点一点来。

第一句let res = binding.res();就有点懵了,res是啥?result?response?其实都不是,我们点进去看看,一直点到底,会发现其实是resolution的缩写:

/// The resolution of a path or export.
///
/// For every path or identifier in Rust, the compiler must determine
/// what the path refers to. This process is called name resolution,
/// and `Res` is the primary result of name resolution.
///
/// For example, everything prefixed with `/* Res */` in this example has
/// an associated `Res`:
///
/// ```
/// fn str_to_string(s: & /* Res */ str) -> /* Res */ String {
///     /* Res */ String::from(/* Res */ s)
/// }
///
/// /* Res */ str_to_string("hello");
/// ```
///
/// The associated `Res`s will be:
///
/// - `str` will resolve to [`Res::PrimTy`];
/// - `String` will resolve to [`Res::Def`], and the `Res` will include the [`DefId`]
///   for `String` as defined in the standard library;
/// - `String::from` will also resolve to [`Res::Def`], with the [`DefId`]
///   pointing to `String::from`;
/// - `s` will resolve to [`Res::Local`];
/// - the call to `str_to_string` will resolve to [`Res::Def`], with the [`DefId`]
///   pointing to the definition of `str_to_string` in the current crate.
//
#[derive(Clone, Copy, PartialEq, Eq, Encodable, Decodable, Hash, Debug)]
#[derive(HashStable_Generic)]
pub enum Res<Id = hir::HirId> {
    ...
}

好的,这条语句就是获得了我们刚才初始化的bindingresolution,我们接着看:

self.check_reserved_macro_name(key.ident, res);
self.set_binding_parent_module(binding, module);

先看第一行的check_reserved_macro_name

crate fn check_reserved_macro_name(&mut self, ident: Ident, res: Res) {
    // Reserve some names that are not quite covered by the general check
    // performed on `Resolver::builtin_attrs`.
    if ident.name == sym::cfg || ident.name == sym::cfg_attr {
        let macro_kind = self.get_macro(res).map(|ext| ext.macro_kind());
        if macro_kind.is_some() && sub_namespace_match(macro_kind, Some(MacroKind::Attr)) {
            self.session.span_err(
                ident.span,
                &format!("name `{}` is reserved in attribute namespace", ident),
            );
        }
    }
}

好像也没啥特殊的,就是看看有没有用到保留关键字,先无视掉吧;

再看看第二行set_binding_parent_module

fn set_binding_parent_module(&mut self, binding: &'a NameBinding<'a>, module: Module<'a>) {
    if let Some(old_module) = self.binding_parent_modules.insert(PtrKey(binding), module) {
        if !ptr::eq(module, old_module) {
            span_bug!(binding.span, "parent module is reset for binding");
        }
    }
}

hmmm……好像是绑定了所在的 module,看着也没啥特殊的,也跳过吧。

接着往下看,这一段是重头戏了,让我们先进入update_resolution看看:

这里我们只关注:

let resolution = &mut *self.resolution(module, key).borrow_mut();
...

let t = f(self, resolution);

这两行,这两行应该是主要逻辑。

首先,我们调用了self.resolution,我们进去看看:

这里又调用了resolutions

这里我们发现又有一段新的逻辑,我们看下字段的注释:

会发现其实 module 的 resolution 是 lazy 计算的,ok,具体的build_reduced_graph_external想必就是计算的部分,我们在这里先跳过,作为一个黑盒,之后再去探究。

好了,现在回过头继续看刚才的代码:

resolution方法中,我们获取到了当前模块的所有resolutions,然后看看key是否存在,不存在就创建一个新的,并返回这个resolution

再回到上层代码:

let resolution = &mut *self.resolution(module, key).borrow_mut();
...

let t = f(self, resolution);

这里我们拿到了resolution后调用了传入的 f,让我们回到try_define中,先看 else 部分:

self.update_resolution(module, key, |this, resolution| {
    if let Some(old_binding) = resolution.binding {
        ...
    } else {
        resolution.binding = Some(binding);
    }

    Ok(())
})

这里如果返回的resolutionbindingNone(对应上面resolution方法中新建的resolution,之前不存在),那么就把resolutionbinding设为当前的binding然后返回Ok,逻辑还是比较简单的。

好了,让我们再接着看看如果原来已经有了一个binding,rustc 会如何处理:

let res = binding.res();

...

self.update_resolution(module, key, |this, resolution| {
    if let Some(old_binding) = resolution.binding {
        if res == Res::Err {
            // Do not override real bindings with `Res::Err`s from error recovery.
            return Ok(());
        }
        ...

这里如果之前返回的 res 本身就是 Err 的话,就直接返回,我们看一下 Err 的注释:

嗯,这部分直接无视吧,我们接着看:

let res = binding.res();
self.update_resolution(module, key, |this, resolution| {
    if let Some(old_binding) = resolution.binding {
        ...
        match (old_binding.is_glob_import(), binding.is_glob_import()) {
            (true, true) => {
                if res != old_binding.res() {
                    resolution.binding = Some(this.ambiguity(
                        AmbiguityKind::GlobVsGlob,
                        old_binding,
                        binding,
                    ));
                } else if !old_binding.vis.is_at_least(binding.vis, &*this) {
                    // We are glob-importing the same item but with greater visibility.
                    resolution.binding = Some(binding);
                }
            }
            ...

如果说新的和旧的都是glob_import,那么我们判断一下当前的res和之前的res是否是同一个,如果不是就说明出现了模糊性,我们把resolutionbinding设置成ambiguity(模糊的意思);如果两个res是同一个,那我们再判断一下可见性,如果说新的可见性更大,那我们就直接替换。

这里大家就会疑惑了,glob_import是啥?我们来插入一个小插曲:

fn import_kind_to_string(import_kind: &ImportKind<'_>) -> String {
    match import_kind {
        ImportKind::Single { source, .. } => source.to_string(),
        ImportKind::Glob { .. } => "*".to_string(),
        ImportKind::ExternCrate { .. } => "<extern crate>".to_string(),
        ImportKind::MacroUse => "#[macro_use]".to_string(),
    }
}

看到这大家应该都知道了吧,我就不过多解释了。

好的,回归正题,看起来这段是处理use相关的,我们可以简单略过,接着往下看:

let res = binding.res();
self.update_resolution(module, key, |this, resolution| {
    if let Some(old_binding) = resolution.binding {
        ...
        match (old_binding.is_glob_import(), binding.is_glob_import()) {
            ...
            (old_glob @ true, false) | (old_glob @ false, true) => {
                let (glob_binding, nonglob_binding) =
                    if old_glob { (old_binding, binding) } else { (binding, old_binding) };
                if glob_binding.res() != nonglob_binding.res()
                    && key.ns == MacroNS
                    && nonglob_binding.expansion != LocalExpnId::ROOT
                {
                    resolution.binding = Some(this.ambiguity(
                        AmbiguityKind::GlobVsExpanded,
                        nonglob_binding,
                        glob_binding,
                    ));
                } else {
                    resolution.binding = Some(nonglob_binding);
                }
                resolution.shadowed_glob = Some(glob_binding);
            }
            ...

这一段我们处理了一个glob_import和一个非glob_import的情况,简单来说原则就是,非glob的优先,但是有个例外:如果非glob的是在宏中的,那么这里就会导致“模糊”(Rust 是卫生宏),这里会像上文一样把binding设为ambiguity

这部分的逻辑涉及到宏的相关知识,我们先作为一个黑盒跳过,反正大概了解到了非glob优先,会shadowglob就完事,这也符合我们的编码经验和人体工程学。

好,我们最后看最简单的一部分:

let res = binding.res();
self.update_resolution(module, key, |this, resolution| {
    if let Some(old_binding) = resolution.binding {
        ...
        match (old_binding.is_glob_import(), binding.is_glob_import()) {
            ...
            (false, false) => {
                return Err(old_binding);
            }
            ...

如果两个名字都不是glob引入的,那么就说明在当前的命名空间中我们出现了俩一样的名字(要注意在这里解析的不是变量名,所以不允许有一样的),那么就说明出错了,返回错误抛给上层,也就是我们的define方法中,并报错:

/// Defines `name` in namespace `ns` of module `parent` to be `def` if it is not yet defined;
/// otherwise, reports an error.
crate fn define<T>(&mut self, parent: Module<'a>, ident: Ident, ns: Namespace, def: T)
where
    T: ToNameBinding<'a>,
{
    let binding = def.to_name_binding(self.arenas);
    let key = self.new_key(ident, ns);
    if let Err(old_binding) = self.try_define(parent, key, binding) {
        self.report_conflict(parent, ident, ns, old_binding, &binding);
    }
}

总结

好了,至此,我们看完了我们开头所说的the name xx is defined multiple times相关的逻辑啦。

不过我们仍然遗留了一些问题,大家可以继续深入探究一下:

  1. binding被标记为ambiguity后,会发生什么?
  2. moduleresolution是怎么被解析出来的?也就是我们略过的build_reduced_graph_external干了啥?
  3. 宏展开导致的冲突为什么要特殊对待?

大家可以顺着以上的问题继续探究,欢迎大家留言评论或者加入 Rust 中文社群一起讨论学习 Rust~


Ext Link: https://www.purewhite.io/2021/11/07/rustc-resolve-reading-defined-multiple-times/

评论区

写评论

还没有评论

1 共 0 条评论, 1 页