Ver Fonte

feat(impact): Rust module-path resolution for qualified `use`/path references

Resolve a Rust qualified path `A::B::C` by mapping the module prefix `A::B` to a
file (crate/self/super-anchored; each segment -> `<dir>/seg.rs` or
`<dir>/seg/mod.rs`) and finding the leaf `C` there, instead of name-matching the
leaf. `emitRustUseBindingRefs` now emits the full path so this can fire.

Fixes common-name re-export collisions: `pub use self::read::read` now links the
correct `fs/read.rs` rather than a stray same-named `read` (ripgrep 81%->87%).
Returns null for non-module prefixes (`Widget::new`, enum variants) so those fall
through to method/name resolution. New collision test; full suite green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Colby McHenry há 2 semanas atrás
pai
commit
2ac7df58f7
4 ficheiros alterados com 175 adições e 26 exclusões
  1. 1 1
      CHANGELOG.md
  2. 19 0
      __tests__/extraction.test.ts
  3. 30 25
      src/extraction/tree-sitter.ts
  4. 125 0
      src/resolution/import-resolver.ts

+ 1 - 1
CHANGELOG.md

@@ -18,7 +18,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 - `codegraph affected` now reports the tests and files that actually depend on your changes. It used to follow only `import` statements — but those never cross file boundaries in CodeGraph's graph — so it returned **no affected tests for any change, in every language**. It now traces the real cross-file usage graph (calls, references, instantiations, and class `extends` / `implements`), so `git diff --name-only | codegraph affected` surfaces the test files that exercise the changed code. Circular-dependency detection, which had the same blind spot, now works too.
 - Blast radius, callers, and `codegraph affected` now recognize far more of the dependencies that were already in your code. A symbol now counts as a dependency whether it's called, used only in a type annotation inside a function body (`const items: Foo[] = []`), imported and placed in a registry array or passed as an argument, used as a JSX component, simply re-exported from a barrel (`export { X } from './x'`), or pulled in as a namespace (`import * as ns from '@/x'`) — including through tsconfig path aliases like `@/`. Previously only called, instantiated, or signature-typed symbols created a cross-file link, so a file that used a dependency in any other way could look like it depended on nothing — and the file that defined a widely-used symbol could look like nothing depended on it. The graph still indexes exactly the same symbols; it just connects the ones that were already there. (TypeScript/JavaScript)
 - The same completeness fix now applies to **Python**: a name brought in with `from module import X` is recorded as a dependency on that module even when `X` is only stored in a list/dict, passed as an argument, used as a decorator, or re-exported through an `__init__.py`. Previously Python linked only imports that were called or instantiated, so a module consumed purely by value — or only re-exported — looked like nothing depended on it.
-- Rust impact and `codegraph affected` now connect far more of the module graph. Struct literals (`Widget { n: 1 }`) are recorded as instantiations; a `use` / `pub use` brings its item into the dependency graph — so a `pub use` re-export hub (a `mod.rs` re-exporting its submodules) depends on the modules it re-exports; and trait dispatch reaches implementations — a struct whose methods cover a trait's is treated as implementing it, and a call through `&dyn Trait` resolves to the concrete method. Previously a Rust type linked only when called or used in a type position, so structs built by literal, modules surfaced only through `pub use`, and trait-only implementations looked like they had no dependents. (#584 for Rust traits)
+- Rust impact and `codegraph affected` now connect far more of the module graph. Struct literals (`Widget { n: 1 }`) are recorded as instantiations; a `use` / `pub use` brings its item into the dependency graph — so a `pub use` re-export hub (a `mod.rs` re-exporting its submodules) depends on the modules it re-exports — resolved by Rust module path (`crate::`/`self::`/`super::`), so a re-export of a common name like `read` links to the right module instead of a same-named symbol elsewhere; and trait dispatch reaches implementations — a struct whose methods cover a trait's is treated as implementing it, and a call through `&dyn Trait` resolves to the concrete method. Previously a Rust type linked only when called or used in a type position, so structs built by literal, modules surfaced only through `pub use`, and trait-only implementations looked like they had no dependents. (#584 for Rust traits)
 - C# `record` types are now indexed. `record`, `record class`, and `record struct` declarations (everywhere in modern C# — DTOs, value objects, CQRS messages, MediatR notifications) were previously skipped entirely, so every reference, generic type argument (`IEnumerable<MyRecord>`), and `new MyRecord(...)` pointed at nothing and the file defining a record looked like it had no callers or dependents. (#237)
 - Go interfaces now connect to their implementations. Go has no `implements` keyword — a type satisfies an interface just by having the right methods — so CodeGraph now infers that link: a struct whose methods cover an interface's method set is treated as implementing it, and a call through the interface (`API.Marshal(...)`) reaches every concrete implementation. This means a type used only via an interface (the common plugin/strategy pattern — e.g. JSON-codec or renderer implementations selected at runtime) is no longer reported as having no callers or no dependents, and impact now flows from an interface method to the implementations behind it. (#584)
 - Go now records cross-package struct creation. A composite literal like `render.XML{...}` or `pkga.Widget{...}` — including ones registered in a package-level `var registry = map[string]R{...}` — now links to the package that defines the type. Cross-package function calls and type references already resolved; this closes struct instantiation, so a package whose types are only *constructed* elsewhere (a common pattern for interface implementations) is no longer reported as having no dependents. Go type conversions such as `(*Wrapped)(x)` now link to the converted-to type as well.

+ 19 - 0
__tests__/extraction.test.ts

@@ -4839,4 +4839,23 @@ describe('Rust cross-module recall', () => {
       cg.destroy();
     } finally { cleanupTempDir(dir); }
   });
+
+  it('resolves a qualified path to the correct module when the leaf name collides', async () => {
+    const dir = rustProject({
+      'lib.rs': 'pub mod fast;\npub mod slow;\npub mod hub;\n',
+      'fast.rs': 'pub fn read() -> i32 { 1 }\n',
+      'slow.rs': 'pub fn read() -> i32 { 2 }\n',
+      // `read` exists in BOTH fast.rs and slow.rs — module-path resolution must
+      // send this re-export to fast.rs specifically, not name-match either.
+      'hub.rs': 'pub use crate::fast::read;\n',
+    });
+    try {
+      const cg = CodeGraph.initSync(dir, { config: { include: ['src/**/*.rs'], exclude: [] } });
+      await cg.indexAll();
+      cg.resolveReferences();
+      expect(cg.getFileDependents('src/fast.rs')).toContain('src/hub.rs');
+      expect(cg.getFileDependents('src/slow.rs')).not.toContain('src/hub.rs');
+      cg.destroy();
+    } finally { cleanupTempDir(dir); }
+  });
 });

+ 30 - 25
src/extraction/tree-sitter.ts

@@ -1888,61 +1888,66 @@ export class TreeSitterExtractor {
   }
 
   /**
-   * Emit one `imports` reference per leaf binding of a Rust `use` declaration —
+   * Emit one `imports` reference per binding of a Rust `use` declaration —
    * `use crate::m::Item`, `use crate::m::{A, B as C}`, `pub use self::sub::Item`.
-   * The leaf name (the defining symbol, not the local alias) is resolved by the
-   * name-matcher to its definition, so a `pub use` re-export hub (a `mod.rs`
-   * re-exporting submodule items) depends on the modules it re-exports, and a
-   * `use`d item that's only stored/passed (not called/typed) still links.
-   * `use ...::*` and bare `self`/`super`/`crate` segments have no leaf to link.
+   * Emits the FULL path (e.g. `self::sub::Item`, not just `Item`) so the resolver
+   * can resolve the module prefix to a file and find the leaf symbol there —
+   * disambiguating common-name re-exports (`pub use self::read::read`, where the
+   * leaf `read` collides with many same-named symbols). Falls back to name-match
+   * on the leaf when the path can't be resolved. `use ...::*` has no leaf binding.
    */
   private emitRustUseBindingRefs(node: SyntaxNode, fromNodeId: string): void {
-    const leaves: SyntaxNode[] = [];
-    const collect = (n: SyntaxNode): void => {
+    const paths: { text: string; node: SyntaxNode }[] = [];
+    const join = (prefix: string, seg: string): string => (prefix ? `${prefix}::${seg}` : seg);
+    const collect = (n: SyntaxNode, prefix: string): void => {
       switch (n.type) {
         case 'identifier':
-          leaves.push(n);
+          paths.push({ text: join(prefix, getNodeText(n, this.source)), node: n });
           break;
         case 'scoped_identifier': {
-          // `a::b::C` → the leaf is the final `name` segment.
-          const name = getChildByField(n, 'name') ?? n.namedChild(n.namedChildCount - 1);
-          if (name && name.type === 'identifier') leaves.push(name);
+          // Full scoped path (`a::b::C`); combine with any outer group prefix.
+          const full = getNodeText(n, this.source).trim();
+          paths.push({ text: prefix ? `${prefix}::${full}` : full, node: n });
           break;
         }
         case 'scoped_use_list': {
-          // `path::{ ... }` → recurse into the list; the path prefix isn't a leaf.
+          // `path::{ ... }` — the group's path becomes the prefix for each item.
+          const pathNode = getChildByField(n, 'path');
+          const seg = pathNode ? getNodeText(pathNode, this.source).trim() : '';
+          const newPrefix = seg ? join(prefix, seg) : prefix;
           const list = getChildByField(n, 'list') ?? n.namedChildren.find((c) => c.type === 'use_list');
-          if (list) collect(list);
+          if (list) collect(list, newPrefix);
           break;
         }
         case 'use_list':
           for (let i = 0; i < n.namedChildCount; i++) {
             const c = n.namedChild(i);
-            if (c) collect(c);
+            if (c) collect(c, prefix);
           }
           break;
         case 'use_as_clause': {
           // `Path as Alias` → link the source path (the definition), not the alias.
-          const path = getChildByField(n, 'path') ?? n.namedChild(0);
-          if (path) collect(path);
+          const p = getChildByField(n, 'path') ?? n.namedChild(0);
+          if (p) collect(p, prefix);
           break;
         }
-        // use_wildcard / self / super / crate → no specific leaf to link.
+        // use_wildcard → no specific binding to link.
       }
     };
     for (let i = 0; i < node.namedChildCount; i++) {
       const c = node.namedChild(i);
-      if (c) collect(c);
+      if (c) collect(c, '');
     }
-    for (const leaf of leaves) {
-      const name = getNodeText(leaf, this.source);
-      if (!name || name === 'self' || name === 'super' || name === 'crate') continue;
+    for (const p of paths) {
+      // The leaf must be a real name (skip a path that is only `self`/`super`/`crate`).
+      const leaf = p.text.split('::').pop();
+      if (!leaf || leaf === 'self' || leaf === 'super' || leaf === 'crate' || leaf === '*') continue;
       this.unresolvedReferences.push({
         fromNodeId,
-        referenceName: name,
+        referenceName: p.text,
         referenceKind: 'imports',
-        line: leaf.startPosition.row + 1,
-        column: leaf.startPosition.column,
+        line: p.node.startPosition.row + 1,
+        column: p.node.startPosition.column,
       });
     }
   }

+ 125 - 0
src/resolution/import-resolver.ts

@@ -1101,6 +1101,15 @@ export function resolveViaImport(
     if (pyResult) return pyResult;
   }
 
+  // Rust qualified path: resolve the module prefix of `crate::m::Item` /
+  // `self::sub::Item` / `super::m::func` to a file, then find the leaf symbol in
+  // it. Disambiguates common-name `pub use self::read::read` re-exports that
+  // name-matching would land on the wrong same-named symbol.
+  if (ref.language === 'rust' && ref.referenceName.includes('::')) {
+    const rustResult = resolveRustPathReference(ref, context);
+    if (rustResult) return rustResult;
+  }
+
   // Whole-module / namespace imports → link the importing file to the module
   // file. Python `from . import certs` / `import mod`, and TS/JS `import * as ns
   // from './x'` (so a namespace touched only via a value-member read still
@@ -1271,6 +1280,122 @@ function resolveModuleImportToFile(
   return null;
 }
 
+/**
+ * Resolve a Rust qualified reference `A::B::C` by mapping the MODULE prefix
+ * (`A::B`) to a file and finding the leaf symbol (`C`) in it. This is the Rust
+ * analog of {@link resolvePythonModuleMember} / {@link resolveGoCrossPackageReference}
+ * and the precise answer to common-name re-exports (`pub use self::read::read`)
+ * that name-matching can't disambiguate. Returns null when the prefix isn't a
+ * real module path (e.g. `Widget::new` — `Widget` is a struct, not a module),
+ * so associated-function calls and enum-variant paths fall through untouched.
+ */
+function resolveRustPathReference(
+  ref: UnresolvedRef,
+  context: ResolutionContext
+): ResolvedRef | null {
+  const segments = ref.referenceName.split('::').filter((s) => s.length > 0);
+  if (segments.length < 2) return null;
+  const leaf = segments[segments.length - 1]!;
+  const modSegs = segments.slice(0, -1);
+
+  const file = resolveRustModuleFile(modSegs, ref.filePath, context);
+  if (!file || file === ref.filePath) return null;
+
+  const target = context.getNodesInFile(file).find(
+    (n) =>
+      n.name === leaf &&
+      (n.kind === 'function' ||
+        n.kind === 'struct' ||
+        n.kind === 'enum' ||
+        n.kind === 'trait' ||
+        n.kind === 'type_alias' ||
+        n.kind === 'constant' ||
+        n.kind === 'method' ||
+        n.kind === 'class' ||
+        n.kind === 'interface')
+  );
+  if (target) {
+    return { original: ref, targetNodeId: target.id, confidence: 0.9, resolvedBy: 'import' };
+  }
+  return null;
+}
+
+/** The crate-root directory (holds `lib.rs`/`main.rs`), walking up from a file. */
+function rustCrateRootDir(fromFileAbs: string, context: ResolutionContext): string | null {
+  const projectRoot = context.getProjectRoot();
+  const toRel = (p: string) => path.relative(projectRoot, p).replace(/\\/g, '/');
+  let dir = path.dirname(fromFileAbs);
+  for (let i = 0; i < 64; i++) {
+    if (context.fileExists(toRel(path.join(dir, 'lib.rs'))) ||
+        context.fileExists(toRel(path.join(dir, 'main.rs')))) {
+      return dir;
+    }
+    const parent = path.dirname(dir);
+    if (parent === dir) return null;
+    dir = parent;
+  }
+  return null;
+}
+
+/** Directory under which the current file's module declares its SUBMODULES. */
+function rustSelfModuleDir(fromFileAbs: string): string {
+  const base = path.basename(fromFileAbs);
+  const dir = path.dirname(fromFileAbs);
+  // mod.rs / lib.rs / main.rs own their directory; `foo.rs`'s submodules live in `foo/`.
+  if (base === 'mod.rs' || base === 'lib.rs' || base === 'main.rs') return dir;
+  return path.join(dir, base.replace(/\.rs$/, ''));
+}
+
+/**
+ * Resolve a Rust module path (segments WITHOUT the leaf symbol) to the file of
+ * the last module segment — `crate::a::b` → `<crate>/a/b.rs` (or `.../b/mod.rs`).
+ * Anchors on `crate` / `self` / `super`; a bare path is tried crate-relative.
+ */
+function resolveRustModuleFile(
+  segments: string[],
+  fromFile: string,
+  context: ResolutionContext
+): string | null {
+  if (segments.length === 0) return null;
+  const projectRoot = context.getProjectRoot();
+  const fromAbs = path.join(projectRoot, fromFile);
+  const toRel = (p: string) => path.relative(projectRoot, p).replace(/\\/g, '/');
+
+  let dir: string | null;
+  let rest: string[];
+  const first = segments[0]!;
+  if (first === 'crate') {
+    dir = rustCrateRootDir(fromAbs, context);
+    rest = segments.slice(1);
+  } else if (first === 'self') {
+    dir = rustSelfModuleDir(fromAbs);
+    rest = segments.slice(1);
+  } else if (first === 'super') {
+    let supers = 0;
+    while (segments[supers] === 'super') supers++;
+    dir = rustSelfModuleDir(fromAbs);
+    for (let s = 0; s < supers; s++) dir = path.dirname(dir);
+    rest = segments.slice(supers);
+  } else {
+    // Bare path (2018 edition treats it as crate-relative).
+    dir = rustCrateRootDir(fromAbs, context);
+    rest = segments;
+  }
+  if (!dir) return null;
+
+  let targetFile: string | null = null;
+  for (const seg of rest) {
+    if (seg === 'self' || seg === 'crate' || seg === 'super') continue;
+    const asFile = toRel(path.join(dir, seg + '.rs'));
+    const asMod = toRel(path.join(dir, seg, 'mod.rs'));
+    if (context.fileExists(asFile)) targetFile = asFile;
+    else if (context.fileExists(asMod)) targetFile = asMod;
+    else return null;
+    dir = path.join(dir, seg);
+  }
+  return targetFile;
+}
+
 /**
  * Resolve a Java/Kotlin reference whose receiver is the simple name of
  * an imported FQN: `Foo.bar(...)` where `import com.example.Foo;`. The