Переглянути джерело

fix(resolution): resolve same-named methods to the call site's own file (#1079) (#1107)

When two files each declared a same-named class with a same-named method
(e.g. `class Logger { void log(); }`), a call resolved to whichever
definition was indexed first — so a call in `b/svc` wrongly targeted
`a/svc`, mixing up that method's callers and blast radius.

The reported case was C++ instance calls, but the underlying pattern —
"multiple same-named candidates, pick the first-indexed, ignore the call
site's file" — lived in three resolution paths, each firing for a
different call shape and affecting different languages:

  - `obj.log()`     instance        -> resolveMethodOnType (C++)
  - `Logger.log()`  class receiver  -> matchMethodCall Strategy 1/2/3
                                       (Python, TypeScript, Java, C#)
  - `Logger::log()` qualified       -> matchByQualifiedName (C++, Rust)

All five sites now share one helper, `preferCallSiteFile`, that prefers
a candidate declared in the call site's own file when a name is
ambiguous. It runs after the `preferredFqn` block in resolveMethodOnType,
so Java/Kotlin import disambiguation (#314) — whose target is
intentionally in another file — is unaffected. The helper is a no-op
when there are fewer than two candidates or none share the call site's
file, so the common single-definition case is unchanged.

Adds 8 tests under `Same-name method disambiguation (#1079)`: the
`preferCallSiteFile` contract, resolveMethodOnType precedence (including
a guard that an import FQN still beats the same-file preference),
`matchByQualifiedName` disambiguation, and end-to-end index tests for the
C++ instance, TypeScript static, and C++ qualified call shapes.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Colby Mchenry 1 день тому
батько
коміт
63bc0fd037
3 змінених файлів з 288 додано та 18 видалено
  1. 1 0
      CHANGELOG.md
  2. 208 1
      __tests__/resolution.test.ts
  3. 79 17
      src/resolution/name-matcher.ts

+ 1 - 0
CHANGELOG.md

@@ -20,6 +20,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 - C++ function names are now recovered even when decorated with a macro CodeGraph doesn't specifically know about. A function written `SOME_LIBRARY_MACRO ReturnType doWork(...)` previously had the macro or return type absorbed into its name whenever the macro wasn't one CodeGraph recognized; now the real name (`doWork`) is recovered regardless of the macro, so it's findable and its callers link — no per-library configuration needed. The recognized-macro list was also broadened (Qt, Folly, Abseil, LLVM, V8, Eigen, rapidjson) so those additionally capture the return type. This only ever cleans up an already-garbled name and is limited to C and C++, so ordinary names — and languages like Kotlin and Scala where identifiers can legitimately contain spaces — are unaffected. (#1102)
 - The set of C++ libraries whose macros are recognized for full return-type recovery was expanded well beyond Unreal Engine — now spanning Mozilla, Protobuf, {fmt}, nlohmann/json, GLM, Bullet, Skia, OpenCV, EASTL, Cocos2d-x, GLib, SQLite, and the common Windows calling conventions (so `HRESULT WINAPI CreateThing(...)` indexes as `CreateThing` returning `HRESULT`). Functions from libraries not on the list still get their name recovered automatically; being listed additionally recovers the return type. (#1103)
 - Graph traversal and blast-radius results no longer drop or miscount relationships in a handful of edge cases. When a symbol could be reached by more than one path, an impact/blast-radius query could leave out a direct dependency between two symbols that were already linked another way; separately, the lower-level graph traversal used by the library API could keep only one of several relationships between the same pair of symbols (for example a symbol that both calls and references another), count a caller reached through two different call sites twice, or return slightly more results than the requested size limit on a very highly-connected symbol. These were long-standing and mostly masked by later de-duplication, so day-to-day query results were largely unaffected, but the traversal now returns the complete, correctly-bounded set. Thanks @inth3shadows for the precise, individually-traced reports. (#1086, #1087, #1088, #1089, #1090)
+- Method calls to same-named classes in different files now resolve to the right definition. If two files each declared, say, a `Logger` class with its own `log()` method, a call could be linked to whichever definition happened to be indexed first — so a call in one file wrongly pointed at the class in another, mixing up that method's callers and blast radius. This affected calls written as `obj.log()`, `Logger.log()`, and `Logger::log()` across many languages, including C++, Python, TypeScript, Java, C#, and Rust. When a method name is ambiguous, CodeGraph now prefers the definition in the calling file itself — the correct target in the common case — while Java/Kotlin calls that an `import` already pins to another file are unaffected. Thanks @inth3shadows for the minimal repro and root-cause analysis. (#1079)
 
 
 ## [1.1.6] - 2026-06-30

+ 208 - 1
__tests__/resolution.test.ts

@@ -11,7 +11,7 @@ import * as os from 'os';
 import { CodeGraph } from '../src';
 import { Node, UnresolvedReference } from '../src/types';
 import { ReferenceResolver, createResolver, ResolutionContext } from '../src/resolution';
-import { matchReference } from '../src/resolution/name-matcher';
+import { matchReference, resolveMethodOnType, matchByQualifiedName, preferCallSiteFile } from '../src/resolution/name-matcher';
 import { resolveImportPath, extractImportMappings, resolveJvmImport, loadCppIncludeDirs, clearCppIncludeDirCache, isPhpIncludePathRef } from '../src/resolution/import-resolver';
 import type { UnresolvedRef } from '../src/resolution/types';
 import { detectFrameworks, getAllFrameworkResolvers } from '../src/resolution/frameworks';
@@ -1443,6 +1443,213 @@ func main() {
     });
   });
 
+  describe('Same-name method disambiguation (#1079)', () => {
+    // resolveMethodOnType picks among several methods that share a
+    // `Type::method` qualifiedName. The precedence is:
+    //   1. preferredFqn (Java/Kotlin import — target is intentionally in
+    //      ANOTHER file, #314),
+    //   2. the call site's OWN file (language-agnostic, #1079),
+    //   3. matches[0] (first-indexed) as a last resort.
+    const methodNode = (
+      id: string,
+      filePath: string,
+      language: Node['language'] = 'cpp',
+      qualifiedName = 'Logger::log',
+      name = 'log',
+    ): Node => ({
+      id, kind: 'method', name, qualifiedName, filePath, language,
+      startLine: 1, endLine: 1, startColumn: 0, endColumn: 0, updatedAt: 0,
+    });
+    const callRef = (filePath: string, language: Node['language'] = 'cpp'): UnresolvedRef => ({
+      fromNodeId: 'caller', referenceName: 'lg.log', referenceKind: 'calls',
+      line: 2, column: 0, filePath, language,
+    });
+    const ctxFor = (candidates: Node[]): ResolutionContext => ({
+      getNodesInFile: () => [],
+      getNodesByName: (name) => candidates.filter((c) => c.name === name),
+      getNodesByQualifiedName: () => [],
+      getNodesByKind: () => [],
+      fileExists: () => false,
+      readFile: () => null,
+      getProjectRoot: () => '',
+      getAllFiles: () => [],
+    });
+
+    it('prefers the definition in the call site\'s own file (#1079)', () => {
+      // matches[0] is the a/ definition; the call comes from b/, so it must
+      // resolve to b/ — not collapse onto the first-indexed match.
+      const logA = methodNode('m:a', 'a/svc.cpp');
+      const logB = methodNode('m:b', 'b/svc.cpp');
+      const result = resolveMethodOnType(
+        'Logger', 'log', callRef('b/svc.cpp'), ctxFor([logA, logB]), 0.9, 'instance-method',
+      );
+      expect(result?.targetNodeId).toBe('m:b');
+    });
+
+    it('lets an import FQN pin a cross-file target over the same-file preference (#314)', () => {
+      // Java: two `Bar::doIt` in different packages. The import FQN pins the
+      // alpha package; even though the call site lives in beta's file, the FQN
+      // must win — the same-file preference runs only AFTER preferredFqn.
+      const alpha = methodNode('m:alpha', 'com/example/alpha/Bar.java', 'java', 'Bar::doIt', 'doIt');
+      const beta = methodNode('m:beta', 'com/example/beta/Bar.java', 'java', 'Bar::doIt', 'doIt');
+      const result = resolveMethodOnType(
+        'Bar', 'doIt', callRef('com/example/beta/Bar.java', 'java'),
+        ctxFor([alpha, beta]), 0.9, 'instance-method', 'com.example.alpha.Bar',
+      );
+      expect(result?.targetNodeId).toBe('m:alpha');
+    });
+
+    it('falls back to the first match when nothing disambiguates', () => {
+      // Call site is a third file: no FQN, no same-file candidate → matches[0].
+      const logA = methodNode('m:a', 'a/svc.cpp');
+      const logB = methodNode('m:b', 'b/svc.cpp');
+      const result = resolveMethodOnType(
+        'Logger', 'log', callRef('c/other.cpp'), ctxFor([logA, logB]), 0.9, 'instance-method',
+      );
+      expect(result?.targetNodeId).toBe('m:a');
+    });
+
+    it('resolves C++ calls end-to-end to same-named classes in different files (#1079)', async () => {
+      // The exact repro from the issue: two files, each with its own
+      // `Logger::log`. Before the fix both callers pointed at the first def.
+      fs.mkdirSync(path.join(tempDir, 'a'), { recursive: true });
+      fs.mkdirSync(path.join(tempDir, 'b'), { recursive: true });
+      fs.writeFileSync(
+        path.join(tempDir, 'a', 'svc.cpp'),
+        `class Logger { public: void log() { int a = 1; } };\nvoid useA() { Logger lg; lg.log(); }\n`,
+      );
+      fs.writeFileSync(
+        path.join(tempDir, 'b', 'svc.cpp'),
+        `class Logger { public: void log() { int b = 2; } };\nvoid useB() { Logger lg; lg.log(); }\n`,
+      );
+
+      cg = await CodeGraph.init(tempDir, { index: true });
+      cg.resolveReferences();
+
+      const logInDir = (dir: string) =>
+        cg.getNodesByKind('method').find(
+          (n) => n.name === 'log' && n.filePath.replace(/\\/g, '/').endsWith(`${dir}/svc.cpp`),
+        )!;
+      const callTargets = (fnName: string) =>
+        cg
+          .getOutgoingEdges(cg.getNodesByKind('function').find((n) => n.name === fnName)!.id)
+          .filter((e) => e.kind === 'calls')
+          .map((e) => e.target);
+
+      const logA = logInDir('a');
+      const logB = logInDir('b');
+      expect(logA).toBeDefined();
+      expect(logB).toBeDefined();
+      expect(logA.id).not.toBe(logB.id);
+
+      // Each caller resolves to the Logger::log in its OWN file.
+      expect(callTargets('useA')).toContain(logA.id);
+      expect(callTargets('useB')).toContain(logB.id);
+    });
+
+    it('preferCallSiteFile puts same-file candidates first and is otherwise a no-op', () => {
+      const a = methodNode('m:a', 'a/svc.cpp');
+      const b = methodNode('m:b', 'b/svc.cpp');
+      // Same-file first; the rest keep their original order (stable).
+      expect(preferCallSiteFile([a, b], 'b/svc.cpp').map((n) => n.id)).toEqual(['m:b', 'm:a']);
+      expect(preferCallSiteFile([a, b], 'a/svc.cpp').map((n) => n.id)).toEqual(['m:a', 'm:b']);
+      // No same-file match → unchanged; <2 candidates → returned as-is.
+      expect(preferCallSiteFile([a, b], 'c/other.cpp').map((n) => n.id)).toEqual(['m:a', 'm:b']);
+      expect(preferCallSiteFile([a], 'z/none.cpp')).toHaveLength(1);
+    });
+
+    it('matchByQualifiedName prefers the same-file target when a qualified name is ambiguous (#1079)', () => {
+      // Two `Logger::log` definitions; an explicit `Logger::log()` call from b/
+      // must resolve to b/'s definition, not the first-indexed one.
+      const a = methodNode('m:a', 'a/svc.cpp');
+      const b = methodNode('m:b', 'b/svc.cpp');
+      const ctx: ResolutionContext = {
+        getNodesInFile: () => [],
+        getNodesByName: (name) => [a, b].filter((n) => n.name === name),
+        getNodesByQualifiedName: (q) => (q === 'Logger::log' ? [a, b] : []),
+        getNodesByKind: () => [],
+        fileExists: () => false,
+        readFile: () => null,
+        getProjectRoot: () => '',
+        getAllFiles: () => [],
+      };
+      const ref: UnresolvedRef = {
+        fromNodeId: 'caller', referenceName: 'Logger::log', referenceKind: 'calls',
+        line: 2, column: 0, filePath: 'b/svc.cpp', language: 'cpp',
+      };
+      expect(matchByQualifiedName(ref, ctx)?.targetNodeId).toBe('m:b');
+    });
+
+    it('resolves a static/class-receiver call to the class in the caller\'s file (#1079)', async () => {
+      // `Logger.log()` — the receiver is the class NAME, so this routes through
+      // the class-name-receiver strategy (not the C++ instance path). It was
+      // file-blind across languages; verified here on TypeScript.
+      fs.mkdirSync(path.join(tempDir, 'a'), { recursive: true });
+      fs.mkdirSync(path.join(tempDir, 'b'), { recursive: true });
+      fs.writeFileSync(
+        path.join(tempDir, 'a', 'svc.ts'),
+        `class Logger { static log() { return 1; } }\nexport function useA() { return Logger.log(); }\n`,
+      );
+      fs.writeFileSync(
+        path.join(tempDir, 'b', 'svc.ts'),
+        `class Logger { static log() { return 2; } }\nexport function useB() { return Logger.log(); }\n`,
+      );
+
+      cg = await CodeGraph.init(tempDir, { index: true });
+      cg.resolveReferences();
+
+      const logInDir = (dir: string) =>
+        cg.getNodesByKind('method').find(
+          (n) => n.name === 'log' && n.filePath.replace(/\\/g, '/').endsWith(`${dir}/svc.ts`),
+        )!;
+      const callTargets = (fnName: string) =>
+        cg
+          .getOutgoingEdges(cg.getNodesByKind('function').find((n) => n.name === fnName)!.id)
+          .filter((e) => e.kind === 'calls')
+          .map((e) => e.target);
+
+      const logA = logInDir('a');
+      const logB = logInDir('b');
+      expect(logA?.id).not.toBe(logB?.id);
+      expect(callTargets('useA')).toContain(logA.id);
+      expect(callTargets('useB')).toContain(logB.id);
+    });
+
+    it('resolves an explicitly-qualified call to the definition in the caller\'s file (#1079)', async () => {
+      // `Logger::log()` with two `Logger::log` definitions routes through the
+      // qualified-name strategy, whose partial match previously picked the first.
+      fs.mkdirSync(path.join(tempDir, 'a'), { recursive: true });
+      fs.mkdirSync(path.join(tempDir, 'b'), { recursive: true });
+      fs.writeFileSync(
+        path.join(tempDir, 'a', 'svc.cpp'),
+        `class Logger { public: static void log() { int a = 1; } };\nvoid useA() { Logger::log(); }\n`,
+      );
+      fs.writeFileSync(
+        path.join(tempDir, 'b', 'svc.cpp'),
+        `class Logger { public: static void log() { int b = 2; } };\nvoid useB() { Logger::log(); }\n`,
+      );
+
+      cg = await CodeGraph.init(tempDir, { index: true });
+      cg.resolveReferences();
+
+      const logInDir = (dir: string) =>
+        cg.getNodesByKind('method').find(
+          (n) => n.name === 'log' && n.filePath.replace(/\\/g, '/').endsWith(`${dir}/svc.cpp`),
+        )!;
+      const callTargets = (fnName: string) =>
+        cg
+          .getOutgoingEdges(cg.getNodesByKind('function').find((n) => n.name === fnName)!.id)
+          .filter((e) => e.kind === 'calls')
+          .map((e) => e.target);
+
+      const logA = logInDir('a');
+      const logB = logInDir('b');
+      expect(logA?.id).not.toBe(logB?.id);
+      expect(callTargets('useA')).toContain(logA.id);
+      expect(callTargets('useB')).toContain(logB.id);
+    });
+  });
+
   describe('Name Matcher: kind bias for new ref kinds', () => {
     const baseContext = (candidates: Node[]): ResolutionContext => ({
       getNodesInFile: () => [],

+ 79 - 17
src/resolution/name-matcher.ts

@@ -420,27 +420,66 @@ export function matchByQualifiedName(
     };
   }
 
-  // Try partial qualified name match
+  // Several symbols share this exact qualified name (e.g. `Logger::log` declared
+  // in two files — an ODR clash or separate translation units): prefer the one
+  // in the call site's own file before the partial-match fallback below, else
+  // the first-indexed def wins and a call in `b/svc` targets `a/svc` (#1079).
+  if (candidates.length > 1) {
+    const ordered = preferCallSiteFile(candidates, ref.filePath);
+    if (ordered[0]!.filePath === ref.filePath) {
+      return {
+        original: ref,
+        targetNodeId: ordered[0]!.id,
+        confidence: 0.95,
+        resolvedBy: 'qualified-name',
+      };
+    }
+  }
+
+  // Try partial qualified name match — again preferring the call site's own
+  // file when more than one symbol's qualifiedName ends with the reference.
   const parts = ref.referenceName.split(/[:.]/);
   const lastName = parts[parts.length - 1];
   if (lastName) {
-    const partialCandidates = context.getNodesByName(lastName);
-    for (const candidate of partialCandidates) {
-      if (candidate.qualifiedName.endsWith(ref.referenceName)) {
-        return {
-          original: ref,
-          targetNodeId: candidate.id,
-          confidence: 0.85,
-          resolvedBy: 'qualified-name',
-        };
-      }
+    const partialCandidates = context
+      .getNodesByName(lastName)
+      .filter((candidate) => candidate.qualifiedName.endsWith(ref.referenceName));
+    const chosen = preferCallSiteFile(partialCandidates, ref.filePath)[0];
+    if (chosen) {
+      return {
+        original: ref,
+        targetNodeId: chosen.id,
+        confidence: 0.85,
+        resolvedBy: 'qualified-name',
+      };
     }
   }
 
   return null;
 }
 
-function resolveMethodOnType(
+/**
+ * When a symbol name is ambiguous across files, prefer the candidate(s) declared
+ * in the call site's own file, keeping the rest in their original order (#1079).
+ * A same-file definition is the strongest language-agnostic signal for which of
+ * several same-named symbols a call means; without it, resolution collapses onto
+ * whichever was indexed first, so a call in `b/svc` wrongly targets `a/svc`.
+ * No-op when there are <2 candidates or none share the call site's file.
+ */
+export function preferCallSiteFile(nodes: Node[], callSiteFile: string): Node[] {
+  if (nodes.length < 2) return nodes;
+  const same: Node[] = [];
+  const other: Node[] = [];
+  for (const n of nodes) {
+    if (n.filePath === callSiteFile) same.push(n);
+    else other.push(n);
+  }
+  return same.length ? [...same, ...other] : nodes;
+}
+
+// Exported for the precedence unit tests (#1079): they assert the
+// preferredFqn → same-file → matches[0] ordering directly.
+export function resolveMethodOnType(
   typeName: string,
   methodName: string,
   ref: UnresolvedRef,
@@ -511,9 +550,19 @@ function resolveMethodOnType(
     }
   }
 
+  // Language-agnostic disambiguation: when several same-named methods survive
+  // (e.g. two files each declaring `class Logger { void log(); }` — an ODR
+  // clash, an anonymous-namespace type, or separate translation units), prefer
+  // the definition in the CALL SITE's own file. Without this, every ambiguous
+  // call collapses onto the first-indexed definition, so a call in `b/svc.cpp`
+  // wrongly points at `a/svc.cpp` (#1079). This runs AFTER the `preferredFqn`
+  // block, so Java/Kotlin import disambiguation — whose target is intentionally
+  // in ANOTHER file (#314) — is unaffected: that block returns early whenever
+  // an import FQN pins the class.
+  const ordered = preferCallSiteFile(matches, ref.filePath);
   return {
     original: ref,
-    targetNodeId: matches[0]!.id,
+    targetNodeId: ordered[0]!.id,
     confidence,
     resolvedBy,
   };
@@ -1041,8 +1090,15 @@ export function matchMethodCall(
     }
   }
 
-  // Strategy 1: Direct class name match (existing logic)
-  const classCandidates = context.getNodesByName(objectOrClass!);
+  // Strategy 1: Direct class name match (existing logic). When the receiver
+  // names a class that exists in several files (`Logger.log()` / `Logger::log()`
+  // with a `Logger` in both `a/` and `b/`), try the class in the call site's
+  // own file first — otherwise the first-indexed class wins and a call in `b/`
+  // resolves to `a/`'s method (#1079).
+  const classCandidates = preferCallSiteFile(
+    context.getNodesByName(objectOrClass!),
+    ref.filePath,
+  );
 
   for (const classNode of classCandidates) {
     if (classNode.kind === 'class' || classNode.kind === 'struct' || classNode.kind === 'interface') {
@@ -1072,7 +1128,10 @@ export function matchMethodCall(
   // e.g., "permissionEngine" → look for classes containing "PermissionEngine"
   const capitalizedReceiver = objectOrClass!.charAt(0).toUpperCase() + objectOrClass!.slice(1);
   if (capitalizedReceiver !== objectOrClass) {
-    const fuzzyClassCandidates = context.getNodesByName(capitalizedReceiver);
+    const fuzzyClassCandidates = preferCallSiteFile(
+      context.getNodesByName(capitalizedReceiver),
+      ref.filePath,
+    );
     for (const classNode of fuzzyClassCandidates) {
       if (classNode.kind === 'class' || classNode.kind === 'struct' || classNode.kind === 'interface') {
         // Skip cross-language class matches
@@ -1136,7 +1195,10 @@ export function matchMethodCall(
       let bestMatch: typeof targetMethods[0] | undefined;
       let bestScore = 0;
 
-      for (const method of targetMethods) {
+      // Same-file candidates first, so a score tie (`score > bestScore` keeps
+      // the first seen) resolves to the call site's own file rather than the
+      // first-indexed duplicate (#1079).
+      for (const method of preferCallSiteFile(targetMethods, ref.filePath)) {
         const classWords = splitCamelCase(method.qualifiedName);
         let score = receiverWords.filter(w =>
           classWords.some(cw => cw.toLowerCase() === w.toLowerCase())