1
0
Эх сурвалжийг харах

fix(kotlin): resolve chained companion-factory calls Foo.getInstance().bar() (#750) (#752)

A Kotlin method called through a companion-object factory, fluent chain, or
constructor — `Foo.getInstance().bar()`, `Config.create(opts).build()`,
`STMTransaction(f).commit()` — dropped the receiver to a BARE method name, which
then name-matched a same-named method on an unrelated class (a wrong edge) or
failed to resolve. Ports the #645/#608 mechanism to Kotlin:

- Part 1: capture Kotlin return types in the extractor. tree-sitter-kotlin
  exposes no field names, so the return type is read positionally (the type node
  after function_value_parameters); inferred/Unit/Nothing returns yield none.
- Part 2: encode a CLASS/companion-factory call-receiver chain as `inner().method`.
  Gated to a capitalized receiver (`Foo.getInstance()` / `Foo(args)`) so instance
  chains (`list.filter{}.map{}`) keep their bare-name behavior — re-encoding those
  would only drop the edge, regressing recall in fluent codebases.
- Part 3: generalize matchJavaCallChain -> matchDottedCallChain (shared by the JVM
  dot-notation languages); resolve the method on the factory's return type, or on
  the constructed class for a Kotlin `Foo(args).method()` receiver. Validated via
  resolveMethodOnType, so a wrong inference yields NO edge.

Validated: synthetic decoy + args + absent-method safety tests; full suite green;
real-repo A/B on arrow-kt/arrow (734 .kt) — node count identical (no explosion),
+49 validated-correct chained edges, and the removed edges are wrong bare-name
guesses the fix correctly stops emitting (419/438 from test/doc files; the 18
from product code are stdlib `.apply{}`, self-loops, and bare-name mismatches) —
a net precision improvement, ~0 correct product edges lost. Java path unchanged
(constructor branch is Kotlin-gated). EXTRACTION_VERSION 6 -> 7.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Colby Mchenry 2 долоо хоног өмнө
parent
commit
3e04650850

+ 1 - 0
CHANGELOG.md

@@ -29,6 +29,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
 ### Fixes
 
+- Kotlin method calls made through a companion-object factory or fluent chain now resolve to the correct class. A call like `Foo.getInstance().bar()` or `Config.create(opts).build()` used to drop the receiver entirely, so the chained method silently attached to a same-named method on an unrelated class — or didn't resolve at all — corrupting callers, impact, and trace. CodeGraph now captures Kotlin return types and infers the chained receiver's type from what the inner call returns, creating the edge only when that class genuinely has the method (so a wrong inference produces no edge instead of a misleading one). Existing Kotlin indexes should be re-indexed (`codegraph index -f`) to benefit. (#750) (Kotlin)
 - Java method calls made through a static factory or fluent chain now resolve to the correct class. A call like `Foo.getInstance().bar()` or `Config.create(opts).build()` used to lose the receiver's type, so when two classes had a same-named method the call silently attached to whichever was indexed first — or didn't resolve at all — corrupting callers, impact, and trace. CodeGraph now captures Java return types and infers the chained receiver's type from what the inner call returns, creating the edge only when that class genuinely has the method (so a wrong inference produces no edge instead of a misleading one). Covers factories and fluent builders that take arguments (`hashKeys().arrayListValues()`), including builders that return a nested type. Existing Java indexes should be re-indexed (`codegraph index -f`) to benefit. (#750) (Java)
 - PHP: a method called through a chained static factory — `Cls::for($x)->method(...)`, the canonical Laravel per-credential / per-tenant client idiom — now records a caller edge. Previously the receiver type (what `for()` returns) was never recovered, so `codegraph_callers` returned nothing for the method and the call was invisible to `codegraph_impact`. CodeGraph now captures PHP return types — `: self` / `: static` resolve to the declaring class, `: SomeClass` to that class — and resolves the chained method on the factory's result, creating the edge only when that class actually has the method (so a wrong inference produces no edge). Existing PHP indexes should be re-indexed (`codegraph index -f`) to benefit. Thanks @cvanderlinden. (#608) (PHP)
 - Search relevance: including the project name in a query (a user naturally writes `MyApp backend routes`) no longer buries the part of the codebase the query is actually about. The project name lexically matches whatever stack embeds it — a `MyAppFrontend/` directory, a `MyAppApp` class — and it was over-weighted two ways: a single PascalCase word was scored once per sub-token (`my` / `app` / `myapp`), so one concept boosted that path several times over; and the name carried full path / disambiguation weight even though it names the whole repo, not any symbol. Now path relevance counts each query word once, and a word matching the project name (derived from `go.mod`, `package.json`, or the repo directory) is dropped from path scoring and from `codegraph_explore`'s type-disambiguation bias — unless it's the only term, so a bare project-name search still works. In a mixed-stack repo, a backend question now surfaces the backend even with the project name in the query. Thanks @MiNuo1. (#720)

+ 73 - 0
__tests__/resolution.test.ts

@@ -2256,6 +2256,79 @@ class Other { void onlyOther() {} }
 class Caller {
     void run() { Foo.getInstance().onlyOther(); }
 }
+`
+      );
+      cg = await CodeGraph.init(tempDir, { index: true });
+      // Foo has no onlyOther() — must not mis-attach to the same-named Other::onlyOther.
+      expect(callerNamesOf('Other::onlyOther')).toEqual([]);
+    });
+  });
+
+  describe('Kotlin chained companion-factory call resolution (#645/#608 mechanism)', () => {
+    function callerNamesOf(qualifiedName: string): string[] {
+      const target = cg.getNodesByKind('method').find((n) => n.qualifiedName === qualifiedName);
+      if (!target) return [];
+      const names = cg
+        .getIncomingEdges(target.id)
+        .filter((e) => e.kind === 'calls')
+        .map((e) => cg.getNode(e.source)?.name)
+        .filter((n): n is string => !!n);
+      return [...new Set(names)].sort();
+    }
+
+    it('resolves Foo.getInstance().bar() via the companion return type, never a same-named decoy', async () => {
+      // Aaa sorts first and has a same-named bar() — without the chain fix Kotlin
+      // dropped the receiver to a bare `bar` and attached to Aaa (a wrong edge).
+      fs.writeFileSync(
+        path.join(tempDir, 'Main.kt'),
+        `class Aaa { fun bar() {} }
+class Foo {
+    companion object {
+        fun getInstance(): Foo = Foo()
+    }
+    fun bar() {}
+}
+class Caller {
+    fun run() { Foo.getInstance().bar() }
+}
+`
+      );
+      cg = await CodeGraph.init(tempDir, { index: true });
+      expect(callerNamesOf('Foo::bar')).toEqual(['run']);
+      expect(callerNamesOf('Aaa::bar')).toEqual([]);
+    });
+
+    it('resolves a companion factory chain that passes arguments — Foo.create(cfg).build()', async () => {
+      fs.writeFileSync(
+        path.join(tempDir, 'Main.kt'),
+        `class Config
+class Foo {
+    companion object {
+        fun create(c: Config): Foo = Foo()
+    }
+    fun build() {}
+}
+class Caller {
+    fun run() { Foo.create(Config()).build() }
+}
+`
+      );
+      cg = await CodeGraph.init(tempDir, { index: true });
+      expect(callerNamesOf('Foo::build')).toEqual(['run']);
+    });
+
+    it('creates NO edge when the companion return type lacks the method (silent miss, not a wrong edge)', async () => {
+      fs.writeFileSync(
+        path.join(tempDir, 'Main.kt'),
+        `class Foo {
+    companion object {
+        fun getInstance(): Foo = Foo()
+    }
+}
+class Other { fun onlyOther() {} }
+class Caller {
+    fun run() { Foo.getInstance().onlyOther() }
+}
 `
       );
       cg = await CodeGraph.init(tempDir, { index: true });

+ 1 - 1
src/extraction/extraction-version.ts

@@ -21,4 +21,4 @@
  * turns the re-index hint into noise — keep it honest (see CLAUDE.md, "Honesty
  * in the product is load-bearing").
  */
-export const EXTRACTION_VERSION = 6;
+export const EXTRACTION_VERSION = 7;

+ 41 - 0
src/extraction/languages/kotlin.ts

@@ -2,6 +2,46 @@ import type { Node as SyntaxNode } from 'web-tree-sitter';
 import { getNodeText, getChildByField } from '../tree-sitter-helpers';
 import type { LanguageExtractor } from '../tree-sitter-types';
 
+/** Kotlin return types that can't be a chained-call receiver (no class to chain on). */
+const KOTLIN_NON_CLASS_RETURN = new Set(['Unit', 'Nothing']);
+
+/**
+ * A Kotlin function's declared return type, normalized to the bare class name a
+ * chained `Foo.getInstance().bar()` could be called on (the #645/#608 mechanism).
+ * tree-sitter-kotlin exposes no field names, so the return type is found
+ * positionally: the first `user_type` / `nullable_type` that FOLLOWS
+ * `function_value_parameters` (an extension receiver's type sits before the
+ * params, so it's never mistaken for the return). An inferred return (expression
+ * body with no `: Type`), a lambda return type, or `Unit` / `Nothing` → undefined.
+ */
+function extractKotlinReturnType(node: SyntaxNode, source: string): string | undefined {
+  let seenParams = false;
+  for (let i = 0; i < node.namedChildCount; i++) {
+    const child = node.namedChild(i);
+    if (!child) continue;
+    if (child.type === 'function_value_parameters') {
+      seenParams = true;
+      continue;
+    }
+    if (!seenParams) continue;
+    // The return type is the type node right after the params. If we reach the
+    // body or a `where`-clause first, there's no declared return type.
+    if (child.type === 'function_body' || child.type === 'type_constraints') return undefined;
+    if (child.type === 'user_type' || child.type === 'nullable_type') {
+      const ut =
+        child.type === 'nullable_type'
+          ? (child.namedChildren.find((c: SyntaxNode) => c.type === 'user_type') ?? child)
+          : child;
+      const typeId = ut.namedChildren.find((c: SyntaxNode) => c.type === 'type_identifier');
+      const name = getNodeText(typeId ?? ut, source).trim();
+      if (!name || !/^[A-Za-z_]\w*$/.test(name)) return undefined;
+      if (KOTLIN_NON_CLASS_RETURN.has(name)) return undefined;
+      return name;
+    }
+  }
+  return undefined;
+}
+
 /** Check if a node matches the `fun interface` misparse pattern */
 function isFunInterfaceNode(node: SyntaxNode): boolean {
   let hasFun = false;
@@ -130,6 +170,7 @@ export const kotlinExtractor: LanguageExtractor = {
   },
   paramsField: 'function_value_parameters',
   returnField: 'type',
+  getReturnType: extractKotlinReturnType,
   resolveBody: (node, _bodyField) => {
     // Kotlin's tree-sitter grammar doesn't use field names, so getChildByField fails.
     // Find body by type: function_body for functions/methods, class_body for classes,

+ 32 - 13
src/extraction/tree-sitter.ts

@@ -2525,22 +2525,41 @@ export class TreeSitterExtractor {
                 calleeName = methodName;
               }
             } else if (
-              (this.language === 'cpp' || this.language === 'c') &&
+              (this.language === 'cpp' || this.language === 'c' || this.language === 'kotlin') &&
               receiver &&
               receiver.type === 'call_expression'
             ) {
-              // C/C++ receiver that is itself a call — `Foo::instance().bar()`,
-              // `openSession()->run()`, `mgr.view().render()`. Keep the inner
-              // call so resolution can infer bar()'s class from what the inner
-              // call RETURNS (#645). Encode as `<innerCallee>().<method>`; the
-              // `().` marker never appears in an ordinary ref, so the C++
-              // resolver can detect and split it. Other languages keep the
-              // bare-name behavior (dropping the receiver) below.
-              const innerFn = getChildByField(receiver, 'function');
-              const innerCallee = innerFn
-                ? getNodeText(innerFn, this.source).replace(/->/g, '.').replace(/\s+/g, '')
-                : '';
-              calleeName = innerCallee ? `${innerCallee}().${methodName}` : methodName;
+              // Receiver that is itself a call — `Foo::instance().bar()`,
+              // `openSession()->run()`, `mgr.view().render()` (C/C++), or
+              // `Foo.getInstance().bar()` (Kotlin). Keep the inner call so
+              // resolution can infer bar()'s class from what the inner call
+              // RETURNS (#645/#608). Encode as `<innerCallee>().<method>`; the
+              // `().` marker never appears in an ordinary ref, so the resolver
+              // can detect and split it. Other languages keep the bare-name
+              // behavior (dropping the receiver) below.
+              let innerCallee: string;
+              let reencode: boolean;
+              if (this.language === 'kotlin') {
+                // tree-sitter-kotlin has no field names — the inner callee is the
+                // call_expression's first named child (a navigation_expression
+                // `Foo.getInstance`, or a bare identifier for a free call).
+                const innerNav = receiver.namedChild(0);
+                innerCallee = innerNav ? getNodeText(innerNav, this.source).replace(/\s+/g, '') : '';
+                // Only re-encode a CLASS / companion-factory chain, whose receiver
+                // chain starts with a capitalized type (`Foo.getInstance().bar()`).
+                // An instance chain (`list.filter{}.map{}`) has a lowercase receiver
+                // whose type we can't recover here — re-encoding it would only drop
+                // the edge (no chain resolution, no bare-name fallback), regressing
+                // recall in fluent codebases. Leave those to the bare-name path.
+                reencode = /^[A-Z]/.test(innerCallee);
+              } else {
+                const innerFn = getChildByField(receiver, 'function');
+                innerCallee = innerFn
+                  ? getNodeText(innerFn, this.source).replace(/->/g, '.').replace(/\s+/g, '')
+                  : '';
+                reencode = !!innerCallee;
+              }
+              calleeName = reencode ? `${innerCallee}().${methodName}` : methodName;
             } else {
               calleeName = methodName;
             }

+ 39 - 15
src/resolution/name-matcher.ts

@@ -577,15 +577,16 @@ export function matchPhpCallChain(
 }
 
 /**
- * Resolve a Java chained call whose receiver is a static factory / fluent call —
+ * Resolve a dotted chained call whose receiver is a static factory / fluent call —
  * `Foo.getInstance().bar()`, encoded by the extractor as `Foo.getInstance().bar`
  * (#645/#608 mechanism). The receiver's type is what `Foo.getInstance` returns
  * (its declared return type); the outer method is then resolved and VALIDATED on
  * it (resolveMethodOnType requires `Type::method` to exist), so a wrong inference
  * yields no edge rather than a wrong one (e.g. a same-named `bar()` on an
- * unrelated class is never matched).
+ * unrelated class is never matched). Shared by the JVM dot-notation languages
+ * (Java, Kotlin) — same receiver shape, same `Class::method` qualified names.
  */
-export function matchJavaCallChain(
+export function matchDottedCallChain(
   ref: UnresolvedRef,
   context: ResolutionContext,
 ): ResolvedRef | null {
@@ -593,20 +594,42 @@ export function matchJavaCallChain(
   if (!m || !m[1] || !m[2]) return null;
   const inner = m[1]; // `Foo.getInstance`
   const method = m[2]; // `bar`
-  // Require an explicit receiver (`Receiver.factory`) — a bare `factory().bar`
-  // chain (a method on `this`) isn't handled here.
   const lastDot = inner.lastIndexOf('.');
-  if (lastDot <= 0) return null;
+
+  // Constructor receiver `Foo(args).method()` (encoded `Foo().method`): a bare,
+  // capitalized inner is a class construction, so the receiver's type is the
+  // class itself — resolve the method on it. Kotlin only: there an unprefixed
+  // capitalized call constructs the class, whereas in Java a bare `Foo()` is a
+  // method call (constructors need `new`), so we must not assume construction.
+  // A lowercase bare inner is a top-level `factory().method()` whose type we
+  // can't recover — bail.
+  if (lastDot <= 0) {
+    if (ref.language !== 'kotlin' || !/^[A-Z]/.test(inner)) return null;
+    return resolveMethodOnType(inner, method, ref, context, 0.85, 'instance-method', importedFqnOf(inner, ref, context));
+  }
+
+  // Factory/fluent receiver `Receiver.factory(args).method()`: the receiver's
+  // type is what `Receiver.factory` returns (its declared return type).
   const factoryClass = inner.slice(0, lastDot).split('.').pop(); // simple class name
   const factoryMethod = inner.slice(lastDot + 1);
   if (!factoryClass || !factoryMethod) return null;
   const ret = lookupCalleeReturnType(`${factoryClass}::${factoryMethod}`, ref, context);
   if (!ret) return null;
-  // When several classes share the returned simple name, the caller file's
-  // import of that type is the only signal that names WHICH one (#314).
+  return resolveMethodOnType(ret, method, ref, context, 0.85, 'instance-method', importedFqnOf(ret, ref, context));
+}
+
+/**
+ * When several classes share a simple type name, the caller file's import of
+ * that type is the only signal that names WHICH one (#314). Returns the imported
+ * FQN for `typeName` in the ref's file, or undefined.
+ */
+function importedFqnOf(
+  typeName: string,
+  ref: UnresolvedRef,
+  context: ResolutionContext,
+): string | undefined {
   const imports = context.getImportMappings(ref.filePath, ref.language);
-  const importedFqn = imports.find((i) => i.localName === ret)?.source;
-  return resolveMethodOnType(ret, method, ref, context, 0.85, 'instance-method', importedFqn);
+  return imports.find((i) => i.localName === typeName)?.source;
 }
 
 /**
@@ -1039,11 +1062,12 @@ export function matchReference(
     if (result) return result;
   }
 
-  // 1d. Java chained static-factory / fluent call — `Foo.getInstance().bar()`
-  // encoded as `Foo.getInstance().bar` (#645/#608 mechanism). Resolve bar's class
-  // from getInstance's declared return type, then validate the method on it.
-  if (ref.language === 'java') {
-    result = matchJavaCallChain(ref, context);
+  // 1d. JVM (Java / Kotlin) chained static-factory / fluent call —
+  // `Foo.getInstance().bar()` encoded as `Foo.getInstance().bar` (#645/#608
+  // mechanism). Resolve bar's class from getInstance's declared return type, then
+  // validate the method on it.
+  if (ref.language === 'java' || ref.language === 'kotlin') {
+    result = matchDottedCallChain(ref, context);
     if (result) return result;
   }