فهرست منبع

fix(scala): resolve chained static-factory/apply calls Foo.create().bar() (#750) (#761)

Ports the #645 (C++) / #608 (PHP) chained-receiver mechanism to Scala. A call
whose receiver is itself a call — `Foo.create().bar()` (companion factory),
`Builder(cfg).bar()` (case-class apply), or a fluent chain — used to drop the
receiver to a bare `bar`, which name-matched a same-named method on an unrelated
type. The most common wrong edge was a stdlib `Option`/`Iterator` `.map`/`.flatMap`/
`.foreach` mis-attributed onto the project's own same-named class.

- scala.ts: `getReturnType` reads the `return_type` field — generic `List[Foo]`
  → container `List`, qualified `pkg.Foo` → `Foo`, `this.type` left undefined.
- tree-sitter.ts: re-encode `Foo.create().bar` when the inner call's receiver chain
  starts with a capital (companion factory / case-class apply); instance chains
  (`list.map().filter()`) stay bare.
- name-matcher.ts: `scala` joins the dotted-chain gate + CONSTRUCTS_VIA_BARE_CALL
  (case-class `apply` constructs the class); resolveMethodOnType validates, so a
  non-conventional `apply` returning another type yields no edge, not a wrong one.
- index.ts: `scala` joins CHAIN_LANGUAGES so trait-inherited methods resolve via
  the conformance second pass.

Validation: 4 synthetic tests (factory+decoy, case-class apply, trait conformance,
absent-method safety). Real-repo A/B on gatling (750 Scala files): +14 / -59 unique
edges — all corrections. The +14 are retargets (e.g. `HttpProtocolBuilder(cfg).baseUrl`
now resolves to HttpProtocolBuilder::baseUrl, not the same-named private BaseUrlSupport
helper); the -59 are wrong edges removed (stdlib Option/Iterator monad calls
mis-tied to the project's Validation::*, self-loops, decoy collisions) — zero genuine
factory chains dropped (verified: gatling has no real Validation.success().map() chains).
db stable at 40 MB. EXTRACTION_VERSION 12→13. Full suite green.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Colby Mchenry 2 هفته پیش
والد
کامیت
2f96f58cbb

+ 1 - 0
CHANGELOG.md

@@ -30,6 +30,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 ### Fixes
 
 - Go method calls made through a chained factory function now resolve to the correct type. A call like `New().Method()` used to drop the receiver, so the chained method attached to a same-named method on an unrelated type — or didn't resolve. CodeGraph now captures Go return types (a pointer `*Foo` resolves to `Foo`, and a multi-return `(*Foo, error)` to its first result), infers the chained receiver's type from what the factory function returns, and resolves the method on it — including methods promoted from an embedded struct — creating the edge only when the type or an embedded type genuinely has the method. Existing Go indexes should be re-indexed (`codegraph index -f`) to benefit. (#750) (Go)
+- Scala method calls made through a companion-object factory, a fluent chain, or a case-class `apply` now resolve to the correct type. A call like `Foo.create().bar()` or `Builder(cfg).bar()` used to drop the receiver, so the chained method silently attached to a same-named method on an unrelated type — most often mis-attributing a standard-library `Option` / `Iterator` `.map` / `.flatMap` / `.foreach` onto your own same-named class. CodeGraph now captures Scala return types (a generic `List[Foo]` resolves to its container `List`, a qualified `pkg.Foo` to `Foo`), infers the chained receiver's type from what the inner call returns or constructs, and resolves the method on it — including methods inherited from a trait the type extends — creating the edge only when that type or one of its traits genuinely has the method (so a wrong inference produces no edge instead of a misleading one). Existing Scala indexes should be re-indexed (`codegraph index -f`) to benefit. (#750) (Scala)
 - Rust method calls made through a chained associated function now resolve to the correct type. A call like `Foo::new().bar()` or `Foo::with(cfg).build()` used to drop the receiver, so the chained method silently attached to a same-named method on an unrelated type — or didn't resolve. CodeGraph now captures Rust return types (`-> Self` resolves to the implementing type), infers the chained receiver's type from what the associated function returns, and resolves the method on it — including methods provided by a trait the type implements (via the new `impl Trait for Type` relationships) — creating the edge only when the type or one of its traits genuinely has the method. Existing Rust indexes should be re-indexed (`codegraph index -f`) to benefit. (#750) (Rust)
 - Chained method calls now resolve when the chained method is **inherited from a superclass or declared on an interface/protocol** the receiver's type conforms to — for example a call on a sealed-subclass instance (`Either.Right(x).combine(...)`) that invokes a method defined on its parent type. Previously these chains found no caller edge even though the factory's type was known, so the call was invisible to callers, impact, and trace. CodeGraph now walks the type's supertypes (its `extends` / `implements` relationships) to find the method, creating the edge only when a supertype genuinely declares it (so a wrong inference still produces no edge). This makes Java, Kotlin, and C# factory and fluent chains more complete. Existing indexes should be re-indexed (`codegraph index -f`) to benefit. (#750)
 - Swift method calls made through a static factory, fluent chain, or constructor now resolve to the correct class. A call like `Foo.make().draw()` or `Foo().draw()` used to drop the receiver, so the chained method silently attached to a same-named method on an unrelated class — or didn't resolve at all. CodeGraph now captures Swift return types and infers the chained receiver's type from what the inner call returns (or the constructed type), creating the edge only when that class genuinely has the method (so a wrong inference produces no edge instead of a misleading one). Existing Swift indexes should be re-indexed (`codegraph index -f`) to benefit. (#750) (Swift)

+ 98 - 0
__tests__/resolution.test.ts

@@ -2728,4 +2728,102 @@ func caller() { engine().ServeHTTP() }
       expect(rawCalls.length).toBeLessThan(5);
     });
   });
+
+  describe('Scala chained static-factory call resolution (#645/#608 mechanism)', () => {
+    function callerNamesOf(qualifiedName: string): string[] {
+      const target = cg.getNodesByKind('method').find((n) => n.qualifiedName === qualifiedName);
+      if (!target) return [];
+      const names = cg
+        .getIncomingEdges(target.id)
+        .filter((e) => e.kind === 'calls')
+        .map((e) => cg.getNode(e.source)?.name)
+        .filter((n): n is string => !!n);
+      return [...new Set(names)].sort();
+    }
+
+    it('resolves a companion-factory chain Foo.create().doIt() to the return type, never a same-named decoy', async () => {
+      fs.writeFileSync(
+        path.join(tempDir, 'Main.scala'),
+        `object Foo {
+  def create(): Bar = new Bar()
+}
+class Bar {
+  def doIt(): Unit = {}
+}
+class Decoy {
+  def doIt(): Unit = {}
+}
+object Main {
+  def run(): Unit = { Foo.create().doIt() }
+}
+`
+      );
+      cg = await CodeGraph.init(tempDir, { index: true });
+      expect(callerNamesOf('Bar::doIt')).toEqual(['run']);
+      expect(callerNamesOf('Decoy::doIt')).toEqual([]);
+    });
+
+    it('resolves a case-class apply construction Point(x).dist() on the constructed class', async () => {
+      fs.writeFileSync(
+        path.join(tempDir, 'Main.scala'),
+        `class Point(x: Int) {
+  def dist(): Int = x
+}
+class Other {
+  def dist(): Int = 0
+}
+object Main {
+  def run(): Unit = { Point(3).dist() }
+}
+`
+      );
+      cg = await CodeGraph.init(tempDir, { index: true });
+      expect(callerNamesOf('Point::dist')).toEqual(['run']);
+      expect(callerNamesOf('Other::dist')).toEqual([]);
+    });
+
+    it('resolves a chained method provided by a trait the return type extends (via conformance)', async () => {
+      fs.writeFileSync(
+        path.join(tempDir, 'Main.scala'),
+        `trait Base {
+  def shared(): Unit = {}
+}
+class Widget extends Base
+class Decoy {
+  def shared(): Unit = {}
+}
+object Factory {
+  def make(): Widget = new Widget()
+}
+object Main {
+  def run(): Unit = { Factory.make().shared() }
+}
+`
+      );
+      cg = await CodeGraph.init(tempDir, { index: true });
+      expect(callerNamesOf('Base::shared')).toEqual(['run']);
+      expect(callerNamesOf('Decoy::shared')).toEqual([]);
+    });
+
+    it('creates NO edge when neither the factory return type nor a supertype has the method (silent miss)', async () => {
+      fs.writeFileSync(
+        path.join(tempDir, 'Main.scala'),
+        `object Foo {
+  def create(): Bar = new Bar()
+}
+class Bar {
+}
+class Other {
+  def onlyOther(): Unit = {}
+}
+object Main {
+  def run(): Unit = { Foo.create().onlyOther() }
+}
+`
+      );
+      cg = await CodeGraph.init(tempDir, { index: true });
+      // Bar has no onlyOther() — must not mis-attach to the same-named Other::onlyOther.
+      expect(callerNamesOf('Other::onlyOther')).toEqual([]);
+    });
+  });
 });

+ 1 - 1
src/extraction/extraction-version.ts

@@ -21,4 +21,4 @@
  * turns the re-index hint into noise — keep it honest (see CLAUDE.md, "Honesty
  * in the product is load-bearing").
  */
-export const EXTRACTION_VERSION = 12;
+export const EXTRACTION_VERSION = 13;

+ 23 - 0
src/extraction/languages/scala.ts

@@ -44,6 +44,28 @@ function emitScalaTypeRefs(typeNode: SyntaxNode, fromId: string, ctx: { addUnres
   }
 }
 
+/**
+ * Capture a Scala method's declared return type as a bare type name, for the
+ * chained static-factory / fluent call mechanism (#750). `def create(): Bar`
+ * yields `Bar`; a generic `List[Bar]` yields its base `List` (the method is on
+ * the container, not the element); a qualified `pkg.Bar` yields `Bar`. A
+ * singleton self-type (`this.type`, the fluent-builder idiom) is left undefined
+ * — its type can't be recovered here, so the chain falls through rather than
+ * inferring a wrong receiver.
+ */
+function extractScalaReturnType(node: SyntaxNode, source: string): string | undefined {
+  const rt = node.childForFieldName('return_type');
+  if (!rt) return undefined;
+  const raw = getNodeText(rt, source).trim();
+  if (raw.startsWith('this.')) return undefined; // `this.type` singleton — unhandled
+  const base = raw
+    .replace(/\[[^\]]*\]/g, '') // strip generic args: List[Bar] → List
+    .replace(/\s+/g, '');
+  const last = base.split('.').pop(); // qualified pkg.Bar → Bar
+  if (!last || !/^[A-Za-z_]\w*$/.test(last)) return undefined;
+  return last;
+}
+
 function extractVisibility(node: SyntaxNode): 'public' | 'private' | 'protected' {
   for (let i = 0; i < node.namedChildCount; i++) {
     const child = node.namedChild(i);
@@ -77,6 +99,7 @@ export const scalaExtractor: LanguageExtractor = {
   bodyField: 'body',
   paramsField: 'parameters',
   returnField: 'return_type',
+  getReturnType: extractScalaReturnType,
   interfaceKind: 'trait',
 
   classifyClassNode: (node: SyntaxNode) => {

+ 7 - 1
src/extraction/tree-sitter.ts

@@ -2530,7 +2530,8 @@ export class TreeSitterExtractor {
                 this.language === 'kotlin' ||
                 this.language === 'swift' ||
                 this.language === 'rust' ||
-                this.language === 'go') &&
+                this.language === 'go' ||
+                this.language === 'scala') &&
               receiver &&
               receiver.type === 'call_expression'
             ) {
@@ -2572,6 +2573,11 @@ export class TreeSitterExtractor {
                 // only drop the edge. C/C++ re-encode any inner.
                 if (this.language === 'rust') reencode = innerFn?.type === 'scoped_identifier';
                 else if (this.language === 'go') reencode = innerFn?.type === 'identifier';
+                // Scala: only a companion-factory / case-class-apply chain whose
+                // receiver chain starts with a capitalized type (`Foo.create().bar()`,
+                // `Foo(args).bar()`). An instance chain (`list.map().filter()`) has a
+                // lowercase receiver whose type we can't recover — leave it bare.
+                else if (this.language === 'scala') reencode = /^[A-Z]/.test(innerCallee);
                 else reencode = !!innerCallee;
               }
               calleeName = reencode ? `${innerCallee}().${methodName}` : methodName;

+ 1 - 1
src/resolution/index.ts

@@ -37,7 +37,7 @@ const SUPERTYPE_BEARING_KINDS = new Set<Node['kind']>([
  * second pass. Dotted-receiver languages resolve via matchDottedCallChain; the
  * `::`-receiver ones (Rust) via matchScopedCallChain.
  */
-const CHAIN_LANGUAGES = new Set(['java', 'kotlin', 'csharp', 'swift', 'rust', 'go']);
+const CHAIN_LANGUAGES = new Set(['java', 'kotlin', 'csharp', 'swift', 'rust', 'go', 'scala']);
 const SCOPED_CHAIN_LANGUAGES = new Set(['rust']);
 
 /** The extractor's chained-receiver encoding: `<inner>().<method>`. */

+ 11 - 6
src/resolution/name-matcher.ts

@@ -600,9 +600,12 @@ export function matchScopedCallChain(
 /**
  * Languages where an unprefixed capitalized call `Foo(args)` constructs the
  * class (so a `Foo(args).method()` receiver's type is `Foo`). Java/C# need `new`,
- * so a bare `Foo()` there is a method call, not construction — excluded.
+ * so a bare `Foo()` there is a method call, not construction — excluded. Scala's
+ * `Foo(args)` is a case-class / companion `apply`, which conventionally returns
+ * `Foo` — and resolveMethodOnType validates, so a non-conventional `apply` that
+ * returns another type simply yields no edge rather than a wrong one.
  */
-const CONSTRUCTS_VIA_BARE_CALL = new Set(['kotlin', 'swift']);
+const CONSTRUCTS_VIA_BARE_CALL = new Set(['kotlin', 'swift', 'scala']);
 
 /**
  * Resolve a dotted chained call whose receiver is a static factory / fluent call —
@@ -1120,15 +1123,17 @@ export function matchReference(
   }
 
   // 1d. Dotted chained static-factory / fluent call (Java / Kotlin / C# / Swift /
-  // Go) — `Foo.getInstance().bar()` encoded as `Foo.getInstance().bar`, or Go's
-  // bare-factory `New().Method()` as `New().Method` (#645/#608 mechanism). Resolve
-  // the method's class from the inner call's declared return type, then validate it.
+  // Go / Scala) — `Foo.getInstance().bar()` encoded as `Foo.getInstance().bar`,
+  // Go's bare-factory `New().Method()` as `New().Method`, or Scala's companion
+  // factory `Foo.create().bar()` (#645/#608 mechanism). Resolve the method's class
+  // from the inner call's declared return type, then validate it.
   if (
     ref.language === 'java' ||
     ref.language === 'kotlin' ||
     ref.language === 'csharp' ||
     ref.language === 'swift' ||
-    ref.language === 'go'
+    ref.language === 'go' ||
+    ref.language === 'scala'
   ) {
     result = matchDottedCallChain(ref, context);
     if (result) return result;