Răsfoiți Sursa

fix(java): resolve chained static-factory calls Foo.getInstance().bar() (#750) (#751)

A Java method called through a static factory or fluent chain — `Foo.getInstance().bar()`,
`Config.create(opts).build()` — lost the receiver's type, so the chained method either
didn't resolve at all or (when a same-named method existed on an unrelated class) attached
to whichever class was indexed first. Ports the #645 (C++) / #608 (PHP) 3-part mechanism:

- Part 1: capture Java return types in the extractor (skip void/primitives/arrays,
  unwrap generics, strip package qualifier).
- Part 2: encode a chained-call receiver as `inner().method` with normalized empty
  parens, so factory calls that take arguments still split.
- Part 3: matchJavaCallChain resolves the chained method on the factory's return type,
  validated via resolveMethodOnType so a wrong inference yields NO edge (never a wrong one).

Validated: synthetic decoy + absent-method safety tests; real-repo A/B on google/guava
(3,227 files) — node count identical (no explosion), 0 edges lost, +1,507 unique chained
edges recovered, precision spot-checked verbatim (Splitter.on().split(),
CacheBuilder.newBuilder().recordStats(), GraphBuilder.directed().build(), nested
MultimapBuilder.linkedHashKeys().arrayListValues()). EXTRACTION_VERSION 5 -> 6.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Colby Mchenry 2 săptămâni în urmă
părinte
comite
7f6bdf7ad1

+ 1 - 0
CHANGELOG.md

@@ -29,6 +29,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
 ### Fixes
 
+- Java method calls made through a static factory or fluent chain now resolve to the correct class. A call like `Foo.getInstance().bar()` or `Config.create(opts).build()` used to lose the receiver's type, so when two classes had a same-named method the call silently attached to whichever was indexed first — or didn't resolve at all — corrupting callers, impact, and trace. CodeGraph now captures Java return types and infers the chained receiver's type from what the inner call returns, creating the edge only when that class genuinely has the method (so a wrong inference produces no edge instead of a misleading one). Covers factories and fluent builders that take arguments (`hashKeys().arrayListValues()`), including builders that return a nested type. Existing Java indexes should be re-indexed (`codegraph index -f`) to benefit. (#750) (Java)
 - PHP: a method called through a chained static factory — `Cls::for($x)->method(...)`, the canonical Laravel per-credential / per-tenant client idiom — now records a caller edge. Previously the receiver type (what `for()` returns) was never recovered, so `codegraph_callers` returned nothing for the method and the call was invisible to `codegraph_impact`. CodeGraph now captures PHP return types — `: self` / `: static` resolve to the declaring class, `: SomeClass` to that class — and resolves the chained method on the factory's result, creating the edge only when that class actually has the method (so a wrong inference produces no edge). Existing PHP indexes should be re-indexed (`codegraph index -f`) to benefit. Thanks @cvanderlinden. (#608) (PHP)
 - Search relevance: including the project name in a query (a user naturally writes `MyApp backend routes`) no longer buries the part of the codebase the query is actually about. The project name lexically matches whatever stack embeds it — a `MyAppFrontend/` directory, a `MyAppApp` class — and it was over-weighted two ways: a single PascalCase word was scored once per sub-token (`my` / `app` / `myapp`), so one concept boosted that path several times over; and the name carried full path / disambiguation weight even though it names the whole repo, not any symbol. Now path relevance counts each query word once, and a word matching the project name (derived from `go.mod`, `package.json`, or the repo directory) is dropped from path scoring and from `codegraph_explore`'s type-disambiguation bias — unless it's the only term, so a bare project-name search still works. In a mixed-stack repo, a backend question now surfaces the backend even with the project name in the query. Thanks @MiNuo1. (#720)
 - Go: a function called only from inside an anonymous closure — a cobra `RunE: func(…) {…}` handler, a goroutine literal, or a callback closure stored in a package-level `var` — now shows its real caller. Previously the call leaked to the file node, so `codegraph_callers` and `codegraph_impact` reported such a function as having no meaningful caller; the call is now attributed to the enclosing declaration, so editing the function surfaces the closures that use it. Existing Go indexes should be re-indexed (`codegraph index -f`) to benefit. Thanks @Cyclone1070. (#693) (Go)

+ 68 - 0
__tests__/resolution.test.ts

@@ -2195,4 +2195,72 @@ void wrong() { WidgetFactory::create().onlyOther(); }
       expect(callerNamesOf('Other::onlyOther')).toEqual([]);
     });
   });
+
+  describe('Java chained static-factory call resolution (#645/#608 mechanism)', () => {
+    function callerNamesOf(qualifiedName: string): string[] {
+      const target = cg.getNodesByKind('method').find((n) => n.qualifiedName === qualifiedName);
+      if (!target) return [];
+      const names = cg
+        .getIncomingEdges(target.id)
+        .filter((e) => e.kind === 'calls')
+        .map((e) => cg.getNode(e.source)?.name)
+        .filter((n): n is string => !!n);
+      return [...new Set(names)].sort();
+    }
+
+    it('resolves Foo.getInstance().bar() via the factory return type, never a same-named decoy', async () => {
+      // Aaa sorts first and has a same-named bar() — it must never win the chain.
+      fs.writeFileSync(
+        path.join(tempDir, 'Main.java'),
+        `class Aaa { void bar() {} }
+class Foo {
+    static Foo getInstance() { return new Foo(); }
+    void bar() {}
+}
+class Caller {
+    void run() { Foo.getInstance().bar(); }
+}
+`
+      );
+      cg = await CodeGraph.init(tempDir, { index: true });
+      expect(callerNamesOf('Foo::bar')).toEqual(['run']);
+      expect(callerNamesOf('Aaa::bar')).toEqual([]);
+    });
+
+    it('resolves a factory chain that passes arguments — Foo.create(cfg).build()', async () => {
+      // The factory call carries an argument; the extractor must normalize the
+      // receiver to empty parens (`Foo.create().build`) so the chain still splits.
+      fs.writeFileSync(
+        path.join(tempDir, 'Main.java'),
+        `class Config {}
+class Foo {
+    static Foo create(Config c) { return new Foo(); }
+    void build() {}
+}
+class Caller {
+    void run() { Foo.create(new Config()).build(); }
+}
+`
+      );
+      cg = await CodeGraph.init(tempDir, { index: true });
+      expect(callerNamesOf('Foo::build')).toEqual(['run']);
+    });
+
+    it('creates NO edge when the factory return type lacks the method (silent miss, not a wrong edge)', async () => {
+      fs.writeFileSync(
+        path.join(tempDir, 'Main.java'),
+        `class Foo {
+    static Foo getInstance() { return new Foo(); }
+}
+class Other { void onlyOther() {} }
+class Caller {
+    void run() { Foo.getInstance().onlyOther(); }
+}
+`
+      );
+      cg = await CodeGraph.init(tempDir, { index: true });
+      // Foo has no onlyOther() — must not mis-attach to the same-named Other::onlyOther.
+      expect(callerNamesOf('Other::onlyOther')).toEqual([]);
+    });
+  });
 });

+ 1 - 1
src/extraction/extraction-version.ts

@@ -21,4 +21,4 @@
  * turns the re-index hint into noise — keep it honest (see CLAUDE.md, "Honesty
  * in the product is load-bearing").
  */
-export const EXTRACTION_VERSION = 5;
+export const EXTRACTION_VERSION = 6;

+ 35 - 0
src/extraction/languages/java.ts

@@ -2,6 +2,40 @@ import type { Node as SyntaxNode } from 'web-tree-sitter';
 import { getNodeText, getChildByField } from '../tree-sitter-helpers';
 import type { LanguageExtractor } from '../tree-sitter-types';
 
+/**
+ * Tree-sitter-java node types for a method's `type` (return) field that can
+ * never be a method receiver — there's no class to chain a `.method()` on, so we
+ * store no `returnType` for them.
+ */
+const JAVA_NON_CLASS_RETURN_NODES = new Set([
+  'void_type',
+  'integral_type', // int, long, short, byte, char
+  'floating_point_type', // float, double
+  'boolean_type',
+]);
+
+/**
+ * A Java method's declared return type, normalized to the bare class name a
+ * chained `Foo.getInstance().bar()` could be called on (the #645/#608 mechanism).
+ * Reads the `type` field: primitives/void/arrays yield undefined (no class to
+ * chain on), `List<Foo>` is unwrapped to its base type `List`, and a dotted
+ * package/outer-class qualifier (`java.util.List`) is stripped to the simple
+ * name. Constructors have no `type` field → undefined.
+ */
+function extractJavaReturnType(node: SyntaxNode, source: string): string | undefined {
+  const typeNode = getChildByField(node, 'type');
+  if (!typeNode) return undefined;
+  if (JAVA_NON_CLASS_RETURN_NODES.has(typeNode.type)) return undefined;
+  // An array return (`Foo[]`) isn't a receiver you call instance methods on.
+  if (typeNode.type === 'array_type') return undefined;
+  // Strip type arguments (`List<Foo>` → `List`) — the chain resolves on the base.
+  const raw = getNodeText(typeNode, source).trim().replace(/<[^>]*>/g, '');
+  // Strip a dotted package / outer-class qualifier (`java.util.List` → `List`).
+  const last = raw.split('.').pop()?.trim();
+  if (!last || !/^[A-Za-z_]\w*$/.test(last)) return undefined;
+  return last;
+}
+
 export const javaExtractor: LanguageExtractor = {
   functionTypes: [],
   classTypes: ['class_declaration'],
@@ -23,6 +57,7 @@ export const javaExtractor: LanguageExtractor = {
   bodyField: 'body',
   paramsField: 'parameters',
   returnField: 'type',
+  getReturnType: extractJavaReturnType,
   getSignature: (node, source) => {
     const params = getChildByField(node, 'parameters');
     const returnType = getChildByField(node, 'type');

+ 27 - 0
src/extraction/tree-sitter.ts

@@ -2376,6 +2376,33 @@ export class TreeSitterExtractor {
         return;
       }
 
+      // Java static-factory / fluent chain: `Foo.getInstance().bar()` — the
+      // receiver is itself a method call, so resolution must infer bar's class
+      // from what `Foo.getInstance` RETURNS (its declared return type), the
+      // #645/#608 mechanism. Encode `<inner-receiver>.<inner-method>().<method>`;
+      // the `().` marker lets the Java chain resolver split it, and normalizing to
+      // empty parens drops any factory args (`Foo.create(cfg).bar()`) that would
+      // otherwise leave a `(cfg)` in the receiver text and break the split.
+      if (
+        methodName &&
+        this.language === 'java' &&
+        objectField.type === 'method_invocation'
+      ) {
+        const innerObj = getChildByField(objectField, 'object');
+        const innerName = getChildByField(objectField, 'name');
+        if (innerObj && innerName) {
+          calleeName = `${getNodeText(innerObj, this.source)}.${getNodeText(innerName, this.source)}().${methodName}`;
+          this.unresolvedReferences.push({
+            fromNodeId: callerId,
+            referenceName: calleeName,
+            referenceKind: 'calls',
+            line: node.startPosition.row + 1,
+            column: node.startPosition.column,
+          });
+          return;
+        }
+      }
+
       let receiverName: string;
       if (objectField.type === 'field_access') {
         const inner = getChildByField(objectField, 'object');

+ 41 - 0
src/resolution/name-matcher.ts

@@ -576,6 +576,39 @@ export function matchPhpCallChain(
   return resolveMethodOnType(resolvedClass, method, ref, context, 0.85, 'instance-method');
 }
 
+/**
+ * Resolve a Java chained call whose receiver is a static factory / fluent call —
+ * `Foo.getInstance().bar()`, encoded by the extractor as `Foo.getInstance().bar`
+ * (#645/#608 mechanism). The receiver's type is what `Foo.getInstance` returns
+ * (its declared return type); the outer method is then resolved and VALIDATED on
+ * it (resolveMethodOnType requires `Type::method` to exist), so a wrong inference
+ * yields no edge rather than a wrong one (e.g. a same-named `bar()` on an
+ * unrelated class is never matched).
+ */
+export function matchJavaCallChain(
+  ref: UnresolvedRef,
+  context: ResolutionContext,
+): ResolvedRef | null {
+  const m = ref.referenceName.match(/^(.+)\(\)\.(\w+)$/);
+  if (!m || !m[1] || !m[2]) return null;
+  const inner = m[1]; // `Foo.getInstance`
+  const method = m[2]; // `bar`
+  // Require an explicit receiver (`Receiver.factory`) — a bare `factory().bar`
+  // chain (a method on `this`) isn't handled here.
+  const lastDot = inner.lastIndexOf('.');
+  if (lastDot <= 0) return null;
+  const factoryClass = inner.slice(0, lastDot).split('.').pop(); // simple class name
+  const factoryMethod = inner.slice(lastDot + 1);
+  if (!factoryClass || !factoryMethod) return null;
+  const ret = lookupCalleeReturnType(`${factoryClass}::${factoryMethod}`, ref, context);
+  if (!ret) return null;
+  // When several classes share the returned simple name, the caller file's
+  // import of that type is the only signal that names WHICH one (#314).
+  const imports = context.getImportMappings(ref.filePath, ref.language);
+  const importedFqn = imports.find((i) => i.localName === ret)?.source;
+  return resolveMethodOnType(ret, method, ref, context, 0.85, 'instance-method', importedFqn);
+}
+
 /**
  * Java/Kotlin: infer a receiver's declared type by walking field declarations
  * in the class enclosing the call site. The field's `signature` is already in
@@ -1006,6 +1039,14 @@ export function matchReference(
     if (result) return result;
   }
 
+  // 1d. Java chained static-factory / fluent call — `Foo.getInstance().bar()`
+  // encoded as `Foo.getInstance().bar` (#645/#608 mechanism). Resolve bar's class
+  // from getInstance's declared return type, then validate the method on it.
+  if (ref.language === 'java') {
+    result = matchJavaCallChain(ref, context);
+    if (result) return result;
+  }
+
   // 2. Method call pattern
   result = matchMethodCall(ref, context);
   if (result) return result;