Ver código fonte

fix(php): resolve chained static-factory calls `Cls::for($x)->method()` (#608) (#749)

A method called through a PHP fluent static factory — `ApiClient::for($c)->createOrder()`,
the canonical Laravel per-credential/per-tenant client idiom — produced no
`calls` edge: the receiver of `->createOrder` is the `Cls::for(...)` static
call, whose result type was never recovered, so the edge was dropped and
`codegraph_callers` returned nothing.

Same shape as the C++ singleton/factory fix (#645), reusing its return_type
column + the chained-call mechanism:
- Capture PHP return types (getReturnType): `: self` / `: static` / `$this`
  stored as the `self` marker, a concrete `: Type` as its short name,
  primitives/unions dropped.
- Encode the chained scoped-call receiver as `Cls::for().method` so the
  resolver can split it (PHP-gated, in extractCall).
- New matchPhpCallChain: look up the factory's return type (`self` → the
  factory's own class; concrete → that class), then resolve AND validate the
  method on it — a wrong inference yields no edge, never a wrong one.

EXTRACTION_VERSION 4->5 (re-index to populate PHP return types + chained edges).

Validated on koel (1383 PHP files): node count identical (no explosion),
0 edges lost, +80 chained-call edges recovered; synthetic tests cover the
self-factory, concrete-return, namespace, decoy, and absent-method cases.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Colby Mchenry 2 semanas atrás
pai
commit
eb5960b535

+ 1 - 0
CHANGELOG.md

@@ -29,6 +29,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
 ### Fixes
 
+- PHP: a method called through a chained static factory — `Cls::for($x)->method(...)`, the canonical Laravel per-credential / per-tenant client idiom — now records a caller edge. Previously the receiver type (what `for()` returns) was never recovered, so `codegraph_callers` returned nothing for the method and the call was invisible to `codegraph_impact`. CodeGraph now captures PHP return types — `: self` / `: static` resolve to the declaring class, `: SomeClass` to that class — and resolves the chained method on the factory's result, creating the edge only when that class actually has the method (so a wrong inference produces no edge). Existing PHP indexes should be re-indexed (`codegraph index -f`) to benefit. Thanks @cvanderlinden. (#608) (PHP)
 - Search relevance: including the project name in a query (a user naturally writes `MyApp backend routes`) no longer buries the part of the codebase the query is actually about. The project name lexically matches whatever stack embeds it — a `MyAppFrontend/` directory, a `MyAppApp` class — and it was over-weighted two ways: a single PascalCase word was scored once per sub-token (`my` / `app` / `myapp`), so one concept boosted that path several times over; and the name carried full path / disambiguation weight even though it names the whole repo, not any symbol. Now path relevance counts each query word once, and a word matching the project name (derived from `go.mod`, `package.json`, or the repo directory) is dropped from path scoring and from `codegraph_explore`'s type-disambiguation bias — unless it's the only term, so a bare project-name search still works. In a mixed-stack repo, a backend question now surfaces the backend even with the project name in the query. Thanks @MiNuo1. (#720)
 - Go: a function called only from inside an anonymous closure — a cobra `RunE: func(…) {…}` handler, a goroutine literal, or a callback closure stored in a package-level `var` — now shows its real caller. Previously the call leaked to the file node, so `codegraph_callers` and `codegraph_impact` reported such a function as having no meaningful caller; the call is now attributed to the enclosing declaration, so editing the function surfaces the closures that use it. Existing Go indexes should be re-indexed (`codegraph index -f`) to benefit. Thanks @Cyclone1070. (#693) (Go)
 - Indexing no longer aborts when a `.gitignore` contains non-UTF-8 bytes or an unparseable pattern. A `.gitignore` transparently encrypted in place by corporate DLP / endpoint-security software (a common enterprise scenario) — or one with a stray pattern the matcher can't compile (`\[`, producing "Unterminated character class") — used to crash the entire `sync` / `index` with a screen of garbled bytes and never name the offending file, leaving `Files: 0 / Nodes: 0`. CodeGraph now skips a `.gitignore` that isn't valid UTF-8 text whole, drops only the individual unparseable patterns from a text one, and logs a warning naming the file — indexing continues either way. Thanks @zhanghang-9527. (#682)

+ 24 - 0
__tests__/extraction.test.ts

@@ -2406,6 +2406,30 @@ end
     });
   });
 
+  describe('PHP return type capture (#608)', () => {
+    it('captures self/static factory returns as the `self` marker; primitives as undefined', () => {
+      const code = `<?php
+class ApiClient {
+    public static function for(string $c): self { return new self; }
+    public static function make(): static { return new static; }
+    public function send(array $p): array { return []; }
+}`;
+      const result = extractFromSource('ApiClient.php', code);
+      expect(result.nodes.find((n) => n.name === 'for' && n.kind === 'method')?.returnType).toBe('self');
+      expect(result.nodes.find((n) => n.name === 'make' && n.kind === 'method')?.returnType).toBe('self');
+      // `array` is not a class to chain on → no return type recorded.
+      expect(result.nodes.find((n) => n.name === 'send' && n.kind === 'method')?.returnType).toBeUndefined();
+    });
+
+    it('captures a concrete return type as its short class name', () => {
+      const code = `<?php
+namespace App;
+class WidgetFactory { public static function make(): Widget { return new Widget(); } }`;
+      const result = extractFromSource('WidgetFactory.php', code);
+      expect(result.nodes.find((n) => n.name === 'make' && n.kind === 'method')?.returnType).toBe('Widget');
+    });
+  });
+
   describe('C/C++ return type capture (#645)', () => {
     it('captures the normalized return type of a C++ method/function', () => {
       const code = `

+ 37 - 0
__tests__/resolution.test.ts

@@ -2158,4 +2158,41 @@ void wrong() { WidgetFactory::create().onlyOther(); }
       expect(callerNamesOf('Other::onlyOther')).toEqual([]);
     });
   });
+
+  describe('PHP chained static-factory call resolution (#608)', () => {
+    function callerNamesOf(qualifiedName: string): string[] {
+      const target = cg.getNodesByKind('method').find((n) => n.qualifiedName === qualifiedName);
+      if (!target) return [];
+      const names = cg
+        .getIncomingEdges(target.id)
+        .filter((e) => e.kind === 'calls')
+        .map((e) => cg.getNode(e.source)?.name)
+        .filter((n): n is string => !!n);
+      return [...new Set(names)].sort();
+    }
+
+    it('resolves Cls::for($x)->method() via the factory\'s `: self` return (#608)', async () => {
+      fs.writeFileSync(
+        path.join(tempDir, 'ApiClient.php'),
+        `<?php\nclass ApiClient {\n    public static function for(string $c): self { return new self; }\n    public function createOrder(array $p): array { return []; }\n}\n`
+      );
+      fs.writeFileSync(
+        path.join(tempDir, 'DispatchOrder.php'),
+        `<?php\nclass DispatchOrder {\n    public function handle(): void {\n        ApiClient::for('cred')->createOrder([]);\n    }\n}\n`
+      );
+      cg = await CodeGraph.init(tempDir, { index: true });
+      // The chained call's edge attaches to the factory result's method.
+      expect(callerNamesOf('ApiClient::createOrder')).toContain('handle');
+    });
+
+    it('creates NO edge when the factory result lacks the method (#608)', async () => {
+      fs.writeFileSync(
+        path.join(tempDir, 'lib.php'),
+        `<?php\nclass ApiClient { public static function for(string $c): self { return new self; } }\nclass Other { public function onlyOther(): void {} }\nclass Caller { public function go(): void { ApiClient::for('x')->onlyOther(); } }\n`
+      );
+      cg = await CodeGraph.init(tempDir, { index: true });
+      // ApiClient has no onlyOther — must not mis-attach to the same-named Other::onlyOther.
+      expect(callerNamesOf('Other::onlyOther')).toEqual([]);
+    });
+  });
 });

+ 1 - 1
src/extraction/extraction-version.ts

@@ -21,4 +21,4 @@
  * turns the re-index hint into noise — keep it honest (see CLAUDE.md, "Honesty
  * in the product is load-bearing").
  */
-export const EXTRACTION_VERSION = 4;
+export const EXTRACTION_VERSION = 5;

+ 34 - 1
src/extraction/languages/php.ts

@@ -1,5 +1,5 @@
 import type { Node as SyntaxNode } from 'web-tree-sitter';
-import { getNodeText } from '../tree-sitter-helpers';
+import { getNodeText, getChildByField } from '../tree-sitter-helpers';
 import type { LanguageExtractor } from '../tree-sitter-types';
 
 // include / require (+ _once) expression node types. These carry the
@@ -33,6 +33,38 @@ function phpStaticIncludePath(node: SyntaxNode, source: string): string | null {
   return content ? getNodeText(content, source) : null;
 }
 
+/** PHP built-in return types that can't be a method receiver (so no class to chain on). */
+const PHP_NON_CLASS_RETURN = new Set([
+  'array', 'string', 'int', 'integer', 'float', 'double', 'bool', 'boolean',
+  'void', 'mixed', 'never', 'null', 'false', 'true', 'object', 'callable',
+  'iterable', 'resource',
+]);
+
+/**
+ * A method/function's declared return type, normalized to the class a chained
+ * `->method()` could be called on (issue #608). `self` / `static` / `$this` are
+ * kept as the marker `self` and resolved to the declaring class at resolution
+ * time; a concrete type returns its short name; primitives / unions / nullable
+ * non-class types return undefined.
+ */
+function extractPhpReturnType(node: SyntaxNode, source: string): string | undefined {
+  let rt = getChildByField(node, 'return_type');
+  if (!rt) return undefined;
+  // Unwrap `?Type`. Union / intersection types are ambiguous — skip them.
+  if (rt.type === 'optional_type') rt = rt.namedChild(0) ?? rt;
+  if (!rt || rt.type === 'primitive_type') return undefined;
+
+  const nameNode = rt.type === 'named_type' ? (rt.namedChild(0) ?? rt) : rt;
+  const text = getNodeText(nameNode, source).trim().replace(/^\\/, '');
+  if (!text) return undefined;
+  const last = text.split('\\').pop() ?? text;
+  const lc = last.toLowerCase();
+  if (lc === 'self' || lc === 'static' || lc === 'this' || lc === '$this') return 'self';
+  if (PHP_NON_CLASS_RETURN.has(lc)) return undefined;
+  if (!/^[A-Za-z_]\w*$/.test(last)) return undefined; // union/intersection/complex
+  return last;
+}
+
 export const phpExtractor: LanguageExtractor = {
   functionTypes: ['function_definition'],
   classTypes: ['class_declaration', 'trait_declaration'],
@@ -50,6 +82,7 @@ export const phpExtractor: LanguageExtractor = {
   bodyField: 'body',
   paramsField: 'parameters',
   returnField: 'return_type',
+  getReturnType: extractPhpReturnType,
   classifyClassNode: (node) => {
     return node.type === 'trait_declaration' ? 'trait' : 'class';
   },

+ 27 - 0
src/extraction/tree-sitter.ts

@@ -2349,6 +2349,33 @@ export class TreeSitterExtractor {
       // single-dot receiver regex fails. Pull out the immediate field after `this.`
       // so the receiver is the field name (`userbo`), which the resolver can then
       // look up in the enclosing class's field declarations.
+      // PHP static-factory fluent chain: `Cls::for($x)->method()` — the receiver
+      // is itself a static call, so resolution must infer the method's class
+      // from what `Cls::for` RETURNS (its `: self` / `: static` / `: Type`),
+      // #608 (mirrors the C++ chain fix in #645). Encode `<Cls::factory>().<method>`;
+      // the `().` marker lets the PHP resolver split it. The receiver text
+      // (`Cls::for('x')`) carries the args, so without this it degrades to an
+      // unresolvable string and the call edge is dropped.
+      if (methodName && this.language === 'php' && objectField.type === 'scoped_call_expression') {
+        const innerScope = getChildByField(objectField, 'scope');
+        const innerName = getChildByField(objectField, 'name');
+        if (innerScope && innerName) {
+          calleeName = `${getNodeText(innerScope, this.source)}::${getNodeText(innerName, this.source)}().${methodName}`;
+        } else {
+          calleeName = methodName;
+        }
+        if (calleeName) {
+          this.unresolvedReferences.push({
+            fromNodeId: callerId,
+            referenceName: calleeName,
+            referenceKind: 'calls',
+            line: node.startPosition.row + 1,
+            column: node.startPosition.column,
+          });
+        }
+        return;
+      }
+
       let receiverName: string;
       if (objectField.type === 'field_access') {
         const inner = getChildByField(objectField, 'object');

+ 40 - 5
src/resolution/name-matcher.ts

@@ -414,10 +414,11 @@ function cppLastSegment(name: string): string {
 
 /**
  * Return type captured at extraction for `Class::method` (or a free function),
- * read off the indexed node's `returnType` (#645). Null when not indexed or no
- * return type was recorded (e.g. a `void`/primitive return).
+ * read off the indexed node's `returnType` — used by the C++ (#645) and PHP
+ * (#608) chained-call resolvers. Language-filtered. Null when not indexed or no
+ * return type was recorded (a `void`/primitive return).
  */
-function lookupCppReturnType(
+function lookupCalleeReturnType(
   callee: string,
   ref: UnresolvedRef,
   context: ResolutionContext,
@@ -492,10 +493,10 @@ function resolveCppCallResultType(
     if (recv.includes('.') || recv.includes('(') || recv.includes('::')) return null; // single level only
     const recvType = inferCppReceiverType(recv, ref, context, depth + 1);
     if (!recvType) return null;
-    return lookupCppReturnType(`${recvType}::${method}`, ref, context);
+    return lookupCalleeReturnType(`${recvType}::${method}`, ref, context);
   }
 
-  const ret = lookupCppReturnType(expr, ref, context);
+  const ret = lookupCalleeReturnType(expr, ref, context);
   if (ret) return ret;
 
   // Direct construction — the callee itself names a class/struct.
@@ -549,6 +550,32 @@ export function matchCppCallChain(
   return resolveMethodOnType(cls, m[2], ref, context, 0.85, 'instance-method');
 }
 
+/**
+ * Resolve a PHP fluent static-factory chain whose receiver is a static call —
+ * `Cls::for($x)->method()`, encoded by the extractor as `Cls::for().method`
+ * (#608, the per-credential Laravel client idiom). The receiver's type is what
+ * `Cls::for` returns: a `: self` / `: static` resolves to `Cls` itself, a
+ * concrete `: Type` to that type. The outer method is then resolved and
+ * VALIDATED on it (resolveMethodOnType requires the method to exist), so a
+ * wrong inference yields no edge rather than a wrong one.
+ */
+export function matchPhpCallChain(
+  ref: UnresolvedRef,
+  context: ResolutionContext,
+): ResolvedRef | null {
+  const m = ref.referenceName.match(/^(.+)\(\)\.(\w+)$/);
+  if (!m || !m[1] || !m[2]) return null;
+  const inner = m[1];
+  const method = m[2];
+  if (!inner.includes('::')) return null; // only static-factory (`Cls::method`) chains
+  const factoryClass = inner.slice(0, inner.lastIndexOf('::'));
+  const ret = lookupCalleeReturnType(inner, ref, context);
+  if (!ret) return null;
+  // `self` (the extractor's marker for self/static/$this) → the factory's class.
+  const resolvedClass = ret === 'self' ? factoryClass : ret;
+  return resolveMethodOnType(resolvedClass, method, ref, context, 0.85, 'instance-method');
+}
+
 /**
  * Java/Kotlin: infer a receiver's declared type by walking field declarations
  * in the class enclosing the call site. The field's `signature` is already in
@@ -971,6 +998,14 @@ export function matchReference(
     if (result) return result;
   }
 
+  // 1c. PHP fluent static-factory chain — `Cls::for($x)->method()` encoded as
+  // `Cls::for().method` (#608). Same idea as 1b: the receiver's type is the
+  // factory's `: self` / `: Type` return.
+  if (ref.language === 'php') {
+    result = matchPhpCallChain(ref, context);
+    if (result) return result;
+  }
+
   // 2. Method call pattern
   result = matchMethodCall(ref, context);
   if (result) return result;