Ver Fonte

fix(csharp): resolve chained static-factory calls Foo.Create().Bar() (#750) (#753)

A C# method called through a static factory or fluent chain —
`Foo.Create().Bar()`, `JObject.Parse(s).Property(...)`,
`Instant.FromUtc(...).InZone(zone)` — lost the receiver's type, so the chained
method didn't resolve and the call was invisible to callers/impact/trace. Ports
the #645/#608 mechanism to C# (additive, like Java #751):

- Part 1: capture C# return types in the extractor, reading the `returns` field
  (`static Foo Create()` -> `Foo`); predefined/array/generic/nullable/namespaced
  types are normalized or skipped.
- Part 2: encode a chained `member_access_expression` receiver
  (`Foo.Create(args).Bar()`) as `inner().Bar` with normalized empty parens, so
  factory calls that take arguments still split. Non-chained member calls keep
  their existing `recv.Method` text.
- Part 3: resolve via the shared matchDottedCallChain (now Java/Kotlin/C#),
  validated by resolveMethodOnType so a wrong inference yields NO edge.

Known limitation (safe): C# extension-method chains don't resolve, since the
method lives on the extension class, not the receiver's type — no edge, never a
wrong one.

Validated: synthetic decoy + args + absent-method safety tests; full suite green;
real-repo A/B on Newtonsoft.Json (945 .cs: +3, 0 lost) and nodatime (488 .cs:
+73, 0 lost) — node count identical (no explosion), 0 edges lost, precision
spot-checked verbatim (Instant.FromUtc().InZone(), Offset.FromHoursAndMinutes().Plus(),
OffsetDateTimePattern.CreateWithInvariantCulture().WithTwoDigitYearMax()).
EXTRACTION_VERSION 7 -> 8.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Colby Mchenry há 2 semanas atrás
pai
commit
aa07dc59d4

+ 1 - 0
CHANGELOG.md

@@ -29,6 +29,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
 ### Fixes
 
+- C# method calls made through a static factory or fluent chain now resolve to the correct class. A call like `Foo.Create().Bar()` or `JObject.Parse(s).Property(...)` used to lose the receiver's type, so the chained method didn't resolve and the call was invisible to callers/impact/trace. CodeGraph now captures C# return types and infers the chained receiver's type from what the inner call returns, creating the edge only when that class genuinely has the method (so a wrong inference produces no edge). Existing C# indexes should be re-indexed (`codegraph index -f`) to benefit. (#750) (C#)
 - Kotlin method calls made through a companion-object factory or fluent chain now resolve to the correct class. A call like `Foo.getInstance().bar()` or `Config.create(opts).build()` used to drop the receiver entirely, so the chained method silently attached to a same-named method on an unrelated class — or didn't resolve at all — corrupting callers, impact, and trace. CodeGraph now captures Kotlin return types and infers the chained receiver's type from what the inner call returns, creating the edge only when that class genuinely has the method (so a wrong inference produces no edge instead of a misleading one). Existing Kotlin indexes should be re-indexed (`codegraph index -f`) to benefit. (#750) (Kotlin)
 - Java method calls made through a static factory or fluent chain now resolve to the correct class. A call like `Foo.getInstance().bar()` or `Config.create(opts).build()` used to lose the receiver's type, so when two classes had a same-named method the call silently attached to whichever was indexed first — or didn't resolve at all — corrupting callers, impact, and trace. CodeGraph now captures Java return types and infers the chained receiver's type from what the inner call returns, creating the edge only when that class genuinely has the method (so a wrong inference produces no edge instead of a misleading one). Covers factories and fluent builders that take arguments (`hashKeys().arrayListValues()`), including builders that return a nested type. Existing Java indexes should be re-indexed (`codegraph index -f`) to benefit. (#750) (Java)
 - PHP: a method called through a chained static factory — `Cls::for($x)->method(...)`, the canonical Laravel per-credential / per-tenant client idiom — now records a caller edge. Previously the receiver type (what `for()` returns) was never recovered, so `codegraph_callers` returned nothing for the method and the call was invisible to `codegraph_impact`. CodeGraph now captures PHP return types — `: self` / `: static` resolve to the declaring class, `: SomeClass` to that class — and resolves the chained method on the factory's result, creating the edge only when that class actually has the method (so a wrong inference produces no edge). Existing PHP indexes should be re-indexed (`codegraph index -f`) to benefit. Thanks @cvanderlinden. (#608) (PHP)

+ 66 - 0
__tests__/resolution.test.ts

@@ -2336,4 +2336,70 @@ class Caller {
       expect(callerNamesOf('Other::onlyOther')).toEqual([]);
     });
   });
+
+  describe('C# chained static-factory call resolution (#645/#608 mechanism)', () => {
+    function callerNamesOf(qualifiedName: string): string[] {
+      const target = cg.getNodesByKind('method').find((n) => n.qualifiedName === qualifiedName);
+      if (!target) return [];
+      const names = cg
+        .getIncomingEdges(target.id)
+        .filter((e) => e.kind === 'calls')
+        .map((e) => cg.getNode(e.source)?.name)
+        .filter((n): n is string => !!n);
+      return [...new Set(names)].sort();
+    }
+
+    it('resolves Foo.Create().Bar() via the factory return type, never a same-named decoy', async () => {
+      // Aaa sorts first and has a same-named Bar() — it must never win the chain.
+      fs.writeFileSync(
+        path.join(tempDir, 'Main.cs'),
+        `class Aaa { void Bar() {} }
+class Foo {
+    static Foo Create() { return new Foo(); }
+    void Bar() {}
+}
+class Caller {
+    void Run() { Foo.Create().Bar(); }
+}
+`
+      );
+      cg = await CodeGraph.init(tempDir, { index: true });
+      expect(callerNamesOf('Foo::Bar')).toEqual(['Run']);
+      expect(callerNamesOf('Aaa::Bar')).toEqual([]);
+    });
+
+    it('resolves a factory chain that passes arguments — Foo.Make(cfg).Build()', async () => {
+      fs.writeFileSync(
+        path.join(tempDir, 'Main.cs'),
+        `class Config {}
+class Foo {
+    static Foo Make(Config c) { return new Foo(); }
+    void Build() {}
+}
+class Caller {
+    void Run() { Foo.Make(new Config()).Build(); }
+}
+`
+      );
+      cg = await CodeGraph.init(tempDir, { index: true });
+      expect(callerNamesOf('Foo::Build')).toEqual(['Run']);
+    });
+
+    it('creates NO edge when the factory return type lacks the method (silent miss, not a wrong edge)', async () => {
+      fs.writeFileSync(
+        path.join(tempDir, 'Main.cs'),
+        `class Foo {
+    static Foo Create() { return new Foo(); }
+}
+class Other { void OnlyOther() {} }
+class Caller {
+    void Run() { Foo.Create().OnlyOther(); }
+}
+`
+      );
+      cg = await CodeGraph.init(tempDir, { index: true });
+      // Foo has no OnlyOther() — must not mis-attach to the same-named Other::OnlyOther.
+      expect(callerNamesOf('Other::OnlyOther')).toEqual([]);
+    });
+  });
 });

+ 1 - 1
src/extraction/extraction-version.ts

@@ -21,4 +21,4 @@
  * turns the re-index hint into noise — keep it honest (see CLAUDE.md, "Honesty
  * in the product is load-bearing").
  */
-export const EXTRACTION_VERSION = 7;
+export const EXTRACTION_VERSION = 8;

+ 21 - 0
src/extraction/languages/csharp.ts

@@ -32,6 +32,26 @@ export function blankCsharpPreprocessorDirectives(source: string): string {
   return source.replace(re, (m, indent) => indent + ' '.repeat(m.length - indent.length));
 }
 
+/**
+ * A C# method's declared return type, normalized to the bare class name a chained
+ * `Foo.Create().Bar()` could be called on (the #645/#608 mechanism). The return
+ * type lives in the `returns` field (`static Foo Create()` → `Foo`); built-in
+ * `predefined_type` (void/int/string/…) and arrays yield undefined, generics are
+ * unwrapped to the base type, nullable `Foo?` is stripped, and a dotted namespace
+ * is reduced to the simple name. Constructors have no `returns` field → undefined.
+ */
+function extractCsharpReturnType(node: SyntaxNode, source: string): string | undefined {
+  const typeNode = node.childForFieldName('returns');
+  if (!typeNode) return undefined;
+  if (typeNode.type === 'predefined_type' || typeNode.type === 'array_type') return undefined;
+  let t = getNodeText(typeNode, source).trim();
+  t = t.replace(/\?+$/, ''); // nullable `Foo?`
+  t = t.replace(/<[^>]*>/g, ''); // generics `List<Foo>` → `List`
+  const last = t.split('.').pop()?.trim(); // namespace `Ns.Foo` → `Foo`
+  if (!last || !/^[A-Za-z_]\w*$/.test(last)) return undefined;
+  return last;
+}
+
 export const csharpExtractor: LanguageExtractor = {
   preParse: blankCsharpPreprocessorDirectives,
   functionTypes: [],
@@ -67,6 +87,7 @@ export const csharpExtractor: LanguageExtractor = {
   bodyField: 'body',
   paramsField: 'parameters',
   returnField: 'type',
+  getReturnType: extractCsharpReturnType,
   getVisibility: (node) => {
     for (let i = 0; i < node.childCount; i++) {
       const child = node.child(i);

+ 16 - 0
src/extraction/tree-sitter.ts

@@ -2567,6 +2567,22 @@ export class TreeSitterExtractor {
         } else if (func.type === 'scoped_identifier' || func.type === 'scoped_call_expression') {
           // Scoped call: Module::function()
           calleeName = getNodeText(func, this.source);
+        } else if (this.language === 'csharp' && func.type === 'member_access_expression') {
+          // C# member call `recv.Method(...)`. When the receiver is itself a call
+          // — a chained factory `Foo.Create(args).Bar()` — encode `inner().Bar`
+          // with normalized empty parens so resolution can infer Bar's class from
+          // what `Foo.Create` RETURNS (#645/#608). A non-call receiver keeps the
+          // full member-access text (the existing `recv.Method` behavior).
+          const recv = getChildByField(func, 'expression');
+          const nameNode = getChildByField(func, 'name');
+          const methodName = nameNode ? getNodeText(nameNode, this.source) : '';
+          if (recv && recv.type === 'invocation_expression' && methodName) {
+            const innerFunc = getChildByField(recv, 'function');
+            const innerCallee = innerFunc ? getNodeText(innerFunc, this.source).replace(/\s+/g, '') : '';
+            calleeName = innerCallee ? `${innerCallee}().${methodName}` : methodName;
+          } else {
+            calleeName = getNodeText(func, this.source);
+          }
         } else {
           calleeName = getNodeText(func, this.source);
         }

+ 4 - 4
src/resolution/name-matcher.ts

@@ -583,8 +583,8 @@ export function matchPhpCallChain(
  * (its declared return type); the outer method is then resolved and VALIDATED on
  * it (resolveMethodOnType requires `Type::method` to exist), so a wrong inference
  * yields no edge rather than a wrong one (e.g. a same-named `bar()` on an
- * unrelated class is never matched). Shared by the JVM dot-notation languages
- * (Java, Kotlin) — same receiver shape, same `Class::method` qualified names.
+ * unrelated class is never matched). Shared by the dot-notation languages
+ * (Java, Kotlin, C#) — same receiver shape, same `Class::method` qualified names.
  */
 export function matchDottedCallChain(
   ref: UnresolvedRef,
@@ -1062,11 +1062,11 @@ export function matchReference(
     if (result) return result;
   }
 
-  // 1d. JVM (Java / Kotlin) chained static-factory / fluent call
+  // 1d. Dotted chained static-factory / fluent call (Java / Kotlin / C#)
   // `Foo.getInstance().bar()` encoded as `Foo.getInstance().bar` (#645/#608
   // mechanism). Resolve bar's class from getInstance's declared return type, then
   // validate the method on it.
-  if (ref.language === 'java' || ref.language === 'kotlin') {
+  if (ref.language === 'java' || ref.language === 'kotlin' || ref.language === 'csharp') {
     result = matchDottedCallChain(ref, context);
     if (result) return result;
   }