Jelajahi Sumber

fix(objc): resolve chained message-send calls [[Foo create] doIt] (#750) (#786)

Ports the #645/#608 chained-receiver mechanism to Objective-C. A message send
whose receiver is itself a message send — `[[Foo create] doIt]` — used to drop
the receiver, so `doIt` name-matched a same-named method on an unrelated class
(commonly a test helper's `init` or an Apple-SDK method).

- objc.ts: getReturnType reads the method's `method_type`, SKIPPING nullability /
  ARC qualifiers (`nonnull instancetype` must yield instancetype, not `nonnull`).
- tree-sitter.ts: the message_expression branch now re-encodes a chained send
  `[[Foo create] doIt]` as `Foo.create().doIt` when the inner receiver is a
  capitalized class and the outer selector is unary.
- name-matcher.ts: `objc` joins the dotted-chain gate + CHAIN_LANGUAGES. A
  class-message factory returns an instance of the RECEIVER class by convention
  (`instancetype`), so when the factory's own return type isn't recoverable
  (`alloc`/`new`/`shared…` return instancetype, or aren't user nodes), the
  receiver's type is the class itself — this resolves the ubiquitous
  `[[X alloc] init]` and singleton chains. resolveMethodOnType validates against
  the class and its supertypes, so a wrong inference yields no edge.

Validation: 4 synthetic tests (factory+decoy, superclass conformance, absent-method
safety, the nonnull-instancetype singleton). Real-repo A/B on SDWebImage (208 files):
+35 / -75 — all corrections (the -75 are wrong `init` mis-matches to a test helper /
wrong class, retargeted to the right class's init in the +35, plus 2 Apple-SDK chains
on unindexed classes). db stable, no node explosion. EXTRACTION_VERSION 14->15.
Full suite green.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Colby Mchenry 1 Minggu lalu
induk
melakukan
d21d2dfa50

+ 1 - 0
CHANGELOG.md

@@ -33,6 +33,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 - Scala method calls made through a companion-object factory, a fluent chain, or a case-class `apply` now resolve to the correct type. A call like `Foo.create().bar()` or `Builder(cfg).bar()` used to drop the receiver, so the chained method silently attached to a same-named method on an unrelated type — most often mis-attributing a standard-library `Option` / `Iterator` `.map` / `.flatMap` / `.foreach` onto your own same-named class. CodeGraph now captures Scala return types (a generic `List[Foo]` resolves to its container `List`, a qualified `pkg.Foo` to `Foo`), infers the chained receiver's type from what the inner call returns or constructs, and resolves the method on it — including methods inherited from a trait the type extends — creating the edge only when that type or one of its traits genuinely has the method (so a wrong inference produces no edge instead of a misleading one). Existing Scala indexes should be re-indexed (`codegraph index -f`) to benefit. (#750) (Scala)
 - Rust method calls made through a chained associated function now resolve to the correct type. A call like `Foo::new().bar()` or `Foo::with(cfg).build()` used to drop the receiver, so the chained method silently attached to a same-named method on an unrelated type — or didn't resolve. CodeGraph now captures Rust return types (`-> Self` resolves to the implementing type), infers the chained receiver's type from what the associated function returns, and resolves the method on it — including methods provided by a trait the type implements (via the new `impl Trait for Type` relationships) — creating the edge only when the type or one of its traits genuinely has the method. Existing Rust indexes should be re-indexed (`codegraph index -f`) to benefit. (#750) (Rust)
 - Dart method calls made through a static factory, a factory or named constructor, or a fluent chain now resolve to the correct type. A call like `Foo.create().bar()` used to drop the receiver, so the chained method silently attached to a same-named method on an unrelated type — most often mis-attributing a standard-library `Option` / `Iterator` `.map` / `.where` onto your own same-named class. CodeGraph now indexes Dart **factory and named constructors** (`factory Foo.create()`, `Foo.named()`) as first-class members so calls to them resolve, captures Dart return types (a generic `List<Foo>` resolves to its container `List`), infers the chained receiver's type from what the inner call returns or constructs, and resolves the method on it — including methods inherited from a superclass or mixin — creating the edge only when that type genuinely has the method. Plain construction (`Foo(...)`) is still recorded as instantiation. Existing Dart indexes should be re-indexed (`codegraph index -f`) to benefit. (#750) (Dart)
+- Objective-C methods called through a chained message send now resolve to the correct class. A call like `[[Foo create] doIt]` used to drop the receiver, so `doIt` silently attached to a same-named method on an unrelated class — most often a test helper or stdlib class. CodeGraph now captures Objective-C method return types and infers the chained receiver's type from what the inner message returns. For the ubiquitous `[[X alloc] init]` and singleton (`[[X sharedInstance] …]`) patterns — where the factory returns `instancetype` — the receiver is the class `X` itself, so the chained method resolves on `X` (including methods inherited from a superclass), creating the edge only when the class genuinely has the method. Existing Objective-C indexes should be re-indexed (`codegraph index -f`) to benefit. (#750) (Objective-C)
 - Chained method calls now resolve when the chained method is **inherited from a superclass or declared on an interface/protocol** the receiver's type conforms to — for example a call on a sealed-subclass instance (`Either.Right(x).combine(...)`) that invokes a method defined on its parent type. Previously these chains found no caller edge even though the factory's type was known, so the call was invisible to callers, impact, and trace. CodeGraph now walks the type's supertypes (its `extends` / `implements` relationships) to find the method, creating the edge only when a supertype genuinely declares it (so a wrong inference still produces no edge). This makes Java, Kotlin, and C# factory and fluent chains more complete. Existing indexes should be re-indexed (`codegraph index -f`) to benefit. (#750)
 - Swift method calls made through a static factory, fluent chain, or constructor now resolve to the correct class. A call like `Foo.make().draw()` or `Foo().draw()` used to drop the receiver, so the chained method silently attached to a same-named method on an unrelated class — or didn't resolve at all. CodeGraph now captures Swift return types and infers the chained receiver's type from what the inner call returns (or the constructed type), creating the edge only when that class genuinely has the method (so a wrong inference produces no edge instead of a misleading one). Existing Swift indexes should be re-indexed (`codegraph index -f`) to benefit. (#750) (Swift)
 - C# method calls made through a static factory or fluent chain now resolve to the correct class. A call like `Foo.Create().Bar()` or `JObject.Parse(s).Property(...)` used to lose the receiver's type, so the chained method didn't resolve and the call was invisible to callers/impact/trace. CodeGraph now captures C# return types and infers the chained receiver's type from what the inner call returns, creating the edge only when that class genuinely has the method (so a wrong inference produces no edge). Existing C# indexes should be re-indexed (`codegraph index -f`) to benefit. (#750) (C#)

+ 137 - 0
__tests__/resolution.test.ts

@@ -2994,4 +2994,141 @@ void run() {
       expect(incoming.some((e) => e.kind === 'instantiates')).toBe(true);
     });
   });
+
+  describe('Objective-C chained message-send call resolution (#645/#608 mechanism)', () => {
+    function callerNamesOf(qualifiedName: string): string[] {
+      const target = cg.getNodesByKind('method').find((n) => n.qualifiedName === qualifiedName);
+      if (!target) return [];
+      const names = cg
+        .getIncomingEdges(target.id)
+        .filter((e) => e.kind === 'calls')
+        .map((e) => cg.getNode(e.source)?.name)
+        .filter((n): n is string => !!n);
+      return [...new Set(names)].sort();
+    }
+
+    it('resolves a chained message send [[Foo create] doIt] via the return type, never a same-named decoy', async () => {
+      fs.writeFileSync(
+        path.join(tempDir, 'main.m'),
+        `@interface Bar : NSObject
+- (void)doIt;
+@end
+@implementation Bar
+- (void)doIt {}
+@end
+@interface Decoy : NSObject
+- (void)doIt;
+@end
+@implementation Decoy
+- (void)doIt {}
+@end
+@interface Foo : NSObject
++ (Bar *)create;
+@end
+@implementation Foo
++ (Bar *)create { return nil; }
+- (void)run { [[Foo create] doIt]; }
+@end
+`
+      );
+      cg = await CodeGraph.init(tempDir, { index: true });
+      expect(callerNamesOf('Bar::doIt')).toEqual(['run']);
+      expect(callerNamesOf('Decoy::doIt')).toEqual([]);
+    });
+
+    it('resolves a chained message whose method is inherited from a superclass (via conformance)', async () => {
+      fs.writeFileSync(
+        path.join(tempDir, 'main.m'),
+        `@interface Base : NSObject
+- (void)render;
+@end
+@implementation Base
+- (void)render {}
+@end
+@interface Widget : Base
+@end
+@implementation Widget
+@end
+@interface Decoy : NSObject
+- (void)render;
+@end
+@implementation Decoy
+- (void)render {}
+@end
+@interface Factory : NSObject
++ (Widget *)make;
+@end
+@implementation Factory
++ (Widget *)make { return nil; }
+- (void)run { [[Factory make] render]; }
+@end
+`
+      );
+      cg = await CodeGraph.init(tempDir, { index: true });
+      expect(callerNamesOf('Base::render')).toEqual(['run']);
+      expect(callerNamesOf('Decoy::render')).toEqual([]);
+    });
+
+    it('creates NO edge when the factory return type lacks the method (silent miss)', async () => {
+      fs.writeFileSync(
+        path.join(tempDir, 'main.m'),
+        `@interface Bar : NSObject
+@end
+@implementation Bar
+@end
+@interface Other : NSObject
+- (void)onlyOther;
+@end
+@implementation Other
+- (void)onlyOther {}
+@end
+@interface Foo : NSObject
++ (Bar *)create;
+@end
+@implementation Foo
++ (Bar *)create { return nil; }
+- (void)run { [[Foo create] onlyOther]; }
+@end
+`
+      );
+      cg = await CodeGraph.init(tempDir, { index: true });
+      // Bar has no onlyOther — must not mis-attach to the same-named Other::onlyOther.
+      expect(callerNamesOf('Other::onlyOther')).toEqual([]);
+    });
+
+    it('resolves a singleton chain [[Cache shared] clearAll] whose factory returns nonnull instancetype', async () => {
+      // The factory returns `nonnull instancetype` — the nullability qualifier must
+      // be skipped (not captured AS the type), and an instancetype class-message
+      // factory returns the receiver class, so clearAll resolves on Cache, never a
+      // same-named decoy. (Regression for both: the captured-`nonnull` bug and the
+      // ubiquitous `[[X alloc] init]` / singleton pattern.)
+      fs.writeFileSync(
+        path.join(tempDir, 'main.m'),
+        `@interface Cache : NSObject
++ (nonnull instancetype)shared;
+- (void)clearAll;
+@end
+@implementation Cache
++ (nonnull instancetype)shared { return nil; }
+- (void)clearAll {}
+@end
+@interface Decoy : NSObject
+- (void)clearAll;
+@end
+@implementation Decoy
+- (void)clearAll {}
+@end
+@interface Caller : NSObject
+- (void)run;
+@end
+@implementation Caller
+- (void)run { [[Cache shared] clearAll]; }
+@end
+`
+      );
+      cg = await CodeGraph.init(tempDir, { index: true });
+      expect(callerNamesOf('Cache::clearAll')).toEqual(['run']);
+      expect(callerNamesOf('Decoy::clearAll')).toEqual([]);
+    });
+  });
 });

+ 1 - 1
src/extraction/extraction-version.ts

@@ -21,4 +21,4 @@
  * turns the re-index hint into noise — keep it honest (see CLAUDE.md, "Honesty
  * in the product is load-bearing").
  */
-export const EXTRACTION_VERSION = 14;
+export const EXTRACTION_VERSION = 15;

+ 41 - 0
src/extraction/languages/objc.ts

@@ -31,6 +31,46 @@ function extractObjcMethodName(node: SyntaxNode, source: string): string | undef
   return identifiers.map((id) => `${getNodeText(id, source)}:`).join('');
 }
 
+/** Nullability / ARC qualifiers that sit where a return type's first type
+ *  identifier does (`(nonnull instancetype)`, `(nullable Bar *)`) — never the type. */
+const OBJC_TYPE_QUALIFIERS = new Set([
+  'nonnull', 'nullable', 'null_unspecified', 'null_resettable',
+  '_Nonnull', '_Nullable', '_Null_unspecified', '__nonnull', '__nullable',
+  'const', 'volatile', 'strong', 'weak', 'copy', 'assign', 'retain', 'oneway',
+  '__strong', '__weak', '__unsafe_unretained', '__autoreleasing', '__kindof',
+]);
+
+/** Collect the type identifiers under a `method_type`, in document order. */
+function collectTypeIdentifiers(node: SyntaxNode, source: string, out: string[]): void {
+  if (node.type === 'type_identifier') out.push(getNodeText(node, source).trim());
+  for (let i = 0; i < node.namedChildCount; i++) {
+    const child = node.namedChild(i);
+    if (child) collectTypeIdentifiers(child, source, out);
+  }
+}
+
+/**
+ * Capture an ObjC method's declared return type as a bare class name, for the
+ * chained static-factory call mechanism (#750). `+ (Bar *)create` yields `Bar`;
+ * a nullability/ARC qualifier (`(nonnull instancetype)`, `(nullable Bar *)`) is
+ * skipped to reach the real type. `void` / `id` / `instancetype` / primitives
+ * yield undefined — for a class-message factory that means the receiver's type
+ * is the class itself (handled in resolution), so `[[X alloc] init]` and
+ * singleton chains still resolve.
+ */
+function extractObjcReturnType(node: SyntaxNode, source: string): string | undefined {
+  if (node.type !== 'method_definition' && node.type !== 'method_declaration') return undefined;
+  const methodType = node.namedChildren.find((c) => c.type === 'method_type');
+  if (!methodType) return undefined;
+  const ids: string[] = [];
+  collectTypeIdentifiers(methodType, source, ids);
+  const name = ids.find((n) => !OBJC_TYPE_QUALIFIERS.has(n));
+  if (!name || !/^[A-Za-z_]\w*$/.test(name) || name === 'void' || name === 'id' || name === 'instancetype') {
+    return undefined;
+  }
+  return name;
+}
+
 function extractObjcPropertyName(node: SyntaxNode, source: string): string | null {
   if (node.type !== 'property_declaration') return null;
 
@@ -73,6 +113,7 @@ export const objcExtractor: LanguageExtractor = {
   nameField: 'declarator',
   bodyField: 'body',
   paramsField: 'parameters',
+  getReturnType: extractObjcReturnType,
   resolveName: extractObjcMethodName,
   extractPropertyName: extractObjcPropertyName,
   resolveBody: (node, bodyField) => {

+ 27 - 0
src/extraction/tree-sitter.ts

@@ -2482,6 +2482,33 @@ export class TreeSitterExtractor {
           } else {
             calleeName = methodName;
           }
+        } else if (receiverField && receiverField.type === 'message_expression' && /^\w+$/.test(methodName)) {
+          // Chained message send `[[Foo create] doIt]` — the receiver is itself a
+          // class message. Recover the inner `Class.selector` and encode
+          // `Class.selector().doIt` so resolution infers doIt's class from what
+          // `Class.selector` RETURNS (#645/#608). Only a CLASS-factory chain
+          // (capitalized inner receiver); a unary outer selector is required
+          // because the chain resolver's method part is `\w+` (no `:`). An
+          // instance chain (`[[obj foo] bar]`, lowercase inner) stays bare.
+          const innerRecv = getChildByField(receiverField, 'receiver');
+          const innerRecvName = innerRecv ? getNodeText(innerRecv, this.source) : '';
+          if (innerRecv?.type === 'identifier' && /^[A-Z]/.test(innerRecvName)) {
+            const innerKw: string[] = [];
+            for (let i = 0; i < receiverField.namedChildCount; i++) {
+              if (receiverField.fieldNameForNamedChild(i) === 'method') {
+                const kw = receiverField.namedChild(i);
+                if (kw) innerKw.push(getNodeText(kw, this.source));
+              }
+            }
+            let innerColon = false;
+            for (let i = 0; i < receiverField.childCount; i++) {
+              if (receiverField.child(i)?.type === ':') { innerColon = true; break; }
+            }
+            const innerSelector = innerColon ? innerKw.map((k) => `${k}:`).join('') : innerKw[0];
+            calleeName = innerSelector ? `${innerRecvName}.${innerSelector}().${methodName}` : methodName;
+          } else {
+            calleeName = methodName;
+          }
         } else {
           calleeName = methodName;
         }

+ 1 - 1
src/resolution/index.ts

@@ -37,7 +37,7 @@ const SUPERTYPE_BEARING_KINDS = new Set<Node['kind']>([
  * second pass. Dotted-receiver languages resolve via matchDottedCallChain; the
  * `::`-receiver ones (Rust) via matchScopedCallChain.
  */
-const CHAIN_LANGUAGES = new Set(['java', 'kotlin', 'csharp', 'swift', 'rust', 'go', 'scala', 'dart']);
+const CHAIN_LANGUAGES = new Set(['java', 'kotlin', 'csharp', 'swift', 'rust', 'go', 'scala', 'dart', 'objc']);
 const SCOPED_CHAIN_LANGUAGES = new Set(['rust']);
 
 /** The extractor's chained-receiver encoding: `<inner>().<method>`. */

+ 25 - 7
src/resolution/name-matcher.ts

@@ -673,7 +673,23 @@ export function matchDottedCallChain(
   const factoryMethod = inner.slice(lastDot + 1);
   if (!factoryClass || !factoryMethod) return null;
   const ret = lookupCalleeReturnType(`${factoryClass}::${factoryMethod}`, ref, context);
-  if (!ret) return null;
+  if (!ret) {
+    // Objective-C: a class-message factory — `[X alloc]`, `[X new]`,
+    // `[X sharedFoo]` — returns an instance of the RECEIVER class `X` by
+    // convention (`instancetype`). So when the factory's own return type isn't
+    // recoverable (its selector returns `instancetype`, or `alloc`/`new` aren't
+    // user-defined nodes at all), the receiver's type is the class `X` itself.
+    // This resolves the ubiquitous `[[X alloc] init]` and singleton chains.
+    // resolveMethodOnType validates against X (and its supertypes), so a class
+    // whose method actually lives elsewhere yields NO edge, not a wrong one — and
+    // crucially this does NOT fire when a concrete return type WAS captured but
+    // simply lacks the method (that already returned null above: absent-method
+    // safety, so a same-named decoy is still never matched).
+    if (ref.language === 'objc' && /^[A-Z]/.test(factoryClass)) {
+      return resolveMethodOnType(factoryClass, method, ref, context, 0.8, 'instance-method', importedFqnOf(factoryClass, ref, context));
+    }
+    return null;
+  }
   return resolveMethodOnType(ret, method, ref, context, 0.85, 'instance-method', importedFqnOf(ret, ref, context));
 }
 
@@ -1123,11 +1139,12 @@ export function matchReference(
   }
 
   // 1d. Dotted chained static-factory / fluent call (Java / Kotlin / C# / Swift /
-  // Go / Scala / Dart) — `Foo.getInstance().bar()` encoded as `Foo.getInstance().bar`,
-  // Go's bare-factory `New().Method()` as `New().Method`, Scala's companion factory
-  // `Foo.create().bar()`, or Dart's static factory / factory-constructor
-  // `Foo.create().bar()` (#645/#608 mechanism). Resolve the method's class from the
-  // inner call's declared return type, then validate it.
+  // Go / Scala / Dart / Objective-C) — `Foo.getInstance().bar()` encoded as
+  // `Foo.getInstance().bar`, Go's bare-factory `New().Method()` as `New().Method`,
+  // Scala's companion factory, Dart's static factory / factory-constructor, or
+  // ObjC's chained message send `[[Foo create] doIt]` encoded as `Foo.create().doIt`
+  // (#645/#608 mechanism). Resolve the method's class from the inner call's
+  // declared return type, then validate it.
   if (
     ref.language === 'java' ||
     ref.language === 'kotlin' ||
@@ -1135,7 +1152,8 @@ export function matchReference(
     ref.language === 'swift' ||
     ref.language === 'go' ||
     ref.language === 'scala' ||
-    ref.language === 'dart'
+    ref.language === 'dart' ||
+    ref.language === 'objc'
   ) {
     result = matchDottedCallChain(ref, context);
     if (result) return result;