1
0
Эх сурвалжийг харах

fix(resolution): yield per ref and cache hot per-ref work so the watchdog can't kill a valid index (#1122) (#1137)

The #850 liveness watchdog was killing valid `codegraph init`/`index` runs
at "Resolving refs 0-2%" on large collision-heavy repos (18-25K-file Java
monorepos on slower hardware). #1105's cooperative yielding assumed a
500-ref sub-chunk is always cheap, but per-ref cost is unbounded: a
colliding method name (`execute`, `process`, ...) whose candidate set
misses the 5,000-entry name LRU re-fetches every same-named row
(unbounded SELECT + materialization, measured 8.8ms at just 4K collisions
on an M4 — linear in collision count), and receiver-type inference
re-split the whole source file per ref (~20% of total index CPU). A dense
pocket multiplied that past the 60s window and the heartbeat starved.

Three guards, no behavior change:
- resolveBatchYielding checkpoints after EVERY ref (maybeYield is a ~ns
  time check when under budget), so a slow pocket can never run more than
  one ref past the yield budget.
- resolveMethodOnType's ref-independent candidate filter is memoized per
  (language, Type::method) on the resolver context; per-ref
  disambiguation (import FQN #314, call-site file #1079) stays outside
  the memo.
- Receiver inference reads lines through a per-file LRU (shared and C++
  inferrers), and skips generated/minified lines >10K chars instead of
  regex-scanning them per ref.

Measured on a 4,028-file synthetic Java bank repo (392K refs): mid-loop
max event-loop stall 1528ms -> 546ms under cache thrash, total init
250.9s -> 96.8s at default config.

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
Colby Mchenry 1 өдөр өмнө
parent
commit
81cb59a86e

+ 1 - 0
CHANGELOG.md

@@ -16,6 +16,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
 ### Fixes
 
+- `codegraph init` and `codegraph index` no longer get killed by the safety watchdog at the "Resolving refs" step on large method-name-heavy codebases (big Java/enterprise monorepos were the main victims, especially on slower machines). Resolution used to come up for air only every 500 references, so a dense stretch of expensive ones could starve the watchdog long enough for it to assume the process was stuck and kill a perfectly healthy index. Resolution now checkpoints after every reference, and two of the expensive steps got much cheaper: repeated method lookups on the same type are now cached, and source files are no longer re-split line-by-line for every call being resolved — indexing such repos is several times faster as a result. Generated or minified single-line files are also skipped during receiver-type inference instead of being scanned per call. Thanks @UchihaYong and @wangmeng-95 for the reports. (#1122)
 - The automatic context hook for Claude Code now fires for structural questions asked in nearly thirty languages — French, Spanish, Portuguese, German, Italian, Dutch, Polish, Czech, Romanian, Hungarian, Greek, Swedish, Danish, Norwegian, Finnish, Russian, Ukrainian, Turkish, Indonesian, Vietnamese, Thai, Hindi, Arabic, Farsi, Hebrew, Japanese, Korean, and both simplified and traditional Chinese — instead of just English and simplified Chinese. Previously a natural question like "comment marche la state machine des commandes ?" injected nothing unless it happened to contain a code-shaped symbol name, making the hook look broken for non-English teams. English questions phrased with derived word forms ("explain the architecture…", "what are the dependencies…") now fire too, and prompts in any other language still fire when they name a symbol from the index. Thanks @anthonyle-roy-lgtm for the report. (#1126)
 - Lua and Luau method calls with capitalized names (`obj:Method()` — the standard Roblox convention) now link to the right method. Because Lua's method-call syntax looks identical to a Luau type annotation, a capitalized call like `lg:Log()` was misread as declaring the variable's type, so whenever two or more classes shared a method name (`Init`, `Update`, `Destroy`, …) the call was silently dropped from callers, impact/blast-radius, and flow traces. Lowercase method names were unaffected. Thanks @inth3shadows for the precise root-cause analysis and repro. (#1124)
 - Removed dead code left behind by the discontinued managed-reasoning feature. Its `codegraph login` flow was unplugged before ever shipping in a release, but the unused module still shipped inside the platform bundles, and a security review flagged its Windows browser-open step (it routed the login URL through `cmd`, which would have been unsafe had the flow ever been wired back up). The leftover module and its tests are now fully deleted. Thanks @inth3shadows for the report. (#1114)

+ 176 - 1
__tests__/resolution.test.ts

@@ -11,7 +11,7 @@ import * as os from 'os';
 import { CodeGraph } from '../src';
 import { Node, UnresolvedReference } from '../src/types';
 import { ReferenceResolver, createResolver, ResolutionContext } from '../src/resolution';
-import { matchReference, resolveMethodOnType, matchByQualifiedName, preferCallSiteFile } from '../src/resolution/name-matcher';
+import { matchReference, resolveMethodOnType, matchByQualifiedName, preferCallSiteFile, matchMethodCall } from '../src/resolution/name-matcher';
 import { resolveImportPath, extractImportMappings, resolveJvmImport, loadCppIncludeDirs, clearCppIncludeDirCache, isPhpIncludePathRef } from '../src/resolution/import-resolver';
 import type { UnresolvedRef } from '../src/resolution/types';
 import { detectFrameworks, getAllFrameworkResolvers } from '../src/resolution/frameworks';
@@ -1650,6 +1650,181 @@ func main() {
     });
   });
 
+  describe('Watchdog-safe resolution on collision-heavy repos (#1122)', () => {
+    // On a large Java-style repo, per-ref resolution cost is unbounded in the
+    // worst case (a colliding method name whose candidate set misses the LRU
+    // re-fetches tens of thousands of rows, and receiver inference re-splits
+    // the whole source file). v1.2.0 yielded only every 500 refs, so a dense
+    // pocket multiplied that cost past the #850 watchdog window and a VALID
+    // `init` was SIGKILLed at "Resolving refs". These pin the three guards:
+    // per-ref yield checkpoints, the (type, method) match memo, and the
+    // per-file lines cache with its generated/minified-line skip.
+    const methodNode = (
+      id: string,
+      filePath: string,
+      qualifiedName: string,
+      name: string,
+      language: Node['language'] = 'typescript',
+      kind: Node['kind'] = 'method',
+    ): Node => ({
+      id, kind, name, qualifiedName, filePath, language,
+      startLine: 1, endLine: 1, startColumn: 0, endColumn: 0, updatedAt: 0,
+    });
+
+    it('resolveMethodOnType consults the method-match memo and still disambiguates per call site', () => {
+      const logA = methodNode('m:a', 'a/svc.ts', 'Logger::log', 'log');
+      const logB = methodNode('m:b', 'b/svc.ts', 'Logger::log', 'log');
+      const shared = [logA, logB]; // one cached array served to every caller
+      let memoCalls = 0;
+      let rawNameLookups = 0;
+      const ctx: ResolutionContext = {
+        getNodesInFile: () => [],
+        getNodesByName: () => { rawNameLookups++; return shared; },
+        getMethodMatches: () => { memoCalls++; return shared; },
+        getNodesByQualifiedName: () => [],
+        getNodesByKind: () => [],
+        fileExists: () => false,
+        readFile: () => null,
+        getProjectRoot: () => '',
+        getAllFiles: () => [],
+      };
+      const refFrom = (filePath: string): UnresolvedRef => ({
+        fromNodeId: 'caller', referenceName: 'lg.log', referenceKind: 'calls',
+        line: 2, column: 0, filePath, language: 'typescript',
+      });
+
+      // Both call sites read the SAME memoized array, yet each still resolves
+      // to its own file — per-ref disambiguation runs after the memo (#1079).
+      const fromA = resolveMethodOnType('Logger', 'log', refFrom('a/svc.ts'), ctx, 0.9, 'instance-method');
+      const fromB = resolveMethodOnType('Logger', 'log', refFrom('b/svc.ts'), ctx, 0.9, 'instance-method');
+      expect(fromA?.targetNodeId).toBe('m:a');
+      expect(fromB?.targetNodeId).toBe('m:b');
+      expect(memoCalls).toBe(2);
+      expect(rawNameLookups).toBe(0); // memo bypasses the unbounded name fetch
+    });
+
+    it('the production resolver context memoizes method matches per (language, type, method)', async () => {
+      fs.writeFileSync(
+        path.join(tempDir, 'svc.ts'),
+        `class Logger { log() { return 1; } }\nexport function use() { const lg = new Logger(); return lg.log(); }\n`,
+      );
+      cg = await CodeGraph.init(tempDir, { index: true });
+      const resolver = (cg as unknown as { resolver: ReferenceResolver }).resolver;
+      const ctx = (resolver as unknown as { context: ResolutionContext }).context;
+
+      const first = ctx.getMethodMatches!('Logger', 'log', 'typescript');
+      const second = ctx.getMethodMatches!('Logger', 'log', 'typescript');
+      expect(first.map((n) => n.qualifiedName)).toEqual(['Logger::log']);
+      // Same array instance = served from the memo, not recomputed.
+      expect(second).toBe(first);
+
+      resolver.clearCaches();
+      const afterClear = ctx.getMethodMatches!('Logger', 'log', 'typescript');
+      expect(afterClear).not.toBe(first);
+      expect(afterClear.map((n) => n.qualifiedName)).toEqual(['Logger::log']);
+    });
+
+    it('resolveBatchYielding offers a yield checkpoint for every ref', async () => {
+      fs.writeFileSync(
+        path.join(tempDir, 'a.ts'),
+        `export function fnA() { return 1; }\nexport function fnB() { return fnA(); }\nexport function fnC() { return fnB(); }\n`,
+      );
+      fs.writeFileSync(
+        path.join(tempDir, 'b.ts'),
+        `import { fnA } from './a';\nexport function fnD() { return fnA(); }\n`,
+      );
+      cg = await CodeGraph.init(tempDir, { index: true });
+      const resolver = (cg as unknown as { resolver: ReferenceResolver }).resolver;
+
+      // `init({ index: true })` already ran resolution, so feed the batch
+      // directly — resolveBatchYielding takes it as an argument; whether each
+      // ref resolves is irrelevant to the checkpoint contract.
+      const refs: UnresolvedReference[] = ['fnA', 'fnB', 'nosuchFn', 'fnA', 'alsoMissing'].map((name, i) => ({
+        fromNodeId: `caller-${i}`,
+        referenceName: name,
+        referenceKind: 'calls',
+        line: i + 1,
+        column: 0,
+        filePath: 'a.ts',
+        language: 'typescript',
+      }));
+
+      let checkpoints = 0;
+      const countingYield = async () => { checkpoints++; };
+      const result = await (resolver as unknown as {
+        resolveBatchYielding(batch: UnresolvedReference[], maybeYield: () => Promise<void>): Promise<{ stats: { total: number } }>;
+      }).resolveBatchYielding(refs, countingYield);
+
+      // One checkpoint per ref: a pocket of pathologically slow refs can never
+      // run more than ONE ref past the yield budget before the heartbeat gets
+      // a window — the #1122 kill required 500.
+      expect(checkpoints).toBe(refs.length);
+      expect(result.stats.total).toBe(refs.length);
+    });
+
+    it('receiver inference reads lines through getFileLines when the context provides it', () => {
+      const loggerClass = methodNode('c:logger', 'svc.ts', 'Logger', 'Logger', 'typescript', 'class');
+      const logMethod = methodNode('m:log', 'svc.ts', 'Logger::log', 'log');
+      const otherLog = methodNode('m:other', 'other.ts', 'Other::log', 'log');
+      const byName: Record<string, Node[]> = {
+        Logger: [loggerClass],
+        log: [logMethod, otherLog], // ambiguous bare name → only inference can resolve
+      };
+      const lines = ['const lg = new Logger();', 'lg.log();'];
+      const ctx: ResolutionContext = {
+        getNodesInFile: () => [],
+        getNodesByName: (name) => byName[name] ?? [],
+        getNodesByQualifiedName: () => [],
+        getNodesByKind: () => [],
+        fileExists: () => false,
+        // Reading the raw source must not be needed when lines are provided.
+        readFile: () => { throw new Error('readFile must not be called when getFileLines exists'); },
+        getFileLines: () => lines,
+        getProjectRoot: () => '',
+        getAllFiles: () => [],
+      };
+      const ref: UnresolvedRef = {
+        fromNodeId: 'caller', referenceName: 'lg.log', referenceKind: 'calls',
+        line: 2, column: 0, filePath: 'svc.ts', language: 'typescript',
+      };
+      expect(matchMethodCall(ref, ctx)?.targetNodeId).toBe('m:log');
+    });
+
+    it('receiver inference skips generated/minified lines instead of regex-scanning them', () => {
+      const loggerClass = methodNode('c:logger', 'svc.ts', 'Logger', 'Logger', 'typescript', 'class');
+      const logMethod = methodNode('m:log', 'svc.ts', 'Logger::log', 'log');
+      const otherLog = methodNode('m:other', 'other.ts', 'Other::log', 'log');
+      const byName: Record<string, Node[]> = {
+        Logger: [loggerClass],
+        log: [logMethod, otherLog],
+      };
+      const ctxWithLines = (lines: string[]): ResolutionContext => ({
+        getNodesInFile: () => [],
+        getNodesByName: (name) => byName[name] ?? [],
+        getNodesByQualifiedName: () => [],
+        getNodesByKind: () => [],
+        fileExists: () => false,
+        readFile: () => null,
+        getFileLines: () => lines,
+        getProjectRoot: () => '',
+        getAllFiles: () => [],
+      });
+      const ref: UnresolvedRef = {
+        fromNodeId: 'caller', referenceName: 'lg.log', referenceKind: 'calls',
+        line: 1, column: 0, filePath: 'svc.ts', language: 'typescript',
+      };
+
+      // Control: the declaration on a normal-length line resolves.
+      const normal = matchMethodCall(ref, ctxWithLines(['const lg = new Logger(); lg.log();']));
+      expect(normal?.targetNodeId).toBe('m:log');
+
+      // The same declaration buried in a >10K-char generated/minified line is
+      // skipped — no resolution, and no per-ref regex pass over the huge line.
+      const minified = 'var pad="' + 'x'.repeat(10_000) + '";const lg = new Logger(); lg.log();';
+      expect(matchMethodCall(ref, ctxWithLines([minified]))).toBeNull();
+    });
+  });
+
   describe('Local-variable receiver-type inference (#1108)', () => {
     // `lg.log()` where `lg` is a local whose type is inferred from its
     // declaration/initializer. Before this, only C++ resolved these; every

+ 96 - 37
src/resolution/index.ts

@@ -6,7 +6,7 @@
 
 import * as fs from 'fs';
 import * as path from 'path';
-import { Node, UnresolvedReference, Edge } from '../types';
+import { Language, Node, UnresolvedReference, Edge } from '../types';
 import { QueryBuilder } from '../db/queries';
 import {
   UnresolvedRef,
@@ -227,6 +227,8 @@ export class ReferenceResolver {
   private nameCache: LRUCache<string, Node[]>; // name → nodes cache
   private lowerNameCache: LRUCache<string, Node[]>; // lower(name) → nodes cache
   private qualifiedNameCache: LRUCache<string, Node[]>; // qualified_name → nodes cache
+  private fileLinesCache: LRUCache<string, string[] | null>; // file → split lines cache
+  private methodMatchCache: LRUCache<string, Node[]>; // lang\0Type::method → matching method nodes
   private knownNames: Set<string> | null = null; // all known symbol names for fast pre-filtering
   private knownFiles: Set<string> | null = null;
   private cachesWarmed = false;
@@ -254,6 +256,10 @@ export class ReferenceResolver {
     this.nameCache = new LRUCache(limit);
     this.lowerNameCache = new LRUCache(limit);
     this.qualifiedNameCache = new LRUCache(limit);
+    // Split-lines arrays are heavier than content strings; refs arrive
+    // file-ordered, so a small cache still hits nearly always.
+    this.fileLinesCache = new LRUCache(contentLimit);
+    this.methodMatchCache = new LRUCache(limit);
 
     this.context = this.createContext();
   }
@@ -324,11 +330,30 @@ export class ReferenceResolver {
     this.nameCache.clear();
     this.lowerNameCache.clear();
     this.qualifiedNameCache.clear();
+    this.fileLinesCache.clear();
+    this.methodMatchCache.clear();
     this.knownNames = null;
     this.knownFiles = null;
     this.cachesWarmed = false;
   }
 
+  /** `readFile` through the LRU content cache (null = read failed, also cached). */
+  private readFileCached(filePath: string): string | null {
+    if (this.fileCache.has(filePath)) {
+      return this.fileCache.get(filePath)!;
+    }
+    const fullPath = path.join(this.projectRoot, filePath);
+    try {
+      const content = fs.readFileSync(fullPath, 'utf-8');
+      this.fileCache.set(filePath, content);
+      return content;
+    } catch (error) {
+      logDebug('Failed to read file for resolution', { filePath, error: String(error) });
+      this.fileCache.set(filePath, null);
+      return null;
+    }
+  }
+
   /**
    * Create the resolution context
    */
@@ -349,6 +374,27 @@ export class ReferenceResolver {
         return result;
       },
 
+      getMethodMatches: (typeName: string, methodName: string, language: Language) => {
+        const key = `${language} ${typeName}::${methodName}`;
+        const cached = this.methodMatchCache.get(key);
+        if (cached !== undefined) return cached;
+        let candidates = this.nameCache.get(methodName);
+        if (candidates === undefined) {
+          candidates = this.queries.getNodesByName(methodName);
+          this.nameCache.set(methodName, candidates);
+        }
+        const want = `${typeName}::${methodName}`;
+        const matches: Node[] = [];
+        for (const m of candidates) {
+          if (m.kind !== 'method') continue;
+          if (m.language !== language) continue;
+          const qn = m.qualifiedName;
+          if (qn === want || qn.endsWith(`::${want}`)) matches.push(m);
+        }
+        this.methodMatchCache.set(key, matches);
+        return matches;
+      },
+
       getNodesByQualifiedName: (qualifiedName: string) => {
         const cached = this.qualifiedNameCache.get(qualifiedName);
         if (cached !== undefined) return cached;
@@ -379,21 +425,15 @@ export class ReferenceResolver {
         }
       },
 
-      readFile: (filePath: string) => {
-        if (this.fileCache.has(filePath)) {
-          return this.fileCache.get(filePath)!;
-        }
+      readFile: (filePath: string) => this.readFileCached(filePath),
 
-        const fullPath = path.join(this.projectRoot, filePath);
-        try {
-          const content = fs.readFileSync(fullPath, 'utf-8');
-          this.fileCache.set(filePath, content);
-          return content;
-        } catch (error) {
-          logDebug('Failed to read file for resolution', { filePath, error: String(error) });
-          this.fileCache.set(filePath, null);
-          return null;
-        }
+      getFileLines: (filePath: string) => {
+        const cached = this.fileLinesCache.get(filePath);
+        if (cached !== undefined) return cached;
+        const source = this.readFileCached(filePath);
+        const lines = source === null ? null : source.split(/\r?\n/);
+        this.fileLinesCache.set(filePath, lines);
+        return lines;
       },
 
       getProjectRoot: () => this.projectRoot,
@@ -926,39 +966,58 @@ export class ReferenceResolver {
   }
 
   /**
-   * Resolve one batch in smaller sub-chunks, yielding to the event loop between
-   * them so the #850 liveness heartbeat can fire on a slow/dense batch (#1091).
-   * Behaviourally identical to a single `resolveAll(batch)`: `warmCaches()` is
-   * idempotent (guarded) and `resolveOne` is independent per ref, so splitting
-   * and re-merging changes only timing, never which edges get created. Falls
-   * through to a plain `resolveAll` when the batch is already small.
+   * Resolve one batch with a yield checkpoint between EVERY ref so the #850
+   * liveness heartbeat can fire on a slow/dense batch (#1091). The checkpoint
+   * granularity is per-ref — not per-N-refs — because per-ref cost is unbounded
+   * in the worst case (a collision-heavy method name whose candidate set misses
+   * the LRU re-fetches tens of thousands of rows): any fixed N multiplies that
+   * worst case into the watchdog window, which is how v1.2.0 still got killed
+   * at "Resolving refs" on large Java monorepos (#1122). `maybeYield()` is a
+   * ~ns time check when under budget, so per-ref checkpoints cost nothing.
+   * Behaviourally identical to `resolveAll(batch)`: `warmCaches()` is
+   * idempotent (guarded) and `resolveOne` is independent per ref, so yielding
+   * between refs changes only timing, never which edges get created.
    */
   private async resolveBatchYielding(
     batch: UnresolvedReference[],
-    maybeYield: MaybeYield,
-    subChunkSize: number = 500
+    maybeYield: MaybeYield
   ): Promise<ResolutionResult> {
-    if (batch.length <= subChunkSize) return this.resolveAll(batch);
+    this.warmCaches();
 
     const resolved: ResolvedRef[] = [];
     const unresolved: UnresolvedRef[] = [];
     const byMethod: Record<string, number> = {};
-    let total = 0;
-    let resolvedCount = 0;
-    let unresolvedCount = 0;
-    for (let i = 0; i < batch.length; i += subChunkSize) {
-      const chunk = this.resolveAll(batch.slice(i, i + subChunkSize));
-      for (const r of chunk.resolved) resolved.push(r);
-      for (const u of chunk.unresolved) unresolved.push(u);
-      total += chunk.stats.total;
-      resolvedCount += chunk.stats.resolved;
-      unresolvedCount += chunk.stats.unresolved;
-      for (const [m, c] of Object.entries(chunk.stats.byMethod)) {
-        byMethod[m] = (byMethod[m] || 0) + c;
+
+    for (const raw of batch) {
+      const ref: UnresolvedRef = {
+        fromNodeId: raw.fromNodeId,
+        referenceName: raw.referenceName,
+        referenceKind: raw.referenceKind,
+        line: raw.line,
+        column: raw.column,
+        filePath: raw.filePath || this.getFilePathFromNodeId(raw.fromNodeId),
+        language: raw.language || this.getLanguageFromNodeId(raw.fromNodeId),
+      };
+      const result = this.resolveOne(ref);
+      if (result) {
+        resolved.push(result);
+        byMethod[result.resolvedBy] = (byMethod[result.resolvedBy] || 0) + 1;
+      } else {
+        unresolved.push(ref);
       }
       await maybeYield();
     }
-    return { resolved, unresolved, stats: { total, resolved: resolvedCount, unresolved: unresolvedCount, byMethod } };
+
+    return {
+      resolved,
+      unresolved,
+      stats: {
+        total: batch.length,
+        resolved: resolved.length,
+        unresolved: unresolved.length,
+        byMethod,
+      },
+    };
   }
 
   /**

+ 42 - 18
src/resolution/name-matcher.ts

@@ -503,15 +503,25 @@ export function resolveMethodOnType(
   // in-class (`class Foo { int bar() { ... } }`) or out-of-line in a separate
   // file (`int Foo::bar() { ... }` in foo.cpp while class Foo is in foo.hpp).
   // The previous same-file approach missed the latter — the typical C++ layout.
-  const methodCandidates = context.getNodesByName(methodName);
-  const want = `${typeName}::${methodName}`;
-  const matches: Node[] = [];
-  for (const m of methodCandidates) {
-    if (m.kind !== 'method') continue;
-    if (m.language !== ref.language) continue;
-    const qn = m.qualifiedName;
-    if (qn === want || qn.endsWith(`::${want}`)) {
-      matches.push(m);
+  // Prefer the context's per-(type, method) memo: the raw name lookup fetches
+  // EVERY node sharing the method name — tens of thousands of rows for a
+  // collision-heavy Java name like `execute` — and re-filtering that per ref
+  // was a dominant term in the #1122 watchdog kill on large repos. Only the
+  // ref-independent filter is memoized; per-ref disambiguation stays below.
+  let matches: Node[];
+  if (context.getMethodMatches) {
+    matches = context.getMethodMatches(typeName, methodName, ref.language);
+  } else {
+    const methodCandidates = context.getNodesByName(methodName);
+    const want = `${typeName}::${methodName}`;
+    matches = [];
+    for (const m of methodCandidates) {
+      if (m.kind !== 'method') continue;
+      if (m.language !== ref.language) continue;
+      const qn = m.qualifiedName;
+      if (qn === want || qn.endsWith(`::${want}`)) {
+        matches.push(m);
+      }
     }
   }
   if (matches.length === 0) {
@@ -610,10 +620,14 @@ function inferCppReceiverType(
   context: ResolutionContext,
   depth = 0,
 ): string | null {
-  const source = context.readFile(ref.filePath);
-  if (!source) return null;
+  // Per-file lines cache when available — this runs per `receiver->method()`
+  // ref and re-splitting the file each time is the same quadratic as the
+  // shared inferrer's (#1122).
+  const lines = context.getFileLines
+    ? context.getFileLines(ref.filePath)
+    : (context.readFile(ref.filePath)?.split(/\r?\n/) ?? null);
+  if (!lines || lines.length === 0) return null;
 
-  const lines = source.split(/\r?\n/);
   const callLineIndex = Math.max(0, Math.min(lines.length - 1, ref.line - 1));
   const escapedReceiver = receiverName.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
   const receiverPattern = new RegExp(`\\b${escapedReceiver}\\b`);
@@ -646,10 +660,12 @@ function inferCppReceiverType(
 
   for (const headerPath of headerCandidates) {
     if (!context.fileExists(headerPath)) continue;
-    const headerSource = context.readFile(headerPath);
-    if (!headerSource) continue;
+    const headerLines = context.getFileLines
+      ? context.getFileLines(headerPath)
+      : (context.readFile(headerPath)?.split(/\r?\n/) ?? null);
+    if (!headerLines) continue;
 
-    for (const line of headerSource.split(/\r?\n/)) {
+    for (const line of headerLines) {
       if (!receiverPattern.test(line)) continue;
       const declaratorMatch = line.match(declaratorRegex);
       if (!declaratorMatch) continue;
@@ -1205,10 +1221,14 @@ function inferLocalReceiverType(
   );
   if (patterns.length === 0) return null;
 
-  const source = context.readFile(ref.filePath);
-  if (!source) return null;
+  // Split through the context's per-file lines cache when available: this runs
+  // for EVERY `receiver.method()` ref, and re-splitting the whole file per ref
+  // was ~20% of total index CPU on Java-heavy repos (#1122).
+  const lines = context.getFileLines
+    ? context.getFileLines(ref.filePath)
+    : (context.readFile(ref.filePath)?.split(/\r?\n/) ?? null);
+  if (!lines || lines.length === 0) return null;
 
-  const lines = source.split(/\r?\n/);
   const callIdx = Math.max(0, Math.min(lines.length - 1, ref.line - 1));
   const startIdx = Math.max(0, enclosingScopeStartLine(ref, context) - 1);
 
@@ -1216,6 +1236,10 @@ function inferLocalReceiverType(
   for (let i = callIdx; i >= startIdx; i--) {
     const line = lines[i];
     if (!line) continue;
+    // A generated/minified line (one multi-KB statement) is not something a
+    // human-written local declaration lives on, and regexing it per ref is
+    // pure waste — skip it rather than scan it.
+    if (line.length > 10_000) continue;
     for (const re of patterns) {
       const m = line.match(re);
       if (m && m[1]) {

+ 20 - 0
src/resolution/types.ts

@@ -75,6 +75,26 @@ export interface ResolutionContext {
   fileExists(filePath: string): boolean;
   /** Read file content */
   readFile(filePath: string): string | null;
+  /**
+   * `readFile(filePath)` split into lines, LRU-cached per file. Receiver-type
+   * inference scans source lines for EVERY `receiver.method()` ref; splitting
+   * the whole file per ref made that O(refs-in-file × file-size) — ~20% of
+   * total index CPU on a Java-heavy repo and a driver of the #1122 watchdog
+   * kill on large ones. Optional so external/test contexts compile without it;
+   * callers fall back to splitting `readFile` themselves.
+   */
+  getFileLines?(filePath: string): string[] | null;
+  /**
+   * The method-definition nodes matching `typeName::methodName` in `language` —
+   * exactly `resolveMethodOnType`'s kind/language/qualifiedName-suffix filter,
+   * LRU-cached per (language, type, method). The uncached path re-fetches every
+   * node sharing the METHOD name (unbounded — tens of thousands on a collision-
+   * heavy Java repo) and re-scans it per ref, the dominant term in the #1122
+   * watchdog kill. Cached entries hold only the small filtered result; per-ref
+   * disambiguation (import FQN, call-site file) stays in the caller so a cached
+   * entry is valid from any call site. Optional for external/test contexts.
+   */
+  getMethodMatches?(typeName: string, methodName: string, language: Language): Node[];
   /** Get project root */
   getProjectRoot(): string;
   /** Get all files */