Browse Source

feat(extraction): capture function-as-value — callback registration sites in callers/impact (#756) (#807)

A function name used as a VALUE — passed as an argument
(signal(SIGINT, handler), qsort(..., compare)), assigned to a function
pointer or field (ops->recv_cb = my_cb, OnClick := Handler), or placed in
a struct initializer / handler table ({ .recv_cb = my_cb },
{ "get", getCommand }) — produced no edge in ANY of the 19 tree-sitter
languages, so registered callbacks looked dead and their registration
sites were invisible to callers/impact.

This adds table-driven function-as-value capture across all 19 languages
(plus the wrapper forms: &fn, &Cls::method, Java Class::m, Kotlin ::f,
Swift #selector, ObjC @selector, Ruby method(:sym), Scala eta, Pascal
@Handler), gated at extraction (same-file definitions + imported
bindings; C-family file-scope initializers are constant-expression
contexts and skip the gate, which is how redis-style cross-file command
tables resolve), and resolved by a dedicated strategy: function/method
targets only, same-file first, unique-or-drop cross-file, no fuzzy
fallback ever. Edges persist as kind 'references' with metadata.fnRef,
so getCallers/getImpactRadius surface them with zero graph-layer
changes; MCP callers/callees label them "via callback registration".

Precision rules bought by real-repo false positives (full A/B record in
docs/design/function-ref-capture.md): C++ is &-explicit outside
file-scope tables (fmt's begin/out/size collisions; out-of-line member
defs are function-kind); TS/JS/Python bare ids resolve to functions only
(TS class fields extract as method-kind — pre-existing quirk); Swift
refuses same-file method overload-families; param-forward shapes
(this.x = x, value: value) and destructuring are skipped; minified
bundles (*.min.js) produce no candidates.

Validated on 17 public OSS repos (redis, excalidraw, gin, bytes, okhttp,
okio, Alamofire, flask, sinatra, Newtonsoft.Json, scopt, provider,
busted, Fusion, AFNetworking, PascalCoin, fmt): node counts identical,
zero calls edges lost or gained, references strictly additive
(+3,200 registration edges total), precision spot-checked by reading
sampled source lines (redis 30/30, flask 8/8). Deliberately NOT covered:
indirect-dispatch resolution (o->cb(x) → impl) — that needs data-flow
through struct fields, and a wrong edge is worse than none.

EXTRACTION_VERSION 18 → 19 (re-index to benefit).

Closes #756

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Colby Mchenry 1 tuần trước cách đây
mục cha
commit
8a114ba53c

+ 1 - 0
CHANGELOG.md

@@ -16,6 +16,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
 ### New Features
 
+- CodeGraph now sees where a function is **registered as a callback**, not just where it's called. A function name passed as an argument (`signal(SIGINT, handler)`, `qsort(…, compare)`, `addEventListener(…, onBlur)`), assigned to a function pointer or field (`ops->recv_cb = my_cb`, `OnClick := Handler`), or placed in a struct initializer or handler table (`{ .recv_cb = my_cb }`, `{ "get", getCommand }`) now produces a reference edge from the registration site to the function — so `codegraph_callers` and `codegraph_impact` surface callback wiring that previously looked like dead code. Works across all supported languages, including the language-specific forms: C/C++ `&fn`, Java `Class::method`, Kotlin `::fn`, Swift `#selector`, Objective-C `@selector`, Ruby `method(:fn)`, Scala eta-expansion, and Delphi/Pascal `@Handler` and `OnClick := Handler` event wiring. Callers output labels these "via callback registration". Resolution is deliberately conservative: an ambiguous name produces no edge rather than a wrong one. Re-index a project to benefit. Thanks @zmcrazy. (#756)
 - The `codegraph_node` MCP tool can now **read a whole source file like the built-in Read tool — only faster, served from the index**. Pass a file path with no symbol and it returns that file's current source with line numbers (the same `<n>⇥<line>` shape Read produces, so an assistant can edit straight from it), narrowable with `offset`/`limit` exactly like Read, plus a one-line note of which files depend on it (the file's blast radius). Use it anywhere you'd reach for Read on an indexed source file. Pass `symbolsOnly: true` for just the file's structure. Configuration/data files (`.yml` / `.properties`) are summarized by key only, never dumped, so secrets in them are never surfaced. The agent-facing guidance was also retuned so assistants reach for codegraph while *implementing* a change (not only when answering questions), since one codegraph call returns the same bytes plus the blast radius, faster than re-reading the file.
 - New `codegraph upgrade` command updates CodeGraph to the latest release in place — it detects how you installed (the standalone `install.sh` / `install.ps1` bundle, npm, or npx) and does the right thing for each, on macOS, Linux, and Windows. Use `codegraph upgrade --check` to see whether an update is available without installing, or `codegraph upgrade <version>` to move to a specific version. After upgrading it reminds you to re-index your projects so they pick up the newer engine's improvements. (#679)
 - `codegraph status` now flags when a project's index was built by an older engine than the one you're running and recommends re-indexing (also surfaced in `codegraph status --json`), so you know when a `codegraph index -f` or `codegraph sync` will add coverage a newer release introduced.

+ 498 - 0
__tests__/function-ref.test.ts

@@ -0,0 +1,498 @@
+/**
+ * Function-as-value capture tests (#756) — registration-linking for callbacks.
+ *
+ * A function name used as a VALUE (passed as an argument, assigned to a
+ * field/function pointer, placed in a struct/object initializer or function
+ * table) must produce a `references` edge from the registration site to the
+ * function, so `callers`/`impact` surface where a callback is wired up.
+ *
+ * Safety properties verified here, per the dynamic-dispatch discipline
+ * ("a wrong edge is worse than none"):
+ *  - decoy: an ambiguous cross-file name (no import, ≥2 definitions) → NO edge
+ *  - same-file priority: a same-file definition beats a same-named decoy
+ *  - kind filter: a class/variable passed as a value never gets a
+ *    function-ref edge
+ *  - self: a function passing itself → no self-loop
+ *  - drain: all resolvable function_ref rows leave unresolved_refs (no
+ *    batched-resolver runaway), and re-index is idempotent
+ */
+
+import { describe, it, expect, beforeAll, afterEach } from 'vitest';
+import * as fs from 'fs';
+import * as path from 'path';
+import * as os from 'os';
+import { CodeGraph } from '../src';
+import type { Edge } from '../src/types';
+import { initGrammars, loadAllGrammars } from '../src/extraction/grammars';
+
+beforeAll(async () => {
+  await initGrammars();
+  await loadAllGrammars();
+});
+
+/** Incoming edges to `name`'s node that came from function-as-value capture. */
+function fnRefEdgesInto(cg: CodeGraph, name: string): Edge[] {
+  const targets = cg.getNodesByName(name);
+  const edges: Edge[] = [];
+  for (const t of targets) {
+    for (const e of cg.getIncomingEdges(t.id)) {
+      if (e.kind === 'references' && e.metadata?.fnRef === true) {
+        edges.push(e);
+      }
+    }
+  }
+  return edges;
+}
+
+/** Names of the source nodes of the given edges, sorted. */
+function sourceNames(cg: CodeGraph, edges: Edge[]): string[] {
+  const names: string[] = [];
+  for (const e of edges) {
+    const n = cg.getNode(e.source);
+    if (n) names.push(n.name);
+  }
+  return names.sort();
+}
+
+describe('Function-as-value capture (#756)', () => {
+  let tmpDir: string | undefined;
+  afterEach(() => {
+    if (tmpDir) fs.rmSync(tmpDir, { recursive: true, force: true });
+    tmpDir = undefined;
+  });
+
+  it('C: registration sites produce references edges (the #756 scenario)', async () => {
+    tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'cg-fnref-c-'));
+    fs.writeFileSync(
+      path.join(tmpDir, 'driver.c'),
+      [
+        'struct ops { void (*recv_cb)(int); void (*send_cb)(int); };',
+        'typedef void (*cb_t)(int);',
+        '',
+        'static void my_recv_cb(int x) { (void)x; }',
+        'static void my_send_cb(int x) { (void)x; }',
+        '',
+        'void register_handler(void (*cb)(int)) { cb(1); }',
+        '',
+        'void direct_caller(void) { my_recv_cb(5); }',
+        '',
+        'void arg_registrar(void) { register_handler(my_recv_cb); }',
+        'void addr_registrar(void) { register_handler(&my_recv_cb); }',
+        'void assign_registrar(struct ops *o) { o->recv_cb = my_recv_cb; }',
+        '',
+        'static struct ops global_ops = { .recv_cb = my_recv_cb, .send_cb = my_send_cb };',
+        'static cb_t cb_table[] = { my_recv_cb, my_send_cb };',
+      ].join('\n')
+    );
+
+    const cg = CodeGraph.initSync(tmpDir);
+    try {
+      await cg.indexAll();
+
+      const intoRecv = fnRefEdgesInto(cg, 'my_recv_cb');
+      expect(sourceNames(cg, intoRecv)).toEqual([
+        'addr_registrar',
+        'arg_registrar',
+        'assign_registrar',
+        'driver.c', // file-scope: designated init + positional table (deduped per source)
+      ]);
+
+      // The direct call is still a `calls` edge — unchanged by this feature.
+      const recv = cg.getNodesByName('my_recv_cb')[0]!;
+      const callEdges = cg
+        .getIncomingEdges(recv.id)
+        .filter((e) => e.kind === 'calls');
+      expect(sourceNames(cg, callEdges)).toEqual(['direct_caller']);
+    } finally {
+      cg.destroy();
+      tmpDir = undefined;
+    }
+  });
+
+  it('TypeScript: arg / object / array / member / assignment forms', async () => {
+    tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'cg-fnref-ts-'));
+    fs.writeFileSync(
+      path.join(tmpDir, 'main.ts'),
+      [
+        'export function targetCb(x: number): void { console.log(x); }',
+        'function registerHandler(cb: (x: number) => void): void { cb(1); }',
+        '',
+        'export function argRegistrar(): void { registerHandler(targetCb); }',
+        'export function timerRegistrar(): void { setTimeout(targetCb, 100); }',
+        'export function objRegistrar(): unknown { return { recv: targetCb }; }',
+        'export function arrRegistrar(): unknown { return [targetCb]; }',
+        '',
+        'class Emitter { cb: ((x: number) => void) | null = null; }',
+        'export function assignRegistrar(e: Emitter): void { e.cb = targetCb; }',
+        '',
+        'interface Btn { on(ev: string, cb: () => void): void; }',
+        'export class Comp {',
+        '  handleClick(): void {}',
+        '  wire(btn: Btn): void { btn.on("click", this.handleClick); }',
+        '}',
+      ].join('\n')
+    );
+
+    const cg = CodeGraph.initSync(tmpDir);
+    try {
+      await cg.indexAll();
+
+      expect(sourceNames(cg, fnRefEdgesInto(cg, 'targetCb'))).toEqual([
+        'argRegistrar',
+        'arrRegistrar',
+        'assignRegistrar',
+        'objRegistrar',
+        'timerRegistrar',
+      ]);
+      // `this.handleClick` is deliberately NOT captured in TS/JS: class fields
+      // extract as method-kind nodes, so `this.X` value positions (mostly data
+      // reads in real code) produced wrong edges — see TS_JS_SPEC note.
+      expect(fnRefEdgesInto(cg, 'handleClick')).toHaveLength(0);
+    } finally {
+      cg.destroy();
+      tmpDir = undefined;
+    }
+  });
+
+  it('resolves an imported callback across files via its import', async () => {
+    tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'cg-fnref-import-'));
+    fs.writeFileSync(
+      path.join(tmpDir, 'handlers.ts'),
+      'export function onMessage(x: number): void { console.log(x); }\n'
+    );
+    fs.writeFileSync(
+      path.join(tmpDir, 'wiring.ts'),
+      [
+        "import { onMessage } from './handlers';",
+        'export function wire(bus: { on(cb: (x: number) => void): void }): void {',
+        '  bus.on(onMessage);',
+        '}',
+      ].join('\n')
+    );
+
+    const cg = CodeGraph.initSync(tmpDir);
+    try {
+      await cg.indexAll();
+      const edges = fnRefEdgesInto(cg, 'onMessage');
+      expect(sourceNames(cg, edges)).toContain('wire');
+      // The edge must target the handlers.ts definition.
+      const target = cg.getNode(edges[0]!.target);
+      expect(target?.filePath.endsWith('handlers.ts')).toBe(true);
+    } finally {
+      cg.destroy();
+      tmpDir = undefined;
+    }
+  });
+
+  it('DECOY: ambiguous cross-file name without an import resolves to NO edge', async () => {
+    tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'cg-fnref-decoy-'));
+    // Two same-named functions in different files…
+    fs.writeFileSync(path.join(tmpDir, 'a.ts'), 'export function process(x: number): void {}\n');
+    fs.writeFileSync(path.join(tmpDir, 'b.ts'), 'export function process(x: number): void {}\n');
+    // …and a registrar that names `process` WITHOUT importing it. The name
+    // still passes the extraction gate only if imported/defined here — it is
+    // neither, so this asserts the gate; even if it leaked through, the
+    // ambiguity rule (unique-only cross-file) must yield no edge.
+    fs.writeFileSync(
+      path.join(tmpDir, 'c.ts'),
+      'export function wire(bus: { on(cb: unknown): void }, process: unknown): void { bus.on(process); }\n'
+    );
+
+    const cg = CodeGraph.initSync(tmpDir);
+    try {
+      await cg.indexAll();
+      const edges = fnRefEdgesInto(cg, 'process');
+      expect(sourceNames(cg, edges)).not.toContain('wire');
+    } finally {
+      cg.destroy();
+      tmpDir = undefined;
+    }
+  });
+
+  it('SAME-FILE PRIORITY: a same-file definition beats a same-named decoy elsewhere', async () => {
+    tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'cg-fnref-samefile-'));
+    fs.writeFileSync(path.join(tmpDir, 'decoy.c'), 'void my_cb(int x) { (void)x; }\n');
+    fs.writeFileSync(
+      path.join(tmpDir, 'real.c'),
+      [
+        'static void my_cb(int x) { (void)x; }',
+        'void register_handler(void (*cb)(int)) { cb(1); }',
+        'void wire(void) { register_handler(my_cb); }',
+      ].join('\n')
+    );
+
+    const cg = CodeGraph.initSync(tmpDir);
+    try {
+      await cg.indexAll();
+      const wires = fnRefEdgesInto(cg, 'my_cb').filter((e) => {
+        const src = cg.getNode(e.source);
+        return src?.name === 'wire';
+      });
+      expect(wires).toHaveLength(1);
+      const target = cg.getNode(wires[0]!.target);
+      expect(target?.filePath.endsWith('real.c')).toBe(true);
+    } finally {
+      cg.destroy();
+      tmpDir = undefined;
+    }
+  });
+
+  it('KIND FILTER: a class passed as a value gets no function-ref edge', async () => {
+    tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'cg-fnref-kind-'));
+    fs.writeFileSync(
+      path.join(tmpDir, 'main.ts'),
+      [
+        'export class Strategy { run(): void {} }',
+        'export function consume(x: unknown): void { void x; }',
+        'export function wire(): void { consume(Strategy); }',
+      ].join('\n')
+    );
+
+    const cg = CodeGraph.initSync(tmpDir);
+    try {
+      await cg.indexAll();
+      const strategy = cg.getNodesByName('Strategy').find((n) => n.kind === 'class')!;
+      const fnRef = cg
+        .getIncomingEdges(strategy.id)
+        .filter((e) => e.metadata?.fnRef === true);
+      expect(fnRef).toHaveLength(0);
+    } finally {
+      cg.destroy();
+      tmpDir = undefined;
+    }
+  });
+
+  it('SELF: a function registering itself produces no self-loop', async () => {
+    tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'cg-fnref-self-'));
+    fs.writeFileSync(
+      path.join(tmpDir, 'main.ts'),
+      [
+        'declare function schedule(cb: () => void): void;',
+        'export function retry(): void { schedule(retry); }',
+      ].join('\n')
+    );
+
+    const cg = CodeGraph.initSync(tmpDir);
+    try {
+      await cg.indexAll();
+      const retry = cg.getNodesByName('retry')[0]!;
+      const selfLoops = cg
+        .getIncomingEdges(retry.id)
+        .filter((e) => e.source === retry.id && e.metadata?.fnRef === true);
+      expect(selfLoops).toHaveLength(0);
+    } finally {
+      cg.destroy();
+      tmpDir = undefined;
+    }
+  });
+
+  it('C++: &Cls::method member pointers resolve scoped; bare ids are free-function-only', async () => {
+    tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'cg-fnref-cpp-'));
+    fs.writeFileSync(
+      path.join(tmpDir, 'widget.cpp'),
+      [
+        'struct Widget {',
+        '  void on_click(int x);',
+        '};',
+        'void Widget::on_click(int x) { (void)x; }',
+        'struct Decoy {',
+        '  void on_click(int x);',
+        '};',
+        'void Decoy::on_click(int x) { (void)x; }',
+        'void free_cb(int x) { (void)x; }',
+        'void bare_fn(int x) { (void)x; }',
+        'void reg(void* p) { (void)p; }',
+        'void wire() {',
+        '  auto p = &Widget::on_click;', // qualified — must hit Widget, not Decoy
+        '  reg(p);',
+        '  reg(&free_cb);', // explicit address-of — captured
+        '  reg(bare_fn);', // bare id in args — NOT captured for C++ (addressOfOnly)
+        '}',
+        // A method named like a local: passing the LOCAL must not resolve to
+        // the method (cpp args accept only explicit & forms).
+        'struct Buf { char* out(); };',
+        'void copy_to(void* out_) { (void)out_; }',
+        'void caller(char* out) { copy_to(out); }',
+      ].join('\n')
+    );
+
+    const cg = CodeGraph.initSync(tmpDir);
+    try {
+      await cg.indexAll();
+
+      // Qualified member pointer resolves to Widget::on_click specifically.
+      const onClicks = cg.getNodesByName('on_click');
+      const widgetOnClick = onClicks.find((n) => n.qualifiedName.includes('Widget'))!;
+      const decoyOnClick = onClicks.find((n) => n.qualifiedName.includes('Decoy'))!;
+      const intoWidget = cg
+        .getIncomingEdges(widgetOnClick.id)
+        .filter((e) => e.metadata?.fnRef === true);
+      expect(intoWidget).toHaveLength(1);
+      expect(cg.getNode(intoWidget[0]!.source)?.name).toBe('wire');
+      expect(
+        cg.getIncomingEdges(decoyOnClick.id).filter((e) => e.metadata?.fnRef === true)
+      ).toHaveLength(0);
+
+      // Explicit &fn resolves; bare identifier in C++ args does NOT (the
+      // generic-name collision class: fmt's `begin`/`out`/`size` params).
+      expect(sourceNames(cg, fnRefEdgesInto(cg, 'free_cb'))).toContain('wire');
+      expect(fnRefEdgesInto(cg, 'bare_fn')).toHaveLength(0);
+
+      // The local `out` param must NOT produce an edge to Buf::out.
+      const outMethod = cg.getNodesByName('out').find((n) => n.kind === 'method');
+      if (outMethod) {
+        expect(
+          cg.getIncomingEdges(outMethod.id).filter((e) => e.metadata?.fnRef === true)
+        ).toHaveLength(0);
+      }
+    } finally {
+      cg.destroy();
+      tmpDir = undefined;
+    }
+  });
+
+  it('Pascal: := event wiring, @addr and bare args', async () => {
+    tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'cg-fnref-pas-'));
+    fs.writeFileSync(
+      path.join(tmpDir, 'main.pas'),
+      [
+        'unit Main;',
+        'interface',
+        'type',
+        '  TCallback = procedure(X: Integer);',
+        '  THolder = class',
+        '  public',
+        '    OnFire: TCallback;',
+        '    procedure Wire;',
+        '  end;',
+        'procedure TargetCb(X: Integer);',
+        'procedure RegisterHandler(Cb: TCallback);',
+        'procedure ArgRegistrar;',
+        'procedure AddrRegistrar;',
+        'implementation',
+        'procedure TargetCb(X: Integer);',
+        'begin',
+        '  WriteLn(X);',
+        'end;',
+        'procedure RegisterHandler(Cb: TCallback);',
+        'begin',
+        '  Cb(1);',
+        'end;',
+        'procedure ArgRegistrar;',
+        'begin',
+        '  RegisterHandler(TargetCb);',
+        'end;',
+        'procedure AddrRegistrar;',
+        'begin',
+        '  RegisterHandler(@TargetCb);',
+        'end;',
+        'procedure THolder.Wire;',
+        'begin',
+        '  OnFire := TargetCb;',
+        'end;',
+        'end.',
+      ].join('\n')
+    );
+
+    const cg = CodeGraph.initSync(tmpDir);
+    try {
+      await cg.indexAll();
+      expect(sourceNames(cg, fnRefEdgesInto(cg, 'TargetCb'))).toEqual([
+        'AddrRegistrar',
+        'ArgRegistrar',
+        'Wire',
+      ]);
+    } finally {
+      cg.destroy();
+      tmpDir = undefined;
+    }
+  });
+
+  it('C UNGATED TABLES: a command table names handlers defined in OTHER files (redis pattern)', async () => {
+    tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'cg-fnref-ctable-'));
+    // Handler defined in its own file…
+    fs.writeFileSync(path.join(tmpDir, 't_string.c'), 'void getCommand(int c) { (void)c; }\n');
+    // …and registered in a table in ANOTHER file, with no import mechanism (C).
+    fs.writeFileSync(
+      path.join(tmpDir, 'server.c'),
+      [
+        'struct cmd { const char *name; void (*proc)(int); };',
+        'static struct cmd commandTable[] = {',
+        '  { "get", getCommand },',
+        '};',
+      ].join('\n')
+    );
+    // Ambiguity safety: two files define dupCmd; a third table references it →
+    // NO edge (unique-or-drop).
+    fs.writeFileSync(path.join(tmpDir, 'dup_a.c'), 'void dupCmd(int c) { (void)c; }\n');
+    fs.writeFileSync(path.join(tmpDir, 'dup_b.c'), 'void dupCmd(int c) { (void)c; }\n');
+    fs.writeFileSync(
+      path.join(tmpDir, 'other.c'),
+      [
+        'struct cmd2 { void (*proc)(int); };',
+        'static struct cmd2 otherTable[] = { { dupCmd } };',
+      ].join('\n')
+    );
+
+    const cg = CodeGraph.initSync(tmpDir);
+    try {
+      await cg.indexAll();
+
+      // Cross-file unique handler resolves from the table's file.
+      const intoGet = fnRefEdgesInto(cg, 'getCommand');
+      expect(sourceNames(cg, intoGet)).toEqual(['server.c']);
+      const target = cg.getNode(intoGet[0]!.target);
+      expect(target?.filePath.endsWith('t_string.c')).toBe(true);
+
+      // Ambiguous handler resolves to NOTHING — silent beats wrong.
+      expect(fnRefEdgesInto(cg, 'dupCmd')).toHaveLength(0);
+    } finally {
+      cg.destroy();
+      tmpDir = undefined;
+    }
+  });
+
+  it('DRAIN: resolvable function_ref rows leave unresolved_refs; re-index is stable', async () => {
+    tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'cg-fnref-drain-'));
+    fs.writeFileSync(
+      path.join(tmpDir, 'main.c'),
+      [
+        'static void cb_a(int x) { (void)x; }',
+        'void reg(void (*cb)(int)) { cb(1); }',
+        'void wire(void) { reg(cb_a); }',
+      ].join('\n')
+    );
+
+    const cg = CodeGraph.initSync(tmpDir);
+    try {
+      await cg.indexAll();
+      const stats1 = cg.getStats();
+
+      // No function_ref rows may linger for resolvable names — the batched
+      // resolver must have drained them (delete keyed on the ORIGINAL stored
+      // ref; the #760 runaway came from violating that).
+      const db = (cg as unknown as { db: { prepare(sql: string): { all(): unknown[] } } }).db;
+      let leftover: unknown[] = [];
+      try {
+        leftover = db
+          .prepare("SELECT * FROM unresolved_refs WHERE reference_kind = 'function_ref'")
+          .all();
+      } catch {
+        // If internals aren't reachable this guard is covered by the edge
+        // assertions below.
+      }
+      expect(leftover).toHaveLength(0);
+
+      // Re-index: identical node/edge counts (idempotent, no accumulation).
+      await cg.indexAll();
+      const stats2 = cg.getStats();
+      expect(stats2.totalNodes).toBe(stats1.totalNodes);
+      expect(stats2.totalEdges).toBe(stats1.totalEdges);
+
+      expect(sourceNames(cg, fnRefEdgesInto(cg, 'cb_a'))).toEqual(['wire']);
+    } finally {
+      cg.destroy();
+      tmpDir = undefined;
+    }
+  });
+});

+ 188 - 0
docs/design/function-ref-capture.md

@@ -0,0 +1,188 @@
+# Function-as-value capture (#756) — registration-linking for callbacks
+
+**Problem.** A function used as a *value* — passed as an argument, assigned to a
+function pointer or field, placed in a struct initializer or handler table —
+produced **no edge** in any of the 19 tree-sitter languages (probed 2026-06-11;
+0/19). `callers(my_recv_cb)` on a C callback showed nothing but direct calls, so
+every registered callback looked dead, and the registration sites — the agent's
+actual next question ("where is this wired up?") — were invisible.
+
+**Non-goal, deliberate.** Resolving the *dispatch* (`o->cb(x)` → the concrete
+registered function) needs data-flow through struct fields; even an LSP needs
+fallbacks there (see the #756 thread). Partial coverage is worse than none and
+a wrong edge is worse than silence — dispatch resolution stays uncovered. What
+ships is the *registration* side, which is deterministic: the function's name
+is literally in the source at the registration site.
+
+## Mechanism
+
+```
+capture (tree-sitter.ts walkers, table-driven per language: src/extraction/function-ref.ts)
+   → gate (flushFnRefCandidates: same-file fn/method name ∪ imported binding names;
+            C-family file-scope initializers skip the gate — see below)
+   → unresolved ref, referenceKind 'function_ref' (internal-only kind)
+   → resolution (resolveOne branch: resolveViaImport first, then matchFunctionRef —
+                 exact name, function/method kinds only, same-family, same-file first,
+                 cross-file only when UNIQUE, never fuzzy)
+   → edge kind 'references', metadata { fnRef: true, resolvedBy, confidence }
+```
+
+`getCallers`/`getCallees`/`getImpactRadius` already traverse `references`, so
+registration sites surface with no graph-layer changes. The MCP callers/callees
+lists label them "via callback registration".
+
+Capture fires from three walkers (a node is only ever visited by one):
+`visitNode` (file/class scope), `visitForCallsAndStructure` (function bodies),
+`visitPascalBlock` (Pascal bodies). Subtrees the walkers consume without
+descending (top-level variable initializers, class field/property initializers,
+custom `visitNode` hooks like Scala's val/var handler) get a candidates-only
+`scanFnRefSubtree` that halts at nested function boundaries.
+
+## Per-language value positions (probe-verified)
+
+| Language | arg | assign RHS | keyed init | list/table | wrapper forms |
+|---|---|---|---|---|---|
+| C / ObjC | `argument_list` | `assignment_expression.right` | `initializer_pair.value` | `initializer_list`, `init_declarator.value` | `&fn` (`pointer_expression`), `@selector(...)` (ObjC) |
+| C++ | **`&` forms only** in args/rhs/varinit | (same — explicit `&` only) | bare ids at FILE scope only | bare ids at FILE scope only | `&fn`, `&Cls::method` (resolved scoped to the class) |
+| TS / JS (tsx/jsx) | `arguments` | `assignment_expression.right` | `pair.value` | `array`, `variable_declarator.value` | — (see TS notes) |
+| Python | `argument_list`, `keyword_argument.value` | `assignment.right` | `pair.value` | `list` | `self.method` (`attribute`) |
+| Go | `argument_list` | `assignment_statement` / `short_var_declaration` (`expression_list`) | `keyed_element` | `literal_value`, `var_spec.value` | — |
+| Rust | `arguments` | `assignment_expression.right` | `field_initializer.value` | `array_expression`, `static_item` / `let_declaration.value` | — |
+| Java | `argument_list` | `assignment_expression.right` | — | `variable_declarator.value` | `method_reference` (`Cls::m`, `this::m`) — the only form |
+| Kotlin | `value_arguments` | `assignment` (last child) | — | — | `callable_reference` (`::f`), `navigation_expression` `this::m` |
+| C# | `argument_list` (`argument`) | `assignment_expression.right` (incl. `+=`) | — | `initializer_expression`, `variable_declarator` | `this.M` (`member_access_expression`; vendored grammar keeps `this` anonymous — handled) |
+| Ruby | `argument_list` | — | `pair.value` | — | only `method(:sym)` / `&method(:sym)` — bare ids are calls/locals in Ruby |
+| Swift | `value_arguments` (`value_argument.value`) | `assignment.result` | (labeled ctor args = args) | `array_literal`, `property_declaration.value` | `#selector(...)` |
+| Scala | `arguments` | `assignment_expression.right` | — | `val_definition.value` (via hook scan) | eta `fn _` (`postfix_expression`) |
+| Dart | `arguments` (`argument`) | `assignment_expression.right` | `pair.value` | `list_literal`, `static_final_declaration` | — |
+| Lua / Luau | `arguments` | `assignment_statement` (`expression_list.value`) | `field.value` (keyed + positional) | (same) | — |
+| Pascal | `exprArgs` (via `visitPascalBlock`) | `assignment.rhs` (`OnFire := Handler`) | — | — | `@Handler` (`exprUnary.operand`) |
+| PHP | **skipped** | — | — | — | first-class callable `fn(...)` already extracts as a `calls` edge; string callables are a precision risk, deferred |
+
+## Precision rules (each one bought by a real-repo false positive)
+
+1. **The gate** (extraction-time): a candidate survives only if its name matches
+   a same-file function/method or an **imported binding** (`referenceKind ===
+   'imports'` only — scraping type-annotation `references` names let locals that
+   shared a type-member's name through; excalidraw).
+2. **C-family ungated file scope**: C has no symbol imports and registers
+   callbacks cross-file at repo scale (redis `server.c`'s command table names
+   handlers from `t_*.c`). File-scope initializer positions (`value`/`list`
+   modes) skip the gate — safe because a C file-scope initializer is a
+   **constant-expression context**: a bare identifier there can only be a
+   function address (enum/macro names get dropped by the kind filter). Local
+   initializers and assignments stay gated: `prev = next`, `*str = field`,
+   `arena_ind_prev = arena_ind` (redis/jemalloc) each matched a unique
+   same-named function somewhere and produced wrong edges when `rhs`/`varinit`
+   were ungated.
+3. **TS/JS/Python: bare ids resolve to `function` kind only.** A bare
+   identifier can never be a method value in these languages (methods need a
+   receiver — `this.m` / `self.m`), and TS class FIELDS are extracted as
+   method-kind nodes (pre-existing extractor quirk), so allowing method
+   targets soaked up locals passed as arguments
+   (`new Set(selectedPointsIndices)` → a same-named "method" field;
+   docopt.py's `name`/`match` params). For the same reason `this.X` capture
+   is disabled for TS/JS — in real code `this.X` value positions are mostly
+   data reads (`setCursor(this.canvas)`). Python's `self.m` form keeps method
+   targets through its own capture shape. C#/Swift/Dart/Java/Kotlin keep
+   method targets (method groups, implicit-self, method references are real
+   method values).
+4. **C++ is `&`-explicit** (`addressOfOnly`): bare identifiers qualify only in
+   FILE-scope initializer tables; everywhere else (args, assignments, local
+   braced-init lists `{begin, size}`) only `&fn` / `&Cls::method` count.
+   C++ codebases are dense with generic free-function names (`begin`, `end`,
+   `out`, `size`, `data`) colliding with locals, and OUT-OF-LINE member
+   definitions extract as *function*-kind nodes, defeating the kind filter —
+   bare-id matching on fmt was mostly wrong edges (72 generic-name + 105
+   member/macro mismatches → after the rule: 22 edges, ~20 genuine gtest
+   member-pointer wirings). `&x` vs `*x` share C's `pointer_expression`; only
+   the `&` operator qualifies. `&Cls::method` resolves SCOPED to that class.
+5. **Swift overload-family refusal**: several same-named METHODS in one file
+   (`Session.request(...)` × N) + a bare identifier = almost always a
+   same-named parameter, not a method value (Alamofire) — refuse rather than
+   guess. A unique method (SwiftUI `action: handleTap`) still resolves.
+6. **Param-forward skips**: `this.status = status` / `o->cb = cb` (assignment
+   whose member name equals the RHS identifier) and Swift/Kotlin labeled args
+   `value: value` — a forwarded local/parameter whose function value is
+   unknowable; a same-named function elsewhere would be the WRONG target.
+7. **Destructuring skip**: `const { center } = ellipse` extracts data, never a
+   function alias.
+8. **Generated/minified files** (`*.min.js` and the codegen patterns in
+   `generated-detection.ts`) produce no fn-ref candidates — minified
+   single-letter symbols resolve everywhere (Alamofire's vendored jquery).
+9. **Resolution**: function/method kinds only, same language family, never the
+   ref's own node (no self-loops), same-file match first, cross-file only when
+   the name is UNIQUE — ambiguity yields **no edge**. No fuzzy fallback,
+   ever (`matchReference` short-circuits `function_ref` refs to
+   `matchFunctionRef`).
+10. **Runaway invariant** (#760): `matchFunctionRef` always returns
+    `original: ref` — the stored row — so `deleteSpecificResolvedReferences`
+    drains the batch.
+
+## Validation (2026-06-11, EXTRACTION_VERSION 19)
+
+Stash-free A/B (baseline = worktree at `main`), fresh shallow clones, public
+OSS only. Per repo: node count must be identical, `calls` edges identical,
+`references` strictly additive, precision spot-checked by reading the source
+line of sampled `fnRef` edges.
+
+Final build, all 17 repos (nodes identical and calls edges untouched on every
+row; `unresolved_refs` fully drained — no batched-resolver runaway):
+
+| Lang | Repo | Nodes (base=fix) | calls Δ | refs gained | Notes |
+|---|---|---|---|---|---|
+| C | redis | 18931 | 0/0 | **+1918** | 30/30 sample genuine — ops tables, qsort comparators, module registration, lua lib tables |
+| TS/React | excalidraw | 10299 | 0/0 | **+121** | 18/20 — residual = param shadowing an imported function (file-level dep real) |
+| Go | gin | 2599 | 0/0 | +14 | |
+| Rust | bytes | 947 | 0/0 | +76 | `map(fn)`, struct init |
+| Java | okhttp | 16008 | 0/0 | +2 | method-ref forms only, by design |
+| Kotlin | okio | 7801 | 0/0 | +1 | `::fn` forms only, by design |
+| Swift | alamofire | 3477 | 0/0 | +116 | adversarial case (params mirror API names); overload-family + label==name rules applied |
+| Python | flask | 2705 | 0/0 | +111 | 8/8 sample genuine — incl. `ensure_sync(self.dispatch_request)` |
+| Ruby | sinatra | 1751 | 0/0 | +8 | `method(:sym)` only |
+| C# | newtonsoft | 20208 | 0/0 | +38 | method groups, `+=` |
+| Scala | scopt | 694 | 0/0 | +10 | eta-expansion |
+| Dart | provider | 1154 | 0/0 | +73 | implicit-this getter reads — true same-class dependencies |
+| Lua | busted | 1257 | 0/0 | +14 | |
+| Luau | fusion | 2126 | 0/0 | +18 | `:Connect(fn)` |
+| ObjC | afnetworking | 1487 | 0/0 | +52 | `@selector`, target-action |
+| Pascal | pascalcoin | 48788 | 0/0 | +577 | `OnClick :=` event wiring + paren-less-call refs (see limits) |
+| C++ | fmt | 7345 | 0/0 | +22 | ~20/22 genuine gtest member-pointer plumbing after addressOfOnly |
+
+Index cost on redis: +6% time, +5% db size.
+
+## Known limits (documented, deliberate)
+
+- **Dispatch resolution** (`o->cb(x)` → implementations): uncovered, see above.
+- **C cross-file in gated positions**: an extern callback registered via
+  *assignment* in a different file than its definition only resolves when the
+  name is repo-unique (initializer tables don't have this limit — they're
+  ungated at file scope).
+- **C++ bare-name registration** (`register_handler(my_cb)` without `&`):
+  dropped by `addressOfOnly` — the generic-name collision rate made bare ids
+  net-negative on real C++ (fmt). `&my_cb` / file-scope tables cover the
+  idioms; C files keep bare args.
+- **Local/param shadowing an imported or same-file function**
+  (`mutateElement(newElement, …)` where the file also imports `newElement`;
+  JS plugins' `indexOf(val)` with a same-file `val()` helper): irreducible
+  without local-scope tracking — the data-flow frontier deliberately left
+  uncovered. ~1-2 per 20 sampled edges on callback-heavy repos; the file-level
+  dependency is real in every observed case.
+- **Swift single same-named method collisions** (`request(self, didFailTask:
+  task…)` where one `task` method exists): the overload-family rule only
+  refuses when ≥2 same-named methods share the file. Alamofire-style
+  API-mirrored param naming keeps a residual; needs same-type scoping (v2).
+- **Pascal paren-less calls** (`Result := DoInitialize`): captured as
+  references (Pascal can't distinguish a procedure VALUE from a paren-less
+  CALL without types). The dependency direction is correct and these calls
+  were previously invisible entirely (#791) — strictly more truth, imperfect
+  label.
+- **Java/Kotlin cross-file method refs** (`OtherClass::method` without the
+  defining class imported as a simple name): gated away; same-file and
+  `this::m` forms work.
+- **Swift cross-file bare references**: Swift sees module-wide symbols without
+  imports, so cross-file bare callbacks only resolve when repo-unique.
+- **PHP string callables**, **Ruby bare symbols** outside `method(:sym)`,
+  **`obj.method` member values** where `obj` isn't `this`/`self`: deferred.
+- **TS `this.X`**: disabled until TS class-field kind classification is fixed
+  (fields currently extract as method-kind nodes).

+ 1 - 1
src/extraction/extraction-version.ts

@@ -21,4 +21,4 @@
  * turns the re-index hint into noise — keep it honest (see CLAUDE.md, "Honesty
  * in the product is load-bearing").
  */
-export const EXTRACTION_VERSION = 18;
+export const EXTRACTION_VERSION = 19;

+ 644 - 0
src/extraction/function-ref.ts

@@ -0,0 +1,644 @@
+/**
+ * Function-as-value capture (#756) — registration-linking for callbacks.
+ *
+ * A function name used as a VALUE — passed as a call argument
+ * (`register_handler(target_cb)`, `signal(SIGINT, handler)`), assigned to a
+ * field or function pointer (`o->cb = target_cb`, `OnFire := TargetCb`),
+ * placed in a struct/object initializer (`{ .recv_cb = my_cb }`,
+ * `{ recv: targetCb }`, `Ops{Cb: targetCb}`), or listed in a function table
+ * (`static cb_t table[] = { cb_a, cb_b }`) — is a real dependency that static
+ * call extraction misses entirely: `callers(target_cb)` showed nothing but
+ * direct calls, so every callback looked dead and its registration sites were
+ * invisible to impact analysis.
+ *
+ * This module captures those value positions during the AST walk as
+ * `function_ref` candidates. Capture is table-driven per language (the value
+ * positions and wrapper forms differ per grammar — `&fn` in C, `Main::fn` in
+ * Java, `::fn` in Kotlin, `#selector(fn)` in Swift, `@TargetCb` in Pascal,
+ * `method(:fn)` in Ruby). Candidates are GATED at end-of-file extraction
+ * (see `TreeSitterExtractor.flushFnRefCandidates`): only names matching a
+ * same-file function/method or an imported binding survive, which bounds
+ * volume and keeps precision high. Resolution then matches survivors against
+ * function/method nodes ONLY (`matchFunctionRef` in
+ * `src/resolution/name-matcher.ts`) and persists them as `references` edges,
+ * which `callers`/`impact` already traverse.
+ *
+ * Deliberately NOT covered (resolving the *dispatch* — `o->cb(x)` → the
+ * registered function — needs data-flow through struct fields; a wrong edge
+ * is worse than none): indirect-call resolution, PHP string callables,
+ * Ruby bare symbols outside `method(:sym)`, and `obj.method` member values
+ * where `obj` isn't `this`/`self`.
+ */
+
+import type { Node as SyntaxNode } from 'web-tree-sitter';
+import { getNodeText, getChildByField } from './tree-sitter-helpers';
+
+export interface FnRefCandidate {
+  name: string;
+  line: number;
+  column: number;
+  /** Which capture position produced this candidate (gate policy keys on it). */
+  mode: CaptureMode;
+  /**
+   * True when the value was an explicit reference form (`&fn`, `&Cls::m`,
+   * `::fn`, `#selector`, `method(:sym)`) rather than a bare identifier —
+   * C++'s flush policy keys on it.
+   */
+  explicitRef: boolean;
+}
+
+/** How to pull candidate value nodes out of a dispatched container node. */
+type CaptureMode =
+  | 'args' // every named child is a potential value (call argument lists)
+  | 'rhs' // the assignment right-hand side (named field, else last named child)
+  | 'value' // the `value` field of a keyed pair (object/struct/table initializers)
+  | 'list' // every named child (array / initializer-list / table positional elements)
+  | 'varinit'; // a variable declarator's initializer value
+
+interface CaptureRule {
+  mode: CaptureMode;
+  /** Field holding the value for rhs/value/varinit (defaults per mode). */
+  field?: string;
+}
+
+export interface FnRefSpec {
+  /** Bare identifier node types that can act as a function value. */
+  idTypes: Set<string>;
+  /** Container node type → how to extract candidate values from it. */
+  dispatch: Map<string, CaptureRule>;
+  /**
+   * Transparent wrapper layers between a container and its values
+   * (`argument`, `value_argument`, `literal_element`, `expression_list`…).
+   * Value: the field to descend into, or null for "named children".
+   * `expression_list` fans out to ALL named children (Go multi-assign).
+   */
+  layers?: Map<string, string | null>;
+  /**
+   * Unary wrappers whose operand is the function value — C/C++ `&fn`
+   * (pointer_expression), Pascal `@Fn` (exprUnary), Scala eta `fn _`
+   * (postfix_expression). Value: operand field, or null for first named child.
+   */
+  unwrap?: Map<string, string | null>;
+  /**
+   * Whole-node reference forms needing bespoke name extraction —
+   * `method_reference` (Java), `callable_reference` / `navigation_expression`
+   * (Kotlin), `selector_expression` (Swift `#selector` / ObjC `@selector`),
+   * Ruby `method(:sym)` calls, and `this.method` member forms.
+   */
+  special?: Set<string>;
+  /**
+   * Capture modes whose candidates skip the same-file/import gate and rely on
+   * resolution's unique-or-drop rule instead. C-family only: an initializer
+   * value, function-pointer assignment RHS, or table element is a
+   * function-pointer position by construction, and C has no symbol imports —
+   * the dominant repo-scale pattern (`server.c`'s command table naming
+   * handlers defined across files) would otherwise be invisible. Call
+   * arguments stay gated everywhere (locals passed as args dwarf callbacks).
+   */
+  ungatedModes?: Set<CaptureMode>;
+  /**
+   * C++ only: in args/rhs/varinit positions, accept ONLY explicit reference
+   * forms (`&fn`, `&Cls::method`) — never bare identifiers. C++ codebases are
+   * dense with generic free-function/accessor names (`begin`, `end`, `out`,
+   * `size`, `data`) that collide with parameters and locals, and out-of-line
+   * member definitions extract as function-kind nodes — bare-id matching on
+   * fmt was mostly wrong edges. File-scope initializer tables (value/list)
+   * still accept bare identifiers, same as C.
+   */
+  addressOfOnly?: boolean;
+}
+
+/** Names that are never function references even when grammars call them identifiers. */
+const NAME_STOPLIST = new Set([
+  'this',
+  'self',
+  'super',
+  'null',
+  'nil',
+  'true',
+  'false',
+  'undefined',
+  'new',
+  'NULL',
+  'nullptr',
+  'None',
+]);
+
+// ---------------------------------------------------------------------------
+// Per-language specs. Node types verified against each grammar (probe fixtures
+// in the #756 investigation; see docs/design/function-ref-capture.md).
+// ---------------------------------------------------------------------------
+
+/** C / C++ / Objective-C share the C-family initializer & assignment shapes. */
+function cFamilySpec(extra?: { special?: string[]; addressOfOnly?: boolean }): FnRefSpec {
+  return {
+    idTypes: new Set(['identifier']),
+    dispatch: new Map<string, CaptureRule>([
+      ['argument_list', { mode: 'args' }],
+      ['assignment_expression', { mode: 'rhs', field: 'right' }],
+      ['init_declarator', { mode: 'varinit', field: 'value' }],
+      ['initializer_list', { mode: 'list' }],
+      ['initializer_pair', { mode: 'value', field: 'value' }],
+    ]),
+    unwrap: new Map([['pointer_expression', 'argument']]),
+    special: new Set(extra?.special ?? []),
+    // C has no symbol imports, and callbacks are registered cross-file at repo
+    // scale (redis: server.c's command table names handlers from t_*.c) — so
+    // initializer positions bypass the gate and lean on resolution's
+    // unique-or-drop rule. ONLY 'value'/'list' (struct/array initializers),
+    // and the flush additionally requires FILE scope: a C file-scope
+    // initializer is a constant-expression context, so a bare identifier
+    // there can only be a function address (or enum/macro, which the
+    // function-kind filter drops) — never a variable. 'rhs'/'varinit' were
+    // tried and produced false edges (`prev = next`, `*str = field` — data
+    // assignments matching a unique same-named function elsewhere), so
+    // assignments stay gated to same-file/import.
+    ungatedModes: new Set<CaptureMode>(['value', 'list']),
+    addressOfOnly: extra?.addressOfOnly,
+  };
+}
+
+// NOTE: deliberately NO `member_expression` (`this.handleClick`) capture for
+// TS/JS. Class fields with type annotations are extracted as method-kind
+// nodes (pre-existing extractor behavior), so `this.X` value positions —
+// which in real code are mostly DATA reads (`setCursor(this.canvas)`) —
+// resolved to those field nodes and produced wrong "registration" edges
+// (excalidraw A/B finding). Revisit if/when TS field classification is fixed.
+const TS_JS_SPEC: FnRefSpec = {
+  idTypes: new Set(['identifier']),
+  dispatch: new Map<string, CaptureRule>([
+    ['arguments', { mode: 'args' }],
+    ['assignment_expression', { mode: 'rhs', field: 'right' }],
+    ['variable_declarator', { mode: 'varinit', field: 'value' }],
+    ['pair', { mode: 'value', field: 'value' }],
+    ['array', { mode: 'list' }],
+  ]),
+};
+
+const PYTHON_SPEC: FnRefSpec = {
+  idTypes: new Set(['identifier']),
+  dispatch: new Map<string, CaptureRule>([
+    ['argument_list', { mode: 'args' }],
+    ['assignment', { mode: 'rhs', field: 'right' }],
+    ['keyword_argument', { mode: 'value', field: 'value' }], // Thread(target=worker)
+    ['pair', { mode: 'value', field: 'value' }],
+    ['list', { mode: 'list' }],
+  ]),
+  special: new Set(['attribute']),
+};
+
+const GO_SPEC: FnRefSpec = {
+  idTypes: new Set(['identifier']),
+  dispatch: new Map<string, CaptureRule>([
+    ['argument_list', { mode: 'args' }],
+    ['assignment_statement', { mode: 'rhs', field: 'right' }],
+    ['short_var_declaration', { mode: 'rhs', field: 'right' }],
+    ['var_spec', { mode: 'varinit', field: 'value' }],
+    ['keyed_element', { mode: 'value' }], // value = last literal_element child
+    ['literal_value', { mode: 'list' }], // positional composite literals
+  ]),
+  layers: new Map<string, string | null>([
+    ['literal_element', null],
+    ['expression_list', null],
+  ]),
+};
+
+const RUST_SPEC: FnRefSpec = {
+  idTypes: new Set(['identifier']),
+  dispatch: new Map<string, CaptureRule>([
+    ['arguments', { mode: 'args' }],
+    ['assignment_expression', { mode: 'rhs', field: 'right' }],
+    ['field_initializer', { mode: 'value', field: 'value' }],
+    ['array_expression', { mode: 'list' }],
+    ['static_item', { mode: 'varinit', field: 'value' }],
+    ['let_declaration', { mode: 'varinit', field: 'value' }],
+  ]),
+};
+
+const JAVA_SPEC: FnRefSpec = {
+  // No bare-identifier function values in Java — only method references.
+  idTypes: new Set<string>(),
+  dispatch: new Map<string, CaptureRule>([
+    ['argument_list', { mode: 'args' }],
+    ['assignment_expression', { mode: 'rhs', field: 'right' }],
+    ['variable_declarator', { mode: 'varinit', field: 'value' }],
+  ]),
+  special: new Set(['method_reference']),
+};
+
+const KOTLIN_SPEC: FnRefSpec = {
+  idTypes: new Set<string>(),
+  dispatch: new Map<string, CaptureRule>([
+    ['value_arguments', { mode: 'args' }],
+    ['assignment', { mode: 'rhs' }], // RHS = last named child (no field in grammar)
+  ]),
+  layers: new Map<string, string | null>([['value_argument', null]]),
+  special: new Set(['callable_reference', 'navigation_expression']),
+};
+
+const CSHARP_SPEC: FnRefSpec = {
+  idTypes: new Set(['identifier']),
+  dispatch: new Map<string, CaptureRule>([
+    ['argument_list', { mode: 'args' }],
+    ['assignment_expression', { mode: 'rhs', field: 'right' }], // covers `+=` event subscription
+    ['initializer_expression', { mode: 'list' }],
+    ['variable_declarator', { mode: 'varinit' }],
+  ]),
+  layers: new Map<string, string | null>([['argument', null]]),
+  special: new Set(['member_access_expression']),
+};
+
+const RUBY_SPEC: FnRefSpec = {
+  // Bare identifiers in Ruby args are method CALLS or locals, never function
+  // values — only the `method(:name)` idiom (and `&method(:name)`) qualifies.
+  idTypes: new Set<string>(),
+  dispatch: new Map<string, CaptureRule>([
+    ['argument_list', { mode: 'args' }],
+    ['pair', { mode: 'value', field: 'value' }],
+  ]),
+  layers: new Map<string, string | null>([['block_argument', null]]),
+  special: new Set(['call']),
+};
+
+const SWIFT_SPEC: FnRefSpec = {
+  idTypes: new Set(['simple_identifier']),
+  dispatch: new Map<string, CaptureRule>([
+    ['value_arguments', { mode: 'args' }],
+    ['assignment', { mode: 'rhs', field: 'result' }],
+    ['array_literal', { mode: 'list' }],
+    ['property_declaration', { mode: 'varinit', field: 'value' }],
+  ]),
+  layers: new Map<string, string | null>([['value_argument', 'value']]),
+  special: new Set(['selector_expression']),
+};
+
+const SCALA_SPEC: FnRefSpec = {
+  idTypes: new Set(['identifier']),
+  dispatch: new Map<string, CaptureRule>([
+    ['arguments', { mode: 'args' }],
+    ['assignment_expression', { mode: 'rhs', field: 'right' }],
+    ['val_definition', { mode: 'varinit', field: 'value' }],
+  ]),
+  unwrap: new Map<string, string | null>([['postfix_expression', null]]), // eta-expansion `fn _`
+};
+
+const DART_SPEC: FnRefSpec = {
+  idTypes: new Set(['identifier']),
+  dispatch: new Map<string, CaptureRule>([
+    ['arguments', { mode: 'args' }],
+    ['assignment_expression', { mode: 'rhs', field: 'right' }],
+    ['pair', { mode: 'value', field: 'value' }],
+    ['list_literal', { mode: 'list' }],
+    ['static_final_declaration', { mode: 'varinit' }],
+  ]),
+  layers: new Map<string, string | null>([['argument', null]]),
+};
+
+const LUA_SPEC: FnRefSpec = {
+  idTypes: new Set(['identifier']),
+  dispatch: new Map<string, CaptureRule>([
+    ['arguments', { mode: 'args' }],
+    ['assignment_statement', { mode: 'rhs' }], // RHS expression_list children carry `value` fields
+    ['field', { mode: 'value', field: 'value' }], // table fields, keyed AND positional
+  ]),
+  layers: new Map<string, string | null>([['expression_list', null]]),
+};
+
+const PASCAL_SPEC: FnRefSpec = {
+  idTypes: new Set(['identifier']),
+  dispatch: new Map<string, CaptureRule>([
+    ['exprArgs', { mode: 'args' }],
+    ['assignment', { mode: 'rhs', field: 'rhs' }], // OnClick := Handler
+  ]),
+  unwrap: new Map<string, string | null>([['exprUnary', 'operand']]), // @Handler
+};
+
+/**
+ * Capture specs by language. PHP is deliberately absent: its first-class
+ * callable `fn(...)` already extracts as a `calls` edge, and string callables
+ * (`'fn_name'`) are a precision risk left for a follow-up.
+ */
+export const FN_REF_SPECS: Record<string, FnRefSpec | undefined> = {
+  c: cFamilySpec(),
+  cpp: cFamilySpec({ addressOfOnly: true }),
+  objc: cFamilySpec({ special: ['selector_expression'] }),
+  typescript: TS_JS_SPEC,
+  tsx: TS_JS_SPEC,
+  javascript: TS_JS_SPEC,
+  jsx: TS_JS_SPEC,
+  python: PYTHON_SPEC,
+  go: GO_SPEC,
+  rust: RUST_SPEC,
+  java: JAVA_SPEC,
+  kotlin: KOTLIN_SPEC,
+  csharp: CSHARP_SPEC,
+  ruby: RUBY_SPEC,
+  swift: SWIFT_SPEC,
+  scala: SCALA_SPEC,
+  dart: DART_SPEC,
+  lua: LUA_SPEC,
+  luau: LUA_SPEC,
+  pascal: PASCAL_SPEC,
+};
+
+// ---------------------------------------------------------------------------
+// Capture
+// ---------------------------------------------------------------------------
+
+/**
+ * Extract candidate names from a dispatched container node. Returns the
+ * (name, position) pairs of every function-value-shaped expression found.
+ */
+export function captureFnRefCandidates(
+  container: SyntaxNode,
+  rule: CaptureRule,
+  spec: FnRefSpec,
+  source: string
+): FnRefCandidate[] {
+  const valueNodes: SyntaxNode[] = [];
+
+  switch (rule.mode) {
+    case 'args':
+    case 'list': {
+      for (let i = 0; i < container.namedChildCount; i++) {
+        const child = container.namedChild(i);
+        if (child) valueNodes.push(child);
+      }
+      break;
+    }
+    case 'rhs': {
+      const rhs = rule.field
+        ? getChildByField(container, rule.field)
+        : container.namedChild(container.namedChildCount - 1);
+      if (rhs) {
+        // Param-storage skip: `this.status = status` / `o->cb = cb` — when
+        // the assigned member's name EQUALS the RHS identifier, the RHS is a
+        // local/parameter being stored, and the function it holds (if any)
+        // is unknowable statically. A same-named function elsewhere would
+        // resolve to the WRONG target (excalidraw A/B finding), so skip.
+        const lhs =
+          getChildByField(container, 'left') ??
+          getChildByField(container, 'lhs') ??
+          getChildByField(container, 'target') ??
+          (container.namedChildCount >= 2 ? container.namedChild(0) : null);
+        const lhsText = lhs ? getNodeText(lhs, source) : '';
+        const lhsLastName = lhsText.match(/([A-Za-z_$][A-Za-z0-9_$]*)\s*$/)?.[1];
+        const rhsText = getNodeText(rhs, source).trim();
+        if (lhsLastName && lhsLastName === rhsText) break;
+        valueNodes.push(rhs);
+      }
+      break;
+    }
+    case 'value': {
+      let value = rule.field ? getChildByField(container, rule.field) : null;
+      // Keyed containers without a value field (Go keyed_element): the value
+      // is the LAST named child (the first is the key).
+      if (!value && container.namedChildCount > 0) {
+        value = container.namedChild(container.namedChildCount - 1);
+      }
+      if (value) valueNodes.push(value);
+      break;
+    }
+    case 'varinit': {
+      // Destructuring (`const { center } = ellipse`) extracts DATA from the
+      // RHS — never a function alias. Without this skip, a parameter that
+      // shadows a same-named imported function produced a wrong edge.
+      const nameNode =
+        getChildByField(container, 'name') ?? getChildByField(container, 'pattern');
+      if (nameNode && (nameNode.type === 'object_pattern' || nameNode.type === 'array_pattern' ||
+                       nameNode.type === 'tuple_pattern' || nameNode.type === 'struct_pattern')) {
+        break;
+      }
+      if (rule.field) {
+        const value = getChildByField(container, rule.field);
+        if (value) valueNodes.push(value);
+      } else {
+        // No value field in this grammar (C# variable_declarator, Dart
+        // static_final_declaration): the initializer is the last named child —
+        // but a declarator WITHOUT an initializer has its NAME there instead.
+        // Require ≥2 named children and never pick the name/pattern child.
+        const value = container.namedChild(container.namedChildCount - 1);
+        const nameChild =
+          getChildByField(container, 'name') ?? getChildByField(container, 'pattern');
+        if (
+          value &&
+          container.namedChildCount >= 2 &&
+          (!nameChild || value.id !== nameChild.id)
+        ) {
+          valueNodes.push(value);
+        }
+      }
+      break;
+    }
+  }
+
+  const out: FnRefCandidate[] = [];
+  for (const v of valueNodes) {
+    // A bare identifier is one that normalizes without passing through an
+    // unwrap/special reference form. C++'s addressOfOnly policy (applied at
+    // flush, where file scope is known) drops bare ids outside file-scope
+    // initializer tables.
+    const explicitRef = !spec.idTypes.has(v.type);
+    for (const { name, node } of normalizeValue(v, spec, source, 0)) {
+      if (!name || NAME_STOPLIST.has(name)) continue;
+      out.push({
+        name,
+        line: node.startPosition.row + 1,
+        column: node.startPosition.column,
+        mode: rule.mode,
+        explicitRef,
+      });
+    }
+  }
+  return out;
+}
+
+/**
+ * Normalize one value expression to zero or more function names. Recursion is
+ * bounded (wrapper layers only); anything that isn't a recognized
+ * function-value shape yields [].
+ */
+function normalizeValue(
+  node: SyntaxNode,
+  spec: FnRefSpec,
+  source: string,
+  depth: number
+): Array<{ name: string; node: SyntaxNode }> {
+  if (depth > 4) return [];
+  const type = node.type;
+
+  // Bare identifier
+  if (spec.idTypes.has(type)) {
+    return [{ name: getNodeText(node, source), node }];
+  }
+
+  // Transparent layers (argument, value_argument, literal_element,
+  // expression_list, block_argument). expression_list fans out (Go `a, b = f, g`).
+  const layerField = spec.layers?.get(type);
+  if (spec.layers?.has(type)) {
+    // Labeled-argument param-forward skip (Swift/Kotlin): `value: value` /
+    // `delay: delay` — when the label EQUALS the value identifier, the value
+    // is a forwarded local/parameter, not a function reference (Alamofire
+    // A/B finding; same rationale as the `this.x = x` assignment skip).
+    if (type === 'value_argument') {
+      const label = getChildByField(node, 'name');
+      const value = getChildByField(node, 'value') ?? node.namedChild(node.namedChildCount - 1);
+      if (
+        label &&
+        value &&
+        getNodeText(label, source).trim() === getNodeText(value, source).trim()
+      ) {
+        return [];
+      }
+    }
+    if (layerField) {
+      const inner = getChildByField(node, layerField);
+      return inner ? normalizeValue(inner, spec, source, depth + 1) : [];
+    }
+    const results: Array<{ name: string; node: SyntaxNode }> = [];
+    for (let i = 0; i < node.namedChildCount; i++) {
+      const child = node.namedChild(i);
+      if (child) results.push(...normalizeValue(child, spec, source, depth + 1));
+    }
+    return results;
+  }
+
+  // Unary wrappers: &fn / @Fn / `fn _`
+  const unwrapField = spec.unwrap?.get(type);
+  if (spec.unwrap?.has(type)) {
+    // C-family `pointer_expression` covers BOTH `&x` (address-of — a function
+    // value) and `*x` (dereference — a data read, never a function value).
+    // Only `&` qualifies; without this, fmt's `*begin` reads resolved to its
+    // free `begin()` functions.
+    if (type === 'pointer_expression' && node.child(0)?.type !== '&') return [];
+    const inner = unwrapField ? getChildByField(node, unwrapField) : node.namedChild(0);
+    if (!inner) return [];
+    // C++ `&Widget::on_click` — keep the QUALIFIED name. Resolution scopes the
+    // method to that class (more precise than a bare-name match, and exempt
+    // from the cpp bare-ids-are-free-functions rule since `&Cls::m` is an
+    // explicit member-pointer).
+    if (inner.type === 'qualified_identifier') {
+      const text = getNodeText(inner, source).trim();
+      return /^[A-Za-z_][\w:]*$/.test(text) ? [{ name: text, node: inner }] : [];
+    }
+    return normalizeValue(inner, spec, source, depth + 1);
+  }
+
+  // Special whole-node reference forms
+  if (spec.special?.has(type)) {
+    return normalizeSpecial(node, type, source);
+  }
+
+  return [];
+}
+
+/** Rightmost descendant-or-self named child of one of the given types. */
+function lastNamedOfType(node: SyntaxNode, types: Set<string>): SyntaxNode | null {
+  let found: SyntaxNode | null = null;
+  for (let i = 0; i < node.namedChildCount; i++) {
+    const child = node.namedChild(i);
+    if (!child) continue;
+    if (types.has(child.type)) found = child;
+    const deeper = lastNamedOfType(child, types);
+    if (deeper) found = deeper;
+  }
+  return found;
+}
+
+function normalizeSpecial(
+  node: SyntaxNode,
+  type: string,
+  source: string
+): Array<{ name: string; node: SyntaxNode }> {
+  switch (type) {
+    // Java `Main::targetCb` / `this::run0` — last identifier child is the method.
+    case 'method_reference': {
+      let last: SyntaxNode | null = null;
+      for (let i = 0; i < node.namedChildCount; i++) {
+        const child = node.namedChild(i);
+        if (child && child.type === 'identifier') last = child;
+      }
+      return last ? [{ name: getNodeText(last, source), node: last }] : [];
+    }
+
+    // Kotlin `::targetCb` — the simple_identifier child.
+    case 'callable_reference': {
+      for (let i = 0; i < node.namedChildCount; i++) {
+        const child = node.namedChild(i);
+        if (child && child.type === 'simple_identifier') {
+          return [{ name: getNodeText(child, source), node: child }];
+        }
+      }
+      return [];
+    }
+
+    // Kotlin `this::fire` parses as navigation_expression with a `::fire`
+    // navigation_suffix. Ordinary `a.b` navigation MUST yield nothing.
+    case 'navigation_expression': {
+      for (let i = 0; i < node.namedChildCount; i++) {
+        const child = node.namedChild(i);
+        if (child && child.type === 'navigation_suffix' && getNodeText(child, source).startsWith('::')) {
+          const id = child.namedChild(child.namedChildCount - 1);
+          if (id) return [{ name: getNodeText(id, source), node: id }];
+        }
+      }
+      return [];
+    }
+
+    // Swift `#selector(Holder.fire)` → fire. ObjC `@selector(storeImage:)` →
+    // `storeImage:` verbatim (ObjC method nodes keep their selector colons).
+    case 'selector_expression': {
+      const inner = node.namedChild(0);
+      if (!inner) return [];
+      if (inner.type === 'identifier' || inner.type === 'simple_identifier') {
+        return [{ name: getNodeText(inner, source), node: inner }];
+      }
+      // Swift dotted form: rightmost simple_identifier. ObjC keyword selector:
+      // text as-is.
+      const last = lastNamedOfType(node, new Set(['simple_identifier']));
+      if (last) return [{ name: getNodeText(last, source), node: last }];
+      return [{ name: getNodeText(inner, source).trim(), node: inner }];
+    }
+
+    // Ruby `method(:target_cb)` — a `call` whose method is literally `method`
+    // with a single symbol argument.
+    case 'call': {
+      const method = getChildByField(node, 'method');
+      if (!method || getNodeText(method, source) !== 'method') return [];
+      const args = getChildByField(node, 'arguments');
+      if (!args || args.namedChildCount !== 1) return [];
+      const sym = args.namedChild(0);
+      if (!sym || sym.type !== 'simple_symbol') return [];
+      const name = getNodeText(sym, source).replace(/^:/, '');
+      return name ? [{ name, node: sym }] : [];
+    }
+
+    // `self.handle_click` (Python) — object must be EXACTLY `self`.
+    case 'attribute': {
+      const obj = getChildByField(node, 'object');
+      const attr = getChildByField(node, 'attribute');
+      if (obj && attr && obj.type === 'identifier' && getNodeText(obj, source) === 'self') {
+        return [{ name: getNodeText(attr, source), node: attr }];
+      }
+      return [];
+    }
+
+    // `this.Run0` (C#) — receiver must be EXACTLY `this`. Two grammar shapes:
+    // newer tree-sitter-c-sharp exposes an `expression` field holding a
+    // `this_expression`; the vendored grammar keeps `this` as an anonymous
+    // token (only the `name` field is a named child), so fall back to the
+    // node text.
+    case 'member_access_expression': {
+      const name = getChildByField(node, 'name');
+      if (!name) return [];
+      const expr = getChildByField(node, 'expression');
+      const isThisReceiver = expr
+        ? expr.type === 'this_expression' || expr.type === 'this'
+        : getNodeText(node, source).startsWith('this.');
+      return isThisReceiver ? [{ name: getNodeText(name, source), node: name }] : [];
+    }
+
+    default:
+      return [];
+  }
+}

+ 3 - 0
src/extraction/generated-detection.ts

@@ -41,6 +41,9 @@ const GENERATED_PATTERNS: ReadonlyArray<RegExp> = [
   /\.pb\.[jt]s$/,
   /_pb\.[jt]s$/,
   /_grpc_pb\.[jt]s$/,
+  // Minified bundles vendored into a repo (docs sites, examples). Their
+  // single-letter symbols make name-based edges pure noise.
+  /\.min\.m?js$/,
   // Python — protobuf / gRPC / openapi-codegen
   /_pb2(_grpc)?\.py$/,
   /_pb2\.pyi$/,

+ 183 - 1
src/extraction/tree-sitter.ts

@@ -17,6 +17,8 @@ import {
 } from '../types';
 import { getParser, detectLanguage, isLanguageSupported, isFileLevelOnlyLanguage } from './grammars';
 import { generateNodeId, getNodeText, getChildByField, getPrecedingDocstring } from './tree-sitter-helpers';
+import { FN_REF_SPECS, captureFnRefCandidates, type FnRefSpec, type FnRefCandidate } from './function-ref';
+import { isGeneratedFile } from './generated-detection';
 import type { LanguageExtractor, ExtractorContext } from './tree-sitter-types';
 import { EXTRACTORS } from './languages';
 import { LiquidExtractor } from './liquid-extractor';
@@ -222,12 +224,18 @@ export class TreeSitterExtractor {
   private extractor: LanguageExtractor | null = null;
   private nodeStack: string[] = []; // Stack of parent node IDs
   private methodIndex: Map<string, string> | null = null; // lookup key → node ID for Pascal defProc lookup
+  // Function-as-value capture (#756): per-language spec + candidates collected
+  // during the walk, gated & flushed into unresolvedReferences at end-of-file
+  // (see flushFnRefCandidates).
+  private fnRefSpec: FnRefSpec | undefined;
+  private fnRefCandidates: Array<FnRefCandidate & { fromNodeId: string }> = [];
 
   constructor(filePath: string, source: string, language?: Language) {
     this.filePath = filePath;
     this.source = source;
     this.language = language || detectLanguage(filePath, source);
     this.extractor = EXTRACTORS[this.language] || null;
+    this.fnRefSpec = FN_REF_SPECS[this.language];
   }
 
   /**
@@ -314,6 +322,10 @@ export class TreeSitterExtractor {
 
       this.visitNode(this.tree.rootNode);
 
+      // Gate + flush function-as-value candidates (#756) while the file's
+      // nodes and import refs are complete and the file node is still pushed.
+      this.flushFnRefCandidates();
+
       if (packageNodeId) this.nodeStack.pop();
       this.nodeStack.pop();
     } catch (error) {
@@ -352,6 +364,136 @@ export class TreeSitterExtractor {
     };
   }
 
+  /**
+   * Function-as-value capture (#756): if this node is one of the language's
+   * value-position containers (call arguments, assignment RHS, struct/object
+   * initializer, array/table literal), collect candidate function names from
+   * it. Candidates are gated & flushed at end-of-file (flushFnRefCandidates).
+   */
+  private maybeCaptureFnRefs(node: SyntaxNode, nodeType: string): void {
+    const spec = this.fnRefSpec;
+    if (!spec) return;
+    const rule = spec.dispatch.get(nodeType);
+    if (!rule || this.nodeStack.length === 0) return;
+    const fromNodeId = this.nodeStack[this.nodeStack.length - 1];
+    if (!fromNodeId) return;
+    for (const cand of captureFnRefCandidates(node, rule, spec, this.source)) {
+      this.fnRefCandidates.push({ ...cand, fromNodeId });
+    }
+  }
+
+  /**
+   * Candidates-only scan of a subtree the main walkers won't traverse
+   * (top-level variable initializers). No extraction side effects. Halts at
+   * nested function definitions: their bodies are walked — and their
+   * candidates attributed — by extractFunction's own body walk.
+   */
+  private scanFnRefSubtree(node: SyntaxNode, depth: number): void {
+    if (!this.fnRefSpec || depth > 12) return;
+    const nodeType = node.type;
+    if (depth > 0 && (
+      this.extractor?.functionTypes.includes(nodeType) ||
+      nodeType === 'arrow_function' ||
+      nodeType === 'function_expression' ||
+      nodeType === 'lambda_literal' ||
+      nodeType === 'lambda_expression'
+    )) {
+      return;
+    }
+    this.maybeCaptureFnRefs(node, nodeType);
+    for (let i = 0; i < node.namedChildCount; i++) {
+      const child = node.namedChild(i);
+      if (child) this.scanFnRefSubtree(child, depth + 1);
+    }
+  }
+
+  /**
+   * Gate captured function-as-value candidates and push survivors as
+   * `function_ref` unresolved references.
+   *
+   * The gate bounds volume and protects precision: a candidate survives only
+   * if its name matches a function/method DEFINED IN THIS FILE or a name this
+   * file imports/references. Everything else (locals, params, fields passed
+   * as arguments) is dropped before it ever reaches the database. Resolution
+   * then matches survivors against function/method nodes only
+   * (matchFunctionRef) and emits `references` edges — which callers/impact
+   * already traverse.
+   *
+   * Known v1 limit, deliberate: a C/C++ callback registered in a DIFFERENT
+   * translation unit than its definition (extern, no symbol imports to match)
+   * is not captured. Same-file registration — the dominant C pattern (static
+   * callback + same-file ops struct) — is.
+   */
+  private flushFnRefCandidates(): void {
+    if (this.fnRefCandidates.length === 0) return;
+    const candidates = this.fnRefCandidates;
+    this.fnRefCandidates = [];
+
+    // Generated/minified files (vendored jquery.min.js and friends): their
+    // function-as-value edges are noise — single-letter minified symbols
+    // resolve everywhere. Same policy as the callback synthesizer.
+    if (isGeneratedFile(this.filePath)) return;
+
+    const definedHere = new Set<string>();
+    for (const n of this.nodes) {
+      if (n.kind === 'function' || n.kind === 'method') definedHere.add(n.name);
+    }
+
+    // Import-binding names only (all binding emitters push kind 'imports').
+    // Deliberately NOT 'references': those carry type-annotation and
+    // interface-member names, which let local variables that share a type
+    // member's name slip through the gate (excalidraw A/B finding).
+    const SIMPLE_NAME = /^[A-Za-z_$][A-Za-z0-9_$]*$/;
+    const importedNames = new Set<string>();
+    for (const r of this.unresolvedReferences) {
+      if (r.referenceKind === 'imports' && SIMPLE_NAME.test(r.referenceName)) {
+        importedNames.add(r.referenceName);
+      }
+    }
+
+    const ungated = this.fnRefSpec?.ungatedModes;
+    const addressOfOnly = this.fnRefSpec?.addressOfOnly === true;
+    const seen = new Set<string>();
+    for (const c of candidates) {
+      const atFileScope = c.fromNodeId.startsWith('file:');
+      // C++ (addressOfOnly): a BARE identifier qualifies only inside a
+      // file-scope initializer table. Everywhere else — args, assignments,
+      // local braced-init lists like `{begin, size}` — only explicit `&`
+      // forms count (fmt A/B finding: generic names `begin`/`out`/`size`
+      // collide with locals and members).
+      if (
+        addressOfOnly &&
+        !c.explicitRef &&
+        !(atFileScope && (c.mode === 'value' || c.mode === 'list'))
+      ) {
+        continue;
+      }
+      // C-family file-scope initializers skip the gate (constant-expression
+      // context — a bare identifier there is a function address, never a
+      // variable; see FnRefSpec.ungatedModes). Local initializers and
+      // everything else require a same-file/import match.
+      const skipGate = ungated?.has(c.mode) === true && atFileScope;
+      // Qualified C++ member-pointers (`Widget::on_click`) gate on the member
+      // name; everything else on the full name.
+      const gateName = c.name.includes('::')
+        ? c.name.slice(c.name.lastIndexOf('::') + 2)
+        : c.name;
+      if (!skipGate && !definedHere.has(gateName) && !importedNames.has(gateName)) {
+        continue;
+      }
+      const key = `${c.fromNodeId}|${c.name}`;
+      if (seen.has(key)) continue;
+      seen.add(key);
+      this.unresolvedReferences.push({
+        fromNodeId: c.fromNodeId,
+        referenceName: c.name,
+        referenceKind: 'function_ref',
+        line: c.line,
+        column: c.column,
+      });
+    }
+  }
+
   /**
    * Visit a node and extract information
    */
@@ -365,7 +507,14 @@ export class TreeSitterExtractor {
     if (this.extractor.visitNode) {
       const ctx = this.makeExtractorContext();
       const handled = this.extractor.visitNode(node, ctx);
-      if (handled) return;
+      if (handled) {
+        // The hook consumed this subtree, so the walkers below never descend
+        // into it — scan it for function-as-value candidates (#756). Scala's
+        // hook handles val/var definitions (`val table = Seq(targetCb)`), for
+        // example. The scan is capture-only and halts at nested functions.
+        this.scanFnRefSubtree(node, 0);
+        return;
+      }
     }
 
     // Pascal-specific AST handling
@@ -374,6 +523,11 @@ export class TreeSitterExtractor {
       if (skipChildren) return;
     }
 
+    // Function-as-value capture (#756) — independent of the dispatch ladder
+    // below (the captured container types have no other handler there), so it
+    // can never shadow or be shadowed by an extraction branch.
+    this.maybeCaptureFnRefs(node, nodeType);
+
     // Check for function declarations
     // For Python/Ruby, function_definition inside a class should be treated as method
     if (this.extractor.functionTypes.includes(nodeType)) {
@@ -437,17 +591,33 @@ export class TreeSitterExtractor {
     // Check for class properties (e.g. C# property_declaration)
     else if (this.extractor.propertyTypes?.includes(nodeType) && this.isInsideClassLikeNode()) {
       this.extractProperty(node);
+      // Property initializers aren't walked — scan for function-as-value
+      // candidates (#756): Scala `val table = Seq(targetCb)` in an object,
+      // Kotlin `val cb = ::handler` class properties.
+      this.scanFnRefSubtree(node, 0);
       skipChildren = true;
     }
     // Check for class fields (e.g. Java field_declaration, C# field_declaration)
     else if (this.extractor.fieldTypes?.includes(nodeType) && this.isInsideClassLikeNode()) {
       this.extractField(node);
+      // Field initializers aren't walked — scan for function-as-value
+      // candidates (#756): Java `List<IntConsumer> table = List.of(Main::cb)`,
+      // C# `List<Action<int>> table = new() { TargetCb }`.
+      this.scanFnRefSubtree(node, 0);
       skipChildren = true;
     }
     // Check for variable declarations (const, let, var, etc.)
     // Only extract top-level variables (not inside functions/methods)
     else if (this.extractor.variableTypes.includes(nodeType) && !this.isInsideClassLikeNode()) {
       this.extractVariable(node);
+      // extractVariable doesn't walk every initializer shape (object literals
+      // are deliberately skipped; Python/Ruby don't walk at all), so scan the
+      // declaration subtree for function-as-value candidates — `const routes =
+      // { home: renderHome }`, `handlers = {"recv": target_cb}`. The scan halts
+      // at nested function definitions (their bodies are walked — and
+      // attributed — separately) and flush-time dedup absorbs any overlap with
+      // initializers extractVariable DOES walk.
+      this.scanFnRefSubtree(node, 0);
       skipChildren = true; // extractVariable handles children
     }
     // Swift stored properties inside a type. Swift instance properties aren't
@@ -3086,6 +3256,10 @@ export class TreeSitterExtractor {
     const visitForCallsAndStructure = (node: SyntaxNode): void => {
       const nodeType = node.type;
 
+      // Function-as-value capture (#756) — function bodies are walked here,
+      // not in visitNode, so the capture hook must fire in both walkers.
+      this.maybeCaptureFnRefs(node, nodeType);
+
       // Rocket route-registration macros (`routes![…]` / `catchers![…]`): the
       // handler paths live in a raw token tree the call walker can't see.
       if (nodeType === 'macro_invocation') this.extractRustRouteMacro(node);
@@ -4461,8 +4635,16 @@ export class TreeSitterExtractor {
     for (let i = 0; i < node.namedChildCount; i++) {
       const child = node.namedChild(i);
       if (!child) continue;
+      // Function-as-value capture (#756): Pascal bodies are walked here, not
+      // in visitNode/visitForCallsAndStructure, so the capture hook fires here
+      // — assignment RHS is the Delphi event-wiring idiom (`OnFire := Handler`).
+      this.maybeCaptureFnRefs(child, child.type);
       if (child.type === 'exprCall') {
         this.extractPascalCall(child);
+        // The walker doesn't descend into a call's arguments — dispatch the
+        // argument container directly (`RegisterHandler(TargetCb)` / `(@Cb)`).
+        const args = child.namedChildren.find((c: SyntaxNode) => c.type === 'exprArgs');
+        if (args) this.maybeCaptureFnRefs(args, 'exprArgs');
       } else if (child.type === 'exprDot') {
         // A STATEMENT-level bare exprDot is a paren-less call (`Obj.Free;`,
         // `TFoo.GetInstance.DoIt;`). Anywhere else (assignment side, condition,

+ 1 - 1
src/mcp/server-instructions.ts

@@ -47,7 +47,7 @@ typically one to a few calls; a grep/read exploration is dozens.
 - **Almost any question — "how does X work", architecture, a bug, "what/where is X", or surveying an area** → \`codegraph_explore\` (PRIMARY — call FIRST; ONE capped call returns the verbatim source of the relevant symbols grouped by file; most often the ONLY call you need)
 - **"How does X reach/become Y? / the flow / the path from X to Y"** → \`codegraph_explore\`, naming the symbols that span the flow (e.g. \`mutateElement renderScene\`) — it surfaces the call path among them, including dynamic-dispatch hops (callbacks, React re-render, JSX children) grep can't follow
 - **"What is the symbol named X?" (just its location)** → \`codegraph_search\`
-- **"What calls this?" / "What does this call?" / "What would changing this break?"** → \`codegraph_callers\` / \`codegraph_callees\` / \`codegraph_impact\`
+- **"What calls this?" / "What does this call?" / "What would changing this break?"** → \`codegraph_callers\` / \`codegraph_callees\` / \`codegraph_impact\`. Callers includes where a function is **registered as a callback** (passed as an argument, assigned to a function pointer/field, listed in a handler table) — labeled "via callback registration" — so a function with no direct calls is NOT dead if it's wired up somewhere
 - **Reading a source FILE (any time you'd use the \`Read\` tool)** → \`codegraph_node\` with a \`file\` path and no \`symbol\`. It returns the file's **current source with line numbers — the same \`<n>\\t<line>\` shape \`Read\` gives you, safe to \`Edit\` from** — narrowable with \`offset\`/\`limit\` exactly like \`Read\`, PLUS a one-line note of which files depend on it. Same bytes as \`Read\`, faster (served from the index), with the blast radius attached. Use it **instead of \`Read\`** for indexed source files; fall back to \`Read\` only for what codegraph doesn't index (configs, docs). Pass \`symbolsOnly: true\` for just the file's structure.
 - **About to read or edit a symbol you can name** → \`codegraph_node\` with that \`symbol\` (SECONDARY — the after-explore depth tool): the verbatim source (\`includeCode: true\`) PLUS its caller/callee trail, so before changing it you see what calls it and what your edit would break. For an OVERLOADED name it returns EVERY matching definition's body in one call, so you never Read a file to find the right overload
 - **"What's in directory X?"** → \`codegraph_files\`

+ 30 - 5
src/mcp/tools.ts

@@ -1113,11 +1113,14 @@ export class ToolHandler {
     // Aggregate callers across all matching symbols
     const seen = new Set<string>();
     const allCallers: Node[] = [];
+    const labels = new Map<string, string>();
     for (const node of allMatches.nodes) {
       for (const c of cg.getCallers(node.id)) {
         if (!seen.has(c.node.id)) {
           seen.add(c.node.id);
           allCallers.push(c.node);
+          const label = this.edgeLabel(c.edge);
+          if (label) labels.set(c.node.id, label);
         }
       }
     }
@@ -1126,7 +1129,7 @@ export class ToolHandler {
       return this.textResult(`No callers found for "${symbol}"${allMatches.note}`);
     }
 
-    const formatted = this.formatNodeList(allCallers.slice(0, limit), `Callers of ${symbol}`) + allMatches.note;
+    const formatted = this.formatNodeList(allCallers.slice(0, limit), `Callers of ${symbol}`, labels) + allMatches.note;
     return this.textResult(this.truncateOutput(formatted));
   }
 
@@ -1148,11 +1151,14 @@ export class ToolHandler {
     // Aggregate callees across all matching symbols
     const seen = new Set<string>();
     const allCallees: Node[] = [];
+    const labels = new Map<string, string>();
     for (const node of allMatches.nodes) {
       for (const c of cg.getCallees(node.id)) {
         if (!seen.has(c.node.id)) {
           seen.add(c.node.id);
           allCallees.push(c.node);
+          const label = this.edgeLabel(c.edge);
+          if (label) labels.set(c.node.id, label);
         }
       }
     }
@@ -1161,7 +1167,7 @@ export class ToolHandler {
       return this.textResult(`No callees found for "${symbol}"${allMatches.note}`);
     }
 
-    const formatted = this.formatNodeList(allCallees.slice(0, limit), `Callees of ${symbol}`) + allMatches.note;
+    const formatted = this.formatNodeList(allCallees.slice(0, limit), `Callees of ${symbol}`, labels) + allMatches.note;
     return this.textResult(this.truncateOutput(formatted));
   }
 
@@ -3337,18 +3343,37 @@ export class ToolHandler {
     return lines.join('\n');
   }
 
-  private formatNodeList(nodes: Node[], title: string): string {
+  private formatNodeList(nodes: Node[], title: string, labels?: Map<string, string>): string {
     const lines: string[] = [`## ${title} (${nodes.length} found)`, ''];
 
     for (const node of nodes) {
       const location = node.startLine ? `:${node.startLine}` : '';
-      // Compact: just name, kind, location
-      lines.push(`- ${node.name} (${node.kind}) - ${node.filePath}${location}`);
+      // Compact: just name, kind, location — plus the relationship when it
+      // isn't a plain call (callback registration, instantiation, …).
+      const label = labels?.get(node.id);
+      lines.push(
+        `- ${node.name} (${node.kind}) - ${node.filePath}${location}${label ? ` — via ${label}` : ''}`
+      );
     }
 
     return lines.join('\n');
   }
 
+  /**
+   * Relationship label for a non-`calls` edge in callers/callees lists. A
+   * function-as-value edge (#756) is the high-signal one: `callers(cb)`
+   * showing "via callback registration" tells the agent this is where the
+   * callback is WIRED, not where it's invoked.
+   */
+  private edgeLabel(edge: Edge): string | null {
+    if (edge.kind === 'calls') return null;
+    if (edge.metadata?.fnRef === true) return 'callback registration';
+    if (edge.kind === 'instantiates') return 'instantiation';
+    if (edge.kind === 'imports') return 'import';
+    if (edge.kind === 'references') return 'reference';
+    return edge.kind;
+  }
+
   private formatImpact(symbol: string, impact: Subgraph): string {
     const nodeCount = impact.nodes.size;
 

+ 30 - 3
src/resolution/index.ts

@@ -16,7 +16,7 @@ import {
   FrameworkResolver,
   ImportMapping,
 } from './types';
-import { matchReference, matchDottedCallChain, matchScopedCallChain, sameLanguageFamily, crossesKnownFamily } from './name-matcher';
+import { matchReference, matchFunctionRef, matchDottedCallChain, matchScopedCallChain, sameLanguageFamily, crossesKnownFamily } from './name-matcher';
 import { resolveViaImport, resolveJvmImport, extractImportMappings, extractReExports, loadCppIncludeDirs, isPhpIncludePathRef } from './import-resolver';
 import { detectFrameworks } from './frameworks';
 import { synthesizeCallbackEdges } from './callback-synthesizer';
@@ -669,6 +669,22 @@ export class ReferenceResolver {
       return null;
     }
 
+    // Function-as-value refs (#756) get a dedicated, strictly-gated path:
+    // import-based resolution first (an imported callback resolves through its
+    // import, the most precise cross-file signal), then matchFunctionRef
+    // (same-file first, unique-only cross-file, function/method targets only).
+    // They never reach the framework or fuzzy strategies below.
+    if (ref.referenceKind === 'function_ref') {
+      const viaImport = this.gateLanguage(resolveViaImport(ref, this.context), ref);
+      if (viaImport) {
+        const target = this.queries.getNodeById(viaImport.targetNodeId);
+        if (target && (target.kind === 'function' || target.kind === 'method')) {
+          return viaImport;
+        }
+      }
+      return this.gateLanguage(matchFunctionRef(ref, this.context), ref);
+    }
+
     // JVM FQN imports skip framework/name-matcher: `import com.example.Bar`
     // resolves directly through the qualifiedName index, which is unambiguous
     // even when several `Bar` classes exist in different packages.
@@ -750,7 +766,13 @@ export class ReferenceResolver {
    */
   createEdges(resolved: ResolvedRef[]): Edge[] {
     return resolved.map((ref) => {
-      let kind = ref.original.referenceKind;
+      // `function_ref` (#756) is internal-only: it persists as a `references`
+      // edge (the registration site depends on the callback), distinguishable
+      // by metadata.resolvedBy === 'function-ref'. callers/impact already
+      // traverse `references`, so registration sites surface with no
+      // graph-layer changes.
+      let kind: Edge['kind'] =
+        ref.original.referenceKind === 'function_ref' ? 'references' : ref.original.referenceKind;
 
       // Promote "extends" to "implements" when a class/struct targets an interface
       if (kind === 'extends') {
@@ -784,6 +806,11 @@ export class ReferenceResolver {
         metadata: {
           confidence: ref.confidence,
           resolvedBy: ref.resolvedBy,
+          // Uniform marker for function-as-value edges (#756), regardless of
+          // which strategy resolved them (import vs matchFunctionRef) — lets
+          // tooling label "callback registration" and lets validation diff
+          // exactly the edges this feature added.
+          ...(ref.original.referenceKind === 'function_ref' ? { fnRef: true } : {}),
         },
       };
     });
@@ -1161,7 +1188,7 @@ export class ReferenceResolver {
     if (!result) return result;
     const tgt = this.getLanguageFromNodeId(result.targetNodeId);
     if (!tgt || !ref.language) return result;
-    if (ref.referenceKind === 'references' && !sameLanguageFamily(tgt, ref.language)) return null;
+    if ((ref.referenceKind === 'references' || ref.referenceKind === 'function_ref') && !sameLanguageFamily(tgt, ref.language)) return null;
     if (ref.referenceKind === 'imports' && crossesKnownFamily(tgt, ref.language)) return null;
     return result;
   }

+ 115 - 1
src/resolution/name-matcher.ts

@@ -158,7 +158,7 @@ export function crossesKnownFamily(a: string, b: string): boolean {
  *    both-known filter so `.vue`/`.svelte` (own tag) importing `.ts` survives.
  */
 function applyLanguageGate(candidates: Node[], ref: UnresolvedRef): Node[] {
-  if (ref.referenceKind === 'references') {
+  if (ref.referenceKind === 'references' || ref.referenceKind === 'function_ref') {
     return candidates.filter((c) => sameLanguageFamily(c.language, ref.language));
   }
   if (ref.referenceKind === 'imports') {
@@ -167,6 +167,113 @@ function applyLanguageGate(candidates: Node[], ref: UnresolvedRef): Node[] {
   return candidates;
 }
 
+/**
+ * Resolve a function-as-value reference (#756) — a function name used as a
+ * callback/function-pointer value (`register(handler)`, `o->cb = handler`,
+ * `{ .cb = handler }`, `signal(SIGINT, handler)`). The ONLY strategy allowed
+ * for `function_ref` refs: exact name, function/method targets only, same
+ * language family, same-file first, and cross-file only when the match is
+ * UNIQUE. No fuzzy fallback, no qualified-name walking — a wrong callback
+ * edge is worse than none.
+ */
+export function matchFunctionRef(
+  ref: UnresolvedRef,
+  context: ResolutionContext
+): ResolvedRef | null {
+  // In JS/TS/Python a bare identifier can never be a method value (methods
+  // are only reachable through a receiver — `this.m` / `self.m` /
+  // `Cls.m`), so bare fn-refs match FUNCTIONS only. This also sidesteps the
+  // pre-existing TS quirk of class fields extracting as method-kind nodes,
+  // which otherwise soaked up local names passed as arguments (excalidraw
+  // A/B finding; same pattern in vendored docopt.py). Python's `self.m`
+  // form keeps method targets via its own capture shape. C++ likewise: a
+  // bare identifier can only be a FREE function (member values need
+  // `&Cls::method`). Other languages keep method targets: C# method groups,
+  // Swift/Dart implicit-self, Java/Kotlin method references.
+  const bareFnOnly =
+    ref.language === 'typescript' || ref.language === 'tsx' ||
+    ref.language === 'javascript' || ref.language === 'jsx' ||
+    ref.language === 'cpp' || ref.language === 'python';
+
+  // Qualified member-pointer (`&Widget::on_click` → "Widget::on_click"):
+  // resolve the member ON THAT SCOPE — exempt from bareFnOnly (the `&Cls::m`
+  // shape is an explicit member reference). Unique-or-drop like everything else.
+  if (ref.referenceName.includes('::')) {
+    const memberName = ref.referenceName.slice(ref.referenceName.lastIndexOf('::') + 2);
+    const scoped = context
+      .getNodesByName(memberName)
+      .filter(
+        (n) =>
+          (n.kind === 'function' || n.kind === 'method') &&
+          sameLanguageFamily(n.language, ref.language) &&
+          n.id !== ref.fromNodeId &&
+          (n.qualifiedName === ref.referenceName ||
+            n.qualifiedName.endsWith(`::${ref.referenceName}`))
+      );
+    if (scoped.length === 0) return null;
+    const sameFileScoped = scoped.filter((n) => n.filePath === ref.filePath);
+    const pool = sameFileScoped.length > 0 ? sameFileScoped : scoped;
+    if (sameFileScoped.length === 0 && scoped.length > 1) return null;
+    const target = pool.reduce((a, b) => (a.startLine <= b.startLine ? a : b));
+    return {
+      original: ref,
+      targetNodeId: target.id,
+      confidence: 0.9,
+      resolvedBy: 'function-ref',
+    };
+  }
+
+  const candidates = context
+    .getNodesByName(ref.referenceName)
+    .filter(
+      (n) =>
+        (n.kind === 'function' || (!bareFnOnly && n.kind === 'method')) &&
+        sameLanguageFamily(n.language, ref.language) &&
+        n.id !== ref.fromNodeId // a function registering itself is not a dependency edge
+    );
+  if (candidates.length === 0) return null;
+
+  // Same-file definition wins — the extraction gate guarantees most survivors
+  // have one, and it's the dominant C pattern (static callback registered in
+  // a same-file ops struct).
+  const sameFile = candidates.filter((n) => n.filePath === ref.filePath);
+  if (sameFile.length > 0) {
+    // Swift: several same-named METHODS in one file is an API overload family
+    // (`Session.request(...)` × N), and a bare identifier hitting it is almost
+    // always a same-named parameter, not a method value (Alamofire A/B
+    // finding) — refuse rather than guess. A single method (SwiftUI's
+    // `action: handleTap`) still resolves.
+    if (
+      ref.language === 'swift' &&
+      sameFile.length > 1 &&
+      sameFile.every((n) => n.kind === 'method')
+    ) {
+      return null;
+    }
+    // Same-name overloads in one file are the same conceptual symbol; pick
+    // the first by position for determinism.
+    const target = sameFile.reduce((a, b) => (a.startLine <= b.startLine ? a : b));
+    return {
+      original: ref,
+      targetNodeId: target.id,
+      confidence: sameFile.length === 1 ? 0.95 : 0.9,
+      resolvedBy: 'function-ref',
+    };
+  }
+
+  // Cross-file (imported names the import resolver didn't already claim):
+  // only an unambiguous match resolves.
+  if (candidates.length === 1) {
+    return {
+      original: ref,
+      targetNodeId: candidates[0]!.id,
+      confidence: 0.8,
+      resolvedBy: 'function-ref',
+    };
+  }
+  return null;
+}
+
 /**
  * Try to resolve a reference by exact name match
  */
@@ -1124,6 +1231,13 @@ export function matchReference(
   ref: UnresolvedRef,
   context: ResolutionContext
 ): ResolvedRef | null {
+  // Function-as-value refs (#756) resolve ONLY through the dedicated matcher —
+  // never the fuzzy/qualified fallthrough below (a wrong callback edge is
+  // worse than none).
+  if (ref.referenceKind === 'function_ref') {
+    return matchFunctionRef(ref, context);
+  }
+
   // Try strategies in order of confidence
   let result: ResolvedRef | null;
 

+ 3 - 3
src/resolution/types.ts

@@ -4,7 +4,7 @@
  * Types for the reference resolution system.
  */
 
-import { EdgeKind, Language, Node } from '../types';
+import { Language, Node, ReferenceKind } from '../types';
 
 /**
  * An unresolved reference from extraction
@@ -15,7 +15,7 @@ export interface UnresolvedRef {
   /** The name being referenced */
   referenceName: string;
   /** Type of reference */
-  referenceKind: EdgeKind;
+  referenceKind: ReferenceKind;
   /** Line where reference occurs */
   line: number;
   /** Column where reference occurs */
@@ -39,7 +39,7 @@ export interface ResolvedRef {
   /** Confidence score (0-1) */
   confidence: number;
   /** How it was resolved */
-  resolvedBy: 'exact-match' | 'import' | 'qualified-name' | 'framework' | 'fuzzy' | 'instance-method' | 'file-path';
+  resolvedBy: 'exact-match' | 'import' | 'qualified-name' | 'framework' | 'fuzzy' | 'instance-method' | 'file-path' | 'function-ref';
 }
 
 /**

+ 9 - 1
src/types.ts

@@ -278,6 +278,14 @@ export interface ExtractionError {
   code?: string;
 }
 
+/**
+ * Kinds an unresolved reference can carry. `function_ref` is internal-only —
+ * a function name used as a VALUE (callback registration, #756). It never
+ * becomes an edge kind: resolution maps it to a `references` edge targeting
+ * function/method nodes only (see `matchFunctionRef`).
+ */
+export type ReferenceKind = EdgeKind | 'function_ref';
+
 /**
  * A reference that couldn't be resolved during extraction
  */
@@ -289,7 +297,7 @@ export interface UnresolvedReference {
   referenceName: string;
 
   /** Type of reference (call, type, import, etc.) */
-  referenceKind: EdgeKind;
+  referenceKind: ReferenceKind;
 
   /** Location of the reference */
   line: number;