Просмотр исходного кода

feat(extraction): add C, Java, C#, PHP, Scala, and Kotlin to value-reference edges

Extends same-file constant-reader impact edges to six more languages, each
requiring a different technique:

**C** — file-scope `static const` / global nodes weren't extracted at all (name
nests in `init_declarator`, the generic fallback missed it). Added a C branch in
`extractVariable` with `cDeclaratorIdentifier` to walk the declarator chain.
Skips bare-`identifier` declarators to kill the macro-prefixed-prototype misparse
FP cluster (a `MACRO RetType fn(args);` mints a spurious type-named global).
Prune switch gains `init_declarator`.

**Java** — `static final` fields already existed but as `field` kind, which the
value-ref gate rejects. An `isConst` predicate (`static` + `final` modifiers)
re-kinds the constant subset to `constant` in `extractField`. Instance `final`
fields stay `field`.

**C#** — same `field`→`constant` approach: `const` modifier or `static readonly`
modifier pair → `constant`; instance `readonly` stays `field`.

**PHP** — constants already extracted as `constant`; the only gap was the
reader-scan. PHP represents a constant reference as a `name` node (not
`identifier`), so bare `X` and the const half of `self::X` / `Foo::X` were
invisible. Added `name` to the scan. No prune wiring needed: a `$var` local is a
`variable_name` — a different namespace — and can never shadow a bare constant.

**Scala** — top-level `val` was already `constant`; `object` and `class` vals
both came out as `field`. Fixed by walking to the enclosing AST definition in the
`val_definition` handler: `object_definition` → `constant`/`variable`; `class`/
`trait`/`enum`/`given` → `field`. Prune switch gains `val_definition`/
`var_definition`.

**Kotlin** — properties weren't extracted at all (`property_declaration →
variable_declaration → simple_identifier` is too deep for the generic path).
Added a `property_declaration` handler in `visitNode`: pulls the nested name,
walks to the enclosing scope for kind (`object`/`companion object`/top-level →
`constant`/`variable`; `class` → `field`; function body → local, skipped). Reader-
scan gains `simple_identifier`; prune switch gains `property_declaration`.

Validated S/M/L across all six languages (hiredis/curl/redis, gson/commons-lang/
guava, AutoMapper/Newtonsoft/efcore, guzzle/monolog/laravel, upickle/cats/pekko,
okio/coroutines/ktor): node count identical on/off, precision guards held, 0
shadow leaks, impact wins confirmed (`INDEX_NOT_FOUND` 4→165, `Curl_ssl` 3→57,
`_resourceManager` 22→1664, `STATE_IN_QUEUE` 1→32, `BLOCKING_SHIFT` 1→24).

C++ was attempted and reverted — tree-sitter-cpp parse fidelity on
template/macro-heavy code leaks class members to file scope as bogus constants;
did not reach the precision bar.
Colby McHenry 1 неделя назад
Родитель
Сommit
907098a703

+ 3 - 1
CHANGELOG.md

@@ -11,7 +11,9 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
 
 ### New Features
 ### New Features
 
 
-- Impact and blast-radius analysis for TypeScript, JavaScript, Go, Python, Rust, and Ruby now understands the readers of a constant. When you change a file-scope, package-level, module-level, or class-level constant — a config object, a lookup table, a shared constant — the other symbols in that file that read it now show up as affected, where before they were invisible (impact only followed calls, imports, and inheritance, so a constant's consumers looked like "nothing depends on this"). This makes `codegraph impact`, and the impact trail in `codegraph_explore`/`codegraph_node`, catch the "change this table, break its readers" class of change. It's on by default and adds no nodes to your graph; bundled/minified files and ambiguously-shadowed names are skipped to keep results precise. Set `CODEGRAPH_VALUE_REFS=0` to turn it off.
+- Impact and blast-radius analysis for TypeScript, JavaScript, Go, Python, Rust, Ruby, C, Java, C#, PHP, Scala, and Kotlin now understands the readers of a constant. When you change a file-scope, package-level, module-level, or class-level constant — a config object, a lookup table, a shared constant — the other symbols in that file that read it now show up as affected, where before they were invisible (impact only followed calls, imports, and inheritance, so a constant's consumers looked like "nothing depends on this"). This makes `codegraph impact`, and the impact trail in `codegraph_explore`/`codegraph_node`, catch the "change this table, break its readers" class of change. It's on by default and adds no nodes to your graph; bundled/minified files and ambiguously-shadowed names are skipped to keep results precise. Set `CODEGRAPH_VALUE_REFS=0` to turn it off.
+- C file-scope constants and globals — `static const` scalars, pointer/array lookup tables, and shared mutable globals — are now recognized as symbols in their own right. They previously weren't extracted at all, so they never appeared in search or carried any dependents; now they show up in `codegraph search` and participate in impact analysis (see above), so changing a C lookup table surfaces the same-file functions that read it.
+- Java `static final` constants, C# `const` / `static readonly` constants, Scala `object` vals, and Kotlin top-level / `object` / `companion object` `val`s are now classified as constants rather than generic fields, so they participate in the constant-reader impact analysis above — change a `public static final` table, a `const string`, a Scala `object Config { val Timeout = … }`, or a Kotlin `companion object { const val … }` and the methods that read it now show up as affected. (Per-object Java `final` / C# `readonly` / Scala & Kotlin `class` instance properties are unchanged.) Kotlin constants were previously not indexed as their own symbols at all, so they now also appear in `codegraph search`.
 
 
 ### Fixes
 ### Fixes
 
 

+ 290 - 0
__tests__/value-reference-edges.test.ts

@@ -258,6 +258,296 @@ describe('value-reference edges', () => {
     expect(valueRefReaders(cg, 'TIMEOUT')).toEqual(expect.arrayContaining(['get_timeout', 'describe']));
     expect(valueRefReaders(cg, 'TIMEOUT')).toEqual(expect.arrayContaining(['get_timeout', 'describe']));
   });
   });
 
 
+  it('edges same-file readers to a file-scope const/table (C)', async () => {
+    // C keeps shareable values at file scope as `static const` — scalars and,
+    // very commonly, pointer/array lookup tables. Both must be extracted as
+    // nodes (the generic fallback misses C's nested init_declarator name) and
+    // their same-file readers edged.
+    fs.writeFileSync(
+      path.join(dir, 'config.c'),
+      [
+        'static const int MAX_ITEMS = 100;',
+        'static const char *const STATUS_NAMES[] = { "ok", "fail", "pending" };',
+        '',
+        'int capped(int n) { return n > MAX_ITEMS ? MAX_ITEMS : n; }',
+        'const char *label(int i) { return STATUS_NAMES[i]; }',
+      ].join('\n'),
+    );
+    cg = index();
+    await cg.indexAll();
+
+    expect(valueRefReaders(cg, 'MAX_ITEMS')).toEqual(expect.arrayContaining(['capped']));
+    expect(valueRefReaders(cg, 'STATUS_NAMES')).toEqual(expect.arrayContaining(['label']));
+  });
+
+  it('does NOT edge a C file const shadowed by a function-local of the same name', async () => {
+    // `TIMEOUT` is a file const AND a local `int TIMEOUT = 5` (init_declarator)
+    // in shadows(). The local read resolves to the inner binding, so a
+    // file-scope edge would be a false positive — the shadow prune drops it.
+    fs.writeFileSync(
+      path.join(dir, 'shadow.c'),
+      [
+        'static const int TIMEOUT = 30;',
+        '',
+        'int uses_const(void) { return TIMEOUT; }',
+        'int shadows(void) {',
+        '    int TIMEOUT = 5;',
+        '    return TIMEOUT;',
+        '}',
+      ].join('\n'),
+    );
+    cg = index();
+    await cg.indexAll();
+
+    expect(valueRefReaders(cg, 'TIMEOUT')).toEqual([]);
+  });
+
+  it('does NOT mint a value target from a macro-prefixed C prototype (return-type misparse)', async () => {
+    // A prototype led by an unknown macro (`CURL_EXTERN CURLcode fn(args);`)
+    // makes tree-sitter-c misparse it as a declaration whose "variable" is the
+    // bare return-type identifier — which would mint a spurious `CURLcode`
+    // value target read by every function of that type. The bare-identifier
+    // skip prevents it, while real file-scope consts still edge their readers.
+    fs.writeFileSync(
+      path.join(dir, 'api.c'),
+      [
+        'typedef enum { CURLE_OK, CURLE_FAIL } CURLcode;',
+        'CURL_EXTERN CURLcode curl_easy_init(int x);',
+        'CURL_EXTERN CURLcode curl_easy_setopt(int y);',
+        '',
+        'static const int REAL_LIMIT = 42;',
+        'int use_real(void) { return REAL_LIMIT; }',
+      ].join('\n'),
+    );
+    cg = index();
+    await cg.indexAll();
+
+    // The return-type name is never extracted as a const/var, so it is not a
+    // value-ref target at all.
+    const curlcodeValues = cg
+      .searchNodes('CURLcode')
+      .map((r) => r.node)
+      .filter((n) => n.name === 'CURLcode' && (n.kind === 'constant' || n.kind === 'variable'));
+    expect(curlcodeValues).toEqual([]);
+    // Real file-scope consts alongside the misparse-prone prototypes still work.
+    expect(valueRefReaders(cg, 'REAL_LIMIT')).toEqual(expect.arrayContaining(['use_real']));
+  });
+
+  it('edges same-file methods to a class-scope static final constant (Java)', async () => {
+    // Java keeps constants as `static final` fields inside a class. They extract
+    // as `constant` kind (not `field`) so the value-ref gate targets them; a
+    // plain instance `final` field is NOT a constant and must not be a target.
+    fs.writeFileSync(
+      path.join(dir, 'Limits.java'),
+      [
+        'class Limits {',
+        '  public static final int MAX_ITEMS = 100;',
+        '  static final String[] STATUS_NAMES = { "ok", "fail" };',
+        '  final int instanceId = 1;',
+        '  int capped(int n) { return n > MAX_ITEMS ? MAX_ITEMS : n; }',
+        '  String label(int i) { return STATUS_NAMES[i]; }',
+        '  int id() { return instanceId; }',
+        '}',
+      ].join('\n'),
+    );
+    cg = index();
+    await cg.indexAll();
+
+    expect(valueRefReaders(cg, 'MAX_ITEMS')).toEqual(expect.arrayContaining(['capped']));
+    expect(valueRefReaders(cg, 'STATUS_NAMES')).toEqual(expect.arrayContaining(['label']));
+    // An instance `final` field is mutable per-object state, not a shared
+    // constant — it stays `field` kind and is never a value-ref target.
+    expect(valueRefReaders(cg, 'instanceId')).toEqual([]);
+  });
+
+  it('does NOT edge a Java class const shadowed by a method-local of the same name', async () => {
+    fs.writeFileSync(
+      path.join(dir, 'Shadow.java'),
+      [
+        'class Shadow {',
+        '  static final int TIMEOUT = 30;',
+        '  int usesConst() { return TIMEOUT; }',
+        '  int shadows() { int TIMEOUT = 5; return TIMEOUT; }',
+        '}',
+      ].join('\n'),
+    );
+    cg = index();
+    await cg.indexAll();
+
+    expect(valueRefReaders(cg, 'TIMEOUT')).toEqual([]);
+  });
+
+  it('edges same-file methods to a class const / static readonly (C#)', async () => {
+    // C# constants are `const` (compile-time) or `static readonly` (runtime);
+    // both extract as `constant`. An instance `readonly` field is per-object and
+    // stays `field`.
+    fs.writeFileSync(
+      path.join(dir, 'Limits.cs'),
+      [
+        'class Limits {',
+        '  const int MAX_ITEMS = 100;',
+        '  static readonly string[] STATUS_NAMES = { "ok", "fail" };',
+        '  readonly int instanceId = 1;',
+        '  int Capped(int n) { return n > MAX_ITEMS ? MAX_ITEMS : n; }',
+        '  string Label(int i) { return STATUS_NAMES[i]; }',
+        '  int Id() { return instanceId; }',
+        '}',
+      ].join('\n'),
+    );
+    cg = index();
+    await cg.indexAll();
+
+    expect(valueRefReaders(cg, 'MAX_ITEMS')).toEqual(expect.arrayContaining(['Capped']));
+    expect(valueRefReaders(cg, 'STATUS_NAMES')).toEqual(expect.arrayContaining(['Label']));
+    expect(valueRefReaders(cg, 'instanceId')).toEqual([]);
+  });
+
+  it('does NOT edge a C# class const shadowed by a method-local of the same name', async () => {
+    fs.writeFileSync(
+      path.join(dir, 'Shadow.cs'),
+      [
+        'class Shadow {',
+        '  const int TIMEOUT = 30;',
+        '  int UsesConst() { return TIMEOUT; }',
+        '  int Shadows() { int TIMEOUT = 5; return TIMEOUT; }',
+        '}',
+      ].join('\n'),
+    );
+    cg = index();
+    await cg.indexAll();
+
+    expect(valueRefReaders(cg, 'TIMEOUT')).toEqual([]);
+  });
+
+  it('edges same-file readers to a top-level and class const, incl. self:: / Class:: (PHP)', async () => {
+    // PHP keeps constants at file scope (`const X`) and inside classes (`const
+    // X`), both extracted as `constant`. A constant *reference* is a `name` node
+    // (bare `X`, or the const half of `self::X` / `Foo::X`), so the reader-scan
+    // must match `name`. A `$var` local is a different namespace and can never
+    // shadow a bare constant — so there is nothing to prune.
+    fs.writeFileSync(
+      path.join(dir, 'Config.php'),
+      [
+        '<?php',
+        'const APP_VERSION = "1.0";',
+        'class Config {',
+        '  const MAX_ITEMS = 100;',
+        '  const STATUS_NAMES = ["ok", "fail"];',
+        '  public static $counter = 0;',
+        '  function capped($n) { return $n > self::MAX_ITEMS ? self::MAX_ITEMS : $n; }',
+        '  function label($i) { return Config::STATUS_NAMES[$i]; }',
+        '  function version() { return APP_VERSION; }',
+        '}',
+      ].join('\n'),
+    );
+    cg = index();
+    await cg.indexAll();
+
+    expect(valueRefReaders(cg, 'MAX_ITEMS')).toEqual(expect.arrayContaining(['capped']));
+    expect(valueRefReaders(cg, 'STATUS_NAMES')).toEqual(expect.arrayContaining(['label']));
+    expect(valueRefReaders(cg, 'APP_VERSION')).toEqual(expect.arrayContaining(['version']));
+    // A static property is mutable class state, not a constant — never a target.
+    expect(valueRefReaders(cg, 'counter')).toEqual([]);
+  });
+
+  it('edges readers to a top-level and object-scope val, not a class instance val (Scala)', async () => {
+    // Scala has no `static`: an `object` is a singleton, so its `val`s are the
+    // shared-constant idiom (extracted as `constant`, like a top-level val). A
+    // `class` val is a per-instance immutable field (`field`, never a target).
+    fs.writeFileSync(
+      path.join(dir, 'Demo.scala'),
+      [
+        'val AppVersion = "1.0"',
+        'object Config {',
+        '  val TIMEOUT_MS = 30',
+        '  val STATUS_NAMES = List("ok", "fail")',
+        '  def capped(n: Int): Int = if (n > TIMEOUT_MS) TIMEOUT_MS else n',
+        '  def label(i: Int): String = STATUS_NAMES(i)',
+        '}',
+        'class Widget {',
+        '  val MaxItems = 100',
+        '  def within(n: Int): Int = if (n < MaxItems) n else MaxItems',
+        '}',
+      ].join('\n'),
+    );
+    cg = index();
+    await cg.indexAll();
+
+    expect(valueRefReaders(cg, 'TIMEOUT_MS')).toEqual(expect.arrayContaining(['capped']));
+    expect(valueRefReaders(cg, 'STATUS_NAMES')).toEqual(expect.arrayContaining(['label']));
+    // A class instance `val` is per-object state (kind `field`), not a shared
+    // constant — never a value-ref target even though `within` reads it.
+    expect(valueRefReaders(cg, 'MaxItems')).toEqual([]);
+  });
+
+  it('does NOT edge a Scala object val shadowed by a method-local val of the same name', async () => {
+    fs.writeFileSync(
+      path.join(dir, 'Shadow.scala'),
+      [
+        'object Config {',
+        '  val TIMEOUT = 30',
+        '  def usesConst(): Int = TIMEOUT',
+        '  def shadows(): Int = { val TIMEOUT = 5; TIMEOUT }',
+        '}',
+      ].join('\n'),
+    );
+    cg = index();
+    await cg.indexAll();
+
+    expect(valueRefReaders(cg, 'TIMEOUT')).toEqual([]);
+  });
+
+  it('edges readers to top-level, object, and companion-object constants, not a class val (Kotlin)', async () => {
+    // Kotlin has no `static`: a top-level property, an `object` (singleton), and a
+    // class's `companion object` all hold shared constants (`val`→constant). A
+    // class instance `val` is per-object state (`field`, never a target). The
+    // property name nests as variable_declaration→simple_identifier, and a const
+    // reference is a `simple_identifier`.
+    fs.writeFileSync(
+      path.join(dir, 'Demo.kt'),
+      [
+        'const val TOP_LEVEL_MAX = 100',
+        'object Config {',
+        '  const val TIMEOUT_MS = 30',
+        '  val STATUS_NAMES = listOf("ok", "fail")',
+        '  fun capped(n: Int): Int = if (n > TIMEOUT_MS) TIMEOUT_MS else n',
+        '  fun label(i: Int): String = STATUS_NAMES[i]',
+        '}',
+        'class Widget {',
+        '  companion object { const val MAX_RETRIES = 3 }',
+        '  val instanceField = 1',
+        '  fun retries(): Int = MAX_RETRIES',
+        '  fun within(n: Int): Int = if (n < TOP_LEVEL_MAX) n else TOP_LEVEL_MAX',
+        '}',
+      ].join('\n'),
+    );
+    cg = index();
+    await cg.indexAll();
+
+    expect(valueRefReaders(cg, 'STATUS_NAMES')).toEqual(expect.arrayContaining(['label']));
+    expect(valueRefReaders(cg, 'MAX_RETRIES')).toEqual(expect.arrayContaining(['retries']));
+    expect(valueRefReaders(cg, 'TOP_LEVEL_MAX')).toEqual(expect.arrayContaining(['within']));
+    // A class instance `val` is per-object state (kind `field`), never a target.
+    expect(valueRefReaders(cg, 'instanceField')).toEqual([]);
+  });
+
+  it('does NOT edge a Kotlin object const shadowed by a method-local val of the same name', async () => {
+    fs.writeFileSync(
+      path.join(dir, 'Shadow.kt'),
+      [
+        'object Config {',
+        '  const val TIMEOUT = 30',
+        '  fun usesConst(): Int = TIMEOUT',
+        '  fun shadows(): Int { val TIMEOUT = 5; return TIMEOUT }',
+        '}',
+      ].join('\n'),
+    );
+    cg = index();
+    await cg.indexAll();
+
+    expect(valueRefReaders(cg, 'TIMEOUT')).toEqual([]);
+  });
+
   it('emits nothing when CODEGRAPH_VALUE_REFS=0', async () => {
   it('emits nothing when CODEGRAPH_VALUE_REFS=0', async () => {
     const prev = process.env.CODEGRAPH_VALUE_REFS;
     const prev = process.env.CODEGRAPH_VALUE_REFS;
     process.env.CODEGRAPH_VALUE_REFS = '0';
     process.env.CODEGRAPH_VALUE_REFS = '0';

+ 130 - 20
docs/design/value-reference-edges-playbook.md

@@ -45,7 +45,7 @@ agent read-reduction (see §4.3).
 
 
 | Symbol | Role |
 | Symbol | Role |
 |---|---|
 |---|---|
-| `VALUE_REF_LANGS` (static Set) | languages the feature runs for. Currently `typescript`, `javascript`, `tsx`, `go`, `python`, `rust`, `ruby`. **Add the new language here.** |
+| `VALUE_REF_LANGS` (static Set) | languages the feature runs for. Currently `typescript`, `javascript`, `tsx`, `go`, `python`, `rust`, `ruby`, `c`, `java`, `csharp`, `php`, `scala`, `kotlin`. **Add the new language here.** |
 | `valueRefsEnabled` | `process.env.CODEGRAPH_VALUE_REFS !== '0'` — default ON, env opts out. |
 | `valueRefsEnabled` | `process.env.CODEGRAPH_VALUE_REFS !== '0'` — default ON, env opts out. |
 | `MAX_VALUE_REF_NODES` (20_000) | per-scope traversal cap (and the shadow-scan cap). |
 | `MAX_VALUE_REF_NODES` (20_000) | per-scope traversal cap (and the shadow-scan cap). |
 | `captureValueRefScope(kind, name, id, node)` | called from `createNode` on every node. Records **targets** (file-scope `const`/`var`) and **reader scopes** (`function`/`method`/`const`/`var`). |
 | `captureValueRefScope(kind, name, id, node)` | called from `createNode` on every node. Records **targets** (file-scope `const`/`var`) and **reader scopes** (`function`/`method`/`const`/`var`). |
@@ -66,13 +66,59 @@ targets** (see §3).
 
 
 ## 2. Current state (what's shipped + validated)
 ## 2. Current state (what's shipped + validated)
 
 
-- **Default ON** for TS/JS/tsx + Go + Python + Rust + Ruby (`CODEGRAPH_VALUE_REFS=0` disables). Shipped in **PR #895**
+- **Default ON** for TS/JS/tsx + Go + Python + Rust + Ruby + C + Java + C# (`CODEGRAPH_VALUE_REFS=0` disables). Shipped in **PR #895**
   (flip-on + the shadow prune); Go added in a later PR (the shadow-prune declarator switch +
   (flip-on + the shadow prune); Go added in a later PR (the shadow-prune declarator switch +
-  `VALUE_REF_LANGS`).
-- **Validated S/M/L** in **TS, JS, tsx, Go, Python, Rust, and Ruby** — see the matrix in the
+  `VALUE_REF_LANGS`); C added later still (extractor change to emit the nodes + the bare-identifier
+  misparse guard); Java + C# after that (field→constant kind switch for the const subset).
+- **Validated S/M/L** in **TS, JS, tsx, Go, Python, Rust, Ruby, C, Java, and C#** — see the matrix in the
   design doc. All clean: node count identical on/off, precision guards held, impact win
   design doc. All clean: node count identical on/off, precision guards held, impact win
   reproduced. Go required extending the shadow prune (per-grammar declarators) — the worked
   reproduced. Go required extending the shadow prune (per-grammar declarators) — the worked
-  example of "step B is load-bearing."
+  example of "step B is load-bearing." **C required the Ruby treatment** (the extractor didn't emit
+  C file-scope const/var nodes at all) **plus** a C-specific FP guard (a macro-prefixed-prototype
+  misparse mints a bare-identifier "variable" named after the return type — skip bare-`identifier`
+  declarators). It was the worked example of "the §2b coverage table's *easy-path* guess can be
+  wrong — always do §5 step C (confirm the nodes exist) before trusting it."
+- **Java + C# were the cleanest class-scope ("Ruby treatment") languages.** The constants already
+  extract — but as `field` kind, which the gate rejects. The whole change was emitting the const
+  *subset* as `constant`: an `isConst` predicate on each extractor (Java `static final`; C# `const`
+  / `static readonly`) + a kind switch in `extractField`. **No new shadow-prune wiring** (method
+  locals are `variable_declarator`, already in the switch) and **no FP guards** (UPPER_SNAKE /
+  PascalCase fit the distinctive-name gate). Instance `final`/`readonly` fields correctly stay
+  `field`. Validated S/M/L: gson/commons-lang/guava, automapper/newtonsoft/efcore — 0 leaks, node
+  parity, big impact wins (`INDEX_NOT_FOUND` 4→165, `_resourceManager` 22→1664).
+- **PHP was the cleanest of all — one reader-scan line.** Constants already extract as `constant`
+  (top-level + class), so the only change was teaching the reader-scan that a PHP constant
+  *reference* is a `name` node (bare `X`, or the const half of `self::X` / `Foo::X`). **No extractor
+  change, no prune wiring** (a `$var` local can't shadow a bare constant — different namespace).
+  Validated S/M/L (guzzle/monolog/laravel), all clean, 0 class/const collisions. The honest caveat:
+  **lower yield** — PHP reads constants cross-file far more than same-file (laravel 2,956 files → 86
+  edges), and value-refs is same-file only; still correct, just a smaller contribution.
+- **Scala — an `object` is the constant scope.** Scala has no `static`; a singleton `object`'s `val`s
+  are the shared-constant idiom (`object Config { val Timeout = 30 }`). Top-level `val` already
+  extracted as `constant`, but object/class vals both came out as `field`. The fix: in the Scala
+  `val_definition` handler, walk to the enclosing definition — `object_definition` (or top-level) →
+  `constant`/`variable`; `class`/`trait`/`enum` → `field` (per-instance, like Java instance `final`).
+  Added `val_definition`/`var_definition` to the shadow prune (method-local `val` shadows). Reader-scan
+  needed nothing (refs are `identifier`). Minor known limitation: Scala uses `val`/`def`
+  interchangeably for members, so a camelCase val can share a name with a method — same-file name
+  matching can't tell them apart (bounded, like Ruby's sibling-class; sweep showed flagged collisions
+  were mostly real object vals read by siblings). Validated S/M/L (upickle/cats/pekko).
+- **C++ was attempted and reverted — DON'T retry without solving parse fidelity first.** tree-sitter-cpp
+  mis-parses real template/macro-heavy C++ (and `.h` files route to the C grammar): class members and
+  parameters leak to file scope as bogus constants/variables. Two guards (skip `ERROR`-ancestor and
+  `compound_statement`-ancestor declarations) removed ~83% of gross leaks, but the residual pervades
+  even well-structured library source (template-class member leaks, amalgamated mega-headers,
+  `.h`-as-C++). It did not reach the precision bar of the other languages. See the C++ section below.
+- **Kotlin = C + Scala + PHP techniques combined (and clean).** Nothing extracted before (property name
+  nests `property_declaration → variable_declaration → simple_identifier` — the C problem). Fix:
+  handle `property_declaration` in the Kotlin `visitNode` hook — pull the nested name, walk to the
+  enclosing definition for the kind (`object`/`companion object`/top-level → `constant`/`variable`;
+  `class` → `field` — the Scala rule; skip locals under a `function_body`/`init`/lambda), add
+  `simple_identifier` to the reader-scan (the PHP-`name` move), and `property_declaration` to the
+  shadow prune. Clean parse fidelity (the one `fun interface` misparse is already handled), so no
+  C++-style tail. One of the cleanest yields — companion-object bit-masks/state consts are a heavy
+  same-file-read idiom. Validated S/M/L (okio/coroutines/ktor); only the bounded val/def-or-class and
+  sibling-companion name overlaps remain (shared with Scala/Ruby).
 - **Tests:** `__tests__/value-reference-edges.test.ts` — same-file readers edged; surfaced in
 - **Tests:** `__tests__/value-reference-edges.test.ts` — same-file readers edged; surfaced in
   impact radius; shadowed const NOT edged (verified to fail without the guard); JSX-only read
   impact radius; shadowed const NOT edged (verified to fail without the guard); JSX-only read
   edged (tsx); `CODEGRAPH_VALUE_REFS=0` emits nothing.
   edged (tsx); `CODEGRAPH_VALUE_REFS=0` emits nothing.
@@ -95,16 +141,25 @@ the bottom of this section).
 | Go | package `const`/`var` |
 | Go | package `const`/`var` |
 | Rust | module + impl `const`/`static` |
 | Rust | module + impl `const`/`static` |
 | Ruby | class/module `CONST` (the class-scope extension) |
 | Ruby | class/module `CONST` (the class-scope extension) |
+| C | file-scope `static const` scalars + pointer/array lookup tables + mutable globals. **Needed an extractor change** (nodes weren't emitted) + a bare-identifier misparse guard — NOT the easy path the table below first guessed |
+| Java | class `static final` fields. Nodes existed as `field` kind; emitted the const subset as `constant` (`isConst` + `extractField` kind switch). No new prune wiring, no FP guards |
+| C# | class `const` / `static readonly`. Identical to Java — same `field`→`constant` change |
+| PHP | top-level `const` + class `const` (both already `constant` kind). **Only** change was the reader-scan: a PHP const *reference* is a `name` node. No extractor change, no prune wiring (a `$var` local can't shadow a bare constant). Lower yield — PHP reads consts cross-file more than same-file |
+| Scala | top-level `val` (already `constant`) + **`object` val** (the singleton-constant idiom; re-kinded from `field` by walking to the enclosing `object_definition`). `class`/`trait`/`enum` vals stay `field`. `val_definition`/`var_definition` added to the shadow prune. Minor val/def name-collision limit |
+| Kotlin | top-level / `object` / `companion object` `val` (re-kinded from nothing — properties weren't extracted at all). Handled in `visitNode`: nested name (`variable_declaration → simple_identifier`, the C move) + scope-walk for kind (Scala move) + `simple_identifier` in the reader-scan (PHP move) + prune. `class` instance vals stay `field`. Clean — one of the best yields (companion bit-masks) |
 | **Svelte, Vue, Astro** | **inherited for free** — their extractors re-parse the `<script>`/frontmatter block as `typescript`/`javascript`, which are in `VALUE_REF_LANGS` (verified: a `.svelte` `const` edges its readers). No separate work; no separate matrix row needed. |
 | **Svelte, Vue, Astro** | **inherited for free** — their extractors re-parse the `<script>`/frontmatter block as `typescript`/`javascript`, which are in `VALUE_REF_LANGS` (verified: a `.svelte` `const` edges its readers). No separate work; no separate matrix row needed. |
 
 
 **🔜 Remaining — likely the easy path** (constants are file/module-scope, or top-level; do §5: add
 **🔜 Remaining — likely the easy path** (constants are file/module-scope, or top-level; do §5: add
 to `VALUE_REF_LANGS`, verify the declarator node type + extractor kind, sweep). Classify each
 to `VALUE_REF_LANGS`, verify the declarator node type + extractor kind, sweep). Classify each
-*before* building — several are mixed file+class scope:
+*before* building — several are mixed file+class scope. **Caveat learned from C:** "easy path" here
+means *scope* fits — it does NOT promise the extractor already emits the const nodes. C was in this
+column but emitted *no* file-scope const/var nodes (its name nests in an `init_declarator` the
+generic fallback can't read), so it needed the Ruby-style extractor change after all. **Always run
+§5 step C (confirm `select kind,name from nodes …` actually shows the consts) before trusting this
+column.**
 
 
 | Language | Constant forms | Note |
 | Language | Constant forms | Note |
 |---|---|---|
 |---|---|---|
-| C | file-scope `const` / `static const` | `init_declarator` in a `declaration`; `#define` macros aren't value nodes |
-| Kotlin | top-level `const val`/`val` (file-scope) + `companion`/`object` (class-scope) | top-level is easy; companion needs the class-scope gate (present) |
 | Swift | top-level `let` (file) + `static let` in a type (class) | README notes Swift stored properties aren't extracted as own nodes — check |
 | Swift | top-level `let` (file) + `static let` in a type (class) | README notes Swift stored properties aren't extracted as own nodes — check |
 | Dart | top-level `const`/`final` (file) + `static const` (class) | mixed |
 | Dart | top-level `const`/`final` (file) + `static const` (class) | mixed |
 | Lua / Luau | file/chunk `local X =` + globals; no `const` keyword | distinctive-name gate (needs `[A-Z_]`) catches fewer — Lua casing varies |
 | Lua / Luau | file/chunk `local X =` + globals; no `const` keyword | distinctive-name gate (needs `[A-Z_]`) catches fewer — Lua casing varies |
@@ -112,19 +167,25 @@ to `VALUE_REF_LANGS`, verify the declarator node type + extractor kind, sweep).
 
 
 **🧱 Remaining — needs the Ruby treatment** (constants live almost entirely **inside a
 **🧱 Remaining — needs the Ruby treatment** (constants live almost entirely **inside a
 class/type**; the class-scope *gate* exists now, but first confirm the extractor emits them as
 class/type**; the class-scope *gate* exists now, but first confirm the extractor emits them as
-`constant`/`variable` nodes — Ruby's weren't extracted at all, and Java/C# fields may come out as
-`field`/`property` kind):
+`constant`/`variable` nodes — Ruby's weren't extracted at all, and class fields often come out as
+`field`/`property` kind, which the gate rejects). **Java + C# (done) were this case**: their
+constants extracted as `field` kind, and the fix was emitting the const subset (`static final` /
+`const` / `static readonly`) as `constant` — the template for the rest of this bucket:
 
 
 | Language | Constant forms |
 | Language | Constant forms |
 |---|---|
 |---|---|
-| Java | `static final` fields in a class |
-| C# | `const` / `static readonly` in a class |
-| Scala | `val` / `final val` in an `object`/`class` |
-| PHP | class `const` + top-level `const` + `define()` |
-| C++ | file-scope + class `static const`/`constexpr` (mixed) |
 | Pascal / Delphi | `const` sections at unit (file) or class scope (mixed) |
 | Pascal / Delphi | `const` sections at unit (file) or class scope (mixed) |
 | Objective-C | `static const` / `extern const` / `#define` (file-ish; macros unparsed; already "partial support") |
 | Objective-C | `static const` / `extern const` / `#define` (file-ish; macros unparsed; already "partial support") |
 
 
+**⛔ Attempted & reverted — C++.** file-scope + class `static const`/`constexpr` (mixed). Machinery
+built and correct on clean C++, but **tree-sitter-cpp parse fidelity is the blocker**: template/
+macro-heavy real C++ leaks class members + parameters to file scope as bogus constants/variables, and
+`.h` files route to the C grammar (mangling C++ classes). Two guards (skip `ERROR`-ancestor and
+`compound_statement`-ancestor declarations) cut ~83% of gross leaks but the residual pervades even
+well-structured library source. **Did not meet the precision bar; reverted.** Don't retry as a
+"value-refs" task — it needs prior work on C++ parse handling (template-class member scoping,
+`.h`-as-C++ detection, amalgamated-header exclusion).
+
 **🚫 N/A:** Liquid (template language — no value constants to track).
 **🚫 N/A:** Liquid (template language — no value constants to track).
 
 
 **Frameworks — not a value-refs axis.** The README's framework list (Django, Flask, Express,
 **Frameworks — not a value-refs axis.** The README's framework list (Django, Flask, Express,
@@ -267,10 +328,18 @@ The target gate now accepts **`file:`, `class:`, and `module:`** parents. Before
   enclosing class's constant, and strict matching would drop those valid reads. The only real FP
   enclosing class's constant, and strict matching would drop those valid reads. The only real FP
   is the same constant name in *sibling* classes in one file (~1.7% of Ruby targets on rails);
   is the same constant name in *sibling* classes in one file (~1.7% of Ruby targets on rails);
   valid code rarely hits it (a bare sibling-class constant is a NameError in Ruby).
   valid code rarely hits it (a bare sibling-class constant is a NameError in Ruby).
-- **Still untested:** Java `static final`, C# `const`, Swift `static let`. The gate covers them
-  now, but confirm the extractor emits them as `constant`/`variable` nodes with a `class:`/
-  `struct:` parent (Swift stored properties, for one, aren't extracted as their own nodes) — and
-  if the parent kind is `struct:`/`interface:` rather than `class:`/`module:`, widen the gate.
+- **Java `static final` + C# `const`/`static readonly` are DONE** (emitted as `field` → re-kinded to
+  `constant`). **Still untested:** Swift `static let`, Kotlin `companion`/`object`. The gate covers
+  them, but confirm the extractor emits them as `constant`/`variable` nodes with a `class:`/`struct:`
+  parent (Swift stored properties aren't extracted as their own nodes) — and if the parent kind is
+  `struct:`/`interface:` rather than `class:`/`module:`, widen the gate.
+- **Confirm the reader-scan matches the language's constant *reference* node type (the PHP lesson).**
+  The reader-scan in `flushValueRefs` matches `identifier` / `constant` / `name`. If the new language
+  represents a constant *read* as some other node type, the scan finds nothing and **no edges form**
+  even with targets correctly registered. PHP refs a const as a **`name`** node (bare `X`, and the
+  const half of `self::X` / `Foo::X`), which the scan missed until `name` was added. Dump a sample's
+  reader body (`scripts/agent-eval` or a quick `getParser` walk) and check the node type of a
+  constant reference *before* sweeping — a zero-edge sweep usually means this, not a target-gate bug.
 
 
 ### B. Confirm the declarator node type (for the shadow prune)
 ### B. Confirm the declarator node type (for the shadow prune)
 
 
@@ -290,7 +359,13 @@ silently does nothing for the new language and intra-file shadowing produces fal
 | Rust | `const_item`, `static_item`, `let_declaration` | const/static → `name` field; let → `pattern` field | **done** |
 | Rust | `const_item`, `static_item`, `let_declaration` | const/static → `name` field; let → `pattern` field | **done** |
 | Ruby | `assignment` (LHS is a `constant` node) | already in the switch; Ruby can't local-shadow a constant, so the prune is effectively a no-op for it | **done** (class-scope) |
 | Ruby | `assignment` (LHS is a `constant` node) | already in the switch; Ruby can't local-shadow a constant, so the prune is effectively a no-op for it | **done** (class-scope) |
 | Ruby | `assignment` with constant LHS (`CONST`) | LHS | to verify |
 | Ruby | `assignment` with constant LHS (`CONST`) | LHS | to verify |
-| C/C++ | `init_declarator` in a file-scope `declaration` | declarator id | to verify |
+| C | `init_declarator` in a file-scope `declaration` | `cDeclaratorIdentifier` walks the `declarator` chain (init → pointer/array → identifier) | **done** |
+| C++ | **attempted & reverted** — parse fidelity (see the C++ note in §2b) | — | reverted |
+| Java | `variable_declarator` (field AND method-local) | `namedChild(0)` = name identifier — **already the TS/JS case**, no new wiring | **done** |
+| C# | `variable_declarator` (field AND method-local) | same as Java — already in the switch | **done** |
+| PHP | **none** | a `$var` local (`variable_name`) is a different namespace from a bare constant — a local can never shadow a constant, so the prune is a no-op and needs no PHP declarator | **done** (n/a) |
+| Scala | `val_definition`, `var_definition` | `pattern` field (identifier) — catches an object/top-level val shadowed by a method-local `val` | **done** |
+| Kotlin | `property_declaration` | `variable_declaration → simple_identifier` (and `bump` accepts `simple_identifier`) — catches an object/companion const shadowed by a method-local `val` | **done** |
 
 
 **The prune rule is `declarators > file-scope-node-count`, NOT `> 1`.** A name can be bound
 **The prune rule is `declarators > file-scope-node-count`, NOT `> 1`.** A name can be bound
 twice *at file scope* legitimately — a **conditional module def** (`try: X = a; except: X = b`,
 twice *at file scope* legitimately — a **conditional module def** (`try: X = a; except: X = b`,
@@ -367,6 +442,41 @@ fixed); impact delta shows the blind→real radius win; full test suite green.
 - **require-bindings (CommonJS) are not FPs** — see §3. Don't "fix" them.
 - **require-bindings (CommonJS) are not FPs** — see §3. Don't "fix" them.
 - **Don't over-engineer a guard for a gap that doesn't manifest** (e.g. param-only shadow):
 - **Don't over-engineer a guard for a gap that doesn't manifest** (e.g. param-only shadow):
   evidence-driven only. The maintainer steered toward minimal, surgical fixes.
   evidence-driven only. The maintainer steered toward minimal, surgical fixes.
+- **C macro-prefixed-prototype misparse (the C FP cluster):** an unknown leading macro
+  (`CURL_EXTERN`, `XXH_PUBLIC_API`) makes tree-sitter-c misparse a prototype `MACRO RetType
+  fn(args);` as a *declaration* whose declared "variable" is the bare return-type identifier
+  (`XXH_errorcode`), splitting `fn(args)` into a bogus expression. It mints one spurious type-named
+  global per prototype — then edged by every function of that type (redis `XXH_errorcode` 1→18).
+  These misparses *always* produce a **bare `identifier`** declarator (checked across
+  pointer/array/sized-return variants); real consts/tables always have an `init_declarator` and real
+  pointer/array globals their own declarator. Fix = **skip bare-`identifier` declarators** in the C
+  branch. The "extra" file-scope variable nodes also drop node-count vs an early pass — both arms
+  match, but don't be surprised the post-fix count is *lower*.
+- **"Easy path" ≠ "nodes already exist."** The §2b table classifies by *scope*; it does not promise
+  the language's consts are extracted. C sat in the easy column yet emitted zero file-scope const
+  nodes. Run §5 step C (`select kind,name from nodes where file_path like '%sample%'`) on a sample
+  *first* — if the consts aren't there, you're doing the Ruby treatment, not the easy path.
+- **Class consts may extract as `field` kind, not `constant` (Java/C#).** Step C must check the
+  *kind*, not just that a node exists: Java `static final` and C# `const`/`static readonly` came out
+  as `field`, which the value-ref target gate (`constant`/`variable` only) silently rejects — so the
+  feature emitted nothing despite the nodes being present. Fix = an `isConst` predicate on the
+  extractor (gated on the const modifiers) + a kind switch in `extractField` (scoped per-language so
+  other languages' fields stay `field`). Don't widen the *gate* to accept `field` — that would pull
+  in every mutable instance field as a target. And only the const *subset* converts: a Java instance
+  `final` or C# instance `readonly` is per-object state, must stay `field`.
+- **A zero-edge sweep with correctly-registered targets = the reader-scan node type (the PHP trap).**
+  Targets can register perfectly (right kind, right scope) and *still* produce zero edges if the
+  reader-scan doesn't recognise how the language writes a constant *read*. PHP refs a const as a
+  **`name`** node, not `identifier`/`constant`, so the scan saw nothing until `name` was added to the
+  match. Before assuming a target-gate bug on a sparse/empty sweep, dump a reader body and check the
+  node type of a known constant reference. (Adding a ref node type to the scan is safe across
+  languages — `flushValueRefs` only runs for the value-ref set, and a file holds only its own
+  grammar's nodes; `name` is PHP-only among the current set.)
+- **Same-file-only means cross-file-heavy languages yield less — that's correct, not a miss.** PHP
+  reads constants across files far more than within one (`Logger::DEBUG` everywhere), so laravel
+  (2,956 files) gave only 86 edges vs Ruby rails's 2,255. Don't chase it: cross-file value consumers
+  are out of scope for *every* language (would need import/scope resolution). Report the lower yield
+  honestly in the matrix rather than treating it as a bug to fix.
 
 
 ---
 ---
 
 

+ 176 - 4
docs/design/value-reference-edges.md

@@ -1,6 +1,6 @@
 # Design + status: same-file value-reference edges
 # Design + status: same-file value-reference edges
 
 
-**Status:** SHIPPED (default-on for TS/JS/tsx + Go + Python + Rust + Ruby; `CODEGRAPH_VALUE_REFS=0` disables). The
+**Status:** SHIPPED (default-on for TS/JS/tsx + Go + Python + Rust + Ruby + C + Java + C# + PHP + Scala + Kotlin; `CODEGRAPH_VALUE_REFS=0` disables). The
 emitter lives in `TreeSitterExtractor.flushValueRefs` (`src/extraction/tree-sitter.ts`).
 emitter lives in `TreeSitterExtractor.flushValueRefs` (`src/extraction/tree-sitter.ts`).
 **Motivation:** close the impact-analysis hole for *value consumers*. Static
 **Motivation:** close the impact-analysis hole for *value consumers*. Static
 extraction edges calls, imports, and inheritance, but never edges a constant to the
 extraction edges calls, imports, and inheritance, but never edges a constant to the
@@ -13,7 +13,7 @@ readers" class of change (the ReScript-PR false positive that motivated the work
 ## TL;DR for a new session
 ## TL;DR for a new session
 
 
 We emit a `references` edge (`metadata: { valueRef: true }`) from a reader symbol to
 We emit a `references` edge (`metadata: { valueRef: true }`) from a reader symbol to
-the **file/package-scope `const`/`var` it reads**, same-file only, for TS/JS/tsx + Go + Python + Rust + Ruby. Those edges
+the **file/package-scope `const`/`var` it reads**, same-file only, for TS/JS/tsx + Go + Python + Rust + Ruby + C + Java + C# + PHP + Scala + Kotlin. Those edges
 flow straight into `getImpactRadius` / `codegraph impact` and the impact trail in
 flow straight into `getImpactRadius` / `codegraph impact` and the impact trail in
 `codegraph_explore` / `codegraph_node` — no agent-behaviour change required.
 `codegraph_explore` / `codegraph_node` — no agent-behaviour change required.
 
 
@@ -46,7 +46,7 @@ The win is **impact-radius correctness**, not agent read-reduction (see "Agent A
    the content-minified bundles guard #1 misses.
    the content-minified bundles guard #1 misses.
 3. **Distinctive-name + same-file** as above.
 3. **Distinctive-name + same-file** as above.
 
 
-## Validation matrix — TS / JS / Go / Python / Rust / Ruby
+## Validation matrix — TS / JS / Go / Python / Rust / Ruby / C / Java / C# / PHP / Scala / Kotlin
 
 
 Method per repo: index the same tree twice (value-refs on vs `CODEGRAPH_VALUE_REFS=0`),
 Method per repo: index the same tree twice (value-refs on vs `CODEGRAPH_VALUE_REFS=0`),
 diff node/edge counts, spot-check precision, and measure `codegraph impact` on a few
 diff node/edge counts, spot-check precision, and measure `codegraph impact` on a few
@@ -101,7 +101,56 @@ file-scope consts. Node count must be **identical** on/off (edges-only feature).
 | jekyll/jekyll | medium | 218 | 1,906 (stable) | +100 (2.4%) | ~100% TP | `DEFAULT_PRIORITY` 1→3, `LOG_LEVELS` 4→5 |
 | jekyll/jekyll | medium | 218 | 1,906 (stable) | +100 (2.4%) | ~100% TP | `DEFAULT_PRIORITY` 1→3, `LOG_LEVELS` 4→5 |
 | rails/rails | large | 1,452 | 61,911 (stable) | +2,255 (1.2%) | ~98% TP (same-file ambiguity 21/1208 targets) | `Post` (Struct const) 75 readers |
 | rails/rails | large | 1,452 | 61,911 (stable) | +2,255 (1.2%) | ~98% TP (same-file ambiguity 21/1208 targets) | `Post` (Struct const) 75 readers |
 
 
-Across S/M/L in all six languages: node count never moved, the precision guards held, and
+**C** (file-scope `static const` scalars + pointer/array lookup tables + mutable globals; required
+extracting the nodes first — see below)
+
+| Repo | size | files | nodes (on=off) | +value-ref edges | precision | `impact` on→off example |
+|---|---|---|---|---|---|---|
+| redis/hiredis | small | 52 | 1,161 (stable) | +29 (2.5%) | all sampled TP; guard holds | `hiredisAllocFns` 1→**71** |
+| curl/curl | large | 994 | 16,124 (stable) | +597 (3.7%) | all sampled TP; guard holds; no minified FPs | `Curl_ssl` 3→**57** |
+| redis/redis | medium | 782 | 19,446 (stable) | +1,634 (8.4%) | all sampled TP after the macro-misparse fix; guard holds | `asmManager` 2→**97**, `keyMetaClass` 1→36, `XXH3_kSecret` 1→27, `helpEntries` 1→13 |
+
+**Java** (class-scope `static final` constants; required emitting them as `constant` kind — see below)
+
+| Repo | size | files | nodes (on=off) | +value-ref edges | precision | `impact` on→off example |
+|---|---|---|---|---|---|---|
+| google/gson | small | 262 | 8,563 (stable) | +387 | all sampled TP; guard holds | `PEEKED_NONE` 1→**31** |
+| apache/commons-lang | medium | 623 | 19,976 (stable) | +2,087 | all sampled TP; guard holds; no minified FPs | `INDEX_NOT_FOUND` 4→**165**, `EMPTY` 5→161 |
+| google/guava | large | 3,227 | 130,945 (stable) | +6,354 | all sampled TP; guard holds; no minified FPs | `APPLICATION_TYPE` 2→**126**, `ABSENT` 4→66 |
+
+**C#** (class-scope `const` / `static readonly`; same `field`→`constant` change as Java)
+
+| Repo | size | files | nodes (on=off) | +value-ref edges | precision | `impact` on→off example |
+|---|---|---|---|---|---|---|
+| AutoMapper/AutoMapper | small | 511 | 19,254 (stable) | +133 | all sampled TP; guard holds | `ContextParameter` 1→**17**, `InstanceFlags` 1→14 |
+| JamesNK/Newtonsoft.Json | medium | 945 | 20,208 (stable) | +344 | all sampled TP; guard holds | `DefaultFlags` 1→**37**, `JsonNamespaceUri` 1→15 |
+| dotnet/efcore | large | 5,731 | 140,847 (stable) | +3,720 | all sampled TP; guard holds; no minified FPs | `_resourceManager` 22→**1664**, `Prefix` 40→237, `Guid77` 2→191 |
+
+**PHP** (top-level `const` + class `const`, both already `constant`; needed only a reader-scan tweak — see below)
+
+| Repo | size | files | nodes (on=off) | +value-ref edges | precision | `impact` on→off example |
+|---|---|---|---|---|---|---|
+| guzzle/guzzle | small | 81 | 1,655 (stable) | +5 (sparse — see note) | all sampled TP; no collisions | `CONNECTION_ERRORS` 1→3 |
+| Seldaek/monolog | medium | 217 | 3,047 (stable) | +79 | all sampled TP; no class/const collisions | `DEFAULT_JSON_FLAGS` 1→**18**, `RFC_5424_LEVELS` 1→17 |
+| laravel/framework | large | 2,956 | 57,519 (stable) | +86 | all sampled TP; no minified/collision FPs | `INVISIBLE_CHARACTERS` 1→**93**, `SESSION_ID_LENGTH` 1→9 |
+
+**Scala** (top-level `val` + `object` val — re-kinded from `field`; `class` instance vals stay `field`)
+
+| Repo | size | files | nodes (on=off) | +value-ref edges | precision | `impact` on→off example |
+|---|---|---|---|---|---|---|
+| com-lihaoyi/upickle | small | 145 | 3,052 (stable) | +82 | all sampled TP; no class/method collisions | `IntegralPattern` 1→**9** |
+| typelevel/cats | medium | 835 | 15,774 (stable) | +89 | sampled TP; flagged val/def name-collisions were real object vals read by siblings | `maxArity` 3→**17**, `fusionMaxStackDepth` 1→13, `minIntValue` 1→7 |
+| apache/pekko | large | 2,720 | 135,041 (stable) | +8,453 (2,065 Scala) | Scala object vals clean; the bulk are valid Java `PARSER`/`DEFAULT_INSTANCE` from generated protobuf `.java` | `ErrorLevel` 5→**33**, `WarningLevel` 5→29 |
+
+**Kotlin** (top-level / `object` / `companion object` `val` → `constant`; `class` instance vals stay `field`)
+
+| Repo | size | files | nodes (on=off) | +value-ref edges | precision | `impact` on→off example |
+|---|---|---|---|---|---|---|
+| square/okio | small | 307 | 8,540 (stable) | +157 | all sampled TP; 0 collisions | `STATE_IN_QUEUE` 1→**32**, `HMAC_KEY` 1→9 |
+| Kotlin/kotlinx.coroutines | medium | 1,039 | 17,058 (stable) | +210 | all sampled TP; 1 cross-file collision | `BLOCKING_SHIFT` 1→**24**, `TERMINATED` 2→22 (companion bit-masks) |
+| ktorio/ktor | large | 2,302 | 43,272 (stable) | +849 | object/companion consts (HTTP header names); flagged collisions are real consts; `TYPE` is a sibling-companion ambiguity | `TYPE` 8→**109**, `FailedPath` 1→22 |
+
+Across S/M/L in all twelve languages: node count never moved, the precision guards held, and
 the `impact` OFF column is the bug — a const that 80–140 symbols read reports "1 affected"
 the `impact` OFF column is the bug — a const that 80–140 symbols read reports "1 affected"
 without value-refs.
 without value-refs.
 
 
@@ -155,6 +204,129 @@ classes in one file — 21 of 1,208 targets (1.7%) on rails, and most of those r
 referencing a sibling class's bare constant is a NameError in real Ruby, so valid code rarely
 referencing a sibling class's bare constant is a NameError in real Ruby, so valid code rarely
 hits it. Net precision ~98–100%.
 hits it. Net precision ~98–100%.
 
 
+**C was NOT the "easy path" the language tracker first assumed — it needed the extractor to emit
+the nodes first.** C keeps shareable values at file scope (`static const` scalars, and very
+commonly pointer/array **lookup tables** + mutable global state), which fits the file-scope target
+gate. But unlike Go/Rust (whose const nodes already existed), C's file-scope `const`/`var` were
+**never extracted as nodes at all**: a C `declaration` nests its name inside an `init_declarator`
+(through `pointer_declarator`/`array_declarator`), and the generic variable-extraction fallback
+only finds a *direct* `identifier` child — so it produced nothing. Three changes (the same shape as
+Ruby's): (1) a C branch in `extractVariable` that resolves the name through the declarator chain and
+emits file-scope declarations as `constant`/`variable` (skipping function-body locals via an
+ancestor check, and `function_declarator` prototypes); (2) an `isConst` on the C extractor (a
+`const` `type_qualifier` → `constant` kind); (3) the shadow prune's declarator switch extended with
+`init_declarator`. Scoped to **C only** — C++ stays on the generic fallback (its class-scope members
+are the harder bucket).
+
+The one false-positive cluster the sweep surfaced was a **macro-prefixed-prototype misparse**, and
+the fix is the load-bearing C detail: an unknown leading macro (`CURL_EXTERN`, `XXH_PUBLIC_API`)
+makes tree-sitter-c misparse a prototype `MACRO RetType fn(args);` as a declaration whose declared
+"variable" is the **bare return-type identifier** (`XXH_errorcode`/`CURLcode`), splitting `fn(args)`
+off as a bogus expression — minting one spurious type-named global per prototype, then edged by
+every function returning that type (redis's `XXH_errorcode` 1→18 before the fix). These misparses
+*always* yield a **bare `identifier`** declarator (verified across pointer/array/sized return
+variants); real consts/tables always carry an initializer (`init_declarator`) and real
+pointer/array globals carry their own declarator. So the C branch **skips bare-`identifier`
+declarators entirely** — killing the whole FP class at the cost of only uninitialized scalar globals
+(`static int g;`), which are rare and low-value. After the fix: every sampled edge on
+hiredis/redis/curl was a true positive, the guard-invariant leak check found 0 shadows across all
+three, and `impact` deltas confirm the blind→real radius (`asmManager` 2→97, `Curl_ssl` 3→57,
+`hiredisAllocFns` 1→71).
+
+**Java + C# were the cleanest class-scope languages — one kind switch, no new guards.** Both keep
+constants *inside a class* (Java `static final` fields; C# `const` / `static readonly`), so unlike
+C the nodes already existed — but as **`field`** kind, which the value-ref gate (`constant`/
+`variable` only) rejects. The whole change was emitting the constant *subset* as `constant`: an
+`isConst` predicate on each extractor (Java = a `static final` field; C# = a `const`, or a `static
+readonly`) plus a kind switch in `extractField`. Everything else was already in place — the
+class-scope target gate (from Ruby), the `identifier` reader-scan, and crucially the shadow prune:
+a method-local that shadows a class const is a `variable_declarator` in both grammars, *already* in
+the prune switch, so a class const shadowed by a local is dropped with no new wiring (validated by
+the Java/C# shadow tests). Instance fields stay `field` — a Java instance `final` or a C# instance
+`readonly` is per-object state, not a shared constant, so it's never a target. The distinctive-name
+gate fits both conventions cleanly (Java `UPPER_SNAKE`, C# `PascalCase`), so no FP class emerged:
+across S/M/L (gson/commons-lang/guava, automapper/newtonsoft/efcore) every sampled edge was a true
+positive, 0 shadow leaks, no minified-file FPs, node count identical on/off. The `impact` wins are
+the headline — Java's canonical `public static final` constants (`INDEX_NOT_FOUND` 4→165, `EMPTY`
+5→161) and C#'s `const`/`static readonly` (`Prefix` 40→237, a generated `_resourceManager` 22→1664)
+all went from a blind "1 affected" to their real radius. The known sibling-class limitation (the
+same const name in two classes in one file resolves to the file-wide target) is shared with Ruby and
+stayed negligible.
+
+**PHP was a near-pure "easy path" — one reader-scan line, no extractor change, no prune wiring.**
+PHP already extracts both top-level `const X = …` and class `const X = …` as `constant` kind (a
+dedicated `const_declaration` handler), inside the right scope (`file:` / `class:`, both gated). The
+*only* change was the reader-scan: PHP represents a constant *reference* — bare `X`, or the const
+half of `self::X` / `Foo::X` / `static::X` — as a **`name`** node, which the scan (matching
+`identifier` / `constant`) missed, so it found nothing until `name` was added. That's safe across
+languages: `flushValueRefs` only runs for the value-ref set, and `name` is PHP-only among them. **No
+shadow prune was needed at all** — a PHP local is a `$var` (`variable_name`), a different namespace
+from a bare constant, so a local can *never* shadow a constant; there is nothing to prune (the
+cleanest case yet). Precision was excellent: UPPER_SNAKE constants fit the distinctive-name gate, and
+a dedicated check for a target whose name collides with a same-file *class* (PHP's one realistic FP —
+`name` nodes also name classes in `new Foo()` / `Foo::`) found **zero** collisions across
+guzzle/monolog/laravel; every sampled edge was a true positive, node count identical on/off.
+
+**The honest caveat: PHP is lower-yield than the class-scope languages, by design.** PHP idiom reads
+constants *across* files far more than within one (a `Logger::DEBUG` or a config constant consumed
+everywhere), and value-refs is **same-file only** — so laravel (2,956 files) produced only 86 edges
+vs. Ruby rails's 2,255 (1,452 files). This is not a miss: the cross-file reads are out of scope for
+*every* language (resolution would need import/scope analysis), and PHP simply leans on them more.
+The same-file reads it *does* capture are clean and the transitive impact wins are real
+(`INVISIBLE_CHARACTERS` 1→93 from 3 direct readers). Net: correct and additive, just a smaller
+absolute contribution than Java/C#/Go.
+
+**Scala — the `object` is the constant scope.** Scala has no `static`; the idiom for a shared
+constant is a `val` inside a singleton `object` (`object Config { val Timeout = 30 }`). A top-level
+`val` already extracted as `constant`, but `object` and `class` vals both came out as `field` (the
+gate rejects `field`). The fix is a kind refinement in the Scala `val_definition` handler: walk to
+the enclosing definition and treat an `object_definition` (or top level) val as `constant`/`variable`
+— while a `class`/`trait`/`enum` val stays `field`, because it is per-instance immutable state, the
+exact analogue of the Java instance `final` we also keep as `field`. (`object` and `class` both
+extract as `class` *kind*, so the distinction is the enclosing AST node type, not the node kind.)
+The shadow prune gained `val_definition`/`var_definition` (a method-local `val` can shadow an object
+val); the reader-scan needed nothing, since a Scala val reference is a plain `identifier`. Method-local
+vals are not extracted at all, so they're not a target source. The one **known limitation** is
+Scala's interchangeable `val`/`def` for members: a camelCase val can share a name with a method in the
+same file, and same-file name matching can't distinguish them — but it's bounded (like Ruby's
+sibling-class case), and on the sweep every flagged val/def collision turned out to be a real `object`
+val read by sibling vals (cats' typeclass instances: `val flatMap = monad`, read by
+`invariantSemigroupal`). Validated S/M/L (upickle/cats/pekko): node count identical on/off, top
+targets genuine object vals (`maxArity` `val = 22`, `DigitTens` lookup table), impact wins real
+(`maxArity` 3→17). The distinctive-name gate fits Scala's camelCase/PascalCase constants (`maxArity`,
+`IntegralPattern`) via their internal uppercase letter.
+
+**Kotlin combined three already-built techniques.** Kotlin has no `static`: shared constants live at
+top level, in an `object` (singleton), or in a class's `companion object` — all `val`/`const val`. A
+class instance `val` is per-object state. Nothing extracted before because a Kotlin property name
+nests (`property_declaration → variable_declaration → simple_identifier`) and the generic path reads
+only a direct child — the **C** problem. The fix handles `property_declaration` in the Kotlin
+`visitNode` hook (where the existing one already manages `fun interface` misparses): pull the nested
+name, then walk to the enclosing definition to set the kind — `object_declaration`/`companion_object`
+(or top level) → `constant`/`variable` (the **Scala** object-vs-class rule), `class_declaration` →
+`field`, and a property under a `function_body`/`init`/lambda is a local and skipped. The reader-scan
+gained `simple_identifier` (Kotlin's reference node — the **PHP `name`** move; `simple_identifier` is
+Kotlin-only among the value-ref set), and the shadow prune gained `property_declaration` (a method-local
+`val` can shadow an object const). Kotlin's parse fidelity is clean (its one known misparse,
+`fun interface`, is already handled), so unlike C++ no precision tail emerged. It validated as one of
+the *cleanest* languages: companion-object bit-masks and state constants are a heavy, same-file-read
+idiom (coroutines' `BLOCKING_SHIFT` 1→24, `TERMINATED` 2→22 in the scheduler; okio's `STATE_IN_QUEUE`
+1→32; ktor's content-type `TYPE` 8→109). okio had 0 collisions, coroutines 1 (cross-file). The same
+val/def-or-class name-overlap limitation as Scala applies (ktor's HTTP DSL names a header const and a
+class the same), plus the sibling-companion case (several `companion object { const val TYPE }` in one
+file collapse to the file-wide target, like Ruby's sibling-class) — both bounded, and every flagged
+collision investigated was a real object/companion const.
+
+**C++ was attempted and reverted** — the machinery (file/namespace-scope + class `field_declaration`
+extraction) is correct on clean C++, but tree-sitter-cpp's parse fidelity on real template/macro-heavy
+code (and the `.h`→C-grammar routing) leaks class members and parameters to file scope as bogus
+constants. Two guards (skip declarations under an `ERROR` or `compound_statement` ancestor) removed
+~83% of the gross leaks, but the residual pervaded even well-structured library source
+(template-class member leaks, amalgamated mega-headers, `.h`-as-C++). It did not reach the precision
+bar the other languages hold, so it was reverted. Reviving C++ needs prior work on C++ parse handling
+(template-class member scoping, `.h`-as-C++ detection, amalgamated-header exclusion), not a value-refs
+wiring pass. See the playbook's §2b C++ note.
+
 **`tsx` is covered by the TS rows** — excalidraw is a React/.tsx codebase, so the headline
 **`tsx` is covered by the TS rows** — excalidraw is a React/.tsx codebase, so the headline
 `tablerIconProps` (1→170) and most of its targets live in `.tsx` files. The one
 `tablerIconProps` (1→170) and most of its targets live in `.tsx` files. The one
 tsx-specific path — a const read *only* inside JSX (`<Foo x={CONST}/>`) — relies on the
 tsx-specific path — a const read *only* inside JSX (`<Foo x={CONST}/>`) — relies on the

+ 7 - 0
src/extraction/languages/c-cpp.ts

@@ -110,6 +110,13 @@ export const cExtractor: LanguageExtractor = {
   nameField: 'declarator',
   nameField: 'declarator',
   bodyField: 'body',
   bodyField: 'body',
   paramsField: 'parameters',
   paramsField: 'parameters',
+  // A `const`/`static const` file-scope declaration carries a `type_qualifier`
+  // child reading "const" — extract those as `constant`, plain globals as
+  // `variable`.
+  isConst: (node) =>
+    node.namedChildren.some(
+      (c: SyntaxNode) => c.type === 'type_qualifier' && c.text === 'const'
+    ),
   getReturnType: extractCppReturnType,
   getReturnType: extractCppReturnType,
   resolveTypeAliasKind: (node, _source) => {
   resolveTypeAliasKind: (node, _source) => {
     // C typedef: `typedef enum { ... } name;` or `typedef struct { ... } name;`
     // C typedef: `typedef enum { ... } name;` or `typedef struct { ... } name;`

+ 16 - 0
src/extraction/languages/csharp.ts

@@ -121,6 +121,22 @@ export const csharpExtractor: LanguageExtractor = {
     }
     }
     return false;
     return false;
   },
   },
+  // `const` and `static readonly` fields are C# constants (`MaxItems`, lookup
+  // tables, shared config). Drives `constant` kind so value-reference edges
+  // target them; instance `readonly` / plain `static` fields stay `field`s.
+  isConst: (node) => {
+    let hasStatic = false;
+    let hasReadonly = false;
+    for (let i = 0; i < node.childCount; i++) {
+      const child = node.child(i);
+      if (child?.type !== 'modifier') continue;
+      const t = child.text;
+      if (t === 'const') return true;
+      if (t === 'static') hasStatic = true;
+      else if (t === 'readonly') hasReadonly = true;
+    }
+    return hasStatic && hasReadonly;
+  },
   isAsync: (node) => {
   isAsync: (node) => {
     for (let i = 0; i < node.childCount; i++) {
     for (let i = 0; i < node.childCount; i++) {
       const child = node.child(i);
       const child = node.child(i);

+ 13 - 0
src/extraction/languages/java.ts

@@ -86,6 +86,19 @@ export const javaExtractor: LanguageExtractor = {
     }
     }
     return false;
     return false;
   },
   },
+  // A `static final` field is a Java constant (`MAX_ITEMS`, lookup tables,
+  // shared config). Drives `constant` kind so value-reference edges target it;
+  // instance / `final`-only / `static`-only fields stay mutable `field`s.
+  isConst: (node) => {
+    for (let i = 0; i < node.childCount; i++) {
+      const child = node.child(i);
+      if (child?.type === 'modifiers') {
+        const text = child.text;
+        return /\bstatic\b/.test(text) && /\bfinal\b/.test(text);
+      }
+    }
+    return false;
+  },
   extractImport: (node, source) => {
   extractImport: (node, source) => {
     const importText = source.substring(node.startIndex, node.endIndex).trim();
     const importText = source.substring(node.startIndex, node.endIndex).trim();
     const scopedId = node.namedChildren.find((c: SyntaxNode) => c.type === 'scoped_identifier');
     const scopedId = node.namedChildren.find((c: SyntaxNode) => c.type === 'scoped_identifier');

+ 45 - 0
src/extraction/languages/kotlin.ts

@@ -85,6 +85,51 @@ export const kotlinExtractor: LanguageExtractor = {
   nameField: 'simple_identifier',
   nameField: 'simple_identifier',
   bodyField: 'function_body',
   bodyField: 'function_body',
   visitNode: (node, ctx) => {
   visitNode: (node, ctx) => {
+    // Kotlin properties (`val` / `var` / `const val`). The name nests as
+    // property_declaration → variable_declaration → simple_identifier, which the
+    // generic variable/field path can't read — so nothing was extracted before.
+    // Kind by enclosing scope: a singleton `object` / `companion object` (and a
+    // top-level property) holds *shared* values — `val`→`constant`,
+    // `var`→`variable` (the Scala-object rule; a `const val` is a `val`). A
+    // `class`/`interface`/`enum` instance `val`/`var` is per-instance state →
+    // `field` (never a value-ref target, like a Java instance `final`). A
+    // property inside a function body / `init` block / lambda is a local and is
+    // skipped entirely.
+    if (node.type === 'property_declaration') {
+      const varDecl = node.namedChildren.find((c) => c.type === 'variable_declaration');
+      const nameNode = varDecl?.namedChildren.find((c) => c.type === 'simple_identifier');
+      if (!nameNode) return false; // destructuring `val (a,b)` etc. — leave to default
+      const name = getNodeText(nameNode, ctx.source);
+      if (!name) return false;
+
+      // Walk to the nearest enclosing definition: a function body / init / lambda
+      // means it's a local; `object`/`companion object` is a constant scope; a
+      // `class_declaration` (covers class/interface/enum) is an instance scope.
+      let scope: 'local' | 'const' | 'instance' = 'const';
+      for (let p = node.parent; p; p = p.parent) {
+        const pt = p.type;
+        if (
+          pt === 'function_body' || pt === 'function_declaration' ||
+          pt === 'lambda_literal' || pt === 'anonymous_initializer' ||
+          pt === 'control_structure_body' || pt === 'getter' || pt === 'setter'
+        ) { scope = 'local'; break; }
+        if (pt === 'companion_object' || pt === 'object_declaration') { scope = 'const'; break; }
+        if (pt === 'class_declaration') { scope = 'instance'; break; }
+      }
+      if (scope === 'local') return true; // a local — don't extract
+
+      const binding = node.namedChildren.find((c) => c.type === 'binding_pattern_kind');
+      const isVal = binding != null && getNodeText(binding, ctx.source) === 'val';
+      const kind = scope === 'instance' ? 'field' : isVal ? 'constant' : 'variable';
+
+      const typeNode = node.childForFieldName('type');
+      const sig = typeNode
+        ? `${isVal ? 'val' : 'var'} ${name}: ${getNodeText(typeNode, ctx.source)}`
+        : undefined;
+      ctx.createNode(kind, name, node, { signature: sig });
+      return true;
+    }
+
     // Handle Kotlin `fun interface` declarations.
     // Handle Kotlin `fun interface` declarations.
     // Tree-sitter-kotlin doesn't support `fun interface` syntax (Kotlin 1.4+).
     // Tree-sitter-kotlin doesn't support `fun interface` syntax (Kotlin 1.4+).
     // It produces two different misparse patterns:
     // It produces two different misparse patterns:

+ 23 - 12
src/extraction/languages/scala.ts

@@ -136,18 +136,29 @@ export const scalaExtractor: LanguageExtractor = {
       const name = getValVarName(node, ctx.source);
       const name = getValVarName(node, ctx.source);
       if (!name) return false;
       if (!name) return false;
 
 
-      const isInClass = ctx.nodeStack.length > 0 &&
-        (() => {
-          const parentId = ctx.nodeStack[ctx.nodeStack.length - 1];
-          const parentNode = ctx.nodes.find((n) => n.id === parentId);
-          return parentNode != null && (
-            parentNode.kind === 'class' || parentNode.kind === 'trait' ||
-            parentNode.kind === 'interface' || parentNode.kind === 'struct' ||
-            parentNode.kind === 'enum' || parentNode.kind === 'module'
-          );
-        })();
-
-      const kind = isInClass ? 'field' : (t === 'val_definition' ? 'constant' : 'variable');
+      // An `object` is a singleton: its `val`s are shared constants (the Scala
+      // idiom for `static final` — `object Config { val Timeout = 30 }`), so
+      // emit them as `constant`/`variable` like a top-level val, which lets
+      // value-reference edges target them. A `class`/`trait`/`enum`/`given` val
+      // is a per-instance immutable field. Both an `object` and a `class`
+      // extract as `class` kind, so the AST node type of the enclosing
+      // definition — not the parent node's kind — is what distinguishes them.
+      let enclosingDef: string | null = null;
+      for (let p = node.parent; p; p = p.parent) {
+        if (
+          p.type === 'class_definition' || p.type === 'trait_definition' ||
+          p.type === 'enum_definition' || p.type === 'given_definition' ||
+          p.type === 'object_definition'
+        ) {
+          enclosingDef = p.type;
+          break;
+        }
+      }
+      const isInstanceField =
+        enclosingDef === 'class_definition' || enclosingDef === 'trait_definition' ||
+        enclosingDef === 'enum_definition' || enclosingDef === 'given_definition';
+
+      const kind = isInstanceField ? 'field' : (t === 'val_definition' ? 'constant' : 'variable');
       const typeNode = node.childForFieldName('type');
       const typeNode = node.childForFieldName('type');
       const sig = typeNode
       const sig = typeNode
         ? `${t === 'val_definition' ? 'val' : 'var'} ${name}: ${getNodeText(typeNode, ctx.source)}`
         ? `${t === 'val_definition' ? 'val' : 'var'} ${name}: ${getNodeText(typeNode, ctx.source)}`

+ 129 - 6
src/extraction/tree-sitter.ts

@@ -151,6 +151,47 @@ function scalaBaseTypeName(node: SyntaxNode | null, source: string): string | nu
   }
   }
 }
 }
 
 
+/**
+ * Resolve the declared identifier inside a C declarator. A `declaration`'s
+ * `declarator` field nests the name through `init_declarator` (with value),
+ * `pointer_declarator`/`array_declarator`/`parenthesized_declarator`
+ * wrappers (each via their own `declarator` field) down to an `identifier`.
+ * A `function_declarator` means the declaration is a function prototype (or a
+ * function-pointer var) — return null so it isn't extracted as a variable.
+ */
+function cDeclaratorIdentifier(node: SyntaxNode | null): SyntaxNode | null {
+  let cur: SyntaxNode | null = node;
+  let guard = 0;
+  while (cur && guard++ < 12) {
+    switch (cur.type) {
+      case 'identifier':
+        return cur;
+      case 'function_declarator':
+        return null;
+      case 'init_declarator':
+      case 'pointer_declarator':
+      case 'array_declarator':
+      case 'parenthesized_declarator':
+        cur = getChildByField(cur, 'declarator');
+        break;
+      default:
+        return null;
+    }
+  }
+  return null;
+}
+
+/** True when `node` is (transitively) inside a C function body — i.e. a local,
+ * not a file/namespace-scope declaration. Walks the parent chain to the root. */
+function hasFunctionAncestor(node: SyntaxNode): boolean {
+  let p = node.parent;
+  while (p) {
+    if (p.type === 'function_definition') return true;
+    p = p.parent;
+  }
+  return false;
+}
+
 /**
 /**
  * PHP type-position wrapper node kinds (a type-hint is `named_type`,
  * PHP type-position wrapper node kinds (a type-hint is `named_type`,
  * `?Foo` is `optional_type`, `A|B` is `union_type`, `A&B` is
  * `?Foo` is `optional_type`, `A|B` is `union_type`, `A&B` is
@@ -224,7 +265,7 @@ export class TreeSitterExtractor {
   // Value-reference edges (default ON; set CODEGRAPH_VALUE_REFS=0 to disable; see flushValueRefs).
   // Value-reference edges (default ON; set CODEGRAPH_VALUE_REFS=0 to disable; see flushValueRefs).
   // Same-file reads of file-scope const/var symbols → `references` edges so impact analysis catches
   // Same-file reads of file-scope const/var symbols → `references` edges so impact analysis catches
   // value consumers ("change this constant/table, affect its readers").
   // value consumers ("change this constant/table, affect its readers").
-  private static readonly VALUE_REF_LANGS = new Set<string>(['typescript', 'javascript', 'tsx', 'go', 'python', 'rust', 'ruby']);
+  private static readonly VALUE_REF_LANGS = new Set<string>(['typescript', 'javascript', 'tsx', 'go', 'python', 'rust', 'ruby', 'c', 'java', 'csharp', 'php', 'scala', 'kotlin']);
   private static readonly MAX_VALUE_REF_NODES = 20_000;
   private static readonly MAX_VALUE_REF_NODES = 20_000;
   private readonly valueRefsEnabled = process.env.CODEGRAPH_VALUE_REFS !== '0';
   private readonly valueRefsEnabled = process.env.CODEGRAPH_VALUE_REFS !== '0';
   private fileScopeValues = new Map<string, string>();
   private fileScopeValues = new Map<string, string>();
@@ -587,7 +628,8 @@ export class TreeSitterExtractor {
     if (this.tree) {
     if (this.tree) {
       const declCounts = new Map<string, number>();
       const declCounts = new Map<string, number>();
       const bump = (nameNode: SyntaxNode | null) => {
       const bump = (nameNode: SyntaxNode | null) => {
-        if (nameNode && nameNode.type === 'identifier') {
+        // `simple_identifier` is Kotlin's name node (a property declarator's name).
+        if (nameNode && (nameNode.type === 'identifier' || nameNode.type === 'simple_identifier')) {
           const nm = getNodeText(nameNode, this.source);
           const nm = getNodeText(nameNode, this.source);
           if (targets.has(nm)) declCounts.set(nm, (declCounts.get(nm) ?? 0) + 1);
           if (targets.has(nm)) declCounts.set(nm, (declCounts.get(nm) ?? 0) + 1);
         }
         }
@@ -615,6 +657,21 @@ export class TreeSitterExtractor {
             else if (left) for (const c of left.namedChildren) bump(c);
             else if (left) for (const c of left.namedChildren) bump(c);
             break;
             break;
           }
           }
+          case 'init_declarator':       // C  `T X = …` (file-scope const AND the local that shadows it)
+            bump(cDeclaratorIdentifier(n));
+            break;
+          case 'val_definition':        // Scala  `val X = …` (object/top-level const AND a method-local that shadows it)
+          case 'var_definition': {      // Scala  `var X = …`
+            const pat = getChildByField(n, 'pattern');
+            if (pat?.type === 'identifier') bump(pat);
+            break;
+          }
+          case 'property_declaration': { // Kotlin  `val X = …` (object/top-level const AND a method-local that shadows it)
+            const vd = n.namedChildren.find((c) => c.type === 'variable_declaration');
+            const id = vd?.namedChildren.find((c) => c.type === 'simple_identifier');
+            if (id) bump(id);
+            break;
+          }
         }
         }
         for (let i = 0; i < n.namedChildCount; i++) {
         for (let i = 0; i < n.namedChildCount; i++) {
           const c = n.namedChild(i);
           const c = n.namedChild(i);
@@ -633,8 +690,18 @@ export class TreeSitterExtractor {
         const n = stack.pop()!;
         const n = stack.pop()!;
         visited++;
         visited++;
         // `constant` covers Ruby, where both a constant's definition and its
         // `constant` covers Ruby, where both a constant's definition and its
-        // references are `constant`-typed nodes, not `identifier`.
-        if (n.type === 'identifier' || n.type === 'constant') {
+        // references are `constant`-typed nodes, not `identifier`. `name` covers
+        // PHP, where a constant reference — bare `MAX_ITEMS` or the const half of
+        // `self::MAX_ITEMS` / `Foo::MAX_ITEMS` — is a `name` node (a `$var` local
+        // is a `variable_name`, a different namespace, so it can never shadow a
+        // bare constant — no prune wiring needed). `simple_identifier` covers
+        // Kotlin, whose every name reference (a const read included) is that
+        // node type. Safe across languages: a file only holds its own grammar's
+        // nodes; `name` is PHP-only and `simple_identifier` is Kotlin-only here.
+        if (
+          n.type === 'identifier' || n.type === 'constant' ||
+          n.type === 'name' || n.type === 'simple_identifier'
+        ) {
           const refName = getNodeText(n, this.source);
           const refName = getNodeText(n, this.source);
           const targetId = targets.get(refName);
           const targetId = targets.get(refName);
           // Skip self and same-name targets: a symbol referencing a file-scope
           // Skip self and same-name targets: a symbol referencing a file-scope
@@ -1581,6 +1648,17 @@ export class TreeSitterExtractor {
     const visibility = this.extractor.getVisibility?.(node);
     const visibility = this.extractor.getVisibility?.(node);
     const isStatic = this.extractor.isStatic?.(node) ?? false;
     const isStatic = this.extractor.isStatic?.(node) ?? false;
 
 
+    // A class field that is actually a CONSTANT (Java `static final`, C# `const`
+    // / `static readonly`) is extracted as `constant` kind, not `field`, so
+    // value-reference edges treat it as a target (the gate accepts
+    // constant/variable, not field). Scoped to languages whose `isConst`
+    // predicate is field-shaped — other languages' fields stay `field`.
+    const fieldKind: NodeKind =
+      (this.language === 'java' || this.language === 'csharp') &&
+      (this.extractor.isConst?.(node) ?? false)
+        ? 'constant'
+        : 'field';
+
     // Java field_declaration: "private final String name = value;" → variable_declarator(s) are direct children
     // Java field_declaration: "private final String name = value;" → variable_declarator(s) are direct children
     // C# field_declaration: wraps in variable_declaration → variable_declarator(s)
     // C# field_declaration: wraps in variable_declaration → variable_declarator(s)
     let declarators = node.namedChildren.filter(
     let declarators = node.namedChildren.filter(
@@ -1641,7 +1719,7 @@ export class TreeSitterExtractor {
         if (!nameNode) continue;
         if (!nameNode) continue;
         const name = getNodeText(nameNode, this.source);
         const name = getNodeText(nameNode, this.source);
         const signature = typeText ? `${typeText} ${name}` : name;
         const signature = typeText ? `${typeText} ${name}` : name;
-        const fieldNode = this.createNode('field', name, decl, {
+        const fieldNode = this.createNode(fieldKind, name, decl, {
           docstring,
           docstring,
           signature,
           signature,
           visibility,
           visibility,
@@ -1665,7 +1743,7 @@ export class TreeSitterExtractor {
         || node.namedChildren.find(c => c.type === 'identifier');
         || node.namedChildren.find(c => c.type === 'identifier');
       if (nameNode) {
       if (nameNode) {
         const name = getNodeText(nameNode, this.source);
         const name = getNodeText(nameNode, this.source);
-        this.createNode('field', name, node, {
+        this.createNode(fieldKind, name, node, {
           docstring,
           docstring,
           visibility,
           visibility,
           isStatic,
           isStatic,
@@ -1967,6 +2045,51 @@ export class TreeSitterExtractor {
         const initSignature = initValue ? `= ${initValue}${initValue.length >= 100 ? '...' : ''}` : undefined;
         const initSignature = initValue ? `= ${initValue}${initValue.length >= 100 ? '...' : ''}` : undefined;
         this.createNode(kind, name, nameNode, { docstring, signature: initSignature, isExported });
         this.createNode(kind, name, nameNode, { docstring, signature: initSignature, isExported });
       });
       });
+    } else if (this.language === 'c') {
+      // C: a `declaration` node's name nests inside the `declarator` field —
+      // `init_declarator` (with value) or bare/pointer/array declarators (no
+      // value); a `function_declarator` is a prototype, not a variable. The
+      // generic fallback below only finds a *direct* identifier child, which C
+      // never has, so file-scope consts/globals went unextracted entirely (and
+      // so had no impact-radius edges). Only file-scope declarations are tracked
+      // — locals inside a function body are skipped (a `static const` table read
+      // by same-file functions is the value the impact graph wants, not every
+      // block-local). C allows several declarators per declaration
+      // (`int a = 1, b = 2;`), so iterate them.
+      if (!hasFunctionAncestor(node)) {
+        for (let i = 0; i < node.namedChildCount; i++) {
+          const child = node.namedChild(i);
+          if (!child) continue;
+          // Accept only `init_declarator` (has a value) and pointer/array
+          // declarators. A *bare* `identifier` declarator is deliberately
+          // skipped: an unknown leading macro (`CURL_EXTERN`, `XXH_PUBLIC_API`)
+          // makes tree-sitter-c misparse a prototype `MACRO RetType fn(args);`
+          // as a declaration whose "variable" is the bare return-type
+          // identifier, splitting `fn(args)` off as a bogus expression — minting
+          // a spurious type-named global for every macro-prefixed prototype in a
+          // header. Those misparses are always bare identifiers; real
+          // consts/tables always carry an initializer. The only legit loss is
+          // uninitialized scalar globals (`static int g;`).
+          if (
+            child.type !== 'init_declarator' &&
+            child.type !== 'pointer_declarator' &&
+            child.type !== 'array_declarator'
+          ) {
+            continue;
+          }
+          const nameNode = cDeclaratorIdentifier(child);
+          if (!nameNode) continue;
+          const name = getNodeText(nameNode, this.source);
+          if (!name) continue;
+          const valueNode =
+            child.type === 'init_declarator' ? getChildByField(child, 'value') : null;
+          const initValue = valueNode ? getNodeText(valueNode, this.source).slice(0, 100) : undefined;
+          const initSignature = initValue
+            ? `= ${initValue}${initValue.length >= 100 ? '...' : ''}`
+            : undefined;
+          this.createNode(kind, name, child, { docstring, signature: initSignature, isExported });
+        }
+      }
     } else {
     } else {
       // Generic fallback for other languages
       // Generic fallback for other languages
       // Try to find identifier children
       // Try to find identifier children