# Playbook: extend value-reference edges to a new language **Purpose.** This is the operational runbook for adding + validating value-reference-edge coverage for one more language. Point a fresh session at this file and say **"Start on language X"** — it has everything: how the feature works, where the code is, the exact validation recipe (with scripts), the per-language checklist, and the traps already hit. Design rationale + the validation matrix already done live in the companion doc: [`value-reference-edges.md`](./value-reference-edges.md). This file is the *how-to*. --- ## 0. "Start on language X" — do this in order 1. Read §1 (how it works) and §2 (current state) so you know the mechanism and what's done. 2. Do the **per-language wiring check** (§5 step A–C) — this is where languages differ and where most of the real work/decisions are. Do NOT skip: a wrong declarator node type or a class-scope-vs-file-scope mismatch makes the feature silently emit nothing (or wrong edges). 3. Run the **validation sweep** (§4) on small/medium/large **public OSS** repos for that language. Hunt FPs. **Fix FP clusters; record singletons.** (See §3 for what a real FP looks like vs an acceptable one.) 4. Add a **row to the matrix** in `value-reference-edges.md` and a **test case** in `__tests__/value-reference-edges.test.ts`. 5. Commit on a branch, open a PR. (§6 has the git workflow + how the prior PRs were done.) Scope rule (hard): **never eval on the maintainer's own repos** — clone a real public OSS repo for the language. (Memory: `agent-eval-targets-public-oss-only`.) --- ## 1. How value-reference edges work **What:** a `references` edge with `metadata: { valueRef: true }` from a *reader symbol* to the **file-scope `const`/`var` it reads**, same-file only. It exists so impact analysis catches "change this constant / config object / lookup table → affect its readers" — a class of change calls/imports/inheritance edges never captured (a const's consumers used to look like "nothing depends on this"). **Where it flows:** straight into `getImpactRadius` → `codegraph impact` and the impact trail in `codegraph_explore` / `codegraph_node`. No agent-behaviour change required. **The win is impact-radius correctness** (a const 90 symbols read going from "1 affected" to "90"), *not* agent read-reduction (see §4.3). **Code — all in `src/extraction/tree-sitter.ts`:** | Symbol | Role | |---|---| | `VALUE_REF_LANGS` (static Set) | languages the feature runs for. Currently `typescript`, `javascript`, `tsx`, `go`, `python`, `rust`, `ruby`, `c`, `java`, `csharp`, `php`, `scala`, `kotlin`, `swift`, `dart`, `pascal`. **Add the new language here.** | | `valueRefsEnabled` | `process.env.CODEGRAPH_VALUE_REFS !== '0'` — default ON, env opts out. | | `MAX_VALUE_REF_NODES` (20_000) | per-scope traversal cap (and the shadow-scan cap). | | `captureValueRefScope(kind, name, id, node)` | called from `createNode` on every node. Records **targets** (file-scope `const`/`var`) and **reader scopes** (`function`/`method`/`const`/`var`). | | `flushValueRefs()` | called once at end of `extract()`. Prunes shadowed targets, then for each reader scope walks its subtree for identifiers matching a target name and emits the edges. | **The two gates inside `captureValueRefScope`** (what you may need to adjust per language): - **Target gate:** `kind ∈ {constant, variable}` **and** `name.length >= 3` **and** `/[A-Z_]/.test(name)` (distinctive name — dodges single-letter / all-lowercase shadowing) **and** the node's parent id starts with `file:`, `class:`, or `module:` (file/class/module scope). - **Reader gate:** `kind ∈ {function, method, constant, variable}`. **The emit loop in `flushValueRefs`:** same-file only (targets + scopes are per-file, reset each flush); deduped per `(reader, target)`; skips `isGeneratedFile(path)`; **prunes shadowed targets** (see §3). --- ## 2. Current state (what's shipped + validated) - **Default ON** for TS/JS/tsx + Go + Python + Rust + Ruby + C + Java + C# (`CODEGRAPH_VALUE_REFS=0` disables). Shipped in **PR #895** (flip-on + the shadow prune); Go added in a later PR (the shadow-prune declarator switch + `VALUE_REF_LANGS`); C added later still (extractor change to emit the nodes + the bare-identifier misparse guard); Java + C# after that (field→constant kind switch for the const subset). - **Validated S/M/L** in **TS, JS, tsx, Go, Python, Rust, Ruby, C, Java, and C#** — see the matrix in the design doc. All clean: node count identical on/off, precision guards held, impact win reproduced. Go required extending the shadow prune (per-grammar declarators) — the worked example of "step B is load-bearing." **C required the Ruby treatment** (the extractor didn't emit C file-scope const/var nodes at all) **plus** a C-specific FP guard (a macro-prefixed-prototype misparse mints a bare-identifier "variable" named after the return type — skip bare-`identifier` declarators). It was the worked example of "the §2b coverage table's *easy-path* guess can be wrong — always do §5 step C (confirm the nodes exist) before trusting it." - **Java + C# were the cleanest class-scope ("Ruby treatment") languages.** The constants already extract — but as `field` kind, which the gate rejects. The whole change was emitting the const *subset* as `constant`: an `isConst` predicate on each extractor (Java `static final`; C# `const` / `static readonly`) + a kind switch in `extractField`. **No new shadow-prune wiring** (method locals are `variable_declarator`, already in the switch) and **no FP guards** (UPPER_SNAKE / PascalCase fit the distinctive-name gate). Instance `final`/`readonly` fields correctly stay `field`. Validated S/M/L: gson/commons-lang/guava, automapper/newtonsoft/efcore — 0 leaks, node parity, big impact wins (`INDEX_NOT_FOUND` 4→165, `_resourceManager` 22→1664). - **PHP was the cleanest of all — one reader-scan line.** Constants already extract as `constant` (top-level + class), so the only change was teaching the reader-scan that a PHP constant *reference* is a `name` node (bare `X`, or the const half of `self::X` / `Foo::X`). **No extractor change, no prune wiring** (a `$var` local can't shadow a bare constant — different namespace). Validated S/M/L (guzzle/monolog/laravel), all clean, 0 class/const collisions. The honest caveat: **lower yield** — PHP reads constants cross-file far more than same-file (laravel 2,956 files → 86 edges), and value-refs is same-file only; still correct, just a smaller contribution. - **Scala — an `object` is the constant scope.** Scala has no `static`; a singleton `object`'s `val`s are the shared-constant idiom (`object Config { val Timeout = 30 }`). Top-level `val` already extracted as `constant`, but object/class vals both came out as `field`. The fix: in the Scala `val_definition` handler, walk to the enclosing definition — `object_definition` (or top-level) → `constant`/`variable`; `class`/`trait`/`enum` → `field` (per-instance, like Java instance `final`). Added `val_definition`/`var_definition` to the shadow prune (method-local `val` shadows). Reader-scan needed nothing (refs are `identifier`). Minor known limitation: Scala uses `val`/`def` interchangeably for members, so a camelCase val can share a name with a method — same-file name matching can't tell them apart (bounded, like Ruby's sibling-class; sweep showed flagged collisions were mostly real object vals read by siblings). Validated S/M/L (upickle/cats/pekko). - **C++ was attempted and reverted — DON'T retry without solving parse fidelity first.** tree-sitter-cpp mis-parses real template/macro-heavy C++ (and `.h` files route to the C grammar): class members and parameters leak to file scope as bogus constants/variables. Two guards (skip `ERROR`-ancestor and `compound_statement`-ancestor declarations) removed ~83% of gross leaks, but the residual pervades even well-structured library source (template-class member leaks, amalgamated mega-headers, `.h`-as-C++). It did not reach the precision bar of the other languages. See the C++ section below. - **Kotlin = C + Scala + PHP techniques combined (and clean).** Nothing extracted before (property name nests `property_declaration → variable_declaration → simple_identifier` — the C problem). Fix: handle `property_declaration` in the Kotlin `visitNode` hook — pull the nested name, walk to the enclosing definition for the kind (`object`/`companion object`/top-level → `constant`/`variable`; `class` → `field` — the Scala rule; skip locals under a `function_body`/`init`/lambda), add `simple_identifier` to the reader-scan (the PHP-`name` move), and `property_declaration` to the shadow prune. Clean parse fidelity (the one `fun interface` misparse is already handled), so no C++-style tail. One of the cleanest yields — companion-object bit-masks/state consts are a heavy same-file-read idiom. Validated S/M/L (okio/coroutines/ktor); only the bounded val/def-or-class and sibling-companion name overlaps remain (shared with Scala/Ruby). - **Swift reused Kotlin + two Swift-specific touches.** Top-level `let` + `static let` in a type are the shared constants (`enum`/`struct` namespace them); instance `let` stays `field`. Nested name (`property_declaration → pattern → simple_identifier`); reader-scan already covered (`simple_identifier`, from Kotlin). Two new things: **(1) the target gate was widened to `struct:`/ `enum:` parents** — Swift namespaces constants there (`enum Constants { static let X }`), and every other language's targets are `file:`/`class:`/`module:`; **(2) computed properties are skipped** (a `var x:Int{ … }` getter has no stored value — detect the `computed_property` child). Node creation slots into the *existing* Swift `property_declaration` handler (property-wrapper/type deps), leaving that untouched. Clean parse, no tail. Validated S/M/L (Alamofire/swift-argument-parser/swift-nio). - **Dart — clean grammar separation, but a sibling-body reader-scan fix.** Dart's grammar already splits the cases: **`static_final_declaration`** is *exactly* a top-level/`static` `const`/`final` (the shared-constant idiom), while instance fields/`var` use `initialized_identifier` and locals use `initialized_variable_definition` — so extracting `static_final_declaration` → `constant` (in a `visitNode` hook) has **no instance/local leaks to guard**. Reader-scan free (Dart refs are `identifier`). The catch was the **reader-scan**: Dart attaches a method/function `body` as a *next sibling* of the signature node (the stored scope), not a child, so the scan saw only the signature and **found nothing** until it was taught to pull in a `function_body` next-sibling (Dart-only among the value-ref set). Shadow prune needed `static_final_declaration` + `initialized_identifier` + `initialized_variable_definition` (a local `const X` shadowing a file `const X`). Validated S/M/L (http/flame/flutter-packages). **Caveat:** generated Dart files inflate the sibling-class ambiguity (a JNIGEN `_bindings.dart` with hundreds of `static final _class` collapses to the file-wide target). The common codegen suffixes (`.g.dart`/`.freezed.dart`/`.pb.dart`) are already filtered by `isGeneratedFile`; header-only-marked generators (JNIGEN) are not, so real source is clean but generated FFI/JNI bindings are noisy. - **Pascal — the genuine easy path + the Dart sibling-body fix again.** Unit/class `const` *already* extracted as `constant` (`variableTypes: ['declConst', …]`), so it was add-to-`VALUE_REF_LANGS` + the shadow prune (`declConst`/`declVar`; a local `const X` shadows a unit `const X`). The catch was the *same* reader-scan bug as Dart: Pascal's proc body is a **`block` sibling** of the `declProc` header (the reader scope), both under a `defProc` — so the same sibling-pull fix was extended to `block`. Reader-scan node type already covered (refs are `identifier`). **Low yield** — Pascal reads constants cross-unit more than same-file (horse: 4 edges). **Caveat:** Pascal is case-insensitive, but the reader-scan matches exact text, so a differently-cased reference is missed (no FP, just a miss); not worth normalizing. - **Tests:** `__tests__/value-reference-edges.test.ts` — same-file readers edged; surfaced in impact radius; shadowed const NOT edged (verified to fail without the guard); JSX-only read edged (tsx); `CODEGRAPH_VALUE_REFS=0` emits nothing. - **Memory:** `value-reference-edges-default-on` (the A/B finding + shadow guard rationale). --- ## 2b. Coverage vs the README (languages + frameworks) Tracked against the README's **Supported Languages** table (24 rows) and **Framework-aware Routes** list. Value-refs is **language-level**, so frameworks are *not* a separate axis (see the bottom of this section). **✅ Done — validated S/M/L (15 + 3 inherited):** | Language | How | |---|---| | TypeScript, JavaScript, tsx | file-scope `const`/`var`; the original languages | | Python | module-level `NAME =` | | Go | package `const`/`var` | | Rust | module + impl `const`/`static` | | Ruby | class/module `CONST` (the class-scope extension) | | C | file-scope `static const` scalars + pointer/array lookup tables + mutable globals. **Needed an extractor change** (nodes weren't emitted) + a bare-identifier misparse guard — NOT the easy path the table below first guessed | | Java | class `static final` fields. Nodes existed as `field` kind; emitted the const subset as `constant` (`isConst` + `extractField` kind switch). No new prune wiring, no FP guards | | C# | class `const` / `static readonly`. Identical to Java — same `field`→`constant` change | | PHP | top-level `const` + class `const` (both already `constant` kind). **Only** change was the reader-scan: a PHP const *reference* is a `name` node. No extractor change, no prune wiring (a `$var` local can't shadow a bare constant). Lower yield — PHP reads consts cross-file more than same-file | | Scala | top-level `val` (already `constant`) + **`object` val** (the singleton-constant idiom; re-kinded from `field` by walking to the enclosing `object_definition`). `class`/`trait`/`enum` vals stay `field`. `val_definition`/`var_definition` added to the shadow prune. Minor val/def name-collision limit | | Kotlin | top-level / `object` / `companion object` `val` (re-kinded from nothing — properties weren't extracted at all). Handled in `visitNode`: nested name (`variable_declaration → simple_identifier`, the C move) + scope-walk for kind (Scala move) + `simple_identifier` in the reader-scan (PHP move) + prune. `class` instance vals stay `field`. Clean — one of the best yields (companion bit-masks) | | Swift | top-level `let` + `static let` in `struct`/`enum`/`class`. Reused Kotlin (nested name + `simple_identifier` reader-scan). Two Swift touches: **gate widened to `struct:`/`enum:` parents** (Swift namespaces consts there), and **computed properties skipped**. `class`/instance stored props stay `field`. Slots into the existing Swift property-wrapper handler | | Dart | top-level `const`/`final` + class `static const`/`static final` — all the **`static_final_declaration`** node, cleanly separated by the grammar from instance/`var`/local (so no leak guard). `visitNode` → `constant`. Needed a reader-scan fix: Dart's method **body is a next sibling** of the signature, so the scan pulls in a `function_body` sibling. Generated-FFI noise (JNIGEN `_bindings.dart`) is the one caveat | | Pascal / Delphi | unit/class `const` (already extracted as `constant`). Add-to-`VALUE_REF_LANGS` + shadow prune (`declConst`/`declVar`) + the **same Dart sibling-body fix** (Pascal's proc body is a `block` sibling of the `declProc` header). Low yield (cross-unit reads); case-insensitive (exact-text scan misses re-cased refs) | | **Svelte, Vue, Astro** | **inherited for free** — their extractors re-parse the `