Parcourir la source

feat(extraction): add Rust to value-reference edges

Extend value-reference edges to Rust (module-level `const`/`static`). Rust's
declarators are `const_item`/`static_item` (the module consts) and
`let_declaration` (the local that shadows) — added to the per-grammar prune
switch. A synthetic probe caught the expected shadow FP (a `const TIMEOUT`
shadowed by a local `let TIMEOUT`), fixed by the addition.

No prune-RULE change needed: Rust's `#[cfg]`-conditional consts (`#[cfg(unix)]
const SEP = …; #[cfg(windows)] const SEP = …`) are kept correctly by the
Python-era file-scope-count rule (validated on tokio's io/interest.rs cfg-gated
flags). Consts written inside a config macro (`cfg_aio! { … }`) live in an
unparsed token tree, so the prune's syntax walk doesn't even see them.

Validated small/medium/large on public Rust OSS (ripgrep, tokio, rust-analyzer):
node count identical on/off (incl. 38.8k nodes on rust-analyzer), precision
samples all true positives, zero real shadow leaks, and the impact win reproduces
(rust-analyzer `INLINE_CAP` 2→183, tokio `PERMIT_SHIFT` 1→97).

Adds 2 Rust test cases (const/static read; the local-`let` shadow). Design
matrix, playbook, and CHANGELOG updated. Full suite green (1,557 passed).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Colby McHenry il y a 1 semaine
Parent
commit
6281a541e4

+ 1 - 1
CHANGELOG.md

@@ -11,7 +11,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
 ### New Features
 
-- Impact and blast-radius analysis for TypeScript, JavaScript, Go, and Python now understands the readers of a constant. When you change a file-scope, package-level, or module-level constant — a config object, a lookup table, a shared constant — the other symbols in that file that read it now show up as affected, where before they were invisible (impact only followed calls, imports, and inheritance, so a constant's consumers looked like "nothing depends on this"). This makes `codegraph impact`, and the impact trail in `codegraph_explore`/`codegraph_node`, catch the "change this table, break its readers" class of change. It's on by default and adds no nodes to your graph; bundled/minified files and ambiguously-shadowed names are skipped to keep results precise. Set `CODEGRAPH_VALUE_REFS=0` to turn it off.
+- Impact and blast-radius analysis for TypeScript, JavaScript, Go, Python, and Rust now understands the readers of a constant. When you change a file-scope, package-level, or module-level constant — a config object, a lookup table, a shared constant — the other symbols in that file that read it now show up as affected, where before they were invisible (impact only followed calls, imports, and inheritance, so a constant's consumers looked like "nothing depends on this"). This makes `codegraph impact`, and the impact trail in `codegraph_explore`/`codegraph_node`, catch the "change this table, break its readers" class of change. It's on by default and adds no nodes to your graph; bundled/minified files and ambiguously-shadowed names are skipped to keep results precise. Set `CODEGRAPH_VALUE_REFS=0` to turn it off.
 
 ### Fixes
 

+ 37 - 0
__tests__/value-reference-edges.test.ts

@@ -124,6 +124,43 @@ describe('value-reference edges', () => {
     expect(valueRefReaders(cg, 'THEME_TOKENS')).toEqual(expect.arrayContaining(['Label', 'Box']));
   });
 
+  it('edges same-file readers to a module-level const/static (Rust)', async () => {
+    fs.writeFileSync(
+      path.join(dir, 'lib.rs'),
+      [
+        'const MAX_RETRIES: u32 = 3;',
+        'static DEFAULT_LABEL: &str = "prod";',
+        '',
+        'fn retry() -> u32 { MAX_RETRIES }',
+        "fn label() -> &'static str { DEFAULT_LABEL }",
+      ].join('\n'),
+    );
+    cg = index();
+    await cg.indexAll();
+
+    expect(valueRefReaders(cg, 'MAX_RETRIES')).toEqual(expect.arrayContaining(['retry']));
+    expect(valueRefReaders(cg, 'DEFAULT_LABEL')).toEqual(expect.arrayContaining(['label']));
+  });
+
+  it('does NOT edge a Rust const shadowed by a local let of the same name', async () => {
+    fs.writeFileSync(
+      path.join(dir, 'shadow.rs'),
+      [
+        'const TIMEOUT: u32 = 30;',
+        '',
+        'fn uses_const() -> u32 { TIMEOUT }',
+        'fn shadows() -> u32 {',
+        '    let TIMEOUT = 5;',
+        '    TIMEOUT',
+        '}',
+      ].join('\n'),
+    );
+    cg = index();
+    await cg.indexAll();
+
+    expect(valueRefReaders(cg, 'TIMEOUT')).toEqual([]);
+  });
+
   it('edges same-file readers to a package-level const/var (Go)', async () => {
     fs.writeFileSync(
       path.join(dir, 'main.go'),

+ 4 - 4
docs/design/value-reference-edges-playbook.md

@@ -45,7 +45,7 @@ agent read-reduction (see §4.3).
 
 | Symbol | Role |
 |---|---|
-| `VALUE_REF_LANGS` (static Set) | languages the feature runs for. Currently `typescript`, `javascript`, `tsx`, `go`, `python`. **Add the new language here.** |
+| `VALUE_REF_LANGS` (static Set) | languages the feature runs for. Currently `typescript`, `javascript`, `tsx`, `go`, `python`, `rust`. **Add the new language here.** |
 | `valueRefsEnabled` | `process.env.CODEGRAPH_VALUE_REFS !== '0'` — default ON, env opts out. |
 | `MAX_VALUE_REF_NODES` (20_000) | per-scope traversal cap (and the shadow-scan cap). |
 | `captureValueRefScope(kind, name, id, node)` | called from `createNode` on every node. Records **targets** (file-scope `const`/`var`) and **reader scopes** (`function`/`method`/`const`/`var`). |
@@ -66,10 +66,10 @@ targets** (see §3).
 
 ## 2. Current state (what's shipped + validated)
 
-- **Default ON** for TS/JS/tsx + Go + Python (`CODEGRAPH_VALUE_REFS=0` disables). Shipped in **PR #895**
+- **Default ON** for TS/JS/tsx + Go + Python + Rust (`CODEGRAPH_VALUE_REFS=0` disables). Shipped in **PR #895**
   (flip-on + the shadow prune); Go added in a later PR (the shadow-prune declarator switch +
   `VALUE_REF_LANGS`).
-- **Validated S/M/L** in **TypeScript, JavaScript, tsx, Go, and Python** — see the matrix in the
+- **Validated S/M/L** in **TypeScript, JavaScript, tsx, Go, Python, and Rust** — see the matrix in the
   design doc. All clean: node count identical on/off, precision guards held, impact win
   reproduced. Go required extending the shadow prune (per-grammar declarators) — the worked
   example of "step B is load-bearing."
@@ -222,7 +222,7 @@ silently does nothing for the new language and intra-file shadowing produces fal
 | TS/JS/tsx | `variable_declarator` | `namedChild(0)` | done |
 | Go | `const_spec`, `var_spec`, `short_var_declaration` | spec → `namedChild(0)`; short-var → identifiers in the `left` field | **done** |
 | Python | `assignment` | `left` field: identifier, or iterate a `pattern_list`/`tuple_pattern` | **done** |
-| Rust | `const_item` / `static_item` (`let_declaration` = locals) | `name` field | to verify |
+| Rust | `const_item`, `static_item`, `let_declaration` | const/static → `name` field; let → `pattern` field | **done** |
 | Ruby | `assignment` with constant LHS (`CONST`) | LHS | to verify |
 | C/C++ | `init_declarator` in a file-scope `declaration` | declarator id | to verify |
 

+ 23 - 5
docs/design/value-reference-edges.md

@@ -1,6 +1,6 @@
 # Design + status: same-file value-reference edges
 
-**Status:** SHIPPED (default-on for TS/JS/tsx + Go + Python; `CODEGRAPH_VALUE_REFS=0` disables). The
+**Status:** SHIPPED (default-on for TS/JS/tsx + Go + Python + Rust; `CODEGRAPH_VALUE_REFS=0` disables). The
 emitter lives in `TreeSitterExtractor.flushValueRefs` (`src/extraction/tree-sitter.ts`).
 **Motivation:** close the impact-analysis hole for *value consumers*. Static
 extraction edges calls, imports, and inheritance, but never edges a constant to the
@@ -13,7 +13,7 @@ readers" class of change (the ReScript-PR false positive that motivated the work
 ## TL;DR for a new session
 
 We emit a `references` edge (`metadata: { valueRef: true }`) from a reader symbol to
-the **file/package-scope `const`/`var` it reads**, same-file only, for TS/JS/tsx + Go + Python. Those edges
+the **file/package-scope `const`/`var` it reads**, same-file only, for TS/JS/tsx + Go + Python + Rust. Those edges
 flow straight into `getImpactRadius` / `codegraph impact` and the impact trail in
 `codegraph_explore` / `codegraph_node` — no agent-behaviour change required.
 
@@ -39,13 +39,14 @@ The win is **impact-radius correctness**, not agent read-reduction (see "Agent A
    or a Python module const shadowed by a local `=` all resolve to the inner binding for nested
    readers — a file-scope edge would be a false positive. Inner re-bindings aren't graph nodes,
    so declarators are counted at the syntax level (per-grammar node types: `variable_declarator`
-   for TS/JS, `const_spec`/`var_spec`/`short_var_declaration` for Go, `assignment` for Python).
+   for TS/JS, `const_spec`/`var_spec`/`short_var_declaration` for Go, `assignment` for Python,
+   `const_item`/`static_item`/`let_declaration` for Rust).
    Comparing against file-scope node count (not a flat ">1") keeps **conditional module defs**
    (`try: X=…; except: X=…`), which legitimately bind a name twice at file scope. This catches
    the content-minified bundles guard #1 misses.
 3. **Distinctive-name + same-file** as above.
 
-## Validation matrix — TypeScript / JavaScript / Go / Python
+## Validation matrix — TypeScript / JavaScript / Go / Python / Rust
 
 Method per repo: index the same tree twice (value-refs on vs `CODEGRAPH_VALUE_REFS=0`),
 diff node/edge counts, spot-check precision, and measure `codegraph impact` on a few
@@ -84,7 +85,15 @@ file-scope consts. Node count must be **identical** on/off (edges-only feature).
 | sqlalchemy/sqlalchemy | medium | 679 | 59,963 (stable) | +1,929 (0.8%) | all sampled TP; guard holds | `COMPARE_FAILED` 1→**26**, `DB_LINK_PLACEHOLDER` 1→19 |
 | django/django | large | 3,005 | 61,748 (stable) | +1,328 (0.7%) | all sampled TP; guard holds | `_trans` 1→**138**, `SEARCH_VAR` 4→8 |
 
-Across S/M/L in all four languages: node count never moved, the precision guards held, and
+**Rust** (module-level `const`/`static`; declarators added, no rule change needed)
+
+| Repo | size | files | nodes (on=off) | +value-ref edges | precision | `impact` on→off example |
+|---|---|---|---|---|---|---|
+| BurntSushi/ripgrep | small | 107 | 3,731 (stable) | +144 (0.9%) | all sampled TP; guard holds | `SHERLOCK` 7→**113** |
+| tokio-rs/tokio | medium | 795 | 13,281 (stable) | +476 (1.1%) | all sampled TP; `#[cfg]`-conditional consts kept | `PERMIT_SHIFT` 1→**97**, `LOCAL_QUEUE_CAPACITY` 2→46 |
+| rust-lang/rust-analyzer | large | 1,530 | 38,780 (stable) | +475 (0.25%) | all sampled TP; 0 real shadow leaks | `INLINE_CAP` 2→**183**, `SPAN_PARTS_BIT` 2→18 |
+
+Across S/M/L in all five languages: node count never moved, the precision guards held, and
 the `impact` OFF column is the bug — a const that 80–140 symbols read reports "1 affected"
 without value-refs.
 
@@ -108,6 +117,15 @@ makes declarators exceed file-scope nodes (the excess is the local). This is str
 correct for *all* languages. (It also made the two halves of a conditional def cross-reference
 via their own names, so same-name value-ref edges are now suppressed.)
 
+**Rust needed only declarators — the rule was already right.** Rust's are `const_item` /
+`static_item` (module consts) and `let_declaration` (the local that shadows). Adding them to
+the switch fixed the expected shadow FP (a `const TIMEOUT` shadowed by a local `let TIMEOUT`).
+Rust also has the conditional-def pattern — `#[cfg(unix)] const SEP = …; #[cfg(windows)] const
+SEP = …` — and the Python-era file-scope-count rule already keeps those correctly (validated on
+tokio's `io/interest.rs` cfg-gated flags). One nice property fell out: consts written inside a
+config macro (`cfg_aio! { … }`) live in an unparsed token tree, so the prune's syntax walk
+doesn't even see them.
+
 **`tsx` is covered by the TS rows** — excalidraw is a React/.tsx codebase, so the headline
 `tablerIconProps` (1→170) and most of its targets live in `.tsx` files. The one
 tsx-specific path — a const read *only* inside JSX (`<Foo x={CONST}/>`) — relies on the

+ 9 - 4
src/extraction/tree-sitter.ts

@@ -224,7 +224,7 @@ export class TreeSitterExtractor {
   // Value-reference edges (default ON; set CODEGRAPH_VALUE_REFS=0 to disable; see flushValueRefs).
   // Same-file reads of file-scope const/var symbols → `references` edges so impact analysis catches
   // value consumers ("change this constant/table, affect its readers").
-  private static readonly VALUE_REF_LANGS = new Set<string>(['typescript', 'javascript', 'tsx', 'go', 'python']);
+  private static readonly VALUE_REF_LANGS = new Set<string>(['typescript', 'javascript', 'tsx', 'go', 'python', 'rust']);
   private static readonly MAX_VALUE_REF_NODES = 20_000;
   private readonly valueRefsEnabled = process.env.CODEGRAPH_VALUE_REFS !== '0';
   private fileScopeValues = new Map<string, string>();
@@ -600,9 +600,14 @@ export class TreeSitterExtractor {
           case 'var_spec':            // Go  `var X = …`
             bump(n.namedChild(0));
             break;
-          case 'short_var_declaration': // Go  `x, Y := …`
-          case 'assignment': {          // Python  `X = …` / `X: T = …` / `A, B = …`
-            const left = getChildByField(n, 'left') ?? n.namedChild(0);
+          case 'const_item':          // Rust  `const X: T = …`
+          case 'static_item':         // Rust  `static X: T = …`
+            bump(getChildByField(n, 'name'));
+            break;
+          case 'let_declaration':       // Rust  `let x = …` (locals — the shadow source)
+          case 'short_var_declaration': // Go    `x, Y := …`
+          case 'assignment': {          // Python `X = …` / `X: T = …` / `A, B = …`
+            const left = getChildByField(n, 'left') ?? getChildByField(n, 'pattern') ?? n.namedChild(0);
             if (left?.type === 'identifier') bump(left);
             else if (left) for (const c of left.namedChildren) bump(c);
             break;