1 month ago · 9f1a951642
--- a/.claude/skills/add-lang/SKILL.md
+++ b/.claude/skills/add-lang/SKILL.md
@@ -0,0 +1,193 @@
 
				+---
			
 
				+name: add-lang
			
 
				+description: Add tree-sitter language support to codegraph end-to-end — wire the grammar + extractor, write tests, then benchmark extraction quality and retrieval value on 3 popular real-world repos. Use when the user runs /add-lang <language> or asks to add/support a new language (e.g. Lua, Elixir, Zig, OCaml) in codegraph.
			
 
				+---
			
 
				+
			
 
				+# Add a language to CodeGraph
			
 
				+
			
 
				+Wire a new tree-sitter language into codegraph's extraction pipeline, prove it
			
 
				+extracts real symbols on popular repos, and prove it beats no-codegraph for an
			
 
				+agent. Runs **fully autonomously** — pick repos, benchmark, update docs, then
			
 
				+report. **Never commit, push, publish, or tag** (house rule); leave all changes
			
 
				+for the user to review.
			
 
				+
			
 
				+The argument is the language token used throughout the `Language` union, e.g.
			
 
				+`lua`, `elixir`, `zig`. If none was given, ask which language. Use the lowercase
			
 
				+single-token form everywhere (`csharp`, not `c#`).
			
 
				+
			
 
				+## Prerequisites
			
 
				+- Run from the codegraph repo root. `node`, `git`, `gh`, and a logged-in
			
 
				+  `claude` CLI (the benchmark spawns real `claude -p` runs).
			
 
				+- The benchmark uses the local dev build — Step 8 builds + links it on PATH.
			
 
				+
			
 
				+## Workflow
			
 
				+
			
 
				+Copy this checklist and work through it in order:
			
 
				+```
			
 
				+- [ ] 1. Resolve language; bail early if already supported (just benchmark)
			
 
				+- [ ] 2. Find a grammar (tree-sitter-wasms vs vendor a .wasm)
			
 
				+- [ ] 3. Discover the grammar's AST node types (dump-ast.mjs)
			
 
				+- [ ] 4. Wire the language (4 source edits)
			
 
				+- [ ] 5. Build + verify-extraction loop until PASS
			
 
				+- [ ] 6. Add extraction tests; make them green
			
 
				+- [ ] 7. Auto-pick 3 popular repos by size tier; add to corpus.json
			
 
				+- [ ] 8. Benchmark all 3: extraction + with/without A/B
			
 
				+- [ ] 9. Update README + CHANGELOG
			
 
				+- [ ] 10. Report; do NOT commit
			
 
				+```
			
 
				+
			
 
				+### Step 1 — Resolve + short-circuit
			
 
				+
			
 
				+Check whether the language is already wired: look for the token in the
			
 
				+`LANGUAGES` const (`src/types.ts`) and the `EXTRACTORS` map
			
 
				+(`src/extraction/languages/index.ts`). If it is already supported (e.g.
			
 
				+`typescript`, `rust`), **skip Steps 2–6** and go straight to benchmarking
			
 
				+(Steps 7–8) to validate/measure it — note in the report that no code changed.
			
 
				+
			
 
				+### Step 2 — Find a grammar
			
 
				+
			
 
				+```bash
			
 
				+ls node_modules/tree-sitter-wasms/out/ | grep -i <lang>   # csharp -> c_sharp
			
 
				+```
			
 
				+- **Present** → off-the-shelf. No vendoring; `grammars.ts` resolves it from
			
 
				+  `tree-sitter-wasms` automatically. (Most popular languages are here: lua,
			
 
				+  elixir, zig, ocaml, solidity, toml, yaml, …)
			
 
				+- **Absent** → you must vendor a `.wasm` into `src/extraction/wasm/` (like
			
 
				+  `pascal`/`scala`) and add the token to the vendored branch in Step 4. Get a
			
 
				+  wasm from the grammar's npm package (a prebuilt `*.wasm`) or by building one
			
 
				+  (`npx tree-sitter-cli build --wasm`, which needs emscripten/Docker — the
			
 
				+  `tree-sitter` CLI is usually not on PATH here). **If you cannot obtain a
			
 
				+  wasm, STOP and tell the user** — the language can't be added without it.
			
 
				+
			
 
				+### Step 3 — Discover AST node types
			
 
				+
			
 
				+Get a representative source file (write a small sample covering functions,
			
 
				+classes/structs, imports, enums; or `curl` a raw file from a known repo), then:
			
 
				+```bash
			
 
				+node scripts/add-lang/dump-ast.mjs <lang> path/to/sample.<ext>
			
 
				+# vendored grammar: pass the wasm path instead of the token
			
 
				+node scripts/add-lang/dump-ast.mjs src/extraction/wasm/tree-sitter-<lang>.wasm sample.<ext>
			
 
				+```
			
 
				+The frequency table + field names (`name:`, `parameters:`, `body:`,
			
 
				+`return_type:`) tell you what to map. Open the existing extractor closest to the
			
 
				+language's paradigm as a model: `rust.ts`/`scala.ts` (functional, traits),
			
 
				+`java.ts`/`csharp.ts` (OO), `python.ts`/`ruby.ts` (scripting), `go.ts`
			
 
				+(top-level methods + receivers).
			
 
				+
			
 
				+### Step 4 — Wire the language (4 edits)
			
 
				+
			
 
				+These are exact, fragile wiring — match the existing style precisely:
			
 
				+
			
 
				+1. **`src/types.ts`** — add `'<lang>',` to the `LANGUAGES` const (before
			
 
				+   `'unknown'`).
			
 
				+2. **`src/extraction/grammars.ts`** — three maps:
			
 
				+   - `WASM_GRAMMAR_FILES`: `<lang>: 'tree-sitter-<lang>.wasm',`
			
 
				+   - `EXTENSION_MAP`: each file extension → `'<lang>'` (e.g. `'.lua': 'lua',`)
			
 
				+   - `getLanguageDisplayName`: `<lang>: '<Display Name>',`
			
 
				+   - **vendored only**: add `<lang>` to the
			
 
				+     `(lang === 'pascal' || lang === 'scala')` wasm-path branch.
			
 
				+3. **`src/extraction/languages/<lang>.ts`** — new file exporting
			
 
				+   `export const <lang>Extractor: LanguageExtractor = { … }`. Map the node types
			
 
				+   from Step 3. Required fields: `functionTypes`, `classTypes`, `methodTypes`,
			
 
				+   `interfaceTypes`, `structTypes`, `enumTypes`, `typeAliasTypes`,
			
 
				+   `importTypes`, `callTypes`, `variableTypes`, `nameField`, `bodyField`,
			
 
				+   `paramsField`. Add hooks as the grammar needs them (`getSignature`,
			
 
				+   `getVisibility`, `isExported`, `extractImport`, `getReceiverType`,
			
 
				+   `interfaceKind`, `enumMemberTypes`, etc. — see
			
 
				+   `src/extraction/tree-sitter-types.ts`).
			
 
				+4. **`src/extraction/languages/index.ts`** — `import { <lang>Extractor } from
			
 
				+   './<lang>';` and add `<lang>: <lang>Extractor,` to `EXTRACTORS`.
			
 
				+
			
 
				+### Step 5 — Build + verify loop
			
 
				+
			
 
				+```bash
			
 
				+npm run build            # tsc + copy-assets (copies any vendored *.wasm into dist/)
			
 
				+```
			
 
				+Index a small sample repo and check extraction:
			
 
				+```bash
			
 
				+( cd <sample-repo> && codegraph init -i )
			
 
				+node scripts/add-lang/verify-extraction.mjs <sample-repo> <lang>
			
 
				+```
			
 
				+`verify-extraction.mjs` fails (exit 1) if the language isn't detected or only
			
 
				+`file`/`import` nodes were produced — the classic symptom of wrong node-type
			
 
				+names. On FAIL or a thin WARN: re-run `dump-ast.mjs` on a richer file, fix the
			
 
				+mappings in `<lang>.ts`, `npm run build`, re-index, re-verify. **Repeat until
			
 
				+PASS.**
			
 
				+
			
 
				+### Step 6 — Tests
			
 
				+
			
 
				+Add to `__tests__/extraction.test.ts`, modeled on the `Rust Extraction` block:
			
 
				+- a `detectLanguage` assertion in `describe('Language Detection')`
			
 
				+- a `describe('<Lang> Extraction')` block asserting functions/classes/imports
			
 
				+  are extracted from an inline source string.
			
 
				+```bash
			
 
				+npx vitest run __tests__/extraction.test.ts
			
 
				+```
			
 
				+Green before continuing.
			
 
				+
			
 
				+### Step 7 — Auto-pick 3 repos + corpus
			
 
				+
			
 
				+Pick **without asking**. Find candidates, then curate 3 that are genuinely
			
 
				+`<lang>`-dominant, one per size tier:
			
 
				+```bash
			
 
				+gh search repos --language=<lang> --sort=stars --limit 40 \
			
 
				+  --json fullName,stargazerCount,description
			
 
				+```
			
 
				+Tiers (match `corpus.json`): **Small** <~150 files · **Medium** ~150–1500 ·
			
 
				+**Large** >~1500. Skip repos that are tagged `<lang>` but mostly another
			
 
				+language. Write one cross-file architecture **question** per repo (the kind that
			
 
				+needs tracing across files). Add a `"<Language>"` block to
			
 
				+`.claude/skills/agent-eval/corpus.json` (fields: `name`, `repo`, `size`,
			
 
				+`files`, `question`) so `/agent-eval` can reuse them.
			
 
				+
			
 
				+### Step 8 — Benchmark all 3 (extraction + A/B)
			
 
				+
			
 
				+Make the dev build the codegraph on PATH **once**, then loop:
			
 
				+```bash
			
 
				+npm run build && ./scripts/local-install.sh
			
 
				+scripts/add-lang/bench.sh <lang> <name> <url> "<question>" headless   # ×3
			
 
				+```
			
 
				+`bench.sh` clones (shared `/tmp/codegraph-corpus`), wipes + indexes, runs
			
 
				+`verify-extraction.mjs`, then the with/without retrieval A/B via
			
 
				+`scripts/agent-eval/run-all.sh` (skips the paid A/B if extraction is broken).
			
 
				+Read each `parse-run.mjs` summary printed by `run-all.sh`: tool calls, file
			
 
				+`Read`s, Grep/Bash, codegraph-tool calls, duration, and **cost** — for both the
			
 
				+`with` and `without` arms. After the loop, restore the dev link if needed:
			
 
				+`./scripts/local-install.sh`.
			
 
				+
			
 
				+### Step 9 — Docs + CHANGELOG
			
 
				+
			
 
				+- **README.md**: add `<Lang>` to the "19+ Languages" feature bullet, and add a
			
 
				+  row to the **Supported Languages** table:
			
 
				+  `| <Lang> | \`.ext\` | Full support (classes, methods, …) |`.
			
 
				+- **CHANGELOG.md**: add an `## [Unreleased]` section at the top (above the
			
 
				+  latest version) with `### Added` → a user-perspective bullet, e.g.
			
 
				+  *"CodeGraph now indexes **<Lang>** (`.ext`) — functions, classes, imports, and
			
 
				+  call edges."* If `## [Unreleased]` already exists, append under it. (`/publish`
			
 
				+  folds this into the next versioned block at release time.)
			
 
				+
			
 
				+### Step 10 — Report (do NOT commit)
			
 
				+
			
 
				+Summarize for review:
			
 
				+- **Files changed**: the 4 wiring edits + new extractor + tests + README +
			
 
				+  CHANGELOG + corpus.json (+ any vendored `.wasm`).
			
 
				+- **Extraction** per repo: files / nodes / edges / `verify-extraction` result.
			
 
				+- **A/B** per repo: `with` vs `without` (tool calls, file Reads, cost) and a
			
 
				+  one-line verdict — did codegraph reduce effort, and did both arms reach a
			
 
				+  correct answer?
			
 
				+- **Gaps / follow-ups** (node types not yet mapped, resolution edges missing,
			
 
				+  framework routes, etc.).
			
 
				+
			
 
				+Hand the changes to the user. **Do not** run `git commit`/`push`,
			
 
				+`npm publish`, or `scripts/release.sh`.
			
 
				+
			
 
				+## Notes
			
 
				+- The A/B spawns real **paid** `claude -p` runs (opus, `--max-budget-usd`),
			
 
				+  2 arms × 3 repos. The corpus dir `/tmp/codegraph-corpus` is shared with
			
 
				+  `/agent-eval`, so clones are reused across runs.
			
 
				+- Any new `*.wasm` must live in `src/extraction/wasm/` — `copy-assets` (run by
			
 
				+  `npm run build`) ships it; otherwise it won't be in `dist/`.
			
 
				+- An index must be served by the **same** binary that built it. Step 8 builds +
			
 
				+  links the dev build first, so this holds.
			
 
				+- If a grammar can't be obtained, or extraction can't reach PASS, **STOP and
			
 
				+  report** — don't ship a half-wired language.
			
--- a/scripts/add-lang/bench.sh
+++ b/scripts/add-lang/bench.sh
@@ -0,0 +1,60 @@
 
				+#!/usr/bin/env bash
			
 
				+# Add-lang benchmark for ONE repo:
			
 
				+#   clone -> wipe+index (with the codegraph on PATH) -> verify extraction ->
			
 
				+#   with/without retrieval A/B (reuses scripts/agent-eval/run-all.sh).
			
 
				+#
			
 
				+# Assumes the codegraph dev build is already built + linked on PATH — the skill
			
 
				+# runs `npm run build && ./scripts/local-install.sh` ONCE before looping repos.
			
 
				+# The A/B is skipped if extraction fails its critical checks (don't burn $ on a
			
 
				+# broken extractor); set FORCE_AB=1 to run it anyway.
			
 
				+#
			
 
				+# Usage: bench.sh <lang> <repo-name> <repo-url> "<question>" [headless|tmux|all]
			
 
				+# Env:   CORPUS   corpus dir (default /tmp/codegraph-corpus, shared with agent-eval)
			
 
				+set -uo pipefail
			
 
				+
			
 
				+LANG_TOKEN="${1:?usage: bench.sh <lang> <repo-name> <repo-url> \"<question>\" [mode]}"
			
 
				+NAME="${2:?repo-name required}"
			
 
				+URL="${3:?repo-url required}"
			
 
				+Q="${4:?question required}"
			
 
				+MODE="${5:-headless}"
			
 
				+
			
 
				+HARNESS="$(cd "$(dirname "$0")" && pwd)"
			
 
				+AGENT_EVAL="$(cd "$HARNESS/../agent-eval" && pwd)"
			
 
				+CORPUS="${CORPUS:-/tmp/codegraph-corpus}"
			
 
				+REPO="$CORPUS/$NAME"
			
 
				+
			
 
				+command -v codegraph >/dev/null || { echo "no codegraph on PATH (build + ./scripts/local-install.sh first)"; exit 1; }
			
 
				+
			
 
				+echo "==================== add-lang bench: $NAME ($LANG_TOKEN) ===================="
			
 
				+echo "codegraph: $(command -v codegraph) -> $(codegraph --version 2>/dev/null || echo '?')"
			
 
				+
			
 
				+# 1. Ensure the repo (shallow clone, reuse if present).
			
 
				+mkdir -p "$CORPUS"
			
 
				+if [ -d "$REPO/.git" ]; then
			
 
				+  echo "→ reusing checkout: $REPO"
			
 
				+else
			
 
				+  echo "→ cloning $URL"
			
 
				+  git clone --depth 1 "$URL" "$REPO" || { echo "git clone failed"; exit 1; }
			
 
				+fi
			
 
				+
			
 
				+# 2. Wipe + index with the binary under test.
			
 
				+echo "→ wiping .codegraph and indexing"
			
 
				+rm -rf "$REPO/.codegraph"
			
 
				+( cd "$REPO" && codegraph init -i ) || { echo "indexing failed"; exit 1; }
			
 
				+
			
 
				+# 3. Verify extraction (cheap guard before the paid A/B).
			
 
				+echo "→ verifying extraction"
			
 
				+node "$HARNESS/verify-extraction.mjs" "$REPO" "$LANG_TOKEN"
			
 
				+VERIFY=$?
			
 
				+
			
 
				+# 4. Retrieval A/B (skipped if extraction is broken, unless FORCE_AB=1).
			
 
				+if [ "$VERIFY" -ne 0 ] && [ "${FORCE_AB:-0}" != "1" ]; then
			
 
				+  echo "→ SKIPPING A/B — extraction failed critical checks (set FORCE_AB=1 to override)"
			
 
				+else
			
 
				+  echo "→ retrieval A/B (mode=$MODE)"
			
 
				+  bash "$AGENT_EVAL/run-all.sh" "$REPO" "$Q" "$MODE"
			
 
				+fi
			
 
				+
			
 
				+echo "==================== bench complete: $NAME (verify exit=$VERIFY) ===================="
			
 
				+# Exit reflects extraction: 0 = pass/warn, 1 = critical fail, 2 = couldn't read status.
			
 
				+exit "$VERIFY"
			
--- a/scripts/add-lang/dump-ast.mjs
+++ b/scripts/add-lang/dump-ast.mjs
@@ -0,0 +1,103 @@
 
				+#!/usr/bin/env node
			
 
				+// Dump the tree-sitter AST for a sample file so you can write a LanguageExtractor
			
 
				+// mapping. Loads a grammar .wasm directly via web-tree-sitter (the same runtime
			
 
				+// codegraph uses) — you do NOT need to register the language first.
			
 
				+//
			
 
				+// Usage:
			
 
				+//   node scripts/add-lang/dump-ast.mjs <lang|wasm-path> <sample-file> [--depth=N] [--full]
			
 
				+// Examples:
			
 
				+//   node scripts/add-lang/dump-ast.mjs lua sample.lua
			
 
				+//   node scripts/add-lang/dump-ast.mjs src/extraction/wasm/tree-sitter-zig.wasm a.zig --depth=4
			
 
				+//
			
 
				+// Output: an indented AST (named nodes, with field names) followed by a
			
 
				+// node-type FREQUENCY table. The frequency table is the payoff — it tells you
			
 
				+// which node types to map to functionTypes / classTypes / importTypes / etc.
			
 
				+
			
 
				+import { readFileSync, existsSync } from 'node:fs';
			
 
				+import { createRequire } from 'node:module';
			
 
				+import { Parser, Language } from 'web-tree-sitter';
			
 
				+
			
 
				+const require = createRequire(import.meta.url);
			
 
				+const fail = (msg) => { console.error(`[dump-ast] ${msg}`); process.exit(1); };
			
 
				+
			
 
				+const argv = process.argv.slice(2);
			
 
				+const positional = argv.filter((a) => !a.startsWith('--'));
			
 
				+const [langOrWasm, sampleFile] = positional;
			
 
				+const depthFlag = argv.find((a) => a.startsWith('--depth='));
			
 
				+const showAll = argv.includes('--full'); // also print anonymous (token) nodes
			
 
				+const maxDepth = depthFlag ? parseInt(depthFlag.split('=')[1], 10) : (showAll ? Infinity : 8);
			
 
				+
			
 
				+if (!langOrWasm || !sampleFile) {
			
 
				+  fail('usage: dump-ast.mjs <lang|wasm-path> <sample-file> [--depth=N] [--full]');
			
 
				+}
			
 
				+if (!existsSync(sampleFile)) fail(`sample file not found: ${sampleFile}`);
			
 
				+
			
 
				+// Language tokens whose tree-sitter-wasms filename differs from the token.
			
 
				+const WASM_SPECIAL = { csharp: 'c_sharp', 'c#': 'c_sharp' };
			
 
				+
			
 
				+function resolveWasm(token) {
			
 
				+  if (token.endsWith('.wasm')) {
			
 
				+    if (!existsSync(token)) fail(`wasm not found: ${token}`);
			
 
				+    return token;
			
 
				+  }
			
 
				+  const base = WASM_SPECIAL[token.toLowerCase()] ?? token.toLowerCase();
			
 
				+  try {
			
 
				+    return require.resolve(`tree-sitter-wasms/out/tree-sitter-${base}.wasm`);
			
 
				+  } catch {
			
 
				+    /* not in tree-sitter-wasms — try a vendored copy */
			
 
				+  }
			
 
				+  const vendored = `src/extraction/wasm/tree-sitter-${base}.wasm`;
			
 
				+  if (existsSync(vendored)) return vendored;
			
 
				+  fail(
			
 
				+    `no grammar for "${token}" — not in tree-sitter-wasms and not vendored at ` +
			
 
				+      `${vendored}. Pass an explicit .wasm path, or vendor one (see SKILL.md "Find a grammar").`
			
 
				+  );
			
 
				+}
			
 
				+
			
 
				+const wasmPath = resolveWasm(langOrWasm);
			
 
				+const source = readFileSync(sampleFile, 'utf8');
			
 
				+
			
 
				+try {
			
 
				+  await Parser.init();
			
 
				+} catch {
			
 
				+  await Parser.init({ locateFile: () => require.resolve('web-tree-sitter/tree-sitter.wasm') });
			
 
				+}
			
 
				+
			
 
				+let language;
			
 
				+try {
			
 
				+  language = await Language.load(wasmPath);
			
 
				+} catch (e) {
			
 
				+  fail(`failed to load grammar ${wasmPath}: ${e.message}`);
			
 
				+}
			
 
				+
			
 
				+const parser = new Parser();
			
 
				+parser.setLanguage(language);
			
 
				+const tree = parser.parse(source);
			
 
				+
			
 
				+const freq = new Map();
			
 
				+const snippet = (node) => {
			
 
				+  const t = node.text.replace(/\s+/g, ' ').trim();
			
 
				+  return t.length > 48 ? `${t.slice(0, 48)}…` : t;
			
 
				+};
			
 
				+
			
 
				+function walk(node, depth, fieldName) {
			
 
				+  if (node.isNamed) freq.set(node.type, (freq.get(node.type) || 0) + 1);
			
 
				+  if ((node.isNamed || showAll) && depth <= maxDepth) {
			
 
				+    const field = fieldName ? `${fieldName}: ` : '';
			
 
				+    const leaf = node.childCount === 0 ? `  "${snippet(node)}"` : '';
			
 
				+    console.log(`${'  '.repeat(depth)}${field}${node.type}  @${node.startPosition.row + 1}:${node.startPosition.column}${leaf}`);
			
 
				+  }
			
 
				+  for (let i = 0; i < node.childCount; i++) {
			
 
				+    const child = node.child(i);
			
 
				+    if (child) walk(child, depth + 1, node.fieldNameForChild(i));
			
 
				+  }
			
 
				+}
			
 
				+
			
 
				+console.log(`\n# AST for ${sampleFile}  (grammar: ${wasmPath.split('/').pop()})\n`);
			
 
				+walk(tree.rootNode, 0, null);
			
 
				+
			
 
				+console.log('\n# Node-type frequency (named nodes) — map the relevant ones in your extractor:\n');
			
 
				+[...freq.entries()]
			
 
				+  .sort((a, b) => b[1] - a[1])
			
 
				+  .forEach(([type, n]) => console.log(`  ${String(n).padStart(5)}  ${type}`));
			
 
				+console.log();
			
--- a/scripts/add-lang/verify-extraction.mjs
+++ b/scripts/add-lang/verify-extraction.mjs
@@ -0,0 +1,70 @@
 
				+#!/usr/bin/env node
			
 
				+// Sanity-check that codegraph extracted REAL symbols (not just file/import nodes)
			
 
				+// from a repo for a given language. Exits non-zero on a critical failure so it
			
 
				+// can drive a write-extractor -> build -> re-check loop.
			
 
				+//
			
 
				+// Usage: node scripts/add-lang/verify-extraction.mjs <repo-path> <lang>
			
 
				+// Reads `codegraph status <repo> --json` using whatever codegraph is on PATH,
			
 
				+// so it reflects the binary that built the index.
			
 
				+//
			
 
				+// Exit codes: 0 = pass or soft-warn, 1 = critical fail, 2 = could not run.
			
 
				+
			
 
				+import { execFileSync } from 'node:child_process';
			
 
				+
			
 
				+const [repo, lang] = process.argv.slice(2);
			
 
				+if (!repo || !lang) {
			
 
				+  console.error('usage: verify-extraction.mjs <repo-path> <lang>');
			
 
				+  process.exit(2);
			
 
				+}
			
 
				+
			
 
				+let status;
			
 
				+try {
			
 
				+  const out = execFileSync('codegraph', ['status', repo, '--json'], { encoding: 'utf8' });
			
 
				+  status = JSON.parse(out);
			
 
				+} catch (e) {
			
 
				+  console.error(`[verify] could not read codegraph status for ${repo}: ${e.message}`);
			
 
				+  process.exit(2);
			
 
				+}
			
 
				+
			
 
				+// Kinds that prove the extractor mapped AST node types (everything except
			
 
				+// 'file' and 'import', which codegraph creates structurally for any language).
			
 
				+const SYMBOL_KINDS = new Set([
			
 
				+  'module', 'class', 'struct', 'interface', 'trait', 'protocol', 'function',
			
 
				+  'method', 'property', 'field', 'variable', 'constant', 'enum', 'enum_member',
			
 
				+  'type_alias', 'namespace', 'route', 'component',
			
 
				+]);
			
 
				+
			
 
				+const byKind = status.nodesByKind || {};
			
 
				+const langs = status.languages || [];
			
 
				+const files = status.fileCount || 0;
			
 
				+const edges = status.edgeCount || 0;
			
 
				+const symbolKinds = Object.keys(byKind).filter((k) => SYMBOL_KINDS.has(k));
			
 
				+const symbolCount = symbolKinds.reduce((s, k) => s + byKind[k], 0);
			
 
				+
			
 
				+const checks = [];
			
 
				+const add = (severity, ok, label, detail) => checks.push({ severity, ok, label, detail });
			
 
				+
			
 
				+add('critical', status.initialized === true, 'index initialized', `initialized=${status.initialized}`);
			
 
				+add('critical', langs.includes(lang), `language "${lang}" detected`, `languages=[${langs.join(', ')}]`);
			
 
				+add('critical', symbolCount > 0, 'structural symbols extracted', `${symbolCount} symbols (${symbolKinds.join(', ') || 'NONE — only file/import nodes!'})`);
			
 
				+add('soft', symbolCount >= files, 'symbol density >= 1/file', `${symbolCount} symbols across ${files} files`);
			
 
				+add('soft', edges > files, 'edges resolved', `${edges} edges across ${files} files`);
			
 
				+
			
 
				+console.log(`\n# Extraction check — ${repo}  (lang=${lang}, backend=${status.backend})`);
			
 
				+console.log(`  files=${files} nodes=${status.nodeCount} edges=${edges}`);
			
 
				+console.log(`  nodesByKind: ${JSON.stringify(byKind)}\n`);
			
 
				+for (const c of checks) console.log(`  ${c.ok ? '✓' : '✗'} ${c.label} — ${c.detail}`);
			
 
				+
			
 
				+const critical = checks.filter((c) => !c.ok && c.severity === 'critical');
			
 
				+const soft = checks.filter((c) => !c.ok && c.severity === 'soft');
			
 
				+console.log();
			
 
				+if (critical.length) {
			
 
				+  console.log(`RESULT: FAIL (${critical.length} critical) — extractor or grammar wiring is broken. Re-run dump-ast.mjs and fix the node-type mappings.`);
			
 
				+  process.exit(1);
			
 
				+}
			
 
				+if (soft.length) {
			
 
				+  console.log(`RESULT: WARN (${soft.length} soft) — extraction works but looks thin; inspect the counts above.`);
			
 
				+  process.exit(0);
			
 
				+}
			
 
				+console.log('RESULT: PASS — extraction looks healthy.');
			
 
				+process.exit(0);