Quellcode durchsuchen

feat: add Lua and Luau language support (#273)

Adds Lua (.lua) and Luau (.luau) extraction — functions, methods with receivers, type aliases (Luau), require imports (incl. Roblox instance-path), and call edges. Vendors the ABI-15 Lua and ABI-14 Luau tree-sitter grammars. Addresses #232.
Colby Mchenry vor 1 Monat
Ursprung
Commit
4329a52bec

+ 219 - 0
.claude/skills/add-lang/SKILL.md

@@ -0,0 +1,219 @@
+---
+name: add-lang
+description: Add tree-sitter language support to codegraph end-to-end — wire the grammar + extractor, write tests, then benchmark extraction quality and retrieval value on 3 popular real-world repos. Use when the user runs /add-lang <language> or asks to add/support a new language (e.g. Lua, Elixir, Zig, OCaml) in codegraph.
+---
+
+# Add a language to CodeGraph
+
+Wire a new tree-sitter language into codegraph's extraction pipeline, prove it
+extracts real symbols on popular repos, and prove it beats no-codegraph for an
+agent. Runs **fully autonomously** — pick repos, benchmark, update docs, then
+report. **Never commit, push, publish, or tag** (house rule); leave all changes
+for the user to review.
+
+The argument is the language token used throughout the `Language` union, e.g.
+`lua`, `elixir`, `zig`. If none was given, ask which language. Use the lowercase
+single-token form everywhere (`csharp`, not `c#`).
+
+## Prerequisites
+- Run from the codegraph repo root. `node`, `git`, `gh`, and a logged-in
+  `claude` CLI (the benchmark spawns real `claude -p` runs).
+- The benchmark uses the local dev build — Step 8 builds + links it on PATH.
+
+## Workflow
+
+Copy this checklist and work through it in order:
+```
+- [ ] 1. Resolve language; bail early if already supported (just benchmark)
+- [ ] 2. Find a grammar + health-check it (ABI / heap corruption)
+- [ ] 3. Discover the grammar's AST node types (dump-ast.mjs)
+- [ ] 4. Wire the language (4 files; sometimes a 5th core touch)
+- [ ] 5. Build + verify-extraction loop until PASS
+- [ ] 6. Add extraction tests; make them green
+- [ ] 7. Auto-pick 3 popular repos by size tier; add to corpus.json
+- [ ] 8. Benchmark all 3: extraction + with/without A/B
+- [ ] 9. Update README + CHANGELOG
+- [ ] 10. Report; do NOT commit
+```
+
+### Step 1 — Resolve + short-circuit
+
+Check whether the language is already wired: look for the token in the
+`LANGUAGES` const (`src/types.ts`) and the `EXTRACTORS` map
+(`src/extraction/languages/index.ts`). If it is already supported (e.g.
+`typescript`, `rust`), **skip Steps 2–6** and go straight to benchmarking
+(Steps 7–8) to validate/measure it — note in the report that no code changed.
+
+### Step 2 — Find a grammar, then health-check it
+
+```bash
+ls node_modules/tree-sitter-wasms/out/ | grep -i <lang>   # csharp -> c_sharp
+```
+- **Present** → likely off-the-shelf; `grammars.ts` resolves it from
+  `tree-sitter-wasms` automatically. (Many languages: elixir, zig, ocaml,
+  solidity, toml, yaml, …)
+- **Absent** → vendor a `.wasm` into `src/extraction/wasm/` (like `pascal` /
+  `scala` / `lua`) and add the token to the vendored branch in Step 4.
+
+**Always health-check before writing an extractor — a *present* grammar can
+still be unusable:**
+```bash
+node scripts/add-lang/check-grammar.mjs <lang> path/to/valid-sample.<ext>
+```
+It prints the grammar's ABI version and parses a valid sample many times in a
+multi-grammar runtime. If it **FAILs** (ERROR trees on valid code — an old ABI
+corrupting the shared WASM heap, which silently drops nested calls/imports on
+every file after the first; e.g. the tree-sitter-wasms **Lua** grammar is ABI 13
+and fails), do NOT use that wasm. **Vendor a newer (ABI 14/15) build instead:**
+```bash
+npm pack @tree-sitter-grammars/tree-sitter-<lang>   # often ships a prebuilt *.wasm
+# or build one: npx tree-sitter build --wasm   (needs Docker/emscripten)
+cp <the>.wasm src/extraction/wasm/tree-sitter-<lang>.wasm
+```
+then add the token to the vendored branch in Step 4 and re-run check-grammar on
+the vendored path until it PASSes. **If you cannot obtain a healthy wasm, STOP
+and tell the user.**
+
+### Step 3 — Discover AST node types
+
+Get a representative source file (write a small sample covering functions,
+classes/structs, imports, enums; or `curl` a raw file from a known repo), then:
+```bash
+node scripts/add-lang/dump-ast.mjs <lang> path/to/sample.<ext>
+# vendored grammar: pass the wasm path instead of the token
+node scripts/add-lang/dump-ast.mjs src/extraction/wasm/tree-sitter-<lang>.wasm sample.<ext>
+```
+The frequency table + field names (`name:`, `parameters:`, `body:`,
+`return_type:`) tell you what to map. Open the existing extractor closest to the
+language's paradigm as a model: `rust.ts`/`scala.ts` (functional, traits),
+`java.ts`/`csharp.ts` (OO), `python.ts`/`ruby.ts` (scripting), `go.ts`
+(top-level methods + receivers).
+
+### Step 4 — Wire the language (4 files)
+
+These are exact, fragile wiring — match the existing style precisely:
+
+1. **`src/types.ts`** — TWO edits:
+   - add `'<lang>',` to the `LANGUAGES` const (before `'unknown'`);
+   - add `'**/*.<ext>',` to `DEFAULT_CONFIG.include`. **Don't skip this** — it's
+     the file-scan allowlist; without the glob, `codegraph init` finds **0
+     files** even though detection/extraction are wired.
+2. **`src/extraction/grammars.ts`** — three maps:
+   - `WASM_GRAMMAR_FILES`: `<lang>: 'tree-sitter-<lang>.wasm',`
+   - `EXTENSION_MAP`: each file extension → `'<lang>'` (e.g. `'.lua': 'lua',`)
+   - `getLanguageDisplayName`: `<lang>: '<Display Name>',`
+   - **vendored only**: add `<lang>` to the
+     `(lang === 'pascal' || lang === 'scala' || …)` wasm-path branch.
+3. **`src/extraction/languages/<lang>.ts`** — new file exporting
+   `export const <lang>Extractor: LanguageExtractor = { … }`. Map the node types
+   from Step 3. Required fields: `functionTypes`, `classTypes`, `methodTypes`,
+   `interfaceTypes`, `structTypes`, `enumTypes`, `typeAliasTypes`,
+   `importTypes`, `callTypes`, `variableTypes`, `nameField`, `bodyField`,
+   `paramsField`. Add hooks as the grammar needs them (`getSignature`,
+   `getVisibility`, `isExported`, `extractImport`, `visitNode`, `getReceiverType`,
+   `interfaceKind`, `enumMemberTypes`, etc. — see
+   `src/extraction/tree-sitter-types.ts`).
+4. **`src/extraction/languages/index.ts`** — `import { <lang>Extractor } from
+   './<lang>';` and add `<lang>: <lang>Extractor,` to `EXTRACTORS`.
+
+**Sometimes a 5th, core touch in `src/extraction/tree-sitter.ts`** — variable
+extraction has per-language branches in `extractVariable` (the generic fallback
+only finds direct `identifier`/`variable_declarator` children). If the grammar
+nests declared names (e.g. Lua's `variable_declaration → variable_list`), add a
+`} else if (this.language === '<lang>')` branch there, mirroring the existing
+ts/python/go ones. Import forms that aren't a distinct node (Lua/Ruby `require`
+is a *call*) are handled in the extractor's `visitNode` hook instead.
+
+### Step 5 — Build + verify loop
+
+```bash
+npm run build            # tsc + copy-assets (copies any vendored *.wasm into dist/)
+```
+Index a small sample repo and check extraction:
+```bash
+( cd <sample-repo> && codegraph init -i )
+node scripts/add-lang/verify-extraction.mjs <sample-repo> <lang>
+```
+`verify-extraction.mjs` fails (exit 1) if the language isn't detected or only
+`file`/`import` nodes were produced — the classic symptom of wrong node-type
+names. On FAIL or a thin WARN: re-run `dump-ast.mjs` on a richer file, fix the
+mappings in `<lang>.ts`, `npm run build`, re-index, re-verify. **Repeat until
+PASS.**
+
+### Step 6 — Tests
+
+Add to `__tests__/extraction.test.ts`, modeled on the `Rust Extraction` block:
+- a `detectLanguage` assertion in `describe('Language Detection')`
+- a `describe('<Lang> Extraction')` block asserting functions/classes/imports
+  are extracted from an inline source string.
+```bash
+npx vitest run __tests__/extraction.test.ts
+```
+Green before continuing.
+
+### Step 7 — Auto-pick 3 repos + corpus
+
+Pick **without asking**. Find candidates, then curate 3 that are genuinely
+`<lang>`-dominant, one per size tier:
+```bash
+gh search repos --language=<lang> --sort=stars --limit 40 \
+  --json fullName,stargazerCount,description
+```
+Tiers (match `corpus.json`): **Small** <~150 files · **Medium** ~150–1500 ·
+**Large** >~1500. Skip repos that are tagged `<lang>` but mostly another
+language. Write one cross-file architecture **question** per repo (the kind that
+needs tracing across files). Add a `"<Language>"` block to
+`.claude/skills/agent-eval/corpus.json` (fields: `name`, `repo`, `size`,
+`files`, `question`) so `/agent-eval` can reuse them.
+
+### Step 8 — Benchmark all 3 (extraction + A/B)
+
+Make the dev build the codegraph on PATH **once**, then loop:
+```bash
+npm run build && ./scripts/local-install.sh
+scripts/add-lang/bench.sh <lang> <name> <url> "<question>" headless   # ×3
+```
+`bench.sh` clones (shared `/tmp/codegraph-corpus`), wipes + indexes, runs
+`verify-extraction.mjs`, then the with/without retrieval A/B via
+`scripts/agent-eval/run-all.sh` (skips the paid A/B if extraction is broken).
+Read each `parse-run.mjs` summary printed by `run-all.sh`: tool calls, file
+`Read`s, Grep/Bash, codegraph-tool calls, duration, and **cost** — for both the
+`with` and `without` arms. After the loop, restore the dev link if needed:
+`./scripts/local-install.sh`.
+
+### Step 9 — Docs + CHANGELOG
+
+- **README.md**: add `<Lang>` to the "19+ Languages" feature bullet, and add a
+  row to the **Supported Languages** table:
+  `| <Lang> | \`.ext\` | Full support (classes, methods, …) |`.
+- **CHANGELOG.md**: add an `## [Unreleased]` section at the top (above the
+  latest version) with `### Added` → a user-perspective bullet, e.g.
+  *"CodeGraph now indexes **<Lang>** (`.ext`) — functions, classes, imports, and
+  call edges."* If `## [Unreleased]` already exists, append under it. (`/publish`
+  folds this into the next versioned block at release time.)
+
+### Step 10 — Report (do NOT commit)
+
+Summarize for review:
+- **Files changed**: the 4 wiring edits + new extractor + tests + README +
+  CHANGELOG + corpus.json (+ any vendored `.wasm`).
+- **Extraction** per repo: files / nodes / edges / `verify-extraction` result.
+- **A/B** per repo: `with` vs `without` (tool calls, file Reads, cost) and a
+  one-line verdict — did codegraph reduce effort, and did both arms reach a
+  correct answer?
+- **Gaps / follow-ups** (node types not yet mapped, resolution edges missing,
+  framework routes, etc.).
+
+Hand the changes to the user. **Do not** run `git commit`/`push`,
+`npm publish`, or `scripts/release.sh`.
+
+## Notes
+- The A/B spawns real **paid** `claude -p` runs (opus, `--max-budget-usd`),
+  2 arms × 3 repos. The corpus dir `/tmp/codegraph-corpus` is shared with
+  `/agent-eval`, so clones are reused across runs.
+- Any new `*.wasm` must live in `src/extraction/wasm/` — `copy-assets` (run by
+  `npm run build`) ships it; otherwise it won't be in `dist/`.
+- An index must be served by the **same** binary that built it. Step 8 builds +
+  links the dev build first, so this holds.
+- If a grammar can't be obtained, or extraction can't reach PASS, **STOP and
+  report** — don't ship a half-wired language.

+ 10 - 0
.claude/skills/agent-eval/corpus.json

@@ -59,5 +59,15 @@
   ],
   "Svelte": [
     { "name": "shadcn-svelte", "repo": "https://github.com/huntabyte/shadcn-svelte", "size": "Medium", "files": "~600", "question": "How do shadcn-svelte components compose and apply their styling?" }
+  ],
+  "Lua": [
+    { "name": "lualine.nvim", "repo": "https://github.com/nvim-lualine/lualine.nvim", "size": "Small", "files": "~120", "question": "How does lualine assemble and render its statusline sections and components?" },
+    { "name": "telescope.nvim", "repo": "https://github.com/nvim-telescope/telescope.nvim", "size": "Medium", "files": "~80", "question": "How does Telescope wire a picker to its finder, sorter, and previewer?" },
+    { "name": "kong", "repo": "https://github.com/Kong/kong", "size": "Large", "files": "~1330", "question": "How does Kong execute plugins across a request's lifecycle phases?" }
+  ],
+  "Luau": [
+    { "name": "Knit", "repo": "https://github.com/Sleitnick/Knit", "size": "Small", "files": "~10", "question": "How does Knit register services and expose them to clients?" },
+    { "name": "vide", "repo": "https://github.com/centau/vide", "size": "Small", "files": "~40", "question": "How does vide track reactive sources and re-run effects when state changes?" },
+    { "name": "Fusion", "repo": "https://github.com/dphfox/Fusion", "size": "Medium", "files": "~115", "question": "How does Fusion build and update its reactive UI graph from state objects?" }
   ]
 }

+ 14 - 0
CHANGELOG.md

@@ -7,6 +7,20 @@ a [GitHub Release](https://github.com/colbymchenry/codegraph/releases) tagged
 This project follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/)
 and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## [Unreleased]
+
+### Added
+- **Lua**: CodeGraph now indexes Lua (`.lua`) — functions, methods (table `t.f`
+  and `t:m` definitions become methods with a `t::f` receiver-qualified name),
+  local variables, `require(...)` imports, and the call edges between them.
+  Querying a Lua project (Neovim plugins, Kong, OpenResty, game code) now
+  surfaces its modules, methods, and call graph.
+- **Luau** ([#232](https://github.com/colbymchenry/codegraph/issues/232)):
+  CodeGraph now indexes Luau (`.luau`), Roblox's typed superset of Lua —
+  everything Lua extracts, plus `type` / `export type` aliases, typed function
+  signatures, generics, and Roblox instance-path `require(script.Parent.X)`
+  imports.
+
 ## [0.8.0] - 2026-05-20
 
 ### Added

+ 3 - 1
README.md

@@ -107,7 +107,7 @@ The gains scale with codebase size: on large repos the agent answers from the in
 | **Full-Text Search** | Find code by name instantly across your entire codebase, powered by FTS5 |
 | **Impact Analysis** | Trace callers, callees, and the full impact radius of any symbol before making changes |
 | **Always Fresh** | File watcher uses native OS events (FSEvents/inotify/ReadDirectoryChangesW) with debounced auto-sync — the graph stays current as you code, zero config |
-| **19+ Languages** | TypeScript, JavaScript, Python, Go, Rust, Java, C#, PHP, Ruby, C, C++, Swift, Kotlin, Dart, Svelte, Liquid, Pascal/Delphi |
+| **19+ Languages** | TypeScript, JavaScript, Python, Go, Rust, Java, C#, PHP, Ruby, C, C++, Swift, Kotlin, Dart, Lua, Luau, Svelte, Liquid, Pascal/Delphi |
 | **Framework-aware Routes** | Recognizes web-framework routing files and links URL patterns to their handlers across 13 frameworks |
 | **100% Local** | No data leaves your machine. No API keys. No external services. SQLite database only |
 
@@ -447,6 +447,8 @@ The `.codegraph/config.json` file controls indexing:
 | Vue | `.vue` | Full support (script + script-setup extraction, Nuxt page/API/middleware routes) |
 | Liquid | `.liquid` | Full support |
 | Pascal / Delphi | `.pas`, `.dpr`, `.dpk`, `.lpr` | Full support (classes, records, interfaces, enums, DFM/FMX form files) |
+| Lua | `.lua` | Full support (functions, methods with receivers, local variables, `require` imports, call edges) |
+| Luau | `.luau` | Full support (everything in Lua, plus `type`/`export type` aliases, typed signatures, and Roblox instance-path `require`) |
 
 ## Troubleshooting
 

+ 177 - 0
__tests__/extraction.test.ts

@@ -3722,3 +3722,180 @@ class Svc {
     expect(decoratedNode?.name).toBe('method');
   });
 });
+
+// =============================================================================
+// Lua
+// =============================================================================
+
+describe('Lua Extraction', () => {
+  describe('Language detection', () => {
+    it('should detect Lua files', () => {
+      expect(detectLanguage('init.lua')).toBe('lua');
+      expect(detectLanguage('src/util.lua')).toBe('lua');
+    });
+
+    it('should report Lua as supported', () => {
+      expect(isLanguageSupported('lua')).toBe(true);
+      expect(getSupportedLanguages()).toContain('lua');
+    });
+  });
+
+  describe('Function extraction', () => {
+    it('should extract global and local functions', () => {
+      const code = `
+function configure(opts) return opts end
+local function helper(x) return x * 2 end
+`;
+      const result = extractFromSource('init.lua', code);
+      const funcs = result.nodes.filter((n) => n.kind === 'function').map((n) => n.name);
+      expect(funcs).toContain('configure');
+      expect(funcs).toContain('helper');
+      const configure = result.nodes.find((n) => n.name === 'configure');
+      expect(configure?.language).toBe('lua');
+      expect(configure?.signature).toBe('(opts)');
+    });
+
+    it('should split table/method functions into a receiver and method name', () => {
+      const code = `
+function M.connect(host, port) return host end
+function M:send(data) return self end
+`;
+      const result = extractFromSource('init.lua', code);
+      const methods = result.nodes.filter((n) => n.kind === 'method');
+      const connect = methods.find((m) => m.name === 'connect');
+      expect(connect?.qualifiedName).toBe('M::connect');
+      const send = methods.find((m) => m.name === 'send');
+      expect(send?.qualifiedName).toBe('M::send');
+    });
+  });
+
+  describe('Variable extraction', () => {
+    it('should extract local variable declarations', () => {
+      const code = `
+local M = {}
+local count = 0
+`;
+      const result = extractFromSource('mod.lua', code);
+      const vars = result.nodes.filter((n) => n.kind === 'variable').map((n) => n.name);
+      expect(vars).toContain('M');
+      expect(vars).toContain('count');
+    });
+  });
+
+  describe('Import extraction (require)', () => {
+    it('should extract require() in local declarations and bare calls', () => {
+      const code = `
+local socket = require("socket")
+local http = require "resty.http"
+require("side.effect")
+`;
+      const result = extractFromSource('net.lua', code);
+      const imports = result.nodes.filter((n) => n.kind === 'import').map((n) => n.name);
+      expect(imports).toContain('socket');
+      expect(imports).toContain('resty.http');
+      expect(imports).toContain('side.effect');
+
+      const ref = result.unresolvedReferences.find(
+        (r) => r.referenceKind === 'imports' && r.referenceName === 'socket'
+      );
+      expect(ref).toBeDefined();
+    });
+
+    // Regression: the tree-sitter-wasms Lua grammar (ABI 13) corrupts the shared
+    // WASM heap under web-tree-sitter 0.25, dropping nested calls/imports on every
+    // parse after the first. We vendor the ABI-15 grammar instead — this guards it
+    // by extracting several sources in sequence and asserting the LAST still works.
+    it('should keep extracting require across many sequential parses', () => {
+      let last;
+      for (let i = 0; i < 8; i++) {
+        last = extractFromSource(`f${i}.lua`, `local m = require("module.${i}")\nreturn m\n`);
+      }
+      const imports = last!.nodes.filter((n) => n.kind === 'import').map((n) => n.name);
+      expect(imports).toContain('module.7');
+    });
+  });
+
+  describe('Call extraction', () => {
+    it('should record intra-file calls as resolvable references', () => {
+      const code = `
+local function helper(x) return x end
+local function run(y) return helper(y) end
+`;
+      const result = extractFromSource('calls.lua', code);
+      const call = result.unresolvedReferences.find(
+        (r) => r.referenceKind === 'calls' && r.referenceName === 'helper'
+      );
+      expect(call).toBeDefined();
+    });
+  });
+});
+
+// =============================================================================
+// Luau (typed superset of Lua — https://luau.org)
+// =============================================================================
+
+describe('Luau Extraction', () => {
+  describe('Language detection', () => {
+    it('should detect Luau files', () => {
+      expect(detectLanguage('init.luau')).toBe('luau');
+      expect(detectLanguage('src/Client.luau')).toBe('luau');
+    });
+
+    it('should report Luau as supported', () => {
+      expect(isLanguageSupported('luau')).toBe(true);
+      expect(getSupportedLanguages()).toContain('luau');
+    });
+  });
+
+  describe('Type aliases', () => {
+    it('should extract `type` and `export type` definitions', () => {
+      const code = `
+export type Vector = { x: number, y: number }
+type Handler = (msg: string) -> boolean
+`;
+      const result = extractFromSource('types.luau', code);
+      const aliases = result.nodes.filter((n) => n.kind === 'type_alias');
+      const vector = aliases.find((a) => a.name === 'Vector');
+      expect(vector).toBeDefined();
+      expect(vector?.isExported).toBe(true);
+      const handler = aliases.find((a) => a.name === 'Handler');
+      expect(handler).toBeDefined();
+      expect(handler?.isExported).toBe(false);
+    });
+  });
+
+  describe('Typed functions and methods', () => {
+    it('should capture typed signatures and split methods by receiver', () => {
+      const code = `
+function configure(opts: { debug: boolean }): boolean
+	return opts.debug
+end
+function Client:fetch(path: string): Response
+	return path
+end
+`;
+      const result = extractFromSource('client.luau', code);
+      const configure = result.nodes.find((n) => n.kind === 'function' && n.name === 'configure');
+      expect(configure?.language).toBe('luau');
+      expect(configure?.signature).toBe('(opts: { debug: boolean }): boolean');
+      const fetch = result.nodes.find((n) => n.kind === 'method' && n.name === 'fetch');
+      expect(fetch?.qualifiedName).toBe('Client::fetch');
+    });
+  });
+
+  describe('Imports and variables', () => {
+    it('should extract string and Roblox instance-path require imports', () => {
+      const code = `
+local http = require("http")
+local Signal = require(script.Parent.Signal)
+local count = 0
+`;
+      const result = extractFromSource('mod.luau', code);
+      const imports = result.nodes.filter((n) => n.kind === 'import').map((n) => n.name);
+      expect(imports).toContain('http'); // string require
+      expect(imports).toContain('Signal'); // Roblox instance-path require
+      const vars = result.nodes.filter((n) => n.kind === 'variable').map((n) => n.name);
+      expect(vars).toContain('count');
+    });
+  });
+});

+ 60 - 0
scripts/add-lang/bench.sh

@@ -0,0 +1,60 @@
+#!/usr/bin/env bash
+# Add-lang benchmark for ONE repo:
+#   clone -> wipe+index (with the codegraph on PATH) -> verify extraction ->
+#   with/without retrieval A/B (reuses scripts/agent-eval/run-all.sh).
+#
+# Assumes the codegraph dev build is already built + linked on PATH — the skill
+# runs `npm run build && ./scripts/local-install.sh` ONCE before looping repos.
+# The A/B is skipped if extraction fails its critical checks (don't burn $ on a
+# broken extractor); set FORCE_AB=1 to run it anyway.
+#
+# Usage: bench.sh <lang> <repo-name> <repo-url> "<question>" [headless|tmux|all]
+# Env:   CORPUS   corpus dir (default /tmp/codegraph-corpus, shared with agent-eval)
+set -uo pipefail
+
+LANG_TOKEN="${1:?usage: bench.sh <lang> <repo-name> <repo-url> \"<question>\" [mode]}"
+NAME="${2:?repo-name required}"
+URL="${3:?repo-url required}"
+Q="${4:?question required}"
+MODE="${5:-headless}"
+
+HARNESS="$(cd "$(dirname "$0")" && pwd)"
+AGENT_EVAL="$(cd "$HARNESS/../agent-eval" && pwd)"
+CORPUS="${CORPUS:-/tmp/codegraph-corpus}"
+REPO="$CORPUS/$NAME"
+
+command -v codegraph >/dev/null || { echo "no codegraph on PATH (build + ./scripts/local-install.sh first)"; exit 1; }
+
+echo "==================== add-lang bench: $NAME ($LANG_TOKEN) ===================="
+echo "codegraph: $(command -v codegraph) -> $(codegraph --version 2>/dev/null || echo '?')"
+
+# 1. Ensure the repo (shallow clone, reuse if present).
+mkdir -p "$CORPUS"
+if [ -d "$REPO/.git" ]; then
+  echo "→ reusing checkout: $REPO"
+else
+  echo "→ cloning $URL"
+  git clone --depth 1 "$URL" "$REPO" || { echo "git clone failed"; exit 1; }
+fi
+
+# 2. Wipe + index with the binary under test.
+echo "→ wiping .codegraph and indexing"
+rm -rf "$REPO/.codegraph"
+( cd "$REPO" && codegraph init -i ) || { echo "indexing failed"; exit 1; }
+
+# 3. Verify extraction (cheap guard before the paid A/B).
+echo "→ verifying extraction"
+node "$HARNESS/verify-extraction.mjs" "$REPO" "$LANG_TOKEN"
+VERIFY=$?
+
+# 4. Retrieval A/B (skipped if extraction is broken, unless FORCE_AB=1).
+if [ "$VERIFY" -ne 0 ] && [ "${FORCE_AB:-0}" != "1" ]; then
+  echo "→ SKIPPING A/B — extraction failed critical checks (set FORCE_AB=1 to override)"
+else
+  echo "→ retrieval A/B (mode=$MODE)"
+  bash "$AGENT_EVAL/run-all.sh" "$REPO" "$Q" "$MODE"
+fi
+
+echo "==================== bench complete: $NAME (verify exit=$VERIFY) ===================="
+# Exit reflects extraction: 0 = pass/warn, 1 = critical fail, 2 = couldn't read status.
+exit "$VERIFY"

+ 75 - 0
scripts/add-lang/check-grammar.mjs

@@ -0,0 +1,75 @@
+#!/usr/bin/env node
+// Verify a tree-sitter grammar wasm is HEALTHY under the project's web-tree-sitter
+// runtime BEFORE writing an extractor. Prints the ABI version and parses a valid
+// sample many times in a multi-grammar context, to catch heap-corruption bugs
+// that silently drop nodes on every parse after the first.
+//
+// Why this exists: the tree-sitter-wasms Lua grammar is ABI 13 and corrupts the
+// shared WASM heap under web-tree-sitter 0.25 — Lua extraction degraded on every
+// file after the first (nested calls/imports vanished). The fix was to vendor the
+// upstream ABI-15 wasm. Run this on any new grammar first; if it FAILs, vendor a
+// newer build instead of using the tree-sitter-wasms one.
+//
+// Usage: node scripts/add-lang/check-grammar.mjs <lang|wasm-path> <valid-sample> [iterations]
+// Exit: 0 healthy, 1 corruption / parse errors, 2 could not run.
+// NOTE: the sample must be SYNTACTICALLY VALID — a broken sample fails for the
+//       wrong reason.
+
+import { readFileSync, existsSync } from 'node:fs';
+import { createRequire } from 'node:module';
+import { Parser, Language } from 'web-tree-sitter';
+
+const require = createRequire(import.meta.url);
+const fail = (code, msg) => { console.error(`[check-grammar] ${msg}`); process.exit(code); };
+
+const [token, sample, iterArg] = process.argv.slice(2);
+if (!token || !sample) fail(2, 'usage: check-grammar.mjs <lang|wasm-path> <valid-sample> [iterations]');
+if (!existsSync(sample)) fail(2, `sample not found: ${sample}`);
+const iters = iterArg ? parseInt(iterArg, 10) : 20;
+
+const SPECIAL = { csharp: 'c_sharp', 'c#': 'c_sharp' };
+function resolveWasm(t) {
+  if (t.endsWith('.wasm')) return existsSync(t) ? t : fail(2, `wasm not found: ${t}`);
+  const base = SPECIAL[t.toLowerCase()] ?? t.toLowerCase();
+  try { return require.resolve(`tree-sitter-wasms/out/tree-sitter-${base}.wasm`); } catch { /* try vendored */ }
+  const vendored = `src/extraction/wasm/tree-sitter-${base}.wasm`;
+  if (existsSync(vendored)) return vendored;
+  return fail(2, `no grammar for "${t}" — not in tree-sitter-wasms and not vendored`);
+}
+
+const wasmPath = resolveWasm(token);
+const source = readFileSync(sample, 'utf8');
+
+try { await Parser.init(); }
+catch { await Parser.init({ locateFile: () => require.resolve('web-tree-sitter/tree-sitter.wasm') }); }
+
+// Load a second, known-good grammar — the corruption surfaces under the
+// multi-grammar runtime that real indexing uses, not a single grammar in isolation.
+try { await Language.load(require.resolve('tree-sitter-wasms/out/tree-sitter-python.wasm')); } catch { /* ok */ }
+
+let language;
+try { language = await Language.load(wasmPath); }
+catch (e) { fail(2, `failed to load ${wasmPath}: ${e.message}`); }
+
+const parser = new Parser();
+parser.setLanguage(language);
+
+let ok = 0, err = 0;
+for (let i = 0; i < iters; i++) {
+  const tree = parser.parse(source);
+  if (tree.rootNode.hasError) err++; else ok++;
+}
+
+console.log(`grammar: ${wasmPath.split('/').pop()}`);
+console.log(`  ABI version: ${language.abiVersion}`);
+console.log(`  parses: ${ok} clean / ${err} with errors (of ${iters})`);
+if (err > 0) {
+  console.log(
+    `RESULT: FAIL — ${err}/${iters} parses produced ERROR trees on a valid sample. ` +
+    `This grammar corrupts under web-tree-sitter; vendor a newer (ABI 14/15) wasm ` +
+    `(see SKILL.md "Find a grammar"). Confirm your sample is syntactically valid first.`
+  );
+  process.exit(1);
+}
+console.log('RESULT: PASS — grammar parses cleanly and reuses safely.');
+process.exit(0);

+ 103 - 0
scripts/add-lang/dump-ast.mjs

@@ -0,0 +1,103 @@
+#!/usr/bin/env node
+// Dump the tree-sitter AST for a sample file so you can write a LanguageExtractor
+// mapping. Loads a grammar .wasm directly via web-tree-sitter (the same runtime
+// codegraph uses) — you do NOT need to register the language first.
+//
+// Usage:
+//   node scripts/add-lang/dump-ast.mjs <lang|wasm-path> <sample-file> [--depth=N] [--full]
+// Examples:
+//   node scripts/add-lang/dump-ast.mjs lua sample.lua
+//   node scripts/add-lang/dump-ast.mjs src/extraction/wasm/tree-sitter-zig.wasm a.zig --depth=4
+//
+// Output: an indented AST (named nodes, with field names) followed by a
+// node-type FREQUENCY table. The frequency table is the payoff — it tells you
+// which node types to map to functionTypes / classTypes / importTypes / etc.
+
+import { readFileSync, existsSync } from 'node:fs';
+import { createRequire } from 'node:module';
+import { Parser, Language } from 'web-tree-sitter';
+
+const require = createRequire(import.meta.url);
+const fail = (msg) => { console.error(`[dump-ast] ${msg}`); process.exit(1); };
+
+const argv = process.argv.slice(2);
+const positional = argv.filter((a) => !a.startsWith('--'));
+const [langOrWasm, sampleFile] = positional;
+const depthFlag = argv.find((a) => a.startsWith('--depth='));
+const showAll = argv.includes('--full'); // also print anonymous (token) nodes
+const maxDepth = depthFlag ? parseInt(depthFlag.split('=')[1], 10) : (showAll ? Infinity : 8);
+
+if (!langOrWasm || !sampleFile) {
+  fail('usage: dump-ast.mjs <lang|wasm-path> <sample-file> [--depth=N] [--full]');
+}
+if (!existsSync(sampleFile)) fail(`sample file not found: ${sampleFile}`);
+
+// Language tokens whose tree-sitter-wasms filename differs from the token.
+const WASM_SPECIAL = { csharp: 'c_sharp', 'c#': 'c_sharp' };
+
+function resolveWasm(token) {
+  if (token.endsWith('.wasm')) {
+    if (!existsSync(token)) fail(`wasm not found: ${token}`);
+    return token;
+  }
+  const base = WASM_SPECIAL[token.toLowerCase()] ?? token.toLowerCase();
+  try {
+    return require.resolve(`tree-sitter-wasms/out/tree-sitter-${base}.wasm`);
+  } catch {
+    /* not in tree-sitter-wasms — try a vendored copy */
+  }
+  const vendored = `src/extraction/wasm/tree-sitter-${base}.wasm`;
+  if (existsSync(vendored)) return vendored;
+  fail(
+    `no grammar for "${token}" — not in tree-sitter-wasms and not vendored at ` +
+      `${vendored}. Pass an explicit .wasm path, or vendor one (see SKILL.md "Find a grammar").`
+  );
+}
+
+const wasmPath = resolveWasm(langOrWasm);
+const source = readFileSync(sampleFile, 'utf8');
+
+try {
+  await Parser.init();
+} catch {
+  await Parser.init({ locateFile: () => require.resolve('web-tree-sitter/tree-sitter.wasm') });
+}
+
+let language;
+try {
+  language = await Language.load(wasmPath);
+} catch (e) {
+  fail(`failed to load grammar ${wasmPath}: ${e.message}`);
+}
+
+const parser = new Parser();
+parser.setLanguage(language);
+const tree = parser.parse(source);
+
+const freq = new Map();
+const snippet = (node) => {
+  const t = node.text.replace(/\s+/g, ' ').trim();
+  return t.length > 48 ? `${t.slice(0, 48)}…` : t;
+};
+
+function walk(node, depth, fieldName) {
+  if (node.isNamed) freq.set(node.type, (freq.get(node.type) || 0) + 1);
+  if ((node.isNamed || showAll) && depth <= maxDepth) {
+    const field = fieldName ? `${fieldName}: ` : '';
+    const leaf = node.childCount === 0 ? `  "${snippet(node)}"` : '';
+    console.log(`${'  '.repeat(depth)}${field}${node.type}  @${node.startPosition.row + 1}:${node.startPosition.column}${leaf}`);
+  }
+  for (let i = 0; i < node.childCount; i++) {
+    const child = node.child(i);
+    if (child) walk(child, depth + 1, node.fieldNameForChild(i));
+  }
+}
+
+console.log(`\n# AST for ${sampleFile}  (grammar: ${wasmPath.split('/').pop()})\n`);
+walk(tree.rootNode, 0, null);
+
+console.log('\n# Node-type frequency (named nodes) — map the relevant ones in your extractor:\n');
+[...freq.entries()]
+  .sort((a, b) => b[1] - a[1])
+  .forEach(([type, n]) => console.log(`  ${String(n).padStart(5)}  ${type}`));
+console.log();

+ 70 - 0
scripts/add-lang/verify-extraction.mjs

@@ -0,0 +1,70 @@
+#!/usr/bin/env node
+// Sanity-check that codegraph extracted REAL symbols (not just file/import nodes)
+// from a repo for a given language. Exits non-zero on a critical failure so it
+// can drive a write-extractor -> build -> re-check loop.
+//
+// Usage: node scripts/add-lang/verify-extraction.mjs <repo-path> <lang>
+// Reads `codegraph status <repo> --json` using whatever codegraph is on PATH,
+// so it reflects the binary that built the index.
+//
+// Exit codes: 0 = pass or soft-warn, 1 = critical fail, 2 = could not run.
+
+import { execFileSync } from 'node:child_process';
+
+const [repo, lang] = process.argv.slice(2);
+if (!repo || !lang) {
+  console.error('usage: verify-extraction.mjs <repo-path> <lang>');
+  process.exit(2);
+}
+
+let status;
+try {
+  const out = execFileSync('codegraph', ['status', repo, '--json'], { encoding: 'utf8' });
+  status = JSON.parse(out);
+} catch (e) {
+  console.error(`[verify] could not read codegraph status for ${repo}: ${e.message}`);
+  process.exit(2);
+}
+
+// Kinds that prove the extractor mapped AST node types (everything except
+// 'file' and 'import', which codegraph creates structurally for any language).
+const SYMBOL_KINDS = new Set([
+  'module', 'class', 'struct', 'interface', 'trait', 'protocol', 'function',
+  'method', 'property', 'field', 'variable', 'constant', 'enum', 'enum_member',
+  'type_alias', 'namespace', 'route', 'component',
+]);
+
+const byKind = status.nodesByKind || {};
+const langs = status.languages || [];
+const files = status.fileCount || 0;
+const edges = status.edgeCount || 0;
+const symbolKinds = Object.keys(byKind).filter((k) => SYMBOL_KINDS.has(k));
+const symbolCount = symbolKinds.reduce((s, k) => s + byKind[k], 0);
+
+const checks = [];
+const add = (severity, ok, label, detail) => checks.push({ severity, ok, label, detail });
+
+add('critical', status.initialized === true, 'index initialized', `initialized=${status.initialized}`);
+add('critical', langs.includes(lang), `language "${lang}" detected`, `languages=[${langs.join(', ')}]`);
+add('critical', symbolCount > 0, 'structural symbols extracted', `${symbolCount} symbols (${symbolKinds.join(', ') || 'NONE — only file/import nodes!'})`);
+add('soft', symbolCount >= files, 'symbol density >= 1/file', `${symbolCount} symbols across ${files} files`);
+add('soft', edges > files, 'edges resolved', `${edges} edges across ${files} files`);
+
+console.log(`\n# Extraction check — ${repo}  (lang=${lang}, backend=${status.backend})`);
+console.log(`  files=${files} nodes=${status.nodeCount} edges=${edges}`);
+console.log(`  nodesByKind: ${JSON.stringify(byKind)}\n`);
+for (const c of checks) console.log(`  ${c.ok ? '✓' : '✗'} ${c.label} — ${c.detail}`);
+
+const critical = checks.filter((c) => !c.ok && c.severity === 'critical');
+const soft = checks.filter((c) => !c.ok && c.severity === 'soft');
+console.log();
+if (critical.length) {
+  console.log(`RESULT: FAIL (${critical.length} critical) — extractor or grammar wiring is broken. Re-run dump-ast.mjs and fix the node-type mappings.`);
+  process.exit(1);
+}
+if (soft.length) {
+  console.log(`RESULT: WARN (${soft.length} soft) — extraction works but looks thin; inspect the counts above.`);
+  process.exit(0);
+}
+console.log('RESULT: PASS — extraction looks healthy.');
+process.exit(0);

+ 12 - 2
src/extraction/grammars.ts

@@ -35,6 +35,8 @@ const WASM_GRAMMAR_FILES: Record<GrammarLanguage, string> = {
   dart: 'tree-sitter-dart.wasm',
   pascal: 'tree-sitter-pascal.wasm',
   scala: 'tree-sitter-scala.wasm',
+  lua: 'tree-sitter-lua.wasm',
+  luau: 'tree-sitter-luau.wasm',
 };
 
 /**
@@ -78,6 +80,8 @@ export const EXTENSION_MAP: Record<string, Language> = {
   '.fmx': 'pascal',
   '.scala': 'scala',
   '.sc': 'scala',
+  '.lua': 'lua',
+  '.luau': 'luau',
 };
 
 /**
@@ -125,8 +129,12 @@ export async function loadGrammarsForLanguages(languages: Language[]): Promise<v
   for (const lang of toLoad) {
     const wasmFile = WASM_GRAMMAR_FILES[lang];
     try {
-      // Pascal and Scala ship their own WASMs (not in tree-sitter-wasms)
-      const wasmPath = (lang === 'pascal' || lang === 'scala')
+      // Some grammars ship their own WASMs (not in tree-sitter-wasms, or the
+      // tree-sitter-wasms build is too old). Lua: tree-sitter-wasms ships an
+      // ABI-13 build that corrupts the shared WASM heap under web-tree-sitter
+      // 0.25 (drops nested calls/imports on every file after the first); we
+      // vendor the upstream ABI-15 wasm instead.
+      const wasmPath = (lang === 'pascal' || lang === 'scala' || lang === 'lua' || lang === 'luau')
         ? path.join(__dirname, 'wasm', wasmFile)
         : require.resolve(`tree-sitter-wasms/out/${wasmFile}`);
       const language = await WasmLanguage.load(wasmPath);
@@ -291,6 +299,8 @@ export function getLanguageDisplayName(language: Language): string {
     liquid: 'Liquid',
     pascal: 'Pascal / Delphi',
     scala: 'Scala',
+    lua: 'Lua',
+    luau: 'Luau',
     unknown: 'Unknown',
   };
   return names[language] || language;

+ 4 - 0
src/extraction/languages/index.ts

@@ -23,6 +23,8 @@ import { kotlinExtractor } from './kotlin';
 import { dartExtractor } from './dart';
 import { pascalExtractor } from './pascal';
 import { scalaExtractor } from './scala';
+import { luaExtractor } from './lua';
+import { luauExtractor } from './luau';
 
 export const EXTRACTORS: Partial<Record<Language, LanguageExtractor>> = {
   typescript: typescriptExtractor,
@@ -43,4 +45,6 @@ export const EXTRACTORS: Partial<Record<Language, LanguageExtractor>> = {
   dart: dartExtractor,
   pascal: pascalExtractor,
   scala: scalaExtractor,
+  lua: luaExtractor,
+  luau: luauExtractor,
 };

+ 152 - 0
src/extraction/languages/lua.ts

@@ -0,0 +1,152 @@
+import type { Node as SyntaxNode } from 'web-tree-sitter';
+import { getNodeText, getChildByField } from '../tree-sitter-helpers';
+import type { LanguageExtractor } from '../tree-sitter-types';
+
+// Node names follow the vendored ABI-15 grammar (@tree-sitter-grammars/
+// tree-sitter-lua), NOT the older tree-sitter-wasms build — see grammars.ts.
+
+/** First descendant of a given type (breadth-first), or null. */
+function findDescendant(node: SyntaxNode, type: string): SyntaxNode | null {
+  const queue: SyntaxNode[] = [...node.namedChildren];
+  while (queue.length) {
+    const n = queue.shift()!;
+    if (n.type === type) return n;
+    queue.push(...n.namedChildren);
+  }
+  return null;
+}
+
+/**
+ * If `callNode` is a `require(...)` call, return the module name; otherwise null.
+ * Lua/Luau have no import statement — modules are loaded by calling the global
+ * `require`. Handles both:
+ *   - string requires:  `require("net.http")` / `require "net.http"`  → "net.http"
+ *   - Roblox/Luau path requires: `require(script.Parent.Signal)`      → "Signal"
+ *     (the dominant idiom in Roblox code, where the argument is an instance path
+ *     rather than a string — use the trailing field as the module name).
+ */
+function requireModule(callNode: SyntaxNode, source: string): string | null {
+  // function_call > name: <callee>, arguments: arguments
+  const name = getChildByField(callNode, 'name');
+  // A dotted/colon callee (e.g. `socket.connect`) is dot/method_index_expression,
+  // never a bare `require`.
+  if (!name || name.type !== 'identifier') return null;
+  if (getNodeText(name, source) !== 'require') return null;
+
+  const args = getChildByField(callNode, 'arguments');
+  if (!args) return null;
+
+  // String require — `string > content: string_content` gives the bare name.
+  const content = findDescendant(args, 'string_content');
+  if (content) return getNodeText(content, source).trim() || null;
+  const str = findDescendant(args, 'string');
+  if (str) {
+    const mod = getNodeText(str, source)
+      .trim()
+      .replace(/^\[\[/, '')
+      .replace(/\]\]$/, '')
+      .replace(/^["']/, '')
+      .replace(/["']$/, '');
+    if (mod) return mod;
+  }
+
+  // Roblox/Luau instance-path require: `require(script.Parent.Signal)` → "Signal".
+  const idx = findDescendant(args, 'dot_index_expression') ?? findDescendant(args, 'method_index_expression');
+  if (idx) {
+    const field = getChildByField(idx, 'field') ?? getChildByField(idx, 'method');
+    if (field) return getNodeText(field, source).trim() || null;
+  }
+  return null;
+}
+
+export const luaExtractor: LanguageExtractor = {
+  // function_declaration covers global (`function f`), table (`function t.f`),
+  // method (`function t:m`), and local (`local function f`) forms — the form is
+  // distinguished by the `name:` child (identifier / dot_index_expression /
+  // method_index_expression) and a `local` token, not by separate node types.
+  // Anonymous `function() ... end` (function_definition) has no name and is
+  // captured via its enclosing variable instead.
+  functionTypes: ['function_declaration'],
+  classTypes: [], // Lua has no classes/structs/interfaces/enums — tables are used for everything
+  methodTypes: [],
+  interfaceTypes: [],
+  structTypes: [],
+  enumTypes: [],
+  typeAliasTypes: [],
+  importTypes: [], // `require` is a function_call — handled in visitNode below
+  callTypes: ['function_call'],
+  variableTypes: ['variable_declaration'], // see the `lua` branch in extractVariable
+  nameField: 'name',
+  bodyField: 'body',
+  paramsField: 'parameters',
+
+  getSignature: (node, source) => {
+    const params = getChildByField(node, 'parameters');
+    return params ? getNodeText(params, source) : undefined;
+  },
+
+  // `function t.f()` / `function t:m()` are methods on table `t`: return the
+  // table as the receiver so they extract as methods with a `t::f` qualified
+  // name. Plain `function f()` / `local function f()` have no receiver and stay
+  // functions. (For `a.b.c`, the receiver is the nested `a.b`.)
+  getReceiverType: (node, source) => {
+    const name = getChildByField(node, 'name');
+    if (name && (name.type === 'dot_index_expression' || name.type === 'method_index_expression')) {
+      const table = getChildByField(name, 'table');
+      if (table) return getNodeText(table, source);
+    }
+    return undefined;
+  },
+
+  // Emit import nodes for `require(...)`. The local-declaration form is handled
+  // explicitly because the variable branch skips the initializer subtree; bare
+  // and global `require` calls are caught when the walker reaches the
+  // function_call node.
+  visitNode: (node, ctx) => {
+    const source = ctx.source;
+
+    const emit = (callNode: SyntaxNode): void => {
+      const mod = requireModule(callNode, source);
+      if (!mod) return;
+      const imp = ctx.createNode('import', mod, callNode, {
+        signature: getNodeText(callNode, source).trim().slice(0, 100),
+      });
+      if (imp && ctx.nodeStack.length > 0) {
+        const parentId = ctx.nodeStack[ctx.nodeStack.length - 1];
+        if (parentId) {
+          ctx.addUnresolvedReference({
+            fromNodeId: parentId,
+            referenceName: mod,
+            referenceKind: 'imports',
+            line: callNode.startPosition.row + 1,
+            column: callNode.startPosition.column,
+          });
+        }
+      }
+    };
+
+    // Bare / global `require("x")` — claim it so it isn't double-counted as a call.
+    if (node.type === 'function_call') {
+      if (requireModule(node, source)) {
+        emit(node);
+        return true;
+      }
+      return false;
+    }
+
+    // `local x = require("x")` — variable_declaration wraps an assignment_statement
+    // whose initializer subtree the variable branch will skip, so dig it out here.
+    if (node.type === 'variable_declaration') {
+      const assign = node.namedChildren.find((c) => c.type === 'assignment_statement');
+      const exprList = assign?.namedChildren.find((c) => c.type === 'expression_list');
+      if (exprList) {
+        for (const val of exprList.namedChildren) {
+          if (val.type === 'function_call') emit(val);
+        }
+      }
+      return false;
+    }
+
+    return false;
+  },
+};

+ 36 - 0
src/extraction/languages/luau.ts

@@ -0,0 +1,36 @@
+import { getNodeText, getChildByField } from '../tree-sitter-helpers';
+import type { LanguageExtractor } from '../tree-sitter-types';
+import { luaExtractor } from './lua';
+
+// Luau (https://luau.org) is a gradually-typed superset of Lua. The
+// tree-sitter-luau grammar reuses the same node names as the vendored Lua
+// grammar (function_declaration, variable_declaration, function_call,
+// dot/method_index_expression, …), so the Luau extractor extends the Lua one
+// and adds the type-system pieces Luau introduces:
+//   - `type X = ...` / `export type X = ...`  → type_definition (type_alias)
+//   - typed parameters and return types        → richer signatures
+//
+// require detection, receiver-splitting (t.f / t:m → methods), and local
+// variable extraction are inherited unchanged from luaExtractor. The shared
+// `extractVariable` core branch is gated on `lua` || `luau`.
+export const luauExtractor: LanguageExtractor = {
+  ...luaExtractor,
+
+  // `type X = ...` and `export type X = ...`
+  typeAliasTypes: ['type_definition'],
+
+  // Only Luau `export type` is exported; the keyword leads the node.
+  isExported: (node, source) => source.slice(node.startIndex, node.startIndex + 7) === 'export ',
+
+  // Params + Luau return type (the named child after `parameters`, before the body).
+  getSignature: (node, source) => {
+    const params = getChildByField(node, 'parameters');
+    if (!params) return undefined;
+    let sig = getNodeText(params, source);
+    const kids = node.namedChildren;
+    const idx = kids.findIndex((c) => c.startIndex === params.startIndex);
+    const ret = idx >= 0 ? kids[idx + 1] : null;
+    if (ret && ret.type !== 'block') sig += `: ${getNodeText(ret, source)}`;
+    return sig;
+  },
+};

+ 28 - 0
src/extraction/tree-sitter.ts

@@ -50,6 +50,17 @@ function extractName(node: SyntaxNode, source: string, extractor: LanguageExtrac
       const innerName = getChildByField(resolved, 'declarator') || resolved.namedChild(0);
       return innerName ? getNodeText(innerName, source) : getNodeText(resolved, source);
     }
+    // Lua: `function t.f()` / `function t:m()` — the name node is a dot/method
+    // index expression; the simple name is the trailing field/method (the table
+    // receiver is captured separately via getReceiverType).
+    if (resolved.type === 'dot_index_expression') {
+      const field = getChildByField(resolved, 'field');
+      if (field) return getNodeText(field, source);
+    }
+    if (resolved.type === 'method_index_expression') {
+      const method = getChildByField(resolved, 'method');
+      if (method) return getNodeText(method, source);
+    }
     return getNodeText(resolved, source);
   }
 
@@ -1111,6 +1122,23 @@ export class TreeSitterExtractor {
           }
         }
       }
+    } else if (this.language === 'lua' || this.language === 'luau') {
+      // Lua/Luau: variable_declaration → assignment_statement → variable_list
+      //      (name: identifier...) = expression_list. `local x, y = 1, 2`
+      //      declares multiple names; only plain identifiers are locals.
+      const assign = node.namedChildren.find((c) => c.type === 'assignment_statement') ?? node;
+      const varList = assign.namedChildren.find((c) => c.type === 'variable_list');
+      const exprList = assign.namedChildren.find((c) => c.type === 'expression_list');
+      const values = exprList ? exprList.namedChildren : [];
+      const names = varList ? varList.namedChildren.filter((c) => c.type === 'identifier') : [];
+      names.forEach((nameNode, i) => {
+        const name = getNodeText(nameNode, this.source);
+        if (!name) return;
+        const valueNode = values[i];
+        const initValue = valueNode ? getNodeText(valueNode, this.source).slice(0, 100) : undefined;
+        const initSignature = initValue ? `= ${initValue}${initValue.length >= 100 ? '...' : ''}` : undefined;
+        this.createNode(kind, name, nameNode, { docstring, signature: initSignature, isExported });
+      });
     } else {
       // Generic fallback for other languages
       // Try to find identifier children

BIN
src/extraction/wasm/tree-sitter-lua.wasm


BIN
src/extraction/wasm/tree-sitter-luau.wasm


+ 6 - 0
src/types.ts

@@ -85,6 +85,8 @@ export const LANGUAGES = [
   'liquid',
   'pascal',
   'scala',
+  'lua',
+  'luau',
   'unknown',
 ] as const;
 
@@ -545,6 +547,10 @@ export const DEFAULT_CONFIG: CodeGraphConfig = {
     // Scala
     '**/*.scala',
     '**/*.sc',
+    // Lua
+    '**/*.lua',
+    // Luau
+    '**/*.luau',
   ],
   exclude: [
     // Version control