Просмотр исходного кода

feat(lua): add Lua language support

Index Lua (.lua): functions, methods (table `t.f`/`t:m` defs become methods with a `t::f` receiver), local variables, `require` imports, and call edges.

Vendor the upstream ABI-15 tree-sitter Lua grammar — the tree-sitter-wasms build is ABI 13 and corrupts the shared WASM heap under web-tree-sitter 0.25, dropping nested calls/imports after the first file in a project.

Harden the add-lang skill from this run: add check-grammar.mjs (flags ABI/heap corruption up front), document the DEFAULT_CONFIG.include edit and grammar health-check in SKILL.md, and add Lua repos to the eval corpus.
Colby McHenry 1 месяц назад
Родитель
Сommit
4ffe9229a7

+ 43 - 17
.claude/skills/add-lang/SKILL.md

@@ -25,9 +25,9 @@ single-token form everywhere (`csharp`, not `c#`).
 Copy this checklist and work through it in order:
 Copy this checklist and work through it in order:
 ```
 ```
 - [ ] 1. Resolve language; bail early if already supported (just benchmark)
 - [ ] 1. Resolve language; bail early if already supported (just benchmark)
-- [ ] 2. Find a grammar (tree-sitter-wasms vs vendor a .wasm)
+- [ ] 2. Find a grammar + health-check it (ABI / heap corruption)
 - [ ] 3. Discover the grammar's AST node types (dump-ast.mjs)
 - [ ] 3. Discover the grammar's AST node types (dump-ast.mjs)
-- [ ] 4. Wire the language (4 source edits)
+- [ ] 4. Wire the language (4 files; sometimes a 5th core touch)
 - [ ] 5. Build + verify-extraction loop until PASS
 - [ ] 5. Build + verify-extraction loop until PASS
 - [ ] 6. Add extraction tests; make them green
 - [ ] 6. Add extraction tests; make them green
 - [ ] 7. Auto-pick 3 popular repos by size tier; add to corpus.json
 - [ ] 7. Auto-pick 3 popular repos by size tier; add to corpus.json
@@ -44,20 +44,35 @@ Check whether the language is already wired: look for the token in the
 `typescript`, `rust`), **skip Steps 2–6** and go straight to benchmarking
 `typescript`, `rust`), **skip Steps 2–6** and go straight to benchmarking
 (Steps 7–8) to validate/measure it — note in the report that no code changed.
 (Steps 7–8) to validate/measure it — note in the report that no code changed.
 
 
-### Step 2 — Find a grammar
+### Step 2 — Find a grammar, then health-check it
 
 
 ```bash
 ```bash
 ls node_modules/tree-sitter-wasms/out/ | grep -i <lang>   # csharp -> c_sharp
 ls node_modules/tree-sitter-wasms/out/ | grep -i <lang>   # csharp -> c_sharp
 ```
 ```
-- **Present** → off-the-shelf. No vendoring; `grammars.ts` resolves it from
-  `tree-sitter-wasms` automatically. (Most popular languages are here: lua,
-  elixir, zig, ocaml, solidity, toml, yaml, …)
-- **Absent** → you must vendor a `.wasm` into `src/extraction/wasm/` (like
-  `pascal`/`scala`) and add the token to the vendored branch in Step 4. Get a
-  wasm from the grammar's npm package (a prebuilt `*.wasm`) or by building one
-  (`npx tree-sitter-cli build --wasm`, which needs emscripten/Docker — the
-  `tree-sitter` CLI is usually not on PATH here). **If you cannot obtain a
-  wasm, STOP and tell the user** — the language can't be added without it.
+- **Present** → likely off-the-shelf; `grammars.ts` resolves it from
+  `tree-sitter-wasms` automatically. (Many languages: elixir, zig, ocaml,
+  solidity, toml, yaml, …)
+- **Absent** → vendor a `.wasm` into `src/extraction/wasm/` (like `pascal` /
+  `scala` / `lua`) and add the token to the vendored branch in Step 4.
+
+**Always health-check before writing an extractor — a *present* grammar can
+still be unusable:**
+```bash
+node scripts/add-lang/check-grammar.mjs <lang> path/to/valid-sample.<ext>
+```
+It prints the grammar's ABI version and parses a valid sample many times in a
+multi-grammar runtime. If it **FAILs** (ERROR trees on valid code — an old ABI
+corrupting the shared WASM heap, which silently drops nested calls/imports on
+every file after the first; e.g. the tree-sitter-wasms **Lua** grammar is ABI 13
+and fails), do NOT use that wasm. **Vendor a newer (ABI 14/15) build instead:**
+```bash
+npm pack @tree-sitter-grammars/tree-sitter-<lang>   # often ships a prebuilt *.wasm
+# or build one: npx tree-sitter build --wasm   (needs Docker/emscripten)
+cp <the>.wasm src/extraction/wasm/tree-sitter-<lang>.wasm
+```
+then add the token to the vendored branch in Step 4 and re-run check-grammar on
+the vendored path until it PASSes. **If you cannot obtain a healthy wasm, STOP
+and tell the user.**
 
 
 ### Step 3 — Discover AST node types
 ### Step 3 — Discover AST node types
 
 
@@ -74,30 +89,41 @@ language's paradigm as a model: `rust.ts`/`scala.ts` (functional, traits),
 `java.ts`/`csharp.ts` (OO), `python.ts`/`ruby.ts` (scripting), `go.ts`
 `java.ts`/`csharp.ts` (OO), `python.ts`/`ruby.ts` (scripting), `go.ts`
 (top-level methods + receivers).
 (top-level methods + receivers).
 
 
-### Step 4 — Wire the language (4 edits)
+### Step 4 — Wire the language (4 files)
 
 
 These are exact, fragile wiring — match the existing style precisely:
 These are exact, fragile wiring — match the existing style precisely:
 
 
-1. **`src/types.ts`** — add `'<lang>',` to the `LANGUAGES` const (before
-   `'unknown'`).
+1. **`src/types.ts`** — TWO edits:
+   - add `'<lang>',` to the `LANGUAGES` const (before `'unknown'`);
+   - add `'**/*.<ext>',` to `DEFAULT_CONFIG.include`. **Don't skip this** — it's
+     the file-scan allowlist; without the glob, `codegraph init` finds **0
+     files** even though detection/extraction are wired.
 2. **`src/extraction/grammars.ts`** — three maps:
 2. **`src/extraction/grammars.ts`** — three maps:
    - `WASM_GRAMMAR_FILES`: `<lang>: 'tree-sitter-<lang>.wasm',`
    - `WASM_GRAMMAR_FILES`: `<lang>: 'tree-sitter-<lang>.wasm',`
    - `EXTENSION_MAP`: each file extension → `'<lang>'` (e.g. `'.lua': 'lua',`)
    - `EXTENSION_MAP`: each file extension → `'<lang>'` (e.g. `'.lua': 'lua',`)
    - `getLanguageDisplayName`: `<lang>: '<Display Name>',`
    - `getLanguageDisplayName`: `<lang>: '<Display Name>',`
    - **vendored only**: add `<lang>` to the
    - **vendored only**: add `<lang>` to the
-     `(lang === 'pascal' || lang === 'scala')` wasm-path branch.
+     `(lang === 'pascal' || lang === 'scala' || …)` wasm-path branch.
 3. **`src/extraction/languages/<lang>.ts`** — new file exporting
 3. **`src/extraction/languages/<lang>.ts`** — new file exporting
    `export const <lang>Extractor: LanguageExtractor = { … }`. Map the node types
    `export const <lang>Extractor: LanguageExtractor = { … }`. Map the node types
    from Step 3. Required fields: `functionTypes`, `classTypes`, `methodTypes`,
    from Step 3. Required fields: `functionTypes`, `classTypes`, `methodTypes`,
    `interfaceTypes`, `structTypes`, `enumTypes`, `typeAliasTypes`,
    `interfaceTypes`, `structTypes`, `enumTypes`, `typeAliasTypes`,
    `importTypes`, `callTypes`, `variableTypes`, `nameField`, `bodyField`,
    `importTypes`, `callTypes`, `variableTypes`, `nameField`, `bodyField`,
    `paramsField`. Add hooks as the grammar needs them (`getSignature`,
    `paramsField`. Add hooks as the grammar needs them (`getSignature`,
-   `getVisibility`, `isExported`, `extractImport`, `getReceiverType`,
+   `getVisibility`, `isExported`, `extractImport`, `visitNode`, `getReceiverType`,
    `interfaceKind`, `enumMemberTypes`, etc. — see
    `interfaceKind`, `enumMemberTypes`, etc. — see
    `src/extraction/tree-sitter-types.ts`).
    `src/extraction/tree-sitter-types.ts`).
 4. **`src/extraction/languages/index.ts`** — `import { <lang>Extractor } from
 4. **`src/extraction/languages/index.ts`** — `import { <lang>Extractor } from
    './<lang>';` and add `<lang>: <lang>Extractor,` to `EXTRACTORS`.
    './<lang>';` and add `<lang>: <lang>Extractor,` to `EXTRACTORS`.
 
 
+**Sometimes a 5th, core touch in `src/extraction/tree-sitter.ts`** — variable
+extraction has per-language branches in `extractVariable` (the generic fallback
+only finds direct `identifier`/`variable_declarator` children). If the grammar
+nests declared names (e.g. Lua's `variable_declaration → variable_list`), add a
+`} else if (this.language === '<lang>')` branch there, mirroring the existing
+ts/python/go ones. Import forms that aren't a distinct node (Lua/Ruby `require`
+is a *call*) are handled in the extractor's `visitNode` hook instead.
+
 ### Step 5 — Build + verify loop
 ### Step 5 — Build + verify loop
 
 
 ```bash
 ```bash

+ 5 - 0
.claude/skills/agent-eval/corpus.json

@@ -59,5 +59,10 @@
   ],
   ],
   "Svelte": [
   "Svelte": [
     { "name": "shadcn-svelte", "repo": "https://github.com/huntabyte/shadcn-svelte", "size": "Medium", "files": "~600", "question": "How do shadcn-svelte components compose and apply their styling?" }
     { "name": "shadcn-svelte", "repo": "https://github.com/huntabyte/shadcn-svelte", "size": "Medium", "files": "~600", "question": "How do shadcn-svelte components compose and apply their styling?" }
+  ],
+  "Lua": [
+    { "name": "lualine.nvim", "repo": "https://github.com/nvim-lualine/lualine.nvim", "size": "Small", "files": "~120", "question": "How does lualine assemble and render its statusline sections and components?" },
+    { "name": "telescope.nvim", "repo": "https://github.com/nvim-telescope/telescope.nvim", "size": "Medium", "files": "~80", "question": "How does Telescope wire a picker to its finder, sorter, and previewer?" },
+    { "name": "kong", "repo": "https://github.com/Kong/kong", "size": "Large", "files": "~1330", "question": "How does Kong execute plugins across a request's lifecycle phases?" }
   ]
   ]
 }
 }

+ 9 - 0
CHANGELOG.md

@@ -7,6 +7,15 @@ a [GitHub Release](https://github.com/colbymchenry/codegraph/releases) tagged
 This project follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/)
 This project follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/)
 and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
 
+## [Unreleased]
+
+### Added
+- **Lua**: CodeGraph now indexes Lua (`.lua`) — functions, methods (table `t.f`
+  and `t:m` definitions become methods with a `t::f` receiver-qualified name),
+  local variables, `require(...)` imports, and the call edges between them.
+  Querying a Lua project (Neovim plugins, Kong, OpenResty, game code) now
+  surfaces its modules, methods, and call graph.
+
 ## [0.8.0] - 2026-05-20
 ## [0.8.0] - 2026-05-20
 
 
 ### Added
 ### Added

+ 2 - 1
README.md

@@ -107,7 +107,7 @@ The gains scale with codebase size: on large repos the agent answers from the in
 | **Full-Text Search** | Find code by name instantly across your entire codebase, powered by FTS5 |
 | **Full-Text Search** | Find code by name instantly across your entire codebase, powered by FTS5 |
 | **Impact Analysis** | Trace callers, callees, and the full impact radius of any symbol before making changes |
 | **Impact Analysis** | Trace callers, callees, and the full impact radius of any symbol before making changes |
 | **Always Fresh** | File watcher uses native OS events (FSEvents/inotify/ReadDirectoryChangesW) with debounced auto-sync — the graph stays current as you code, zero config |
 | **Always Fresh** | File watcher uses native OS events (FSEvents/inotify/ReadDirectoryChangesW) with debounced auto-sync — the graph stays current as you code, zero config |
-| **19+ Languages** | TypeScript, JavaScript, Python, Go, Rust, Java, C#, PHP, Ruby, C, C++, Swift, Kotlin, Dart, Svelte, Liquid, Pascal/Delphi |
+| **19+ Languages** | TypeScript, JavaScript, Python, Go, Rust, Java, C#, PHP, Ruby, C, C++, Swift, Kotlin, Dart, Lua, Svelte, Liquid, Pascal/Delphi |
 | **Framework-aware Routes** | Recognizes web-framework routing files and links URL patterns to their handlers across 13 frameworks |
 | **Framework-aware Routes** | Recognizes web-framework routing files and links URL patterns to their handlers across 13 frameworks |
 | **100% Local** | No data leaves your machine. No API keys. No external services. SQLite database only |
 | **100% Local** | No data leaves your machine. No API keys. No external services. SQLite database only |
 
 
@@ -447,6 +447,7 @@ The `.codegraph/config.json` file controls indexing:
 | Vue | `.vue` | Full support (script + script-setup extraction, Nuxt page/API/middleware routes) |
 | Vue | `.vue` | Full support (script + script-setup extraction, Nuxt page/API/middleware routes) |
 | Liquid | `.liquid` | Full support |
 | Liquid | `.liquid` | Full support |
 | Pascal / Delphi | `.pas`, `.dpr`, `.dpk`, `.lpr` | Full support (classes, records, interfaces, enums, DFM/FMX form files) |
 | Pascal / Delphi | `.pas`, `.dpr`, `.dpk`, `.lpr` | Full support (classes, records, interfaces, enums, DFM/FMX form files) |
+| Lua | `.lua` | Full support (functions, methods with receivers, local variables, `require` imports, call edges) |
 
 
 ## Troubleshooting
 ## Troubleshooting
 
 

+ 107 - 0
__tests__/extraction.test.ts

@@ -3722,3 +3722,110 @@ class Svc {
     expect(decoratedNode?.name).toBe('method');
     expect(decoratedNode?.name).toBe('method');
   });
   });
 });
 });
+
+// =============================================================================
+// Lua
+// =============================================================================
+
+describe('Lua Extraction', () => {
+  describe('Language detection', () => {
+    it('should detect Lua files', () => {
+      expect(detectLanguage('init.lua')).toBe('lua');
+      expect(detectLanguage('src/util.lua')).toBe('lua');
+    });
+
+    it('should report Lua as supported', () => {
+      expect(isLanguageSupported('lua')).toBe(true);
+      expect(getSupportedLanguages()).toContain('lua');
+    });
+  });
+
+  describe('Function extraction', () => {
+    it('should extract global and local functions', () => {
+      const code = `
+function configure(opts) return opts end
+local function helper(x) return x * 2 end
+`;
+      const result = extractFromSource('init.lua', code);
+      const funcs = result.nodes.filter((n) => n.kind === 'function').map((n) => n.name);
+      expect(funcs).toContain('configure');
+      expect(funcs).toContain('helper');
+      const configure = result.nodes.find((n) => n.name === 'configure');
+      expect(configure?.language).toBe('lua');
+      expect(configure?.signature).toBe('(opts)');
+    });
+
+    it('should split table/method functions into a receiver and method name', () => {
+      const code = `
+function M.connect(host, port) return host end
+function M:send(data) return self end
+`;
+      const result = extractFromSource('init.lua', code);
+      const methods = result.nodes.filter((n) => n.kind === 'method');
+      const connect = methods.find((m) => m.name === 'connect');
+      expect(connect?.qualifiedName).toBe('M::connect');
+      const send = methods.find((m) => m.name === 'send');
+      expect(send?.qualifiedName).toBe('M::send');
+    });
+  });
+
+  describe('Variable extraction', () => {
+    it('should extract local variable declarations', () => {
+      const code = `
+local M = {}
+local count = 0
+`;
+      const result = extractFromSource('mod.lua', code);
+      const vars = result.nodes.filter((n) => n.kind === 'variable').map((n) => n.name);
+      expect(vars).toContain('M');
+      expect(vars).toContain('count');
+    });
+  });
+
+  describe('Import extraction (require)', () => {
+    it('should extract require() in local declarations and bare calls', () => {
+      const code = `
+local socket = require("socket")
+local http = require "resty.http"
+require("side.effect")
+`;
+      const result = extractFromSource('net.lua', code);
+      const imports = result.nodes.filter((n) => n.kind === 'import').map((n) => n.name);
+      expect(imports).toContain('socket');
+      expect(imports).toContain('resty.http');
+      expect(imports).toContain('side.effect');
+
+      const ref = result.unresolvedReferences.find(
+        (r) => r.referenceKind === 'imports' && r.referenceName === 'socket'
+      );
+      expect(ref).toBeDefined();
+    });
+
+    // Regression: the tree-sitter-wasms Lua grammar (ABI 13) corrupts the shared
+    // WASM heap under web-tree-sitter 0.25, dropping nested calls/imports on every
+    // parse after the first. We vendor the ABI-15 grammar instead — this guards it
+    // by extracting several sources in sequence and asserting the LAST still works.
+    it('should keep extracting require across many sequential parses', () => {
+      let last;
+      for (let i = 0; i < 8; i++) {
+        last = extractFromSource(`f${i}.lua`, `local m = require("module.${i}")\nreturn m\n`);
+      }
+      const imports = last!.nodes.filter((n) => n.kind === 'import').map((n) => n.name);
+      expect(imports).toContain('module.7');
+    });
+  });
+
+  describe('Call extraction', () => {
+    it('should record intra-file calls as resolvable references', () => {
+      const code = `
+local function helper(x) return x end
+local function run(y) return helper(y) end
+`;
+      const result = extractFromSource('calls.lua', code);
+      const call = result.unresolvedReferences.find(
+        (r) => r.referenceKind === 'calls' && r.referenceName === 'helper'
+      );
+      expect(call).toBeDefined();
+    });
+  });
+});

+ 75 - 0
scripts/add-lang/check-grammar.mjs

@@ -0,0 +1,75 @@
+#!/usr/bin/env node
+// Verify a tree-sitter grammar wasm is HEALTHY under the project's web-tree-sitter
+// runtime BEFORE writing an extractor. Prints the ABI version and parses a valid
+// sample many times in a multi-grammar context, to catch heap-corruption bugs
+// that silently drop nodes on every parse after the first.
+//
+// Why this exists: the tree-sitter-wasms Lua grammar is ABI 13 and corrupts the
+// shared WASM heap under web-tree-sitter 0.25 — Lua extraction degraded on every
+// file after the first (nested calls/imports vanished). The fix was to vendor the
+// upstream ABI-15 wasm. Run this on any new grammar first; if it FAILs, vendor a
+// newer build instead of using the tree-sitter-wasms one.
+//
+// Usage: node scripts/add-lang/check-grammar.mjs <lang|wasm-path> <valid-sample> [iterations]
+// Exit: 0 healthy, 1 corruption / parse errors, 2 could not run.
+// NOTE: the sample must be SYNTACTICALLY VALID — a broken sample fails for the
+//       wrong reason.
+
+import { readFileSync, existsSync } from 'node:fs';
+import { createRequire } from 'node:module';
+import { Parser, Language } from 'web-tree-sitter';
+
+const require = createRequire(import.meta.url);
+const fail = (code, msg) => { console.error(`[check-grammar] ${msg}`); process.exit(code); };
+
+const [token, sample, iterArg] = process.argv.slice(2);
+if (!token || !sample) fail(2, 'usage: check-grammar.mjs <lang|wasm-path> <valid-sample> [iterations]');
+if (!existsSync(sample)) fail(2, `sample not found: ${sample}`);
+const iters = iterArg ? parseInt(iterArg, 10) : 20;
+
+const SPECIAL = { csharp: 'c_sharp', 'c#': 'c_sharp' };
+function resolveWasm(t) {
+  if (t.endsWith('.wasm')) return existsSync(t) ? t : fail(2, `wasm not found: ${t}`);
+  const base = SPECIAL[t.toLowerCase()] ?? t.toLowerCase();
+  try { return require.resolve(`tree-sitter-wasms/out/tree-sitter-${base}.wasm`); } catch { /* try vendored */ }
+  const vendored = `src/extraction/wasm/tree-sitter-${base}.wasm`;
+  if (existsSync(vendored)) return vendored;
+  return fail(2, `no grammar for "${t}" — not in tree-sitter-wasms and not vendored`);
+}
+
+const wasmPath = resolveWasm(token);
+const source = readFileSync(sample, 'utf8');
+
+try { await Parser.init(); }
+catch { await Parser.init({ locateFile: () => require.resolve('web-tree-sitter/tree-sitter.wasm') }); }
+
+// Load a second, known-good grammar — the corruption surfaces under the
+// multi-grammar runtime that real indexing uses, not a single grammar in isolation.
+try { await Language.load(require.resolve('tree-sitter-wasms/out/tree-sitter-python.wasm')); } catch { /* ok */ }
+
+let language;
+try { language = await Language.load(wasmPath); }
+catch (e) { fail(2, `failed to load ${wasmPath}: ${e.message}`); }
+
+const parser = new Parser();
+parser.setLanguage(language);
+
+let ok = 0, err = 0;
+for (let i = 0; i < iters; i++) {
+  const tree = parser.parse(source);
+  if (tree.rootNode.hasError) err++; else ok++;
+}
+
+console.log(`grammar: ${wasmPath.split('/').pop()}`);
+console.log(`  ABI version: ${language.abiVersion}`);
+console.log(`  parses: ${ok} clean / ${err} with errors (of ${iters})`);
+if (err > 0) {
+  console.log(
+    `RESULT: FAIL — ${err}/${iters} parses produced ERROR trees on a valid sample. ` +
+    `This grammar corrupts under web-tree-sitter; vendor a newer (ABI 14/15) wasm ` +
+    `(see SKILL.md "Find a grammar"). Confirm your sample is syntactically valid first.`
+  );
+  process.exit(1);
+}
+console.log('RESULT: PASS — grammar parses cleanly and reuses safely.');
+process.exit(0);

+ 9 - 2
src/extraction/grammars.ts

@@ -35,6 +35,7 @@ const WASM_GRAMMAR_FILES: Record<GrammarLanguage, string> = {
   dart: 'tree-sitter-dart.wasm',
   dart: 'tree-sitter-dart.wasm',
   pascal: 'tree-sitter-pascal.wasm',
   pascal: 'tree-sitter-pascal.wasm',
   scala: 'tree-sitter-scala.wasm',
   scala: 'tree-sitter-scala.wasm',
+  lua: 'tree-sitter-lua.wasm',
 };
 };
 
 
 /**
 /**
@@ -78,6 +79,7 @@ export const EXTENSION_MAP: Record<string, Language> = {
   '.fmx': 'pascal',
   '.fmx': 'pascal',
   '.scala': 'scala',
   '.scala': 'scala',
   '.sc': 'scala',
   '.sc': 'scala',
+  '.lua': 'lua',
 };
 };
 
 
 /**
 /**
@@ -125,8 +127,12 @@ export async function loadGrammarsForLanguages(languages: Language[]): Promise<v
   for (const lang of toLoad) {
   for (const lang of toLoad) {
     const wasmFile = WASM_GRAMMAR_FILES[lang];
     const wasmFile = WASM_GRAMMAR_FILES[lang];
     try {
     try {
-      // Pascal and Scala ship their own WASMs (not in tree-sitter-wasms)
-      const wasmPath = (lang === 'pascal' || lang === 'scala')
+      // Some grammars ship their own WASMs (not in tree-sitter-wasms, or the
+      // tree-sitter-wasms build is too old). Lua: tree-sitter-wasms ships an
+      // ABI-13 build that corrupts the shared WASM heap under web-tree-sitter
+      // 0.25 (drops nested calls/imports on every file after the first); we
+      // vendor the upstream ABI-15 wasm instead.
+      const wasmPath = (lang === 'pascal' || lang === 'scala' || lang === 'lua')
         ? path.join(__dirname, 'wasm', wasmFile)
         ? path.join(__dirname, 'wasm', wasmFile)
         : require.resolve(`tree-sitter-wasms/out/${wasmFile}`);
         : require.resolve(`tree-sitter-wasms/out/${wasmFile}`);
       const language = await WasmLanguage.load(wasmPath);
       const language = await WasmLanguage.load(wasmPath);
@@ -291,6 +297,7 @@ export function getLanguageDisplayName(language: Language): string {
     liquid: 'Liquid',
     liquid: 'Liquid',
     pascal: 'Pascal / Delphi',
     pascal: 'Pascal / Delphi',
     scala: 'Scala',
     scala: 'Scala',
+    lua: 'Lua',
     unknown: 'Unknown',
     unknown: 'Unknown',
   };
   };
   return names[language] || language;
   return names[language] || language;

+ 2 - 0
src/extraction/languages/index.ts

@@ -23,6 +23,7 @@ import { kotlinExtractor } from './kotlin';
 import { dartExtractor } from './dart';
 import { dartExtractor } from './dart';
 import { pascalExtractor } from './pascal';
 import { pascalExtractor } from './pascal';
 import { scalaExtractor } from './scala';
 import { scalaExtractor } from './scala';
+import { luaExtractor } from './lua';
 
 
 export const EXTRACTORS: Partial<Record<Language, LanguageExtractor>> = {
 export const EXTRACTORS: Partial<Record<Language, LanguageExtractor>> = {
   typescript: typescriptExtractor,
   typescript: typescriptExtractor,
@@ -43,4 +44,5 @@ export const EXTRACTORS: Partial<Record<Language, LanguageExtractor>> = {
   dart: dartExtractor,
   dart: dartExtractor,
   pascal: pascalExtractor,
   pascal: pascalExtractor,
   scala: scalaExtractor,
   scala: scalaExtractor,
+  lua: luaExtractor,
 };
 };

+ 139 - 0
src/extraction/languages/lua.ts

@@ -0,0 +1,139 @@
+import type { Node as SyntaxNode } from 'web-tree-sitter';
+import { getNodeText, getChildByField } from '../tree-sitter-helpers';
+import type { LanguageExtractor } from '../tree-sitter-types';
+
+// Node names follow the vendored ABI-15 grammar (@tree-sitter-grammars/
+// tree-sitter-lua), NOT the older tree-sitter-wasms build — see grammars.ts.
+
+/** First descendant of a given type (breadth-first), or null. */
+function findDescendant(node: SyntaxNode, type: string): SyntaxNode | null {
+  const queue: SyntaxNode[] = [...node.namedChildren];
+  while (queue.length) {
+    const n = queue.shift()!;
+    if (n.type === type) return n;
+    queue.push(...n.namedChildren);
+  }
+  return null;
+}
+
+/**
+ * If `callNode` is a `require("module")` / `require "module"` call, return the
+ * bare module string; otherwise null. Lua has no import statement — modules are
+ * loaded by calling the global `require`.
+ */
+function requireModule(callNode: SyntaxNode, source: string): string | null {
+  // function_call > name: <callee>, arguments: arguments
+  const name = getChildByField(callNode, 'name');
+  // A dotted/colon callee (e.g. `socket.connect`) is dot/method_index_expression,
+  // never a bare `require`.
+  if (!name || name.type !== 'identifier') return null;
+  if (getNodeText(name, source) !== 'require') return null;
+
+  const args = getChildByField(callNode, 'arguments');
+  if (!args) return null;
+  // `string > content: string_content` gives the bare module name (no quotes).
+  const content = findDescendant(args, 'string_content');
+  if (content) return getNodeText(content, source).trim() || null;
+  // Fallback: a string node without a content child — strip delimiters.
+  const str = findDescendant(args, 'string');
+  if (!str) return null;
+  const mod = getNodeText(str, source)
+    .trim()
+    .replace(/^\[\[/, '')
+    .replace(/\]\]$/, '')
+    .replace(/^["']/, '')
+    .replace(/["']$/, '');
+  return mod || null;
+}
+
+export const luaExtractor: LanguageExtractor = {
+  // function_declaration covers global (`function f`), table (`function t.f`),
+  // method (`function t:m`), and local (`local function f`) forms — the form is
+  // distinguished by the `name:` child (identifier / dot_index_expression /
+  // method_index_expression) and a `local` token, not by separate node types.
+  // Anonymous `function() ... end` (function_definition) has no name and is
+  // captured via its enclosing variable instead.
+  functionTypes: ['function_declaration'],
+  classTypes: [], // Lua has no classes/structs/interfaces/enums — tables are used for everything
+  methodTypes: [],
+  interfaceTypes: [],
+  structTypes: [],
+  enumTypes: [],
+  typeAliasTypes: [],
+  importTypes: [], // `require` is a function_call — handled in visitNode below
+  callTypes: ['function_call'],
+  variableTypes: ['variable_declaration'], // see the `lua` branch in extractVariable
+  nameField: 'name',
+  bodyField: 'body',
+  paramsField: 'parameters',
+
+  getSignature: (node, source) => {
+    const params = getChildByField(node, 'parameters');
+    return params ? getNodeText(params, source) : undefined;
+  },
+
+  // `function t.f()` / `function t:m()` are methods on table `t`: return the
+  // table as the receiver so they extract as methods with a `t::f` qualified
+  // name. Plain `function f()` / `local function f()` have no receiver and stay
+  // functions. (For `a.b.c`, the receiver is the nested `a.b`.)
+  getReceiverType: (node, source) => {
+    const name = getChildByField(node, 'name');
+    if (name && (name.type === 'dot_index_expression' || name.type === 'method_index_expression')) {
+      const table = getChildByField(name, 'table');
+      if (table) return getNodeText(table, source);
+    }
+    return undefined;
+  },
+
+  // Emit import nodes for `require(...)`. The local-declaration form is handled
+  // explicitly because the variable branch skips the initializer subtree; bare
+  // and global `require` calls are caught when the walker reaches the
+  // function_call node.
+  visitNode: (node, ctx) => {
+    const source = ctx.source;
+
+    const emit = (callNode: SyntaxNode): void => {
+      const mod = requireModule(callNode, source);
+      if (!mod) return;
+      const imp = ctx.createNode('import', mod, callNode, {
+        signature: getNodeText(callNode, source).trim().slice(0, 100),
+      });
+      if (imp && ctx.nodeStack.length > 0) {
+        const parentId = ctx.nodeStack[ctx.nodeStack.length - 1];
+        if (parentId) {
+          ctx.addUnresolvedReference({
+            fromNodeId: parentId,
+            referenceName: mod,
+            referenceKind: 'imports',
+            line: callNode.startPosition.row + 1,
+            column: callNode.startPosition.column,
+          });
+        }
+      }
+    };
+
+    // Bare / global `require("x")` — claim it so it isn't double-counted as a call.
+    if (node.type === 'function_call') {
+      if (requireModule(node, source)) {
+        emit(node);
+        return true;
+      }
+      return false;
+    }
+
+    // `local x = require("x")` — variable_declaration wraps an assignment_statement
+    // whose initializer subtree the variable branch will skip, so dig it out here.
+    if (node.type === 'variable_declaration') {
+      const assign = node.namedChildren.find((c) => c.type === 'assignment_statement');
+      const exprList = assign?.namedChildren.find((c) => c.type === 'expression_list');
+      if (exprList) {
+        for (const val of exprList.namedChildren) {
+          if (val.type === 'function_call') emit(val);
+        }
+      }
+      return false;
+    }
+
+    return false;
+  },
+};

+ 28 - 0
src/extraction/tree-sitter.ts

@@ -50,6 +50,17 @@ function extractName(node: SyntaxNode, source: string, extractor: LanguageExtrac
       const innerName = getChildByField(resolved, 'declarator') || resolved.namedChild(0);
       const innerName = getChildByField(resolved, 'declarator') || resolved.namedChild(0);
       return innerName ? getNodeText(innerName, source) : getNodeText(resolved, source);
       return innerName ? getNodeText(innerName, source) : getNodeText(resolved, source);
     }
     }
+    // Lua: `function t.f()` / `function t:m()` — the name node is a dot/method
+    // index expression; the simple name is the trailing field/method (the table
+    // receiver is captured separately via getReceiverType).
+    if (resolved.type === 'dot_index_expression') {
+      const field = getChildByField(resolved, 'field');
+      if (field) return getNodeText(field, source);
+    }
+    if (resolved.type === 'method_index_expression') {
+      const method = getChildByField(resolved, 'method');
+      if (method) return getNodeText(method, source);
+    }
     return getNodeText(resolved, source);
     return getNodeText(resolved, source);
   }
   }
 
 
@@ -1111,6 +1122,23 @@ export class TreeSitterExtractor {
           }
           }
         }
         }
       }
       }
+    } else if (this.language === 'lua') {
+      // Lua: variable_declaration → assignment_statement → variable_list
+      //      (name: identifier...) = expression_list. `local x, y = 1, 2`
+      //      declares multiple names; only plain identifiers are locals.
+      const assign = node.namedChildren.find((c) => c.type === 'assignment_statement') ?? node;
+      const varList = assign.namedChildren.find((c) => c.type === 'variable_list');
+      const exprList = assign.namedChildren.find((c) => c.type === 'expression_list');
+      const values = exprList ? exprList.namedChildren : [];
+      const names = varList ? varList.namedChildren.filter((c) => c.type === 'identifier') : [];
+      names.forEach((nameNode, i) => {
+        const name = getNodeText(nameNode, this.source);
+        if (!name) return;
+        const valueNode = values[i];
+        const initValue = valueNode ? getNodeText(valueNode, this.source).slice(0, 100) : undefined;
+        const initSignature = initValue ? `= ${initValue}${initValue.length >= 100 ? '...' : ''}` : undefined;
+        this.createNode(kind, name, nameNode, { docstring, signature: initSignature, isExported });
+      });
     } else {
     } else {
       // Generic fallback for other languages
       // Generic fallback for other languages
       // Try to find identifier children
       // Try to find identifier children

BIN
src/extraction/wasm/tree-sitter-lua.wasm


+ 3 - 0
src/types.ts

@@ -85,6 +85,7 @@ export const LANGUAGES = [
   'liquid',
   'liquid',
   'pascal',
   'pascal',
   'scala',
   'scala',
+  'lua',
   'unknown',
   'unknown',
 ] as const;
 ] as const;
 
 
@@ -545,6 +546,8 @@ export const DEFAULT_CONFIG: CodeGraphConfig = {
     // Scala
     // Scala
     '**/*.scala',
     '**/*.scala',
     '**/*.sc',
     '**/*.sc',
+    // Lua
+    '**/*.lua',
   ],
   ],
   exclude: [
   exclude: [
     // Version control
     // Version control