3 Sitoutukset ba209d9489 ... a89315645d

Tekijä SHA1 Viesti Päivämäärä
  Colby Mchenry a89315645d feat(go): index GoFrame g.Meta routes and bind them to controller methods (#747) (#957) 15 tuntia sitten
  Colby Mchenry 6459ead6aa fix(extraction): index files reached through in-root symlinks that point outside the repo (#935) (#956) 15 tuntia sitten
  Colby Mchenry d1121e46f0 feat(config): map custom file extensions to languages via codegraph.json (#906) (#955) 17 tuntia sitten

+ 3 - 0
CHANGELOG.md

@@ -23,6 +23,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 - `codegraph_explore` now follows **Laravel events** in PHP. An `event(new OrderShipped($order))` call now links to every listener that handles it — each listener's `handle()` method, usually a separate `app/Listeners/` class — so "what reacts to this event?" traces from the dispatch straight into the listener bodies. Listeners are found both ways Laravel registers them: by a typed `handle(OrderShipped $event)` (auto-discovery, including a `handle(A|B $event)` union that listens for two events) and by the `protected $listen` map in your `EventServiceProvider` (which also catches a listener whose `handle()` has no type-hint). One event fans out to all its listeners, and queued jobs — dispatched via `::dispatch()` rather than `event()` — are correctly left out.
 - CodeGraph now understands **Lombok**-generated methods in Java. `@Getter`, `@Setter`, `@Data`, `@Value`, and `@Builder` generate getters, setters, `builder()`, `equals`/`hashCode`/`toString`, and the `@Slf4j` `log` field at compile time, so those methods never appear in the source — and a `user.getName()`, `User.builder()`, or `log.info(...)` call used to resolve to nothing, silently breaking call-chain analysis (the agent would conclude the method didn't exist and reconstruct it by hand). Those members are now indexed from the annotations and fields, so they appear in `codegraph search` and `codegraph_explore`/`codegraph_node`, and callers trace through them like any hand-written method. They're marked as Lombok-generated so they read as generated, not hand-written; a method you write yourself is never overridden, static fields get no accessor, and a class without Lombok is unaffected. Thanks @git87663849. (#912)
 - `codegraph_explore` now follows **C and C++ function-pointer dispatch**. C does polymorphism with function pointers: a struct carries a function-pointer field, concrete functions are registered into it through a table (`static struct cmd commands[] = {{"add", cmd_add}, …}`), a designated initializer (`.handler = on_open`), or an assignment, and the code dispatches indirectly (`p->fn(argv)`). None of that was visible to analysis — the indirect call resolved to nothing, so `git`'s command runner looked like it called nothing and a vtable's implementations had no callers. CodeGraph now links the dispatch site to the registered handlers, keyed by the struct field, so "what runs when this dispatches?" traces from `p->fn(...)` into every function registered for that field. This covers the command-table idiom (git, redis) and the ops-struct/vtable idiom (curl's content-encoders, protocol handlers), including the case where a generic hook slot is reassigned from a registry (`h->func = found->fn`). It stays precise — distinct function-pointer fields don't cross-link, a plain data field is never treated as a dispatch, and a project without function-pointer dispatch is unaffected. (#932)
+- `codegraph_explore` now follows **GoFrame** route bindings in Go. GoFrame's standard router wires routes reflectively: the path and method live in a `g.Meta` struct tag on a request type (`` g.Meta `path:"/user/sign-in" method:"post"` ``), the controller method that serves it is matched by that request type, and the two are joined at runtime by `group.Bind(...)` — so there was no path string and no edge from a route to its handler, and "where is `/user/sign-in` handled?" or "where are the routes bound to controllers?" could only be answered by reading. CodeGraph now indexes each `g.Meta` route as a real route node and links it to the controller method whose signature takes that request type, so a route resolves to its handler structurally in one `codegraph_explore` call. The link is by request type, not method name — so it's correct even when the two differ (a `DeptSearchReq` served by a `List` method); it tells apart the many identical request types a large app defines one-per-module (`cash.ListReq` vs `order.ListReq`) by package, including cloned addon modules; and a route whose handler isn't present is left unlinked rather than guessed. (#747)
 
 - `codegraph_explore` now surfaces the right code in large multi-layer projects. When you ask a backend-flow question in a repo that pairs an API server with a big frontend that mirrors the same domain words — say an `app/` admin UI sitting over an `api/` server — the server-side file that genuinely matches several of your query's terms is no longer pushed out of the results by the larger, more interconnected frontend layer. A file corroborated by two or more distinct query terms is now kept in the answer even when a denser unrelated layer would otherwise crowd it out, so "how does X read items / handle the request" returns the service or handler that does the work instead of a wall of frontend views. Single-layer projects are unaffected; set `CODEGRAPH_RANK_NO_MULTITERM=1` to revert to the previous ranking.
 - Impact and blast-radius analysis for TypeScript, JavaScript, Go, Python, Rust, Ruby, C, Java, C#, PHP, Scala, Kotlin, Swift, Dart, and Pascal/Delphi now understands the readers of a constant. When you change a file-scope, package-level, module-level, or class-level constant — a config object, a lookup table, a shared constant — the other symbols in that file that read it now show up as affected, where before they were invisible (impact only followed calls, imports, and inheritance, so a constant's consumers looked like "nothing depends on this"). This makes `codegraph impact`, and the impact trail in `codegraph_explore`/`codegraph_node`, catch the "change this table, break its readers" class of change. It's on by default and adds no nodes to your graph; bundled/minified files and ambiguously-shadowed names are skipped to keep results precise. Set `CODEGRAPH_VALUE_REFS=0` to turn it off.
@@ -30,6 +31,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 - Java `static final` constants, C# `const` / `static readonly` constants, Scala `object` vals, and Kotlin top-level / `object` / `companion object` `val`s are now classified as constants rather than generic fields, so they participate in the constant-reader impact analysis above — change a `public static final` table, a `const string`, a Scala `object Config { val Timeout = … }`, or a Kotlin `companion object { const val … }` and the methods that read it now show up as affected. (Per-object Java `final` / C# `readonly` / Scala & Kotlin `class` instance properties are unchanged.) Kotlin constants were previously not indexed as their own symbols at all, so they now also appear in `codegraph search`.
 - Swift top-level `let`s and `static let` constants (including those namespaced in an `enum`/`struct`, the common Swift pattern) are now indexed as constants and participate in the constant-reader impact analysis above — change a `static let defaultRetryLimit` or an `enum Constants { static let … }` and the same-file code that reads it shows up as affected. Computed properties and per-instance `let`s are not treated as constants.
 - Dart top-level `const`/`final` and class `static const`/`static final` constants are now indexed as constants and participate in the constant-reader impact analysis above. Instance fields, `var`s, and locals are not treated as constants. (Generated Dart code with the standard `.g.dart`/`.freezed.dart`/`.pb.dart` suffixes is already skipped.)
+- You can now teach CodeGraph about custom file extensions. Drop a `codegraph.json` at your repo root with an `extensions` map — `{ "extensions": { ".dota_lua": "lua", ".tpl": "php" } }` — and files with those extensions get indexed under the language you name, instead of being silently skipped because the extension wasn't one of the built-in defaults. It's opt-in and committed alongside your code so the whole team shares it, your mappings layer on top of the built-ins and win on conflict (you can even re-point a built-in, e.g. `.h` → `cpp`), and a typo'd language or a malformed config is warned about and skipped rather than breaking indexing. Projects without a `codegraph.json` behave exactly as before. (#906)
 
 ### Fixes
 
@@ -48,6 +50,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 - A git worktree of a submodule is no longer indexed as a duplicate copy of that submodule's code. CodeGraph already skips ordinary worktrees (a second working view of a repo it indexes), but a worktree created *from a submodule* — common in monorepos that check submodules out into worktrees for parallel feature work — was mistaken for a genuine embedded repo and swept in, duplicating every symbol it shared with the real submodule checkout (one report had ~28% of its index as duplicates, inflating both query results and the database). These submodule worktrees are now recognized and skipped, while the submodule's own checkout stays indexed as distinct code. Thanks @charlesxu2026-ship-it. (#945)
 - A C++ class or struct annotated with an export/visibility macro — `class MYLIB_EXPORT Foo : public Bar { … }`, the common DLL-export / visibility pattern in headers — is no longer mis-indexed as a function spanning the whole class body. Not knowing the macro is a macro, the parser read it as a type and the whole declaration as a function, so the class surfaced as a phantom `function` that showed up as a false caller in `codegraph callers`, `codegraph impact`, and blast-radius analysis, and skewed symbol counts. CodeGraph now recognizes this misparse and drops the bogus node. Thanks @spwlyzx. (#946)
 - `codegraph status` no longer reports phantom pending changes for files CodeGraph deliberately keeps out of the index — a tracked file inside a committed dependency dir (a checked-in `vendor/` or `node_modules/`), or a tracked file under a `.gitignore`d directory. A full index correctly excludes these, and `codegraph sync` never indexes them, but the fast change-detector behind `codegraph status` flagged every edit to such a file as "modified" (and a new one as "added") — so the pending-changes count never cleared no matter how many times you synced. Change detection now applies the same ignore rules the full index does, so `status` agrees with `sync`, and tools built on CodeGraph's change-detection API get the same corrected list. (#766)
+- Files reached through a symlinked directory that points outside your project now index instead of being silently skipped. When a folder in your repo is a symlink to a location outside the repo root — the standard layout for Dota 2 custom games and similar SDK-linked projects, where `game/` and `content/` link into the engine tree — CodeGraph followed the symlink to discover those files but then blocked every one of them while reading, logging `Path traversal blocked in batch reader` and indexing nothing under them. The reader now agrees with the directory scan and indexes those files. The guard against `../` path escapes is unchanged, and the protection that keeps your agent from being served file contents from outside the repo is also unchanged — only the indexer's own read path follows these in-repo symlinks. (#935)
 
 
 ## [1.0.1] - 2026-06-13

+ 27 - 3
README.md

@@ -585,9 +585,10 @@ that drive the graph directly: `DatabaseConnection`, `QueryBuilder`,
 
 ## Configuration
 
-There isn't any — CodeGraph is zero-config, with **no config file** to write or
-keep in sync. Language support is automatic from the file extension; there's
-nothing to wire up per language.
+Next to none — CodeGraph is **zero-config by default**, with nothing to write or
+keep in sync to get started. Language support is automatic from the file
+extension; there's nothing to wire up per language. The one optional file is for
+mapping [custom file extensions](#custom-file-extensions).
 
 What it skips out of the box:
 
@@ -605,6 +606,29 @@ add a negation — `!vendor/`. The defaults apply uniformly, so committing a
 dependency or build directory doesn't force it into the graph; the `.gitignore`
 negation is the explicit opt-in.
 
+### Custom file extensions
+
+If your project uses a non-standard extension for a [supported
+language](#supported-languages) — say `.dota_lua` for Lua, or `.tpl` for PHP —
+those files are skipped by default, because the extension isn't one CodeGraph
+recognizes. Map them with an optional **`codegraph.json`** at your project root:
+
+```json
+{
+  "extensions": {
+    ".dota_lua": "lua",
+    ".tpl": "php"
+  }
+}
+```
+
+Each value is a supported language id. The mappings merge on top of the built-in
+defaults and win on conflict, so you can also re-point a built-in (e.g.
+`".h": "cpp"`). Commit the file to share the mapping with your team. A typo'd
+language or a malformed file is warned about and skipped — it never breaks
+indexing — and a project with no `codegraph.json` behaves exactly as before.
+Re-index (`codegraph index`) after adding or changing mappings.
+
 ## Telemetry
 
 CodeGraph collects **anonymous usage statistics** — which tools and commands get

+ 157 - 0
__tests__/extension-mapping.test.ts

@@ -0,0 +1,157 @@
+/**
+ * Custom extension → language mapping (#906).
+ *
+ * A project can map non-standard file extensions to a supported language via a
+ * committed `codegraph.json` at the repo root, so files that would otherwise be
+ * silently skipped get indexed under the right grammar. These tests cover the
+ * two choke-point functions (detectLanguage / isSourceFile) honoring an override
+ * map, the loader's validation/normalization/caching of `codegraph.json`, and a
+ * full index proving a custom-extension file is actually extracted — while the
+ * zero-config path stays byte-identical (the file is NOT indexed without config).
+ */
+import { describe, it, expect, beforeEach, afterEach } from 'vitest';
+import * as fs from 'node:fs';
+import * as path from 'node:path';
+import * as os from 'node:os';
+import { CodeGraph } from '../src';
+import { detectLanguage, isSourceFile } from '../src/extraction/grammars';
+import { loadExtensionOverrides, clearProjectConfigCache } from '../src/project-config';
+
+describe('custom extension → language mapping (#906)', () => {
+  describe('detectLanguage / isSourceFile overrides argument', () => {
+    it('maps a custom extension only when present in the overrides', () => {
+      expect(detectLanguage('a/b.foo')).toBe('unknown');
+      expect(isSourceFile('a/b.foo')).toBe(false);
+
+      expect(detectLanguage('a/b.foo', undefined, { '.foo': 'typescript' })).toBe('typescript');
+      expect(isSourceFile('a/b.foo', { '.foo': 'typescript' })).toBe(true);
+    });
+
+    it('lets a user mapping take precedence over a built-in extension', () => {
+      expect(detectLanguage('x.h')).toBe('c');
+      expect(detectLanguage('x.h', undefined, { '.h': 'cpp' })).toBe('cpp');
+    });
+
+    it('is byte-identical to zero-config behavior when no overrides are passed', () => {
+      expect(detectLanguage('x.ts')).toBe('typescript');
+      expect(detectLanguage('x.py')).toBe('python');
+      expect(isSourceFile('x.ts')).toBe(true);
+      expect(isSourceFile('x.unknownext')).toBe(false);
+    });
+  });
+
+  describe('loadExtensionOverrides (codegraph.json)', () => {
+    let dir: string;
+    beforeEach(() => {
+      dir = fs.mkdtempSync(path.join(os.tmpdir(), 'cg-extmap-'));
+      clearProjectConfigCache();
+    });
+    afterEach(() => {
+      clearProjectConfigCache();
+      fs.rmSync(dir, { recursive: true, force: true });
+    });
+    const writeConfig = (obj: unknown) =>
+      fs.writeFileSync(
+        path.join(dir, 'codegraph.json'),
+        typeof obj === 'string' ? obj : JSON.stringify(obj)
+      );
+
+    it('returns an empty map when there is no codegraph.json', () => {
+      expect(loadExtensionOverrides(dir)).toEqual({});
+    });
+
+    it('loads and validates a well-formed extensions map', () => {
+      writeConfig({ extensions: { '.foo': 'typescript', '.bar': 'python' } });
+      expect(loadExtensionOverrides(dir)).toEqual({ '.foo': 'typescript', '.bar': 'python' });
+    });
+
+    it('normalizes keys (adds a leading dot, lowercases)', () => {
+      writeConfig({ extensions: { foo: 'lua', '.BAR': 'go' } });
+      expect(loadExtensionOverrides(dir)).toEqual({ '.foo': 'lua', '.bar': 'go' });
+    });
+
+    it('skips entries whose target is not a supported language', () => {
+      writeConfig({ extensions: { '.foo': 'typescript', '.bad': 'pyhton', '.x': 'unknown' } });
+      expect(loadExtensionOverrides(dir)).toEqual({ '.foo': 'typescript' });
+    });
+
+    it('skips multi-part and otherwise unusable extension keys', () => {
+      writeConfig({ extensions: { '.d.ts': 'typescript', 'a/b': 'go', '.': 'lua', '.ok': 'rust' } });
+      expect(loadExtensionOverrides(dir)).toEqual({ '.ok': 'rust' });
+    });
+
+    it('ignores malformed JSON without throwing', () => {
+      writeConfig('{ not: valid json ');
+      expect(loadExtensionOverrides(dir)).toEqual({});
+    });
+
+    it('ignores a non-object extensions field', () => {
+      writeConfig({ extensions: 'nope' });
+      expect(loadExtensionOverrides(dir)).toEqual({});
+    });
+
+    it('picks up a changed config (mtime-invalidated cache)', () => {
+      writeConfig({ extensions: { '.foo': 'typescript' } });
+      expect(loadExtensionOverrides(dir)).toEqual({ '.foo': 'typescript' });
+
+      writeConfig({ extensions: { '.foo': 'go' } });
+      // Force a distinct mtime in case the filesystem clock is coarse.
+      const future = new Date(Date.now() + 2000);
+      fs.utimesSync(path.join(dir, 'codegraph.json'), future, future);
+
+      expect(loadExtensionOverrides(dir)).toEqual({ '.foo': 'go' });
+    });
+  });
+
+  describe('indexAll honors codegraph.json end-to-end', () => {
+    let dir: string;
+    beforeEach(() => {
+      dir = fs.mkdtempSync(path.join(os.tmpdir(), 'cg-extmap-idx-'));
+      clearProjectConfigCache();
+    });
+    afterEach(() => {
+      clearProjectConfigCache();
+      fs.rmSync(dir, { recursive: true, force: true });
+    });
+    const write = (rel: string, body: string) => {
+      const p = path.join(dir, rel);
+      fs.mkdirSync(path.dirname(p), { recursive: true });
+      fs.writeFileSync(p, body);
+    };
+    const indexAndQuery = async () => {
+      const cg = await CodeGraph.init(dir, { silent: true });
+      await cg.indexAll();
+      const db = (cg as any).db.db;
+      const nodes = db
+        .prepare('SELECT name, kind, file_path, language FROM nodes WHERE file_path = ?')
+        .all('widget.foo');
+      const files = db
+        .prepare('SELECT path, language FROM files WHERE path = ?')
+        .all('widget.foo');
+      cg.close?.();
+      return { nodes, files };
+    };
+
+    const SOURCE = 'export function widgetHandler(x: number): number { return x + 1; }\n';
+
+    it('indexes a custom-extension file mapped to a supported language', async () => {
+      write('codegraph.json', JSON.stringify({ extensions: { '.foo': 'typescript' } }));
+      write('widget.foo', SOURCE);
+
+      const { nodes, files } = await indexAndQuery();
+
+      expect(files.length).toBe(1);
+      expect(files[0].language).toBe('typescript');
+      expect(nodes.some((n: any) => n.name === 'widgetHandler' && n.language === 'typescript')).toBe(true);
+    });
+
+    it('does NOT index the same file without codegraph.json (zero-config preserved)', async () => {
+      write('widget.foo', SOURCE);
+
+      const { nodes, files } = await indexAndQuery();
+
+      expect(files.length).toBe(0);
+      expect(nodes.length).toBe(0);
+    });
+  });
+});

+ 72 - 0
__tests__/frameworks.test.ts

@@ -944,6 +944,78 @@ describe('goResolver.extract', () => {
   });
 });
 
+import { goframeResolver } from '../src/resolution/frameworks/goframe';
+
+describe('goframeResolver', () => {
+  it('detects GoFrame from a gogf/gf dependency in go.mod', () => {
+    const ctx: any = {
+      readFile: (f: string) =>
+        f === 'go.mod' ? 'module example.com/app\nrequire github.com/gogf/gf/v2 v2.7.0\n' : null,
+    };
+    expect(goframeResolver.detect(ctx)).toBe(true);
+    const noGf: any = { readFile: (f: string) => (f === 'go.mod' ? 'module example.com/app\n' : null) };
+    expect(goframeResolver.detect(noGf)).toBe(false);
+  });
+
+  it('extracts a route node from a g.Meta request struct (method upper-cased)', () => {
+    const src = `package v1
+import "github.com/gogf/gf/v2/frame/g"
+type SignInReq struct {
+	g.Meta   \`path:"/user/sign-in" method:"post" tags:"User" summary:"Sign in"\`
+	Passport string
+}
+type SignInRes struct{}
+`;
+    const { nodes } = goframeResolver.extract!('api/user/v1/user_sign_in.go', src);
+    expect(nodes).toHaveLength(1);
+    expect(nodes[0].kind).toBe('route');
+    expect(nodes[0].name).toBe('POST /user/sign-in');
+    // The package-qualified request type is encoded for the synthesizer join.
+    expect(nodes[0].qualifiedName).toContain('::goframe-route:v1.SignInReq');
+  });
+
+  it('is independent of g.Meta tag attribute order', () => {
+    const src = `type DeptSearchReq struct {
+	g.Meta \`path:"/dept/list" tags:"Dept" method:"get" summary:"列表"\`
+}`;
+    const { nodes } = goframeResolver.extract!('api/system/dept.go', src);
+    expect(nodes[0].name).toBe('GET /dept/list');
+    expect(nodes[0].qualifiedName).toContain('::goframe-route:DeptSearchReq');
+  });
+
+  it('skips a response g.Meta that has no path (mime-only) and other non-route metadata', () => {
+    const src = `type ListRes struct {
+	g.Meta \`mime:"application/json"\`
+	Items []string
+}`;
+    const { nodes } = goframeResolver.extract!('api/x.go', src);
+    expect(nodes).toHaveLength(0);
+  });
+
+  it('defaults method to ANY when method: is omitted', () => {
+    const src = `type PingReq struct {
+	g.Meta \`path:"/ping"\`
+}`;
+    const { nodes } = goframeResolver.extract!('api/ping.go', src);
+    expect(nodes[0].name).toBe('ANY /ping');
+  });
+
+  it('extracts every request struct in a multi-route api file', () => {
+    const src = `type DeptListReq struct { g.Meta \`path:"/dept/list" method:"get"\` }
+type DeptListRes struct { g.Meta \`mime:"application/json"\` }
+type DeptAddReq struct { g.Meta \`path:"/dept/add" method:"post"\` }
+type DeptAddRes struct {}
+`;
+    const { nodes } = goframeResolver.extract!('api/dept.go', src);
+    expect(nodes.map((n) => n.name).sort()).toEqual(['GET /dept/list', 'POST /dept/add']);
+  });
+
+  it('returns nothing for a non-go file or a file without g.Meta', () => {
+    expect(goframeResolver.extract!('main.ts', 'const x = 1').nodes).toHaveLength(0);
+    expect(goframeResolver.extract!('main.go', 'package main\nfunc main() {}\n').nodes).toHaveLength(0);
+  });
+});
+
 import { rustResolver } from '../src/resolution/frameworks/rust';
 
 describe('rustResolver.extract', () => {

+ 181 - 0
__tests__/goframe.test.ts

@@ -0,0 +1,181 @@
+/**
+ * GoFrame route → controller-method coverage (#747), end to end.
+ *
+ * GoFrame binds routes reflectively, so the route declared in a request type's
+ * `g.Meta` tag has no static edge to the controller method that serves it, and
+ * the method name is NOT derivable from the request type (`DeptSearchReq` is
+ * served by `List`). This indexes a fixture through the full pipeline and
+ * checks: the `g.Meta` tags become route nodes; each route joins to its handler
+ * by the request type in the method signature (the naming-mismatch case
+ * included); a response (`mime`-only) `g.Meta` makes no route; a route with no
+ * handler is left unlinked (silent beats wrong); and the response type never
+ * produces a spurious edge.
+ */
+import { describe, it, expect, beforeEach, afterEach } from 'vitest';
+import * as fs from 'node:fs';
+import * as path from 'node:path';
+import * as os from 'node:os';
+import { CodeGraph } from '../src';
+
+describe('GoFrame route synthesizer', () => {
+  let dir: string;
+  beforeEach(() => { dir = fs.mkdtempSync(path.join(os.tmpdir(), 'goframe-')); });
+  afterEach(() => { fs.rmSync(dir, { recursive: true, force: true }); });
+
+  it('joins each g.Meta route to its controller method by the request-type signature', async () => {
+    fs.writeFileSync(path.join(dir, 'go.mod'), 'module example.com/app\n\nrequire github.com/gogf/gf/v2 v2.7.0\n');
+
+    fs.mkdirSync(path.join(dir, 'api', 'system'), { recursive: true });
+    fs.writeFileSync(
+      path.join(dir, 'api', 'system', 'dept.go'),
+      `package system
+
+import "github.com/gogf/gf/v2/frame/g"
+
+type DeptSearchReq struct {
+	g.Meta   \`path:"/dept/list" tags:"Dept" method:"get" summary:"list"\`
+	DeptName string
+}
+type DeptSearchRes struct {
+	g.Meta \`mime:"application/json"\`
+	List   []string
+}
+
+type DeptAddReq struct {
+	g.Meta \`path:"/dept/add" method:"post"\`
+	Name   string
+}
+type DeptAddRes struct{}
+
+// A declared route whose handler does not exist in this codebase.
+type OrphanReq struct {
+	g.Meta \`path:"/orphan" method:"get"\`
+}
+type OrphanRes struct{}
+`
+    );
+
+    fs.mkdirSync(path.join(dir, 'internal', 'controller'), { recursive: true });
+    fs.writeFileSync(
+      path.join(dir, 'internal', 'controller', 'dept.go'),
+      `package controller
+
+import (
+	"context"
+
+	"example.com/app/api/system"
+)
+
+type sysDeptController struct{}
+
+// NB: method name (List) differs from the request type (DeptSearchReq) — the join
+// must be by signature, not name.
+func (c *sysDeptController) List(ctx context.Context, req *system.DeptSearchReq) (res *system.DeptSearchRes, err error) {
+	return helper(ctx)
+}
+
+func (c *sysDeptController) Add(ctx context.Context, req *system.DeptAddReq) (res *system.DeptAddRes, err error) {
+	return
+}
+
+// Returns the response type but takes no request type — must NOT be linked.
+func helper(ctx context.Context) (res *system.DeptSearchRes, err error) {
+	return
+}
+`
+    );
+
+    const cg = await CodeGraph.init(dir, { silent: true });
+    await cg.indexAll();
+    const db = (cg as any).db.db;
+
+    const routes = db.prepare(`SELECT name FROM nodes WHERE kind='route' ORDER BY name`).all();
+    const edges = db
+      .prepare(
+        `SELECT json_extract(e.metadata,'$.route') route, json_extract(e.metadata,'$.requestType') reqType,
+                e.kind, t.name target_name, t.kind target_kind
+         FROM edges e JOIN nodes t ON t.id = e.target
+         WHERE json_extract(e.metadata,'$.synthesizedBy') = 'goframe-route'
+         ORDER BY route`
+      )
+      .all();
+    cg.close?.();
+
+    // Three routes from path-bearing g.Meta; the mime-only response g.Meta makes none.
+    expect(routes.map((r: any) => r.name)).toEqual(['GET /dept/list', 'GET /orphan', 'POST /dept/add']);
+
+    // Two route→handler edges — the orphan route stays unlinked (silent beats wrong).
+    expect(edges).toHaveLength(2);
+    const byRoute = Object.fromEntries(edges.map((e: any) => [e.route, e]));
+
+    // Naming mismatch resolved by signature: GET /dept/list → List.
+    expect(byRoute['GET /dept/list'].target_name).toBe('List');
+    expect(byRoute['GET /dept/list'].reqType).toBe('DeptSearchReq');
+    expect(byRoute['POST /dept/add'].target_name).toBe('Add');
+
+    // It is a dynamic-dispatch `calls` hop to a real method, never to the helper.
+    expect(edges.every((e: any) => e.kind === 'calls' && e.target_kind === 'method')).toBe(true);
+    expect(edges.some((e: any) => e.target_name === 'helper')).toBe(false);
+    expect(byRoute['GET /orphan']).toBeUndefined();
+  });
+
+  it('disambiguates identical bare request types across modules by package qualifier', async () => {
+    fs.writeFileSync(path.join(dir, 'go.mod'), 'module example.com/app\n\nrequire github.com/gogf/gf/v2 v2.7.0\n');
+
+    // Two modules that BOTH define `type ListReq struct` — the collision a large
+    // GoFrame app has dozens of. The package qualifier in the handler signature
+    // (`*cash.ListReq` vs `*order.ListReq`) is the only thing that tells them apart.
+    for (const mod of ['cash', 'order']) {
+      fs.mkdirSync(path.join(dir, 'api', mod), { recursive: true });
+      fs.writeFileSync(
+        path.join(dir, 'api', mod, `${mod}.go`),
+        `package ${mod}
+
+import "github.com/gogf/gf/v2/frame/g"
+
+type ListReq struct {
+	g.Meta \`path:"/${mod}/list" method:"get"\`
+}
+type ListRes struct{}
+`
+      );
+      fs.mkdirSync(path.join(dir, 'internal', 'controller', mod), { recursive: true });
+      fs.writeFileSync(
+        path.join(dir, 'internal', 'controller', mod, `${mod}.go`),
+        `package ${mod}
+
+import (
+	"context"
+
+	"example.com/app/api/${mod}"
+)
+
+type c${mod} struct{}
+
+func (c *c${mod}) List(ctx context.Context, req *${mod}.ListReq) (res *${mod}.ListRes, err error) {
+	return
+}
+`
+      );
+    }
+
+    const cg = await CodeGraph.init(dir, { silent: true });
+    await cg.indexAll();
+    const db = (cg as any).db.db;
+    const rows = db
+      .prepare(
+        `SELECT json_extract(e.metadata,'$.route') route, t.file_path handler_file
+         FROM edges e JOIN nodes t ON t.id = e.target
+         WHERE json_extract(e.metadata,'$.synthesizedBy') = 'goframe-route'
+         ORDER BY route`
+      )
+      .all();
+    cg.close?.();
+
+    expect(rows).toHaveLength(2);
+    // Each route binds to ITS OWN module's handler, never the other's.
+    const byRoute = Object.fromEntries(rows.map((r: any) => [r.route, r.handler_file]));
+    expect(byRoute['GET /cash/list']).toContain('controller/cash/');
+    expect(byRoute['GET /order/list']).toContain('controller/order/');
+  });
+});

+ 46 - 0
__tests__/security.test.ts

@@ -232,6 +232,26 @@ describe('Symlink escape prevention (#527)', () => {
     expect(validatePathWithinRoot(root, 'src/inlink.ts')).not.toBeNull();
   });
 
+  // The INDEXING read path opts into following in-root symlinks the directory
+  // walk already descended into — discovery and the reader must agree, or files
+  // reached via an in-root symlink-to-outside fail to index (#935). The lexical
+  // `../` guard is NOT waived, and content-serving sinks never pass the flag.
+  it('allowSymlinkEscape follows an in-repo symlink to an out-of-root FILE (indexing read)', () => {
+    if (!link(path.join(root, 'escape'), path.join(outside, 'pkg', 'secret.txt'))) return;
+    expect(validatePathWithinRoot(root, 'escape', { allowSymlinkEscape: true })).not.toBeNull();
+  });
+
+  it('allowSymlinkEscape follows a path through an in-repo out-of-root DIR symlink (indexing read)', () => {
+    if (!link(path.join(root, 'escapedir'), path.join(outside, 'pkg'))) return;
+    expect(validatePathWithinRoot(root, 'escapedir/secret.txt', { allowSymlinkEscape: true })).not.toBeNull();
+  });
+
+  it('allowSymlinkEscape STILL rejects a lexical ../ traversal (guard not waived)', () => {
+    expect(
+      validatePathWithinRoot(root, `../${path.basename(outside)}/pkg/secret.txt`, { allowSymlinkEscape: true })
+    ).toBeNull();
+  });
+
   it('end-to-end: getCode never serves an out-of-root file reached via a dir symlink', async () => {
     fs.writeFileSync(path.join(outside, 'pkg', 'leak.ts'),
       'export function leaked() { return "LEAKED-ZZZ-9"; }\n');
@@ -250,6 +270,32 @@ describe('Symlink escape prevention (#527)', () => {
       cg.close();
     }
   });
+
+  it('end-to-end (#935): indexes source reached through an in-root dir symlink to outside', async () => {
+    // The Dota custom-game layout symlinks `game/` and `content/` into an SDK
+    // tree outside the repo. Before #935 the batch reader's strict symlink-escape
+    // guard blocked every file under them, so nothing indexed — even though the
+    // directory walk deliberately followed the symlink to enumerate them. The
+    // reader now agrees with discovery: the file indexes.
+    fs.writeFileSync(path.join(outside, 'pkg', 'vendored.ts'),
+      'export function vendoredHelper() { return "LEAKED-ZZZ-9"; }\n');
+    if (!link(path.join(root, 'game'), path.join(outside, 'pkg'))) return;
+
+    const cg = CodeGraph.initSync(root, { config: { include: ['**/*.ts'], exclude: [] } });
+    try {
+      await cg.indexAll();
+      // The symlinked-in file is now part of the graph...
+      const names = cg.getNodesByKind('function').map((n) => n.name);
+      expect(names).toContain('vendoredHelper');
+      // ...but its out-of-root contents are STILL never served (the #527 sink
+      // stays strict — indexing relaxes only the read path, not getCode).
+      for (const n of cg.getNodesByKind('function')) {
+        expect((await cg.getCode(n.id)) ?? '').not.toContain('LEAKED-ZZZ-9');
+      }
+    } finally {
+      cg.close();
+    }
+  });
 });
 
 describe('validateProjectPath — sensitive directory blocking', () => {

Tiedoston diff-näkymää rajattu, sillä se on liian suuri
+ 1 - 0
docs/design/dynamic-dispatch-coverage-playbook.md


+ 19 - 2
site/src/content/docs/getting-started/configuration.md

@@ -1,9 +1,9 @@
 ---
 title: Configuration
-description: CodeGraph is zero-config — there are no config files.
+description: CodeGraph is zero-config by default, with one optional file for mapping custom extensions.
 ---
 
-There isn't any — CodeGraph is **zero-config**, with **no config file** to write or keep in sync. Language support is automatic from the file extension; there's nothing to wire up per language.
+Next to none — CodeGraph is **zero-config by default**, with nothing to write or keep in sync to get started. Language support is automatic from the file extension; there's nothing to wire up per language. The one optional file is for mapping [custom file extensions](#custom-file-extensions).
 
 ## What it skips out of the box
 
@@ -17,6 +17,23 @@ To keep something else out, add it to `.gitignore`. To pull a default-excluded d
 
 The defaults apply uniformly, so committing a dependency or build directory doesn't force it into the graph — the `.gitignore` negation is the explicit opt-in.
 
+## Custom file extensions
+
+If your project uses a non-standard extension for a [supported language](/codegraph/reference/languages/) — say `.dota_lua` for Lua, or `.tpl` for PHP — those files are skipped by default, because the extension isn't one CodeGraph recognizes. Map them with an optional `codegraph.json` at your project root:
+
+```json
+{
+  "extensions": {
+    ".dota_lua": "lua",
+    ".tpl": "php"
+  }
+}
+```
+
+Each value is a supported language id. The mappings merge on top of the built-in defaults and win on conflict, so you can also re-point a built-in (e.g. `".h": "cpp"`). Commit the file to share the mapping with your team.
+
+A typo'd language or a malformed file is warned about and skipped — it never breaks indexing — and a project with no `codegraph.json` behaves exactly as before. Re-index (`codegraph index`) after adding or changing mappings.
+
 ## Where data lives
 
 Per-project data lives in a `.codegraph/` directory at your project root, containing the SQLite database (`codegraph.db`). Nothing leaves your machine.

+ 14 - 5
src/extraction/grammars.ts

@@ -121,13 +121,18 @@ export const EXTENSION_MAP: Record<string, Language> = {
  * Whether a file is one CodeGraph can parse, based purely on its extension.
  * This is the single source of truth for "should we index this file" — derived
  * from EXTENSION_MAP so parser support and indexing selection never drift.
+ *
+ * `overrides` is the project's validated custom extension → language map (from
+ * `codegraph.json`); when present its extensions count as indexable in addition
+ * to the built-ins. Omitting it is byte-identical to the zero-config behavior.
  */
-export function isSourceFile(filePath: string): boolean {
+export function isSourceFile(filePath: string, overrides?: Record<string, Language>): boolean {
   if (isPlayRoutesFile(filePath)) return true; // Play `conf/routes` is extensionless
   if (isShopifyLiquidJson(filePath)) return true; // Shopify OS 2.0 JSON templates / section groups
   const dot = filePath.lastIndexOf('.');
   if (dot < 0) return false;
-  return filePath.slice(dot).toLowerCase() in EXTENSION_MAP;
+  const ext = filePath.slice(dot).toLowerCase();
+  return ext in EXTENSION_MAP || (!!overrides && ext in overrides);
 }
 
 /**
@@ -266,9 +271,13 @@ export function getParser(language: Language): Parser | null {
 }
 
 /**
- * Detect language from file extension
+ * Detect language from file extension.
+ *
+ * `overrides` is the project's validated custom extension → language map (from
+ * `codegraph.json`); when present its mappings take precedence over the built-in
+ * `EXTENSION_MAP`. Omitting it is byte-identical to the zero-config behavior.
  */
-export function detectLanguage(filePath: string, source?: string): Language {
+export function detectLanguage(filePath: string, source?: string, overrides?: Record<string, Language>): Language {
   // Play `conf/routes` has no grammar — route through the no-symbol path; the
   // Play framework resolver extracts route nodes from it.
   if (isPlayRoutesFile(filePath)) return 'yaml';
@@ -276,7 +285,7 @@ export function detectLanguage(filePath: string, source?: string): Language {
   // Shopify OS 2.0 JSON templates / section groups → the Liquid extractor (it
   // links each section `"type"` to its `sections/<type>.liquid`).
   if (isShopifyLiquidJson(filePath)) return 'liquid';
-  const lang = EXTENSION_MAP[ext] || 'unknown';
+  const lang = (overrides && overrides[ext]) || EXTENSION_MAP[ext] || 'unknown';
 
   // .h files could be C, C++, or Objective-C — check source content
   if (lang === 'c' && ext === '.h' && source) {

+ 50 - 23
src/extraction/index.ts

@@ -19,6 +19,7 @@ import {
 import { QueryBuilder } from '../db/queries';
 import { extractFromSource } from './tree-sitter';
 import { detectLanguage, isSourceFile, isLanguageSupported, isFileLevelOnlyLanguage, initGrammars, loadGrammarsForLanguages } from './grammars';
+import { loadExtensionOverrides } from '../project-config';
 import { isCodeGraphDataDir } from '../directory';
 import { logDebug, logWarn } from '../errors';
 import { validatePathWithinRoot, normalizePath } from '../utils';
@@ -637,14 +638,17 @@ interface GitChanges {
 function getGitChangedFiles(rootDir: string): GitChanges | null {
   try {
     const changes: GitChanges = { modified: [], added: [], deleted: [] };
-    collectGitStatus(rootDir, '', changes);
+    // Custom extension → language overrides from the project's codegraph.json,
+    // so change detection sees the same custom-extension files the full index does.
+    const overrides = loadExtensionOverrides(rootDir);
+    collectGitStatus(rootDir, '', changes, overrides);
     return changes;
   } catch {
     return null;
   }
 }
 
-function collectGitStatus(repoDir: string, prefix: string, out: GitChanges): void {
+function collectGitStatus(repoDir: string, prefix: string, out: GitChanges, overrides?: Record<string, Language>): void {
   const output = execFileSync(
     'git',
     ['status', '--porcelain', '--no-renames'],
@@ -678,7 +682,7 @@ function collectGitStatus(repoDir: string, prefix: string, out: GitChanges): voi
     }
 
     const filePath = normalizePath(prefix + rel);
-    if (!isSourceFile(filePath)) continue;
+    if (!isSourceFile(filePath, overrides)) continue;
 
     if (statusCode.includes('D')) {
       // Deletions stay unfiltered: getChangedFiles acts on one only when the
@@ -704,11 +708,11 @@ function collectGitStatus(repoDir: string, prefix: string, out: GitChanges): voi
   // nested deeper) and under this repo's gitignored dirs.
   for (const rel of untrackedDirs) {
     for (const repoRel of findNestedGitRepos(path.join(repoDir, rel), rel)) {
-      collectGitStatus(path.join(repoDir, repoRel), prefix + repoRel, out);
+      collectGitStatus(path.join(repoDir, repoRel), prefix + repoRel, out, overrides);
     }
   }
   for (const rel of findIgnoredEmbeddedRepos(repoDir)) {
-    collectGitStatus(path.join(repoDir, rel), prefix + rel, out);
+    collectGitStatus(path.join(repoDir, rel), prefix + rel, out, overrides);
   }
 }
 
@@ -723,13 +727,16 @@ export function scanDirectory(
   rootDir: string,
   onProgress?: (current: number, file: string) => void
 ): string[] {
+  // Custom extension → language overrides from the project's codegraph.json.
+  const overrides = loadExtensionOverrides(rootDir);
+
   // Fast path: use git to get all visible files (respects .gitignore everywhere)
   const gitFiles = getGitVisibleFiles(rootDir);
   if (gitFiles) {
     const files: string[] = [];
     let count = 0;
     for (const filePath of gitFiles) {
-      if (isSourceFile(filePath)) {
+      if (isSourceFile(filePath, overrides)) {
         files.push(filePath);
         count++;
         onProgress?.(count, filePath);
@@ -750,12 +757,15 @@ export async function scanDirectoryAsync(
   rootDir: string,
   onProgress?: (current: number, file: string) => void
 ): Promise<string[]> {
+  // Custom extension → language overrides from the project's codegraph.json.
+  const overrides = loadExtensionOverrides(rootDir);
+
   const gitFiles = getGitVisibleFiles(rootDir);
   if (gitFiles) {
     const files: string[] = [];
     let count = 0;
     for (const filePath of gitFiles) {
-      if (isSourceFile(filePath)) {
+      if (isSourceFile(filePath, overrides)) {
         files.push(filePath);
         count++;
         onProgress?.(count, filePath);
@@ -781,6 +791,8 @@ function scanDirectoryWalk(
   const files: string[] = [];
   let count = 0;
   const visitedDirs = new Set<string>();
+  // Custom extension → language overrides from the project's codegraph.json.
+  const overrides = loadExtensionOverrides(rootDir);
 
   // A .gitignore matcher scoped to the directory that declared it. Patterns in
   // a nested .gitignore are relative to that directory, so we keep the dir
@@ -857,7 +869,7 @@ function scanDirectoryWalk(
               walk(fullPath, active);
             }
           } else if (stat.isFile()) {
-            if (!isIgnored(fullPath, false, active) && isSourceFile(relativePath)) {
+            if (!isIgnored(fullPath, false, active) && isSourceFile(relativePath, overrides)) {
               files.push(relativePath);
               count++;
               onProgress?.(count, relativePath);
@@ -874,7 +886,7 @@ function scanDirectoryWalk(
           walk(fullPath, active);
         }
       } else if (entry.isFile()) {
-        if (!isIgnored(fullPath, false, active) && isSourceFile(relativePath)) {
+        if (!isIgnored(fullPath, false, active) && isSourceFile(relativePath, overrides)) {
           files.push(relativePath);
           count++;
           onProgress?.(count, relativePath);
@@ -994,6 +1006,11 @@ export class ExtractionOrchestrator {
     let totalNodes = 0;
     let totalEdges = 0;
 
+    // Custom extension → language overrides from the project's codegraph.json.
+    // Threaded into language detection so custom-extension files load the right
+    // grammar and store under the mapped language.
+    const overrides = loadExtensionOverrides(this.rootDir);
+
     const log = verbose
       ? (msg: string) => { console.log(`[worker] ${msg}`); }
       : (_msg: string) => {};
@@ -1050,7 +1067,7 @@ export class ExtractionOrchestrator {
     await new Promise(resolve => setImmediate(resolve));
 
     // Detect needed languages and load grammars in the parse worker
-    const neededLanguages = [...new Set(files.map((f) => detectLanguage(f)))];
+    const neededLanguages = [...new Set(files.map((f) => detectLanguage(f, undefined, overrides)))];
     // .h files default to 'c' but may be C++ — ensure cpp grammar is loaded when c is needed
     if (neededLanguages.includes('c') && !neededLanguages.includes('cpp')) {
       neededLanguages.push('cpp');
@@ -1161,12 +1178,17 @@ export class ExtractionOrchestrator {
     }
 
     async function requestParse(filePath: string, content: string): Promise<ExtractionResult> {
+      // Resolve the language on the main thread (where the project's
+      // codegraph.json overrides are loaded) and hand it to the worker, so the
+      // worker never needs the override map itself.
+      const language = detectLanguage(filePath, content, overrides);
+
       if (!WorkerClass) {
         // In-process fallback
         return extractFromSource(
           filePath,
           content,
-          detectLanguage(filePath, content),
+          language,
           frameworkNames
         );
       }
@@ -1198,7 +1220,7 @@ export class ExtractionOrchestrator {
         }, timeoutMs);
 
         pendingParses.set(id, { resolve, reject, timer });
-        worker.postMessage({ type: 'parse', id, filePath, content, frameworkNames });
+        worker.postMessage({ type: 'parse', id, filePath, content, frameworkNames, language });
       });
     }
 
@@ -1223,7 +1245,10 @@ export class ExtractionOrchestrator {
       const fileContents = await Promise.all(
         batch.map(async (fp) => {
           try {
-            const fullPath = validatePathWithinRoot(this.rootDir, fp);
+            // Indexing read: follow in-root symlinks the directory walk already
+            // descended into (the `../` guard still applies) so files reached
+            // via an in-root symlink-to-outside still index (#935).
+            const fullPath = validatePathWithinRoot(this.rootDir, fp, { allowSymlinkEscape: true });
             if (!fullPath) {
               logWarn('Path traversal blocked in batch reader', { filePath: fp });
               return { filePath: fp, content: null as string | null, stats: null as fs.Stats | null, error: new Error('Path traversal blocked') };
@@ -1312,7 +1337,7 @@ export class ExtractionOrchestrator {
 
         // Store in database on main thread (SQLite is not thread-safe)
         if (result.nodes.length > 0 || result.errors.length === 0) {
-          const language = detectLanguage(filePath, content);
+          const language = detectLanguage(filePath, content, overrides);
           this.storeExtractionResult(filePath, content, language, stats, result);
         }
 
@@ -1333,7 +1358,7 @@ export class ExtractionOrchestrator {
           // Files with no symbols but no errors (yaml, twig, properties) are
           // tracked at the file level — count them as indexed so the CLI
           // doesn't misleadingly report "No files found to index".
-          const lang = detectLanguage(filePath, content);
+          const lang = detectLanguage(filePath, content, overrides);
           if (isFileLevelOnlyLanguage(lang)) {
             filesIndexed++;
           } else {
@@ -1393,7 +1418,7 @@ export class ExtractionOrchestrator {
         }
 
         if (result.nodes.length > 0 || result.errors.length === 0) {
-          const language = detectLanguage(filePath, content);
+          const language = detectLanguage(filePath, content, overrides);
           const stats = await fsp.stat(path.join(this.rootDir, filePath));
           this.storeExtractionResult(filePath, content, language, stats, result);
 
@@ -1444,7 +1469,7 @@ export class ExtractionOrchestrator {
           }
 
           if (result.nodes.length > 0 || result.errors.length === 0) {
-            const language = detectLanguage(filePath, fullContent);
+            const language = detectLanguage(filePath, fullContent, overrides);
             const stats = await fsp.stat(path.join(this.rootDir, filePath));
             this.storeExtractionResult(filePath, fullContent, language, stats, result);
 
@@ -1529,7 +1554,8 @@ export class ExtractionOrchestrator {
    * Index a single file
    */
   async indexFile(relativePath: string): Promise<ExtractionResult> {
-    const fullPath = validatePathWithinRoot(this.rootDir, relativePath);
+    // Indexing read: follow in-root symlinks (the `../` guard still applies), #935.
+    const fullPath = validatePathWithinRoot(this.rootDir, relativePath, { allowSymlinkEscape: true });
 
     if (!fullPath) {
       return {
@@ -1576,8 +1602,8 @@ export class ExtractionOrchestrator {
     content: string,
     stats: fs.Stats
   ): Promise<ExtractionResult> {
-    // Prevent path traversal
-    const fullPath = validatePathWithinRoot(this.rootDir, relativePath);
+    // Prevent `../` traversal; follow in-root symlinks like the directory walk (#935).
+    const fullPath = validatePathWithinRoot(this.rootDir, relativePath, { allowSymlinkEscape: true });
     if (!fullPath) {
       logWarn('Path traversal blocked in indexFileWithContent', { relativePath });
       return {
@@ -1607,8 +1633,8 @@ export class ExtractionOrchestrator {
       };
     }
 
-    // Detect language
-    const language = detectLanguage(relativePath, content);
+    // Detect language (honoring the project's codegraph.json extension overrides)
+    const language = detectLanguage(relativePath, content, loadExtensionOverrides(this.rootDir));
     if (!isLanguageSupported(language)) {
       return {
         nodes: [],
@@ -1863,7 +1889,8 @@ export class ExtractionOrchestrator {
 
     // Load only grammars needed for changed files
     if (filesToIndex.length > 0) {
-      const neededLanguages = [...new Set(filesToIndex.map((f) => detectLanguage(f)))];
+      const overrides = loadExtensionOverrides(this.rootDir);
+      const neededLanguages = [...new Set(filesToIndex.map((f) => detectLanguage(f, undefined, overrides)))];
       // .h files default to 'c' but may be C++ — ensure cpp grammar is loaded
       if (neededLanguages.includes('c') && !neededLanguages.includes('cpp')) {
         neededLanguages.push('cpp');

+ 5 - 2
src/extraction/parse-worker.ts

@@ -55,14 +55,17 @@ import type { Language, ExtractionResult } from '../types';
 const PARSER_RESET_INTERVAL = 5000;
 const parseCounts = new Map<Language, number>();
 
-parentPort!.on('message', async (msg: { type: string; id?: number; filePath?: string; content?: string; languages?: Language[]; frameworkNames?: string[] }) => {
+parentPort!.on('message', async (msg: { type: string; id?: number; filePath?: string; content?: string; languages?: Language[]; frameworkNames?: string[]; language?: Language }) => {
   if (msg.type === 'load-grammars') {
     await loadGrammarsForLanguages(msg.languages!);
     parentPort!.postMessage({ type: 'grammars-loaded' });
   } else if (msg.type === 'parse') {
     const { id, filePath, content, frameworkNames } = msg;
     try {
-      const language = detectLanguage(filePath!, content);
+      // The main thread resolves the language (it holds the project's
+      // codegraph.json extension overrides) and sends it; fall back to detection
+      // for older callers / safety.
+      const language = msg.language ?? detectLanguage(filePath!, content);
       const result: ExtractionResult = extractFromSource(filePath!, content!, language, frameworkNames);
 
       // Periodic parser reset to reclaim WASM heap memory

+ 8 - 0
src/mcp/tools.ts

@@ -1651,6 +1651,14 @@ export class ToolHandler {
         registeredAt,
       };
     }
+    if (m?.synthesizedBy === 'goframe-route') {
+      const route = m.route ? `\`${String(m.route)}\`` : 'a route';
+      return {
+        label: `GoFrame route ${route} — reflective Bind → controller method (dynamic dispatch)`,
+        compact: `dynamic: GoFrame route ${m.route ? String(m.route) : ''}${at}`,
+        registeredAt,
+      };
+    }
     // Generic fallback for any other synthesizer (redux-thunk, gin-middleware-chain,
     // flutter-build, …): a synthesized hop must never read as a bare static `calls`.
     // It's a dynamic-dispatch bridge — label it as one and keep its wiring site.

+ 155 - 0
src/project-config.ts

@@ -0,0 +1,155 @@
+/**
+ * Project-scoped configuration: a committed `codegraph.json` at the project
+ * root that a team shares through version control.
+ *
+ * Today it carries one thing — `extensions`, an opt-in map from a custom file
+ * extension to one of CodeGraph's supported languages. The built-in
+ * extension → language table (`EXTENSION_MAP` in `extraction/grammars.ts`) is
+ * otherwise hardcoded, so a codebase that uses a non-standard extension for a
+ * supported language (e.g. `.dota_lua` for Lua) sees those files silently
+ * skipped. This lets the project map them once, in a version-controlled file:
+ *
+ *   {
+ *     "extensions": {
+ *       ".dota_lua": "lua",
+ *       ".tpl": "php"
+ *     }
+ *   }
+ *
+ * User mappings merge on TOP of the built-ins and win on conflict, so a project
+ * can also re-point a built-in extension (e.g. force `.h` → `cpp`). Absent or
+ * malformed config is the zero-config default — no overrides, no error. Invalid
+ * individual entries are warned-and-skipped (never fatal): an unparseable
+ * project file must not break indexing.
+ */
+import * as fs from 'fs';
+import * as path from 'path';
+import { Language } from './types';
+import { isLanguageSupported } from './extraction/grammars';
+import { logWarn } from './errors';
+
+/** Filename of the project-scoped config, resolved relative to the project root. */
+export const PROJECT_CONFIG_FILENAME = 'codegraph.json';
+
+export interface ProjectConfig {
+  /** Map of custom file extension (`.foo`) to a supported language id. */
+  extensions?: Record<string, string>;
+}
+
+interface CacheEntry {
+  mtimeMs: number;
+  overrides: Record<string, Language>;
+}
+
+/**
+ * Cache keyed by project root. The loader is called once per indexing/scan/sync
+ * operation (and per watch event), so the mtime guard keeps repeat calls to one
+ * `stat` while a single `codegraph.json` is in force. Keying by root keeps two
+ * projects in the same process (the daemon / multi-project MCP server) isolated.
+ */
+const overridesCache = new Map<string, Record<string, Language>>();
+const cacheMeta = new Map<string, CacheEntry>();
+
+/** Shared frozen empty map so the no-config path allocates nothing. */
+const EMPTY: Record<string, Language> = Object.freeze({});
+
+/**
+ * Normalize a user-provided extension key to the `.ext` lowercase form used by
+ * the built-in map. Returns null for keys that can never match a real file
+ * extension (so the caller warns and skips):
+ *   - empty / just "."
+ *   - multi-part (".d.ts") — language detection keys off the FINAL extension
+ *     only (`lastIndexOf('.')`), so a multi-dot key would never be consulted.
+ *   - anything containing a path separator.
+ */
+function normalizeExtKey(raw: string): string | null {
+  if (typeof raw !== 'string') return null;
+  let ext = raw.trim().toLowerCase();
+  if (!ext) return null;
+  if (!ext.startsWith('.')) ext = '.' + ext;
+  const body = ext.slice(1);
+  if (!body) return null;
+  if (body.includes('.') || body.includes('/') || body.includes('\\')) return null;
+  return ext;
+}
+
+/**
+ * Parse and validate the `extensions` map out of a `codegraph.json` file.
+ * Every failure mode degrades to "no overrides from this entry" — a bad file or
+ * a typo'd language never throws.
+ */
+function parseExtensionOverrides(file: string): Record<string, Language> {
+  let raw: string;
+  try {
+    raw = fs.readFileSync(file, 'utf-8');
+  } catch {
+    return EMPTY;
+  }
+
+  let parsed: unknown;
+  try {
+    parsed = JSON.parse(raw);
+  } catch (err) {
+    logWarn(`Ignoring ${PROJECT_CONFIG_FILENAME}: not valid JSON`, {
+      file,
+      error: err instanceof Error ? err.message : String(err),
+    });
+    return EMPTY;
+  }
+
+  if (!parsed || typeof parsed !== 'object') return EMPTY;
+  const exts = (parsed as ProjectConfig).extensions;
+  if (!exts || typeof exts !== 'object' || Array.isArray(exts)) return EMPTY;
+
+  const out: Record<string, Language> = {};
+  for (const [rawKey, rawVal] of Object.entries(exts)) {
+    const key = normalizeExtKey(rawKey);
+    if (!key) {
+      logWarn(`Ignoring extension mapping in ${PROJECT_CONFIG_FILENAME}: "${rawKey}" is not a valid file extension`, { file });
+      continue;
+    }
+    if (typeof rawVal !== 'string' || !isLanguageSupported(rawVal as Language)) {
+      logWarn(`Ignoring extension "${rawKey}" in ${PROJECT_CONFIG_FILENAME}: "${String(rawVal)}" is not a supported language`, { file });
+      continue;
+    }
+    out[key] = rawVal as Language;
+  }
+
+  return Object.keys(out).length > 0 ? out : EMPTY;
+}
+
+/**
+ * Load the validated extension overrides for a project, mtime-cached.
+ *
+ * Returns a map of `.ext` → supported language id. The result merges on top of
+ * the built-in extension map at the point of use (see `detectLanguage` /
+ * `isSourceFile`), with these user mappings taking precedence. Returns an empty
+ * map when there is no `codegraph.json` (the zero-config default).
+ */
+export function loadExtensionOverrides(rootDir: string): Record<string, Language> {
+  const file = path.join(rootDir, PROJECT_CONFIG_FILENAME);
+
+  let mtimeMs: number;
+  try {
+    mtimeMs = fs.statSync(file).mtimeMs;
+  } catch {
+    // No config file — drop any stale cache entry and return the default.
+    cacheMeta.delete(rootDir);
+    overridesCache.delete(rootDir);
+    return EMPTY;
+  }
+
+  const meta = cacheMeta.get(rootDir);
+  if (meta && meta.mtimeMs === mtimeMs) return meta.overrides;
+
+  const overrides = parseExtensionOverrides(file);
+  cacheMeta.set(rootDir, { mtimeMs, overrides });
+  overridesCache.set(rootDir, overrides);
+  return overrides;
+}
+
+/** Test/maintenance hook: forget cached config (e.g. after rewriting it in a test). */
+export function clearProjectConfigCache(): void {
+  cacheMeta.clear();
+  overridesCache.clear();
+}

+ 3 - 0
src/resolution/callback-synthesizer.ts

@@ -27,6 +27,7 @@ import type { ResolutionContext } from './types';
 import { isGeneratedFile } from '../extraction/generated-detection';
 import { stripCommentsForRegex } from './strip-comments';
 import { cFnPointerDispatchEdges } from './c-fnptr-synthesizer';
+import { goframeRouteEdges } from './goframe-synthesizer';
 
 const REGISTRAR_NAME = /^(on[A-Z]\w*|subscribe|addListener|addEventListener|register|watch|listen|addCallback)$/;
 const DISPATCHER_NAME = /(emit|trigger|notify|dispatch|fire|publish|flush)/i;
@@ -2703,6 +2704,7 @@ export function synthesizeCallbackEdges(queries: QueryBuilder, ctx: ResolutionCo
   const sidekiqEdges = sidekiqDispatchEdges(ctx);
   const laravelEdges = laravelEventEdges(ctx);
   const cFnPtrEdges = cFnPointerDispatchEdges(queries, ctx);
+  const goframeEdges = goframeRouteEdges(ctx);
 
   const merged: Edge[] = [];
   const seen = new Set<string>();
@@ -2737,6 +2739,7 @@ export function synthesizeCallbackEdges(queries: QueryBuilder, ctx: ResolutionCo
     ...sidekiqEdges,
     ...laravelEdges,
     ...cFnPtrEdges,
+    ...goframeEdges,
   ]) {
     const key = `${e.source}>${e.target}`;
     if (seen.has(key)) continue;

+ 118 - 0
src/resolution/frameworks/goframe.ts

@@ -0,0 +1,118 @@
+/**
+ * GoFrame Framework Resolver (route metadata) — issue #747.
+ *
+ * GoFrame's "standard router" binds routes reflectively, so there is no literal
+ * path string at a `.GET("/x", handler)` call site and no static edge from a
+ * route to the controller method that serves it. The structural facts live in
+ * two places, joined only at runtime by GoFrame:
+ *
+ *   // api/user/v1/user_sign_in.go — the route lives in a struct tag on the request type
+ *   type SignInReq struct {
+ *       g.Meta `path:"/user/sign-in" method:"post" tags:"UserService" summary:"…"`
+ *       …
+ *   }
+ *   // internal/controller/user/user_v1_sign_in.go — the handler takes *that* request type
+ *   func (c *ControllerV1) SignIn(ctx context.Context, req *v1.SignInReq) (res *v1.SignInRes, err error)
+ *   // internal/cmd/cmd.go — reflective binding (no path, no handler name)
+ *   group.Bind(user.NewV1())
+ *
+ * This resolver handles the FIRST half: it reads the `g.Meta` struct tag on a
+ * request type into a `route` node (`POST /user/sign-in`). The route → handler
+ * EDGE is the genuinely reflective part — the method name is NOT derivable from
+ * the request type (`DeptSearchReq` is served by `List`, `DeptAddReq` by `Add`),
+ * so the only reliable join is the request type appearing in the method's
+ * parameter signature. That whole-graph join is done by the companion
+ * `goframeRouteEdges` synthesizer, which reads the request type back out of the
+ * route node's qualifiedName.
+ *
+ * Honesty note: the route node carries the `g.Meta` path verbatim. The group
+ * prefix from `s.Group("/api", …)` / nested `group.Group("/v1", …)` is applied
+ * by reflective `Bind` at runtime and is deliberately NOT reconstructed here —
+ * the discriminating, structural part is the per-route path + method.
+ */
+
+import { Node } from '../../types';
+import { FrameworkResolver, UnresolvedRef, ResolvedRef, ResolutionContext } from '../types';
+import { stripCommentsForRegex } from '../strip-comments';
+
+/**
+ * A request type carrying a routable `g.Meta` tag. `g.Meta` is, by GoFrame
+ * convention, the first embedded field of the struct, so anchoring on
+ * `struct { g.Meta `…` }` is both precise and cheap. Response types embed
+ * `g.Meta` too but tag it `mime:"…"` with no `path:` — the path requirement
+ * below filters them out.
+ */
+const GOFRAME_META_RE = /\btype\s+([A-Z]\w*)\s+struct\s*\{\s*g\.Meta\s+`([^`]*)`/g;
+const META_PATH_RE = /\bpath:"([^"]+)"/;
+const META_METHOD_RE = /\bmethod:"([^"]+)"/;
+const GO_PACKAGE_RE = /^\s*package\s+(\w+)/m;
+
+/** Marker embedded in a route node's qualifiedName so the synthesizer can read
+ *  back the request type to join on. The value after it is the package-qualified
+ *  request type (`cash.ListReq`) — the package disambiguates the many identical
+ *  bare names (`ListReq`, `GetReq`) a large app defines, one per module. Falls
+ *  back to the bare type when no `package` declaration is found. */
+export const GOFRAME_ROUTE_MARKER = '::goframe-route:';
+
+export const goframeResolver: FrameworkResolver = {
+  name: 'goframe',
+  languages: ['go'],
+
+  detect(context: ResolutionContext): boolean {
+    const goMod = context.readFile('go.mod');
+    // GoFrame is `github.com/gogf/gf` (v1) or `github.com/gogf/gf/v2` (v2).
+    return !!goMod && goMod.includes('github.com/gogf/gf');
+  },
+
+  extract(filePath, content) {
+    if (!filePath.endsWith('.go')) return { nodes: [], references: [] };
+    // Cheap reject: the file must mention g.Meta at all.
+    if (!content.includes('g.Meta')) return { nodes: [], references: [] };
+
+    const nodes: Node[] = [];
+    const now = Date.now();
+    const safe = stripCommentsForRegex(content, 'go');
+    const pkg = GO_PACKAGE_RE.exec(safe)?.[1];
+
+    GOFRAME_META_RE.lastIndex = 0;
+    let match: RegExpExecArray | null;
+    while ((match = GOFRAME_META_RE.exec(safe)) !== null) {
+      const [, requestType, tag] = match;
+      const pathMatch = META_PATH_RE.exec(tag!);
+      if (!pathMatch) continue; // response `g.Meta `mime:…`` and other non-route metadata
+      const routePath = pathMatch[1]!;
+      const methodMatch = META_METHOD_RE.exec(tag!);
+      // GoFrame defaults to all methods when `method:` is omitted.
+      const method = methodMatch ? methodMatch[1]!.toUpperCase() : 'ANY';
+      const line = safe.slice(0, match.index).split('\n').length;
+      // The handler's signature qualifies the request type with its package
+      // (`req *cash.ListReq`); encode `pkg.Type` so the synthesizer can match it.
+      const joinKey = pkg ? `${pkg}.${requestType}` : requestType!;
+
+      nodes.push({
+        id: `route:${filePath}:${line}:${method}:${routePath}`,
+        kind: 'route',
+        name: `${method} ${routePath}`,
+        // The request type is the synthesizer's join key — encode it after the
+        // marker. The path stays human-readable in `name`.
+        qualifiedName: `${filePath}${GOFRAME_ROUTE_MARKER}${joinKey}`,
+        filePath,
+        startLine: line,
+        endLine: line,
+        startColumn: 0,
+        endColumn: match[0].length,
+        language: 'go',
+        updatedAt: now,
+      });
+    }
+
+    return { nodes, references: [] };
+  },
+
+  // The route → controller-method edge is reflective (request-type join across
+  // files) and is built by the `goframeRouteEdges` synthesizer after the graph
+  // is complete. This resolver creates no references of its own.
+  resolve(_ref: UnresolvedRef, _context: ResolutionContext): ResolvedRef | null {
+    return null;
+  },
+};

+ 3 - 0
src/resolution/frameworks/index.ts

@@ -19,6 +19,7 @@ import { railsResolver } from './ruby';
 import { springResolver } from './java';
 import { playResolver } from './play';
 import { goResolver } from './go';
+import { goframeResolver } from './goframe';
 import { rustResolver } from './rust';
 import { aspnetResolver } from './csharp';
 import { swiftUIResolver, uikitResolver, vaporResolver } from './swift';
@@ -52,6 +53,7 @@ const FRAMEWORK_RESOLVERS: FrameworkResolver[] = [
   playResolver,
   // Go
   goResolver,
+  goframeResolver,
   // Rust
   rustResolver,
   // C#
@@ -136,6 +138,7 @@ export { railsResolver } from './ruby';
 export { springResolver } from './java';
 export { playResolver } from './play';
 export { goResolver } from './go';
+export { goframeResolver } from './goframe';
 export { rustResolver } from './rust';
 export { aspnetResolver } from './csharp';
 export { swiftUIResolver, uikitResolver, vaporResolver } from './swift';

+ 144 - 0
src/resolution/goframe-synthesizer.ts

@@ -0,0 +1,144 @@
+/**
+ * GoFrame route → controller-method dispatch synthesis (#747).
+ *
+ * GoFrame binds routes reflectively (`group.Bind(user.NewV1())`), so the route
+ * declared in a request type's `g.Meta` tag has no static edge to the method
+ * that serves it. The `goframeResolver` extract pass turns each `g.Meta` into a
+ * `route` node carrying its request type in the qualifiedName; this whole-graph
+ * pass closes the loop by joining each route to its handler.
+ *
+ * The join key is the REQUEST TYPE, not the method name — GoFrame method names
+ * are free (`DeptSearchReq` is served by `List`, `DeptAddReq` by `Add`), so the
+ * only reliable link is the request type appearing in the handler's parameter
+ * signature:
+ *
+ *   func (c *sysDeptController) Add(ctx context.Context, req *system.DeptAddReq) (…)
+ *                                                              ^^^^^^^^^^^^^^^^  the join
+ *
+ * Go method nodes already carry that signature, so no source re-read is needed.
+ * Each synthesized edge is `kind:'calls'`, `provenance:'heuristic'`,
+ * `metadata.synthesizedBy:'goframe-route'` — a reflective-dispatch bridge, so
+ * `codegraph_explore` surfaces it as a dynamic hop rather than a literal call,
+ * and the handler's callers list the route that reaches it. A project with no
+ * GoFrame routes is a no-op.
+ */
+
+import type { Edge, Node } from '../types';
+import type { ResolutionContext } from './types';
+import { GOFRAME_ROUTE_MARKER } from './frameworks/goframe';
+
+const FANOUT_CAP = 2000; // backstop only; real apps are 1 route → 1 method.
+
+/**
+ * Pointer-parameter types in a Go method signature, in both qualified and bare
+ * forms: `(ctx context.Context, req *cash.ListReq)` → `["cash.ListReq",
+ * "ListReq"]`. The qualified form disambiguates the many identical bare names a
+ * large app defines (one `ListReq` per module); the bare form is the fallback
+ * for a same-package (unqualified) handler. The response pointer (`*cash.ListRes`)
+ * is captured too but never matches a request type, so it drops out of the join.
+ */
+function pointerParamTypes(sig: string): string[] {
+  const out: string[] = [];
+  const re = /\*\s*(?:(\w+)\.)?([A-Z]\w*)\b/g;
+  let m: RegExpExecArray | null;
+  while ((m = re.exec(sig)) !== null) {
+    if (m[1]) out.push(`${m[1]}.${m[2]}`);
+    out.push(m[2]!);
+  }
+  return out;
+}
+
+/** The addon/plugin module a path lives under (`addons/hgexample/…` → `hgexample`),
+ *  or `''` for the core app. Large GoFrame apps ship demo addons that CLONE the
+ *  whole module tree — identical package names and request types — so the package
+ *  qualifier can't tell an addon's `config.GetReq` from core's. The addon root can. */
+function addonRoot(p: string): string {
+  return /(?:^|\/)addons\/([^/]+)\//.exec(p)?.[1] ?? '';
+}
+
+/**
+ * Pick the one handler for a route from same-request-type candidates. Usually a
+ * single candidate. When several share the request type (a cloned addon module),
+ * keep controller-dir methods, then the one in the route's own module (core route
+ * → core handler, addon route → that addon's handler). Ambiguity left over ⇒ no
+ * edge (silent beats wrong).
+ */
+function selectHandler(candidates: Node[], routeFile: string): Node | null {
+  if (candidates.length === 1) return candidates[0]!;
+  let cands = candidates.filter((h) => /\/controller(s)?\//.test(h.filePath));
+  if (cands.length === 0) cands = candidates;
+  if (cands.length === 1) return cands[0]!;
+  const ar = addonRoot(routeFile);
+  const sameModule = cands.filter((h) => addonRoot(h.filePath) === ar);
+  return sameModule.length === 1 ? sameModule[0]! : null;
+}
+
+export function goframeRouteEdges(ctx: ResolutionContext): Edge[] {
+  // Route nodes the goframe extractor created, keyed by their package-qualified
+  // request type (`cash.ListReq`). `wanted` holds every key a handler signature
+  // could match — the qualified form plus its bare type fallback.
+  const routesByReqType = new Map<string, Node[]>();
+  const wanted = new Set<string>();
+  for (const route of ctx.getNodesByKind('route')) {
+    if (route.language !== 'go') continue;
+    const marker = route.qualifiedName.indexOf(GOFRAME_ROUTE_MARKER);
+    if (marker < 0) continue;
+    const joinKey = route.qualifiedName.slice(marker + GOFRAME_ROUTE_MARKER.length);
+    if (!joinKey) continue;
+    let arr = routesByReqType.get(joinKey);
+    if (!arr) { arr = []; routesByReqType.set(joinKey, arr); }
+    arr.push(route);
+    wanted.add(joinKey);
+    const dot = joinKey.lastIndexOf('.');
+    if (dot >= 0) wanted.add(joinKey.slice(dot + 1)); // bare fallback
+  }
+  if (routesByReqType.size === 0) return [];
+
+  // Handler candidates: Go methods whose signature takes a wanted request type by
+  // pointer, indexed by every matching (qualified + bare) form so a route can
+  // match precisely on `pkg.Type` and fall back to the bare `Type`.
+  const handlersByKey = new Map<string, Node[]>();
+  for (const method of ctx.getNodesByKind('method')) {
+    if (method.language !== 'go' || !method.signature) continue;
+    for (const t of pointerParamTypes(method.signature)) {
+      if (!wanted.has(t)) continue;
+      let arr = handlersByKey.get(t);
+      if (!arr) { arr = []; handlersByKey.set(t, arr); }
+      arr.push(method);
+    }
+  }
+
+  const edges: Edge[] = [];
+  const seen = new Set<string>();
+  let added = 0;
+  for (const [joinKey, routes] of routesByReqType) {
+    const bare = joinKey.includes('.') ? joinKey.slice(joinKey.lastIndexOf('.') + 1) : joinKey;
+    // Precise package-qualified match first; bare type only as a fallback (covers
+    // a same-package handler or an aliased import where the bare name is unique).
+    const candidates = handlersByKey.get(joinKey) ?? handlersByKey.get(bare);
+    if (!candidates || candidates.length === 0) continue;
+    const requestType = bare;
+    for (const route of routes) {
+      const handler = selectHandler(candidates, route.filePath);
+      if (!handler || route.id === handler.id) continue;
+      const key = `${route.id}>${handler.id}`;
+      if (seen.has(key) || added >= FANOUT_CAP) continue;
+      seen.add(key);
+      edges.push({
+        source: route.id,
+        target: handler.id,
+        kind: 'calls',
+        line: route.startLine,
+        provenance: 'heuristic',
+        metadata: {
+          synthesizedBy: 'goframe-route',
+          route: route.name,
+          requestType,
+          registeredAt: `${handler.filePath}:${handler.startLine}`,
+        },
+      });
+      added++;
+    }
+  }
+  return edges;
+}

+ 2 - 1
src/sync/watcher.ts

@@ -34,6 +34,7 @@
 import * as fs from 'fs';
 import * as path from 'path';
 import { isSourceFile, buildScopeIgnore, type ScopeIgnore } from '../extraction';
+import { loadExtensionOverrides } from '../project-config';
 import { logDebug, logWarn } from '../errors';
 import { normalizePath } from '../utils';
 import { isCodeGraphDataDir } from '../directory';
@@ -535,7 +536,7 @@ export class FileWatcher {
     if (!rel || rel === '.' || rel.startsWith('..')) return;
     if (this.isAlwaysIgnored(rel)) return;
     if (this.ignoreMatcher && this.ignoreMatcher.ignores(rel)) return;
-    if (!isSourceFile(rel)) return;
+    if (!isSourceFile(rel, loadExtensionOverrides(this.projectRoot))) return;
 
     logDebug('File change detected', { file: rel });
     if (this.ready) {

+ 24 - 2
src/utils.ts

@@ -91,25 +91,47 @@ function isWithinDir(child: string, parent: string): boolean {
  * (codegraph_node `includeCode`, codegraph_explore source) go through here, so
  * this is the chokepoint that keeps out-of-root file contents from leaking.
  *
+ * `allowSymlinkEscape` waives **only** the realpath-escape rejection (the
+ * lexical `../` guard still applies) for the INDEXING read path. The directory
+ * walk deliberately descends into in-root symlinks whose targets live outside
+ * the root (e.g. a `game/` symlink in a Dota custom-game tree, #935); discovery
+ * and the reader must agree, or every file the walk enumerated fails to index.
+ * Indexing only reads paths it just discovered, into a local index — it never
+ * serves them to an agent, so this does not widen the #527 leak surface. The
+ * content-serving sinks must never pass this flag.
+ *
  * @param projectRoot - The project root directory
  * @param filePath - The (relative or absolute) file path to validate
+ * @param options.allowSymlinkEscape - Follow in-root symlinks out of the root
+ *   (indexing read path only); defaults to the strict, leak-safe behavior.
  * @returns The resolved absolute path (realpath when it exists), or null if it
  *   escapes the root
  */
-export function validatePathWithinRoot(projectRoot: string, filePath: string): string | null {
+export function validatePathWithinRoot(
+  projectRoot: string,
+  filePath: string,
+  options?: { allowSymlinkEscape?: boolean }
+): string | null {
   const resolved = path.resolve(projectRoot, filePath);
   const normalizedRoot = path.resolve(projectRoot);
 
-  // 1. Lexical containment — cheap, catches `../` traversal.
+  // 1. Lexical containment — cheap, catches `../` traversal. Applies even on
+  //    the indexing read path: a crafted `../` escape is still rejected.
   if (!isWithinDir(resolved, normalizedRoot)) {
     return null;
   }
 
   // 2. Symlink-aware containment — resolve symlinks on both sides and re-check,
   //    so an in-repo symlink whose real target escapes the root is rejected.
+  //    The indexing read path (allowSymlinkEscape) skips only this rejection so
+  //    it stays consistent with the directory walk, which already followed the
+  //    in-root symlink to enumerate these files (#935).
   try {
     const realRoot = fs.realpathSync(normalizedRoot);
     const realResolved = fs.realpathSync(resolved);
+    if (options?.allowSymlinkEscape) {
+      return realResolved;
+    }
     return isWithinDir(realResolved, realRoot) ? realResolved : null;
   } catch (err) {
     // ENOENT: the path doesn't exist yet (a file about to be written, or an

Kaikkia tiedostoja ei voida näyttää, sillä liian monta tiedostoa muuttui tässä diffissä