3 Commits ba209d9489 ... a89315645d

Tác giả SHA1 Thông báo Ngày
  Colby Mchenry a89315645d feat(go): index GoFrame g.Meta routes and bind them to controller methods (#747) (#957) 15 giờ trước cách đây
  Colby Mchenry 6459ead6aa fix(extraction): index files reached through in-root symlinks that point outside the repo (#935) (#956) 15 giờ trước cách đây
  Colby Mchenry d1121e46f0 feat(config): map custom file extensions to languages via codegraph.json (#906) (#955) 17 giờ trước cách đây

+ 3 - 0
CHANGELOG.md

@@ -23,6 +23,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 - `codegraph_explore` now follows **Laravel events** in PHP. An `event(new OrderShipped($order))` call now links to every listener that handles it — each listener's `handle()` method, usually a separate `app/Listeners/` class — so "what reacts to this event?" traces from the dispatch straight into the listener bodies. Listeners are found both ways Laravel registers them: by a typed `handle(OrderShipped $event)` (auto-discovery, including a `handle(A|B $event)` union that listens for two events) and by the `protected $listen` map in your `EventServiceProvider` (which also catches a listener whose `handle()` has no type-hint). One event fans out to all its listeners, and queued jobs — dispatched via `::dispatch()` rather than `event()` — are correctly left out.
 - CodeGraph now understands **Lombok**-generated methods in Java. `@Getter`, `@Setter`, `@Data`, `@Value`, and `@Builder` generate getters, setters, `builder()`, `equals`/`hashCode`/`toString`, and the `@Slf4j` `log` field at compile time, so those methods never appear in the source — and a `user.getName()`, `User.builder()`, or `log.info(...)` call used to resolve to nothing, silently breaking call-chain analysis (the agent would conclude the method didn't exist and reconstruct it by hand). Those members are now indexed from the annotations and fields, so they appear in `codegraph search` and `codegraph_explore`/`codegraph_node`, and callers trace through them like any hand-written method. They're marked as Lombok-generated so they read as generated, not hand-written; a method you write yourself is never overridden, static fields get no accessor, and a class without Lombok is unaffected. Thanks @git87663849. (#912)
 - `codegraph_explore` now follows **C and C++ function-pointer dispatch**. C does polymorphism with function pointers: a struct carries a function-pointer field, concrete functions are registered into it through a table (`static struct cmd commands[] = {{"add", cmd_add}, …}`), a designated initializer (`.handler = on_open`), or an assignment, and the code dispatches indirectly (`p->fn(argv)`). None of that was visible to analysis — the indirect call resolved to nothing, so `git`'s command runner looked like it called nothing and a vtable's implementations had no callers. CodeGraph now links the dispatch site to the registered handlers, keyed by the struct field, so "what runs when this dispatches?" traces from `p->fn(...)` into every function registered for that field. This covers the command-table idiom (git, redis) and the ops-struct/vtable idiom (curl's content-encoders, protocol handlers), including the case where a generic hook slot is reassigned from a registry (`h->func = found->fn`). It stays precise — distinct function-pointer fields don't cross-link, a plain data field is never treated as a dispatch, and a project without function-pointer dispatch is unaffected. (#932)
+- `codegraph_explore` now follows **GoFrame** route bindings in Go. GoFrame's standard router wires routes reflectively: the path and method live in a `g.Meta` struct tag on a request type (`` g.Meta `path:"/user/sign-in" method:"post"` ``), the controller method that serves it is matched by that request type, and the two are joined at runtime by `group.Bind(...)` — so there was no path string and no edge from a route to its handler, and "where is `/user/sign-in` handled?" or "where are the routes bound to controllers?" could only be answered by reading. CodeGraph now indexes each `g.Meta` route as a real route node and links it to the controller method whose signature takes that request type, so a route resolves to its handler structurally in one `codegraph_explore` call. The link is by request type, not method name — so it's correct even when the two differ (a `DeptSearchReq` served by a `List` method); it tells apart the many identical request types a large app defines one-per-module (`cash.ListReq` vs `order.ListReq`) by package, including cloned addon modules; and a route whose handler isn't present is left unlinked rather than guessed. (#747)
 
 - `codegraph_explore` now surfaces the right code in large multi-layer projects. When you ask a backend-flow question in a repo that pairs an API server with a big frontend that mirrors the same domain words — say an `app/` admin UI sitting over an `api/` server — the server-side file that genuinely matches several of your query's terms is no longer pushed out of the results by the larger, more interconnected frontend layer. A file corroborated by two or more distinct query terms is now kept in the answer even when a denser unrelated layer would otherwise crowd it out, so "how does X read items / handle the request" returns the service or handler that does the work instead of a wall of frontend views. Single-layer projects are unaffected; set `CODEGRAPH_RANK_NO_MULTITERM=1` to revert to the previous ranking.
 - Impact and blast-radius analysis for TypeScript, JavaScript, Go, Python, Rust, Ruby, C, Java, C#, PHP, Scala, Kotlin, Swift, Dart, and Pascal/Delphi now understands the readers of a constant. When you change a file-scope, package-level, module-level, or class-level constant — a config object, a lookup table, a shared constant — the other symbols in that file that read it now show up as affected, where before they were invisible (impact only followed calls, imports, and inheritance, so a constant's consumers looked like "nothing depends on this"). This makes `codegraph impact`, and the impact trail in `codegraph_explore`/`codegraph_node`, catch the "change this table, break its readers" class of change. It's on by default and adds no nodes to your graph; bundled/minified files and ambiguously-shadowed names are skipped to keep results precise. Set `CODEGRAPH_VALUE_REFS=0` to turn it off.
@@ -30,6 +31,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 - Java `static final` constants, C# `const` / `static readonly` constants, Scala `object` vals, and Kotlin top-level / `object` / `companion object` `val`s are now classified as constants rather than generic fields, so they participate in the constant-reader impact analysis above — change a `public static final` table, a `const string`, a Scala `object Config { val Timeout = … }`, or a Kotlin `companion object { const val … }` and the methods that read it now show up as affected. (Per-object Java `final` / C# `readonly` / Scala & Kotlin `class` instance properties are unchanged.) Kotlin constants were previously not indexed as their own symbols at all, so they now also appear in `codegraph search`.
 - Swift top-level `let`s and `static let` constants (including those namespaced in an `enum`/`struct`, the common Swift pattern) are now indexed as constants and participate in the constant-reader impact analysis above — change a `static let defaultRetryLimit` or an `enum Constants { static let … }` and the same-file code that reads it shows up as affected. Computed properties and per-instance `let`s are not treated as constants.
 - Dart top-level `const`/`final` and class `static const`/`static final` constants are now indexed as constants and participate in the constant-reader impact analysis above. Instance fields, `var`s, and locals are not treated as constants. (Generated Dart code with the standard `.g.dart`/`.freezed.dart`/`.pb.dart` suffixes is already skipped.)
+- You can now teach CodeGraph about custom file extensions. Drop a `codegraph.json` at your repo root with an `extensions` map — `{ "extensions": { ".dota_lua": "lua", ".tpl": "php" } }` — and files with those extensions get indexed under the language you name, instead of being silently skipped because the extension wasn't one of the built-in defaults. It's opt-in and committed alongside your code so the whole team shares it, your mappings layer on top of the built-ins and win on conflict (you can even re-point a built-in, e.g. `.h` → `cpp`), and a typo'd language or a malformed config is warned about and skipped rather than breaking indexing. Projects without a `codegraph.json` behave exactly as before. (#906)
 
 ### Fixes
 
@@ -48,6 +50,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 - A git worktree of a submodule is no longer indexed as a duplicate copy of that submodule's code. CodeGraph already skips ordinary worktrees (a second working view of a repo it indexes), but a worktree created *from a submodule* — common in monorepos that check submodules out into worktrees for parallel feature work — was mistaken for a genuine embedded repo and swept in, duplicating every symbol it shared with the real submodule checkout (one report had ~28% of its index as duplicates, inflating both query results and the database). These submodule worktrees are now recognized and skipped, while the submodule's own checkout stays indexed as distinct code. Thanks @charlesxu2026-ship-it. (#945)
 - A C++ class or struct annotated with an export/visibility macro — `class MYLIB_EXPORT Foo : public Bar { … }`, the common DLL-export / visibility pattern in headers — is no longer mis-indexed as a function spanning the whole class body. Not knowing the macro is a macro, the parser read it as a type and the whole declaration as a function, so the class surfaced as a phantom `function` that showed up as a false caller in `codegraph callers`, `codegraph impact`, and blast-radius analysis, and skewed symbol counts. CodeGraph now recognizes this misparse and drops the bogus node. Thanks @spwlyzx. (#946)
 - `codegraph status` no longer reports phantom pending changes for files CodeGraph deliberately keeps out of the index — a tracked file inside a committed dependency dir (a checked-in `vendor/` or `node_modules/`), or a tracked file under a `.gitignore`d directory. A full index correctly excludes these, and `codegraph sync` never indexes them, but the fast change-detector behind `codegraph status` flagged every edit to such a file as "modified" (and a new one as "added") — so the pending-changes count never cleared no matter how many times you synced. Change detection now applies the same ignore rules the full index does, so `status` agrees with `sync`, and tools built on CodeGraph's change-detection API get the same corrected list. (#766)
+- Files reached through a symlinked directory that points outside your project now index instead of being silently skipped. When a folder in your repo is a symlink to a location outside the repo root — the standard layout for Dota 2 custom games and similar SDK-linked projects, where `game/` and `content/` link into the engine tree — CodeGraph followed the symlink to discover those files but then blocked every one of them while reading, logging `Path traversal blocked in batch reader` and indexing nothing under them. The reader now agrees with the directory scan and indexes those files. The guard against `../` path escapes is unchanged, and the protection that keeps your agent from being served file contents from outside the repo is also unchanged — only the indexer's own read path follows these in-repo symlinks. (#935)
 
 
 ## [1.0.1] - 2026-06-13

+ 27 - 3
README.md

@@ -585,9 +585,10 @@ that drive the graph directly: `DatabaseConnection`, `QueryBuilder`,
 
 ## Configuration
 
-There isn't any — CodeGraph is zero-config, with **no config file** to write or
-keep in sync. Language support is automatic from the file extension; there's
-nothing to wire up per language.
+Next to none — CodeGraph is **zero-config by default**, with nothing to write or
+keep in sync to get started. Language support is automatic from the file
+extension; there's nothing to wire up per language. The one optional file is for
+mapping [custom file extensions](#custom-file-extensions).
 
 What it skips out of the box:
 
@@ -605,6 +606,29 @@ add a negation — `!vendor/`. The defaults apply uniformly, so committing a
 dependency or build directory doesn't force it into the graph; the `.gitignore`
 negation is the explicit opt-in.
 
+### Custom file extensions
+
+If your project uses a non-standard extension for a [supported
+language](#supported-languages) — say `.dota_lua` for Lua, or `.tpl` for PHP —
+those files are skipped by default, because the extension isn't one CodeGraph
+recognizes. Map them with an optional **`codegraph.json`** at your project root:
+
+```json
+{
+  "extensions": {
+    ".dota_lua": "lua",
+    ".tpl": "php"
+  }
+}
+```
+
+Each value is a supported language id. The mappings merge on top of the built-in
+defaults and win on conflict, so you can also re-point a built-in (e.g.
+`".h": "cpp"`). Commit the file to share the mapping with your team. A typo'd
+language or a malformed file is warned about and skipped — it never breaks
+indexing — and a project with no `codegraph.json` behaves exactly as before.
+Re-index (`codegraph index`) after adding or changing mappings.
+
 ## Telemetry
 
 CodeGraph collects **anonymous usage statistics** — which tools and commands get

+ 157 - 0
__tests__/extension-mapping.test.ts

@@ -0,0 +1,157 @@
+/**
+ * Custom extension → language mapping (#906).
+ *
+ * A project can map non-standard file extensions to a supported language via a
+ * committed `codegraph.json` at the repo root, so files that would otherwise be
+ * silently skipped get indexed under the right grammar. These tests cover the
+ * two choke-point functions (detectLanguage / isSourceFile) honoring an override
+ * map, the loader's validation/normalization/caching of `codegraph.json`, and a
+ * full index proving a custom-extension file is actually extracted — while the
+ * zero-config path stays byte-identical (the file is NOT indexed without config).
+ */
+import { describe, it, expect, beforeEach, afterEach } from 'vitest';
+import * as fs from 'node:fs';
+import * as path from 'node:path';
+import * as os from 'node:os';
+import { CodeGraph } from '../src';
+import { detectLanguage, isSourceFile } from '../src/extraction/grammars';
+import { loadExtensionOverrides, clearProjectConfigCache } from '../src/project-config';
+
+describe('custom extension → language mapping (#906)', () => {
+  describe('detectLanguage / isSourceFile overrides argument', () => {
+    it('maps a custom extension only when present in the overrides', () => {
+      expect(detectLanguage('a/b.foo')).toBe('unknown');
+      expect(isSourceFile('a/b.foo')).toBe(false);
+
+      expect(detectLanguage('a/b.foo', undefined, { '.foo': 'typescript' })).toBe('typescript');
+      expect(isSourceFile('a/b.foo', { '.foo': 'typescript' })).toBe(true);
+    });
+
+    it('lets a user mapping take precedence over a built-in extension', () => {
+      expect(detectLanguage('x.h')).toBe('c');
+      expect(detectLanguage('x.h', undefined, { '.h': 'cpp' })).toBe('cpp');
+    });
+
+    it('is byte-identical to zero-config behavior when no overrides are passed', () => {
+      expect(detectLanguage('x.ts')).toBe('typescript');
+      expect(detectLanguage('x.py')).toBe('python');
+      expect(isSourceFile('x.ts')).toBe(true);
+      expect(isSourceFile('x.unknownext')).toBe(false);
+    });
+  });
+
+  describe('loadExtensionOverrides (codegraph.json)', () => {
+    let dir: string;
+    beforeEach(() => {
+      dir = fs.mkdtempSync(path.join(os.tmpdir(), 'cg-extmap-'));
+      clearProjectConfigCache();
+    });
+    afterEach(() => {
+      clearProjectConfigCache();
+      fs.rmSync(dir, { recursive: true, force: true });
+    });
+    const writeConfig = (obj: unknown) =>
+      fs.writeFileSync(
+        path.join(dir, 'codegraph.json'),
+        typeof obj === 'string' ? obj : JSON.stringify(obj)
+      );
+
+    it('returns an empty map when there is no codegraph.json', () => {
+      expect(loadExtensionOverrides(dir)).toEqual({});
+    });
+
+    it('loads and validates a well-formed extensions map', () => {
+      writeConfig({ extensions: { '.foo': 'typescript', '.bar': 'python' } });
+      expect(loadExtensionOverrides(dir)).toEqual({ '.foo': 'typescript', '.bar': 'python' });
+    });
+
+    it('normalizes keys (adds a leading dot, lowercases)', () => {
+      writeConfig({ extensions: { foo: 'lua', '.BAR': 'go' } });
+      expect(loadExtensionOverrides(dir)).toEqual({ '.foo': 'lua', '.bar': 'go' });
+    });
+
+    it('skips entries whose target is not a supported language', () => {
+      writeConfig({ extensions: { '.foo': 'typescript', '.bad': 'pyhton', '.x': 'unknown' } });
+      expect(loadExtensionOverrides(dir)).toEqual({ '.foo': 'typescript' });
+    });
+
+    it('skips multi-part and otherwise unusable extension keys', () => {
+      writeConfig({ extensions: { '.d.ts': 'typescript', 'a/b': 'go', '.': 'lua', '.ok': 'rust' } });
+      expect(loadExtensionOverrides(dir)).toEqual({ '.ok': 'rust' });
+    });
+
+    it('ignores malformed JSON without throwing', () => {
+      writeConfig('{ not: valid json ');
+      expect(loadExtensionOverrides(dir)).toEqual({});
+    });
+
+    it('ignores a non-object extensions field', () => {
+      writeConfig({ extensions: 'nope' });
+      expect(loadExtensionOverrides(dir)).toEqual({});
+    });
+
+    it('picks up a changed config (mtime-invalidated cache)', () => {
+      writeConfig({ extensions: { '.foo': 'typescript' } });
+      expect(loadExtensionOverrides(dir)).toEqual({ '.foo': 'typescript' });
+
+      writeConfig({ extensions: { '.foo': 'go' } });
+      // Force a distinct mtime in case the filesystem clock is coarse.
+      const future = new Date(Date.now() + 2000);
+      fs.utimesSync(path.join(dir, 'codegraph.json'), future, future);
+
+      expect(loadExtensionOverrides(dir)).toEqual({ '.foo': 'go' });
+    });
+  });
+
+  describe('indexAll honors codegraph.json end-to-end', () => {
+    let dir: string;
+    beforeEach(() => {
+      dir = fs.mkdtempSync(path.join(os.tmpdir(), 'cg-extmap-idx-'));
+      clearProjectConfigCache();
+    });
+    afterEach(() => {
+      clearProjectConfigCache();
+      fs.rmSync(dir, { recursive: true, force: true });
+    });
+    const write = (rel: string, body: string) => {
+      const p = path.join(dir, rel);
+      fs.mkdirSync(path.dirname(p), { recursive: true });
+      fs.writeFileSync(p, body);
+    };
+    const indexAndQuery = async () => {
+      const cg = await CodeGraph.init(dir, { silent: true });
+      await cg.indexAll();
+      const db = (cg as any).db.db;
+      const nodes = db
+        .prepare('SELECT name, kind, file_path, language FROM nodes WHERE file_path = ?')
+        .all('widget.foo');
+      const files = db
+        .prepare('SELECT path, language FROM files WHERE path = ?')
+        .all('widget.foo');
+      cg.close?.();
+      return { nodes, files };
+    };
+
+    const SOURCE = 'export function widgetHandler(x: number): number { return x + 1; }\n';
+
+    it('indexes a custom-extension file mapped to a supported language', async () => {
+      write('codegraph.json', JSON.stringify({ extensions: { '.foo': 'typescript' } }));
+      write('widget.foo', SOURCE);
+
+      const { nodes, files } = await indexAndQuery();
+
+      expect(files.length).toBe(1);
+      expect(files[0].language).toBe('typescript');
+      expect(nodes.some((n: any) => n.name === 'widgetHandler' && n.language === 'typescript')).toBe(true);
+    });
+
+    it('does NOT index the same file without codegraph.json (zero-config preserved)', async () => {
+      write('widget.foo', SOURCE);
+
+      const { nodes, files } = await indexAndQuery();
+
+      expect(files.length).toBe(0);
+      expect(nodes.length).toBe(0);
+    });
+  });
+});

+ 72 - 0
__tests__/frameworks.test.ts

@@ -944,6 +944,78 @@ describe('goResolver.extract', () => {
   });
 });
 
+import { goframeResolver } from '../src/resolution/frameworks/goframe';
+
+describe('goframeResolver', () => {
+  it('detects GoFrame from a gogf/gf dependency in go.mod', () => {
+    const ctx: any = {
+      readFile: (f: string) =>
+        f === 'go.mod' ? 'module example.com/app\nrequire github.com/gogf/gf/v2 v2.7.0\n' : null,
+    };
+    expect(goframeResolver.detect(ctx)).toBe(true);
+    const noGf: any = { readFile: (f: string) => (f === 'go.mod' ? 'module example.com/app\n' : null) };
+    expect(goframeResolver.detect(noGf)).toBe(false);
+  });
+
+  it('extracts a route node from a g.Meta request struct (method upper-cased)', () => {
+    const src = `package v1
+import "github.com/gogf/gf/v2/frame/g"
+type SignInReq struct {
+	g.Meta   \`path:"/user/sign-in" method:"post" tags:"User" summary:"Sign in"\`
+	Passport string
+}
+type SignInRes struct{}
+`;
+    const { nodes } = goframeResolver.extract!('api/user/v1/user_sign_in.go', src);
+    expect(nodes).toHaveLength(1);
+    expect(nodes[0].kind).toBe('route');
+    expect(nodes[0].name).toBe('POST /user/sign-in');
+    // The package-qualified request type is encoded for the synthesizer join.
+    expect(nodes[0].qualifiedName).toContain('::goframe-route:v1.SignInReq');
+  });
+
+  it('is independent of g.Meta tag attribute order', () => {
+    const src = `type DeptSearchReq struct {
+	g.Meta \`path:"/dept/list" tags:"Dept" method:"get" summary:"列表"\`
+}`;
+    const { nodes } = goframeResolver.extract!('api/system/dept.go', src);
+    expect(nodes[0].name).toBe('GET /dept/list');
+    expect(nodes[0].qualifiedName).toContain('::goframe-route:DeptSearchReq');
+  });
+
+  it('skips a response g.Meta that has no path (mime-only) and other non-route metadata', () => {
+    const src = `type ListRes struct {
+	g.Meta \`mime:"application/json"\`
+	Items []string
+}`;
+    const { nodes } = goframeResolver.extract!('api/x.go', src);
+    expect(nodes).toHaveLength(0);
+  });
+
+  it('defaults method to ANY when method: is omitted', () => {
+    const src = `type PingReq struct {
+	g.Meta \`path:"/ping"\`
+}`;
+    const { nodes } = goframeResolver.extract!('api/ping.go', src);
+    expect(nodes[0].name).toBe('ANY /ping');
+  });
+
+  it('extracts every request struct in a multi-route api file', () => {
+    const src = `type DeptListReq struct { g.Meta \`path:"/dept/list" method:"get"\` }
+type DeptListRes struct { g.Meta \`mime:"application/json"\` }
+type DeptAddReq struct { g.Meta \`path:"/dept/add" method:"post"\` }
+type DeptAddRes struct {}
+`;
+    const { nodes } = goframeResolver.extract!('api/dept.go', src);
+    expect(nodes.map((n) => n.name).sort()).toEqual(['GET /dept/list', 'POST /dept/add']);
+  });
+
+  it('returns nothing for a non-go file or a file without g.Meta', () => {
+    expect(goframeResolver.extract!('main.ts', 'const x = 1').nodes).toHaveLength(0);
+    expect(goframeResolver.extract!('main.go', 'package main\nfunc main() {}\n').nodes).toHaveLength(0);
+  });
+});
+
 import { rustResolver } from '../src/resolution/frameworks/rust';
 
 describe('rustResolver.extract', () => {

+ 181 - 0
__tests__/goframe.test.ts

@@ -0,0 +1,181 @@
+/**
+ * GoFrame route → controller-method coverage (#747), end to end.
+ *
+ * GoFrame binds routes reflectively, so the route declared in a request type's
+ * `g.Meta` tag has no static edge to the controller method that serves it, and
+ * the method name is NOT derivable from the request type (`DeptSearchReq` is
+ * served by `List`). This indexes a fixture through the full pipeline and
+ * checks: the `g.Meta` tags become route nodes; each route joins to its handler
+ * by the request type in the method signature (the naming-mismatch case
+ * included); a response (`mime`-only) `g.Meta` makes no route; a route with no
+ * handler is left unlinked (silent beats wrong); and the response type never
+ * produces a spurious edge.
+ */
+import { describe, it, expect, beforeEach, afterEach } from 'vitest';
+import * as fs from 'node:fs';
+import * as path from 'node:path';
+import * as os from 'node:os';
+import { CodeGraph } from '../src';
+
+describe('GoFrame route synthesizer', () => {
+  let dir: string;
+  beforeEach(() => { dir = fs.mkdtempSync(path.join(os.tmpdir(), 'goframe-')); });
+  afterEach(() => { fs.rmSync(dir, { recursive: true, force: true }); });
+
+  it('joins each g.Meta route to its controller method by the request-type signature', async () => {
+    fs.writeFileSync(path.join(dir, 'go.mod'), 'module example.com/app\n\nrequire github.com/gogf/gf/v2 v2.7.0\n');
+
+    fs.mkdirSync(path.join(dir, 'api', 'system'), { recursive: true });
+    fs.writeFileSync(
+      path.join(dir, 'api', 'system', 'dept.go'),
+      `package system
+
+import "github.com/gogf/gf/v2/frame/g"
+
+type DeptSearchReq struct {
+	g.Meta   \`path:"/dept/list" tags:"Dept" method:"get" summary:"list"\`
+	DeptName string
+}
+type DeptSearchRes struct {
+	g.Meta \`mime:"application/json"\`
+	List   []string
+}
+
+type DeptAddReq struct {
+	g.Meta \`path:"/dept/add" method:"post"\`
+	Name   string
+}
+type DeptAddRes struct{}
+
+// A declared route whose handler does not exist in this codebase.
+type OrphanReq struct {
+	g.Meta \`path:"/orphan" method:"get"\`
+}
+type OrphanRes struct{}
+`
+    );
+
+    fs.mkdirSync(path.join(dir, 'internal', 'controller'), { recursive: true });
+    fs.writeFileSync(
+      path.join(dir, 'internal', 'controller', 'dept.go'),
+      `package controller
+
+import (
+	"context"
+
+	"example.com/app/api/system"
+)
+
+type sysDeptController struct{}
+
+// NB: method name (List) differs from the request type (DeptSearchReq) — the join
+// must be by signature, not name.
+func (c *sysDeptController) List(ctx context.Context, req *system.DeptSearchReq) (res *system.DeptSearchRes, err error) {
+	return helper(ctx)
+}
+
+func (c *sysDeptController) Add(ctx context.Context, req *system.DeptAddReq) (res *system.DeptAddRes, err error) {
+	return
+}
+
+// Returns the response type but takes no request type — must NOT be linked.
+func helper(ctx context.Context) (res *system.DeptSearchRes, err error) {
+	return
+}
+`
+    );
+
+    const cg = await CodeGraph.init(dir, { silent: true });
+    await cg.indexAll();
+    const db = (cg as any).db.db;
+
+    const routes = db.prepare(`SELECT name FROM nodes WHERE kind='route' ORDER BY name`).all();
+    const edges = db
+      .prepare(
+        `SELECT json_extract(e.metadata,'$.route') route, json_extract(e.metadata,'$.requestType') reqType,
+                e.kind, t.name target_name, t.kind target_kind
+         FROM edges e JOIN nodes t ON t.id = e.target
+         WHERE json_extract(e.metadata,'$.synthesizedBy') = 'goframe-route'
+         ORDER BY route`
+      )
+      .all();
+    cg.close?.();
+
+    // Three routes from path-bearing g.Meta; the mime-only response g.Meta makes none.
+    expect(routes.map((r: any) => r.name)).toEqual(['GET /dept/list', 'GET /orphan', 'POST /dept/add']);
+
+    // Two route→handler edges — the orphan route stays unlinked (silent beats wrong).
+    expect(edges).toHaveLength(2);
+    const byRoute = Object.fromEntries(edges.map((e: any) => [e.route, e]));
+
+    // Naming mismatch resolved by signature: GET /dept/list → List.
+    expect(byRoute['GET /dept/list'].target_name).toBe('List');
+    expect(byRoute['GET /dept/list'].reqType).toBe('DeptSearchReq');
+    expect(byRoute['POST /dept/add'].target_name).toBe('Add');
+
+    // It is a dynamic-dispatch `calls` hop to a real method, never to the helper.
+    expect(edges.every((e: any) => e.kind === 'calls' && e.target_kind === 'method')).toBe(true);
+    expect(edges.some((e: any) => e.target_name === 'helper')).toBe(false);
+    expect(byRoute['GET /orphan']).toBeUndefined();
+  });
+
+  it('disambiguates identical bare request types across modules by package qualifier', async () => {
+    fs.writeFileSync(path.join(dir, 'go.mod'), 'module example.com/app\n\nrequire github.com/gogf/gf/v2 v2.7.0\n');
+
+    // Two modules that BOTH define `type ListReq struct` — the collision a large
+    // GoFrame app has dozens of. The package qualifier in the handler signature
+    // (`*cash.ListReq` vs `*order.ListReq`) is the only thing that tells them apart.
+    for (const mod of ['cash', 'order']) {
+      fs.mkdirSync(path.join(dir, 'api', mod), { recursive: true });
+      fs.writeFileSync(
+        path.join(dir, 'api', mod, `${mod}.go`),
+        `package ${mod}
+
+import "github.com/gogf/gf/v2/frame/g"
+
+type ListReq struct {
+	g.Meta \`path:"/${mod}/list" method:"get"\`
+}
+type ListRes struct{}
+`
+      );
+      fs.mkdirSync(path.join(dir, 'internal', 'controller', mod), { recursive: true });
+      fs.writeFileSync(
+        path.join(dir, 'internal', 'controller', mod, `${mod}.go`),
+        `package ${mod}
+
+import (
+	"context"
+
+	"example.com/app/api/${mod}"
+)
+
+type c${mod} struct{}
+
+func (c *c${mod}) List(ctx context.Context, req *${mod}.ListReq) (res *${mod}.ListRes, err error) {
+	return
+}
+`
+      );
+    }
+
+    const cg = await CodeGraph.init(dir, { silent: true });
+    await cg.indexAll();
+    const db = (cg as any).db.db;
+    const rows = db
+      .prepare(
+        `SELECT json_extract(e.metadata,'$.route') route, t.file_path handler_file
+         FROM edges e JOIN nodes t ON t.id = e.target
+         WHERE json_extract(e.metadata,'$.synthesizedBy') = 'goframe-route'
+         ORDER BY route`
+      )
+      .all();
+    cg.close?.();
+
+    expect(rows).toHaveLength(2);
+    // Each route binds to ITS OWN module's handler, never the other's.
+    const byRoute = Object.fromEntries(rows.map((r: any) => [r.route, r.handler_file]));
+    expect(byRoute['GET /cash/list']).toContain('controller/cash/');
+    expect(byRoute['GET /order/list']).toContain('controller/order/');
+  });
+});

+ 46 - 0
__tests__/security.test.ts

@@ -232,6 +232,26 @@ describe('Symlink escape prevention (#527)', () => {
     expect(validatePathWithinRoot(root, 'src/inlink.ts')).not.toBeNull();
   });
 
+  // The INDEXING read path opts into following in-root symlinks the directory
+  // walk already descended into — discovery and the reader must agree, or files
+  // reached via an in-root symlink-to-outside fail to index (#935). The lexical
+  // `../` guard is NOT waived, and content-serving sinks never pass the flag.
+  it('allowSymlinkEscape follows an in-repo symlink to an out-of-root FILE (indexing read)', () => {
+    if (!link(path.join(root, 'escape'), path.join(outside, 'pkg', 'secret.txt'))) return;
+    expect(validatePathWithinRoot(root, 'escape', { allowSymlinkEscape: true })).not.toBeNull();
+  });
+
+  it('allowSymlinkEscape follows a path through an in-repo out-of-root DIR symlink (indexing read)', () => {
+    if (!link(path.join(root, 'escapedir'), path.join(outside, 'pkg'))) return;
+    expect(validatePathWithinRoot(root, 'escapedir/secret.txt', { allowSymlinkEscape: true })).not.toBeNull();
+  });
+
+  it('allowSymlinkEscape STILL rejects a lexical ../ traversal (guard not waived)', () => {
+    expect(
+      validatePathWithinRoot(root, `../${path.basename(outside)}/pkg/secret.txt`, { allowSymlinkEscape: true })
+    ).toBeNull();
+  });
+
   it('end-to-end: getCode never serves an out-of-root file reached via a dir symlink', async () => {
     fs.writeFileSync(path.join(outside, 'pkg', 'leak.ts'),
       'export function leaked() { return "LEAKED-ZZZ-9"; }\n');
@@ -250,6 +270,32 @@ describe('Symlink escape prevention (#527)', () => {
       cg.close();
     }
   });
+
+  it('end-to-end (#935): indexes source reached through an in-root dir symlink to outside', async () => {
+    // The Dota custom-game layout symlinks `game/` and `content/` into an SDK
+    // tree outside the repo. Before #935 the batch reader's strict symlink-escape
+    // guard blocked every file under them, so nothing indexed — even though the
+    // directory walk deliberately followed the symlink to enumerate them. The
+    // reader now agrees with discovery: the file indexes.
+    fs.writeFileSync(path.join(outside, 'pkg', 'vendored.ts'),
+      'export function vendoredHelper() { return "LEAKED-ZZZ-9"; }\n');
+    if (!link(path.join(root, 'game'), path.join(outside, 'pkg'))) return;
+
+    const cg = CodeGraph.initSync(root, { config: { include: ['**/*.ts'], exclude: [] } });
+    try {
+      await cg.indexAll();
+      // The symlinked-in file is now part of the graph...
+      const names = cg.getNodesByKind('function').map((n) => n.name);
+      expect(names).toContain('vendoredHelper');
+      // ...but its out-of-root contents are STILL never served (the #527 sink
+      // stays strict — indexing relaxes only the read path, not getCode).
+      for (const n of cg.getNodesByKind('function')) {
+        expect((await cg.getCode(n.id)) ?? '').not.toContain('LEAKED-ZZZ-9');
+      }
+    } finally {
+      cg.close();
+    }
+  });
 });
 
 describe('validateProjectPath — sensitive directory blocking', () => {

Những thai đổi đã bị hủy bỏ vì nó quá lớn
+ 1 - 0
docs/design/dynamic-dispatch-coverage-playbook.md


+ 19 - 2
site/src/content/docs/getting-started/configuration.md

@@ -1,9 +1,9 @@
 ---
 title: Configuration
-description: CodeGraph is zero-config — there are no config files.
+description: CodeGraph is zero-config by default, with one optional file for mapping custom extensions.
 ---
 
-There isn't any — CodeGraph is **zero-config**, with **no config file** to write or keep in sync. Language support is automatic from the file extension; there's nothing to wire up per language.
+Next to none — CodeGraph is **zero-config by default**, with nothing to write or keep in sync to get started. Language support is automatic from the file extension; there's nothing to wire up per language. The one optional file is for mapping [custom file extensions](#custom-file-extensions).
 
 ## What it skips out of the box
 
@@ -17,6 +17,23 @@ To keep something else out, add it to `.gitignore`. To pull a default-excluded d
 
 The defaults apply uniformly, so committing a dependency or build directory doesn't force it into the graph — the `.gitignore` negation is the explicit opt-in.
 
+## Custom file extensions
+
+If your project uses a non-standard extension for a [supported language](/codegraph/reference/languages/) — say `.dota_lua` for Lua, or `.tpl` for PHP — those files are skipped by default, because the extension isn't one CodeGraph recognizes. Map them with an optional `codegraph.json` at your project root:
+
+```json
+{
+  "extensions": {
+    ".dota_lua": "lua",
+    ".tpl": "php"
+  }
+}
+```
+
+Each value is a supported language id. The mappings merge on top of the built-in defaults and win on conflict, so you can also re-point a built-in (e.g. `".h": "cpp"`). Commit the file to share the mapping with your team.
+
+A typo'd language or a malformed file is warned about and skipped — it never breaks indexing — and a project with no `codegraph.json` behaves exactly as before. Re-index (`codegraph index`) after adding or changing mappings.
+
 ## Where data lives
 
 Per-project data lives in a `.codegraph/` directory at your project root, containing the SQLite database (`codegraph.db`). Nothing leaves your machine.

+ 14 - 5
src/extraction/grammars.ts

@@ -121,13 +121,18 @@ export const EXTENSION_MAP: Record<string, Language> = {
  * Whether a file is one CodeGraph can parse, based purely on its extension.
  * This is the single source of truth for "should we index this file" — derived
  * from EXTENSION_MAP so parser support and indexing selection never drift.
+ *
+ * `overrides` is the project's validated custom extension → language map (from
+ * `codegraph.json`); when present its extensions count as indexable in addition
+ * to the built-ins. Omitting it is byte-identical to the zero-config behavior.
  */
-export function isSourceFile(filePath: string): boolean {
+export function isSourceFile(filePath: string, overrides?: Record<string, Language>): boolean {
   if (isPlayRoutesFile(filePath)) return true; // Play `conf/routes` is extensionless
   if (isShopifyLiquidJson(filePath)) return true; // Shopify OS 2.0 JSON templates / section groups
   const dot = filePath.lastIndexOf('.');
   if (dot < 0) return false;
-  return filePath.slice(dot).toLowerCase() in EXTENSION_MAP;
+  const ext = filePath.slice(dot).toLowerCase();
+  return ext in EXTENSION_MAP || (!!overrides && ext in overrides);
 }
 
 /**
@@ -266,9 +271,13 @@ export function getParser(language: Language): Parser | null {
 }
 
 /**
- * Detect language from file extension
+ * Detect language from file extension.
+ *
+ * `overrides` is the project's validated custom extension → language map (from
+ * `codegraph.json`); when present its mappings take precedence over the built-in
+ * `EXTENSION_MAP`. Omitting it is byte-identical to the zero-config behavior.
  */
-export function detectLanguage(filePath: string, source?: string): Language {
+export function detectLanguage(filePath: string, source?: string, overrides?: Record<string, Language>): Language {
   // Play `conf/routes` has no grammar — route through the no-symbol path; the
   // Play framework resolver extracts route nodes from it.
   if (isPlayRoutesFile(filePath)) return 'yaml';
@@ -276,7 +285,7 @@ export function detectLanguage(filePath: string, source?: string): Language {
   // Shopify OS 2.0 JSON templates / section groups → the Liquid extractor (it
   // links each section `"type"` to its `sections/<type>.liquid`).
   if (isShopifyLiquidJson(filePath)) return 'liquid';
-  const lang = EXTENSION_MAP[ext] || 'unknown';
+  const lang = (overrides && overrides[ext]) || EXTENSION_MAP[ext] || 'unknown';
 
   // .h files could be C, C++, or Objective-C — check source content
   if (lang === 'c' && ext === '.h' && source) {

+ 50 - 23
src/extraction/index.ts

@@ -19,6 +19,7 @@ import {
 import { QueryBuilder } from '../db/queries';
 import { extractFromSource } from './tree-sitter';
 import { detectLanguage, isSourceFile, isLanguageSupported, isFileLevelOnlyLanguage, initGrammars, loadGrammarsForLanguages } from './grammars';
+import { loadExtensionOverrides } from '../project-config';
 import { isCodeGraphDataDir } from '../directory';
 import { logDebug, logWarn } from '../errors';
 import { validatePathWithinRoot, normalizePath } from '../utils';
@@ -637,14 +638,17 @@ interface GitChanges {
 function getGitChangedFiles(rootDir: string): GitChanges | null {
   try {
     const changes: GitChanges = { modified: [], added: [], deleted: [] };
-    collectGitStatus(rootDir, '', changes);
+    // Custom extension → language overrides from the project's codegraph.json,
+    // so change detection sees the same custom-extension files the full index does.
+    const overrides = loadExtensionOverrides(rootDir);
+    collectGitStatus(rootDir, '', changes, overrides);
     return changes;
   } catch {
     return null;
   }
 }
 
-function collectGitStatus(repoDir: string, prefix: string, out: GitChanges): void {
+function collectGitStatus(repoDir: string, prefix: string, out: GitChanges, overrides?: Record<string, Language>): void {
   const output = execFileSync(
     'git',
     ['status', '--porcelain', '--no-renames'],
@@ -678,7 +682,7 @@ function collectGitStatus(repoDir: string, prefix: string, out: GitChanges): voi
     }
 
     const filePath = normalizePath(prefix + rel);
-    if (!isSourceFile(filePath)) continue;
+    if (!isSourceFile(filePath, overrides)) continue;
 
     if (statusCode.includes('D')) {
       // Deletions stay unfiltered: getChangedFiles acts on one only when the
@@ -704,11 +708,11 @@ function collectGitStatus(repoDir: string, prefix: string, out: GitChanges): voi
   // nested deeper) and under this repo's gitignored dirs.
   for (const rel of untrackedDirs) {
     for (const repoRel of findNestedGitRepos(path.join(repoDir, rel), rel)) {
-      collectGitStatus(path.join(repoDir, repoRel), prefix + repoRel, out);
+      collectGitStatus(path.join(repoDir, repoRel), prefix + repoRel, out, overrides);
     }
   }
   for (const rel of findIgnoredEmbeddedRepos(repoDir)) {
-    collectGitStatus(path.join(repoDir, rel), prefix + rel, out);
+    collectGitStatus(path.join(repoDir, rel), prefix + rel, out, overrides);
   }
 }
 
@@ -723,13 +727,16 @@ export function scanDirectory(
   rootDir: string,
   onProgress?: (current: number, file: string) => void
 ): string[] {
+  // Custom extension → language overrides from the project's codegraph.json.
+  const overrides = loadExtensionOverrides(rootDir);
+
   // Fast path: use git to get all visible files (respects .gitignore everywhere)
   const gitFiles = getGitVisibleFiles(rootDir);
   if (gitFiles) {
     const files: string[] = [];
     let count = 0;
     for (const filePath of gitFiles) {
-      if (isSourceFile(filePath)) {
+      if (isSourceFile(filePath, overrides)) {
         files.push(filePath);
         count++;
         onProgress?.(count, filePath);
@@ -750,12 +757,15 @@ export async function scanDirectoryAsync(
   rootDir: string,
   onProgress?: (current: number, file: string) => void
 ): Promise<string[]> {
+  // Custom extension → language overrides from the project's codegraph.json.
+  const overrides = loadExtensionOverrides(rootDir);
+
   const gitFiles = getGitVisibleFiles(rootDir);
   if (gitFiles) {
     const files: string[] = [];
     let count = 0;
     for (const filePath of gitFiles) {
-      if (isSourceFile(filePath)) {
+      if (isSourceFile(filePath, overrides)) {
         files.push(filePath);
         count++;
         onProgress?.(count, filePath);
@@ -781,6 +791,8 @@ function scanDirectoryWalk(
   const files: string[] = [];
   let count = 0;
   const visitedDirs = new Set<string>();
+  // Custom extension → language overrides from the project's codegraph.json.
+  const overrides = loadExtensionOverrides(rootDir);
 
   // A .gitignore matcher scoped to the directory that declared it. Patterns in
   // a nested .gitignore are relative to that directory, so we keep the dir
@@ -857,7 +869,7 @@ function scanDirectoryWalk(
               walk(fullPath, active);
             }
           } else if (stat.isFile()) {
-            if (!isIgnored(fullPath, false, active) && isSourceFile(relativePath)) {
+            if (!isIgnored(fullPath, false, active) && isSourceFile(relativePath, overrides)) {
               files.push(relativePath);
               count++;
               onProgress?.(count, relativePath);
@@ -874,7 +886,7 @@ function scanDirectoryWalk(
           walk(fullPath, active);
         }
       } else if (entry.isFile()) {
-        if (!isIgnored(fullPath, false, active) && isSourceFile(relativePath)) {
+        if (!isIgnored(fullPath, false, active) && isSourceFile(relativePath, overrides)) {
           files.push(relativePath);
           count++;
           onProgress?.(count, relativePath);
@@ -994,6 +1006,11 @@ export class ExtractionOrchestrator {
     let totalNodes = 0;
     let totalEdges = 0;
 
+    // Custom extension → language overrides from the project's codegraph.json.
+    // Threaded into language detection so custom-extension files load the right
+    // grammar and store under the mapped language.
+    const overrides = loadExtensionOverrides(this.rootDir);
+
     const log = verbose
       ? (msg: string) => { console.log(`[worker] ${msg}`); }
       : (_msg: string) => {};
@@ -1050,7 +1067,7 @@ export class ExtractionOrchestrator {
     await new Promise(resolve => setImmediate(resolve));
 
     // Detect needed languages and load grammars in the parse worker
-    const neededLanguages = [...new Set(files.map((f) => detectLanguage(f)))];
+    const neededLanguages = [...new Set(files.map((f) => detectLanguage(f, undefined, overrides)))];
     // .h files default to 'c' but may be C++ — ensure cpp grammar is loaded when c is needed
     if (neededLanguages.includes('c') && !neededLanguages.includes('cpp')) {
       neededLanguages.push('cpp');
@@ -1161,12 +1178,17 @@ export class ExtractionOrchestrator {
     }
 
     async function requestParse(filePath: string, content: string): Promise<ExtractionResult> {
+      // Resolve the language on the main thread (where the project's
+      // codegraph.json overrides are loaded) and hand it to the worker, so the
+      // worker never needs the override map itself.
+      const language = detectLanguage(filePath, content, overrides);
+
       if (!WorkerClass) {
         // In-process fallback
         return extractFromSource(
           filePath,
           content,
-          detectLanguage(filePath, content),
+          language,
           frameworkNames
         );
       }
@@ -1198,7 +1220,7 @@ export class ExtractionOrchestrator {
         }, timeoutMs);
 
         pendingParses.set(id, { resolve, reject, timer });
-        worker.postMessage({ type: 'parse', id, filePath, content, frameworkNames });
+        worker.postMessage({ type: 'parse', id, filePath, content, frameworkNames, language });
       });
     }
 
@@ -1223,7 +1245,10 @@ export class ExtractionOrchestrator {
       const fileContents = await Promise.all(
         batch.map(async (fp) => {
           try {
-            const fullPath = validatePathWithinRoot(this.rootDir, fp);
+            // Indexing read: follow in-root symlinks the directory walk already
+            // descended into (the `../` guard still applies) so files reached
+            // via an in-root symlink-to-outside still index (#935).
+            const fullPath = validatePathWithinRoot(this.rootDir, fp, { allowSymlinkEscape: true });
             if (!fullPath) {
               logWarn('Path traversal blocked in batch reader', { filePath: fp });
               return { filePath: fp, content: null as string | null, stats: null as fs.Stats | null, error: new Error('Path traversal blocked') };
@@ -1312,7 +1337,7 @@ export class ExtractionOrchestrator {
 
         // Store in database on main thread (SQLite is not thread-safe)
         if (result.nodes.length > 0 || result.errors.length === 0) {
-          const language = detectLanguage(filePath, content);
+          const language = detectLanguage(filePath, content, overrides);
           this.storeExtractionResult(filePath, content, language, stats, result);
         }
 
@@ -1333,7 +1358,7 @@ export class ExtractionOrchestrator {
           // Files with no symbols but no errors (yaml, twig, properties) are
           // tracked at the file level — count them as indexed so the CLI
           // doesn't misleadingly report "No files found to index".
-          const lang = detectLanguage(filePath, content);
+          const lang = detectLanguage(filePath, content, overrides);
           if (isFileLevelOnlyLanguage(lang)) {
             filesIndexed++;
           } else {
@@ -1393,7 +1418,7 @@ export class ExtractionOrchestrator {
         }
 
         if (result.nodes.length > 0 || result.errors.length === 0) {
-          const language = detectLanguage(filePath, content);
+          const language = detectLanguage(filePath, content, overrides);
           const stats = await fsp.stat(path.join(this.rootDir, filePath));
           this.storeExtractionResult(filePath, content, language, stats, result);
 
@@ -1444,7 +1469,7 @@ export class ExtractionOrchestrator {
           }
 
           if (result.nodes.length > 0 || result.errors.length === 0) {
-            const language = detectLanguage(filePath, fullContent);
+            const language = detectLanguage(filePath, fullContent, overrides);
             const stats = await fsp.stat(path.join(this.rootDir, filePath));
             this.storeExtractionResult(filePath, fullContent, language, stats, result);
 
@@ -1529,7 +1554,8 @@ export class ExtractionOrchestrator {
    * Index a single file
    */
   async indexFile(relativePath: string): Promise<ExtractionResult> {
-    const fullPath = validatePathWithinRoot(this.rootDir, relativePath);
+    // Indexing read: follow in-root symlinks (the `../` guard still applies), #935.
+    const fullPath = validatePathWithinRoot(this.rootDir, relativePath, { allowSymlinkEscape: true });
 
     if (!fullPath) {
       return {
@@ -1576,8 +1602,8 @@ export class ExtractionOrchestrator {
     content: string,
     stats: fs.Stats
   ): Promise<ExtractionResult> {
-    // Prevent path traversal
-    const fullPath = validatePathWithinRoot(this.rootDir, relativePath);
+    // Prevent `../` traversal; follow in-root symlinks like the directory walk (#935).
+    const fullPath = validatePathWithinRoot(this.rootDir, relativePath, { allowSymlinkEscape: true });
     if (!fullPath) {
       logWarn('Path traversal blocked in indexFileWithContent', { relativePath });
       return {
@@ -1607,8 +1633,8 @@ export class ExtractionOrchestrator {
       };
     }
 
-    // Detect language
-    const language = detectLanguage(relativePath, content);
+    // Detect language (honoring the project's codegraph.json extension overrides)
+    const language = detectLanguage(relativePath, content, loadExtensionOverrides(this.rootDir));
     if (!isLanguageSupported(language)) {
       return {
         nodes: [],
@@ -1863,7 +1889,8 @@ export class ExtractionOrchestrator {
 
     // Load only grammars needed for changed files
     if (filesToIndex.length > 0) {
-      const neededLanguages = [...new Set(filesToIndex.map((f) => detectLanguage(f)))];
+      const overrides = loadExtensionOverrides(this.rootDir);
+      const neededLanguages = [...new Set(filesToIndex.map((f) => detectLanguage(f, undefined, overrides)))];
       // .h files default to 'c' but may be C++ — ensure cpp grammar is loaded
       if (neededLanguages.includes('c') && !neededLanguages.includes('cpp')) {
         neededLanguages.push('cpp');

+ 5 - 2
src/extraction/parse-worker.ts

@@ -55,14 +55,17 @@ import type { Language, ExtractionResult } from '../types';
 const PARSER_RESET_INTERVAL = 5000;
 const parseCounts = new Map<Language, number>();
 
-parentPort!.on('message', async (msg: { type: string; id?: number; filePath?: string; content?: string; languages?: Language[]; frameworkNames?: string[] }) => {
+parentPort!.on('message', async (msg: { type: string; id?: number; filePath?: string; content?: string; languages?: Language[]; frameworkNames?: string[]; language?: Language }) => {
   if (msg.type === 'load-grammars') {
     await loadGrammarsForLanguages(msg.languages!);
     parentPort!.postMessage({ type: 'grammars-loaded' });
   } else if (msg.type === 'parse') {
     const { id, filePath, content, frameworkNames } = msg;
     try {
-      const language = detectLanguage(filePath!, content);
+      // The main thread resolves the language (it holds the project's
+      // codegraph.json extension overrides) and sends it; fall back to detection
+      // for older callers / safety.
+      const language = msg.language ?? detectLanguage(filePath!, content);
       const result: ExtractionResult = extractFromSource(filePath!, content!, language, frameworkNames);
 
       // Periodic parser reset to reclaim WASM heap memory

+ 8 - 0
src/mcp/tools.ts

@@ -1651,6 +1651,14 @@ export class ToolHandler {
         registeredAt,
       };
     }
+    if (m?.synthesizedBy === 'goframe-route') {
+      const route = m.route ? `\`${String(m.route)}\`` : 'a route';
+      return {
+        label: `GoFrame route ${route} — reflective Bind → controller method (dynamic dispatch)`,
+        compact: `dynamic: GoFrame route ${m.route ? String(m.route) : ''}${at}`,
+        registeredAt,
+      };
+    }
     // Generic fallback for any other synthesizer (redux-thunk, gin-middleware-chain,
     // flutter-build, …): a synthesized hop must never read as a bare static `calls`.
     // It's a dynamic-dispatch bridge — label it as one and keep its wiring site.

+ 155 - 0
src/project-config.ts

@@ -0,0 +1,155 @@
+/**
+ * Project-scoped configuration: a committed `codegraph.json` at the project
+ * root that a team shares through version control.
+ *
+ * Today it carries one thing — `extensions`, an opt-in map from a custom file
+ * extension to one of CodeGraph's supported languages. The built-in
+ * extension → language table (`EXTENSION_MAP` in `extraction/grammars.ts`) is
+ * otherwise hardcoded, so a codebase that uses a non-standard extension for a
+ * supported language (e.g. `.dota_lua` for Lua) sees those files silently
+ * skipped. This lets the project map them once, in a version-controlled file:
+ *
+ *   {
+ *     "extensions": {
+ *       ".dota_lua": "lua",
+ *       ".tpl": "php"
+ *     }
+ *   }
+ *
+ * User mappings merge on TOP of the built-ins and win on conflict, so a project
+ * can also re-point a built-in extension (e.g. force `.h` → `cpp`). Absent or
+ * malformed config is the zero-config default — no overrides, no error. Invalid
+ * individual entries are warned-and-skipped (never fatal): an unparseable
+ * project file must not break indexing.
+ */
+import * as fs from 'fs';
+import * as path from 'path';
+import { Language } from './types';
+import { isLanguageSupported } from './extraction/grammars';
+import { logWarn } from './errors';
+
+/** Filename of the project-scoped config, resolved relative to the project root. */
+export const PROJECT_CONFIG_FILENAME = 'codegraph.json';
+
+export interface ProjectConfig {
+  /** Map of custom file extension (`.foo`) to a supported language id. */
+  extensions?: Record<string, string>;
+}
+
+interface CacheEntry {
+  mtimeMs: number;
+  overrides: Record<string, Language>;
+}
+
+/**
+ * Cache keyed by project root. The loader is called once per indexing/scan/sync
+ * operation (and per watch event), so the mtime guard keeps repeat calls to one
+ * `stat` while a single `codegraph.json` is in force. Keying by root keeps two
+ * projects in the same process (the daemon / multi-project MCP server) isolated.
+ */
+const overridesCache = new Map<string, Record<string, Language>>();
+const cacheMeta = new Map<string, CacheEntry>();
+
+/** Shared frozen empty map so the no-config path allocates nothing. */
+const EMPTY: Record<string, Language> = Object.freeze({});
+
+/**
+ * Normalize a user-provided extension key to the `.ext` lowercase form used by
+ * the built-in map. Returns null for keys that can never match a real file
+ * extension (so the caller warns and skips):
+ *   - empty / just "."
+ *   - multi-part (".d.ts") — language detection keys off the FINAL extension
+ *     only (`lastIndexOf('.')`), so a multi-dot key would never be consulted.
+ *   - anything containing a path separator.
+ */
+function normalizeExtKey(raw: string): string | null {
+  if (typeof raw !== 'string') return null;
+  let ext = raw.trim().toLowerCase();
+  if (!ext) return null;
+  if (!ext.startsWith('.')) ext = '.' + ext;
+  const body = ext.slice(1);
+  if (!body) return null;
+  if (body.includes('.') || body.includes('/') || body.includes('\\')) return null;
+  return ext;
+}
+
+/**
+ * Parse and validate the `extensions` map out of a `codegraph.json` file.
+ * Every failure mode degrades to "no overrides from this entry" — a bad file or
+ * a typo'd language never throws.
+ */
+function parseExtensionOverrides(file: string): Record<string, Language> {
+  let raw: string;
+  try {
+    raw = fs.readFileSync(file, 'utf-8');
+  } catch {
+    return EMPTY;
+  }
+
+  let parsed: unknown;
+  try {
+    parsed = JSON.parse(raw);
+  } catch (err) {
+    logWarn(`Ignoring ${PROJECT_CONFIG_FILENAME}: not valid JSON`, {
+      file,
+      error: err instanceof Error ? err.message : String(err),
+    });
+    return EMPTY;
+  }
+
+  if (!parsed || typeof parsed !== 'object') return EMPTY;
+  const exts = (parsed as ProjectConfig).extensions;
+  if (!exts || typeof exts !== 'object' || Array.isArray(exts)) return EMPTY;
+
+  const out: Record<string, Language> = {};
+  for (const [rawKey, rawVal] of Object.entries(exts)) {
+    const key = normalizeExtKey(rawKey);
+    if (!key) {
+      logWarn(`Ignoring extension mapping in ${PROJECT_CONFIG_FILENAME}: "${rawKey}" is not a valid file extension`, { file });
+      continue;
+    }
+    if (typeof rawVal !== 'string' || !isLanguageSupported(rawVal as Language)) {
+      logWarn(`Ignoring extension "${rawKey}" in ${PROJECT_CONFIG_FILENAME}: "${String(rawVal)}" is not a supported language`, { file });
+      continue;
+    }
+    out[key] = rawVal as Language;
+  }
+
+  return Object.keys(out).length > 0 ? out : EMPTY;
+}
+
+/**
+ * Load the validated extension overrides for a project, mtime-cached.
+ *
+ * Returns a map of `.ext` → supported language id. The result merges on top of
+ * the built-in extension map at the point of use (see `detectLanguage` /
+ * `isSourceFile`), with these user mappings taking precedence. Returns an empty
+ * map when there is no `codegraph.json` (the zero-config default).
+ */
+export function loadExtensionOverrides(rootDir: string): Record<string, Language> {
+  const file = path.join(rootDir, PROJECT_CONFIG_FILENAME);
+
+  let mtimeMs: number;
+  try {
+    mtimeMs = fs.statSync(file).mtimeMs;
+  } catch {
+    // No config file — drop any stale cache entry and return the default.
+    cacheMeta.delete(rootDir);
+    overridesCache.delete(rootDir);
+    return EMPTY;
+  }
+
+  const meta = cacheMeta.get(rootDir);
+  if (meta && meta.mtimeMs === mtimeMs) return meta.overrides;
+
+  const overrides = parseExtensionOverrides(file);
+  cacheMeta.set(rootDir, { mtimeMs, overrides });
+  overridesCache.set(rootDir, overrides);
+  return overrides;
+}
+
+/** Test/maintenance hook: forget cached config (e.g. after rewriting it in a test). */
+export function clearProjectConfigCache(): void {
+  cacheMeta.clear();
+  overridesCache.clear();
+}

+ 3 - 0
src/resolution/callback-synthesizer.ts

@@ -27,6 +27,7 @@ import type { ResolutionContext } from './types';
 import { isGeneratedFile } from '../extraction/generated-detection';
 import { stripCommentsForRegex } from './strip-comments';
 import { cFnPointerDispatchEdges } from './c-fnptr-synthesizer';
+import { goframeRouteEdges } from './goframe-synthesizer';
 
 const REGISTRAR_NAME = /^(on[A-Z]\w*|subscribe|addListener|addEventListener|register|watch|listen|addCallback)$/;
 const DISPATCHER_NAME = /(emit|trigger|notify|dispatch|fire|publish|flush)/i;
@@ -2703,6 +2704,7 @@ export function synthesizeCallbackEdges(queries: QueryBuilder, ctx: ResolutionCo
   const sidekiqEdges = sidekiqDispatchEdges(ctx);
   const laravelEdges = laravelEventEdges(ctx);
   const cFnPtrEdges = cFnPointerDispatchEdges(queries, ctx);
+  const goframeEdges = goframeRouteEdges(ctx);
 
   const merged: Edge[] = [];
   const seen = new Set<string>();
@@ -2737,6 +2739,7 @@ export function synthesizeCallbackEdges(queries: QueryBuilder, ctx: ResolutionCo
     ...sidekiqEdges,
     ...laravelEdges,
     ...cFnPtrEdges,
+    ...goframeEdges,
   ]) {
     const key = `${e.source}>${e.target}`;
     if (seen.has(key)) continue;

+ 118 - 0
src/resolution/frameworks/goframe.ts

@@ -0,0 +1,118 @@
+/**
+ * GoFrame Framework Resolver (route metadata) — issue #747.
+ *
+ * GoFrame's "standard router" binds routes reflectively, so there is no literal
+ * path string at a `.GET("/x", handler)` call site and no static edge from a
+ * route to the controller method that serves it. The structural facts live in
+ * two places, joined only at runtime by GoFrame:
+ *
+ *   // api/user/v1/user_sign_in.go — the route lives in a struct tag on the request type
+ *   type SignInReq struct {
+ *       g.Meta `path:"/user/sign-in" method:"post" tags:"UserService" summary:"…"`
+ *       …
+ *   }
+ *   // internal/controller/user/user_v1_sign_in.go — the handler takes *that* request type
+ *   func (c *ControllerV1) SignIn(ctx context.Context, req *v1.SignInReq) (res *v1.SignInRes, err error)
+ *   // internal/cmd/cmd.go — reflective binding (no path, no handler name)
+ *   group.Bind(user.NewV1())
+ *
+ * This resolver handles the FIRST half: it reads the `g.Meta` struct tag on a
+ * request type into a `route` node (`POST /user/sign-in`). The route → handler
+ * EDGE is the genuinely reflective part — the method name is NOT derivable from
+ * the request type (`DeptSearchReq` is served by `List`, `DeptAddReq` by `Add`),
+ * so the only reliable join is the request type appearing in the method's
+ * parameter signature. That whole-graph join is done by the companion
+ * `goframeRouteEdges` synthesizer, which reads the request type back out of the
+ * route node's qualifiedName.
+ *
+ * Honesty note: the route node carries the `g.Meta` path verbatim. The group
+ * prefix from `s.Group("/api", …)` / nested `group.Group("/v1", …)` is applied
+ * by reflective `Bind` at runtime and is deliberately NOT reconstructed here —
+ * the discriminating, structural part is the per-route path + method.
+ */
+
+import { Node } from '../../types';
+import { FrameworkResolver, UnresolvedRef, ResolvedRef, ResolutionContext } from '../types';
+import { stripCommentsForRegex } from '../strip-comments';
+
+/**
+ * A request type carrying a routable `g.Meta` tag. `g.Meta` is, by GoFrame
+ * convention, the first embedded field of the struct, so anchoring on
+ * `struct { g.Meta `…` }` is both precise and cheap. Response types embed
+ * `g.Meta` too but tag it `mime:"…"` with no `path:` — the path requirement
+ * below filters them out.
+ */
+const GOFRAME_META_RE = /\btype\s+([A-Z]\w*)\s+struct\s*\{\s*g\.Meta\s+`([^`]*)`/g;
+const META_PATH_RE = /\bpath:"([^"]+)"/;
+const META_METHOD_RE = /\bmethod:"([^"]+)"/;
+const GO_PACKAGE_RE = /^\s*package\s+(\w+)/m;
+
+/** Marker embedded in a route node's qualifiedName so the synthesizer can read
+ *  back the request type to join on. The value after it is the package-qualified
+ *  request type (`cash.ListReq`) — the package disambiguates the many identical
+ *  bare names (`ListReq`, `GetReq`) a large app defines, one per module. Falls
+ *  back to the bare type when no `package` declaration is found. */
+export const GOFRAME_ROUTE_MARKER = '::goframe-route:';
+
+export const goframeResolver: FrameworkResolver = {
+  name: 'goframe',
+  languages: ['go'],
+
+  detect(context: ResolutionContext): boolean {
+    const goMod = context.readFile('go.mod');
+    // GoFrame is `github.com/gogf/gf` (v1) or `github.com/gogf/gf/v2` (v2).
+    return !!goMod && goMod.includes('github.com/gogf/gf');
+  },
+
+  extract(filePath, content) {
+    if (!filePath.endsWith('.go')) return { nodes: [], references: [] };
+    // Cheap reject: the file must mention g.Meta at all.
+    if (!content.includes('g.Meta')) return { nodes: [], references: [] };
+
+    const nodes: Node[] = [];
+    const now = Date.now();
+    const safe = stripCommentsForRegex(content, 'go');
+    const pkg = GO_PACKAGE_RE.exec(safe)?.[1];
+
+    GOFRAME_META_RE.lastIndex = 0;
+    let match: RegExpExecArray | null;
+    while ((match = GOFRAME_META_RE.exec(safe)) !== null) {
+      const [, requestType, tag] = match;
+      const pathMatch = META_PATH_RE.exec(tag!);
+      if (!pathMatch) continue; // response `g.Meta `mime:…`` and other non-route metadata
+      const routePath = pathMatch[1]!;
+      const methodMatch = META_METHOD_RE.exec(tag!);
+      // GoFrame defaults to all methods when `method:` is omitted.
+      const method = methodMatch ? methodMatch[1]!.toUpperCase() : 'ANY';
+      const line = safe.slice(0, match.index).split('\n').length;
+      // The handler's signature qualifies the request type with its package
+      // (`req *cash.ListReq`); encode `pkg.Type` so the synthesizer can match it.
+      const joinKey = pkg ? `${pkg}.${requestType}` : requestType!;
+
+      nodes.push({
+        id: `route:${filePath}:${line}:${method}:${routePath}`,
+        kind: 'route',
+        name: `${method} ${routePath}`,
+        // The request type is the synthesizer's join key — encode it after the
+        // marker. The path stays human-readable in `name`.
+        qualifiedName: `${filePath}${GOFRAME_ROUTE_MARKER}${joinKey}`,
+        filePath,
+        startLine: line,
+        endLine: line,
+        startColumn: 0,
+        endColumn: match[0].length,
+        language: 'go',
+        updatedAt: now,
+      });
+    }
+
+    return { nodes, references: [] };
+  },
+
+  // The route → controller-method edge is reflective (request-type join across
+  // files) and is built by the `goframeRouteEdges` synthesizer after the graph
+  // is complete. This resolver creates no references of its own.
+  resolve(_ref: UnresolvedRef, _context: ResolutionContext): ResolvedRef | null {
+    return null;
+  },
+};

+ 3 - 0
src/resolution/frameworks/index.ts

@@ -19,6 +19,7 @@ import { railsResolver } from './ruby';
 import { springResolver } from './java';
 import { playResolver } from './play';
 import { goResolver } from './go';
+import { goframeResolver } from './goframe';
 import { rustResolver } from './rust';
 import { aspnetResolver } from './csharp';
 import { swiftUIResolver, uikitResolver, vaporResolver } from './swift';
@@ -52,6 +53,7 @@ const FRAMEWORK_RESOLVERS: FrameworkResolver[] = [
   playResolver,
   // Go
   goResolver,
+  goframeResolver,
   // Rust
   rustResolver,
   // C#
@@ -136,6 +138,7 @@ export { railsResolver } from './ruby';
 export { springResolver } from './java';
 export { playResolver } from './play';
 export { goResolver } from './go';
+export { goframeResolver } from './goframe';
 export { rustResolver } from './rust';
 export { aspnetResolver } from './csharp';
 export { swiftUIResolver, uikitResolver, vaporResolver } from './swift';

+ 144 - 0
src/resolution/goframe-synthesizer.ts

@@ -0,0 +1,144 @@
+/**
+ * GoFrame route → controller-method dispatch synthesis (#747).
+ *
+ * GoFrame binds routes reflectively (`group.Bind(user.NewV1())`), so the route
+ * declared in a request type's `g.Meta` tag has no static edge to the method
+ * that serves it. The `goframeResolver` extract pass turns each `g.Meta` into a
+ * `route` node carrying its request type in the qualifiedName; this whole-graph
+ * pass closes the loop by joining each route to its handler.
+ *
+ * The join key is the REQUEST TYPE, not the method name — GoFrame method names
+ * are free (`DeptSearchReq` is served by `List`, `DeptAddReq` by `Add`), so the
+ * only reliable link is the request type appearing in the handler's parameter
+ * signature:
+ *
+ *   func (c *sysDeptController) Add(ctx context.Context, req *system.DeptAddReq) (…)
+ *                                                              ^^^^^^^^^^^^^^^^  the join
+ *
+ * Go method nodes already carry that signature, so no source re-read is needed.
+ * Each synthesized edge is `kind:'calls'`, `provenance:'heuristic'`,
+ * `metadata.synthesizedBy:'goframe-route'` — a reflective-dispatch bridge, so
+ * `codegraph_explore` surfaces it as a dynamic hop rather than a literal call,
+ * and the handler's callers list the route that reaches it. A project with no
+ * GoFrame routes is a no-op.
+ */
+
+import type { Edge, Node } from '../types';
+import type { ResolutionContext } from './types';
+import { GOFRAME_ROUTE_MARKER } from './frameworks/goframe';
+
+const FANOUT_CAP = 2000; // backstop only; real apps are 1 route → 1 method.
+
+/**
+ * Pointer-parameter types in a Go method signature, in both qualified and bare
+ * forms: `(ctx context.Context, req *cash.ListReq)` → `["cash.ListReq",
+ * "ListReq"]`. The qualified form disambiguates the many identical bare names a
+ * large app defines (one `ListReq` per module); the bare form is the fallback
+ * for a same-package (unqualified) handler. The response pointer (`*cash.ListRes`)
+ * is captured too but never matches a request type, so it drops out of the join.
+ */
+function pointerParamTypes(sig: string): string[] {
+  const out: string[] = [];
+  const re = /\*\s*(?:(\w+)\.)?([A-Z]\w*)\b/g;
+  let m: RegExpExecArray | null;
+  while ((m = re.exec(sig)) !== null) {
+    if (m[1]) out.push(`${m[1]}.${m[2]}`);
+    out.push(m[2]!);
+  }
+  return out;
+}
+
+/** The addon/plugin module a path lives under (`addons/hgexample/…` → `hgexample`),
+ *  or `''` for the core app. Large GoFrame apps ship demo addons that CLONE the
+ *  whole module tree — identical package names and request types — so the package
+ *  qualifier can't tell an addon's `config.GetReq` from core's. The addon root can. */
+function addonRoot(p: string): string {
+  return /(?:^|\/)addons\/([^/]+)\//.exec(p)?.[1] ?? '';
+}
+
+/**
+ * Pick the one handler for a route from same-request-type candidates. Usually a
+ * single candidate. When several share the request type (a cloned addon module),
+ * keep controller-dir methods, then the one in the route's own module (core route
+ * → core handler, addon route → that addon's handler). Ambiguity left over ⇒ no
+ * edge (silent beats wrong).
+ */
+function selectHandler(candidates: Node[], routeFile: string): Node | null {
+  if (candidates.length === 1) return candidates[0]!;
+  let cands = candidates.filter((h) => /\/controller(s)?\//.test(h.filePath));
+  if (cands.length === 0) cands = candidates;
+  if (cands.length === 1) return cands[0]!;
+  const ar = addonRoot(routeFile);
+  const sameModule = cands.filter((h) => addonRoot(h.filePath) === ar);
+  return sameModule.length === 1 ? sameModule[0]! : null;
+}
+
+export function goframeRouteEdges(ctx: ResolutionContext): Edge[] {
+  // Route nodes the goframe extractor created, keyed by their package-qualified
+  // request type (`cash.ListReq`). `wanted` holds every key a handler signature
+  // could match — the qualified form plus its bare type fallback.
+  const routesByReqType = new Map<string, Node[]>();
+  const wanted = new Set<string>();
+  for (const route of ctx.getNodesByKind('route')) {
+    if (route.language !== 'go') continue;
+    const marker = route.qualifiedName.indexOf(GOFRAME_ROUTE_MARKER);
+    if (marker < 0) continue;
+    const joinKey = route.qualifiedName.slice(marker + GOFRAME_ROUTE_MARKER.length);
+    if (!joinKey) continue;
+    let arr = routesByReqType.get(joinKey);
+    if (!arr) { arr = []; routesByReqType.set(joinKey, arr); }
+    arr.push(route);
+    wanted.add(joinKey);
+    const dot = joinKey.lastIndexOf('.');
+    if (dot >= 0) wanted.add(joinKey.slice(dot + 1)); // bare fallback
+  }
+  if (routesByReqType.size === 0) return [];
+
+  // Handler candidates: Go methods whose signature takes a wanted request type by
+  // pointer, indexed by every matching (qualified + bare) form so a route can
+  // match precisely on `pkg.Type` and fall back to the bare `Type`.
+  const handlersByKey = new Map<string, Node[]>();
+  for (const method of ctx.getNodesByKind('method')) {
+    if (method.language !== 'go' || !method.signature) continue;
+    for (const t of pointerParamTypes(method.signature)) {
+      if (!wanted.has(t)) continue;
+      let arr = handlersByKey.get(t);
+      if (!arr) { arr = []; handlersByKey.set(t, arr); }
+      arr.push(method);
+    }
+  }
+
+  const edges: Edge[] = [];
+  const seen = new Set<string>();
+  let added = 0;
+  for (const [joinKey, routes] of routesByReqType) {
+    const bare = joinKey.includes('.') ? joinKey.slice(joinKey.lastIndexOf('.') + 1) : joinKey;
+    // Precise package-qualified match first; bare type only as a fallback (covers
+    // a same-package handler or an aliased import where the bare name is unique).
+    const candidates = handlersByKey.get(joinKey) ?? handlersByKey.get(bare);
+    if (!candidates || candidates.length === 0) continue;
+    const requestType = bare;
+    for (const route of routes) {
+      const handler = selectHandler(candidates, route.filePath);
+      if (!handler || route.id === handler.id) continue;
+      const key = `${route.id}>${handler.id}`;
+      if (seen.has(key) || added >= FANOUT_CAP) continue;
+      seen.add(key);
+      edges.push({
+        source: route.id,
+        target: handler.id,
+        kind: 'calls',
+        line: route.startLine,
+        provenance: 'heuristic',
+        metadata: {
+          synthesizedBy: 'goframe-route',
+          route: route.name,
+          requestType,
+          registeredAt: `${handler.filePath}:${handler.startLine}`,
+        },
+      });
+      added++;
+    }
+  }
+  return edges;
+}

+ 2 - 1
src/sync/watcher.ts

@@ -34,6 +34,7 @@
 import * as fs from 'fs';
 import * as path from 'path';
 import { isSourceFile, buildScopeIgnore, type ScopeIgnore } from '../extraction';
+import { loadExtensionOverrides } from '../project-config';
 import { logDebug, logWarn } from '../errors';
 import { normalizePath } from '../utils';
 import { isCodeGraphDataDir } from '../directory';
@@ -535,7 +536,7 @@ export class FileWatcher {
     if (!rel || rel === '.' || rel.startsWith('..')) return;
     if (this.isAlwaysIgnored(rel)) return;
     if (this.ignoreMatcher && this.ignoreMatcher.ignores(rel)) return;
-    if (!isSourceFile(rel)) return;
+    if (!isSourceFile(rel, loadExtensionOverrides(this.projectRoot))) return;
 
     logDebug('File change detected', { file: rel });
     if (this.ready) {

+ 24 - 2
src/utils.ts

@@ -91,25 +91,47 @@ function isWithinDir(child: string, parent: string): boolean {
  * (codegraph_node `includeCode`, codegraph_explore source) go through here, so
  * this is the chokepoint that keeps out-of-root file contents from leaking.
  *
+ * `allowSymlinkEscape` waives **only** the realpath-escape rejection (the
+ * lexical `../` guard still applies) for the INDEXING read path. The directory
+ * walk deliberately descends into in-root symlinks whose targets live outside
+ * the root (e.g. a `game/` symlink in a Dota custom-game tree, #935); discovery
+ * and the reader must agree, or every file the walk enumerated fails to index.
+ * Indexing only reads paths it just discovered, into a local index — it never
+ * serves them to an agent, so this does not widen the #527 leak surface. The
+ * content-serving sinks must never pass this flag.
+ *
  * @param projectRoot - The project root directory
  * @param filePath - The (relative or absolute) file path to validate
+ * @param options.allowSymlinkEscape - Follow in-root symlinks out of the root
+ *   (indexing read path only); defaults to the strict, leak-safe behavior.
  * @returns The resolved absolute path (realpath when it exists), or null if it
  *   escapes the root
  */
-export function validatePathWithinRoot(projectRoot: string, filePath: string): string | null {
+export function validatePathWithinRoot(
+  projectRoot: string,
+  filePath: string,
+  options?: { allowSymlinkEscape?: boolean }
+): string | null {
   const resolved = path.resolve(projectRoot, filePath);
   const normalizedRoot = path.resolve(projectRoot);
 
-  // 1. Lexical containment — cheap, catches `../` traversal.
+  // 1. Lexical containment — cheap, catches `../` traversal. Applies even on
+  //    the indexing read path: a crafted `../` escape is still rejected.
   if (!isWithinDir(resolved, normalizedRoot)) {
     return null;
   }
 
   // 2. Symlink-aware containment — resolve symlinks on both sides and re-check,
   //    so an in-repo symlink whose real target escapes the root is rejected.
+  //    The indexing read path (allowSymlinkEscape) skips only this rejection so
+  //    it stays consistent with the directory walk, which already followed the
+  //    in-root symlink to enumerate these files (#935).
   try {
     const realRoot = fs.realpathSync(normalizedRoot);
     const realResolved = fs.realpathSync(resolved);
+    if (options?.allowSymlinkEscape) {
+      return realResolved;
+    }
     return isWithinDir(realResolved, realRoot) ? realResolved : null;
   } catch (err) {
     // ENOENT: the path doesn't exist yet (a file about to be written, or an

Một số tệp đã không được hiển thị bởi vì quá nhiều tập tin thay đổi trong này khác