Selaa lähdekoodia

fix(security): resolve symlinks in path validation to block out-of-root reads (#527) (#724)

* fix(security): resolve symlinks in path validation to block out-of-root reads (#527)

validatePathWithinRoot was purely lexical (path.resolve + startsWith), so an
in-repo symlink whose logical path is inside the project root but whose real
target escapes it passed validation — and both content-serving read sinks
(codegraph_node includeCode, codegraph_explore source) then readFileSync'd it,
leaking out-of-root file contents (e.g. ~/.ssh, /etc) to the agent.

Add a realpath layer: after the lexical check, resolve symlinks on both the
candidate path and the root and re-compare, rejecting anything whose real path
escapes the root. An in-root symlink is still allowed (no over-blocking).
Comparison is case-insensitive on Windows (NTFS + realpath casing). Not-yet-
existing paths (ENOENT) fall back to the lexical result so about-to-be-written
files still validate; other resolution errors reject.

Removes the dead, never-called isPathWithinRoot / isPathWithinRootReal helpers
(the latter a footgun — it returned true on realpath failure). Adds RED->GREEN
tests: in->out file/dir symlinks rejected, in->in allowed, ../ rejected, ENOENT
allowed, plus an end-to-end test proving getCode no longer serves an out-of-root
file reached through a dir symlink.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(changelog): note the #527 symlink path-escape fix

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Colby Mchenry 2 viikkoa sitten
vanhempi
sitoutus
7fd8b4c185
3 muutettua tiedostoa jossa 124 lisäystä ja 48 poistoa
  1. 1 0
      CHANGELOG.md
  2. 77 1
      __tests__/security.test.ts
  3. 46 47
      src/utils.ts

+ 1 - 0
CHANGELOG.md

@@ -11,6 +11,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
 ### Security
 
+- Closed a path-traversal hole where a symbolic link inside an indexed project that pointed *outside* the project root could make CodeGraph serve that out-of-root file's contents (for example a file under your home directory) to the AI agent. CodeGraph now resolves symlinks when validating file access and refuses to read anything whose real location is outside the project, while still allowing symlinks that stay within it. Thanks @sulthonzh. (#527)
 - CodeGraph now indexes Spring configuration files (`application.properties` / `application.yml`) by key only, and never includes their values in `codegraph_explore` or `codegraph_node` output. Previously a secret committed to one of these files — a database password, API key, or connection string with embedded credentials — could be surfaced to an AI agent that asked about nearby code, even though the agent never opened the file. The configuration keys are still indexed, so reference and impact analysis are unaffected; an agent that genuinely needs a value reads the file itself. Shopify Liquid `{% schema %}` blocks are likewise indexed by name only. (#383)
 
 ### New Features

+ 77 - 1
__tests__/security.test.ts

@@ -12,7 +12,7 @@ import { describe, it, expect, beforeEach, afterEach } from 'vitest';
 import * as fs from 'fs';
 import * as path from 'path';
 import * as os from 'os';
-import { FileLock, validateProjectPath } from '../src/utils';
+import { FileLock, validateProjectPath, validatePathWithinRoot } from '../src/utils';
 import CodeGraph from '../src/index';
 import { ToolHandler, tools } from '../src/mcp/tools';
 import { scanDirectory, isSourceFile } from '../src/extraction';
@@ -176,6 +176,82 @@ describe('Path Traversal Prevention', () => {
   });
 });
 
+describe('Symlink escape prevention (#527)', () => {
+  // An in-repo symlink whose logical path is inside the project root but whose
+  // REAL target escapes the root must never be served. validatePathWithinRoot
+  // is the chokepoint both content-serving read sinks go through (codegraph_node
+  // includeCode + codegraph_explore source rendering), so it must resolve
+  // symlinks, not just compare strings. realpathSync the roots so the test's own
+  // expectations don't trip over /tmp -> /private/tmp on macOS.
+  let root: string;
+  let outside: string;
+
+  beforeEach(() => {
+    root = fs.realpathSync(fs.mkdtempSync(path.join(os.tmpdir(), 'cg-root-')));
+    outside = fs.realpathSync(fs.mkdtempSync(path.join(os.tmpdir(), 'cg-outside-')));
+    fs.mkdirSync(path.join(root, 'src'));
+    fs.writeFileSync(path.join(root, 'src', 'in.ts'), 'export const x = 1;\n');
+    fs.mkdirSync(path.join(outside, 'pkg'));
+    fs.writeFileSync(path.join(outside, 'pkg', 'secret.txt'), 'TOP-SECRET\n');
+  });
+
+  afterEach(() => {
+    fs.rmSync(root, { recursive: true, force: true });
+    fs.rmSync(outside, { recursive: true, force: true });
+  });
+
+  // Symlink creation needs privileges on Windows; skip gracefully if it fails.
+  const link = (linkPath: string, target: string): boolean => {
+    try { fs.symlinkSync(target, linkPath); return true; } catch { return false; }
+  };
+
+  it('allows a real file inside the root (and realpaths consistently)', () => {
+    expect(validatePathWithinRoot(root, 'src/in.ts')).not.toBeNull();
+  });
+
+  it('allows a not-yet-existing path inside the root (ENOENT — files about to be written)', () => {
+    expect(validatePathWithinRoot(root, 'src/will-write.ts')).not.toBeNull();
+  });
+
+  it('rejects a lexical ../ traversal out of the root', () => {
+    expect(validatePathWithinRoot(root, `../${path.basename(outside)}/pkg/secret.txt`)).toBeNull();
+  });
+
+  it('rejects an in-repo symlink to an out-of-root FILE', () => {
+    if (!link(path.join(root, 'escape'), path.join(outside, 'pkg', 'secret.txt'))) return;
+    expect(validatePathWithinRoot(root, 'escape')).toBeNull();
+  });
+
+  it('rejects a path that escapes through an in-repo symlink to an out-of-root DIR', () => {
+    if (!link(path.join(root, 'escapedir'), path.join(outside, 'pkg'))) return;
+    expect(validatePathWithinRoot(root, 'escapedir/secret.txt')).toBeNull();
+  });
+
+  it('still allows an in-repo symlink that stays WITHIN the root (no over-blocking)', () => {
+    if (!link(path.join(root, 'src', 'inlink.ts'), path.join(root, 'src', 'in.ts'))) return;
+    expect(validatePathWithinRoot(root, 'src/inlink.ts')).not.toBeNull();
+  });
+
+  it('end-to-end: getCode never serves an out-of-root file reached via a dir symlink', async () => {
+    fs.writeFileSync(path.join(outside, 'pkg', 'leak.ts'),
+      'export function leaked() { return "LEAKED-ZZZ-9"; }\n');
+    if (!link(path.join(root, 'vendored'), path.join(outside, 'pkg'))) return;
+
+    const cg = CodeGraph.initSync(root, { config: { include: ['**/*.ts'], exclude: [] } });
+    try {
+      await cg.indexAll();
+      // Whether or not extraction followed the dir symlink, NO node may ever
+      // yield the out-of-root content through getCode.
+      for (const n of cg.getNodesByKind('function')) {
+        const code = await cg.getCode(n.id);
+        expect(code ?? '').not.toContain('LEAKED-ZZZ-9');
+      }
+    } finally {
+      cg.close();
+    }
+  });
+});
+
 describe('validateProjectPath — sensitive directory blocking', () => {
   // POSIX-only: on Windows '/etc' resolves to C:\etc (non-existent), not a
   // sensitive dir — the Windows case is covered by the win32-gated test below.

+ 46 - 47
src/utils.ts

@@ -66,21 +66,61 @@ export function isConfigLeafNode(node: { kind: string; language?: string }): boo
 }
 
 /**
- * Validate that a resolved file path stays within the project root.
- * Prevents path traversal attacks (e.g. node.filePath = "../../etc/passwd").
+ * Whether `child` is `parent` itself or sits underneath it. Case-insensitive on
+ * Windows — NTFS is case-insensitive, and realpathSync can hand back a different
+ * case than the lexical root, which would otherwise false-reject a valid file.
+ */
+function isWithinDir(child: string, parent: string): boolean {
+  let c = child;
+  let p = parent;
+  if (process.platform === 'win32') {
+    c = c.toLowerCase();
+    p = p.toLowerCase();
+  }
+  return c === p || c.startsWith(p + path.sep);
+}
+
+/**
+ * Validate that a file path stays within the project root, resolving symlinks.
+ *
+ * Two layers: a cheap lexical check that catches `../` traversal, then a
+ * realpath check that catches symlink escapes — an in-repo symlink whose
+ * logical path is inside the root but whose real target points outside it
+ * (issue #527). A symlink that stays within the root is still allowed, so
+ * legitimate in-tree symlinks keep working. Both content-serving read sinks
+ * (codegraph_node `includeCode`, codegraph_explore source) go through here, so
+ * this is the chokepoint that keeps out-of-root file contents from leaking.
  *
  * @param projectRoot - The project root directory
- * @param filePath - The relative file path to validate
- * @returns The resolved absolute path, or null if it escapes the root
+ * @param filePath - The (relative or absolute) file path to validate
+ * @returns The resolved absolute path (realpath when it exists), or null if it
+ *   escapes the root
  */
 export function validatePathWithinRoot(projectRoot: string, filePath: string): string | null {
   const resolved = path.resolve(projectRoot, filePath);
   const normalizedRoot = path.resolve(projectRoot);
 
-  if (!resolved.startsWith(normalizedRoot + path.sep) && resolved !== normalizedRoot) {
+  // 1. Lexical containment — cheap, catches `../` traversal.
+  if (!isWithinDir(resolved, normalizedRoot)) {
+    return null;
+  }
+
+  // 2. Symlink-aware containment — resolve symlinks on both sides and re-check,
+  //    so an in-repo symlink whose real target escapes the root is rejected.
+  try {
+    const realRoot = fs.realpathSync(normalizedRoot);
+    const realResolved = fs.realpathSync(resolved);
+    return isWithinDir(realResolved, realRoot) ? realResolved : null;
+  } catch (err) {
+    // ENOENT: the path doesn't exist yet (a file about to be written, or an
+    // index entry for a since-deleted file) — no symlink to follow, and the
+    // lexical check already passed, so allow the lexical path. Any other
+    // resolution failure (ELOOP, EACCES, …) is treated as unsafe → reject.
+    if ((err as NodeJS.ErrnoException).code === 'ENOENT') {
+      return resolved;
+    }
     return null;
   }
-  return resolved;
 }
 
 /**
@@ -124,47 +164,6 @@ export function validateProjectPath(dirPath: string): string | null {
   return null;
 }
 
-/**
- * Check if a file path resolves to a location within the given root directory.
- *
- * Prevents path traversal attacks by ensuring the resolved absolute path
- * starts with the resolved root path. Handles '..' sequences, symlink-like
- * relative paths, and platform-specific separators.
- *
- * @param filePath - The path to check (can be relative or absolute)
- * @param rootDir - The root directory that filePath must stay within
- * @returns true if filePath resolves to a location within rootDir
- */
-export function isPathWithinRoot(filePath: string, rootDir: string): boolean {
-  const resolvedPath = path.resolve(rootDir, filePath);
-  const resolvedRoot = path.resolve(rootDir);
-  return resolvedPath.startsWith(resolvedRoot + path.sep) || resolvedPath === resolvedRoot;
-}
-
-/**
- * Like isPathWithinRoot but also resolves symlinks via fs.realpathSync.
- *
- * This catches symlink escapes where the logical path appears to be within
- * root but the real path on disk points elsewhere. Falls back to logical
- * path checking if realpath resolution fails (e.g. broken symlink).
- */
-export function isPathWithinRootReal(filePath: string, rootDir: string): boolean {
-  // First do the cheap logical check
-  if (!isPathWithinRoot(filePath, rootDir)) {
-    return false;
-  }
-
-  // Then verify with realpath to catch symlink escapes
-  try {
-    const realPath = fs.realpathSync(path.resolve(rootDir, filePath));
-    const realRoot = fs.realpathSync(rootDir);
-    return realPath.startsWith(realRoot + path.sep) || realPath === realRoot;
-  } catch {
-    // If realpath fails (broken symlink, permissions), fall back to logical check
-    return true;
-  }
-}
-
 /**
  * Safely parse JSON with a fallback value.
  * Prevents crashes from corrupted database metadata.