Procházet zdrojové kódy

fix(extraction): map PHP include/require to file→file dependency edges (#660) (#663)

PHP's importTypes only captured namespace_use_declaration, so
include/require(_once) — the dependency mechanism in procedural and
script-style PHP — never produced edges. callers, impact, and trace
missed the entire file-include graph; only namespace `use` became a
dependency edge.

Capture the four include/require expression types and emit file→file
imports edges, reusing the path-based resolution that C/C++ #include
already goes through. Only static string-literal paths are resolved
(relative to the including file); dynamic forms (include $var,
require __DIR__ . '/x', interpolated strings) are skipped.

Include PATHS are distinguished from namespace `use` symbols by shape: a
path contains '/' or '.', which PHP identifiers and FQNs never do. A
path-shaped include that doesn't resolve to a known project file is left
unresolved and does NOT fall back to the symbol name-matcher, which would
otherwise mis-connect "inc/db.php" to an unrelated db.php elsewhere — a
wrong edge is worse than a missing one.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Colby McHenry <me@colbymchenry.com>
Max Hsu před 2 týdny
rodič
revize
6e2a24d96a

+ 1 - 0
CHANGELOG.md

@@ -85,6 +85,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 - CodeGraph's MCP server now answers an agent's `resources/list` and `prompts/list` probes with an empty list instead of an error, clearing the `-32601` messages some clients (opencode, Codex) logged on connect. (#621)
 - Svelte and Vue components used through a barrel file — `export { default as Button } from './Button.svelte'` re-exported from an `index.ts` and imported elsewhere — are no longer falsely reported as having **0 callers**. CodeGraph now follows the default re-export all the way to the component and resolves the imports that `.svelte` / `.vue` files themselves use, so `codegraph_callers` and `codegraph_impact` see every place a component is used. This also covers components imported from another package in a workspace/monorepo (`@scope/ui/widgets`) and bare directory imports (`import { x } from './'`). Previously a live component consumed only through a barrel looked like dead code. Thanks @nakisen. (#629)
 - Components used in a Vue Single-File Component's `<template>` — `<MyButton />`, or the kebab-case `<my-button />` — are now indexed as usages, so `codegraph_callers` and `codegraph_impact` include components that appear only in another component's markup (including through a barrel re-export). Previously only a Vue component's `<script>` block was analyzed, so template-only usages were invisible. (#629)
+- PHP: `include` / `require` / `include_once` / `require_once` of a static path now create a file→file dependency edge, so `codegraph_callers` and `codegraph_impact` follow includes in procedural / script-style PHP codebases — previously only namespace `use` statements became dependency edges. Dynamic includes (`include $var`, `require __DIR__ . '/x'`) are skipped. Thanks @atahan150 (#660).
 
 ## [0.9.9] - 2026-06-02
 

+ 37 - 0
__tests__/extraction.test.ts

@@ -2247,6 +2247,43 @@ use Closure;
       expect(names).toContain('Illuminate\\Support\\Str');
       expect(names).toContain('Closure');
     });
+
+    it('should extract include/require (+_once) static paths as imports (#660)', () => {
+      const code = `<?php
+require_once("lib.php");
+include 'other.php';
+require 'r.php';
+include_once("io.php");
+`;
+      const result = extractFromSource('page.php', code);
+      const names = result.nodes.filter((n) => n.kind === 'import').map((n) => n.name);
+      expect(names).toContain('lib.php');
+      expect(names).toContain('other.php');
+      expect(names).toContain('r.php');
+      expect(names).toContain('io.php');
+    });
+
+    it('should skip dynamic include/require with no static path (#660)', () => {
+      const code = `<?php
+require_once(__DIR__ . '/dyn.php');
+include $file;
+include "tpl/{$name}.php";
+`;
+      const result = extractFromSource('page.php', code);
+      const imports = result.nodes.filter((n) => n.kind === 'import');
+      expect(imports).toHaveLength(0);
+    });
+
+    it('should extract include alongside namespace use without interference (#660)', () => {
+      const code = `<?php
+use App\\Service\\Mailer;
+require_once("bootstrap.php");
+`;
+      const result = extractFromSource('page.php', code);
+      const names = result.nodes.filter((n) => n.kind === 'import').map((n) => n.name);
+      expect(names).toContain('App\\Service\\Mailer');
+      expect(names).toContain('bootstrap.php');
+    });
   });
 
   describe('Ruby imports', () => {

+ 133 - 1
__tests__/resolution.test.ts

@@ -12,7 +12,7 @@ import { CodeGraph } from '../src';
 import { Node, UnresolvedReference } from '../src/types';
 import { ReferenceResolver, createResolver, ResolutionContext } from '../src/resolution';
 import { matchReference } from '../src/resolution/name-matcher';
-import { resolveImportPath, extractImportMappings, resolveJvmImport, loadCppIncludeDirs, clearCppIncludeDirCache } from '../src/resolution/import-resolver';
+import { resolveImportPath, extractImportMappings, resolveJvmImport, loadCppIncludeDirs, clearCppIncludeDirCache, isPhpIncludePathRef } from '../src/resolution/import-resolver';
 import type { UnresolvedRef } from '../src/resolution/types';
 import { detectFrameworks, getAllFrameworkResolvers } from '../src/resolution/frameworks';
 import { QueryBuilder } from '../src/db/queries';
@@ -1919,6 +1919,138 @@ func main() {
     });
   });
 
+  describe('PHP Include Resolution', () => {
+    it('isPhpIncludePathRef distinguishes include paths from namespace use (#660)', () => {
+      const mk = (name: string, over: Partial<UnresolvedRef> = {}): UnresolvedRef => ({
+        fromNodeId: 'f', referenceName: name, referenceKind: 'imports',
+        line: 1, column: 0, filePath: 'x.php', language: 'php', ...over,
+      });
+      // include paths: contain a slash or a file extension
+      expect(isPhpIncludePathRef(mk('lib.php'))).toBe(true);
+      expect(isPhpIncludePathRef(mk('inc/db.php'))).toBe(true);
+      expect(isPhpIncludePathRef(mk('../config.php'))).toBe(true);
+      // namespace use symbols: a bare class (Closure) or FQN — never a path,
+      // so they must NOT be treated as includes (would mis-connect to a
+      // same-named Closure.php / Bar.php file).
+      expect(isPhpIncludePathRef(mk('Closure'))).toBe(false);
+      expect(isPhpIncludePathRef(mk('PDO'))).toBe(false);
+      expect(isPhpIncludePathRef(mk('App\\Foo\\Bar'))).toBe(false);
+      // scoped to PHP imports only
+      expect(isPhpIncludePathRef(mk('lib.php', { language: 'c' }))).toBe(false);
+      expect(isPhpIncludePathRef(mk('lib.php', { referenceKind: 'calls' }))).toBe(false);
+    });
+
+    it('resolves require_once to a file→file imports edge (#660)', async () => {
+      const tempProject = fs.mkdtempSync(path.join(os.tmpdir(), 'codegraph-php-e2e-'));
+      try {
+        fs.mkdirSync(path.join(tempProject, 'src'), { recursive: true });
+        fs.writeFileSync(
+          path.join(tempProject, 'src', 'lib.php'),
+          `<?php\nfunction greet() { return "hi"; }\n`
+        );
+        fs.writeFileSync(
+          path.join(tempProject, 'src', 'page.php'),
+          `<?php\nrequire_once("lib.php");\necho greet();\n`
+        );
+
+        cg = await CodeGraph.init(tempProject, { index: true });
+
+        // reporter's repro: page.php's `require_once("lib.php")` must resolve
+        // to the real src/lib.php file node — a file→file `imports` edge, so
+        // callers(lib.php) now includes page.php.
+        const db = DatabaseConnection.open(path.join(tempProject, '.codegraph', 'codegraph.db'));
+        const rows = db.getDb().prepare(`
+          select dst.kind as dstKind, dst.file_path as dstPath
+          from edges e
+          join nodes src on e.source = src.id
+          join nodes dst on e.target = dst.id
+          where e.kind = 'imports'
+            and src.kind = 'file'
+            and src.file_path = 'src/page.php'
+        `).all() as Array<{ dstKind: string; dstPath: string }>;
+        const resolved = rows.find(
+          (r) => r.dstKind === 'file' && r.dstPath === 'src/lib.php'
+        );
+        expect(resolved, 'page.php → src/lib.php imports edge missing').toBeDefined();
+      } finally {
+        fs.rmSync(tempProject, { recursive: true, force: true });
+      }
+    });
+
+    it('resolves a subdirectory include path to the correct file (#660)', async () => {
+      const tempProject = fs.mkdtempSync(path.join(os.tmpdir(), 'codegraph-php-subdir-'));
+      try {
+        fs.mkdirSync(path.join(tempProject, 'inc'), { recursive: true });
+        fs.writeFileSync(
+          path.join(tempProject, 'inc', 'db.php'),
+          `<?php\nfunction query() { return 1; }\n`
+        );
+        fs.writeFileSync(
+          path.join(tempProject, 'index.php'),
+          `<?php\nrequire "inc/db.php";\nquery();\n`
+        );
+
+        cg = await CodeGraph.init(tempProject, { index: true });
+
+        const db = DatabaseConnection.open(path.join(tempProject, '.codegraph', 'codegraph.db'));
+        const rows = db.getDb().prepare(`
+          select dst.kind as dstKind, dst.file_path as dstPath
+          from edges e
+          join nodes src on e.source = src.id
+          join nodes dst on e.target = dst.id
+          where e.kind = 'imports'
+            and src.kind = 'file'
+            and src.file_path = 'index.php'
+        `).all() as Array<{ dstKind: string; dstPath: string }>;
+        expect(
+          rows.find((r) => r.dstKind === 'file' && r.dstPath === 'inc/db.php'),
+          'index.php → inc/db.php imports edge missing'
+        ).toBeDefined();
+      } finally {
+        fs.rmSync(tempProject, { recursive: true, force: true });
+      }
+    });
+
+    it('does not mis-connect an unresolvable include to a same-named file elsewhere (#660)', async () => {
+      const tempProject = fs.mkdtempSync(path.join(os.tmpdir(), 'codegraph-php-misresolve-'));
+      try {
+        // app/page.php's `require "inc/db.php"` resolves relative to app/, where
+        // inc/db.php does NOT exist. A same-named lib/inc/db.php exists elsewhere
+        // but is unrelated — no edge should be created (a wrong edge is worse
+        // than a missing one).
+        fs.mkdirSync(path.join(tempProject, 'app'), { recursive: true });
+        fs.mkdirSync(path.join(tempProject, 'lib', 'inc'), { recursive: true });
+        fs.writeFileSync(
+          path.join(tempProject, 'lib', 'inc', 'db.php'),
+          `<?php\nfunction unrelated() {}\n`
+        );
+        fs.writeFileSync(
+          path.join(tempProject, 'app', 'page.php'),
+          `<?php\nrequire "inc/db.php";\n`
+        );
+
+        cg = await CodeGraph.init(tempProject, { index: true });
+
+        const db = DatabaseConnection.open(path.join(tempProject, '.codegraph', 'codegraph.db'));
+        const rows = db.getDb().prepare(`
+          select dst.kind as dstKind, dst.file_path as dstPath
+          from edges e
+          join nodes src on e.source = src.id
+          join nodes dst on e.target = dst.id
+          where e.kind = 'imports'
+            and src.kind = 'file'
+            and src.file_path = 'app/page.php'
+        `).all() as Array<{ dstKind: string; dstPath: string }>;
+        expect(
+          rows.find((r) => r.dstKind === 'file' && r.dstPath === 'lib/inc/db.php'),
+          'app/page.php must NOT mis-connect to unrelated lib/inc/db.php'
+        ).toBeUndefined();
+      } finally {
+        fs.rmSync(tempProject, { recursive: true, force: true });
+      }
+    });
+  });
+
   describe('C++ chained-call receiver resolution (#645)', () => {
     async function indexCpp(files: Record<string, string>): Promise<void> {
       for (const [name, content] of Object.entries(files)) {

+ 40 - 1
src/extraction/languages/php.ts

@@ -2,6 +2,37 @@ import type { Node as SyntaxNode } from 'web-tree-sitter';
 import { getNodeText } from '../tree-sitter-helpers';
 import type { LanguageExtractor } from '../tree-sitter-types';
 
+// include / require (+ _once) expression node types. These carry the
+// file→file dependency in procedural PHP, where `include`/`require` — not
+// namespace `use` — is how a file pulls in another (issue #660).
+const PHP_INCLUDE_TYPES = new Set([
+  'include_expression',
+  'include_once_expression',
+  'require_expression',
+  'require_once_expression',
+]);
+
+/**
+ * Extract a static string-literal path from a PHP include/require expression.
+ *
+ * Returns null for dynamic forms (`include $var`, `require __DIR__ . '/x'`,
+ * interpolated strings) — they have no resolvable compile-time path, which
+ * matches the issue's "static string literals (the common case)" scope.
+ */
+function phpStaticIncludePath(node: SyntaxNode, source: string): string | null {
+  // The path argument is the expression's first named child; the call-style
+  // form `require("x")` wraps it in a parenthesized_expression.
+  let arg: SyntaxNode | null = node.namedChild(0);
+  if (arg?.type === 'parenthesized_expression') arg = arg.namedChild(0);
+  if (!arg || (arg.type !== 'string' && arg.type !== 'encapsed_string')) return null;
+  // Pure literal only: any non-`string_content` child (interpolated variable,
+  // escape sequence, …) means the value isn't a static path.
+  const parts = arg.namedChildren;
+  if (parts.some((c: SyntaxNode) => c.type !== 'string_content')) return null;
+  const content = parts.find((c: SyntaxNode) => c.type === 'string_content');
+  return content ? getNodeText(content, source) : null;
+}
+
 export const phpExtractor: LanguageExtractor = {
   functionTypes: ['function_definition'],
   classTypes: ['class_declaration', 'trait_declaration'],
@@ -11,7 +42,7 @@ export const phpExtractor: LanguageExtractor = {
   enumTypes: ['enum_declaration'],
   enumMemberTypes: ['enum_case'],
   typeAliasTypes: [],
-  importTypes: ['namespace_use_declaration'],
+  importTypes: ['namespace_use_declaration', ...PHP_INCLUDE_TYPES],
   callTypes: ['function_call_expression', 'member_call_expression', 'scoped_call_expression'],
   variableTypes: ['const_declaration'],
   fieldTypes: ['property_declaration'],
@@ -93,6 +124,14 @@ export const phpExtractor: LanguageExtractor = {
   extractImport: (node, source) => {
     const importText = source.substring(node.startIndex, node.endIndex).trim();
 
+    // include / require (+ _once): emit a file→file dependency. The path is a
+    // static string literal in the common case; dynamic forms resolve to null
+    // and are skipped (no import node, no edge).
+    if (PHP_INCLUDE_TYPES.has(node.type)) {
+      const includePath = phpStaticIncludePath(node, source);
+      return includePath ? { moduleName: includePath, signature: importText } : null;
+    }
+
     // Check for grouped imports: use X\{A, B} - return null for core fallback
     const namespacePrefix = node.namedChildren.find((c: SyntaxNode) => c.type === 'namespace_name');
     const useGroup = node.namedChildren.find((c: SyntaxNode) => c.type === 'namespace_use_group');

+ 71 - 0
src/resolution/import-resolver.ts

@@ -529,6 +529,47 @@ function resolveCppIncludePath(
   return null;
 }
 
+/**
+ * Is this reference a PHP include/require PATH (vs a namespace `use` symbol)?
+ *
+ * include/require emit a file path ("lib.php", "inc/db.php", "../x.php"),
+ * whereas namespace use is an FQN (App\Foo\Bar) or a bare class symbol
+ * (Closure). PHP identifiers contain neither '/' nor '.', so a slash or dot
+ * marks a path-shaped include. Such references resolve to files only — never
+ * to a same-named symbol — so callers must not fall back to the name-matcher.
+ */
+export function isPhpIncludePathRef(ref: UnresolvedRef): boolean {
+  return (
+    ref.language === 'php' &&
+    ref.referenceKind === 'imports' &&
+    (ref.referenceName.includes('/') || ref.referenceName.includes('.'))
+  );
+}
+
+/**
+ * Resolve a PHP include/require path to a project-relative file path.
+ *
+ * PHP resolves includes relative to the including file's directory (the
+ * common case for procedural codebases); php.ini `include_path` is not
+ * modeled. Callers pass an already-extracted static literal path.
+ */
+function resolvePhpIncludePath(
+  includePath: string,
+  fromFile: string,
+  context: ResolutionContext
+): string | null {
+  const projectRoot = context.getProjectRoot();
+  const fromDir = path.dirname(path.join(projectRoot, fromFile));
+  const basePath = path.resolve(fromDir, includePath);
+  const relativePath = path.relative(projectRoot, basePath).replace(/\\/g, '/');
+  if (context.fileExists(relativePath)) return relativePath;
+  // The literal may omit the .php extension (e.g. include "config").
+  for (const ext of EXTENSION_RESOLUTION.php ?? []) {
+    if (context.fileExists(relativePath + ext)) return relativePath + ext;
+  }
+  return null;
+}
+
 /**
  * Extract import mappings from a file
  */
@@ -1122,6 +1163,36 @@ export function resolveViaImport(
     return null;
   }
 
+  // PHP include/require — resolve the static string path to a file→file
+  // edge, mirroring the C/C++ branch above. Distinguish include PATHS from
+  // namespace `use` symbols by shape: an include path contains a slash or a
+  // file extension ("lib.php", "inc/db.php", "../x.php"), whereas a namespace
+  // use is an FQN (App\Foo\Bar) or a bare class symbol (Closure) — PHP
+  // identifiers contain neither '/' nor '.'. Only path-shaped references are
+  // includes; symbol references fall through to the namespace resolution.
+  if (isPhpIncludePathRef(ref)) {
+    const resolvedPath = resolvePhpIncludePath(ref.referenceName, ref.filePath, context);
+    if (resolvedPath) {
+      const basename = resolvedPath.split('/').pop()!;
+      const fileNode = context
+        .getNodesByName(basename)
+        .find((n) => n.kind === 'file' && n.filePath === resolvedPath);
+      if (fileNode) {
+        return {
+          original: ref,
+          targetNodeId: fileNode.id,
+          confidence: 0.9,
+          resolvedBy: 'import',
+        };
+      }
+    }
+    // A path-shaped include that doesn't resolve to a known project file is a
+    // dead end. Return unresolved rather than falling through to the symbol
+    // name-matcher, which would mis-connect e.g. "inc/db.php" to an unrelated
+    // db.php elsewhere in the tree — a wrong edge is worse than a missing one.
+    return null;
+  }
+
   // Use cached import mappings (avoids re-reading and re-parsing per ref)
   const imports = context.getImportMappings(ref.filePath, ref.language);
   if (imports.length === 0 && !context.readFile(ref.filePath)) {

+ 13 - 1
src/resolution/index.ts

@@ -17,7 +17,7 @@ import {
   ImportMapping,
 } from './types';
 import { matchReference, sameLanguageFamily, crossesKnownFamily } from './name-matcher';
-import { resolveViaImport, resolveJvmImport, extractImportMappings, extractReExports, loadCppIncludeDirs } from './import-resolver';
+import { resolveViaImport, resolveJvmImport, extractImportMappings, extractReExports, loadCppIncludeDirs, isPhpIncludePathRef } from './import-resolver';
 import { detectFrameworks } from './frameworks';
 import { synthesizeCallbackEdges } from './callback-synthesizer';
 import { loadProjectAliases, type AliasMap } from './path-aliases';
@@ -666,6 +666,18 @@ export class ReferenceResolver {
       candidates.push(importResult);
     }
 
+    // PHP include/require paths resolve to files via import resolution only.
+    // If that didn't find the file, do NOT fall back to the symbol
+    // name-matcher — it would mis-connect e.g. "inc/db.php" to an unrelated
+    // db.php elsewhere in the tree (a wrong edge is worse than none, #660).
+    if (isPhpIncludePathRef(ref)) {
+      return candidates.length > 0
+        ? candidates.reduce((best, curr) =>
+            curr.confidence > best.confidence ? curr : best
+          )
+        : null;
+    }
+
     // Strategy 3: Try name matching
     const nameResult = this.gateLanguage(matchReference(ref, this.context), ref);
     if (nameResult) {