فهرست منبع

fix(react): recognize forwardRef/memo/styled components + index JSX-file routes (#841)

forwardRef/memo/styled-wrapped component consts were classified as plain
`constant` nodes (the initializer is a call/tagged-template, not a bare arrow),
so the JSX-render synthesizer and component resolution skipped them — callers
and impact returned empty for the entire shadcn/ui-style UI layer. Recognize
them in the tree-sitter extractor as `component` nodes (correct body range +
callee capture), PascalCase-gated so a memoization util stays a constant.

Separately, the `react` resolver's `languages` lacked 'tsx'/'jsx', so its
`extract()` never ran on JSX files — React Router `<Route>`/createBrowserRouter
and Next.js page routes (which only live in .tsx/.jsx) were never indexed. Add
'tsx'/'jsx' and make `extract()` route-only: the component/hook regex it carried
duplicated tree-sitter nodes (a `useAuth` became two `function` nodes) and is
fully superseded by the extractor now.

Validated before/after: taxonomy 0->99 component nodes (35 w/ callers) + 1->15
routes; radix 0->262 components (80 w/ callers); cypress-realworld-app 45->52
routes (7 <Route> tags from .tsx); non-React control unchanged; node count
stable. New tests: react-hoc-component.test.ts + a route e2e in
frameworks-integration.test.ts.

Root-caused by @maxmilian (#846); reported by @Arlandaren.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Colby McHenry 1 روز پیش
والد
کامیت
64426cad93
5فایلهای تغییر یافته به همراه306 افزوده شده و 65 حذف شده
  1. 2 0
      CHANGELOG.md
  2. 53 0
      __tests__/frameworks-integration.test.ts
  3. 145 0
      __tests__/react-hoc-component.test.ts
  4. 89 0
      src/extraction/tree-sitter.ts
  5. 17 65
      src/resolution/frameworks/react.ts

+ 2 - 0
CHANGELOG.md

@@ -30,6 +30,8 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
 ### Fixes
 
+- React components declared with `forwardRef`, `memo`, or styled-components / emotion (`const Button = forwardRef(...)`, `const Card = memo(...)`, `const Box = styled.button\`…\``) are now recognized as components, so finding where they're used works. Before, they were indexed as plain constants, so `codegraph callers` and impact analysis reported "no callers found" even when the component was rendered across dozens of files — a dangerous false "safe to change" right before refactoring a shared component. Now every `<Button/>` usage links back to the component, so callers and blast radius are complete. This is the standard shadcn/ui declaration style, so for typical React design systems the whole UI layer is no longer invisible to impact analysis. Thanks @Arlandaren for the report and @maxmilian for the root-cause. (#841)
+- React Router and Next.js routes defined in `.tsx` / `.jsx` files are now indexed. Routes written as JSX — `<Route path="/users" element={<UsersPage/>}/>`, `createBrowserRouter([...])`, and Next.js `app/`/`pages/` page files — were being skipped entirely (only routes that happened to live in plain `.ts`/`.js` were picked up), so "what renders at this path?" and the route → page-component link were missing for most React apps. Now those routes show up in `codegraph search`/`codegraph_explore` and connect to the component they render, just like the backend route → handler links on other frameworks.
 - `codegraph index` now rebuilds the full graph from scratch, so it produces the same result as a fresh `codegraph init` instead of reporting "0 nodes, 0 edges" and looking like it wiped your index. Previously, re-running `index` on an unchanged project skipped every file (their contents hadn't changed) and showed an empty-looking summary; it now clears and re-indexes for an honest, complete rebuild every time. Use `codegraph sync` for fast incremental updates between full rebuilds. Thanks @Arc-univer. (#874)
 - The file watcher that auto-syncs the graph now fails cleanly when live watching can no longer be trusted, instead of looking healthy while the index quietly goes stale. If the operating system runs out of file-watch resources, or another process holds the write lock far longer than a normal save, CodeGraph now disables auto-sync once — with a single clear message telling you to run `codegraph sync` (or rely on the git sync hooks) to refresh — rather than retrying forever or repeating the same error on a loop. And while auto-sync is disabled, CodeGraph's tool responses (and `codegraph status`) now say so plainly, so your AI agent knows to read files directly instead of trusting a frozen index. This mostly matters for long-running MCP/daemon sessions, which could otherwise keep serving stale results while appearing to work. Thanks @thismilktea. (#876)
 - On Linux, hitting the kernel's inotify watch limit on a large project no longer silently leaves half the tree unwatched. CodeGraph now tells you once — naming the exact setting to raise (`fs.inotify.max_user_watches`, e.g. `sudo sysctl fs.inotify.max_user_watches=1048576`) — and keeps live-watching the directories it could register while `codegraph sync` (or the git sync hooks) covers the rest. (#876)

+ 53 - 0
__tests__/frameworks-integration.test.ts

@@ -908,3 +908,56 @@ describe('Go gRPC stub→impl synthesis', () => {
     }
   });
 });
+
+describe('React Router end-to-end route extraction (.tsx/.jsx)', () => {
+  let tmpDir: string | undefined;
+  afterEach(() => {
+    if (tmpDir) fs.rmSync(tmpDir, { recursive: true, force: true });
+    tmpDir = undefined;
+  });
+
+  // Regression for the resolver language-gate bug: the `react` resolver's
+  // `extract()` was filtered out of the .tsx/.jsx grammars, so `<Route>` routes
+  // — which only live in JSX files — were never indexed through the real
+  // indexing path (the unit tests call extract() directly and so missed this).
+  it('indexes <Route element={<X/>}> routes from a .tsx file and links them to the component', async () => {
+    tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'cg-rr-'));
+    fs.writeFileSync(
+      path.join(tmpDir, 'package.json'),
+      '{"dependencies":{"react":"^18.0.0","react-router-dom":"^6.0.0"}}'
+    );
+    fs.writeFileSync(
+      path.join(tmpDir, 'Home.tsx'),
+      'export function Home() { return null; }\n'
+    );
+    fs.writeFileSync(
+      path.join(tmpDir, 'routes.tsx'),
+      `import { Routes, Route } from 'react-router-dom';
+import { Home } from './Home';
+export function AppRoutes() {
+  return (
+    <Routes>
+      <Route path="/home" element={<Home/>} />
+    </Routes>
+  );
+}
+`
+    );
+
+    const cg = CodeGraph.initSync(tmpDir);
+    await cg.indexAll();
+    try {
+      // The route node from the .tsx file exists (the bug: it didn't).
+      const route = cg.getNodesByKind('route').find((n) => n.name === '/home');
+      expect(route, '/home route from .tsx should be indexed').toBeDefined();
+
+      // ...and it links to the Home component.
+      const home = cg.getNodesByName('Home').find((n) => n.kind === 'function');
+      expect(home).toBeDefined();
+      const toHome = cg.getOutgoingEdges(route!.id).find((e) => e.target === home!.id);
+      expect(toHome, 'route → Home component edge').toBeDefined();
+    } finally {
+      cg.close();
+    }
+  });
+});

+ 145 - 0
__tests__/react-hoc-component.test.ts

@@ -0,0 +1,145 @@
+import { describe, it, expect, beforeEach, afterEach } from 'vitest';
+import * as fs from 'node:fs';
+import * as path from 'node:path';
+import * as os from 'node:os';
+import { CodeGraph } from '../src';
+
+/**
+ * #841 — React components declared via an HOC wrapper
+ * (`const Button = forwardRef(...)`, `memo(...)`, `styled.x\`…\``) were indexed
+ * as plain `constant` nodes, so their JSX usages (`<Button/>`) got no render
+ * edge and `getCallers` / `getImpactRadius` returned empty — a dangerous silent
+ * false negative for every shadcn/ui-style design system. They must now be
+ * `component` nodes that receive jsx-render edges like function components do.
+ */
+describe('React HOC-wrapped component recognition (#841)', () => {
+  let dir: string;
+  let cg: any;
+
+  beforeEach(() => {
+    dir = fs.mkdtempSync(path.join(os.tmpdir(), 'react-hoc-'));
+    fs.writeFileSync(path.join(dir, 'package.json'), '{"dependencies":{"react":"^18.0.0"}}');
+  });
+
+  afterEach(() => {
+    cg?.close?.();
+    fs.rmSync(dir, { recursive: true, force: true });
+  });
+
+  async function index() {
+    cg = await CodeGraph.init(dir, { silent: true });
+    await cg.indexAll();
+    return (cg as any).db.db;
+  }
+
+  const kindsOf = (db: any, name: string): string[] =>
+    db
+      .prepare('SELECT kind FROM nodes WHERE name=? ORDER BY kind')
+      .all(name)
+      .map((r: any) => r.kind);
+
+  it('classifies forwardRef / memo / styled consts as component nodes (not constant)', async () => {
+    fs.writeFileSync(
+      path.join(dir, 'ui.tsx'),
+      `import * as React from 'react';
+import styled from 'styled-components';
+export const Button = React.forwardRef<HTMLButtonElement, {}>((props, ref) => <button ref={ref} {...props} />);
+export const Bare = forwardRef((props, ref) => <span ref={ref} />);
+export const Card = memo((props: { t: string }) => <div>{props.t}</div>);
+export const Named = memo(function Named(props: { t: string }) { return <div>{props.t}</div>; });
+export const Boxed = styled.div\`color: red;\`;
+export const Wrapped = styled(Button)\`padding: 4px;\`;
+export const Rewrapped = memo(Button);
+`
+    );
+    const db = await index();
+    for (const name of ['Button', 'Bare', 'Card', 'Named', 'Boxed', 'Wrapped', 'Rewrapped']) {
+      expect(kindsOf(db, name), `${name} should be a component`).toContain('component');
+      // The bug was that these stayed plain constants.
+      expect(kindsOf(db, name), `${name} should not remain a constant`).not.toContain('constant');
+    }
+  });
+
+  it('emits jsx-render edges so getCallers/getImpactRadius resolve a forwardRef component', async () => {
+    fs.writeFileSync(
+      path.join(dir, 'button.tsx'),
+      `import * as React from 'react';
+export const Button = React.forwardRef<HTMLButtonElement, {}>((props, ref) => <button ref={ref} {...props} />);
+`
+    );
+    fs.writeFileSync(
+      path.join(dir, 'page.tsx'),
+      `import { Button } from './button';
+export function Page() {
+  return <Button>Click</Button>;
+}
+`
+    );
+    const db = await index();
+
+    // The render edge exists and is the synthesized jsx-render kind.
+    const edgeRows = db
+      .prepare(
+        `SELECT s.name caller FROM edges e
+         JOIN nodes s ON s.id = e.source
+         JOIN nodes t ON t.id = e.target
+         WHERE json_extract(e.metadata, '$.synthesizedBy') = 'jsx-render'
+           AND t.kind = 'component' AND t.name = 'Button'`
+      )
+      .all();
+    expect(edgeRows.map((r: any) => r.caller)).toContain('Page');
+
+    // ...and it surfaces through the public callers API (the issue's symptom:
+    // "No callers found" before the fix).
+    const buttonId = db
+      .prepare("SELECT id FROM nodes WHERE name='Button' AND kind='component'")
+      .get().id as string;
+    const callers = cg.getCallers(buttonId).map((c: any) => c.node.name);
+    expect(callers).toContain('Page');
+  });
+
+  it('captures the inner render-fn body callees under the component', async () => {
+    fs.writeFileSync(
+      path.join(dir, 'widget.tsx'),
+      `import * as React from 'react';
+function useThing() { return 1; }
+export const Widget = React.forwardRef((props, ref) => {
+  const v = useThing();
+  return <div ref={ref}>{v}</div>;
+});
+`
+    );
+    const db = await index();
+    const rows = db
+      .prepare(
+        `SELECT t.name FROM edges e
+         JOIN nodes s ON s.id = e.source
+         JOIN nodes t ON t.id = e.target
+         WHERE s.name = 'Widget' AND s.kind = 'component'
+           AND e.kind = 'calls' AND t.name = 'useThing'`
+      )
+      .all();
+    expect(rows.length).toBeGreaterThanOrEqual(1);
+  });
+
+  it('does not misclassify non-component PascalCase consts (precision)', async () => {
+    fs.writeFileSync(
+      path.join(dir, 'controls.tsx'),
+      `import * as React from 'react';
+const cache = memo(expensiveFn);
+export const Config = loadConfig();
+export const Client = new ApiClient();
+export const Styles = styledHelper();
+export const Total = [1, 2].reduce((a, b) => a + b, 0);
+export const Theme = { color: 'red' };
+`
+    );
+    const db = await index();
+    for (const name of ['Config', 'Client', 'Styles', 'Total', 'Theme']) {
+      expect(kindsOf(db, name), `${name} must stay a constant`).toContain('constant');
+      expect(kindsOf(db, name), `${name} must not be a component`).not.toContain('component');
+    }
+    // A lowercase-named memo() result is a memoization util, not a component.
+    expect(kindsOf(db, 'cache')).not.toContain('component');
+  });
+});

+ 89 - 0
src/extraction/tree-sitter.ts

@@ -44,6 +44,10 @@ export { generateNodeId } from './tree-sitter-helpers';
  */
 const RTK_HOOK_NAME_RE = /^use[A-Z][A-Za-z0-9]*(?:Query|Mutation)$/;
 
+/** React HOC callees whose result is itself a component — a PascalCase const
+ *  initialized with one of these is a component, not a constant (#841). */
+const REACT_COMPONENT_HOCS = new Set(['forwardRef', 'memo', 'React.forwardRef', 'React.memo']);
+
 /** Vue store collections whose object-literal members are the symbols an agent
  *  looks for. Extracted as function nodes so `actions`/`mutations`/`getters` are
  *  findable + readable (the foundation under any later dispatch-bridge synth). */
@@ -1421,6 +1425,71 @@ export class TreeSitterExtractor {
     this.nodeStack.pop();
   }
 
+  /**
+   * Detect a React component declared via an HOC wrapper whose result is itself a
+   * component: `forwardRef(...)`, `memo(...)`, `React.forwardRef/memo(...)`, and
+   * styled-components / emotion `styled.tag\`…\`` / `styled(Base)\`…\``. These
+   * initializers are a call / tagged-template (not a bare arrow), so the const is
+   * otherwise classified `constant` — and a constant is skipped by both the
+   * JSX-render edge synthesizer and component resolution, so `<Button/>` usages
+   * get no edge and callers/impact silently return empty (#841).
+   *
+   * Returns `{ inner }` — the inline render function to extract as the component
+   * body, or `null` when the wrapper has no inline function (`memo(Imported)`,
+   * `styled.button\`…\``) and only a bodyless component node is minted — or
+   * `undefined` when this initializer is not a recognized component wrapper.
+   */
+  private reactComponentHoc(valueNode: SyntaxNode): { inner: SyntaxNode | null } | undefined {
+    if (valueNode.type !== 'call_expression') return undefined;
+    const callee = getChildByField(valueNode, 'function');
+    if (!callee) return undefined;
+    const calleeText = getNodeText(callee, this.source);
+    // styled-components / emotion: `styled.button\`…\`` / `styled(Base)\`…\``.
+    // tree-sitter models these tagged templates as a call_expression whose callee
+    // is the `styled.x` / `styled(Base)` tag (\b avoids matching `styledFoo`).
+    // No inline render fn — the argument is the CSS template.
+    if (/^styled\b/.test(calleeText)) return { inner: null };
+    // React HOCs: `forwardRef`/`memo`/`React.forwardRef`/`React.memo`.
+    if (!REACT_COMPONENT_HOCS.has(calleeText)) return undefined;
+    // The first arrow / function-expression argument is the render fn (if inline;
+    // `memo(Imported)` passes a bare identifier and has none).
+    const args = getChildByField(valueNode, 'arguments');
+    let inner: SyntaxNode | null = null;
+    if (args) {
+      for (let i = 0; i < args.namedChildCount; i++) {
+        const a = args.namedChild(i);
+        if (a && (a.type === 'arrow_function' || a.type === 'function_expression')) {
+          inner = a;
+          break;
+        }
+      }
+    }
+    return { inner };
+  }
+
+  /**
+   * Emit a `component` node for an HOC-wrapped React component declaration (see
+   * reactComponentHoc). Named by the declarator (`Button`) and located at it so
+   * the node range spans the body. When the wrapper has an inline render
+   * function, its body is walked so the component's callees (hooks, helpers) are
+   * captured under the component node — matching how a plain
+   * `const Foo = () => …` arrow component already behaves.
+   */
+  private extractReactComponentNode(
+    name: string,
+    declarator: SyntaxNode,
+    innerFn: SyntaxNode | null,
+    extra: { docstring?: string; signature?: string; isExported?: boolean }
+  ): void {
+    const compNode = this.createNode('component', name, declarator, extra);
+    if (!compNode || !innerFn || !this.extractor) return;
+    this.nodeStack.push(compNode.id);
+    const body = this.extractor.resolveBody?.(innerFn, this.extractor.bodyField)
+      ?? getChildByField(innerFn, this.extractor.bodyField);
+    if (body) this.visitFunctionBody(body, compNode.id);
+    this.nodeStack.pop();
+  }
+
   /**
    * Extract a class
    */
@@ -2316,6 +2385,26 @@ export class TreeSitterExtractor {
             const initValue = valueNode ? getNodeText(valueNode, this.source).slice(0, 100) : undefined;
             const initSignature = initValue ? `= ${initValue}${initValue.length >= 100 ? '...' : ''}` : undefined;
 
+            // React HOC-wrapped components (`forwardRef`/`memo`/`styled`) — see
+            // reactComponentHoc. The initializer is a call / tagged-template (not
+            // a bare arrow), so without this the const is a plain `constant`,
+            // which the JSX-render synthesizer and component resolution both skip
+            // → `<Button/>` usages get no edge and callers/impact return empty
+            // (the whole shadcn/ui design-system pattern, #841). PascalCase-gated
+            // to the component naming convention so a memoization util
+            // (`const cache = memo(fn)`) stays a constant.
+            if (valueNode && /^[A-Z]/.test(name)) {
+              const hoc = this.reactComponentHoc(valueNode);
+              if (hoc) {
+                this.extractReactComponentNode(name, child, hoc.inner, {
+                  docstring,
+                  signature: initSignature,
+                  isExported,
+                });
+                continue;
+              }
+            }
+
             const varNode = this.createNode(kind, name, child, {
               docstring,
               signature: initSignature,

+ 17 - 65
src/resolution/frameworks/react.ts

@@ -9,7 +9,12 @@ import { FrameworkResolver, UnresolvedRef, ResolvedRef, ResolutionContext } from
 
 export const reactResolver: FrameworkResolver = {
   name: 'react',
-  languages: ['javascript', 'typescript'],
+  // Includes 'tsx'/'jsx' so route extraction runs on JSX files (where
+  // `<Route element={<X/>}>` routes live) — without them the .tsx/.jsx grammars
+  // were filtered out of the extract pass and those routes were never indexed.
+  // (resolve() is unaffected — it runs for every detected framework regardless
+  // of language; only the extract pass filters on `languages`.)
+  languages: ['javascript', 'typescript', 'tsx', 'jsx'],
 
   detect(context: ResolutionContext): boolean {
     // Check for React in package.json
@@ -90,70 +95,17 @@ export const reactResolver: FrameworkResolver = {
     const references: UnresolvedRef[] = [];
     const now = Date.now();
 
-    // Extract component definitions
-    // function Component() or const Component = () =>
-    const componentPatterns = [
-      // Function components
-      /(?:export\s+)?function\s+([A-Z][a-zA-Z0-9]*)\s*\(/g,
-      // Arrow function components
-      /(?:export\s+)?(?:const|let)\s+([A-Z][a-zA-Z0-9]*)\s*=\s*(?:\([^)]*\)|[a-zA-Z_][a-zA-Z0-9_]*)\s*=>/g,
-      // forwardRef components
-      /(?:export\s+)?(?:const|let)\s+([A-Z][a-zA-Z0-9]*)\s*=\s*(?:React\.)?forwardRef/g,
-      // memo components
-      /(?:export\s+)?(?:const|let)\s+([A-Z][a-zA-Z0-9]*)\s*=\s*(?:React\.)?memo/g,
-    ];
-
-    for (const pattern of componentPatterns) {
-      let match;
-      while ((match = pattern.exec(content)) !== null) {
-        const [fullMatch, name] = match;
-        const line = content.slice(0, match.index).split('\n').length;
-
-        // Check if it returns JSX (rough heuristic)
-        const afterMatch = content.slice(match.index + fullMatch.length, match.index + fullMatch.length + 500);
-        const hasJSX = afterMatch.includes('<') && (afterMatch.includes('/>') || afterMatch.includes('</'));
-
-        if (hasJSX) {
-          nodes.push({
-            id: `component:${filePath}:${name}:${line}`,
-            kind: 'component',
-            name: name!,
-            qualifiedName: `${filePath}::${name}`,
-            filePath,
-            startLine: line,
-            endLine: line,
-            startColumn: 0,
-            endColumn: fullMatch.length,
-            language: filePath.endsWith('.tsx') ? 'tsx' : 'jsx',
-            isExported: fullMatch.includes('export'),
-            updatedAt: now,
-          });
-        }
-      }
-    }
-
-    // Extract custom hooks
-    const hookPattern = /(?:export\s+)?(?:function|const|let)\s+(use[A-Z][a-zA-Z0-9]*)\s*[=(]/g;
-    let hookMatch;
-    while ((hookMatch = hookPattern.exec(content)) !== null) {
-      const [fullMatch, name] = hookMatch;
-      const line = content.slice(0, hookMatch.index).split('\n').length;
-
-      nodes.push({
-        id: `hook:${filePath}:${name}:${line}`,
-        kind: 'function',
-        name: name!,
-        qualifiedName: `${filePath}::${name}`,
-        filePath,
-        startLine: line,
-        endLine: line,
-        startColumn: 0,
-        endColumn: fullMatch.length,
-        language: filePath.endsWith('.ts') || filePath.endsWith('.tsx') ? 'typescript' : 'javascript',
-        isExported: fullMatch.includes('export'),
-        updatedAt: now,
-      });
-    }
+    // Components and custom hooks are NOT extracted here. The tree-sitter
+    // extractor already emits them natively across .ts/.tsx/.js/.jsx — function
+    // and arrow components as `function` nodes, HOC-wrapped components
+    // (`forwardRef`/`memo`/`styled`) as `component` nodes (#841), and `useX`
+    // hooks as `function` nodes. Re-deriving them here with regex only ran on
+    // .ts/.js anyway (this resolver's `languages` didn't include the 'tsx'/'jsx'
+    // grammars), and it DUPLICATED those tree-sitter nodes (e.g. a `useAuth`
+    // ended up as two `function` nodes). This `extract` now contributes only
+    // what tree-sitter can't: route nodes (React Router + Next.js conventions),
+    // which is why 'tsx'/'jsx' are now in `languages` — `<Route>`/`element={<X/>}`
+    // routes live in JSX files and were previously skipped entirely.
 
     // React Router: <Route path="/x" component={Comp}/> (v5) or
     // <Route path="/x" element={<Comp/>}/> (v6). Attributes appear in any order,