Răsfoiți Sursa

fix(cpp): recover export-macro-annotated classes instead of dropping them (#1061) (#1070)

A C++ class annotated with an export/visibility macro between `class`/`struct`
and the type name — `class MYMODULE_API UMyComponent : public UActorComponent`
(the standard Unreal-Engine `*_API` pattern), or the equivalent `*_EXPORT`/
`*_ABI` macros in Qt, Boost, LLVM, etc. — makes tree-sitter read `class MACRO`
as an elaborated type and the whole declaration as a function. #946 dropped the
resulting phantom function, but that also discarded the recoverable class name,
members, and base-class edge, so the class never entered the graph and
"find subclasses" / type-hierarchy / impact-through-inheritance returned
nothing for effectively every gameplay class in a UE project.

Add `blankCppExportMacros` as `cppExtractor.preParse`: it blanks the macro with
equal-length spaces before parsing (offset-preserving, like C#'s
`blankCsharpPreprocessorDirectives`/#237), so the declaration parses as a normal
class_specifier and existing extraction emits the node, members, and `extends`
edge. Generalized past UE `*_API` to any all-caps export macro, with two
false-positive guards: the trailing `[:{]` definition-guard (leaves elaborated
var decls like `struct FOO var;` alone) and requiring the macro to be followed
by the real name (leaves an all-caps class NAME such as `class FOO : public Base`
alone). C++-only, so C's heavier `struct TAG var;` never reaches it. The #946
drop stays as the fallback for any residual misparse the blanking doesn't catch.

Validated on google/leveldb (LEVELDB_EXPORT, 134 files): class/struct nodes
266→293, extends edges 292→359, phantom functions 588→513; every export-macro
real definition flips function→class and `EnvWrapper extends Env` goes
absent→present.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Colby Mchenry 1 zi în urmă
părinte
comite
e596c968ab
3 a modificat fișierele cu 139 adăugiri și 14 ștergeri
  1. 1 0
      CHANGELOG.md
  2. 102 13
      __tests__/extraction.test.ts
  3. 36 1
      src/extraction/languages/c-cpp.ts

+ 1 - 0
CHANGELOG.md

@@ -11,6 +11,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
 ### Fixes
 
+- C++ classes annotated with an export or visibility macro are now indexed as real classes. This is the `class MYMODULE_API UMyComponent : public UActorComponent` style used throughout Unreal Engine — where an `XXX_API` macro sits between `class`/`struct` and the type name — as well as the equivalent `*_EXPORT` / `*_ABI` macros common in Qt, Boost, LLVM, and many other libraries. Previously that macro made the parser misread the whole declaration as a function, so the class was dropped entirely: it never appeared in the graph and its base class went unrecorded, which made "find subclasses", type-hierarchy, and impact-through-inheritance queries come back empty for effectively every gameplay class in an Unreal Engine project. The class, its members, and its inheritance link are now all captured. Thanks @luoyxy for the detailed report and proposed fix. (#1061)
 - `codegraph_explore` now surfaces the options/config type behind a function when you ask, in plain language, what to change to add a parameter to it. A question like "what do I need to change to add a new parameter to X" shares no words with the file that actually defines X's options — for example a functional-options struct and its `With…` builders living in a separate `options.go`, reachable only through X's signature — so that file scored near-zero on every text and connectivity signal and got dropped: explore returned X itself but not the file you'd edit, and the agent fell back to grep. Explore now follows a named function's parameter and return types and pulls in the file that defines them when ranking would otherwise bury it, so the options/config file shows up with its fields. Well-connected types that already rank are left untouched, so ordinary "how does X work" flow questions are unchanged. (The separate tools `codegraph_search`/`codegraph_impact`/`codegraph_node` remain available via `CODEGRAPH_MCP_TOOLS` for anyone who prefers driving each step explicitly.) Thanks @wauxhall for the detailed investigation. (#1064)
 
 

+ 102 - 13
__tests__/extraction.test.ts

@@ -11,7 +11,7 @@ import * as os from 'os';
 import { CodeGraph } from '../src';
 import { extractFromSource, scanDirectory, buildDefaultIgnore, discoverEmbeddedRepoRoots, buildScopeIgnore } from '../src/extraction';
 import { detectLanguage, isLanguageSupported, getSupportedLanguages, initGrammars, loadAllGrammars, isSourceFile } from '../src/extraction/grammars';
-import { stripCppTemplateArgs } from '../src/extraction/languages/c-cpp';
+import { stripCppTemplateArgs, blankCppExportMacros } from '../src/extraction/languages/c-cpp';
 import { normalizePath } from '../src/utils';
 
 beforeAll(async () => {
@@ -2665,13 +2665,17 @@ std::unique_ptr<Widget> makeWidget() { return nullptr; }
     });
   });
 
-  describe('C++ macro-prefixed class/struct misparse (#946)', () => {
-    // An export/visibility macro before the class name plus a base clause
-    // (`class MACRO Name : public Base { … }`) makes tree-sitter read `class
-    // MACRO` as an elaborated type and the whole declaration as a
-    // function_definition named after the class, spanning the entire body — a
-    // phantom `function` that polluted callers/impact/blast-radius. It's dropped.
-    it('does not mint a phantom function for a macro-annotated class that inherits', () => {
+  describe('C++ macro-prefixed class/struct misparse (#946 → recovered in #1061)', () => {
+    // An export/visibility macro before the class name (`class MACRO Name :
+    // public Base { … }`) makes tree-sitter read `class MACRO` as an elaborated
+    // type and the whole declaration as a function_definition named after the
+    // class — a phantom `function` that polluted callers/impact/blast-radius.
+    // #946 dropped that phantom; #1061's preParse (`blankCppExportMacros`) now
+    // blanks the ALL-CAPS macro before parsing, so the class parses normally and
+    // is *recovered* — node, members, and base edge all present — not just
+    // de-phantomed. The #946 drop survives as the fallback for any residual
+    // misparse the blanking doesn't catch.
+    it('recovers a macro-annotated class that inherits (no phantom, real class + base edge)', () => {
       const code = `#pragma once
 #define MAPCORE_EXPORT __attribute__((visibility("default")))
 
@@ -2694,16 +2698,25 @@ public:
       const result = extractFromSource('provider.h', code);
 
       // The misparse used to surface as `function | LocalDataProvider` spanning
-      // the whole class body — a false caller in the graph. It's gone now.
+      // the whole class body — a false caller in the graph. It's gone.
       expect(
         result.nodes.find((n) => n.name === 'LocalDataProvider' && n.kind === 'function')
       ).toBeUndefined();
 
+      // …and the class is now recovered (was dropped under #946), with its
+      // `extends DataProvider` edge — the whole point of #1061.
+      expect(result.nodes.find((n) => n.name === 'LocalDataProvider')?.kind).toBe('class');
+      expect(
+        result.unresolvedReferences.find(
+          (r) => r.referenceKind === 'extends' && r.referenceName === 'DataProvider'
+        )
+      ).toBeTruthy();
+
       // The sibling class without the macro is unaffected — still a class.
       expect(result.nodes.find((n) => n.name === 'DataProvider')?.kind).toBe('class');
     });
 
-    it('drops the struct variant too, without dropping a genuine class', () => {
+    it('recovers the struct variant too, without disturbing a genuine class', () => {
       const code = `
 #define API __declspec(dllexport)
 struct API Widget : public Base { int x; };
@@ -2711,14 +2724,90 @@ class Plain : public Base { public: int y; };
 `;
       const result = extractFromSource('widget.cpp', code);
 
-      // `struct MACRO Name : Base { … }` misparses the same way — no phantom function.
+      // `struct MACRO Name : Base { … }` misparses the same way — no phantom
+      // function, and the struct is recovered with its base edge.
       expect(
         result.nodes.find((n) => n.name === 'Widget' && n.kind === 'function')
       ).toBeUndefined();
+      expect(result.nodes.find((n) => n.name === 'Widget')?.kind).toBe('struct');
 
-      // A normal class with a base clause and no macro must still be a class — the
-      // drop is precise, not a blanket "class with inheritance" filter.
+      // A normal class with a base clause and no macro is untouched.
       expect(result.nodes.find((n) => n.name === 'Plain')?.kind).toBe('class');
+      const exts = result.unresolvedReferences
+        .filter((r) => r.referenceKind === 'extends')
+        .map((r) => r.referenceName);
+      expect(exts.filter((n) => n === 'Base').length).toBe(2); // Widget + Plain both extend Base
+    });
+  });
+
+  describe('C++ export-macro class recovery (#1061)', () => {
+    // Unreal-Engine style: `class MYGAME_API UMyComponent : public UActorComponent`.
+    // The leading `*_API` macro alone (base clause or not) triggers the #946
+    // misparse and dropped the class — breaking subclass / type-hierarchy /
+    // inheritance-impact queries for effectively every gameplay class in a UE
+    // project. blankCppExportMacros recovers them.
+    it('recovers UE *_API classes and the inheritance edge (the issue repro)', () => {
+      const code = `class ENGINE_API UActorComponent { };
+class MYGAME_API UMyComponent : public UActorComponent { };
+`;
+      const result = extractFromSource('ue.cpp', code);
+      const classes = result.nodes.filter((n) => n.kind === 'class').map((n) => n.name);
+      expect(classes).toContain('UActorComponent'); // macro, no base — also was dropped
+      expect(classes).toContain('UMyComponent');
+      expect(result.nodes.find((n) => n.kind === 'function')).toBeUndefined(); // no phantom
+      expect(
+        result.unresolvedReferences.find(
+          (r) => r.referenceKind === 'extends' && r.referenceName === 'UActorComponent'
+        )
+      ).toBeTruthy();
+    });
+
+    it('blankCppExportMacros blanks only the header macro, offset-preserving', () => {
+      // Blanking replaces the macro with equal-length spaces, so the output is
+      // byte-for-byte the same length and identical *except* the macro is gone —
+      // every downstream line/column stays exact.
+      const check = (inp: string, macro: string, rest: string) => {
+        const out = blankCppExportMacros(inp);
+        expect(out.length).toBe(inp.length); // every byte offset preserved
+        expect(out).not.toContain(macro); // the macro token is blanked
+        expect(out.replace(/ +/g, ' ')).toBe(rest); // nothing else changed
+      };
+      // Generalizes across the export-macro space: UE _API, Qt/Boost _EXPORT,
+      // LLVM _ABI, bare API.
+      check(
+        'class MYGAME_API UMyComponent : public UActorComponent { };',
+        'MYGAME_API',
+        'class UMyComponent : public UActorComponent { };'
+      );
+      check('struct MAPCORE_EXPORT W : B {}', 'MAPCORE_EXPORT', 'struct W : B {}');
+      check('class LLVM_ABI Foo {}', 'LLVM_ABI', 'class Foo {}');
+    });
+
+    it('does NOT blank an all-caps class NAME or an elaborated-type var decl', () => {
+      // The name itself being ALL-CAPS (with or without a base) must survive —
+      // the macro is only the token *before* the name, gated on a `: { ` def.
+      for (const c of [
+        'class FOO { int x; };',
+        'class FOO : public Base { int x; };',
+        'struct BAR : public Base { int y; };',
+        'enum class COLOR { Red, Green };',
+        // elaborated-type variable declarations end in ; = [ — never : {
+        'struct FOO bar;',
+        'class FOO obj = make();',
+        'struct FOO arr[10];',
+        // a *_API macro used as an ordinary value elsewhere
+        'int x = SOME_API; void f() { use(MYMODULE_API); }',
+      ]) {
+        expect(blankCppExportMacros(c)).toBe(c);
+      }
+      // And the all-caps-named class keeps its base edge through real extraction.
+      const result = extractFromSource('ctrl.cpp', 'class FOO : public Base { int x; };');
+      expect(result.nodes.find((n) => n.name === 'FOO')?.kind).toBe('class');
+      expect(
+        result.unresolvedReferences.find(
+          (r) => r.referenceKind === 'extends' && r.referenceName === 'Base'
+        )
+      ).toBeTruthy();
     });
   });
 

+ 36 - 1
src/extraction/languages/c-cpp.ts

@@ -209,7 +209,40 @@ function isMacroMisparsedTypeDecl(node: SyntaxNode): boolean {
   return true;
 }
 
+/**
+ * Blank an export/visibility macro in a `class/struct EXPORT_MACRO Name …`
+ * *definition* header before parsing. Not knowing the macro, tree-sitter reads
+ * `class EXPORT_MACRO` as an elaborated type specifier and the rest as a
+ * function, so the whole class — its name, base clause, and members — drops out
+ * of the index (#946 catches the resulting phantom function but can't recover
+ * the class), which silently breaks type-hierarchy / inheritance-impact queries
+ * for effectively every Unreal-Engine (`*_API`), Qt/Boost (`*_EXPORT`), LLVM
+ * (`*_ABI`), … class. Replacing the macro with equal-length spaces preserves
+ * every byte offset (and thus line/column), so the declaration then parses as a
+ * normal class_specifier and the existing extraction emits the node, members,
+ * and `extends` edge. (#1061, follow-up to #946.)
+ *
+ * Matched tightly so it can't touch the same macro used as an ordinary value
+ * elsewhere (`int x = SOME_API;`): the macro is the ALL-CAPS token sitting
+ * *between* `class`/`struct` and the type name, and the trailing `[:{]`
+ * definition-guard fires only when a base clause or body follows — the only
+ * shape that misparses. That guard also leaves elaborated-type variable
+ * declarations (`struct FOO var;`, `class FOO obj = …`) untouched, since those
+ * end in `;` / `=` / `[`, never `:` / `{`. C++-only (wired into cppExtractor),
+ * so C's heavier use of `struct TAG var;` never reaches it.
+ */
+export function blankCppExportMacros(source: string): string {
+  if (source.indexOf('class') === -1 && source.indexOf('struct') === -1) return source;
+  return source.replace(
+    /\b(class|struct)(\s+)([A-Z][A-Z0-9_]+)(?=\s+[A-Za-z_]\w*(?:\s+final)?\s*[:{])/g,
+    (_m, kw, ws, macro) => kw + ws + ' '.repeat(macro.length)
+  );
+}
+
 export const cppExtractor: LanguageExtractor = {
+  // Recover macro-annotated class/struct definitions (`class MYMODULE_API Foo : Base`)
+  // that tree-sitter otherwise misparses into a phantom function (#1061/#946).
+  preParse: blankCppExportMacros,
   functionTypes: ['function_definition'],
   classTypes: ['class_specifier'],
   methodTypes: ['function_definition'],
@@ -262,7 +295,9 @@ export const cppExtractor: LanguageExtractor = {
     const cppKeywords = ['switch', 'if', 'for', 'while', 'do', 'case', 'return'];
     if (cppKeywords.includes(name)) return true;
     // `class MACRO Name : public Base { … }` misparses to a function_definition
-    // named after the class — drop that phantom (#946).
+    // named after the class. `blankCppExportMacros` (preParse) recovers the
+    // common ALL-CAPS export-macro shape; this drop is the fallback for any
+    // residual misparse it doesn't blank — still no phantom function (#1061/#946).
     return isMacroMisparsedTypeDecl(node);
   },
   extractImport: (node, source) => {