Quellcode durchsuchen

fix(extraction): recognize common third-party C++ inline macros, not just UE (#1101)

* fix(extraction): recognize common third-party C++ inline macros, not just UE

Extend blankCppInlineMacros beyond Unreal Engine's FORCEINLINE family to the
inline/linkage macros that vendored third-party libraries define and that
mangle function names the same way:

- pugixml: PUGI__FN / PUGI__FN_NO_INLINE (before the return type) and
  PUGIXML_FUNCTION (linkage macro, between return type and name — the blank
  mechanism handles both positions).
- Godot: _FORCE_INLINE_ / _ALWAYS_INLINE_.
- Boost: BOOST_FORCEINLINE / BOOST_NOINLINE.
- Generic cross-ecosystem hints: ALWAYS_INLINE / FORCE_INLINE / NOINLINE.

The list now drives a single generated alternation (longest-token-first), so
adding a codebase's macro is a one-line change. Still curated exact tokens in
specifier position only — a real all-caps return type like `HRESULT DoIt()` is
never touched (verified by controls).

Validated on CARLA (large UE project, 1131 C++/h files): function-name mangles
440 -> 16 (428 fixed). The 16 residual and 7 clean->mangled shifts are all in
third-party vendored files — chiefly pugixml.cpp, a 12k-line macro amalgamation
where error recovery is non-local, so blanking one of several *stacked* macros
(PUGI__FN + PUGI__UNSIGNED_OVERFLOW …) shifts an already-imperfect extraction.
Normal C++/UE code (ActionRoguelike, ALS) sees zero regressions — blanking a
macro there only helps. Chasing pugixml's internal attribute macros is left out
of scope. Seven regression tests added.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(changelog): note third-party C++ inline macro recognition (#1101)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Colby Mchenry vor 16 Stunden
Ursprung
Commit
a164ceae8b
3 geänderte Dateien mit 48 neuen und 16 gelöschten Zeilen
  1. 1 0
      CHANGELOG.md
  2. 13 0
      __tests__/extraction.test.ts
  3. 34 16
      src/extraction/languages/c-cpp.ts

+ 1 - 0
CHANGELOG.md

@@ -14,6 +14,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 - C++ forward declarations no longer crowd out the real class definition. A `class Foo;` forward declaration — common in large C++ and Unreal Engine codebases, where a heavily used class is forward-declared across dozens of headers — was indexed as its own class node every time it appeared. So exploring that class returned mostly forward-declaration sites, and could even pick one of them as the representative for blast-radius, burying the actual definition and its members and callers. Bodiless forward declarations are now skipped for C and C++, exactly as forward-declared structs and enums already were, so only the real definition is indexed. Languages where a class with no body is a complete definition — such as Kotlin's `class Empty` and Scala — are unaffected. Thanks @luoyxy for the report and root-cause analysis. (#1093)
 - C++ methods that return a reference, and user-defined conversion operators, are now indexed under their correct names. An inline getter like `const FGameplayTagContainer& GetActiveTags() const` — everywhere in Unreal Engine headers — was indexed as `& GetActiveTags() const` instead of `GetActiveTags`, and a conversion operator like `operator EALSMovementState() const` kept its trailing `() const` instead of reading `operator EALSMovementState`. In both cases the garbled name meant you couldn't find the symbol by name and its callers weren't linked. Both now read cleanly, matching how pointer-returning and value-returning methods already worked. (#1096)
 - C++ functions written with an inline-specifier macro before the return type are now indexed correctly. In Unreal Engine, inline helpers are commonly written `FORCEINLINE FString GetEnumerationToString(...)`; the `FORCEINLINE` macro made the parser read the return type as part of the function's name (`FString GetEnumerationToString` instead of `GetEnumerationToString`) and lose the real return type, so the function couldn't be found by name and its callers weren't linked. CodeGraph now recognizes the standard Unreal inline macros (`FORCEINLINE`, `FORCENOINLINE`, `FORCEINLINE_DEBUGGABLE`), so both the name and the return type are captured. (#1100)
+- The same function-name recovery now covers inline macros from common third-party C++ libraries, not just Unreal Engine — including pugixml (`PUGI__FN`, `PUGIXML_FUNCTION`), Godot (`_FORCE_INLINE_`), Boost (`BOOST_FORCEINLINE`), and generic `ALWAYS_INLINE` / `FORCE_INLINE`. Functions decorated with these are now indexed under their real names. On a large Unreal project vendoring these libraries this cleaned up the large majority of remaining function-name garbling. (#1101)
 
 
 ## [1.1.6] - 2026-06-30

+ 13 - 0
__tests__/extraction.test.ts

@@ -2963,6 +2963,19 @@ class APXCharacter {  // the one real definition
       expect(infoOf('static FORCEINLINE const FString& GetRef(int V) { return H(V); }').map((x) => x.name)).toContain('GetRef');
     });
 
+    it('handles common third-party inline macros (pugixml, Godot, Boost, generic)', () => {
+      // pugixml: PUGI__FN before the return type; PUGIXML_FUNCTION (linkage)
+      // between the return type and the name — both recovered.
+      expect(infoOf('PUGI__FN void* default_allocate(size_t n) { return H(n); }').map((x) => x.name)).toContain('default_allocate');
+      expect(infoOf('PUGI__FN_NO_INLINE bool strequal(const char_t* a) { return H(a); }').map((x) => x.name)).toContain('strequal');
+      expect(infoOf('std::string PUGIXML_FUNCTION as_utf8(const wchar_t* s) { return H(s); }').map((x) => x.name)).toContain('as_utf8');
+      // Godot / Boost / generic inline hints
+      expect(infoOf('_FORCE_INLINE_ String get_name() const { return H(); }').map((x) => x.name)).toContain('get_name');
+      expect(infoOf('_ALWAYS_INLINE_ Vector2 get_pos() { return H(); }').map((x) => x.name)).toContain('get_pos');
+      expect(infoOf('BOOST_FORCEINLINE result_type call() { return H(); }').map((x) => x.name)).toContain('call');
+      expect(infoOf('ALWAYS_INLINE MyType compute() { return H(); }').map((x) => x.name)).toContain('compute');
+    });
+
     it('leaves ordinary functions and real all-caps return types untouched (controls)', () => {
       expect(infoOf('FString GetName(int V) { return H(V); }')).toEqual([{ name: 'GetName', ret: 'FString' }]);
       // A real all-caps type that is NOT a listed inline macro stays the return type.

+ 34 - 16
src/extraction/languages/c-cpp.ts

@@ -247,27 +247,45 @@ export function blankCppExportMacros(source: string): string {
  * becomes the return type and, for a non-primitive return, the return type gets
  * glued onto the name (`GetName` → `"FString GetName"`), so the function can't
  * be found by name and its callers don't link. This is pervasive in Unreal
- * Engine, where inline helpers are written `FORCEINLINE <ret> <name>(…)`.
- * Replacing the macro with equal-length spaces preserves every byte offset (so
- * line/column stay exact) and the declaration then parses as an ordinary
- * function — recovering the real name AND the return type — mirroring how
- * `blankCppExportMacros` recovers macro-annotated classes (#946/#1061).
+ * Engine (`FORCEINLINE <ret> <name>(…)`) and in vendored third-party libraries
+ * that define their own inline macro (pugixml's `PUGI__FN`, Godot's
+ * `_FORCE_INLINE_`, Boost's `BOOST_FORCEINLINE`, …). Replacing the macro with
+ * equal-length spaces preserves every byte offset (so line/column stay exact)
+ * and the declaration then parses as an ordinary function — recovering the real
+ * name AND the return type — mirroring how `blankCppExportMacros` recovers
+ * macro-annotated classes (#946/#1061).
  *
  * Matched tightly so it can't touch an ordinary identifier: only the exact,
- * well-known UE inline specifiers, and only in specifier position — immediately
- * followed by whitespace and the identifier that starts the return type or name.
- * That lookahead leaves value/expression uses (`x = FORCEINLINE ? …`), string
- * literals, and `FORCEINLINE_SOMETHINGELSE` (word-boundary) alone. To cover a
- * new codebase's inline macro, add its exact token here.
+ * curated inline-specifier tokens below (never an arbitrary all-caps token, so a
+ * real return type like `HRESULT DoIt()` is untouched), and only in specifier
+ * position — immediately followed by whitespace and the identifier that starts
+ * the return type or name. That lookahead leaves value/expression uses
+ * (`x = FORCEINLINE ? …`), string literals, and longer words
+ * (`FORCEINLINE_SOMETHINGELSE`, word-boundary) alone. To cover a new codebase's
+ * inline macro, add its exact token to the list.
  */
-const CPP_INLINE_MACROS = ['FORCEINLINE_DEBUGGABLE', 'FORCENOINLINE', 'FORCEINLINE'] as const;
+const CPP_INLINE_MACROS = [
+  // Unreal Engine
+  'FORCEINLINE_DEBUGGABLE', 'FORCENOINLINE', 'FORCEINLINE',
+  // pugixml (ubiquitous vendored XML parser): `#define PUGI__FN inline` before
+  // the return type, plus `PUGIXML_FUNCTION` (linkage macro) between the return
+  // type and the name — the blank mechanism handles both positions.
+  'PUGI__FN_NO_INLINE', 'PUGI__FN', 'PUGIXML_FUNCTION',
+  // Godot
+  '_ALWAYS_INLINE_', '_FORCE_INLINE_',
+  // Boost
+  'BOOST_FORCEINLINE', 'BOOST_NOINLINE',
+  // Common cross-ecosystem inline/attribute hints
+  'ALWAYS_INLINE', 'FORCE_INLINE', 'NOINLINE',
+] as const;
+// One alternation, longest token first so a longer macro wins over a prefix.
+const CPP_INLINE_MACRO_RE = new RegExp(
+  `\\b(${[...CPP_INLINE_MACROS].sort((a, b) => b.length - a.length).join('|')})\\b(?=\\s+[A-Za-z_])`,
+  'g'
+);
 export function blankCppInlineMacros(source: string): string {
   if (!CPP_INLINE_MACROS.some((m) => source.indexOf(m) !== -1)) return source;
-  return source.replace(
-    // `FORCEINLINE_DEBUGGABLE` before `FORCEINLINE` so the longer token wins.
-    /\b(FORCEINLINE_DEBUGGABLE|FORCENOINLINE|FORCEINLINE)\b(?=\s+[A-Za-z_])/g,
-    (_m, macro) => ' '.repeat(macro.length)
-  );
+  return source.replace(CPP_INLINE_MACRO_RE, (m) => ' '.repeat(m.length));
 }
 
 /** C/C++ source pre-processing before tree-sitter: recover both macro-annotated