Sfoglia il codice sorgente

fix(extraction): recover C++ function names prefixed by an inline-specifier macro (#1100)

* fix(extraction): recover C++ function names prefixed by an inline-specifier macro

An unknown inline-specifier macro before a function's return type
(`FORCEINLINE FString GetName(…)`) threw tree-sitter into error recovery: the
macro was read as the return type and — for a non-primitive return — the return
type was glued onto the name, so the function was indexed as
`"FString GetName"` instead of `GetName`, unfindable by name and with no caller
links. This is pervasive in Unreal Engine, where inline helpers are written
`FORCEINLINE <ret> <name>(…)` (e.g. ALS's `FORCEINLINE FString GetEnumerationToString`).

Add `blankCppInlineMacros`, a preParse that blanks the known UE inline macros
(`FORCEINLINE`, `FORCENOINLINE`, `FORCEINLINE_DEBUGGABLE`) with equal-length
spaces so byte offsets stay exact and the declaration parses as an ordinary
function — recovering both the real name AND the return type. This is the same
recover-don't-drop approach as blankCppExportMacros (#946/#1061), and the two
are composed into the cppExtractor preParse.

Matched tightly (exact known tokens, only in specifier position — followed by
the identifier that starts the return type/name), so ordinary identifiers, real
all-caps return types (`HRESULT DoIt()`), string literals, expression uses, and
longer words (`FORCEINLINE_COUNT`) are untouched — verified by controls. C++-only;
Kotlin/Scala re-index byte-for-byte identical. Five regression tests added.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(changelog): note C++ inline-specifier-macro function name fix (#1100)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Colby Mchenry 16 ore fa
parent
commit
9b2ce1c8f6
3 ha cambiato i file con 97 aggiunte e 4 eliminazioni
  1. 1 0
      CHANGELOG.md
  2. 55 1
      __tests__/extraction.test.ts
  3. 41 3
      src/extraction/languages/c-cpp.ts

+ 1 - 0
CHANGELOG.md

@@ -13,6 +13,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
 - C++ forward declarations no longer crowd out the real class definition. A `class Foo;` forward declaration — common in large C++ and Unreal Engine codebases, where a heavily used class is forward-declared across dozens of headers — was indexed as its own class node every time it appeared. So exploring that class returned mostly forward-declaration sites, and could even pick one of them as the representative for blast-radius, burying the actual definition and its members and callers. Bodiless forward declarations are now skipped for C and C++, exactly as forward-declared structs and enums already were, so only the real definition is indexed. Languages where a class with no body is a complete definition — such as Kotlin's `class Empty` and Scala — are unaffected. Thanks @luoyxy for the report and root-cause analysis. (#1093)
 - C++ methods that return a reference, and user-defined conversion operators, are now indexed under their correct names. An inline getter like `const FGameplayTagContainer& GetActiveTags() const` — everywhere in Unreal Engine headers — was indexed as `& GetActiveTags() const` instead of `GetActiveTags`, and a conversion operator like `operator EALSMovementState() const` kept its trailing `() const` instead of reading `operator EALSMovementState`. In both cases the garbled name meant you couldn't find the symbol by name and its callers weren't linked. Both now read cleanly, matching how pointer-returning and value-returning methods already worked. (#1096)
+- C++ functions written with an inline-specifier macro before the return type are now indexed correctly. In Unreal Engine, inline helpers are commonly written `FORCEINLINE FString GetEnumerationToString(...)`; the `FORCEINLINE` macro made the parser read the return type as part of the function's name (`FString GetEnumerationToString` instead of `GetEnumerationToString`) and lose the real return type, so the function couldn't be found by name and its callers weren't linked. CodeGraph now recognizes the standard Unreal inline macros (`FORCEINLINE`, `FORCENOINLINE`, `FORCEINLINE_DEBUGGABLE`), so both the name and the return type are captured. (#1100)
 
 
 ## [1.1.6] - 2026-06-30

+ 55 - 1
__tests__/extraction.test.ts

@@ -11,7 +11,7 @@ import * as os from 'os';
 import { CodeGraph } from '../src';
 import { extractFromSource, scanDirectory, buildDefaultIgnore, discoverEmbeddedRepoRoots, buildScopeIgnore } from '../src/extraction';
 import { detectLanguage, isLanguageSupported, getSupportedLanguages, initGrammars, loadAllGrammars, isSourceFile } from '../src/extraction/grammars';
-import { stripCppTemplateArgs, blankCppExportMacros } from '../src/extraction/languages/c-cpp';
+import { stripCppTemplateArgs, blankCppExportMacros, blankCppInlineMacros } from '../src/extraction/languages/c-cpp';
 import { normalizePath } from '../src/utils';
 
 beforeAll(async () => {
@@ -2928,6 +2928,60 @@ class APXCharacter {  // the one real definition
     });
   });
 
+  describe('C++ macro-prefixed function names (#1093 follow-up)', () => {
+    // An unknown inline-specifier macro before the return type
+    // (`FORCEINLINE FString GetName(…)`) threw tree-sitter into error recovery:
+    // the macro became the return type and — for a non-primitive return — the
+    // return type was glued onto the name (`"FString GetName"`), so the function
+    // was unfindable by name and its callers didn't link. `blankCppInlineMacros`
+    // blanks the known UE inline macros before parsing (offset-preserving), the
+    // same recover-don't-drop approach as the macro-annotated-class fix. Pervasive
+    // in Unreal Engine (`FORCEINLINE`).
+    const infoOf = (code: string) =>
+      extractFromSource('m.cpp', code).nodes
+        .filter((n) => n.kind === 'method' || n.kind === 'function')
+        .map((n) => ({ name: n.name, ret: n.returnType }));
+
+    it('recovers the real name AND return type of a FORCEINLINE function', () => {
+      expect(infoOf('static FORCEINLINE FString GetName(int V) { return H(V); }')).toEqual([
+        { name: 'GetName', ret: 'FString' },
+      ]);
+    });
+
+    it('handles the templated UE helper shape (GetEnumerationToString)', () => {
+      const names = infoOf(
+        'template <typename E> static FORCEINLINE FString GetEnumerationToString(const E V) { return H(V); }'
+      ).map((x) => x.name);
+      expect(names).toContain('GetEnumerationToString');
+    });
+
+    it('handles FORCENOINLINE / FORCEINLINE_DEBUGGABLE, methods, void, and reference returns', () => {
+      expect(infoOf('FORCENOINLINE FString A(int V){return H(V);}').map((x) => x.name)).toContain('A');
+      expect(infoOf('FORCEINLINE_DEBUGGABLE FString B(int V){return H(V);}').map((x) => x.name)).toContain('B');
+      expect(infoOf('struct S { FORCEINLINE FString GetName(int V) { return H(V); } };').map((x) => x.name)).toContain('GetName');
+      expect(infoOf('static FORCEINLINE void DoThing(int V) { H(V); }').map((x) => x.name)).toContain('DoThing');
+      expect(infoOf('static FORCEINLINE const FString& GetRef(int V) { return H(V); }').map((x) => x.name)).toContain('GetRef');
+    });
+
+    it('leaves ordinary functions and real all-caps return types untouched (controls)', () => {
+      expect(infoOf('FString GetName(int V) { return H(V); }')).toEqual([{ name: 'GetName', ret: 'FString' }]);
+      // A real all-caps type that is NOT a listed inline macro stays the return type.
+      expect(infoOf('HRESULT DoIt(int V) { return H(V); }')).toEqual([{ name: 'DoIt', ret: 'HRESULT' }]);
+    });
+
+    it('blankCppInlineMacros preserves offsets and only touches specifier-position macros', () => {
+      // Blanked with equal-length spaces (byte offsets preserved).
+      expect(blankCppInlineMacros('FORCEINLINE FString F()')).toBe('            FString F()');
+      expect(blankCppInlineMacros('FORCEINLINE FString F()')).toHaveLength('FORCEINLINE FString F()'.length);
+      // Not in specifier position → untouched: string literals, expressions,
+      // longer word (`FORCEINLINE_COUNT`), and the fast path.
+      expect(blankCppInlineMacros('const char* s = "FORCEINLINE";')).toBe('const char* s = "FORCEINLINE";');
+      expect(blankCppInlineMacros('x = FORCEINLINE + 1;')).toBe('x = FORCEINLINE + 1;');
+      expect(blankCppInlineMacros('int FORCEINLINE_COUNT = 3;')).toBe('int FORCEINLINE_COUNT = 3;');
+      expect(blankCppInlineMacros('no macros here')).toBe('no macros here');
+    });
+  });
+
   describe('C++ templated base-class inheritance (#1043)', () => {
     // Inheriting from a template (`class D : public Base<int>`) recorded the base
     // ref as the full instantiation `Base<int>`, which never name-matched the

+ 41 - 3
src/extraction/languages/c-cpp.ts

@@ -239,10 +239,48 @@ export function blankCppExportMacros(source: string): string {
   );
 }
 
+/**
+ * Blank a known inline-specifier macro sitting in front of a function's return
+ * type (`FORCEINLINE FString GetName(…)`), before parsing. Not knowing the
+ * macro, tree-sitter can't reconcile `MACRO <return-type> <name>(` — an extra
+ * type-like token before the name — and drops into error recovery: the macro
+ * becomes the return type and, for a non-primitive return, the return type gets
+ * glued onto the name (`GetName` → `"FString GetName"`), so the function can't
+ * be found by name and its callers don't link. This is pervasive in Unreal
+ * Engine, where inline helpers are written `FORCEINLINE <ret> <name>(…)`.
+ * Replacing the macro with equal-length spaces preserves every byte offset (so
+ * line/column stay exact) and the declaration then parses as an ordinary
+ * function — recovering the real name AND the return type — mirroring how
+ * `blankCppExportMacros` recovers macro-annotated classes (#946/#1061).
+ *
+ * Matched tightly so it can't touch an ordinary identifier: only the exact,
+ * well-known UE inline specifiers, and only in specifier position — immediately
+ * followed by whitespace and the identifier that starts the return type or name.
+ * That lookahead leaves value/expression uses (`x = FORCEINLINE ? …`), string
+ * literals, and `FORCEINLINE_SOMETHINGELSE` (word-boundary) alone. To cover a
+ * new codebase's inline macro, add its exact token here.
+ */
+const CPP_INLINE_MACROS = ['FORCEINLINE_DEBUGGABLE', 'FORCENOINLINE', 'FORCEINLINE'] as const;
+export function blankCppInlineMacros(source: string): string {
+  if (!CPP_INLINE_MACROS.some((m) => source.indexOf(m) !== -1)) return source;
+  return source.replace(
+    // `FORCEINLINE_DEBUGGABLE` before `FORCEINLINE` so the longer token wins.
+    /\b(FORCEINLINE_DEBUGGABLE|FORCENOINLINE|FORCEINLINE)\b(?=\s+[A-Za-z_])/g,
+    (_m, macro) => ' '.repeat(macro.length)
+  );
+}
+
+/** C/C++ source pre-processing before tree-sitter: recover both macro-annotated
+ * class definitions and macro-prefixed function definitions. Offset-preserving. */
+function preParseCppSource(source: string): string {
+  return blankCppInlineMacros(blankCppExportMacros(source));
+}
+
 export const cppExtractor: LanguageExtractor = {
-  // Recover macro-annotated class/struct definitions (`class MYMODULE_API Foo : Base`)
-  // that tree-sitter otherwise misparses into a phantom function (#1061/#946).
-  preParse: blankCppExportMacros,
+  // Recover macro-annotated class/struct definitions (`class MYMODULE_API Foo : Base`,
+  // #1061/#946) and macro-prefixed functions (`FORCEINLINE FString Foo()`, #1093
+  // follow-up) that tree-sitter otherwise misparses.
+  preParse: preParseCppSource,
   functionTypes: ['function_definition'],
   classTypes: ['class_specifier'],
   // A bodiless `class_specifier` is a forward declaration (`class Foo;`) or an