Kaynağa Gözat

fix(extraction): correct C++ reference-return and conversion-operator method names (#1096)

* fix(extraction): correct C++ reference-return and conversion-operator method names

Two pre-existing C++ name-extraction bugs surfaced while validating the #1093
forward-declaration fix against real Unreal Engine repos (ActionRoguelike, ALS):

1. Inline methods/functions returning a reference were named after the whole
   declarator. `const int& getRef() const {…}` parses with a reference_declarator
   wrapping the function_declarator; extractName unwrapped pointer_declarator but
   not reference_declarator, so the method was named "& getRef() const" instead
   of "getRef" — polluting search and breaking caller linkage. Ubiquitous in UE
   headers (`const FGameplayTagContainer& GetActiveTags() const`). Now the
   reference wrapper is unwrapped alongside the pointer wrapper.

2. User-defined conversion operators were named with their full declarator —
   `operator EALSMovementState() const` — instead of `operator EALSMovementState`,
   so they didn't match the symbolic-overload style (`operator+`) and carried
   `() const` noise. The operator_cast declarator is now named `operator <type>`.

Both are additive and C++-scoped (reference_declarator / operator_cast are C++
grammar nodes). Pointer, value, and out-of-line reference returns, and symbolic
operator overloads, are unchanged. Six regression tests added.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(changelog): note C++ reference-return and conversion-operator name fixes (#1096)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Colby Mchenry 18 saat önce
ebeveyn
işleme
712a406726
3 değiştirilmiş dosya ile 83 ekleme ve 2 silme
  1. 1 0
      CHANGELOG.md
  2. 64 0
      __tests__/extraction.test.ts
  3. 18 2
      src/extraction/tree-sitter.ts

+ 1 - 0
CHANGELOG.md

@@ -12,6 +12,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 ### Fixes
 
 - C++ forward declarations no longer crowd out the real class definition. A `class Foo;` forward declaration — common in large C++ and Unreal Engine codebases, where a heavily used class is forward-declared across dozens of headers — was indexed as its own class node every time it appeared. So exploring that class returned mostly forward-declaration sites, and could even pick one of them as the representative for blast-radius, burying the actual definition and its members and callers. Bodiless forward declarations are now skipped for C and C++, exactly as forward-declared structs and enums already were, so only the real definition is indexed. Languages where a class with no body is a complete definition — such as Kotlin's `class Empty` and Scala — are unaffected. Thanks @luoyxy for the report and root-cause analysis. (#1093)
+- C++ methods that return a reference, and user-defined conversion operators, are now indexed under their correct names. An inline getter like `const FGameplayTagContainer& GetActiveTags() const` — everywhere in Unreal Engine headers — was indexed as `& GetActiveTags() const` instead of `GetActiveTags`, and a conversion operator like `operator EALSMovementState() const` kept its trailing `() const` instead of reading `operator EALSMovementState`. In both cases the garbled name meant you couldn't find the symbol by name and its callers weren't linked. Both now read cleanly, matching how pointer-returning and value-returning methods already worked. (#1096)
 
 
 ## [1.1.6] - 2026-06-30

+ 64 - 0
__tests__/extraction.test.ts

@@ -2864,6 +2864,70 @@ class APXCharacter {  // the one real definition
     });
   });
 
+  describe('C++ reference-return method/function names (#1093 follow-up)', () => {
+    // An inline method/function returning a reference parses with a
+    // `reference_declarator` wrapping the `function_declarator`. That wrapper
+    // wasn't unwrapped (only `pointer_declarator` was), so the name captured the
+    // whole declarator — `const int& getRef() const {…}` became the method named
+    // "& getRef() const" instead of "getRef", polluting search and callers. Very
+    // common in Unreal Engine headers (`const FGameplayTagContainer& GetActiveTags() const`).
+    const namesOf = (code: string) =>
+      extractFromSource('r.cpp', code).nodes
+        .filter((n) => n.kind === 'method' || n.kind === 'function')
+        .map((n) => n.name);
+
+    it('names an inline reference-returning method by its identifier, not the declarator', () => {
+      const names = namesOf('class C {\npublic:\n  const int& getRef() const { return x; }\n  int& mutRef() { return x; }\n  int x;\n};');
+      expect(names).toContain('getRef');
+      expect(names).toContain('mutRef');
+      // No name leaks the reference sigil or the parameter/qualifier tail.
+      expect(names.some((n) => /[&()]/.test(n))).toBe(false);
+    });
+
+    it('handles rvalue-reference returns and reference-returning free functions', () => {
+      expect(namesOf('class C { int&& take() { return 1; } };')).toContain('take');
+      expect(namesOf('const int& globalRef() { static int x; return x; }')).toContain('globalRef');
+    });
+
+    it('leaves pointer, value, and out-of-line reference returns unchanged (controls)', () => {
+      expect(namesOf('class C { int* getPtr() { return &x; } int x; };')).toContain('getPtr');
+      expect(namesOf('class C { int getVal() const { return x; } int x; };')).toContain('getVal');
+      // Out-of-line `T& C::f()` already resolves via the qualified-name hook.
+      expect(namesOf('const int& C::getRef() const { return x; }')).toContain('getRef');
+    });
+  });
+
+  describe('C++ user-defined conversion operator names (#1093 follow-up)', () => {
+    // A conversion operator's declarator is an `operator_cast` (target type +
+    // `() const` tail). It was named with the whole declarator —
+    // `operator EALSMovementState() const` — so it didn't match the symbolic-
+    // overload style (`operator+`) and carried parameter noise. It's now named
+    // `operator <type>`. Common in Unreal Engine enum-wrapper structs.
+    const namesOf = (code: string) =>
+      extractFromSource('o.cpp', code).nodes
+        .filter((n) => n.kind === 'method' || n.kind === 'function')
+        .map((n) => n.name);
+
+    it('names a conversion operator as "operator <type>", not the full declarator', () => {
+      const names = namesOf('struct S {\n  operator int() const { return 1; }\n  operator bool() { return true; }\n  int x;\n};');
+      expect(names).toContain('operator int');
+      expect(names).toContain('operator bool');
+      expect(names.some((n) => n.includes('(') || n.includes('const'))).toBe(false);
+    });
+
+    it('handles a user-type conversion operator', () => {
+      expect(
+        namesOf('struct FALSMovementState {\n  operator EALSMovementState() const { return State; }\n  EALSMovementState State;\n};')
+      ).toContain('operator EALSMovementState');
+    });
+
+    it('leaves symbolic operator overloads unchanged (control)', () => {
+      const names = namesOf('struct S {\n  S operator+(const S& o) const { return o; }\n  int& operator[](int i) { return x; }\n  int x;\n};');
+      expect(names).toContain('operator+');
+      expect(names).toContain('operator[]'); // reference-returning subscript, name still clean
+    });
+  });
+
   describe('C++ templated base-class inheritance (#1043)', () => {
     // Inheriting from a template (`class D : public Base<int>`) recorded the base
     // ref as the full instantiation `Base<int>`, which never name-matched the

+ 18 - 2
src/extraction/tree-sitter.ts

@@ -69,13 +69,29 @@ function extractName(node: SyntaxNode, source: string, extractor: LanguageExtrac
   // Try field name first
   const nameNode = getChildByField(node, extractor.nameField);
   if (nameNode) {
-    // Unwrap pointer_declarator(s) for C/C++ pointer return types
+    // Unwrap pointer_declarator / reference_declarator for C/C++ pointer and
+    // reference return types (`int* f()`, `int& f()`, `int&& f()`). Without
+    // unwrapping the reference wrapper an inline reference-returning method is
+    // named "& f() const" instead of "f" — common in Unreal Engine gameplay
+    // headers (`const FGameplayTagContainer& GetActiveTags() const`). Out-of-line
+    // defs (`T& C::f()`) already resolve via the qualified-name hook. A
+    // pointer_declarator exposes its inner through a `declarator` field; a
+    // reference_declarator has none, so it's reached via namedChild(0).
     let resolved = nameNode;
-    while (resolved.type === 'pointer_declarator') {
+    while (resolved.type === 'pointer_declarator' || resolved.type === 'reference_declarator') {
       const inner = getChildByField(resolved, 'declarator') || resolved.namedChild(0);
       if (!inner) break;
       resolved = inner;
     }
+    // C++ user-defined conversion operator: the declarator is an `operator_cast`
+    // whose first child is the target type and second is the `() const` tail. Name
+    // it `operator <type>` (the conventional spelling) rather than the whole
+    // `operator EALSMovementState() const` declarator, so it matches symbolic
+    // overloads (`operator+`) and is findable by the type name.
+    if (resolved.type === 'operator_cast') {
+      const typeNode = resolved.namedChild(0);
+      return typeNode ? `operator ${getNodeText(typeNode, source).trim()}` : getNodeText(resolved, source);
+    }
     // Handle complex declarators (C/C++)
     if (resolved.type === 'function_declarator' || resolved.type === 'declarator') {
       const innerName = getChildByField(resolved, 'declarator') || resolved.namedChild(0);