Explorar o código

fix(extraction): record instantiates for C++ stack/brace construction (#1035) (#1049)

`instantiates` edges came only from heap `new Calculator(0)` (a
new_expression) and copy-init `Calculator c = Calculator(0)` (a
call_expression). Stack direct-init `Calculator calc(0)` and brace-init
`Widget w{1, 2}` parse as a `declaration` whose constructor arguments hang
directly off the declarator as an argument_list / initializer_list — there
is no call/new node — so the function-body walker saw no constructor
invocation and emitted no edge. A function that built objects with the
ordinary stack syntax looked like it didn't construct them, and the
dependency was missing from impact / callers.

In the body walker, a C++ `declaration` that is a stack/brace construction
now reuses extractInstantiation (a declaration's `type` field IS the
constructed class name, and extractInstantiation already strips template
args / namespace and emits the `instantiates` ref). Gated by
isCppStackConstruction, which requires BOTH a class-like type
(type_identifier / template_type / qualified_identifier — so `int x(0)`
and `auto z = …` are excluded) AND a declarator carrying args
(argument_list / initializer_list — so default `Calculator c;` and the
most-vexing-parse `Calculator c();` are excluded). The edge targets the
class node, not the same-named constructor method.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Colby Mchenry hai 11 horas
pai
achega
2176a7a439
Modificáronse 4 ficheiros con 128 adicións e 0 borrados
  1. 1 0
      CHANGELOG.md
  2. 36 0
      __tests__/extraction.test.ts
  3. 37 0
      __tests__/resolution.test.ts
  4. 54 0
      src/extraction/tree-sitter.ts

+ 1 - 0
CHANGELOG.md

@@ -15,6 +15,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 - CodeGraph no longer shows a misleading "different git working tree" warning when you work inside a submodule (or other nested repo) of a workspace you indexed at its root. Because indexing a workspace now pulls in its submodules and embedded clones, a query run from inside one correctly resolves up to the workspace's single index — but it was still warning that the results came from "a different working tree" and suggesting you run `codegraph init -i`, which would have split the submodule back out into its own separate index and undone the unified view. CodeGraph now recognizes that the nested repo's code is already part of the workspace index and stays quiet. The warning still appears for a genuine git worktree — a second checkout of the *same* repository on another branch, which really does have its own uncommitted symbols — since that's the case it exists for. (#1031, #1033)
 - On Windows, CodeGraph's background server now shuts down cleanly instead of occasionally aborting with a crash error. When the indexed project contained a nested repository (a submodule or embedded clone), stopping the server could race the file watcher's teardown and exit with a Windows crash code rather than a clean exit. Shutdown now lets that teardown finish first, so the server stops cleanly and promptly. (Windows only; other platforms were unaffected.) (#1033)
 - C++ classes that inherit from a templated base — `class Widget : public Base<int>`, a CRTP base like `class App : public CRTPBase<App>`, or a struct inheriting a template — are now linked to that base class in the graph. Previously the template arguments (`<int>`) made the inheritance go unrecognized, so these classes looked like they inherited from nothing and impact/callers analysis stopped at the boundary; the connection is now followed like any other base class. Thanks @ryancu7 for the report. (#1043)
+- C++ objects constructed on the stack — `Calculator calc(0)` or `Widget w{1, 2}` — now record that the enclosing function instantiates that class, the same as heap construction (`new Calculator(0)`) already did. Previously only the `new` form was tracked, so a function that built objects with the ordinary stack syntax looked like it didn't construct them and the dependency was missing from impact/callers. Thanks @Dshuishui for the report. (#1035)
 
 
 ## [1.1.2] - 2026-06-28

+ 36 - 0
__tests__/extraction.test.ts

@@ -2763,6 +2763,42 @@ class Both : public Base<char>, public Plain {};
     });
   });
 
+  describe('C++ stack-allocation construction (#1035)', () => {
+    // `Calculator calc(0)` (direct-init) and `Widget w{1, 2}` (brace-init) carry
+    // the constructor args directly on the declarator — no call/new node — so
+    // they emitted no constructor reference, unlike heap `new Calculator(0)`. An
+    // `instantiates` ref to the constructed type is now emitted for both.
+    const instNames = (code: string) =>
+      extractFromSource('f.cpp', `void run() {\n${code}\n}`)
+        .unresolvedReferences.filter((r) => r.referenceKind === 'instantiates')
+        .map((r) => r.referenceName);
+
+    it('emits an instantiates ref for direct-init and brace-init', () => {
+      expect(instNames('Calculator calc(0);')).toEqual(['Calculator']);
+      expect(instNames('Widget w{1, 2};')).toEqual(['Widget']);
+    });
+
+    it('strips template args and namespace to the bare class name', () => {
+      // `std::vector<int> v(10)` → `vector`; `ns::Widget w(0)` → `Widget`.
+      expect(instNames('std::vector<int> v(10);')).toEqual(['vector']);
+      expect(instNames('ns::Widget w(0);')).toEqual(['Widget']);
+    });
+
+    it('does not emit for primitives, default construction, or the most-vexing parse', () => {
+      expect(instNames('int x(5);')).toEqual([]); // primitive direct-init
+      expect(instNames('int y{6};')).toEqual([]); // primitive brace-init
+      expect(instNames('auto z = make();')).toEqual([]); // auto + call (handled elsewhere)
+      expect(instNames('Calculator deferred;')).toEqual([]); // default construction, no args
+      expect(instNames('Calculator calc();')).toEqual([]); // function declaration (most-vexing parse)
+    });
+
+    it('emits a single instantiates ref for a multi-declarator statement', () => {
+      // `Calculator a(1), b(2);` shares one `type` field; both construct a
+      // Calculator, so one ref suffices (it dedups to one edge regardless).
+      expect(instNames('Calculator a(1), b(2);')).toEqual(['Calculator']);
+    });
+  });
+
   describe('C/C++ imports', () => {
     it('should extract system include', () => {
       const code = `#include <iostream>`;

+ 37 - 0
__tests__/resolution.test.ts

@@ -930,6 +930,43 @@ def bootstrap():
       expect(callsToUserService).toHaveLength(0);
     });
 
+    it('records instantiates for C++ stack/brace construction, targeting the class (#1035)', async () => {
+      // `Calculator calc(0)` (direct-init) and `Widget w{1, 2}` (brace-init)
+      // carry the constructor args directly on the declarator — there's no
+      // call/new node — so they recorded no `instantiates` edge, while heap
+      // `new Calculator(0)` did. Both stack forms now do.
+      fs.writeFileSync(
+        path.join(tempDir, 'm.cpp'),
+        `class Calculator { public: Calculator(int seed) {} int add(int a, int b){ return a+b; } };
+class Widget { public: Widget(int a, int b) {} };
+
+int runStack(int a, int b) { Calculator calc(0); return calc.add(a, b); }
+int runBrace() { Widget w{1, 2}; return 0; }
+int runHeap(int a, int b) { Calculator* c = new Calculator(0); return c->add(a, b); }
+void noise() { int x(5); int y{6}; Calculator deferred; }
+`
+      );
+      cg = await CodeGraph.init(tempDir, { index: true });
+
+      const fn = (name: string) => cg.getNodesByKind('function').find((n) => n.name === name)!;
+      const instTargets = (name: string) =>
+        cg
+          .getOutgoingEdges(fn(name).id)
+          .filter((e) => e.kind === 'instantiates')
+          .map((e) => cg.getNode(e.target)!);
+
+      // Direct-init (the issue) and brace-init both instantiate, targeting the
+      // CLASS node — not the same-named constructor method.
+      const stack = instTargets('runStack');
+      expect(stack.map((n) => `${n.kind}:${n.name}`)).toContain('class:Calculator');
+      expect(instTargets('runBrace').map((n) => `${n.kind}:${n.name}`)).toContain('class:Widget');
+      // Heap still works (regression guard).
+      expect(instTargets('runHeap').map((n) => `${n.kind}:${n.name}`)).toContain('class:Calculator');
+      // Primitives (`int x(0)`/`int y{6}`) and bare default construction
+      // (`Calculator deferred;`) must NOT mint an instantiates edge.
+      expect(instTargets('noise')).toHaveLength(0);
+    });
+
     it('resolves a cross-file static method call to the method, not the class (#825)', async () => {
       // `Foo.bar()` where `Foo` is an imported class must link to the static
       // method `Foo::bar`, NOT to the class `Foo`. Previously the import

+ 54 - 0
src/extraction/tree-sitter.ts

@@ -3882,6 +3882,47 @@ export class TreeSitterExtractor {
     }
   }
 
+  /**
+   * Is this C++ `declaration` a stack/direct-initialization object construction
+   * that invokes a constructor — `Calculator calc(0)` (direct-init) or
+   * `Widget w{1, 2}` (brace-init) — as opposed to a plain variable or a
+   * function declaration? Used to emit an `instantiates` edge for the
+   * call-less construction syntax (#1035); heap `new T(...)` is handled
+   * separately by INSTANTIATION_KINDS.
+   *
+   * Two signals, both required:
+   *  - the `type` field is a class-like NAMED type (`type_identifier`,
+   *    `template_type`, or `qualified_identifier`). Primitives (`int x(0)`),
+   *    `auto` (`placeholder_type_specifier` — that form always carries a real
+   *    `call_expression`, already handled), and sized specifiers are excluded —
+   *    they construct no class; and
+   *  - a declarator carries constructor arguments: an `init_declarator` whose
+   *    `value` is an `argument_list` (`(args)`) or `initializer_list` (`{args}`).
+   *    This skips default construction `Calculator c;` (no value) and the
+   *    most-vexing-parse `Calculator c();` (a bodyless `function_declarator`,
+   *    a function decl — not a construction).
+   */
+  private isCppStackConstruction(node: SyntaxNode): boolean {
+    const typeNode = getChildByField(node, 'type');
+    if (
+      !typeNode ||
+      (typeNode.type !== 'type_identifier' &&
+        typeNode.type !== 'template_type' &&
+        typeNode.type !== 'qualified_identifier')
+    ) {
+      return false;
+    }
+    for (let i = 0; i < node.namedChildCount; i++) {
+      const child = node.namedChild(i);
+      if (child?.type !== 'init_declarator') continue;
+      const value = getChildByField(child, 'value');
+      if (value && (value.type === 'argument_list' || value.type === 'initializer_list')) {
+        return true;
+      }
+    }
+    return false;
+  }
+
   /**
    * Static-member / value-read pass. A type/enum/class used only via a member
    * VALUE — `Enum.value`, `Type.CONST`, `Colors.red`, `Foo::BAR` — recorded no
@@ -4259,6 +4300,19 @@ export class TreeSitterExtractor {
         }
       }
 
+      // C++ stack / direct-initialization construction — `Calculator calc(0)`
+      // and `Widget w{1, 2}`. Unlike heap `new Calculator(0)` (a new_expression
+      // handled above), these carry the constructor arguments directly on the
+      // declarator with NO call/new node, so the body walker saw no constructor
+      // invocation and recorded no `instantiates` edge (#1035). A declaration's
+      // `type` field IS the constructed class name, so reuse extractInstantiation
+      // (which strips template args / namespace and emits the `instantiates`
+      // ref). Children still recurse below, so a nested ctor-arg call
+      // (`Calculator calc(make())`) keeps its own `calls` ref.
+      if (nodeType === 'declaration' && this.language === 'cpp' && this.isCppStackConstruction(node)) {
+        this.extractInstantiation(node);
+      }
+
       // Static-member / value-read: `Enum.value`, `Type.CONST`, `Foo::BAR`.
       this.extractStaticMemberRef(node);