Переглянути джерело

feat(impact): C++ free-function name extraction — stop naming functions after a param/return type

The C++ qualified-name resolver BFS-searched the whole declarator for a
`qualified_identifier`, INCLUDING the parameter list and the trailing return
type. So a plain free function `std::string TableFileName(const std::string&
dbname)` was named `string` (from the `std::string` parameter), and an
`auto to_string(...) -> std::string` was named `string` (from the trailing
return) — calls never resolved, `codegraph_node` couldn't find them, and the
defining file looked like nothing depended on it. This is how most free
functions in a namespaced C++ library look, so it was a broad miss.

Skip `parameter_list` and `trailing_return_type` when locating the function's
own qualified name; a non-qualified name correctly falls back to the default
declarator-name extraction (`TableFileName`, not `string`).

Measured: leveldb 91.7% → 94.8% fair cross-file dependent coverage; fmt went from
32 functions mis-named after a bare type to 1; redis (C, unaffected — uses no
qualified-name resolver) held at 92.2%, its residual being genuine frontiers
(generated command tables, macro-reached code, function-pointer command dispatch,
example modules). Node count stable (functions were already nodes, just
mis-named). Full suite green; qualified methods (`Foo::bar`) unaffected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Colby McHenry 2 тижнів тому
батько
коміт
ec8fe3f
3 змінених файлів з 92 додано та 30 видалено
  1. 1 0
      CHANGELOG.md
  2. 63 0
      __tests__/extraction.test.ts
  3. 28 30
      src/extraction/languages/c-cpp.ts

+ 1 - 0
CHANGELOG.md

@@ -24,6 +24,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 - Kotlin Multiplatform `expect`/`actual` declarations are now connected. A platform implementation — `actual fun`, `actual class`, or an `actual typealias` in a `jvm` / `native` / `js` / `wasm` source set — is linked to the common `expect` declaration it fulfills (including the common case of an `expect class` fulfilled by an `actual typealias`). Previously a caller in common code resolved to the `expect` declaration, so every platform `actual` looked like it had no dependents and editing one showed an empty blast radius; now changing a platform implementation surfaces the common API and everything that uses it. (Kotlin)
 - Scala impact and `codegraph affected` now connect the type graph that typeclass-style code is built on. A parameterized supertype (`trait Monoid[A] extends Semigroup[A] with Serializable`) now links to each parent; a type used in a `val`/`def` signature, as a type argument, or as a context bound (`def f[A: Monoid]`) — including the trailing implicit parameter list (`(implicit M: Monoid[A])`) where typeclass instances are passed — now records a dependency; and `new T[...] { … }` counts as an instantiation. Previously Scala linked only plain calls and bare, non-generic supertypes, so a trait extended with type parameters, used as a type, or required as an implicit looked like nothing depended on it — which on a typeclass-heavy codebase (cats, algebra) was most of the graph. (Scala)
 - PHP impact and `codegraph affected` now understand namespaces and `use` imports. Classes are tracked by their namespaced name, so the many same-named classes a framework defines (Laravel has 7+ `Factory` interfaces, several `Dispatcher`s, across namespaces) are told apart instead of collapsing into one arbitrary match. A `use App\Contracts\Cache\Factory;` now records a dependency on exactly that class — so a contract or interface that's imported and constructor-injected (the dependency-injection pattern) is no longer reported as having no dependents — and parameter, property, and return type-hints are recorded too. Previously PHP ignored namespaces entirely and linked only calls, `new`, and inheritance. (PHP)
+- C++ free functions are now indexed under their real name. A function written with a qualified-type parameter (`std::string TableFileName(const std::string& dbname)`) or an `auto … -> std::string` trailing return type was mistakenly named after that type (`string`), so calls to it never resolved, `codegraph_node` couldn't find it by name, and the file defining it looked like nothing depended on it. The function now keeps its real name, so cross-file calls, callers, and blast radius work — a meaningful gain for any namespaced C++ codebase (this is how most free functions in a library look). (C++)
 - Ruby impact and `codegraph affected` now follow mixins and `require`s. `include`, `extend`, and `prepend` of a module — Ruby's primary composition mechanism (ActiveSupport concerns, `Comparable`, `Enumerable`) — now record a dependency on that module, so editing a concern surfaces every class that mixes it in; previously these were read as a call to a method named `include`, so a module whose methods are exercised only by application code looked like nothing depended on it. And `require "lib/foo"` / `require_relative "../foo"` now link to the required file, so a file pulled in only by a `require` (config-loaded components, gems that don't autoload) is no longer reported as having no dependents. Together these took a typical gem from ~71% of its files showing real dependents to ~100%. (Ruby)
 - C# `record` types are now indexed. `record`, `record class`, and `record struct` declarations (everywhere in modern C# — DTOs, value objects, CQRS messages, MediatR notifications) were previously skipped entirely, so every reference, generic type argument (`IEnumerable<MyRecord>`), and `new MyRecord(...)` pointed at nothing and the file defining a record looked like it had no callers or dependents. (#237)
 - Go interfaces now connect to their implementations. Go has no `implements` keyword — a type satisfies an interface just by having the right methods — so CodeGraph now infers that link: a struct whose methods cover an interface's method set is treated as implementing it, and a call through the interface (`API.Marshal(...)`) reaches every concrete implementation. This means a type used only via an interface (the common plugin/strategy pattern — e.g. JSON-codec or renderer implementations selected at runtime) is no longer reported as having no callers or no dependents, and impact now flows from an interface method to the implementations behind it. (#584)

+ 63 - 0
__tests__/extraction.test.ts

@@ -3551,6 +3551,69 @@ end
   });
 });
 
+describe('C++ free-function name extraction', () => {
+  let tempDir: string;
+  let cg: CodeGraph;
+
+  beforeEach(() => {
+    tempDir = createTempDir();
+  });
+
+  afterEach(() => {
+    if (cg) cg.close();
+    if (fs.existsSync(tempDir)) fs.rmSync(tempDir, { recursive: true, force: true });
+  });
+
+  it('names a free function correctly when it has qualified-type params or a trailing return type', async () => {
+    const src = path.join(tempDir, 'src');
+    fs.mkdirSync(src, { recursive: true });
+
+    // TableFileName has a `const std::string&` parameter; BuildName uses an
+    // `auto … -> std::string` trailing return type. Both used to be named
+    // `string` (picked up from the parameter / return type), so callers never
+    // resolved and the defining file looked like nothing depended on it.
+    fs.writeFileSync(
+      path.join(src, 'names.cc'),
+      `#include <string>
+
+std::string TableFileName(const std::string& dbname, int number) {
+  return dbname;
+}
+
+auto BuildName(const std::string& a) -> std::string {
+  return a;
+}
+`
+    );
+    fs.writeFileSync(
+      path.join(src, 'user.cc'),
+      `#include <string>
+
+std::string use() {
+  return TableFileName("db", 1) + BuildName("x");
+}
+`
+    );
+
+    cg = CodeGraph.initSync(tempDir);
+    await cg.indexAll();
+    cg.resolveReferences();
+
+    // The functions are extracted under their real names, not `string`.
+    const fns = cg.getNodesByKind('function');
+    const tableFn = fns.find((n) => n.name === 'TableFileName');
+    const buildFn = fns.find((n) => n.name === 'BuildName');
+    expect(tableFn, 'TableFileName extracted (not "string")').toBeDefined();
+    expect(buildFn, 'BuildName extracted (not "string")').toBeDefined();
+
+    // And the cross-file calls resolve to them, so editing names.cc surfaces user.cc.
+    for (const fn of [tableFn!, buildFn!]) {
+      const reached = [...cg.getImpactRadius(fn.id, 3).nodes.values()].map((n) => n.filePath ?? '');
+      expect(reached.some((p) => p.endsWith('user.cc')), `${fn.name} should be called from user.cc`).toBe(true);
+    }
+  });
+});
+
 describe('Full Indexing', () => {
   let tempDir: string;
 

+ 28 - 30
src/extraction/languages/c-cpp.ts

@@ -2,49 +2,47 @@ import type { Node as SyntaxNode } from 'web-tree-sitter';
 import { getChildByField, getNodeText } from '../tree-sitter-helpers';
 import type { LanguageExtractor } from '../tree-sitter-types';
 
-function extractCppQualifiedMethodName(node: SyntaxNode, source: string): string | undefined {
-  const declarator = getChildByField(node, 'declarator');
-  if (!declarator) return undefined;
-
+/**
+ * Find the function NAME's `qualified_identifier` (`Foo::bar`) inside a
+ * declarator, skipping the `parameter_list` — a parameter with a qualified type
+ * (`const std::string& x`) must NOT be mistaken for the method name. Without the
+ * skip, a plain free function `std::string TableFileName(const std::string&...)`
+ * was named `string` (from the parameter type), so calls to it never resolved
+ * and its file looked like nothing depended on it.
+ */
+function findDeclaratorQualifiedId(declarator: SyntaxNode): SyntaxNode | undefined {
   const queue: SyntaxNode[] = [declarator];
   while (queue.length > 0) {
     const current = queue.shift()!;
-    if (current.type === 'qualified_identifier') {
-      const text = getNodeText(current, source).trim();
-      const parts = text.split('::').filter(Boolean);
-      return parts[parts.length - 1];
-    }
+    if (current.type === 'qualified_identifier') return current;
     for (let i = 0; i < current.namedChildCount; i++) {
       const child = current.namedChild(i);
-      if (child) queue.push(child);
+      // Don't descend into parameters or the trailing return type — their types
+      // (`const std::string&`, `-> std::string`) aren't the function name.
+      if (child && child.type !== 'parameter_list' && child.type !== 'trailing_return_type') {
+        queue.push(child);
+      }
     }
   }
-
   return undefined;
 }
 
-function extractCppReceiverType(node: SyntaxNode, source: string): string | undefined {
+function extractCppQualifiedMethodName(node: SyntaxNode, source: string): string | undefined {
   const declarator = getChildByField(node, 'declarator');
   if (!declarator) return undefined;
+  const qid = findDeclaratorQualifiedId(declarator);
+  if (!qid) return undefined;
+  const parts = getNodeText(qid, source).trim().split('::').filter(Boolean);
+  return parts[parts.length - 1];
+}
 
-  const queue: SyntaxNode[] = [declarator];
-  while (queue.length > 0) {
-    const current = queue.shift()!;
-    if (current.type === 'qualified_identifier') {
-      const text = getNodeText(current, source).trim();
-      const parts = text.split('::').filter(Boolean);
-      if (parts.length > 1) {
-        return parts.slice(0, -1).join('::');
-      }
-      return undefined;
-    }
-    for (let i = 0; i < current.namedChildCount; i++) {
-      const child = current.namedChild(i);
-      if (child) queue.push(child);
-    }
-  }
-
-  return undefined;
+function extractCppReceiverType(node: SyntaxNode, source: string): string | undefined {
+  const declarator = getChildByField(node, 'declarator');
+  if (!declarator) return undefined;
+  const qid = findDeclaratorQualifiedId(declarator);
+  if (!qid) return undefined;
+  const parts = getNodeText(qid, source).trim().split('::').filter(Boolean);
+  return parts.length > 1 ? parts.slice(0, -1).join('::') : undefined;
 }
 
 export const cExtractor: LanguageExtractor = {