Преглед изворни кода

feat(csharp): index C# 12 primary constructors via an up-to-date grammar (#237) (#717)

Vendor tree-sitter-c-sharp 0.23.5 (ABI 15) for C#, replacing the bundled ABI-13
build that dropped primary-constructor classes. Adds native primary-ctor
parsing, primary-ctor parameter dependency edges, return-type extraction via the
renamed `returns` field, and a preParse that blanks `#if` directive lines the
new grammar mis-parses inside enum bodies. Validated on MediatR / eShopOnWeb /
Newtonsoft.Json + full suite.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Colby Mchenry пре 2 недеља
родитељ
комит
80db274e5f

+ 1 - 0
CHANGELOG.md

@@ -51,6 +51,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 - C++ free functions are now indexed under their real name. A function written with a qualified-type parameter (`std::string TableFileName(const std::string& dbname)`) or an `auto … -> std::string` trailing return type was mistakenly named after that type (`string`), so calls to it never resolved, `codegraph_node` couldn't find it by name, and the file defining it looked like nothing depended on it. The function now keeps its real name, so cross-file calls, callers, and blast radius work — a meaningful gain for any namespaced C++ codebase (this is how most free functions in a library look). (C++)
 - Ruby impact and `codegraph affected` now follow mixins and `require`s. `include`, `extend`, and `prepend` of a module — Ruby's primary composition mechanism (ActiveSupport concerns, `Comparable`, `Enumerable`) — now record a dependency on that module, so editing a concern surfaces every class that mixes it in; previously these were read as a call to a method named `include`, so a module whose methods are exercised only by application code looked like nothing depended on it. And `require "lib/foo"` / `require_relative "../foo"` now link to the required file, so a file pulled in only by a `require` (config-loaded components, gems that don't autoload) is no longer reported as having no dependents. Together these took a typical gem from ~71% of its files showing real dependents to ~100%. (Ruby)
 - C# `record` types are now indexed. `record`, `record class`, and `record struct` declarations (everywhere in modern C# — DTOs, value objects, CQRS messages, MediatR notifications) were previously skipped entirely, so every reference, generic type argument (`IEnumerable<MyRecord>`), and `new MyRecord(...)` pointed at nothing and the file defining a record looked like it had no callers or dependents. (#237)
+- C# is now parsed with an up-to-date grammar that understands C# 12 **primary constructors**. A class or struct written as `class OrderService(IRepo repo, [FromKeyedServices("primary")] ICache cache) { … }` is now indexed reliably — previously the constructor parameter list confused the parser and could drop the whole class (and all of its methods) from the index, most often exactly when a parameter carried an attribute, as in the ASP.NET keyed-dependency-injection pattern. The primary-constructor parameters are also recorded as dependencies, so the services a type is constructed with show up in its blast radius and "who depends on this contract" answers. Method return types, base types, and members all continue to resolve, and `#if`-guarded members in multi-targeting code keep parsing correctly. (#237)
 - Go interfaces now connect to their implementations. Go has no `implements` keyword — a type satisfies an interface just by having the right methods — so CodeGraph now infers that link: a struct whose methods cover an interface's method set is treated as implementing it, and a call through the interface (`API.Marshal(...)`) reaches every concrete implementation. This means a type used only via an interface (the common plugin/strategy pattern — e.g. JSON-codec or renderer implementations selected at runtime) is no longer reported as having no callers or no dependents, and impact now flows from an interface method to the implementations behind it. (#584)
 - Go now records cross-package struct creation. A composite literal like `render.XML{...}` or `pkga.Widget{...}` — including ones registered in a package-level `var registry = map[string]R{...}` — now links to the package that defines the type. Cross-package function calls and type references already resolved; this closes struct instantiation, so a package whose types are only *constructed* elsewhere (a common pattern for interface implementations) is no longer reported as having no dependents. Go type conversions such as `(*Wrapped)(x)` now link to the converted-to type as well.
 - Python now follows whole-module imports — `from . import certs` then `certs.where()`, or `from pkg import sub` then `sub.run()`. Calls and attribute access through an imported submodule now resolve to that submodule, and importing a module is recorded as a dependency on it even when the member you use is itself re-exported from a third-party package. This also fixed Python relative-import path resolution generally (`from .sub.mod import x`), so `codegraph affected` and impact see the real module graph of a package.

+ 80 - 0
__tests__/extraction.test.ts

@@ -1013,6 +1013,86 @@ public class OrderService
     expect(classNode?.name).toBe('OrderService');
     expect(classNode?.visibility).toBe('public');
   });
+
+  it('indexes primary-constructor classes, including keyed-DI attribute params (#237)', () => {
+    // C# 12 primary constructors (`class Foo(IDep dep) { … }`) are parsed
+    // natively by the vendored tree-sitter-c-sharp 0.23.x grammar. The worst
+    // shape under the previous (older) grammar — an attribute-with-args on a
+    // ctor param (`[FromKeyedServices("primary")] …`, the ASP.NET keyed-DI
+    // pattern) — used to parse as an ERROR that swallowed the whole class, so
+    // the class and all its methods vanished. They now index in every case.
+    const code = `
+public class DataService(IMemoryCache cache)
+{
+    public void Warm() { }
+}
+
+public class InstanceService(InstanceManager m, ProfileManager p)
+{
+    public void DeployAndLaunchAsync() { }
+    public void Deploy() { }
+}
+
+public partial class UpdateService(int x) : ILifetimeService
+{
+    public void Run() { }
+}
+
+public class K1KeyedDi([FromKeyedServices("primary")] IMemoryCache cache)
+{
+    public void Warm() { }
+}
+
+public record CatalogBrand(int Id, string Name);
+`;
+    const result = extractFromSource('Services.cs', code);
+    const classNames = result.nodes.filter((n) => n.kind === 'class').map((n) => n.name);
+    expect(classNames).toContain('DataService');
+    expect(classNames).toContain('InstanceService');
+    expect(classNames).toContain('UpdateService'); // partial + base list
+    expect(classNames).toContain('K1KeyedDi'); // attribute-arg ctor param — used to vanish entirely
+    expect(classNames).toContain('CatalogBrand'); // record
+
+    const methods = result.nodes.filter((n) => n.kind === 'method').map((n) => n.name);
+    expect(methods).toContain('DeployAndLaunchAsync');
+    expect(methods).toContain('Deploy');
+    expect(methods).toContain('Run');
+  });
+
+  it('keeps a class indexable when a nested enum has #if-guarded members (#237)', () => {
+    // A `#if` directive inside an enum member list (the multi-targeting pattern
+    // in libraries like Newtonsoft.Json) makes the grammar emit an ERROR that,
+    // for a nested enum, detaches the enclosing class's member list — dropping
+    // most of the class's methods. A pre-parse pass blanks the directive lines
+    // (keeping both branches), so the class and all its methods still index.
+    const code = `
+public class Reader
+{
+    private enum ReadType
+    {
+#if HAVE_DATE_TIME_OFFSET
+        ReadAsDateTimeOffset,
+#endif
+        ReadAsDouble,
+        ReadAsString,
+    }
+
+    public void Open() { }
+    public void Close() { }
+    public int ReadInt() { return 0; }
+}
+`;
+    const result = extractFromSource('Reader.cs', code);
+    const methods = result.nodes.filter((n) => n.kind === 'method').map((n) => n.name);
+    // All three methods after the #if-bearing enum must survive.
+    expect(methods).toContain('Open');
+    expect(methods).toContain('Close');
+    expect(methods).toContain('ReadInt');
+    // Both enum branches are kept.
+    const enumMembers = result.nodes.filter((n) => n.kind === 'enum_member').map((n) => n.name);
+    expect(enumMembers).toContain('ReadAsDateTimeOffset');
+    expect(enumMembers).toContain('ReadAsDouble');
+  });
 });
 
 describe('PHP Extraction', () => {

+ 35 - 0
__tests__/resolution.test.ts

@@ -1132,6 +1132,41 @@ public class DataExporter
       expect(userIncoming.length).toBeGreaterThanOrEqual(3);
     });
 
+    it('C# primary-constructor parameters record their type dependencies (#237)', async () => {
+      // C# 12 primary constructors declare a type's injected dependencies inline
+      // (`class Svc(IRepo repo, [FromKeyedServices("k")] ICache cache)`). Each
+      // ctor parameter's type is recorded as a `references` edge from the class,
+      // so a DI-registered contract reached only through a primary ctor is no
+      // longer reported as having no dependents.
+      fs.mkdirSync(path.join(tempDir, 'src'), { recursive: true });
+      fs.writeFileSync(
+        path.join(tempDir, 'src', 'Contracts.cs'),
+        `namespace App;
+public interface IRepo { }
+public class ICache { }
+`
+      );
+      fs.writeFileSync(
+        path.join(tempDir, 'src', 'OrderService.cs'),
+        `namespace App;
+public sealed class OrderService(IRepo repo, [FromKeyedServices("primary")] ICache cache)
+{
+  public void Run() { }
+}
+`
+      );
+
+      cg = await CodeGraph.init(tempDir, { index: true });
+
+      const svc = cg.getNodesByKind('class').find((n) => n.name === 'OrderService');
+      expect(svc).toBeDefined();
+      // The class itself must index (it used to vanish under the old grammar).
+      const out = cg.getOutgoingEdges(svc!.id).filter((e) => e.kind === 'references');
+      const depNames = out.map((e) => cg.getNode(e.target)?.name);
+      expect(depNames).toContain('IRepo');
+      expect(depNames).toContain('ICache'); // the keyed-DI ([FromKeyedServices]) dependency
+    });
+
     it('Go: leaves stdlib calls (fmt.Println, etc.) external', async () => {
       fs.writeFileSync(
         path.join(tempDir, 'go.mod'),

+ 6 - 2
src/extraction/grammars.ts

@@ -200,8 +200,12 @@ export async function loadGrammarsForLanguages(languages: Language[]): Promise<v
       // tree-sitter-wasms build is too old). Lua: tree-sitter-wasms ships an
       // ABI-13 build that corrupts the shared WASM heap under web-tree-sitter
       // 0.25 (drops nested calls/imports on every file after the first); we
-      // vendor the upstream ABI-15 wasm instead.
-      const wasmPath = (lang === 'pascal' || lang === 'scala' || lang === 'lua' || lang === 'luau')
+      // vendor the upstream ABI-15 wasm instead. C#: the tree-sitter-wasms
+      // build (ABI 13) has no primary-constructor support and parses
+      // `class Foo(...)` as an ERROR that swallows the whole class (#237); we
+      // vendor the upstream ABI-15 tree-sitter-c-sharp 0.23.5 wasm, which parses
+      // primary constructors natively.
+      const wasmPath = (lang === 'pascal' || lang === 'scala' || lang === 'lua' || lang === 'luau' || lang === 'csharp')
         ? path.join(__dirname, 'wasm', wasmFile)
         : require.resolve(`tree-sitter-wasms/out/${wasmFile}`);
       const language = await WasmLanguage.load(wasmPath);

+ 31 - 0
src/extraction/languages/csharp.ts

@@ -2,7 +2,38 @@ import type { Node as SyntaxNode } from 'web-tree-sitter';
 import { getNodeText } from '../tree-sitter-helpers';
 import type { LanguageExtractor } from '../tree-sitter-types';
 
+/**
+ * Blank C# conditional-compilation directive lines (`#if` / `#elif` / `#else` /
+ * `#endif`) before parsing. The vendored tree-sitter-c-sharp grammar mis-parses
+ * a `#if` that appears *inside an enum member list* — the canonical
+ * multi-targeting shape:
+ *
+ *   enum ReadType {
+ *   #if HAVE_DATE_TIME_OFFSET
+ *       ReadAsDateTimeOffset,
+ *   #endif
+ *       ReadAsDouble,
+ *   }
+ *
+ * It emits an ERROR that, for a nested enum, detaches the *enclosing class's*
+ * member list, so most of the class's methods drop out of the index. Removing
+ * the directive lines (keeping the guarded code) sidesteps it. Both branches of
+ * an `#if/#else` are kept — the same behaviour the previous grammar produced,
+ * and the right default for a code graph (index every symbol regardless of
+ * build flags). Replacement preserves byte offsets (directive text → spaces,
+ * newlines kept) so every symbol's line/column stays exact. (#237)
+ */
+export function blankCsharpPreprocessorDirectives(source: string): string {
+  if (source.indexOf('#') === -1) return source;
+  // Conditional-compilation directives only. `#region`/`#pragma`/`#nullable`
+  // parse fine and are left alone. A directive must be the first non-space token
+  // on its line (C# requirement), so anchor to line start.
+  const re = /^([ \t]*)#[ \t]*(if|elif|else|endif)\b[^\n]*/gm;
+  return source.replace(re, (m, indent) => indent + ' '.repeat(m.length - indent.length));
+}
+
 export const csharpExtractor: LanguageExtractor = {
+  preParse: blankCsharpPreprocessorDirectives,
   functionTypes: [],
   // Records are first-class type declarations in modern C# (DTOs, value objects,
   // MediatR/CQRS messages). `record` / `record class` parse as record_declaration

+ 10 - 0
src/extraction/tree-sitter-types.ts

@@ -78,6 +78,16 @@ export interface ExtractorContext {
  * language-specific details like signatures, visibility, and imports.
  */
 export interface LanguageExtractor {
+  /**
+   * Optional source transform applied immediately before the grammar parses the
+   * file. Used to work around grammar gaps that would otherwise corrupt the
+   * parse tree (e.g. C# blanks conditional-compilation directive lines the
+   * grammar mis-parses inside enum bodies). MUST preserve byte offsets (replace
+   * removed text with spaces, keep newlines) so node positions and getNodeText
+   * stay correct; the returned string is used for both parsing and extraction.
+   */
+  preParse?: (source: string) => string;
+
   // --- Node type mappings ---
 
   /** Node types that represent functions */

+ 43 - 2
src/extraction/tree-sitter.ts

@@ -272,6 +272,14 @@ export class TreeSitterExtractor {
     }
 
     try {
+      // Optional pre-parse source transform (offset-preserving) to work around
+      // grammar gaps — e.g. C# blanks conditional-compilation directive lines
+      // the grammar mis-parses inside enum bodies (#237). We reassign
+      // this.source so downstream getNodeText reads the same bytes the parser
+      // saw (identical outside the blanked directive lines).
+      if (this.extractor?.preParse) {
+        this.source = this.extractor.preParse(this.source);
+      }
       this.tree = parser.parse(this.source) ?? null;
       if (!this.tree) {
         throw new Error('Parser returned null tree');
@@ -853,6 +861,9 @@ export class TreeSitterExtractor {
     // Extract extends/implements
     this.extractInheritance(node, classNode.id);
 
+    // C# primary-constructor parameter dependencies (`class Svc(IRepo r, …)`).
+    this.extractCsharpPrimaryCtorParamRefs(node, classNode.id);
+
     // Extract decorators applied to the class (`@Foo class X {}`).
     this.extractDecoratorsFor(node, classNode.id);
 
@@ -1027,6 +1038,10 @@ export class TreeSitterExtractor {
     // Extract inheritance (e.g. Swift: struct HTTPMethod: RawRepresentable)
     this.extractInheritance(node, structNode.id);
 
+    // C# primary-constructor parameter dependencies (`struct P(int x)`, and
+    // `record struct M(decimal Amount)` which the grammar nests here).
+    this.extractCsharpPrimaryCtorParamRefs(node, structNode.id);
+
     // Push to stack for field extraction
     this.nodeStack.push(structNode.id);
     for (let i = 0; i < body.namedChildCount; i++) {
@@ -3486,8 +3501,11 @@ export class TreeSitterExtractor {
    * `tuple_type`, …) — none of which are `type_identifier`. Closes #381.
    */
   private extractCsharpTypeRefs(node: SyntaxNode, nodeId: string): void {
-    // Return type / property type — the field is named `type`.
-    const directType = getChildByField(node, 'type');
+    // A property's type is under the `type` field; a method/constructor's RETURN
+    // type is under `returns` (tree-sitter-c-sharp 0.23.x — older builds used
+    // `type` for both). A node carries only one of the two, so checking both
+    // covers return types and property types without conflating them.
+    const directType = getChildByField(node, 'type') ?? getChildByField(node, 'returns');
     if (directType) this.walkCsharpTypePosition(directType, nodeId);
 
     // Field declarations wrap declarators in a `variable_declaration`
@@ -3516,6 +3534,29 @@ export class TreeSitterExtractor {
     }
   }
 
+  /**
+   * Record the dependencies declared by a C# PRIMARY CONSTRUCTOR
+   * (`class Svc(IRepo repo, [FromKeyedServices("k")] ICache cache) { … }`,
+   * C# 12+). The parameter list hangs off the class/struct/record declaration
+   * as an unnamed-field `parameter_list` child (not the `parameters` field a
+   * method uses), so it's found by node type. Each parameter's declared type
+   * becomes a `references` edge from the owning type — these are exactly the
+   * services a DI-registered type depends on, so impact/blast-radius and
+   * "who depends on this contract" now see them. No-op when there's no primary
+   * constructor. (#237)
+   */
+  private extractCsharpPrimaryCtorParamRefs(node: SyntaxNode, ownerId: string): void {
+    if (this.language !== 'csharp') return;
+    const paramList = node.namedChildren.find((c: SyntaxNode) => c.type === 'parameter_list');
+    if (!paramList) return;
+    for (let i = 0; i < paramList.namedChildCount; i++) {
+      const param = paramList.namedChild(i);
+      if (!param || param.type !== 'parameter') continue;
+      const paramType = getChildByField(param, 'type');
+      if (paramType) this.walkCsharpTypePosition(paramType, ownerId);
+    }
+  }
+
   /**
    * Walk a C# subtree that is KNOWN to be in a type position
    * (return type, parameter type, property type, field type, generic

BIN
src/extraction/wasm/tree-sitter-c_sharp.wasm