Ver Fonte

fix(extraction): C# produces `references` edges for type annotations (#381)

Indexing any C# project produced zero `references` edges:

  csharp | calls         | 810
  csharp | extends       | 70
  csharp | implements    | 20
  csharp | instantiates  | 169
  csharp | references    | 0   <-- bug

So `codegraph_callers SessionInfoDto` returned 0 hits even when the DTO
was used as a parameter or return type across the codebase, and
`codegraph_callees DataExporter` only saw `using` imports.

Two root causes:

1. `csharp.ts` was missing `returnField`. The default `'return_type'`
   doesn't exist on C# method_declaration nodes — the field is `'type'`.
   Also `paramsField` was set to `'parameter_list'` (the node TYPE)
   instead of `'parameters'` (the field NAME on method_declaration), so
   parameter extraction silently no-op'd.

2. `extractTypeRefsFromSubtree` only emitted refs for `type_identifier`
   leaves. C# tree-sitter does not produce `type_identifier` — it uses
   `identifier`, `predefined_type`, `qualified_name`, `generic_name`,
   `array_type`, `nullable_type`, `tuple_type`, etc.

Fix:

- `csharp.ts`: set `paramsField:'parameters'` and `returnField:'type'`.
- Route C# through a dedicated `extractCsharpTypeRefs` + recursive
  `walkCsharpTypePosition`. The walker descends ONLY into known type
  fields (parameter.type, method.type, property.type,
  variable_declaration.type, tuple_element.type), so parameter names
  like `request` in `Build(UserDto request)` are never mis-emitted as
  type refs.
- Hook `extractField` and `extractProperty` to call
  `extractTypeAnnotations`. Previously neither emitted type references
  for any language; the C# branch in the new walker handles the
  `field_declaration → variable_declaration → type` nesting.

Validation on dotnet/eShop (527 .cs files):
  references:  35 -> 925 (+26x)
  No regression in calls/imports/instantiates/extends/implements.

Test covers method return type, parameter types, generic return
(`Task<SessionInfoDto>`), property type, and field type all flowing to
incoming `references` edges on the DTO/class — and confirms parameter
names don't leak.

Closes #381.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Colby McHenry há 3 semanas atrás
pai
commit
272d0e99a6

+ 1 - 0
CHANGELOG.md

@@ -16,6 +16,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
   - **Field-injected concrete-bean trace.** A Spring controller's `@Resource(name="userBO") private UserBO userbo;` followed by `this.userbo.toLogin2(...)` now resolves through to `UserBO.toLogin2` even when the field type is a concrete class whose name doesn't match the field by Java naming convention (`userbo` → `UserBO`). The fix is two layered changes in the language layer (Java only): (a) the call extractor unwraps `this.<field>` receivers (previously surfaced as `this.userbo.toLogin2` and dropped through every name-matcher strategy); (b) the resolver looks up the receiver name in the enclosing class's field declarations and uses the declared type to resolve the method. This generalizes beyond Spring — any Java code using `this.field.method()` now resolves correctly.
 
 ### Fixed
+- **C# now produces `references` edges for parameter, return, property, and field types (#381).** Indexing any C# project used to yield **zero** `references` edges, so `codegraph_callers SomeDto` returned no results even when the DTO was used as a parameter or return type across the codebase, and `codegraph_callees` on a service class only saw its `using` imports. Two root causes: `csharp.ts` was missing `returnField`, and the type-leaf walker only matched `type_identifier` nodes — but C# tree-sitter emits `identifier`/`predefined_type`/`qualified_name`/`generic_name` instead. The fix adds the missing extractor field, routes C# through a dedicated type walker that only descends into known type-position fields (so parameter NAMES like `request` in `Build(UserDto request)` never mis-emit as type refs), and hooks `extractField`/`extractProperty` to invoke the walker. Measured on dotnet/eShop (527 `.cs` files): C# `references` edges go from **35 → 925** (+26x), with no regression in `calls`/`imports`/`instantiates`/`extends`/`implements`.
 - **Go cross-package qualified calls (`pkga.FuncX(...)`) now resolve to the right package (#388).** On a Go monorepo with a layered package layout (handler/service/domain/dao), `codegraph_callers`, `_callees`, `_impact`, and `_trace` used to return ~0-1 results where grep finds hundreds to thousands of real call sites — the central value proposition of CodeGraph silently degraded on entire Go codebases. Root cause: the import resolver flagged every Go import path without `/internal/` as third-party (because it had no idea what the project's own module path was), so cross-package calls fell through to name-matching with path-proximity scoring, which on real codebases picks ~one accidental candidate per call site. The Go branch now reads the project's `go.mod`, treats `<module-path>/...` imports as in-module, and looks up the qualified symbol in the imported package's directory; same-name functions in *different* packages no longer collide. As a side fix, Go nodes now correctly carry `is_exported=1` for capitalized identifiers (the resolver needs this to filter candidates). Measured on gRPC-Go (1,031 `.go` files, layered packages): cross-package `calls` edges go from 10,880 → 19,929 (**+83%**), total `calls` from 23,803 → 34,105 (**+43%**), with no false-positive resolution of stdlib calls (`fmt.Println` etc. stay external).
 - **`codegraph_files` now returns the whole project when an agent passes `path="/"`, `"."`, `"./"`, `""`, or a Windows-style `"\\"` — instead of "No files found matching the criteria."** Indexed file paths are stored as project-relative POSIX (e.g. `src/foo.ts`), but the path filter used a plain `startsWith`, so a leading slash or any of the other root-ish shapes an agent might guess matched nothing and pushed the agent back to Read/Glob — the exact opencode + Gemini Flash regression reported on Windows 11. Subdirectory filters are now equally forgiving: `"/src"`, `"./src"`, `"src/"`, `"src\\components"`, etc. all resolve correctly. Sibling-prefix bleed (`"src"` was previously matching `src-utils/...`) is also fixed — the filter now requires either an exact match or a `<filter>/` boundary. Closes #426.
 - **File watcher no longer marks edited files as fresh when another process holds the index lock.** When a second writer (concurrent `codegraph index`, a git hook, another MCP daemon) held `.codegraph/codegraph.lock`, `CodeGraph.sync()` returned a zero-shape no-op instead of throwing. The file watcher took that as a successful sync and cleared `pendingFiles` — so the per-file staleness signal MCP tools surface to agents (issue #403) dropped immediately, even though the edit was never indexed. `CodeGraph.watch()` now converts that no-op into a typed `LockUnavailableError` thrown into the watcher; the existing retry path preserves `pendingFiles` and reschedules until the lock becomes available. The error is logged at debug only (no `onSyncError` callback) so a long-running external indexer doesn't spam stderr every debounce cycle. Closes #449.

+ 53 - 0
__tests__/resolution.test.ts

@@ -742,6 +742,59 @@ func UseAliased() {
       expect(target?.filePath.replace(/\\/g, '/')).toBe('pkgb/lib.go');
     });
 
+    it('C# extracts references from method/property/field types (#381)', async () => {
+      // Pre-#381, every C# project produced ZERO `references` edges:
+      // csharp.ts was missing returnField, and the type-leaf walker
+      // only recognized TS/Java's `type_identifier` nodes — C# uses
+      // `identifier`/`predefined_type`/`qualified_name`/`generic_name`.
+      const srcDir = path.join(tempDir, 'src');
+      fs.mkdirSync(srcDir, { recursive: true });
+
+      fs.writeFileSync(
+        path.join(srcDir, 'Dtos.cs'),
+        `namespace MyApp;
+public class SessionInfoDto { public string Id { get; set; } = ""; }
+public class UserDto { public string Name { get; set; } = ""; }
+`
+      );
+      fs.writeFileSync(
+        path.join(srcDir, 'Service.cs'),
+        `using System.Threading.Tasks;
+namespace MyApp;
+public class DataExporter
+{
+  public SessionInfoDto Build(UserDto user, SessionInfoDto session) { return session; }
+  public Task<SessionInfoDto> BuildAsync(UserDto user) { return Task.FromResult(new SessionInfoDto()); }
+  public SessionInfoDto Latest { get; set; } = new();
+  private UserDto _cached;
+}
+`
+      );
+
+      cg = await CodeGraph.init(tempDir, { index: true });
+
+      const sessionDto = cg
+        .getNodesByKind('class')
+        .find((n) => n.name === 'SessionInfoDto');
+      const userDto = cg
+        .getNodesByKind('class')
+        .find((n) => n.name === 'UserDto');
+      expect(sessionDto).toBeDefined();
+      expect(userDto).toBeDefined();
+
+      const sessionIncoming = cg
+        .getIncomingEdges(sessionDto!.id)
+        .filter((e) => e.kind === 'references');
+      const userIncoming = cg
+        .getIncomingEdges(userDto!.id)
+        .filter((e) => e.kind === 'references');
+
+      // SessionInfoDto: Build return, Build param, BuildAsync return (inside Task<>), Latest property.
+      // UserDto: Build param, BuildAsync param, _cached field.
+      expect(sessionIncoming.length).toBeGreaterThanOrEqual(4);
+      expect(userIncoming.length).toBeGreaterThanOrEqual(3);
+    });
+
     it('Go: leaves stdlib calls (fmt.Println, etc.) external', async () => {
       fs.writeFileSync(
         path.join(tempDir, 'go.mod'),

+ 2 - 1
src/extraction/languages/csharp.ts

@@ -18,7 +18,8 @@ export const csharpExtractor: LanguageExtractor = {
   propertyTypes: ['property_declaration'],
   nameField: 'name',
   bodyField: 'body',
-  paramsField: 'parameter_list',
+  paramsField: 'parameters',
+  returnField: 'type',
   getVisibility: (node) => {
     for (let i = 0; i < node.childCount; i++) {
       const child = node.child(i);

+ 131 - 1
src/extraction/tree-sitter.ts

@@ -940,6 +940,10 @@ export class TreeSitterExtractor {
     // decorator->target relationship for class properties too.
     if (propNode) {
       this.extractDecoratorsFor(node, propNode.id);
+      // Emit `references` edges from the property to types named in its
+      // type annotation (#381). The generic walker handles TS-style
+      // `type_annotation` children; the C# branch walks the `type` field.
+      this.extractTypeAnnotations(node, propNode.id);
     }
   }
 
@@ -1022,7 +1026,15 @@ export class TreeSitterExtractor {
         });
         // Java/Kotlin annotations / TS field decorators sit on the
         // outer field_declaration, not on the individual declarator.
-        if (fieldNode) this.extractDecoratorsFor(node, fieldNode.id);
+        if (fieldNode) {
+          this.extractDecoratorsFor(node, fieldNode.id);
+          // Same as properties: emit `references` to the field's annotated
+          // type. The outer `field_declaration` is the right scope to
+          // search from — C# carries the `type` inside `variable_declaration`
+          // and the language-aware path in `extractTypeAnnotations` descends
+          // into that wrapper (#381).
+          this.extractTypeAnnotations(node, fieldNode.id);
+        }
       }
     } else {
       // Fallback: try to find an identifier child directly
@@ -2219,6 +2231,17 @@ export class TreeSitterExtractor {
     if (!this.extractor) return;
     if (!this.TYPE_ANNOTATION_LANGUAGES.has(this.language)) return;
 
+    // C# tree-sitter doesn't produce `type_identifier` leaves — it uses
+    // `identifier`, `predefined_type`, `qualified_name`, `generic_name`,
+    // etc. — so the generic walker below emits zero references for it.
+    // Dispatch to a C#-aware path that only walks type-position subtrees
+    // (the `type` field of a parameter/method/property/field), so
+    // parameter NAMES never accidentally surface as type refs (#381).
+    if (this.language === 'csharp') {
+      this.extractCsharpTypeRefs(node, nodeId);
+      return;
+    }
+
     // Extract parameter type annotations
     const params = getChildByField(node, this.extractor.paramsField || 'parameters');
     if (params) {
@@ -2240,6 +2263,113 @@ export class TreeSitterExtractor {
     }
   }
 
+  /**
+   * Extract C# type references from a node that owns a type position —
+   * a method/constructor declaration, a property declaration, or a
+   * field declaration (which wraps `variable_declaration → type`).
+   *
+   * Walks ONLY into known type fields, so parameter names like
+   * `request` in `Build(UserDto request)` are never mis-emitted as
+   * type references. Once inside a type subtree, `walkCsharpTypePosition`
+   * recognizes C#'s actual type-leaf node kinds (`identifier`,
+   * `qualified_name`, `generic_name`, `array_type`, `nullable_type`,
+   * `tuple_type`, …) — none of which are `type_identifier`. Closes #381.
+   */
+  private extractCsharpTypeRefs(node: SyntaxNode, nodeId: string): void {
+    // Return type / property type — the field is named `type`.
+    const directType = getChildByField(node, 'type');
+    if (directType) this.walkCsharpTypePosition(directType, nodeId);
+
+    // Field declarations wrap declarators in a `variable_declaration`
+    // whose `type` field carries the type. The outer `field_declaration`
+    // has no `type` field of its own, so the call above is a no-op here
+    // and we descend one level.
+    const varDecl = node.namedChildren.find((c: SyntaxNode) => c.type === 'variable_declaration');
+    if (varDecl) {
+      const vdType = getChildByField(varDecl, 'type');
+      if (vdType) this.walkCsharpTypePosition(vdType, nodeId);
+    }
+
+    // Method / constructor parameters. The field name on
+    // `method_declaration` is `parameters`; it points at a
+    // `parameter_list` whose `parameter` children each have their own
+    // `type` field. Walking ONLY the type field skips parameter NAMES,
+    // which would otherwise mis-emit as type references.
+    const params = getChildByField(node, 'parameters');
+    if (params) {
+      for (let i = 0; i < params.namedChildCount; i++) {
+        const child = params.namedChild(i);
+        if (!child || child.type !== 'parameter') continue;
+        const paramType = getChildByField(child, 'type');
+        if (paramType) this.walkCsharpTypePosition(paramType, nodeId);
+      }
+    }
+  }
+
+  /**
+   * Walk a C# subtree that is KNOWN to be in a type position
+   * (return type, parameter type, property type, field type, generic
+   * argument). Identifiers here are type names, not parameter names.
+   */
+  private walkCsharpTypePosition(node: SyntaxNode, fromNodeId: string): void {
+    // `predefined_type` is int/string/bool/etc. — never a project ref.
+    if (node.type === 'predefined_type') return;
+
+    // Bare type name: `Foo` in `Foo bar`, or the `Foo` inside `List<Foo>`.
+    if (node.type === 'identifier') {
+      const name = getNodeText(node, this.source);
+      if (name && !this.BUILTIN_TYPES.has(name)) {
+        this.unresolvedReferences.push({
+          fromNodeId,
+          referenceName: name,
+          referenceKind: 'references',
+          line: node.startPosition.row + 1,
+          column: node.startPosition.column,
+        });
+      }
+      return;
+    }
+
+    // `Namespace.Foo` → the rightmost identifier is the type. Emit the
+    // full qualified name as the reference; the resolver can still match
+    // on the trailing simple name when needed.
+    if (node.type === 'qualified_name') {
+      const text = getNodeText(node, this.source);
+      const last = text.split('.').pop() ?? text;
+      if (last && !this.BUILTIN_TYPES.has(last)) {
+        this.unresolvedReferences.push({
+          fromNodeId,
+          referenceName: last,
+          referenceKind: 'references',
+          line: node.startPosition.row + 1,
+          column: node.startPosition.column,
+        });
+      }
+      return;
+    }
+
+    // `(int Code, Foo Payload)` — tuple element has BOTH a `type` and a
+    // `name` field; descending into all named children would mis-emit
+    // the element name (`Code`, `Payload`) as a type ref. Walk only the
+    // type field.
+    if (node.type === 'tuple_element') {
+      const t = getChildByField(node, 'type');
+      if (t) this.walkCsharpTypePosition(t, fromNodeId);
+      return;
+    }
+
+    // Composite type nodes — recurse into named children. Covers
+    // `generic_name` (head identifier + `type_argument_list`),
+    // `nullable_type`, `array_type`, `pointer_type`, `tuple_type`,
+    // `ref_type`, and any newer wrapping shapes the grammar adds.
+    // Identifiers reached here are all type-positional (parameter/field
+    // names are gated out before we descend).
+    for (let i = 0; i < node.namedChildCount; i++) {
+      const child = node.namedChild(i);
+      if (child) this.walkCsharpTypePosition(child, fromNodeId);
+    }
+  }
+
   /**
    * Extract type references from a variable's type annotation.
    */