Jelajahi Sumber

feat(impact): Scala type-graph coverage — parameterized extends, type refs, implicit params

Scala extraction produced symbol nodes but almost no edges for typeclass-style
code: every supertype/type/implicit relationship was dropped, so a trait used as
a type, extended with type parameters, or required as an implicit looked like
nothing depended on it — on cats/algebra that was most of the graph.

- extends/with: `extends A[X] with B with C` packs all supertypes into one
  `extends_clause`; the generic path took only namedChild(0) and kept the full
  text (`A[X]`) so parameterized supertypes (every typeclass) never matched.
  Iterate every supertype and unwrap `generic_type` → base type name via a new
  shared `scalaBaseTypeName` helper.
- type references: add `scala` to the type-annotation languages so method
  parameter/return types link; walk EVERY curried parameter list (the trailing
  `(implicit M: TC[A])` list is where instances are passed); walk
  `type_parameters` for context bounds (`def f[A: Monoid]`); and emit refs for
  `val`/`var` type annotations from the Scala extractor. Capitalized Scala
  primitives are skipped to avoid resolution noise.
- instantiation: `new T[...] { ... }` (`instance_expression`) now emits an
  `instantiates` edge to the base type.

Measured (fair cross-file dependent coverage, symbol-bearing source files):
cats 48.9% → 89.2% (scalafix tooling + JMH benches excluded; 82.1% raw),
gatling 76.3% → 91.2%. Node count stable (edges added, not nodes); residual is
genuine frontiers (cross-build scala-2.12/2.13+/3 variants, laws test-support,
wildcard-import barrels, infix-type evidence). Full suite green; changes gated
to Scala so no cross-language regression.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Colby McHenry 2 minggu lalu
induk
melakukan
b5489d9

+ 1 - 0
CHANGELOG.md

@@ -22,6 +22,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 - Swift property wrappers and attributes are now connected. A `@Argument` / `@Published` / `@State` / custom `@propertyWrapper` on a property — and attributes on types, methods, and functions (`@objc`, `@MainActor`, …) — now record a dependency on the wrapper/attribute type. Previously these were dropped entirely (Swift attributes parse differently from other languages, and stored properties weren't being inspected), so the wrapper type looked unused and the file using it depended on nothing — a big gap for SwiftUI and argument-parser-style code.
 - Java annotations are now connected. Annotation definitions (`@interface Foo`) are indexed as types, and every `@Foo` usage on a class, method, or field is recorded as a dependency on it. Previously neither side was captured — annotation usages were dropped (they live inside the declaration's modifiers) and `@interface` types weren't indexed at all — so annotation-driven code (Spring `@GetMapping`, JPA `@Entity`, Gson `@SerializedName`, …) showed the annotation as having no users and the annotated class as not depending on it.
 - Kotlin Multiplatform `expect`/`actual` declarations are now connected. A platform implementation — `actual fun`, `actual class`, or an `actual typealias` in a `jvm` / `native` / `js` / `wasm` source set — is linked to the common `expect` declaration it fulfills (including the common case of an `expect class` fulfilled by an `actual typealias`). Previously a caller in common code resolved to the `expect` declaration, so every platform `actual` looked like it had no dependents and editing one showed an empty blast radius; now changing a platform implementation surfaces the common API and everything that uses it. (Kotlin)
+- Scala impact and `codegraph affected` now connect the type graph that typeclass-style code is built on. A parameterized supertype (`trait Monoid[A] extends Semigroup[A] with Serializable`) now links to each parent; a type used in a `val`/`def` signature, as a type argument, or as a context bound (`def f[A: Monoid]`) — including the trailing implicit parameter list (`(implicit M: Monoid[A])`) where typeclass instances are passed — now records a dependency; and `new T[...] { … }` counts as an instantiation. Previously Scala linked only plain calls and bare, non-generic supertypes, so a trait extended with type parameters, used as a type, or required as an implicit looked like nothing depended on it — which on a typeclass-heavy codebase (cats, algebra) was most of the graph. (Scala)
 - C# `record` types are now indexed. `record`, `record class`, and `record struct` declarations (everywhere in modern C# — DTOs, value objects, CQRS messages, MediatR notifications) were previously skipped entirely, so every reference, generic type argument (`IEnumerable<MyRecord>`), and `new MyRecord(...)` pointed at nothing and the file defining a record looked like it had no callers or dependents. (#237)
 - Go interfaces now connect to their implementations. Go has no `implements` keyword — a type satisfies an interface just by having the right methods — so CodeGraph now infers that link: a struct whose methods cover an interface's method set is treated as implementing it, and a call through the interface (`API.Marshal(...)`) reaches every concrete implementation. This means a type used only via an interface (the common plugin/strategy pattern — e.g. JSON-codec or renderer implementations selected at runtime) is no longer reported as having no callers or no dependents, and impact now flows from an interface method to the implementations behind it. (#584)
 - Go now records cross-package struct creation. A composite literal like `render.XML{...}` or `pkga.Widget{...}` — including ones registered in a package-level `var registry = map[string]R{...}` — now links to the package that defines the type. Cross-package function calls and type references already resolved; this closes struct instantiation, so a package whose types are only *constructed* elsewhere (a common pattern for interface implementations) is no longer reported as having no dependents. Go type conversions such as `(*Wrapped)(x)` now link to the converted-to type as well.

+ 81 - 0
__tests__/extraction.test.ts

@@ -3278,6 +3278,87 @@ actual typealias Lock = java.util.concurrent.locks.ReentrantLock
   });
 });
 
+describe('Scala cross-file dependencies', () => {
+  let tempDir: string;
+  let cg: CodeGraph;
+
+  beforeEach(() => {
+    tempDir = createTempDir();
+  });
+
+  afterEach(() => {
+    if (cg) cg.close();
+    if (fs.existsSync(tempDir)) fs.rmSync(tempDir, { recursive: true, force: true });
+  });
+
+  it('links parameterized supertypes, type annotations, and implicit params across files', async () => {
+    const src = path.join(tempDir, 'src', 'main', 'scala', 'demo');
+    fs.mkdirSync(src, { recursive: true });
+
+    fs.writeFileSync(
+      path.join(src, 'Semigroup.scala'),
+      `package demo
+
+trait Semigroup[A] {
+  def combine(x: A, y: A): A
+}
+`
+    );
+    fs.writeFileSync(
+      path.join(src, 'Monoid.scala'),
+      `package demo
+
+trait Monoid[A] extends Semigroup[A] {
+  def empty: A
+}
+`
+    );
+    fs.writeFileSync(
+      path.join(src, 'Instances.scala'),
+      `package demo
+
+object Instances {
+  implicit val intMonoid: Monoid[Int] = new Monoid[Int] {
+    def empty: Int = 0
+    def combine(x: Int, y: Int): Int = x + y
+  }
+}
+`
+    );
+    fs.writeFileSync(
+      path.join(src, 'Folding.scala'),
+      `package demo
+
+object Folding {
+  def fold[A](xs: List[A])(implicit M: Monoid[A]): A =
+    xs.foldLeft(M.empty)(M.combine)
+}
+`
+    );
+
+    cg = CodeGraph.initSync(tempDir);
+    await cg.indexAll();
+    cg.resolveReferences();
+
+    const monoid = cg.getNodesByKind('trait').find((n) => n.name === 'Monoid');
+    const semigroup = cg.getNodesByKind('trait').find((n) => n.name === 'Semigroup');
+    expect(monoid).toBeDefined();
+    expect(semigroup).toBeDefined();
+    expect(monoid!.filePath).not.toBe(semigroup!.filePath);
+
+    // Parameterized supertype `extends Semigroup[A]` must create an extends edge —
+    // the whole point of the fix (the `[A]` used to defeat name matching).
+    const semaImpact = cg.getImpactRadius(semigroup!.id, 3);
+    expect([...semaImpact.nodes.values()].map((n) => n.name)).toContain('Monoid');
+
+    // Editing Monoid surfaces the cross-file users: the instance val typed
+    // `Monoid[Int]` and the method taking it as an implicit (curried) param.
+    const impacted = [...cg.getImpactRadius(monoid!.id, 3).nodes.values()].map((n) => n.name);
+    expect(impacted).toContain('intMonoid'); // field type annotation
+    expect(impacted).toContain('fold'); // trailing implicit parameter list
+  });
+});
+
 describe('Full Indexing', () => {
   let tempDir: string;
 

+ 36 - 1
src/extraction/languages/scala.ts

@@ -10,6 +10,40 @@ function getValVarName(node: SyntaxNode, source: string): string | null {
   return identChild ? getNodeText(identChild, source) : null;
 }
 
+// Capitalized Scala primitives/ubiquitous aliases that shouldn't create refs.
+const SCALA_BUILTIN_TYPES = new Set([
+  'Int', 'Long', 'Short', 'Byte', 'Float', 'Double', 'Boolean', 'Char', 'Unit',
+  'String', 'Any', 'AnyRef', 'AnyVal', 'Nothing', 'Null',
+]);
+
+/**
+ * Emit `references` edges for every type identifier in a Scala type subtree
+ * (a `val`/`var` type annotation), unwrapping `generic_type` etc. Mirrors the
+ * generic type-annotation extraction the core extractor runs for method
+ * parameter/return types, but Scala `val`s are created here in visitNode so
+ * their type is walked here too. A trait used only as a field type (the common
+ * `implicit val x: Monoid[Int]` instance pattern) thus gains a dependent.
+ */
+function emitScalaTypeRefs(typeNode: SyntaxNode, fromId: string, ctx: { addUnresolvedReference: (r: { fromNodeId: string; referenceName: string; referenceKind: 'references'; line: number; column: number }) => void }, source: string): void {
+  if (typeNode.type === 'type_identifier') {
+    const name = source.substring(typeNode.startIndex, typeNode.endIndex);
+    if (name && !SCALA_BUILTIN_TYPES.has(name)) {
+      ctx.addUnresolvedReference({
+        fromNodeId: fromId,
+        referenceName: name,
+        referenceKind: 'references',
+        line: typeNode.startPosition.row + 1,
+        column: typeNode.startPosition.column,
+      });
+    }
+    return;
+  }
+  for (let i = 0; i < typeNode.namedChildCount; i++) {
+    const child = typeNode.namedChild(i);
+    if (child) emitScalaTypeRefs(child, fromId, ctx, source);
+  }
+}
+
 function extractVisibility(node: SyntaxNode): 'public' | 'private' | 'protected' {
   for (let i = 0; i < node.namedChildCount; i++) {
     const child = node.namedChild(i);
@@ -96,7 +130,8 @@ export const scalaExtractor: LanguageExtractor = {
         ? `${t === 'val_definition' ? 'val' : 'var'} ${name}: ${getNodeText(typeNode, ctx.source)}`
         : undefined;
 
-      ctx.createNode(kind, name, node, { signature: sig, visibility: extractVisibility(node) });
+      const created = ctx.createNode(kind, name, node, { signature: sig, visibility: extractVisibility(node) });
+      if (created && typeNode) emitScalaTypeRefs(typeNode, created.id, ctx, ctx.source);
       return true;
     }
 

+ 105 - 5
src/extraction/tree-sitter.ts

@@ -115,6 +115,38 @@ function extractName(node: SyntaxNode, source: string, extractor: LanguageExtrac
   return '<anonymous>';
 }
 
+/**
+ * Resolve a Scala type node to its base type NAME for name-matching — unwrapping
+ * `generic_type` (`Monoid[Int]` → `Monoid`), taking the last segment of a
+ * qualified `stable_type_identifier` (`cats.Functor` → `Functor`), and falling
+ * back to a descendant `type_identifier`. Returns null for non-type nodes.
+ * Shared by Scala inheritance and type-reference extraction.
+ */
+function scalaBaseTypeName(node: SyntaxNode | null, source: string): string | null {
+  if (!node) return null;
+  switch (node.type) {
+    case 'type_identifier':
+    case 'identifier':
+      return getNodeText(node, source);
+    case 'generic_type':
+      // `<base> type_arguments` — the base type is the first named child.
+      return scalaBaseTypeName(node.namedChild(0), source);
+    case 'stable_type_identifier':
+    case 'stable_identifier': {
+      // Qualified `a.b.C` — match on the simple (last) segment.
+      const ids = node.namedChildren.filter(
+        (c: SyntaxNode) => c.type === 'type_identifier' || c.type === 'identifier'
+      );
+      const last = ids[ids.length - 1];
+      return last ? getNodeText(last, source) : null;
+    }
+    default: {
+      const id = node.namedChildren.find((c: SyntaxNode) => c.type === 'type_identifier');
+      return id ? getNodeText(id, source) : null;
+    }
+  }
+}
+
 /**
  * Tree-sitter node kinds that represent constructor invocations
  * (`new Foo()` and friends). Used by extractInstantiation to emit
@@ -126,6 +158,7 @@ const INSTANTIATION_KINDS: ReadonlySet<string> = new Set([
   'instance_creation_expression',    // some grammars
   'composite_literal',               // go — `Widget{...}` / `pkga.Widget{...}`
   'struct_expression',               // rust — `Widget { n: 1 }` / `m::Widget { .. }`
+  'instance_expression',             // scala — `new Monoid[Int] { ... }`
 ]);
 
 /**
@@ -2223,6 +2256,23 @@ export class TreeSitterExtractor {
       return;
     }
 
+    // Scala: `new Monoid[Int] { ... }` — the constructor is a `generic_type`
+    // (or qualified `stable_type_identifier`) using `[...]` type args, which the
+    // generic `<...>` strip below misses. Unwrap to the base type name.
+    if (node.type === 'instance_expression') {
+      const name = scalaBaseTypeName(ctor, this.source);
+      if (name) {
+        this.unresolvedReferences.push({
+          fromNodeId: fromId,
+          referenceName: name,
+          referenceKind: 'instantiates',
+          line: node.startPosition.row + 1,
+          column: node.startPosition.column,
+        });
+      }
+      return;
+    }
+
     let className = getNodeText(ctor, this.source);
     // Strip type-argument suffix first: `new Map<K, V>()` would
     // otherwise produce className 'Map<K, V>' (the constructor
@@ -2612,6 +2662,28 @@ export class TreeSitterExtractor {
         child.type === 'base_clause' || // PHP class extends
         child.type === 'extends_interfaces' // Java interface extends
       ) {
+        // Scala: `extends A[X] with B with C` packs EVERY supertype into the
+        // one extends_clause (separated by `with`), each a `generic_type` /
+        // `type_identifier` / `stable_type_identifier`. The generic path below
+        // takes only namedChild(0) and keeps the full text (`A[X]`), so a
+        // parameterized supertype — every typeclass in cats/algebra — never
+        // matched and `with`-mixed traits past the first were dropped. Iterate
+        // all supertypes and unwrap each to its base type name.
+        if (this.language === 'scala') {
+          for (const target of child.namedChildren) {
+            const name = scalaBaseTypeName(target, this.source);
+            if (name) {
+              this.unresolvedReferences.push({
+                fromNodeId: classId,
+                referenceName: name,
+                referenceKind: 'extends',
+                line: target.startPosition.row + 1,
+                column: target.startPosition.column,
+              });
+            }
+          }
+          continue;
+        }
         // Extract parent class/interface names
         // Java uses type_list wrapper: superclass -> type_identifier, extends_interfaces -> type_list -> type_identifier
         const typeList = child.namedChildren.find((c: SyntaxNode) => c.type === 'type_list');
@@ -2912,7 +2984,7 @@ export class TreeSitterExtractor {
    * Languages that support type annotations (TypeScript, etc.)
    */
   private readonly TYPE_ANNOTATION_LANGUAGES = new Set([
-    'typescript', 'tsx', 'dart', 'kotlin', 'swift', 'rust', 'go', 'java', 'csharp',
+    'typescript', 'tsx', 'dart', 'kotlin', 'swift', 'rust', 'go', 'java', 'csharp', 'scala',
   ]);
 
   /**
@@ -2929,6 +3001,9 @@ export class TreeSitterExtractor {
     // Go
     'int8', 'int16', 'int32', 'int64', 'uint8', 'uint16', 'uint32', 'uint64',
     'float32', 'float64', 'complex64', 'complex128', 'rune', 'error',
+    // Scala (capitalized primitives + ubiquitous stdlib aliases)
+    'Int', 'Long', 'Short', 'Byte', 'Float', 'Double', 'Boolean', 'Char', 'Unit',
+    'String', 'Any', 'AnyRef', 'AnyVal', 'Nothing', 'Null',
   ]);
 
   /**
@@ -2950,10 +3025,19 @@ export class TreeSitterExtractor {
       return;
     }
 
-    // Extract parameter type annotations
-    const params = getChildByField(node, this.extractor.paramsField || 'parameters');
-    if (params) {
-      this.extractTypeRefsFromSubtree(params, nodeId);
+    // Extract parameter type annotations. Scala curries — `def f(a)(implicit
+    // M: TC)` has MULTIPLE `parameters` siblings, and the typeclass is almost
+    // always in the trailing implicit list — so walk every parameter list, not
+    // just getChildByField's first match.
+    if (this.language === 'scala') {
+      for (const pc of node.namedChildren) {
+        if (pc.type === 'parameters') this.extractTypeRefsFromSubtree(pc, nodeId);
+      }
+    } else {
+      const params = getChildByField(node, this.extractor.paramsField || 'parameters');
+      if (params) {
+        this.extractTypeRefsFromSubtree(params, nodeId);
+      }
     }
 
     // Extract return type annotation
@@ -2962,6 +3046,22 @@ export class TreeSitterExtractor {
       this.extractTypeRefsFromSubtree(returnType, nodeId);
     }
 
+    // Scala context bounds / type-parameter bounds: `def f[A: Monoid]`,
+    // `[F[_]: Monad]`, `[A <: Foo]` carry the bound type inside `type_parameters`.
+    // This is THE pervasive way a typeclass is required in Scala, yet the bound
+    // never appears in the value parameters. Param NAMES are `identifier` (not
+    // `type_identifier`), so only the bound types surface. Scala-only: in other
+    // languages a `type_parameters` child holds declaration names as
+    // `type_identifier` (TS `<T>`), which would wrongly surface as refs.
+    if (this.language === 'scala') {
+      const typeParams = node.namedChildren.find(
+        (c: SyntaxNode) => c.type === 'type_parameters'
+      );
+      if (typeParams) {
+        this.extractTypeRefsFromSubtree(typeParams, nodeId);
+      }
+    }
+
     // Extract direct type annotation (for class fields like `model: ITextModel`)
     const typeAnnotation = node.namedChildren.find(
       (c: SyntaxNode) => c.type === 'type_annotation'