Преглед изворни кода

feat(impact): cross-language static-member / value-read pass — link Enum.value, Type.CONST, Foo::BAR

The recurring residual across Dart, Java, C#, Swift, Kotlin, Scala, PHP: a type
used only through a static member or enum VALUE recorded no edge. The body walker
handled CALLS (`Type.method()`) and `new`, but reading an enum value
(`MediaKind.video`), a static field (`Colors.red`, `JsonScope.EMPTY_DOCUMENT`), or
a class constant (`Foo::BAR`) produced nothing — so an enum/constants type used
purely by value looked like nothing depended on it.

`extractStaticMemberRef` (run in the function-body walk) emits a `references` edge
to the capitalized receiver of a member-access VALUE read. Handles each
language's member-access node (`field_access` Java, `member_access_expression`
C#, `navigation_expression` Kotlin/Swift, `field_expression` Scala,
`class_constant_access_expression`/`scoped_property_access_expression` PHP,
`qualified_identifier` C++, and Dart's `identifier` + sibling `selector`).
Skipped when the access is a call's callee (already linked) and gated to
languages where types are Capitalized by convention — so a lowercase
`obj.field` / `pkg.func` never triggers. TS/JS/Python are deliberately excluded:
already high-coverage and they drive the retrieval-performance benchmark.

Measured: flutter/packages (Dart) 92.4% → 93.2% (the enum-value files); additive
across the other languages. Node count stable (edges only). Verified per-language
(Java/Kotlin/PHP/Swift/Dart/Scala emit the ref to the right type). Full suite
green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Colby McHenry пре 2 недеља
родитељ
комит
857baf76b8
3 измењених фајлова са 160 додато и 0 уклоњено
  1. 1 0
      CHANGELOG.md
  2. 57 0
      __tests__/extraction.test.ts
  3. 102 0
      src/extraction/tree-sitter.ts

+ 1 - 0
CHANGELOG.md

@@ -24,6 +24,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 - Kotlin Multiplatform `expect`/`actual` declarations are now connected. A platform implementation — `actual fun`, `actual class`, or an `actual typealias` in a `jvm` / `native` / `js` / `wasm` source set — is linked to the common `expect` declaration it fulfills (including the common case of an `expect class` fulfilled by an `actual typealias`). Previously a caller in common code resolved to the `expect` declaration, so every platform `actual` looked like it had no dependents and editing one showed an empty blast radius; now changing a platform implementation surfaces the common API and everything that uses it. (Kotlin)
 - Scala impact and `codegraph affected` now connect the type graph that typeclass-style code is built on. A parameterized supertype (`trait Monoid[A] extends Semigroup[A] with Serializable`) now links to each parent; a type used in a `val`/`def` signature, as a type argument, or as a context bound (`def f[A: Monoid]`) — including the trailing implicit parameter list (`(implicit M: Monoid[A])`) where typeclass instances are passed — now records a dependency; and `new T[...] { … }` counts as an instantiation. Previously Scala linked only plain calls and bare, non-generic supertypes, so a trait extended with type parameters, used as a type, or required as an implicit looked like nothing depended on it — which on a typeclass-heavy codebase (cats, algebra) was most of the graph. (Scala)
 - PHP impact and `codegraph affected` now understand namespaces and `use` imports. Classes are tracked by their namespaced name, so the many same-named classes a framework defines (Laravel has 7+ `Factory` interfaces, several `Dispatcher`s, across namespaces) are told apart instead of collapsing into one arbitrary match. A `use App\Contracts\Cache\Factory;` now records a dependency on exactly that class — so a contract or interface that's imported and constructor-injected (the dependency-injection pattern) is no longer reported as having no dependents — and parameter, property, and return type-hints are recorded too. Previously PHP ignored namespaces entirely and linked only calls, `new`, and inheritance. (PHP)
+- A type referenced only through a static member or enum value now records a dependency. Reading an enum value (`MediaKind.video`), a static constant (`Colors.red`, `JsonScope.EMPTY_DOCUMENT`), or a class constant (`Foo::BAR`) now links to the type — previously only method calls and `new` did, so a type or enum used purely *by value* (enum-heavy APIs, constants classes — a very common pattern) looked like nothing depended on it. Applies to Java, C#, Kotlin, Swift, Scala, Dart, PHP, and C++.
 - Dart impact and `codegraph affected` now follow mixins and method type annotations. A `with` mixin — Dart's core composition mechanism, which Flutter is built on — now records a dependency, so editing a mixin surfaces every class that mixes it in (the whole `with` clause used to be dropped, and a class declared `with M` alone even lost its real superclass link). And types used in a method's parameters or return value now link to their definition, so a class or enum referenced only as a type — not constructed or called — is no longer reported as having no dependents. (Dart)
 - C++ free functions are now indexed under their real name. A function written with a qualified-type parameter (`std::string TableFileName(const std::string& dbname)`) or an `auto … -> std::string` trailing return type was mistakenly named after that type (`string`), so calls to it never resolved, `codegraph_node` couldn't find it by name, and the file defining it looked like nothing depended on it. The function now keeps its real name, so cross-file calls, callers, and blast radius work — a meaningful gain for any namespaced C++ codebase (this is how most free functions in a library look). (C++)
 - Ruby impact and `codegraph affected` now follow mixins and `require`s. `include`, `extend`, and `prepend` of a module — Ruby's primary composition mechanism (ActiveSupport concerns, `Comparable`, `Enumerable`) — now record a dependency on that module, so editing a concern surfaces every class that mixes it in; previously these were read as a call to a method named `include`, so a module whose methods are exercised only by application code looked like nothing depended on it. And `require "lib/foo"` / `require_relative "../foo"` now link to the required file, so a file pulled in only by a `require` (config-loaded components, gems that don't autoload) is no longer reported as having no dependents. Together these took a typical gem from ~71% of its files showing real dependents to ~100%. (Ruby)

+ 57 - 0
__tests__/extraction.test.ts

@@ -3686,6 +3686,63 @@ class UserService extends Repository with Loggable {
   });
 });
 
+describe('Static-member / value-read references', () => {
+  let tempDir: string;
+  let cg: CodeGraph;
+
+  beforeEach(() => {
+    tempDir = createTempDir();
+  });
+
+  afterEach(() => {
+    if (cg) cg.close();
+    if (fs.existsSync(tempDir)) fs.rmSync(tempDir, { recursive: true, force: true });
+  });
+
+  it('links a type referenced only via a static field / enum value (and ignores lowercase receivers)', async () => {
+    fs.writeFileSync(
+      path.join(tempDir, 'JsonScope.java'),
+      `class JsonScope {
+  static final int EMPTY_DOCUMENT = 1;
+}
+`
+    );
+    fs.writeFileSync(
+      path.join(tempDir, 'Reader.java'),
+      `class Reader {
+  private int helper;
+  int peek() {
+    return JsonScope.EMPTY_DOCUMENT;
+  }
+  int noop() {
+    return this.helper;
+  }
+}
+`
+    );
+
+    cg = CodeGraph.initSync(tempDir);
+    await cg.indexAll();
+    cg.resolveReferences();
+
+    // JsonScope is used ONLY as `JsonScope.EMPTY_DOCUMENT` (a static-field value
+    // read — never constructed or called), so before the static-member pass it
+    // had no dependents. Editing it now surfaces Reader.java.
+    const scope = cg.getNodesByKind('class').find((n) => n.name === 'JsonScope');
+    expect(scope, 'JsonScope indexed').toBeDefined();
+    const reached = [...cg.getImpactRadius(scope!.id, 3).nodes.values()].map((n) => n.filePath ?? '');
+    expect(reached.some((p) => p.endsWith('Reader.java'))).toBe(true);
+
+    // A lowercase receiver (`this.helper`) must NOT be emitted as a type ref —
+    // only Capitalized receivers (types) are. No node named `this`/`helper`
+    // should appear as a reference target from peek/noop beyond JsonScope.
+    const refTargets = cg
+      .getNodesByKind('class')
+      .filter((n) => n.name === 'this' || n.name === 'helper');
+    expect(refTargets.length).toBe(0);
+  });
+});
+
 describe('Full Indexing', () => {
   let tempDir: string;
 

+ 102 - 0
src/extraction/tree-sitter.ts

@@ -159,6 +159,35 @@ const PHP_TYPE_NODES: ReadonlySet<string> = new Set([
   'primitive_type',
 ]);
 
+/**
+ * Member-access node kinds whose receiver, when it's a capitalized
+ * type/enum/class name, is a real dependency — `Enum.value`, `Type.CONST`,
+ * `Foo::BAR`. These VALUE reads (as opposed to `Type.method()` calls, already
+ * handled) produced no edge, so a type used only via a static member or enum
+ * value looked like nothing depended on it. See {@link extractStaticMemberRef}.
+ */
+const MEMBER_ACCESS_TYPES: ReadonlySet<string> = new Set([
+  'field_access',                       // java (`Foo.BAR`)
+  'member_access_expression',           // c#  (`Foo.Bar`)
+  'navigation_expression',              // kotlin / swift (`Foo.bar`)
+  'field_expression',                   // scala (`Foo.bar`)
+  'class_constant_access_expression',   // php (`Foo::CONST`, `Foo::class`)
+  'scoped_property_access_expression',  // php (`Foo::$bar`)
+  'qualified_identifier',               // c++ (`Foo::bar`)
+]);
+
+/**
+ * Languages whose types are Capitalized by convention, so a capitalized
+ * member-access receiver is reliably a type (not a local/variable). The
+ * static-member/value-read pass is gated to these — the ones where it was the
+ * confirmed residual frontier (enum-value / static-field reads). TS/JS/Python
+ * are deliberately excluded: their coverage was already high and they drive the
+ * retrieval-performance benchmark, so there's no need to perturb their graph.
+ */
+const STATIC_MEMBER_LANGS: ReadonlySet<string> = new Set([
+  'java', 'csharp', 'kotlin', 'swift', 'scala', 'dart', 'php', 'cpp',
+]);
+
 /**
  * Tree-sitter node kinds that represent constructor invocations
  * (`new Foo()` and friends). Used by extractInstantiation to emit
@@ -2398,6 +2427,76 @@ export class TreeSitterExtractor {
     }
   }
 
+  /**
+   * Static-member / value-read pass. A type/enum/class used only via a member
+   * VALUE — `Enum.value`, `Type.CONST`, `Colors.red`, `Foo::BAR` — recorded no
+   * edge, because the body walker only handled CALLS (`Type.method()`). So a
+   * type referenced only by an enum value or a static field looked like nothing
+   * depended on it (the residual frontier across Dart/Java/C#/Swift/Kotlin/PHP).
+   * Emit a `references` edge to the capitalized receiver. Gated to languages
+   * where types are Capitalized by convention, and skipped when the access is a
+   * call's callee (the call extractor already links the method).
+   */
+  private extractStaticMemberRef(node: SyntaxNode): void {
+    if (!STATIC_MEMBER_LANGS.has(this.language)) return;
+    if (this.nodeStack.length === 0) return;
+    const ownerId = this.nodeStack[this.nodeStack.length - 1];
+    if (!ownerId) return;
+
+    // Dart structures member access as an `identifier` + a sibling `selector`,
+    // not a single node. A value-read selector (no `argument_part`) whose
+    // previous sibling is a capitalized identifier is `Enum.value`.
+    if (this.language === 'dart') {
+      if (node.type !== 'selector') return;
+      if (node.namedChildren.some((c: SyntaxNode) => c.type === 'argument_part')) return;
+      const prev = node.previousNamedSibling;
+      if (prev?.type === 'identifier' && /^[A-Z][A-Za-z0-9_]*$/.test(prev.text)) {
+        this.pushStaticMemberRef(prev.text, ownerId, prev);
+      }
+      return;
+    }
+
+    if (!MEMBER_ACCESS_TYPES.has(node.type)) return;
+
+    // Skip `Type.method()` — the access is the callee of a call, already linked.
+    const parent = node.parent;
+    if (parent && this.extractor!.callTypes.includes(parent.type)) {
+      const callee =
+        getChildByField(parent, 'function') ??
+        getChildByField(parent, 'method') ??
+        parent.namedChild(0);
+      if (callee && callee.startIndex === node.startIndex) return;
+    }
+
+    // The receiver must be a SIMPLE capitalized identifier — `Type.X`, not the
+    // nested `a.B.c` (whose own head member-access is visited separately) nor a
+    // lowercase `obj.field` / `pkg.func`.
+    const recv =
+      getChildByField(node, 'object') ??
+      getChildByField(node, 'expression') ??
+      getChildByField(node, 'scope') ??
+      node.namedChild(0);
+    if (!recv) return;
+    const t = recv.type;
+    if (
+      t === 'identifier' || t === 'type_identifier' || t === 'simple_identifier' ||
+      t === 'name' || t === 'scoped_type_identifier'
+    ) {
+      const text = getNodeText(recv, this.source);
+      if (/^[A-Z][A-Za-z0-9_]*$/.test(text)) this.pushStaticMemberRef(text, ownerId, recv);
+    }
+  }
+
+  private pushStaticMemberRef(name: string, ownerId: string, node: SyntaxNode): void {
+    this.unresolvedReferences.push({
+      fromNodeId: ownerId,
+      referenceName: name,
+      referenceKind: 'references',
+      line: node.startPosition.row + 1,
+      column: node.startPosition.column,
+    });
+  }
+
   /**
    * Find a `class_body` child of an `object_creation_expression` — the
    * marker for an anonymous class (`new T() { ... }`). Returns the body
@@ -2640,6 +2739,9 @@ export class TreeSitterExtractor {
         }
       }
 
+      // Static-member / value-read: `Enum.value`, `Type.CONST`, `Foo::BAR`.
+      this.extractStaticMemberRef(node);
+
       // Local variable type annotations inside a body — `const items: Foo[] = []`,
       // `const x: SomeType = svc.load()`. We deliberately do NOT create nodes for
       // locals (that would explode the graph — the data-flow frontier we leave