# Changelog All notable changes to CodeGraph are documented here. Each entry also ships as a [GitHub Release](https://github.com/colbymchenry/codegraph/releases) tagged `vX.Y.Z`, which is where most people will look. This project follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). ## [Unreleased] ## [0.9.7] - 2026-05-28 ### Added - **Generated-file down-ranking across search, trace, and explore.** A new filename-based classifier (`src/extraction/generated-detection.ts`) flags protobuf / gRPC / mockgen / build-output files (`.pb.go`, `.pulsar.go`, `_grpc.pb.go`, `_mock.go`, `_mocks.go`, `mock_*.go`, `.generated.[jt]sx`, `_pb2(_grpc)?.py`, `.pb.{cc,h}`, `.g.dart`, `.freezed.dart`) and pushes them LAST in disambiguation. Before this, a `codegraph_search "Send"` on cosmos-sdk returned the gRPC interface stub at `tx_grpc.pb.go:124` as the first match — the trace landed on that empty stub, reported "no path", and the agent fell back to Read. With the down-rank applied to `findSymbol`, `findAllSymbols`, `codegraph_search`, the CLI `query` command, AND the context Entry Points / Related Symbols / Code blocks, the bank keeper's `msgServer.Send` (the real implementation) ranks #3 instead of #9 and trace lands on it directly. Pure path-based classifier — no schema change, no index migration. - **gRPC interface→implementation bridge for Go.** New synthesizer `goGrpcStubImplEdges` in `src/resolution/callback-synthesizer.ts` finds `UnimplementedXxxServer` structs in `.pb.go` / `_grpc.pb.go` files, identifies their RPC-method signatures (excluding the `mustEmbed*` / `testEmbeddedByValue` gRPC markers), and links each stub method to the hand-written impl method on any struct whose method-name set is a superset. Closes Go's structural-typing gap that the Java/Kotlin-only `interfaceOverrideEdges` couldn't bridge. Excludes other generated files from candidate impls so a sibling `msgClient` in the same `.pb.go` doesn't get falsely paired. Measured on cosmos-sdk: 467 stub→impl `calls` edges synthesized, bank's `UnimplementedMsgServer::Send` now points only to `x/bank/keeper/msg_server.go::msgServer::Send` — not to mocks, not to client wrappers. - **Trace-failure response now inlines both endpoints' bodies + neighbors.** When `codegraph_trace` can't find a static call path (typically a dynamic-dispatch break), it used to return a one-liner telling the agent to call `codegraph_node` next — which triggered 3-4 follow-up calls plus a Read. The new failure response inlines each endpoint's source (capped at 120 lines / 3600 chars), callers, and callees in one response. On the cosmos-Q3 / etcd-Q2 audits this eliminated the entire fan-out pattern (5-11 codegraph calls collapsed into 1-2). - **Path-proximity pairing in trace endpoint selection.** In a multi-module Go repo, a symbol like `EndBlocker` exists in 20+ modules; FTS picks one almost arbitrarily. Trace now scores every `from` × `to` candidate pair by shared directory prefix length (longest match wins) so `x/gov/abci.go::EndBlocker` + `x/gov/keeper/tally.go::Tally` are paired before `simapp/app.go`'s wrapper EndBlocker is even considered. A less-canonical-path penalty (`enterprise/`, `contrib/`, `examples/`, `vendor/`, `third_party/`, `deprecated/`, `legacy/`) ensures a side-module with a longer shared prefix doesn't beat the canonical module with a shorter one. FindPath probe budget capped at 20 pairs. - **Test-file deprioritization in `codegraph_explore`.** Existing `isLowValue` only caught directory-style patterns (`/tests/`, `/spec/`); now also catches Go's `_test.go`, Ruby's `_spec.rb`, JS/TS `.test.ts` / `.spec.tsx`, and Java/Kotlin/Scala `*Test.java` / `*Spec.kt`. Without this, etcd's `watchable_store_test.go` consumed 5K chars of explore budget that should have gone to the hand-written flow source. - **Small-repo retrieval tuning (`<500` indexed files).** Three coordinated changes so small projects resolve flow questions in 1-2 MCP calls instead of 3-5. (i) MCP tool surface drops to the 5 core tools (`codegraph_search` / `codegraph_context` / `codegraph_node` / `codegraph_explore` / `codegraph_trace`); the other 5 (`codegraph_callers` /`codegraph_callees`/`codegraph_impact`/`codegraph_status`/`codegraph_files`) cost more in tool-list overhead than they recoup at this scale. Empirically validated as the floor — n=2 audits showed cutting below 5 regresses cobra/ky/sinatra (3-tool gate) and catastrophically regresses express (1-tool gate, +107% LOSS). (ii) `codegraph_context` responses end with a strong directive telling the agent the response IS the comprehensive pass for a project this size and follow-ups should be narrow (`trace from→to`, single-symbol `node`) — not another broad `codegraph_explore` that re-bundles the same content. (iii) Explore output budget gets a sub-150 tier (13K total / 4 files / 3.8K each, Relationships section dropped, test/spec/icon/i18n files hard-excluded from the relevant-file set unless the query is about tests), and `codegraph_context` `maxNodes` defaults to 8 instead of 20. - **`codegraph_context` auto-traces flow queries.** When the task reads like "how does X reach Y", "trace the path from A to B", or "how does X propagate through Z", `codegraph_context` now runs the trace internally and splices its body into the response. Detection is conservative — needs a flow keyword AND ≥2 distinct PascalCase / camelCase identifiers, with the first two ordered by appearance taken as `from`/`to`. On dynamic-dispatch breaks it falls back to the trace-failure response (which already inlines both endpoint bodies + neighbors). Saves the follow-up `codegraph_trace` that was the #2 cost driver on multi-module flow questions in the audit. - **Routing-manifest inline in `codegraph_context` for small-repo routing queries.** When the task mentions routes/handlers/endpoints/middleware/etc. on a sub-500-file project, `codegraph_context` now appends a compact URL → handler table built from `route` nodes + their `references`/`calls` edges, then inlines the full source (≤16KB) of the file holding the most handler endpoints. Targets the Glob+Read pattern that was beating codegraph on realworld template repos (rails-realworld, laravel-realworld, drupal-admintoolbar, …) where the agent would just read `routes.rb` / `web.php` instead of asking the graph. Manifest is silently skipped when fewer than 3 non-test routes exist or no file holds ≥30% of them (no single answer file). - **Core-directory ranking boost in `codegraph_context` search.** Projects with one file holding the dense majority of internal call edges (e.g. sinatra's `lib/sinatra/base.rb` at ~85% of all in-file edges) now get search results in that file's directory boosted by +25 score. Fixes the case where a small extension file with a verbatim name match outranks the actual framework core (sinatra-contrib's `multi_route.rb` `route` was outranking base.rb's `route!`). Test and generated files are excluded from "dominant file" candidacy so etcd's `rpc.pb.go` (1916 in-file edges, generated protobuf) can't beat the hand-written `server/etcdserver/server.go` (470 edges). - **Interface → implementation synthesis extended beyond JVM.** `interfaceOverrideEdges` previously bridged interface methods to concrete impls in Java/Kotlin only. Now also runs for C#, TypeScript, JavaScript, Swift, and Scala — Swift conformance also iterates `struct` nodes (value-type protocol conformance) alongside `class`. Closes the same structural-typing gap the new Go gRPC bridge closes, for any language where the resolver emits explicit `implements`/`extends` edges. - **Shorter MCP tool descriptions.** All 10 `codegraph_*` tool descriptions condensed (typically ~50% shorter), keeping the "use this for X / prefer over Y" steering but dropping the longer rationale (which lives in `server-instructions.ts`, the load-bearing channel). Tool-list bytes on the agent side drop proportionally; cumulative across multi-tool sessions. - **Java / Kotlin imports now resolve by fully-qualified name.** Extraction wraps every top-level declaration of a `.kt` / `.java` file in a `namespace` node carrying the file's `package` (so a class `Bar` in `package com.example.foo` is indexed with qualifiedName `com.example.foo::Bar`), and `import com.example.foo.Bar` looks the target up through that index — regardless of whether the class lives in `Bar.kt`, `Models.kt`, or a top-level function. Disambiguates same-name classes across packages (the central failure mode of the previous name-matcher fallback in multi-module Spring / Android codebases), works across the Java↔Kotlin interop boundary, and lays groundwork for binding-precise Dagger2 / Hilt resolution. Wildcard imports (`com.example.*`) still go through name-matcher. - **Java / C# anonymous classes (`new T() { ... }`) are now extracted as first-class class nodes with their overrides.** Previously, an anonymous subclass returned from a factory or lambda — `return new BaseIter() { @Override int separatorStart(int s) { ... } };` — produced only an `instantiates` edge: the override methods were invisible to the graph and Phase 5.5 interface-impl synthesis had no class to bridge. The anon class now lands as `` with an `extends` reference to the named base/interface, scoped under the enclosing method, and its `method_declaration` members become normal method nodes. The interface→impl synthesizer then bridges the base's abstract methods to the anonymous overrides automatically. Concrete effect on `google/guava` (3,227 .java files): 3,608 anonymous classes extracted, +2,534 interface-impl edges reach overrides hidden in `new T() { ... }` blocks (including lambda bodies). An agent investigating `Splitter.SplittingIterator.separatorStart` now sees the four anonymous overrides in its trail without a Read. ### Changed - **The installer no longer writes a `## CodeGraph` instructions block into your agent's instructions file** (`CLAUDE.md`, `AGENTS.md`, `GEMINI.md`, Cursor's `.cursor/rules/codegraph.mdc`, or Kiro's steering doc). That block duplicated, almost verbatim, the usage guidance the MCP server already emits in its `initialize` response — so every agent that surfaces MCP instructions (Claude Code does) read the same playbook twice each turn (#529). The MCP server instructions are now the single source of truth. `codegraph install` stops writing the block, and **the next time you run `codegraph install` (or `codegraph uninstall`) it strips a block a previous version wrote**, preserving everything else in the file (and deleting Cursor `.mdc` / Kiro steering files that were ours outright). Note: simply upgrading the npm package does not remove an existing block — re-run the installer to clean it up. The leftover block is harmless meanwhile (just redundant with the MCP instructions). If you'd added your own notes inside the ``/`` markers, move them outside the markers first — only the marked block is removed. ### Fixed - **MCP tools no longer return rows for files deleted while no server was running.** The post-open catch-up sync that reconciles the index against the working tree (catching `git pull`/`checkout`/`rebase` and any edits or deletes made between sessions) was fire-and-forget — so a tool call that landed in the first ~50–300ms could race past it and serve rows for files that no longer exist on disk. The per-file staleness banner couldn't help here, because that signal is populated by the file watcher (which doesn't see pre-startup changes). Now the first tool call of the session awaits the catch-up before serving; subsequent calls pay nothing. Most visible on the "deleted everything between sessions" case, where MCP now returns the correct empty index instead of stale rows. Validated end-to-end on a 10,640-file VS Code index. - **Windows: black console windows no longer flash on every file save / MCP reconnect (#485, #510, #530).** v0.9.5 moved the MCP server to a detached shared daemon (#411). Detached processes have no inherited console on Windows, so any console-subsystem child they spawn (the daemon's `git` invocations during auto-sync, the WASM-runtime `node` re-exec, the installer's `npm` shell-out) is created with a fresh console window visible to the user unless the spawn passes `windowsHide: true` (which libuv translates to `STARTF_USESHOWWINDOW | SW_HIDE`, so the window is created hidden and never flashes). All ten `spawnSync` / `execFileSync` / `execSync` call sites across extraction, sync, installer, and the WASM-flags relaunch now pass `windowsHide: true`. macOS/Linux ignore the option, so this is a no-op elsewhere. The daemon launcher itself (`src/mcp/index.ts`) already passed the flag — these children had been missed. - **`codegraph index` / `init -i` summary now reports the true edge count.** The per-file counter in the orchestrator only saw extraction-phase edges, so resolution and synthesizer edges (often >50% of the graph on cross-file-heavy repos like Spring multi-module Java) were missing from the `X nodes, Y edges` line. Snapshotting the DB before/after the full pipeline now reports the actual additions. Example: indexing `macrozheng/mall` previously reported `20 047 edges` while the DB held `45 629`. ## [0.9.6] - 2026-05-27 - **C/C++ `#include` resolution — bare-basename includes now connect to the actual header file, not a phantom import node (#453).** Path-prefixed includes (`#include "common/args.h"`) already resolved via file-path suffix matching, but bare-basename includes (`#include "uint256.h"` from a caller in another directory) used to leave only a phantom edge to a floating `import` node owned by the including file. The resolver now walks C/C++ include search directories — pulled from `compile_commands.json` (`-I`/`-isystem` flags) when present, otherwise discovered by probing conventional dirs (`include/`, `src/`, `lib/`, `api/`, `inc/`) plus any top-level directory containing `.h`/`.hpp` files — and resolves the include to a real file node, producing a true file→file `imports` edge. System headers (``, ``, ``, ~80 C and ~80 C++ stdlib names) are filtered before the scan so they don't false-resolve via heuristic dir matching. C/C++ built-in symbols (`std::*` unconditionally, plus `printf`/`malloc`/`cout`/`make_shared`/etc. when **no user-defined symbol with that name exists**) are filtered from name-matching too — C/C++ projects routinely shadow stdlib names (custom allocators, stream wrappers, logging libs), so the filter only fires when there's no real definition to bind to. Measured on bitcoin-core (1,989 indexed files): C/C++ file→file `imports` edges 6,027 → 8,086 (**+34%**), false-positive call edges from `std::move`/`std::swap` etc. into similarly-named user methods −2,154 (**−3.6%** of C/C++ `calls`). - **Enterprise Spring / MyBatis flow now traces end-to-end (#389).** Three gaps that previously forced agents back to grep on large Spring/MyBatis codebases are closed: - **MyBatis XML mapper indexing + Java↔XML bridge.** `*.xml` files containing `` are now first-class: each `` and `` becomes a method-shaped node qualified as `::`, and a new synthesizer (`mybatis-java-xml`) links the matching Java mapper interface method → its XML statement with a `calls` edge. `` to a `` fragment in the same mapper also resolves. Non-mapper XML (`pom.xml`, `web.xml`, `log4j.xml`, etc.) emits just a file node — no symbol noise. Validated on macrozheng/mall-tiny: all 6 custom-SQL Java mapper methods reach their XML counterparts; `trace(UmsRoleController.listResource, UmsResourceMapper::getResourceListByRoleId-xml)` connects in 4 hops across controller → service-iface → impl → mapper-iface → XML. - **Spring `@Value`/`@ConfigurationProperties` config-key linkage.** `application.{yml,yaml,properties}` (+ profile variants `application-dev.yml`, `bootstrap.yml`, etc.) is parsed during indexing, with one `constant` node per leaf key qualified by its dotted path (`app.cache.name.user-token`). `@Value("${app.cache.name.user-token}")` and `@ConfigurationProperties(prefix = "app.cache")` references in Java/Kotlin emit binding nodes that resolve to the matching key (or, for `@ConfigurationProperties`, a key under the prefix). Spring's **relaxed binding** applies (kebab `cache-list` ↔ camel `cacheList` ↔ snake `cache_list` ↔ `CACHE_LIST`), so a Java `@Value("${app.retryCount}")` finds `app.retry-count` in `application.properties`. `${key:default}` form is supported; the default is stripped before lookup. - **Field-injected concrete-bean trace.** A Spring controller's `@Resource(name="userBO") private UserBO userbo;` followed by `this.userbo.toLogin2(...)` now resolves through to `UserBO.toLogin2` even when the field type is a concrete class whose name doesn't match the field by Java naming convention (`userbo` → `UserBO`). The fix is two layered changes in the language layer (Java only): (a) the call extractor unwraps `this.` receivers (previously surfaced as `this.userbo.toLogin2` and dropped through every name-matcher strategy); (b) the resolver looks up the receiver name in the enclosing class's field declarations and uses the declared type to resolve the method. This generalizes beyond Spring — any Java code using `this.field.method()` now resolves correctly. ### Fixed - **Java/Kotlin imports now disambiguate same-name classes across modules (#314).** A Maven multi-module project where `dao/converter/FooConverter` and `service/converter/FooConverter` both expose a `convert` method used to resolve via file-path proximity — picking whichever class was closer to the caller, which is wrong any time the caller lives in an equidistant cross-cutting module. The import resolver had no Java branch at all (`extractImportMappings` returned `[]` for `.java`/`.kt`), so the FQN signal Java imports carry — `import com.example.dao.converter.FooConverter;` — was being thrown away. New `extractJavaImports` parses regular and `import static` directives. `resolveViaImport` now has a Java/Kotlin cross-file branch that converts the imported FQN to a file-path suffix (`com/example/dao/converter/FooConverter.java`) and resolves the symbol against the file whose path matches. For the `@Autowired private FooConverter fooConverter; fooConverter.convert(...)` field-receiver pattern (Spring's typical shape), `matchMethodCall` now passes the imported FQN to `resolveMethodOnType` so when multiple `FooConverter::convert` candidates exist, the import — not iteration order — picks the right one. Validated end-to-end on a synthetic two-module repro: swapping only the `import` line on the caller (with identical field declaration and call site) switches the resolved target between dao and service correctly. On spring-petclinic, +15 newly import-resolved Java edges with no regression in `calls`/`imports`/`extends`. - **TypeScript `type` aliases with object shapes no longer cause cross-module false-positive call edges (#359).** Receiver-typed `handle.stop()` where `handle: RecorderHandle` and `RecorderHandle = { stop: () => Promise }` used to attach the call edge to an unrelated `class Foo { stop() {} }` in a sibling directory via path-proximity matching, because the type alias had no `stop` node — only the look-alike class did. The fix surfaces type-alias object-shape members (and intersection-type members) as first-class `property`/`method` nodes under the alias: `type X = { foo: T; bar(): T }` now produces `X::foo` and `X::bar` in the graph. Function-typed properties (`stop: () => Promise`) are emitted as `method` kind so `obj.stop()` resolves to them; non-function properties remain `property` kind. With the alias's members in the graph, the existing camelCase receiver-name word overlap (`recorder` ↔ `RecorderHandle`) routes the call to the correct alias member instead of the wrong class. Anonymous nested object types inside generic arguments (`Promise<{ ok: true }>`) intentionally don't produce phantom members — only immediate `object_type` / `intersection_type` operands of the alias value are walked. Measured on excalidraw/excalidraw (314 .ts files): **+776 new property nodes** + **+1,008 method nodes from type-alias members** + **+226 newly accurate `calls` edges** pointing at alias members (some shifted from incorrect class targets, some previously unresolved). - **C# now produces `references` edges for parameter, return, property, and field types (#381).** Indexing any C# project used to yield **zero** `references` edges, so `codegraph_callers SomeDto` returned no results even when the DTO was used as a parameter or return type across the codebase, and `codegraph_callees` on a service class only saw its `using` imports. Two root causes: `csharp.ts` was missing `returnField`, and the type-leaf walker only matched `type_identifier` nodes — but C# tree-sitter emits `identifier`/`predefined_type`/`qualified_name`/`generic_name` instead. The fix adds the missing extractor field, routes C# through a dedicated type walker that only descends into known type-position fields (so parameter NAMES like `request` in `Build(UserDto request)` never mis-emit as type refs), and hooks `extractField`/`extractProperty` to invoke the walker. Measured on dotnet/eShop (527 `.cs` files): C# `references` edges go from **35 → 925** (+26x), with no regression in `calls`/`imports`/`instantiates`/`extends`/`implements`. - **Go cross-package qualified calls (`pkga.FuncX(...)`) now resolve to the right package (#388).** On a Go monorepo with a layered package layout (handler/service/domain/dao), `codegraph_callers`, `_callees`, `_impact`, and `_trace` used to return ~0-1 results where grep finds hundreds to thousands of real call sites — the central value proposition of CodeGraph silently degraded on entire Go codebases. Root cause: the import resolver flagged every Go import path without `/internal/` as third-party (because it had no idea what the project's own module path was), so cross-package calls fell through to name-matching with path-proximity scoring, which on real codebases picks ~one accidental candidate per call site. The Go branch now reads the project's `go.mod`, treats `/...` imports as in-module, and looks up the qualified symbol in the imported package's directory; same-name functions in *different* packages no longer collide. As a side fix, Go nodes now correctly carry `is_exported=1` for capitalized identifiers (the resolver needs this to filter candidates). Measured on gRPC-Go (1,031 `.go` files, layered packages): cross-package `calls` edges go from 10,880 → 19,929 (**+83%**), total `calls` from 23,803 → 34,105 (**+43%**), with no false-positive resolution of stdlib calls (`fmt.Println` etc. stay external). - **`codegraph_files` now returns the whole project when an agent passes `path="/"`, `"."`, `"./"`, `""`, or a Windows-style `"\\"` — instead of "No files found matching the criteria."** Indexed file paths are stored as project-relative POSIX (e.g. `src/foo.ts`), but the path filter used a plain `startsWith`, so a leading slash or any of the other root-ish shapes an agent might guess matched nothing and pushed the agent back to Read/Glob — the exact opencode + Gemini Flash regression reported on Windows 11. Subdirectory filters are now equally forgiving: `"/src"`, `"./src"`, `"src/"`, `"src\\components"`, etc. all resolve correctly. Sibling-prefix bleed (`"src"` was previously matching `src-utils/...`) is also fixed — the filter now requires either an exact match or a `/` boundary. Closes #426. - **File watcher no longer marks edited files as fresh when another process holds the index lock.** When a second writer (concurrent `codegraph index`, a git hook, another MCP daemon) held `.codegraph/codegraph.lock`, `CodeGraph.sync()` returned a zero-shape no-op instead of throwing. The file watcher took that as a successful sync and cleared `pendingFiles` — so the per-file staleness signal MCP tools surface to agents (issue #403) dropped immediately, even though the edit was never indexed. `CodeGraph.watch()` now converts that no-op into a typed `LockUnavailableError` thrown into the watcher; the existing retry path preserves `pendingFiles` and reschedules until the lock becomes available. The error is logged at debug only (no `onSyncError` callback) so a long-running external indexer doesn't spam stderr every debounce cycle. Closes #449. - **TS/JS top-level initializer calls and inline-object-method calls are no longer dropped.** Calls inside a top-level variable initializer (`const token = getTokenMp()`) and inside methods of an inline object literal (`{ methods: { save() { getTokenMp() } } }`) were never walked by the variable / method-definition extractors, so `getTokenMp` showed up nowhere in `codegraph_callers`. The variable extractor now walks any non-object initializer value for calls; the method-definition extractor still avoids creating synthetic nodes for inline-object methods (the noise reason is unchanged) but now walks their bodies so the calls inside aren't lost. Surfaces in plain `.ts`/`.js` files (top-level `const x = foo()`) and in Vue SFCs (`