1
0

telemetry.md 11 KB

Anonymous usage telemetry

Status: implemented — ingest Worker (telemetry-worker/), client (src/telemetry/), codegraph telemetry CLI, MCP + installer wiring, TELEMETRY.md. Pending: Worker deploy

  • DNS, release. Scope: public codegraph engine (CLI + MCP server + installer)

CodeGraph is a local-first tool whose whole pitch is "your code never leaves your machine." Telemetry has to be designed so that sentence stays true and provable: a short, auditable list of anonymous counters, documented field-by-field, easy to turn off, and impossible to grow quietly. This doc is the contract; TELEMETRY.md (repo root, user-facing) restates it and the implementation must never collect anything not listed there.

Goals

Answer, in aggregate and anonymously:

  • How many machines actively use codegraph (daily/weekly), and how does that change?
  • Which agents drive usage (Claude Code, Cursor, Codex, opencode, …) — via MCP clientInfo.
  • Which install targets people pick, local vs global, fresh vs upgrade.
  • Which MCP tools and CLI commands get used, how often, and how often they error.
  • Which languages people index (prioritize extractor/framework work by real usage).
  • Version adoption speed, OS/arch/Node mix, native-vs-wasm SQLite backend share.

Non-goals / never collected

  • No source code, ever. No file paths, file names, repo names, symbol names, query strings, search terms, or anything derived from the contents of an indexed project.
  • No IP addresses (stripped at the edge; storage disabled at the backend too).
  • No hardware fingerprinting — the machine ID is a random UUID, not derived from anything.
  • No per-keystroke / per-call event stream — usage is aggregated locally into daily rollups before anything is sent.
  • No telemetry from the codegraph-pro fork (see "codegraph-pro rule" below).

Principles

  1. The schema is the allowlist. Client sends only the events below; the ingest Worker validates against the same allowlist and drops anything else. Adding a field = PR that edits this doc + TELEMETRY.md + the Worker allowlist together.
  2. Telemetry may never cost the user anything: zero added latency on the MCP tool-call hot path (the repo's core invariant), zero new npm dependencies (global fetch, Node ≥18), zero bytes on stdout (stdio is the MCP protocol channel), zero retries, zero error noise. Every failure mode is silence.
  3. Off is off. When disabled, no process opens a socket to the telemetry endpoint — not even an "opted out" ping.
  4. First-party endpoint. Clients only ever talk to telemetry.getcodegraph.com. The URL baked into a published npm version POSTs there forever, so the domain must be ours; the backend behind it can change without a client release.

Events

Common envelope on every batch (computed once per process):

field example notes
machine_id b3a8… (UUIDv4) random, minted at first run, stored in global config
codegraph_version 0.9.12 from package.json
os / arch darwin / arm64 process.platform / process.arch
node_major 22 major only
ci false CI env var present
schema_version 1 bump when the schema changes

Event types:

  • install — one per installer run. Props: targets (e.g. ["claude","cursor"]), scope (local/global), kind (fresh/upgrade/reinstall), sqlite_backend (native/wasm).
  • index — one per full index (init/index, not per sync). Props: languages (names only, e.g. ["typescript","go"]), file_count_bucket (<100, 100-1k, 1k-10k, 10k+), duration_bucket (<10s, 10-60s, 1-5m, 5m+), sqlite_backend.
  • usage_rollup — the workhorse. One event per (day, kind, name) per machine, aggregated locally. Props: kind (mcp_tool/cli_command), name (e.g. codegraph_explore, affected), count, error_count, and for MCP: client_name/client_version from the initialize handshake (src/mcp/session.ts case 'initialize' — plumbing to add; currently unread).
  • uninstall — one per uninstall/uninit run (churn signal). Props: targets.

Volume math: rollups mean monthly events ≈ active machines × active days × distinct tools used (single digits) — the PostHog free tier (1M events/mo) covers tens of thousands of MAU. There is no per-call event by design.

Events are sent as PostHog anonymous events ($process_person_profile: false): cheaper, no person profiles, unique-machine counts still work on distinct_id = machine_id. Revisit only if retention tooling demands profiles.

Consent & controls

Resolution order (first match wins):

  1. DO_NOT_TRACK=1 (community standard — always honored) → off
  2. CODEGRAPH_TELEMETRY=0|1 → forced off/on for that process
  3. Global config ~/.codegraph/telemetry.json → stored user choice
  4. Default: on, gated by the first-run notice below

Surfaces:

  • Installer (interactive): a visible clack toggle in the existing prompt flow — "Share anonymous usage data? (no code, paths, or names — see TELEMETRY.md)" — default yes. Choice persisted with consent_source: "installer". Re-runs/upgrades respect the stored choice and don't re-ask.
  • Headless paths (npx codegraph init, MCP server — no TTY, never prompt): right before the first actual send (recording only buffers locally and stays silent — so the installer's explicit toggle always precedes any notice), print one line to stderr and record first_run_notice_shown: codegraph collects anonymous usage stats (no code or paths) — "codegraph telemetry off" or CODEGRAPH_TELEMETRY=0 disables. Details: TELEMETRY.md
  • CLI: codegraph telemetry status|on|off (status prints the machine ID, current state, and what decided it). Deleting ~/.codegraph/telemetry.json resets everything, including the machine ID.

~/.codegraph/telemetry.json:

{
  "enabled": true,
  "machine_id": "uuid-v4",
  "consent_source": "installer | default-notice | cli",
  "first_run_notice_shown": true,
  "updated_at": "2026-06-12T00:00:00Z"
}

(~/.codegraph/ is new — today nothing global exists. Coexists by filename if a user ever indexes $HOME itself, since per-project data lives in <project>/.codegraph/ with fixed other filenames.)

Client architecture

New module src/telemetry/ (single small module, no deps):

  • Counters in memory — recording a tool call/CLI command is an in-memory increment. Nothing on the hot path touches disk or network. MCP tool handlers call telemetry.count('mcp_tool', name, ok) and move on.
  • Buffer — counters persist (debounced, async) to ~/.codegraph/telemetry-queue.jsonl. Hard cap ~256 KB; on overflow drop oldest lines. Corrupt buffer → truncate, never throw.
  • Flush — many CLI actions end via process.exit(), where beforeExit never fires and async sends die, so the design is: a tiny synchronous append on process.on('exit') persists in-memory deltas (survives process.exit), and actual network sends happen opportunistically — at the start of long-running commands (init/index/sync/ uninit/upgrade), on an unref'd interval in the long-lived MCP server/daemon, and awaited-with-cap at the end of install/init/index/uninit where a second is invisible. Sends POST completed-day rollups + lifecycle events to https://telemetry.getcodegraph.com/v1/events with AbortSignal.timeout(1500), fire-and-forget: any response (or none) is final — no retry, no error surfaced. The queue is claimed by atomic rename so concurrent processes can't double-send (a crashed sender's claim merges back after an hour). CODEGRAPH_TELEMETRY_DEBUG=1 echoes payloads to stderr for development.
  • Offline / air-gapped: flush fails silently, buffer stays within cap, steady state is a bounded file and zero noise.

Ingest endpoint (Cloudflare Worker)

telemetry.getcodegraph.com → small Worker living at telemetry-worker/ in this repo — public on purpose, so anyone can audit exactly what the endpoint stores. It ships nowhere with the npm package (excluded by the files allowlist):

  • POST /v1/events: validate against the event/property allowlist (drop unknown events, strip unknown props), enforce sane sizes, never forward or log the client IP (drop CF-Connecting-IP), light per-machine_id rate limit so abuse can't burn the ingest cap, forward to https://us.i.posthog.com/batch/ with the project key from a Worker secret. Responds 204 on accept (including events dropped by the allowlist) and honest 4xx for malformed/oversized/rate-limited requests — the client treats every response as final and never retries.
  • Backend today: PostHog Cloud US, free plan, "discard client IP" enabled, GeoIP disabled, autocapture/replay/heatmaps/web-vitals all off. The Worker is the seam: swapping the backend later is a Worker change, not a client release.

codegraph-pro rule (do not lose this in upstream merges)

The private codegraph-pro fork ships inside customer containers whose guarantee is "nothing leaves the box" — including telemetry. In the fork, telemetry must be default-off and not enableable by the installer (compile-time constant or stripped module), and the container sets CODEGRAPH_TELEMETRY=0 as belt-and-braces. This rule lives in the fork's CLAUDE.md and must survive every upstream merge.

Rollout

  1. This doc + repo-root TELEMETRY.md (user-facing field-by-field list) + README section.
  2. Worker + DNS live first (so the first shipping client never 404s), PostHog dashboards: weekly active machines, installs by target, usage by tool × client, version adoption, languages indexed.
  3. Client module + config + codegraph telemetry subcommand + MCP clientInfo plumbing.
  4. Installer toggle + first-run notice. CHANGELOG entry under [Unreleased] announcing telemetry, the default, and every off-switch. Release.

Tests (no DB mocking, per repo convention; fetch mocked at globalThis.fetch): consent precedence (env > config > default), off ⇒ zero fetch calls, rollup aggregation across days, buffer cap + corrupt-buffer recovery, no-stdout invariant under MCP transport, flush abort honors timeout, installer toggle persists + re-run doesn't re-ask (__tests__/installer-targets.test.ts per house rules).

Open questions

  • Exact installer copy / notice wording — maintainer call before release.
  • uninstall event: keep or drop? (Honest churn signal vs. "pinging on the way out" optics.)
  • CI events are kept (tagged ci: true) because engine-in-CI is a real usage mode — revisit if it ever dominates volume.