1
0

TELEMETRY.md 4.3 KB

Telemetry

CodeGraph collects a small set of anonymous usage statistics — which commands and tools get used, which languages get indexed, which agents drive usage — so we can tell which of the 20+ languages and 8 agent integrations deserve the most work. This page is the complete list of what is collected. If a field isn't on this page, it isn't collected; the ingest endpoint enforces this list as an allowlist and is itself public, auditable code in this repository.

Turning it off

Any of these works, permanently:

codegraph telemetry off        # stores your choice (and deletes any unsent data)
export CODEGRAPH_TELEMETRY=0   # per-shell / per-CI override
export DO_NOT_TRACK=1          # the cross-tool standard — always honored

codegraph telemetry status shows the current state, what decided it, and your machine ID. The interactive installer (codegraph install) asks up front with a visible default-on toggle and never re-asks. If you never saw the installer (e.g. npx straight into init), a one-line notice is printed to stderr before the first time anything is sent.

Off means off: when disabled, CodeGraph records nothing, opens no connection to the telemetry endpoint, and sends no "opted out" ping.

What is collected

Every payload carries this envelope:

field example notes
machine_id b3a8c1… random UUID minted on first send — derived from nothing
codegraph_version 0.9.9
os / arch darwin / arm64 platform identifiers only
node_major 22 major version only
ci false whether the CI env var was set
schema_version 1 bumped when this page changes

And one of four events:

  • install — when codegraph install configures agents: which agents (["claude","cursor",…]), global vs project-local, and whether it was a fresh install, an upgrade, or a re-run.
  • index — when a full index completes: the language names present (e.g. ["typescript","go"]), the file count as a coarse bucket (<100, 100-1k, 1k-10k, 10k+), the duration as a bucket (<10s, 10-60s, 1-5m, 5m+), and the SQLite backend (native/wasm).
  • usage_rollup — one line per day per tool: the tool or CLI command name (e.g. codegraph_explore, init), how many times it ran, how many errored, and — for MCP tools — the connecting agent's name and version from the MCP handshake (e.g. Claude Code 2.1).
  • uninstall — when codegraph uninstall/uninit runs: which agents were removed.

Usage is aggregated locally into daily totals before anything is sent — there is no per-call event stream, and nothing is sent in real time.

What is never collected

  • No source code. No file paths, file names, directory names, repository names or URLs, symbol names, search queries, or anything else derived from the contents of an indexed project.
  • No IP addresses. The ingest endpoint never reads, logs, or forwards the client IP, and IP discarding is enabled at the analytics backend on top of that. No geolocation.
  • No fingerprinting. The machine ID is a random UUID stored in ~/.codegraph/telemetry.json — delete that file (or run codegraph telemetry off, then on) and the old ID is gone forever, with no way to reconnect it.
  • No personal data. No usernames, hostnames, emails, or environment variables.

How it travels

Events POST to telemetry.getcodegraph.com — a first-party endpoint whose complete source lives in telemetry-worker/ in this repository. It validates every event and property against the allowlist above (anything else is dropped), strips IPs, rate-limits, and forwards to a managed analytics store (PostHog, US region) as anonymous events. Sends are fire-and-forget with a short timeout: offline or air-gapped machines buffer a bounded local file (256 KB cap) and never retry-loop, log errors, or slow a command down. Telemetry never adds latency to MCP tool calls — recording is an in-memory counter.

The engineering contract behind all of this — including the rule that schema changes must update this page, the client, and the public endpoint in one PR — is in docs/design/telemetry.md.