Преглед изворни кода

fix(prompt-hook): record high-tier gate telemetry only when context was actually injected (#1143) (#1149)

gate('high-keyword'/'high-token') sat outside the injection guard, so an
errored or empty codegraph_explore still counted as a HIGH-tier success.
The gate telemetry is the measured recall/precision funnel that decides
whether the tiered gate design survives — a delivery failure must degrade
it toward noop-*, not inflate the high tiers. Failures now record
noop-explore-keyword / noop-explore-token. Doc enum updated (including
the noop-vocab-empty outcome the #1142 fix adds next).

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
Colby Mchenry пре 1 дан
родитељ
комит
be55b93d02
3 измењених фајлова са 18 додато и 6 уклоњено
  1. 1 1
      CHANGELOG.md
  2. 10 4
      docs/design/telemetry.md
  3. 7 1
      src/bin/codegraph.ts

+ 1 - 1
CHANGELOG.md

@@ -12,7 +12,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 ### New Features
 
 - The Claude Code context hook now recognizes prompts that describe code in plain words — in any language — by checking the prompt's words against the symbol names actually in your project's index. Asking about "the state machine des commandes" finds `OrderStateMachine` with no keyword involved. Confidence decides how much gets injected: structural questions and prompts naming a real symbol still get full context up front; a plain-words match gets a short pointer to the matching symbols so the agent queries them itself; everything else stays silent, exactly as before.
-- Anonymous usage telemetry now counts how often the context hook injected context, offered a hint, or stayed silent — fixed counter names only; the prompt's content is never stored or sent. This makes the hook's accuracy measurable instead of guessed.
+- Anonymous usage telemetry now counts how often the context hook injected context, offered a hint, or stayed silent — fixed counter names only; the prompt's content is never stored or sent. This makes the hook's accuracy measurable instead of guessed. The counters record what actually happened, not what was attempted: a lookup that errors or comes back empty counts as a distinct silent outcome, never as delivered context (#1143, thanks @inth3shadows).
 
 ### Fixes
 

+ 10 - 4
docs/design/telemetry.md

@@ -75,10 +75,16 @@ Event types:
   The prompt hook additionally rolls up its gate DECISION as `cli_command`
   counters named `prompt-hook-gate-<outcome>`, outcome ∈ `high-keyword` /
   `high-token` / `medium-segment` / `nudge-projects` / `noop-shape` /
-  `noop-no-index` / `noop-unverified` — decision names only, never prompt
-  content. This is the gate's measured recall/precision funnel: a rising
-  `noop-*` share against the `high`/`medium` tiers is the signal that the
-  gate (keyword table or segment matching) is missing real questions.
+  `noop-no-index` / `noop-unverified` / `noop-explore-keyword` /
+  `noop-explore-token` / `noop-vocab-empty` — decision names only, never
+  prompt content. This is the gate's measured recall/precision funnel: a
+  rising `noop-*` share against the `high`/`medium` tiers is the signal that
+  the gate (keyword table or segment matching) is missing real questions.
+  A `high-*` outcome means context was actually injected — a gate decision
+  whose `codegraph_explore` errored or returned nothing records
+  `noop-explore-<trigger>` instead (#1143), and a MEDIUM-eligible prompt
+  hitting a not-yet-backfilled segment vocabulary records `noop-vocab-empty`
+  rather than polluting `noop-unverified` (#1142).
 - **`uninstall`** — one per `uninstall`/`uninit` run (churn signal). Props: `targets`.
 
 Volume math: rollups mean monthly events ≈ active machines × active days × distinct

+ 7 - 1
src/bin/codegraph.ts

@@ -1141,8 +1141,14 @@ program
               process.stdout.write(
                 `<codegraph_context note="Structural context from CodeGraph for this prompt — treat returned source as already read; ${more}.">\n${body}${others}\n</codegraph_context>\n`,
               );
+              gate(keyworded ? 'high-keyword' : 'high-token');
+            } else {
+              // A high-* outcome must mean context was actually delivered —
+              // the funnel's noop-vs-high split is how gate recall is
+              // measured (#1143). An explore error or empty result is a
+              // delivery failure, not a gate success.
+              gate(keyworded ? 'noop-explore-keyword' : 'noop-explore-token');
             }
-            gate(keyworded ? 'high-keyword' : 'high-token');
             return;
           }