Merge remote-tracking branch 'origin/main' into garrytan/prompt-injection-guard

2026-05-21 20:28:24 +08:00 · 2026-04-20 05:04:45 +08:00
parent 60a1531124 22a4451e0e
commit d66fac53bf
113 changed files with 12732 additions and 4425 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -139,10 +139,16 @@ SKILL.md files are **generated** from `.tmpl` templates. To update docs:
 To add a new browse command: add it to `browse/src/commands.ts` and rebuild.
 To add a snapshot flag: add it to `SNAPSHOT_FLAGS` in `browse/src/snapshot.ts` and rebuild.

-**Token ceiling:** Generated SKILL.md files must stay under 100KB (~25K tokens).
-`gen-skill-docs` warns if any file exceeds this. If a skill template grows past the
-ceiling, consider extracting optional sections into separate resolvers that only
-inject when relevant, or making verbose evaluation rubrics more concise.
+**Token ceiling:** Generated SKILL.md files trip a warning above 160KB (~40K tokens).
+This is a "watch for feature bloat" guardrail, not a hard gate. Modern flagship
+models have 200K-1M context windows, so 40K is 4-20% of window, and prompt caching
+makes the marginal cost of larger skills small. The ceiling exists to catch runaway
+preamble/resolver growth, not to force compression on carefully-tuned big skills
+(`ship`, `plan-ceo-review`, `office-hours` legitimately pack 25-35K tokens of
+behavior). If you blow past 40K, the right fix is usually: (1) look at WHAT grew,
+(2) if one resolver added 10K+ in a single PR, question whether it belongs inline
+or as a reference doc, (3) only compress carefully-tuned prose as a last resort —
+cuts to the coverage audit, review army, or voice directive have real quality cost.

 **Merge conflicts on SKILL.md files:** NEVER resolve conflicts on generated SKILL.md
 files by accepting either side. Instead: (1) resolve conflicts on the `.tmpl` templates
@@ -433,6 +439,60 @@ CHANGELOG.md is **for users**, not contributors. Write it like product release n
 - No jargon: say "every question now tells you which project and branch you're in" not
  "AskUserQuestion format standardized across skill templates via preamble resolver."

+### Release-summary format (every `## [X.Y.Z]` entry)
+
+Every version entry in `CHANGELOG.md` MUST start with a release-summary section in
+the GStack/Garry voice, one viewport's worth of prose + tables that lands like a
+verdict, not marketing. The itemized changelog (subsections, bullets, files) goes
+BELOW that summary, separated by a `### Itemized changes` header.
+
+The release-summary section gets read by humans, by the auto-update agent, and by
+anyone deciding whether to upgrade. The itemized list is for agents that need to
+know exactly what changed.
+
+Structure for the top of every `## [X.Y.Z]` entry:
+
+1. **Two-line bold headline** (10-14 words total). Should land like a verdict, not
+   marketing. Sound like someone who shipped today and cares whether it works.
+2. **Lead paragraph** (3-5 sentences). What shipped, what changed for the user.
+   Specific, concrete, no AI vocabulary, no em dashes, no hype.
+3. **A "The X numbers that matter" section** with:
+   - One short setup paragraph naming the source of the numbers (real production
+     deployment OR a reproducible benchmark, name the file/command to run).
+   - A table of 3-6 key metrics with BEFORE / AFTER / Δ columns.
+   - A second optional table for per-category breakdown if relevant.
+   - 1-2 sentences interpreting the most striking number in concrete user terms.
+4. **A "What this means for [audience]" closing paragraph** (2-4 sentences) tying
+   the metrics to a real workflow shift. End with what to do.
+
+Voice rules for the release summary:
+- No em dashes (use commas, periods, "...").
+- No AI vocabulary (delve, robust, comprehensive, nuanced, fundamental, etc.) or
+  banned phrases ("here's the kicker", "the bottom line", etc.).
+- Real numbers, real file names, real commands. Not "fast" but "~30s on 30K pages."
+- Short paragraphs, mix one-sentence punches with 2-3 sentence runs.
+- Connect to user outcomes: "the agent does ~3x less reading" beats "improved precision."
+- Be direct about quality. "Well-designed" or "this is a mess." No dancing.
+
+Source material:
+- CHANGELOG previous entry for prior context.
+- Benchmark files or `/retro` output for headline numbers.
+- Recent commits (`git log <prev-version>..HEAD --oneline`) for what shipped.
+- Don't make up numbers. If a metric isn't in a benchmark or production data,
+  don't include it. Say "no measurement yet" if asked.
+
+Target length: ~250-350 words for the summary. Should render as one viewport.
+
+### Itemized changes (below the release summary)
+
+Write `### Itemized changes` and continue with the detailed subsections (Added,
+Changed, Fixed, For contributors). Same rules as the user-facing voice guidance
+above, plus:
+
+- **Always credit community contributions.** When an entry includes work from a
+  community PR, name the contributor with `Contributed by @username`. Contributors
+  did real work. Thank them publicly every time, no exceptions.
+
 ## AI effort compression

 When estimating or discussing effort, always show both human-team and CC+gstack time: