Merge remote-tracking branch 'origin/main' into garrytan/prompt-injection-guard

This commit is contained in:
Garry Tan
2026-04-20 05:04:45 +08:00
113 changed files with 12732 additions and 4425 deletions

View File

@@ -139,10 +139,16 @@ SKILL.md files are **generated** from `.tmpl` templates. To update docs:
To add a new browse command: add it to `browse/src/commands.ts` and rebuild.
To add a snapshot flag: add it to `SNAPSHOT_FLAGS` in `browse/src/snapshot.ts` and rebuild.
**Token ceiling:** Generated SKILL.md files must stay under 100KB (~25K tokens).
`gen-skill-docs` warns if any file exceeds this. If a skill template grows past the
ceiling, consider extracting optional sections into separate resolvers that only
inject when relevant, or making verbose evaluation rubrics more concise.
**Token ceiling:** Generated SKILL.md files trip a warning above 160KB (~40K tokens).
This is a "watch for feature bloat" guardrail, not a hard gate. Modern flagship
models have 200K-1M context windows, so 40K is 4-20% of window, and prompt caching
makes the marginal cost of larger skills small. The ceiling exists to catch runaway
preamble/resolver growth, not to force compression on carefully-tuned big skills
(`ship`, `plan-ceo-review`, `office-hours` legitimately pack 25-35K tokens of
behavior). If you blow past 40K, the right fix is usually: (1) look at WHAT grew,
(2) if one resolver added 10K+ in a single PR, question whether it belongs inline
or as a reference doc, (3) only compress carefully-tuned prose as a last resort —
cuts to the coverage audit, review army, or voice directive have real quality cost.
**Merge conflicts on SKILL.md files:** NEVER resolve conflicts on generated SKILL.md
files by accepting either side. Instead: (1) resolve conflicts on the `.tmpl` templates
@@ -433,6 +439,60 @@ CHANGELOG.md is **for users**, not contributors. Write it like product release n
- No jargon: say "every question now tells you which project and branch you're in" not
"AskUserQuestion format standardized across skill templates via preamble resolver."
### Release-summary format (every `## [X.Y.Z]` entry)
Every version entry in `CHANGELOG.md` MUST start with a release-summary section in
the GStack/Garry voice, one viewport's worth of prose + tables that lands like a
verdict, not marketing. The itemized changelog (subsections, bullets, files) goes
BELOW that summary, separated by a `### Itemized changes` header.
The release-summary section gets read by humans, by the auto-update agent, and by
anyone deciding whether to upgrade. The itemized list is for agents that need to
know exactly what changed.
Structure for the top of every `## [X.Y.Z]` entry:
1. **Two-line bold headline** (10-14 words total). Should land like a verdict, not
marketing. Sound like someone who shipped today and cares whether it works.
2. **Lead paragraph** (3-5 sentences). What shipped, what changed for the user.
Specific, concrete, no AI vocabulary, no em dashes, no hype.
3. **A "The X numbers that matter" section** with:
- One short setup paragraph naming the source of the numbers (real production
deployment OR a reproducible benchmark, name the file/command to run).
- A table of 3-6 key metrics with BEFORE / AFTER / Δ columns.
- A second optional table for per-category breakdown if relevant.
- 1-2 sentences interpreting the most striking number in concrete user terms.
4. **A "What this means for [audience]" closing paragraph** (2-4 sentences) tying
the metrics to a real workflow shift. End with what to do.
Voice rules for the release summary:
- No em dashes (use commas, periods, "...").
- No AI vocabulary (delve, robust, comprehensive, nuanced, fundamental, etc.) or
banned phrases ("here's the kicker", "the bottom line", etc.).
- Real numbers, real file names, real commands. Not "fast" but "~30s on 30K pages."
- Short paragraphs, mix one-sentence punches with 2-3 sentence runs.
- Connect to user outcomes: "the agent does ~3x less reading" beats "improved precision."
- Be direct about quality. "Well-designed" or "this is a mess." No dancing.
Source material:
- CHANGELOG previous entry for prior context.
- Benchmark files or `/retro` output for headline numbers.
- Recent commits (`git log <prev-version>..HEAD --oneline`) for what shipped.
- Don't make up numbers. If a metric isn't in a benchmark or production data,
don't include it. Say "no measurement yet" if asked.
Target length: ~250-350 words for the summary. Should render as one viewport.
### Itemized changes (below the release summary)
Write `### Itemized changes` and continue with the detailed subsections (Added,
Changed, Fixed, For contributors). Same rules as the user-facing voice guidance
above, plus:
- **Always credit community contributions.** When an entry includes work from a
community PR, name the contributor with `Contributed by @username`. Contributors
did real work. Thank them publicly every time, no exceptions.
## AI effort compression
When estimating or discussing effort, always show both human-team and CC+gstack time: