v1.33.1.0 fix(learnings): token-OR query + task-shaped retrieval in 3 long skills (#1442)

* fix(learnings): use token-OR matching in gstack-learnings-search --query Split the query on whitespace into tokens; a learning matches if ANY token appears as a substring in ANY of key/insight/files. Previously the whole query was a single substring, so multi-word queries like "debug investigation" only matched learnings whose insight contained that exact contiguous phrase, which is usually nothing. Whitespace-only query falls through to no-query (matches today's no-flag behavior). Single-word queries behave exactly as before. Adds test/gstack-learnings-search.test.ts: 3 assertions covering multi-token, single-token, and no-query backwards compat. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(resolver): parameterized LEARNINGS_SEARCH with shell-injection guard The {{LEARNINGS_SEARCH}} macro now accepts a query=KEYWORD argument that gets interpolated as --query "<keyword>" into the generated bash. Empty value falls through to no-query (principle of least surprise: a stray {{LEARNINGS_SEARCH:query=}} placeholder gets today's behavior, not a build failure). Pattern reuses the parameterized-macro parsing from composition.ts. The 13 templates that don't pass a query stay byte-identical in their generated SKILL.md output. Shell-injection guard: the query value is whitelisted to ^[A-Za-z0-9 _-]+$ at gen-skill-docs time. Any \$(), backticks, semicolons, or quotes throw a loud build error instead of emitting executable bash. Static template queries are safe by inspection; this defends against future contributors writing dangerous values. Adds 5 assertions to test/gen-skill-docs.test.ts covering no-args, claude+query=foo bar on both cross-project and project-scoped branches, codex host variant, empty value semantics, and shell-injection payloads (\$(whoami), backticks, ;, &, ", \\, \$x) throwing build errors. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(skills): task-shaped queries + mid-flow refresh in /investigate /qa /ship The three long skills now pull learnings keyed to their theme at the top, then re-pull at phase boundaries as work shifts to new sub-tasks. Top-of-skill queries (5-6 token unions, token-OR matched): - investigate: "debug investigation root cause hypothesis bug fix" - qa: "qa testing bug regression flake fixture" - ship: "release ship version changelog merge pr" Mid-flow refresh blocks (concrete keyword recipe + worked examples): - investigate: between Phase 1 (hypothesis) and Phase 2 (analysis), keyed to the hypothesis noun. Examples: auth-cookie, session-expiry. - qa: between Phase 7 (triage) and Phase 8 (fix loop), keyed to the buggy component name. Examples: checkout-button, signup-form. - ship: just before Step 12 (VERSION bump), keyed to the headline feature. Examples: learnings-search, pacing, worktree-ship. Keyword recipe enforces alphanumeric+hyphen only (no quotes, slashes, dots, colons) so dynamic queries cannot inject shell metacharacters. The other 13 short-lived skills keep the bare {{LEARNINGS_SEARCH}} form. Backwards-compat verified via diff: their generated SKILL.md output is byte-identical to before this change. Golden ship fixtures regenerated to match the new ship/SKILL.md output. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore: bump version and changelog (v1.33.1.0) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test: refresh codex+factory ship golden fixtures Follow-up to 513c9660 — the codex and factory host outputs needed regeneration too, missed in the initial commit because gen:skill-docs was only run for the claude host. Now matches gen:skill-docs --host all. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-21 20:28:24 +08:00 · 2026-05-11 19:34:33 -07:00
parent d21ba06b5a
commit 1a4f0c9c15
16 changed files with 365 additions and 27 deletions
--- a/test/fixtures/golden/claude-ship-SKILL.md
+++ b/test/fixtures/golden/claude-ship-SKILL.md
@@ -1824,9 +1824,9 @@ Search for relevant learnings from previous sessions:
 _CROSS_PROJ=$(~/.claude/skills/gstack/bin/gstack-config get cross_project_learnings 2>/dev/null || echo "unset")
 echo "CROSS_PROJECT: $_CROSS_PROJ"
 if [ "$_CROSS_PROJ" = "true" ]; then
-  ~/.claude/skills/gstack/bin/gstack-learnings-search --limit 10 --cross-project 2>/dev/null || true
+  ~/.claude/skills/gstack/bin/gstack-learnings-search --limit 10 --query "release ship version changelog merge pr" --cross-project 2>/dev/null || true
 else
-  ~/.claude/skills/gstack/bin/gstack-learnings-search --limit 10 2>/dev/null || true
+  ~/.claude/skills/gstack/bin/gstack-learnings-search --limit 10 --query "release ship version changelog merge pr" 2>/dev/null || true
 fi
 ```

@@ -2459,6 +2459,20 @@ already knows. A good test: would this insight save time in a future session? If



+### Refresh learnings for the headline feature on this branch
+
+The top-of-skill learnings pull was keyed to "release ship" broadly. Before the VERSION/CHANGELOG step, re-pull learnings keyed to THIS branch's headline feature so any prior version-bump or CHANGELOG pitfalls for similar features surface.
+
+Pick ONE keyword that names the headline feature you're shipping. The keyword should be a noun: the primary skill or module name, the central feature noun, or the binary you changed. The keyword MUST be alphanumeric or hyphen only — no quotes, slashes, dots, colons, or whitespace. If your candidate has any of those, simplify to just the alphanumeric stem.
+
+Worked examples (ship-specific): good keywords are `learnings-search`, `pacing`, `worktree-ship`. Bad: `the branch headline`, `v1.31.1.0`, `feat: token-or search`.
+
+```bash
+~/.claude/skills/gstack/bin/gstack-learnings-search --query "<your-keyword>" --limit 5 2>/dev/null || true
+```
+
+If any learnings come back, name which one applies to the version bump or CHANGELOG framing in one sentence. If none come back, continue without reference — the absence is itself useful information.
+
 ## Step 12: Version bump (auto-decide)

 **Idempotency check:** Before bumping, classify the state by comparing `VERSION` against the base branch AND against `package.json`'s `version` field. Four states: FRESH (do bump), ALREADY_BUMPED (skip bump), DRIFT_STALE_PKG (sync pkg only, no re-bump), DRIFT_UNEXPECTED (stop and ask).
--- a/test/fixtures/golden/codex-ship-SKILL.md
+++ b/test/fixtures/golden/codex-ship-SKILL.md
@@ -310,6 +310,26 @@ Effort both-scales: when an option involves effort, label both human-team and CC

 Net line closes the tradeoff. Per-skill instructions may add stricter rules.

+12. **Non-ASCII characters — write directly, never \u-escape.** When any
+    string field (question, option label, option description) contains
+    Chinese (繁體/簡體), Japanese, Korean, or other non-ASCII text, emit
+    the literal UTF-8 characters in the JSON string. **Never escape them
+    as `\uXXXX`.** Claude Code's tool parameter pipe is UTF-8 native
+    and passes characters through unchanged. Manually escaping requires
+    recalling each codepoint from training, which is unreliable for long
+    CJK strings — the model regularly emits the wrong codepoint (e.g.
+    writes `\u3103` thinking it is 管 U+7BA1, but `\u3103` is
+    actually ㄃, so the user sees `管理工具` rendered as `㄃3用箱`).
+    The trigger is long, multi-line questions with hundreds of CJK
+    characters: that is exactly when reflexive escaping kicks in and
+    exactly when miscoding is most damaging. Long ≠ escape. Keep
+    characters literal.
+
+    Wrong: `"question": "請選擇\uXXXX\uXXXX\uXXXX\uXXXX"`
+    Right: `"question": "請選擇管理工具"`
+
+    Only JSON-mandatory escapes remain allowed: `\n`, `\t`, `\"`, `\\`.
+
 ### Self-check before emitting

 Before calling AskUserQuestion, verify:
@@ -322,6 +342,7 @@ Before calling AskUserQuestion, verify:
 - [ ] Dual-scale effort labels on effort-bearing options (human / CC)
 - [ ] Net line closes the decision
 - [ ] You are calling the tool, not writing prose
+- [ ] Non-ASCII characters (CJK / accents) written directly, NOT \u-escaped


 ## Artifacts Sync (skill start)
@@ -1789,7 +1810,7 @@ Add a `## Verification Results` section to the PR body (Step 19):
 Search for relevant learnings from previous sessions on this project:

 ```bash
-$GSTACK_BIN/gstack-learnings-search --limit 10 2>/dev/null || true
+$GSTACK_BIN/gstack-learnings-search --limit 10 --query "release ship version changelog merge pr" 2>/dev/null || true
 ```

 If learnings are found, incorporate them into your analysis. When a review finding
@@ -2053,6 +2074,20 @@ already knows. A good test: would this insight save time in a future session? If



+### Refresh learnings for the headline feature on this branch
+
+The top-of-skill learnings pull was keyed to "release ship" broadly. Before the VERSION/CHANGELOG step, re-pull learnings keyed to THIS branch's headline feature so any prior version-bump or CHANGELOG pitfalls for similar features surface.
+
+Pick ONE keyword that names the headline feature you're shipping. The keyword should be a noun: the primary skill or module name, the central feature noun, or the binary you changed. The keyword MUST be alphanumeric or hyphen only — no quotes, slashes, dots, colons, or whitespace. If your candidate has any of those, simplify to just the alphanumeric stem.
+
+Worked examples (ship-specific): good keywords are `learnings-search`, `pacing`, `worktree-ship`. Bad: `the branch headline`, `v1.31.1.0`, `feat: token-or search`.
+
+```bash
+$GSTACK_ROOT/bin/gstack-learnings-search --query "<your-keyword>" --limit 5 2>/dev/null || true
+```
+
+If any learnings come back, name which one applies to the version bump or CHANGELOG framing in one sentence. If none come back, continue without reference — the absence is itself useful information.
+
 ## Step 12: Version bump (auto-decide)

 **Idempotency check:** Before bumping, classify the state by comparing `VERSION` against the base branch AND against `package.json`'s `version` field. Four states: FRESH (do bump), ALREADY_BUMPED (skip bump), DRIFT_STALE_PKG (sync pkg only, no re-bump), DRIFT_UNEXPECTED (stop and ask).
--- a/test/fixtures/golden/factory-ship-SKILL.md
+++ b/test/fixtures/golden/factory-ship-SKILL.md
@@ -312,6 +312,26 @@ Effort both-scales: when an option involves effort, label both human-team and CC

 Net line closes the tradeoff. Per-skill instructions may add stricter rules.

+12. **Non-ASCII characters — write directly, never \u-escape.** When any
+    string field (question, option label, option description) contains
+    Chinese (繁體/簡體), Japanese, Korean, or other non-ASCII text, emit
+    the literal UTF-8 characters in the JSON string. **Never escape them
+    as `\uXXXX`.** Claude Code's tool parameter pipe is UTF-8 native
+    and passes characters through unchanged. Manually escaping requires
+    recalling each codepoint from training, which is unreliable for long
+    CJK strings — the model regularly emits the wrong codepoint (e.g.
+    writes `\u3103` thinking it is 管 U+7BA1, but `\u3103` is
+    actually ㄃, so the user sees `管理工具` rendered as `㄃3用箱`).
+    The trigger is long, multi-line questions with hundreds of CJK
+    characters: that is exactly when reflexive escaping kicks in and
+    exactly when miscoding is most damaging. Long ≠ escape. Keep
+    characters literal.
+
+    Wrong: `"question": "請選擇\uXXXX\uXXXX\uXXXX\uXXXX"`
+    Right: `"question": "請選擇管理工具"`
+
+    Only JSON-mandatory escapes remain allowed: `\n`, `\t`, `\"`, `\\`.
+
 ### Self-check before emitting

 Before calling AskUserQuestion, verify:
@@ -324,6 +344,7 @@ Before calling AskUserQuestion, verify:
 - [ ] Dual-scale effort labels on effort-bearing options (human / CC)
 - [ ] Net line closes the decision
 - [ ] You are calling the tool, not writing prose
+- [ ] Non-ASCII characters (CJK / accents) written directly, NOT \u-escaped


 ## Artifacts Sync (skill start)
@@ -1794,9 +1815,9 @@ Search for relevant learnings from previous sessions:
 _CROSS_PROJ=$($GSTACK_BIN/gstack-config get cross_project_learnings 2>/dev/null || echo "unset")
 echo "CROSS_PROJECT: $_CROSS_PROJ"
 if [ "$_CROSS_PROJ" = "true" ]; then
-  $GSTACK_BIN/gstack-learnings-search --limit 10 --cross-project 2>/dev/null || true
+  $GSTACK_BIN/gstack-learnings-search --limit 10 --query "release ship version changelog merge pr" --cross-project 2>/dev/null || true
 else
-  $GSTACK_BIN/gstack-learnings-search --limit 10 2>/dev/null || true
+  $GSTACK_BIN/gstack-learnings-search --limit 10 --query "release ship version changelog merge pr" 2>/dev/null || true
 fi
 ```

@@ -2429,6 +2450,20 @@ already knows. A good test: would this insight save time in a future session? If



+### Refresh learnings for the headline feature on this branch
+
+The top-of-skill learnings pull was keyed to "release ship" broadly. Before the VERSION/CHANGELOG step, re-pull learnings keyed to THIS branch's headline feature so any prior version-bump or CHANGELOG pitfalls for similar features surface.
+
+Pick ONE keyword that names the headline feature you're shipping. The keyword should be a noun: the primary skill or module name, the central feature noun, or the binary you changed. The keyword MUST be alphanumeric or hyphen only — no quotes, slashes, dots, colons, or whitespace. If your candidate has any of those, simplify to just the alphanumeric stem.
+
+Worked examples (ship-specific): good keywords are `learnings-search`, `pacing`, `worktree-ship`. Bad: `the branch headline`, `v1.31.1.0`, `feat: token-or search`.
+
+```bash
+$GSTACK_ROOT/bin/gstack-learnings-search --query "<your-keyword>" --limit 5 2>/dev/null || true
+```
+
+If any learnings come back, name which one applies to the version bump or CHANGELOG framing in one sentence. If none come back, continue without reference — the absence is itself useful information.
+
 ## Step 12: Version bump (auto-decide)

 **Idempotency check:** Before bumping, classify the state by comparing `VERSION` against the base branch AND against `package.json`'s `version` field. Four states: FRESH (do bump), ALREADY_BUMPED (skip bump), DRIFT_STALE_PKG (sync pkg only, no re-bump), DRIFT_UNEXPECTED (stop and ask).