docs: add skill-quality evaluator scenario

2026-05-13 08:03:04 +08:00 · 2026-05-12 18:24:41 -04:00
parent b25d4770f5
commit 337ced0828
8 changed files with 291 additions and 8 deletions
--- a/docs/ECC-2.0-GA-ROADMAP.md
+++ b/docs/ECC-2.0-GA-ROADMAP.md
@@ -201,7 +201,7 @@ is not complete unless the evidence column exists and has been freshly verified.
 | AgentShield enterprise iteration | Policy gates, SARIF, packs, provenance, corpus, HTML reports, exception lifecycle audit | PRs #53, #55-#62 landed with test evidence | Needs PDF/export decision or next enterprise signal |
 | ECC Tools next-level app | Billing audit, PR checks, deep analyzer, sync backlog | PRs #26-#39 landed with test evidence | Needs capacity-backed Linear rollout / broader evaluator corpus |
 | GitGuardian/Dependabot/CodeRabbit-style checks | Non-blocking taxonomy and deterministic follow-up checks | ECC-Tools risk taxonomy check plus follow-up signals landed, including Skill Quality, Deep Analyzer Evidence, Analyzer Corpus Evidence, RAG/Evaluator Evidence, and PR Review/Salvage Evidence | Partially complete |
-| Harness-agnostic learning system | Audit, adapter matrix, observability, traces, promotion loop | Audit/adapters/observability gates plus `docs/architecture/evaluator-rag-prototype.md` and `examples/evaluator-rag-prototype/` define read-only stale-salvage, billing-readiness, CI-failure-diagnosis, harness-config-quality, and AgentShield policy-exception scenarios with trace, report, playbook, and verifier result artifacts | Needs skill-quality and deep-analyzer corpus |
+| Harness-agnostic learning system | Audit, adapter matrix, observability, traces, promotion loop | Audit/adapters/observability gates plus `docs/architecture/evaluator-rag-prototype.md` and `examples/evaluator-rag-prototype/` define read-only stale-salvage, billing-readiness, CI-failure-diagnosis, harness-config-quality, AgentShield policy-exception, and skill-quality evidence scenarios with trace, report, playbook, and verifier result artifacts | Needs deep-analyzer corpus |
 | Linear roadmap is detailed | Linear project status plus repo mirror | Repo mirror exists; issue creation was retried on 2026-05-12 and remains blocked by the workspace free issue limit | Needs recurring status updates after each merge batch |
 | Flow separation and progress tracking | Flow lanes with owner artifacts and update cadence | This roadmap defines lanes below | Active |
 | Realtime Linear sync | Project updates while issue limit is blocked; issues later | ECC-Tools #39 implements opt-in Linear API sync for deferred follow-up backlog items | Needs workspace capacity/config rollout |
@@ -220,7 +220,7 @@ back to the repo evidence and merge commits.
 | Queue hygiene and salvage | GitHub PR/issue state, salvage ledger | Append ledger entries for any future stale closures | Every cleanup batch |
 | Release and publication | rc.1 release docs, publication readiness doc | Naming matrix and plugin submission/contact checklist | Before any tag |
 | Harness OS core | Audit, adapter matrix, observability docs, `ecc2/` | HUD/session-control acceptance spec | Weekly until GA |
-| Evaluation and RAG | Reference-set validation, harness audit, traces | Read-only evaluator/RAG prototype plus stale-salvage, billing-readiness, CI-failure-diagnosis, harness-config-quality, and AgentShield policy-exception fixtures | Expand to skill-quality or deep-analyzer evidence scenario |
+| Evaluation and RAG | Reference-set validation, harness audit, traces | Read-only evaluator/RAG prototype plus stale-salvage, billing-readiness, CI-failure-diagnosis, harness-config-quality, AgentShield policy-exception, and skill-quality evidence fixtures | Expand to deep-analyzer evidence scenario |
 | AgentShield enterprise | AgentShield PR evidence and roadmap notes | PDF-export decision or next enterprise signal | After value decision |
 | ECC Tools app | ECC-Tools PR evidence, billing audit, risk taxonomy | Capacity-backed Linear rollout or broader evaluator/RAG corpus slice | Next implementation batch |
 | Linear progress | Linear project status updates and this mirror | Status update with queue/evidence/missing gates | Every significant merge batch |
@@ -420,5 +420,5 @@ Acceptance:
 2. Enable/configure the merged Linear backlog sync path after workspace issue
   capacity clears or the Linear workspace is upgraded.
 3. Expand the evaluator/RAG corpus beyond stale-salvage, billing, CI,
-   harness-config, and AgentShield policy-exception prototypes toward
-   skill-quality and deep-analyzer evidence scenarios.
+   harness-config, AgentShield policy-exception, and skill-quality evidence
+   prototypes toward deep-analyzer evidence scenarios.
--- a/docs/architecture/evaluator-rag-prototype.md
+++ b/docs/architecture/evaluator-rag-prototype.md
@@ -16,7 +16,10 @@ agent proposes fixes for red checks. A harness-config quality scenario keeps
 MCP, plugin, hook, command, agent, and adapter recommendations tied to the
 adapter matrix before they mutate setup guidance. An AgentShield policy
 exception scenario gates security exceptions on SARIF/report evidence, owner
-fields, expiry state, and remediation-versus-exception decisions.
+fields, expiry state, and remediation-versus-exception decisions. A
+skill-quality evidence scenario requires observed failure or feedback evidence,
+working examples, reference-set gaps, and validation commands before a skill
+amendment can be promoted.

 ## Reference Pressure

@@ -110,6 +113,9 @@ Current corpus:
 - `agentshield-policy-exception`: requires AgentShield SARIF or report
  evidence, policy-pack source, owner/ticket/scope/expiry fields, and expired
  exception enforcement before a policy exception can be promoted.
+- `skill-quality-evidence`: requires focused skill scope, observed failure or
+  user-feedback evidence, examples/reference-set coverage, validation commands,
+  and publication safety before a skill amendment can be promoted.

 ## ECC Tools Mapping

@@ -143,5 +149,5 @@ A candidate can be promoted only when:

 The next evaluator/RAG corpus should add:

- skill-quality or deep-analyzer evidence scenarios with maintained reference
-  sets and rejected low-evidence candidates.
+- a deep-analyzer evidence scenario with maintained reference sets and rejected
+  low-evidence candidates.