docs: add AgentShield policy exception evaluator scenario

2026-05-13 08:03:04 +08:00 · 2026-05-12 18:08:02 -04:00
parent 6fbf58d590
commit b25d4770f5
8 changed files with 289 additions and 9 deletions
--- a/docs/ECC-2.0-GA-ROADMAP.md
+++ b/docs/ECC-2.0-GA-ROADMAP.md
@@ -59,7 +59,8 @@ As of 2026-05-12:
  self-improving harness prototype: scenario specs, traces, reports,
  candidate playbooks, verifier results, accepted maintainer-salvage,
  billing-readiness, CI-failure-diagnosis, and harness-config-quality
-  candidates, plus rejected unsafe candidates.
+  candidates, plus the AgentShield policy-exception scenario and rejected
+  unsafe candidates.
 - The npm package surface now excludes Python bytecode/cache artifacts through
  package `files` negation rules and a publish-surface regression test.
 - `docs/legacy-artifact-inventory.md` records that no `_legacy-documents-*`
@@ -200,7 +201,7 @@ is not complete unless the evidence column exists and has been freshly verified.
 | AgentShield enterprise iteration | Policy gates, SARIF, packs, provenance, corpus, HTML reports, exception lifecycle audit | PRs #53, #55-#62 landed with test evidence | Needs PDF/export decision or next enterprise signal |
 | ECC Tools next-level app | Billing audit, PR checks, deep analyzer, sync backlog | PRs #26-#39 landed with test evidence | Needs capacity-backed Linear rollout / broader evaluator corpus |
 | GitGuardian/Dependabot/CodeRabbit-style checks | Non-blocking taxonomy and deterministic follow-up checks | ECC-Tools risk taxonomy check plus follow-up signals landed, including Skill Quality, Deep Analyzer Evidence, Analyzer Corpus Evidence, RAG/Evaluator Evidence, and PR Review/Salvage Evidence | Partially complete |
-| Harness-agnostic learning system | Audit, adapter matrix, observability, traces, promotion loop | Audit/adapters/observability gates plus `docs/architecture/evaluator-rag-prototype.md` and `examples/evaluator-rag-prototype/` define read-only stale-salvage, billing-readiness, CI-failure-diagnosis, and harness-config-quality scenarios with trace, report, playbook, and verifier result artifacts | Needs AgentShield policy exception corpus |
+| Harness-agnostic learning system | Audit, adapter matrix, observability, traces, promotion loop | Audit/adapters/observability gates plus `docs/architecture/evaluator-rag-prototype.md` and `examples/evaluator-rag-prototype/` define read-only stale-salvage, billing-readiness, CI-failure-diagnosis, harness-config-quality, and AgentShield policy-exception scenarios with trace, report, playbook, and verifier result artifacts | Needs skill-quality and deep-analyzer corpus |
 | Linear roadmap is detailed | Linear project status plus repo mirror | Repo mirror exists; issue creation was retried on 2026-05-12 and remains blocked by the workspace free issue limit | Needs recurring status updates after each merge batch |
 | Flow separation and progress tracking | Flow lanes with owner artifacts and update cadence | This roadmap defines lanes below | Active |
 | Realtime Linear sync | Project updates while issue limit is blocked; issues later | ECC-Tools #39 implements opt-in Linear API sync for deferred follow-up backlog items | Needs workspace capacity/config rollout |
@@ -219,7 +220,7 @@ back to the repo evidence and merge commits.
 | Queue hygiene and salvage | GitHub PR/issue state, salvage ledger | Append ledger entries for any future stale closures | Every cleanup batch |
 | Release and publication | rc.1 release docs, publication readiness doc | Naming matrix and plugin submission/contact checklist | Before any tag |
 | Harness OS core | Audit, adapter matrix, observability docs, `ecc2/` | HUD/session-control acceptance spec | Weekly until GA |
-| Evaluation and RAG | Reference-set validation, harness audit, traces | Read-only evaluator/RAG prototype plus stale-salvage, billing-readiness, CI-failure-diagnosis, and harness-config-quality fixtures | Expand to AgentShield policy exception scenario |
+| Evaluation and RAG | Reference-set validation, harness audit, traces | Read-only evaluator/RAG prototype plus stale-salvage, billing-readiness, CI-failure-diagnosis, harness-config-quality, and AgentShield policy-exception fixtures | Expand to skill-quality or deep-analyzer evidence scenario |
 | AgentShield enterprise | AgentShield PR evidence and roadmap notes | PDF-export decision or next enterprise signal | After value decision |
 | ECC Tools app | ECC-Tools PR evidence, billing audit, risk taxonomy | Capacity-backed Linear rollout or broader evaluator/RAG corpus slice | Next implementation batch |
 | Linear progress | Linear project status updates and this mirror | Status update with queue/evidence/missing gates | Every significant merge batch |
@@ -418,6 +419,6 @@ Acceptance:
   executive report, corpus benchmark output, and exception lifecycle audit.
 2. Enable/configure the merged Linear backlog sync path after workspace issue
   capacity clears or the Linear workspace is upgraded.
-3. Expand the evaluator/RAG corpus beyond the stale-salvage and billing
-   prototypes to CI failure diagnosis, harness-config drift, and AgentShield
-   policy exception scenarios.
+3. Expand the evaluator/RAG corpus beyond stale-salvage, billing, CI,
+   harness-config, and AgentShield policy-exception prototypes toward
+   skill-quality and deep-analyzer evidence scenarios.
--- a/docs/architecture/evaluator-rag-prototype.md
+++ b/docs/architecture/evaluator-rag-prototype.md
@@ -14,7 +14,9 @@ treat dry-run release evidence or roadmap intent as live billing state. A
 CI-failure diagnosis scenario adds the log-first workflow needed before an
 agent proposes fixes for red checks. A harness-config quality scenario keeps
 MCP, plugin, hook, command, agent, and adapter recommendations tied to the
-adapter matrix before they mutate setup guidance.
+adapter matrix before they mutate setup guidance. An AgentShield policy
+exception scenario gates security exceptions on SARIF/report evidence, owner
+fields, expiry state, and remediation-versus-exception decisions.

 ## Reference Pressure

@@ -105,6 +107,9 @@ Current corpus:
 - `harness-config-quality`: requires adapter state, install/onramp path,
  verification commands, risk notes, and config-preservation behavior before a
  harness setup recommendation can be promoted.
+- `agentshield-policy-exception`: requires AgentShield SARIF or report
+  evidence, policy-pack source, owner/ticket/scope/expiry fields, and expired
+  exception enforcement before a policy exception can be promoted.

 ## ECC Tools Mapping

@@ -138,4 +143,5 @@ A candidate can be promoted only when:

 The next evaluator/RAG corpus should add:

- an AgentShield policy exception scenario with SARIF and report evidence.
+- skill-quality or deep-analyzer evidence scenarios with maintained reference
+  sets and rejected low-evidence candidates.