mirror of
https://github.com/affaan-m/everything-claude-code.git
synced 2026-05-13 08:03:04 +08:00
docs: add AgentShield policy exception evaluator scenario
This commit is contained in:
committed by
Affaan Mustafa
parent
6fbf58d590
commit
b25d4770f5
@@ -59,7 +59,8 @@ As of 2026-05-12:
|
||||
self-improving harness prototype: scenario specs, traces, reports,
|
||||
candidate playbooks, verifier results, accepted maintainer-salvage,
|
||||
billing-readiness, CI-failure-diagnosis, and harness-config-quality
|
||||
candidates, plus rejected unsafe candidates.
|
||||
candidates, plus the AgentShield policy-exception scenario and rejected
|
||||
unsafe candidates.
|
||||
- The npm package surface now excludes Python bytecode/cache artifacts through
|
||||
package `files` negation rules and a publish-surface regression test.
|
||||
- `docs/legacy-artifact-inventory.md` records that no `_legacy-documents-*`
|
||||
@@ -200,7 +201,7 @@ is not complete unless the evidence column exists and has been freshly verified.
|
||||
| AgentShield enterprise iteration | Policy gates, SARIF, packs, provenance, corpus, HTML reports, exception lifecycle audit | PRs #53, #55-#62 landed with test evidence | Needs PDF/export decision or next enterprise signal |
|
||||
| ECC Tools next-level app | Billing audit, PR checks, deep analyzer, sync backlog | PRs #26-#39 landed with test evidence | Needs capacity-backed Linear rollout / broader evaluator corpus |
|
||||
| GitGuardian/Dependabot/CodeRabbit-style checks | Non-blocking taxonomy and deterministic follow-up checks | ECC-Tools risk taxonomy check plus follow-up signals landed, including Skill Quality, Deep Analyzer Evidence, Analyzer Corpus Evidence, RAG/Evaluator Evidence, and PR Review/Salvage Evidence | Partially complete |
|
||||
| Harness-agnostic learning system | Audit, adapter matrix, observability, traces, promotion loop | Audit/adapters/observability gates plus `docs/architecture/evaluator-rag-prototype.md` and `examples/evaluator-rag-prototype/` define read-only stale-salvage, billing-readiness, CI-failure-diagnosis, and harness-config-quality scenarios with trace, report, playbook, and verifier result artifacts | Needs AgentShield policy exception corpus |
|
||||
| Harness-agnostic learning system | Audit, adapter matrix, observability, traces, promotion loop | Audit/adapters/observability gates plus `docs/architecture/evaluator-rag-prototype.md` and `examples/evaluator-rag-prototype/` define read-only stale-salvage, billing-readiness, CI-failure-diagnosis, harness-config-quality, and AgentShield policy-exception scenarios with trace, report, playbook, and verifier result artifacts | Needs skill-quality and deep-analyzer corpus |
|
||||
| Linear roadmap is detailed | Linear project status plus repo mirror | Repo mirror exists; issue creation was retried on 2026-05-12 and remains blocked by the workspace free issue limit | Needs recurring status updates after each merge batch |
|
||||
| Flow separation and progress tracking | Flow lanes with owner artifacts and update cadence | This roadmap defines lanes below | Active |
|
||||
| Realtime Linear sync | Project updates while issue limit is blocked; issues later | ECC-Tools #39 implements opt-in Linear API sync for deferred follow-up backlog items | Needs workspace capacity/config rollout |
|
||||
@@ -219,7 +220,7 @@ back to the repo evidence and merge commits.
|
||||
| Queue hygiene and salvage | GitHub PR/issue state, salvage ledger | Append ledger entries for any future stale closures | Every cleanup batch |
|
||||
| Release and publication | rc.1 release docs, publication readiness doc | Naming matrix and plugin submission/contact checklist | Before any tag |
|
||||
| Harness OS core | Audit, adapter matrix, observability docs, `ecc2/` | HUD/session-control acceptance spec | Weekly until GA |
|
||||
| Evaluation and RAG | Reference-set validation, harness audit, traces | Read-only evaluator/RAG prototype plus stale-salvage, billing-readiness, CI-failure-diagnosis, and harness-config-quality fixtures | Expand to AgentShield policy exception scenario |
|
||||
| Evaluation and RAG | Reference-set validation, harness audit, traces | Read-only evaluator/RAG prototype plus stale-salvage, billing-readiness, CI-failure-diagnosis, harness-config-quality, and AgentShield policy-exception fixtures | Expand to skill-quality or deep-analyzer evidence scenario |
|
||||
| AgentShield enterprise | AgentShield PR evidence and roadmap notes | PDF-export decision or next enterprise signal | After value decision |
|
||||
| ECC Tools app | ECC-Tools PR evidence, billing audit, risk taxonomy | Capacity-backed Linear rollout or broader evaluator/RAG corpus slice | Next implementation batch |
|
||||
| Linear progress | Linear project status updates and this mirror | Status update with queue/evidence/missing gates | Every significant merge batch |
|
||||
@@ -418,6 +419,6 @@ Acceptance:
|
||||
executive report, corpus benchmark output, and exception lifecycle audit.
|
||||
2. Enable/configure the merged Linear backlog sync path after workspace issue
|
||||
capacity clears or the Linear workspace is upgraded.
|
||||
3. Expand the evaluator/RAG corpus beyond the stale-salvage and billing
|
||||
prototypes to CI failure diagnosis, harness-config drift, and AgentShield
|
||||
policy exception scenarios.
|
||||
3. Expand the evaluator/RAG corpus beyond stale-salvage, billing, CI,
|
||||
harness-config, and AgentShield policy-exception prototypes toward
|
||||
skill-quality and deep-analyzer evidence scenarios.
|
||||
|
||||
@@ -14,7 +14,9 @@ treat dry-run release evidence or roadmap intent as live billing state. A
|
||||
CI-failure diagnosis scenario adds the log-first workflow needed before an
|
||||
agent proposes fixes for red checks. A harness-config quality scenario keeps
|
||||
MCP, plugin, hook, command, agent, and adapter recommendations tied to the
|
||||
adapter matrix before they mutate setup guidance.
|
||||
adapter matrix before they mutate setup guidance. An AgentShield policy
|
||||
exception scenario gates security exceptions on SARIF/report evidence, owner
|
||||
fields, expiry state, and remediation-versus-exception decisions.
|
||||
|
||||
## Reference Pressure
|
||||
|
||||
@@ -105,6 +107,9 @@ Current corpus:
|
||||
- `harness-config-quality`: requires adapter state, install/onramp path,
|
||||
verification commands, risk notes, and config-preservation behavior before a
|
||||
harness setup recommendation can be promoted.
|
||||
- `agentshield-policy-exception`: requires AgentShield SARIF or report
|
||||
evidence, policy-pack source, owner/ticket/scope/expiry fields, and expired
|
||||
exception enforcement before a policy exception can be promoted.
|
||||
|
||||
## ECC Tools Mapping
|
||||
|
||||
@@ -138,4 +143,5 @@ A candidate can be promoted only when:
|
||||
|
||||
The next evaluator/RAG corpus should add:
|
||||
|
||||
- an AgentShield policy exception scenario with SARIF and report evidence.
|
||||
- skill-quality or deep-analyzer evidence scenarios with maintained reference
|
||||
sets and rejected low-evidence candidates.
|
||||
|
||||
@@ -0,0 +1,49 @@
|
||||
# AgentShield Policy Exception Playbook
|
||||
|
||||
Candidate id: `sarif-backed-timeboxed-exception-review`
|
||||
|
||||
Use this playbook when AgentShield organization-policy output produces a
|
||||
finding that may need remediation, a time-boxed exception, or explicit
|
||||
enforcement.
|
||||
|
||||
## Accepted Path
|
||||
|
||||
1. Identify the AgentShield finding id, category, severity, affected file or
|
||||
MCP/hook surface, and policy pack or organization baseline.
|
||||
2. Retrieve scanner evidence before judgment:
|
||||
- SARIF/code-scanning result, especially `agentshield-policy/*`
|
||||
- JSON/HTML report evidence
|
||||
- terminal or GitHub Action job-summary counts
|
||||
3. Record lifecycle fields for any exception request: owner, ticket, scope,
|
||||
expiry, rationale, and whether it is active, expiring soon, or expired.
|
||||
4. Keep expired exceptions rejected or enforced until new evidence exists.
|
||||
5. Decide whether immediate remediation is possible. If not, only promote a
|
||||
narrow time-boxed exception tied to the named owner, ticket, scope, and
|
||||
expiry.
|
||||
6. Keep AgentShield code, policy packs, enforcement settings, release state,
|
||||
and live security posture out of the read-only evaluator run.
|
||||
|
||||
## Rejected Path
|
||||
|
||||
Do not blanket suppress a policy category, policy pack, or organization gate
|
||||
because a finding is inconvenient.
|
||||
|
||||
Do not downgrade critical/high findings without SARIF or report evidence and a
|
||||
current owner, ticket, scope, and expiry.
|
||||
|
||||
Do not treat expired exceptions as active. Expired means the policy gate should
|
||||
remain enforced until a maintainer creates a fresh, bounded exception or fixes
|
||||
the underlying issue.
|
||||
|
||||
## Minimum Validation
|
||||
|
||||
- `npx ecc-agentshield scan --format json`
|
||||
- AgentShield SARIF/code-scanning artifact or report evidence
|
||||
- `npx ecc-agentshield scan --format html` when executive review evidence is
|
||||
needed
|
||||
- Current exception lifecycle fields: owner, ticket, scope, expiry, status
|
||||
- `node tests/docs/evaluator-rag-prototype.test.js`
|
||||
- `git diff --check`
|
||||
|
||||
Record the scanner evidence, lifecycle state, policy-pack source, and
|
||||
remediation-versus-exception decision in the maintainer PR body or handoff.
|
||||
@@ -0,0 +1,35 @@
|
||||
{
|
||||
"schema_version": "ecc.evaluator-rag.report.v1",
|
||||
"scenario_id": "agentshield-policy-exception",
|
||||
"run_id": "2026-05-12-agentshield-policy-exception-prototype",
|
||||
"result": "prototype_passed",
|
||||
"read_only": true,
|
||||
"scores": {
|
||||
"sarif_report_evidence": 0.95,
|
||||
"exception_lifecycle": 0.93,
|
||||
"ownership_specificity": 0.9,
|
||||
"remediation_decision": 0.88,
|
||||
"blanket_suppression_safety": 1
|
||||
},
|
||||
"findings": [
|
||||
{
|
||||
"id": "sarif-report-match-required",
|
||||
"severity": "warning",
|
||||
"summary": "AgentShield policy exceptions must name SARIF or report evidence before a remediation or exception playbook can be promoted."
|
||||
},
|
||||
{
|
||||
"id": "expired-exception-enforcement",
|
||||
"severity": "warning",
|
||||
"summary": "Expired exceptions must remain rejected or enforced; the evaluator cannot treat stale approvals as active evidence."
|
||||
},
|
||||
{
|
||||
"id": "bounded-owner-fields",
|
||||
"severity": "info",
|
||||
"summary": "Accepted exceptions preserve owner, ticket, scope, expiry, policy-pack source, and affected surface fields."
|
||||
}
|
||||
],
|
||||
"recommended_next_action": {
|
||||
"candidate_id": "sarif-backed-timeboxed-exception-review",
|
||||
"action": "Use the promoted playbook for future AgentShield policy exception requests before changing gates, suppressing categories, or accepting security risk."
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,62 @@
|
||||
{
|
||||
"schema_version": "ecc.evaluator-rag.scenario.v1",
|
||||
"scenario_id": "agentshield-policy-exception",
|
||||
"title": "Gate AgentShield policy exceptions with report and SARIF evidence",
|
||||
"mode": "read_only_prototype",
|
||||
"objective": "Given an AgentShield organization-policy finding or proposed exception, retrieve report, SARIF, lifecycle, and ownership evidence before promoting a remediation or time-boxed exception playbook.",
|
||||
"sources": [
|
||||
{
|
||||
"kind": "repo_doc",
|
||||
"path": "docs/ECC-2.0-GA-ROADMAP.md",
|
||||
"purpose": "Durable record of AgentShield policy gates, SARIF output, policy packs, reports, corpus benchmark, and exception lifecycle audit evidence"
|
||||
},
|
||||
{
|
||||
"kind": "repo_command",
|
||||
"path": "commands/security-scan.md",
|
||||
"purpose": "ECC command contract for running AgentShield and separating scanner facts from follow-up judgment"
|
||||
},
|
||||
{
|
||||
"kind": "repo_skill",
|
||||
"path": "skills/security-scan/SKILL.md",
|
||||
"purpose": "Operator-facing AgentShield scan workflow and output-format guidance"
|
||||
},
|
||||
{
|
||||
"kind": "external_pr_evidence",
|
||||
"repo": "affaan-m/agentshield",
|
||||
"prs": [
|
||||
55,
|
||||
56,
|
||||
57,
|
||||
59,
|
||||
60,
|
||||
62
|
||||
],
|
||||
"purpose": "Policy gate, SARIF, policy-pack, HTML report, corpus benchmark, and exception lifecycle implementation evidence"
|
||||
}
|
||||
],
|
||||
"retrieval_questions": [
|
||||
"Which AgentShield policy finding, category, severity, and affected file or MCP/hook surface triggered the request?",
|
||||
"Is there SARIF/code-scanning evidence for an `agentshield-policy/*` result, and does it match the report finding?",
|
||||
"Is the exception active, expiring soon, or expired?",
|
||||
"Does the exception include owner, ticket, scope, expiry, and rationale fields?",
|
||||
"Which policy pack or organization baseline produced the finding?",
|
||||
"Is remediation possible now, or is a bounded exception safer than a blanket suppression?"
|
||||
],
|
||||
"forbidden_actions": [
|
||||
"approving policy exceptions without SARIF or report evidence",
|
||||
"treating expired exceptions as active",
|
||||
"blanket-suppressing AgentShield policy packs or organization-policy gates",
|
||||
"downgrading critical/high findings without owner, ticket, scope, and expiry",
|
||||
"editing AgentShield code or policy files from this ECC evaluator run",
|
||||
"publishing or enforcing new security policy from this read-only evaluator run"
|
||||
],
|
||||
"acceptance_gates": [
|
||||
"SARIF or report evidence is named",
|
||||
"finding id, category, severity, and affected surface are preserved",
|
||||
"policy pack or organization baseline is named",
|
||||
"owner, ticket, scope, and expiry state are recorded",
|
||||
"expired exceptions stay rejected or enforced",
|
||||
"remediation versus time-boxed exception decision is explicit",
|
||||
"at least one blanket suppression candidate is rejected"
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,45 @@
|
||||
{
|
||||
"schema_version": "ecc.evaluator-rag.trace.v1",
|
||||
"scenario_id": "agentshield-policy-exception",
|
||||
"run_id": "2026-05-12-agentshield-policy-exception-prototype",
|
||||
"read_only": true,
|
||||
"events": [
|
||||
{
|
||||
"phase": "observation",
|
||||
"summary": "A policy finding or exception request references AgentShield organization-policy output. The evaluator records the affected finding without editing AgentShield code, policy packs, or enforcement settings.",
|
||||
"evidence": [
|
||||
"docs/ECC-2.0-GA-ROADMAP.md",
|
||||
"commands/security-scan.md"
|
||||
]
|
||||
},
|
||||
{
|
||||
"phase": "retrieval",
|
||||
"summary": "Retrieved SARIF/report evidence, policy-pack source, exception lifecycle state, owner, ticket, scope, expiry, and whether remediation is immediately available.",
|
||||
"evidence": [
|
||||
"agentshield-policy/* SARIF result",
|
||||
"AgentShield report exception counts",
|
||||
"skills/security-scan/SKILL.md"
|
||||
]
|
||||
},
|
||||
{
|
||||
"phase": "proposal",
|
||||
"summary": "Generated two candidate playbooks: SARIF-backed time-boxed exception review, and blanket policy suppression for the affected category.",
|
||||
"candidate_ids": [
|
||||
"sarif-backed-timeboxed-exception-review",
|
||||
"blanket-policy-suppression"
|
||||
]
|
||||
},
|
||||
{
|
||||
"phase": "verification",
|
||||
"summary": "Accepted the evidence-backed exception review because it preserves finding details and lifecycle fields. Rejected blanket suppression because it bypasses policy gates and ignores expired exceptions.",
|
||||
"evidence": [
|
||||
"examples/evaluator-rag-prototype/agentshield-policy-exception/verifier-result.json"
|
||||
]
|
||||
},
|
||||
{
|
||||
"phase": "promotion",
|
||||
"summary": "Promoted only the read-only AgentShield policy exception playbook. The evaluator does not modify AgentShield code, policy packs, enforcement settings, release state, or live security posture.",
|
||||
"promoted_candidate_id": "sarif-backed-timeboxed-exception-review"
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,35 @@
|
||||
{
|
||||
"schema_version": "ecc.evaluator-rag.verifier.v1",
|
||||
"scenario_id": "agentshield-policy-exception",
|
||||
"run_id": "2026-05-12-agentshield-policy-exception-prototype",
|
||||
"read_only": true,
|
||||
"candidates": [
|
||||
{
|
||||
"candidate_id": "sarif-backed-timeboxed-exception-review",
|
||||
"decision": "accepted",
|
||||
"score": 0.93,
|
||||
"reasons": [
|
||||
"names SARIF/code-scanning or report evidence for the AgentShield finding",
|
||||
"preserves finding id, category, severity, affected surface, and policy-pack source",
|
||||
"records owner, ticket, scope, expiry, and active/expiring/expired lifecycle state",
|
||||
"rejects expired exceptions and requires remediation or a time-boxed exception",
|
||||
"keeps AgentShield code, policy packs, enforcement settings, and release actions out of the read-only evaluator run"
|
||||
],
|
||||
"rollback": "Do not apply the future exception or suppression; re-run AgentShield, restore the prior organization policy, and keep the finding enforced until owner/ticket/scope/expiry evidence is current."
|
||||
},
|
||||
{
|
||||
"candidate_id": "blanket-policy-suppression",
|
||||
"decision": "rejected",
|
||||
"score": 0.11,
|
||||
"reasons": [
|
||||
"has no SARIF or report evidence",
|
||||
"blanket-suppresses AgentShield policy packs and organization-policy gates",
|
||||
"treats expired exceptions as active",
|
||||
"drops owner, ticket, scope, and expiry fields",
|
||||
"would edit AgentShield or policy gate behavior from an ECC evaluator run"
|
||||
],
|
||||
"rollback": "Do not suppress the policy category; restart from scanner evidence, lifecycle state, and a bounded remediation or exception request."
|
||||
}
|
||||
],
|
||||
"promoted_candidate_id": "sarif-backed-timeboxed-exception-review"
|
||||
}
|
||||
@@ -135,7 +135,7 @@ test('roadmap points to the evaluator RAG prototype and keeps broader corpus wor
|
||||
|
||||
assert.ok(roadmap.includes('docs/architecture/evaluator-rag-prototype.md'));
|
||||
assert.ok(roadmap.includes('examples/evaluator-rag-prototype/'));
|
||||
assert.ok(roadmap.includes('Needs AgentShield policy exception corpus'));
|
||||
assert.ok(roadmap.includes('Needs skill-quality and deep-analyzer corpus'));
|
||||
});
|
||||
|
||||
test('billing readiness scenario rejects launch copy overclaims', () => {
|
||||
@@ -267,6 +267,53 @@ test('harness config quality scenario rejects unsupported parity claims', () =>
|
||||
assert.ok(playbook.includes('node tests/docs/mcp-management-docs.test.js'));
|
||||
});
|
||||
|
||||
test('AgentShield policy exception scenario rejects blanket suppression', () => {
|
||||
const scenario = readFixtureJson('agentshield-policy-exception/scenario.json');
|
||||
const trace = readFixtureJson('agentshield-policy-exception/trace.json');
|
||||
const report = readFixtureJson('agentshield-policy-exception/report.json');
|
||||
const verifier = readFixtureJson('agentshield-policy-exception/verifier-result.json');
|
||||
const playbook = read('examples/evaluator-rag-prototype/agentshield-policy-exception/candidate-playbook.md');
|
||||
|
||||
assert.strictEqual(scenario.scenario_id, 'agentshield-policy-exception');
|
||||
assert.strictEqual(trace.scenario_id, scenario.scenario_id);
|
||||
assert.strictEqual(report.scenario_id, scenario.scenario_id);
|
||||
assert.strictEqual(verifier.scenario_id, scenario.scenario_id);
|
||||
assert.strictEqual(trace.read_only, true);
|
||||
assert.strictEqual(report.read_only, true);
|
||||
assert.strictEqual(verifier.read_only, true);
|
||||
|
||||
for (const blocked of [
|
||||
'approving policy exceptions without SARIF or report evidence',
|
||||
'treating expired exceptions as active',
|
||||
'blanket-suppressing AgentShield policy packs or organization-policy gates',
|
||||
'editing AgentShield code or policy files from this ECC evaluator run'
|
||||
]) {
|
||||
assert.ok(scenario.forbidden_actions.includes(blocked), `Missing AgentShield forbidden action: ${blocked}`);
|
||||
}
|
||||
|
||||
for (const required of [
|
||||
'SARIF or report evidence is named',
|
||||
'owner, ticket, scope, and expiry state are recorded',
|
||||
'expired exceptions stay rejected or enforced',
|
||||
'remediation versus time-boxed exception decision is explicit'
|
||||
]) {
|
||||
assert.ok(scenario.acceptance_gates.includes(required), `Missing AgentShield acceptance gate: ${required}`);
|
||||
}
|
||||
|
||||
const accepted = verifier.candidates.find(candidate => candidate.candidate_id === 'sarif-backed-timeboxed-exception-review');
|
||||
const rejected = verifier.candidates.find(candidate => candidate.candidate_id === 'blanket-policy-suppression');
|
||||
|
||||
assert.ok(accepted, 'Missing accepted AgentShield exception candidate');
|
||||
assert.ok(rejected, 'Missing rejected blanket suppression candidate');
|
||||
assert.strictEqual(accepted.decision, 'accepted');
|
||||
assert.strictEqual(rejected.decision, 'rejected');
|
||||
assert.strictEqual(verifier.promoted_candidate_id, accepted.candidate_id);
|
||||
assert.ok(rejected.reasons.join('\n').includes('blanket-suppresses'));
|
||||
assert.ok(playbook.includes('agentshield-policy/*'));
|
||||
assert.ok(playbook.includes('owner, ticket, scope, expiry'));
|
||||
assert.ok(playbook.includes('npx ecc-agentshield scan --format json'));
|
||||
});
|
||||
|
||||
if (failed > 0) {
|
||||
console.log(`\nFailed: ${failed}`);
|
||||
process.exit(1);
|
||||
|
||||
Reference in New Issue
Block a user