docs: add deep-analyzer evaluator scenario

This commit is contained in:
Affaan Mustafa
2026-05-12 18:43:28 -04:00
committed by Affaan Mustafa
parent 337ced0828
commit 37c27a60fd
8 changed files with 297 additions and 12 deletions

View File

@@ -201,7 +201,7 @@ is not complete unless the evidence column exists and has been freshly verified.
| AgentShield enterprise iteration | Policy gates, SARIF, packs, provenance, corpus, HTML reports, exception lifecycle audit | PRs #53, #55-#62 landed with test evidence | Needs PDF/export decision or next enterprise signal |
| ECC Tools next-level app | Billing audit, PR checks, deep analyzer, sync backlog | PRs #26-#39 landed with test evidence | Needs capacity-backed Linear rollout / broader evaluator corpus |
| GitGuardian/Dependabot/CodeRabbit-style checks | Non-blocking taxonomy and deterministic follow-up checks | ECC-Tools risk taxonomy check plus follow-up signals landed, including Skill Quality, Deep Analyzer Evidence, Analyzer Corpus Evidence, RAG/Evaluator Evidence, and PR Review/Salvage Evidence | Partially complete |
| Harness-agnostic learning system | Audit, adapter matrix, observability, traces, promotion loop | Audit/adapters/observability gates plus `docs/architecture/evaluator-rag-prototype.md` and `examples/evaluator-rag-prototype/` define read-only stale-salvage, billing-readiness, CI-failure-diagnosis, harness-config-quality, AgentShield policy-exception, and skill-quality evidence scenarios with trace, report, playbook, and verifier result artifacts | Needs deep-analyzer corpus |
| Harness-agnostic learning system | Audit, adapter matrix, observability, traces, promotion loop | Audit/adapters/observability gates plus `docs/architecture/evaluator-rag-prototype.md` and `examples/evaluator-rag-prototype/` define read-only stale-salvage, billing-readiness, CI-failure-diagnosis, harness-config-quality, AgentShield policy-exception, skill-quality evidence, and deep-analyzer evidence scenarios with trace, report, playbook, and verifier result artifacts | Local corpus complete; hosted integration remains future |
| Linear roadmap is detailed | Linear project status plus repo mirror | Repo mirror exists; issue creation was retried on 2026-05-12 and remains blocked by the workspace free issue limit | Needs recurring status updates after each merge batch |
| Flow separation and progress tracking | Flow lanes with owner artifacts and update cadence | This roadmap defines lanes below | Active |
| Realtime Linear sync | Project updates while issue limit is blocked; issues later | ECC-Tools #39 implements opt-in Linear API sync for deferred follow-up backlog items | Needs workspace capacity/config rollout |
@@ -220,7 +220,7 @@ back to the repo evidence and merge commits.
| Queue hygiene and salvage | GitHub PR/issue state, salvage ledger | Append ledger entries for any future stale closures | Every cleanup batch |
| Release and publication | rc.1 release docs, publication readiness doc | Naming matrix and plugin submission/contact checklist | Before any tag |
| Harness OS core | Audit, adapter matrix, observability docs, `ecc2/` | HUD/session-control acceptance spec | Weekly until GA |
| Evaluation and RAG | Reference-set validation, harness audit, traces | Read-only evaluator/RAG prototype plus stale-salvage, billing-readiness, CI-failure-diagnosis, harness-config-quality, AgentShield policy-exception, and skill-quality evidence fixtures | Expand to deep-analyzer evidence scenario |
| Evaluation and RAG | Reference-set validation, harness audit, traces | Read-only evaluator/RAG prototype plus stale-salvage, billing-readiness, CI-failure-diagnosis, harness-config-quality, AgentShield policy-exception, skill-quality evidence, and deep-analyzer evidence fixtures | Use as fixture contract before hosted retrieval/check-run automation |
| AgentShield enterprise | AgentShield PR evidence and roadmap notes | PDF-export decision or next enterprise signal | After value decision |
| ECC Tools app | ECC-Tools PR evidence, billing audit, risk taxonomy | Capacity-backed Linear rollout or broader evaluator/RAG corpus slice | Next implementation batch |
| Linear progress | Linear project status updates and this mirror | Status update with queue/evidence/missing gates | Every significant merge batch |
@@ -419,6 +419,6 @@ Acceptance:
executive report, corpus benchmark output, and exception lifecycle audit.
2. Enable/configure the merged Linear backlog sync path after workspace issue
capacity clears or the Linear workspace is upgraded.
3. Expand the evaluator/RAG corpus beyond stale-salvage, billing, CI,
harness-config, AgentShield policy-exception, and skill-quality evidence
prototypes toward deep-analyzer evidence scenarios.
3. Consume the local evaluator/RAG corpus from ECC Tools before adding hosted
retrieval, vector storage, model-backed judging, or automated check-run
promotion.

View File

@@ -19,7 +19,9 @@ exception scenario gates security exceptions on SARIF/report evidence, owner
fields, expiry state, and remediation-versus-exception decisions. A
skill-quality evidence scenario requires observed failure or feedback evidence,
working examples, reference-set gaps, and validation commands before a skill
amendment can be promoted.
amendment can be promoted. A deep-analyzer evidence scenario requires analyzer
corpus cases, expected-output comparisons, and risk-taxonomy proof before
repository or commit-analysis behavior can change.
## Reference Pressure
@@ -116,6 +118,9 @@ Current corpus:
- `skill-quality-evidence`: requires focused skill scope, observed failure or
user-feedback evidence, examples/reference-set coverage, validation commands,
and publication safety before a skill amendment can be promoted.
- `deep-analyzer-evidence`: requires maintained analyzer corpus cases,
expected-output comparisons, representative repository/commit histories, and
regression commands before deep-analysis behavior can be promoted.
## ECC Tools Mapping
@@ -147,7 +152,7 @@ A candidate can be promoted only when:
## Next Expansion
The next evaluator/RAG corpus should add:
- a deep-analyzer evidence scenario with maintained reference sets and rejected
low-evidence candidates.
The local evaluator/RAG corpus now covers the current evidence buckets. Future
work should consume these fixtures from ECC Tools before adding hosted
retrieval, vector storage, model-backed judging, or automated check-run
promotion.