docs: add evaluator billing readiness scenario (#1825)

This commit is contained in:
Affaan Mustafa
2026-05-12 17:24:34 -04:00
committed by GitHub
parent dcf5668b27
commit 863519eecf
8 changed files with 279 additions and 10 deletions

View File

@@ -199,7 +199,7 @@ is not complete unless the evidence column exists and has been freshly verified.
| AgentShield enterprise iteration | Policy gates, SARIF, packs, provenance, corpus, HTML reports, exception lifecycle audit | PRs #53, #55-#62 landed with test evidence | Needs PDF/export decision or next enterprise signal |
| ECC Tools next-level app | Billing audit, PR checks, deep analyzer, sync backlog | PRs #26-#39 landed with test evidence | Needs capacity-backed Linear rollout / broader evaluator corpus |
| GitGuardian/Dependabot/CodeRabbit-style checks | Non-blocking taxonomy and deterministic follow-up checks | ECC-Tools risk taxonomy check plus follow-up signals landed, including Skill Quality, Deep Analyzer Evidence, Analyzer Corpus Evidence, RAG/Evaluator Evidence, and PR Review/Salvage Evidence | Partially complete |
| Harness-agnostic learning system | Audit, adapter matrix, observability, traces, promotion loop | Audit/adapters/observability gates plus `docs/architecture/evaluator-rag-prototype.md` and `examples/evaluator-rag-prototype/` define the first read-only scenario, trace, report, playbook, and verifier result | Needs broader evaluator corpus |
| Harness-agnostic learning system | Audit, adapter matrix, observability, traces, promotion loop | Audit/adapters/observability gates plus `docs/architecture/evaluator-rag-prototype.md` and `examples/evaluator-rag-prototype/` define read-only stale-salvage and billing-readiness scenarios with trace, report, playbook, and verifier result artifacts | Needs broader evaluator corpus |
| Linear roadmap is detailed | Linear project status plus repo mirror | Repo mirror exists; issue creation was retried on 2026-05-12 and remains blocked by the workspace free issue limit | Needs recurring status updates after each merge batch |
| Flow separation and progress tracking | Flow lanes with owner artifacts and update cadence | This roadmap defines lanes below | Active |
| Realtime Linear sync | Project updates while issue limit is blocked; issues later | ECC-Tools #39 implements opt-in Linear API sync for deferred follow-up backlog items | Needs workspace capacity/config rollout |
@@ -218,7 +218,7 @@ back to the repo evidence and merge commits.
| Queue hygiene and salvage | GitHub PR/issue state, salvage ledger | Append ledger entries for any future stale closures | Every cleanup batch |
| Release and publication | rc.1 release docs, publication readiness doc | Naming matrix and plugin submission/contact checklist | Before any tag |
| Harness OS core | Audit, adapter matrix, observability docs, `ecc2/` | HUD/session-control acceptance spec | Weekly until GA |
| Evaluation and RAG | Reference-set validation, harness audit, traces | Read-only evaluator/RAG prototype plus fixture contract | Expand to CI, billing, harness-config, and AgentShield scenarios |
| Evaluation and RAG | Reference-set validation, harness audit, traces | Read-only evaluator/RAG prototype plus stale-salvage and billing-readiness fixtures | Expand to CI, harness-config, and AgentShield scenarios |
| AgentShield enterprise | AgentShield PR evidence and roadmap notes | PDF-export decision or next enterprise signal | After value decision |
| ECC Tools app | ECC-Tools PR evidence, billing audit, risk taxonomy | Capacity-backed Linear rollout or broader evaluator/RAG corpus slice | Next implementation batch |
| Linear progress | Linear project status updates and this mirror | Status update with queue/evidence/missing gates | Every significant merge batch |
@@ -356,6 +356,11 @@ Acceptance:
Manifest Integrity, CI/CD Recommendation, Cost/Token Risk, Reference Set
Validation, Deep Analyzer Evidence, RAG/Evaluator Evidence,
PR Review/Salvage Evidence, Skill Quality, and Agent Config Review.
- Evaluator/RAG billing readiness fixture
`examples/evaluator-rag-prototype/billing-marketplace-readiness/` records the
read-only claim-verification path for Marketplace, App, subscription, seat,
entitlement, and plan language before launch copy can treat those claims as
live.
- Cost/token-risk predictive follow-ups flag AI routing, model-call, usage,
quota, and budget changes when budget evidence is missing.
- Reference-set validation follow-ups flag analyzer, skill, agent, command, and
@@ -412,6 +417,6 @@ Acceptance:
executive report, corpus benchmark output, and exception lifecycle audit.
2. Enable/configure the merged Linear backlog sync path after workspace issue
capacity clears or the Linear workspace is upgraded.
3. Expand the evaluator/RAG corpus beyond the first stale-salvage prototype to
CI failure diagnosis, harness-config drift, billing readiness, and
AgentShield policy exception scenarios.
3. Expand the evaluator/RAG corpus beyond the stale-salvage and billing
prototypes to CI failure diagnosis, harness-config drift, and AgentShield
policy exception scenarios.

View File

@@ -7,9 +7,10 @@ that loop.
The fixture set lives in
[`examples/evaluator-rag-prototype/`](../../examples/evaluator-rag-prototype/).
It uses the May 2026 stale-PR cleanup and salvage lane as the first concrete
scenario because that lane has real inputs, real accepted work, and real
rejected work.
It started with the May 2026 stale-PR cleanup and salvage lane because that
lane has real inputs, real accepted work, and real rejected work. The corpus now
also includes a billing/Marketplace readiness scenario so launch copy cannot
treat dry-run release evidence or roadmap intent as live billing state.
## Reference Pressure
@@ -83,6 +84,19 @@ The verifier rejects a blind cherry-pick proposal that:
- lacks tests or ledger updates;
- mutates release or plugin publication state.
## Corpus Fixtures
The root fixture files preserve the original
`stale-pr-salvage-maintainer-branch` prototype. Additional scenarios can live in
subdirectories when they reuse the same five-artifact contract.
Current corpus:
- `stale-pr-salvage-maintainer-branch`: recovers useful closed PR work through
maintainer-owned branches with attribution and validation.
- `billing-marketplace-readiness`: verifies billing, App, and Marketplace
launch claims before public copy says they are live.
## ECC Tools Mapping
ECC Tools already flags missing RAG/evaluator evidence for retrieval,
@@ -117,6 +131,4 @@ The next evaluator/RAG corpus should add:
- a CI-failure diagnosis scenario with captured logs and a known fix;
- a harness-config quality scenario covering MCP/plugin/hook drift;
- a billing-readiness scenario that separates verified Marketplace claims from
launch-copy assumptions;
- an AgentShield policy exception scenario with SARIF and report evidence.