From 3b12fb273ffbb9e96dbb21b601e0eed409c3bff9 Mon Sep 17 00:00:00 2001 From: Affaan Mustafa Date: Wed, 13 May 2026 22:39:01 -0400 Subject: [PATCH] docs: sync ECC Tools hosted promotion readiness (#1890) --- docs/ECC-2.0-GA-ROADMAP.md | 27 +++++++++++++++------- tests/docs/evaluator-rag-prototype.test.js | 4 ++-- 2 files changed, 21 insertions(+), 10 deletions(-) diff --git a/docs/ECC-2.0-GA-ROADMAP.md b/docs/ECC-2.0-GA-ROADMAP.md index 8b40a5c5..f1faef36 100644 --- a/docs/ECC-2.0-GA-ROADMAP.md +++ b/docs/ECC-2.0-GA-ROADMAP.md @@ -158,6 +158,13 @@ As of 2026-05-13: queued PR analysis now reads the #65/#66 latest-result cache when publishing `ECC Tools / Hosted Depth Plan`, includes the latest hosted run status in the plan table, and recommends the next unrun ready job before reruns. +- ECC-Tools PR #68 merged as `2cde524b5ef8f34ab7bb1af973248fe4be4359f8` + and added deterministic hosted promotion readiness: + opened/synchronized PRs now publish a non-blocking + `ECC Tools / Hosted Promotion Readiness` check-run that compares changed + files against the checked-in evaluator/RAG corpus, warns on missing + hosted-job promotion evidence, and can be disabled with + `PR_HOSTED_PROMOTION_READINESS_CHECK_MODE=off`. - Handoff `ecc-supply-chain-audit-20260513-0645.md` under `~/.cluster-swarm/handoffs/` records the May 13 supply-chain sweep: no active lockfile/manifest hit for @@ -394,6 +401,10 @@ As of 2026-05-13: - ECC-Tools PR #67 feeds those cached results back into the hosted depth-plan check-run so queued analysis recommends the next unrun ready hosted job from cache state instead of repeating the static readiness order. +- ECC-Tools PR #68 adds the first evaluator-backed hosted promotion gate: + opened/synchronized PRs get a non-blocking Hosted Promotion Readiness + check-run that turns the evaluator/RAG corpus into warnings when changed + files match fixture scenarios without their expected evidence artifacts. - ECC PR #1803 landed the contributor Quarkus handling branch after maintainer cleanup, current-`main` alignment, full local validation, and preservation of the author's removal of incomplete ja-JP and zh-CN Quarkus translations. @@ -447,10 +458,10 @@ is not complete unless the evidence column exists and has been freshly verified. | Claude and Codex plugin publication | Contact/submission path with required artifacts and status | Publication readiness, naming matrix, and May 12 dry-run evidence document plugin validation, clean-checkout Claude tag/install smoke, and Codex marketplace CLI shape | Needs explicit approval for real tag/push and marketplace submission | | Articles, tweets, and announcements | X thread, LinkedIn copy, GitHub release copy, push checklist | Draft launch collateral exists under rc.1 release docs | Needs URL-backed refresh | | AgentShield enterprise iteration | Policy gates, SARIF, packs, provenance, corpus, HTML reports, exception lifecycle audit, baseline drift Action/CLI surfaces, evidence-pack redaction, harness adapter registry, enterprise research roadmap, supply-chain hardened release path, CI-safe baseline fingerprints, corpus accuracy recommendations, remediation workflow phases, env proxy hijack corpus coverage | PRs #53, #55-#64, #67-#69, and #78-#82 landed with test evidence; native PDF export deferred in favor of self-contained HTML plus print-to-PDF until explicit enterprise demand appears; `docs/architecture/agentshield-enterprise-research-roadmap.md` now has baseline drift, evidence-pack bundle, redaction, adapter-registry, supply-chain hardening, hashed baseline fingerprints, corpus accuracy recommendation, remediation workflow, and env proxy hijack corpus slices landed | Next hosted evidence-pack workflow depth | -| ECC Tools next-level app | Billing audit, PR checks, deep analyzer, sync backlog, evaluator/RAG corpus, analysis-depth readiness, hosted execution planning, hosted CI diagnostics, hosted security evidence review, hosted harness compatibility audit, hosted reference-set evaluation, hosted AI routing/cost review, hosted team backlog routing, hosted depth-plan check-run, PR-comment hosted job dispatch, hosted job result history/check-runs, hosted result status command, status-aware depth-plan recommendations | PRs #26-#43 plus #53-#67 landed with test evidence, including AgentShield evidence-pack gap routing, canonical bundle recognition, supply-chain signature gates, PR draft follow-up Linear tracking, evidence-backed/deep-ready repository classification, the `/api/analysis/depth-plan` hosted job plan, `/api/analysis/jobs/ci-diagnostics`, `/api/analysis/jobs/security-evidence-review`, `/api/analysis/jobs/harness-compatibility-audit`, `/api/analysis/jobs/reference-set-evaluation`, `/api/analysis/jobs/ai-routing-cost-review`, `/api/analysis/jobs/team-backlog-routing`, the `ECC Tools / Hosted Depth Plan` check-run, `/ecc-tools analyze --job ...` PR-comment dispatch, non-blocking per-hosted-job result check-runs backed by 30-day result cache records, `/ecc-tools analyze --job status` cache lookup, and cache-aware next-job recommendations in the depth-plan check-run | Next work is evaluator-backed hosted promotion | +| ECC Tools next-level app | Billing audit, PR checks, deep analyzer, sync backlog, evaluator/RAG corpus, analysis-depth readiness, hosted execution planning, hosted CI diagnostics, hosted security evidence review, hosted harness compatibility audit, hosted reference-set evaluation, hosted AI routing/cost review, hosted team backlog routing, hosted depth-plan check-run, PR-comment hosted job dispatch, hosted job result history/check-runs, hosted result status command, status-aware depth-plan recommendations, hosted promotion readiness | PRs #26-#43 plus #53-#68 landed with test evidence, including AgentShield evidence-pack gap routing, canonical bundle recognition, supply-chain signature gates, PR draft follow-up Linear tracking, evidence-backed/deep-ready repository classification, the `/api/analysis/depth-plan` hosted job plan, `/api/analysis/jobs/ci-diagnostics`, `/api/analysis/jobs/security-evidence-review`, `/api/analysis/jobs/harness-compatibility-audit`, `/api/analysis/jobs/reference-set-evaluation`, `/api/analysis/jobs/ai-routing-cost-review`, `/api/analysis/jobs/team-backlog-routing`, the `ECC Tools / Hosted Depth Plan` check-run, `/ecc-tools analyze --job ...` PR-comment dispatch, non-blocking per-hosted-job result check-runs backed by 30-day result cache records, `/ecc-tools analyze --job status` cache lookup, cache-aware next-job recommendations in the depth-plan check-run, and the `ECC Tools / Hosted Promotion Readiness` corpus-backed PR check-run | Next work is hosted job output scoring and retrieval/model-backed promotion after the deterministic gate | | GitGuardian/Dependabot/CodeRabbit-style checks | Non-blocking taxonomy, deterministic follow-up checks, and local supply-chain gates | ECC-Tools risk taxonomy check plus follow-up signals landed, including Skill Quality, Deep Analyzer Evidence, Analyzer Corpus Evidence, RAG/Evaluator Evidence, PR Review/Salvage Evidence, and AgentShield evidence-pack evidence; #1846 added npm registry signature gates; #1848 added the supply-chain incident-response playbook and `pull_request_target` cache-poisoning validator guard; #1851 added the privileged checkout credential-persistence guard; AgentShield #78, JARVIS #13, and ECC-Tools #53 applied the same hardening outside trunk | Current supply-chain gate complete; deeper hosted review features remain future | -| Harness-agnostic learning system | Audit, adapter matrix, observability, traces, promotion loop | Audit/adapters/observability gates plus `docs/architecture/evaluator-rag-prototype.md`, `examples/evaluator-rag-prototype/`, and ECC-Tools PR #40 define read-only stale-salvage, billing-readiness, CI-failure-diagnosis, harness-config-quality, AgentShield policy-exception, skill-quality evidence, deep-analyzer evidence, and RAG/evaluator comparison scenarios with trace, report, playbook, verifier, and predictive-check artifacts | Local corpus complete; hosted integration remains future | -| Linear roadmap is detailed | Linear project status plus repo mirror | Repo mirror exists; issue creation was retried on 2026-05-12 and remains blocked by the workspace free issue limit; this May 13 sync adds ECC #1860, AgentShield #78-#82, JARVIS #13, ECC-Tools #53-#67, resolved queue/discussion counts, and Linear project status updates through ECC-Tools #67 | Needs recurring status updates after each merge batch | +| Harness-agnostic learning system | Audit, adapter matrix, observability, traces, promotion loop | Audit/adapters/observability gates plus `docs/architecture/evaluator-rag-prototype.md`, `examples/evaluator-rag-prototype/`, and ECC-Tools PR #40 define read-only stale-salvage, billing-readiness, CI-failure-diagnosis, harness-config-quality, AgentShield policy-exception, skill-quality evidence, deep-analyzer evidence, and RAG/evaluator comparison scenarios with trace, report, playbook, verifier, and predictive-check artifacts; ECC-Tools PR #68 now turns that corpus into a deterministic PR check-run gate | Deterministic hosted PR check integrated; hosted retrieval remains future | +| Linear roadmap is detailed | Linear project status plus repo mirror | Repo mirror exists; issue creation was retried on 2026-05-12 and remains blocked by the workspace free issue limit; this May 13 sync adds ECC #1860, AgentShield #78-#82, JARVIS #13, ECC-Tools #53-#68, resolved queue/discussion counts, and Linear project status updates through ECC-Tools #68 | Needs recurring status updates after each merge batch | | Flow separation and progress tracking | Flow lanes with owner artifacts and update cadence | This roadmap defines lanes below and `docs/architecture/progress-sync-contract.md` makes GitHub/Linear/handoff/roadmap sync part of the readiness gate | Active | | Realtime Linear sync | Project updates while issue limit is blocked; issues later | ECC-Tools #39 implements opt-in Linear API sync for deferred follow-up backlog items, and ECC-Tools #54 adds copy-ready PR drafts to that backlog when draft PR shells are not opened; `docs/architecture/progress-sync-contract.md` defines the local file-backed realtime boundary while issue capacity is blocked | Needs workspace capacity/config rollout | | Observability for self-use | Local readiness gate, traces, status snapshots, HUD/status contract, risk ledger, progress-sync contract | `npm run observability:ready` reports 21/21 | Complete for local gate | @@ -469,9 +480,9 @@ repo evidence and merge commits. | Queue hygiene and salvage | GitHub PR/issue state, salvage ledger | Append ledger entries for any future stale closures | Every cleanup batch | | Release and publication | rc.1 release docs, publication readiness doc | Naming matrix and plugin submission/contact checklist | Before any tag | | Harness OS core | Audit, adapter matrix, observability docs, `ecc2/` | HUD/session-control acceptance spec | Weekly until GA | -| Evaluation and RAG | Reference-set validation, harness audit, traces, ECC-Tools corpus | Read-only evaluator/RAG prototype plus stale-salvage, billing-readiness, CI-failure-diagnosis, harness-config-quality, AgentShield policy-exception, skill-quality evidence, deep-analyzer evidence, and RAG/evaluator comparison fixtures | Hosted retrieval/check-run automation plan | +| Evaluation and RAG | Reference-set validation, harness audit, traces, ECC-Tools corpus | Read-only evaluator/RAG prototype plus stale-salvage, billing-readiness, CI-failure-diagnosis, harness-config-quality, AgentShield policy-exception, skill-quality evidence, deep-analyzer evidence, and RAG/evaluator comparison fixtures; ECC-Tools #68 publishes the corpus as a hosted promotion readiness check-run | Hosted retrieval/model-backed promotion plan | | AgentShield enterprise | AgentShield PR evidence and roadmap notes | Remediation workflow depth or corpus expansion follow-up | Next implementation batch | -| ECC Tools app | ECC-Tools PR evidence, billing audit, risk taxonomy, evaluator/RAG corpus | ECC-Tools #53 published the supply-chain workflow hardening branch, #54 tracks copy-ready PR drafts in the Linear/project backlog, #55 classifies analysis-depth readiness, #56 exposes the hosted execution plan, #57 executes the first hosted CI diagnostics job, #58 executes the hosted security evidence review job, #59 executes the hosted harness compatibility audit, #60 executes the hosted reference-set evaluation, #61 executes the hosted AI routing/cost review, #62 executes hosted team backlog routing, #63 publishes the hosted depth-plan check-run, and #64 dispatches hosted jobs from PR comments; next work is hosted result history/check-run summaries | Next implementation batch | +| ECC Tools app | ECC-Tools PR evidence, billing audit, risk taxonomy, evaluator/RAG corpus | ECC-Tools #53 published the supply-chain workflow hardening branch, #54 tracks copy-ready PR drafts in the Linear/project backlog, #55 classifies analysis-depth readiness, #56 exposes the hosted execution plan, #57 executes the first hosted CI diagnostics job, #58 executes the hosted security evidence review job, #59 executes the hosted harness compatibility audit, #60 executes the hosted reference-set evaluation, #61 executes the hosted AI routing/cost review, #62 executes hosted team backlog routing, #63 publishes the hosted depth-plan check-run, #64 dispatches hosted jobs from PR comments, #65 persists hosted result history/check-runs, #66 exposes hosted job status from PR comments, #67 makes depth-plan recommendations cache-aware, and #68 publishes hosted promotion readiness from the evaluator/RAG corpus; next work is hosted job output scoring and retrieval/model-backed promotion | Next implementation batch | | Linear progress | Linear project status updates, `docs/architecture/progress-sync-contract.md`, and this mirror | Status update with queue/evidence/missing gates | Every significant merge batch | The project status update should always include: @@ -688,9 +699,9 @@ Acceptance: PR #82 expanded corpus coverage for env proxy hijacks and out-of-band exfiltration; and ECC-Tools PRs #42/#43 now route and recognize evidence packs. The next slice is hosted evidence-pack workflow depth. -2. Add evaluator-backed hosted promotion on top of the shipped executor, - status, and cache-aware depth-plan surfaces, keeping retrieval/vector work - behind deterministic fixture evaluation first. +2. Extend the #68 deterministic hosted promotion gate from corpus/path matching + to hosted job output scoring, keeping retrieval/vector work behind fixture + evaluation until the scoring contract is stable. 3. Enable/configure the merged Linear backlog sync path after workspace issue capacity clears or the Linear workspace is upgraded, then verify PR-draft salvage items land in the expected project. diff --git a/tests/docs/evaluator-rag-prototype.test.js b/tests/docs/evaluator-rag-prototype.test.js index 52459379..a62f3e90 100644 --- a/tests/docs/evaluator-rag-prototype.test.js +++ b/tests/docs/evaluator-rag-prototype.test.js @@ -130,12 +130,12 @@ test('candidate playbook preserves stale-salvage operating rules', () => { } }); -test('roadmap points to the evaluator RAG prototype and keeps hosted integration open', () => { +test('roadmap points to the evaluator RAG prototype and hosted PR check', () => { const roadmap = read('docs/ECC-2.0-GA-ROADMAP.md'); assert.ok(roadmap.includes('docs/architecture/evaluator-rag-prototype.md')); assert.ok(roadmap.includes('examples/evaluator-rag-prototype/')); - assert.ok(roadmap.includes('Local corpus complete; hosted integration remains future')); + assert.ok(roadmap.includes('Deterministic hosted PR check integrated; hosted retrieval remains future')); }); test('billing readiness scenario rejects launch copy overclaims', () => {