docs: refresh ECC 2.0 reference architecture (#1783)

2026-05-12 15:47:27 +08:00 · 2026-05-12 02:03:07 -04:00
parent cb2a70ce72
commit 60bd26fadf
2 changed files with 233 additions and 48 deletions
--- a/docs/ECC-2.0-GA-ROADMAP.md
+++ b/docs/ECC-2.0-GA-ROADMAP.md
@@ -34,8 +34,9 @@ As of 2026-05-12:
 - Do not publish release or social announcements until the GitHub release,
  npm/package state, billing state, and plugin submission surfaces are verified
  with fresh evidence.
- Do not treat closed stale PRs as discarded. Inspect them, port useful current
-  compatible work on maintainer-owned branches, and credit the source PR.
+- Do not treat closed stale PRs as discarded. Pair each cleanup batch with a
+  salvage pass: inspect the closed diffs, port useful compatible work on
+  maintainer-owned branches, and credit the source PR.
 - Do not create new Linear issues until the active issue limit is cleared.

 ## Reference Pressure
@@ -167,7 +168,8 @@ Acceptance:
 - Each useful artifact is marked landed, Linear/project-tracked, salvage
  branch, or archive/no-action.
 - Stale PR salvage policy stays in force: close stale/conflicted PRs first,
-  then port useful compatible content on maintainer branches with attribution.
+  record a salvage ledger item, then port useful compatible content on
+  maintainer branches with attribution.
 - #1687 localization leftovers are handled only by translator/manual review,
  not blind cherry-pick.

@@ -181,3 +183,5 @@ Acceptance:
   payments announcement.
 5. Inventory `_legacy-documents-*` and map useful artifacts to landed,
   milestone-tracked, salvage, or archive states.
+6. Build the stale-PR salvage ledger from closed cleanup batches, then port
+   useful pieces in small attributed maintainer PRs.
--- a/docs/ECC-2.0-REFERENCE-ARCHITECTURE.md
+++ b/docs/ECC-2.0-REFERENCE-ARCHITECTURE.md
@@ -1,57 +1,238 @@
 # ECC 2.0 Reference Architecture

-Research summary from competitor/reference analysis (2026-03-22).
-
-For the current GA execution roadmap and Linear milestone mirror, see
+Current execution mirror:
 [`ECC-2.0-GA-ROADMAP.md`](ECC-2.0-GA-ROADMAP.md).

-## Competitive Landscape
+This document turns the May 2026 reference sweep into concrete ECC backlog
+shape. It is not a second strategy memo: every reference pressure below should
+land as an adapter, check, observable signal, security policy, PR review
+surface, or release-readiness gate.

-| Project | Stars | Language | Type | Multi-Agent | Worktrees | Terminal-native |
-|---------|-------|----------|------|-------------|-----------|-----------------|
-| **ECC 2.0** | - | Rust | TUI | Yes | Yes | **Yes (SSH)** |
-| superset-sh/superset | 7.7K | TypeScript | Electron | Yes | Yes | No (desktop) |
-| standardagents/dmux | 1.2K | TypeScript | TUI (Ink) | Yes | Yes | Yes |
-| opencode-ai/opencode | 11.5K | Go | TUI | No | No | Yes |
-| smtg-ai/claude-squad | 6.5K | Go | TUI | Yes | Yes | Yes |
+## Reference Baseline

-## Three-Layer Architecture
+Snapshot date: 2026-05-12.

-```
-┌─────────────────────────────────┐
-│        TUI Layer (ratatui)      │  User-facing dashboard
-│  Panes, diff viewer, hotkeys    │  Communicates via Unix socket
-├─────────────────────────────────┤
-│     Runtime Layer (library)     │  Workspace runtime, agent registry,
-│  State persistence, detection   │  status detection, SQLite
-├─────────────────────────────────┤
-│     Daemon Layer (process)      │  Persistent across TUI restarts
-│  Terminal sessions, git ops,    │  PTY management, heartbeats
-│  agent process supervision      │
-└─────────────────────────────────┘
+| Reference | Primary pressure on ECC 2.0 | Concrete ECC delta |
+| --- | --- | --- |
+| [`stablyai/orca`](https://github.com/stablyai/orca) | Worktree-native multi-agent IDE with terminals, source control, GitHub integration, SSH, notifications, design/browser mode, account switching, and per-worktree context. | Treat worktree lifecycle, review state, notification state, and account/provider identity as first-class adapter signals. |
+| [`superset-sh/superset`](https://github.com/superset-sh/superset) | Desktop AI-agent workspace with parallel execution, worktree isolation, diff review, workspace presets, and broad CLI-agent compatibility. | Add workspace preset taxonomy and make ECC2 session/worktree state exportable enough for external editors to consume. |
+| [`standardagents/dmux`](https://github.com/standardagents/dmux) | Tmux/worktree orchestration, lifecycle hooks, multi-select agent control, smart merging, file browser, notifications, and cleanup. | Add lifecycle-hook coverage to the harness matrix and define merge/conflict queue events. |
+| [`aidenybai/ghast`](https://github.com/aidenybai/ghast) | Native macOS terminal multiplexer with cwd-grouped workspaces, panes, tabs, drag/drop, search, and notifications. | Preserve terminal-native ergonomics while adding cwd/session grouping and searchable handoff/session records. |
+| [`jarrodwatts/claude-hud`](https://github.com/jarrodwatts/claude-hud) | Always-visible Claude Code statusline for context, tools, agents, todos, and transcript-backed activity. | Formalize the ECC HUD/status payload for context, cost, tool calls, active agents, todos, queue state, checks, and risk. |
+| [`stanford-iris-lab/meta-harness`](https://github.com/stanford-iris-lab/meta-harness) | Automated search over task-specific harness design: what to store, retrieve, and show. | Split ECC improvement loops into scenario spec, proposer trace, verifier result, and promoted playbook. |
+| [`greyhaven-ai/autocontext`](https://github.com/greyhaven-ai/autocontext) | Recursive harness improvement using traces, reports, artifacts, datasets, playbooks, and role-separated evaluators. | Store reusable traces and playbooks before mutating installed harness assets. |
+| [`NousResearch/hermes-agent`](https://github.com/NousResearch/hermes-agent) | Self-improving operator shell with memories, skills, scheduler, gateways, subagents, terminal backends, and migration tooling. | Keep ECC portable across local, SSH, container, and hosted terminal backends without hiding the underlying commands. |
+| [`anthropics/claude-code`](https://github.com/anthropics/claude-code), [`sst/opencode`](https://github.com/sst/opencode), Zed, Codex, Cursor, Gemini | Different agent harnesses expose different hooks, plugin surfaces, session stores, config files, and review loops. | Maintain a public adapter compliance matrix instead of treating one harness as the canonical UX. |
+| Local Claude Code source review | Session, tool, permission, hook, remote, analytics, task, and context-suggestion surfaces are more structured than the public CLI UX suggests. | Model status and risk events around session messages, permission requests, tool progress, context pressure, and summary state. |
+
+## Architecture Shape
+
+ECC 2.0 should be a harness operating system, not only a catalog of commands,
+agents, and skills.
+
+```text
+┌──────────────────────────────────────────────────────────────┐
+│ Operator Surface                                             │
+│ CLI, plugin, TUI, HUD/statusline, release gates, PR checks   │
+├──────────────────────────────────────────────────────────────┤
+│ Harness Adapter Layer                                        │
+│ Claude Code, Codex, OpenCode, Cursor, Gemini, Zed, dmux,     │
+│ Orca, Superset, Ghast, terminal-only                         │
+├──────────────────────────────────────────────────────────────┤
+│ Worktree, Session, And Queue Runtime                         │
+│ worktrees, panes, sessions, todos, checks, merge/conflict    │
+│ queues, notification state, ownership, handoff exports       │
+├──────────────────────────────────────────────────────────────┤
+│ Observability And Evaluation Loop                            │
+│ JSONL traces, status snapshots, risk ledger, harness audit,  │
+│ scenario specs, verifiers, promoted playbooks, RAG sets      │
+├──────────────────────────────────────────────────────────────┤
+│ Security And Commercial Platform                             │
+│ AgentShield policies/SARIF, ECC Tools checks, billing,       │
+│ Linear/GitHub sync, enterprise reports                       │
+└──────────────────────────────────────────────────────────────┘
 ```

-## Patterns to Adopt
+## Reference-To-Backlog Map

-### From Superset (Electron, 7.7K stars)
- **Workspace Runtime Registry** — trait-based abstraction with capability flags
- **Persistent daemon terminal** — sessions survive restarts via IPC
- **Per-project mutex** for git operations (prevents race conditions)
- **Port allocation** per workspace for dev servers
- **Cold restore** from serialized terminal scrollback
+### Worktree And Session Orchestration

-### From dmux (Ink TUI, 1.2K stars)
- **Worker-per-pane status detection** — fingerprint terminal output + LLM classification
- **Agent Registry** — centralized agent definitions (install check, launch cmd, permissions)
- **Retry strategies** — different policies for destructive vs read-only operations
- **PaneLifecycleManager** — exclusive locks preventing concurrent pane races
- **Lifecycle hooks** — worktree_created, pre_merge, post_merge
- **Background cleanup queue** — async worktree deletion
+Adopt from Orca, Superset, dmux, and Ghast:

-## ECC 2.0 Advantages
- Terminal-native (works over SSH, unlike Superset)
- Integrates with 116-skill ecosystem
- AgentShield security scanning
- Self-improving skill evolution (continuous-learning-v2)
- Rust single binary (3.4MB, no runtime deps)
- First Rust-based agentic IDE TUI in open source
+- Worktree lifecycle events: create, resume, pause, stop, diff, review, PR,
+  merge-ready, conflict, stale, close, salvage.
+- Session grouping by repo, branch, cwd, task, owner, and harness.
+- Workspace presets for release lane, PR triage lane, docs lane, security lane,
+  and test-writer lane.
+- Notifications for blocked CI, dirty worktrees, merge conflicts, stale review,
+  and finished autonomous runs.
+- Review loops that can annotate diffs and PRs without taking ownership away
+  from maintainers.
+
+Repo work:
+
+- `everything-claude-code`: extend the adapter compliance matrix and public
+  scorecard onramp.
+- `ecc2`: surface session/worktree state through a stable local payload before
+  adding hosted telemetry.
+- `ECC-Tools`: consume the same lifecycle events for PR checks, issue routing,
+  and Linear sync.
+
+Verification:
+
+- `npm run harness:audit -- --format json`
+- `npm run observability:ready`
+- targeted adapter matrix tests once the matrix moves from docs to data
+
+### HUD, Status, And Observability
+
+Adopt from Claude HUD and the Claude Code source review:
+
+- Context pressure: usage, compaction risk, large-result warnings, and summary
+  state.
+- Tool activity: active tool, recent tools, duration, risky operations, and
+  permission requests.
+- Agent activity: active subagents, delegated task, branch/worktree, and wait
+  state.
+- Queue activity: open PRs/issues, CI state, stale/conflict batches, review
+  state, and closed-stale salvage backlog.
+- Cost/risk: token cost estimate, destructive-operation risk, hook/MCP risk,
+  and security scan state.
+
+Repo work:
+
+- Keep `docs/architecture/observability-readiness.md` as the operator-facing
+  readiness gate.
+- Define a versioned HUD/status JSON contract that both ECC2 and ECC Tools can
+  consume.
+- Add sample exports from `loop-status`, `session-inspect`, harness audit, and
+  risk ledger into a fixture directory before building visual UI.
+
+Verification:
+
+- `npm run observability:ready`
+- fixture validation for every status payload
+- cross-platform smoke test for commands that read session history
+
+### Self-Improving Harness Loop
+
+Adopt from Meta-Harness, Autocontext, and Hermes Agent:
+
+- Separate the loop into observation, proposal, verification, promotion, and
+  rollback.
+- Store every proposed improvement as trace plus artifact, not only as a final
+  changed file.
+- Promote playbooks only after a verifier proves that they improve a scenario
+  without widening blast radius.
+- Use RAG/reference sets for vetted ECC patterns, team history, CI failures,
+  review outcomes, harness config quality, and security decisions.
+
+Repo work:
+
+- `everything-claude-code`: document scenario specs, verifier contracts, and
+  playbook promotion rules.
+- `ECC-Tools`: map analyzer findings to PR comments, check runs, and Linear
+  tasks without flooding the workspace.
+- `agentshield`: feed prompt-injection and config-risk findings into regression
+  suites.
+
+Verification:
+
+- read-only prototype that emits a trace, report, candidate playbook, and
+  verifier result
+- regression fixture proving a bad proposal is rejected
+
+### AgentShield Enterprise Security Platform
+
+AgentShield should move from useful scanner to enterprise security platform.
+
+Backlog shape:
+
+- Policy schema for org baseline, rule severity, owner, exception, expiration,
+  evidence, and audit trail.
+- SARIF output for GitHub code scanning.
+- Policy packs for OSS, team, enterprise, regulated, high-risk hooks/MCP, and
+  CI enforcement.
+- Supply-chain intelligence for MCP packages, npm/pip provenance, CVEs,
+  typosquats, and dependency reputation.
+- Prompt-injection corpus and regression benchmark.
+- JSON plus executive HTML/PDF report output.
+
+Verification:
+
+- schema unit tests
+- SARIF fixture tests
+- policy-pack golden tests
+- false-positive regression tests from the public issue history
+
+### ECC Tools Commercial And Review Platform
+
+ECC Tools should become the GitHub-native layer for billing, deep analysis,
+PR checks, and Linear progress tracking.
+
+Backlog shape:
+
+- Native GitHub Marketplace billing audit before any payments announcement:
+  plans, seats, org/account mapping, subscription state, overage behavior,
+  downgrade/cancel behavior, and failure modes.
+- Deep analyzer comparable in scope to the useful parts of GitGuardian,
+  Dependabot, CodeRabbit, and Greptile: security evidence, dependency risk,
+  CI/CD recommendations, PR review behavior, config quality, token/cost risk,
+  and harness drift.
+- RAG/reference set over vetted ECC patterns, historical PR outcomes,
+  dependency advisories, CI failures, review decisions, and team-specific
+  conventions.
+- Linear sync that maps findings to project status, milestone evidence, and
+  owner-ready issues without exhausting issue limits.
+
+Verification:
+
+- check-run fixture tests
+- billing webhook replay tests
+- analyzer golden PR fixtures
+- Linear sync dry-run fixture
+
+### Closed-Stale Salvage Lane
+
+Closing stale PRs keeps the public queue usable, but useful work should not be
+lost because a contributor no longer has time to rebase.
+
+Execution rule:
+
+1. Close stale, conflicted, or obsolete PRs with a clear courtesy comment.
+2. Record them in a salvage ledger with source PR, author, reason closed,
+   useful files/concepts, risk, and recommended maintainer action.
+3. After the cleanup batch, inspect each closed PR diff manually.
+4. Cherry-pick only when the patch still applies cleanly and preserves current
+   architecture. Otherwise reimplement the useful idea in a fresh maintainer
+   branch.
+5. Preserve attribution in the commit body or PR body.
+6. Comment back on the source PR when useful work lands, linking the maintainer
+   PR or merged commit.
+7. Mark the ledger item as landed, superseded, Linear-tracked, or no-action.
+
+Required safeguards:
+
+- Never blind cherry-pick generated churn, bulk localization, or dependency
+  major-version changes.
+- Prefer small maintainer PRs over one salvage megabranch.
+- Run the same validation gates as normal code, docs, or catalog changes.
+- Keep contributor credit even when the final implementation is rewritten.
+
+## Near-Term Implementation Order
+
+1. Extend the harness adapter matrix and public scorecard onramp.
+2. Add the release/name/plugin publication checklist with evidence fields.
+3. Define the HUD/status JSON contract and fixture directory.
+4. Start AgentShield policy schema plus SARIF fixtures.
+5. Audit ECC Tools billing and check-run surfaces.
+6. Inventory legacy folders and closed-stale PRs into the salvage ledger.
+7. Port useful stale work in small attributed maintainer PRs.
+
+## Non-Goals
+
+- Hosted telemetry before the local event model is useful and testable.
+- Automatic mutation of user harness configs without verifier evidence.
+- Treating any one agent harness as the canonical interface.
+- Release or payments announcements before command, package, marketplace, and
+  billing evidence is fresh.