diff --git a/.huskies/specs/tech/SPIKE_679_HTTP_TO_CRDT_BUS.md b/.huskies/specs/tech/SPIKE_679_HTTP_TO_CRDT_BUS.md new file mode 100644 index 00000000..37040c94 --- /dev/null +++ b/.huskies/specs/tech/SPIKE_679_HTTP_TO_CRDT_BUS.md @@ -0,0 +1,401 @@ +# Spike 679: Migrate Inter-Component HTTP to Signed CRDT WebSocket Bus + +## 1. Endpoint Inventory + +Every HTTP/WS endpoint currently exposed by the gateway and project servers, with caller, purpose, and requirements. + +### Standard-Mode Server Endpoints + +#### WebSocket + +| Path | Caller | Purpose | Latency | Freshness | Durability | +|------|--------|---------|---------|-----------|------------| +| `/ws` | Browser frontend | Chat messages, command output streaming | Real-time | N/A (stream) | Ephemeral | +| `/crdt-sync` | Peer nodes, headless agents | CRDT op replication, snapshot exchange | Sub-second | Must converge | Durable (SQLite) | + +#### MCP + +| Method | Path | Caller | Purpose | Latency | Freshness | Durability | +|--------|------|--------|---------|---------|-----------|------------| +| GET/POST | `/mcp` | Claude Code agent (stdio), gateway proxy | Agent tool calls (story create/update, git, shell, etc.) | <500 ms | Strong (mutations) | Durable via CRDT | + +#### Agents API + +| Method | Path | Caller | Purpose | Latency | Freshness | Durability | +|--------|------|--------|---------|---------|-----------|------------| +| POST | `/api/agents/start` | Frontend, MCP | Start a coding agent for a story | <1 s | N/A | Durable (process started) | +| POST | `/api/agents/stop` | Frontend, MCP | Stop a running agent | <1 s | N/A | Durable (process killed) | +| GET | `/api/agents` | Frontend | List active agents and status | <100 ms | Near-real-time | None (in-memory) | +| GET | `/api/agents/config` | Frontend | Read agent config from project.toml | <100 ms | Seconds OK | None | +| POST | `/api/agents/config/reload` | Frontend | Reload config from disk | <500 ms | N/A | None | +| POST | `/api/agents/worktrees` | MCP | Create worktree for a story | <1 s | N/A | Durable (git) | +| GET | `/api/agents/worktrees` | Frontend, MCP | List worktrees | <100 ms | Seconds OK | None | +| DELETE | `/api/agents/worktrees/:story_id` | MCP | Remove a worktree | <1 s | N/A | Durable (git) | +| GET | `/api/agents/:story_id/:name/output` | Frontend, MCP | Read agent log file | <200 ms | Seconds OK | Durable (JSONL file) | +| GET | `/api/work-items/:story_id` | MCP | Get story test results | <100 ms | Seconds OK | Durable (file) | +| GET | `/api/work-items/:story_id/test-results` | MCP | Fetch cached test run output | <100 ms | Seconds OK | Durable (file) | +| GET | `/api/work-items/:story_id/token-cost` | MCP | Get token usage for story | <100 ms | Seconds OK | Durable (file) | +| GET | `/api/token-usage` | Frontend | Aggregate token usage | <100 ms | Minutes OK | Durable (file) | + +#### Project Management + +| Method | Path | Caller | Purpose | Latency | Freshness | Durability | +|--------|------|--------|---------|---------|-----------|------------| +| GET | `/api/project` | Frontend | Get current project config | <100 ms | Seconds OK | Durable (file) | +| POST | `/api/project` | Frontend | Update project config | <500 ms | N/A | Durable (file) | +| DELETE | `/api/project` | Frontend | Reset project config | <500 ms | N/A | Durable (file) | +| GET | `/api/projects` | Frontend | List all known projects | <100 ms | Seconds OK | Durable (file) | +| POST | `/api/projects/forget` | Frontend | Remove project from registry | <500 ms | N/A | Durable (file) | + +#### Chat + +| Method | Path | Caller | Purpose | Latency | Freshness | Durability | +|--------|------|--------|---------|---------|-----------|------------| +| POST | `/api/chat/cancel` | Frontend | Cancel an in-progress chat | <100 ms | N/A | None | + +#### Settings + +| Method | Path | Caller | Purpose | Latency | Freshness | Durability | +|--------|------|--------|---------|---------|-----------|------------| +| GET/PUT | `/api/settings` | Frontend | Read/write general settings | <100 ms | Seconds OK | Durable (JSON store) | +| GET/PUT | `/api/settings/editor` | Frontend | Read/write editor setting | <100 ms | Seconds OK | Durable (JSON store) | +| POST | `/api/settings/open-file` | Frontend | Open file in editor | <500 ms | N/A | None | + +#### IO (Filesystem/Shell) + +| Method | Path | Caller | Purpose | Latency | Freshness | Durability | +|--------|------|--------|---------|---------|-----------|------------| +| POST | `/api/io/fs/read` | Agent (MCP alt), Frontend | Read file contents | <200 ms | Real-time | N/A | +| POST | `/api/io/fs/write` | Agent (MCP alt), Frontend | Write file contents | <500 ms | N/A | Durable (fs) | +| POST | `/api/io/fs/list` | Frontend | List directory relative to project | <100 ms | Real-time | N/A | +| POST | `/api/io/fs/list/absolute` | Frontend | List absolute path directory | <100 ms | Real-time | N/A | +| POST | `/api/io/fs/create/absolute` | Frontend | Create file at absolute path | <500 ms | N/A | Durable (fs) | +| GET | `/api/io/fs/home` | Frontend | Get home directory | <50 ms | Stable | N/A | +| GET | `/api/io/fs/files` | Frontend | File tree of project | <500 ms | Seconds OK | N/A | +| POST | `/api/io/search` | Frontend | Ripgrep search | <1 s | Real-time | N/A | +| POST | `/api/io/shell/exec` | Frontend | Execute shell command | Variable | N/A | None | + +#### Model / LLM Config + +| Method | Path | Caller | Purpose | Latency | Freshness | Durability | +|--------|------|--------|---------|---------|-----------|------------| +| GET/POST | `/api/model` | Frontend | Read/write active model selection | <100 ms | Seconds OK | Durable (JSON store) | +| GET | `/api/ollama/models` | Frontend | List available Ollama models | <1 s | Minutes OK | None | +| GET | `/api/anthropic/key/exists` | Frontend | Check if API key is set | <50 ms | Seconds OK | None | +| POST | `/api/anthropic/key` | Frontend | Store Anthropic API key | <100 ms | N/A | Durable (store) | +| GET | `/api/anthropic/models` | Frontend | List Claude models | <1 s | Minutes OK | None | + +#### Wizard + +| Method | Path | Caller | Purpose | Latency | Freshness | Durability | +|--------|------|--------|---------|---------|-----------|------------| +| GET | `/api/wizard` | Frontend | Get wizard state | <100 ms | Real-time | Durable (store) | +| PUT | `/api/wizard/step/:step/content` | Frontend | Update step content | <200 ms | N/A | Durable (store) | +| POST | `/api/wizard/step/:step/confirm` | Frontend | Confirm a wizard step | <200 ms | N/A | Durable | +| POST | `/api/wizard/step/:step/skip` | Frontend | Skip a wizard step | <100 ms | N/A | Durable | +| POST | `/api/wizard/step/:step/generating` | Frontend | Mark step as generating | <100 ms | N/A | Durable | + +#### Bot / Transports + +| Method | Path | Caller | Purpose | Latency | Freshness | Durability | +|--------|------|--------|---------|---------|-----------|------------| +| POST | `/api/bot/command` | Frontend | Send a bot command | <500 ms | N/A | None | +| GET/PUT | `/api/bot/config` | Frontend | Read/write bot config | <100 ms | Seconds OK | Durable (file) | + +#### Auth / OAuth + +| Method | Path | Caller | Purpose | Latency | Freshness | Durability | +|--------|------|--------|---------|---------|-----------|------------| +| GET | `/oauth/authorize` | Browser redirect | Start OAuth flow | <200 ms | N/A | None | +| GET | `/callback` | OAuth provider redirect | Handle OAuth callback | <500 ms | N/A | Durable (token) | +| GET | `/oauth/status` | Frontend | Check OAuth connection status | <100 ms | Seconds OK | None | + +#### Webhooks (External Inbound) + +| Method | Path | Caller | Purpose | Latency | Freshness | Durability | +|--------|------|--------|---------|---------|-----------|------------| +| GET/POST | `/webhook/whatsapp` | WhatsApp platform | Receive WhatsApp messages | <200 ms | Real-time | None (forwarded) | +| POST | `/webhook/slack` | Slack platform | Receive Slack events | <200 ms | Real-time | None (forwarded) | +| POST | `/webhook/slack/command` | Slack platform | Receive Slack slash commands | <200 ms | Real-time | None (forwarded) | + +#### Debug / Health + +| Method | Path | Caller | Purpose | Latency | Freshness | Durability | +|--------|------|--------|---------|---------|-----------|------------| +| GET | `/health` | Gateway, load balancer | Health check | <50 ms | Real-time | None | +| GET | `/debug/crdt` | Developer/ops | Dump raw CRDT state | <500 ms | Real-time | None | +| GET (SSE) | `/api/agents/:story_id/:name/stream` | Frontend | Stream live agent output | Real-time | N/A | None | +| GET | `/api/events` | Gateway polling task | Poll project events | <200 ms | Seconds OK | None | + +#### Frontend Assets + +| Path | Purpose | +|------|---------| +| `/` | SPA entry point | +| `/assets/*` | JS/CSS/fonts (rust-embed) | +| `/*path` | SPA fallback | + +--- + +### Gateway-Mode Server Endpoints + +| Method | Path | Caller | Purpose | Latency | Freshness | Durability | +|--------|------|--------|---------|---------|-----------|------------| +| GET | `/health` | Load balancer, project containers | Health check | <50 ms | Real-time | None | +| GET | `/bot-config` | Browser | Serve bot config HTML page | <100 ms | N/A | N/A | +| GET | `/api/gateway` | Frontend | Get gateway state (active project, project list) | <100 ms | Seconds OK | Durable (toml) | +| POST | `/api/gateway/switch` | Frontend, MCP | Switch active project | <200 ms | N/A | Durable (in-memory + file) | +| GET | `/api/gateway/pipeline` | Frontend | Aggregate pipeline status across all projects | <1 s | Seconds OK | None (aggregated) | +| POST | `/api/gateway/projects` | Frontend, init_project MCP | Register a new project in projects.toml | <500 ms | N/A | Durable (file) | +| DELETE | `/api/gateway/projects/:name` | Frontend | Remove a registered project | <500 ms | N/A | Durable (file) | +| GET/PUT | `/api/gateway/bot-config` | Frontend | Read/write bot config file | <100 ms | Seconds OK | Durable (file) | +| GET/POST | `/mcp` | Claude Code agent | MCP proxy to active project | <500 ms | Strong | Durable via upstream | +| GET | `/gateway/mode` | Frontend | Check whether gateway mode is active | <50 ms | Stable | None | +| POST | `/gateway/tokens` | Ops/admin | Generate a headless-agent join token | <100 ms | N/A | Durable (in-memory HashMap) | +| POST | `/gateway/register` | Headless build agent at startup | Register agent with token, supply address | <200 ms | N/A | In-memory Vec | +| GET | `/gateway/agents` | Frontend, ops | List all registered headless agents | <100 ms | Seconds OK | In-memory Vec | +| DELETE | `/gateway/agents/:id` | Frontend, ops | Deregister an agent | <200 ms | N/A | In-memory Vec | +| POST | `/gateway/agents/:id/assign` | Frontend, ops | Assign agent to a project | <200 ms | N/A | In-memory Vec | +| POST | `/gateway/agents/:id/heartbeat` | Headless agent (periodic) | Signal agent is alive | <100 ms | Real-time | In-memory Vec | + +--- + +## 2. Classification + +| Endpoint Group | Classification | +|---------------|----------------| +| `/webhook/whatsapp`, `/webhook/slack`, `/webhook/slack/command` | **external-webhook** | +| `/`, `/assets/*`, `/*path`, `/bot-config` (HTML) | **frontend-asset** | +| `POST /api/agents/start`, `POST /api/agents/stop`, `POST /api/agents/worktrees`, `DELETE /api/agents/worktrees/:id` | **write** | +| `POST /api/project`, `DELETE /api/project`, `POST /api/projects/forget` | **write** | +| `PUT /api/settings`, `PUT /api/settings/editor`, `POST /api/settings/open-file` | **write** | +| `POST /api/model`, `POST /api/anthropic/key` | **write** | +| `POST /api/wizard/step/*`, `PUT /api/wizard/step/*` | **write** | +| `POST /api/bot/command`, `PUT /api/bot/config` | **write** | +| `POST /api/io/fs/write`, `POST /api/io/fs/create/absolute`, `POST /api/io/shell/exec` | **write** | +| `POST /api/gateway/switch`, `POST /api/gateway/projects`, `DELETE /api/gateway/projects/:name` | **write** | +| `POST /gateway/tokens`, `POST /gateway/register`, `DELETE /gateway/agents/:id`, `POST /gateway/agents/:id/assign` | **write** | +| `POST /gateway/agents/:id/heartbeat` | **write** | +| `POST /mcp`, `GET /mcp` | **write** (mutations dominate; reads via CRDT subscription eventually) | +| All remaining `GET` endpoints | **read** | +| `POST /api/chat/cancel`, `POST /api/agents/config/reload` | **write** (side-effect only, stateless result) | + +--- + +## 3. Write Endpoints → Target CRDT Collections + +| Endpoint | Current Storage | Target CRDT Collection | Notes | +|----------|----------------|----------------------|-------| +| `POST /gateway/tokens` | `GatewayState.pending_tokens: HashMap` | `tokens` — LWW map keyed by token UUID | TTL field; garbage-collect expired entries | +| `POST /gateway/register` | `GatewayState.joined_agents: Vec` | `nodes` — existing CRDT node collection (extend with agent metadata) | Already partially exists for CRDT mesh peers | +| `POST /gateway/agents/:id/assign` | `joined_agents` Vec mutation | `nodes` — LWW field `assigned_project` per node entry | | +| `DELETE /gateway/agents/:id` | `joined_agents` Vec mutation | `nodes` — tombstone / remove entry | Add-wins or explicit remove flag | +| `POST /gateway/agents/:id/heartbeat` | `joined_agents` Vec `last_seen` field | `nodes` — LWW `last_seen_ms` field per node | Low-cost: just a timestamp LWW | +| `POST /api/agents/start` | `AgentPool.agents: HashMap` | No new CRDT; agent process is local. Side-effect only. Assign record if cross-node visibility needed → `active_agents` LWW map | | +| `POST /api/agents/stop` | `AgentPool.agents` mutation | Same as above | | +| `POST /api/agents/worktrees` | git filesystem | No CRDT needed; git worktrees are local | | +| `POST /api/gateway/switch` | `GatewayState.active_project` in-memory | `gateway_config` — LWW field `active_project` | | +| `POST /api/gateway/projects` | `projects.toml` file | `gateway_config.projects` — LWW map by project name | | +| `DELETE /api/gateway/projects/:name` | `projects.toml` file | `gateway_config.projects` — tombstone entry | | +| `PUT /api/settings`, `PUT /api/settings/editor` | `JsonFileStore` | `settings` — LWW map per key | Low priority; settings are single-node today | +| `POST /api/model` | `JsonFileStore` | `settings` — same LWW map | | +| `POST /api/anthropic/key` | Encrypted file/env | Stay out of CRDT (secrets) | | +| `PUT /api/bot/config` | `.huskies/bot.toml` file | Stay out of CRDT (credentials) | | +| `POST /mcp` | CRDT (already) | Already replicated via CRDT WebSocket bus | Story/pipeline mutations are CRDT-native | +| Merge job tracking | `AgentPool.merge_jobs: HashMap` | `merge_jobs` — LWW map by story_id, or append-only log | Needed for cross-node merge visibility | +| Test job tracking | `AppContext.test_job_registry: HashMap` | `test_jobs` — LWW map by story_id | Needed so any node can query test status | + +--- + +## 4. Read Endpoints → Proposed RPC Frame Shapes + +| Endpoint | Request Fields | Response Fields | +|----------|---------------|-----------------| +| `GET /health` | _(none)_ | `{status: "ok", version: string, node_id: string}` | +| `GET /api/gateway` | _(none)_ | `{active_project: string, projects: {name, url, healthy}[]}` | +| `GET /api/gateway/pipeline` | _(none)_ | `{projects: {name: string, pipeline: PipelineStages}[]}` | +| `GET /gateway/agents` | _(none)_ | `{agents: {id, label, address, assigned_project, last_seen_ms, alive: bool}[]}` | +| `GET /api/agents` | _(none)_ | `{agents: {story_id, agent_name, pid, status, started_at}[]}` | +| `GET /api/agents/worktrees` | _(none)_ | `{worktrees: {story_id, path, branch}[]}` | +| `GET /api/agents/:id/:name/output` | _(path params)_ | `{lines: AgentLogLine[]}` | +| `GET /api/work-items/:story_id/test-results` | _(path param)_ | `{passed: bool, output: string, ran_at: timestamp}` | +| `GET /api/work-items/:story_id/token-cost` | _(path param)_ | `{input_tokens: u64, output_tokens: u64, cost_usd: f64}` | +| `GET /api/token-usage` | _(none)_ | `{total_input: u64, total_output: u64, per_agent: {...}[]}` | +| `GET /api/settings` | _(none)_ | `{settings: Record}` | +| `GET /api/model` | _(none)_ | `{provider: string, model: string}` | +| `GET /api/events` | `{since: unix_ms}` | `{events: {type, payload, ts}[], next_since: unix_ms}` | +| `GET /debug/crdt` | _(none)_ | `{crdt_doc: json}` | +| `GET /api/wizard` | _(none)_ | `{steps: WizardStep[], current_step: string}` | +| `GET /api/anthropic/models` | _(none)_ | `{models: {id, name}[]}` | +| `GET /api/ollama/models` | _(none)_ | `{models: {name, size}[]}` | + +--- + +## 5. Draft: Unsigned Read-RPC Protocol + +### Rationale + +Write mutations already flow through the CRDT bus (signed ops). Read endpoints are the remaining HTTP surface that could be migrated to the same WebSocket channel. This section drafts the envelope format so read RPCs can share the bus without requiring Ed25519 auth (unsigned reads are fine; only writes need authenticity guarantees). + +### Frame Envelope (JSON over WebSocket) + +```json +// Request (caller → peer) +{ + "version": 1, + "kind": "rpc_request", + "correlation_id": "uuid-v4", + "ttl_ms": 5000, + "method": "get_pipeline_status", + "params": {} +} + +// Success response (peer → caller) +{ + "version": 1, + "kind": "rpc_response", + "correlation_id": "uuid-v4", + "ok": true, + "result": { ... } +} + +// Error response +{ + "version": 1, + "kind": "rpc_response", + "correlation_id": "uuid-v4", + "ok": false, + "error": "human-readable message", + "code": "NOT_FOUND | TIMEOUT | PEER_OFFLINE | INTERNAL" +} +``` + +### Correlation IDs + +Each request carries a UUID v4 `correlation_id`. The responder echoes it verbatim. Callers maintain a `HashMap` to route responses back to waiting futures. On TTL expiry the entry is removed and the caller receives `Err(Timeout)`. + +### TTL Semantics + +- Caller specifies `ttl_ms` (default 5000, max 30000). +- If the responding peer does not answer within the TTL, the caller synthesises a `TIMEOUT` error response locally. +- Responders do not need to track TTLs; they answer as fast as they can. +- Callers may use stale cached results if `ttl_ms == 0` is supplied and a cache entry exists (opt-in freshness trade-off). + +### Error Codes + +| Code | Meaning | +|------|---------| +| `NOT_FOUND` | Resource does not exist | +| `TIMEOUT` | Peer did not respond within TTL | +| `PEER_OFFLINE` | No live peer with the requested capability is connected | +| `UNAUTHORIZED` | Caller lacks permission (future, when auth lands) | +| `INTERNAL` | Unexpected server-side error | + +### Peer-Offline Handling + +- Before sending a request the caller checks whether any peer that can serve the method is currently connected. +- If no peer is online, the caller immediately returns `PEER_OFFLINE` without queuing (fail-fast). +- For idempotent reads, callers may fall back to a local CRDT-materialized view if `PEER_OFFLINE` or `TIMEOUT` is received. +- Non-idempotent reads (e.g., `exec_shell`) must not be retried automatically. + +### Method Naming Convention + +`.` — e.g. `pipeline.get`, `agents.list`, `health.check`, `events.poll`. + +--- + +## 6. In-Memory State → CRDT Collection Migration + +| Location | Field | Current Type | Proposed CRDT Type | Rationale | +|----------|-------|-------------|-------------------|-----------| +| `gateway.rs::GatewayState` | `pending_tokens` | `HashMap` | **LWW-map** keyed by token UUID, with `expires_at` TTL field | Tokens are short-lived; LWW is fine; GC by TTL | +| `gateway.rs::GatewayState` | `joined_agents` | `Vec` | Extend existing **`nodes` CRDT collection** with agent metadata fields (label, address, assigned_project, last_seen_ms) | Nodes collection already exists for CRDT mesh peers | +| `agents/pool/mod.rs::AgentPool` | `merge_jobs` | `HashMap` | **LWW-map** keyed by story_id; fields: node_id, status, started_at, error | Required for cross-node merge visibility | +| `agents/pool/mod.rs::AgentPool` | `agents` (running agent handles) | `HashMap` | **LWW-map** `active_agents` keyed by story_id; fields: node_id, agent_name, pid(optional), started_at, status | Process handles stay local; only metadata replicated | +| `http/context.rs::AppContext` | `test_job_registry` | `HashMap` (TestJobRegistry) | **LWW-map** `test_jobs` keyed by story_id; fields: node_id, status, started_at, finished_at | Needed so any node can query test run status | +| `agents/pool/auto_assign` | agent throttle / last-seen timestamps | Local variables / in-memory | **LWW-map** `agent_throttle` keyed by agent_name; field: last_dispatched_at | Prevents double-dispatch on multi-node | +| `gateway.rs::GatewayState` | `active_project` | `Arc>` | **LWW register** in `gateway_config` collection, field `active_project` | Single-value; LWW is correct | +| `gateway.rs::GatewayState` | `projects` (BTreeMap) | `Arc>>` | **LWW-map** in `gateway_config.projects` keyed by project name | Infrequently mutated; LWW correct | + +### Summary of Proposed New CRDT Collections + +| Collection | Type | Notes | +|-----------|------|-------| +| `tokens` | LWW-map | Join tokens with TTL; garbage-collect on expiry | +| `nodes` | LWW-map (extend existing) | Already exists; add agent metadata fields | +| `merge_jobs` | LWW-map | One entry per story; overwritten on each merge attempt | +| `active_agents` | LWW-map | One entry per story; metadata only (not process handles) | +| `test_jobs` | LWW-map | One entry per story; test run status | +| `agent_throttle` | LWW-map | One entry per agent name; last-dispatched timestamp | +| `gateway_config` | LWW-map (or flat LWW fields) | `active_project`, `projects` map | + +--- + +## 7. Migration Order and Dependencies + +### Blocking Dependency + +**Story 665 (Ed25519 auth)** must land before any write operation is migrated to the CRDT bus. Unsigned writes on a shared bus would allow any connected peer to forge mutations. Read RPCs do not require auth. + +### Wave 0 — Foundation (no story 665 needed) + +These can land in parallel with or before story 665: + +1. **Extend `nodes` CRDT collection** with `label`, `address`, `assigned_project`, `last_seen_ms` fields. This is a pure schema addition. +2. **Add `merge_jobs` and `active_agents` LWW-maps** to the CRDT document schema (additive; existing nodes ignore unknown fields via `serde(default)`). +3. **Implement unsigned read-RPC multiplexer** on the existing `/crdt-sync` WebSocket channel (new `kind: "rpc_request"/"rpc_response"` frame types, ignored by old peers). + +### Wave 1 — Migrate Heartbeat + Agent Registration (after `nodes` schema extended) + +- Replace `POST /gateway/agents/:id/heartbeat` HTTP call with a CRDT LWW write to `nodes[id].last_seen_ms`. +- Replace `POST /gateway/register` with a CRDT insert into `nodes` collection. +- Replace `POST /gateway/tokens` / token validation with CRDT `tokens` map read/write. +- **Blocks on story 665** for the write side; read queries (list agents, check token) can migrate via read-RPC first. + +### Wave 2 — Migrate Read Endpoints to Read-RPC (no auth required) + +Can land in parallel with Wave 1 write migration: + +- `GET /health` → `health.check` RPC (gateway reads from CRDT `nodes` liveness) +- `GET /gateway/agents` → `agents.list` RPC reading from CRDT `nodes` +- `GET /api/events` polling loop → subscribe to CRDT op stream directly (eliminate polling) +- `GET /api/gateway/pipeline` → `pipeline.get` RPC or direct CRDT materialisation (already replicated) +- `GET /api/agents` → `active_agents.list` RPC reading from CRDT `active_agents` + +### Wave 3 — Migrate Merge and Test Job Tracking (after waves 0–1) + +- Replace `merge_jobs` HashMap with CRDT `merge_jobs` map writes on merge start/completion. +- Replace `test_job_registry` HashMap with CRDT `test_jobs` map writes on test start/completion. +- Enables: any node can query merge or test status without HTTP call to the node that started the job. + +### Wave 4 — Migrate Gateway Config Writes (after story 665) + +- `POST /api/gateway/switch`, `POST /api/gateway/projects`, `DELETE /api/gateway/projects/:name` → CRDT `gateway_config` LWW writes. +- Low urgency; these are infrequent admin operations. Can keep HTTP as a thin wrapper that writes to CRDT. + +### Endpoints That Stay HTTP + +| Endpoint | Reason | +|----------|--------| +| `/webhook/whatsapp`, `/webhook/slack` | External platform callbacks; must remain HTTP | +| `/oauth/authorize`, `/callback` | OAuth redirect flow; must remain HTTP | +| `/api/io/*`, `/api/io/shell/exec` | Local filesystem/shell; process-local, not cross-node | +| `/api/io/fs/*` | Same — local I/O only | +| `/mcp` | External MCP clients (Claude Code CLI) speak HTTP/SSE; gateway proxy stays HTTP | +| `/assets/*`, `/`, `/*path` | Static frontend assets | +| `/api/anthropic/key`, `PUT /api/bot/config` | Credentials — must stay local, never in CRDT | +| `GET /debug/crdt` | Debug only; HTTP fine | + +### Dependency Graph Summary + +``` +story 665 (Ed25519 auth) + └── Wave 1 write migrations (heartbeat, register, assign, tokens) + └── Wave 4 gateway config writes + +Wave 0 (schema extensions + read-RPC multiplexer) [can start now, parallel] + └── Wave 2 read endpoint migrations [can start now, parallel] + └── Wave 3 merge/test job tracking [after Wave 0 schema] +``` + +**Critical path:** Story 665 → Wave 1 → Wave 4. Everything else is parallel.