818 stripped the source map because it had stale paths. Empirically that
made coder agents far slower — they spent most of each session re-discovering
the codebase via Read/Grep before reaching any Edit, and ran out of turn
budget without committing.
Restoring a fresh source map keyed off current master. Uses directories
where possible so it stays useful through future decomposes, plus a
"Canonical examples" section pointing at the patterns to copy when adding
new CRDT collections, RPC handlers, services, chat commands, etc.
This is a stopgap until 819 (auto-generated source-map-gen) lands.
Mergemaster needs more headroom for heavy merges (e.g. the slug-to-numeric
ID migration touching many files, or the FS-shadow deletion stories that
require fixing test setup across the codebase). 60 turns wasn't enough
for the larger ones.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Append to all coder/opus system_prompts: for delete/signature-change
refactors, delete first and let compiler errors guide the call-site
walk; do not pre-read files predicting breakage. Reduces exploration
overhead on mechanical refactors.
- Bump mergemaster inactivity_timeout_secs 300 -> 900 (15 min) so
mergemaster survives the 5-minute API rate-limit backoff. Without
this, mergemaster gets killed for inactivity while waiting on rate
limit clear, blocking all merges.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stories like the broadcaster-consumer migrations legitimately need ~60
substantive turns (16 ProjectConfig initializer sites + main.rs subscriber
+ reading existing patterns to mirror). 50 was too tight.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Full inventory of all gateway and project server endpoints with caller,
purpose, latency/freshness/durability requirements. Classifies each as
write/read/external-webhook/frontend-asset. Maps write endpoints to
target CRDT collections, proposes RPC frame shapes for read endpoints,
drafts the unsigned read-RPC protocol (envelope, correlation IDs, TTL,
error codes, peer-offline handling), lists in-memory state needing CRDT
migration with proposed types, and defines a wave-ordered migration plan
with explicit dependencies (story 665 Ed25519 auth as the blocker for
write migrations).
Real cause of mergemaster turn-burnout: not merge conflicts, just polling
overhead. The server-side tool_merge_agent_work IS designed to block until
the merge completes, but the MCP client times out after 60s. The agent
then polls get_merge_status, with 30-60s sleeps between polls — each
poll cycle costs 2 turns (sleep + tool call). The merge takes 5-10 min
for a clean run, so the agent burns 10-20 turns just waiting.
Updated workflow tells mergemaster:
- 'operation timed out' is normal, do NOT immediately re-call (would queue
a duplicate merge)
- Use Bash sleep 300 (one 5-min wait = 1 turn) between polls
- Cap at 3 polls = 15 minutes total, plenty for any clean merge
- Reserve turns for actual fix-up work if gates fail
Combined with the earlier 30→60 turn / $5→$15 budget bump, this should
land any merge with no real conflicts in 3-5 turns total. Plenty of
headroom remaining for genuine gate-fix work.
30 turns is too tight for non-trivial merge gate failures. Combined with
the 3-retry cap, stories with any post-merge fix-up needed (cargo fmt
nits, slightly out-of-date diffs after parallel merges, etc.) get
permanently blocked.
This is a stopgap until story 668 lands (which will keep gates_passed=false
work in the coder stage entirely, so mergemaster only ever sees clean
diffs and the original 30 turns / $5 is fine again).
Captures the dual representation we have today (legacy filesystem stage
strings + front-matter flags vs the typed Stage/ArchiveReason/ExecutionState
enums in pipeline_state.rs that are defined-but-not-wired) and itemises the
transitions and behaviours we have identified as missing or partially
implemented (first-class supersede/abandon/hold verbs, type-conversion side
effects, pinned-agent honouring under contention, blocked-flag enforcement
beyond auto-assign, ghost-story recovery, etc.).
Section (b) is intended as a living dumping ground — append new
transitions and incidents as they come up so that the state-machine
roadmap (spike 613 in backlog) has a ready-made input.
Agents now read specs/00_CONTEXT.md (what the project does) and
specs/tech/STACK.md (tech stack + source map) in addition to the
README. STACK.md rewritten to reflect current state — removes stale
references to biome, tauri-specta, .story_kit.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Agents need to know the gateway is a mode of the binary, not a
separate app, and that UI stories are frontend React work, not
Rust backend restructuring.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
cargo fmt without --all fails with "Failed to find targets" in
workspace repos. This was blocking every story's gates. Also ran
cargo fmt --all to fix all existing formatting issues.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Documents the three modes of the huskies binary: standard single-project
server, headless build agent (--rendezvous), and multi-project gateway
(--gateway). Includes projects.toml config example and Docker Compose
sketch for multi-project setup.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Agents now know to add //! module comments and /// doc comments
to new public items, keeping documentation consistent with the
codebase-wide doc pass from story 542.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Strip all filesystem pipeline references that were causing agents to
waste turns searching for story files on disk. The README now points
agents at MCP tools exclusively and documents the async run_tests
workflow.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
run_tests MCP tool now spawns tests in the background and returns
immediately. Agents poll get_test_result to check completion. This
prevents zombie cargo processes from holding the build lock when the
CLI times out the MCP call before tests finish.
Also fixes agent permission mode: acceptEdits replaces invalid
allowFullAutoEdit that was causing agents to crash-loop on spawn.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Agents were spending entire $5 budgets grepping the codebase and reading
git history instead of making fixes when the story already specifies
exact file paths and function names. Changed bug workflow from
"investigate root cause first" to "trust the story, act fast" — go
directly to the specified location when the story tells you where.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Agents were running script/test directly through the PTY, streaming
the full output of npm install, cargo clippy, cargo test, and frontend
builds into session logs. This tripled session log sizes (~200KB to
~600KB per session) and contributed to CLI SIGABRT crashes.
The run_tests MCP tool already runs script/test server-side and returns
a truncated JSON summary. Agents now use it exclusively.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds the markdown shadows for stories filed during today's stress-test
session, plus a SESSION_HANDOFF document for picking up the work in
a future session.
New stories (510-521):
510 — bug: stale 1_backlog filesystem shadows get re-promoted by timers
511 — bug: CRDT lamport clock resets to 1 on restart (FIXED in 99557635)
512 — story: migrate chat commands from filesystem lookup to CRDT/DB
513 — story: startup reconcile pass for state-machine drift detection
514 — story: delete_story should do a full cleanup
515 — story: debug MCP tool to dump in-memory CRDT state
516 — story: update_story.description should create the section if missing
517 — story: remove filesystem-shadow fallback paths from lifecycle.rs
518 — story: apply_and_persist should log persist_tx send failures
519 — story: mergemaster should fail loudly on no-op merges (mostly
obviated by Stage::Merge { commits_ahead: NonZeroU32 } in 520)
520 — story: typed pipeline state machine in Rust (sketches added in f7d69cde)
521 — story: MCP capability to write a CRDT tombstone for a story
Refactor 436 (unify story stuck states) is marked superseded by 520
via front_matter — its functionality is now part of the
Stage::Archived { reason: ArchiveReason } enum in story 520's design.
The SESSION_HANDOFF_2026-04-09.md document captures: the four-state-machine
drift situation that motivated story 520, today's bug fixes (502 + 511),
the off-leash rogue commit incident (forensic tag rogue-commit-2026-04-09-ac9f3ecf
preserved), the recommended next-session priority order, and useful
diagnostic recipes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>