Captures the architecture for going from "new project" chat command to
a running, container-isolated, editor-accessible huskies project.
Covers the three personas (chat-only / editor-using / multi-project),
the container template (base + stack overlay + project bind mount),
build sandbox model (host stays clean, all dep-code in container),
editor-agnostic SSH access, git integration, and a 5-phase rollout.
Source for upcoming bootstrap stories.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Local-only scratchpad for tracking suspected duplicate-Timmy /
duplicate-create_story incidents while we hunt the cause.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Catches up master with entries added by stories that merged in a binary
predating 1065 (merge-pipeline source-map regen): ErrorBoundary,
WsConnectivity, transition_merge_failure_to_retry, and others.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two stories today (961, 962) passed every other gate and got bounced
at the merge step on a single missing `///` on a `pub mod` line.
Sonnet keeps treating the doc comment as optional when the rule says
"add doc comments to new modules and pub functions/structs/enums."
Promote the rule to its own loud section with no-exceptions wording
and a concrete reminder to run source-map-check before committing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The "living document" rule was soft and got ignored — coders wrote
PLAN.md once at session start and then drifted away from it. Tie the
update to a trigger they already do (the wip/final commit), and call
out stale "Current state" as a process failure.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Observed: stories 917, 918, 920, 910 all turn-limit-killed despite producing
real commits. Tally across their session logs shows 30–55% of assistant
turns were pure narration ("I'll read X next", "Now let me check Y") with
no tool_use. At 80 max_turns the effective work budget was ~44 tool calls,
not enough for a typical bug fix's edit + test + check_criterion cycle.
Changes:
- New optional AgentConfig field max_tool_turns. When set the watchdog
uses it instead of max_turns; only assistant messages whose
data.message.content has at least one tool_use block count.
- count_turns_in_log in agents/pool/auto_assign/watchdog/limits.rs
filters on tool_use. Existing test helper write_fake_session_log now
emits tool_use blocks; added write_fake_mixed_session_log for the
narration regression test.
- agents.toml: coders/coder-opus get max_turns=200 (claude-code's own
--max-turns cap, sized to never bite before the watchdog) and
max_tool_turns=80. qa: 120 / 40. mergemaster: 250 / 100. Budgets
unchanged — the dollar cap remains the runaway-loop backstop, with
~$3-5 worst-case waste if an agent narrates indefinitely.
- Two new regression tests:
* watchdog_does_not_count_narration_only_turns: 5 tool + 30 narration
under max_tool_turns=10 stays Running.
* watchdog_max_tool_turns_overrides_max_turns: 4 tool turns at
max_tool_turns=3 / max_turns=200 still terminates with TurnLimit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bug 903 (run_tests attach instead of respawn) + 904 (MCP progress
notifications + SSE) together eliminate the transport-timeout error
mode from the agent's point of view: long test runs complete without
the MCP client ever observing a tool-call error. Production
verification (see d64f1e94 / ddc4228b deploy at 14:30 UTC today)
confirmed 78s and 65s test runs completing in single processes with
no respawn churn and no retry needed.
The "If run_tests errors with a transport timeout, call it again"
sentence in coder-1/2/3/opus system_prompts (added belt-and-braces
in a97a10fb) is now redundant. Removing it tightens the agent's
mental model down to: call run_tests, wait for the result. No
error-handling branch, no retry semantics to internalise.
This closes the last open AC on story 904.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pre-d64f1e94 the "call run_tests again — it attaches" guidance was a
lie (every call killed the prior job and spawned a fresh one). With
the attach fix in place, the contract is now real and safe to depend
on. Tighten the wording so agents see exactly what to do:
OLD: "Do not use ScheduleWakeup to wait for run_tests; if run_tests
appears to time out, call run_tests again — it attaches to the
in-flight test job and blocks until completion."
NEW: "If run_tests errors with a transport timeout, call it again —
it's idempotent and attaches to the same in-flight test job,
so retries are safe and eventually return a pass/fail result."
Improvements:
- "errors with a transport timeout" matches what the agent literally
observes (a tool-call error), not the vague "appears to time out".
- Explicit on idempotency so agents understand why retry is safe and
don't worry about double-running the suite.
- Drops the ScheduleWakeup clause — already enforced via the
`disallowed_tools` setting on coder-1/2/3/opus, so the prompt
reminder was redundant.
Applied uniformly across coder-1, coder-2, coder-3, coder-opus.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bug 902: the Step 0 "resume from worktree state" instruction told coders
to call git_status / git_log / git_diff to discover prior session work,
which they then extended into hunting for the story `.md` file on disk
via find / ls — pointless post-865, since story content lives only in
the CRDT.
Update Step 0 in coder-1, coder-2, coder-3, and coder-opus to add an
explicit instruction: "To read story content, ACs, or description, call
the `get_story_todos` MCP tool — do NOT search for a story `.md` file
on disk; story content is CRDT-only."
Single substring replacement covers all four agents (identical Step 0
across them).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Add `disallowed_tools` field to `AgentConfig` and render it as
`--disallowedTools` CLI flag in `render_agent_args`
- Set `disallowed_tools = ["ScheduleWakeup"]` on all four coder agents
(coder-1, coder-2, coder-3, coder-opus); QA and mergemaster unaffected
- Append instruction to all coder `system_prompt`s: do not use
ScheduleWakeup to wait for run_tests; if run_tests appears to time out,
call run_tests again — it attaches to the in-flight job and blocks
- Add tests: `render_agent_args_disallowed_tools` and
`coder_agents_disallow_schedule_wakeup`
`tool_run_tests` in `server/src/http/mcp/shell_tools/script.rs` is fully
blocking server-side: it spawns the test child, polls every 1s server-side
until exit (or `TEST_TIMEOUT_SECS = 1200s`), and returns the full
{passed, exit_code, output} directly. There is NO async/started-status
return path.
But two places told agents the wrong story:
1. `tools_list/system_tools.rs` description claimed "Returns immediately
with status: started. Poll get_test_result..." — agents read tool
descriptions for protocol semantics, so they followed this and burned
turns polling get_test_result.
2. `agents.toml` had been correctly saying it blocks, but my last commit
(776aad38) "fixed" it the wrong way based on a misread of the code.
Now both say: run_tests blocks server-side, returns the full result, do
not poll get_test_result. get_test_result remains for external observers
(UI checking on a job another caller started).
Reverts the prompt change in 776aad38 with the correct text.
Pre-f958f57e, run_tests blocked until completion. After that fix it became
a background-job starter, with get_test_result polling. The agent prompts
were never updated, so they still said "run_tests blocks until complete"
— and agents then waste turns polling.
Updated coder-1/2/3, coder-opus, and qa prompts to describe the actual
flow: run_tests is async, get_test_result blocks for up to 20s per call,
test suites typically take 1-5 minutes so expect a few polls.
Companion bug filed for bumping TEST_POLL_BLOCK_SECS so one poll covers
most test runs (root-cause fix; this commit is the prompt half).