Commit Graph

30 Commits

Author SHA1 Message Date
Timmy a97a10fba2 docs(903): coder system_prompts — clarify run_tests retry contract
Pre-d64f1e94 the "call run_tests again — it attaches" guidance was a
lie (every call killed the prior job and spawned a fresh one). With
the attach fix in place, the contract is now real and safe to depend
on. Tighten the wording so agents see exactly what to do:

OLD: "Do not use ScheduleWakeup to wait for run_tests; if run_tests
      appears to time out, call run_tests again — it attaches to the
      in-flight test job and blocks until completion."

NEW: "If run_tests errors with a transport timeout, call it again —
      it's idempotent and attaches to the same in-flight test job,
      so retries are safe and eventually return a pass/fail result."

Improvements:
- "errors with a transport timeout" matches what the agent literally
  observes (a tool-call error), not the vague "appears to time out".
- Explicit on idempotency so agents understand why retry is safe and
  don't worry about double-running the suite.
- Drops the ScheduleWakeup clause — already enforced via the
  `disallowed_tools` setting on coder-1/2/3/opus, so the prompt
  reminder was redundant.

Applied uniformly across coder-1, coder-2, coder-3, coder-opus.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 14:54:34 +01:00
Timmy e955250474 fix(902): coder system_prompts steer to get_story_todos for story content
Bug 902: the Step 0 "resume from worktree state" instruction told coders
to call git_status / git_log / git_diff to discover prior session work,
which they then extended into hunting for the story `.md` file on disk
via find / ls — pointless post-865, since story content lives only in
the CRDT.

Update Step 0 in coder-1, coder-2, coder-3, and coder-opus to add an
explicit instruction: "To read story content, ACs, or description, call
the `get_story_todos` MCP tool — do NOT search for a story `.md` file
on disk; story content is CRDT-only."

Single substring replacement covers all four agents (identical Step 0
across them).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 13:13:08 +01:00
dave fac4442969 fix(896): disallow ScheduleWakeup for coder agents; add run_tests retry guidance
- Add `disallowed_tools` field to `AgentConfig` and render it as
  `--disallowedTools` CLI flag in `render_agent_args`
- Set `disallowed_tools = ["ScheduleWakeup"]` on all four coder agents
  (coder-1, coder-2, coder-3, coder-opus); QA and mergemaster unaffected
- Append instruction to all coder `system_prompt`s: do not use
  ScheduleWakeup to wait for run_tests; if run_tests appears to time out,
  call run_tests again — it attaches to the in-flight job and blocks
- Add tests: `render_agent_args_disallowed_tools` and
  `coder_agents_disallow_schedule_wakeup`
2026-05-08 15:28:48 +01:00
dave cf35027b5a config(coders): step 0 — resume prior-session work via git_status + git_log/diff against master..HEAD 2026-04-29 16:03:03 +00:00
dave b4854cf693 huskies: merge 862 2026-04-29 13:28:37 +00:00
dave 9979ff2cf9 huskies: merge 859 2026-04-29 10:18:37 +00:00
dave 8802e1fe59 huskies: merge 853 2026-04-29 09:08:28 +00:00
dave 549a9defc4 huskies: merge 851 2026-04-29 08:42:28 +00:00
dave 3ce34c34e9 huskies: merge 850 2026-04-29 08:27:05 +00:00
dave b698cee284 huskies: merge 821 2026-04-28 21:06:54 +00:00
dave 32a3465fc4 fix: tell the truth about run_tests being blocking
`tool_run_tests` in `server/src/http/mcp/shell_tools/script.rs` is fully
blocking server-side: it spawns the test child, polls every 1s server-side
until exit (or `TEST_TIMEOUT_SECS = 1200s`), and returns the full
{passed, exit_code, output} directly. There is NO async/started-status
return path.

But two places told agents the wrong story:
1. `tools_list/system_tools.rs` description claimed "Returns immediately
   with status: started. Poll get_test_result..." — agents read tool
   descriptions for protocol semantics, so they followed this and burned
   turns polling get_test_result.
2. `agents.toml` had been correctly saying it blocks, but my last commit
   (776aad38) "fixed" it the wrong way based on a misread of the code.

Now both say: run_tests blocks server-side, returns the full result, do
not poll get_test_result. get_test_result remains for external observers
(UI checking on a job another caller started).

Reverts the prompt change in 776aad38 with the correct text.
2026-04-28 15:59:06 +00:00
dave 776aad3877 fix: agent prompts honest about run_tests being async
Pre-f958f57e, run_tests blocked until completion. After that fix it became
a background-job starter, with get_test_result polling. The agent prompts
were never updated, so they still said "run_tests blocks until complete"
— and agents then waste turns polling.

Updated coder-1/2/3, coder-opus, and qa prompts to describe the actual
flow: run_tests is async, get_test_result blocks for up to 20s per call,
test suites typically take 1-5 minutes so expect a few polls.

Companion bug filed for bumping TEST_POLL_BLOCK_SECS so one poll covers
most test runs (root-cause fix; this commit is the prompt half).
2026-04-28 15:55:15 +00:00
dave 36ca8d5e3b huskies: merge 827 2026-04-28 13:01:48 +00:00
dave c1bb5888a8 config: bump mergemaster max_turns 60->100, budget $15->$25
Mergemaster needs more headroom for heavy merges (e.g. the slug-to-numeric
ID migration touching many files, or the FS-shadow deletion stories that
require fixing test setup across the codebase). 60 turns wasn't enough
for the larger ones.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 21:20:22 +00:00
dave 191883fe2a config: brutalist refactor guidance + bump mergemaster inactivity_timeout
- Append to all coder/opus system_prompts: for delete/signature-change
  refactors, delete first and let compiler errors guide the call-site
  walk; do not pre-read files predicting breakage. Reduces exploration
  overhead on mechanical refactors.

- Bump mergemaster inactivity_timeout_secs 300 -> 900 (15 min) so
  mergemaster survives the 5-minute API rate-limit backoff. Without
  this, mergemaster gets killed for inactivity while waiting on rate
  limit clear, blocking all merges.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 21:19:09 +00:00
dave 0b7f7dfdf7 config: bump sonnet coder-1/2/3 max_turns 50→80
Stories like the broadcaster-consumer migrations legitimately need ~60
substantive turns (16 ProjectConfig initializer sites + main.rs subscriber
+ reading existing patterns to mirror). 50 was too tight.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 14:56:24 +00:00
dave 56c979c950 config: tell mergemaster to use 5-min sleeps between merge_agent_work polls
Real cause of mergemaster turn-burnout: not merge conflicts, just polling
overhead. The server-side tool_merge_agent_work IS designed to block until
the merge completes, but the MCP client times out after 60s. The agent
then polls get_merge_status, with 30-60s sleeps between polls — each
poll cycle costs 2 turns (sleep + tool call). The merge takes 5-10 min
for a clean run, so the agent burns 10-20 turns just waiting.

Updated workflow tells mergemaster:
- 'operation timed out' is normal, do NOT immediately re-call (would queue
  a duplicate merge)
- Use Bash sleep 300 (one 5-min wait = 1 turn) between polls
- Cap at 3 polls = 15 minutes total, plenty for any clean merge
- Reserve turns for actual fix-up work if gates fail

Combined with the earlier 30→60 turn / $5→$15 budget bump, this should
land any merge with no real conflicts in 3-5 turns total. Plenty of
headroom remaining for genuine gate-fix work.
2026-04-27 10:50:44 +00:00
dave 7b305ba892 config: bump mergemaster max_turns 30→60, budget $5→$15
30 turns is too tight for non-trivial merge gate failures. Combined with
the 3-retry cap, stories with any post-merge fix-up needed (cargo fmt
nits, slightly out-of-date diffs after parallel merges, etc.) get
permanently blocked.

This is a stopgap until story 668 lands (which will keep gates_passed=false
work in the coder stage entirely, so mergemaster only ever sees clean
diffs and the original 30 turns / $5 is fine again).
2026-04-27 10:41:45 +00:00
dave a4480fa067 chore: feed CONTEXT and STACK specs to all agents, update STACK with source map
Agents now read specs/00_CONTEXT.md (what the project does) and
specs/tech/STACK.md (tech stack + source map) in addition to the
README. STACK.md rewritten to reflect current state — removes stale
references to biome, tauri-specta, .story_kit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 18:15:09 +00:00
dave 483489cc44 fix: rewrite coder agent prompts — run tests before commit, remove stale instructions
Key changes:
- Tests before commit, not after: "run run_tests, fix failures, then commit"
- Removed polling references (run_tests blocks now)
- Removed "never run script/test" (primes agents to think about it)
- Removed dead "user review" instruction
- Removed "commit and stop" which signalled skip-testing
- Cleaner workflow: implement → check criteria → test → fix → commit → exit

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 13:19:08 +00:00
dave 28adef9739 chore: switch mergemaster to opus and add cargo fmt guidance
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 12:35:57 +00:00
dave badfabcf5e chore: switch mergemaster to opus and add cargo fmt guidance
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 12:27:58 +00:00
dave b7f077197d chore: add doc comment guidance to coder agent system prompts
Agents now know to add //! module comments and /// doc comments
to new public items, keeping documentation consistent with the
codebase-wide doc pass from story 542.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 13:25:21 +00:00
dave f958f57e56 fix: async run_tests to prevent zombie cargo processes blocking gates
run_tests MCP tool now spawns tests in the background and returns
immediately. Agents poll get_test_result to check completion. This
prevents zombie cargo processes from holding the build lock when the
CLI times out the MCP call before tests finish.

Also fixes agent permission mode: acceptEdits replaces invalid
allowFullAutoEdit that was causing agents to crash-loop on spawn.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 22:00:05 +00:00
dave c0d1be675b fix: mergemaster prompt says merge_agent_work blocks — no polling needed 2026-04-11 18:13:53 +00:00
dave a9a1852422 fix: agent prompts say trust the story description instead of always investigating
Agents were spending entire $5 budgets grepping the codebase and reading
git history instead of making fixes when the story already specifies
exact file paths and function names. Changed bug workflow from
"investigate root cause first" to "trust the story, act fast" — go
directly to the specified location when the story tells you where.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 15:16:12 +00:00
dave dd53870c59 fix: agent prompts use run_tests MCP tool instead of running script/test via Bash
Agents were running script/test directly through the PTY, streaming
the full output of npm install, cargo clippy, cargo test, and frontend
builds into session logs. This tripled session log sizes (~200KB to
~600KB per session) and contributed to CLI SIGABRT crashes.

The run_tests MCP tool already runs script/test server-side and returns
a truncated JSON summary. Agents now use it exclusively.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 11:46:05 +00:00
dave 7e5b9839e8 huskies: merge 523_refactor_introduce_script_test_script_lint_script_build_and_migrate_agent_prompts_off_tech_specific_commands 2026-04-10 11:22:51 +00:00
Timmy 934bda5904 Trying out sonnet for merges 2026-04-10 01:04:25 +01:00
dave 470e7a5fd5 huskies: merge 482_refactor_split_agent_definitions_from_project_toml_into_agents_toml 2026-04-04 21:24:22 +00:00