huskies

Author	SHA1	Message	Date
Timmy	bb845d17cf	docs(904): drop run_tests retry-on-timeout clause from coder prompts Bug 903 (run_tests attach instead of respawn) + 904 (MCP progress notifications + SSE) together eliminate the transport-timeout error mode from the agent's point of view: long test runs complete without the MCP client ever observing a tool-call error. Production verification (see `d64f1e94` / `ddc4228b` deploy at 14:30 UTC today) confirmed 78s and 65s test runs completing in single processes with no respawn churn and no retry needed. The "If run_tests errors with a transport timeout, call it again" sentence in coder-1/2/3/opus system_prompts (added belt-and-braces in `a97a10fb`) is now redundant. Removing it tightens the agent's mental model down to: call run_tests, wait for the result. No error-handling branch, no retry semantics to internalise. This closes the last open AC on story 904. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 15:36:53 +01:00
Timmy	a97a10fba2	docs(903): coder system_prompts — clarify run_tests retry contract Pre-d64f1e94 the "call run_tests again — it attaches" guidance was a lie (every call killed the prior job and spawned a fresh one). With the attach fix in place, the contract is now real and safe to depend on. Tighten the wording so agents see exactly what to do: OLD: "Do not use ScheduleWakeup to wait for run_tests; if run_tests appears to time out, call run_tests again — it attaches to the in-flight test job and blocks until completion." NEW: "If run_tests errors with a transport timeout, call it again — it's idempotent and attaches to the same in-flight test job, so retries are safe and eventually return a pass/fail result." Improvements: - "errors with a transport timeout" matches what the agent literally observes (a tool-call error), not the vague "appears to time out". - Explicit on idempotency so agents understand why retry is safe and don't worry about double-running the suite. - Drops the ScheduleWakeup clause — already enforced via the `disallowed_tools` setting on coder-1/2/3/opus, so the prompt reminder was redundant. Applied uniformly across coder-1, coder-2, coder-3, coder-opus. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 14:54:34 +01:00
Timmy	e955250474	fix(902): coder system_prompts steer to get_story_todos for story content Bug 902: the Step 0 "resume from worktree state" instruction told coders to call git_status / git_log / git_diff to discover prior session work, which they then extended into hunting for the story `.md` file on disk via find / ls — pointless post-865, since story content lives only in the CRDT. Update Step 0 in coder-1, coder-2, coder-3, and coder-opus to add an explicit instruction: "To read story content, ACs, or description, call the `get_story_todos` MCP tool — do NOT search for a story `.md` file on disk; story content is CRDT-only." Single substring replacement covers all four agents (identical Step 0 across them). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 13:13:08 +01:00
dave	fac4442969	fix(896): disallow ScheduleWakeup for coder agents; add run_tests retry guidance - Add `disallowed_tools` field to `AgentConfig` and render it as `--disallowedTools` CLI flag in `render_agent_args` - Set `disallowed_tools = ["ScheduleWakeup"]` on all four coder agents (coder-1, coder-2, coder-3, coder-opus); QA and mergemaster unaffected - Append instruction to all coder `system_prompt`s: do not use ScheduleWakeup to wait for run_tests; if run_tests appears to time out, call run_tests again — it attaches to the in-flight job and blocks - Add tests: `render_agent_args_disallowed_tools` and `coder_agents_disallow_schedule_wakeup`	2026-05-08 15:28:48 +01:00
dave	cf35027b5a	config(coders): step 0 — resume prior-session work via git_status + git_log/diff against master..HEAD	2026-04-29 16:03:03 +00:00
dave	b4854cf693	huskies: merge 862	2026-04-29 13:28:37 +00:00
dave	9979ff2cf9	huskies: merge 859	2026-04-29 10:18:37 +00:00
dave	8802e1fe59	huskies: merge 853	2026-04-29 09:08:28 +00:00
dave	549a9defc4	huskies: merge 851	2026-04-29 08:42:28 +00:00
dave	3ce34c34e9	huskies: merge 850	2026-04-29 08:27:05 +00:00
dave	b698cee284	huskies: merge 821	2026-04-28 21:06:54 +00:00
dave	32a3465fc4	fix: tell the truth about run_tests being blocking `tool_run_tests` in `server/src/http/mcp/shell_tools/script.rs` is fully blocking server-side: it spawns the test child, polls every 1s server-side until exit (or `TEST_TIMEOUT_SECS = 1200s`), and returns the full {passed, exit_code, output} directly. There is NO async/started-status return path. But two places told agents the wrong story: 1. `tools_list/system_tools.rs` description claimed "Returns immediately with status: started. Poll get_test_result..." — agents read tool descriptions for protocol semantics, so they followed this and burned turns polling get_test_result. 2. `agents.toml` had been correctly saying it blocks, but my last commit (`776aad38`) "fixed" it the wrong way based on a misread of the code. Now both say: run_tests blocks server-side, returns the full result, do not poll get_test_result. get_test_result remains for external observers (UI checking on a job another caller started). Reverts the prompt change in `776aad38` with the correct text.	2026-04-28 15:59:06 +00:00
dave	776aad3877	fix: agent prompts honest about run_tests being async Pre-f958f57e, run_tests blocked until completion. After that fix it became a background-job starter, with get_test_result polling. The agent prompts were never updated, so they still said "run_tests blocks until complete" — and agents then waste turns polling. Updated coder-1/2/3, coder-opus, and qa prompts to describe the actual flow: run_tests is async, get_test_result blocks for up to 20s per call, test suites typically take 1-5 minutes so expect a few polls. Companion bug filed for bumping TEST_POLL_BLOCK_SECS so one poll covers most test runs (root-cause fix; this commit is the prompt half).	2026-04-28 15:55:15 +00:00
dave	36ca8d5e3b	huskies: merge 827	2026-04-28 13:01:48 +00:00
dave	c1bb5888a8	config: bump mergemaster max_turns 60->100, budget $15->$25 Mergemaster needs more headroom for heavy merges (e.g. the slug-to-numeric ID migration touching many files, or the FS-shadow deletion stories that require fixing test setup across the codebase). 60 turns wasn't enough for the larger ones. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 21:20:22 +00:00
dave	191883fe2a	config: brutalist refactor guidance + bump mergemaster inactivity_timeout - Append to all coder/opus system_prompts: for delete/signature-change refactors, delete first and let compiler errors guide the call-site walk; do not pre-read files predicting breakage. Reduces exploration overhead on mechanical refactors. - Bump mergemaster inactivity_timeout_secs 300 -> 900 (15 min) so mergemaster survives the 5-minute API rate-limit backoff. Without this, mergemaster gets killed for inactivity while waiting on rate limit clear, blocking all merges. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 21:19:09 +00:00
dave	0b7f7dfdf7	config: bump sonnet coder-1/2/3 max_turns 50→80 Stories like the broadcaster-consumer migrations legitimately need ~60 substantive turns (16 ProjectConfig initializer sites + main.rs subscriber + reading existing patterns to mirror). 50 was too tight. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 14:56:24 +00:00
dave	56c979c950	config: tell mergemaster to use 5-min sleeps between merge_agent_work polls Real cause of mergemaster turn-burnout: not merge conflicts, just polling overhead. The server-side tool_merge_agent_work IS designed to block until the merge completes, but the MCP client times out after 60s. The agent then polls get_merge_status, with 30-60s sleeps between polls — each poll cycle costs 2 turns (sleep + tool call). The merge takes 5-10 min for a clean run, so the agent burns 10-20 turns just waiting. Updated workflow tells mergemaster: - 'operation timed out' is normal, do NOT immediately re-call (would queue a duplicate merge) - Use Bash sleep 300 (one 5-min wait = 1 turn) between polls - Cap at 3 polls = 15 minutes total, plenty for any clean merge - Reserve turns for actual fix-up work if gates fail Combined with the earlier 30→60 turn / $5→$15 budget bump, this should land any merge with no real conflicts in 3-5 turns total. Plenty of headroom remaining for genuine gate-fix work.	2026-04-27 10:50:44 +00:00
dave	7b305ba892	config: bump mergemaster max_turns 30→60, budget $5→$15 30 turns is too tight for non-trivial merge gate failures. Combined with the 3-retry cap, stories with any post-merge fix-up needed (cargo fmt nits, slightly out-of-date diffs after parallel merges, etc.) get permanently blocked. This is a stopgap until story 668 lands (which will keep gates_passed=false work in the coder stage entirely, so mergemaster only ever sees clean diffs and the original 30 turns / $5 is fine again).	2026-04-27 10:41:45 +00:00
dave	a4480fa067	chore: feed CONTEXT and STACK specs to all agents, update STACK with source map Agents now read specs/00_CONTEXT.md (what the project does) and specs/tech/STACK.md (tech stack + source map) in addition to the README. STACK.md rewritten to reflect current state — removes stale references to biome, tauri-specta, .story_kit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 18:15:09 +00:00
dave	483489cc44	fix: rewrite coder agent prompts — run tests before commit, remove stale instructions Key changes: - Tests before commit, not after: "run run_tests, fix failures, then commit" - Removed polling references (run_tests blocks now) - Removed "never run script/test" (primes agents to think about it) - Removed dead "user review" instruction - Removed "commit and stop" which signalled skip-testing - Cleaner workflow: implement → check criteria → test → fix → commit → exit Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 13:19:08 +00:00
dave	28adef9739	chore: switch mergemaster to opus and add cargo fmt guidance Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 12:35:57 +00:00
dave	badfabcf5e	chore: switch mergemaster to opus and add cargo fmt guidance Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 12:27:58 +00:00
dave	b7f077197d	chore: add doc comment guidance to coder agent system prompts Agents now know to add //! module comments and /// doc comments to new public items, keeping documentation consistent with the codebase-wide doc pass from story 542. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 13:25:21 +00:00
dave	f958f57e56	fix: async run_tests to prevent zombie cargo processes blocking gates run_tests MCP tool now spawns tests in the background and returns immediately. Agents poll get_test_result to check completion. This prevents zombie cargo processes from holding the build lock when the CLI times out the MCP call before tests finish. Also fixes agent permission mode: acceptEdits replaces invalid allowFullAutoEdit that was causing agents to crash-loop on spawn. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 22:00:05 +00:00
dave	c0d1be675b	fix: mergemaster prompt says merge_agent_work blocks — no polling needed	2026-04-11 18:13:53 +00:00
dave	a9a1852422	fix: agent prompts say trust the story description instead of always investigating Agents were spending entire $5 budgets grepping the codebase and reading git history instead of making fixes when the story already specifies exact file paths and function names. Changed bug workflow from "investigate root cause first" to "trust the story, act fast" — go directly to the specified location when the story tells you where. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 15:16:12 +00:00
dave	dd53870c59	fix: agent prompts use run_tests MCP tool instead of running script/test via Bash Agents were running script/test directly through the PTY, streaming the full output of npm install, cargo clippy, cargo test, and frontend builds into session logs. This tripled session log sizes (~200KB to ~600KB per session) and contributed to CLI SIGABRT crashes. The run_tests MCP tool already runs script/test server-side and returns a truncated JSON summary. Agents now use it exclusively. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 11:46:05 +00:00
dave	7e5b9839e8	huskies: merge 523_refactor_introduce_script_test_script_lint_script_build_and_migrate_agent_prompts_off_tech_specific_commands	2026-04-10 11:22:51 +00:00
Timmy	934bda5904	Trying out sonnet for merges	2026-04-10 01:04:25 +01:00
dave	470e7a5fd5	huskies: merge 482_refactor_split_agent_definitions_from_project_toml_into_agents_toml	2026-04-04 21:24:22 +00:00

31 Commits