Commit Graph

194 Commits

Author SHA1 Message Date
dave 0b3a33a63c huskies: merge 1037 2026-05-14 15:54:17 +00:00
Timmy b0090aba84 Adding baseline source-map 2026-05-14 16:35:08 +01:00
dave 14a39b6205 huskies: merge 980 2026-05-13 14:44:17 +00:00
dave c89a5c2da6 huskies: merge 966 2026-05-13 12:21:43 +00:00
dave 3c9851d17d docs(AGENT.md): forceful "no exceptions" doc-comment rule
Two stories today (961, 962) passed every other gate and got bounced
at the merge step on a single missing `///` on a `pub mod` line.
Sonnet keeps treating the doc comment as optional when the rule says
"add doc comments to new modules and pub functions/structs/enums."

Promote the rule to its own loud section with no-exceptions wording
and a concrete reminder to run source-map-check before committing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 12:08:54 +00:00
Timmy 78b1ecdc3c docs(AGENT): require PLAN.md update on every wip + final commit
The "living document" rule was soft and got ignored — coders wrote
PLAN.md once at session start and then drifted away from it. Tie the
update to a trigger they already do (the wip/final commit), and call
out stale "Current state" as a process failure.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 11:57:51 +01:00
dave 4a0fbcaa95 huskies: merge 949 2026-05-13 07:14:50 +00:00
dave 9ccbdff19f huskies: merge 952 2026-05-13 05:43:22 +00:00
dave 8e9112066f huskies: merge 935 2026-05-12 22:03:15 +00:00
dave c3144b7937 huskies: merge 900 2026-05-12 16:46:33 +00:00
Timmy 6feb68f3e3 fix(923): watchdog counts only tool-using turns; narration-only turns no longer burn budget
Observed: stories 917, 918, 920, 910 all turn-limit-killed despite producing
real commits. Tally across their session logs shows 30–55% of assistant
turns were pure narration ("I'll read X next", "Now let me check Y") with
no tool_use. At 80 max_turns the effective work budget was ~44 tool calls,
not enough for a typical bug fix's edit + test + check_criterion cycle.

Changes:
- New optional AgentConfig field max_tool_turns. When set the watchdog
  uses it instead of max_turns; only assistant messages whose
  data.message.content has at least one tool_use block count.
- count_turns_in_log in agents/pool/auto_assign/watchdog/limits.rs
  filters on tool_use. Existing test helper write_fake_session_log now
  emits tool_use blocks; added write_fake_mixed_session_log for the
  narration regression test.
- agents.toml: coders/coder-opus get max_turns=200 (claude-code's own
  --max-turns cap, sized to never bite before the watchdog) and
  max_tool_turns=80. qa: 120 / 40. mergemaster: 250 / 100. Budgets
  unchanged — the dollar cap remains the runaway-loop backstop, with
  ~$3-5 worst-case waste if an agent narrates indefinitely.
- Two new regression tests:
  * watchdog_does_not_count_narration_only_turns: 5 tool + 30 narration
    under max_tool_turns=10 stays Running.
  * watchdog_max_tool_turns_overrides_max_turns: 4 tool turns at
    max_tool_turns=3 / max_turns=200 still terminates with TurnLimit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 17:25:11 +01:00
Timmy bb845d17cf docs(904): drop run_tests retry-on-timeout clause from coder prompts
Bug 903 (run_tests attach instead of respawn) + 904 (MCP progress
notifications + SSE) together eliminate the transport-timeout error
mode from the agent's point of view: long test runs complete without
the MCP client ever observing a tool-call error. Production
verification (see d64f1e94 / ddc4228b deploy at 14:30 UTC today)
confirmed 78s and 65s test runs completing in single processes with
no respawn churn and no retry needed.

The "If run_tests errors with a transport timeout, call it again"
sentence in coder-1/2/3/opus system_prompts (added belt-and-braces
in a97a10fb) is now redundant. Removing it tightens the agent's
mental model down to: call run_tests, wait for the result. No
error-handling branch, no retry semantics to internalise.

This closes the last open AC on story 904.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 15:36:53 +01:00
Timmy a97a10fba2 docs(903): coder system_prompts — clarify run_tests retry contract
Pre-d64f1e94 the "call run_tests again — it attaches" guidance was a
lie (every call killed the prior job and spawned a fresh one). With
the attach fix in place, the contract is now real and safe to depend
on. Tighten the wording so agents see exactly what to do:

OLD: "Do not use ScheduleWakeup to wait for run_tests; if run_tests
      appears to time out, call run_tests again — it attaches to the
      in-flight test job and blocks until completion."

NEW: "If run_tests errors with a transport timeout, call it again —
      it's idempotent and attaches to the same in-flight test job,
      so retries are safe and eventually return a pass/fail result."

Improvements:
- "errors with a transport timeout" matches what the agent literally
  observes (a tool-call error), not the vague "appears to time out".
- Explicit on idempotency so agents understand why retry is safe and
  don't worry about double-running the suite.
- Drops the ScheduleWakeup clause — already enforced via the
  `disallowed_tools` setting on coder-1/2/3/opus, so the prompt
  reminder was redundant.

Applied uniformly across coder-1, coder-2, coder-3, coder-opus.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 14:54:34 +01:00
Timmy e955250474 fix(902): coder system_prompts steer to get_story_todos for story content
Bug 902: the Step 0 "resume from worktree state" instruction told coders
to call git_status / git_log / git_diff to discover prior session work,
which they then extended into hunting for the story `.md` file on disk
via find / ls — pointless post-865, since story content lives only in
the CRDT.

Update Step 0 in coder-1, coder-2, coder-3, and coder-opus to add an
explicit instruction: "To read story content, ACs, or description, call
the `get_story_todos` MCP tool — do NOT search for a story `.md` file
on disk; story content is CRDT-only."

Single substring replacement covers all four agents (identical Step 0
across them).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 13:13:08 +01:00
dave fac4442969 fix(896): disallow ScheduleWakeup for coder agents; add run_tests retry guidance
- Add `disallowed_tools` field to `AgentConfig` and render it as
  `--disallowedTools` CLI flag in `render_agent_args`
- Set `disallowed_tools = ["ScheduleWakeup"]` on all four coder agents
  (coder-1, coder-2, coder-3, coder-opus); QA and mergemaster unaffected
- Append instruction to all coder `system_prompt`s: do not use
  ScheduleWakeup to wait for run_tests; if run_tests appears to time out,
  call run_tests again — it attaches to the in-flight job and blocks
- Add tests: `render_agent_args_disallowed_tools` and
  `coder_agents_disallow_schedule_wakeup`
2026-05-08 15:28:48 +01:00
dave c50a04445c spike(814): add gateway update command design doc
Documents chat-driven `update` bot command for multi-project gateway:
command surface, auth (room+role guard, future Ed25519), Docker-managed
rollout sequence, automatic and manual rollback, open questions, and
dependencies.
2026-04-29 18:17:19 +00:00
dave cf35027b5a config(coders): step 0 — resume prior-session work via git_status + git_log/diff against master..HEAD 2026-04-29 16:03:03 +00:00
dave b4854cf693 huskies: merge 862 2026-04-29 13:28:37 +00:00
dave 9979ff2cf9 huskies: merge 859 2026-04-29 10:18:37 +00:00
dave 8802e1fe59 huskies: merge 853 2026-04-29 09:08:28 +00:00
dave 549a9defc4 huskies: merge 851 2026-04-29 08:42:28 +00:00
dave 3ce34c34e9 huskies: merge 850 2026-04-29 08:27:05 +00:00
dave b698cee284 huskies: merge 821 2026-04-28 21:06:54 +00:00
dave 32a3465fc4 fix: tell the truth about run_tests being blocking
`tool_run_tests` in `server/src/http/mcp/shell_tools/script.rs` is fully
blocking server-side: it spawns the test child, polls every 1s server-side
until exit (or `TEST_TIMEOUT_SECS = 1200s`), and returns the full
{passed, exit_code, output} directly. There is NO async/started-status
return path.

But two places told agents the wrong story:
1. `tools_list/system_tools.rs` description claimed "Returns immediately
   with status: started. Poll get_test_result..." — agents read tool
   descriptions for protocol semantics, so they followed this and burned
   turns polling get_test_result.
2. `agents.toml` had been correctly saying it blocks, but my last commit
   (776aad38) "fixed" it the wrong way based on a misread of the code.

Now both say: run_tests blocks server-side, returns the full result, do
not poll get_test_result. get_test_result remains for external observers
(UI checking on a job another caller started).

Reverts the prompt change in 776aad38 with the correct text.
2026-04-28 15:59:06 +00:00
dave 776aad3877 fix: agent prompts honest about run_tests being async
Pre-f958f57e, run_tests blocked until completion. After that fix it became
a background-job starter, with get_test_result polling. The agent prompts
were never updated, so they still said "run_tests blocks until complete"
— and agents then waste turns polling.

Updated coder-1/2/3, coder-opus, and qa prompts to describe the actual
flow: run_tests is async, get_test_result blocks for up to 20s per call,
test suites typically take 1-5 minutes so expect a few polls.

Companion bug filed for bumping TEST_POLL_BLOCK_SECS so one poll covers
most test runs (root-cause fix; this commit is the prompt half).
2026-04-28 15:55:15 +00:00
dave bb779a0b0f chore: regenerate STACK.md source map from module doc-comments
Walked server/src/, frontend/src/, and crates/, extracting each module's
//! doc-comment to build a directory-level source map. One row per
directory + one row per top-level file.

Replaces the hand-written stopgap from 5d6757dd with content auto-derived
from the codebase, so it stays useful as decomposes happen — the
descriptions come from mod.rs, not from my recollection of where things
live.

Still a stopgap until 819 (auto-generated source-map-gen) lands and gets
wired into the agent spawn path, but the content is closer to what 819
will produce.
2026-04-28 15:50:29 +00:00
dave 5d6757dd65 chore: restore source map in STACK.md (stopgap for 818 regression)
818 stripped the source map because it had stale paths. Empirically that
made coder agents far slower — they spent most of each session re-discovering
the codebase via Read/Grep before reaching any Edit, and ran out of turn
budget without committing.

Restoring a fresh source map keyed off current master. Uses directories
where possible so it stays useful through future decomposes, plus a
"Canonical examples" section pointing at the patterns to copy when adding
new CRDT collections, RPC handlers, services, chat commands, etc.

This is a stopgap until 819 (auto-generated source-map-gen) lands.
2026-04-28 15:43:44 +00:00
dave 36ca8d5e3b huskies: merge 827 2026-04-28 13:01:48 +00:00
dave e879d6f602 huskies: merge 818 2026-04-28 12:03:01 +00:00
dave c1bb5888a8 config: bump mergemaster max_turns 60->100, budget $15->$25
Mergemaster needs more headroom for heavy merges (e.g. the slug-to-numeric
ID migration touching many files, or the FS-shadow deletion stories that
require fixing test setup across the codebase). 60 turns wasn't enough
for the larger ones.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 21:20:22 +00:00
dave 191883fe2a config: brutalist refactor guidance + bump mergemaster inactivity_timeout
- Append to all coder/opus system_prompts: for delete/signature-change
  refactors, delete first and let compiler errors guide the call-site
  walk; do not pre-read files predicting breakage. Reduces exploration
  overhead on mechanical refactors.

- Bump mergemaster inactivity_timeout_secs 300 -> 900 (15 min) so
  mergemaster survives the 5-minute API rate-limit backoff. Without
  this, mergemaster gets killed for inactivity while waiting on rate
  limit clear, blocking all merges.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 21:19:09 +00:00
dave 2b28ccbf2c Merge spike branch 'feature/story-679_spike_migrate_inter_component_http_to_signed_crdt_websocket_bus' into master 2026-04-27 17:01:48 +00:00
dave 5884dac825 chore: gitignore .huskies/session_store.json (runtime artifact)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 14:59:33 +00:00
dave 0b7f7dfdf7 config: bump sonnet coder-1/2/3 max_turns 50→80
Stories like the broadcaster-consumer migrations legitimately need ~60
substantive turns (16 ProjectConfig initializer sites + main.rs subscriber
+ reading existing patterns to mirror). 50 was too tight.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 14:56:24 +00:00
dave 756c790b9f spike 679: document HTTP-to-CRDT-bus migration plan
Full inventory of all gateway and project server endpoints with caller,
purpose, latency/freshness/durability requirements. Classifies each as
write/read/external-webhook/frontend-asset. Maps write endpoints to
target CRDT collections, proposes RPC frame shapes for read endpoints,
drafts the unsigned read-RPC protocol (envelope, correlation IDs, TTL,
error codes, peer-offline handling), lists in-memory state needing CRDT
migration with proposed types, and defines a wave-ordered migration plan
with explicit dependencies (story 665 Ed25519 auth as the blocker for
write migrations).
2026-04-27 14:49:38 +00:00
dave 56c979c950 config: tell mergemaster to use 5-min sleeps between merge_agent_work polls
Real cause of mergemaster turn-burnout: not merge conflicts, just polling
overhead. The server-side tool_merge_agent_work IS designed to block until
the merge completes, but the MCP client times out after 60s. The agent
then polls get_merge_status, with 30-60s sleeps between polls — each
poll cycle costs 2 turns (sleep + tool call). The merge takes 5-10 min
for a clean run, so the agent burns 10-20 turns just waiting.

Updated workflow tells mergemaster:
- 'operation timed out' is normal, do NOT immediately re-call (would queue
  a duplicate merge)
- Use Bash sleep 300 (one 5-min wait = 1 turn) between polls
- Cap at 3 polls = 15 minutes total, plenty for any clean merge
- Reserve turns for actual fix-up work if gates fail

Combined with the earlier 30→60 turn / $5→$15 budget bump, this should
land any merge with no real conflicts in 3-5 turns total. Plenty of
headroom remaining for genuine gate-fix work.
2026-04-27 10:50:44 +00:00
dave 7b305ba892 config: bump mergemaster max_turns 30→60, budget $5→$15
30 turns is too tight for non-trivial merge gate failures. Combined with
the 3-retry cap, stories with any post-merge fix-up needed (cargo fmt
nits, slightly out-of-date diffs after parallel merges, etc.) get
permanently blocked.

This is a stopgap until story 668 lands (which will keep gates_passed=false
work in the coder stage entirely, so mergemaster only ever sees clean
diffs and the original 30 turns / $5 is fine again).
2026-04-27 10:41:45 +00:00
dave 9fbbfcd585 huskies: merge 667_story_agent_prompt_target_maximum_file_size_of_800_lines_as_a_soft_guide_decompose_larger_files_by_concern 2026-04-27 01:37:52 +00:00
dave f88bb5f486 huskies: merge 645_bug_agent_runtime_panics_with_output_write_bytes_is_ok_assertion_marking_stories_falsely_blocked 2026-04-26 10:54:58 +00:00
dave 2097787e1f docs: add pipeline state machine reference (current + planned transitions)
Captures the dual representation we have today (legacy filesystem stage
strings + front-matter flags vs the typed Stage/ArchiveReason/ExecutionState
enums in pipeline_state.rs that are defined-but-not-wired) and itemises the
transitions and behaviours we have identified as missing or partially
implemented (first-class supersede/abandon/hold verbs, type-conversion side
effects, pinned-agent honouring under contention, blocked-flag enforcement
beyond auto-assign, ghost-story recovery, etc.).

Section (b) is intended as a living dumping ground — append new
transitions and incidents as they come up so that the state-machine
roadmap (spike 613 in backlog) has a ready-made input.
2026-04-25 13:33:57 +00:00
dave 4b765bbc39 huskies: merge 601_story_project_local_agent_prompt_layer_for_huskies 2026-04-23 11:56:19 +00:00
dave b3da321a3b huskies: merge 598_story_expose_huskies_init_as_a_gateway_mcp_tool 2026-04-22 21:39:29 +00:00
Timmy 6c76b569c4 Deleting ancient handoff file 2026-04-17 13:18:02 +01:00
dave a4480fa067 chore: feed CONTEXT and STACK specs to all agents, update STACK with source map
Agents now read specs/00_CONTEXT.md (what the project does) and
specs/tech/STACK.md (tech stack + source map) in addition to the
README. STACK.md rewritten to reflect current state — removes stale
references to biome, tauri-specta, .story_kit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 18:15:09 +00:00
dave 483489cc44 fix: rewrite coder agent prompts — run tests before commit, remove stale instructions
Key changes:
- Tests before commit, not after: "run run_tests, fix failures, then commit"
- Removed polling references (run_tests blocks now)
- Removed "never run script/test" (primes agents to think about it)
- Removed dead "user review" instruction
- Removed "commit and stop" which signalled skip-testing
- Cleaner workflow: implement → check criteria → test → fix → commit → exit

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 13:19:08 +00:00
dave 8936abd8cd docs: add project architecture section to README for agent context
Agents need to know the gateway is a mode of the binary, not a
separate app, and that UI stories are frontend React work, not
Rust backend restructuring.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 16:23:18 +00:00
dave 28adef9739 chore: switch mergemaster to opus and add cargo fmt guidance
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 12:35:57 +00:00
dave badfabcf5e chore: switch mergemaster to opus and add cargo fmt guidance
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 12:27:58 +00:00
dave 845b85e7a7 fix: add --all to cargo fmt in script/test and autoformat codebase
cargo fmt without --all fails with "Failed to find targets" in
workspace repos. This was blocking every story's gates. Also ran
cargo fmt --all to fix all existing formatting issues.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 14:07:08 +00:00
dave 0cb68e1de9 docs: add deployment modes to README — standard, headless, and gateway
Documents the three modes of the huskies binary: standard single-project
server, headless build agent (--rendezvous), and multi-project gateway
(--gateway). Includes projects.toml config example and Docker Compose
sketch for multi-project setup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 13:44:10 +00:00