huskies

Author	SHA1	Message	Date
dave	eb48ef19e7	huskies: merge 1011	2026-05-13 21:32:11 +00:00
dave	8b53e20ca9	huskies: merge 961	2026-05-13 11:27:21 +00:00
Timmy	ddc4228b10	feat(904): MCP progress notifications + SSE for long-running tool calls Follow-up to bug 903. The attach fix made run_tests retries safe, but agents still observed the underlying MCP transport timeout as a tool-call error and had to handle it via retry. Implement the proper fix: MCP `notifications/progress` events keep the client's transport timer alive so the call never errors from the agent's perspective. What changed: server/src/http/mcp/progress.rs (new) - `ProgressEmitter` (progressToken + mpsc sender) installed in a `tokio::task_local!` scope by the SSE response path. - `emit_progress(progress, total, message)` builds a JSON-RPC `notifications/progress` message and sends it via the channel. No-op when no emitter is in scope (plain JSON path / tests / API runtimes), so tool handlers can call it unconditionally. server/src/http/mcp/mod.rs - mcp_post_handler now detects `Accept: text/event-stream` AND a `params._meta.progressToken` on tools/call. When both are present, routes through `sse_tools_call` instead of the plain JSON path. - sse_tools_call: spawns the dispatch task with the emitter installed, builds an SSE stream that interleaves incoming progress events with the final JSON-RPC response, with a 15s keep-alive interval as a backstop for tools that don't emit their own progress. - Plain JSON behaviour is unchanged for non-SSE clients and for everything other than tools/call. server/src/http/mcp/shell_tools/script.rs - tool_run_tests poll loop emits `notifications/progress` every 25s of elapsed time (well below the typical ~60s MCP transport timeout). Attached callers (the bug 903 fix path) also emit so their MCP socket stays alive while waiting for the in-flight job. - Output filtering: on a passing run the response now returns a one-line summary ("All N tests passed.") instead of the full `cargo test` stdout, which was pure noise that burned agent tokens. Failure output is unchanged (truncated tail with the `failures:` section and final test_result line). CRDT entry stores the same filtered value so attached callers see it too. Tests (3 new): - emit_progress_no_op_without_emitter — calling outside scope is safe - emit_progress_sends_notification_when_emitter_installed — full path - emit_progress_omits_optional_fields — total/message optional Not changed: coder system_prompts still tell agents to retry on transport-timeout errors. That advice is now belt-and-braces — if claude-code's HTTP MCP client honours progress notifications, no agent will ever observe the error; if not, retry is still safe post-903. We can drop the retry advice once we've observed the SSE path working in the field. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 15:05:04 +01:00
Timmy	d64f1e94ff	fix(903): run_tests attaches to in-flight job instead of kill+respawn Bug 903: every `run_tests` MCP call killed the prior `cargo test` child for the same worktree and spawned a fresh one. Combined with the ~60s MCP client-side timeout and the 896 agent prompt that told agents to "call run_tests again — it attaches to the in-flight test job", this produced a respawn loop: agent calls, MCP times out at 60s, agent retries, run_tests kills the running build and starts a new one. The test suite never reaches the finish line. Server log evidence: "Started test job for <worktree> (pid N)" with a new PID every ~60-90s for the same worktree. Fix: when `run_tests` is called and a job is already in flight for that worktree, ATTACH to it instead of killing+respawning. The original job's poll loop already writes the final status to the CRDT `test_jobs` collection; attached callers just poll that CRDT entry (the same pattern `get_test_result` uses) and return the result when the in-flight job transitions out of "running". The 896 prompt's claim is now actually true. Worktrees remain isolated from each other and may run `cargo test` concurrently — there is no cross-worktree serialisation. The single invariant is "at most one test job per worktree at a time". New test: `tool_run_tests_concurrent_calls_attach_to_single_job` spawns two concurrent calls on the same worktree against a 2s `sleep`-based script and asserts total elapsed stays close to 2s (attach) rather than 4s (respawn). Note: the cross-worktree linker-OOM symptom Timmy reported in the field was downstream of the respawn loop. Killed-but-not-fully-reaped cargo invocations stack memory pressure beyond the nominal N worktrees. With the attach fix, each worktree runs exactly one in-flight build at a time and old builds finish cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 14:22:35 +01:00
dave	1251b869a6	style: cargo fmt on today's new code (883/884/886/opus-pin) The mergemaster gates run rustfmt and rejected 864's merge because several files I added/touched in master today had not been fmt'd. Six files affected, mostly trivial line-wrapping nits. Fixes the formatting gate for the next 864 merge attempt.	2026-04-30 22:15:37 +00:00
dave	7a0c186d94	fix(886): parse cargo diagnostics in run_check/run_build/run_lint Before: tool_run_check (and run_build/run_lint via run_script_tool) returned the entire cargo log verbatim in `output`. For runs with many errors the response routinely exceeded the MCP token cap, was dumped to a tool-results file, and the agent had to scrape it with python3 just to see the error list — burning many turns on file archaeology for what should be a one-look operation. Real example: 864's coder hit `result (143,708 characters) exceeds maximum allowed tokens` and spent ~8 turns extracting 3 errors. Now: - New `service::shell::parse_diagnostics` parses `error[CODE]:` / `warning[CODE]:` headers + their `--> file:line` markers into structured `Diagnostic { kind, code, message, file, line }`. - `tool_run_check` (and the run_build/run_lint shared body) returns `{ passed, exit_code, errors: [...], warnings: [...], summary }`. Raw `output` is dropped from the default response. - New `verbose: bool` argument (default false) restores the raw output for callers who actually need it. - Updated the existing tool_run_check test to assert the new contract (150 errors → 150 structured entries, response < 50KB). Skipped run_tests in this pass — its parser would need to recognise test-runner output (different format from cargo); will land separately. Closes 886.	2026-04-30 15:06:02 +00:00
dave	4d24b5b661	huskies: merge 855	2026-04-29 21:41:03 +00:00
dave	b4854cf693	huskies: merge 862	2026-04-29 13:28:37 +00:00
dave	01169332b3	huskies: merge 774	2026-04-28 10:51:59 +00:00
dave	5da29c3d91	huskies: merge 668_bug_pipeline_advances_coder_work_to_merge_when_gates_passed_false	2026-04-27 11:39:11 +00:00
dave	34a399b838	refactor: split http/mcp/shell_tools.rs (1144) into mod + exec + script The 1144-line shell_tools.rs is split: - exec.rs: validate_working_dir + tool_run_command + handle_run_command_sse + their tests (~550 lines) - script.rs: tool_run_tests + tool_get_test_result + tool_run_build + tool_run_lint + helpers + their tests (~610 lines) - mod.rs: re-exports (~12 lines) Tests stay co-located. All 2636 tests pass; clippy clean.	2026-04-27 02:04:04 +00:00

11 Commits