Follow-up to bug 903. The attach fix made run_tests retries safe, but
agents still observed the underlying MCP transport timeout as a
tool-call error and had to handle it via retry. Implement the proper
fix: MCP `notifications/progress` events keep the client's transport
timer alive so the call never errors from the agent's perspective.
What changed:
server/src/http/mcp/progress.rs (new)
- `ProgressEmitter` (progressToken + mpsc sender) installed in a
`tokio::task_local!` scope by the SSE response path.
- `emit_progress(progress, total, message)` builds a JSON-RPC
`notifications/progress` message and sends it via the channel.
No-op when no emitter is in scope (plain JSON path / tests / API
runtimes), so tool handlers can call it unconditionally.
server/src/http/mcp/mod.rs
- mcp_post_handler now detects `Accept: text/event-stream` AND a
`params._meta.progressToken` on tools/call. When both are present,
routes through `sse_tools_call` instead of the plain JSON path.
- sse_tools_call: spawns the dispatch task with the emitter installed,
builds an SSE stream that interleaves incoming progress events with
the final JSON-RPC response, with a 15s keep-alive interval as a
backstop for tools that don't emit their own progress.
- Plain JSON behaviour is unchanged for non-SSE clients and for
everything other than tools/call.
server/src/http/mcp/shell_tools/script.rs
- tool_run_tests poll loop emits `notifications/progress` every 25s
of elapsed time (well below the typical ~60s MCP transport
timeout). Attached callers (the bug 903 fix path) also emit so
their MCP socket stays alive while waiting for the in-flight job.
- Output filtering: on a passing run the response now returns a
one-line summary ("All N tests passed.") instead of the full
`cargo test` stdout, which was pure noise that burned agent
tokens. Failure output is unchanged (truncated tail with the
`failures:` section and final test_result line). CRDT entry
stores the same filtered value so attached callers see it too.
Tests (3 new):
- emit_progress_no_op_without_emitter — calling outside scope is safe
- emit_progress_sends_notification_when_emitter_installed — full path
- emit_progress_omits_optional_fields — total/message optional
Not changed: coder system_prompts still tell agents to retry on
transport-timeout errors. That advice is now belt-and-braces — if
claude-code's HTTP MCP client honours progress notifications, no agent
will ever observe the error; if not, retry is still safe post-903. We
can drop the retry advice once we've observed the SSE path working in
the field.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bug 903: every `run_tests` MCP call killed the prior `cargo test` child
for the same worktree and spawned a fresh one. Combined with the
~60s MCP client-side timeout and the 896 agent prompt that told agents
to "call run_tests again — it attaches to the in-flight test job",
this produced a respawn loop: agent calls, MCP times out at 60s, agent
retries, run_tests kills the running build and starts a new one. The
test suite never reaches the finish line.
Server log evidence: "Started test job for <worktree> (pid N)" with a
new PID every ~60-90s for the same worktree.
Fix: when `run_tests` is called and a job is already in flight for that
worktree, ATTACH to it instead of killing+respawning. The original job's
poll loop already writes the final status to the CRDT `test_jobs`
collection; attached callers just poll that CRDT entry (the same
pattern `get_test_result` uses) and return the result when the
in-flight job transitions out of "running". The 896 prompt's claim is
now actually true.
Worktrees remain isolated from each other and may run `cargo test`
concurrently — there is no cross-worktree serialisation. The single
invariant is "at most one test job per worktree at a time".
New test: `tool_run_tests_concurrent_calls_attach_to_single_job`
spawns two concurrent calls on the same worktree against a 2s
`sleep`-based script and asserts total elapsed stays close to 2s
(attach) rather than 4s (respawn).
Note: the cross-worktree linker-OOM symptom Timmy reported in the
field was downstream of the respawn loop. Killed-but-not-fully-reaped
cargo invocations stack memory pressure beyond the nominal N
worktrees. With the attach fix, each worktree runs exactly one
in-flight build at a time and old builds finish cleanly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The mergemaster gates run rustfmt and rejected 864's merge because
several files I added/touched in master today had not been fmt'd.
Six files affected, mostly trivial line-wrapping nits. Fixes the
formatting gate for the next 864 merge attempt.
Before: tool_run_check (and run_build/run_lint via run_script_tool)
returned the entire cargo log verbatim in `output`. For runs with many
errors the response routinely exceeded the MCP token cap, was dumped
to a tool-results file, and the agent had to scrape it with python3
just to see the error list — burning many turns on file archaeology
for what should be a one-look operation. Real example: 864's coder
hit `result (143,708 characters) exceeds maximum allowed tokens` and
spent ~8 turns extracting 3 errors.
Now:
- New `service::shell::parse_diagnostics` parses `error[CODE]:` /
`warning[CODE]:` headers + their `--> file:line` markers into
structured `Diagnostic { kind, code, message, file, line }`.
- `tool_run_check` (and the run_build/run_lint shared body) returns
`{ passed, exit_code, errors: [...], warnings: [...], summary }`.
Raw `output` is dropped from the default response.
- New `verbose: bool` argument (default false) restores the raw
output for callers who actually need it.
- Updated the existing tool_run_check test to assert the new
contract (150 errors → 150 structured entries, response < 50KB).
Skipped run_tests in this pass — its parser would need to recognise
test-runner output (different format from cargo); will land separately.
Closes 886.