Bug 903: every `run_tests` MCP call killed the prior `cargo test` child
for the same worktree and spawned a fresh one. Combined with the
~60s MCP client-side timeout and the 896 agent prompt that told agents
to "call run_tests again — it attaches to the in-flight test job",
this produced a respawn loop: agent calls, MCP times out at 60s, agent
retries, run_tests kills the running build and starts a new one. The
test suite never reaches the finish line.
Server log evidence: "Started test job for <worktree> (pid N)" with a
new PID every ~60-90s for the same worktree.
Fix: when `run_tests` is called and a job is already in flight for that
worktree, ATTACH to it instead of killing+respawning. The original job's
poll loop already writes the final status to the CRDT `test_jobs`
collection; attached callers just poll that CRDT entry (the same
pattern `get_test_result` uses) and return the result when the
in-flight job transitions out of "running". The 896 prompt's claim is
now actually true.
Worktrees remain isolated from each other and may run `cargo test`
concurrently — there is no cross-worktree serialisation. The single
invariant is "at most one test job per worktree at a time".
New test: `tool_run_tests_concurrent_calls_attach_to_single_job`
spawns two concurrent calls on the same worktree against a 2s
`sleep`-based script and asserts total elapsed stays close to 2s
(attach) rather than 4s (respawn).
Note: the cross-worktree linker-OOM symptom Timmy reported in the
field was downstream of the respawn loop. Killed-but-not-fully-reaped
cargo invocations stack memory pressure beyond the nominal N
worktrees. With the attach fix, each worktree runs exactly one
in-flight build at a time and old builds finish cleanly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The mergemaster gates run rustfmt and rejected 864's merge because
several files I added/touched in master today had not been fmt'd.
Six files affected, mostly trivial line-wrapping nits. Fixes the
formatting gate for the next 864 merge attempt.
Before: tool_run_check (and run_build/run_lint via run_script_tool)
returned the entire cargo log verbatim in `output`. For runs with many
errors the response routinely exceeded the MCP token cap, was dumped
to a tool-results file, and the agent had to scrape it with python3
just to see the error list — burning many turns on file archaeology
for what should be a one-look operation. Real example: 864's coder
hit `result (143,708 characters) exceeds maximum allowed tokens` and
spent ~8 turns extracting 3 errors.
Now:
- New `service::shell::parse_diagnostics` parses `error[CODE]:` /
`warning[CODE]:` headers + their `--> file:line` markers into
structured `Diagnostic { kind, code, message, file, line }`.
- `tool_run_check` (and the run_build/run_lint shared body) returns
`{ passed, exit_code, errors: [...], warnings: [...], summary }`.
Raw `output` is dropped from the default response.
- New `verbose: bool` argument (default false) restores the raw
output for callers who actually need it.
- Updated the existing tool_run_check test to assert the new
contract (150 errors → 150 structured entries, response < 50KB).
Skipped run_tests in this pass — its parser would need to recognise
test-runner output (different format from cargo); will land separately.
Closes 886.