d64f1e94ff
Bug 903: every `run_tests` MCP call killed the prior `cargo test` child for the same worktree and spawned a fresh one. Combined with the ~60s MCP client-side timeout and the 896 agent prompt that told agents to "call run_tests again — it attaches to the in-flight test job", this produced a respawn loop: agent calls, MCP times out at 60s, agent retries, run_tests kills the running build and starts a new one. The test suite never reaches the finish line. Server log evidence: "Started test job for <worktree> (pid N)" with a new PID every ~60-90s for the same worktree. Fix: when `run_tests` is called and a job is already in flight for that worktree, ATTACH to it instead of killing+respawning. The original job's poll loop already writes the final status to the CRDT `test_jobs` collection; attached callers just poll that CRDT entry (the same pattern `get_test_result` uses) and return the result when the in-flight job transitions out of "running". The 896 prompt's claim is now actually true. Worktrees remain isolated from each other and may run `cargo test` concurrently — there is no cross-worktree serialisation. The single invariant is "at most one test job per worktree at a time". New test: `tool_run_tests_concurrent_calls_attach_to_single_job` spawns two concurrent calls on the same worktree against a 2s `sleep`-based script and asserts total elapsed stays close to 2s (attach) rather than 4s (respawn). Note: the cross-worktree linker-OOM symptom Timmy reported in the field was downstream of the respawn loop. Killed-but-not-fully-reaped cargo invocations stack memory pressure beyond the nominal N worktrees. With the attach fix, each worktree runs exactly one in-flight build at a time and old builds finish cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>