feat(904): MCP progress notifications + SSE for long-running tool calls

Follow-up to bug 903. The attach fix made run_tests retries safe, but
agents still observed the underlying MCP transport timeout as a
tool-call error and had to handle it via retry. Implement the proper
fix: MCP `notifications/progress` events keep the client's transport
timer alive so the call never errors from the agent's perspective.

What changed:

server/src/http/mcp/progress.rs (new)
  - `ProgressEmitter` (progressToken + mpsc sender) installed in a
    `tokio::task_local!` scope by the SSE response path.
  - `emit_progress(progress, total, message)` builds a JSON-RPC
    `notifications/progress` message and sends it via the channel.
    No-op when no emitter is in scope (plain JSON path / tests / API
    runtimes), so tool handlers can call it unconditionally.

server/src/http/mcp/mod.rs
  - mcp_post_handler now detects `Accept: text/event-stream` AND a
    `params._meta.progressToken` on tools/call. When both are present,
    routes through `sse_tools_call` instead of the plain JSON path.
  - sse_tools_call: spawns the dispatch task with the emitter installed,
    builds an SSE stream that interleaves incoming progress events with
    the final JSON-RPC response, with a 15s keep-alive interval as a
    backstop for tools that don't emit their own progress.
  - Plain JSON behaviour is unchanged for non-SSE clients and for
    everything other than tools/call.

server/src/http/mcp/shell_tools/script.rs
  - tool_run_tests poll loop emits `notifications/progress` every 25s
    of elapsed time (well below the typical ~60s MCP transport
    timeout). Attached callers (the bug 903 fix path) also emit so
    their MCP socket stays alive while waiting for the in-flight job.
  - Output filtering: on a passing run the response now returns a
    one-line summary ("All N tests passed.") instead of the full
    `cargo test` stdout, which was pure noise that burned agent
    tokens. Failure output is unchanged (truncated tail with the
    `failures:` section and final test_result line). CRDT entry
    stores the same filtered value so attached callers see it too.

Tests (3 new):
  - emit_progress_no_op_without_emitter — calling outside scope is safe
  - emit_progress_sends_notification_when_emitter_installed — full path
  - emit_progress_omits_optional_fields — total/message optional

Not changed: coder system_prompts still tell agents to retry on
transport-timeout errors. That advice is now belt-and-braces — if
claude-code's HTTP MCP client honours progress notifications, no agent
will ever observe the error; if not, retry is still safe post-903. We
can drop the retry advice once we've observed the SSE path working in
the field.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Timmy
2026-05-12 15:05:04 +01:00
parent a97a10fba2
commit ddc4228b10
3 changed files with 300 additions and 6 deletions
+49 -5
View File
@@ -13,6 +13,10 @@ use super::exec::validate_working_dir;
const TEST_TIMEOUT_SECS: u64 = 1200;
const MAX_OUTPUT_LINES: usize = 100;
/// How often `tool_run_tests` emits a `notifications/progress` event while a
/// test job is in flight. Chosen well below typical MCP HTTP transport
/// timeouts (~60s) so the client's socket timer resets before it can fire.
const PROGRESS_INTERVAL_SECS: u64 = 25;
// ── In-flight process registry ───────────────────────────────────────────────
//
@@ -142,9 +146,24 @@ pub(crate) async fn tool_run_tests(args: &Value, ctx: &AppContext) -> Result<Str
// Block server-side, checking every second until done or timeout.
let start = std::time::Instant::now();
let mut last_progress_emit_secs: u64 = 0;
loop {
tokio::time::sleep(std::time::Duration::from_secs(1)).await;
// Emit periodic progress notifications so any SSE consumer's MCP
// transport timer resets and the tool call doesn't surface a
// transport-timeout error to the agent. No-op when there is no
// emitter (plain JSON path).
let elapsed_secs = start.elapsed().as_secs();
if elapsed_secs.saturating_sub(last_progress_emit_secs) >= PROGRESS_INTERVAL_SECS {
crate::http::mcp::progress::emit_progress(
elapsed_secs as f64,
None,
Some(&format!("running tests: {elapsed_secs}s elapsed")),
);
last_progress_emit_secs = elapsed_secs;
}
let mut jobs = active_jobs().lock().map_err(|e| e.to_string())?;
let job = match jobs.get_mut(&working_dir) {
Some(j) => j,
@@ -165,9 +184,18 @@ pub(crate) async fn tool_run_tests(args: &Value, ctx: &AppContext) -> Result<Str
.unwrap_or_default();
let combined = format!("{stdout}{stderr}");
let (tests_passed, tests_failed) = parse_test_counts(&combined);
let truncated = truncate_output(&combined, MAX_OUTPUT_LINES);
let passed = status.success();
let exit_code = status.code().unwrap_or(-1);
// On success the full cargo output is pure noise that burns
// agent tokens — replace with a one-line summary. On failure
// we keep the tail (which usually contains the `failures:`
// section + final test_result line) so agents have enough
// context to fix the failing tests.
let response_output = if passed {
format!("All {tests_passed} tests passed.")
} else {
truncate_output(&combined, MAX_OUTPUT_LINES)
};
let crdt_status = if passed { "pass" } else { "fail" };
crate::slog!(
"[run_tests] Test job for {} finished (pid {}, passed={})",
@@ -176,13 +204,14 @@ pub(crate) async fn tool_run_tests(args: &Value, ctx: &AppContext) -> Result<Str
passed
);
// Persist result in CRDT for post-restart visibility.
// Persist result in CRDT for post-restart visibility and so
// attached callers see the same filtered output.
crate::crdt_state::write_test_job(
&sid,
crdt_status,
started_at_unix,
Some(unix_now()),
Some(&truncated),
Some(&response_output),
);
// Capture positive test evidence in the DB so the pipeline
@@ -198,7 +227,7 @@ pub(crate) async fn tool_run_tests(args: &Value, ctx: &AppContext) -> Result<Str
"timed_out": false,
"tests_passed": tests_passed,
"tests_failed": tests_failed,
"output": truncated,
"output": response_output,
}))
.map_err(|e| format!("Serialization error: {e}"));
}
@@ -257,6 +286,7 @@ pub(crate) async fn tool_run_tests(args: &Value, ctx: &AppContext) -> Result<Str
/// `cargo test` run rather than racing to kill+respawn.
async fn attach_to_in_flight_test_job(sid: &str) -> Result<String, String> {
let start = std::time::Instant::now();
let mut last_progress_emit_secs: u64 = 0;
loop {
match crate::crdt_state::read_test_job(sid) {
None => {
@@ -273,7 +303,21 @@ async fn attach_to_in_flight_test_job(sid: &str) -> Result<String, String> {
}
}
if start.elapsed().as_secs() > TEST_TIMEOUT_SECS {
// Keep the SSE consumer's MCP socket alive while we wait. No-op when
// no emitter is installed (plain JSON path).
let elapsed_secs = start.elapsed().as_secs();
if elapsed_secs.saturating_sub(last_progress_emit_secs) >= PROGRESS_INTERVAL_SECS {
crate::http::mcp::progress::emit_progress(
elapsed_secs as f64,
None,
Some(&format!(
"attached to in-flight test job: {elapsed_secs}s elapsed"
)),
);
last_progress_emit_secs = elapsed_secs;
}
if elapsed_secs > TEST_TIMEOUT_SECS {
return Err(format!(
"Attached test job for '{sid}' did not complete within {TEST_TIMEOUT_SECS}s"
));