fix(923): watchdog counts only tool-using turns; narration-only turns no longer burn budget
Observed: stories 917, 918, 920, 910 all turn-limit-killed despite producing
real commits. Tally across their session logs shows 30–55% of assistant
turns were pure narration ("I'll read X next", "Now let me check Y") with
no tool_use. At 80 max_turns the effective work budget was ~44 tool calls,
not enough for a typical bug fix's edit + test + check_criterion cycle.
Changes:
- New optional AgentConfig field max_tool_turns. When set the watchdog
uses it instead of max_turns; only assistant messages whose
data.message.content has at least one tool_use block count.
- count_turns_in_log in agents/pool/auto_assign/watchdog/limits.rs
filters on tool_use. Existing test helper write_fake_session_log now
emits tool_use blocks; added write_fake_mixed_session_log for the
narration regression test.
- agents.toml: coders/coder-opus get max_turns=200 (claude-code's own
--max-turns cap, sized to never bite before the watchdog) and
max_tool_turns=80. qa: 120 / 40. mergemaster: 250 / 100. Budgets
unchanged — the dollar cap remains the runaway-loop backstop, with
~$3-5 worst-case waste if an agent narrates indefinitely.
- Two new regression tests:
* watchdog_does_not_count_narration_only_turns: 5 tool + 30 narration
under max_tool_turns=10 stays Running.
* watchdog_max_tool_turns_overrides_max_turns: 4 tool turns at
max_tool_turns=3 / max_turns=200 still terminates with TurnLimit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -41,7 +41,13 @@ pub(crate) fn resolve_session_log(
|
||||
crate::agent_log::find_latest_log(project_root, story_id, agent_name)
|
||||
}
|
||||
|
||||
/// Count `assistant` events in a single log file.
|
||||
/// Count **tool-using** assistant turns in a single log file (story 923).
|
||||
///
|
||||
/// A turn counts only if its assistant message contains at least one
|
||||
/// `content` block with `type == "tool_use"`. Narration-only turns
|
||||
/// (text-only assistant messages such as "I'll read X next" preambles)
|
||||
/// don't count against the watchdog limit — they make no progress, only
|
||||
/// burn budget, and Sonnet emits them at roughly 30–55% of all turns.
|
||||
pub(crate) fn count_turns_in_log(path: &Path) -> u64 {
|
||||
let entries = match crate::agent_log::read_log(path) {
|
||||
Ok(e) => e,
|
||||
@@ -50,13 +56,24 @@ pub(crate) fn count_turns_in_log(path: &Path) -> u64 {
|
||||
entries
|
||||
.iter()
|
||||
.filter(|entry| {
|
||||
entry.event.get("type").and_then(|v| v.as_str()) == Some("agent_json")
|
||||
&& entry
|
||||
.event
|
||||
.get("data")
|
||||
.and_then(|d| d.get("type"))
|
||||
.and_then(|v| v.as_str())
|
||||
== Some("assistant")
|
||||
if entry.event.get("type").and_then(|v| v.as_str()) != Some("agent_json") {
|
||||
return false;
|
||||
}
|
||||
let data = match entry.event.get("data") {
|
||||
Some(d) => d,
|
||||
None => return false,
|
||||
};
|
||||
if data.get("type").and_then(|v| v.as_str()) != Some("assistant") {
|
||||
return false;
|
||||
}
|
||||
// Require at least one tool_use content block.
|
||||
data.pointer("/message/content")
|
||||
.and_then(|c| c.as_array())
|
||||
.map(|arr| {
|
||||
arr.iter()
|
||||
.any(|item| item.get("type").and_then(|v| v.as_str()) == Some("tool_use"))
|
||||
})
|
||||
.unwrap_or(false)
|
||||
})
|
||||
.count() as u64
|
||||
}
|
||||
@@ -103,7 +120,10 @@ pub(super) fn check_agent_limits(
|
||||
|
||||
for (key, story_id, agent_name, tx, log_session_id) in &running {
|
||||
let agent_config = config.agent.iter().find(|a| a.name == *agent_name);
|
||||
let max_turns = agent_config.and_then(|a| a.max_turns);
|
||||
// The watchdog gates on max_tool_turns (counts only tool-using
|
||||
// assistant turns) when set; otherwise falls back to max_turns for
|
||||
// backwards compatibility with configs that haven't migrated yet.
|
||||
let max_turns = agent_config.and_then(|a| a.max_tool_turns.or(a.max_turns));
|
||||
let max_budget_usd = agent_config.and_then(|a| a.max_budget_usd);
|
||||
|
||||
// Skip agents with no limits configured.
|
||||
|
||||
Reference in New Issue
Block a user