Commit Graph

766 Commits

Author SHA1 Message Date
dave 7505f7fdeb huskies: merge 843 2026-04-29 15:54:28 +00:00
dave 7f8467b068 huskies: merge 871 2026-04-29 15:45:54 +00:00
dave 2655288412 huskies: merge 870 2026-04-29 15:26:57 +00:00
dave db65271587 huskies: merge 842 2026-04-29 15:10:11 +00:00
dave f3e4d5d072 huskies: merge 869 2026-04-29 14:58:11 +00:00
dave d1f58094f8 huskies: merge 839 2026-04-29 14:13:34 +00:00
dave 4324fa7511 huskies: merge 838 2026-04-29 13:58:05 +00:00
dave 59b626d3ba huskies: merge 824 2026-04-29 13:42:58 +00:00
dave b4854cf693 huskies: merge 862 2026-04-29 13:28:37 +00:00
dave 69930fb29f huskies: merge 837 2026-04-29 12:06:09 +00:00
dave 186cb38eeb huskies: merge 836 2026-04-29 11:50:04 +00:00
dave edeed3d1b6 huskies: merge 861 2026-04-29 11:12:20 +00:00
dave 19a2ffde96 huskies: merge 860 2026-04-29 10:53:39 +00:00
dave 11d111360d huskies: merge 858 2026-04-29 10:47:18 +00:00
dave be5db846cc huskies: merge 835 2026-04-29 10:41:27 +00:00
dave 1ae8e8ec9d huskies: merge 841 2026-04-29 10:36:03 +00:00
dave d22a591fdc huskies: merge 834 2026-04-29 10:12:53 +00:00
dave 0403dc9871 huskies: merge 833 2026-04-29 09:55:09 +00:00
dave 4ed1fb5110 huskies: merge 854 2026-04-29 09:29:32 +00:00
dave dcd695ad0e huskies: merge 852 2026-04-29 08:55:49 +00:00
dave 549a9defc4 huskies: merge 851 2026-04-29 08:42:28 +00:00
dave 283bbc8658 huskies: merge 832 2026-04-29 00:34:21 +00:00
dave f3bb0a6f4b huskies: merge 828 2026-04-29 00:29:26 +00:00
dave 89bf4ae0cf huskies: merge 831 2026-04-29 00:16:18 +00:00
dave 42b576d285 huskies: merge 817 2026-04-29 00:11:48 +00:00
dave 39bdbbd095 huskies: merge 809 2026-04-29 00:06:42 +00:00
dave a8d45dcdff huskies: merge 800 2026-04-28 23:36:40 +00:00
dave ab01a62bd1 huskies: merge 808 2026-04-28 23:18:57 +00:00
dave 6092f7efbb huskies: merge 822 2026-04-28 23:12:25 +00:00
dave dd35a8a530 huskies: merge 799 2026-04-28 21:01:55 +00:00
dave 97b9eaa39d huskies: merge 807 2026-04-28 20:56:17 +00:00
dave 2a77f73ba4 fix(merge): use server-start-time, not pid, for stale-merge detection
The merge_jobs cleanup encoded the server's pid in the CRDT and checked
`kill(pid, 0)` to decide whether a "running" entry was stale. Two problems:

  1. The cleanup runs *inside* the server, so checking whether the
     server's own pid is alive is tautological — kill(self_pid, 0)
     always succeeds.
  2. `rebuild_and_restart` does an `execve()` re-exec, which keeps the
     same pid. After re-exec, merge_jobs from the previous server
     instance still encode "the current pid" — so the cleanup never
     fires, and stories like 799/800 sit forever with status="running"
     while no actual merge runs.

Switch to a per-process server-start-time captured lazily in a
`OnceLock<f64>` (reset by execve, so the new instance sees a fresh
boot-time). A merge_job's recorded start-time < current boot-time means
it came from a previous instance: stale, delete it.

Legacy pid-encoded entries decode to None and are also treated as stale.

MergeJob.pid → MergeJob.server_start_time. Tests updated.
2026-04-28 20:41:32 +00:00
dave 8f392f4fc7 huskies: merge 789 2026-04-28 20:38:12 +00:00
dave f5ab75ecaa huskies: merge 819 2026-04-28 20:28:35 +00:00
dave b060d8fc88 fix(pty): always pass -p on resume so --include-partial-messages works
claude CLI 2.1.97 strictly enforces that --include-partial-messages
requires --print/-p to be set. The resume path skipped -p when the
prompt was empty (which is the common case on respawns when there's
no fresh failure context to inject), so the spawned claude process
saw `--resume <sid> ... --include-partial-messages` without -p and
exited with code 1: "include-partial-messages requires --print and
--output-format=stream-json".

Net effect: every coder respawn with prior_sessions > 0 and empty
prompt was failing immediately, looking exactly like a rate-limit
(empty agent log, zero tool calls). 819 hit retry-limit (4/3) and
got marked blocked because of this — not because of any actual code
or rate-limit issue.

Fix: always pass `-p <prompt>` on resume, even with empty prompt.
2026-04-28 20:14:32 +00:00
dave 20e1210818 huskies: merge 830 2026-04-28 19:30:08 +00:00
dave 46b1e84629 huskies: merge 791 2026-04-28 19:18:12 +00:00
dave e4af2d5c08 huskies: merge 803 2026-04-28 19:10:41 +00:00
dave fa54451ba6 huskies: merge 802 2026-04-28 19:05:06 +00:00
dave 619bdd9c82 huskies: merge 801 2026-04-28 16:43:04 +00:00
dave 30dd4b3a0a huskies: merge 796 2026-04-28 16:34:10 +00:00
dave a65cd86c8f huskies: merge 798 2026-04-28 16:25:33 +00:00
dave 1e40215c3e huskies: merge 797 2026-04-28 16:06:50 +00:00
dave d0b7db6765 huskies: merge 785 2026-04-28 15:59:50 +00:00
dave 32a3465fc4 fix: tell the truth about run_tests being blocking
`tool_run_tests` in `server/src/http/mcp/shell_tools/script.rs` is fully
blocking server-side: it spawns the test child, polls every 1s server-side
until exit (or `TEST_TIMEOUT_SECS = 1200s`), and returns the full
{passed, exit_code, output} directly. There is NO async/started-status
return path.

But two places told agents the wrong story:
1. `tools_list/system_tools.rs` description claimed "Returns immediately
   with status: started. Poll get_test_result..." — agents read tool
   descriptions for protocol semantics, so they followed this and burned
   turns polling get_test_result.
2. `agents.toml` had been correctly saying it blocks, but my last commit
   (776aad38) "fixed" it the wrong way based on a misread of the code.

Now both say: run_tests blocks server-side, returns the full result, do
not poll get_test_result. get_test_result remains for external observers
(UI checking on a job another caller started).

Reverts the prompt change in 776aad38 with the correct text.
2026-04-28 15:59:06 +00:00
dave bf17fc14af huskies: merge 795 2026-04-28 15:45:10 +00:00
dave f63464852b huskies: merge 770 2026-04-28 15:38:34 +00:00
dave 1946709681 huskies: merge 788 2026-04-28 15:28:31 +00:00
dave f62012ee9c huskies: merge 793 2026-04-28 15:21:51 +00:00
dave c1a50eab8e huskies: merge 790 2026-04-28 15:16:01 +00:00