Commit Graph

134 Commits

Author SHA1 Message Date
dave 0403dc9871 huskies: merge 833 2026-04-29 09:55:09 +00:00
dave 4ed1fb5110 huskies: merge 854 2026-04-29 09:29:32 +00:00
dave dcd695ad0e huskies: merge 852 2026-04-29 08:55:49 +00:00
dave 549a9defc4 huskies: merge 851 2026-04-29 08:42:28 +00:00
dave 89bf4ae0cf huskies: merge 831 2026-04-29 00:16:18 +00:00
dave 6092f7efbb huskies: merge 822 2026-04-28 23:12:25 +00:00
dave 2a77f73ba4 fix(merge): use server-start-time, not pid, for stale-merge detection
The merge_jobs cleanup encoded the server's pid in the CRDT and checked
`kill(pid, 0)` to decide whether a "running" entry was stale. Two problems:

  1. The cleanup runs *inside* the server, so checking whether the
     server's own pid is alive is tautological — kill(self_pid, 0)
     always succeeds.
  2. `rebuild_and_restart` does an `execve()` re-exec, which keeps the
     same pid. After re-exec, merge_jobs from the previous server
     instance still encode "the current pid" — so the cleanup never
     fires, and stories like 799/800 sit forever with status="running"
     while no actual merge runs.

Switch to a per-process server-start-time captured lazily in a
`OnceLock<f64>` (reset by execve, so the new instance sees a fresh
boot-time). A merge_job's recorded start-time < current boot-time means
it came from a previous instance: stale, delete it.

Legacy pid-encoded entries decode to None and are also treated as stale.

MergeJob.pid → MergeJob.server_start_time. Tests updated.
2026-04-28 20:41:32 +00:00
dave f5ab75ecaa huskies: merge 819 2026-04-28 20:28:35 +00:00
dave b060d8fc88 fix(pty): always pass -p on resume so --include-partial-messages works
claude CLI 2.1.97 strictly enforces that --include-partial-messages
requires --print/-p to be set. The resume path skipped -p when the
prompt was empty (which is the common case on respawns when there's
no fresh failure context to inject), so the spawned claude process
saw `--resume <sid> ... --include-partial-messages` without -p and
exited with code 1: "include-partial-messages requires --print and
--output-format=stream-json".

Net effect: every coder respawn with prior_sessions > 0 and empty
prompt was failing immediately, looking exactly like a rate-limit
(empty agent log, zero tool calls). 819 hit retry-limit (4/3) and
got marked blocked because of this — not because of any actual code
or rate-limit issue.

Fix: always pass `-p <prompt>` on resume, even with empty prompt.
2026-04-28 20:14:32 +00:00
dave e4af2d5c08 huskies: merge 803 2026-04-28 19:10:41 +00:00
dave 619bdd9c82 huskies: merge 801 2026-04-28 16:43:04 +00:00
dave f62012ee9c huskies: merge 793 2026-04-28 15:21:51 +00:00
dave 7cd9706c0f huskies: merge 813 2026-04-28 14:22:19 +00:00
dave 8f23d13ac8 huskies: merge 779 2026-04-28 13:48:40 +00:00
dave 36ca8d5e3b huskies: merge 827 2026-04-28 13:01:48 +00:00
dave 6c2bdde695 huskies: merge 783 2026-04-28 11:17:40 +00:00
dave 7faacb6664 huskies: merge 773 2026-04-28 10:24:04 +00:00
dave 70aaffc2ab huskies: merge 777 2026-04-28 00:33:14 +00:00
dave 63ce7b9ec3 huskies: merge 759 2026-04-28 00:07:04 +00:00
dave 7ee542dd1e huskies: merge 757 2026-04-27 23:36:56 +00:00
dave 1388658ae8 huskies: merge 730_story_use_numeric_only_story_ids_across_mcp_worktrees_git_branches_and_log_paths 2026-04-27 20:22:47 +00:00
dave 615e1c7f73 huskies: merge 738_refactor_delete_fs_shadow_code_from_lifecycle_rs_and_the_work_directory_watcher 2026-04-27 19:56:53 +00:00
dave 63a30a9319 huskies: merge 736_story_drain_and_prepend_buffered_status_events_on_the_user_s_next_agent_message 2026-04-27 19:37:39 +00:00
dave b008235d0d huskies: merge 683_refactor_decompose_server_src_agents_pool_start_mod_rs_1329_lines 2026-04-27 18:26:31 +00:00
dave 272a592a4d huskies: merge 735_story_attach_statuseventbuffer_to_each_agent_session_scoped_per_project_reset_on_restart 2026-04-27 18:06:11 +00:00
dave d654f55981 huskies: merge 682_refactor_decompose_server_src_agents_merge_squash_rs_1346_lines 2026-04-27 17:58:04 +00:00
dave 101f616346 huskies: merge 719_refactor_stale_merge_job_lock_recovery_on_new_merge_attempts 2026-04-27 17:46:49 +00:00
dave ed8646f0d9 huskies: merge 681_refactor_decompose_server_src_agents_pool_pipeline_advance_mod_rs_1509_lines 2026-04-27 17:35:17 +00:00
dave 4a0f57478c huskies: merge 671_refactor_migrate_pipeline_state_consumers_from_string_comparisons_to_typed_pipelinestage_enum 2026-04-27 16:39:39 +00:00
dave 6a582d73b6 huskies: merge 675_bug_mergemaster_silently_exits_when_feature_branch_has_zero_commits_ahead_of_master 2026-04-27 14:43:54 +00:00
dave 5da29c3d91 huskies: merge 668_bug_pipeline_advances_coder_work_to_merge_when_gates_passed_false 2026-04-27 11:39:11 +00:00
dave ac85cfce5d huskies: merge 652_story_pass_resume_session_id_on_agent_respawn_so_new_sessions_inherit_prior_reasoning 2026-04-27 11:27:50 +00:00
dave c600b94f4e chore: remove dangling orphan files accidentally added in b340aa97
server/src/agents/pool/lifecycle.rs and server/src/chat/transport/matrix/notifications.rs were untracked leftovers from an abandoned WIP stash that 'git add -A' picked up. Neither is declared as a mod anywhere — they're dangling code that doesn't get compiled but pollutes the tree.
2026-04-27 01:32:38 +00:00
dave b340aa97b0 fix: clean up clippy warnings + cargo fmt across post-refactor surface
The 13-file refactor pass (commits db00a5d4 through eca15b4e) introduced
~89 clippy errors and 38 cargo fmt issues — every agent in every worktree
hit them on script/test, burning their turn budget on cleanup before doing
real story work. This is the silent kill behind 644, 652, 655, 664, 667
all hitting watchdog limits this round.

Changes:
- cargo fmt --all across 37 files (formatting normalisation only)
- #![allow(unused_imports, dead_code)] on 24 split modules where the
  python-script splitter imported liberally to be safe; tighter cleanup
  per-import will happen as agents touch each module
- Removed truly-dead re-exports (cleanup_merge_workspace, slog_warn from
  http/mcp/mod.rs, CliArgs/print_help from main.rs)
- Prefixed _auth_msg in crdt_sync/server.rs (handshake helper return is
  bound but not consumed)
- Converted dangling /// doc block in crdt_sync/mod.rs to //! so it
  attaches to the module
- Removed empty lines after doc comments in 4 spots (clippy lint)

All 2636 tests pass; clippy --all-targets -- -D warnings clean.
2026-04-27 01:32:08 +00:00
dave 06035f20ad fix: restore #[tokio::main] on main(), #[cfg(unix)] on platform tests, #[allow] on run_pty_session/AuthListenerResult
The biggest miss is #[tokio::main] — without it, async fn main() doesn't compile,
and the binary in every worktree fails 'cargo check'. Agents in those worktrees
burn their turn budgets trying to fix the build before they can do real work, then
get killed by the watchdog. That's why all three in-flight stories failed.

Other restored attributes:
- #[cfg(unix)] on 4 tests in merge/squash and scaffold (skip on non-Unix)
- #[allow(dead_code)] on AuthListenerResult test enum
- #[allow(clippy::too_many_arguments)] on run_pty_session

Same root cause as the earlier #[test] attribute losses: my line ranges started
at the fn line, missing the leading attribute on the previous line.
2026-04-26 23:38:17 +00:00
dave eca15b4ee7 refactor: split agents/pool/start.rs into mod.rs + validation.rs + spawn.rs
The 1630-line start.rs is split into a sub-module directory:

- validation.rs: validate_agent_stage + read_front_matter_agent helpers (69 lines)
- spawn.rs: run_agent_spawn — the background async work that was inlined as
  a tokio::spawn closure body inside start_agent (359 lines)
- mod.rs: AgentPool::start_agent orchestrator + tests (1062 lines)

Stage validation and front-matter agent reading are pre-lock pure helpers that
naturally extract.  The spawn closure body becomes a free async fn that takes
the previously-cloned values as parameters; rebound to the original _clone /
_owned names at the top of the body so the actual work code is a verbatim copy.

No behaviour change. All 23 start tests pass; full suite green.
2026-04-26 22:12:04 +00:00
dave ca72f36c78 refactor: split agents/pool/pipeline/advance.rs into mod.rs + helpers.rs
The 1353-line advance.rs is split into:

- mod.rs: impl AgentPool with run_pipeline_advance + start_mergemaster_or_block + tests (1244 lines)
- helpers.rs: spawn_pipeline_advance, resolve_qa_mode_from_store, write_review_hold_to_store, should_block_story (128 lines)

Tests stay co-located with run_pipeline_advance which they exercise.

No behaviour change. All 10 advance tests pass; full suite green.
2026-04-26 21:35:04 +00:00
dave ce94dd0af4 refactor: split agents/merge.rs into mod.rs + squash.rs + conflicts.rs
The 1772-line merge.rs is split into:

- conflicts.rs: try_resolve_conflicts + resolve_simple_conflicts + tests (351 lines)
- squash.rs: run_squash_merge orchestrator + cleanup + run_merge_quality_gates + tests (1306 lines)
- mod.rs: doc, types (MergeJobStatus, MergeJob, MergeReport, SquashMergeResult), re-exports (52 lines)

Tests stay co-located.

No behaviour change. All 20 merge tests pass; full suite green
(2635 tests with --test-threads=1).
2026-04-26 21:15:06 +00:00
dave ff51a1a465 huskies: merge 651_bug_remove_git_reset_clean_behaviour_from_bug_645_s_recovery_path_uncommitted_work_in_worktrees_is_never_junk 2026-04-26 16:46:25 +00:00
dave 365b907ba4 huskies: merge 650_bug_watchdog_turns_used_and_budget_used_usd_accumulate_across_all_sessions_restart_counts_against_limits_from_prior_runs 2026-04-26 16:24:10 +00:00
dave 148c88bd40 huskies: merge 646_bug_watchdog_from_bug_624_is_not_actually_enforcing_max_turns_max_budget_usd_in_production 2026-04-26 13:11:48 +00:00
dave f88bb5f486 huskies: merge 645_bug_agent_runtime_panics_with_output_write_bytes_is_ok_assertion_marking_stories_falsely_blocked 2026-04-26 10:54:58 +00:00
dave e20083a283 huskies: merge 624_bug_agent_turn_and_budget_limits_not_enforced_coder_1_ran_5_6x_over_max_turns 2026-04-25 13:11:30 +00:00
dave 4b765bbc39 huskies: merge 601_story_project_local_agent_prompt_layer_for_huskies 2026-04-23 11:56:19 +00:00
dave d235fd41ac huskies: merge 581_story_freeze_command_to_hold_a_story_at_its_current_stage_without_advancing 2026-04-15 18:02:14 +00:00
dave df5ba8ebab huskies: merge 560_story_make_merge_agent_work_return_results_like_run_tests_instead_of_polling 2026-04-14 10:26:44 +00:00
dave 979cf39228 huskies: merge 557_refactor_remove_all_filesystem_fallback_paths_crdt_is_the_only_source_of_truth 2026-04-14 09:14:07 +00:00
dave 10d3517648 fix: remove filesystem fallback from scan_stage_items to unblock 557 merge
The auto-resolver kept both sides of the conflict — feature's
_project_root signature with master's filesystem code referencing
project_root — producing a compile error. Remove the filesystem
fallback on master so there's no conflict. CRDT is the only source
of truth.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 18:14:58 +00:00
dave d618bc3b32 huskies: merge 556_bug_stale_filesystem_shadows_in_1_backlog_cause_auto_assign_to_promote_archived_stories 2026-04-13 14:48:44 +00:00
dave 845b85e7a7 fix: add --all to cargo fmt in script/test and autoformat codebase
cargo fmt without --all fails with "Failed to find targets" in
workspace repos. This was blocking every story's gates. Also ran
cargo fmt --all to fix all existing formatting issues.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 14:07:08 +00:00