huskies

Author	SHA1	Message	Date
Timmy	265e6f9a15	fix(1101): strip passing-test lines before classify() lint check; remove diagnostic The merge gate classifier was matching trigger keywords like `missing_doc_comments` inside passing-test name lines (e.g. `test agents::gates::tests::classify_lint_from_missing_doc_comments ... ok`), causing every gate failure to be mis-classified as Lint and bounced back to a fixup coder. Strip `test … … ok` lines before scanning for lint triggers. Also removes the temporary diagnostic block in runner.rs that confirmed the bug. Applied directly to master because the 1101 feature branch carried stale work from an earlier incarnation of the story that semantically conflicted with master's later diagnostic commit (`is_fixup` deleted on the branch, referenced on master). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 16:52:26 +01:00
dave	0695ad7ae6	huskies: merge 1115 story new project: --adopt flow to wrap a container around an existing checkout	2026-05-17 15:17:12 +00:00
dave	eb6b07531a	huskies: merge 1114 story new project: --path flag to override default host directory	2026-05-17 14:48:49 +00:00
Timmy	a5bfd40233	Bump version to 0.12.0	2026-05-17 02:10:31 +01:00
dave	a40500eea9	huskies: merge 1111 bug Test isolation: `init_for_test()` and `ensure_content_store()` are once-per-thread, not once-per-test, polluting CRDT state across tests	2026-05-17 00:33:45 +00:00
dave	f8212f102f	huskies: merge 1109 story Chat bootstrap Phase 4: `--git` clones an existing repo and configures push credentials	2026-05-17 00:18:25 +00:00
dave	59302b465d	huskies: merge 1108 story Chat bootstrap Phase 3: SSH-remote editor access into the project container (any editor)	2026-05-16 23:37:59 +00:00
dave	efafe44db1	huskies: merge 1110 story Chat bootstrap Phase 2b: additional stack overlays (Go, Python, Ruby, JVM)	2026-05-16 23:20:31 +00:00
dave	3a43337735	huskies: merge 1107 story Chat bootstrap Phase 2a: stack-overlay framework + Rust and Node stack overlays	2026-05-16 23:01:49 +00:00
dave	10d992a7e4	huskies: merge 1106 story Chat bootstrap Phase 1: `new project` chat command spawns a bare project container and registers it with the gateway	2026-05-16 22:39:20 +00:00
Timmy	7db0b78e88	Bump version to 0.11.1	2026-05-15 23:38:09 +01:00
dave	979492449e	huskies: merge 1105 bug Freeze from Backlog stores wrong resume_to — Unfreeze restores to Coding instead of Backlog	2026-05-15 22:33:54 +00:00
Timmy	6fbe239313	fix(1102): require non-empty origin.id on create_* MCP tools bug 1102 was created today with origin={kind:user, id:""} because build_origin silently defaulted id to empty when the caller didn't pass one — we couldn't tell who filed it. Bug 1088's origin field is useless as audit if every caller can omit themselves. Changes: - build_origin (server/src/http/mcp/story_tools/mod.rs) now returns Result<String, String> and rejects missing/empty/whitespace-only id with an instructional error pointing at bug 1102 / story 1104. - 5 create_* tool handlers (bug, spike, refactor, epic, story) now resolve origin BEFORE create_*_file so an attribution-less call leaves no half-state behind. - 5 tool input schemas now advertise origin as a required object via a shared origin_schema() helper. The schema description gives every caller (coder agent, chat bot, user, system) a concrete example so the LLM populates the field correctly on first sight. - Test fixtures pass origin = {kind:"test", id:"test-suite"}. Story 1104 (signed actions) is the longer-term replacement; this is the quick attribution win agreed for master ahead of that design work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 23:13:54 +01:00
Timmy	26527e7dae	diag(1101): log classify verdict + matched trigger on merge gate failures Bug 1101's reframed AC1: when a non-success merge runs, log the typed GateFailureKind, the matched classifier-trigger substring (if any) and ~90 chars of surrounding context. Fires on every gate failure regardless of routing, so the next fixup-loop bounce will tell us which substring is fooling classify() into Fmt\|Lint\|SourceMapCheck on what's actually a Test failure. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 23:13:38 +01:00
dave	04a57e92c2	huskies: merge 1103 bug Rate-limit warning at session start sticks the `rate_limit_exit` flag, causing 1053's fast-path bypass to skip completion on clean session exits	2026-05-15 21:02:37 +00:00
dave	4216ced493	huskies: merge 1100 bug Multiple LLM agents can run concurrently on the same story (coder + mergemaster + others) — enforce one-agent-per-story invariant	2026-05-15 20:24:31 +00:00
dave	63d86f1263	huskies: merge 1096 bug Shadow drift: set_agent writes CRDT agent register without updating pipeline_items.agent	2026-05-15 19:05:56 +00:00
dave	1adc734801	huskies: merge 1098 bug Shadow drift: set_retry_count / bump_retry_count write CRDT register without updating pipeline_items.retry_count	2026-05-15 18:25:25 +00:00
dave	8531bac6cd	huskies: merge 1097 bug Shadow drift: set_depends_on writes CRDT depends_on register without updating pipeline_items.depends_on	2026-05-15 12:40:17 +00:00
dave	2857c3b46b	huskies: merge 1094 bug delete_story leaks zombie rows in pipeline_items shadow table — 176 tombstoned items still report non-terminal stages	2026-05-15 12:27:48 +00:00
dave	62d1535e76	huskies: merge 1095 bug Shadow drift: set_name writes CRDT name register without updating pipeline_items.name	2026-05-15 12:10:11 +00:00
dave	fc5481dbe4	huskies: merge 1093 bug Chat dispatcher spawns one Timmy per inbound message — needs coalesce window + per-session serial lock	2026-05-15 12:03:09 +00:00
dave	01e60a670c	huskies: merge 1091 refactor Migrate the merge-gate's stale-cargo kill path to `process_kill`	2026-05-15 11:50:03 +00:00
dave	c4010854a5	huskies: merge 1089 bug Stuck-agent detector blocks stories on legitimate exploration / debugging — uses too narrow a "progress" signal	2026-05-15 11:40:44 +00:00
dave	4aa76ce673	huskies: merge 1090 refactor Migrate `AgentPool::kill_all_children` and `kill_child_for_key` to `process_kill` so server shutdown and `stop_agent` actually kill claude	2026-05-15 11:16:16 +00:00
Timmy	fb82bd7bca	test(tick_loop): de-flake reconcile_never_floods_broadcast_channel The test asserted msg_count == 0 on a process-global broadcast channel (TRANSITION_TX is a single OnceLock<Sender> shared across the test binary), so any concurrent test calling apply_transition could land events in our receiver between the drain and the post-reconcile check. Observed failure: 3 stray transitions from parallel tests. Drop the strict count check. The real "never floods" invariant is captured by the Lagged check alone: 1000 seeded items must not overflow the 256-slot channel, which can only hold if the reconcile path bypasses the broadcast (AC4). The sibling test `reconcile_pass_scales_to_1000_items_without_lagged_divergence` already uses this Lagged-only pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 11:13:31 +01:00
Timmy	b7df5cbe4e	fix(agents): kill-then-status reorder in stop_agent stop_agent had the same order-of-operations bug fixed in the watchdog: status flipped to Failed before the claude process was verified gone, opening the idempotency window that allowed a duplicate spawn to race in alongside the surviving process. Now follows the three-step protocol: 1. Read worktree path under a read-only lock (no mutation). 2. SIGKILL the worktree's process tree via process_kill and block until verified gone — start_agent's Running/Pending whitelist continues to reject duplicate spawns throughout. 3. Only then mutate the agent record, abort the task handle, and drop the child_killers entry. Falls back to the old portable_pty SIGHUP path (with a warning) when no worktree was recorded, matching the watchdog's behaviour. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 10:46:02 +01:00
Timmy	fe9804b32c	feat: add process_kill module + use it to fix watchdog double-spawn Adds `crate::process_kill` — reliable SIGKILL-with-verify primitives used across the server in place of the various ad-hoc kill paths that ignored their kill-effective return values. The module exposes three pieces: - `sigkill_pids_and_verify(pids)`: SIGKILL each pid and block (up to 2s) until every pid is verified gone. Returns survivors if not. - `pids_matching(pattern)`: pgrep -f wrapper. - `descendant_pids(root)`: recursive pgrep -P walker for process trees. Wires the watchdog's limit-termination path through it, and reorders the protocol to fix the duplicate-coder bug observed on story 1086 (2026-05-15): Before: check_agent_limits set status=Failed before the kill ran. The kill itself was `portable_pty::ChildKiller::kill()`, which sends SIGHUP on Unix — claude-code ignores SIGHUP, so the process kept running while the agent record was already marked terminated. The idempotency check in `start_agent` whitelists Running/Pending, so the next auto-assign pass spawned a fresh agent alongside the still-alive prior one. Two claude PIDs sharing one session_id, racing on the same worktree. After: status update is moved OUT of check_agent_limits and into the caller AFTER the kill is verified. The kill itself is now SIGKILL-the- process-tree-in-the-worktree, with explicit verification that every pid is gone. The idempotency window is closed. The existing watchdog test suite (14 tests) still passes; 7 new tests cover the process_kill primitives directly. `agents/pool/process.rs`'s `kill_all_children` and `kill_child_for_key` still use the old portable_pty SIGHUP path — they have the same bug but in lower-impact code paths (shutdown, operator stop). They will be migrated under a separate story to keep this commit focused. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 10:36:33 +01:00
dave	df32a1542b	huskies: merge 1087 story Pipeline+Status split — Step D: migrate CRDT storage to (Pipeline, Status) and remove the Stage enum	2026-05-15 08:47:38 +00:00
dave	e82602db77	huskies: merge 1086 story Pipeline+Status split — Step C: migrate auto-assign, subscribers, and lifecycle transitions to read Pipeline + Status	2026-05-15 08:26:39 +00:00
Timmy	2d6105c778	fix: skip setup commands on worktree reuse so reconciler doesn't fire npm ci every 30s Story 1066 (merged 2026-05-14 23:39) introduced a periodic reconciler that calls `reconcile_worktree_create` every 30 seconds (default `reconcile_interval_secs`). The reconciler's docstring promises it is a no-op for stories whose worktree already exists — but the implementation calls `create_worktree`, whose reuse path was running `run_setup_commands` unconditionally. Setup includes destructive `npm ci` (rm -rf node_modules then reinstall), so every Coding story got `npm ci` fired every 30 seconds. When story 1086 hit a gate-failure retry loop on 2026-05-15, the merge gate's own `npm install`/`npm run build` raced one of these reconciler-driven `npm ci` runs that was wiping node_modules — leaving `.bin/tsc` as a broken symlink pointing into a half-populated `typescript/` package and producing `sh: 1: tsc: not found`. 37 npm ci fires for 1086 in 5 hours against only 3 real Coding transitions, a 12x amplification driven entirely by the 30-second reconcile cadence. Fix: align `create_worktree`'s behaviour with the contract `reconcile_worktree_create` already documents — reuse is a no-op for setup commands. Sparse checkout and `.mcp.json` rewrite still run (both cheap and idempotent). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 08:57:38 +01:00
Timmy	d89940e85b	fix: drop source-map.json from agent orientation bundle The orientation bundle was 96 KB per coder spawn with 85 KB of that being source-map.json — a static symbol listing that drowned out the workflow rules in AGENT.md and likely explains why PLAN.md ceremony is being skipped (the instruction is ~5% of the bundle, buried under a wall of symbols). Agents are excellent at grep on demand, so the source map adds little value as a preloaded cheat sheet. File stays on disk for the merge-time source-map-check doc-coverage gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 07:48:18 +01:00
dave	13f7dab5f0	huskies: merge 1088	2026-05-15 02:03:30 +00:00
dave	b053f14d58	huskies: merge 1085	2026-05-15 01:38:05 +00:00
dave	56179d712e	huskies: merge 1078	2026-05-15 01:32:29 +00:00
dave	1506141155	huskies: merge 1072	2026-05-15 01:27:25 +00:00
dave	0c23d209a0	huskies: merge 1077	2026-05-15 00:58:57 +00:00
dave	eac5763e03	huskies: merge 1075	2026-05-15 00:48:06 +00:00
dave	f9b140add9	huskies: merge 1073	2026-05-15 00:37:01 +00:00
dave	d4db96f709	huskies: merge 1070	2026-05-15 00:20:29 +00:00
dave	5f08573db8	huskies: merge 1076	2026-05-15 00:10:15 +00:00
dave	da83fcb78d	huskies: merge 1074	2026-05-15 00:01:58 +00:00
dave	bb6a6063e8	huskies: merge 1066	2026-05-14 23:45:53 +00:00
dave	374aa77f27	huskies: merge 1069	2026-05-14 23:29:32 +00:00
Timmy	bbc4c9aa45	Bump version to 0.11.0	2026-05-14 23:31:15 +01:00
dave	c66016394b	huskies: merge 1063	2026-05-14 21:53:56 +00:00
dave	23c3301903	huskies: merge 1065	2026-05-14 21:48:09 +00:00
Timmy	e6865a1bc6	fix: stop event-triggers Lagged handler from re-emitting via the same channel Merge 1061 added a replay_current_pipeline_state() call to the broadcast::Lagged branch, but replay broadcasts one event per CRDT item (~997) into a 256-slot channel, deterministically re-overflowing it and triggering another Lagged. The loop pinned CPU and likely caused today's machine crash. Revert to the pre-1061 behaviour of logging and continuing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 22:33:14 +01:00
dave	8f666bd6b3	huskies: merge 1062	2026-05-14 20:36:51 +00:00
dave	5678f2a556	huskies: merge 1061	2026-05-14 20:12:51 +00:00

1 2 3 4 5 ...

1048 Commits