Story 1113 added `#[derive(RustEmbed)] #[folder = "../frontend/dist"]`
plus a unit test that calls `EmbeddedAssets::iter()`. The macro only
generates `iter()` when the folder exists at compile time, so the Rust
build now has a hard compile-time dependency on `frontend/dist/`.
`script/test` ran `cargo clippy` (line 48) before the frontend build
(line 53+). In a fresh merge worktree with no `frontend/dist/`, clippy
failed immediately on the `iter()` call and the script exited before
`npm run build` ever ran — the gate could never self-heal. Blocked
1116's merge today; would block every future merge.
Move the frontend build above all cargo invocations. Verified by
running script/test in a fresh worktree with `node_modules` and
`frontend/dist` removed: 385/385 frontend tests + cargo tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The merge gate classifier was matching trigger keywords like
`missing_doc_comments` inside passing-test name lines
(e.g. `test agents::gates::tests::classify_lint_from_missing_doc_comments ... ok`),
causing every gate failure to be mis-classified as Lint and bounced
back to a fixup coder. Strip `test … … ok` lines before scanning for
lint triggers. Also removes the temporary diagnostic block in
runner.rs that confirmed the bug.
Applied directly to master because the 1101 feature branch carried
stale work from an earlier incarnation of the story that semantically
conflicted with master's later diagnostic commit (`is_fixup` deleted
on the branch, referenced on master).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures the architecture for going from "new project" chat command to
a running, container-isolated, editor-accessible huskies project.
Covers the three personas (chat-only / editor-using / multi-project),
the container template (base + stack overlay + project bind mount),
build sandbox model (host stays clean, all dep-code in container),
editor-agnostic SSH access, git integration, and a 5-phase rollout.
Source for upcoming bootstrap stories.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bug 1102 was created today with origin={kind:user, id:""} because
build_origin silently defaulted id to empty when the caller didn't pass
one — we couldn't tell who filed it. Bug 1088's origin field is useless
as audit if every caller can omit themselves.
Changes:
- build_origin (server/src/http/mcp/story_tools/mod.rs) now returns
Result<String, String> and rejects missing/empty/whitespace-only id
with an instructional error pointing at bug 1102 / story 1104.
- 5 create_* tool handlers (bug, spike, refactor, epic, story) now
resolve origin BEFORE create_*_file so an attribution-less call
leaves no half-state behind.
- 5 tool input schemas now advertise origin as a required object via
a shared origin_schema() helper. The schema description gives every
caller (coder agent, chat bot, user, system) a concrete example so
the LLM populates the field correctly on first sight.
- Test fixtures pass origin = {kind:"test", id:"test-suite"}.
Story 1104 (signed actions) is the longer-term replacement; this is the
quick attribution win agreed for master ahead of that design work.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bug 1101's reframed AC1: when a non-success merge runs, log the typed
GateFailureKind, the matched classifier-trigger substring (if any) and
~90 chars of surrounding context. Fires on every gate failure regardless
of routing, so the next fixup-loop bounce will tell us which substring is
fooling classify() into Fmt|Lint|SourceMapCheck on what's actually a Test
failure.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The test asserted msg_count == 0 on a process-global broadcast channel
(TRANSITION_TX is a single OnceLock<Sender> shared across the test
binary), so any concurrent test calling apply_transition could land
events in our receiver between the drain and the post-reconcile check.
Observed failure: 3 stray transitions from parallel tests.
Drop the strict count check. The real "never floods" invariant is
captured by the Lagged check alone: 1000 seeded items must not overflow
the 256-slot channel, which can only hold if the reconcile path
bypasses the broadcast (AC4). The sibling test
`reconcile_pass_scales_to_1000_items_without_lagged_divergence` already
uses this Lagged-only pattern.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
stop_agent had the same order-of-operations bug fixed in the watchdog:
status flipped to Failed before the claude process was verified gone,
opening the idempotency window that allowed a duplicate spawn to race
in alongside the surviving process.
Now follows the three-step protocol:
1. Read worktree path under a read-only lock (no mutation).
2. SIGKILL the worktree's process tree via process_kill and block
until verified gone — start_agent's Running/Pending whitelist
continues to reject duplicate spawns throughout.
3. Only then mutate the agent record, abort the task handle, and
drop the child_killers entry.
Falls back to the old portable_pty SIGHUP path (with a warning) when
no worktree was recorded, matching the watchdog's behaviour.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds `crate::process_kill` — reliable SIGKILL-with-verify primitives used
across the server in place of the various ad-hoc kill paths that ignored
their kill-effective return values. The module exposes three pieces:
- `sigkill_pids_and_verify(pids)`: SIGKILL each pid and block (up to 2s)
until every pid is verified gone. Returns survivors if not.
- `pids_matching(pattern)`: pgrep -f wrapper.
- `descendant_pids(root)`: recursive pgrep -P walker for process trees.
Wires the watchdog's limit-termination path through it, and reorders the
protocol to fix the duplicate-coder bug observed on story 1086 (2026-05-15):
Before: check_agent_limits set status=Failed before the kill ran. The
kill itself was `portable_pty::ChildKiller::kill()`, which sends SIGHUP
on Unix — claude-code ignores SIGHUP, so the process kept running while
the agent record was already marked terminated. The idempotency check
in `start_agent` whitelists Running/Pending, so the next auto-assign
pass spawned a fresh agent alongside the still-alive prior one. Two
claude PIDs sharing one session_id, racing on the same worktree.
After: status update is moved OUT of check_agent_limits and into the
caller AFTER the kill is verified. The kill itself is now SIGKILL-the-
process-tree-in-the-worktree, with explicit verification that every pid
is gone. The idempotency window is closed.
The existing watchdog test suite (14 tests) still passes; 7 new tests
cover the process_kill primitives directly.
`agents/pool/process.rs`'s `kill_all_children` and `kill_child_for_key`
still use the old portable_pty SIGHUP path — they have the same bug but
in lower-impact code paths (shutdown, operator stop). They will be
migrated under a separate story to keep this commit focused.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>