Two issues that surfaced when story 1 ran in the adopted huskies-server
sled:
1. Dockerfile.base: the base image had no nodejs / claude CLI, so every
coder agent spawn in an adopted project sled failed with
`Unable to spawn claude: No viable candidates found in PATH`. Install
nodejs + @anthropic-ai/claude-code in the base image so every sled
built from it can spawn agents out of the box.
2. worktree/create.rs::install_pre_commit_hook: `git config --worktree`
requires `extensions.worktreeConfig = true` to be set on the repo
config; without it, every worktree creation logged a noisy
`Pre-commit hook install failed` warning. Enable the extension
idempotently before the per-worktree hooks-path set so the hook
install succeeds cleanly.
After this, rebuild huskies-project-base and recreate any adopted
project containers to pick up the CLI.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Story 1130 added HUSKIES_HOST=0.0.0.0 so the server INSIDE a project
container binds to all interfaces, but the host-side `docker -p`
mapping was still `127.0.0.1:{port}:3001` and `127.0.0.1:{ssh_port}:22`
— reachable from the docker host only, blocking remote MCP clients
and out-of-host SSH onto the project container.
Switch host-side mapping to 0.0.0.0 for both the MCP and SSH ports so
project containers spawned via `new project` are reachable from
anywhere that can route to the docker host. Existing containers
created before this commit retain their localhost-only mapping and
need to be recreated to pick up the change.
Add a regression test asserting both -p arguments use 0.0.0.0 and
reject any 127.0.0.1 restriction in the mapping.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The merge gate classifier was matching trigger keywords like
`missing_doc_comments` inside passing-test name lines
(e.g. `test agents::gates::tests::classify_lint_from_missing_doc_comments ... ok`),
causing every gate failure to be mis-classified as Lint and bounced
back to a fixup coder. Strip `test … … ok` lines before scanning for
lint triggers. Also removes the temporary diagnostic block in
runner.rs that confirmed the bug.
Applied directly to master because the 1101 feature branch carried
stale work from an earlier incarnation of the story that semantically
conflicted with master's later diagnostic commit (`is_fixup` deleted
on the branch, referenced on master).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bug 1102 was created today with origin={kind:user, id:""} because
build_origin silently defaulted id to empty when the caller didn't pass
one — we couldn't tell who filed it. Bug 1088's origin field is useless
as audit if every caller can omit themselves.
Changes:
- build_origin (server/src/http/mcp/story_tools/mod.rs) now returns
Result<String, String> and rejects missing/empty/whitespace-only id
with an instructional error pointing at bug 1102 / story 1104.
- 5 create_* tool handlers (bug, spike, refactor, epic, story) now
resolve origin BEFORE create_*_file so an attribution-less call
leaves no half-state behind.
- 5 tool input schemas now advertise origin as a required object via
a shared origin_schema() helper. The schema description gives every
caller (coder agent, chat bot, user, system) a concrete example so
the LLM populates the field correctly on first sight.
- Test fixtures pass origin = {kind:"test", id:"test-suite"}.
Story 1104 (signed actions) is the longer-term replacement; this is the
quick attribution win agreed for master ahead of that design work.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bug 1101's reframed AC1: when a non-success merge runs, log the typed
GateFailureKind, the matched classifier-trigger substring (if any) and
~90 chars of surrounding context. Fires on every gate failure regardless
of routing, so the next fixup-loop bounce will tell us which substring is
fooling classify() into Fmt|Lint|SourceMapCheck on what's actually a Test
failure.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The test asserted msg_count == 0 on a process-global broadcast channel
(TRANSITION_TX is a single OnceLock<Sender> shared across the test
binary), so any concurrent test calling apply_transition could land
events in our receiver between the drain and the post-reconcile check.
Observed failure: 3 stray transitions from parallel tests.
Drop the strict count check. The real "never floods" invariant is
captured by the Lagged check alone: 1000 seeded items must not overflow
the 256-slot channel, which can only hold if the reconcile path
bypasses the broadcast (AC4). The sibling test
`reconcile_pass_scales_to_1000_items_without_lagged_divergence` already
uses this Lagged-only pattern.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
stop_agent had the same order-of-operations bug fixed in the watchdog:
status flipped to Failed before the claude process was verified gone,
opening the idempotency window that allowed a duplicate spawn to race
in alongside the surviving process.
Now follows the three-step protocol:
1. Read worktree path under a read-only lock (no mutation).
2. SIGKILL the worktree's process tree via process_kill and block
until verified gone — start_agent's Running/Pending whitelist
continues to reject duplicate spawns throughout.
3. Only then mutate the agent record, abort the task handle, and
drop the child_killers entry.
Falls back to the old portable_pty SIGHUP path (with a warning) when
no worktree was recorded, matching the watchdog's behaviour.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>