Rename all references from storkit to huskies across the codebase:
- .storkit/ directory → .huskies/
- Binary name, Cargo package name, Docker image references
- Server code, frontend code, config files, scripts
- Fix script/test to build frontend before cargo clippy/test
so merge worktrees have frontend/dist available for RustEmbed
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The timer tick loop now calls move_story_to_current() before
start_agent(), so stories scheduled from the backlog are moved into the
pipeline automatically when the timer fires. The timer bot command also
accepts backlog stories (previously required current).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
strip_mention_separator now skips all non-ASCII-alphanumeric chars
(emoji, colons, spaces) and returns a slice starting at the first
command character. Fixes mention pills with emoji display names
(e.g. "timmy ⚡️ status") not matching bot commands.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The /help test expected the help overlay to appear, but /help now goes
through botCommand like other slash commands. Updated the test to match.
Also added reader thread join and child.wait() calls to
claude_code.rs to prevent PTY master fd leaks from web UI chat sessions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The reader thread spawned in run_agent_pty_blocking was never joined,
leaving a cloned PTY master fd open after the agent exited. When the
pipeline restarted the agent on the same worktree, the stale fd from
the previous session interfered with the new PTY allocation, causing
Claude Code's bundled ripgrep to crash with:
fatal runtime error: assertion failed: output.write(&bytes).is_ok()
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Builds aarch64-unknown-linux-musl via cross alongside the existing
x86_64 Linux and macOS arm64 targets.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The acceptance gate was hardcoded to run cargo clippy, which fails on
non-Rust projects (Go, Node, etc.). Now the gate only runs script/test
which is project-specific. Clippy is added to storkit's own script/test
so Rust linting is preserved for this project.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The LLM was having the conversation with the user but never following
through with wizard_generate calls. The instructions now spell out
the full workflow: get hint, write content, stage it, show user, confirm.
Also adds "keep moving" instruction so the LLM auto-advances to the
next step after confirmation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previously unblock only checked for blocked=true. Stories stuck in
merge with a merge_failure field were not considered "blocked" and
unblock refused to act. Now it clears both blocked and merge_failure,
and reports which fields were cleared.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There were two places checking for code changes: the post-cherry-pick
verification (already fixed) and a pre-cherry-pick check in the
merge-queue worktree. The pre-cherry-pick check was still filtering
all of .storkit/ which rejected stories that only change project.toml.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
wizard_generate now checks if the project has no source code. On bare
projects, the generation hints tell the LLM to ask the user what they
want to build and what tech stack they plan to use, rather than trying
to read a nonexistent codebase.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The format_wizard_state hints now tell the LLM what to do ("show it
to the user and ask if they're happy") rather than exposing tool names
to the user ("Run wizard_generate").
README wizard instructions now distinguish between existing-code projects
(read codebase, generate files) and bare projects (interview the user
about what they want to build).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The wizard check was only in CLAUDE.md which is Claude-specific.
Move the primary instruction to .storkit/README.md (step 1 of First
Steps) so any LLM reading the dev process docs will discover the wizard.
CLAUDE.md keeps a shorter pointer to the README.
Also fix stale .story_kit/ paths to .storkit/ in the README.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Change from passive "call wizard_status to check progress" to active
"On your first conversation, call wizard_status" with IMPORTANT prefix.
Without the direct instruction, Claude ignores the wizard tools.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Without this, Claude Code in a freshly scaffolded project has no idea
storkit's wizard or MCP tools exist and gives generic setup advice.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The post-cherry-pick diff check was excluding all of .storkit/, which
rejected stories whose deliverable is .storkit/project.toml changes
(e.g. 431 updating QA agent prompts). Narrow the exclusion to
.storkit/work/ which is where pipeline file moves live.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Manual merge of story 399 feature branch, adapted for the current CLI
parser (which includes the init subcommand from 429).
- storkit --port 3000 sets the listening port
- storkit --port=3000 also works
- Port resolution: CLI flag > STORKIT_PORT env > default 3001
- Supports combining with init: storkit init --port 3000 /path
- Replaces CliDirective enum with CliArgs struct that handles both
--port and init in a single pass
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Skip the version bump commit if nothing changed, so re-running
script/release for the same version doesn't fail on empty commit.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The changelog grep commands return exit code 1 when no commits match,
which set -euo pipefail treats as fatal. Add || true guards so the
script continues to the tag/push/release steps.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
After the cherry-pick step in run_squash_merge, verify:
1. project_root is on the base branch (not a merge-queue branch)
2. HEAD commit has actual code changes (not an empty/story-only diff)
If either check fails, return success=false so the story stays in merge
stage for retry instead of being phantom-advanced to done.
Also rename move_story_to_archived → move_story_to_done.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Story 424's merge used the wrong variant name HardBlock instead of
RateLimitHardBlock, breaking master compilation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The initial commit added the `throttled` field to `StoryAgent` but missed
several construction sites in lifecycle.rs, test_helpers.rs, and scan.rs.
Also adds the `HardBlock` match arm in the WebSocket event conversion and
minor CSS/import ordering fixes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add HardBlock variant to WatcherEvent (story_id, agent_name, reset_time)
- In pty.rs, distinguish allowed_warning (throttle) from hard blocks;
emit RateLimitWarning for throttles, HardBlock for actual 429s
- Add `throttled: bool` field to StoryAgent / AgentInfo
- Pool spawns a background listener that sets throttled=true on
RateLimitWarning or HardBlock events and fires AgentStateChanged
- Status command shows traffic-light dots: ○ idle, ● running, ◑ throttled, ✗ blocked
- Read blocked flag from story front matter for the ✗ dot
- Notifications: RateLimitWarning silenced (too noisy); HardBlock sends
urgent chat notification with optional reset time
- Tests added for traffic_light_dot, read_story_blocked, status output,
and all notification paths
The new WatcherEvent::RateLimitHardBlock variant added in the feature
commit was not covered in the ws.rs From<WatcherEvent> match, causing
a compile error. Add the missing arm returning None (same as
RateLimitWarning — handled by chat notifications only, not WebSocket).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add `unblock` bot command (chat + web UI slash command) that clears the
`blocked` flag and resets `retry_count` to 0 in story front matter
- Works across all pipeline stages (1_backlog through 6_archived)
- Returns confirmation with story name and ID, or clear error if story
is not found or not blocked
- Expose `unblock_story` MCP tool for programmatic use by agents
- Make `chat::commands::unblock` module pub(crate) so story_tools can
call `unblock_by_number`
- Add 8 unit tests covering registration, validation, core logic, and
edge cases (not-found, not-blocked, any stage, story ID in response)
- Update MCP tools list test: 49 → 50 tools
Add three HTTP endpoints for OAuth login without terminal access:
- GET /oauth/authorize — generates PKCE params, redirects to
claude.com/cai/oauth/authorize with code=true and full scopes
- GET /callback — exchanges auth code for tokens via JSON POST to
platform.claude.com/v1/oauth/token, writes ~/.claude/.credentials.json
- GET /oauth/status — returns current credential state as JSON
Uses SHA-256 (sha2 crate) for PKCE code challenge. The authorize URL
targets claude.com/cai/ (not platform.claude.com) which is required
for Max/Pro subscriptions to grant user:inference scope.
Users visit http://localhost:3001/oauth/authorize in their browser
to authenticate. Matrix/WhatsApp can send this link when auth fails.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Detect authentication_failed errors from the Claude Code PTY stream
and automatically refresh the OAuth access token using the stored
refresh token in ~/.claude/.credentials.json.
- New module server/src/llm/oauth.rs: reads credentials, calls
platform.claude.com/v1/oauth/token with JSON body, writes back
- PTY provider detects "error":"authentication_failed" via AtomicBool
- chat_stream retries once after successful refresh
- Clear error message if refresh also fails
On success the retry is transparent. On failure the user sees:
"OAuth session expired. Please run claude login to re-authenticate."
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Archive completed bug 397
- Create story 399 (CLI port flag)
- Remove @biomejs/cli-darwin-arm64 and @rollup/rollup-darwin-arm64
from package.json (breaks Docker builds on Linux)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New projects now get bot.toml.matrix.example, bot.toml.whatsapp-meta.example,
bot.toml.whatsapp-twilio.example, and bot.toml.slack.example in .storkit/
during scaffolding.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the monolithic bot.toml.example with separate files for each
transport: matrix, whatsapp-meta, whatsapp-twilio, and slack. Add a
chat bot configuration section to the README explaining that only one
transport can be active at a time and how to set up each one.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
work/2_current/, work/3_qa/, work/4_merge/ are not committed to git
(only 1_backlog, 5_done, 6_archived are). New projects were missing
these entries in .storkit/.gitignore.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
gVisor is incompatible with OrbStack bind mounts on macOS — writes to
/mnt/mac are blocked by the gVisor filesystem sandbox. Removing
runtime: runsc from docker-compose.yml, the gVisor setup docs from
README.md, and the runsc assertion test from rebuild.rs.
The existing Docker hardening (read-only root, cap_drop ALL,
no-new-privileges, resource limits) remains in place.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The bind-mounted node_modules from macOS contains darwin-arm64 native
binaries which don't work in the Linux container. Run npm install on
container startup to get the correct platform binaries.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Commit e4227cf (a story creation auto-commit) erroneously deleted 175
files from master's tree, likely due to a race condition between
concurrent git operations. This commit re-adds all files from the
working directory.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When git worktree remove fails with "not a working tree", fall back to
removing the directory directly and run git worktree prune to clean
stale metadata. Fixes bug 364.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
.story_kit/ and .story_kit_port were stale references from before the
rename to storkit.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When unset, Claude Code falls back to OAuth credentials from
`claude login`, allowing agents to run on a Max subscription
instead of prepaid API credits.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Docker named volumes inherit directory ownership when first created.
By creating /workspace/target and /app/target as storkit-owned before
the USER directive, the volumes will be writable by the storkit user.
Without this, cargo build/test/clippy all fail with Permission Denied.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reverts workarounds added by the 361 agent when the hardened Docker
container broke the test suite:
- gates.rs: restore tempfile::tempdir() (was changed to tempdir_in
CARGO_MANIFEST_DIR to avoid noexec /tmp; noexec is now removed)
- pool/mod.rs: restore ps -p <pid> check in process_is_running (was
changed to /proc/<pid> existence check; procps is now installed)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add procps to runtime stage so `ps` is available for process management
- Remove noexec from /tmp and /home/storkit tmpfs mounts so test scripts
can be executed from tempdir
- Update coder agent system_prompt to run clippy --all-targets --all-features
matching what the server acceptance gate actually runs
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Acceptance gates run cargo clippy but the component wasn't installed
in the build stage. Agents were doing real work then failing every
gate check because clippy wasn't available.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The tmpfs at /home/storkit defaulted to root ownership (mode=755),
so Claude Code couldn't write ~/.claude.json or ~/.cache/. Set
uid=999,gid=999 to match the storkit user.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Claude Code writes to ~/.claude.json, ~/.cache/, and ~/.npm/ which
failed silently on the read-only root filesystem. Add tmpfs at
/home/storkit so the home dir is writable (the claude-state volume
overlays on top for persistent .claude/ data).
Also fix .dockerignore: use **/target/ to match nested target dirs,
add .storkit/logs/ and **/node_modules/ to prevent multi-GB build
context transfers.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use the known cargo build output path instead of current_exe() when
re-execing after a rebuild. In Docker, the running binary lives at
/usr/local/bin/storkit (read-only) while cargo writes the new binary
to /app/target/release/storkit (a writable volume), so current_exe()
would just restart the old binary.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Run as non-root user (fixes Claude Code refusing bypassPermissions as
root, which caused all agent spawns to exit instantly with no session).
Add read-only root filesystem, drop all capabilities, set
no-new-privileges, bind port to localhost only, and require
GIT_USER_NAME/GIT_USER_EMAIL env vars at startup.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Bumps server/Cargo.toml and frontend/package.json to 0.4.1
- Release script now auto-bumps both version files when run
- Changelog generation matches both "storkit:" and "story-kit:" prefixes
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The .story_kit → .storkit rename updated the grep pattern but all historical
merge commits still use the old "story-kit:" prefix, so overview could not
find any stories.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Renames the config directory and updates 514 references across 42 Rust
source files, plus CLAUDE.md, .gitignore, Makefile, script/release,
and .mcp.json files. All 1205 tests pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Updates -p flag in rebuild_and_restart, MCP server name, enabledMcpjsonServers,
and test values to match the new binary/crate name.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds htop bot command with live-updating Matrix message showing system
load and per-agent CPU/memory usage. Supports timeout override and
htop stop. Resolved conflict with git command in commands.rs registry.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Stories got stuck in QA/merge when agents were busy at assignment time.
Consolidates auto_assign into a single unconditional call at the end of
run_pipeline_advance, so whenever any agent completes, the system
immediately scans for pending work and assigns free agents.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Moves status, ambient, and help commands into a unified command registry
in commands.rs. Help output now automatically lists all registered
commands. Resolved merge conflict with 1_backlog rename.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bug 283 was implemented with manual_qa defaulting to true, causing all
stories to hold in QA for human review. Changed to default false as
originally specified — stories advance automatically unless explicitly
opted in with manual_qa: true.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add permission rules to .claude/settings.json
- Document empty merge and direct-to-master problems in problems.md
- Fix agent stream URL to use vite proxy instead of hardcoded host
- Add /agents proxy config to vite.config.ts
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Change pkill pattern to 'target.*story-kit' to only match the Rust
binary, not any process with story-kit in its working directory.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Removed the --system argument from the PTY runner — Claude Code CLI
doesn't support it. Bot name instruction is now prepended to the user
prompt instead of passed as a system prompt argument.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Cast lastSendChatArgs through unknown to satisfy TypeScript's
strict type narrowing on the nullable union type.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds system_prompt parameter to chat_stream so the Matrix bot
passes "Your name is {name}" to Claude Code. Reads display_name
from bot.toml config. Resolved conflicts by integrating bot_name
into master's permission-handling code structure.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The history trimming logic cleared session_id = None whenever entries
exceeded history_size. Since the history was always at capacity, every
new message wiped the session_id immediately after storing it. This
meant --resume was never passed to Claude Code, making every turn a
fresh conversation.
Fix: preserve session_id on trim. Claude Code's --resume loads the
full conversation from its own session transcript on disk, so trimming
our local tracking entries doesn't invalidate the session.
Also adds debug logging for session_id capture/storage (temporary).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The test was asserting 5_done should NOT trigger commits, but commit 74dc42c
added 5_done to COMMIT_WORTHY_STAGES. Updated test to match.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Upgrade mergemaster prompt to resolve complex conflicts itself
instead of just reporting failure. Includes instructions to check
git history and story files for context before resolving.
- Add proxy error handler to vite config to prevent crashes on
backend ECONNREFUSED.
- Fix bug 279: auto-assign now checks that preferred agent's stage
matches the pipeline stage. Coders won't be assigned to QA/merge.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds assigned agent display to the expanded work item detail panel.
Resolved conflicts by keeping master versions of bot.rs (permission
handling), ChatInput.tsx, and fs.rs. Removed duplicate list_project_files
endpoint and tests from io.rs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The mergemaster's auto-resolver inserted a duplicate chat_stream call
inside the tokio::select! permission loop, producing mismatched braces.
The 266 merge didn't contribute useful code changes to bot.rs — the
session_id handling was already present.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ignore 2_current/, 3_qa/, and 4_merge/ in git since spike 92 stopped
committing intermediate pipeline moves. Keep 5_done/ tracked so
accepted work is in git history before sweep to archived.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Filter flush_pending() to only git-commit for terminal stages
(1_upcoming and 6_archived) while still broadcasting WatcherEvents
for all stages so the frontend stays in sync.
Reduces pipeline commits from 5+ to 2 per story run. No system
dependencies on intermediate commits were found.
Preserves merge_failure front matter cleanup from master.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
script/test unconditionally cd'd into frontend/ and ran npm test, which
failed in sparse-checkout worktrees that only contain the server/ subtree.
Guard the frontend step with a directory existence check so acceptance
gates pass for worktrees that have no frontend checkout.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Vite only needs to watch frontend/ sources. Ignore git objects,
Rust source, Cargo files, node_modules, and vendor directories
that change in bulk during squash merges.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The server writes .mcp.json with its current port on startup and
during worktree creation. Tracking it in git causes QA gate failures
when the worktree copy diverges from master (e.g. different port).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Generate structured changelogs from completed stories instead of raw
commit messages. Group by features, bug fixes, and refactors. Filter
out story-kit automation commits.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
TypeScript control flow analysis can't track reassignment inside vi.mock
callbacks, causing lastSendChatArgs to narrow to never. Use non-null
assertions after the explicit toBeNull() guard.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The homeserver requires User-Interactive Authentication before accepting
cross-signing keys. Pass the bot's password so bootstrap succeeds.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
run_command_with_timeout piped stdout/stderr but only read them after
the child exited. When test output exceeded the 64KB OS pipe buffer,
the child blocked on write() while the parent blocked on waitpid() —
a permanent deadlock that caused every merge pipeline to hang.
Drain both pipes in background threads so the buffers never fill.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The merge pipeline (squash merge + quality gates) takes well over 60
seconds. Claude Code's MCP HTTP transport times out at 60s, causing
"completed with no output" — the mergemaster retries fruitlessly.
merge_agent_work now starts the pipeline as a background task and
returns immediately. A new get_merge_status tool lets the mergemaster
poll until the job reaches a terminal state. Also adds a double-start
guard so concurrent calls for the same story are rejected.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Test commands in run_project_tests now use wait-timeout to enforce a
600-second ceiling, preventing hung processes (e.g. Playwright with no
server) from blocking the merge pipeline indefinitely. Also disables
e2e tests in script/test until the merge workspace can run them safely.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds an optional `agent:` field to story file front matter so that a
specific agent can be requested for a story. The auto-assign loop now:
1. Reads the front-matter `agent` field for each story before picking
a free agent.
2. If a preferred agent is named, uses it when free; skips the story
(without falling back) when that agent is busy.
3. Falls back to the existing `find_free_agent_for_stage` behaviour
when no preference is specified.
Ported from feature branch that predated the agents.rs module refactoring.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Split the monolithic agents.rs into 6 focused modules:
- mod.rs: shared types (AgentEvent, AgentStatus, etc.) and re-exports
- pool.rs: AgentPool struct, all methods, and helper free functions
- pty.rs: PTY streaming (run_agent_pty_blocking, emit_event)
- lifecycle.rs: story movement functions (move_story_to_qa, etc.)
- gates.rs: acceptance gates (clippy, tests, coverage)
- merge.rs: squash-merge, conflict resolution, quality gates
All 121 original tests are preserved and distributed across modules.
Also adds clear_front_matter_field to story_metadata.rs to strip
stale merge_failure from front matter when stories move to done.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add "Bug Workflow: Root Cause First" guidance to all coder agent prompts
and system prompts. Adds a test ensuring all coder-stage agents include
root cause, git bisect/log, and anti-workaround instructions.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Commit untracked work pipeline files (stories, bugs in various stages)
and package-lock.json that were present on the filesystem but not yet
tracked by git.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Correct lastScrollTopRef assignment in scrollToBottom to read back the
browser-capped scrollTop value instead of using scrollHeight, which was
causing handleScroll to incorrectly detect upward scrolling and disable
auto-scroll.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove the require_verified_devices config toggle. The bot now always requires
encrypted rooms and cross-signing-verified devices before executing any command.
Messages from unencrypted rooms or unverified devices are rejected.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement /btw side question slash command — lets users ask quick
questions from conversation context without disrupting the main chat.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. **Check for MCP Tools:** Read `.mcp.json` to discover the MCP server endpoint. Then list available tools by calling:
1. **Check Setup Wizard:** Call `wizard_status` to check if project setup is complete. If the wizard is not complete, guide the user through the remaining steps. Important rules for the wizard flow:
- **Be conversational.** Don't show tool names, step numbers, or raw wizard output to the user.
- **On projects with existing code:** Read the codebase and generate each file, then show the user what you wrote and ask if it looks right.
- **On bare projects with no code:** Ask the user what they want to build, what language/framework they plan to use, and generate files from their answers.
- **You must actually generate the files.** The workflow for each step is: (1) call `wizard_generate` with no args to get a hint, (2) write the file content yourself based on the conversation, (3) call `wizard_generate` again with the `content` argument containing the full file body, (4) show the user what you wrote, (5) call `wizard_confirm` (they approve), `wizard_retry` (they want changes), or `wizard_skip` (they want to skip). Do not stop after discussing — follow through and write the files.
- **Keep moving.** After each step is confirmed, immediately proceed to the next wizard step without waiting for the user to ask.
2. **Check for MCP Tools:** Read `.mcp.json` to discover the MCP server endpoint. Then list available tools by calling:
This returns the full tool catalog (create stories, spawn agents, record tests, manage worktrees, etc.). Familiarize yourself with the available tools before proceeding. These tools allow you to directly manipulate the workflow and spawn subsidiary agents without manual file manipulation.
This returns the full tool catalog (create stories, spawn agents, record tests, manage worktrees, etc.). Familiarize yourself with the available tools before proceeding. These tools allow you to directly manipulate the workflow and spawn subsidiary agents without manual file manipulation.
2. **Read Context:** Check `.story_kit/specs/00_CONTEXT.md` for high-level project goals.
3. **Read Context:** Check `.huskies/specs/00_CONTEXT.md` for high-level project goals.
3. **Read Stack:** Check `.story_kit/specs/tech/STACK.md` for technical constraints and patterns.
4. **Read Stack:** Check `.huskies/specs/tech/STACK.md` for technical constraints and patterns.
4. **Check Work Items:** Look at `.story_kit/work/1_upcoming/` and `.story_kit/work/2_current/` to see what work is pending.
5. **Check Work Items:** Look at `.huskies/work/1_backlog/` and `.huskies/work/2_current/` to see what work is pending.
Items in `5_done` are auto-swept to `6_archived` after 4 hours by the server.
Items in `5_done` are auto-swept to `6_archived` after 4 hours by the server.
@@ -87,7 +93,7 @@ Items in `5_done` are auto-swept to `6_archived` after 4 hours by the server.
The server watches `.story_kit/work/` for changes. When a file is created, moved, or modified, the watcher auto-commits with a deterministic message and broadcasts a WebSocket notification to the frontend. This means:
The server watches `.story_kit/work/` for changes. When a file is created, moved, or modified, the watcher auto-commits with a deterministic message and broadcasts a WebSocket notification to the frontend. This means:
* MCP tools only need to write/move files — the watcher handles git commits
* MCP tools only need to write/move files — the watcher handles git commits
* IDE drag-and-drop works (drag a story from `1_upcoming/` to `2_current/`)
* IDE drag-and-drop works (drag a story from `1_backlog/` to `2_current/`)
* The frontend updates automatically without manual refresh
* The frontend updates automatically without manual refresh
---
---
@@ -156,7 +162,7 @@ Not everything needs to be a full story. Simple bugs can skip the story process:
* Performance issues with known fixes
* Performance issues with known fixes
### Bug Process
### Bug Process
1. **Document Bug:** Create a bug file in `work/1_upcoming/` named `{id}_bug_{slug}.md` with:
1. **Document Bug:** Create a bug file in `work/1_backlog/` named `{id}_bug_{slug}.md` with:
***Symptom:** What the user observes
***Symptom:** What the user observes
***Root Cause:** Technical explanation (if known)
***Root Cause:** Technical explanation (if known)
***Reproduction Steps:** How to trigger the bug
***Reproduction Steps:** How to trigger the bug
@@ -186,7 +192,7 @@ Not everything needs a story or bug fix. Spikes are time-boxed investigations to
* Need to validate performance constraints
* Need to validate performance constraints
### Spike Process
### Spike Process
1. **Document Spike:** Create a spike file in `work/1_upcoming/` named `{id}_spike_{slug}.md` with:
1. **Document Spike:** Create a spike file in `work/1_backlog/` named `{id}_spike_{slug}.md` with:
***Question:** What you need to answer
***Question:** What you need to answer
***Hypothesis:** What you expect to be true
***Hypothesis:** What you expect to be true
***Timebox:** Strict limit for the research
***Timebox:** Strict limit for the research
@@ -209,7 +215,7 @@ When the LLM context window fills up (or the chat gets slow/confused):
1. **Stop Coding.**
1. **Stop Coding.**
2. **Instruction:** Tell the user to open a new chat.
2. **Instruction:** Tell the user to open a new chat.
3. **Handoff:** The only context the new LLM needs is in the `specs/` folder and `.mcp.json`.
3. **Handoff:** The only context the new LLM needs is in the `specs/` folder and `.mcp.json`.
* *Prompt for New Session:* "I am working on Project X. Read `.mcp.json` to discover available tools, then read `specs/00_CONTEXT.md` and `specs/tech/STACK.md`. Then look at `work/1_upcoming/` and `work/2_current/` to see what is pending."
* *Prompt for New Session:* "I am working on Project X. Read `.mcp.json` to discover available tools, then read `specs/00_CONTEXT.md` and `specs/tech/STACK.md`. Then look at `work/1_backlog/` and `work/2_current/` to see what is pending."
---
---
@@ -221,14 +227,36 @@ If a user hands you this document and says "Apply this process to my project":
1. **Check for MCP Tools:** Look for `.mcp.json` in the project root. If it exists, you have programmatic access to workflow tools and agent spawning capabilities.
1. **Check for MCP Tools:** Look for `.mcp.json` in the project root. If it exists, you have programmatic access to workflow tools and agent spawning capabilities.
2. **Analyze the Request:** Ask for the high-level goal ("What are we building?") and the tech preferences ("Rust or Python?").
2. **Analyze the Request:** Ask for the high-level goal ("What are we building?") and the tech preferences ("Rust or Python?").
3. **Git Check:** Check if the directory is a git repository (`git status`). If not, run `git init`.
3. **Git Check:** Check if the directory is a git repository (`git status`). If not, run `git init`.
4. **Scaffold:** Run commands to create the `work/` and `specs/` folders with the 6-stage pipeline (`work/1_upcoming/` through `work/6_archived/`).
4. **Scaffold:** Run commands to create the `work/` and `specs/` folders with the 6-stage pipeline (`work/1_backlog/` through `work/6_archived/`).
5. **Draft Context:** Write `specs/00_CONTEXT.md` based on the user's answer.
5. **Draft Context:** Write `specs/00_CONTEXT.md` based on the user's answer.
6. **Draft Stack:** Write `specs/tech/STACK.md` based on best practices for that language.
6. **Draft Stack:** Write `specs/tech/STACK.md` based on best practices for that language.
7. **Wait:** Ask the user for "Story #1".
7. **Wait:** Ask the user for "Story #1".
---
---
## 6. Code Quality
## 6. Chat Bot Configuration
Story Kit includes a chat bot that can be connected to one messaging platform at a time. The bot handles commands, LLM conversations, and pipeline notifications.
**Only one transport can be active at a time.** To configure the bot, copy the appropriate example file to `.huskies/bot.toml`:
The `bot.toml` file is gitignored (it contains secrets). The example files are checked in for reference.
---
## 7. Code Quality
**MANDATORY:** Before completing Step 3 (Verification) of any story, you MUST run all applicable linters, formatters, and test suites and fix ALL errors and warnings. Zero tolerance for warnings or errors.
**MANDATORY:** Before completing Step 3 (Verification) of any story, you MUST run all applicable linters, formatters, and test suites and fix ALL errors and warnings. Zero tolerance for warnings or errors.
Recurring issues observed during pipeline operation. Review periodically and create stories for systemic problems.
## 2026-03-18: Stories graduating to "done" with empty merges (7 of 10)
Pipeline allows stories to move through coding → QA → merge → done without any actual code changes landing on master. The squash-merge produces an empty diff but the pipeline still marks the story as done. Affected stories: 247, 273, 274, 278, 279, 280, 92. Only 266, 271, 277, and 281 actually shipped code. Root cause: no check that the merge commit contains a non-empty diff. Filed bug 283 for the manual_qa gate issue specifically, but the empty-merge-to-done problem is broader and needs its own fix.
## 2026-03-18: Agent committed directly to master instead of worktree
Multiple agents have committed directly to master instead of their worktree/feature branch:
- Commit `5f4591f` ("fix: update should_commit_stage test to match 5_done") — likely mergemaster
- Commit `a32cfbd` ("Add bot-level command registry with help command") — story 285 coder committed code + Cargo.lock directly to master
Agents should only commit to their feature branch or merge-queue branch, never to master directly. Suspect agents are running `git commit` in the project root instead of the worktree directory. This can also revert uncommitted fixes on master (e.g. project.toml pkill fix was overwritten). Frequency: at least 2 confirmed cases. This is a recurring and serious problem — needs a guard in the server or agent prompts.
## 2026-03-19: Auto-assign re-assigns mergemaster to failed merge stories in a loop
After bug 295 fix (`auto_assign_available_work` after every pipeline advance), mergemaster gets re-assigned to stories that already have a merge failure flag. Story 310 had an empty diff merge failure — mergemaster correctly reported the failure, but auto-assign immediately re-assigned mergemaster to the same story, creating an infinite retry loop. The auto-assign logic needs to check for the `merge_failure` front matter flag before re-assigning agents to stories in `4_merge/`.
## 2026-03-19: Coder produces no code (complete ghost — story 310)
Story 310 (Bot delete command) went through the full pipeline — coder session ran, passed QA/gates, moved to merge — but the coder produced zero code. No commits on the feature branch, no commits on master. The entire agent session was a no-op. This is different from the "committed to master instead of worktree" problem — in this case, the coder simply did nothing. Need to investigate the coder logs to understand what happened. The empty-diff merge check would catch this at merge time, but ideally the server should detect "coder finished with no commits on feature branch" at the gate-check stage and fail early.
## 2026-03-19: Auto-assign assigns mergemaster to coding-stage stories
Auto-assign picked mergemaster for story 310 which was in `2_current/`. Mergemaster should only work on stories in `4_merge/`. The `auto_assign_available_work` function doesn't enforce that the agent's configured stage matches the pipeline stage of the story it's being assigned to. Story 279 (auto-assign respects agent stage from front matter) was supposed to fix this, but the check may only apply to front-matter preferences, not the fallback assignment path.
# Project-wide default QA mode: "server", "agent", or "human".
# Per-story `qa` front matter overrides this setting.
default_qa="server"
# Default model for coder agents. Only agents with this model are auto-assigned.
# Opus coders are reserved for explicit per-story `agent:` front matter requests.
default_coder_model="sonnet"
# Maximum concurrent coder agents. Stories wait in 2_current/ when all slots are full.
max_coders=3
# Maximum retries per story per pipeline stage before marking as blocked.
# Set to 0 to disable retry limits.
max_retries=3
# Base branch name for this project. Worktree creation, merges, and agent prompts
# use this value for {{base_branch}}. When not set, falls back to auto-detection
# (reads current HEAD branch).
base_branch="master"
[[component]]
name="frontend"
path="frontend"
setup=["npm ci","npm run build"]
teardown=[]
[[component]]
name="server"
path="."
setup=["mkdir -p frontend/dist","cargo check"]
teardown=[]
[[agent]]
name="coder-1"
stage="coder"
role="Full-stack engineer. Implements features across all components."
model="sonnet"
max_turns=50
max_budget_usd=5.00
prompt="You are working in a git worktree on story {{story_id}}. Read CLAUDE.md first, then .story_kit/README.md to understand the dev process. The story details are in your prompt above. Follow the SDTW process through implementation and verification (Steps 1-3). The worktree and feature branch already exist - do not create them. Check .mcp.json for MCP tools. Do NOT accept the story or merge - commit your work and stop. If the user asks to review your changes, tell them to run: cd \"{{worktree_path}}\" && git difftool {{base_branch}}...HEAD\n\nIMPORTANT: Commit all your work before your process exits. The server will automatically run acceptance gates (cargo clippy + tests) when your process exits and advance the pipeline based on the results.\n\n## Bug Workflow: Root Cause First\nWhen working on bugs:\n1. Investigate the root cause before writing any fix. Use `git bisect` to find the breaking commit or `git log` to trace history. Read the relevant code before touching anything.\n2. Fix the root cause with a surgical, minimal change. Do NOT add new abstractions, wrappers, or workarounds when a targeted fix to the original code is possible.\n3. Write commit messages that explain what broke and why, not just what was changed.\n4. If you cannot determine the root cause after thorough investigation, document what you tried and why it was inconclusive — do not guess and ship a speculative fix."
system_prompt="You are a full-stack engineer working autonomously in a git worktree. Follow the Story-Driven Test Workflow strictly. Run cargo clippy --all-targets --all-features and biome checks before considering work complete. Commit all your work before finishing - use a descriptive commit message. Do not accept stories, move them to archived, or merge to master - a human will do that. Do not coordinate with other agents - focus on your assigned story. The server automatically runs acceptance gates when your process exits. For bugs, always find and fix the root cause. Use git bisect to find breaking commits. Do not layer new code on top of existing code when a surgical fix is possible. If root cause is unclear after investigation, document what you tried rather than guessing."
[[agent]]
name="coder-2"
stage="coder"
role="Full-stack engineer. Implements features across all components."
model="sonnet"
max_turns=50
max_budget_usd=5.00
prompt="You are working in a git worktree on story {{story_id}}. Read CLAUDE.md first, then .story_kit/README.md to understand the dev process. The story details are in your prompt above. Follow the SDTW process through implementation and verification (Steps 1-3). The worktree and feature branch already exist - do not create them. Check .mcp.json for MCP tools. Do NOT accept the story or merge - commit your work and stop. If the user asks to review your changes, tell them to run: cd \"{{worktree_path}}\" && git difftool {{base_branch}}...HEAD\n\nIMPORTANT: Commit all your work before your process exits. The server will automatically run acceptance gates (cargo clippy + tests) when your process exits and advance the pipeline based on the results.\n\n## Bug Workflow: Root Cause First\nWhen working on bugs:\n1. Investigate the root cause before writing any fix. Use `git bisect` to find the breaking commit or `git log` to trace history. Read the relevant code before touching anything.\n2. Fix the root cause with a surgical, minimal change. Do NOT add new abstractions, wrappers, or workarounds when a targeted fix to the original code is possible.\n3. Write commit messages that explain what broke and why, not just what was changed.\n4. If you cannot determine the root cause after thorough investigation, document what you tried and why it was inconclusive — do not guess and ship a speculative fix."
system_prompt="You are a full-stack engineer working autonomously in a git worktree. Follow the Story-Driven Test Workflow strictly. Run cargo clippy --all-targets --all-features and biome checks before considering work complete. Commit all your work before finishing - use a descriptive commit message. Do not accept stories, move them to archived, or merge to master - a human will do that. Do not coordinate with other agents - focus on your assigned story. The server automatically runs acceptance gates when your process exits. For bugs, always find and fix the root cause. Use git bisect to find breaking commits. Do not layer new code on top of existing code when a surgical fix is possible. If root cause is unclear after investigation, document what you tried rather than guessing."
[[agent]]
name="coder-3"
stage="coder"
role="Full-stack engineer. Implements features across all components."
model="sonnet"
max_turns=50
max_budget_usd=5.00
prompt="You are working in a git worktree on story {{story_id}}. Read CLAUDE.md first, then .story_kit/README.md to understand the dev process. The story details are in your prompt above. Follow the SDTW process through implementation and verification (Steps 1-3). The worktree and feature branch already exist - do not create them. Check .mcp.json for MCP tools. Do NOT accept the story or merge - commit your work and stop. If the user asks to review your changes, tell them to run: cd \"{{worktree_path}}\" && git difftool {{base_branch}}...HEAD\n\nIMPORTANT: Commit all your work before your process exits. The server will automatically run acceptance gates (cargo clippy + tests) when your process exits and advance the pipeline based on the results.\n\n## Bug Workflow: Root Cause First\nWhen working on bugs:\n1. Investigate the root cause before writing any fix. Use `git bisect` to find the breaking commit or `git log` to trace history. Read the relevant code before touching anything.\n2. Fix the root cause with a surgical, minimal change. Do NOT add new abstractions, wrappers, or workarounds when a targeted fix to the original code is possible.\n3. Write commit messages that explain what broke and why, not just what was changed.\n4. If you cannot determine the root cause after thorough investigation, document what you tried and why it was inconclusive — do not guess and ship a speculative fix."
system_prompt="You are a full-stack engineer working autonomously in a git worktree. Follow the Story-Driven Test Workflow strictly. Run cargo clippy --all-targets --all-features and biome checks before considering work complete. Commit all your work before finishing - use a descriptive commit message. Do not accept stories, move them to archived, or merge to master - a human will do that. Do not coordinate with other agents - focus on your assigned story. The server automatically runs acceptance gates when your process exits. For bugs, always find and fix the root cause. Use git bisect to find breaking commits. Do not layer new code on top of existing code when a surgical fix is possible. If root cause is unclear after investigation, document what you tried rather than guessing."
[[agent]]
name="qa-2"
stage="qa"
role="Reviews coder work in worktrees: runs quality gates, verifies acceptance criteria, and reports findings."
model="sonnet"
max_turns=40
max_budget_usd=4.00
prompt="""You are the QA agent for story {{story_id}}. Your job is to verify the coder's work satisfies the story's acceptance criteria and produce a structured QA report.
Read CLAUDE.md first, then .story_kit/README.md to understand the dev process.
## Your Workflow
### 0. Read the Story
- Read the story file at `.huskies/work/3_qa/{{story_id}}.md`
- Extract every acceptance criterion (the `- [ ]` checkbox lines)
- Keep this list in mind for Step 3
### 1. Deterministic Gates (Prerequisites)
Run these first — if any fail, reject immediately without proceeding to AC review:
- Run `cargo clippy --all-targets --all-features` — must show 0 errors, 0 warnings
- Run `cargo test` and verify all tests pass
- If a `frontend/` directory exists:
- Run `npm run build` and note any TypeScript errors
- Run `npx @biomejs/biome check src/` and note any linting issues
- Run `npm test` and verify all frontend tests pass
### 2. Code Change Review
- Run `git diff master...HEAD --stat` to see what files changed
- Run `git diff master...HEAD` to review the actual changes
- Flag any incomplete implementations:
- `todo!()`, `unimplemented!()`, `panic!()` used as stubs
- Placeholder strings like "TODO", "FIXME", "notimplemented"
- Empty match arms or arms that just return `Default::default()`
- Hardcoded values where real logic is expected
- Note any obvious coding mistakes (unused imports, dead code, unhandled errors)
### 3. Acceptance Criteria Review
For each AC extracted in Step 0:
- Review the diff and test files to determine if the code addresses this AC
- PASS: describe specifically how the code addresses it (which file/function/test)
- FAIL: explain exactly what is missing or incorrect
An AC fails if:
- No code change or test relates to it
- The implementation is stubbed out (todo!/unimplemented!)
- A test exists but doesn't actually assert the behaviour described
### 4. Manual Testing Support (only if all gates PASS and all ACs PASS)
- Build the server: run `cargo build` and note success/failure
- If build succeeds: find a free port (try 3010-3020) and attempt to start the server
- Generate a testing plan including:
- URL to visit in the browser
- Things to check in the UI
- curl commands to exercise relevant API endpoints
- Kill the test server when done: `pkill -f 'target.*huskies' || true` (NEVER use `pkill -f huskies` — it kills the vite dev server)
### 5. Produce Structured Report and Verdict
Print your QA report to stdout. Then call `approve_qa` or `reject_qa` via the MCP tool based on the overall result. Use this format:
```
## QA Report for {{story_id}}
### Code Quality
- clippy: PASS/FAIL (details)
- TypeScript build: PASS/FAIL/SKIP (details)
- Biome lint: PASS/FAIL/SKIP (details)
- cargo test: PASS/FAIL (N tests)
- npm test: PASS/FAIL/SKIP (N tests)
- Incomplete implementations: (list any todo!/unimplemented!/stubs found, or "None")
- Other code review findings: (list any issues found, or "None")
### Acceptance Criteria Review
- AC: <criterion text>
Result: PASS/FAIL
Evidence: <how the code addresses it, or what is missing>
(repeat for each AC)
### Manual Testing Plan
- Server URL: http://localhost:PORT (or "Skipped—gate/ACfailure" or "Buildfailed")
- Pages to visit: (list, or "N/A")
- Things to check: (list, or "N/A")
- curl commands: (list, or "N/A")
### Overall: PASS/FAIL
Reason: (summary of why it passed or the primary reason it failed)
```
After printing the report:
- If Overall is PASS: call `approve_qa(story_id='{{story_id}}')` via MCP
- If Overall is FAIL: call `reject_qa(story_id='{{story_id}}', notes='<concise reason>')` via MCP so the coder knows exactly what to fix
## Rules
- Do NOT modify any code — read-only review only
- Gates must pass before AC review — a gate failure is an automatic reject
- If any AC is not met, the overall result is FAIL
- Always call approve_qa or reject_qa — never leave the story without a verdict"""
system_prompt="You are a QA agent. Your job is read-only: run quality gates, verify each acceptance criterion against the diff, and produce a structured QA report. Always call approve_qa or reject_qa via MCP to record your verdict. Do not modify code."
[[agent]]
name="coder-opus"
stage="coder"
role="Senior full-stack engineer for complex tasks. Implements features across all components."
model="opus"
max_turns=80
max_budget_usd=20.00
prompt="You are working in a git worktree on story {{story_id}}. Read CLAUDE.md first, then .story_kit/README.md to understand the dev process. The story details are in your prompt above. Follow the SDTW process through implementation and verification (Steps 1-3). The worktree and feature branch already exist - do not create them. Check .mcp.json for MCP tools. Do NOT accept the story or merge - commit your work and stop. If the user asks to review your changes, tell them to run: cd \"{{worktree_path}}\" && git difftool {{base_branch}}...HEAD\n\nIMPORTANT: Commit all your work before your process exits. The server will automatically run acceptance gates (cargo clippy + tests) when your process exits and advance the pipeline based on the results.\n\n## Bug Workflow: Root Cause First\nWhen working on bugs:\n1. Investigate the root cause before writing any fix. Use `git bisect` to find the breaking commit or `git log` to trace history. Read the relevant code before touching anything.\n2. Fix the root cause with a surgical, minimal change. Do NOT add new abstractions, wrappers, or workarounds when a targeted fix to the original code is possible.\n3. Write commit messages that explain what broke and why, not just what was changed.\n4. If you cannot determine the root cause after thorough investigation, document what you tried and why it was inconclusive — do not guess and ship a speculative fix."
system_prompt="You are a senior full-stack engineer working autonomously in a git worktree. You handle complex tasks requiring deep architectural understanding. Follow the Story-Driven Test Workflow strictly. Run cargo clippy --all-targets --all-features and biome checks before considering work complete. Commit all your work before finishing - use a descriptive commit message. Do not accept stories, move them to archived, or merge to master - a human will do that. Do not coordinate with other agents - focus on your assigned story. The server automatically runs acceptance gates when your process exits. For bugs, always find and fix the root cause. Use git bisect to find breaking commits. Do not layer new code on top of existing code when a surgical fix is possible. If root cause is unclear after investigation, document what you tried rather than guessing."
[[agent]]
name="qa"
stage="qa"
role="Reviews coder work in worktrees: runs quality gates, verifies acceptance criteria, and reports findings."
model="sonnet"
max_turns=40
max_budget_usd=4.00
prompt="""You are the QA agent for story {{story_id}}. Your job is to verify the coder's work satisfies the story's acceptance criteria and produce a structured QA report.
Read CLAUDE.md first, then .story_kit/README.md to understand the dev process.
## Your Workflow
### 0. Read the Story
- Read the story file at `.huskies/work/3_qa/{{story_id}}.md`
- Extract every acceptance criterion (the `- [ ]` checkbox lines)
- Keep this list in mind for Step 3
### 1. Deterministic Gates (Prerequisites)
Run these first — if any fail, reject immediately without proceeding to AC review:
- Run `cargo clippy --all-targets --all-features` — must show 0 errors, 0 warnings
- Run `cargo test` and verify all tests pass
- If a `frontend/` directory exists:
- Run `npm run build` and note any TypeScript errors
- Run `npx @biomejs/biome check src/` and note any linting issues
- Run `npm test` and verify all frontend tests pass
### 2. Code Change Review
- Run `git diff master...HEAD --stat` to see what files changed
- Run `git diff master...HEAD` to review the actual changes
- Flag any incomplete implementations:
- `todo!()`, `unimplemented!()`, `panic!()` used as stubs
- Placeholder strings like "TODO", "FIXME", "notimplemented"
- Empty match arms or arms that just return `Default::default()`
- Hardcoded values where real logic is expected
- Note any obvious coding mistakes (unused imports, dead code, unhandled errors)
### 3. Acceptance Criteria Review
For each AC extracted in Step 0:
- Review the diff and test files to determine if the code addresses this AC
- PASS: describe specifically how the code addresses it (which file/function/test)
- FAIL: explain exactly what is missing or incorrect
An AC fails if:
- No code change or test relates to it
- The implementation is stubbed out (todo!/unimplemented!)
- A test exists but doesn't actually assert the behaviour described
### 4. Manual Testing Support (only if all gates PASS and all ACs PASS)
- Build the server: run `cargo build` and note success/failure
- If build succeeds: find a free port (try 3010-3020) and attempt to start the server
- Generate a testing plan including:
- URL to visit in the browser
- Things to check in the UI
- curl commands to exercise relevant API endpoints
- Kill the test server when done: `pkill -f 'target.*huskies' || true` (NEVER use `pkill -f huskies` — it kills the vite dev server)
### 5. Produce Structured Report and Verdict
Print your QA report to stdout. Then call `approve_qa` or `reject_qa` via the MCP tool based on the overall result. Use this format:
```
## QA Report for {{story_id}}
### Code Quality
- clippy: PASS/FAIL (details)
- TypeScript build: PASS/FAIL/SKIP (details)
- Biome lint: PASS/FAIL/SKIP (details)
- cargo test: PASS/FAIL (N tests)
- npm test: PASS/FAIL/SKIP (N tests)
- Incomplete implementations: (list any todo!/unimplemented!/stubs found, or "None")
- Other code review findings: (list any issues found, or "None")
### Acceptance Criteria Review
- AC: <criterion text>
Result: PASS/FAIL
Evidence: <how the code addresses it, or what is missing>
(repeat for each AC)
### Manual Testing Plan
- Server URL: http://localhost:PORT (or "Skipped—gate/ACfailure" or "Buildfailed")
- Pages to visit: (list, or "N/A")
- Things to check: (list, or "N/A")
- curl commands: (list, or "N/A")
### Overall: PASS/FAIL
Reason: (summary of why it passed or the primary reason it failed)
```
After printing the report:
- If Overall is PASS: call `approve_qa(story_id='{{story_id}}')` via MCP
- If Overall is FAIL: call `reject_qa(story_id='{{story_id}}', notes='<concise reason>')` via MCP so the coder knows exactly what to fix
## Rules
- Do NOT modify any code — read-only review only
- Gates must pass before AC review — a gate failure is an automatic reject
- If any AC is not met, the overall result is FAIL
- Always call approve_qa or reject_qa — never leave the story without a verdict"""
system_prompt="You are a QA agent. Your job is read-only: run quality gates, verify each acceptance criterion against the diff, and produce a structured QA report. Always call approve_qa or reject_qa via MCP to record your verdict. Do not modify code."
[[agent]]
name="mergemaster"
stage="mergemaster"
role="Merges completed coder work into master, runs quality gates, archives stories, and cleans up worktrees."
model="opus"
max_turns=30
max_budget_usd=5.00
prompt="""You are the mergemaster agent for story {{story_id}}. Your job is to merge the completed coder work into master.
Read CLAUDE.md first, then .story_kit/README.md to understand the dev process.
## Your Workflow
1. Call merge_agent_work(story_id='{{story_id}}') via the MCP tool to trigger the full merge pipeline
2. Review the result: check success, had_conflicts, conflicts_resolved, gates_passed, and gate_output
3. If merge succeeded and gates passed: report success to the human
4. If conflicts were auto-resolved (conflicts_resolved=true) and gates passed: report success, noting which conflicts were resolved
5. If conflicts could not be auto-resolved: **resolve them yourself** in the merge worktree (see below)
6. If merge failed for any other reason: call report_merge_failure(story_id='{{story_id}}', reason='<details>') and report to the human
7. If gates failed after merge: attempt to fix the issues yourself in the merge worktree, then re-trigger merge_agent_work. After 3 fix attempts, call report_merge_failure and stop.
## Resolving Complex Conflicts Yourself
When the auto-resolver fails, you have access to the merge worktree at `.story_kit/merge_workspace/`. Go in there and resolve the conflicts manually:
1. Run `git diff --name-only --diff-filter=U` in the merge worktree to list conflicted files
2. **Build context before touching code.** Run `git log --oneline master...HEAD` on the feature branch to see its commits. Then run `git log --oneline --since="$(gitlog-1--format=%ci<feature-branch-base-commit>)" master` to see what landed on master since the branch was created. Read the story files in `.story_kit/work/` for any recently merged stories that touch the same files — this tells you WHY master changed and what must be preserved.
3. Read each conflicted file and understand both sides of the conflict
4. **Understand intent, not just syntax.** The feature branch may be behind master — master's version of shared infrastructure is almost always correct. The feature branch's contribution is the NEW functionality it adds. Your job is to integrate the new into master's structure, not pick one side.
5. Resolve by integrating the feature's new functionality into master's code structure
5. Stage resolved files with `git add`
6. Run `cargo check` (and `npm run build` if frontend changed) to verify compilation
7. If it compiles, commit and re-trigger merge_agent_work
### Common conflict patterns in this project:
**Story file rename/rename conflicts:** Both branches moved the story .md file to different pipeline directories. Resolution: `git rm` both sides — story files in `work/2_current/`, `work/3_qa/`, `work/4_merge/` are gitignored and don't need to be committed.
**bot.rs tokio::select! conflicts:** Master has a `tokio::select!` loop in `handle_message()` that handles permission forwarding (story 275). Feature branches created before story 275 have a simpler direct `provider.chat_stream().await` call. Resolution: KEEP master's tokio::select! loop. Integrate only the feature's new logic (e.g. typing indicators, new callbacks) into the existing loop structure. Do NOT replace the loop with the old direct call.
**Duplicate functions/imports:** The auto-resolver keeps both sides, producing duplicates. Resolution: keep one copy (prefer master's version), delete the duplicate.
**Formatting-only conflicts:** Both sides reformatted the same code differently. Resolution: pick either side (prefer master).
## Fixing Gate Failures
If quality gates fail (cargo clippy, cargo test, npm run build, npm test), attempt to fix issues yourself in the merge worktree.
- Trivial formatting issues that block compilation or linting
**Report to human without attempting a fix:**
- Logic errors or incorrect business logic
- Missing function implementations
- Architectural changes required
- Non-trivial refactoring needed
**Max retry limit:** If gates still fail after 3 fix attempts, call report_merge_failure to record the failure, then stop immediately and report the full gate output to the human.
## CRITICAL Rules
- NEVER manually move story files between pipeline stages (e.g. from 4_merge/ to 5_done/)
- NEVER call accept_story — only merge_agent_work can move stories to done after a successful merge
- When merge fails after exhausting your fix attempts, ALWAYS call report_merge_failure
- Report conflict resolution outcomes clearly
- Report gate failures with full output so the human can act if needed
- The server automatically runs acceptance gates when your process exits"""
system_prompt="You are the mergemaster agent. Your primary job is to merge feature branches to master. First try the merge_agent_work MCP tool. If the auto-resolver fails on complex conflicts, resolve them yourself in the merge worktree — you are an opus-class agent capable of understanding both sides of a conflict and producing correct merged code. Common patterns: keep master's tokio::select! permission loop in bot.rs, discard story file rename conflicts (gitignored), remove duplicate definitions. After resolving, verify compilation before re-triggering merge. CRITICAL: Never manually move story files or call accept_story. After 3 failed fix attempts, call report_merge_failure and stop."
@@ -9,7 +9,7 @@ This project is a standalone Rust **web server binary** that serves a Vite/React
***Framework:** Poem HTTP server with WebSocket support for streaming; HTTP APIs should use Poem OpenAPI (Swagger) for non-streaming endpoints.
***Framework:** Poem HTTP server with WebSocket support for streaming; HTTP APIs should use Poem OpenAPI (Swagger) for non-streaming endpoints.
* **Frontend:** TypeScript + React
* **Frontend:** TypeScript + React
***Build Tool:** Vite
***Build Tool:** Vite
***Package Manager:** pnpm (required)
***Package Manager:** npm
***Styling:** CSS Modules or Tailwind (TBD - Defaulting to CSS Modules)
***Styling:** CSS Modules or Tailwind (TBD - Defaulting to CSS Modules)
***State Management:** React Context / Hooks
***State Management:** React Context / Hooks
***Chat UI:** Rendered Markdown with syntax highlighting.
***Chat UI:** Rendered Markdown with syntax highlighting.
@@ -91,8 +91,8 @@ To support both Remote and Local models, the system implements a `ModelProvider`
* **Quality Gates:**
* **Quality Gates:**
* `npx @biomejs/biome check src/` must show 0 errors, 0 warnings
* `npx @biomejs/biome check src/` must show 0 errors, 0 warnings
* `npm run build` must succeed
* `npm run build` must succeed
* `npx vitest run` must pass
* `npm test` must pass
* `npx playwright test` must pass
* `npm run test:e2e` must pass
* No `any` types allowed (use proper types or `unknown`)
* No `any` types allowed (use proper types or `unknown`)
* React keys must use stable IDs, not array indices
* React keys must use stable IDs, not array indices
* All buttons must have explicit `type` attribute
* All buttons must have explicit `type` attribute
@@ -118,8 +118,8 @@ To support both Remote and Local models, the system implements a `ModelProvider`
Multiple instances can run simultaneously in different worktrees. To avoid port conflicts:
Multiple instances can run simultaneously in different worktrees. To avoid port conflicts:
- **Backend:** Set `STORYKIT_PORT` to a unique port (default is 3001). Example: `STORYKIT_PORT=3002 cargo run`
- **Backend:** Set `HUSKIES_PORT` to a unique port (default is 3001). Example: `HUSKIES_PORT=3002 cargo run`
- **Frontend:** Run `pnpm dev` from `frontend/`. It auto-selects the next unused port. It reads `STORYKIT_PORT` to know which backend to talk to, so export it before running: `export STORYKIT_PORT=3002 && cd frontend && pnpm dev`
- **Frontend:** Run `npm run dev` from `frontend/`. It auto-selects the next unused port. It reads `HUSKIES_PORT` to know which backend to talk to, so export it before running: `export HUSKIES_PORT=3002 && cd frontend && npm run dev`
When running in a worktree, use a port that won't conflict with the main instance (3001). Ports 3002+ are good choices.
When running in a worktree, use a port that won't conflict with the main instance (3001). Ports 3002+ are good choices.
- **Blocker**: `matrix-sdk 0.16.0` pins `rusqlite 0.37.x` which pins `libsqlite3-sys 0.35.0`. A clean upgrade requires either waiting for matrix-sdk to bump their rusqlite dep, or upgrading matrix-sdk itself.
- **Reverted 2026-03-17**: A previous coder vendored the entire rusqlite crate with a fake `0.37.99` version and patched its libsqlite3-sys dep. This was too hacky — reverted to clean `0.35.0`.
## Acceptance Criteria
- [ ]`libsqlite3-sys` is upgraded to `0.37.0` via a clean dependency path (no vendored forks)
- [ ]`cargo build` succeeds
- [ ] All tests pass
- [ ] No `[patch.crates-io]` hacks or vendored crates
# Story 388: WhatsApp webhook HMAC signature verification
## User Story
As a bot operator, I want incoming WhatsApp webhook requests to be cryptographically verified, so that forged requests from unauthorized sources are rejected.
## Acceptance Criteria
- [ ] Meta webhooks: validate X-Hub-Signature-256 HMAC-SHA256 header using the app secret before processing
- [ ] Twilio webhooks: validate request signature using the auth token before processing
- [ ] Requests with missing or invalid signatures are rejected with 403 Forbidden
- [ ] Verification is fail-closed: if signature checking is configured, unsigned requests are rejected
- [ ] Existing bot.toml config is extended with any needed secrets (e.g. Meta app_secret for HMAC verification)
- [ ] MUST use audited crypto crates (hmac, sha2, sha1, base64) — no hand-rolled cryptographic primitives
name: "Fly.io Machines API integration for multi-tenant huskies SaaS"
---
# Spike 408: Fly.io Machines API integration for multi-tenant huskies SaaS
## Question
Can we build a working Rust integration that creates and manages per-tenant Fly.io Machines, attaches volumes, injects Claude credentials, and proxies JWT-authenticated HTTP/WebSocket traffic to the right machine?
## Hypothesis
A thin Rust service using `reqwest` for the Machines API and `axum` for the reverse proxy is sufficient. No heavyweight orchestration framework needed.
## Prerequisites
- Fly.io account with API token (set `FLY_API_TOKEN` env var)
- Spike 407 findings reviewed
## Timebox
4 hours
## Investigation Plan
- [ ] Create a minimal Rust crate in `spikes/fly_machines/` — do not touch production code
- [ ] Implement machine lifecycle: create, start, stop, destroy via Fly Machines REST API using `reqwest`
- [ ] Test attaching a persistent volume to a machine and verify it persists across stop/start
- [ ] Test secret injection — pass a dummy `credentials.json` as a Fly secret and verify it's readable inside the machine
- [ ] Sketch the auth proxy: JWT validation → machine lookup → reverse proxy to machine's private IP; verify WebSocket proxying works
- [ ] Measure actual cold start time for a minimal huskies container image
- [ ] Document any API quirks, rate limits, or sharp edges discovered during testing
name: "Multi-account OAuth token rotation on rate limit"
---
# Story 411: Multi-account OAuth token rotation on rate limit
## User Story
As a huskies user with multiple Claude Max subscriptions, I want the system to automatically rotate to a different account when one gets rate limited, so that agents and chat don't stall out waiting for limits to reset.
## Acceptance Criteria
- [ ] OAuth login flow stores credentials per-account (keyed by email), not overwriting previous accounts
- [ ] GET /oauth/status returns all stored accounts and their status (active, rate-limited, expired)
- [ ] When the active account hits a rate limit, huskies automatically swaps to the next available account's refresh token, refreshes, and retries
- [ ] The bot sends a notification in Matrix/WhatsApp when it swaps accounts
- [ ] If all accounts are rate limited, the bot surfaces a clear message with the time until the earliest reset
- [ ] A new /oauth/authorize login adds to the account pool rather than replacing the current credentials
name: "Recheck bot command to re-run gates without restarting agent"
---
# Story 412: Recheck bot command to re-run gates without restarting agent
## User Story
As a user, I want to send `recheck <number>` to the bot so that it re-runs acceptance gates on an existing worktree without spawning a new agent, so I can unblock stories that failed due to environment issues without wasting agent turns.
## Acceptance Criteria
- [ ] recheck command is registered in chat/commands/mod.rs and appears in help output
- [ ]`recheck <number>` runs run_acceptance_gates on the story's existing worktree
- [ ] If gates pass, the story advances through the pipeline (same as if a coder completed successfully)
- [ ] If gates fail, the error output is returned to the user (not silently retried)
- [ ] If no worktree exists for the story, returns a clear error
- [ ] Does not spawn a new agent or increment retry_count
- [ ] Works from all transports (Matrix, WhatsApp, Slack)
name: "Unblock command handles all stuck states not just blocked flag"
---
# Story 435: Unblock command handles all stuck states not just blocked flag
## User Story
As a project owner, I want the unblock command to clear any stuck state on a story — not just the blocked flag — so that I have a single command to unstick stories regardless of why they're stuck.
## Acceptance Criteria
- [ ] Unblock clears merge_failure field in addition to blocked flag
- [ ] Unblock clears review_hold field
- [ ] Unblock reports which fields were cleared in the confirmation message
- [ ] Unblock works on stories in any pipeline stage (backlog, current, qa, merge, done)
- [ ] If no stuck state is found (no blocked, merge_failure, or review_hold), returns a clear message saying so
name: "Unify story stuck states into a single status field"
---
# Refactor 436: Unify story stuck states into a single status field
## Current State
- TBD
## Desired State
Replace the separate blocked, merge_failure, and review_hold front matter fields with a single status field (e.g. status: blocked, status: merge_failure, status: review_hold). Simplifies the unblock command, auto-assign checks, and pipeline advance logic.
## Acceptance Criteria
- [ ] Replace blocked: true, merge_failure: string, and review_hold: true with a single status: field in story front matter
- [ ] Auto-assign checks a single field instead of three separate ones
- [ ] Pipeline advance and lifecycle code reads/writes the unified status field
- [ ] Unblock command clears the status field regardless of which stuck state it was
- [ ] retry_count remains a separate field (it's a counter, not a state)
- [ ] Migration: existing stories with old fields are handled gracefully on read
@@ -10,7 +10,7 @@ The `prompt_permission` MCP tool returns plain text ("Permission granted for '..
## How to Reproduce
## How to Reproduce
1. Start the story-kit server and open the web UI
1. Start the huskies server and open the web UI
2. Chat with the claude-code-pty model
2. Chat with the claude-code-pty model
3. Ask it to do something that requires a tool NOT in `.claude/settings.json` allow list (e.g. `wc -l /etc/hosts`, or WebFetch to a non-allowed domain)
3. Ask it to do something that requires a tool NOT in `.claude/settings.json` allow list (e.g. `wc -l /etc/hosts`, or WebFetch to a non-allowed domain)
@@ -6,7 +6,7 @@ name: "Retry limit for mergemaster and pipeline restarts"
## User Story
## User Story
As a developer using story-kit, I want pipeline auto-restarts to have a configurable retry limit so that failing agents don't loop infinitely consuming CPU and API credits.
As a developer using huskies, I want pipeline auto-restarts to have a configurable retry limit so that failing agents don't loop infinitely consuming CPU and API credits.
These markers are phrases that appear in the scaffold templates (`server/src/io/fs.rs` lines 233 and 269). The detection logic (`is_template_or_missing` at line 59) checks if the file *contains* the marker string. But these phrases are generic enough that real project content can contain them too — especially when the project being managed IS an agentic code assistant (i.e. story-kit managing itself).
These markers are phrases that appear in the scaffold templates (`server/src/io/fs.rs` lines 233 and 269). The detection logic (`is_template_or_missing` at line 59) checks if the file *contains* the marker string. But these phrases are generic enough that real project content can contain them too — especially when the project being managed IS an agentic code assistant (i.e. huskies managing itself).
## The Fix
## The Fix
Replace the content-based marker detection with a dedicated sentinel comment that only exists in untouched scaffold templates. The sentinel should be something that would never appear in real content, like an HTML comment:
Replace the content-based marker detection with a dedicated sentinel comment that only exists in untouched scaffold templates. The sentinel should be something that would never appear in real content, like an HTML comment:
```
```
<!-- story-kit:scaffold-template -->
<!-- huskies:scaffold-template -->
```
```
Changes needed:
Changes needed:
1. **`server/src/io/onboarding.rs`**: Replace `TEMPLATE_MARKER_CONTEXT` and `TEMPLATE_MARKER_STACK` with a single `TEMPLATE_SENTINEL` constant set to `"<!-- story-kit:scaffold-template -->"`. Update `check_onboarding_status` to use it for both context and stack checks.
1. **`server/src/io/onboarding.rs`**: Replace `TEMPLATE_MARKER_CONTEXT` and `TEMPLATE_MARKER_STACK` with a single `TEMPLATE_SENTINEL` constant set to `"<!-- huskies:scaffold-template -->"`. Update `check_onboarding_status` to use it for both context and stack checks.
2. **`server/src/io/fs.rs`**: Add `<!-- story-kit:scaffold-template -->` as the first line of both `STORY_KIT_CONTEXT` and `STORY_KIT_STACK` template constants (lines 233 and 269).
2. **`server/src/io/fs.rs`**: Add `<!-- huskies:scaffold-template -->` as the first line of both `STORY_KIT_CONTEXT` and `STORY_KIT_STACK` template constants (lines 233 and 269).
3. **`server/src/io/onboarding.rs` tests**: Update the test `needs_onboarding_true_when_specs_contain_scaffold_markers` to use the sentinel instead of the old marker phrases. Also add a test confirming that content containing "Agentic AI Code Assistant" WITHOUT the sentinel does NOT trigger onboarding.
3. **`server/src/io/onboarding.rs` tests**: Update the test `needs_onboarding_true_when_specs_contain_scaffold_markers` to use the sentinel instead of the old marker phrases. Also add a test confirming that content containing "Agentic AI Code Assistant" WITHOUT the sentinel does NOT trigger onboarding.
@@ -42,7 +42,7 @@ Changes needed:
## Acceptance Criteria
## Acceptance Criteria
- [ ] Scaffold templates contain the sentinel `<!-- story-kit:scaffold-template -->` as first line
- [ ] Scaffold templates contain the sentinel `<!-- huskies:scaffold-template -->` as first line
- [ ] `needs_onboarding()` returns false for projects whose specs contain "Agentic AI Code Assistant" but NOT the sentinel
- [ ] `needs_onboarding()` returns false for projects whose specs contain "Agentic AI Code Assistant" but NOT the sentinel
- [ ] `needs_onboarding()` returns true for untouched scaffold content (which contains the sentinel)
- [ ] `needs_onboarding()` returns true for untouched scaffold content (which contains the sentinel)
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.