- Bumps server/Cargo.toml and frontend/package.json to 0.4.1
- Release script now auto-bumps both version files when run
- Changelog generation matches both "storkit:" and "story-kit:" prefixes
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The .story_kit → .storkit rename updated the grep pattern but all historical
merge commits still use the old "story-kit:" prefix, so overview could not
find any stories.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Renames the config directory and updates 514 references across 42 Rust
source files, plus CLAUDE.md, .gitignore, Makefile, script/release,
and .mcp.json files. All 1205 tests pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Updates -p flag in rebuild_and_restart, MCP server name, enabledMcpjsonServers,
and test values to match the new binary/crate name.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds htop bot command with live-updating Matrix message showing system
load and per-agent CPU/memory usage. Supports timeout override and
htop stop. Resolved conflict with git command in commands.rs registry.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Stories got stuck in QA/merge when agents were busy at assignment time.
Consolidates auto_assign into a single unconditional call at the end of
run_pipeline_advance, so whenever any agent completes, the system
immediately scans for pending work and assigns free agents.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Moves status, ambient, and help commands into a unified command registry
in commands.rs. Help output now automatically lists all registered
commands. Resolved merge conflict with 1_backlog rename.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bug 283 was implemented with manual_qa defaulting to true, causing all
stories to hold in QA for human review. Changed to default false as
originally specified — stories advance automatically unless explicitly
opted in with manual_qa: true.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add permission rules to .claude/settings.json
- Document empty merge and direct-to-master problems in problems.md
- Fix agent stream URL to use vite proxy instead of hardcoded host
- Add /agents proxy config to vite.config.ts
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Recurring issues observed during pipeline operation. Review periodically and create stories for systemic problems.
## 2026-03-18: Stories graduating to "done" with empty merges (7 of 10)
Pipeline allows stories to move through coding → QA → merge → done without any actual code changes landing on master. The squash-merge produces an empty diff but the pipeline still marks the story as done. Affected stories: 247, 273, 274, 278, 279, 280, 92. Only 266, 271, 277, and 281 actually shipped code. Root cause: no check that the merge commit contains a non-empty diff. Filed bug 283 for the manual_qa gate issue specifically, but the empty-merge-to-done problem is broader and needs its own fix.
## 2026-03-18: Agent committed directly to master instead of worktree
Multiple agents have committed directly to master instead of their worktree/feature branch:
- Commit `5f4591f` ("fix: update should_commit_stage test to match 5_done") — likely mergemaster
- Commit `a32cfbd` ("Add bot-level command registry with help command") — story 285 coder committed code + Cargo.lock directly to master
Agents should only commit to their feature branch or merge-queue branch, never to master directly. Suspect agents are running `git commit` in the project root instead of the worktree directory. This can also revert uncommitted fixes on master (e.g. project.toml pkill fix was overwritten). Frequency: at least 2 confirmed cases. This is a recurring and serious problem — needs a guard in the server or agent prompts.
## 2026-03-19: Auto-assign re-assigns mergemaster to failed merge stories in a loop
After bug 295 fix (`auto_assign_available_work` after every pipeline advance), mergemaster gets re-assigned to stories that already have a merge failure flag. Story 310 had an empty diff merge failure — mergemaster correctly reported the failure, but auto-assign immediately re-assigned mergemaster to the same story, creating an infinite retry loop. The auto-assign logic needs to check for the `merge_failure` front matter flag before re-assigning agents to stories in `4_merge/`.
## 2026-03-19: Coder produces no code (complete ghost — story 310)
Story 310 (Bot delete command) went through the full pipeline — coder session ran, passed QA/gates, moved to merge — but the coder produced zero code. No commits on the feature branch, no commits on master. The entire agent session was a no-op. This is different from the "committed to master instead of worktree" problem — in this case, the coder simply did nothing. Need to investigate the coder logs to understand what happened. The empty-diff merge check would catch this at merge time, but ideally the server should detect "coder finished with no commits on feature branch" at the gate-check stage and fail early.
## 2026-03-19: Auto-assign assigns mergemaster to coding-stage stories
Auto-assign picked mergemaster for story 310 which was in `2_current/`. Mergemaster should only work on stories in `4_merge/`. The `auto_assign_available_work` function doesn't enforce that the agent's configured stage matches the pipeline stage of the story it's being assigned to. Story 279 (auto-assign respects agent stage from front matter) was supposed to fix this, but the check may only apply to front-matter preferences, not the fallback assignment path.
system_prompt="You are a supervisor agent. Read CLAUDE.md and .story_kit/README.md first to understand the project dev process. Use MCP tools to coordinate sub-agents. Never implement code directly - always delegate to coder agents and monitor their progress. Use wait_for_agent to block until the coder finishes — the server automatically runs acceptance gates when the agent process exits. Never accept stories or merge to master - get all gates green and report to the human."
[[agent]]
name="coder-1"
stage="coder"
@@ -69,6 +45,16 @@ max_budget_usd = 5.00
prompt="You are working in a git worktree on story {{story_id}}. Read CLAUDE.md first, then .story_kit/README.md to understand the dev process. The story details are in your prompt above. Follow the SDTW process through implementation and verification (Steps 1-3). The worktree and feature branch already exist - do not create them. Check .mcp.json for MCP tools. Do NOT accept the story or merge - commit your work and stop. If the user asks to review your changes, tell them to run: cd \"{{worktree_path}}\" && git difftool {{base_branch}}...HEAD\n\nIMPORTANT: Commit all your work before your process exits. The server will automatically run acceptance gates (cargo clippy + tests) when your process exits and advance the pipeline based on the results.\n\n## Bug Workflow: Root Cause First\nWhen working on bugs:\n1. Investigate the root cause before writing any fix. Use `git bisect` to find the breaking commit or `git log` to trace history. Read the relevant code before touching anything.\n2. Fix the root cause with a surgical, minimal change. Do NOT add new abstractions, wrappers, or workarounds when a targeted fix to the original code is possible.\n3. Write commit messages that explain what broke and why, not just what was changed.\n4. If you cannot determine the root cause after thorough investigation, document what you tried and why it was inconclusive — do not guess and ship a speculative fix."
system_prompt="You are a full-stack engineer working autonomously in a git worktree. Follow the Story-Driven Test Workflow strictly. Run cargo clippy and biome checks before considering work complete. Commit all your work before finishing - use a descriptive commit message. Do not accept stories, move them to archived, or merge to master - a human will do that. Do not coordinate with other agents - focus on your assigned story. The server automatically runs acceptance gates when your process exits. For bugs, always find and fix the root cause. Use git bisect to find breaking commits. Do not layer new code on top of existing code when a surgical fix is possible. If root cause is unclear after investigation, document what you tried rather than guessing."
[[agent]]
name="coder-3"
stage="coder"
role="Full-stack engineer. Implements features across all components."
model="sonnet"
max_turns=50
max_budget_usd=5.00
prompt="You are working in a git worktree on story {{story_id}}. Read CLAUDE.md first, then .story_kit/README.md to understand the dev process. The story details are in your prompt above. Follow the SDTW process through implementation and verification (Steps 1-3). The worktree and feature branch already exist - do not create them. Check .mcp.json for MCP tools. Do NOT accept the story or merge - commit your work and stop. If the user asks to review your changes, tell them to run: cd \"{{worktree_path}}\" && git difftool {{base_branch}}...HEAD\n\nIMPORTANT: Commit all your work before your process exits. The server will automatically run acceptance gates (cargo clippy + tests) when your process exits and advance the pipeline based on the results.\n\n## Bug Workflow: Root Cause First\nWhen working on bugs:\n1. Investigate the root cause before writing any fix. Use `git bisect` to find the breaking commit or `git log` to trace history. Read the relevant code before touching anything.\n2. Fix the root cause with a surgical, minimal change. Do NOT add new abstractions, wrappers, or workarounds when a targeted fix to the original code is possible.\n3. Write commit messages that explain what broke and why, not just what was changed.\n4. If you cannot determine the root cause after thorough investigation, document what you tried and why it was inconclusive — do not guess and ship a speculative fix."
system_prompt="You are a full-stack engineer working autonomously in a git worktree. Follow the Story-Driven Test Workflow strictly. Run cargo clippy and biome checks before considering work complete. Commit all your work before finishing - use a descriptive commit message. Do not accept stories, move them to archived, or merge to master - a human will do that. Do not coordinate with other agents - focus on your assigned story. The server automatically runs acceptance gates when your process exits. For bugs, always find and fix the root cause. Use git bisect to find breaking commits. Do not layer new code on top of existing code when a surgical fix is possible. If root cause is unclear after investigation, document what you tried rather than guessing."
[[agent]]
name="qa-2"
stage="qa"
@@ -102,7 +88,7 @@ Read CLAUDE.md first, then .story_kit/README.md to understand the dev process.
@@ -118,8 +118,8 @@ To support both Remote and Local models, the system implements a `ModelProvider`
Multiple instances can run simultaneously in different worktrees. To avoid port conflicts:
- **Backend:** Set `STORYKIT_PORT` to a unique port (default is 3001). Example: `STORYKIT_PORT=3002 cargo run`
- **Frontend:** Run `npm run dev` from `frontend/`. It auto-selects the next unused port. It reads `STORYKIT_PORT` to know which backend to talk to, so export it before running: `export STORYKIT_PORT=3002 && cd frontend && npm run dev`
- **Backend:** Set `STORKIT_PORT` to a unique port (default is 3001). Example: `STORKIT_PORT=3002 cargo run`
- **Frontend:** Run `npm run dev` from `frontend/`. It auto-selects the next unused port. It reads `STORKIT_PORT` to know which backend to talk to, so export it before running: `export STORKIT_PORT=3002 && cd frontend && npm run dev`
When running in a worktree, use a port that won't conflict with the main instance (3001). Ports 3002+ are good choices.
name: "Evaluate Docker/OrbStack for agent isolation and resource limiting"
agent: coder-opus
---
# Spike 329: Evaluate Docker/OrbStack for agent isolation and resource limiting
## Question
Investigate running the entire storkit system (server, Matrix bot, agents, web UI) inside a single Docker container, using OrbStack as the macOS runtime for better performance. The goal is to isolate storkit from the host machine — not to isolate agents from each other.
Currently storkit runs as bare processes on the host with full filesystem and network access. A single container would provide:
1.**Host isolation** — storkit can't touch anything outside the container
2.**Clean install/uninstall** — `docker run` to start, `docker rm` to remove
3.**Reproducible environment** — same container works on any machine
4.**Distributable product** — `docker pull storkit` for new users
5.**Resource limits** — cap total CPU/memory for the whole system
name: "Abstract agent runtime to support non-Claude-Code backends"
---
# Refactor 343: Abstract agent runtime to support non-Claude-Code backends
## Current State
- TBD
## Desired State
Currently agent spawning is tightly coupled to Claude Code CLI — agents are spawned as PTY processes running the `claude` binary. To support ChatGPT and Gemini as agent backends, we need to abstract the agent runtime.
The agent pool currently does:
1. Spawn `claude` CLI process via portable-pty
2. Stream JSON events from stdout
3. Parse tool calls, text output, thinking traces
4. Wait for process exit, run gates
This needs to become a trait so different backends can be plugged in:
- OpenAI API — calls ChatGPT via API with tool definitions, manages conversation loop
- Gemini API — calls Gemini via API with tool definitions, manages conversation loop
The key abstraction is: an agent runtime takes a prompt + tools and produces a stream of events (text output, tool calls, completion). The existing PTY/Claude Code logic becomes one implementation of this trait.
## Acceptance Criteria
- [ ] Define an AgentRuntime trait with methods for: start, stream_events, stop, get_status
- [ ] ClaudeCodeRuntime implements the trait using existing PTY spawning logic
- [ ] Agent pool uses the trait instead of directly spawning Claude Code
- [ ] Runtime selection is configurable per agent in project.toml (e.g. runtime = 'claude-code')
- [ ] All existing Claude Code agent functionality preserved
- [ ] Event stream format is runtime-agnostic (text, tool_call, thinking, done)
As a project owner, I want to run agents using ChatGPT (GPT-4o, o3, etc.) via the OpenAI API, so that I can use OpenAI models for coding tasks alongside Claude.
## Acceptance Criteria
- [ ] Implement OpenAiRuntime using the AgentRuntime trait from refactor 343
- [ ] Supports GPT-4o and o3 models via the OpenAI chat completions API
- [ ] Manages a conversation loop: send prompt + tool definitions, execute tool calls, continue until done
- [ ] Agents connect to storkit's MCP server for all tool operations — no custom file/bash tools needed
- [ ] MCP tool definitions are converted to OpenAI function calling format
- [ ] Configurable in project.toml: runtime = 'openai', model = 'gpt-4o'
- [ ] OPENAI_API_KEY passed via environment variable
- [ ] Token usage tracked and logged to token_usage.jsonl
- [ ] Agent output streams to the same event system (web UI, bot notifications)
# Story 345: Gemini agent backend via Google AI API
## User Story
As a project owner, I want to run agents using Gemini (2.5 Pro, etc.) via the Google AI API, so that I can use Google models for coding tasks alongside Claude and ChatGPT.
## Acceptance Criteria
- [ ] Implement GeminiRuntime using the AgentRuntime trait from refactor 343
- [ ] Supports Gemini 2.5 Pro and other Gemini models via the Google AI generativeai API
- [ ] Manages a conversation loop: send prompt + tool definitions, execute tool calls, continue until done
- [ ] Agents connect to storkit's MCP server for all tool operations — no custom file/bash tools needed
- [ ] MCP tool definitions are converted to Gemini function calling format
- [ ] Configurable in project.toml: runtime = 'gemini', model = 'gemini-2.5-pro'
- [ ] GOOGLE_AI_API_KEY passed via environment variable
- [ ] Token usage tracked and logged to token_usage.jsonl
- [ ] Agent output streams to the same event system (web UI, bot notifications)
As a non-Claude agent connected via MCP, I want a code intelligence tool so that I can find function, struct, and type definitions without grepping through all files.
## Acceptance Criteria
- [ ] get_definitions tool — finds function/struct/enum/type/class definitions by name or pattern
- [ ] Supports Rust (fn, struct, enum, impl, trait) and TypeScript (function, class, interface, type) at minimum
- [ ] Returns file path, line number, and the definition signature
- [ ] Scoped to the agent's worktree
- [ ] Faster than grepping — uses tree-sitter or regex-based parsing
name: "Web UI agent assignment dropdown on work items"
---
# Story 339: Web UI agent assignment dropdown on work items
## User Story
As a project owner using the web UI, I want to select which agent to assign to a work item from a dropdown, so that I can control agent assignments visually.
## Acceptance Criteria
- [ ] Agent dropdown visible in expanded work item detail panel
- [ ] Shows available agents filtered by appropriate stage (coders for current, QA for qa, mergemaster for merge)
- [ ] Selecting an agent stops any current agent and starts the new one
- [ ] Updates the story front matter with the agent assignment
- [ ] Shows agent status (running, idle) in the dropdown
name: "Bot reset command to clear conversation context"
---
# Story 351: Bot reset command to clear conversation context
## User Story
As a project owner in a chat room, I want to type "{bot_name} reset" to drop the current Claude Code session and start fresh, so that I can reduce token usage when context gets bloated without restarting the server.
## Acceptance Criteria
- [ ] '{bot_name} reset' kills the current Claude Code session
- [ ] A new session starts immediately with clean context
- [ ] Memories persist via the file system (auto-memory directory is unchanged)
- [ ] Bot confirms the reset with a short message
- [ ] Registered in the command registry so it appears in help output
name: "Ambient on/off command not intercepted by bot after refactors"
---
# Bug 352: Ambient on/off command not intercepted by bot after refactors
## Description
The ambient on/off bot command stopped being intercepted by the bot after the recent refactors (328 split commands.rs into modules, 330 consolidated chat transports into chat/ module). Messages like "timmy ambient off", "ambient off", and "ambient on" are being forwarded to the LLM instead of being handled at the bot level. The ambient toggle was previously handled in bot.rs before the command registry dispatch — it may not have been properly wired up after the code was moved to the chat/ module structure.
## How to Reproduce
1. Type "timmy ambient off" in a Matrix room where ambient mode is on
2. Observe that the message is forwarded to Claude instead of being intercepted
3. Same for "timmy ambient on", "ambient off", "ambient on"
## Actual Result
Ambient toggle commands are forwarded to the LLM as regular messages.
## Expected Result
Ambient toggle commands should be intercepted at the bot level and toggle ambient mode without invoking the LLM, with a confirmation message sent directly.
## Acceptance Criteria
- [ ] 'timmy ambient on' toggles ambient mode on and sends confirmation without LLM invocation
- [ ] 'timmy ambient off' toggles ambient mode off and sends confirmation without LLM invocation
- [ ] Ambient toggle works after refactors 328 and 330
- [ ] Ambient state persists in bot.toml as before
@@ -10,7 +10,7 @@ The `prompt_permission` MCP tool returns plain text ("Permission granted for '..
## How to Reproduce
1. Start the story-kit server and open the web UI
1. Start the storkit server and open the web UI
2. Chat with the claude-code-pty model
3. Ask it to do something that requires a tool NOT in `.claude/settings.json` allow list (e.g. `wc -l /etc/hosts`, or WebFetch to a non-allowed domain)
@@ -6,7 +6,7 @@ name: "Retry limit for mergemaster and pipeline restarts"
## User Story
As a developer using story-kit, I want pipeline auto-restarts to have a configurable retry limit so that failing agents don't loop infinitely consuming CPU and API credits.
As a developer using storkit, I want pipeline auto-restarts to have a configurable retry limit so that failing agents don't loop infinitely consuming CPU and API credits.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.