rename .story_kit directory to .storkit and update all references

Renames the config directory and updates 514 references across 42 Rust
source files, plus CLAUDE.md, .gitignore, Makefile, script/release,
and .mcp.json files. All 1205 tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Dave
2026-03-20 11:34:53 +00:00
parent 375277f86e
commit 9581e5d51a
406 changed files with 531 additions and 530 deletions

View File

View File

@@ -0,0 +1,20 @@
---
name: "Gate pipeline transitions on ensure_acceptance"
---
# Story 169: Gate pipeline transitions on ensure_acceptance
## User Story
As a project owner, I want story progression to be blocked unless ensure_acceptance passes, so that agents can't skip the testing workflow.
## Acceptance Criteria
- [ ] move_story_to_merge rejects stories that haven't passed ensure_acceptance
- [ ] accept_story rejects stories that haven't passed ensure_acceptance
- [ ] Rejection returns a clear error message telling the agent what's missing
- [ ] Existing passing stories (all criteria checked, tests recorded) still flow through normally
## Out of Scope
- TBD

View File

@@ -0,0 +1,24 @@
---
name: "Upgrade libsqlite3-sys"
---
# Refactor 260: Upgrade libsqlite3-sys
## Description
Upgrade the `libsqlite3-sys` dependency from `0.35.0` to `0.37.0`. The crate is used with `features = ["bundled"]` for static builds.
## Version Notes
- Current: `libsqlite3-sys 0.35.0` (pinned transitively by `matrix-sdk 0.16.0``matrix-sdk-sqlite``rusqlite 0.37.x`)
- Target: `libsqlite3-sys 0.37.0`
- Latest upstream rusqlite: `0.39.0`
- **Blocker**: `matrix-sdk 0.16.0` pins `rusqlite 0.37.x` which pins `libsqlite3-sys 0.35.0`. A clean upgrade requires either waiting for matrix-sdk to bump their rusqlite dep, or upgrading matrix-sdk itself.
- **Reverted 2026-03-17**: A previous coder vendored the entire rusqlite crate with a fake `0.37.99` version and patched its libsqlite3-sys dep. This was too hacky — reverted to clean `0.35.0`.
## Acceptance Criteria
- [ ] `libsqlite3-sys` is upgraded to `0.37.0` via a clean dependency path (no vendored forks)
- [ ] `cargo build` succeeds
- [ ] All tests pass
- [ ] No `[patch.crates-io]` hacks or vendored crates

View File

@@ -0,0 +1,69 @@
---
name: "Evaluate Docker/OrbStack for agent isolation and resource limiting"
agent: coder-opus
---
# Spike 329: Evaluate Docker/OrbStack for agent isolation and resource limiting
## Question
Investigate running the entire storkit system (server, Matrix bot, agents, web UI) inside a single Docker container, using OrbStack as the macOS runtime for better performance. The goal is to isolate storkit from the host machine — not to isolate agents from each other.
Currently storkit runs as bare processes on the host with full filesystem and network access. A single container would provide:
1. **Host isolation** — storkit can't touch anything outside the container
2. **Clean install/uninstall**`docker run` to start, `docker rm` to remove
3. **Reproducible environment** — same container works on any machine
4. **Distributable product**`docker pull storkit` for new users
5. **Resource limits** — cap total CPU/memory for the whole system
## Architecture
```
Docker Container (single)
├── storkit server
│ ├── Matrix bot
│ ├── WhatsApp webhook
│ ├── Slack webhook
│ ├── Web UI
│ └── MCP server
├── Agent processes (coder-1, coder-2, coder-opus, qa, mergemaster)
├── Rust toolchain + Node.js + Claude Code CLI
└── /workspace (bind-mounted project repo from host)
```
## Key questions to answer:
- **Performance**: How much slower are cargo builds inside the container on macOS? Compare Docker Desktop vs OrbStack for bind-mounted volumes.
- **Dockerfile**: What's the minimal image for the full stack? Rust toolchain + Node.js + Claude Code CLI + cargo-nextest + git.
- **Bind mounts**: The project repo is bind-mounted from the host. Any filesystem performance concerns with OrbStack?
- **Networking**: Container exposes web UI port (3000). Matrix/WhatsApp/Slack connect outbound. Any issues?
- **API key**: Pass ANTHROPIC_API_KEY as env var to the container.
- **Git**: Git operations happen inside the container on the bind-mounted repo. Commits are visible on the host immediately.
- **Cargo cache**: Use a named Docker volume for ~/.cargo/registry so dependencies persist across container restarts.
- **Claude Code state**: Where does Claude Code store its session data? Needs to persist or be in a volume.
- **OrbStack vs Docker Desktop**: Is OrbStack required for acceptable performance, or does Docker Desktop work too?
- **Server restart**: Does `rebuild_and_restart` work inside a container (re-exec with new binary)?
## Deliverable:
A proof-of-concept Dockerfile, docker-compose.yml, and a short write-up with findings and performance benchmarks.
## Hypothesis
- TBD
## Timebox
- TBD
## Investigation Plan
- TBD
## Findings
- TBD
## Recommendation
- TBD

View File

@@ -0,0 +1,40 @@
---
name: "Abstract agent runtime to support non-Claude-Code backends"
---
# Refactor 343: Abstract agent runtime to support non-Claude-Code backends
## Current State
- TBD
## Desired State
Currently agent spawning is tightly coupled to Claude Code CLI — agents are spawned as PTY processes running the `claude` binary. To support ChatGPT and Gemini as agent backends, we need to abstract the agent runtime.
The agent pool currently does:
1. Spawn `claude` CLI process via portable-pty
2. Stream JSON events from stdout
3. Parse tool calls, text output, thinking traces
4. Wait for process exit, run gates
This needs to become a trait so different backends can be plugged in:
- Claude Code (existing) — spawns `claude` CLI, parses JSON stream
- OpenAI API — calls ChatGPT via API with tool definitions, manages conversation loop
- Gemini API — calls Gemini via API with tool definitions, manages conversation loop
The key abstraction is: an agent runtime takes a prompt + tools and produces a stream of events (text output, tool calls, completion). The existing PTY/Claude Code logic becomes one implementation of this trait.
## Acceptance Criteria
- [ ] Define an AgentRuntime trait with methods for: start, stream_events, stop, get_status
- [ ] ClaudeCodeRuntime implements the trait using existing PTY spawning logic
- [ ] Agent pool uses the trait instead of directly spawning Claude Code
- [ ] Runtime selection is configurable per agent in project.toml (e.g. runtime = 'claude-code')
- [ ] All existing Claude Code agent functionality preserved
- [ ] Event stream format is runtime-agnostic (text, tool_call, thinking, done)
- [ ] Token usage tracking works across runtimes
## Out of Scope
- TBD

View File

@@ -0,0 +1,25 @@
---
name: "ChatGPT agent backend via OpenAI API"
---
# Story 344: ChatGPT agent backend via OpenAI API
## User Story
As a project owner, I want to run agents using ChatGPT (GPT-4o, o3, etc.) via the OpenAI API, so that I can use OpenAI models for coding tasks alongside Claude.
## Acceptance Criteria
- [ ] Implement OpenAiRuntime using the AgentRuntime trait from refactor 343
- [ ] Supports GPT-4o and o3 models via the OpenAI chat completions API
- [ ] Manages a conversation loop: send prompt + tool definitions, execute tool calls, continue until done
- [ ] Agents connect to storkit's MCP server for all tool operations — no custom file/bash tools needed
- [ ] MCP tool definitions are converted to OpenAI function calling format
- [ ] Configurable in project.toml: runtime = 'openai', model = 'gpt-4o'
- [ ] OPENAI_API_KEY passed via environment variable
- [ ] Token usage tracked and logged to token_usage.jsonl
- [ ] Agent output streams to the same event system (web UI, bot notifications)
## Out of Scope
- TBD

View File

@@ -0,0 +1,25 @@
---
name: "Gemini agent backend via Google AI API"
---
# Story 345: Gemini agent backend via Google AI API
## User Story
As a project owner, I want to run agents using Gemini (2.5 Pro, etc.) via the Google AI API, so that I can use Google models for coding tasks alongside Claude and ChatGPT.
## Acceptance Criteria
- [ ] Implement GeminiRuntime using the AgentRuntime trait from refactor 343
- [ ] Supports Gemini 2.5 Pro and other Gemini models via the Google AI generativeai API
- [ ] Manages a conversation loop: send prompt + tool definitions, execute tool calls, continue until done
- [ ] Agents connect to storkit's MCP server for all tool operations — no custom file/bash tools needed
- [ ] MCP tool definitions are converted to Gemini function calling format
- [ ] Configurable in project.toml: runtime = 'gemini', model = 'gemini-2.5-pro'
- [ ] GOOGLE_AI_API_KEY passed via environment variable
- [ ] Token usage tracked and logged to token_usage.jsonl
- [ ] Agent output streams to the same event system (web UI, bot notifications)
## Out of Scope
- TBD

View File

@@ -0,0 +1,22 @@
---
name: "MCP tools for code search (grep and glob)"
---
# Story 348: MCP tools for code search (grep and glob)
## User Story
As a non-Claude agent connected via MCP, I want search tools so that I can find files and search code contents in my worktree.
## Acceptance Criteria
- [ ] grep tool — searches file contents with regex support, returns matching lines with context
- [ ] glob tool — finds files by pattern (e.g. '**/*.rs')
- [ ] Both scoped to the agent's worktree
- [ ] grep supports output modes: content (matching lines), files_with_matches (just paths), count
- [ ] grep supports context lines (-A, -B, -C)
- [ ] Results limited to prevent overwhelming the LLM context
## Out of Scope
- TBD

View File

@@ -0,0 +1,23 @@
---
name: "MCP tools for git operations"
---
# Story 349: MCP tools for git operations
## User Story
As a non-Claude agent connected via MCP, I want git tools so that I can check status, stage files, commit changes, and view history in my worktree.
## Acceptance Criteria
- [ ] git_status tool — returns working tree status (staged, unstaged, untracked files)
- [ ] git_diff tool — returns diff output, supports staged/unstaged/commit range
- [ ] git_add tool — stages files by path
- [ ] git_commit tool — commits staged changes with a message
- [ ] git_log tool — returns commit history with configurable count and format
- [ ] All operations run in the agent's worktree
- [ ] Cannot push, force-push, or modify remote — server handles that
## Out of Scope
- TBD

View File

@@ -0,0 +1,21 @@
---
name: "MCP tool for code definitions lookup"
---
# Story 350: MCP tool for code definitions lookup
## User Story
As a non-Claude agent connected via MCP, I want a code intelligence tool so that I can find function, struct, and type definitions without grepping through all files.
## Acceptance Criteria
- [ ] get_definitions tool — finds function/struct/enum/type/class definitions by name or pattern
- [ ] Supports Rust (fn, struct, enum, impl, trait) and TypeScript (function, class, interface, type) at minimum
- [ ] Returns file path, line number, and the definition signature
- [ ] Scoped to the agent's worktree
- [ ] Faster than grepping — uses tree-sitter or regex-based parsing
## Out of Scope
- TBD

View File

@@ -0,0 +1,31 @@
---
name: Agent Security and Sandboxing
---
# Story 34: Agent Security and Sandboxing
## User Story
**As a** supervisor orchestrating multiple autonomous agents,
**I want to** constrain what each agent can access and do,
**So that** agents can't escape their worktree, damage shared state, or perform unintended actions.
## Acceptance Criteria
- [ ] Agent creation accepts an `allowed_tools` list to restrict Claude Code tool access per agent.
- [ ] Agent creation accepts a `disallowed_tools` list as an alternative to allowlisting.
- [ ] Agents without Bash access can still perform useful coding work (Read, Edit, Write, Glob, Grep).
- [ ] Investigate replacing direct Bash/shell access with Rust-implemented tool proxies that enforce boundaries:
- Scoped `exec_shell` that only runs allowlisted commands (e.g., `cargo test`, `npm test`) within the agent's worktree.
- Scoped `read_file` / `write_file` that reject paths outside the agent's worktree root.
- Scoped `git` operations that only work within the agent's worktree.
- [ ] Evaluate `--max-turns` and `--max-budget-usd` as safety limits for runaway agents.
- [ ] Document the trust model: what the supervisor controls vs what agents can do autonomously.
## Questions to Explore
- Can we use MCP (Model Context Protocol) to expose our Rust-implemented tools to Claude Code, replacing its built-in Bash/filesystem tools with scoped versions?
- What's the right granularity for shell allowlists — command-level (`cargo test`) or pattern-level (`cargo *`)?
- Should agents have read access outside their worktree (e.g., to reference shared specs) but write access only within it?
- Is OS-level sandboxing (Docker, macOS sandbox profiles) worth the complexity for a personal tool?
## Out of Scope
- Multi-user authentication or authorization (single-user personal tool).
- Network-level isolation between agents.
- Encrypting agent communication channels (all local).

View File

@@ -0,0 +1,18 @@
---
name: Live Test Gate Updates
---
# Story 57: Live Test Gate Updates
## User Story
As a user, I want the Gate and Todo panels to update automatically when tests are recorded or acceptance is checked, so I can see progress without manually refreshing.
## Acceptance Criteria
- [ ] Server broadcasts a `{"type": "notification", "topic": "tests"}` event over `/ws` when tests are recorded, acceptance is checked, or coverage is collected
- [ ] GatePanel auto-refreshes its data when it receives a `tests` notification
- [ ] TodoPanel auto-refreshes its data when it receives a `tests` notification
- [ ] Manual refresh buttons continue to work
- [ ] Panels do not flicker or lose scroll position on auto-refresh
- [ ] End-to-end test: record test results via MCP, verify Gate panel updates without manual refresh

View File

@@ -0,0 +1,21 @@
---
name: "Fetch real context window size from Anthropic models API"
---
# Story 90: Fetch real context window size from Anthropic models API
## User Story
As a user chatting with a Claude model, I want the context remaining indicator to show the actual context window size for the selected model (fetched from the Anthropic API) instead of a hardcoded value, so that the indicator is accurate across all current and future models.
## Acceptance Criteria
- [ ] Backend AnthropicModelInfo struct deserializes the context_window field from the Anthropic /v1/models response
- [ ] Backend /anthropic/models endpoint returns both model ID and context window size to the frontend
- [ ] Frontend uses the real context window size from the API response instead of the hardcoded getContextWindowSize map for Anthropic models
- [ ] Context indicator in ChatHeader displays the correct percentage based on the real context window size
- [ ] Hardcoded fallback remains for Ollama/local models that don't provide context window metadata
## Out of Scope
- TBD