rename .story_kit directory to .storkit and update all references

Renames the config directory and updates 514 references across 42 Rust source files, plus CLAUDE.md, .gitignore, Makefile, script/release, and .mcp.json files. All 1205 tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 11:34:53 +00:00
parent 375277f86e
commit 9581e5d51a
406 changed files with 531 additions and 530 deletions
--- a/.storkit/work/1_backlog/.gitkeep
+++ b/.storkit/work/1_backlog/.gitkeep
--- a/.storkit/work/1_backlog/169_story_gate_pipeline_transitions_on_ensure_acceptance.md
+++ b/.storkit/work/1_backlog/169_story_gate_pipeline_transitions_on_ensure_acceptance.md
@@ -0,0 +1,20 @@
+---
+name: "Gate pipeline transitions on ensure_acceptance"
+---
+
+# Story 169: Gate pipeline transitions on ensure_acceptance
+
+## User Story
+
+As a project owner, I want story progression to be blocked unless ensure_acceptance passes, so that agents can't skip the testing workflow.
+
+## Acceptance Criteria
+
+- [ ] move_story_to_merge rejects stories that haven't passed ensure_acceptance
+- [ ] accept_story rejects stories that haven't passed ensure_acceptance
+- [ ] Rejection returns a clear error message telling the agent what's missing
+- [ ] Existing passing stories (all criteria checked, tests recorded) still flow through normally
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/1_backlog/260_refactor_upgrade_libsqlite3_sys.md
+++ b/.storkit/work/1_backlog/260_refactor_upgrade_libsqlite3_sys.md
@@ -0,0 +1,24 @@
+---
+name: "Upgrade libsqlite3-sys"
+---
+
+# Refactor 260: Upgrade libsqlite3-sys
+
+## Description
+
+Upgrade the `libsqlite3-sys` dependency from `0.35.0` to `0.37.0`. The crate is used with `features = ["bundled"]` for static builds.
+
+## Version Notes
+
+- Current: `libsqlite3-sys 0.35.0` (pinned transitively by `matrix-sdk 0.16.0` → `matrix-sdk-sqlite` → `rusqlite 0.37.x`)
+- Target: `libsqlite3-sys 0.37.0`
+- Latest upstream rusqlite: `0.39.0`
+- **Blocker**: `matrix-sdk 0.16.0` pins `rusqlite 0.37.x` which pins `libsqlite3-sys 0.35.0`. A clean upgrade requires either waiting for matrix-sdk to bump their rusqlite dep, or upgrading matrix-sdk itself.
+- **Reverted 2026-03-17**: A previous coder vendored the entire rusqlite crate with a fake `0.37.99` version and patched its libsqlite3-sys dep. This was too hacky — reverted to clean `0.35.0`.
+
+## Acceptance Criteria
+
+- [ ] `libsqlite3-sys` is upgraded to `0.37.0` via a clean dependency path (no vendored forks)
+- [ ] `cargo build` succeeds
+- [ ] All tests pass
+- [ ] No `[patch.crates-io]` hacks or vendored crates
--- a/.storkit/work/1_backlog/329_spike_evaluate_docker_orbstack_for_agent_isolation_and_resource_limiting.md
+++ b/.storkit/work/1_backlog/329_spike_evaluate_docker_orbstack_for_agent_isolation_and_resource_limiting.md
@@ -0,0 +1,69 @@
+---
+name: "Evaluate Docker/OrbStack for agent isolation and resource limiting"
+agent: coder-opus
+---
+
+# Spike 329: Evaluate Docker/OrbStack for agent isolation and resource limiting
+
+## Question
+
+Investigate running the entire storkit system (server, Matrix bot, agents, web UI) inside a single Docker container, using OrbStack as the macOS runtime for better performance. The goal is to isolate storkit from the host machine — not to isolate agents from each other.
+
+Currently storkit runs as bare processes on the host with full filesystem and network access. A single container would provide:
+
+1. **Host isolation** — storkit can't touch anything outside the container
+2. **Clean install/uninstall** — `docker run` to start, `docker rm` to remove
+3. **Reproducible environment** — same container works on any machine
+4. **Distributable product** — `docker pull storkit` for new users
+5. **Resource limits** — cap total CPU/memory for the whole system
+
+## Architecture
+
+```
+Docker Container (single)
+├── storkit server
+│   ├── Matrix bot
+│   ├── WhatsApp webhook
+│   ├── Slack webhook
+│   ├── Web UI
+│   └── MCP server
+├── Agent processes (coder-1, coder-2, coder-opus, qa, mergemaster)
+├── Rust toolchain + Node.js + Claude Code CLI
+└── /workspace (bind-mounted project repo from host)
+```
+
+## Key questions to answer:
+
+- **Performance**: How much slower are cargo builds inside the container on macOS? Compare Docker Desktop vs OrbStack for bind-mounted volumes.
+- **Dockerfile**: What's the minimal image for the full stack? Rust toolchain + Node.js + Claude Code CLI + cargo-nextest + git.
+- **Bind mounts**: The project repo is bind-mounted from the host. Any filesystem performance concerns with OrbStack?
+- **Networking**: Container exposes web UI port (3000). Matrix/WhatsApp/Slack connect outbound. Any issues?
+- **API key**: Pass ANTHROPIC_API_KEY as env var to the container.
+- **Git**: Git operations happen inside the container on the bind-mounted repo. Commits are visible on the host immediately.
+- **Cargo cache**: Use a named Docker volume for ~/.cargo/registry so dependencies persist across container restarts.
+- **Claude Code state**: Where does Claude Code store its session data? Needs to persist or be in a volume.
+- **OrbStack vs Docker Desktop**: Is OrbStack required for acceptable performance, or does Docker Desktop work too?
+- **Server restart**: Does `rebuild_and_restart` work inside a container (re-exec with new binary)?
+
+## Deliverable:
+A proof-of-concept Dockerfile, docker-compose.yml, and a short write-up with findings and performance benchmarks.
+
+## Hypothesis
+
+- TBD
+
+## Timebox
+
+- TBD
+
+## Investigation Plan
+
+- TBD
+
+## Findings
+
+- TBD
+
+## Recommendation
+
+- TBD
--- a/.storkit/work/1_backlog/343_refactor_abstract_agent_runtime_to_support_non_claude_code_backends.md
+++ b/.storkit/work/1_backlog/343_refactor_abstract_agent_runtime_to_support_non_claude_code_backends.md
@@ -0,0 +1,40 @@
+---
+name: "Abstract agent runtime to support non-Claude-Code backends"
+---
+
+# Refactor 343: Abstract agent runtime to support non-Claude-Code backends
+
+## Current State
+
+- TBD
+
+## Desired State
+
+Currently agent spawning is tightly coupled to Claude Code CLI — agents are spawned as PTY processes running the `claude` binary. To support ChatGPT and Gemini as agent backends, we need to abstract the agent runtime.
+
+The agent pool currently does:
+1. Spawn `claude` CLI process via portable-pty
+2. Stream JSON events from stdout
+3. Parse tool calls, text output, thinking traces
+4. Wait for process exit, run gates
+
+This needs to become a trait so different backends can be plugged in:
+- Claude Code (existing) — spawns `claude` CLI, parses JSON stream
+- OpenAI API — calls ChatGPT via API with tool definitions, manages conversation loop
+- Gemini API — calls Gemini via API with tool definitions, manages conversation loop
+
+The key abstraction is: an agent runtime takes a prompt + tools and produces a stream of events (text output, tool calls, completion). The existing PTY/Claude Code logic becomes one implementation of this trait.
+
+## Acceptance Criteria
+
+- [ ] Define an AgentRuntime trait with methods for: start, stream_events, stop, get_status
+- [ ] ClaudeCodeRuntime implements the trait using existing PTY spawning logic
+- [ ] Agent pool uses the trait instead of directly spawning Claude Code
+- [ ] Runtime selection is configurable per agent in project.toml (e.g. runtime = 'claude-code')
+- [ ] All existing Claude Code agent functionality preserved
+- [ ] Event stream format is runtime-agnostic (text, tool_call, thinking, done)
+- [ ] Token usage tracking works across runtimes
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/1_backlog/344_story_chatgpt_agent_backend_via_openai_api.md
+++ b/.storkit/work/1_backlog/344_story_chatgpt_agent_backend_via_openai_api.md
@@ -0,0 +1,25 @@
+---
+name: "ChatGPT agent backend via OpenAI API"
+---
+
+# Story 344: ChatGPT agent backend via OpenAI API
+
+## User Story
+
+As a project owner, I want to run agents using ChatGPT (GPT-4o, o3, etc.) via the OpenAI API, so that I can use OpenAI models for coding tasks alongside Claude.
+
+## Acceptance Criteria
+
+- [ ] Implement OpenAiRuntime using the AgentRuntime trait from refactor 343
+- [ ] Supports GPT-4o and o3 models via the OpenAI chat completions API
+- [ ] Manages a conversation loop: send prompt + tool definitions, execute tool calls, continue until done
+- [ ] Agents connect to storkit's MCP server for all tool operations — no custom file/bash tools needed
+- [ ] MCP tool definitions are converted to OpenAI function calling format
+- [ ] Configurable in project.toml: runtime = 'openai', model = 'gpt-4o'
+- [ ] OPENAI_API_KEY passed via environment variable
+- [ ] Token usage tracked and logged to token_usage.jsonl
+- [ ] Agent output streams to the same event system (web UI, bot notifications)
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/1_backlog/345_story_gemini_agent_backend_via_google_ai_api.md
+++ b/.storkit/work/1_backlog/345_story_gemini_agent_backend_via_google_ai_api.md
@@ -0,0 +1,25 @@
+---
+name: "Gemini agent backend via Google AI API"
+---
+
+# Story 345: Gemini agent backend via Google AI API
+
+## User Story
+
+As a project owner, I want to run agents using Gemini (2.5 Pro, etc.) via the Google AI API, so that I can use Google models for coding tasks alongside Claude and ChatGPT.
+
+## Acceptance Criteria
+
+- [ ] Implement GeminiRuntime using the AgentRuntime trait from refactor 343
+- [ ] Supports Gemini 2.5 Pro and other Gemini models via the Google AI generativeai API
+- [ ] Manages a conversation loop: send prompt + tool definitions, execute tool calls, continue until done
+- [ ] Agents connect to storkit's MCP server for all tool operations — no custom file/bash tools needed
+- [ ] MCP tool definitions are converted to Gemini function calling format
+- [ ] Configurable in project.toml: runtime = 'gemini', model = 'gemini-2.5-pro'
+- [ ] GOOGLE_AI_API_KEY passed via environment variable
+- [ ] Token usage tracked and logged to token_usage.jsonl
+- [ ] Agent output streams to the same event system (web UI, bot notifications)
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/1_backlog/348_story_mcp_tools_for_code_search_grep_and_glob.md
+++ b/.storkit/work/1_backlog/348_story_mcp_tools_for_code_search_grep_and_glob.md
@@ -0,0 +1,22 @@
+---
+name: "MCP tools for code search (grep and glob)"
+---
+
+# Story 348: MCP tools for code search (grep and glob)
+
+## User Story
+
+As a non-Claude agent connected via MCP, I want search tools so that I can find files and search code contents in my worktree.
+
+## Acceptance Criteria
+
+- [ ] grep tool — searches file contents with regex support, returns matching lines with context
+- [ ] glob tool — finds files by pattern (e.g. '**/*.rs')
+- [ ] Both scoped to the agent's worktree
+- [ ] grep supports output modes: content (matching lines), files_with_matches (just paths), count
+- [ ] grep supports context lines (-A, -B, -C)
+- [ ] Results limited to prevent overwhelming the LLM context
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/1_backlog/349_story_mcp_tools_for_git_operations.md
+++ b/.storkit/work/1_backlog/349_story_mcp_tools_for_git_operations.md
@@ -0,0 +1,23 @@
+---
+name: "MCP tools for git operations"
+---
+
+# Story 349: MCP tools for git operations
+
+## User Story
+
+As a non-Claude agent connected via MCP, I want git tools so that I can check status, stage files, commit changes, and view history in my worktree.
+
+## Acceptance Criteria
+
+- [ ] git_status tool — returns working tree status (staged, unstaged, untracked files)
+- [ ] git_diff tool — returns diff output, supports staged/unstaged/commit range
+- [ ] git_add tool — stages files by path
+- [ ] git_commit tool — commits staged changes with a message
+- [ ] git_log tool — returns commit history with configurable count and format
+- [ ] All operations run in the agent's worktree
+- [ ] Cannot push, force-push, or modify remote — server handles that
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/1_backlog/350_story_mcp_tool_for_code_definitions_lookup.md
+++ b/.storkit/work/1_backlog/350_story_mcp_tool_for_code_definitions_lookup.md
@@ -0,0 +1,21 @@
+---
+name: "MCP tool for code definitions lookup"
+---
+
+# Story 350: MCP tool for code definitions lookup
+
+## User Story
+
+As a non-Claude agent connected via MCP, I want a code intelligence tool so that I can find function, struct, and type definitions without grepping through all files.
+
+## Acceptance Criteria
+
+- [ ] get_definitions tool — finds function/struct/enum/type/class definitions by name or pattern
+- [ ] Supports Rust (fn, struct, enum, impl, trait) and TypeScript (function, class, interface, type) at minimum
+- [ ] Returns file path, line number, and the definition signature
+- [ ] Scoped to the agent's worktree
+- [ ] Faster than grepping — uses tree-sitter or regex-based parsing
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/1_backlog/35_story_agent_security_and_sandboxing.md
+++ b/.storkit/work/1_backlog/35_story_agent_security_and_sandboxing.md
@@ -0,0 +1,31 @@
+---
+name: Agent Security and Sandboxing
+---
+# Story 34: Agent Security and Sandboxing
+
+## User Story
+**As a** supervisor orchestrating multiple autonomous agents,
+**I want to** constrain what each agent can access and do,
+**So that** agents can't escape their worktree, damage shared state, or perform unintended actions.
+
+## Acceptance Criteria
+- [ ] Agent creation accepts an `allowed_tools` list to restrict Claude Code tool access per agent.
+- [ ] Agent creation accepts a `disallowed_tools` list as an alternative to allowlisting.
+- [ ] Agents without Bash access can still perform useful coding work (Read, Edit, Write, Glob, Grep).
+- [ ] Investigate replacing direct Bash/shell access with Rust-implemented tool proxies that enforce boundaries:
+  - Scoped `exec_shell` that only runs allowlisted commands (e.g., `cargo test`, `npm test`) within the agent's worktree.
+  - Scoped `read_file` / `write_file` that reject paths outside the agent's worktree root.
+  - Scoped `git` operations that only work within the agent's worktree.
+- [ ] Evaluate `--max-turns` and `--max-budget-usd` as safety limits for runaway agents.
+- [ ] Document the trust model: what the supervisor controls vs what agents can do autonomously.
+
+## Questions to Explore
+- Can we use MCP (Model Context Protocol) to expose our Rust-implemented tools to Claude Code, replacing its built-in Bash/filesystem tools with scoped versions?
+- What's the right granularity for shell allowlists — command-level (`cargo test`) or pattern-level (`cargo *`)?
+- Should agents have read access outside their worktree (e.g., to reference shared specs) but write access only within it?
+- Is OS-level sandboxing (Docker, macOS sandbox profiles) worth the complexity for a personal tool?
+
+## Out of Scope
+- Multi-user authentication or authorization (single-user personal tool).
+- Network-level isolation between agents.
+- Encrypting agent communication channels (all local).
--- a/.storkit/work/1_backlog/57_story_live_test_gate_updates.md
+++ b/.storkit/work/1_backlog/57_story_live_test_gate_updates.md
@@ -0,0 +1,18 @@
+---
+name: Live Test Gate Updates
+---
+
+# Story 57: Live Test Gate Updates
+
+## User Story
+
+As a user, I want the Gate and Todo panels to update automatically when tests are recorded or acceptance is checked, so I can see progress without manually refreshing.
+
+## Acceptance Criteria
+
+- [ ] Server broadcasts a `{"type": "notification", "topic": "tests"}` event over `/ws` when tests are recorded, acceptance is checked, or coverage is collected
+- [ ] GatePanel auto-refreshes its data when it receives a `tests` notification
+- [ ] TodoPanel auto-refreshes its data when it receives a `tests` notification
+- [ ] Manual refresh buttons continue to work
+- [ ] Panels do not flicker or lose scroll position on auto-refresh
+- [ ] End-to-end test: record test results via MCP, verify Gate panel updates without manual refresh
--- a/.storkit/work/1_backlog/90_story_fetch_real_context_window_size_from_anthropic_models_api.md
+++ b/.storkit/work/1_backlog/90_story_fetch_real_context_window_size_from_anthropic_models_api.md
@@ -0,0 +1,21 @@
+---
+name: "Fetch real context window size from Anthropic models API"
+---
+
+# Story 90: Fetch real context window size from Anthropic models API
+
+## User Story
+
+As a user chatting with a Claude model, I want the context remaining indicator to show the actual context window size for the selected model (fetched from the Anthropic API) instead of a hardcoded value, so that the indicator is accurate across all current and future models.
+
+## Acceptance Criteria
+
+- [ ] Backend AnthropicModelInfo struct deserializes the context_window field from the Anthropic /v1/models response
+- [ ] Backend /anthropic/models endpoint returns both model ID and context window size to the frontend
+- [ ] Frontend uses the real context window size from the API response instead of the hardcoded getContextWindowSize map for Anthropic models
+- [ ] Context indicator in ChatHeader displays the correct percentage based on the real context window size
+- [ ] Hardcoded fallback remains for Ollama/local models that don't provide context window metadata
+
+## Out of Scope
+
+- TBD