rename .story_kit directory to .storkit and update all references
Renames the config directory and updates 514 references across 42 Rust source files, plus CLAUDE.md, .gitignore, Makefile, script/release, and .mcp.json files. All 1205 tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
0
.storkit/work/1_backlog/.gitkeep
Normal file
0
.storkit/work/1_backlog/.gitkeep
Normal file
@@ -0,0 +1,20 @@
|
||||
---
|
||||
name: "Gate pipeline transitions on ensure_acceptance"
|
||||
---
|
||||
|
||||
# Story 169: Gate pipeline transitions on ensure_acceptance
|
||||
|
||||
## User Story
|
||||
|
||||
As a project owner, I want story progression to be blocked unless ensure_acceptance passes, so that agents can't skip the testing workflow.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] move_story_to_merge rejects stories that haven't passed ensure_acceptance
|
||||
- [ ] accept_story rejects stories that haven't passed ensure_acceptance
|
||||
- [ ] Rejection returns a clear error message telling the agent what's missing
|
||||
- [ ] Existing passing stories (all criteria checked, tests recorded) still flow through normally
|
||||
|
||||
## Out of Scope
|
||||
|
||||
- TBD
|
||||
@@ -0,0 +1,24 @@
|
||||
---
|
||||
name: "Upgrade libsqlite3-sys"
|
||||
---
|
||||
|
||||
# Refactor 260: Upgrade libsqlite3-sys
|
||||
|
||||
## Description
|
||||
|
||||
Upgrade the `libsqlite3-sys` dependency from `0.35.0` to `0.37.0`. The crate is used with `features = ["bundled"]` for static builds.
|
||||
|
||||
## Version Notes
|
||||
|
||||
- Current: `libsqlite3-sys 0.35.0` (pinned transitively by `matrix-sdk 0.16.0` → `matrix-sdk-sqlite` → `rusqlite 0.37.x`)
|
||||
- Target: `libsqlite3-sys 0.37.0`
|
||||
- Latest upstream rusqlite: `0.39.0`
|
||||
- **Blocker**: `matrix-sdk 0.16.0` pins `rusqlite 0.37.x` which pins `libsqlite3-sys 0.35.0`. A clean upgrade requires either waiting for matrix-sdk to bump their rusqlite dep, or upgrading matrix-sdk itself.
|
||||
- **Reverted 2026-03-17**: A previous coder vendored the entire rusqlite crate with a fake `0.37.99` version and patched its libsqlite3-sys dep. This was too hacky — reverted to clean `0.35.0`.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] `libsqlite3-sys` is upgraded to `0.37.0` via a clean dependency path (no vendored forks)
|
||||
- [ ] `cargo build` succeeds
|
||||
- [ ] All tests pass
|
||||
- [ ] No `[patch.crates-io]` hacks or vendored crates
|
||||
@@ -0,0 +1,69 @@
|
||||
---
|
||||
name: "Evaluate Docker/OrbStack for agent isolation and resource limiting"
|
||||
agent: coder-opus
|
||||
---
|
||||
|
||||
# Spike 329: Evaluate Docker/OrbStack for agent isolation and resource limiting
|
||||
|
||||
## Question
|
||||
|
||||
Investigate running the entire storkit system (server, Matrix bot, agents, web UI) inside a single Docker container, using OrbStack as the macOS runtime for better performance. The goal is to isolate storkit from the host machine — not to isolate agents from each other.
|
||||
|
||||
Currently storkit runs as bare processes on the host with full filesystem and network access. A single container would provide:
|
||||
|
||||
1. **Host isolation** — storkit can't touch anything outside the container
|
||||
2. **Clean install/uninstall** — `docker run` to start, `docker rm` to remove
|
||||
3. **Reproducible environment** — same container works on any machine
|
||||
4. **Distributable product** — `docker pull storkit` for new users
|
||||
5. **Resource limits** — cap total CPU/memory for the whole system
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
Docker Container (single)
|
||||
├── storkit server
|
||||
│ ├── Matrix bot
|
||||
│ ├── WhatsApp webhook
|
||||
│ ├── Slack webhook
|
||||
│ ├── Web UI
|
||||
│ └── MCP server
|
||||
├── Agent processes (coder-1, coder-2, coder-opus, qa, mergemaster)
|
||||
├── Rust toolchain + Node.js + Claude Code CLI
|
||||
└── /workspace (bind-mounted project repo from host)
|
||||
```
|
||||
|
||||
## Key questions to answer:
|
||||
|
||||
- **Performance**: How much slower are cargo builds inside the container on macOS? Compare Docker Desktop vs OrbStack for bind-mounted volumes.
|
||||
- **Dockerfile**: What's the minimal image for the full stack? Rust toolchain + Node.js + Claude Code CLI + cargo-nextest + git.
|
||||
- **Bind mounts**: The project repo is bind-mounted from the host. Any filesystem performance concerns with OrbStack?
|
||||
- **Networking**: Container exposes web UI port (3000). Matrix/WhatsApp/Slack connect outbound. Any issues?
|
||||
- **API key**: Pass ANTHROPIC_API_KEY as env var to the container.
|
||||
- **Git**: Git operations happen inside the container on the bind-mounted repo. Commits are visible on the host immediately.
|
||||
- **Cargo cache**: Use a named Docker volume for ~/.cargo/registry so dependencies persist across container restarts.
|
||||
- **Claude Code state**: Where does Claude Code store its session data? Needs to persist or be in a volume.
|
||||
- **OrbStack vs Docker Desktop**: Is OrbStack required for acceptable performance, or does Docker Desktop work too?
|
||||
- **Server restart**: Does `rebuild_and_restart` work inside a container (re-exec with new binary)?
|
||||
|
||||
## Deliverable:
|
||||
A proof-of-concept Dockerfile, docker-compose.yml, and a short write-up with findings and performance benchmarks.
|
||||
|
||||
## Hypothesis
|
||||
|
||||
- TBD
|
||||
|
||||
## Timebox
|
||||
|
||||
- TBD
|
||||
|
||||
## Investigation Plan
|
||||
|
||||
- TBD
|
||||
|
||||
## Findings
|
||||
|
||||
- TBD
|
||||
|
||||
## Recommendation
|
||||
|
||||
- TBD
|
||||
@@ -0,0 +1,40 @@
|
||||
---
|
||||
name: "Abstract agent runtime to support non-Claude-Code backends"
|
||||
---
|
||||
|
||||
# Refactor 343: Abstract agent runtime to support non-Claude-Code backends
|
||||
|
||||
## Current State
|
||||
|
||||
- TBD
|
||||
|
||||
## Desired State
|
||||
|
||||
Currently agent spawning is tightly coupled to Claude Code CLI — agents are spawned as PTY processes running the `claude` binary. To support ChatGPT and Gemini as agent backends, we need to abstract the agent runtime.
|
||||
|
||||
The agent pool currently does:
|
||||
1. Spawn `claude` CLI process via portable-pty
|
||||
2. Stream JSON events from stdout
|
||||
3. Parse tool calls, text output, thinking traces
|
||||
4. Wait for process exit, run gates
|
||||
|
||||
This needs to become a trait so different backends can be plugged in:
|
||||
- Claude Code (existing) — spawns `claude` CLI, parses JSON stream
|
||||
- OpenAI API — calls ChatGPT via API with tool definitions, manages conversation loop
|
||||
- Gemini API — calls Gemini via API with tool definitions, manages conversation loop
|
||||
|
||||
The key abstraction is: an agent runtime takes a prompt + tools and produces a stream of events (text output, tool calls, completion). The existing PTY/Claude Code logic becomes one implementation of this trait.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] Define an AgentRuntime trait with methods for: start, stream_events, stop, get_status
|
||||
- [ ] ClaudeCodeRuntime implements the trait using existing PTY spawning logic
|
||||
- [ ] Agent pool uses the trait instead of directly spawning Claude Code
|
||||
- [ ] Runtime selection is configurable per agent in project.toml (e.g. runtime = 'claude-code')
|
||||
- [ ] All existing Claude Code agent functionality preserved
|
||||
- [ ] Event stream format is runtime-agnostic (text, tool_call, thinking, done)
|
||||
- [ ] Token usage tracking works across runtimes
|
||||
|
||||
## Out of Scope
|
||||
|
||||
- TBD
|
||||
@@ -0,0 +1,25 @@
|
||||
---
|
||||
name: "ChatGPT agent backend via OpenAI API"
|
||||
---
|
||||
|
||||
# Story 344: ChatGPT agent backend via OpenAI API
|
||||
|
||||
## User Story
|
||||
|
||||
As a project owner, I want to run agents using ChatGPT (GPT-4o, o3, etc.) via the OpenAI API, so that I can use OpenAI models for coding tasks alongside Claude.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] Implement OpenAiRuntime using the AgentRuntime trait from refactor 343
|
||||
- [ ] Supports GPT-4o and o3 models via the OpenAI chat completions API
|
||||
- [ ] Manages a conversation loop: send prompt + tool definitions, execute tool calls, continue until done
|
||||
- [ ] Agents connect to storkit's MCP server for all tool operations — no custom file/bash tools needed
|
||||
- [ ] MCP tool definitions are converted to OpenAI function calling format
|
||||
- [ ] Configurable in project.toml: runtime = 'openai', model = 'gpt-4o'
|
||||
- [ ] OPENAI_API_KEY passed via environment variable
|
||||
- [ ] Token usage tracked and logged to token_usage.jsonl
|
||||
- [ ] Agent output streams to the same event system (web UI, bot notifications)
|
||||
|
||||
## Out of Scope
|
||||
|
||||
- TBD
|
||||
@@ -0,0 +1,25 @@
|
||||
---
|
||||
name: "Gemini agent backend via Google AI API"
|
||||
---
|
||||
|
||||
# Story 345: Gemini agent backend via Google AI API
|
||||
|
||||
## User Story
|
||||
|
||||
As a project owner, I want to run agents using Gemini (2.5 Pro, etc.) via the Google AI API, so that I can use Google models for coding tasks alongside Claude and ChatGPT.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] Implement GeminiRuntime using the AgentRuntime trait from refactor 343
|
||||
- [ ] Supports Gemini 2.5 Pro and other Gemini models via the Google AI generativeai API
|
||||
- [ ] Manages a conversation loop: send prompt + tool definitions, execute tool calls, continue until done
|
||||
- [ ] Agents connect to storkit's MCP server for all tool operations — no custom file/bash tools needed
|
||||
- [ ] MCP tool definitions are converted to Gemini function calling format
|
||||
- [ ] Configurable in project.toml: runtime = 'gemini', model = 'gemini-2.5-pro'
|
||||
- [ ] GOOGLE_AI_API_KEY passed via environment variable
|
||||
- [ ] Token usage tracked and logged to token_usage.jsonl
|
||||
- [ ] Agent output streams to the same event system (web UI, bot notifications)
|
||||
|
||||
## Out of Scope
|
||||
|
||||
- TBD
|
||||
@@ -0,0 +1,22 @@
|
||||
---
|
||||
name: "MCP tools for code search (grep and glob)"
|
||||
---
|
||||
|
||||
# Story 348: MCP tools for code search (grep and glob)
|
||||
|
||||
## User Story
|
||||
|
||||
As a non-Claude agent connected via MCP, I want search tools so that I can find files and search code contents in my worktree.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] grep tool — searches file contents with regex support, returns matching lines with context
|
||||
- [ ] glob tool — finds files by pattern (e.g. '**/*.rs')
|
||||
- [ ] Both scoped to the agent's worktree
|
||||
- [ ] grep supports output modes: content (matching lines), files_with_matches (just paths), count
|
||||
- [ ] grep supports context lines (-A, -B, -C)
|
||||
- [ ] Results limited to prevent overwhelming the LLM context
|
||||
|
||||
## Out of Scope
|
||||
|
||||
- TBD
|
||||
@@ -0,0 +1,23 @@
|
||||
---
|
||||
name: "MCP tools for git operations"
|
||||
---
|
||||
|
||||
# Story 349: MCP tools for git operations
|
||||
|
||||
## User Story
|
||||
|
||||
As a non-Claude agent connected via MCP, I want git tools so that I can check status, stage files, commit changes, and view history in my worktree.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] git_status tool — returns working tree status (staged, unstaged, untracked files)
|
||||
- [ ] git_diff tool — returns diff output, supports staged/unstaged/commit range
|
||||
- [ ] git_add tool — stages files by path
|
||||
- [ ] git_commit tool — commits staged changes with a message
|
||||
- [ ] git_log tool — returns commit history with configurable count and format
|
||||
- [ ] All operations run in the agent's worktree
|
||||
- [ ] Cannot push, force-push, or modify remote — server handles that
|
||||
|
||||
## Out of Scope
|
||||
|
||||
- TBD
|
||||
@@ -0,0 +1,21 @@
|
||||
---
|
||||
name: "MCP tool for code definitions lookup"
|
||||
---
|
||||
|
||||
# Story 350: MCP tool for code definitions lookup
|
||||
|
||||
## User Story
|
||||
|
||||
As a non-Claude agent connected via MCP, I want a code intelligence tool so that I can find function, struct, and type definitions without grepping through all files.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] get_definitions tool — finds function/struct/enum/type/class definitions by name or pattern
|
||||
- [ ] Supports Rust (fn, struct, enum, impl, trait) and TypeScript (function, class, interface, type) at minimum
|
||||
- [ ] Returns file path, line number, and the definition signature
|
||||
- [ ] Scoped to the agent's worktree
|
||||
- [ ] Faster than grepping — uses tree-sitter or regex-based parsing
|
||||
|
||||
## Out of Scope
|
||||
|
||||
- TBD
|
||||
@@ -0,0 +1,31 @@
|
||||
---
|
||||
name: Agent Security and Sandboxing
|
||||
---
|
||||
# Story 34: Agent Security and Sandboxing
|
||||
|
||||
## User Story
|
||||
**As a** supervisor orchestrating multiple autonomous agents,
|
||||
**I want to** constrain what each agent can access and do,
|
||||
**So that** agents can't escape their worktree, damage shared state, or perform unintended actions.
|
||||
|
||||
## Acceptance Criteria
|
||||
- [ ] Agent creation accepts an `allowed_tools` list to restrict Claude Code tool access per agent.
|
||||
- [ ] Agent creation accepts a `disallowed_tools` list as an alternative to allowlisting.
|
||||
- [ ] Agents without Bash access can still perform useful coding work (Read, Edit, Write, Glob, Grep).
|
||||
- [ ] Investigate replacing direct Bash/shell access with Rust-implemented tool proxies that enforce boundaries:
|
||||
- Scoped `exec_shell` that only runs allowlisted commands (e.g., `cargo test`, `npm test`) within the agent's worktree.
|
||||
- Scoped `read_file` / `write_file` that reject paths outside the agent's worktree root.
|
||||
- Scoped `git` operations that only work within the agent's worktree.
|
||||
- [ ] Evaluate `--max-turns` and `--max-budget-usd` as safety limits for runaway agents.
|
||||
- [ ] Document the trust model: what the supervisor controls vs what agents can do autonomously.
|
||||
|
||||
## Questions to Explore
|
||||
- Can we use MCP (Model Context Protocol) to expose our Rust-implemented tools to Claude Code, replacing its built-in Bash/filesystem tools with scoped versions?
|
||||
- What's the right granularity for shell allowlists — command-level (`cargo test`) or pattern-level (`cargo *`)?
|
||||
- Should agents have read access outside their worktree (e.g., to reference shared specs) but write access only within it?
|
||||
- Is OS-level sandboxing (Docker, macOS sandbox profiles) worth the complexity for a personal tool?
|
||||
|
||||
## Out of Scope
|
||||
- Multi-user authentication or authorization (single-user personal tool).
|
||||
- Network-level isolation between agents.
|
||||
- Encrypting agent communication channels (all local).
|
||||
18
.storkit/work/1_backlog/57_story_live_test_gate_updates.md
Normal file
18
.storkit/work/1_backlog/57_story_live_test_gate_updates.md
Normal file
@@ -0,0 +1,18 @@
|
||||
---
|
||||
name: Live Test Gate Updates
|
||||
---
|
||||
|
||||
# Story 57: Live Test Gate Updates
|
||||
|
||||
## User Story
|
||||
|
||||
As a user, I want the Gate and Todo panels to update automatically when tests are recorded or acceptance is checked, so I can see progress without manually refreshing.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] Server broadcasts a `{"type": "notification", "topic": "tests"}` event over `/ws` when tests are recorded, acceptance is checked, or coverage is collected
|
||||
- [ ] GatePanel auto-refreshes its data when it receives a `tests` notification
|
||||
- [ ] TodoPanel auto-refreshes its data when it receives a `tests` notification
|
||||
- [ ] Manual refresh buttons continue to work
|
||||
- [ ] Panels do not flicker or lose scroll position on auto-refresh
|
||||
- [ ] End-to-end test: record test results via MCP, verify Gate panel updates without manual refresh
|
||||
@@ -0,0 +1,21 @@
|
||||
---
|
||||
name: "Fetch real context window size from Anthropic models API"
|
||||
---
|
||||
|
||||
# Story 90: Fetch real context window size from Anthropic models API
|
||||
|
||||
## User Story
|
||||
|
||||
As a user chatting with a Claude model, I want the context remaining indicator to show the actual context window size for the selected model (fetched from the Anthropic API) instead of a hardcoded value, so that the indicator is accurate across all current and future models.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] Backend AnthropicModelInfo struct deserializes the context_window field from the Anthropic /v1/models response
|
||||
- [ ] Backend /anthropic/models endpoint returns both model ID and context window size to the frontend
|
||||
- [ ] Frontend uses the real context window size from the API response instead of the hardcoded getContextWindowSize map for Anthropic models
|
||||
- [ ] Context indicator in ChatHeader displays the correct percentage based on the real context window size
|
||||
- [ ] Hardcoded fallback remains for Ollama/local models that don't provide context window metadata
|
||||
|
||||
## Out of Scope
|
||||
|
||||
- TBD
|
||||
Reference in New Issue
Block a user