storkit: create 365_story_surface_api_rate_limit_warnings_in_chat

This commit is contained in:
dave
2026-03-22 18:19:23 +00:00
parent f346712dd1
commit e4227cf673
175 changed files with 0 additions and 83945 deletions

22
.storkit/.gitignore vendored
View File

@@ -1,22 +0,0 @@
# Bot config (contains credentials)
bot.toml
# Matrix SDK state store
matrix_store/
matrix_device_id
matrix_history.json
# Agent worktrees and merge workspace (managed by the server, not tracked in git)
worktrees/
merge_workspace/
# Intermediate pipeline stages (transient, not committed per spike 92)
work/2_current/
work/3_qa/
work/4_merge/
# Coverage reports (generated by cargo-llvm-cov, not tracked in git)
coverage/
# Token usage log (generated at runtime, contains cost data)
token_usage.jsonl

View File

@@ -1,239 +0,0 @@
# Story Kit: The Story-Driven Test Workflow (SDTW)
**Target Audience:** Large Language Models (LLMs) acting as Senior Engineers.
**Goal:** To maintain long-term project coherence, prevent context window exhaustion, and ensure high-quality, testable code generation in large software projects.
---
## 0. First Steps (For New LLM Sessions)
When you start a new session with this project:
1. **Check for MCP Tools:** Read `.mcp.json` to discover the MCP server endpoint. Then list available tools by calling:
```bash
curl -s "$(jq -r '.mcpServers["storkit"].url' .mcp.json)" \
-H 'Content-Type: application/json' \
-d '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}'
```
This returns the full tool catalog (create stories, spawn agents, record tests, manage worktrees, etc.). Familiarize yourself with the available tools before proceeding. These tools allow you to directly manipulate the workflow and spawn subsidiary agents without manual file manipulation.
2. **Read Context:** Check `.story_kit/specs/00_CONTEXT.md` for high-level project goals.
3. **Read Stack:** Check `.story_kit/specs/tech/STACK.md` for technical constraints and patterns.
4. **Check Work Items:** Look at `.story_kit/work/1_backlog/` and `.story_kit/work/2_current/` to see what work is pending.
---
## 1. The Philosophy
We treat the codebase as the implementation of a **"Living Specification."** driven by **User Stories**
Instead of ephemeral chat prompts ("Fix this", "Add that"), we work through persistent artifacts.
* **Stories** define the *Change*.
* **Tests** define the *Truth*.
* **Code** defines the *Reality*.
**The Golden Rule:** You are not allowed to write code until the Acceptance Criteria are captured in the story.
---
## 1.5 MCP Tools
Agents have programmatic access to the workflow via MCP tools served at `POST /mcp`. The project `.mcp.json` registers this endpoint automatically so Claude Code sessions and spawned agents can call tools like `create_story`, `validate_stories`, `list_upcoming`, `get_story_todos`, `record_tests`, `ensure_acceptance`, `start_agent`, `stop_agent`, `list_agents`, and `get_agent_output` without parsing English instructions.
**To discover what tools are available:** Check `.mcp.json` for the server endpoint, then use the MCP protocol to list available tools.
---
## 2. Directory Structure
```text
project_root/
.mcp.json # MCP server configuration (if MCP tools are available)
.story_kit/
├── README.md # This document
├── project.toml # Agent configuration (roles, models, prompts)
├── work/ # Unified work item pipeline (stories, bugs, spikes)
│ ├── 1_backlog/ # New work items awaiting implementation
│ ├── 2_current/ # Work in progress
│ ├── 3_qa/ # QA review
│ ├── 4_merge/ # Ready to merge to master
│ ├── 5_done/ # Merged and completed (auto-swept to 6_archived after 4 hours)
│ └── 6_archived/ # Long-term archive
├── worktrees/ # Agent worktrees (managed by the server)
├── specs/ # Minimal guardrails (context + stack)
│ ├── 00_CONTEXT.md # High-level goals, domain definition, and glossary
│ ├── tech/ # Implementation details (Stack, Architecture, Constraints)
│ │ └── STACK.md # The "Constitution" (Languages, Libs, Patterns)
│ └── functional/ # Domain logic (Platform-agnostic behavior)
│ └── ...
└── src/ # The Code
```
### Work Items
All work items (stories, bugs, spikes) live in the same `work/` pipeline. Items are named: `{id}_{type}_{slug}.md`
* Stories: `57_story_live_test_gate_updates.md`
* Bugs: `4_bug_run_button_does_not_start_agent.md`
* Spikes: `61_spike_filesystem_watcher_architecture.md`
Items move through stages by moving the file between directories:
`1_backlog` → `2_current` → `3_qa` → `4_merge` → `5_done` → `6_archived`
Items in `5_done` are auto-swept to `6_archived` after 4 hours by the server.
### Filesystem Watcher
The server watches `.story_kit/work/` for changes. When a file is created, moved, or modified, the watcher auto-commits with a deterministic message and broadcasts a WebSocket notification to the frontend. This means:
* MCP tools only need to write/move files — the watcher handles git commits
* IDE drag-and-drop works (drag a story from `1_backlog/` to `2_current/`)
* The frontend updates automatically without manual refresh
---
## 3. The Cycle (The "Loop")
When the user asks for a feature, follow this 4-step loop strictly:
### Step 1: The Story (Ingest)
* **User Input:** "I want the robot to dance."
* **Action:** Create a story via MCP tool `create_story` (guarantees correct front matter and auto-assigns the story number).
* **Front Matter (Required):** Every work item file MUST begin with YAML front matter containing a `name` field:
```yaml
---
name: Short Human-Readable Story Name
---
```
* **Move to Current:** Once the story is validated and ready for coding, move it to `work/2_current/`.
* **Tracking:** Mark Acceptance Criteria as tested directly in the story file as tests are completed.
* **Content:**
* **User Story:** "As a user, I want..."
* **Acceptance Criteria:** Bullet points of observable success.
* **Out of scope:** Things that are out of scope so that the LLM doesn't go crazy
* **Story Quality (INVEST):** Stories should be Independent, Negotiable, Valuable, Estimable, Small, and Testable.
* **Git:** The `start_agent` MCP tool automatically creates a worktree under `.story_kit/worktrees/`, checks out a feature branch, moves the story to `work/2_current/`, and spawns the agent. No manual branch or worktree creation is needed.
### Step 2: The Implementation (Code)
* **Action:** Write the code to satisfy the approved tests and Acceptance Criteria.
* **Constraint:** adhere strictly to `specs/tech/STACK.md` (e.g., if it forbids certain patterns, you must not use them).
* **Full-Stack Completion:** Every story must be completed across all components of the stack. If a feature touches the backend, frontend, and API layer, all three must be fully implemented and working end-to-end before the story can be accepted. Partial implementations (e.g., backend logic with no frontend wiring, or UI scaffolding with no real data) do not satisfy acceptance criteria.
### Step 3: Verification (Close)
* **Action:** For each Acceptance Criterion in the story, write a failing test (red), mark the criterion as tested, make the test pass (green), and refactor if needed. Keep only one failing test at a time.
* **Action:** Run compilation and make sure it succeeds without errors. Consult `specs/tech/STACK.md` and run all required linters listed there (treat warnings as errors). Run tests and make sure they all pass before proceeding. Ask questions here if needed.
* **Action:** Do not accept stories yourself. Ask the user if they accept the story. If they agree, move the story file to `work/5_done/`.
* **Move to Done:** After acceptance, move the story from `work/2_current/` (or `work/4_merge/`) to `work/5_done/`.
* **Action:** When the user accepts:
1. Move the story file to `work/5_done/`
2. Commit both changes to the feature branch
3. Perform the squash merge: `git merge --squash feature/story-name`
4. Commit to master with a comprehensive commit message
5. Delete the feature branch: `git branch -D feature/story-name`
* **Important:** Do NOT mark acceptance criteria as complete before user acceptance. Only mark them complete when the user explicitly accepts the story.
**CRITICAL - NO SUMMARY DOCUMENTS:**
* **NEVER** create a separate summary document (e.g., `STORY_XX_SUMMARY.md`, `IMPLEMENTATION_NOTES.md`, etc.)
* **NEVER** write terminal output to a markdown file for "documentation purposes"
* Tests are the primary source of truth. Keep test coverage and Acceptance Criteria aligned after each story.
* If you find yourself typing `cat << 'EOF' > SUMMARY.md` or similar, **STOP IMMEDIATELY**.
* The only files that should exist after story completion:
* Updated code in `src/`
* Updated guardrails in `specs/` (if needed)
* Archived work item in `work/5_done/` (server auto-sweeps to `work/6_archived/` after 4 hours)
---
## 3.5. Bug Workflow (Simplified Path)
Not everything needs to be a full story. Simple bugs can skip the story process:
### When to Use Bug Workflow
* Defects in existing functionality (not new features)
* State inconsistencies or data corruption
* UI glitches that don't require spec changes
* Performance issues with known fixes
### Bug Process
1. **Document Bug:** Create a bug file in `work/1_backlog/` named `{id}_bug_{slug}.md` with:
* **Symptom:** What the user observes
* **Root Cause:** Technical explanation (if known)
* **Reproduction Steps:** How to trigger the bug
* **Proposed Fix:** Brief technical approach
* **Workaround:** Temporary solution if available
2. **Start an Agent:** Use the `start_agent` MCP tool to create a worktree and spawn an agent for the bug fix.
3. **Write a Failing Test:** Before fixing the bug, write a test that reproduces it (red). This proves the bug exists and prevents regression.
4. **Fix the Bug:** Make minimal code changes to make the test pass (green).
5. **User Testing:** Let the user verify the fix in the worktree before merging. Do not proceed until they confirm.
6. **Archive & Merge:** Move the bug file to `work/5_done/`, squash merge to master, delete the worktree and branch.
7. **No Guardrail Update Needed:** Unless the bug reveals a missing constraint
### Bug vs Story vs Spike
* **Bug:** Existing functionality is broken → Fix it
* **Story:** New functionality is needed → Test it, then build it
* **Spike:** Uncertainty/feasibility discovery → Run spike workflow
---
## 3.6. Spike Workflow (Research Path)
Not everything needs a story or bug fix. Spikes are time-boxed investigations to reduce uncertainty.
### When to Use a Spike
* Unclear root cause or feasibility
* Need to compare libraries/encoders/formats
* Need to validate performance constraints
### Spike Process
1. **Document Spike:** Create a spike file in `work/1_backlog/` named `{id}_spike_{slug}.md` with:
* **Question:** What you need to answer
* **Hypothesis:** What you expect to be true
* **Timebox:** Strict limit for the research
* **Investigation Plan:** Steps/tools to use
* **Findings:** Evidence and observations
* **Recommendation:** Next step (Story, Bug, or No Action)
2. **Execute Research:** Stay within the timebox. No production code changes.
3. **Escalate if Needed:** If implementation is required, open a Story or Bug and follow that workflow.
4. **Archive:** Move the spike file to `work/5_done/`.
### Spike Output
* Decision and evidence, not production code
* Specs updated only if the spike changes system truth
---
## 4. Context Reset Protocol
When the LLM context window fills up (or the chat gets slow/confused):
1. **Stop Coding.**
2. **Instruction:** Tell the user to open a new chat.
3. **Handoff:** The only context the new LLM needs is in the `specs/` folder and `.mcp.json`.
* *Prompt for New Session:* "I am working on Project X. Read `.mcp.json` to discover available tools, then read `specs/00_CONTEXT.md` and `specs/tech/STACK.md`. Then look at `work/1_backlog/` and `work/2_current/` to see what is pending."
---
## 5. Setup Instructions (For the LLM)
If a user hands you this document and says "Apply this process to my project":
1. **Check for MCP Tools:** Look for `.mcp.json` in the project root. If it exists, you have programmatic access to workflow tools and agent spawning capabilities.
2. **Analyze the Request:** Ask for the high-level goal ("What are we building?") and the tech preferences ("Rust or Python?").
3. **Git Check:** Check if the directory is a git repository (`git status`). If not, run `git init`.
4. **Scaffold:** Run commands to create the `work/` and `specs/` folders with the 6-stage pipeline (`work/1_backlog/` through `work/6_archived/`).
5. **Draft Context:** Write `specs/00_CONTEXT.md` based on the user's answer.
6. **Draft Stack:** Write `specs/tech/STACK.md` based on best practices for that language.
7. **Wait:** Ask the user for "Story #1".
---
## 6. Code Quality
**MANDATORY:** Before completing Step 3 (Verification) of any story, you MUST run all applicable linters, formatters, and test suites and fix ALL errors and warnings. Zero tolerance for warnings or errors.
**AUTO-RUN CHECKS:** Always run the required lint/test/build checks as soon as relevant changes are made. Do not ask for permission to run them—run them automatically and fix any failures.
**ALWAYS FIX DIAGNOSTICS:** At every stage, you must proactively fix all errors and warnings without waiting for user confirmation. Do not pause to ask whether to fix diagnostics—fix them immediately as part of the workflow.
**Consult `specs/tech/STACK.md`** for the specific tools, commands, linter configurations, and quality gates for this project. The STACK file is the single source of truth for what must pass before a story can be accepted.

View File

@@ -1,61 +0,0 @@
homeserver = "https://matrix.example.com"
username = "@botname:example.com"
password = "your-bot-password"
# List one or more rooms to listen in. Use a single-element list for one room.
room_ids = ["!roomid:example.com"]
# Optional: the deprecated single-room key is still accepted for backwards compat.
# room_id = "!roomid:example.com"
allowed_users = ["@youruser:example.com"]
enabled = false
# Maximum conversation turns to remember per room (default: 20).
# history_size = 20
# Rooms where the bot responds to all messages (not just addressed ones).
# This list is updated automatically when users toggle ambient mode at runtime.
# ambient_rooms = ["!roomid:example.com"]
# ── WhatsApp Business API ──────────────────────────────────────────────
# Set transport = "whatsapp" to use WhatsApp instead of Matrix.
# The webhook endpoint will be available at /webhook/whatsapp.
# You must configure this URL in the Meta Developer Dashboard.
#
# transport = "whatsapp"
# whatsapp_phone_number_id = "123456789012345"
# whatsapp_access_token = "EAAx..."
# whatsapp_verify_token = "my-secret-verify-token"
#
# ── 24-hour messaging window & notification templates ─────────────────
# WhatsApp only allows free-form text messages within 24 hours of the last
# inbound message from a user. For proactive pipeline notifications sent
# after the window expires, an approved Meta message template is used.
#
# Register the template in the Meta Business Manager:
# 1. Go to Business Settings → WhatsApp → Message Templates → Create.
# 2. Category: UTILITY
# 3. Template name: pipeline_notification (or your chosen name below)
# 4. Language: English (en_US)
# 5. Body text (example):
# Story *{{1}}* has moved to *{{2}}*.
# Where {{1}} = story name, {{2}} = pipeline stage.
# 6. Submit for review. Meta typically approves utility templates within
# minutes; transactional categories may take longer.
#
# Once approved, set the name below (default: "pipeline_notification"):
# whatsapp_notification_template = "pipeline_notification"
# ── Slack Bot API ─────────────────────────────────────────────────────
# Set transport = "slack" to use Slack instead of Matrix.
# The webhook endpoint will be available at /webhook/slack.
# Configure this URL in the Slack App → Event Subscriptions → Request URL.
#
# Required Slack App scopes: chat:write, chat:update
# Subscribe to bot events: message.channels, message.groups, message.im
#
# transport = "slack"
# slack_bot_token = "xoxb-..."
# slack_signing_secret = "your-signing-secret"
# slack_channel_ids = ["C01ABCDEF"]

View File

@@ -1,28 +0,0 @@
# Problems
Recurring issues observed during pipeline operation. Review periodically and create stories for systemic problems.
## 2026-03-18: Stories graduating to "done" with empty merges (7 of 10)
Pipeline allows stories to move through coding → QA → merge → done without any actual code changes landing on master. The squash-merge produces an empty diff but the pipeline still marks the story as done. Affected stories: 247, 273, 274, 278, 279, 280, 92. Only 266, 271, 277, and 281 actually shipped code. Root cause: no check that the merge commit contains a non-empty diff. Filed bug 283 for the manual_qa gate issue specifically, but the empty-merge-to-done problem is broader and needs its own fix.
## 2026-03-18: Agent committed directly to master instead of worktree
Multiple agents have committed directly to master instead of their worktree/feature branch:
- Commit `5f4591f` ("fix: update should_commit_stage test to match 5_done") — likely mergemaster
- Commit `a32cfbd` ("Add bot-level command registry with help command") — story 285 coder committed code + Cargo.lock directly to master
Agents should only commit to their feature branch or merge-queue branch, never to master directly. Suspect agents are running `git commit` in the project root instead of the worktree directory. This can also revert uncommitted fixes on master (e.g. project.toml pkill fix was overwritten). Frequency: at least 2 confirmed cases. This is a recurring and serious problem — needs a guard in the server or agent prompts.
## 2026-03-19: Auto-assign re-assigns mergemaster to failed merge stories in a loop
After bug 295 fix (`auto_assign_available_work` after every pipeline advance), mergemaster gets re-assigned to stories that already have a merge failure flag. Story 310 had an empty diff merge failure — mergemaster correctly reported the failure, but auto-assign immediately re-assigned mergemaster to the same story, creating an infinite retry loop. The auto-assign logic needs to check for the `merge_failure` front matter flag before re-assigning agents to stories in `4_merge/`.
## 2026-03-19: Coder produces no code (complete ghost — story 310)
Story 310 (Bot delete command) went through the full pipeline — coder session ran, passed QA/gates, moved to merge — but the coder produced zero code. No commits on the feature branch, no commits on master. The entire agent session was a no-op. This is different from the "committed to master instead of worktree" problem — in this case, the coder simply did nothing. Need to investigate the coder logs to understand what happened. The empty-diff merge check would catch this at merge time, but ideally the server should detect "coder finished with no commits on feature branch" at the gate-check stage and fail early.
## 2026-03-19: Auto-assign assigns mergemaster to coding-stage stories
Auto-assign picked mergemaster for story 310 which was in `2_current/`. Mergemaster should only work on stories in `4_merge/`. The `auto_assign_available_work` function doesn't enforce that the agent's configured stage matches the pipeline stage of the story it's being assigned to. Story 279 (auto-assign respects agent stage from front matter) was supposed to fix this, but the check may only apply to front-matter preferences, not the fallback assignment path.

View File

@@ -1,272 +0,0 @@
# Project-wide default QA mode: "server", "agent", or "human".
# Per-story `qa` front matter overrides this setting.
default_qa = "server"
# Default model for coder agents. Only agents with this model are auto-assigned.
# Opus coders are reserved for explicit per-story `agent:` front matter requests.
default_coder_model = "sonnet"
# Maximum concurrent coder agents. Stories wait in 2_current/ when all slots are full.
max_coders = 3
# Maximum retries per story per pipeline stage before marking as blocked.
# Set to 0 to disable retry limits.
max_retries = 2
[[component]]
name = "frontend"
path = "frontend"
setup = ["npm install", "npm run build"]
teardown = []
[[component]]
name = "server"
path = "."
setup = ["mkdir -p frontend/dist", "cargo check"]
teardown = []
[[agent]]
name = "coder-1"
stage = "coder"
role = "Full-stack engineer. Implements features across all components."
model = "sonnet"
max_turns = 50
max_budget_usd = 5.00
prompt = "You are working in a git worktree on story {{story_id}}. Read CLAUDE.md first, then .story_kit/README.md to understand the dev process. The story details are in your prompt above. Follow the SDTW process through implementation and verification (Steps 1-3). The worktree and feature branch already exist - do not create them. Check .mcp.json for MCP tools. Do NOT accept the story or merge - commit your work and stop. If the user asks to review your changes, tell them to run: cd \"{{worktree_path}}\" && git difftool {{base_branch}}...HEAD\n\nIMPORTANT: Commit all your work before your process exits. The server will automatically run acceptance gates (cargo clippy + tests) when your process exits and advance the pipeline based on the results.\n\n## Bug Workflow: Root Cause First\nWhen working on bugs:\n1. Investigate the root cause before writing any fix. Use `git bisect` to find the breaking commit or `git log` to trace history. Read the relevant code before touching anything.\n2. Fix the root cause with a surgical, minimal change. Do NOT add new abstractions, wrappers, or workarounds when a targeted fix to the original code is possible.\n3. Write commit messages that explain what broke and why, not just what was changed.\n4. If you cannot determine the root cause after thorough investigation, document what you tried and why it was inconclusive — do not guess and ship a speculative fix."
system_prompt = "You are a full-stack engineer working autonomously in a git worktree. Follow the Story-Driven Test Workflow strictly. Run cargo clippy --all-targets --all-features and biome checks before considering work complete. Commit all your work before finishing - use a descriptive commit message. Do not accept stories, move them to archived, or merge to master - a human will do that. Do not coordinate with other agents - focus on your assigned story. The server automatically runs acceptance gates when your process exits. For bugs, always find and fix the root cause. Use git bisect to find breaking commits. Do not layer new code on top of existing code when a surgical fix is possible. If root cause is unclear after investigation, document what you tried rather than guessing."
[[agent]]
name = "coder-2"
stage = "coder"
role = "Full-stack engineer. Implements features across all components."
model = "sonnet"
max_turns = 50
max_budget_usd = 5.00
prompt = "You are working in a git worktree on story {{story_id}}. Read CLAUDE.md first, then .story_kit/README.md to understand the dev process. The story details are in your prompt above. Follow the SDTW process through implementation and verification (Steps 1-3). The worktree and feature branch already exist - do not create them. Check .mcp.json for MCP tools. Do NOT accept the story or merge - commit your work and stop. If the user asks to review your changes, tell them to run: cd \"{{worktree_path}}\" && git difftool {{base_branch}}...HEAD\n\nIMPORTANT: Commit all your work before your process exits. The server will automatically run acceptance gates (cargo clippy + tests) when your process exits and advance the pipeline based on the results.\n\n## Bug Workflow: Root Cause First\nWhen working on bugs:\n1. Investigate the root cause before writing any fix. Use `git bisect` to find the breaking commit or `git log` to trace history. Read the relevant code before touching anything.\n2. Fix the root cause with a surgical, minimal change. Do NOT add new abstractions, wrappers, or workarounds when a targeted fix to the original code is possible.\n3. Write commit messages that explain what broke and why, not just what was changed.\n4. If you cannot determine the root cause after thorough investigation, document what you tried and why it was inconclusive — do not guess and ship a speculative fix."
system_prompt = "You are a full-stack engineer working autonomously in a git worktree. Follow the Story-Driven Test Workflow strictly. Run cargo clippy --all-targets --all-features and biome checks before considering work complete. Commit all your work before finishing - use a descriptive commit message. Do not accept stories, move them to archived, or merge to master - a human will do that. Do not coordinate with other agents - focus on your assigned story. The server automatically runs acceptance gates when your process exits. For bugs, always find and fix the root cause. Use git bisect to find breaking commits. Do not layer new code on top of existing code when a surgical fix is possible. If root cause is unclear after investigation, document what you tried rather than guessing."
[[agent]]
name = "coder-3"
stage = "coder"
role = "Full-stack engineer. Implements features across all components."
model = "sonnet"
max_turns = 50
max_budget_usd = 5.00
prompt = "You are working in a git worktree on story {{story_id}}. Read CLAUDE.md first, then .story_kit/README.md to understand the dev process. The story details are in your prompt above. Follow the SDTW process through implementation and verification (Steps 1-3). The worktree and feature branch already exist - do not create them. Check .mcp.json for MCP tools. Do NOT accept the story or merge - commit your work and stop. If the user asks to review your changes, tell them to run: cd \"{{worktree_path}}\" && git difftool {{base_branch}}...HEAD\n\nIMPORTANT: Commit all your work before your process exits. The server will automatically run acceptance gates (cargo clippy + tests) when your process exits and advance the pipeline based on the results.\n\n## Bug Workflow: Root Cause First\nWhen working on bugs:\n1. Investigate the root cause before writing any fix. Use `git bisect` to find the breaking commit or `git log` to trace history. Read the relevant code before touching anything.\n2. Fix the root cause with a surgical, minimal change. Do NOT add new abstractions, wrappers, or workarounds when a targeted fix to the original code is possible.\n3. Write commit messages that explain what broke and why, not just what was changed.\n4. If you cannot determine the root cause after thorough investigation, document what you tried and why it was inconclusive — do not guess and ship a speculative fix."
system_prompt = "You are a full-stack engineer working autonomously in a git worktree. Follow the Story-Driven Test Workflow strictly. Run cargo clippy --all-targets --all-features and biome checks before considering work complete. Commit all your work before finishing - use a descriptive commit message. Do not accept stories, move them to archived, or merge to master - a human will do that. Do not coordinate with other agents - focus on your assigned story. The server automatically runs acceptance gates when your process exits. For bugs, always find and fix the root cause. Use git bisect to find breaking commits. Do not layer new code on top of existing code when a surgical fix is possible. If root cause is unclear after investigation, document what you tried rather than guessing."
[[agent]]
name = "qa-2"
stage = "qa"
role = "Reviews coder work in worktrees: runs quality gates, generates testing plans, and reports findings."
model = "sonnet"
max_turns = 40
max_budget_usd = 4.00
prompt = """You are the QA agent for story {{story_id}}. Your job is to review the coder's work in the worktree and produce a structured QA report.
Read CLAUDE.md first, then .story_kit/README.md to understand the dev process.
## Your Workflow
### 1. Code Quality Scan
- Run `git diff master...HEAD --stat` to see what files changed
- Run `git diff master...HEAD` to review the actual changes for obvious coding mistakes (unused imports, dead code, unhandled errors, hardcoded values)
- Run `cargo clippy --all-targets --all-features` and note any warnings
- If a `frontend/` directory exists:
- Run `npm run build` and note any TypeScript errors
- Run `npx @biomejs/biome check src/` and note any linting issues
### 2. Test Verification
- Run `cargo test` and verify all tests pass
- If `frontend/` exists: run `npm test` and verify all frontend tests pass
- Review test quality: look for tests that are trivial or don't assert meaningful behavior
### 3. Manual Testing Support
- Build the server: run `cargo build` and note success/failure
- If build succeeds: find a free port (try 3010-3020) and attempt to start the server
- Generate a testing plan including:
- URL to visit in the browser
- Things to check in the UI
- curl commands to exercise relevant API endpoints
- Kill the test server when done: `pkill -f 'target.*storkit' || true` (NEVER use `pkill -f storkit` — it kills the vite dev server)
### 4. Produce Structured Report
Print your QA report to stdout before your process exits. The server will automatically run acceptance gates. Use this format:
```
## QA Report for {{story_id}}
### Code Quality
- clippy: PASS/FAIL (details)
- TypeScript build: PASS/FAIL/SKIP (details)
- Biome lint: PASS/FAIL/SKIP (details)
- Code review findings: (list any issues found, or "None")
### Test Verification
- cargo test: PASS/FAIL (N tests)
- npm test: PASS/FAIL/SKIP (N tests)
- Test quality issues: (list any trivial/weak tests, or "None")
### Manual Testing Plan
- Server URL: http://localhost:PORT (or "Build failed")
- Pages to visit: (list)
- Things to check: (list)
- curl commands: (list)
### Overall: PASS/FAIL
```
## Rules
- Do NOT modify any code — read-only review only
- If the server fails to start, still provide the testing plan with curl commands
- The server automatically runs acceptance gates when your process exits"""
system_prompt = "You are a QA agent. Your job is read-only: review code quality, run tests, try to start the server, and produce a structured QA report. Do not modify code. The server automatically runs acceptance gates when your process exits."
[[agent]]
name = "coder-opus"
stage = "coder"
role = "Senior full-stack engineer for complex tasks. Implements features across all components."
model = "opus"
max_turns = 80
max_budget_usd = 20.00
prompt = "You are working in a git worktree on story {{story_id}}. Read CLAUDE.md first, then .story_kit/README.md to understand the dev process. The story details are in your prompt above. Follow the SDTW process through implementation and verification (Steps 1-3). The worktree and feature branch already exist - do not create them. Check .mcp.json for MCP tools. Do NOT accept the story or merge - commit your work and stop. If the user asks to review your changes, tell them to run: cd \"{{worktree_path}}\" && git difftool {{base_branch}}...HEAD\n\nIMPORTANT: Commit all your work before your process exits. The server will automatically run acceptance gates (cargo clippy + tests) when your process exits and advance the pipeline based on the results.\n\n## Bug Workflow: Root Cause First\nWhen working on bugs:\n1. Investigate the root cause before writing any fix. Use `git bisect` to find the breaking commit or `git log` to trace history. Read the relevant code before touching anything.\n2. Fix the root cause with a surgical, minimal change. Do NOT add new abstractions, wrappers, or workarounds when a targeted fix to the original code is possible.\n3. Write commit messages that explain what broke and why, not just what was changed.\n4. If you cannot determine the root cause after thorough investigation, document what you tried and why it was inconclusive — do not guess and ship a speculative fix."
system_prompt = "You are a senior full-stack engineer working autonomously in a git worktree. You handle complex tasks requiring deep architectural understanding. Follow the Story-Driven Test Workflow strictly. Run cargo clippy --all-targets --all-features and biome checks before considering work complete. Commit all your work before finishing - use a descriptive commit message. Do not accept stories, move them to archived, or merge to master - a human will do that. Do not coordinate with other agents - focus on your assigned story. The server automatically runs acceptance gates when your process exits. For bugs, always find and fix the root cause. Use git bisect to find breaking commits. Do not layer new code on top of existing code when a surgical fix is possible. If root cause is unclear after investigation, document what you tried rather than guessing."
[[agent]]
name = "qa"
stage = "qa"
role = "Reviews coder work in worktrees: runs quality gates, generates testing plans, and reports findings."
model = "sonnet"
max_turns = 40
max_budget_usd = 4.00
prompt = """You are the QA agent for story {{story_id}}. Your job is to review the coder's work in the worktree and produce a structured QA report.
Read CLAUDE.md first, then .story_kit/README.md to understand the dev process.
## Your Workflow
### 1. Code Quality Scan
- Run `git diff master...HEAD --stat` to see what files changed
- Run `git diff master...HEAD` to review the actual changes for obvious coding mistakes (unused imports, dead code, unhandled errors, hardcoded values)
- Run `cargo clippy --all-targets --all-features` and note any warnings
- If a `frontend/` directory exists:
- Run `npm run build` and note any TypeScript errors
- Run `npx @biomejs/biome check src/` and note any linting issues
### 2. Test Verification
- Run `cargo test` and verify all tests pass
- If `frontend/` exists: run `npm test` and verify all frontend tests pass
- Review test quality: look for tests that are trivial or don't assert meaningful behavior
### 3. Manual Testing Support
- Build the server: run `cargo build` and note success/failure
- If build succeeds: find a free port (try 3010-3020) and attempt to start the server
- Generate a testing plan including:
- URL to visit in the browser
- Things to check in the UI
- curl commands to exercise relevant API endpoints
- Kill the test server when done: `pkill -f 'target.*storkit' || true` (NEVER use `pkill -f storkit` — it kills the vite dev server)
### 4. Produce Structured Report
Print your QA report to stdout before your process exits. The server will automatically run acceptance gates. Use this format:
```
## QA Report for {{story_id}}
### Code Quality
- clippy: PASS/FAIL (details)
- TypeScript build: PASS/FAIL/SKIP (details)
- Biome lint: PASS/FAIL/SKIP (details)
- Code review findings: (list any issues found, or "None")
### Test Verification
- cargo test: PASS/FAIL (N tests)
- npm test: PASS/FAIL/SKIP (N tests)
- Test quality issues: (list any trivial/weak tests, or "None")
### Manual Testing Plan
- Server URL: http://localhost:PORT (or "Build failed")
- Pages to visit: (list)
- Things to check: (list)
- curl commands: (list)
### Overall: PASS/FAIL
```
## Rules
- Do NOT modify any code — read-only review only
- If the server fails to start, still provide the testing plan with curl commands
- The server automatically runs acceptance gates when your process exits"""
system_prompt = "You are a QA agent. Your job is read-only: review code quality, run tests, try to start the server, and produce a structured QA report. Do not modify code. The server automatically runs acceptance gates when your process exits."
[[agent]]
name = "mergemaster"
stage = "mergemaster"
role = "Merges completed coder work into master, runs quality gates, archives stories, and cleans up worktrees."
model = "opus"
max_turns = 30
max_budget_usd = 5.00
prompt = """You are the mergemaster agent for story {{story_id}}. Your job is to merge the completed coder work into master.
Read CLAUDE.md first, then .story_kit/README.md to understand the dev process.
## Your Workflow
1. Call merge_agent_work(story_id='{{story_id}}') via the MCP tool to trigger the full merge pipeline
2. Review the result: check success, had_conflicts, conflicts_resolved, gates_passed, and gate_output
3. If merge succeeded and gates passed: report success to the human
4. If conflicts were auto-resolved (conflicts_resolved=true) and gates passed: report success, noting which conflicts were resolved
5. If conflicts could not be auto-resolved: **resolve them yourself** in the merge worktree (see below)
6. If merge failed for any other reason: call report_merge_failure(story_id='{{story_id}}', reason='<details>') and report to the human
7. If gates failed after merge: attempt to fix the issues yourself in the merge worktree, then re-trigger merge_agent_work. After 3 fix attempts, call report_merge_failure and stop.
## Resolving Complex Conflicts Yourself
When the auto-resolver fails, you have access to the merge worktree at `.story_kit/merge_workspace/`. Go in there and resolve the conflicts manually:
1. Run `git diff --name-only --diff-filter=U` in the merge worktree to list conflicted files
2. **Build context before touching code.** Run `git log --oneline master...HEAD` on the feature branch to see its commits. Then run `git log --oneline --since="$(git log -1 --format=%ci <feature-branch-base-commit>)" master` to see what landed on master since the branch was created. Read the story files in `.story_kit/work/` for any recently merged stories that touch the same files — this tells you WHY master changed and what must be preserved.
3. Read each conflicted file and understand both sides of the conflict
4. **Understand intent, not just syntax.** The feature branch may be behind master — master's version of shared infrastructure is almost always correct. The feature branch's contribution is the NEW functionality it adds. Your job is to integrate the new into master's structure, not pick one side.
5. Resolve by integrating the feature's new functionality into master's code structure
5. Stage resolved files with `git add`
6. Run `cargo check` (and `npm run build` if frontend changed) to verify compilation
7. If it compiles, commit and re-trigger merge_agent_work
### Common conflict patterns in this project:
**Story file rename/rename conflicts:** Both branches moved the story .md file to different pipeline directories. Resolution: `git rm` both sides — story files in `work/2_current/`, `work/3_qa/`, `work/4_merge/` are gitignored and don't need to be committed.
**bot.rs tokio::select! conflicts:** Master has a `tokio::select!` loop in `handle_message()` that handles permission forwarding (story 275). Feature branches created before story 275 have a simpler direct `provider.chat_stream().await` call. Resolution: KEEP master's tokio::select! loop. Integrate only the feature's new logic (e.g. typing indicators, new callbacks) into the existing loop structure. Do NOT replace the loop with the old direct call.
**Duplicate functions/imports:** The auto-resolver keeps both sides, producing duplicates. Resolution: keep one copy (prefer master's version), delete the duplicate.
**Formatting-only conflicts:** Both sides reformatted the same code differently. Resolution: pick either side (prefer master).
## Fixing Gate Failures
If quality gates fail (cargo clippy, cargo test, npm run build, npm test), attempt to fix issues yourself in the merge worktree.
**Fix yourself (up to 3 attempts total):**
- Syntax errors (missing semicolons, brackets, commas)
- Duplicate definitions from merge artifacts
- Simple type annotation errors
- Unused import warnings flagged by clippy
- Mismatched braces from bad conflict resolution
- Trivial formatting issues that block compilation or linting
**Report to human without attempting a fix:**
- Logic errors or incorrect business logic
- Missing function implementations
- Architectural changes required
- Non-trivial refactoring needed
**Max retry limit:** If gates still fail after 3 fix attempts, call report_merge_failure to record the failure, then stop immediately and report the full gate output to the human.
## CRITICAL Rules
- NEVER manually move story files between pipeline stages (e.g. from 4_merge/ to 5_done/)
- NEVER call accept_story — only merge_agent_work can move stories to done after a successful merge
- When merge fails after exhausting your fix attempts, ALWAYS call report_merge_failure
- Report conflict resolution outcomes clearly
- Report gate failures with full output so the human can act if needed
- The server automatically runs acceptance gates when your process exits"""
system_prompt = "You are the mergemaster agent. Your primary job is to merge feature branches to master. First try the merge_agent_work MCP tool. If the auto-resolver fails on complex conflicts, resolve them yourself in the merge worktree — you are an opus-class agent capable of understanding both sides of a conflict and producing correct merged code. Common patterns: keep master's tokio::select! permission loop in bot.rs, discard story file rename conflicts (gitignored), remove duplicate definitions. After resolving, verify compilation before re-triggering merge. CRITICAL: Never manually move story files or call accept_story. After 3 failed fix attempts, call report_merge_failure and stop."

View File

@@ -1,33 +0,0 @@
# Project Context
## High-Level Goal
To build a standalone **Agentic AI Code Assistant** application as a single Rust binary that serves a Vite/React web UI and exposes a WebSocket API. The assistant will facilitate a test-driven development (TDD) workflow first, with both unit and integration tests providing the primary guardrails for code changes. Once the single-threaded TDD workflow is stable and usable (including compatibility with lower-cost agents), the project will evolve to a multi-agent orchestration model using Git worktrees and supervisory roles to maximize throughput. Unlike a passive chat interface, this assistant acts as an **Agent**, capable of using tools to read the filesystem, execute shell commands, manage git repositories, and modify code directly to implement features.
## Core Features
1. **Chat Interface:** A conversational UI for the user to interact with the AI assistant.
2. **Agentic Tool Bridge:** A robust system mapping LLM "Tool Calls" to native Rust functions.
* **Filesystem:** Read/Write access (scoped to the target project).
* **Search:** High-performance file searching (ripgrep-style) and content retrieval.
* **Shell Integration:** Ability to execute approved commands (e.g., `cargo`, `npm`, `git`) to run tests, linters, and version control.
3. **Workflow Management:** Specialized tools to manage a TDD-first lifecycle:
* Defining test requirements (unit + integration) before code changes.
* Implementing code via red-green-refactor.
* Enforcing test and quality gates before acceptance.
* Scaling later to multi-agent orchestration with Git worktrees and supervisory checks, after the single-threaded process is stable.
4. **LLM Integration:** Connection to an LLM backend to drive the intelligence and tool selection.
* **Remote:** Support for major APIs (Anthropic Claude, Google Gemini, OpenAI, etc).
* **Local:** Support for local inference via Ollama.
## Domain Definition
* **User:** A software engineer using the assistant to build a project.
* **Target Project:** The local software project the user is working on.
* **Agent:** The AI entity that receives prompts and decides which **Tools** to invoke to solve the problem.
* **Tool:** A discrete function exposed to the Agent (e.g., `run_shell_command`, `write_file`, `search_project`).
* **Story:** A unit of work defining a change (Feature Request).
* **Spec:** A persistent documentation artifact defining the current truth of the system.
## Glossary
* **SDSW:** Story-Driven Spec Workflow.
* **Web Server Binary:** The Rust binary that serves the Vite/React frontend and exposes the WebSocket API.
* **Living Spec:** The collection of Markdown files in `.story_kit/` that define the project.
* **Tool Call:** A structured request from the LLM to execute a specific native function.

View File

@@ -1,44 +0,0 @@
# Slack Integration Setup
## Bot Configuration
Slack integration is configured via `bot.toml` in the project's `.story_kit/` directory:
```toml
transport = "slack"
display_name = "Storkit"
slack_bot_token = "xoxb-..."
slack_signing_secret = "..."
slack_channel_ids = ["C01ABCDEF"]
```
## Slack App Configuration
### Event Subscriptions
1. In your Slack app settings, enable **Event Subscriptions**.
2. Set the **Request URL** to: `https://<your-host>/webhook/slack`
3. Subscribe to the `message.channels` and `message.im` bot events.
### Slash Commands
Slash commands provide quick access to pipeline commands without mentioning the bot.
1. In your Slack app settings, go to **Slash Commands**.
2. Create the following commands, all pointing to the same **Request URL**: `https://<your-host>/webhook/slack/command`
| Command | Description |
|---------|-------------|
| `/storkit-status` | Show pipeline status and agent availability |
| `/storkit-cost` | Show token spend: 24h total, top stories, and breakdown |
| `/storkit-show` | Display the full text of a work item (e.g. `/storkit-show 42`) |
| `/storkit-git` | Show git status: branch, changes, ahead/behind |
| `/storkit-htop` | Show system and agent process dashboard |
All slash command responses are **ephemeral** — only the user who invoked the command sees the response.
### OAuth & Permissions
Required bot token scopes:
- `chat:write` — send messages
- `commands` — handle slash commands

View File

@@ -1,33 +0,0 @@
# Functional Spec: UI Layout
## 1. Global Structure
The application uses a **fixed-layout** strategy to maximize chat visibility.
```text
+-------------------------------------------------------+
| HEADER (Fixed Height, e.g., 50px) |
| [Project: ~/foo/bar] [Model: llama3] [x] Tools |
+-------------------------------------------------------+
| |
| CHAT AREA (Flex Grow, Scrollable) |
| |
| (User Message) |
| (Agent Message) |
| |
+-------------------------------------------------------+
| INPUT AREA (Fixed Height, Bottom) |
| [ Input Field ........................... ] [Send] |
+-------------------------------------------------------+
```
## 2. Components
* **Header:** Contains global context (Project) and session config (Model/Tools).
* *Constraint:* Must not scroll away.
* **ChatList:** The scrollable container for messages.
* **InputBar:** Pinned to the bottom.
## 3. Styling
* Use Flexbox (`flex-direction: column`) on the main container.
* Header: `flex-shrink: 0`.
* ChatList: `flex-grow: 1`, `overflow-y: auto`.
* InputBar: `flex-shrink: 0`.

View File

@@ -1,474 +0,0 @@
# Functional Spec: UI/UX Responsiveness
## Problem
Currently, the `chat` command in Rust is an async function that performs a long-running, blocking loop (waiting for LLM, executing tools). While Tauri executes this on a separate thread from the UI, the frontend awaits the *entire* result before re-rendering. This makes the app feel "frozen" because there is no feedback during the 10-60 seconds of generation.
## Solution: Event-Driven Feedback
Instead of waiting for the final array of messages, the Backend should emit **Events** to the Frontend in real-time.
### 1. Events
* `chat:token`: Emitted when a text token is generated (Streaming text).
* `chat:tool-start`: Emitted when a tool call begins (e.g., `{ tool: "git status" }`).
* `chat:tool-end`: Emitted when a tool call finishes (e.g., `{ output: "..." }`).
### 2. Implementation Strategy
#### Token-by-Token Streaming (Story 18)
The system now implements full token streaming for real-time response display:
* **Backend (Rust):**
* Set `stream: true` in Ollama API requests
* Parse newline-delimited JSON from Ollama's streaming response
* Emit `chat:token` events for each token received
* Use `reqwest` streaming body with async iteration
* After streaming completes, emit `chat:update` with the full message
* **Frontend (TypeScript):**
* Listen for `chat:token` events
* Append tokens to the current assistant message in real-time
* Maintain smooth auto-scroll as tokens arrive
* After streaming completes, process `chat:update` for final state
* **Event-Driven Updates:**
* `chat:token`: Emitted for each token during streaming (payload: `{ content: string }`)
* `chat:update`: Emitted after LLM response complete or after Tool Execution (payload: `Message[]`)
* Frontend maintains streaming state separate from message history
### 3. Visuals
* **Loading State:** The "Send" button should show a spinner or "Stop" button.
* **Auto-Scroll:** The chat view uses smart auto-scroll that respects user scrolling (see Smart Auto-Scroll section below).
## Smart Auto-Scroll (Story 22)
### Problem
Users need to review previous messages while the AI is streaming new content, but aggressive auto-scrolling constantly drags them back to the bottom, making it impossible to read older content.
### Solution: Scroll-Position-Aware Auto-Scroll
The chat implements intelligent auto-scroll that:
* Automatically scrolls to show new content when the user is at/near the bottom
* Pauses auto-scroll when the user scrolls up to review older messages
* Resumes auto-scroll when the user scrolls back to the bottom
### Requirements
1. **Scroll Detection:** Track whether the user is at the bottom of the chat
2. **Threshold:** Define "near bottom" as within 25px of the bottom
3. **Auto-Scroll Logic:** Only trigger auto-scroll if user is at/near bottom
4. **Smooth Operation:** No flickering or jarring behavior during scrolling
5. **Universal:** Works during both streaming responses and tool execution
### Implementation Notes
**Core Components:**
* `scrollContainerRef`: Reference to the scrollable messages container
* `shouldAutoScrollRef`: Tracks whether auto-scroll should be active (uses ref to avoid re-renders)
* `messagesEndRef`: Target element for scroll-to-bottom behavior
**Detection Function:**
```typescript
const isScrolledToBottom = () => {
const element = scrollContainerRef.current;
if (!element) return true;
const threshold = 25; // pixels from bottom
return (
element.scrollHeight - element.scrollTop - element.clientHeight < threshold
);
};
```
**Scroll Handler:**
```typescript
const handleScroll = () => {
// Update auto-scroll state based on scroll position
shouldAutoScrollRef.current = isScrolledToBottom();
};
```
**Conditional Auto-Scroll:**
```typescript
useEffect(() => {
if (shouldAutoScrollRef.current) {
scrollToBottom();
}
}, [messages, streamingContent]);
```
**DOM Setup:**
* Attach `ref={scrollContainerRef}` to the messages container
* Attach `onScroll={handleScroll}` to detect user scrolling
* Initialize `shouldAutoScrollRef` to `true` (enable auto-scroll by default)
### Edge Cases
1. **Initial Load:** Auto-scroll is enabled by default
2. **Rapid Scrolling:** Uses refs to avoid race conditions and excessive re-renders
3. **Manual Scroll to Bottom:** Auto-scroll re-enables when user scrolls near bottom
4. **No Container:** Falls back to always allowing auto-scroll if container ref is null
## Tool Output Display
### Problem
Tool outputs (like file contents, search results, or command output) can be very long, making the chat history difficult to read. Users need to see the Agent's reasoning and responses without being overwhelmed by verbose tool output.
### Solution: Collapsible Tool Outputs
Tool outputs should be rendered in a collapsible component that is **closed by default**.
### Requirements
1. **Default State:** Tool outputs are collapsed/closed when first rendered
2. **Summary Line:** Shows essential information without expanding:
- Tool name (e.g., `read_file`, `exec_shell`)
- Key arguments (e.g., file path, command name)
- Format: "▶ tool_name(key_arg)"
- Example: "▶ read_file(src/main.rs)"
- Example: "▶ exec_shell(cargo check)"
3. **Expandable:** User can click the summary to toggle expansion
4. **Output Display:** When expanded, shows the complete tool output in a readable format:
- Use `<pre>` or monospace font for code/terminal output
- Preserve whitespace and line breaks
- Limit height with scrolling for very long outputs (e.g., max-height: 300px)
5. **Visual Indicator:** Clear arrow or icon showing collapsed/expanded state
6. **Styling:** Consistent with the dark theme, distinguishable from assistant messages
### Implementation Notes
* Use native `<details>` and `<summary>` HTML elements for accessibility
* Or implement custom collapsible component with proper ARIA attributes
* Tool outputs should be visually distinct (border, background color, or badge)
* Multiple tool calls in sequence should each be independently collapsible
## Scroll Bar Styling
### Problem
Visible scroll bars create visual clutter and make the interface feel less polished. Standard browser scroll bars can be distracting and break the clean aesthetic of the dark theme.
### Solution: Hidden Scroll Bars with Maintained Functionality
Scroll bars should be hidden while maintaining full scroll functionality.
### Requirements
1. **Visual:** Scroll bars should not be visible to the user
2. **Functionality:** Scrolling must still work perfectly:
- Mouse wheel scrolling
- Trackpad scrolling
- Keyboard navigation (arrow keys, page up/down)
- Auto-scroll to bottom for new messages
3. **Cross-browser:** Solution must work on Chrome, Firefox, and Safari
4. **Areas affected:**
- Main chat message area (vertical scroll)
- Tool output content (both vertical and horizontal)
- Any other scrollable containers
### Implementation Notes
* Use CSS `scrollbar-width: none` for Firefox
* Use `::-webkit-scrollbar { display: none; }` for Chrome/Safari/Edge
* Maintain `overflow: auto` or `overflow-y: scroll` to preserve scroll functionality
* Ensure `overflow-x: hidden` where horizontal scroll is not needed
* Test with very long messages and large tool outputs to ensure no layout breaking
## Text Alignment and Readability
### Problem
Center-aligned text in a chat interface is unconventional and reduces readability, especially for code blocks and long-form content. Standard chat UIs align messages differently based on the sender.
### Solution: Context-Appropriate Text Alignment
Messages should follow standard chat UI conventions with proper alignment based on message type.
### Requirements
1. **User Messages:** Right-aligned (standard pattern showing messages sent by the user)
2. **Assistant Messages:** Left-aligned (standard pattern showing messages received)
3. **Tool Outputs:** Left-aligned (part of the system/assistant response flow)
4. **Code Blocks:** Always left-aligned regardless of message type (for readability)
5. **Container:** Remove any center-alignment from the chat container
6. **Max-Width:** Maintain current max-width constraint (e.g., 768px) for optimal readability
7. **Spacing:** Maintain proper padding and visual hierarchy between messages
### Implementation Notes
* Check for `textAlign: "center"` in inline styles and remove
* Check for `text-align: center` in CSS and remove from chat-related classes
* Ensure flexbox alignment is set appropriately:
* User messages: `alignItems: "flex-end"`
* Assistant/Tool messages: `alignItems: "flex-start"`
* Code blocks should have `text-align: left` explicitly set
## Syntax Highlighting
### Problem
Code blocks in assistant responses currently lack syntax highlighting, making them harder to read and understand. Developers expect colored syntax highlighting similar to their code editors.
### Solution: Syntax Highlighting for Code Blocks
Integrate syntax highlighting into markdown code blocks rendered by the assistant.
### Requirements
1. **Languages Supported:** At minimum:
- JavaScript/TypeScript
- Rust
- Python
- JSON
- Markdown
- Shell/Bash
- HTML/CSS
- SQL
2. **Theme:** Use a dark theme that complements the existing dark UI (e.g., `oneDark`, `vsDark`, `dracula`)
3. **Integration:** Work seamlessly with `react-markdown` component
4. **Performance:** Should not significantly impact rendering performance
5. **Fallback:** Plain monospace text for unrecognized languages
6. **Inline Code:** Inline code (single backticks) should maintain simple styling without full syntax highlighting
### Implementation Notes
* Use `react-syntax-highlighter` library with `react-markdown`
* Or use `rehype-highlight` plugin for `react-markdown`
* Configure with a dark theme preset (e.g., `oneDark` from `react-syntax-highlighter/dist/esm/styles/prism`)
* Apply to code blocks via `react-markdown` components prop:
```tsx
<Markdown
components={{
code: ({node, inline, className, children, ...props}) => {
const match = /language-(\w+)/.exec(className || '');
return !inline && match ? (
<SyntaxHighlighter style={oneDark} language={match[1]} {...props}>
{String(children).replace(/\n$/, '')}
</SyntaxHighlighter>
) : (
<code className={className} {...props}>{children}</code>
);
}
}}
/>
```
* Ensure syntax highlighted code blocks are left-aligned
* Test with various code samples to ensure proper rendering
## Token Streaming
### Problem
Without streaming, users see no feedback during model generation. The response appears all at once after waiting, which feels unresponsive and provides no indication that the system is working.
### Solution: Token-by-Token Streaming
Stream tokens from Ollama in real-time and display them as they arrive, providing immediate feedback and a responsive chat experience similar to ChatGPT.
### Requirements
1. **Real-time Display:** Tokens appear immediately as Ollama generates them
2. **Smooth Performance:** No lag or stuttering during high token throughput
3. **Tool Compatibility:** Streaming works correctly with tool calls and multi-turn conversations
4. **Auto-scroll:** Chat view follows streaming content automatically
5. **Error Handling:** Gracefully handle stream interruptions or errors
6. **State Management:** Maintain clean separation between streaming state and final message history
### Implementation Notes
#### Backend (Rust)
* Enable streaming in Ollama requests: `stream: true`
* Parse newline-delimited JSON from response body
* Each line is a separate JSON object: `{"message":{"content":"token"},"done":false}`
* Use `futures::StreamExt` or similar for async stream processing
* Emit `chat:token` event for each token
* Emit `chat:update` when streaming completes
* Handle both streaming text and tool call interruptions
#### Frontend (TypeScript)
* Create streaming state separate from message history
* Listen for `chat:token` events and append to streaming buffer
* Render streaming content in real-time
* On `chat:update`, replace streaming content with final message
* Maintain scroll position during streaming
#### Ollama Streaming Format
```json
{"message":{"role":"assistant","content":"Hello"},"done":false}
{"message":{"role":"assistant","content":" world"},"done":false}
{"message":{"role":"assistant","content":"!"},"done":true}
{"message":{"role":"assistant","tool_calls":[...]},"done":true}
```
### Edge Cases
* Tool calls during streaming: Switch from text streaming to tool execution
* Cancellation during streaming: Clean up streaming state properly
* Network interruptions: Show error and preserve partial content
* Very fast streaming: Throttle UI updates if needed for performance
## Input Focus Management
### Problem
When the app loads with a project selected, users need to click into the chat input box before they can start typing. This adds unnecessary friction to the user experience.
### Solution: Auto-focus on Component Mount
The chat input field should automatically receive focus when the chat component mounts, allowing users to immediately start typing.
### Requirements
1. **Auto-focus:** Input field receives focus automatically when chat component loads
2. **Visible Cursor:** Cursor should be visible and blinking in the input field
3. **Immediate Typing:** User can start typing without clicking into the field
4. **Non-intrusive:** Should not interfere with other UI interactions or accessibility
5. **Timing:** Focus should be set after the component fully mounts
### Implementation Notes
* Use React `useRef` to create a reference to the input element
* Use `useEffect` with empty dependency array to run once on mount
* Call `inputRef.current?.focus()` in the effect
* Ensure the ref is properly attached to the input element
* Example implementation:
```tsx
const inputRef = useRef<HTMLInputElement>(null);
useEffect(() => {
inputRef.current?.focus();
}, []);
return <input ref={inputRef} ... />
```
## Response Interruption
### Problem
Users may want to interrupt a long-running model response to ask a different question or change direction. Having to wait for the full response to complete creates friction and wastes time.
### Solution: Interrupt on Typing
When the user starts typing in the input field while the model is generating a response, the generation should be cancelled immediately, allowing the user to send a new message.
### Requirements
1. **Input Always Enabled:** The input field should remain enabled and usable even while the model is generating
2. **Interrupt Detection:** Detect when user types in the input field while `loading` state is true
3. **Immediate Cancellation:** Cancel the ongoing generation as soon as typing is detected
4. **Preserve Partial Response:** Any partial response generated before interruption should remain visible in the chat
5. **State Reset:** UI should return to normal state (ready to send) after interruption
6. **Preserve User Input:** The user's new input should be preserved in the input field
7. **Visual Feedback:** "Thinking..." indicator should disappear when generation is interrupted
### Implementation Notes
* Do NOT disable the input field during loading
* Listen for input changes while `loading` is true
* When user types during loading, call backend to cancel generation (if possible) or just stop waiting
* Set `loading` state to false immediately when typing detected
* Backend may need a `cancel_chat` command or similar
* Consider if Ollama requests can be cancelled mid-generation or if we just stop processing the response
* Example implementation:
```tsx
const handleInputChange = (e: React.ChangeEvent<HTMLInputElement>) => {
const newValue = e.target.value;
setInput(newValue);
// If user starts typing while model is generating, interrupt
if (loading && newValue.length > input.length) {
setLoading(false);
// Optionally call backend to cancel: invoke("cancel_chat")
}
};
```
## Session Management
### Problem
Users may want to start a fresh conversation without restarting the application. Long conversations can become unwieldy, and users need a way to clear context for new tasks while keeping the same project open.
### Solution: New Session Button
Provide a clear, accessible way for users to start a new session by clearing the chat history.
### Requirements
1. **Button Placement:** Located in the header area, near model controls
2. **Visual Design:** Secondary/subtle styling to prevent accidental clicks
3. **Confirmation Dialog:** Ask "Are you sure? This will clear all messages." before clearing
4. **State Management:**
- Clear `messages` state array
- Clear `streamingContent` if any streaming is in progress
- Preserve project path, model selection, and tool settings
- Cancel any in-flight backend operations before clearing
5. **User Feedback:** Immediate visual response (messages disappear)
6. **Empty State:** Show a welcome message or empty state after clearing
### Implementation Notes
**Frontend:**
- Add "New Session" button to header
- Implement confirmation modal/dialog
- Call `setMessages([])` after confirmation
- Cancel any ongoing streaming/tool execution
- Consider keyboard shortcut (e.g., Cmd/Ctrl+K)
**Backend:**
- May need to cancel ongoing chat operations
- Clear any server-side state if applicable
- No persistent session history (sessions are ephemeral)
**Edge Cases:**
- Don't clear while actively streaming (cancel first, then clear)
- Handle confirmation dismissal (do nothing)
- Ensure button is always accessible (not disabled)
### Button Label Options
- "New Session" (clear and descriptive)
- "Clear Chat" (direct but less friendly)
- "Start Over" (conversational)
- Icon: 🔄 or ⊕ (plus in circle)
## Context Window Usage Display
### Problem
Users have no visibility into how much of the model's context window they're using. This leads to:
- Unexpected quality degradation when context limit is reached
- Uncertainty about when to start a new session
- Inability to gauge conversation length
### Solution: Real-time Context Usage Indicator
Display a persistent indicator showing current token usage vs. model's context window limit.
### Requirements
1. **Visual Indicator:** Always visible in header area
2. **Real-time Updates:** Updates as messages are added
3. **Model-Aware:** Shows correct limit based on selected model
4. **Color Coding:** Visual warning as limit approaches
- Green/default: 0-74% usage
- Yellow/warning: 75-89% usage
- Red/danger: 90-100% usage
5. **Clear Format:** "2.5K / 8K tokens (31%)" or similar
6. **Token Estimation:** Approximate token count for all messages
### Implementation Notes
**Token Estimation:**
- Use simple approximation: 1 token ≈ 4 characters
- Or integrate `gpt-tokenizer` for more accuracy
- Count: system prompts + user messages + assistant responses + tool outputs + tool calls
**Model Context Windows:**
- llama3.1, llama3.2: 8K tokens
- qwen2.5-coder: 32K tokens
- deepseek-coder: 16K tokens
- Default/unknown: 8K tokens
**Calculation:**
```tsx
const estimateTokens = (text: string): number => {
return Math.ceil(text.length / 4);
};
const calculateContextUsage = (messages: Message[], systemPrompt: string) => {
let total = estimateTokens(systemPrompt);
messages.forEach(msg => {
total += estimateTokens(msg.content);
if (msg.tool_calls) {
total += estimateTokens(JSON.stringify(msg.tool_calls));
}
});
return total;
};
```
**UI Placement:**
- Header area, near model selector
- Non-intrusive but always visible
- Optional tooltip with breakdown on hover
### Edge Cases
- Empty conversation: Show "0 / 8K"
- During streaming: Include partial content
- After clearing: Reset to 0
- Model change: Update context window limit

View File

@@ -1,130 +0,0 @@
# Tech Stack & Constraints
## Overview
This project is a standalone Rust **web server binary** that serves a Vite/React frontend and exposes a **WebSocket API**. The built frontend assets are packaged with the binary (in a `frontend` directory) and served as static files. It functions as an **Agentic Code Assistant** capable of safely executing tools on the host system.
## Core Stack
* **Backend:** Rust (Web Server)
* **MSRV:** Stable (latest)
* **Framework:** Poem HTTP server with WebSocket support for streaming; HTTP APIs should use Poem OpenAPI (Swagger) for non-streaming endpoints.
* **Frontend:** TypeScript + React
* **Build Tool:** Vite
* **Package Manager:** npm
* **Styling:** CSS Modules or Tailwind (TBD - Defaulting to CSS Modules)
* **State Management:** React Context / Hooks
* **Chat UI:** Rendered Markdown with syntax highlighting.
## Agent Architecture
The application follows a **Tool-Use (Function Calling)** architecture:
1. **Frontend:** Collects user input and sends it to the LLM.
2. **LLM:** Decides to generate text OR request a **Tool Call** (e.g., `execute_shell`, `read_file`).
3. **Web Server Backend (The "Hand"):**
* Intercepts Tool Calls.
* Validates the request against the **Safety Policy**.
* Executes the native code (File I/O, Shell Process, Search).
* Returns the output (stdout/stderr/file content) to the LLM.
* **Streaming:** The backend sends real-time updates over WebSocket to keep the UI responsive during long-running Agent tasks.
## LLM Provider Abstraction
To support both Remote and Local models, the system implements a `ModelProvider` abstraction layer.
* **Strategy:**
* Abstract the differences between API formats (OpenAI-compatible vs Anthropic vs Gemini).
* Normalize "Tool Use" definitions, as each provider handles function calling schemas differently.
* **Supported Providers:**
* **Ollama:** Local inference (e.g., Llama 3, DeepSeek Coder) for privacy and offline usage.
* **Anthropic:** Claude 3.5 models (Sonnet, Haiku) via API for coding tasks (Story 12).
* **Provider Selection:**
* Automatic detection based on model name prefix:
* `claude-` → Anthropic API
* Otherwise → Ollama
* Single unified model dropdown with section headers ("Anthropic", "Ollama")
* **API Key Management:**
* Anthropic API key stored server-side and persisted securely
* On first use of Claude model, user prompted to enter API key
* Key persists across sessions (no re-entry needed)
## Tooling Capabilities
### 1. Filesystem (Native)
* **Scope:** Strictly limited to the user-selected `project_root`.
* **Operations:** Read, Write, List, Delete.
* **Constraint:** Modifications to `.git/` are strictly forbidden via file APIs (use Git tools instead).
### 2. Shell Execution
* **Library:** `tokio::process` for async execution.
* **Constraint:** We do **not** run an interactive shell (repl). We run discrete, stateless commands.
* **Allowlist:** The agent may only execute specific binaries:
* `git`
* `cargo`, `rustc`, `rustfmt`, `clippy`
* `npm`, `node`, `yarn`, `pnpm`, `bun`
* `ls`, `find`, `grep` (if not using internal search)
* `mkdir`, `rm`, `touch`, `mv`, `cp`
### 3. Search & Navigation
* **Library:** `ignore` (by BurntSushi) + `grep` logic.
* **Behavior:**
* Must respect `.gitignore` files automatically.
* Must be performant (parallel traversal).
## Coding Standards
### Rust
* **Style:** `rustfmt` standard.
* **Linter:** `clippy` - Must pass with 0 warnings before merging.
* **Error Handling:** Custom `AppError` type deriving `thiserror`. All Commands return `Result<T, AppError>`.
* **Concurrency:** Heavy tools (Search, Shell) must run on `tokio` threads to avoid blocking the UI.
* **Quality Gates:**
* `cargo clippy --all-targets --all-features` must show 0 errors, 0 warnings
* `cargo check` must succeed
* `cargo nextest run` must pass all tests
* **Test Coverage:**
* Generate JSON report: `cargo llvm-cov nextest --no-clean --json --output-path .story_kit/coverage/server.json`
* Generate lcov report: `cargo llvm-cov report --lcov --output-path .story_kit/coverage/server.lcov`
* Reports are written to `.story_kit/coverage/` (excluded from git)
### TypeScript / React
* **Style:** Biome formatter (replaces Prettier/ESLint).
* **Linter:** Biome - Must pass with 0 errors, 0 warnings before merging.
* **Types:** Shared types with Rust (via `tauri-specta` or manual interface matching) are preferred to ensure type safety across the bridge.
* **Testing:** Vitest for unit/component tests; Playwright for end-to-end tests.
* **Quality Gates:**
* `npx @biomejs/biome check src/` must show 0 errors, 0 warnings
* `npm run build` must succeed
* `npm test` must pass
* `npm run test:e2e` must pass
* No `any` types allowed (use proper types or `unknown`)
* React keys must use stable IDs, not array indices
* All buttons must have explicit `type` attribute
## Libraries (Approved)
* **Rust:**
* `serde`, `serde_json`: Serialization.
* `ignore`: Fast recursive directory iteration respecting gitignore.
* `walkdir`: Simple directory traversal.
* `tokio`: Async runtime.
* `reqwest`: For LLM API calls (Anthropic, Ollama).
* `eventsource-stream`: For Server-Sent Events (Anthropic streaming).
* `uuid`: For unique message IDs.
* `chrono`: For timestamps.
* `poem`: HTTP server framework.
* `poem-openapi`: OpenAPI (Swagger) for non-streaming HTTP APIs.
* **JavaScript:**
* `react-markdown`: For rendering chat responses.
* `vitest`: Unit/component testing.
* `playwright`: End-to-end testing.
## Running the App (Worktrees & Ports)
Multiple instances can run simultaneously in different worktrees. To avoid port conflicts:
- **Backend:** Set `STORKIT_PORT` to a unique port (default is 3001). Example: `STORKIT_PORT=3002 cargo run`
- **Frontend:** Run `npm run dev` from `frontend/`. It auto-selects the next unused port. It reads `STORKIT_PORT` to know which backend to talk to, so export it before running: `export STORKIT_PORT=3002 && cd frontend && npm run dev`
When running in a worktree, use a port that won't conflict with the main instance (3001). Ports 3002+ are good choices.
## Safety & Sandbox
1. **Project Scope:** The application must strictly enforce that it does not read/write outside the `project_root` selected by the user.
2. **Human in the Loop:**
* Shell commands that modify state (non-readonly) should ideally require a UI confirmation (configurable).
* File writes must be confirmed or revertible.