Story Kit: The Story-Driven Test Workflow (SDTW)
Target Audience: Large Language Models (LLMs) acting as Senior Engineers. Goal: To maintain long-term project coherence, prevent context window exhaustion, and ensure high-quality, testable code generation in large software projects.
0. First Steps (For New LLM Sessions)
When you start a new session with this project:
- Check for MCP Tools: Read
.mcp.jsonto discover the MCP server endpoint. Then list available tools by calling:This returns the full tool catalog (create stories, spawn agents, record tests, manage worktrees, etc.). Familiarize yourself with the available tools before proceeding. These tools allow you to directly manipulate the workflow and spawn subsidiary agents without manual file manipulation.curl -s "$(jq -r '.mcpServers["storkit"].url' .mcp.json)" \ -H 'Content-Type: application/json' \ -d '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}' - Read Context: Check
.story_kit/specs/00_CONTEXT.mdfor high-level project goals. - Read Stack: Check
.story_kit/specs/tech/STACK.mdfor technical constraints and patterns. - Check Work Items: Look at
.story_kit/work/1_backlog/and.story_kit/work/2_current/to see what work is pending.
1. The Philosophy
We treat the codebase as the implementation of a "Living Specification." driven by User Stories Instead of ephemeral chat prompts ("Fix this", "Add that"), we work through persistent artifacts.
- Stories define the Change.
- Tests define the Truth.
- Code defines the Reality.
The Golden Rule: You are not allowed to write code until the Acceptance Criteria are captured in the story.
1.5 MCP Tools
Agents have programmatic access to the workflow via MCP tools served at POST /mcp. The project .mcp.json registers this endpoint automatically so Claude Code sessions and spawned agents can call tools like create_story, validate_stories, list_upcoming, get_story_todos, record_tests, ensure_acceptance, start_agent, stop_agent, list_agents, and get_agent_output without parsing English instructions.
To discover what tools are available: Check .mcp.json for the server endpoint, then use the MCP protocol to list available tools.
2. Directory Structure
project_root/
.mcp.json # MCP server configuration (if MCP tools are available)
.story_kit/
├── README.md # This document
├── project.toml # Agent configuration (roles, models, prompts)
├── work/ # Unified work item pipeline (stories, bugs, spikes)
│ ├── 1_backlog/ # New work items awaiting implementation
│ ├── 2_current/ # Work in progress
│ ├── 3_qa/ # QA review
│ ├── 4_merge/ # Ready to merge to master
│ ├── 5_done/ # Merged and completed (auto-swept to 6_archived after 4 hours)
│ └── 6_archived/ # Long-term archive
├── worktrees/ # Agent worktrees (managed by the server)
├── specs/ # Minimal guardrails (context + stack)
│ ├── 00_CONTEXT.md # High-level goals, domain definition, and glossary
│ ├── tech/ # Implementation details (Stack, Architecture, Constraints)
│ │ └── STACK.md # The "Constitution" (Languages, Libs, Patterns)
│ └── functional/ # Domain logic (Platform-agnostic behavior)
│ └── ...
└── src/ # The Code
Work Items
All work items (stories, bugs, spikes) live in the same work/ pipeline. Items are named: {id}_{type}_{slug}.md
- Stories:
57_story_live_test_gate_updates.md - Bugs:
4_bug_run_button_does_not_start_agent.md - Spikes:
61_spike_filesystem_watcher_architecture.md
Items move through stages by moving the file between directories:
1_backlog → 2_current → 3_qa → 4_merge → 5_done → 6_archived
Items in 5_done are auto-swept to 6_archived after 4 hours by the server.
Filesystem Watcher
The server watches .story_kit/work/ for changes. When a file is created, moved, or modified, the watcher auto-commits with a deterministic message and broadcasts a WebSocket notification to the frontend. This means:
- MCP tools only need to write/move files — the watcher handles git commits
- IDE drag-and-drop works (drag a story from
1_backlog/to2_current/) - The frontend updates automatically without manual refresh
3. The Cycle (The "Loop")
When the user asks for a feature, follow this 4-step loop strictly:
Step 1: The Story (Ingest)
- User Input: "I want the robot to dance."
- Action: Create a story via MCP tool
create_story(guarantees correct front matter and auto-assigns the story number). - Front Matter (Required): Every work item file MUST begin with YAML front matter containing a
namefield:--- name: Short Human-Readable Story Name --- - Move to Current: Once the story is validated and ready for coding, move it to
work/2_current/. - Tracking: Mark Acceptance Criteria as tested directly in the story file as tests are completed.
- Content:
- User Story: "As a user, I want..."
- Acceptance Criteria: Bullet points of observable success.
- Out of scope: Things that are out of scope so that the LLM doesn't go crazy
- Story Quality (INVEST): Stories should be Independent, Negotiable, Valuable, Estimable, Small, and Testable.
- Git: The
start_agentMCP tool automatically creates a worktree under.story_kit/worktrees/, checks out a feature branch, moves the story towork/2_current/, and spawns the agent. No manual branch or worktree creation is needed.
Step 2: The Implementation (Code)
- Action: Write the code to satisfy the approved tests and Acceptance Criteria.
- Constraint: adhere strictly to
specs/tech/STACK.md(e.g., if it forbids certain patterns, you must not use them). - Full-Stack Completion: Every story must be completed across all components of the stack. If a feature touches the backend, frontend, and API layer, all three must be fully implemented and working end-to-end before the story can be accepted. Partial implementations (e.g., backend logic with no frontend wiring, or UI scaffolding with no real data) do not satisfy acceptance criteria.
Step 3: Verification (Close)
- Action: For each Acceptance Criterion in the story, write a failing test (red), mark the criterion as tested, make the test pass (green), and refactor if needed. Keep only one failing test at a time.
- Action: Run compilation and make sure it succeeds without errors. Consult
specs/tech/STACK.mdand run all required linters listed there (treat warnings as errors). Run tests and make sure they all pass before proceeding. Ask questions here if needed. - Action: Do not accept stories yourself. Ask the user if they accept the story. If they agree, move the story file to
work/5_done/. - Move to Done: After acceptance, move the story from
work/2_current/(orwork/4_merge/) towork/5_done/. - Action: When the user accepts:
- Move the story file to
work/5_done/ - Commit both changes to the feature branch
- Perform the squash merge:
git merge --squash feature/story-name - Commit to master with a comprehensive commit message
- Delete the feature branch:
git branch -D feature/story-name
- Move the story file to
- Important: Do NOT mark acceptance criteria as complete before user acceptance. Only mark them complete when the user explicitly accepts the story.
CRITICAL - NO SUMMARY DOCUMENTS:
- NEVER create a separate summary document (e.g.,
STORY_XX_SUMMARY.md,IMPLEMENTATION_NOTES.md, etc.) - NEVER write terminal output to a markdown file for "documentation purposes"
- Tests are the primary source of truth. Keep test coverage and Acceptance Criteria aligned after each story.
- If you find yourself typing
cat << 'EOF' > SUMMARY.mdor similar, STOP IMMEDIATELY. - The only files that should exist after story completion:
- Updated code in
src/ - Updated guardrails in
specs/(if needed) - Archived work item in
work/5_done/(server auto-sweeps towork/6_archived/after 4 hours)
- Updated code in
3.5. Bug Workflow (Simplified Path)
Not everything needs to be a full story. Simple bugs can skip the story process:
When to Use Bug Workflow
- Defects in existing functionality (not new features)
- State inconsistencies or data corruption
- UI glitches that don't require spec changes
- Performance issues with known fixes
Bug Process
- Document Bug: Create a bug file in
work/1_backlog/named{id}_bug_{slug}.mdwith:- Symptom: What the user observes
- Root Cause: Technical explanation (if known)
- Reproduction Steps: How to trigger the bug
- Proposed Fix: Brief technical approach
- Workaround: Temporary solution if available
- Start an Agent: Use the
start_agentMCP tool to create a worktree and spawn an agent for the bug fix. - Write a Failing Test: Before fixing the bug, write a test that reproduces it (red). This proves the bug exists and prevents regression.
- Fix the Bug: Make minimal code changes to make the test pass (green).
- User Testing: Let the user verify the fix in the worktree before merging. Do not proceed until they confirm.
- Archive & Merge: Move the bug file to
work/5_done/, squash merge to master, delete the worktree and branch. - No Guardrail Update Needed: Unless the bug reveals a missing constraint
Bug vs Story vs Spike
- Bug: Existing functionality is broken → Fix it
- Story: New functionality is needed → Test it, then build it
- Spike: Uncertainty/feasibility discovery → Run spike workflow
3.6. Spike Workflow (Research Path)
Not everything needs a story or bug fix. Spikes are time-boxed investigations to reduce uncertainty.
When to Use a Spike
- Unclear root cause or feasibility
- Need to compare libraries/encoders/formats
- Need to validate performance constraints
Spike Process
- Document Spike: Create a spike file in
work/1_backlog/named{id}_spike_{slug}.mdwith:- Question: What you need to answer
- Hypothesis: What you expect to be true
- Timebox: Strict limit for the research
- Investigation Plan: Steps/tools to use
- Findings: Evidence and observations
- Recommendation: Next step (Story, Bug, or No Action)
- Execute Research: Stay within the timebox. No production code changes.
- Escalate if Needed: If implementation is required, open a Story or Bug and follow that workflow.
- Archive: Move the spike file to
work/5_done/.
Spike Output
- Decision and evidence, not production code
- Specs updated only if the spike changes system truth
4. Context Reset Protocol
When the LLM context window fills up (or the chat gets slow/confused):
- Stop Coding.
- Instruction: Tell the user to open a new chat.
- Handoff: The only context the new LLM needs is in the
specs/folder and.mcp.json.- Prompt for New Session: "I am working on Project X. Read
.mcp.jsonto discover available tools, then readspecs/00_CONTEXT.mdandspecs/tech/STACK.md. Then look atwork/1_backlog/andwork/2_current/to see what is pending."
- Prompt for New Session: "I am working on Project X. Read
5. Setup Instructions (For the LLM)
If a user hands you this document and says "Apply this process to my project":
- Check for MCP Tools: Look for
.mcp.jsonin the project root. If it exists, you have programmatic access to workflow tools and agent spawning capabilities. - Analyze the Request: Ask for the high-level goal ("What are we building?") and the tech preferences ("Rust or Python?").
- Git Check: Check if the directory is a git repository (
git status). If not, rungit init. - Scaffold: Run commands to create the
work/andspecs/folders with the 6-stage pipeline (work/1_backlog/throughwork/6_archived/). - Draft Context: Write
specs/00_CONTEXT.mdbased on the user's answer. - Draft Stack: Write
specs/tech/STACK.mdbased on best practices for that language. - Wait: Ask the user for "Story #1".
6. Code Quality
MANDATORY: Before completing Step 3 (Verification) of any story, you MUST run all applicable linters, formatters, and test suites and fix ALL errors and warnings. Zero tolerance for warnings or errors.
AUTO-RUN CHECKS: Always run the required lint/test/build checks as soon as relevant changes are made. Do not ask for permission to run them—run them automatically and fix any failures.
ALWAYS FIX DIAGNOSTICS: At every stage, you must proactively fix all errors and warnings without waiting for user confirmation. Do not pause to ask whether to fix diagnostics—fix them immediately as part of the workflow.
Consult specs/tech/STACK.md for the specific tools, commands, linter configurations, and quality gates for this project. The STACK file is the single source of truth for what must pass before a story can be accepted.