Files

Story Kit: The Story-Driven Test Workflow (SDTW)

Target Audience: Large Language Models (LLMs) acting as Senior Engineers. Goal: To maintain long-term project coherence, prevent context window exhaustion, and ensure high-quality, testable code generation in large software projects.


0. First Steps (For New LLM Sessions)

When you start a new session with this project:

  1. Check for MCP Tools: Read .mcp.json to discover the MCP server endpoint. Then list available tools by calling:
    curl -s "$(jq -r '.mcpServers["storkit"].url' .mcp.json)" \
      -H 'Content-Type: application/json' \
      -d '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}'
    
    This returns the full tool catalog (create stories, spawn agents, record tests, manage worktrees, etc.). Familiarize yourself with the available tools before proceeding. These tools allow you to directly manipulate the workflow and spawn subsidiary agents without manual file manipulation.
  2. Read Context: Check .story_kit/specs/00_CONTEXT.md for high-level project goals.
  3. Read Stack: Check .story_kit/specs/tech/STACK.md for technical constraints and patterns.
  4. Check Work Items: Look at .story_kit/work/1_backlog/ and .story_kit/work/2_current/ to see what work is pending.

1. The Philosophy

We treat the codebase as the implementation of a "Living Specification." driven by User Stories Instead of ephemeral chat prompts ("Fix this", "Add that"), we work through persistent artifacts.

  • Stories define the Change.
  • Tests define the Truth.
  • Code defines the Reality.

The Golden Rule: You are not allowed to write code until the Acceptance Criteria are captured in the story.


1.5 MCP Tools

Agents have programmatic access to the workflow via MCP tools served at POST /mcp. The project .mcp.json registers this endpoint automatically so Claude Code sessions and spawned agents can call tools like create_story, validate_stories, list_upcoming, get_story_todos, record_tests, ensure_acceptance, start_agent, stop_agent, list_agents, and get_agent_output without parsing English instructions.

To discover what tools are available: Check .mcp.json for the server endpoint, then use the MCP protocol to list available tools.


2. Directory Structure

project_root/
  .mcp.json              # MCP server configuration (if MCP tools are available)
  .story_kit/
  ├── README.md          # This document
  ├── project.toml       # Agent configuration (roles, models, prompts)
  ├── work/              # Unified work item pipeline (stories, bugs, spikes)
  │   ├── 1_backlog/    # New work items awaiting implementation
  │   ├── 2_current/     # Work in progress
  │   ├── 3_qa/          # QA review
  │   ├── 4_merge/       # Ready to merge to master
  │   ├── 5_done/        # Merged and completed (auto-swept to 6_archived after 4 hours)
  │   └── 6_archived/    # Long-term archive
  ├── worktrees/         # Agent worktrees (managed by the server)
  ├── specs/             # Minimal guardrails (context + stack)
  │   ├── 00_CONTEXT.md  # High-level goals, domain definition, and glossary
  │   ├── tech/          # Implementation details (Stack, Architecture, Constraints)
  │   │   └── STACK.md   # The "Constitution" (Languages, Libs, Patterns)
  │   └── functional/    # Domain logic (Platform-agnostic behavior)
  │       └── ...
  └── src/               # The Code

Work Items

All work items (stories, bugs, spikes) live in the same work/ pipeline. Items are named: {id}_{type}_{slug}.md

  • Stories: 57_story_live_test_gate_updates.md
  • Bugs: 4_bug_run_button_does_not_start_agent.md
  • Spikes: 61_spike_filesystem_watcher_architecture.md

Items move through stages by moving the file between directories:

1_backlog2_current3_qa4_merge5_done6_archived

Items in 5_done are auto-swept to 6_archived after 4 hours by the server.

Filesystem Watcher

The server watches .story_kit/work/ for changes. When a file is created, moved, or modified, the watcher auto-commits with a deterministic message and broadcasts a WebSocket notification to the frontend. This means:

  • MCP tools only need to write/move files — the watcher handles git commits
  • IDE drag-and-drop works (drag a story from 1_backlog/ to 2_current/)
  • The frontend updates automatically without manual refresh

3. The Cycle (The "Loop")

When the user asks for a feature, follow this 4-step loop strictly:

Step 1: The Story (Ingest)

  • User Input: "I want the robot to dance."
  • Action: Create a story via MCP tool create_story (guarantees correct front matter and auto-assigns the story number).
  • Front Matter (Required): Every work item file MUST begin with YAML front matter containing a name field:
    ---
    name: Short Human-Readable Story Name
    ---
    
  • Move to Current: Once the story is validated and ready for coding, move it to work/2_current/.
  • Tracking: Mark Acceptance Criteria as tested directly in the story file as tests are completed.
  • Content:
    • User Story: "As a user, I want..."
    • Acceptance Criteria: Bullet points of observable success.
    • Out of scope: Things that are out of scope so that the LLM doesn't go crazy
  • Story Quality (INVEST): Stories should be Independent, Negotiable, Valuable, Estimable, Small, and Testable.
  • Git: The start_agent MCP tool automatically creates a worktree under .story_kit/worktrees/, checks out a feature branch, moves the story to work/2_current/, and spawns the agent. No manual branch or worktree creation is needed.

Step 2: The Implementation (Code)

  • Action: Write the code to satisfy the approved tests and Acceptance Criteria.
  • Constraint: adhere strictly to specs/tech/STACK.md (e.g., if it forbids certain patterns, you must not use them).
  • Full-Stack Completion: Every story must be completed across all components of the stack. If a feature touches the backend, frontend, and API layer, all three must be fully implemented and working end-to-end before the story can be accepted. Partial implementations (e.g., backend logic with no frontend wiring, or UI scaffolding with no real data) do not satisfy acceptance criteria.

Step 3: Verification (Close)

  • Action: For each Acceptance Criterion in the story, write a failing test (red), mark the criterion as tested, make the test pass (green), and refactor if needed. Keep only one failing test at a time.
  • Action: Run compilation and make sure it succeeds without errors. Consult specs/tech/STACK.md and run all required linters listed there (treat warnings as errors). Run tests and make sure they all pass before proceeding. Ask questions here if needed.
  • Action: Do not accept stories yourself. Ask the user if they accept the story. If they agree, move the story file to work/5_done/.
  • Move to Done: After acceptance, move the story from work/2_current/ (or work/4_merge/) to work/5_done/.
  • Action: When the user accepts:
    1. Move the story file to work/5_done/
    2. Commit both changes to the feature branch
    3. Perform the squash merge: git merge --squash feature/story-name
    4. Commit to master with a comprehensive commit message
    5. Delete the feature branch: git branch -D feature/story-name
  • Important: Do NOT mark acceptance criteria as complete before user acceptance. Only mark them complete when the user explicitly accepts the story.

CRITICAL - NO SUMMARY DOCUMENTS:

  • NEVER create a separate summary document (e.g., STORY_XX_SUMMARY.md, IMPLEMENTATION_NOTES.md, etc.)
  • NEVER write terminal output to a markdown file for "documentation purposes"
  • Tests are the primary source of truth. Keep test coverage and Acceptance Criteria aligned after each story.
  • If you find yourself typing cat << 'EOF' > SUMMARY.md or similar, STOP IMMEDIATELY.
  • The only files that should exist after story completion:
    • Updated code in src/
    • Updated guardrails in specs/ (if needed)
    • Archived work item in work/5_done/ (server auto-sweeps to work/6_archived/ after 4 hours)

3.5. Bug Workflow (Simplified Path)

Not everything needs to be a full story. Simple bugs can skip the story process:

When to Use Bug Workflow

  • Defects in existing functionality (not new features)
  • State inconsistencies or data corruption
  • UI glitches that don't require spec changes
  • Performance issues with known fixes

Bug Process

  1. Document Bug: Create a bug file in work/1_backlog/ named {id}_bug_{slug}.md with:
    • Symptom: What the user observes
    • Root Cause: Technical explanation (if known)
    • Reproduction Steps: How to trigger the bug
    • Proposed Fix: Brief technical approach
    • Workaround: Temporary solution if available
  2. Start an Agent: Use the start_agent MCP tool to create a worktree and spawn an agent for the bug fix.
  3. Write a Failing Test: Before fixing the bug, write a test that reproduces it (red). This proves the bug exists and prevents regression.
  4. Fix the Bug: Make minimal code changes to make the test pass (green).
  5. User Testing: Let the user verify the fix in the worktree before merging. Do not proceed until they confirm.
  6. Archive & Merge: Move the bug file to work/5_done/, squash merge to master, delete the worktree and branch.
  7. No Guardrail Update Needed: Unless the bug reveals a missing constraint

Bug vs Story vs Spike

  • Bug: Existing functionality is broken → Fix it
  • Story: New functionality is needed → Test it, then build it
  • Spike: Uncertainty/feasibility discovery → Run spike workflow

3.6. Spike Workflow (Research Path)

Not everything needs a story or bug fix. Spikes are time-boxed investigations to reduce uncertainty.

When to Use a Spike

  • Unclear root cause or feasibility
  • Need to compare libraries/encoders/formats
  • Need to validate performance constraints

Spike Process

  1. Document Spike: Create a spike file in work/1_backlog/ named {id}_spike_{slug}.md with:
    • Question: What you need to answer
    • Hypothesis: What you expect to be true
    • Timebox: Strict limit for the research
    • Investigation Plan: Steps/tools to use
    • Findings: Evidence and observations
    • Recommendation: Next step (Story, Bug, or No Action)
  2. Execute Research: Stay within the timebox. No production code changes.
  3. Escalate if Needed: If implementation is required, open a Story or Bug and follow that workflow.
  4. Archive: Move the spike file to work/5_done/.

Spike Output

  • Decision and evidence, not production code
  • Specs updated only if the spike changes system truth

4. Context Reset Protocol

When the LLM context window fills up (or the chat gets slow/confused):

  1. Stop Coding.
  2. Instruction: Tell the user to open a new chat.
  3. Handoff: The only context the new LLM needs is in the specs/ folder and .mcp.json.
    • Prompt for New Session: "I am working on Project X. Read .mcp.json to discover available tools, then read specs/00_CONTEXT.md and specs/tech/STACK.md. Then look at work/1_backlog/ and work/2_current/ to see what is pending."

5. Setup Instructions (For the LLM)

If a user hands you this document and says "Apply this process to my project":

  1. Check for MCP Tools: Look for .mcp.json in the project root. If it exists, you have programmatic access to workflow tools and agent spawning capabilities.
  2. Analyze the Request: Ask for the high-level goal ("What are we building?") and the tech preferences ("Rust or Python?").
  3. Git Check: Check if the directory is a git repository (git status). If not, run git init.
  4. Scaffold: Run commands to create the work/ and specs/ folders with the 6-stage pipeline (work/1_backlog/ through work/6_archived/).
  5. Draft Context: Write specs/00_CONTEXT.md based on the user's answer.
  6. Draft Stack: Write specs/tech/STACK.md based on best practices for that language.
  7. Wait: Ask the user for "Story #1".

6. Code Quality

MANDATORY: Before completing Step 3 (Verification) of any story, you MUST run all applicable linters, formatters, and test suites and fix ALL errors and warnings. Zero tolerance for warnings or errors.

AUTO-RUN CHECKS: Always run the required lint/test/build checks as soon as relevant changes are made. Do not ask for permission to run them—run them automatically and fix any failures.

ALWAYS FIX DIAGNOSTICS: At every stage, you must proactively fix all errors and warnings without waiting for user confirmation. Do not pause to ask whether to fix diagnostics—fix them immediately as part of the workflow.

Consult specs/tech/STACK.md for the specific tools, commands, linter configurations, and quality gates for this project. The STACK file is the single source of truth for what must pass before a story can be accepted.