story-kit: merge 287_story_rename_upcoming_pipeline_stage_to_backlog

2026-03-18 14:31:12 +00:00
parent 967ebd7a84
commit df6f792214
26 changed files with 250 additions and 228 deletions
--- a/.story_kit/work/1_backlog/.gitkeep
+++ b/.story_kit/work/1_backlog/.gitkeep
--- a/.story_kit/work/1_backlog/169_story_gate_pipeline_transitions_on_ensure_acceptance.md
+++ b/.story_kit/work/1_backlog/169_story_gate_pipeline_transitions_on_ensure_acceptance.md
@@ -0,0 +1,20 @@
+---
+name: "Gate pipeline transitions on ensure_acceptance"
+---
+
+# Story 169: Gate pipeline transitions on ensure_acceptance
+
+## User Story
+
+As a project owner, I want story progression to be blocked unless ensure_acceptance passes, so that agents can't skip the testing workflow.
+
+## Acceptance Criteria
+
+- [ ] move_story_to_merge rejects stories that haven't passed ensure_acceptance
+- [ ] accept_story rejects stories that haven't passed ensure_acceptance
+- [ ] Rejection returns a clear error message telling the agent what's missing
+- [ ] Existing passing stories (all criteria checked, tests recorded) still flow through normally
+
+## Out of Scope
+
+- TBD
--- a/.story_kit/work/1_backlog/247_story_human_qa_gate_with_rejection_flow.md
+++ b/.story_kit/work/1_backlog/247_story_human_qa_gate_with_rejection_flow.md
@@ -0,0 +1,25 @@
+---
+name: "Human QA gate with rejection flow"
+---
+
+# Story 247: Human QA gate with rejection flow
+
+## User Story
+
+As the project owner, I want stories to require my manual approval after machine QA before they can be merged, so that features that compile and pass tests but do not actually work correctly are caught before reaching master.
+
+## Acceptance Criteria
+
+- [ ] Story files support a manual_qa front matter field (defaults to true)
+- [ ] After machine QA passes in 3_qa, stories with manual_qa: true wait for human approval before moving to 4_merge
+- [ ] The UI shows a clear way to launch the app from the worktree for manual testing (single button click), with automatic port conflict handling via .story_kit_port
+- [ ] Frontend and backend are pre-compiled during machine QA so the app is ready to run instantly for manual testing
+- [ ] Only one QA app instance runs at a time — do not automatically spin up multiple instances
+- [ ] Human can approve a story from 3_qa to move it to 4_merge
+- [ ] Human can reject a story from 3_qa back to 2_current with notes about what is broken
+- [ ] Rejection notes are written into the story file so the coder can see what needs fixing
+- [ ] Stories with manual_qa: false skip the human gate and proceed directly from machine QA to 4_merge
+
+## Out of Scope
+
+- TBD
--- a/.story_kit/work/1_backlog/260_refactor_upgrade_libsqlite3_sys.md
+++ b/.story_kit/work/1_backlog/260_refactor_upgrade_libsqlite3_sys.md
@@ -0,0 +1,24 @@
+---
+name: "Upgrade libsqlite3-sys"
+---
+
+# Refactor 260: Upgrade libsqlite3-sys
+
+## Description
+
+Upgrade the `libsqlite3-sys` dependency from `0.35.0` to `0.37.0`. The crate is used with `features = ["bundled"]` for static builds.
+
+## Version Notes
+
+- Current: `libsqlite3-sys 0.35.0` (pinned transitively by `matrix-sdk 0.16.0` → `matrix-sdk-sqlite` → `rusqlite 0.37.x`)
+- Target: `libsqlite3-sys 0.37.0`
+- Latest upstream rusqlite: `0.39.0`
+- **Blocker**: `matrix-sdk 0.16.0` pins `rusqlite 0.37.x` which pins `libsqlite3-sys 0.35.0`. A clean upgrade requires either waiting for matrix-sdk to bump their rusqlite dep, or upgrading matrix-sdk itself.
+- **Reverted 2026-03-17**: A previous coder vendored the entire rusqlite crate with a fake `0.37.99` version and patched its libsqlite3-sys dep. This was too hacky — reverted to clean `0.35.0`.
+
+## Acceptance Criteria
+
+- [ ] `libsqlite3-sys` is upgraded to `0.37.0` via a clean dependency path (no vendored forks)
+- [ ] `cargo build` succeeds
+- [ ] All tests pass
+- [ ] No `[patch.crates-io]` hacks or vendored crates
--- a/.story_kit/work/1_backlog/280_story_long_running_supervisor_agent_with_periodic_pipeline_polling.md
+++ b/.story_kit/work/1_backlog/280_story_long_running_supervisor_agent_with_periodic_pipeline_polling.md
@@ -0,0 +1,32 @@
+---
+name: "Long-running supervisor agent with periodic pipeline polling"
+agent: coder-opus
+---
+
+# Story 280: Long-running supervisor agent with periodic pipeline polling
+
+## User Story
+
+As a project owner, I want a long-running supervisor agent (opus) that automatically monitors the pipeline, assigns agents, resolves stuck items, and handles routine operational tasks, so that I don't have to manually check status, kick agents, or babysit the pipeline in every conversation.
+
+## Acceptance Criteria
+
+- [ ] Server can start a persistent supervisor agent that stays alive across the session (not per-story)
+- [ ] Server prods the supervisor periodically (default 30s, configurable in project.toml) with a pipeline status update
+- [ ] Supervisor auto-assigns agents to unassigned items in current/qa/merge stages
+- [ ] Supervisor detects stuck agents (no progress for configurable timeout) and restarts them
+- [ ] Supervisor detects merge failures and sends stories back to current for rebase when appropriate
+- [ ] Supervisor can be chatted with via Matrix (timmy relays to supervisor) or via the web UI
+- [ ] Supervisor logs its decisions so the human can review what it did and why
+- [ ] Polling interval is configurable in project.toml (e.g. supervisor_poll_interval_secs = 30)
+- [ ] Supervisor logs persistent/recurring problems to `.story_kit/problems.md` with timestamp, description, and frequency — humans review this file periodically to create stories for systemic issues
+
+## Notes
+
+- **2026-03-18**: Moved back to current from merge. Previous attempt went through the full pipeline but the squash-merge produced an empty diff — no code was actually implemented. Needs a real implementation.
+
+## Out of Scope
+
+- Supervisor accepting or merging stories to master (human job)
+- Supervisor making architectural decisions
+- Replacing the existing per-story agent spawning — supervisor coordinates on top of it
--- a/.story_kit/work/1_backlog/287_story_rename_upcoming_pipeline_stage_to_backlog.md
+++ b/.story_kit/work/1_backlog/287_story_rename_upcoming_pipeline_stage_to_backlog.md
@@ -0,0 +1,22 @@
+---
+name: "Rename upcoming pipeline stage to backlog"
+---
+
+# Story 287: Rename upcoming pipeline stage to backlog
+
+## User Story
+
+As a project owner, I want the "upcoming" pipeline stage renamed to "backlog" throughout the codebase, UI, and directory structure, so that the terminology better reflects that these items are not necessarily coming up next.
+
+## Acceptance Criteria
+
+- [ ] Directory renamed from 1_upcoming to 1_backlog
+- [ ] All server code references updated (watcher, lifecycle, MCP tools, workflow, etc.)
+- [ ] Frontend UI labels updated
+- [ ] MCP tool descriptions and outputs use "backlog" instead of "upcoming"
+- [ ] Existing story/bug files moved to the new directory
+- [ ] Git commit messages use "backlog" for new items going forward
+
+## Out of Scope
+
+- TBD
--- a/.story_kit/work/1_backlog/35_story_agent_security_and_sandboxing.md
+++ b/.story_kit/work/1_backlog/35_story_agent_security_and_sandboxing.md
@@ -0,0 +1,31 @@
+---
+name: Agent Security and Sandboxing
+---
+# Story 34: Agent Security and Sandboxing
+
+## User Story
+**As a** supervisor orchestrating multiple autonomous agents,
+**I want to** constrain what each agent can access and do,
+**So that** agents can't escape their worktree, damage shared state, or perform unintended actions.
+
+## Acceptance Criteria
+- [ ] Agent creation accepts an `allowed_tools` list to restrict Claude Code tool access per agent.
+- [ ] Agent creation accepts a `disallowed_tools` list as an alternative to allowlisting.
+- [ ] Agents without Bash access can still perform useful coding work (Read, Edit, Write, Glob, Grep).
+- [ ] Investigate replacing direct Bash/shell access with Rust-implemented tool proxies that enforce boundaries:
+  - Scoped `exec_shell` that only runs allowlisted commands (e.g., `cargo test`, `npm test`) within the agent's worktree.
+  - Scoped `read_file` / `write_file` that reject paths outside the agent's worktree root.
+  - Scoped `git` operations that only work within the agent's worktree.
+- [ ] Evaluate `--max-turns` and `--max-budget-usd` as safety limits for runaway agents.
+- [ ] Document the trust model: what the supervisor controls vs what agents can do autonomously.
+
+## Questions to Explore
+- Can we use MCP (Model Context Protocol) to expose our Rust-implemented tools to Claude Code, replacing its built-in Bash/filesystem tools with scoped versions?
+- What's the right granularity for shell allowlists — command-level (`cargo test`) or pattern-level (`cargo *`)?
+- Should agents have read access outside their worktree (e.g., to reference shared specs) but write access only within it?
+- Is OS-level sandboxing (Docker, macOS sandbox profiles) worth the complexity for a personal tool?
+
+## Out of Scope
+- Multi-user authentication or authorization (single-user personal tool).
+- Network-level isolation between agents.
+- Encrypting agent communication channels (all local).
--- a/.story_kit/work/1_backlog/57_story_live_test_gate_updates.md
+++ b/.story_kit/work/1_backlog/57_story_live_test_gate_updates.md
@@ -0,0 +1,18 @@
+---
+name: Live Test Gate Updates
+---
+
+# Story 57: Live Test Gate Updates
+
+## User Story
+
+As a user, I want the Gate and Todo panels to update automatically when tests are recorded or acceptance is checked, so I can see progress without manually refreshing.
+
+## Acceptance Criteria
+
+- [ ] Server broadcasts a `{"type": "notification", "topic": "tests"}` event over `/ws` when tests are recorded, acceptance is checked, or coverage is collected
+- [ ] GatePanel auto-refreshes its data when it receives a `tests` notification
+- [ ] TodoPanel auto-refreshes its data when it receives a `tests` notification
+- [ ] Manual refresh buttons continue to work
+- [ ] Panels do not flicker or lose scroll position on auto-refresh
+- [ ] End-to-end test: record test results via MCP, verify Gate panel updates without manual refresh
--- a/.story_kit/work/1_backlog/90_story_fetch_real_context_window_size_from_anthropic_models_api.md
+++ b/.story_kit/work/1_backlog/90_story_fetch_real_context_window_size_from_anthropic_models_api.md
@@ -0,0 +1,21 @@
+---
+name: "Fetch real context window size from Anthropic models API"
+---
+
+# Story 90: Fetch real context window size from Anthropic models API
+
+## User Story
+
+As a user chatting with a Claude model, I want the context remaining indicator to show the actual context window size for the selected model (fetched from the Anthropic API) instead of a hardcoded value, so that the indicator is accurate across all current and future models.
+
+## Acceptance Criteria
+
+- [ ] Backend AnthropicModelInfo struct deserializes the context_window field from the Anthropic /v1/models response
+- [ ] Backend /anthropic/models endpoint returns both model ID and context window size to the frontend
+- [ ] Frontend uses the real context window size from the API response instead of the hardcoded getContextWindowSize map for Anthropic models
+- [ ] Context indicator in ChatHeader displays the correct percentage based on the real context window size
+- [ ] Hardcoded fallback remains for Ollama/local models that don't provide context window metadata
+
+## Out of Scope
+
+- TBD