storkit: delete 57_story_live_test_gate_updates
This commit is contained in:
@@ -1,157 +0,0 @@
|
||||
---
|
||||
name: "Evaluate Docker/OrbStack for agent isolation and resource limiting"
|
||||
agent: coder-opus
|
||||
---
|
||||
|
||||
# Spike 329: Evaluate Docker/OrbStack for agent isolation and resource limiting
|
||||
|
||||
## Question
|
||||
|
||||
Investigate running the entire storkit system (server, Matrix bot, agents, web UI) inside a single Docker container, using OrbStack as the macOS runtime for better performance. The goal is to isolate storkit from the host machine — not to isolate agents from each other.
|
||||
|
||||
**Important context:** Storkit developing itself is the dogfood edge case. The primary use case is storkit managing agents that develop *other* projects, driven by multiple users in chat rooms (Matrix, WhatsApp, Slack). Isolation must account for untrusted codebases, multi-user command surfaces, and running against arbitrary repos — not just the single-developer self-hosted setup.
|
||||
|
||||
Currently storkit runs as bare processes on the host with full filesystem and network access. A single container would provide:
|
||||
|
||||
1. **Host isolation** — storkit can't touch anything outside the container
|
||||
2. **Clean install/uninstall** — `docker run` to start, `docker rm` to remove
|
||||
3. **Reproducible environment** — same container works on any machine
|
||||
4. **Distributable product** — `docker pull storkit` for new users
|
||||
5. **Resource limits** — cap total CPU/memory for the whole system
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
Docker Container (single)
|
||||
├── storkit server
|
||||
│ ├── Matrix bot
|
||||
│ ├── WhatsApp webhook
|
||||
│ ├── Slack webhook
|
||||
│ ├── Web UI
|
||||
│ └── MCP server
|
||||
├── Agent processes (coder-1, coder-2, coder-opus, qa, mergemaster)
|
||||
├── Rust toolchain + Node.js + Claude Code CLI
|
||||
└── /workspace (bind-mounted project repo from host)
|
||||
```
|
||||
|
||||
## Key questions to answer:
|
||||
|
||||
- **Performance**: How much slower are cargo builds inside the container on macOS? Compare Docker Desktop vs OrbStack for bind-mounted volumes.
|
||||
- **Dockerfile**: What's the minimal image for the full stack? Rust toolchain + Node.js + Claude Code CLI + cargo-nextest + git.
|
||||
- **Bind mounts**: The project repo is bind-mounted from the host. Any filesystem performance concerns with OrbStack?
|
||||
- **Networking**: Container exposes web UI port (3000). Matrix/WhatsApp/Slack connect outbound. Any issues?
|
||||
- **API key**: Pass ANTHROPIC_API_KEY as env var to the container.
|
||||
- **Git**: Git operations happen inside the container on the bind-mounted repo. Commits are visible on the host immediately.
|
||||
- **Cargo cache**: Use a named Docker volume for ~/.cargo/registry so dependencies persist across container restarts.
|
||||
- **Claude Code state**: Where does Claude Code store its session data? Needs to persist or be in a volume.
|
||||
- **OrbStack vs Docker Desktop**: Is OrbStack required for acceptable performance, or does Docker Desktop work too?
|
||||
- **Server restart**: Does `rebuild_and_restart` work inside a container (re-exec with new binary)?
|
||||
|
||||
## Deliverable:
|
||||
A proof-of-concept Dockerfile, docker-compose.yml, and a short write-up with findings and performance benchmarks.
|
||||
|
||||
## Hypothesis
|
||||
|
||||
Running storkit inside a single Docker container on macOS is viable with OrbStack, provided
|
||||
`target/` directories are kept on Docker volumes rather than the bind-mounted project repo.
|
||||
Sequential I/O is fast enough; directory-stat overhead is the real bottleneck.
|
||||
|
||||
## Timebox
|
||||
|
||||
Initial investigation: 1 session (spike 329, 2026-03-21). OrbStack vs Docker Desktop
|
||||
comparison requires a second session on a machine with both installed.
|
||||
|
||||
## Investigation Plan
|
||||
|
||||
1. ✅ Boot container, confirm full stack (Rust, Node, Claude Code CLI, git) is present
|
||||
2. ✅ Benchmark bind-mount vs Docker-volume filesystem performance
|
||||
3. ✅ Identify Dockerfile/compose gaps (missing gcc, missing target/ volumes)
|
||||
4. ✅ Fix gaps and document
|
||||
5. ⬜ Rebuild image with fixes, run full `cargo build --release` benchmark
|
||||
6. ⬜ Compare Docker Desktop vs OrbStack on the same machine
|
||||
|
||||
## Findings
|
||||
|
||||
### Environment (2026-03-21, inside running container)
|
||||
|
||||
- **OS**: Debian GNU/Linux 12 (bookworm), arm64
|
||||
- **CPUs**: 10
|
||||
- **Rust**: 1.90.0 / Cargo 1.90.0
|
||||
- **Node**: v22.22.1
|
||||
- **Git**: 2.39.5
|
||||
- **Runtime**: OrbStack — confirmed by bind-mount path `/run/host_mark/Users → /workspace`
|
||||
|
||||
### Filesystem performance benchmarks
|
||||
|
||||
The critical finding: **bind-mount directory traversal is ~23x slower per file than a Docker volume**.
|
||||
|
||||
| Filesystem | Files traversed | Time | Rate |
|
||||
|---|---|---|---|
|
||||
| Docker volume (`/usr/local/cargo/registry`) | 21,703 | **38ms** | ~571k files/sec |
|
||||
| Container fs (`/tmp`) | 611 | **4ms** | fast |
|
||||
| Bind mount — `target/` subtree | 270,550 | **10,564ms** | ~25k files/sec |
|
||||
| Bind mount — non-target files | 50,048 | **11,314ms** | ~4.4k files/sec |
|
||||
|
||||
Sequential I/O on the bind mount is acceptable:
|
||||
|
||||
| Operation | Time | Throughput |
|
||||
|---|---|---|
|
||||
| Write 100MB to `/tmp` (container fs) | 92ms | 1.1 GB/s |
|
||||
| Read 100MB from `/tmp` | 42ms | 2.4 GB/s |
|
||||
| Write 100MB to `/workspace` (bind mount) | 227ms | 440 MB/s |
|
||||
| Read 100MB from `/workspace` (bind mount) | 78ms | 1.3 GB/s |
|
||||
|
||||
**Interpretation**: Sequential reads/writes are fine. The bottleneck is directory stat operations —
|
||||
exactly what `cargo` does when checking whether artifacts are stale. Leaving `target/` on the
|
||||
bind mount will make incremental builds extremely slow (12+ seconds just to traverse the tree
|
||||
before a single file is compiled).
|
||||
|
||||
### Bugs found and fixed
|
||||
|
||||
**Bug 1 — `target/` directories on bind mount (docker-compose.yml)**
|
||||
|
||||
The compose file mounted `${PROJECT_PATH}:/workspace` but had no override for `target/`.
|
||||
Cargo's 270k build artifacts were being stat-checked through the slow bind mount on every
|
||||
incremental build. Fixed by adding named Docker volumes:
|
||||
|
||||
```yaml
|
||||
- workspace-target:/workspace/target
|
||||
- storkit-target:/app/target
|
||||
```
|
||||
|
||||
**Bug 2 — missing `build-essential` in runtime stage (Dockerfile)**
|
||||
|
||||
The runtime stage (`debian:bookworm-slim`) copies the Rust toolchain from the base stage but
|
||||
does not install `gcc`/`cc`. Any `cargo build` invocation fails at link time with:
|
||||
|
||||
```
|
||||
error: linker `cc` not found
|
||||
```
|
||||
|
||||
This affects both `rebuild_and_restart` and any agent-driven cargo commands. Fixed by adding
|
||||
`build-essential`, `pkg-config`, and `libssl-dev` to the runtime apt-get block.
|
||||
|
||||
### Key questions — status
|
||||
|
||||
| Question | Status | Answer |
|
||||
|---|---|---|
|
||||
| Dockerfile — minimal image? | ✅ | `rust:1.90-bookworm` base + Node 22 + Claude Code CLI + cargo-nextest. Runtime stage needs `build-essential`. |
|
||||
| Cargo cache persistence? | ✅ | Named volume `cargo-registry:/usr/local/cargo/registry` — works well. |
|
||||
| Claude Code state? | ✅ | Named volume `claude-state:/root/.claude` — correct approach. |
|
||||
| Bind mount performance? | ✅ | Sequential I/O fine; directory traversal slow — mitigated by target/ volumes. |
|
||||
| API key? | ✅ | Passed as `ANTHROPIC_API_KEY` env var via compose. |
|
||||
| Git on bind mount? | ✅ | Works — host sees commits immediately. |
|
||||
| rebuild_and_restart? | ⚠️ | Needs `build-essential` fix (now patched). Source at `/app` via bind mount is correct. |
|
||||
| OrbStack vs Docker Desktop? | ⬜ | Not yet benchmarked — requires second session. OrbStack VirtioFS confirmed working. Docker Desktop likely worse for directory traversal. |
|
||||
| Resource limits? | ✅ | `deploy.resources` limits in compose (4 CPU / 8G RAM). |
|
||||
| Networking? | ✅ | Port 3001 exposed. Outbound Matrix/Slack/WhatsApp connections are unrestricted inside container. |
|
||||
|
||||
## Recommendation
|
||||
|
||||
**Proceed with Docker + OrbStack.** The architecture is sound and the PoC works. Two bugs
|
||||
have been fixed (missing `build-essential`, missing `target/` volumes). The remaining unknown
|
||||
is the Docker Desktop vs OrbStack performance delta — we expect OrbStack to be significantly
|
||||
faster based on VirtioFS vs gRPC-FUSE, but the magnitude needs measuring.
|
||||
|
||||
**Next step**: rebuild the image with the patched Dockerfile and run a full `cargo build --release`
|
||||
benchmark to get an end-to-end number. Then repeat on Docker Desktop for comparison.
|
||||
@@ -1,20 +0,0 @@
|
||||
---
|
||||
name: "Remove deprecated manual_qa front matter field"
|
||||
---
|
||||
|
||||
# Story 361: Remove deprecated manual_qa front matter field
|
||||
|
||||
## User Story
|
||||
|
||||
As a developer, I want the deprecated manual_qa boolean field removed from the codebase, so that the front matter schema stays clean and doesn't accumulate legacy boolean flags alongside the more expressive qa: server|agent|human field that replaced it.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] manual_qa field is removed from the FrontMatter and StoryMetadata structs in story_metadata.rs
|
||||
- [ ] Legacy mapping from manual_qa: true → qa: human is removed
|
||||
- [ ] Any existing story files using manual_qa are migrated to qa: human
|
||||
- [ ] Codebase compiles cleanly with no references to manual_qa remaining
|
||||
|
||||
## Out of Scope
|
||||
|
||||
- TBD
|
||||
@@ -1,18 +0,0 @@
|
||||
---
|
||||
name: Live Test Gate Updates
|
||||
---
|
||||
|
||||
# Story 57: Live Test Gate Updates
|
||||
|
||||
## User Story
|
||||
|
||||
As a user, I want the Gate and Todo panels to update automatically when tests are recorded or acceptance is checked, so I can see progress without manually refreshing.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] Server broadcasts a `{"type": "notification", "topic": "tests"}` event over `/ws` when tests are recorded, acceptance is checked, or coverage is collected
|
||||
- [ ] GatePanel auto-refreshes its data when it receives a `tests` notification
|
||||
- [ ] TodoPanel auto-refreshes its data when it receives a `tests` notification
|
||||
- [ ] Manual refresh buttons continue to work
|
||||
- [ ] Panels do not flicker or lose scroll position on auto-refresh
|
||||
- [ ] End-to-end test: record test results via MCP, verify Gate panel updates without manual refresh
|
||||
Reference in New Issue
Block a user