storkit: delete 57_story_live_test_gate_updates
This commit is contained in:
@@ -1,157 +0,0 @@
|
|||||||
---
|
|
||||||
name: "Evaluate Docker/OrbStack for agent isolation and resource limiting"
|
|
||||||
agent: coder-opus
|
|
||||||
---
|
|
||||||
|
|
||||||
# Spike 329: Evaluate Docker/OrbStack for agent isolation and resource limiting
|
|
||||||
|
|
||||||
## Question
|
|
||||||
|
|
||||||
Investigate running the entire storkit system (server, Matrix bot, agents, web UI) inside a single Docker container, using OrbStack as the macOS runtime for better performance. The goal is to isolate storkit from the host machine — not to isolate agents from each other.
|
|
||||||
|
|
||||||
**Important context:** Storkit developing itself is the dogfood edge case. The primary use case is storkit managing agents that develop *other* projects, driven by multiple users in chat rooms (Matrix, WhatsApp, Slack). Isolation must account for untrusted codebases, multi-user command surfaces, and running against arbitrary repos — not just the single-developer self-hosted setup.
|
|
||||||
|
|
||||||
Currently storkit runs as bare processes on the host with full filesystem and network access. A single container would provide:
|
|
||||||
|
|
||||||
1. **Host isolation** — storkit can't touch anything outside the container
|
|
||||||
2. **Clean install/uninstall** — `docker run` to start, `docker rm` to remove
|
|
||||||
3. **Reproducible environment** — same container works on any machine
|
|
||||||
4. **Distributable product** — `docker pull storkit` for new users
|
|
||||||
5. **Resource limits** — cap total CPU/memory for the whole system
|
|
||||||
|
|
||||||
## Architecture
|
|
||||||
|
|
||||||
```
|
|
||||||
Docker Container (single)
|
|
||||||
├── storkit server
|
|
||||||
│ ├── Matrix bot
|
|
||||||
│ ├── WhatsApp webhook
|
|
||||||
│ ├── Slack webhook
|
|
||||||
│ ├── Web UI
|
|
||||||
│ └── MCP server
|
|
||||||
├── Agent processes (coder-1, coder-2, coder-opus, qa, mergemaster)
|
|
||||||
├── Rust toolchain + Node.js + Claude Code CLI
|
|
||||||
└── /workspace (bind-mounted project repo from host)
|
|
||||||
```
|
|
||||||
|
|
||||||
## Key questions to answer:
|
|
||||||
|
|
||||||
- **Performance**: How much slower are cargo builds inside the container on macOS? Compare Docker Desktop vs OrbStack for bind-mounted volumes.
|
|
||||||
- **Dockerfile**: What's the minimal image for the full stack? Rust toolchain + Node.js + Claude Code CLI + cargo-nextest + git.
|
|
||||||
- **Bind mounts**: The project repo is bind-mounted from the host. Any filesystem performance concerns with OrbStack?
|
|
||||||
- **Networking**: Container exposes web UI port (3000). Matrix/WhatsApp/Slack connect outbound. Any issues?
|
|
||||||
- **API key**: Pass ANTHROPIC_API_KEY as env var to the container.
|
|
||||||
- **Git**: Git operations happen inside the container on the bind-mounted repo. Commits are visible on the host immediately.
|
|
||||||
- **Cargo cache**: Use a named Docker volume for ~/.cargo/registry so dependencies persist across container restarts.
|
|
||||||
- **Claude Code state**: Where does Claude Code store its session data? Needs to persist or be in a volume.
|
|
||||||
- **OrbStack vs Docker Desktop**: Is OrbStack required for acceptable performance, or does Docker Desktop work too?
|
|
||||||
- **Server restart**: Does `rebuild_and_restart` work inside a container (re-exec with new binary)?
|
|
||||||
|
|
||||||
## Deliverable:
|
|
||||||
A proof-of-concept Dockerfile, docker-compose.yml, and a short write-up with findings and performance benchmarks.
|
|
||||||
|
|
||||||
## Hypothesis
|
|
||||||
|
|
||||||
Running storkit inside a single Docker container on macOS is viable with OrbStack, provided
|
|
||||||
`target/` directories are kept on Docker volumes rather than the bind-mounted project repo.
|
|
||||||
Sequential I/O is fast enough; directory-stat overhead is the real bottleneck.
|
|
||||||
|
|
||||||
## Timebox
|
|
||||||
|
|
||||||
Initial investigation: 1 session (spike 329, 2026-03-21). OrbStack vs Docker Desktop
|
|
||||||
comparison requires a second session on a machine with both installed.
|
|
||||||
|
|
||||||
## Investigation Plan
|
|
||||||
|
|
||||||
1. ✅ Boot container, confirm full stack (Rust, Node, Claude Code CLI, git) is present
|
|
||||||
2. ✅ Benchmark bind-mount vs Docker-volume filesystem performance
|
|
||||||
3. ✅ Identify Dockerfile/compose gaps (missing gcc, missing target/ volumes)
|
|
||||||
4. ✅ Fix gaps and document
|
|
||||||
5. ⬜ Rebuild image with fixes, run full `cargo build --release` benchmark
|
|
||||||
6. ⬜ Compare Docker Desktop vs OrbStack on the same machine
|
|
||||||
|
|
||||||
## Findings
|
|
||||||
|
|
||||||
### Environment (2026-03-21, inside running container)
|
|
||||||
|
|
||||||
- **OS**: Debian GNU/Linux 12 (bookworm), arm64
|
|
||||||
- **CPUs**: 10
|
|
||||||
- **Rust**: 1.90.0 / Cargo 1.90.0
|
|
||||||
- **Node**: v22.22.1
|
|
||||||
- **Git**: 2.39.5
|
|
||||||
- **Runtime**: OrbStack — confirmed by bind-mount path `/run/host_mark/Users → /workspace`
|
|
||||||
|
|
||||||
### Filesystem performance benchmarks
|
|
||||||
|
|
||||||
The critical finding: **bind-mount directory traversal is ~23x slower per file than a Docker volume**.
|
|
||||||
|
|
||||||
| Filesystem | Files traversed | Time | Rate |
|
|
||||||
|---|---|---|---|
|
|
||||||
| Docker volume (`/usr/local/cargo/registry`) | 21,703 | **38ms** | ~571k files/sec |
|
|
||||||
| Container fs (`/tmp`) | 611 | **4ms** | fast |
|
|
||||||
| Bind mount — `target/` subtree | 270,550 | **10,564ms** | ~25k files/sec |
|
|
||||||
| Bind mount — non-target files | 50,048 | **11,314ms** | ~4.4k files/sec |
|
|
||||||
|
|
||||||
Sequential I/O on the bind mount is acceptable:
|
|
||||||
|
|
||||||
| Operation | Time | Throughput |
|
|
||||||
|---|---|---|
|
|
||||||
| Write 100MB to `/tmp` (container fs) | 92ms | 1.1 GB/s |
|
|
||||||
| Read 100MB from `/tmp` | 42ms | 2.4 GB/s |
|
|
||||||
| Write 100MB to `/workspace` (bind mount) | 227ms | 440 MB/s |
|
|
||||||
| Read 100MB from `/workspace` (bind mount) | 78ms | 1.3 GB/s |
|
|
||||||
|
|
||||||
**Interpretation**: Sequential reads/writes are fine. The bottleneck is directory stat operations —
|
|
||||||
exactly what `cargo` does when checking whether artifacts are stale. Leaving `target/` on the
|
|
||||||
bind mount will make incremental builds extremely slow (12+ seconds just to traverse the tree
|
|
||||||
before a single file is compiled).
|
|
||||||
|
|
||||||
### Bugs found and fixed
|
|
||||||
|
|
||||||
**Bug 1 — `target/` directories on bind mount (docker-compose.yml)**
|
|
||||||
|
|
||||||
The compose file mounted `${PROJECT_PATH}:/workspace` but had no override for `target/`.
|
|
||||||
Cargo's 270k build artifacts were being stat-checked through the slow bind mount on every
|
|
||||||
incremental build. Fixed by adding named Docker volumes:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
- workspace-target:/workspace/target
|
|
||||||
- storkit-target:/app/target
|
|
||||||
```
|
|
||||||
|
|
||||||
**Bug 2 — missing `build-essential` in runtime stage (Dockerfile)**
|
|
||||||
|
|
||||||
The runtime stage (`debian:bookworm-slim`) copies the Rust toolchain from the base stage but
|
|
||||||
does not install `gcc`/`cc`. Any `cargo build` invocation fails at link time with:
|
|
||||||
|
|
||||||
```
|
|
||||||
error: linker `cc` not found
|
|
||||||
```
|
|
||||||
|
|
||||||
This affects both `rebuild_and_restart` and any agent-driven cargo commands. Fixed by adding
|
|
||||||
`build-essential`, `pkg-config`, and `libssl-dev` to the runtime apt-get block.
|
|
||||||
|
|
||||||
### Key questions — status
|
|
||||||
|
|
||||||
| Question | Status | Answer |
|
|
||||||
|---|---|---|
|
|
||||||
| Dockerfile — minimal image? | ✅ | `rust:1.90-bookworm` base + Node 22 + Claude Code CLI + cargo-nextest. Runtime stage needs `build-essential`. |
|
|
||||||
| Cargo cache persistence? | ✅ | Named volume `cargo-registry:/usr/local/cargo/registry` — works well. |
|
|
||||||
| Claude Code state? | ✅ | Named volume `claude-state:/root/.claude` — correct approach. |
|
|
||||||
| Bind mount performance? | ✅ | Sequential I/O fine; directory traversal slow — mitigated by target/ volumes. |
|
|
||||||
| API key? | ✅ | Passed as `ANTHROPIC_API_KEY` env var via compose. |
|
|
||||||
| Git on bind mount? | ✅ | Works — host sees commits immediately. |
|
|
||||||
| rebuild_and_restart? | ⚠️ | Needs `build-essential` fix (now patched). Source at `/app` via bind mount is correct. |
|
|
||||||
| OrbStack vs Docker Desktop? | ⬜ | Not yet benchmarked — requires second session. OrbStack VirtioFS confirmed working. Docker Desktop likely worse for directory traversal. |
|
|
||||||
| Resource limits? | ✅ | `deploy.resources` limits in compose (4 CPU / 8G RAM). |
|
|
||||||
| Networking? | ✅ | Port 3001 exposed. Outbound Matrix/Slack/WhatsApp connections are unrestricted inside container. |
|
|
||||||
|
|
||||||
## Recommendation
|
|
||||||
|
|
||||||
**Proceed with Docker + OrbStack.** The architecture is sound and the PoC works. Two bugs
|
|
||||||
have been fixed (missing `build-essential`, missing `target/` volumes). The remaining unknown
|
|
||||||
is the Docker Desktop vs OrbStack performance delta — we expect OrbStack to be significantly
|
|
||||||
faster based on VirtioFS vs gRPC-FUSE, but the magnitude needs measuring.
|
|
||||||
|
|
||||||
**Next step**: rebuild the image with the patched Dockerfile and run a full `cargo build --release`
|
|
||||||
benchmark to get an end-to-end number. Then repeat on Docker Desktop for comparison.
|
|
||||||
@@ -1,20 +0,0 @@
|
|||||||
---
|
|
||||||
name: "Remove deprecated manual_qa front matter field"
|
|
||||||
---
|
|
||||||
|
|
||||||
# Story 361: Remove deprecated manual_qa front matter field
|
|
||||||
|
|
||||||
## User Story
|
|
||||||
|
|
||||||
As a developer, I want the deprecated manual_qa boolean field removed from the codebase, so that the front matter schema stays clean and doesn't accumulate legacy boolean flags alongside the more expressive qa: server|agent|human field that replaced it.
|
|
||||||
|
|
||||||
## Acceptance Criteria
|
|
||||||
|
|
||||||
- [ ] manual_qa field is removed from the FrontMatter and StoryMetadata structs in story_metadata.rs
|
|
||||||
- [ ] Legacy mapping from manual_qa: true → qa: human is removed
|
|
||||||
- [ ] Any existing story files using manual_qa are migrated to qa: human
|
|
||||||
- [ ] Codebase compiles cleanly with no references to manual_qa remaining
|
|
||||||
|
|
||||||
## Out of Scope
|
|
||||||
|
|
||||||
- TBD
|
|
||||||
@@ -1,18 +0,0 @@
|
|||||||
---
|
|
||||||
name: Live Test Gate Updates
|
|
||||||
---
|
|
||||||
|
|
||||||
# Story 57: Live Test Gate Updates
|
|
||||||
|
|
||||||
## User Story
|
|
||||||
|
|
||||||
As a user, I want the Gate and Todo panels to update automatically when tests are recorded or acceptance is checked, so I can see progress without manually refreshing.
|
|
||||||
|
|
||||||
## Acceptance Criteria
|
|
||||||
|
|
||||||
- [ ] Server broadcasts a `{"type": "notification", "topic": "tests"}` event over `/ws` when tests are recorded, acceptance is checked, or coverage is collected
|
|
||||||
- [ ] GatePanel auto-refreshes its data when it receives a `tests` notification
|
|
||||||
- [ ] TodoPanel auto-refreshes its data when it receives a `tests` notification
|
|
||||||
- [ ] Manual refresh buttons continue to work
|
|
||||||
- [ ] Panels do not flicker or lose scroll position on auto-refresh
|
|
||||||
- [ ] End-to-end test: record test results via MCP, verify Gate panel updates without manual refresh
|
|
||||||
Reference in New Issue
Block a user