diff --git a/.storkit/work/1_backlog/407_spike_fly_io_machines_for_multi_tenant_storkit_saas.md b/.storkit/work/1_backlog/407_spike_fly_io_machines_for_multi_tenant_storkit_saas.md index 96ce49ee..cd47c114 100644 --- a/.storkit/work/1_backlog/407_spike_fly_io_machines_for_multi_tenant_storkit_saas.md +++ b/.storkit/work/1_backlog/407_spike_fly_io_machines_for_multi_tenant_storkit_saas.md @@ -10,15 +10,22 @@ Can Fly.io Machines provide sufficient isolation, fast enough cold start, and si ## Hypothesis -- TBD +Fly.io Machines (Firecracker-based microVMs) offer the right balance of isolation, cold-start speed, and operational simplicity for early-stage SaaS. A thin Rust auth proxy routes JWT-authenticated requests to per-tenant machines, avoiding the ops complexity of self-managed gVisor/Kubernetes. ## Timebox -- TBD +4 hours ## Investigation Plan -- TBD +- [ ] Review Fly.io Machines API — create/start/stop/destroy machine via REST, assess Rust `reqwest` integration +- [ ] Assess isolation model — Firecracker microVM vs gVisor; is it sufficient for tenants running arbitrary shell commands via claude code? +- [ ] Test cold start time for a storkit container image (target: <2s) +- [ ] Evaluate persistent volume support — can a volume be attached per tenant for `.storkit/` and project root? +- [ ] Assess Claude auth injection — how to securely pass `~/.claude/.credentials.json` per tenant at machine start +- [ ] Sketch the auth proxy design — JWT validation → machine lookup → reverse proxy (WebSocket support required) +- [ ] Check pricing model for always-on vs stop-on-idle machines at small tenant counts (10, 100, 1000) +- [ ] Identify any showstoppers (network egress limits, image registry, machine count limits per org) ## Findings