huskies

Author	SHA1	Message	Date
dave	48ea612739	fix: remove startup CRDT stage sync — it fights the done→archived sweep The sync_crdt_stages_from_db migration reads pipeline_items (which has stale 5_done stages) and overwrites the CRDT back to 5_done for stories that were already swept to 6_archived. On every restart, done stories reappear and get re-swept. The migration served its purpose — CRDT stages are now correct. Remove it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 13:50:07 +00:00
dave	17d635b66b	fix: restore CRDT-based triage command (535 fix was reverted by merge conflict) Story 535's triage fix was overwritten by a subsequent merge that resolved a conflict by taking the old filesystem-based version. Re-applies the CRDT-based triage that reads from pipeline state and content store, works for any pipeline stage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 13:43:26 +00:00
dave	4ab723f40b	huskies: merge 538_bug_done_archived_sweep_never_fires_because_stage_done_projection_uses_utc_now_instead_of_real_merged_at_timestamp	2026-04-11 13:29:38 +00:00
dave	5d193bb568	huskies: merge 537_bug_delete_item_sets_stage_to_deleted_string_instead_of_writing_a_crdt_tombstone	2026-04-11 13:25:45 +00:00
dave	dcf6cf8f82	fix: collapse consecutive str::replace calls to satisfy clippy	2026-04-11 13:21:47 +00:00
dave	eea54ca616	fix: thread-local CRDT and content store for test isolation Tests shared a global CRDT singleton and content store HashMap, causing flaky failures when parallel tests wrote items that polluted each other's assertions. 3-5 random test failures per run. Both CRDT_STATE and CONTENT_STORE now use thread_local! in test mode so each test thread gets its own isolated instance. Production code is unchanged — it still uses the global OnceLock singletons. Also fixed 3 tests (create_story_writes_correct_content, next_item_number_increments_from_existing_bugs, next_item_number_scans_archived_too) that relied on leaked state from other tests — they now write to the content store explicitly. Result: 1902 passed, 0 failed across 5 consecutive runs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 13:02:09 +00:00
dave	dd53870c59	fix: agent prompts use run_tests MCP tool instead of running script/test via Bash Agents were running script/test directly through the PTY, streaming the full output of npm install, cargo clippy, cargo test, and frontend builds into session logs. This tripled session log sizes (~200KB to ~600KB per session) and contributed to CLI SIGABRT crashes. The run_tests MCP tool already runs script/test server-side and returns a truncated JSON summary. Agents now use it exclusively. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 11:46:05 +00:00
dave	5696d77922	debug: add PTY spawn diagnostics for Session: None investigation When an agent CLI exits without creating a session, we now log: - Number of prior sessions and total session log bytes - Child process exit status (exit code or signal) - Explicit SESSION NONE warning with context This will help diagnose whether the fatal runtime error (output.write assertion) correlates with accumulated sessions, budget exhaustion, or something else. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 11:21:06 +00:00
dave	44ef477a01	fix: skip rate limit timer for short blocks (≤10 min) — CLI handles internally The rate limit auto-scheduler was creating timers for every hard block, including short 5-minute throttles. This caused a death loop: agent hits rate limit, timer set, agent exits, pipeline restarts before timer fires, new agent dies instantly (Session: None) because API is still throttled. Short rate limits are handled naturally by the CLI's internal wait. Only schedule timers for long session-level blocks (>10 min) where the CLI will exit and needs external restart. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 10:52:14 +00:00
Timmy	de738b27ed	fix: CrdtNode derive macro defaults missing fields instead of panicking When replaying old CRDT ops that predate new struct fields (e.g. claimed_by, claim_ts added by story 479), node_from would call .unwrap() on None and panic during init. Now defaults to an empty CrdtNode::new() for missing fields, allowing schema evolution without breaking replay of historical ops. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 00:16:10 +01:00
dave	fc24da82ae	debug: add logging to sync_crdt_stages_from_db to diagnose stale backlog	2026-04-10 20:33:04 +00:00
dave	bae3619723	fix: startup migration syncs stale CRDT stages from pipeline_items DB 510 stories had stale 1_backlog stages in the CRDT because they were imported during the filesystem→CRDT migration and then moved forward via filesystem-only moves that never wrote CRDT ops. This made done stories appear as ghost entries in the backlog. On startup, reads the authoritative stage from pipeline_items and corrects any CRDT entries that disagree. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 19:58:17 +00:00
dave	ea36160667	fix: read_all_items must use deduplicated index, not raw CRDT entries read_all_items was iterating all CRDT entries including stale duplicates from earlier stage writes. A story written multiple times (backlog → current → done) would appear in the output multiple times with different stages, causing ghost entries in the pipeline status and backlog views. Now iterates only the index (story_id → visible_index map) which represents the latest-wins deduplicated view of each story. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 19:32:55 +00:00
dave	2e0ed98d42	huskies: merge 480_story_cryptographic_node_auth_for_distributed_mesh	2026-04-10 19:14:21 +00:00
dave	40893a8cb1	huskies: merge 535_bug_chat_status_number_and_mcp_tool_status_still_read_from_filesystem_broken_after_530	2026-04-10 19:01:31 +00:00
dave	bc2b1e244c	huskies: merge 498_bug_stale_merge_job_lock_prevents_new_merges_after_agent_dies	2026-04-10 18:55:05 +00:00
dave	6f7a0c7708	huskies: merge 479_story_build_agent_mode_with_crdt_based_work_claiming	2026-04-10 18:50:30 +00:00
dave	91be0ac47f	huskies: merge 534_refactor_unify_timer_tick_watchdog_and_watcher_sweep_into_a_single_1_second_tick_loop	2026-04-10 17:38:42 +00:00
dave	808935b446	huskies: merge 528_story_crdt_based_peer_discovery_via_node_presence_entries	2026-04-10 17:03:05 +00:00
dave	4c8fe910a7	huskies: merge 533_story_crdt_based_done_archived_sweep_to_replace_filesystem_based_watcher_sweep	2026-04-10 16:58:50 +00:00
dave	8f34c521fb	huskies: merge 508_story_configurable_rendezvous_peer_in_project_toml_with_outbound_crdt_sync_connect	2026-04-10 16:44:50 +00:00
dave	a59f4fc1a5	huskies: merge 532_story_remove_startup_reconcile_pass_and_drift_notification_no_filesystem_to_reconcile_against	2026-04-10 16:40:56 +00:00
dave	b88857c2e4	huskies: merge 507_story_apply_inbound_signedops_with_causal_order_queue_for_partition_recovery	2026-04-10 16:13:07 +00:00
dave	1ca9bc1bfd	huskies: merge 506_story_websocket_sync_endpoint_that_broadcasts_local_signedops_to_connected_peers	2026-04-10 15:52:49 +00:00
dave	73890c98fa	huskies: merge 505_story_signedop_wire_codec_for_crdt_sync_between_nodes	2026-04-10 15:35:10 +00:00
dave	bfede09fe6	huskies: merge 529_bug_stale_mergemaster_advance_moves_done_stories_back_to_merge_zombie_merge_loop	2026-04-10 15:20:34 +00:00
dave	11d19d8902	huskies: merge 530_story_eliminate_filesystem_markdown_shadows_entirely_crdt_db_is_the_only_story_store	2026-04-10 14:59:58 +00:00
dave	1dd675796b	huskies: merge 531_story_mcp_tool_to_read_agent_session_logs_from_disk_not_just_live_stream	2026-04-10 13:08:51 +00:00
dave	31388da609	huskies: merge 517_story_remove_filesystem_shadow_fallback_paths_from_lifecycle_rs_finish_the_migration_to_crdt_only	2026-04-10 13:00:25 +00:00
dave	fe405e81c6	huskies: merge 527_story_remove_rate_limit_hard_block_bot_notifications_from_matrix_chat	2026-04-10 11:27:36 +00:00
dave	7e5b9839e8	huskies: merge 523_refactor_introduce_script_test_script_lint_script_build_and_migrate_agent_prompts_off_tech_specific_commands	2026-04-10 11:22:51 +00:00
dave	2a24a4cc85	huskies: merge 522_story_migrate_status_command_pipeline_view_from_filesystem_to_pipeline_state_read_all_typed	2026-04-10 10:37:17 +00:00
dave	6310c8bf49	huskies: merge 518_story_apply_and_persist_should_log_when_persist_tx_send_fails_instead_of_silently_dropping_the_op	2026-04-10 10:33:01 +00:00
dave	61ae30873f	huskies: merge 516_story_update_story_description_should_create_the_description_section_if_it_doesn_t_exist_instead_of_erroring	2026-04-10 10:28:53 +00:00
dave	f015fe5a1d	huskies: merge 515_story_add_a_debug_mcp_tool_to_dump_the_in_memory_crdt_state_for_inspection	2026-04-10 10:24:30 +00:00
dave	c6b6be872b	huskies: merge 509_bug_create_story_silently_drops_description_and_any_other_unknown_parameters_with_no_error	2026-04-10 10:20:13 +00:00
dave	5377eeae5b	huskies: merge 513_story_startup_reconcile_pass_that_detects_drift_between_crdt_pipeline_items_and_filesystem_shadows	2026-04-10 10:16:45 +00:00
Timmy	92b212e7fd	huskies: merge 504_story_update_story_front_matter_mcp_schema_should_accept_non_string_values_lists_bools_numbers Squash merge of story 504: add MCP regression tests for non-string front_matter values (arrays, bools, integers). The schema change itself was already on master. Fixed the array assertion to match YAML's space-after-comma inline sequence format. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 11:08:21 +01:00
Timmy	9633ab35a6	fix: validate_story_dirs reads filesystem shadows instead of global CRDT singleton (bug 525) The post-520 migration changed validate_story_dirs to read from pipeline_state::read_all_typed() (the process-global CRDT singleton), ignoring its root: &Path argument. This broke test isolation — tests creating a tempdir saw dozens of results from ambient CRDT state, causing non-deterministic failures that blocked every mergemaster gate. Remove the CRDT singleton block and rely on the filesystem shadow scan that already uses the root argument correctly. 1845/1845 tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 10:52:42 +01:00
dave	d1b845fd2e	fix: move_item must not overwrite advanced CRDT stage when missing_ok=true (bug 524) When a story is found in the CRDT but not in the expected source stages, and missing_ok is true, return Ok(None) instead of proceeding with the move. This prevents promote_ready_backlog_stories from demoting a story that has already advanced to merge/done via a stale filesystem shadow in 1_backlog. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-10 00:21:39 +00:00
Timmy	934bda5904	Trying out sonnet for merges	2026-04-10 01:04:25 +01:00
Timmy	962e3d4e7d	fmt	2026-04-10 01:04:09 +01:00
dave	0de9200d48	huskies: merge 512_story_migrate_chat_commands_from_filesystem_lookup_to_crdt_db	2026-04-09 23:03:58 +00:00
dave	c324452b38	fix: commit uncommitted native JSON type changes on master These changes (HashMap<String, String> → HashMap<String, Value> for front matter, json_value_to_yaml_scalar, and oneOf schema for front_matter) were left uncommitted on master after a previous merge, blocking the cherry-pick step of story 509's merge. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-09 22:35:52 +00:00
dave	d3ee850f37	huskies: merge 500_story_remove_duplicate_pty_debug_log_lines	2026-04-09 22:16:03 +00:00
dave	cbe016d7a2	huskies: merge 519_story_mergemaster_should_detect_no_commits_ahead_of_master_and_fail_loudly_instead_of_exiting_silently	2026-04-09 22:11:09 +00:00
dave	6f6d37e955	huskies: merge 514_story_delete_story_should_do_a_full_cleanup_crdt_op_db_row_filesystem_shadow_worktree_pending_timers	2026-04-09 22:05:18 +00:00
dave	84717b04bd	huskies: merge 520_story_typed_pipeline_state_machine_in_rust_foundation_replaces_stringly_typed_crdt_views_with_strict_enums_subsumes_436	2026-04-09 21:27:48 +00:00
Timmy	1d9287389a	feat(521): evict_item primitive + purge_story MCP tool Adds the foundational capability to clear a story from the running server's in-memory CRDT state without restarting the process. This is story 521, motivated by the 2026-04-09 incident where stories 478 and 503 kept resurrecting from in-memory CRDT after every sqlite delete / worktree removal / timers.json clear. The only previous remedy was a full docker restart. Changes: - server/src/crdt_state.rs: new `pub fn evict_item(story_id: &str)`. Looks up the item's CRDT OpId via the visible-index map, calls the bft-json-crdt list `delete()` primitive to construct a tombstone op, runs it through the existing `apply_and_persist` machinery (which signs, applies to the in-memory CRDT, and queues for persistence to crdt_ops), rebuilds the story_id → visible_index map, and drops the in-memory CONTENT_STORE entry. The tombstone survives a restart because it's persisted as a real CRDT op. - server/src/http/mcp/story_tools.rs: new `tool_purge_story` MCP handler that takes a story_id and calls evict_item. Deliberately minimal — does NOT touch agents, worktrees, pipeline_items shadow table, timers.json, or filesystem shadows. Compose with stop_agent, remove_worktree, etc. for a full purge. Story 514 (delete_story full cleanup) is the future "do it all" tool. - server/src/http/mcp/mod.rs: registers the `purge_story` tool in the tools list and dispatch table. Usage: mcp__huskies__purge_story story_id="<full_story_id>" Returns a string confirming the eviction. The story will no longer appear in get_pipeline_status, list_agents, or any other API that reads from the in-memory CRDT view, and on the next server restart the persisted tombstone op will keep it from being reconstructed. This is a prerequisite for story 514 (delete_story full cleanup) and useful for any "kill it with fire" operator need. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 21:29:09 +01:00
Timmy	13635b01bc	wip(501): timer cancellation infrastructure (parallel session WIP + main.rs wiring) Bundles in-progress work from a parallel Claude session toward fixing bug 501 (rate-limit retry timer doesn't cancel on stop_agent / move_story / successful completion). This commit lands the foundation but the MCP tool wiring is still TODO. - server/src/chat/timer.rs: defense-in-depth check in tick_once that skips firing a timer for stories already past 3_qa (3_qa, 4_merge, 5_done, 6_archived). The primary cancellation path will be in the MCP tools; this guards races where a timer was scheduled before the story was advanced and the tool didn't get a chance to cancel it. - server/src/http/context.rs: adds `timer_store: Arc<TimerStore>` field on AppContext so MCP tools (move_story, stop_agent, ...) can reach the shared timer store and cancel pending entries when the user intervenes manually. The test helper is updated to construct one. - server/src/main.rs: wires up a TimerStore instance in the AppContext initialiser so the binary actually compiles after the context.rs field addition. TODO: the matrix bot's spawn_bot still creates its own TimerStore instance (in chat/transport/matrix/bot/run.rs:220-227) rather than consuming the shared one — that refactor is the next step in the bug 501 fix. What is NOT in this commit and is needed to actually fix bug 501: - The MCP tool side (move_story, stop_agent, delete_story) does not yet call timer_store.cancel(story_id) when invoked - The matrix bot's spawn_bot does not yet consume the shared timer_store from AppContext — it still creates its own Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 21:28:48 +01:00

1 2 3 4 5 ...

3196 Commits