Story 17: Display Context Window Usage with emoji indicator

- Added real-time context window usage indicator in header - Format: emoji + percentage (🟢 52%) - Color-coded emoji: 🟢 <75%, 🟡 <90%, 🔴 >=90% - Hover tooltip shows full details: 'Context: 4,300 / 8,192 tokens (52%)' - Token estimation: 1 token ≈ 4 characters - Model-aware context windows: llama3 (8K), qwen2.5 (32K), deepseek (16K) - Includes system prompts, messages, tool calls, and streaming content - Updates in real-time as conversation progresses - All quality checks passing (TypeScript, Biome, Clippy, builds) Tested and verified: - Shows accurate percentage of context usage - Emoji changes color at appropriate thresholds - Different models show correct context window sizes - Can exceed 100% when over limit (shows red) - Tooltip provides exact token counts
2025-12-27 17:26:21 +00:00
parent 9965c78221
commit bd8d838457
4 changed files with 811 additions and 597 deletions
--- a/.living_spec/stories/archive/17_display_remaining_context.md
+++ b/.living_spec/stories/archive/17_display_remaining_context.md
@@ -0,0 +1,82 @@
+# Story 17: Display Context Window Usage
+
+## User Story
+As a user, I want to see how much of the model's context window I'm currently using, so that I know when I'm approaching the limit and should start a new session to avoid losing conversation quality.
+
+## Acceptance Criteria
+- [x] A visual indicator shows the current context usage (e.g., "2.5K / 8K tokens" or percentage)
+- [x] The indicator is always visible in the UI (header area recommended)
+- [x] The display updates in real-time as messages are added
+- [x] Different models show their appropriate context window size (e.g., 8K for llama3.1, 128K for larger models)
+- [x] The indicator changes color or style when approaching the limit (e.g., yellow at 75%, red at 90%)
+- [x] Hovering over the indicator shows more details (tokens per message breakdown - optional)
+- [x] The calculation includes system prompts, user messages, assistant responses, and tool outputs
+- [x] Token counting is reasonably accurate (doesn't need to be perfect, estimate is fine)
+
+## Out of Scope
+- Exact token counting (approximation is acceptable)
+- Automatic session clearing when limit reached
+- Per-message token counts in the UI
+- Token usage history or analytics
+- Different tokenizers for different models (use one estimation method)
+- Backend token tracking from Ollama (estimate on frontend)
+
+## Technical Notes
+
+### Token Estimation
+- Simple approximation: 1 token ≈ 4 characters (English text)
+- Or use a basic tokenizer library like `gpt-tokenizer` or `tiktoken` (JS port)
+- Count all message content: system prompts + user messages + assistant responses + tool outputs
+- Include tool call JSON in the count
+
+### Context Window Sizes
+Common model context windows:
+- llama3.1, llama3.2: 8K tokens (8,192)
+- qwen2.5-coder: 32K tokens
+- deepseek-coder: 16K tokens
+- Default/unknown: 8K tokens
+
+### Implementation Approach
+```tsx
+// Simple character-based estimation
+const estimateTokens = (text: string): number => {
+  return Math.ceil(text.length / 4);
+};
+
+const calculateTotalTokens = (messages: Message[]): number => {
+  let total = 0;
+  // Add system prompt tokens (from backend)
+  total += estimateTokens(SYSTEM_PROMPT);
+  
+  // Add all message tokens
+  for (const msg of messages) {
+    total += estimateTokens(msg.content);
+    if (msg.tool_calls) {
+      total += estimateTokens(JSON.stringify(msg.tool_calls));
+    }
+  }
+  
+  return total;
+};
+```
+
+### UI Placement
+- Header area, right side near model selector
+- Format: "2.5K / 8K tokens (31%)"
+- Color coding:
+  - Green/default: 0-74%
+  - Yellow/warning: 75-89%
+  - Red/danger: 90-100%
+
+## Design Considerations
+- Keep it subtle and non-intrusive
+- Should be informative but not alarming
+- Consider a small progress bar or circular indicator
+- Example: "📊 2,450 / 8,192 (30%)"
+- Or icon-based: "🟢 30% context"
+
+## Future Enhancements (Not in this story)
+- Backend token counting from Ollama (if available)
+- Per-message token display on hover
+- "Summarize and continue" feature to compress history
+- Export/archive conversation before clearing