Story 17: Display Context Window Usage with emoji indicator

- Added real-time context window usage indicator in header - Format: emoji + percentage (🟢 52%) - Color-coded emoji: 🟢 <75%, 🟡 <90%, 🔴 >=90% - Hover tooltip shows full details: 'Context: 4,300 / 8,192 tokens (52%)' - Token estimation: 1 token ≈ 4 characters - Model-aware context windows: llama3 (8K), qwen2.5 (32K), deepseek (16K) - Includes system prompts, messages, tool calls, and streaming content - Updates in real-time as conversation progresses - All quality checks passing (TypeScript, Biome, Clippy, builds) Tested and verified: - Shows accurate percentage of context usage - Emoji changes color at appropriate thresholds - Different models show correct context window sizes - Can exceed 100% when over limit (shows red) - Tooltip provides exact token counts
2025-12-27 17:26:21 +00:00
parent 9965c78221
commit bd8d838457
4 changed files with 811 additions and 597 deletions
--- a/.living_spec/specs/functional/UI_UX.md
+++ b/.living_spec/specs/functional/UI_UX.md
@@ -338,3 +338,69 @@ Provide a clear, accessible way for users to start a new session by clearing the
 - "Clear Chat" (direct but less friendly)
 - "Start Over" (conversational)
 - Icon: 🔄 or ⊕ (plus in circle)
+
+## Context Window Usage Display
+
+### Problem
+Users have no visibility into how much of the model's context window they're using. This leads to:
+- Unexpected quality degradation when context limit is reached
+- Uncertainty about when to start a new session
+- Inability to gauge conversation length
+
+### Solution: Real-time Context Usage Indicator
+Display a persistent indicator showing current token usage vs. model's context window limit.
+
+### Requirements
+
+1. **Visual Indicator:** Always visible in header area
+2. **Real-time Updates:** Updates as messages are added
+3. **Model-Aware:** Shows correct limit based on selected model
+4. **Color Coding:** Visual warning as limit approaches
+   - Green/default: 0-74% usage
+   - Yellow/warning: 75-89% usage
+   - Red/danger: 90-100% usage
+5. **Clear Format:** "2.5K / 8K tokens (31%)" or similar
+6. **Token Estimation:** Approximate token count for all messages
+
+### Implementation Notes
+
+**Token Estimation:**
+- Use simple approximation: 1 token ≈ 4 characters
+- Or integrate `gpt-tokenizer` for more accuracy
+- Count: system prompts + user messages + assistant responses + tool outputs + tool calls
+
+**Model Context Windows:**
+- llama3.1, llama3.2: 8K tokens
+- qwen2.5-coder: 32K tokens
+- deepseek-coder: 16K tokens  
+- Default/unknown: 8K tokens
+
+**Calculation:**
+```tsx
+const estimateTokens = (text: string): number => {
+  return Math.ceil(text.length / 4);
+};
+
+const calculateContextUsage = (messages: Message[], systemPrompt: string) => {
+  let total = estimateTokens(systemPrompt);
+  messages.forEach(msg => {
+    total += estimateTokens(msg.content);
+    if (msg.tool_calls) {
+      total += estimateTokens(JSON.stringify(msg.tool_calls));
+    }
+  });
+  return total;
+};
+```
+
+**UI Placement:**
+- Header area, near model selector
+- Non-intrusive but always visible
+- Optional tooltip with breakdown on hover
+
+### Edge Cases
+- Empty conversation: Show "0 / 8K"
+- During streaming: Include partial content
+- After clearing: Reset to 0
+- Model change: Update context window limit
+