Story 17: Display Context Window Usage with emoji indicator

- Added real-time context window usage indicator in header - Format: emoji + percentage (🟢 52%) - Color-coded emoji: 🟢 <75%, 🟡 <90%, 🔴 >=90% - Hover tooltip shows full details: 'Context: 4,300 / 8,192 tokens (52%)' - Token estimation: 1 token ≈ 4 characters - Model-aware context windows: llama3 (8K), qwen2.5 (32K), deepseek (16K) - Includes system prompts, messages, tool calls, and streaming content - Updates in real-time as conversation progresses - All quality checks passing (TypeScript, Biome, Clippy, builds) Tested and verified: - Shows accurate percentage of context usage - Emoji changes color at appropriate thresholds - Different models show correct context window sizes - Can exceed 100% when over limit (shows red) - Tooltip provides exact token counts
2025-12-27 17:26:21 +00:00
parent 9965c78221
commit bd8d838457
4 changed files with 811 additions and 597 deletions
--- a/.living_spec/specs/functional/UI_UX.md
+++ b/.living_spec/specs/functional/UI_UX.md
@@ -338,3 +338,69 @@ Provide a clear, accessible way for users to start a new session by clearing the
 - "Clear Chat" (direct but less friendly)
 - "Start Over" (conversational)
 - Icon: 🔄 or ⊕ (plus in circle)
 ## Context Window Usage Display
 ### Problem
 Users have no visibility into how much of the model's context window they're using. This leads to:
 - Unexpected quality degradation when context limit is reached
 - Uncertainty about when to start a new session
 - Inability to gauge conversation length
 ### Solution: Real-time Context Usage Indicator
 Display a persistent indicator showing current token usage vs. model's context window limit.
 ### Requirements
 1. **Visual Indicator:** Always visible in header area
 2. **Real-time Updates:** Updates as messages are added
 3. **Model-Aware:** Shows correct limit based on selected model
 4. **Color Coding:** Visual warning as limit approaches
   - Green/default: 0-74% usage
   - Yellow/warning: 75-89% usage
   - Red/danger: 90-100% usage
 5. **Clear Format:** "2.5K / 8K tokens (31%)" or similar
 6. **Token Estimation:** Approximate token count for all messages
 ### Implementation Notes
 **Token Estimation:**
 - Use simple approximation: 1 token ≈ 4 characters
 - Or integrate `gpt-tokenizer` for more accuracy
 - Count: system prompts + user messages + assistant responses + tool outputs + tool calls
 **Model Context Windows:**
 - llama3.1, llama3.2: 8K tokens
 - qwen2.5-coder: 32K tokens
 - deepseek-coder: 16K tokens  
 - Default/unknown: 8K tokens
 **Calculation:**
 ```tsx
 const estimateTokens = (text: string): number => {
  return Math.ceil(text.length / 4);
 };
 const calculateContextUsage = (messages: Message[], systemPrompt: string) => {
  let total = estimateTokens(systemPrompt);
  messages.forEach(msg => {
    total += estimateTokens(msg.content);
    if (msg.tool_calls) {
      total += estimateTokens(JSON.stringify(msg.tool_calls));
    }
  });
  return total;
 };
 ```
 **UI Placement:**
 - Header area, near model selector
 - Non-intrusive but always visible
 - Optional tooltip with breakdown on hover
 ### Edge Cases
 - Empty conversation: Show "0 / 8K"
 - During streaming: Include partial content
 - After clearing: Reset to 0
 - Model change: Update context window limit
--- a/.living_spec/stories/17_display_remaining_context.md
+++ b/.living_spec/stories/17_display_remaining_context.md
--- a/.living_spec/stories/archive/17_display_remaining_context.md
+++ b/.living_spec/stories/archive/17_display_remaining_context.md
@@ -0,0 +1,82 @@
 # Story 17: Display Context Window Usage
 ## User Story
 As a user, I want to see how much of the model's context window I'm currently using, so that I know when I'm approaching the limit and should start a new session to avoid losing conversation quality.
 ## Acceptance Criteria
 - [x] A visual indicator shows the current context usage (e.g., "2.5K / 8K tokens" or percentage)
 - [x] The indicator is always visible in the UI (header area recommended)
 - [x] The display updates in real-time as messages are added
 - [x] Different models show their appropriate context window size (e.g., 8K for llama3.1, 128K for larger models)
 - [x] The indicator changes color or style when approaching the limit (e.g., yellow at 75%, red at 90%)
 - [x] Hovering over the indicator shows more details (tokens per message breakdown - optional)
 - [x] The calculation includes system prompts, user messages, assistant responses, and tool outputs
 - [x] Token counting is reasonably accurate (doesn't need to be perfect, estimate is fine)
 ## Out of Scope
 - Exact token counting (approximation is acceptable)
 - Automatic session clearing when limit reached
 - Per-message token counts in the UI
 - Token usage history or analytics
 - Different tokenizers for different models (use one estimation method)
 - Backend token tracking from Ollama (estimate on frontend)
 ## Technical Notes
 ### Token Estimation
 - Simple approximation: 1 token ≈ 4 characters (English text)
 - Or use a basic tokenizer library like `gpt-tokenizer` or `tiktoken` (JS port)
 - Count all message content: system prompts + user messages + assistant responses + tool outputs
 - Include tool call JSON in the count
 ### Context Window Sizes
 Common model context windows:
 - llama3.1, llama3.2: 8K tokens (8,192)
 - qwen2.5-coder: 32K tokens
 - deepseek-coder: 16K tokens
 - Default/unknown: 8K tokens
 ### Implementation Approach
 ```tsx
 // Simple character-based estimation
 const estimateTokens = (text: string): number => {
  return Math.ceil(text.length / 4);
 };
 const calculateTotalTokens = (messages: Message[]): number => {
  let total = 0;
  // Add system prompt tokens (from backend)
  total += estimateTokens(SYSTEM_PROMPT);
  // Add all message tokens
  for (const msg of messages) {
    total += estimateTokens(msg.content);
    if (msg.tool_calls) {
      total += estimateTokens(JSON.stringify(msg.tool_calls));
    }
  }
  return total;
 };
 ```
 ### UI Placement
 - Header area, right side near model selector
 - Format: "2.5K / 8K tokens (31%)"
 - Color coding:
  - Green/default: 0-74%
  - Yellow/warning: 75-89%
  - Red/danger: 90-100%
 ## Design Considerations
 - Keep it subtle and non-intrusive
 - Should be informative but not alarming
 - Consider a small progress bar or circular indicator
 - Example: "📊 2,450 / 8,192 (30%)"
 - Or icon-based: "🟢 30% context"
 ## Future Enhancements (Not in this story)
 - Backend token counting from Ollama (if available)
 - Per-message token display on hover
 - "Summarize and continue" feature to compress history
 - Export/archive conversation before clearing
--- a/src/components/Chat.tsx
+++ b/src/components/Chat.tsx