# Story 17: Display Context Window Usage ## User Story As a user, I want to see how much of the model's context window I'm currently using, so that I know when I'm approaching the limit and should start a new session to avoid losing conversation quality. ## Acceptance Criteria - [x] A visual indicator shows the current context usage (e.g., "2.5K / 8K tokens" or percentage) - [x] The indicator is always visible in the UI (header area recommended) - [x] The display updates in real-time as messages are added - [x] Different models show their appropriate context window size (e.g., 8K for llama3.1, 128K for larger models) - [x] The indicator changes color or style when approaching the limit (e.g., yellow at 75%, red at 90%) - [x] Hovering over the indicator shows more details (tokens per message breakdown - optional) - [x] The calculation includes system prompts, user messages, assistant responses, and tool outputs - [x] Token counting is reasonably accurate (doesn't need to be perfect, estimate is fine) ## Out of Scope - Exact token counting (approximation is acceptable) - Automatic session clearing when limit reached - Per-message token counts in the UI - Token usage history or analytics - Different tokenizers for different models (use one estimation method) - Backend token tracking from Ollama (estimate on frontend) ## Technical Notes ### Token Estimation - Simple approximation: 1 token ≈ 4 characters (English text) - Or use a basic tokenizer library like `gpt-tokenizer` or `tiktoken` (JS port) - Count all message content: system prompts + user messages + assistant responses + tool outputs - Include tool call JSON in the count ### Context Window Sizes Common model context windows: - llama3.1, llama3.2: 8K tokens (8,192) - qwen2.5-coder: 32K tokens - deepseek-coder: 16K tokens - Default/unknown: 8K tokens ### Implementation Approach ```tsx // Simple character-based estimation const estimateTokens = (text: string): number => { return Math.ceil(text.length / 4); }; const calculateTotalTokens = (messages: Message[]): number => { let total = 0; // Add system prompt tokens (from backend) total += estimateTokens(SYSTEM_PROMPT); // Add all message tokens for (const msg of messages) { total += estimateTokens(msg.content); if (msg.tool_calls) { total += estimateTokens(JSON.stringify(msg.tool_calls)); } } return total; }; ``` ### UI Placement - Header area, right side near model selector - Format: "2.5K / 8K tokens (31%)" - Color coding: - Green/default: 0-74% - Yellow/warning: 75-89% - Red/danger: 90-100% ## Design Considerations - Keep it subtle and non-intrusive - Should be informative but not alarming - Consider a small progress bar or circular indicator - Example: "📊 2,450 / 8,192 (30%)" - Or icon-based: "🟢 30% context" ## Future Enhancements (Not in this story) - Backend token counting from Ollama (if available) - Per-message token display on hover - "Summarize and continue" feature to compress history - Export/archive conversation before clearing