Files

T

Dave bd8d838457 Story 17: Display Context Window Usage with emoji indicator

- Added real-time context window usage indicator in header
- Format: emoji + percentage (🟢 52%)
- Color-coded emoji: 🟢 <75%, 🟡 <90%, 🔴 >=90%
- Hover tooltip shows full details: 'Context: 4,300 / 8,192 tokens (52%)'
- Token estimation: 1 token ≈ 4 characters
- Model-aware context windows: llama3 (8K), qwen2.5 (32K), deepseek (16K)
- Includes system prompts, messages, tool calls, and streaming content
- Updates in real-time as conversation progresses
- All quality checks passing (TypeScript, Biome, Clippy, builds)

Tested and verified:
- Shows accurate percentage of context usage
- Emoji changes color at appropriate thresholds
- Different models show correct context window sizes
- Can exceed 100% when over limit (shows red)
- Tooltip provides exact token counts

2025-12-27 17:26:21 +00:00

3.0 KiB

Raw Blame History

Story 17: Display Context Window Usage

User Story

As a user, I want to see how much of the model's context window I'm currently using, so that I know when I'm approaching the limit and should start a new session to avoid losing conversation quality.

Acceptance Criteria

A visual indicator shows the current context usage (e.g., "2.5K / 8K tokens" or percentage)
The indicator is always visible in the UI (header area recommended)
The display updates in real-time as messages are added
Different models show their appropriate context window size (e.g., 8K for llama3.1, 128K for larger models)
The indicator changes color or style when approaching the limit (e.g., yellow at 75%, red at 90%)
Hovering over the indicator shows more details (tokens per message breakdown - optional)
The calculation includes system prompts, user messages, assistant responses, and tool outputs
Token counting is reasonably accurate (doesn't need to be perfect, estimate is fine)

Out of Scope

Exact token counting (approximation is acceptable)
Automatic session clearing when limit reached
Per-message token counts in the UI
Token usage history or analytics
Different tokenizers for different models (use one estimation method)
Backend token tracking from Ollama (estimate on frontend)

Technical Notes

Token Estimation

Simple approximation: 1 token ≈ 4 characters (English text)
Or use a basic tokenizer library like gpt-tokenizer or tiktoken (JS port)
Count all message content: system prompts + user messages + assistant responses + tool outputs
Include tool call JSON in the count

Context Window Sizes

Common model context windows:

llama3.1, llama3.2: 8K tokens (8,192)
qwen2.5-coder: 32K tokens
deepseek-coder: 16K tokens
Default/unknown: 8K tokens

Implementation Approach

// Simple character-based estimation
const estimateTokens = (text: string): number => {
  return Math.ceil(text.length / 4);
};

const calculateTotalTokens = (messages: Message[]): number => {
  let total = 0;
  // Add system prompt tokens (from backend)
  total += estimateTokens(SYSTEM_PROMPT);
  
  // Add all message tokens
  for (const msg of messages) {
    total += estimateTokens(msg.content);
    if (msg.tool_calls) {
      total += estimateTokens(JSON.stringify(msg.tool_calls));
    }
  }
  
  return total;
};

UI Placement

Header area, right side near model selector
Format: "2.5K / 8K tokens (31%)"
Color coding:
- Green/default: 0-74%
- Yellow/warning: 75-89%
- Red/danger: 90-100%

Design Considerations

Keep it subtle and non-intrusive
Should be informative but not alarming
Consider a small progress bar or circular indicator
Example: "📊 2,450 / 8,192 (30%)"
Or icon-based: "🟢 30% context"

Future Enhancements (Not in this story)

Backend token counting from Ollama (if available)
Per-message token display on hover
"Summarize and continue" feature to compress history
Export/archive conversation before clearing

3.0 KiB Raw Blame History