Files
huskies/.living_spec/stories/archive/17_display_remaining_context.md
T
Dave bd8d838457 Story 17: Display Context Window Usage with emoji indicator
- Added real-time context window usage indicator in header
- Format: emoji + percentage (🟢 52%)
- Color-coded emoji: 🟢 <75%, 🟡 <90%, 🔴 >=90%
- Hover tooltip shows full details: 'Context: 4,300 / 8,192 tokens (52%)'
- Token estimation: 1 token ≈ 4 characters
- Model-aware context windows: llama3 (8K), qwen2.5 (32K), deepseek (16K)
- Includes system prompts, messages, tool calls, and streaming content
- Updates in real-time as conversation progresses
- All quality checks passing (TypeScript, Biome, Clippy, builds)

Tested and verified:
- Shows accurate percentage of context usage
- Emoji changes color at appropriate thresholds
- Different models show correct context window sizes
- Can exceed 100% when over limit (shows red)
- Tooltip provides exact token counts
2025-12-27 17:26:21 +00:00

3.0 KiB

Story 17: Display Context Window Usage

User Story

As a user, I want to see how much of the model's context window I'm currently using, so that I know when I'm approaching the limit and should start a new session to avoid losing conversation quality.

Acceptance Criteria

  • A visual indicator shows the current context usage (e.g., "2.5K / 8K tokens" or percentage)
  • The indicator is always visible in the UI (header area recommended)
  • The display updates in real-time as messages are added
  • Different models show their appropriate context window size (e.g., 8K for llama3.1, 128K for larger models)
  • The indicator changes color or style when approaching the limit (e.g., yellow at 75%, red at 90%)
  • Hovering over the indicator shows more details (tokens per message breakdown - optional)
  • The calculation includes system prompts, user messages, assistant responses, and tool outputs
  • Token counting is reasonably accurate (doesn't need to be perfect, estimate is fine)

Out of Scope

  • Exact token counting (approximation is acceptable)
  • Automatic session clearing when limit reached
  • Per-message token counts in the UI
  • Token usage history or analytics
  • Different tokenizers for different models (use one estimation method)
  • Backend token tracking from Ollama (estimate on frontend)

Technical Notes

Token Estimation

  • Simple approximation: 1 token ≈ 4 characters (English text)
  • Or use a basic tokenizer library like gpt-tokenizer or tiktoken (JS port)
  • Count all message content: system prompts + user messages + assistant responses + tool outputs
  • Include tool call JSON in the count

Context Window Sizes

Common model context windows:

  • llama3.1, llama3.2: 8K tokens (8,192)
  • qwen2.5-coder: 32K tokens
  • deepseek-coder: 16K tokens
  • Default/unknown: 8K tokens

Implementation Approach

// Simple character-based estimation
const estimateTokens = (text: string): number => {
  return Math.ceil(text.length / 4);
};

const calculateTotalTokens = (messages: Message[]): number => {
  let total = 0;
  // Add system prompt tokens (from backend)
  total += estimateTokens(SYSTEM_PROMPT);
  
  // Add all message tokens
  for (const msg of messages) {
    total += estimateTokens(msg.content);
    if (msg.tool_calls) {
      total += estimateTokens(JSON.stringify(msg.tool_calls));
    }
  }
  
  return total;
};

UI Placement

  • Header area, right side near model selector
  • Format: "2.5K / 8K tokens (31%)"
  • Color coding:
    • Green/default: 0-74%
    • Yellow/warning: 75-89%
    • Red/danger: 90-100%

Design Considerations

  • Keep it subtle and non-intrusive
  • Should be informative but not alarming
  • Consider a small progress bar or circular indicator
  • Example: "📊 2,450 / 8,192 (30%)"
  • Or icon-based: "🟢 30% context"

Future Enhancements (Not in this story)

  • Backend token counting from Ollama (if available)
  • Per-message token display on hover
  • "Summarize and continue" feature to compress history
  • Export/archive conversation before clearing