.living_spec/stories/archive/17_display_remaining_context.md

# Story 17: Display Context Window Usage

## User Story
As a user, I want to see how much of the model's context window I'm currently using, so that I know when I'm approaching the limit and should start a new session to avoid losing conversation quality.

## Acceptance Criteria
- [x] A visual indicator shows the current context usage (e.g., "2.5K / 8K tokens" or percentage)
- [x] The indicator is always visible in the UI (header area recommended)
- [x] The display updates in real-time as messages are added
- [x] Different models show their appropriate context window size (e.g., 8K for llama3.1, 128K for larger models)
- [x] The indicator changes color or style when approaching the limit (e.g., yellow at 75%, red at 90%)
- [x] Hovering over the indicator shows more details (tokens per message breakdown - optional)
- [x] The calculation includes system prompts, user messages, assistant responses, and tool outputs
- [x] Token counting is reasonably accurate (doesn't need to be perfect, estimate is fine)

## Out of Scope
- Exact token counting (approximation is acceptable)
- Automatic session clearing when limit reached
- Per-message token counts in the UI
- Token usage history or analytics
- Different tokenizers for different models (use one estimation method)
- Backend token tracking from Ollama (estimate on frontend)

## Technical Notes

### Token Estimation
- Simple approximation: 1 token ≈ 4 characters (English text)
- Or use a basic tokenizer library like `gpt-tokenizer` or `tiktoken` (JS port)
- Count all message content: system prompts + user messages + assistant responses + tool outputs
- Include tool call JSON in the count

### Context Window Sizes
Common model context windows:
- llama3.1, llama3.2: 8K tokens (8,192)
- qwen2.5-coder: 32K tokens
- deepseek-coder: 16K tokens
- Default/unknown: 8K tokens

### Implementation Approach
```tsx
// Simple character-based estimation
const estimateTokens = (text: string): number => {
  return Math.ceil(text.length / 4);
};

const calculateTotalTokens = (messages: Message[]): number => {
  let total = 0;
  // Add system prompt tokens (from backend)
  total += estimateTokens(SYSTEM_PROMPT);
  
  // Add all message tokens
  for (const msg of messages) {
    total += estimateTokens(msg.content);
    if (msg.tool_calls) {
      total += estimateTokens(JSON.stringify(msg.tool_calls));
    }
  }
  
  return total;
};
```

### UI Placement
- Header area, right side near model selector
- Format: "2.5K / 8K tokens (31%)"
- Color coding:
  - Green/default: 0-74%
  - Yellow/warning: 75-89%
  - Red/danger: 90-100%

## Design Considerations
- Keep it subtle and non-intrusive
- Should be informative but not alarming
- Consider a small progress bar or circular indicator
- Example: "📊 2,450 / 8,192 (30%)"
- Or icon-based: "🟢 30% context"

## Future Enhancements (Not in this story)
- Backend token counting from Ollama (if available)
- Per-message token display on hover
- "Summarize and continue" feature to compress history
- Export/archive conversation before clearing
Story 17: Display Context Window Usage with emoji indicator 2025-12-27 17:26:21 +00:00			`# Story 17: Display Context Window Usage`

			`## User Story`
			`As a user, I want to see how much of the model's context window I'm currently using, so that I know when I'm approaching the limit and should start a new session to avoid losing conversation quality.`

			`## Acceptance Criteria`
			`- [x] A visual indicator shows the current context usage (e.g., "2.5K / 8K tokens" or percentage)`
			`- [x] The indicator is always visible in the UI (header area recommended)`
			`- [x] The display updates in real-time as messages are added`
			`- [x] Different models show their appropriate context window size (e.g., 8K for llama3.1, 128K for larger models)`
			`- [x] The indicator changes color or style when approaching the limit (e.g., yellow at 75%, red at 90%)`
			`- [x] Hovering over the indicator shows more details (tokens per message breakdown - optional)`
			`- [x] The calculation includes system prompts, user messages, assistant responses, and tool outputs`
			`- [x] Token counting is reasonably accurate (doesn't need to be perfect, estimate is fine)`

			`## Out of Scope`
			`- Exact token counting (approximation is acceptable)`
			`- Automatic session clearing when limit reached`
			`- Per-message token counts in the UI`
			`- Token usage history or analytics`
			`- Different tokenizers for different models (use one estimation method)`
			`- Backend token tracking from Ollama (estimate on frontend)`

			`## Technical Notes`

			`### Token Estimation`
			`- Simple approximation: 1 token ≈ 4 characters (English text)`
			- Or use a basic tokenizer library like `gpt-tokenizer` or `tiktoken` (JS port)
			`- Count all message content: system prompts + user messages + assistant responses + tool outputs`
			`- Include tool call JSON in the count`

			`### Context Window Sizes`
			`Common model context windows:`
			`- llama3.1, llama3.2: 8K tokens (8,192)`
			`- qwen2.5-coder: 32K tokens`
			`- deepseek-coder: 16K tokens`
			`- Default/unknown: 8K tokens`

			`### Implementation Approach`
			```tsx
			`// Simple character-based estimation`
			`const estimateTokens = (text: string): number => {`
			`return Math.ceil(text.length / 4);`
			`};`

			`const calculateTotalTokens = (messages: Message[]): number => {`
			`let total = 0;`
			`// Add system prompt tokens (from backend)`
			`total += estimateTokens(SYSTEM_PROMPT);`

			`// Add all message tokens`
			`for (const msg of messages) {`
			`total += estimateTokens(msg.content);`
			`if (msg.tool_calls) {`
			`total += estimateTokens(JSON.stringify(msg.tool_calls));`
			`}`
			`}`

			`return total;`
			`};`
			```

			`### UI Placement`
			`- Header area, right side near model selector`
			`- Format: "2.5K / 8K tokens (31%)"`
			`- Color coding:`
			`- Green/default: 0-74%`
			`- Yellow/warning: 75-89%`
			`- Red/danger: 90-100%`

			`## Design Considerations`
			`- Keep it subtle and non-intrusive`
			`- Should be informative but not alarming`
			`- Consider a small progress bar or circular indicator`
			`- Example: "📊 2,450 / 8,192 (30%)"`
			`- Or icon-based: "🟢 30% context"`

			`## Future Enhancements (Not in this story)`
			`- Backend token counting from Ollama (if available)`
			`- Per-message token display on hover`
			`- "Summarize and continue" feature to compress history`
			`- Export/archive conversation before clearing`