Story 17: Display Context Window Usage with emoji indicator
- Added real-time context window usage indicator in header - Format: emoji + percentage (🟢 52%) - Color-coded emoji: 🟢 <75%, 🟡 <90%, 🔴 >=90% - Hover tooltip shows full details: 'Context: 4,300 / 8,192 tokens (52%)' - Token estimation: 1 token ≈ 4 characters - Model-aware context windows: llama3 (8K), qwen2.5 (32K), deepseek (16K) - Includes system prompts, messages, tool calls, and streaming content - Updates in real-time as conversation progresses - All quality checks passing (TypeScript, Biome, Clippy, builds) Tested and verified: - Shows accurate percentage of context usage - Emoji changes color at appropriate thresholds - Different models show correct context window sizes - Can exceed 100% when over limit (shows red) - Tooltip provides exact token counts
This commit is contained in:
@@ -338,3 +338,69 @@ Provide a clear, accessible way for users to start a new session by clearing the
|
|||||||
- "Clear Chat" (direct but less friendly)
|
- "Clear Chat" (direct but less friendly)
|
||||||
- "Start Over" (conversational)
|
- "Start Over" (conversational)
|
||||||
- Icon: 🔄 or ⊕ (plus in circle)
|
- Icon: 🔄 or ⊕ (plus in circle)
|
||||||
|
|
||||||
|
## Context Window Usage Display
|
||||||
|
|
||||||
|
### Problem
|
||||||
|
Users have no visibility into how much of the model's context window they're using. This leads to:
|
||||||
|
- Unexpected quality degradation when context limit is reached
|
||||||
|
- Uncertainty about when to start a new session
|
||||||
|
- Inability to gauge conversation length
|
||||||
|
|
||||||
|
### Solution: Real-time Context Usage Indicator
|
||||||
|
Display a persistent indicator showing current token usage vs. model's context window limit.
|
||||||
|
|
||||||
|
### Requirements
|
||||||
|
|
||||||
|
1. **Visual Indicator:** Always visible in header area
|
||||||
|
2. **Real-time Updates:** Updates as messages are added
|
||||||
|
3. **Model-Aware:** Shows correct limit based on selected model
|
||||||
|
4. **Color Coding:** Visual warning as limit approaches
|
||||||
|
- Green/default: 0-74% usage
|
||||||
|
- Yellow/warning: 75-89% usage
|
||||||
|
- Red/danger: 90-100% usage
|
||||||
|
5. **Clear Format:** "2.5K / 8K tokens (31%)" or similar
|
||||||
|
6. **Token Estimation:** Approximate token count for all messages
|
||||||
|
|
||||||
|
### Implementation Notes
|
||||||
|
|
||||||
|
**Token Estimation:**
|
||||||
|
- Use simple approximation: 1 token ≈ 4 characters
|
||||||
|
- Or integrate `gpt-tokenizer` for more accuracy
|
||||||
|
- Count: system prompts + user messages + assistant responses + tool outputs + tool calls
|
||||||
|
|
||||||
|
**Model Context Windows:**
|
||||||
|
- llama3.1, llama3.2: 8K tokens
|
||||||
|
- qwen2.5-coder: 32K tokens
|
||||||
|
- deepseek-coder: 16K tokens
|
||||||
|
- Default/unknown: 8K tokens
|
||||||
|
|
||||||
|
**Calculation:**
|
||||||
|
```tsx
|
||||||
|
const estimateTokens = (text: string): number => {
|
||||||
|
return Math.ceil(text.length / 4);
|
||||||
|
};
|
||||||
|
|
||||||
|
const calculateContextUsage = (messages: Message[], systemPrompt: string) => {
|
||||||
|
let total = estimateTokens(systemPrompt);
|
||||||
|
messages.forEach(msg => {
|
||||||
|
total += estimateTokens(msg.content);
|
||||||
|
if (msg.tool_calls) {
|
||||||
|
total += estimateTokens(JSON.stringify(msg.tool_calls));
|
||||||
|
}
|
||||||
|
});
|
||||||
|
return total;
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
**UI Placement:**
|
||||||
|
- Header area, near model selector
|
||||||
|
- Non-intrusive but always visible
|
||||||
|
- Optional tooltip with breakdown on hover
|
||||||
|
|
||||||
|
### Edge Cases
|
||||||
|
- Empty conversation: Show "0 / 8K"
|
||||||
|
- During streaming: Include partial content
|
||||||
|
- After clearing: Reset to 0
|
||||||
|
- Model change: Update context window limit
|
||||||
|
|
||||||
|
|||||||
82
.living_spec/stories/archive/17_display_remaining_context.md
Normal file
82
.living_spec/stories/archive/17_display_remaining_context.md
Normal file
@@ -0,0 +1,82 @@
|
|||||||
|
# Story 17: Display Context Window Usage
|
||||||
|
|
||||||
|
## User Story
|
||||||
|
As a user, I want to see how much of the model's context window I'm currently using, so that I know when I'm approaching the limit and should start a new session to avoid losing conversation quality.
|
||||||
|
|
||||||
|
## Acceptance Criteria
|
||||||
|
- [x] A visual indicator shows the current context usage (e.g., "2.5K / 8K tokens" or percentage)
|
||||||
|
- [x] The indicator is always visible in the UI (header area recommended)
|
||||||
|
- [x] The display updates in real-time as messages are added
|
||||||
|
- [x] Different models show their appropriate context window size (e.g., 8K for llama3.1, 128K for larger models)
|
||||||
|
- [x] The indicator changes color or style when approaching the limit (e.g., yellow at 75%, red at 90%)
|
||||||
|
- [x] Hovering over the indicator shows more details (tokens per message breakdown - optional)
|
||||||
|
- [x] The calculation includes system prompts, user messages, assistant responses, and tool outputs
|
||||||
|
- [x] Token counting is reasonably accurate (doesn't need to be perfect, estimate is fine)
|
||||||
|
|
||||||
|
## Out of Scope
|
||||||
|
- Exact token counting (approximation is acceptable)
|
||||||
|
- Automatic session clearing when limit reached
|
||||||
|
- Per-message token counts in the UI
|
||||||
|
- Token usage history or analytics
|
||||||
|
- Different tokenizers for different models (use one estimation method)
|
||||||
|
- Backend token tracking from Ollama (estimate on frontend)
|
||||||
|
|
||||||
|
## Technical Notes
|
||||||
|
|
||||||
|
### Token Estimation
|
||||||
|
- Simple approximation: 1 token ≈ 4 characters (English text)
|
||||||
|
- Or use a basic tokenizer library like `gpt-tokenizer` or `tiktoken` (JS port)
|
||||||
|
- Count all message content: system prompts + user messages + assistant responses + tool outputs
|
||||||
|
- Include tool call JSON in the count
|
||||||
|
|
||||||
|
### Context Window Sizes
|
||||||
|
Common model context windows:
|
||||||
|
- llama3.1, llama3.2: 8K tokens (8,192)
|
||||||
|
- qwen2.5-coder: 32K tokens
|
||||||
|
- deepseek-coder: 16K tokens
|
||||||
|
- Default/unknown: 8K tokens
|
||||||
|
|
||||||
|
### Implementation Approach
|
||||||
|
```tsx
|
||||||
|
// Simple character-based estimation
|
||||||
|
const estimateTokens = (text: string): number => {
|
||||||
|
return Math.ceil(text.length / 4);
|
||||||
|
};
|
||||||
|
|
||||||
|
const calculateTotalTokens = (messages: Message[]): number => {
|
||||||
|
let total = 0;
|
||||||
|
// Add system prompt tokens (from backend)
|
||||||
|
total += estimateTokens(SYSTEM_PROMPT);
|
||||||
|
|
||||||
|
// Add all message tokens
|
||||||
|
for (const msg of messages) {
|
||||||
|
total += estimateTokens(msg.content);
|
||||||
|
if (msg.tool_calls) {
|
||||||
|
total += estimateTokens(JSON.stringify(msg.tool_calls));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return total;
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
### UI Placement
|
||||||
|
- Header area, right side near model selector
|
||||||
|
- Format: "2.5K / 8K tokens (31%)"
|
||||||
|
- Color coding:
|
||||||
|
- Green/default: 0-74%
|
||||||
|
- Yellow/warning: 75-89%
|
||||||
|
- Red/danger: 90-100%
|
||||||
|
|
||||||
|
## Design Considerations
|
||||||
|
- Keep it subtle and non-intrusive
|
||||||
|
- Should be informative but not alarming
|
||||||
|
- Consider a small progress bar or circular indicator
|
||||||
|
- Example: "📊 2,450 / 8,192 (30%)"
|
||||||
|
- Or icon-based: "🟢 30% context"
|
||||||
|
|
||||||
|
## Future Enhancements (Not in this story)
|
||||||
|
- Backend token counting from Ollama (if available)
|
||||||
|
- Per-message token display on hover
|
||||||
|
- "Summarize and continue" feature to compress history
|
||||||
|
- Export/archive conversation before clearing
|
||||||
@@ -22,6 +22,59 @@ export function Chat({ projectPath, onCloseProject }: ChatProps) {
|
|||||||
const messagesEndRef = useRef<HTMLDivElement>(null);
|
const messagesEndRef = useRef<HTMLDivElement>(null);
|
||||||
const inputRef = useRef<HTMLInputElement>(null);
|
const inputRef = useRef<HTMLInputElement>(null);
|
||||||
|
|
||||||
|
// Token estimation and context window tracking
|
||||||
|
const estimateTokens = (text: string): number => {
|
||||||
|
return Math.ceil(text.length / 4);
|
||||||
|
};
|
||||||
|
|
||||||
|
const getContextWindowSize = (modelName: string): number => {
|
||||||
|
if (modelName.includes("llama3")) return 8192;
|
||||||
|
if (modelName.includes("qwen2.5")) return 32768;
|
||||||
|
if (modelName.includes("deepseek")) return 16384;
|
||||||
|
return 8192; // Default
|
||||||
|
};
|
||||||
|
|
||||||
|
const calculateContextUsage = (): {
|
||||||
|
used: number;
|
||||||
|
total: number;
|
||||||
|
percentage: number;
|
||||||
|
} => {
|
||||||
|
let totalTokens = 0;
|
||||||
|
|
||||||
|
// System prompts (approximate)
|
||||||
|
totalTokens += 200;
|
||||||
|
|
||||||
|
// All messages
|
||||||
|
for (const msg of messages) {
|
||||||
|
totalTokens += estimateTokens(msg.content);
|
||||||
|
if (msg.tool_calls) {
|
||||||
|
totalTokens += estimateTokens(JSON.stringify(msg.tool_calls));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Streaming content
|
||||||
|
if (streamingContent) {
|
||||||
|
totalTokens += estimateTokens(streamingContent);
|
||||||
|
}
|
||||||
|
|
||||||
|
const contextWindow = getContextWindowSize(model);
|
||||||
|
const percentage = Math.round((totalTokens / contextWindow) * 100);
|
||||||
|
|
||||||
|
return {
|
||||||
|
used: totalTokens,
|
||||||
|
total: contextWindow,
|
||||||
|
percentage,
|
||||||
|
};
|
||||||
|
};
|
||||||
|
|
||||||
|
const contextUsage = calculateContextUsage();
|
||||||
|
|
||||||
|
const getContextEmoji = (percentage: number): string => {
|
||||||
|
if (percentage >= 90) return "🔴";
|
||||||
|
if (percentage >= 75) return "🟡";
|
||||||
|
return "🟢";
|
||||||
|
};
|
||||||
|
|
||||||
useEffect(() => {
|
useEffect(() => {
|
||||||
invoke<string[]>("get_ollama_models")
|
invoke<string[]>("get_ollama_models")
|
||||||
.then(async (models) => {
|
.then(async (models) => {
|
||||||
@@ -206,6 +259,19 @@ export function Chat({ projectPath, onCloseProject }: ChatProps) {
|
|||||||
|
|
||||||
{/* Model Controls */}
|
{/* Model Controls */}
|
||||||
<div style={{ display: "flex", alignItems: "center", gap: "16px" }}>
|
<div style={{ display: "flex", alignItems: "center", gap: "16px" }}>
|
||||||
|
{/* Context Usage Indicator */}
|
||||||
|
<div
|
||||||
|
style={{
|
||||||
|
fontSize: "0.9em",
|
||||||
|
color: "#ccc",
|
||||||
|
whiteSpace: "nowrap",
|
||||||
|
}}
|
||||||
|
title={`Context: ${contextUsage.used.toLocaleString()} / ${contextUsage.total.toLocaleString()} tokens (${contextUsage.percentage}%)`}
|
||||||
|
>
|
||||||
|
{getContextEmoji(contextUsage.percentage)} {contextUsage.percentage}
|
||||||
|
%
|
||||||
|
</div>
|
||||||
|
|
||||||
<button
|
<button
|
||||||
type="button"
|
type="button"
|
||||||
onClick={clearSession}
|
onClick={clearSession}
|
||||||
|
|||||||
Reference in New Issue
Block a user