# Model Selection Guide ## Overview This application requires LLM models that support **tool calling** (function calling) and are capable of **autonomous execution** rather than just code suggestion. Not all models are suitable for agentic workflows. ## Recommended Models ### Primary Recommendation: GPT-OSS **Model:** `gpt-oss:20b` - **Size:** 13 GB - **Context:** 128K tokens - **Tool Support:** ✅ Excellent - **Autonomous Behavior:** ✅ Excellent - **Why:** OpenAI's open-weight model specifically designed for "agentic tasks". Reliably uses `write_file` to implement changes directly rather than suggesting code. ```bash ollama pull gpt-oss:20b ``` ### Alternative Options #### Llama 3.1 (Best Balance) **Model:** `llama3.1:8b` - **Size:** 4.7 GB - **Context:** 128K tokens - **Tool Support:** ✅ Excellent - **Autonomous Behavior:** ✅ Good - **Why:** Industry standard for tool calling. Well-documented, reliable, and smaller than GPT-OSS. ```bash ollama pull llama3.1:8b ``` #### Qwen 2.5 Coder (Coding Focused) **Model:** `qwen2.5-coder:7b` or `qwen2.5-coder:14b` - **Size:** 4.5 GB / 9 GB - **Context:** 32K tokens - **Tool Support:** ✅ Good - **Autonomous Behavior:** ✅ Good - **Why:** Specifically trained for coding tasks. Note: Use Qwen **2.5**, NOT Qwen 3. ```bash ollama pull qwen2.5-coder:7b # or for more capability: ollama pull qwen2.5-coder:14b ``` #### Mistral (General Purpose) **Model:** `mistral:7b` - **Size:** 4 GB - **Context:** 32K tokens - **Tool Support:** ✅ Good - **Autonomous Behavior:** ✅ Good - **Why:** Fast, efficient, and good at following instructions. ```bash ollama pull mistral:7b ``` ## Models to Avoid ### ❌ Qwen3-Coder **Problem:** Despite supporting tool calling, Qwen3-Coder is trained more as a "helpful assistant" and tends to suggest code in markdown blocks rather than using `write_file` to implement changes directly. **Status:** Works for reading files and analysis, but not recommended for autonomous coding. ### ❌ DeepSeek-Coder-V2 **Problem:** Does not support tool calling at all. **Error:** `"registry.ollama.ai/library/deepseek-coder-v2:latest does not support tools"` ### ❌ StarCoder / CodeLlama (older versions) **Problem:** Most older coding models don't support tool calling or do it poorly. ## How to Verify Tool Support Check if a model supports tools on the Ollama library page: ``` https://ollama.com/library/ ``` Look for the "Tools" tag in the model's capabilities. You can also check locally: ```bash ollama show ``` ## Model Selection Criteria When choosing a model for autonomous coding, prioritize: 1. **Tool Calling Support** - Must support function calling natively 2. **Autonomous Behavior** - Trained to execute rather than suggest 3. **Context Window** - Larger is better for complex projects (32K minimum, 128K ideal) 4. **Size vs Performance** - Balance between model size and your hardware 5. **Prompt Adherence** - Follows system instructions reliably ## Testing a New Model To test if a model works for autonomous coding: 1. Select it in the UI dropdown 2. Ask it to create a simple file: "Create a new file called test.txt with 'Hello World' in it" 3. **Expected behavior:** Uses `write_file` tool and creates the file 4. **Bad behavior:** Suggests code in markdown blocks or asks what you want to do If it suggests code instead of writing it, the model is not suitable for this application. ## Context Window Management Current context usage (approximate): - System prompts: ~1,000 tokens - Tool definitions: ~300 tokens - Per message overhead: ~50-100 tokens - Average conversation: 2-5K tokens Most models will handle 20-30 exchanges before context becomes an issue. The agent loop is limited to 30 turns to prevent context exhaustion. ## Performance Notes **Speed:** Smaller models (3B-8B) are faster but less capable. Larger models (20B-70B) are more reliable but slower. **Hardware:** - 8B models: ~8 GB RAM - 20B models: ~16 GB RAM - 70B models: ~48 GB RAM (quantized) **Recommendation:** Start with `llama3.1:8b` for speed, upgrade to `gpt-oss:20b` for reliability. ## Summary **For this application:** 1. **Best overall:** `gpt-oss:20b` (proven autonomous behavior) 2. **Best balance:** `llama3.1:8b` (fast, reliable, well-supported) 3. **For coding:** `qwen2.5-coder:7b` (specialized, but smaller context) **Avoid:** Qwen3-Coder, DeepSeek-Coder-V2, any model without tool support.