13 KiB
ai-agent
A fully local AI coding agent for the terminal -- powered by Ollama and small models, with intelligent routing, cross-session memory, and MCP tool integration.
╭──────────────────────────────────────────╮
│ ai-agent │
│ 100% local. Your data never leaves. │
│ │
│ ASK -- PLAN -- BUILD │
│ 0.8B 4B 9B │
╰──────────────────────────────────────────╯
What is ai-agent?
- 100% local -- runs entirely on your machine via Ollama. No API keys, no cloud, no data leaving your device.
- Small model optimized -- intelligent routing across Qwen 3.5 variants (0.8B / 2B / 4B / 9B) based on task complexity.
- Three operational modes -- ASK for quick answers, PLAN for design and reasoning, BUILD for full execution with tools.
- MCP native -- first-class Model Context Protocol support (STDIO, SSE, Streamable HTTP) for extensible tool integration.
- Beautiful TUI -- built with Charm's BubbleTea v2, Lip Gloss v2, and Glamour for rich markdown rendering in the terminal.
- Infinite Context Engine (ICE) -- cross-session vector retrieval that surfaces relevant past conversations automatically.
- Auto-Memory Detection -- the LLM extracts facts, decisions, preferences, and TODOs from conversations and persists them.
- Thinking/CoT extraction -- chain-of-thought reasoning is captured and displayed in collapsible blocks.
- Skills system -- load
.mdskill files with YAML frontmatter to inject domain-specific instructions into the system prompt. - Agent profiles -- configure per-project agents with custom system prompts, skills, and MCP servers.
Quick Start
Prerequisites
Install
Pull the required model, then install:
ollama pull qwen3.5:2b
go install github.com/abdul-hamid-achik/ai-agent/cmd/ai-agent@latest
For the full model routing suite (optional):
ollama pull qwen3.5:0.8b
ollama pull qwen3.5:4b
ollama pull qwen3.5:9b
ollama pull nomic-embed-text # for ICE vector embeddings
Configure
Create a config file (optional -- defaults work out of the box):
mkdir -p ~/.config/ai-agent
cp config.example.yaml ~/.config/ai-agent/config.yaml
Run
ai-agent
Or from source:
task dev
Features
Model Routing
ai-agent automatically selects the right model size for the task at hand. Simple questions go to the fast 2B model; complex multi-step reasoning escalates to the 9B model. The router analyzes query complexity using keyword heuristics and word count.
| Complexity | Model | Speed | Use Cases |
|---|---|---|---|
| Simple | qwen3.5:2b | 2.5x | Quick answers, simple tool use, single edits |
| Medium | qwen3.5:4b | 1.5x | Code completion, refactoring, explanations |
| Complex | qwen3.5:9b | 1.0x | Multi-step reasoning, debugging, code review |
The fallback chain ensures graceful degradation if a model is not available: 2b -> 4b -> 9b.
Three Modes: ASK / PLAN / BUILD
Cycle between modes with shift+tab. Each mode configures a different system prompt and preferred model tier.
- ASK -- Direct, concise answers. Routes to the fastest available model. Tools available for file reads and searches.
- PLAN -- Design and planning. Breaks tasks into steps. Reads and explores with tools but does not modify files.
- BUILD -- Full execution mode. Uses the most capable model. All tools enabled including writes and modifications.
MCP Tool Integration
Connect any MCP-compatible tool server. Supports all three transport protocols:
- STDIO -- Launch tools as subprocesses (default).
- SSE -- Connect to Server-Sent Events endpoints.
- Streamable HTTP -- Connect to HTTP-based MCP servers.
Tool calls execute in parallel when possible. The registry handles graceful failure if a server becomes unavailable.
Infinite Context Engine (ICE)
ICE embeds each conversation turn using nomic-embed-text and stores them persistently. On every new message, it retrieves the most relevant past conversations via cosine similarity and injects them into the system prompt -- giving the agent memory that spans across sessions.
Auto-Memory Detection
After each conversation turn, a background process analyzes the exchange and extracts structured memories:
- FACT -- objective information the user shared
- DECISION -- choices made during the conversation
- PREFERENCE -- user preferences and working styles
- TODO -- action items and follow-ups
Memories are stored in ~/.config/ai-agent/memories.json with tag-weighted search scoring (tags weighted 3x over content).
Thinking/CoT Display
When the model produces chain-of-thought reasoning, ai-agent captures it and renders it in collapsible blocks. Toggle the display with ctrl+t.
Skills System
Drop .md files with YAML frontmatter into the skills directory to inject domain-specific instructions:
~/.config/ai-agent/skills/
Manage active skills with /skill list, /skill activate <name>, and /skill deactivate <name>.
Agent Profiles
Create per-project or per-domain agent profiles:
~/.agents/<name>/
AGENT.md # System prompt additions
SKILL.md # Agent-specific skills
mcp.yaml # Agent-specific MCP servers
Switch profiles with /agent <name> or /agent list.
Configuration
File Locations
Config is searched in order (first match wins):
./ai-agent.yaml(project-local)~/.config/ai-agent/config.yaml(user-global)
Annotated Example
ollama:
model: "qwen3.5:2b" # Default model
base_url: "http://localhost:11434" # Ollama API endpoint
num_ctx: 262144 # Context window size
# Skills directory (default: ~/.config/ai-agent/skills/)
# skills_dir: "/path/to/custom/skills"
# MCP tool servers
servers:
# STDIO transport (default)
- name: noted
command: noted
args: [mcp]
# SSE transport
# - name: remote-server
# transport: sse
# url: "http://localhost:8811"
# Streamable HTTP transport
# - name: streamable-server
# transport: streamable-http
# url: "http://localhost:8812/mcp"
# ICE configuration
# ice:
# enabled: true
# embed_model: "nomic-embed-text"
# store_path: "~/.config/ai-agent/conversations.json"
Environment Variables
| Variable | Description | Overrides |
|---|---|---|
OLLAMA_HOST |
Ollama API base URL | ollama.base_url |
LOCAL_AGENT_MODEL |
Default model name | ollama.model |
LOCAL_AGENT_AGENTS_DIR |
Path to agents directory | agents.dir |
Keyboard Shortcuts
Input
| Key | Action |
|---|---|
enter |
Send message |
shift+enter |
Insert new line |
up / down |
Browse input history |
shift+tab |
Cycle mode (ASK/PLAN/BUILD) |
ctrl+m |
Quick model switch |
Navigation
| Key | Action |
|---|---|
pgup / pgdown |
Scroll conversation |
ctrl+u |
Half-page scroll up |
ctrl+d |
Half-page scroll down |
Display
| Key | Action |
|---|---|
? |
Toggle help overlay |
t |
Expand/collapse tool calls |
space |
Toggle last tool details |
ctrl+t |
Toggle thinking/CoT display |
ctrl+y |
Copy last response |
Control
| Key | Action |
|---|---|
esc |
Cancel streaming / close overlay |
ctrl+c |
Quit |
ctrl+l |
Clear screen |
ctrl+n |
New conversation |
Slash Commands
| Command | Description |
|---|---|
/help |
Show help overlay |
/clear |
Clear conversation history |
/new |
Start a fresh conversation |
/model [name|list|fast|smart] |
Show or switch models |
/models |
Open model picker |
/agent [name|list] |
Show or switch agent profile |
/load <path> |
Load markdown file as context |
/unload |
Remove loaded context |
/skill [list|activate|deactivate] [name] |
Manage skills |
/servers |
List connected MCP servers |
/ice |
Show ICE engine status |
/sessions |
Browse saved sessions |
/exit |
Quit |
Architecture
cmd/ai-agent/ Entry point
internal/
agent/ ReAct loop orchestration
llm/ LLM abstraction (OllamaClient, ModelManager)
mcp/ MCP server registry
config/ YAML config, env overrides, Router
ice/ Infinite Context Engine
memory/ Persistent key-value store
skill/ Skill file loader
command/ Slash command registry
tui/ BubbleTea v2 terminal UI
logging/ Structured logging
Request Flow
User Input
|
v
agent.AddUserMessage()
|
v
ICE embeds message, retrieves relevant past context
|
v
System prompt assembled (tools + skills + context + ICE + memory)
|
v
Router selects model based on task complexity
|
v
LLM streams response via ChatStream()
|
v
Tool calls routed through MCP registry (parallel execution)
|
v
ReAct loop continues (up to 10 iterations) until final text
|
v
Conversation compacted if token budget exceeded
Auto-memory detection runs in background
Key Interfaces
llm.Client-- pluggable LLM provider (ChatStream,Ping,Embed)agent.Output-- streaming callbacks for TUI renderingcommand.Registry-- extensible slash command dispatch
Concurrency
sync.RWMutex protects shared state in ModelManager, mcp.Registry, and memory.Store. Auto-memory detection and MCP connections run as background goroutines. Tool calls execute in parallel when independent.
Comparison
| Feature | ai-agent | opencode | crush |
|---|---|---|---|
| 100% local (no API keys) | Yes | No | Yes |
| Model routing by task complexity | Yes | No | No |
| Operational modes (ASK/PLAN/BUILD) | Yes | No | No |
| Cross-session memory (ICE) | Yes | No | No |
| Auto-memory detection | Yes | No | No |
| Thinking/CoT extraction | Yes | Yes | No |
| MCP tool support | Yes | Yes | Yes |
| Skills system | Yes | No | No |
| Plan form overlay | Yes | No | No |
| Small model optimized | Yes | No | No |
| TUI chat interface | Yes | Yes | Yes |
| Language | Go | TypeScript | Go |
Building
This project uses Task as its build tool.
task build # Compile to bin/ai-agent
task run # Build and run
task dev # Quick run via go run ./cmd/ai-agent
task test # Run all tests: go test ./...
task lint # Run golangci-lint run ./...
task clean # Remove bin/ directory
Run a single test:
go test ./internal/agent/ -run TestFunctionName
License
MIT