ai-agent/README.md

# ai-agent

A fully local AI coding agent for the terminal -- powered by Ollama and small models, with intelligent routing, cross-session memory, and MCP tool integration.

```
  ╭──────────────────────────────────────────╮
  │  ai-agent                             │
  │  100% local. Your data never leaves.     │
  │                                          │
  │  ASK  --  PLAN  --  BUILD                │
  │  0.8B    4B        9B                    │
  ╰──────────────────────────────────────────╯
```

---

## What is ai-agent?

- **100% local** -- runs entirely on your machine via Ollama. No API keys, no cloud, no data leaving your device.
- **Small model optimized** -- intelligent routing across Qwen 3.5 variants (0.8B / 2B / 4B / 9B) based on task complexity.
- **Three operational modes** -- ASK for quick answers, PLAN for design and reasoning, BUILD for full execution with tools.
- **MCP native** -- first-class Model Context Protocol support (STDIO, SSE, Streamable HTTP) for extensible tool integration.
- **Beautiful TUI** -- built with Charm's BubbleTea v2, Lip Gloss v2, and Glamour for rich markdown rendering in the terminal.
- **Infinite Context Engine (ICE)** -- cross-session vector retrieval that surfaces relevant past conversations automatically.
- **Auto-Memory Detection** -- the LLM extracts facts, decisions, preferences, and TODOs from conversations and persists them.
- **Thinking/CoT extraction** -- chain-of-thought reasoning is captured and displayed in collapsible blocks.
- **Skills system** -- load `.md` skill files with YAML frontmatter to inject domain-specific instructions into the system prompt.
- **Agent profiles** -- configure per-project agents with custom system prompts, skills, and MCP servers.

---

## Quick Start

### Prerequisites

- [Go 1.25+](https://go.dev/dl/)
- [Ollama](https://ollama.ai/) running locally
- [Task](https://taskfile.dev/) (optional, for build commands)

### Install

Pull the required model, then install:

```bash
ollama pull qwen3.5:2b

go install github.com/abdul-hamid-achik/ai-agent/cmd/ai-agent@latest
```

For the full model routing suite (optional):

```bash
ollama pull qwen3.5:0.8b
ollama pull qwen3.5:4b
ollama pull qwen3.5:9b
ollama pull nomic-embed-text   # for ICE vector embeddings
```

### Configure

Create a config file (optional -- defaults work out of the box):

```bash
mkdir -p ~/.config/ai-agent
cp config.example.yaml ~/.config/ai-agent/config.yaml
```

### Run

```bash
ai-agent
```

Or from source:

```bash
task dev
```

---

## Features

### Model Routing

ai-agent automatically selects the right model size for the task at hand. Simple questions go to the fast 2B model; complex multi-step reasoning escalates to the 9B model. The router analyzes query complexity using keyword heuristics and word count.

| Complexity | Model         | Speed  | Use Cases                                    |
|------------|---------------|--------|----------------------------------------------|
| Simple     | qwen3.5:2b    | 2.5x   | Quick answers, simple tool use, single edits |
| Medium     | qwen3.5:4b    | 1.5x   | Code completion, refactoring, explanations   |
| Complex    | qwen3.5:9b    | 1.0x   | Multi-step reasoning, debugging, code review |

The fallback chain ensures graceful degradation if a model is not available: `2b -> 4b -> 9b`.

### Three Modes: ASK / PLAN / BUILD

Cycle between modes with `shift+tab`. Each mode configures a different system prompt and preferred model tier.

- **ASK** -- Direct, concise answers. Routes to the fastest available model. Tools available for file reads and searches.
- **PLAN** -- Design and planning. Breaks tasks into steps. Reads and explores with tools but does not modify files.
- **BUILD** -- Full execution mode. Uses the most capable model. All tools enabled including writes and modifications.

### MCP Tool Integration

Connect any MCP-compatible tool server. Supports all three transport protocols:

- **STDIO** -- Launch tools as subprocesses (default).
- **SSE** -- Connect to Server-Sent Events endpoints.
- **Streamable HTTP** -- Connect to HTTP-based MCP servers.

Tool calls execute in parallel when possible. The registry handles graceful failure if a server becomes unavailable.

### Infinite Context Engine (ICE)

ICE embeds each conversation turn using `nomic-embed-text` and stores them persistently. On every new message, it retrieves the most relevant past conversations via cosine similarity and injects them into the system prompt -- giving the agent memory that spans across sessions.

### Auto-Memory Detection

After each conversation turn, a background process analyzes the exchange and extracts structured memories:

- **FACT** -- objective information the user shared
- **DECISION** -- choices made during the conversation
- **PREFERENCE** -- user preferences and working styles
- **TODO** -- action items and follow-ups

Memories are stored in `~/.config/ai-agent/memories.json` with tag-weighted search scoring (tags weighted 3x over content).

### Thinking/CoT Display

When the model produces chain-of-thought reasoning, ai-agent captures it and renders it in collapsible blocks. Toggle the display with `ctrl+t`.

### Skills System

Drop `.md` files with YAML frontmatter into the skills directory to inject domain-specific instructions:

```
~/.config/ai-agent/skills/
```

Manage active skills with `/skill list`, `/skill activate <name>`, and `/skill deactivate <name>`.

### Agent Profiles

Create per-project or per-domain agent profiles:

```
~/.agents/<name>/
  AGENT.md       # System prompt additions
  SKILL.md       # Agent-specific skills
  mcp.yaml       # Agent-specific MCP servers
```

Switch profiles with `/agent <name>` or `/agent list`.

---

## Configuration

### File Locations

Config is searched in order (first match wins):

1. `./ai-agent.yaml` (project-local)
2. `~/.config/ai-agent/config.yaml` (user-global)

### Annotated Example

```yaml
ollama:
  model: "qwen3.5:2b"               # Default model
  base_url: "http://localhost:11434"  # Ollama API endpoint
  num_ctx: 262144                     # Context window size

# Skills directory (default: ~/.config/ai-agent/skills/)
# skills_dir: "/path/to/custom/skills"

# MCP tool servers
servers:
  # STDIO transport (default)
  - name: noted
    command: noted
    args: [mcp]

  # SSE transport
  # - name: remote-server
  #   transport: sse
  #   url: "http://localhost:8811"

  # Streamable HTTP transport
  # - name: streamable-server
  #   transport: streamable-http
  #   url: "http://localhost:8812/mcp"

# ICE configuration
# ice:
#   enabled: true
#   embed_model: "nomic-embed-text"
#   store_path: "~/.config/ai-agent/conversations.json"
```

### Environment Variables

| Variable                 | Description                  | Overrides            |
|--------------------------|------------------------------|----------------------|
| `OLLAMA_HOST`            | Ollama API base URL          | `ollama.base_url`    |
| `LOCAL_AGENT_MODEL`      | Default model name           | `ollama.model`       |
| `LOCAL_AGENT_AGENTS_DIR` | Path to agents directory     | `agents.dir`         |

---

## Keyboard Shortcuts

### Input

| Key             | Action                        |
|-----------------|-------------------------------|
| `enter`         | Send message                  |
| `shift+enter`   | Insert new line               |
| `up` / `down`   | Browse input history          |
| `shift+tab`     | Cycle mode (ASK/PLAN/BUILD)   |
| `ctrl+m`        | Quick model switch            |

### Navigation

| Key              | Action                       |
|------------------|------------------------------|
| `pgup` / `pgdown`| Scroll conversation          |
| `ctrl+u`         | Half-page scroll up          |
| `ctrl+d`         | Half-page scroll down        |

### Display

| Key             | Action                        |
|-----------------|-------------------------------|
| `?`             | Toggle help overlay           |
| `t`             | Expand/collapse tool calls    |
| `space`         | Toggle last tool details      |
| `ctrl+t`        | Toggle thinking/CoT display   |
| `ctrl+y`        | Copy last response            |

### Control

| Key             | Action                        |
|-----------------|-------------------------------|
| `esc`           | Cancel streaming / close overlay |
| `ctrl+c`        | Quit                          |
| `ctrl+l`        | Clear screen                  |
| `ctrl+n`        | New conversation              |

---

## Slash Commands

| Command                              | Description                       |
|--------------------------------------|-----------------------------------|
| `/help`                              | Show help overlay                 |
| `/clear`                             | Clear conversation history        |
| `/new`                               | Start a fresh conversation        |
| `/model [name\|list\|fast\|smart]`   | Show or switch models             |
| `/models`                            | Open model picker                 |
| `/agent [name\|list]`                | Show or switch agent profile      |
| `/load <path>`                       | Load markdown file as context     |
| `/unload`                            | Remove loaded context             |
| `/skill [list\|activate\|deactivate] [name]` | Manage skills            |
| `/servers`                           | List connected MCP servers        |
| `/ice`                               | Show ICE engine status            |
| `/sessions`                          | Browse saved sessions             |
| `/exit`                              | Quit                              |

---

## Architecture

```
cmd/ai-agent/          Entry point
internal/
  agent/                  ReAct loop orchestration
  llm/                    LLM abstraction (OllamaClient, ModelManager)
  mcp/                    MCP server registry
  config/                 YAML config, env overrides, Router
  ice/                    Infinite Context Engine
  memory/                 Persistent key-value store
  skill/                  Skill file loader
  command/                Slash command registry
  tui/                    BubbleTea v2 terminal UI
  logging/                Structured logging
```

### Request Flow

```
User Input
    |
    v
agent.AddUserMessage()
    |
    v
ICE embeds message, retrieves relevant past context
    |
    v
System prompt assembled (tools + skills + context + ICE + memory)
    |
    v
Router selects model based on task complexity
    |
    v
LLM streams response via ChatStream()
    |
    v
Tool calls routed through MCP registry (parallel execution)
    |
    v
ReAct loop continues (up to 10 iterations) until final text
    |
    v
Conversation compacted if token budget exceeded
Auto-memory detection runs in background
```

### Key Interfaces

- `llm.Client` -- pluggable LLM provider (`ChatStream`, `Ping`, `Embed`)
- `agent.Output` -- streaming callbacks for TUI rendering
- `command.Registry` -- extensible slash command dispatch

### Concurrency

`sync.RWMutex` protects shared state in `ModelManager`, `mcp.Registry`, and `memory.Store`. Auto-memory detection and MCP connections run as background goroutines. Tool calls execute in parallel when independent.

---

## Comparison

| Feature                          | ai-agent | opencode | crush |
|----------------------------------|:-----------:|:--------:|:-----:|
| 100% local (no API keys)         | Yes         | No       | Yes   |
| Model routing by task complexity | Yes         | No       | No    |
| Operational modes (ASK/PLAN/BUILD)| Yes        | No       | No    |
| Cross-session memory (ICE)       | Yes         | No       | No    |
| Auto-memory detection            | Yes         | No       | No    |
| Thinking/CoT extraction          | Yes         | Yes      | No    |
| MCP tool support                 | Yes         | Yes      | Yes   |
| Skills system                    | Yes         | No       | No    |
| Plan form overlay                | Yes         | No       | No    |
| Small model optimized            | Yes         | No       | No    |
| TUI chat interface               | Yes         | Yes      | Yes   |
| Language                         | Go          | TypeScript| Go   |

---

## Building

This project uses [Task](https://taskfile.dev/) as its build tool.

```bash
task build       # Compile to bin/ai-agent
task run         # Build and run
task dev         # Quick run via go run ./cmd/ai-agent
task test        # Run all tests: go test ./...
task lint        # Run golangci-lint run ./...
task clean       # Remove bin/ directory
```

Run a single test:

```bash
go test ./internal/agent/ -run TestFunctionName
```

---

## License

MIT