Finch — Local AI Coding Assistant

██████╗ ██╗███╗   ██╗ ██████╗██╗  ██╗
██╔════╝██║████╗  ██║██╔════╝██║  ██║
█████╗  ██║██╔██╗ ██║██║     ███████║
██╔══╝  ██║██║╚██╗██║██║     ██╔══██║
██║     ██║██║ ╚████║╚██████╗██║  ██║
╚═╝     ╚═╝╚═╝  ╚═══╝ ╚═════╝╚═╝  ╚═╝

AI coding assistant · runs on your machine · works offline · learns your style

WRITTEN IN RUST SINGLE BINARY WORKS OFFLINE PRIVATE BY DEFAULT OPEN SOURCE

zsh — finch — 80×24

▋

INSTALL NOW ★ VIEW ON GITHUB

// THE OPEN AGENT

Claude Code is great — if you're on Claude.
Finch brings the same agentic experience to any provider.

GrokGPT-4oGeminiMistralGroqClaudeLocal ONNX

⬡

FULL TOOL USE ON ANY LLM

Read, Glob, Grep, Bash, WebFetch — the complete agentic tool loop works with every supported provider. Grok users, GPT-4 users, Mistral users all get the same multi-turn, permission-gated, streaming agent. Switch mid-session with /provider grok.

▤

CLAUDE.MD + FINCH.MD

Auto-loads project instructions by walking the filesystem from root to your working directory — exactly like Claude Code. FINCH.md is a vendor-neutral alias: one instruction file that works with Finch, Claude Code, Cursor, or any assistant that respects it.

◫

MEMTREE MEMORY

Hierarchical memory tree, not RAG. Related conversations cluster semantically; parent nodes aggregate their children for efficient retrieval. The data structure and REPL wiring are in place — SQLite persistence and the TUI tree view are next on the roadmap.

⬡

MCP PLUGINS

Extend with any Model Context Protocol server — the same ecosystem as Claude Desktop and VS Code. Connect databases, IDEs, APIs, or custom internal tools with a single config entry.

▶

AUTONOMOUS MODE

Run headlessly on a task backlog — a named agent identity commits its own work, logs everything to JSONL, and can reflect on completed work to update its own system prompt. Data structures and CLI command are in place; end-to-end testing in progress.

// USE IT YOUR WAY

INTERACTIVE REPL

Full TUI with scrollback, streaming, plan mode, and agentic tool use. Ghost text autocomplete. Session history. Inline dialogs with keyboard navigation and custom-response text entry.

$ finch

      ▄▄▄▄▄▄
    ▗▟█●██▙►  finch v0.7.4
  ▐████████▌  Qwen-2.5-7B · ready
  ▝▜██████▛▘  ~/repos/myproject
     ╥  ╥
    ╱    ╲

> How do I use lifetimes in Rust?▋

SINGLE QUERY + PIPE

Scriptable one-shot queries. Pipe stdin directly into finch. Works great in shell scripts, CI, and editor integrations.

$ finch query "What is a Rust lifetime?"

$ echo "Explain this error" | finch

$ cat error.log | finch "what went wrong?"

$ git diff | finch "write a commit message"

BACKGROUND DAEMON — LOCAL MODEL AS A SERVICE

Finch spawns a background daemon automatically. It exposes an OpenAI-compatible API on port 11435 — so VS Code extensions, scripts, and any tool that speaks the OpenAI protocol can use your local model with zero cloud costs.

Enable mDNS/Bonjour advertising and every machine on your local network can discover and use the model running on your desktop — no manual IP configuration, no cloud, no API keys.

# Start daemon with network sharing
$ finch daemon --bind 0.0.0.0:11435 --mdns

  Listening on  0.0.0.0:11435
  mDNS name     finch.local
  Model         Qwen-2.5-7B [CoreML]
  API           OpenAI-compatible

# Other machines on LAN connect to:
# http://finch.local:11435

// VS Code — Continue.dev config
{
  "models": [{
    "title": "Finch (local)",
    "provider": "openai",
    "model": "local",
    "apiBase":
      "http://localhost:11435",
    "apiKey": "none"
  }]
}

// FEATURES

▣

WORKS OFFLINE

6 model families via ONNX Runtime — Qwen, Llama, Gemma, Mistral, Phi, DeepSeek. Candle backend available on Linux. Zero latency. No subscriptions or rate limits.

◈

NATIVE ACCELERATION

ONNX Runtime's CoreML execution provider on Apple Silicon (M1–M4) — dispatches ops to ANE or GPU where CoreML's op set allows. Linux uses CUDA, ROCm, or CPU. Candle (Metal/CPU) is available as an alternative backend on Linux; Metal is not viable on macOS for current model families.

◉

PRIVACY FIRST

Your code never leaves your machine. No telemetry. Everything runs locally — your proprietary code stays yours.

▶

INSTANT STARTUP

REPL appears in under 100ms. Model loads in the background. While it's loading on first run, optionally falls back to a cloud provider.

◆

AGENTIC TOOLS

Read, Glob, Grep, Bash, WebFetch. Full multi-turn agentic loop with permission system. Like Claude Code, but entirely local.

◇

LORA FINE-TUNING

Weighted feedback collection infrastructure is in place. LoRA adapter training and loading is the next major milestone — contributions welcome.

⬡

ITERATIVE PLANNING

/plan <task> runs an adversarial multi-persona critique loop. Seven roles review each draft; must-address issues block convergence. Swap providers freely — the alignment system normalises JSON output across all six.

// HOW IT WORKS

  ┌──────────────────────────────────────────────────────┐
  │                     YOUR REQUEST                      │
  └──────────────────────┬───────────────────────────────┘
                         │
                         ▼
               ┌──────────────────┐
               │   MODEL READY?   │
               └──┬───────────────┘
                 NO                YES
                  │                 │
                  ▼                 ▼
        ┌──────────────┐  ┌──────────────────────────────┐
        │ TEACHER API   │  │   LOCAL MODEL (ONNX)          │
        │ Claude/GPT-4  │  │  Qwen · Llama · Gemma · Phi   │
        │ Gemini/Grok   │  │  Mistral · DeepSeek           │
        │ (while loading│  │  CoreML EP / CUDA / CPU        │
        │  first time)  │  └────────────┬─────────────────┘
        └──────────────┘               │
                                        ▼
                              ┌──────────────────┐
                              │     RESPONSE      │
                              └──────────────────┘

INSTALL & RUN

One curl command installs the binary. The REPL starts in under 100ms. On first run, Finch downloads a Qwen model in the background — you can start asking questions immediately.

QUERY LOCALLY

Queries run on your local ONNX model. On Apple Silicon, ONNX Runtime's CoreML execution provider dispatches ops to ANE or GPU where supported. Linux supports CUDA and other ONNX Runtime execution providers. No internet required after the model is cached.

USE AGENTIC TOOLS

Finch can read files, search your codebase, run shell commands, and fetch web pages — all with your approval. Full multi-turn agentic loop, just like Claude Code.

LORA COMING SOON

Feedback collection infrastructure is ready (Ctrl+G / Ctrl+B). LoRA fine-tuning to adapt the model to your codebase is the next major milestone.

// SUPPORTED MODELS

Local (ONNX Runtime — runs on your hardware)

FAMILY	SIZES	FORMAT	NOTES
Qwen 2.5	1.5B · 3B · 7B · 14B	ONNX	Recommended · auto-selected by RAM
Llama 3	1B · 3B · 8B	ONNX	Meta · general purpose
Gemma 2/3	1B · 2B · 9B	ONNX	Google · strong reasoning
Mistral	7B	ONNX	Mistral AI · efficient · models via microsoft/
Phi-3/4	3.8B · 14B	ONNX	Microsoft · small + capable
DeepSeek Coder	1.3B · 6.7B	ONNX	Optimised for code generation

CoreML execution provider on Apple Silicon (M1–M4) · CUDA/ROCm/CPU on Linux via ONNX Runtime · Qwen auto-selected by RAM by default

Cloud providers (bring your own API key)

PROVIDER	COST TIER	BEST FOR	NOTES
Grok (xAI)	Free with X Premium+	Daily use · coding · tool calls	Only frontier model free via consumer sub · console.x.ai
Groq	$	Speed · batch processing	Fastest inference of any provider · runs open models on custom hardware
Gemini (Google)	$ (Flash) · $$ (Pro)	Long context · multimodal	Gemini Flash is extremely cost-effective · 1M token context window
Mistral	$–$$	European data residency · open weights	Mistral Large for serious work · Codestral for code
Claude (Anthropic)	$$–$$$	Complex reasoning · long tasks · plan mode	Haiku is budget · Sonnet is the sweet spot · best tool use quality
GPT-4o (OpenAI)	$$–$$$	General purpose · ecosystem	GPT-4o mini for budget · full GPT-4o for quality

Switch any time with /provider grok, /provider claude, etc. · Conversation history preserved across switches · All providers support full tool use

// INSTALL

$ curl -sSL https://raw.githubusercontent.com/darwin-finch/finch/main/install.sh | sh

macOS Apple Silicon · Linux x86_64 · PolyForm Noncommercial · view script ↗

One self-contained binary — no Node, no Python, no Docker, no runtime dependencies.

01
Installs the binary to ~/.local/bin/finch and adds it to your PATH
02
Run finch — the REPL starts in under 100ms
03
Run finch setup to configure an optional provider API key (Claude, GPT-4, Grok, etc.)
04
On first run, Finch downloads a local model in the background — start querying immediately

// or build from source

$ git clone https://github.com/darwin-finch/finch && cd finch && cargo build --release

// THE PROBLEM

// WITH FINCH

// THE OPEN AGENT

FULL TOOL USE ON ANY LLM

CLAUDE.MD + FINCH.MD

MEMTREE MEMORY

MCP PLUGINS

AUTONOMOUS MODE

// USE IT YOUR WAY

INTERACTIVE REPL

SINGLE QUERY + PIPE

BACKGROUND DAEMON — LOCAL MODEL AS A SERVICE

// FEATURES

WORKS OFFLINE

NATIVE ACCELERATION

PRIVACY FIRST

INSTANT STARTUP

AGENTIC TOOLS

LORA FINE-TUNING

ITERATIVE PLANNING

// HOW IT WORKS

INSTALL & RUN

QUERY LOCALLY

USE AGENTIC TOOLS

LORA COMING SOON

// SUPPORTED MODELS

Local (ONNX Runtime — runs on your hardware)

Cloud providers (bring your own API key)

// INSTALL

// CONTRIBUTE

Good First Issues

Help Wanted

LoRA Adapter Loading

// THE PROBLEM

// WITH FINCH

// THE OPEN AGENT

FULL TOOL USE ON ANY LLM

CLAUDE.MD + FINCH.MD

MEMTREE MEMORY

MCP PLUGINS

AUTONOMOUS MODE

// USE IT YOUR WAY

INTERACTIVE REPL

SINGLE QUERY + PIPE

BACKGROUND DAEMON — LOCAL MODEL AS A SERVICE

// FEATURES

WORKS OFFLINE

NATIVE ACCELERATION

PRIVACY FIRST

INSTANT STARTUP

AGENTIC TOOLS

LORA FINE-TUNING

ITERATIVE PLANNING

// HOW IT WORKS

INSTALL & RUN

QUERY LOCALLY

USE AGENTIC TOOLS

LORA COMING SOON

// SUPPORTED MODELS

Local (ONNX Runtime — runs on your hardware)

Cloud providers (bring your own API key)

// INSTALL

// CONTRIBUTE

Good First Issues

Help Wanted

LoRA Adapter Loading

// SUPPORT THE PROJECT