██████╗ ██╗███╗ ██╗ ██████╗██╗ ██╗ ██╔════╝██║████╗ ██║██╔════╝██║ ██║ █████╗ ██║██╔██╗ ██║██║ ███████║ ██╔══╝ ██║██║╚██╗██║██║ ██╔══██║ ██║ ██║██║ ╚████║╚██████╗██║ ██║ ╚═╝ ╚═╝╚═╝ ╚═══╝ ╚═════╝╚═╝ ╚═╝
AI coding assistant · runs on your machine · works offline · learns your style
Claude Code is great — if you're on Claude.
Finch brings the same agentic experience to any provider.
Read, Glob, Grep, Bash, WebFetch — the complete agentic tool loop works with every supported provider. Grok users, GPT-4 users, Mistral users all get the same multi-turn, permission-gated, streaming agent. Switch mid-session with /provider grok.
Auto-loads project instructions by walking the filesystem from root to your working directory — exactly like Claude Code. FINCH.md is a vendor-neutral alias: one instruction file that works with Finch, Claude Code, Cursor, or any assistant that respects it.
Hierarchical memory tree, not RAG. Related conversations cluster semantically; parent nodes aggregate their children for efficient retrieval. The data structure and REPL wiring are in place — SQLite persistence and the TUI tree view are next on the roadmap.
Extend with any Model Context Protocol server — the same ecosystem as Claude Desktop and VS Code. Connect databases, IDEs, APIs, or custom internal tools with a single config entry.
Run headlessly on a task backlog — a named agent identity commits its own work, logs everything to JSONL, and can reflect on completed work to update its own system prompt. Data structures and CLI command are in place; end-to-end testing in progress.
Full TUI with scrollback, streaming, plan mode, and agentic tool use. Ghost text autocomplete. Session history. Inline dialogs with keyboard navigation and custom-response text entry.
$ finch ▄▄▄▄▄▄ ▗▟█●██▙► finch v0.7.4 ▐████████▌ Qwen-2.5-7B · ready ▝▜██████▛▘ ~/repos/myproject ╥ ╥ ╱ ╲ > How do I use lifetimes in Rust?▋
Scriptable one-shot queries. Pipe stdin directly into finch. Works great in shell scripts, CI, and editor integrations.
$ finch query "What is a Rust lifetime?" $ echo "Explain this error" | finch $ cat error.log | finch "what went wrong?" $ git diff | finch "write a commit message"
Finch spawns a background daemon automatically. It exposes an OpenAI-compatible API on port 11435 — so VS Code extensions, scripts, and any tool that speaks the OpenAI protocol can use your local model with zero cloud costs.
Enable mDNS/Bonjour advertising and every machine on your local network can discover and use the model running on your desktop — no manual IP configuration, no cloud, no API keys.
# Start daemon with network sharing $ finch daemon --bind 0.0.0.0:11435 --mdns Listening on 0.0.0.0:11435 mDNS name finch.local Model Qwen-2.5-7B [CoreML] API OpenAI-compatible # Other machines on LAN connect to: # http://finch.local:11435
// VS Code — Continue.dev config { "models": [{ "title": "Finch (local)", "provider": "openai", "model": "local", "apiBase": "http://localhost:11435", "apiKey": "none" }] }
6 model families via ONNX Runtime — Qwen, Llama, Gemma, Mistral, Phi, DeepSeek. Candle backend available on Linux. Zero latency. No subscriptions or rate limits.
ONNX Runtime's CoreML execution provider on Apple Silicon (M1–M4) — dispatches ops to ANE or GPU where CoreML's op set allows. Linux uses CUDA, ROCm, or CPU. Candle (Metal/CPU) is available as an alternative backend on Linux; Metal is not viable on macOS for current model families.
Your code never leaves your machine. No telemetry. Everything runs locally — your proprietary code stays yours.
REPL appears in under 100ms. Model loads in the background. While it's loading on first run, optionally falls back to a cloud provider.
Read, Glob, Grep, Bash, WebFetch. Full multi-turn agentic loop with permission system. Like Claude Code, but entirely local.
Weighted feedback collection infrastructure is in place. LoRA adapter training and loading is the next major milestone — contributions welcome.
/plan <task> runs an adversarial multi-persona critique loop. Seven roles review each draft; must-address issues block convergence. Swap providers freely — the alignment system normalises JSON output across all six.
┌──────────────────────────────────────────────────────┐
│ YOUR REQUEST │
└──────────────────────┬───────────────────────────────┘
│
▼
┌──────────────────┐
│ MODEL READY? │
└──┬───────────────┘
NO YES
│ │
▼ ▼
┌──────────────┐ ┌──────────────────────────────┐
│ TEACHER API │ │ LOCAL MODEL (ONNX) │
│ Claude/GPT-4 │ │ Qwen · Llama · Gemma · Phi │
│ Gemini/Grok │ │ Mistral · DeepSeek │
│ (while loading│ │ CoreML EP / CUDA / CPU │
│ first time) │ └────────────┬─────────────────┘
└──────────────┘ │
▼
┌──────────────────┐
│ RESPONSE │
└──────────────────┘
One curl command installs the binary. The REPL starts in under 100ms. On first run, Finch downloads a Qwen model in the background — you can start asking questions immediately.
Queries run on your local ONNX model. On Apple Silicon, ONNX Runtime's CoreML execution provider dispatches ops to ANE or GPU where supported. Linux supports CUDA and other ONNX Runtime execution providers. No internet required after the model is cached.
Finch can read files, search your codebase, run shell commands, and fetch web pages — all with your approval. Full multi-turn agentic loop, just like Claude Code.
Feedback collection infrastructure is ready (Ctrl+G / Ctrl+B). LoRA fine-tuning to adapt the model to your codebase is the next major milestone.
| FAMILY | SIZES | FORMAT | NOTES |
|---|---|---|---|
| Qwen 2.5 | 1.5B · 3B · 7B · 14B | ONNX | Recommended · auto-selected by RAM |
| Llama 3 | 1B · 3B · 8B | ONNX | Meta · general purpose |
| Gemma 2/3 | 1B · 2B · 9B | ONNX | Google · strong reasoning |
| Mistral | 7B | ONNX | Mistral AI · efficient · models via microsoft/ |
| Phi-3/4 | 3.8B · 14B | ONNX | Microsoft · small + capable |
| DeepSeek Coder | 1.3B · 6.7B | ONNX | Optimised for code generation |
CoreML execution provider on Apple Silicon (M1–M4) · CUDA/ROCm/CPU on Linux via ONNX Runtime · Qwen auto-selected by RAM by default
| PROVIDER | COST TIER | BEST FOR | NOTES |
|---|---|---|---|
| Grok (xAI) | Free with X Premium+ | Daily use · coding · tool calls | Only frontier model free via consumer sub · console.x.ai |
| Groq | $ | Speed · batch processing | Fastest inference of any provider · runs open models on custom hardware |
| Gemini (Google) | $ (Flash) · $$ (Pro) | Long context · multimodal | Gemini Flash is extremely cost-effective · 1M token context window |
| Mistral | $–$$ | European data residency · open weights | Mistral Large for serious work · Codestral for code |
| Claude (Anthropic) | $$–$$$ | Complex reasoning · long tasks · plan mode | Haiku is budget · Sonnet is the sweet spot · best tool use quality |
| GPT-4o (OpenAI) | $$–$$$ | General purpose · ecosystem | GPT-4o mini for budget · full GPT-4o for quality |
Switch any time with /provider grok, /provider claude, etc. · Conversation history preserved across switches · All providers support full tool use
curl -sSL https://raw.githubusercontent.com/darwin-finch/finch/main/install.sh | sh
One self-contained binary — no Node, no Python, no Docker, no runtime dependencies.
~/.local/bin/finch and adds it to your PATHfinch — the REPL starts in under 100msfinch setup to configure an optional provider API key (Claude, GPT-4, Grok, etc.)git clone https://github.com/darwin-finch/finch && cd finch && cargo build --release
Finch is early-stage and actively looking for contributors. All issues are tracked on GitHub.
Browse issues tagged as good first issues — a quick way to get familiar with a Rust codebase that covers ONNX, TUI, async, and tool execution.
help wantedIntegration tests, model adapter improvements, multi-provider routing, and more. Check GitHub Issues for the current list.
big ticketLoad fine-tuned LoRA adapters into ONNX Runtime at inference time. The largest open milestone — would unlock continuous local learning.
Finch is free and open-source. Sponsorships help cover API costs and fund the big open milestones.