██████╗ ██╗███╗   ██╗ ██████╗██╗  ██╗
██╔════╝██║████╗  ██║██╔════╝██║  ██║
█████╗  ██║██╔██╗ ██║██║     ███████║
██╔══╝  ██║██║╚██╗██║██║     ██╔══██║
██║     ██║██║ ╚████║╚██████╗██║  ██║
╚═╝     ╚═╝╚═╝  ╚═══╝ ╚═════╝╚═╝  ╚═╝

AI coding assistant · runs on your machine · works offline · learns your style

WRITTEN IN RUST SINGLE BINARY WORKS OFFLINE PRIVATE BY DEFAULT OPEN SOURCE
zsh — finch — 80×24

// THE PROBLEM

  • Constant internet connection required
  • Every query costs API money
  • Your code goes to the cloud
  • No learning from your patterns
  • Ongoing API costs for every single query
VS

// WITH FINCH

  • Fully offline after first model download
  • Zero marginal cost per query
  • Code stays on your machine
  • 6 model families, pick what fits your hardware
  • Useful from day one (pre-trained models)

// THE OPEN AGENT

Claude Code is great — if you're on Claude.
Finch brings the same agentic experience to any provider.

GrokGPT-4oGeminiMistralGroqClaudeLocal ONNX

FULL TOOL USE ON ANY LLM

Read, Glob, Grep, Bash, WebFetch — the complete agentic tool loop works with every supported provider. Grok users, GPT-4 users, Mistral users all get the same multi-turn, permission-gated, streaming agent. Switch mid-session with /provider grok.

CLAUDE.MD + FINCH.MD

Auto-loads project instructions by walking the filesystem from root to your working directory — exactly like Claude Code. FINCH.md is a vendor-neutral alias: one instruction file that works with Finch, Claude Code, Cursor, or any assistant that respects it.

MEMTREE MEMORY

Hierarchical memory tree, not RAG. Related conversations cluster semantically; parent nodes aggregate their children for efficient retrieval. The data structure and REPL wiring are in place — SQLite persistence and the TUI tree view are next on the roadmap.

MCP PLUGINS

Extend with any Model Context Protocol server — the same ecosystem as Claude Desktop and VS Code. Connect databases, IDEs, APIs, or custom internal tools with a single config entry.

AUTONOMOUS MODE

Run headlessly on a task backlog — a named agent identity commits its own work, logs everything to JSONL, and can reflect on completed work to update its own system prompt. Data structures and CLI command are in place; end-to-end testing in progress.

// USE IT YOUR WAY

01

INTERACTIVE REPL

Full TUI with scrollback, streaming, plan mode, and agentic tool use. Ghost text autocomplete. Session history. Inline dialogs with keyboard navigation and custom-response text entry.

$ finch

      ▄▄▄▄▄▄
    ▗▟█●██▙►  finch v0.7.4
  ▐████████▌  Qwen-2.5-7B · ready
  ▝▜██████▛▘  ~/repos/myproject
     ╥  ╥
    ╱    ╲

> How do I use lifetimes in Rust?
02

SINGLE QUERY + PIPE

Scriptable one-shot queries. Pipe stdin directly into finch. Works great in shell scripts, CI, and editor integrations.

$ finch query "What is a Rust lifetime?"

$ echo "Explain this error" | finch

$ cat error.log | finch "what went wrong?"

$ git diff | finch "write a commit message"
03

BACKGROUND DAEMON — LOCAL MODEL AS A SERVICE

Finch spawns a background daemon automatically. It exposes an OpenAI-compatible API on port 11435 — so VS Code extensions, scripts, and any tool that speaks the OpenAI protocol can use your local model with zero cloud costs.

Enable mDNS/Bonjour advertising and every machine on your local network can discover and use the model running on your desktop — no manual IP configuration, no cloud, no API keys.

# Start daemon with network sharing
$ finch daemon --bind 0.0.0.0:11435 --mdns

  Listening on  0.0.0.0:11435
  mDNS name     finch.local
  Model         Qwen-2.5-7B [CoreML]
  API           OpenAI-compatible

# Other machines on LAN connect to:
# http://finch.local:11435
// VS Code — Continue.dev config
{
  "models": [{
    "title": "Finch (local)",
    "provider": "openai",
    "model": "local",
    "apiBase":
      "http://localhost:11435",
    "apiKey": "none"
  }]
}

// FEATURES

WORKS OFFLINE

6 model families via ONNX Runtime — Qwen, Llama, Gemma, Mistral, Phi, DeepSeek. Candle backend available on Linux. Zero latency. No subscriptions or rate limits.

NATIVE ACCELERATION

ONNX Runtime's CoreML execution provider on Apple Silicon (M1–M4) — dispatches ops to ANE or GPU where CoreML's op set allows. Linux uses CUDA, ROCm, or CPU. Candle (Metal/CPU) is available as an alternative backend on Linux; Metal is not viable on macOS for current model families.

PRIVACY FIRST

Your code never leaves your machine. No telemetry. Everything runs locally — your proprietary code stays yours.

INSTANT STARTUP

REPL appears in under 100ms. Model loads in the background. While it's loading on first run, optionally falls back to a cloud provider.

AGENTIC TOOLS

Read, Glob, Grep, Bash, WebFetch. Full multi-turn agentic loop with permission system. Like Claude Code, but entirely local.

LORA FINE-TUNING

Weighted feedback collection infrastructure is in place. LoRA adapter training and loading is the next major milestone — contributions welcome.

ITERATIVE PLANNING

/plan <task> runs an adversarial multi-persona critique loop. Seven roles review each draft; must-address issues block convergence. Swap providers freely — the alignment system normalises JSON output across all six.

// HOW IT WORKS

  ┌──────────────────────────────────────────────────────┐
  │                     YOUR REQUEST                      │
  └──────────────────────┬───────────────────────────────┘
                         │
                         ▼
               ┌──────────────────┐
               │   MODEL READY?   │
               └──┬───────────────┘
                 NO                YES
                  │                 │
                  ▼                 ▼
        ┌──────────────┐  ┌──────────────────────────────┐
        │ TEACHER API   │  │   LOCAL MODEL (ONNX)          │
        │ Claude/GPT-4  │  │  Qwen · Llama · Gemma · Phi   │
        │ Gemini/Grok   │  │  Mistral · DeepSeek           │
        │ (while loading│  │  CoreML EP / CUDA / CPU        │
        │  first time)  │  └────────────┬─────────────────┘
        └──────────────┘               │
                                        ▼
                              ┌──────────────────┐
                              │     RESPONSE      │
                              └──────────────────┘
01

INSTALL & RUN

One curl command installs the binary. The REPL starts in under 100ms. On first run, Finch downloads a Qwen model in the background — you can start asking questions immediately.

02

QUERY LOCALLY

Queries run on your local ONNX model. On Apple Silicon, ONNX Runtime's CoreML execution provider dispatches ops to ANE or GPU where supported. Linux supports CUDA and other ONNX Runtime execution providers. No internet required after the model is cached.

03

USE AGENTIC TOOLS

Finch can read files, search your codebase, run shell commands, and fetch web pages — all with your approval. Full multi-turn agentic loop, just like Claude Code.

04

LORA COMING SOON

Feedback collection infrastructure is ready (Ctrl+G / Ctrl+B). LoRA fine-tuning to adapt the model to your codebase is the next major milestone.

// SUPPORTED MODELS

Local (ONNX Runtime — runs on your hardware)

FAMILY SIZES FORMAT NOTES
Qwen 2.5 1.5B · 3B · 7B · 14B ONNX Recommended · auto-selected by RAM
Llama 3 1B · 3B · 8B ONNX Meta · general purpose
Gemma 2/3 1B · 2B · 9B ONNX Google · strong reasoning
Mistral 7B ONNX Mistral AI · efficient · models via microsoft/
Phi-3/4 3.8B · 14B ONNX Microsoft · small + capable
DeepSeek Coder 1.3B · 6.7B ONNX Optimised for code generation

CoreML execution provider on Apple Silicon (M1–M4) · CUDA/ROCm/CPU on Linux via ONNX Runtime · Qwen auto-selected by RAM by default

Cloud providers (bring your own API key)

PROVIDER COST TIER BEST FOR NOTES
Grok (xAI) Free with X Premium+ Daily use · coding · tool calls Only frontier model free via consumer sub · console.x.ai
Groq $ Speed · batch processing Fastest inference of any provider · runs open models on custom hardware
Gemini (Google) $ (Flash) · $$ (Pro) Long context · multimodal Gemini Flash is extremely cost-effective · 1M token context window
Mistral $–$$ European data residency · open weights Mistral Large for serious work · Codestral for code
Claude (Anthropic) $$–$$$ Complex reasoning · long tasks · plan mode Haiku is budget · Sonnet is the sweet spot · best tool use quality
GPT-4o (OpenAI) $$–$$$ General purpose · ecosystem GPT-4o mini for budget · full GPT-4o for quality

Switch any time with /provider grok, /provider claude, etc. · Conversation history preserved across switches · All providers support full tool use

// INSTALL

$ curl -sSL https://raw.githubusercontent.com/darwin-finch/finch/main/install.sh | sh

macOS Apple Silicon · Linux x86_64 · PolyForm Noncommercial · view script ↗

One self-contained binary — no Node, no Python, no Docker, no runtime dependencies.

  1. 01
    Installs the binary to ~/.local/bin/finch and adds it to your PATH
  2. 02
    Run finch — the REPL starts in under 100ms
  3. 03
    Run finch setup to configure an optional provider API key (Claude, GPT-4, Grok, etc.)
  4. 04
    On first run, Finch downloads a local model in the background — start querying immediately
// or build from source
$ git clone https://github.com/darwin-finch/finch && cd finch && cargo build --release

// CONTRIBUTE

Finch is early-stage and actively looking for contributors. All issues are tracked on GitHub.

good first issue

Good First Issues

Browse issues tagged as good first issues — a quick way to get familiar with a Rust codebase that covers ONNX, TUI, async, and tool execution.

help wanted

Help Wanted

Integration tests, model adapter improvements, multi-provider routing, and more. Check GitHub Issues for the current list.

big ticket

LoRA Adapter Loading

Load fine-tuned LoRA adapters into ONNX Runtime at inference time. The largest open milestone — would unlock continuous local learning.