Universal LLM Provider Runtime

One interface for every model. Authentication, routing, streaming, retries, caching — handled.

Quick Start · Features · Docs · Examples · Providers · Architecture · Contributing

What is eyrie

eyrie is the LLM provider runtime that powers the hawk coding agent. It handles everything between your application and LLM APIs — authentication, model resolution, streaming, retries, rate limiting, and caching.

When your app calls a model, eyrie figures out which provider to use, how to talk to it, and how to stream the response back. Switch from Anthropic to Ollama? eyrie handles the translation. API returns 529? eyrie retries with backoff. Response hits max_tokens? eyrie continues automatically.

Your app never talks to an LLM API directly. eyrie does.

Quick Start

go get github.com/GrayCodeAI/eyrie

Requires Go 1.26+. Minimal dependencies (UUID, OpenTelemetry, SQLite, keyring).

import "github.com/GrayCodeAI/eyrie/client"

// Create a client — provider auto-detected from environment
c := client.NewEyrieClient(&client.EyrieConfig{
    Provider: client.DetectProvider(),
})

// Stream a response
sr, err := c.StreamChat(ctx, messages, client.ChatOptions{
    Model: "claude-sonnet-4-6",
})
defer sr.Close()

for evt := range sr.Events {
    switch evt.Type {
    case "content":   // stream text
    case "tool_call": // execute tool
    case "done":      // response complete
    }
}

Features

Provider Routing

Automatically detects and routes to the right provider based on environment variables, config files, or explicit selection.

Model Resolution

Maps abstract tiers (opus/sonnet/haiku) to concrete model IDs per provider. Ships with an embedded catalog of pricing, context windows, and capabilities.

Streaming

Parses SSE for Anthropic and OpenAI formats — text, tool calls, and thinking blocks.

Reliability

Retries on 429/500/529 with exponential backoff and Retry-After support
Auto-continuation when stop_reason == max_tokens
Provider fallback chains for high availability

Rate Limiting

Token bucket rate limiter per provider — prevents hitting API limits before they happen.

Caching

Response caching with configurable TTL
Semantic similarity caching for repeated prompts
Anthropic prompt caching breakpoints on system prompt and conversation prefix

Cost Tracking

Built-in cost estimation per call, with per-provider pricing from the embedded model catalog.

Reasoning Controls

Passes reasoning_effort and Anthropic extended-thinking thinking_budget_tokens through to capable models — omitted when unset.

Keyless CI Auth

GitHub OIDC keyless authentication for cloud deployments — mints a short-lived token in GitHub Actions and exchanges it for AWS Bedrock (STS AssumeRoleWithWebIdentity) or GCP Vertex (Workload Identity Federation) credentials, no stored secrets.

OpenAI-Compatible Proxy

Serves POST /v1/chat/completions so existing OpenAI SDK clients can talk to eyrie unchanged.

Load-Balancing Strategies

Named routing strategies beyond weighted random: simple-shuffle, least-busy, latency-based, cost-based, and usage-based.

Pluggable Cache & Audit Sinks

Distributed CacheBackend interface (in-memory default, RESP/Redis-capable) and an AuditSink interface (no-op default, JSONL file sink) for privacy-preserving call metadata.

Model Role Slots

Named primary / weak / editor model slots with fallback to primary, plus an LLM summarizing condenser that shrinks long conversation histories using the weak model.

Rerank & Readiness

POST /rerank endpoint (provider-backed with lexical fallback) and a GET /ready readiness probe alongside the existing health check.

gRPC Skeleton

Dependency-free gRPC API skeleton behind the grpc build tag — wired when generated stubs are available.

Documentation

Detailed documentation is available in the docs/ directory:

Architecture — System design, data flow, and reliability features
Provider Setup Guide — Credential configuration and provider setup
Dynamic Model Discovery — Live model discovery architecture

Examples

Runnable examples are in the examples/ directory:

Basic Chat — Simple synchronous chat
Streaming — SSE streaming with event handling
Multi-Provider — Fallback chains across providers

Run any example with:

ANTHROPIC_API_KEY=sk-... go run ./examples/basic/

Supported Providers

12 setup gateways in catalog/registry/providers.go (hawk /config uses the same list):

Provider	ID	Env variable
Anthropic	`anthropic`	`ANTHROPIC_API_KEY`
OpenAI	`openai`	`OPENAI_API_KEY`
Google Gemini	`gemini`	`GEMINI_API_KEY`
OpenRouter	`openrouter`	`OPENROUTER_API_KEY`
xAI (Grok)	`grok`	`XAI_API_KEY`
Z.AI	`z-ai`	`ZAI_API_KEY`
CanopyWave	`canopywave`	`CANOPYWAVE_API_KEY`
OpenCode Go	`opencodego`	`OPENCODEGO_API_KEY`
Kimi (Moonshot)	`kimi`	`MOONSHOT_API_KEY`
Xiaomi (MiMo) Pay-as-you-go	`xiaomi_mimo_payg`	`XIAOMI_MIMO_PAYG_API_KEY`
Xiaomi (MiMo) Token Plan	`xiaomi_mimo_token_plan`	`XIAOMI_MIMO_TOKEN_PLAN_API_KEY` (+ region `cn` / `sgp` / `ams`)
Ollama	`ollama`	`OLLAMA_BASE_URL` (local; no API key)

Runtime auto-detection uses a separate priority order for chat when no deployment is pinned; see config profiles.

Usage

Basic Chat

resp, err := c.Chat(ctx, messages, client.ChatOptions{
    Model: "gpt-4o",
})

Streaming with Continuation

// Auto-continues when max_tokens is hit
resp, err := client.ChatWithContinuation(ctx, provider, messages,
    client.ChatOptions{Model: model},
    client.DefaultContinuationConfig(),
)

Mock Provider for Testing

mock := client.NewMockProvider(client.MockModeFixed)
mock.Response = "Here is the code you asked for..."

resp, _ := mock.Chat(ctx, messages, opts)
// No real API calls — perfect for tests

Model Catalog

cat := catalog.DefaultModelCatalog()

// Get the best model for a tier
model := catalog.GetPreferredProviderModel("anthropic", catalog.TierSonnet, &cat)
// → "claude-sonnet-4-6"

// Check deprecation warnings
warn := catalog.GetModelDeprecationWarning("claude-3-7-sonnet", "anthropic")

Provider Configuration

cfg := config.LoadProviderConfig("")             // load from disk
config.ApplyProviderConfigToEnv(cfg, false, nil) // apply to environment
config.SaveProviderConfig(cfg, "")               // save changes

Architecture

eyrie/
├── client/                 # Provider client & streaming interface
├── config/                 # Provider configuration & routing
│   └── credential/         # Credential file management
├── catalog/                # Model catalog & tier system
│   ├── discover/           # Model discovery
│   ├── legacy/             # Legacy model support
│   ├── live/               # Live model data
│   └── registry/           # Model registry
├── codeagent/              # Code agent retry & fallback strategies
├── conversation/           # Conversation engine with branching
├── credentials/            # Credential management
├── docs/                   # Documentation & guides
├── examples/               # Runnable code examples
├── router/                 # Weighted provider router
├── runtime/                # Runtime manifest & routing policies
├── storage/                # SQLite conversation DAG store
├── types/                  # Branded types & API errors
├── errors/                 # Error message constants
├── constants/              # API limits
├── utils/                  # Error utilities
├── internal/
│   ├── api/                # HTTP API handlers
│   ├── cache/              # Response cache warmer
│   ├── health/             # Provider health checker
│   ├── observability/      # OpenTelemetry spans & metrics
│   ├── sdk/                # Go, Python, TypeScript client SDKs
│   └── version/            # Version information
└── assets/                 # Logo and branding

See docs/ARCHITECTURE.md for detailed system design and data flows.

Ecosystem

eyrie is part of the hawk-eco:

Component	Repository	Purpose
hawk	GrayCodeAI/hawk	AI coding agent
eyrie	This repo	LLM provider runtime
tok	GrayCodeAI/tok	Tokenizer & compression
yaad	GrayCodeAI/yaad	Graph-based memory
trace	GrayCodeAI/trace	Session capture

Development

Prerequisites

Go 1.26+

Build & Test

go build ./...               # Verify the library compiles
go test -race ./...           # Run all tests with race detector
make ci                       # Run full CI suite (lint, test, security)
make cover                    # Generate coverage report

Contributing

We welcome contributions! Please see CONTRIBUTING.md for development setup, commit conventions, and the PR process.

Quick start:

Fork and create a branch: git checkout -b feat/short-description
Make changes in small, focused commits
Run make ci locally
Open a pull request

Use Conventional Commits for commit messages — release-please uses them for versioning.

License

MIT — see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 225 Commits
.github		.github
api		api
assets		assets
catalog		catalog
client		client
codeagent		codeagent
config		config
constants		constants
conversation		conversation
credentials		credentials
docs		docs
errors		errors
examples		examples
internal		internal
plans		plans
router		router
rules		rules
runtime		runtime
scripts		scripts
setup		setup
storage		storage
types		types
utils		utils
verify		verify
.editorconfig		.editorconfig
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.markdownlint-cli2.jsonc		.markdownlint-cli2.jsonc
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
VERSION		VERSION
go.mod		go.mod
go.sum		go.sum
go.work		go.work
go.work.sum		go.work.sum
lefthook.yml		lefthook.yml
sgconfig.yaml		sgconfig.yaml

Folders and files

Latest commit

History

Repository files navigation

Universal LLM Provider Runtime

What is eyrie

Quick Start

Features

Provider Routing

Model Resolution

Streaming

Reliability

Rate Limiting

Caching

Cost Tracking

Reasoning Controls

Keyless CI Auth

OpenAI-Compatible Proxy

Load-Balancing Strategies

Pluggable Cache & Audit Sinks

Model Role Slots

Rerank & Readiness

gRPC Skeleton

Documentation

Examples

Supported Providers

Usage

Basic Chat

Streaming with Continuation

Mock Provider for Testing

Model Catalog

Provider Configuration

Architecture

Ecosystem

Development

Prerequisites

Build & Test

Contributing

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages