LLM Router for IPFS Accelerate

The LLM Router provides a unified interface for text generation across multiple LLM providers, with built-in caching, retry logic, and integration with the existing IPFS Accelerate endpoint multiplexing infrastructure.

Features

Unified API: Single generate_text() function works with all providers
Multiple Providers: OpenRouter, Codex CLI, Copilot CLI/SDK, Gemini, Claude, Backend Manager, local HuggingFace
Automatic Fallback: Tries multiple providers in order of availability
Response Caching: CID-based or SHA256-based caching for deterministic results
Dependency Injection: Share resources (caches, managers) across calls
Integration: Works seamlessly with existing CLI wrappers and backend manager
No Duplication: Reuses existing infrastructure (DualModeWrapper, BaseCLIWrapper)

Quick Start

Basic Usage

from ipfs_accelerate_py import generate_text

# Auto-select best available provider
response = generate_text("Explain what IPFS is in one sentence")
print(response)

Using a Specific Provider

from ipfs_accelerate_py import generate_text

# Use OpenRouter
response = generate_text(
    "Write a Python function to calculate fibonacci",
    provider="openrouter",
    model_name="openai/gpt-4o-mini",
    max_tokens=256,
    temperature=0.7
)

With Caching

import os
from ipfs_accelerate_py import generate_text

# Enable response cache (enabled by default)
os.environ["IPFS_ACCELERATE_PY_ROUTER_RESPONSE_CACHE"] = "1"

# First call - cache miss
response1 = generate_text("What is 2+2?")

# Second call - cache hit (much faster)
response2 = generate_text("What is 2+2?")

Custom Provider Registration

from ipfs_accelerate_py import register_llm_provider, generate_text

# Define a custom provider
class MyProvider:
    def generate(self, prompt, *, model_name=None, **kwargs):
        return f"Custom response to: {prompt}"

# Register it
register_llm_provider("my_provider", lambda: MyProvider())

# Use it
response = generate_text("test", provider="my_provider")

Dependency Injection

from ipfs_accelerate_py import RouterDeps, generate_text

# Create shared deps container
deps = RouterDeps()

# You can inject pre-configured components
# deps.backend_manager = my_backend_manager
# deps.remote_cache = my_remote_cache

# All calls will share the same resources
response1 = generate_text("First request", deps=deps)
response2 = generate_text("Second request", deps=deps)

print(f"Shared cache has {len(deps.router_cache)} items")

Available Providers

Built-in Providers

1. OpenRouter (`openrouter`)

API-based access to multiple LLM providers.

Configuration:

export OPENROUTER_API_KEY="your-api-key"
export IPFS_ACCELERATE_PY_OPENROUTER_MODEL="openai/gpt-4o-mini"  # optional

Usage:

response = generate_text(
    "Your prompt",
    provider="openrouter",
    model_name="openai/gpt-4o-mini",
    max_tokens=256
)

2. Codex CLI (`codex_cli`)

Uses existing OpenAI Codex CLI integration.

Configuration:

export IPFS_ACCELERATE_PY_CODEX_MODEL="gpt-3.5-turbo"  # optional

Usage:

response = generate_text(
    "Write a sorting algorithm",
    provider="codex_cli"
)

3. Copilot CLI (`copilot_cli`)

Uses existing GitHub Copilot CLI wrapper.

Usage:

response = generate_text(
    "Suggest a git command to...",
    provider="copilot_cli"
)

4. Copilot SDK (`copilot_sdk`)

Uses existing GitHub Copilot SDK wrapper.

Configuration:

export IPFS_ACCELERATE_PY_COPILOT_SDK_MODEL="gpt-4o"  # optional

Usage:

response = generate_text(
    "Your prompt",
    provider="copilot_sdk",
    model_name="gpt-4o"
)

5. Gemini (`gemini_cli`)

Uses existing Gemini CLI integration (SDK-based).

Configuration:

export GOOGLE_API_KEY="your-api-key"
export IPFS_ACCELERATE_PY_GEMINI_MODEL="gemini-pro"  # optional

Usage:

response = generate_text(
    "Explain quantum computing",
    provider="gemini_cli",
    temperature=0.7
)

6. Claude (`claude_code`)

Uses existing Claude Code CLI integration (SDK-based).

Configuration:

export ANTHROPIC_API_KEY="your-api-key"
export IPFS_ACCELERATE_PY_CLAUDE_MODEL="claude-3-5-sonnet-20241022"  # optional

Usage:

response = generate_text(
    "Write a unit test for...",
    provider="claude_code"
)

7. Backend Manager (`backend_manager`)

Uses InferenceBackendManager for distributed/multiplexed inference.

Note: This provider requires backends to expose a callable instance attribute. The actual execution behavior depends on backend implementation. This is a best-effort provider intended for future integration with fully-featured backend instances.

Configuration:

export IPFS_ACCELERATE_PY_ENABLE_BACKEND_MANAGER="1"
export IPFS_ACCELERATE_PY_LLM_LOAD_BALANCING="round_robin"  # or least_loaded, best_performance

Usage:

response = generate_text(
    "Your prompt",
    provider="backend_manager"
)

Current Limitations:

Requires backends with callable instance attribute for inference execution
The InferenceBackendManager singleton is used (config parameters documented for future use)

8. Local HuggingFace (`local_hf`)

Fallback to local transformers models.

Configuration:

export IPFS_ACCELERATE_PY_LLM_MODEL="gpt2"  # or any HF model

Usage:

response = generate_text(
    "Your prompt",
    provider="local_hf",
    model_name="gpt2"
)

Environment Variables

Provider Selection

IPFS_ACCELERATE_PY_LLM_PROVIDER: Force a specific provider (bypasses auto-detection)

Caching

IPFS_ACCELERATE_PY_ROUTER_CACHE: Enable/disable provider caching (default: "1")
IPFS_ACCELERATE_PY_ROUTER_RESPONSE_CACHE: Enable/disable response caching (default: "1")
IPFS_ACCELERATE_PY_ROUTER_CACHE_KEY: Cache key strategy ("sha256" or "cid", default: "sha256")
IPFS_ACCELERATE_PY_ROUTER_CACHE_CID_BASE: CID encoding base (default: "base32")

Backend Manager

IPFS_ACCELERATE_PY_ENABLE_BACKEND_MANAGER: Enable backend manager provider (default: "0")
IPFS_ACCELERATE_PY_LLM_LOAD_BALANCING: Load balancing strategy (default: "round_robin")

Default Model

IPFS_ACCELERATE_PY_LLM_MODEL: Default model for providers that support it

Provider Resolution Order

When no provider is specified, the router tries providers in this order:

Backend Manager (if enabled via env var)
OpenRouter (if API key configured)
Codex CLI (if available)
Copilot CLI (if available)
Copilot SDK (if available)
Gemini (if API key configured)
Claude (if API key configured)
Local HuggingFace (if transformers installed)

Integration with Existing Infrastructure

CLI Wrappers

The router integrates with existing CLI wrappers without duplication:

# These are already available and used by the router
from ipfs_accelerate_py.cli_integrations import (
    OpenAICodexCLIIntegration,
    GeminiCLIIntegration,
    ClaudeCodeCLIIntegration
)

from ipfs_accelerate_py.copilot_cli.wrapper import CopilotCLI
from ipfs_accelerate_py.copilot_sdk.wrapper import CopilotSDK

Backend Manager

The router can use the InferenceBackendManager for distributed inference:

from ipfs_accelerate_py import generate_text
import os

# Enable backend manager
os.environ["IPFS_ACCELERATE_PY_ENABLE_BACKEND_MANAGER"] = "1"

# This will route through the backend manager
response = generate_text("Your prompt", provider="backend_manager")

Caching

Response caching integrates with the existing CID-based caching:

from ipfs_accelerate_py import generate_text
import os

# Use CID-based caching
os.environ["IPFS_ACCELERATE_PY_ROUTER_CACHE_KEY"] = "cid"

# Responses are cached by content-addressed CID
response = generate_text("Your prompt")

Advanced Usage

Provider Instance Management

from ipfs_accelerate_py import get_llm_provider, generate_text

# Get a provider instance
provider = get_llm_provider("openrouter")

# Reuse it for multiple requests
response1 = generate_text("First", provider_instance=provider)
response2 = generate_text("Second", provider_instance=provider)

Custom Dependency Container

from ipfs_accelerate_py import RouterDeps, generate_text
from ipfs_accelerate_py.inference_backend_manager import get_backend_manager

# Create custom deps with pre-configured components
deps = RouterDeps()
deps.backend_manager = get_backend_manager()

# All calls use the same backend manager
response1 = generate_text("First", deps=deps)
response2 = generate_text("Second", deps=deps)

Clear Caches

from ipfs_accelerate_py import clear_llm_router_caches

# Clear internal provider caches
clear_llm_router_caches()

Examples

See examples/llm_router_example.py for comprehensive usage examples.

Run the example:

python examples/llm_router_example.py

Testing

Run the integration tests:

python test/test_llm_router_integration.py

Architecture

The LLM Router follows these design principles:

No Import-Time Side Effects: All heavy imports are lazy
Reuse Existing Infrastructure: No duplication of CLI wrappers or backend managers
Dependency Injection: Optional RouterDeps for sharing resources
Provider Registry: Extensible via register_llm_provider()
Automatic Fallback: Tries multiple providers in order
CID-Based Caching: Content-addressed caching for determinism
Integration Ready: Works with existing endpoint multiplexing

Benefits Over ipfs_datasets_py Implementation

Full Integration: Seamlessly works with existing InferenceBackendManager
No Duplication: Reuses all existing CLI/SDK wrappers
Distributed Ready: Supports distributed/P2P inference via backend manager
CID Caching: Built-in CID-based caching support
Existing Patterns: Follows DualModeWrapper and BaseCLIWrapper patterns
Endpoint Multiplexing: Can multiplex across peers via backend manager

Future Enhancements

Add streaming support for long responses
Add token counting and rate limiting
Add more provider-specific optimizations
Add metrics and monitoring integration
Add distributed caching via libp2p
Add provider health checks

License

See the main project LICENSE file.

FilesExpand file tree

LLM_ROUTER.md

Latest commit

History

LLM_ROUTER.md

File metadata and controls

LLM Router for IPFS Accelerate

Features

Quick Start

Basic Usage

Using a Specific Provider

With Caching

Custom Provider Registration

Dependency Injection

Available Providers

Built-in Providers

1. OpenRouter (openrouter)

2. Codex CLI (codex_cli)

3. Copilot CLI (copilot_cli)

4. Copilot SDK (copilot_sdk)

5. Gemini (gemini_cli)

6. Claude (claude_code)

7. Backend Manager (backend_manager)

8. Local HuggingFace (local_hf)

Environment Variables

Provider Selection

Caching

Backend Manager

Default Model

Provider Resolution Order

Integration with Existing Infrastructure

CLI Wrappers

Backend Manager

Caching

Advanced Usage

Provider Instance Management

Custom Dependency Container

Clear Caches

Examples

Testing

Architecture

Benefits Over ipfs_datasets_py Implementation

Future Enhancements

License

1. OpenRouter (`openrouter`)

2. Codex CLI (`codex_cli`)

3. Copilot CLI (`copilot_cli`)

4. Copilot SDK (`copilot_sdk`)

5. Gemini (`gemini_cli`)

6. Claude (`claude_code`)

7. Backend Manager (`backend_manager`)

8. Local HuggingFace (`local_hf`)