Skip to content

Commit eddd1a3

Browse files
committed
Apply PR nikmcfly#41: Cloud-LLM tier + GRAPH_LLM_* + error recovery (kept FAST patch, kept 24h timeout, kept ollama out of compose)
1 parent b2dd037 commit eddd1a3

12 files changed

Lines changed: 451 additions & 51 deletions

.env.example

Lines changed: 24 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,17 +14,30 @@ LLM_MODEL_NAME=qwen2.5:32b
1414
# Lighter model for development (less VRAM):
1515
# LLM_MODEL_NAME=qwen2.5:14b
1616

17+
# --- Cloud LLM alternative (OpenRouter, OpenAI, etc.) ---
18+
# Uncomment and set these to use a cloud LLM instead of Ollama.
19+
# Then start with: docker compose -f docker-compose.yml -f docker-compose.cloud.yml up -d
20+
# LLM_API_KEY=sk-or-v1-your-key-here
21+
# LLM_BASE_URL=https://openrouter.ai/api/v1
22+
# LLM_MODEL_NAME=moonshotai/kimi-k2.6
23+
1724
# ===== Neo4j Configuration =====
1825
# For Docker: use service name "neo4j" instead of "localhost"
1926
NEO4J_URI=bolt://localhost:7687
2027
NEO4J_USER=neo4j
2128
NEO4J_PASSWORD=mirofish
2229

2330
# ===== Embedding Configuration =====
24-
# Uses Ollama's embedding endpoint
31+
# Uses Ollama's embedding endpoint by default
2532
EMBEDDING_MODEL=nomic-embed-text
2633
EMBEDDING_BASE_URL=http://localhost:11434
2734
# For Docker: EMBEDDING_BASE_URL=http://ollama:11434
35+
#
36+
# For cloud embeddings (OpenRouter, OpenAI):
37+
# EMBEDDING_MODEL=qwen/qwen3-embedding-8b
38+
# EMBEDDING_BASE_URL=https://openrouter.ai/api/v1
39+
# EMBEDDING_API_KEY defaults to LLM_API_KEY if not set
40+
# EMBEDDING_API_KEY=sk-or-v1-your-key-here
2841

2942
# ===== OASIS / CAMEL-AI Configuration =====
3043
# CAMEL-AI reads OPENAI_API_KEY and OPENAI_API_BASE_URL
@@ -39,3 +52,13 @@ OPENAI_API_BASE_URL=http://localhost:11434/v1
3952
# NEO4J_URI=bolt://neo4j:7687
4053
# EMBEDDING_BASE_URL=http://ollama:11434
4154
# OPENAI_API_BASE_URL=http://ollama:11434/v1
55+
56+
# ===== Hybrid Routing Strategy (Optional) =====
57+
# For cost and speed optimization, you can split LLM traffic:
58+
# 1. Primary LLM (LLM_*) handles complex reasoning and agent personalities.
59+
# 2. Graph LLM (GRAPH_LLM_*) handles high-volume entity extraction.
60+
# If GRAPH_LLM vars are not set, they fall back to primary LLM settings.
61+
62+
# GRAPH_LLM_API_KEY=sk-or-v1-your-key-here
63+
# GRAPH_LLM_BASE_URL=https://openrouter.ai/api/v1
64+
# GRAPH_LLM_MODEL_NAME=google/gemini-3-flash-preview

README.md

Lines changed: 85 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
# MiroFish-Offline
66

7-
**Fully local fork of [MiroFish](https://github.com/666ghj/MiroFish) — no cloud APIs required. English UI.**
7+
**Fully local fork of [MiroFish](https://github.com/666ghj/MiroFish) — no cloud APIs required (Cloud optional for offloading inference). English UI.**
88

99
*A multi-agent swarm intelligence engine that simulates public opinion, market sentiment, and social dynamics. Entirely on your hardware.*
1010

@@ -25,9 +25,9 @@ The [original MiroFish](https://github.com/666ghj/MiroFish) was built for the Ch
2525
|---|---|
2626
| Chinese UI | **English UI** (1,000+ strings translated) |
2727
| Zep Cloud (graph memory) | **Neo4j Community Edition 5.15** |
28-
| DashScope / OpenAI API (LLM) | **Ollama** (qwen2.5, llama3, etc.) |
29-
| Zep Cloud embeddings | **nomic-embed-text** via Ollama |
30-
| Cloud API keys required | **Zero cloud dependencies** |
28+
| DashScope / OpenAI API (LLM) | **Ollama** (qwen2.5, llama3, etc.) **OR any OpenAI-compatible API** |
29+
| Zep Cloud embeddings | **nomic-embed-text** via Ollama **OR any OpenAI-compatible embeddings** |
30+
| Cloud API keys required | **Zero cloud dependencies (Cloud optional)** |
3131

3232
## Workflow
3333

@@ -67,9 +67,44 @@ docker exec mirofish-ollama ollama pull nomic-embed-text
6767

6868
Open `http://localhost:3000` — that's it.
6969

70-
### Option B: Manual
70+
### Option B: Cloud Mode (OpenRouter / OpenAI)
7171

72-
**1. Start Neo4j**
72+
If you have limited local hardware, you can run the simulation using cloud APIs (like OpenRouter or OpenAI) while keeping the graph memory local.
73+
74+
1. **Configure `.env`**:
75+
Uncomment the Cloud section in your `.env` and add your API keys:
76+
```bash
77+
LLM_API_KEY=your_openrouter_key
78+
LLM_BASE_URL=https://openrouter.ai/api/v1
79+
LLM_MODEL_NAME=moonshotai/kimi-k2.6
80+
EMBEDDING_BASE_URL=https://openrouter.ai/api/v1
81+
EMBEDDING_MODEL=qwen/qwen3-embedding-8b
82+
```
83+
84+
```bash
85+
docker compose -f docker-compose.yml -f docker-compose.cloud.yml up -d
86+
```
87+
88+
### Option C: Hybrid Routing (Performance & Cost Optimized)
89+
90+
This is the recommended setup for serious research. It splits the workload between two models:
91+
1. **Agent Reasoning** (Smart): Frontier model for complex behavior (e.g., Kimi K2.6, GPT-4o).
92+
2. **Graph Extraction** (Cheap): High-throughput model for entity parsing (e.g., Gemini Flash).
93+
94+
**Configure in `.env`**:
95+
```bash
96+
# Agent Reasoning (Kimi)
97+
LLM_MODEL_NAME=moonshotai/kimi-k2.6
98+
99+
# Background Extraction (Flash)
100+
GRAPH_LLM_MODEL_NAME=google/gemini-3-flash-preview
101+
GRAPH_LLM_API_KEY=your_key
102+
GRAPH_LLM_BASE_URL=https://openrouter.ai/api/v1
103+
```
104+
105+
### Option D: Manual Installation
106+
107+
If you prefer to run services individually:
73108

74109
```bash
75110
docker run -d --name neo4j \
@@ -107,27 +142,42 @@ npm run dev
107142

108143
Open `http://localhost:3000`.
109144

110-
## Configuration
145+
## Configuration Reference
111146

112-
All settings are in `.env` (copy from `.env.example`):
147+
MiroFish uses a tiered configuration system. Copy `.env.example` to `.env` to get started.
113148

114-
```bash
115-
# LLM — points to local Ollama (OpenAI-compatible API)
116-
LLM_API_KEY=ollama
117-
LLM_BASE_URL=http://localhost:11434/v1
118-
LLM_MODEL_NAME=qwen2.5:32b
119-
120-
# Neo4j
121-
NEO4J_URI=bolt://localhost:7687
122-
NEO4J_USER=neo4j
123-
NEO4J_PASSWORD=mirofish
124-
125-
# Embeddings
126-
EMBEDDING_MODEL=nomic-embed-text
127-
EMBEDDING_BASE_URL=http://localhost:11434
128-
```
149+
### 1. Agent Reasoning (The "Brains")
150+
These variables control the primary LLM used by agents for behavior, reports, and simulated interactions.
129151

130-
Works with any OpenAI-compatible API — swap Ollama for Claude, GPT, or any other provider by changing `LLM_BASE_URL` and `LLM_API_KEY`.
152+
| Variable | Default (Local) | Recommended (Cloud) |
153+
|---|---|---|
154+
| `LLM_MODEL_NAME` | `qwen2.5:32b` | `moonshotai/kimi-k2.6` |
155+
| `LLM_BASE_URL` | `http://localhost:11434/v1` | `https://openrouter.ai/api/v1` |
156+
| `LLM_API_KEY` | `ollama` | `sk-or-v1-...` |
157+
158+
### 2. Graph Extraction (The "Worker")
159+
Optional. Controls the background process that parses documents into the knowledge graph. **If not set, it defaults to the Agent Reasoning settings above.**
160+
161+
| Variable | Use Case | Recommended |
162+
|---|---|---|
163+
| `GRAPH_LLM_MODEL_NAME` | NER / Ontology | `google/gemini-3-flash-preview` |
164+
| `GRAPH_LLM_API_KEY` | API Auth | (Your OpenRouter Key) |
165+
| `GRAPH_LLM_BASE_URL` | API Endpoint | `https://openrouter.ai/api/v1` |
166+
167+
### 3. Embeddings
168+
Used for vector search and long-term memory retrieval.
169+
170+
| Variable | Local | Cloud |
171+
|---|---|---|
172+
| `EMBEDDING_MODEL` | `nomic-embed-text` | `qwen/qwen3-embedding-8b` |
173+
| `EMBEDDING_BASE_URL`| `http://localhost:11434`| `https://openrouter.ai/api/v1`|
174+
175+
### 4. Database (Neo4j)
176+
| Variable | Default | Description |
177+
|---|---|---|
178+
| `NEO4J_URI` | `bolt://localhost:7687` | Use `bolt://neo4j:7687` for Docker |
179+
| `NEO4J_USER` | `neo4j` | Default user |
180+
| `NEO4J_PASSWORD` | `mirofish` | Set during initialization |
131181

132182
## Architecture
133183

@@ -152,7 +202,7 @@ This fork introduces a clean abstraction layer between the application and the g
152202
│ │ Neo4jStorage │ │
153203
│ │ ┌───────────────┐ │ │
154204
│ │ │ EmbeddingService│ ← Ollama │
155-
│ │ │ NERExtractor │ ← Ollama LLM
205+
│ │ │ NERExtractor │ ← Extraction LLM│
156206
│ │ │ SearchService │ ← Hybrid search │
157207
│ │ └───────────────┘ │ │
158208
│ └───────────────────┘ │
@@ -174,14 +224,17 @@ This fork introduces a clean abstraction layer between the application and the g
174224

175225
## Hardware Requirements
176226

177-
| Component | Minimum | Recommended |
178-
|---|---|---|
179-
| RAM | 16 GB | 32 GB |
180-
| VRAM (GPU) | 10 GB (14b model) | 24 GB (32b model) |
181-
| Disk | 20 GB | 50 GB |
182-
| CPU | 4 cores | 8+ cores |
227+
| Component | Minimum (Local) | Recommended (Local) | Cloud Mode |
228+
|---|---|---|---|
229+
| **RAM** | 16 GB | 32 GB | 8 GB |
230+
| **VRAM (GPU)** | 10 GB (14b model) | 24 GB (32b model) | **0 GB** |
231+
| **Disk** | 20 GB | 50 GB | 10 GB |
232+
| **CPU** | 4 cores | 8+ cores | 2+ cores |
233+
234+
> [!NOTE]
235+
> **Cloud Mode** offloads all LLM inference and embeddings to external providers. This is the recommended way to run MiroFish on laptops or hardware without a dedicated NVIDIA GPU. **Hybrid Routing** still benefits from 0 GB VRAM requirements as long as both Reasoning and Extraction models are offloaded.
183236
184-
CPU-only mode works but is significantly slower for LLM inference. For lighter setups, use `qwen2.5:14b` or `qwen2.5:7b`.
237+
CPU-only mode for local LLMs is possible but not recommended for simulations with >50 agents due to extreme latency.
185238

186239
## Use Cases
187240

backend/app/config.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,14 @@ class Config:
4242
# Embedding configuration
4343
EMBEDDING_MODEL = os.environ.get('EMBEDDING_MODEL', 'nomic-embed-text')
4444
EMBEDDING_BASE_URL = os.environ.get('EMBEDDING_BASE_URL', 'http://localhost:11434')
45+
EMBEDDING_API_KEY = os.environ.get('EMBEDDING_API_KEY', os.environ.get('LLM_API_KEY'))
46+
47+
# Graph Extraction LLM configuration (separate routing for NER/ontology)
48+
# Uses a cheap, fast model for high-volume text processing.
49+
# Falls back to primary LLM_* if not set.
50+
GRAPH_LLM_API_KEY = os.environ.get('GRAPH_LLM_API_KEY', os.environ.get('LLM_API_KEY'))
51+
GRAPH_LLM_BASE_URL = os.environ.get('GRAPH_LLM_BASE_URL', os.environ.get('LLM_BASE_URL', 'http://localhost:11434/v1'))
52+
GRAPH_LLM_MODEL_NAME = os.environ.get('GRAPH_LLM_MODEL_NAME', os.environ.get('LLM_MODEL_NAME', 'qwen2.5:32b'))
4553

4654
# File upload configuration
4755
MAX_CONTENT_LENGTH = 50 * 1024 * 1024 # 50MB

backend/app/services/ontology_generator.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -162,7 +162,7 @@ class OntologyGenerator:
162162
"""
163163

164164
def __init__(self, llm_client: Optional[LLMClient] = None):
165-
self.llm_client = llm_client or LLMClient()
165+
self.llm_client = llm_client or LLMClient.for_graph_extraction()
166166

167167
def generate(
168168
self,

0 commit comments

Comments
 (0)