Skip to content

Commit e0244cb

Browse files
Update LightRAG setup: Kimi 2.5 entity extraction, Cerebras+Qwen3, multi-option .env
- Replace single .env example with three clear options: Option A: Kimi 2.5 + Fireworks (recommended) Option B: Cerebras + Qwen 3 (fastest ingestion) Option C: Free local Ollama setup - Add LLM_BINDING_HOST config for Kimi and Cerebras endpoints - Add API key signup links for Moonshot, Cerebras, and Fireworks - Update prerequisites to recommend Kimi 2.5 and Cerebras+Qwen3 - Add entity extraction model comparison table to part3-lightrag-setup.md - Strengthen recommendation text with specific use-case guidance - Update troubleshooting slow ingestion to reference new models - Keep both README.md and part3-lightrag-setup.md in sync Co-Authored-By: Rob <onerobby@gmail.com>
1 parent 10abd24 commit e0244cb

2 files changed

Lines changed: 100 additions & 24 deletions

File tree

README.md

Lines changed: 44 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -605,8 +605,8 @@ LightRAG does vector DB + knowledge graph **in parallel** during ingestion. One
605605
### Prerequisites
606606

607607
- Python 3.11+
608-
- An LLM API key (for entity extraction during ingestion — OpenAI, Anthropic, or any OpenAI-compatible provider)
609-
- An embedding API key (Fireworks recommended for high-quality 4096-dim embeddings, or use local Ollama)
608+
- An LLM API key for entity extraction during ingestion — we recommend **Kimi 2.5** (Moonshot AI) for quality or **Cerebras + Qwen 3** for speed. Any OpenAI-compatible provider works (OpenAI, Anthropic, OpenRouter, etc.)
609+
- An embedding API key **Fireworks + Qwen3-Embedding-8B** recommended for high-quality 4096-dim embeddings, or use **Ollama + nomic-embed-text** locally for free
610610

611611
### Install LightRAG
612612

@@ -627,24 +627,57 @@ pip install -e ".[api]"
627627

628628
Create `~/.hermes/lightrag/.env`:
629629

630+
**Option A — Kimi 2.5 + Fireworks (recommended):**
631+
630632
```bash
631-
# LLM for entity extraction (during ingestion)
633+
# --- LLM for entity extraction (during ingestion) ---
632634
LLM_BINDING=openai
633-
LLM_MODEL=kimi-2.5 # What we actually use — great quality/cost ratio
634-
LLM_BINDING_API_KEY=<your-api-key>
635+
LLM_MODEL=kimi-2.5 # Best quality/cost ratio for entity extraction
636+
LLM_BINDING_HOST=https://api.moonshot.cn/v1 # Kimi uses an OpenAI-compatible API
637+
LLM_BINDING_API_KEY=<your-moonshot-api-key> # Get one at https://platform.moonshot.cn
635638

636-
# Embedding model (for vector storage)
639+
# --- Embedding model (for vector storage) ---
640+
EMBEDDING_BINDING=fireworks
641+
EMBEDDING_MODEL=accounts/fireworks/models/qwen3-embedding-8b # 4096-dim, excellent quality
642+
EMBEDDING_API_KEY=<your-fireworks-api-key> # Get one at https://fireworks.ai
643+
```
644+
645+
**Option B — Cerebras + Qwen 3 (fastest ingestion):**
646+
647+
```bash
648+
# --- LLM for entity extraction (during ingestion) ---
649+
LLM_BINDING=openai
650+
LLM_MODEL=qwen-3-32b # Cerebras runs Qwen 3 at 2000+ tok/s
651+
LLM_BINDING_HOST=https://api.cerebras.ai/v1
652+
LLM_BINDING_API_KEY=<your-cerebras-api-key> # Get one at https://cloud.cerebras.ai
653+
654+
# --- Embedding model (for vector storage) ---
637655
EMBEDDING_BINDING=fireworks
638656
EMBEDDING_MODEL=accounts/fireworks/models/qwen3-embedding-8b
639657
EMBEDDING_API_KEY=<your-fireworks-api-key>
658+
```
659+
660+
**Option C — Free local setup (Ollama):**
640661

641-
# Or use local Ollama (free, no API key needed):
642-
# EMBEDDING_BINDING=ollama
643-
# EMBEDDING_MODEL=nomic-embed-text
662+
```bash
663+
# --- LLM for entity extraction (local, free, quality may vary) ---
664+
LLM_BINDING=ollama
665+
LLM_MODEL=qwen3:32b # Or any capable local model
666+
# No API key needed
667+
668+
# --- Embedding model (local, free) ---
669+
EMBEDDING_BINDING=ollama
670+
EMBEDDING_MODEL=nomic-embed-text # 768-dim, good quality, 2GB VRAM
671+
# No API key needed
644672
```
645673

646674
> **Security tip:** Set restrictive permissions on this file: `chmod 600 ~/.hermes/lightrag/.env`
647675
676+
> **Where to get API keys:**
677+
> - **Kimi / Moonshot:** [platform.moonshot.cn](https://platform.moonshot.cn) — sign up, create an API key
678+
> - **Cerebras:** [cloud.cerebras.ai](https://cloud.cerebras.ai) — free tier available, very generous limits
679+
> - **Fireworks:** [fireworks.ai](https://fireworks.ai) — sign up for an API key
680+
648681
### Entity Extraction Model — What to Use
649682

650683
This is the LLM that reads your documents and pulls out entities and relationships during ingestion. Quality here directly determines how good your knowledge graph is.
@@ -657,9 +690,9 @@ This is the LLM that reads your documents and pulls out entities and relationshi
657690
| Claude Sonnet 4 | Medium | Excellent | Mid-range | Overkill for ingestion but works great |
658691
| **Ollama local** | Depends on GPU | Unpredictable | Free | Untested for this use case — might mess up entity extraction quality. Use at your own risk |
659692

660-
> **Our recommendation:** Kimi 2.5 for quality, Cerebras + Qwen 3 if you're ingesting a lot of documents and speed matters. Both are cheap and reliable. We haven't tested local Ollama for entity extraction — it's free but the extraction quality is unverified and you might get a messy graph.
693+
> **Our recommendation:** Use **Kimi 2.5** for the best quality-to-cost ratio — it produces clean, accurate entity graphs. Switch to **Cerebras + Qwen 3** if you're doing bulk ingestion and speed is the priority (2000+ tokens/sec means you can ingest hundreds of documents in minutes). Both are cheap and reliable. We haven't tested local Ollama for entity extraction — it's free but the extraction quality is unverified and you might get a messy graph.
661694
662-
> **Embedding quality matters.** If you have a GPU with 8GB+ VRAM, run `nomic-embed-text` locally via Ollama for free. If you want the best quality, use Fireworks' Qwen3-Embedding-8B (4096 dimensions) — the search accuracy difference is dramatic.
695+
> **Embedding quality matters.** If you have a GPU with 8GB+ VRAM, run `nomic-embed-text` locally via Ollama for free. If you want the best quality, use **Fireworks' Qwen3-Embedding-8B** (4096 dimensions) — the search accuracy difference is dramatic. This is the embedding model we recommend regardless of which LLM you choose for entity extraction.
663696
664697
---
665698

part3-lightrag-setup.md

Lines changed: 56 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -46,8 +46,8 @@ LightRAG does vector DB + knowledge graph **in parallel** during ingestion. One
4646
### Prerequisites
4747

4848
- Python 3.11+
49-
- An LLM API key (for entity extraction during ingestion — OpenAI, Anthropic, or any OpenAI-compatible provider)
50-
- An embedding API key (Fireworks recommended for high-quality 4096-dim embeddings, or use local Ollama)
49+
- An LLM API key for entity extraction during ingestion — we recommend **Kimi 2.5** (Moonshot AI) for quality or **Cerebras + Qwen 3** for speed. Any OpenAI-compatible provider works (OpenAI, Anthropic, OpenRouter, etc.)
50+
- An embedding API key **Fireworks + Qwen3-Embedding-8B** recommended for high-quality 4096-dim embeddings, or use **Ollama + nomic-embed-text** locally for free
5151

5252
### Install LightRAG
5353

@@ -68,27 +68,70 @@ pip install -e ".[api]"
6868

6969
Create `~/.hermes/lightrag/.env`:
7070

71+
**Option A — Kimi 2.5 + Fireworks (recommended):**
72+
73+
```bash
74+
# --- LLM for entity extraction (during ingestion) ---
75+
LLM_BINDING=openai
76+
LLM_MODEL=kimi-2.5 # Best quality/cost ratio for entity extraction
77+
LLM_BINDING_HOST=https://api.moonshot.cn/v1 # Kimi uses an OpenAI-compatible API
78+
LLM_BINDING_API_KEY=<your-moonshot-api-key> # Get one at https://platform.moonshot.cn
79+
80+
# --- Embedding model (for vector storage) ---
81+
EMBEDDING_BINDING=fireworks
82+
EMBEDDING_MODEL=accounts/fireworks/models/qwen3-embedding-8b # 4096-dim, excellent quality
83+
EMBEDDING_API_KEY=<your-fireworks-api-key> # Get one at https://fireworks.ai
84+
```
85+
86+
**Option B — Cerebras + Qwen 3 (fastest ingestion):**
87+
7188
```bash
72-
# LLM for entity extraction (during ingestion)
89+
# --- LLM for entity extraction (during ingestion) ---
7390
LLM_BINDING=openai
74-
LLM_MODEL=gpt-4.1-mini
75-
LLM_BINDING_API_KEY=<your-openai-api-key>
91+
LLM_MODEL=qwen-3-32b # Cerebras runs Qwen 3 at 2000+ tok/s
92+
LLM_BINDING_HOST=https://api.cerebras.ai/v1
93+
LLM_BINDING_API_KEY=<your-cerebras-api-key> # Get one at https://cloud.cerebras.ai
7694

77-
# Embedding model (for vector storage)
95+
# --- Embedding model (for vector storage) ---
7896
EMBEDDING_BINDING=fireworks
7997
EMBEDDING_MODEL=accounts/fireworks/models/qwen3-embedding-8b
8098
EMBEDDING_API_KEY=<your-fireworks-api-key>
99+
```
81100

82-
# Or use local Ollama (free, no API key needed):
83-
# EMBEDDING_BINDING=ollama
84-
# EMBEDDING_MODEL=nomic-embed-text
101+
**Option C — Free local setup (Ollama):**
102+
103+
```bash
104+
# --- LLM for entity extraction (local, free, quality may vary) ---
105+
LLM_BINDING=ollama
106+
LLM_MODEL=qwen3:32b # Or any capable local model
107+
# No API key needed
108+
109+
# --- Embedding model (local, free) ---
110+
EMBEDDING_BINDING=ollama
111+
EMBEDDING_MODEL=nomic-embed-text # 768-dim, good quality, 2GB VRAM
112+
# No API key needed
85113
```
86114

87-
> **Security tip:** Set restrictive permissions on this file: `chmod 600 ~/.hermes/lightrag/.env`
115+
> **Where to get API keys:**
116+
> - **Kimi / Moonshot:** [platform.moonshot.cn](https://platform.moonshot.cn) — sign up, create an API key
117+
> - **Cerebras:** [cloud.cerebras.ai](https://cloud.cerebras.ai) — free tier available, very generous limits
118+
> - **Fireworks:** [fireworks.ai](https://fireworks.ai) — sign up for an API key
119+
120+
### Entity Extraction Model — What to Use
121+
122+
This is the LLM that reads your documents and pulls out entities and relationships during ingestion. Quality here directly determines how good your knowledge graph is.
123+
124+
| Model | Speed | Quality | Cost | Recommendation |
125+
|-------|-------|---------|------|----------------|
126+
| **Kimi 2.5** | Fast | Excellent | Cheap | **What we use.** Great balance of quality, speed, and cost for entity extraction |
127+
| **Cerebras + Qwen 3** | Blazing fast | Very good | Very cheap | **Fastest option in the world.** Cerebras inference at 2000+ tok/s makes bulk ingestion fly |
128+
| GPT-4.1-mini | Fast | Good | Cheap | Solid fallback, well-tested |
129+
| Claude Sonnet 4 | Medium | Excellent | Mid-range | Overkill for ingestion but works great |
130+
| **Ollama local** | Depends on GPU | Unpredictable | Free | Untested for this use case — might mess up entity extraction quality. Use at your own risk |
88131

89-
> **Tip:** Use `gpt-4.1-mini` or `claude-sonnet-4-20250514` for entity extraction. It doesn't need to be your smartest model — it just needs to reliably identify entities and relationships. Cheaper models save money on ingestion.
132+
> **Our recommendation:** Use **Kimi 2.5** for the best quality-to-cost ratio — it produces clean, accurate entity graphs. Switch to **Cerebras + Qwen 3** if you're doing bulk ingestion and speed is the priority (2000+ tokens/sec means you can ingest hundreds of documents in minutes). Both are cheap and reliable. We haven't tested local Ollama for entity extraction — it's free but the extraction quality is unverified and you might get a messy graph.
90133
91-
> **Embedding quality matters.** If you have a GPU with 8GB+ VRAM, run `nomic-embed-text` locally via Ollama for free. If you want the best quality, use Fireworks' Qwen3-Embedding-8B (4096 dimensions) — the search accuracy difference is dramatic.
134+
> **Embedding quality matters.** If you have a GPU with 8GB+ VRAM, run `nomic-embed-text` locally via Ollama for free. If you want the best quality, use **Fireworks' Qwen3-Embedding-8B** (4096 dimensions) — the search accuracy difference is dramatic. This is the embedding model we recommend regardless of which LLM you choose for entity extraction.
92135
93136
---
94137

@@ -389,7 +432,7 @@ cd ~/.hermes/lightrag/LightRAG && lightrag-server --port 9623
389432
### Slow ingestion
390433

391434
Entity extraction is LLM-bound. Speed it up:
392-
- Use a faster model for ingestion (GPT-4.1-mini, Claude Haiku)
435+
- Use a faster model for ingestion (Cerebras + Qwen 3 is the fastest option, or Kimi 2.5)
393436
- Process documents in parallel batches
394437
- Use a local model if you have GPU capacity
395438

0 commit comments

Comments
 (0)