You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Update LightRAG setup: Kimi 2.5 entity extraction, Cerebras+Qwen3, multi-option .env
- Replace single .env example with three clear options:
Option A: Kimi 2.5 + Fireworks (recommended)
Option B: Cerebras + Qwen 3 (fastest ingestion)
Option C: Free local Ollama setup
- Add LLM_BINDING_HOST config for Kimi and Cerebras endpoints
- Add API key signup links for Moonshot, Cerebras, and Fireworks
- Update prerequisites to recommend Kimi 2.5 and Cerebras+Qwen3
- Add entity extraction model comparison table to part3-lightrag-setup.md
- Strengthen recommendation text with specific use-case guidance
- Update troubleshooting slow ingestion to reference new models
- Keep both README.md and part3-lightrag-setup.md in sync
Co-Authored-By: Rob <onerobby@gmail.com>
Copy file name to clipboardExpand all lines: README.md
+44-11Lines changed: 44 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -605,8 +605,8 @@ LightRAG does vector DB + knowledge graph **in parallel** during ingestion. One
605
605
### Prerequisites
606
606
607
607
- Python 3.11+
608
-
- An LLM API key (for entity extraction during ingestion — OpenAI, Anthropic, or any OpenAI-compatible provider)
609
-
- An embedding API key (Fireworks recommended for high-quality 4096-dim embeddings, or use local Ollama)
608
+
- An LLM API key for entity extraction during ingestion — we recommend **Kimi 2.5** (Moonshot AI) for quality or **Cerebras + Qwen 3** for speed. Any OpenAI-compatible provider works (OpenAI, Anthropic, OpenRouter, etc.)
609
+
- An embedding API key — **Fireworks + Qwen3-Embedding-8B**recommended for high-quality 4096-dim embeddings, or use **Ollama + nomic-embed-text** locally for free
610
610
611
611
### Install LightRAG
612
612
@@ -627,24 +627,57 @@ pip install -e ".[api]"
627
627
628
628
Create `~/.hermes/lightrag/.env`:
629
629
630
+
**Option A — Kimi 2.5 + Fireworks (recommended):**
631
+
630
632
```bash
631
-
# LLM for entity extraction (during ingestion)
633
+
#--- LLM for entity extraction (during ingestion) ---
632
634
LLM_BINDING=openai
633
-
LLM_MODEL=kimi-2.5 # What we actually use — great quality/cost ratio
634
-
LLM_BINDING_API_KEY=<your-api-key>
635
+
LLM_MODEL=kimi-2.5 # Best quality/cost ratio for entity extraction
636
+
LLM_BINDING_HOST=https://api.moonshot.cn/v1 # Kimi uses an OpenAI-compatible API
637
+
LLM_BINDING_API_KEY=<your-moonshot-api-key># Get one at https://platform.moonshot.cn
# --- LLM for entity extraction (local, free, quality may vary) ---
664
+
LLM_BINDING=ollama
665
+
LLM_MODEL=qwen3:32b # Or any capable local model
666
+
# No API key needed
667
+
668
+
# --- Embedding model (local, free) ---
669
+
EMBEDDING_BINDING=ollama
670
+
EMBEDDING_MODEL=nomic-embed-text # 768-dim, good quality, 2GB VRAM
671
+
# No API key needed
644
672
```
645
673
646
674
> **Security tip:** Set restrictive permissions on this file: `chmod 600 ~/.hermes/lightrag/.env`
647
675
676
+
> **Where to get API keys:**
677
+
> -**Kimi / Moonshot:**[platform.moonshot.cn](https://platform.moonshot.cn) — sign up, create an API key
678
+
> -**Cerebras:**[cloud.cerebras.ai](https://cloud.cerebras.ai) — free tier available, very generous limits
679
+
> -**Fireworks:**[fireworks.ai](https://fireworks.ai) — sign up for an API key
680
+
648
681
### Entity Extraction Model — What to Use
649
682
650
683
This is the LLM that reads your documents and pulls out entities and relationships during ingestion. Quality here directly determines how good your knowledge graph is.
@@ -657,9 +690,9 @@ This is the LLM that reads your documents and pulls out entities and relationshi
657
690
| Claude Sonnet 4 | Medium | Excellent | Mid-range | Overkill for ingestion but works great |
658
691
|**Ollama local**| Depends on GPU | Unpredictable | Free | Untested for this use case — might mess up entity extraction quality. Use at your own risk |
659
692
660
-
> **Our recommendation:** Kimi 2.5 for quality, Cerebras + Qwen 3 if you're ingesting a lot of documents and speed matters. Both are cheap and reliable. We haven't tested local Ollama for entity extraction — it's free but the extraction quality is unverified and you might get a messy graph.
693
+
> **Our recommendation:**Use **Kimi 2.5** for the best quality-to-cost ratio — it produces clean, accurate entity graphs. Switch to **Cerebras + Qwen 3** if you're doing bulk ingestion and speed is the priority (2000+ tokens/sec means you can ingest hundreds of documents in minutes). Both are cheap and reliable. We haven't tested local Ollama for entity extraction — it's free but the extraction quality is unverified and you might get a messy graph.
661
694
662
-
> **Embedding quality matters.** If you have a GPU with 8GB+ VRAM, run `nomic-embed-text` locally via Ollama for free. If you want the best quality, use Fireworks' Qwen3-Embedding-8B (4096 dimensions) — the search accuracy difference is dramatic.
695
+
> **Embedding quality matters.** If you have a GPU with 8GB+ VRAM, run `nomic-embed-text` locally via Ollama for free. If you want the best quality, use **Fireworks' Qwen3-Embedding-8B** (4096 dimensions) — the search accuracy difference is dramatic. This is the embedding model we recommend regardless of which LLM you choose for entity extraction.
Copy file name to clipboardExpand all lines: part3-lightrag-setup.md
+56-13Lines changed: 56 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -46,8 +46,8 @@ LightRAG does vector DB + knowledge graph **in parallel** during ingestion. One
46
46
### Prerequisites
47
47
48
48
- Python 3.11+
49
-
- An LLM API key (for entity extraction during ingestion — OpenAI, Anthropic, or any OpenAI-compatible provider)
50
-
- An embedding API key (Fireworks recommended for high-quality 4096-dim embeddings, or use local Ollama)
49
+
- An LLM API key for entity extraction during ingestion — we recommend **Kimi 2.5** (Moonshot AI) for quality or **Cerebras + Qwen 3** for speed. Any OpenAI-compatible provider works (OpenAI, Anthropic, OpenRouter, etc.)
50
+
- An embedding API key — **Fireworks + Qwen3-Embedding-8B**recommended for high-quality 4096-dim embeddings, or use **Ollama + nomic-embed-text** locally for free
51
51
52
52
### Install LightRAG
53
53
@@ -68,27 +68,70 @@ pip install -e ".[api]"
68
68
69
69
Create `~/.hermes/lightrag/.env`:
70
70
71
+
**Option A — Kimi 2.5 + Fireworks (recommended):**
72
+
73
+
```bash
74
+
# --- LLM for entity extraction (during ingestion) ---
75
+
LLM_BINDING=openai
76
+
LLM_MODEL=kimi-2.5 # Best quality/cost ratio for entity extraction
77
+
LLM_BINDING_HOST=https://api.moonshot.cn/v1 # Kimi uses an OpenAI-compatible API
78
+
LLM_BINDING_API_KEY=<your-moonshot-api-key># Get one at https://platform.moonshot.cn
# --- LLM for entity extraction (local, free, quality may vary) ---
105
+
LLM_BINDING=ollama
106
+
LLM_MODEL=qwen3:32b # Or any capable local model
107
+
# No API key needed
108
+
109
+
# --- Embedding model (local, free) ---
110
+
EMBEDDING_BINDING=ollama
111
+
EMBEDDING_MODEL=nomic-embed-text # 768-dim, good quality, 2GB VRAM
112
+
# No API key needed
85
113
```
86
114
87
-
> **Security tip:** Set restrictive permissions on this file: `chmod 600 ~/.hermes/lightrag/.env`
115
+
> **Where to get API keys:**
116
+
> -**Kimi / Moonshot:**[platform.moonshot.cn](https://platform.moonshot.cn) — sign up, create an API key
117
+
> -**Cerebras:**[cloud.cerebras.ai](https://cloud.cerebras.ai) — free tier available, very generous limits
118
+
> -**Fireworks:**[fireworks.ai](https://fireworks.ai) — sign up for an API key
119
+
120
+
### Entity Extraction Model — What to Use
121
+
122
+
This is the LLM that reads your documents and pulls out entities and relationships during ingestion. Quality here directly determines how good your knowledge graph is.
|**Kimi 2.5**| Fast | Excellent | Cheap |**What we use.** Great balance of quality, speed, and cost for entity extraction |
127
+
|**Cerebras + Qwen 3**| Blazing fast | Very good | Very cheap |**Fastest option in the world.** Cerebras inference at 2000+ tok/s makes bulk ingestion fly |
128
+
| GPT-4.1-mini | Fast | Good | Cheap | Solid fallback, well-tested |
129
+
| Claude Sonnet 4 | Medium | Excellent | Mid-range | Overkill for ingestion but works great |
130
+
|**Ollama local**| Depends on GPU | Unpredictable | Free | Untested for this use case — might mess up entity extraction quality. Use at your own risk |
88
131
89
-
> **Tip:** Use `gpt-4.1-mini` or `claude-sonnet-4-20250514` for entity extraction. It doesn't need to be your smartest model — it just needs to reliably identify entities and relationships. Cheaper models save money on ingestion.
132
+
> **Our recommendation:** Use **Kimi 2.5** for the best quality-to-cost ratio — it produces clean, accurate entity graphs. Switch to **Cerebras + Qwen 3** if you're doing bulk ingestion and speed is the priority (2000+ tokens/sec means you can ingest hundreds of documents in minutes). Both are cheap and reliable. We haven't tested local Ollama for entity extraction — it's free but the extraction quality is unverified and you might get a messy graph.
90
133
91
-
> **Embedding quality matters.** If you have a GPU with 8GB+ VRAM, run `nomic-embed-text` locally via Ollama for free. If you want the best quality, use Fireworks' Qwen3-Embedding-8B (4096 dimensions) — the search accuracy difference is dramatic.
134
+
> **Embedding quality matters.** If you have a GPU with 8GB+ VRAM, run `nomic-embed-text` locally via Ollama for free. If you want the best quality, use **Fireworks' Qwen3-Embedding-8B** (4096 dimensions) — the search accuracy difference is dramatic. This is the embedding model we recommend regardless of which LLM you choose for entity extraction.
92
135
93
136
---
94
137
@@ -389,7 +432,7 @@ cd ~/.hermes/lightrag/LightRAG && lightrag-server --port 9623
389
432
### Slow ingestion
390
433
391
434
Entity extraction is LLM-bound. Speed it up:
392
-
- Use a faster model for ingestion (GPT-4.1-mini, Claude Haiku)
435
+
- Use a faster model for ingestion (Cerebras + Qwen 3 is the fastest option, or Kimi 2.5)
0 commit comments