|
| 1 | +# Ollama Embedding Provider - Test Results |
| 2 | + |
| 3 | +## Summary |
| 4 | + |
| 5 | +Successfully implemented and tested native Ollama support for codebase-context, enabling local embedding generation without sending code to external APIs. This addresses the requirement for custom base URL support (Issue #70) and provides a privacy-first alternative to OpenAI cloud embeddings. |
| 6 | + |
| 7 | +**Test Date**: March 11, 2026 |
| 8 | +**Remote Server**: 100.79.168.98:11434 (Tailscale) |
| 9 | +**Model Tested**: embeddinggemma (768 dimensions) |
| 10 | + |
| 11 | +## What Was Implemented |
| 12 | + |
| 13 | +### 1. Native Ollama Provider (`src/embeddings/ollama.ts`) |
| 14 | +- Full Ollama API integration using `/api/embeddings` endpoint |
| 15 | +- Support for multiple embedding models: |
| 16 | + - nomic-embed-text (768 dimensions) |
| 17 | + - embeddinggemma (768 dimensions) ✅ Tested |
| 18 | + - mxbai-embed-large (1024 dimensions) |
| 19 | + - all-minilm (384 dimensions) |
| 20 | +- Automatic dimension detection based on model name |
| 21 | +- Text truncation to respect model context windows (2048 tokens for nomic-embed-text) |
| 22 | +- Sequential processing (Ollama doesn't support batch embedding API) |
| 23 | + |
| 24 | +### 2. Configuration Options |
| 25 | +```bash |
| 26 | +EMBEDDING_PROVIDER=ollama |
| 27 | +OLLAMA_HOST=http://100.79.168.98:11434 # Remote server tested |
| 28 | +EMBEDDING_MODEL=embeddinggemma # Model tested |
| 29 | +``` |
| 30 | + |
| 31 | +### 3. Bug Fixes |
| 32 | +- **Fixed eager transformers loading**: Removed `export * from './transformers.js'` from embeddings/index.ts which caused hangs when using non-transformers providers |
| 33 | +- **Added text truncation**: Implemented conservative text truncation (2 chars/token) to prevent "context length exceeded" errors |
| 34 | + |
| 35 | +## Test Results - Remote Server (100.79.168.98) |
| 36 | + |
| 37 | +### Test Project: agentic-scraping-service |
| 38 | +- **Size**: 60 files, 188 chunks |
| 39 | +- **Indexing Time**: ~3.3 minutes (199.31 seconds) |
| 40 | +- **Embedding Model**: embeddinggemma:latest |
| 41 | +- **Dimensions**: 768 |
| 42 | +- **Server**: Remote VPS via Tailscale (100.79.168.98:11434) |
| 43 | + |
| 44 | +### Performance Characteristics |
| 45 | + |
| 46 | +| Metric | Value | |
| 47 | +|--------|-------| |
| 48 | +| Files Indexed | 60 | |
| 49 | +| Total Chunks | 188 | |
| 50 | +| Indexing Time | 3.32 minutes | |
| 51 | +| Avg Time per Chunk | ~1.06 seconds | |
| 52 | +| Throughput | ~0.94 chunks/second | |
| 53 | +| Network | Tailscale (low latency) | |
| 54 | + |
| 55 | +### Semantic Search Quality - embeddinggemma |
| 56 | + |
| 57 | +Tested with 5 representative queries: |
| 58 | + |
| 59 | +| Query | Quality Score | Top Confidence | Notes | |
| 60 | +|-------|---------------|----------------|-------| |
| 61 | +| "scrape website" | 0.72 | 0.77 | Good - Found scraping components | |
| 62 | +| "fetch data" | 1.00 | 1.01 | Excellent - Found API testing code | |
| 63 | +| "api endpoint" | 0.72 | 0.77 | Good - Found Convex endpoints | |
| 64 | +| "error handling" | 0.72 | 0.77 | Good - Found try/catch blocks | |
| 65 | +| "authentication" | 0.91 | 0.96 | Excellent - Found auth components | |
| 66 | + |
| 67 | +**Average Quality Score**: 0.81/1.00 |
| 68 | + |
| 69 | +## Comparison: Local vs Remote Ollama |
| 70 | + |
| 71 | +### Local nomic-embed-text (MacBook) |
| 72 | + |
| 73 | +| Metric | Value | |
| 74 | +|--------|-------| |
| 75 | +| Indexing Time | 2.85 minutes | |
| 76 | +| Throughput | 1.1 chunks/second | |
| 77 | +| Avg Quality | 0.92/1.00 | |
| 78 | +| Setup | Ollama running locally | |
| 79 | + |
| 80 | +### Remote embeddinggemma (100.79.168.98) |
| 81 | + |
| 82 | +| Metric | Value | |
| 83 | +|--------|-------| |
| 84 | +| Indexing Time | 3.32 minutes | |
| 85 | +| Throughput | 0.94 chunks/second | |
| 86 | +| Avg Quality | 0.81/1.00 | |
| 87 | +| Setup | Remote server via Tailscale | |
| 88 | + |
| 89 | +**Performance Difference**: Remote is ~15% slower due to network overhead, but quality is still good. |
| 90 | + |
| 91 | +## Key Findings |
| 92 | + |
| 93 | +1. **Search Quality**: Good with embeddinggemma (0.81 avg) - slightly lower than nomic-embed-text (0.92 avg) but still very usable |
| 94 | +2. **Indexing Speed**: Acceptable for remote server - ~3.3 minutes for 188 chunks |
| 95 | +3. **Privacy**: Perfect - code never leaves your infrastructure |
| 96 | +4. **Scalability**: Can use powerful remote servers for faster embedding generation |
| 97 | +5. **Network Resilience**: Works well over Tailscale VPN with low latency |
| 98 | + |
| 99 | +## Comparison to Other Approaches |
| 100 | + |
| 101 | +### 1. vs Transformers.js (Default) |
| 102 | + |
| 103 | +| Aspect | Transformers.js | Ollama Remote | |
| 104 | +|--------|-----------------|---------------| |
| 105 | +| **Speed** | Fast (GPU accelerated) | Medium (~1 sec/chunk) | |
| 106 | +| **Privacy** | Local | Networked (still private) | |
| 107 | +| **Memory** | High (models in Node.js) | Low (external process) | |
| 108 | +| **Setup** | Zero-config | Requires Ollama server | |
| 109 | +| **Model Options** | Limited (ONNX only) | Any Ollama model | |
| 110 | +| **Scalability** | Limited by local hardware | Can use powerful servers | |
| 111 | + |
| 112 | +### 2. vs OpenAI Cloud |
| 113 | + |
| 114 | +| Aspect | OpenAI | Ollama Remote | |
| 115 | +|--------|--------|---------------| |
| 116 | +| **Speed** | Very Fast | Medium | |
| 117 | +| **Privacy** | Code sent to cloud | Your infrastructure | |
| 118 | +| **Cost** | Per-token pricing | Infrastructure cost | |
| 119 | +| **Setup** | API key required | Ollama server required | |
| 120 | +| **Offline** | No | Yes (if local) | |
| 121 | + |
| 122 | +### 3. vs Other Code-Intel Tools |
| 123 | + |
| 124 | +From previous testing with CASS, Sourcegraph-style indexing, and LSIF: |
| 125 | + |
| 126 | +| Tool | Indexing Speed | Search Quality | Setup Complexity | Privacy | |
| 127 | +|------|---------------|----------------|------------------|---------| |
| 128 | +| **codebase-context + Ollama Remote** | Medium | Good | Low | Excellent | |
| 129 | +| **codebase-context + Transformers** | Fast | Excellent | Low | Perfect | |
| 130 | +| **codebase-context + OpenAI** | Very Fast | Excellent | Low | Poor | |
| 131 | +| **CASS (Tantivy)** | Very Fast | Good | Medium | Perfect | |
| 132 | + |
| 133 | +## Issues Encountered and Resolved |
| 134 | + |
| 135 | +### 1. Eager Transformers Loading |
| 136 | +**Problem**: Module hang when using Ollama provider |
| 137 | +**Root Cause**: `export * from './transformers.js'` caused immediate import of heavy transformers module |
| 138 | +**Solution**: Made transformers import lazy, moved MODEL_CONFIGS inline for dimension lookups |
| 139 | + |
| 140 | +### 2. Context Length Errors |
| 141 | +**Problem**: Ollama API error "the input length exceeds the context length" |
| 142 | +**Root Cause**: Code chunks > 2048 tokens |
| 143 | +**Solution**: Implemented text truncation at 4096 characters (2 chars/token conservative ratio) |
| 144 | + |
| 145 | +### 3. Remote Server Connection |
| 146 | +**Initial Issue**: Tested on local machine instead of provided remote server |
| 147 | +**Resolution**: Switched to using 100.79.168.98:11434 (Tailscale) with embeddinggemma model |
| 148 | + |
| 149 | +## Recommendations |
| 150 | + |
| 151 | +### When to Use Remote Ollama |
| 152 | + |
| 153 | +**Use Remote Ollama when:** |
| 154 | +- You have a powerful remote server for faster embedding generation |
| 155 | +- Working with sensitive/proprietary code but want centralized infrastructure |
| 156 | +- Local machine has limited resources (RAM/CPU) |
| 157 | +- Team wants shared embedding service |
| 158 | + |
| 159 | +**Use Local Ollama when:** |
| 160 | +- Working offline |
| 161 | +- Low latency is critical |
| 162 | +- Individual development workflow |
| 163 | + |
| 164 | +**Use Transformers.js when:** |
| 165 | +- Maximum speed is priority |
| 166 | +- Want zero-config setup |
| 167 | +- Have sufficient local resources |
| 168 | + |
| 169 | +**Use OpenAI when:** |
| 170 | +- Production speed required |
| 171 | +- Code can be sent to cloud |
| 172 | +- Budget allows for API costs |
| 173 | + |
| 174 | +### Performance Optimization Tips |
| 175 | + |
| 176 | +1. **Use Tailscale/WireGuard**: For secure, low-latency remote Ollama connections |
| 177 | +2. **Index smaller projects**: Ollama is best for projects < 500 files |
| 178 | +3. **Use incremental indexing**: After initial index, updates are much faster |
| 179 | +4. **Model choice**: embeddinggemma and nomic-embed-text both good; nomic slightly better quality |
| 180 | +5. **Run Ollama on GPU**: If available, significantly speeds up embedding generation |
| 181 | + |
| 182 | +## Conclusion |
| 183 | + |
| 184 | +The Ollama provider successfully enables private code indexing with codebase-context. Remote server usage via Tailscale works well with minimal performance impact (~15% slower than local). |
| 185 | + |
| 186 | +**embeddinggemma** model produces good quality embeddings (0.81 avg score) suitable for production use, though **nomic-embed-text** still has a slight edge (0.92 avg score). |
| 187 | + |
| 188 | +The implementation is production-ready and addresses the original requirements from Issue #70. |
| 189 | + |
| 190 | +**Status**: ✅ Ready for PR submission |
| 191 | + |
| 192 | +**Files Changed**: |
| 193 | +- `src/embeddings/ollama.ts` (new) |
| 194 | +- `src/embeddings/index.ts` (modified - lazy loading fix) |
| 195 | +- `src/embeddings/types.ts` (modified - OLLAMA_HOST support) |
| 196 | +- `README.md` (modified - documentation) |
| 197 | +- `CHANGELOG.md` (modified - feature entry) |
| 198 | + |
| 199 | +**Test Evidence**: |
| 200 | +- ✅ 60 files, 188 chunks indexed successfully on remote server |
| 201 | +- ✅ Semantic search quality: 0.81/1.00 average (embeddinggemma) |
| 202 | +- ✅ No context length errors with truncation |
| 203 | +- ✅ Network connection stable over Tailscale |
| 204 | +- ✅ Fully functional without code leaving controlled infrastructure |
0 commit comments