You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add comprehensive similarity search and neural embedding features
## Summary
- Add neural embedding support with -ie flag for semantic code analysis
- Create similarity_index.py with 6 similarity algorithms (cosine, euclidean, manhattan, dot-product, jaccard, weighted-cosine)
- Implement centralized Ollama management with find_ollama.py
- Add comprehensive test suite covering all functionality
- Integrate caching system for fast similarity queries
- Support custom output files with -o flag for experimentation
## Key Features Added
- find_ollama.py: Centralized Ollama detection, model management, and embedding generation
- similarity_index.py: Multi-algorithm similarity search with integrated caching
- Comprehensive test suites for both scripts with 60+ test cases
- Enhanced PROJECT_INDEX.json with similarity analysis caching
- Modular architecture following existing find_python.sh pattern
## CLI Examples
- python3 scripts/find_ollama.py --status # Check Ollama status
- python3 scripts/similarity_index.py --build-cache --algorithms cosine,euclidean
- python3 scripts/similarity_index.py -q "auth function" --algorithm cosine
- python3 scripts/similarity_index.py --duplicates
- python3 run_tests.py # Run comprehensive test suite
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
claude "analyze architecture -ic200"# Export up to 200k to clipboard for external AI
29
+
claude "find similar functions -ie"# Include neural embeddings (requires Ollama)
29
30
30
31
# Or manually create/update the index anytime
31
32
/index
@@ -92,7 +93,57 @@ claude "architecture review -ic800" # Up to 800k tokens
92
93
- ChatGPT
93
94
- Grok
94
95
95
-
**Note**: I'm not using this on large projects myself yet - this is inspiration/theory. Your mileage may vary. If you hit snags, have Claude Code update it to work for your specific use case!
96
+
### Neural Embeddings with `-ie` flag
97
+
```bash
98
+
# Generate index with neural embeddings for each function/class
99
+
claude "find similar code patterns -ie"# Includes embeddings
100
+
claude "search for duplicates -ie50"# 50k tokens with embeddings
101
+
```
102
+
103
+
**Requirements**:
104
+
- Ollama installed and running (`ollama serve`)
105
+
- nomic-embed-text model (auto-downloads if needed)
106
+
107
+
**Benefits**:
108
+
- Semantic similarity search
109
+
- Find duplicate/similar code patterns
110
+
- Better code understanding through vector representations
111
+
112
+
### Similarity Search (`similarity_index.py`)
113
+
114
+
Find similar code patterns using neural embeddings with multiple algorithms:
0 commit comments