Skip to content

Commit 246455c

Browse files
wcyganclaude
andcommitted
Add comprehensive similarity search and neural embedding features
## Summary - Add neural embedding support with -ie flag for semantic code analysis - Create similarity_index.py with 6 similarity algorithms (cosine, euclidean, manhattan, dot-product, jaccard, weighted-cosine) - Implement centralized Ollama management with find_ollama.py - Add comprehensive test suite covering all functionality - Integrate caching system for fast similarity queries - Support custom output files with -o flag for experimentation ## Key Features Added - find_ollama.py: Centralized Ollama detection, model management, and embedding generation - similarity_index.py: Multi-algorithm similarity search with integrated caching - Comprehensive test suites for both scripts with 60+ test cases - Enhanced PROJECT_INDEX.json with similarity analysis caching - Modular architecture following existing find_python.sh pattern ## CLI Examples - python3 scripts/find_ollama.py --status # Check Ollama status - python3 scripts/similarity_index.py --build-cache --algorithms cosine,euclidean - python3 scripts/similarity_index.py -q "auth function" --algorithm cosine - python3 scripts/similarity_index.py --duplicates - python3 run_tests.py # Run comprehensive test suite 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
1 parent e44ad19 commit 246455c

9 files changed

Lines changed: 1909 additions & 37 deletions

PROJECT_INDEX.json

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

README.md

Lines changed: 52 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ Just add `-i` to any Claude prompt:
2626
claude "fix the auth bug -i" # Auto-creates/uses index (default 50k)
2727
claude "refactor database code -i75" # Target ~75k tokens (if project needs it)
2828
claude "analyze architecture -ic200" # Export up to 200k to clipboard for external AI
29+
claude "find similar functions -ie" # Include neural embeddings (requires Ollama)
2930

3031
# Or manually create/update the index anytime
3132
/index
@@ -92,7 +93,57 @@ claude "architecture review -ic800" # Up to 800k tokens
9293
- ChatGPT
9394
- Grok
9495

95-
**Note**: I'm not using this on large projects myself yet - this is inspiration/theory. Your mileage may vary. If you hit snags, have Claude Code update it to work for your specific use case!
96+
### Neural Embeddings with `-ie` flag
97+
```bash
98+
# Generate index with neural embeddings for each function/class
99+
claude "find similar code patterns -ie" # Includes embeddings
100+
claude "search for duplicates -ie50" # 50k tokens with embeddings
101+
```
102+
103+
**Requirements**:
104+
- Ollama installed and running (`ollama serve`)
105+
- nomic-embed-text model (auto-downloads if needed)
106+
107+
**Benefits**:
108+
- Semantic similarity search
109+
- Find duplicate/similar code patterns
110+
- Better code understanding through vector representations
111+
112+
### Similarity Search (`similarity_index.py`)
113+
114+
Find similar code patterns using neural embeddings with multiple algorithms:
115+
116+
```bash
117+
# Build similarity cache (enhances PROJECT_INDEX.json)
118+
python3 ~/.claude-code-project-index/scripts/similarity_index.py --build-cache --algorithms cosine,euclidean
119+
120+
# Search for similar functions
121+
python3 ~/.claude-code-project-index/scripts/similarity_index.py -q "authentication function"
122+
python3 ~/.claude-code-project-index/scripts/similarity_index.py -q "validate email" --algorithm euclidean
123+
124+
# Find potential duplicates
125+
python3 ~/.claude-code-project-index/scripts/similarity_index.py --duplicates --algorithm cosine
126+
127+
# Custom output file for experiments
128+
python3 ~/.claude-code-project-index/scripts/similarity_index.py --build-cache -o experiment.json --algorithms manhattan
129+
```
130+
131+
**Available Algorithms:**
132+
- `cosine`: Standard cosine similarity (default)
133+
- `euclidean`: Based on Euclidean distance
134+
- `manhattan`: Based on Manhattan distance (L1 norm)
135+
- `dot-product`: Raw dot product similarity
136+
- `jaccard`: Jaccard similarity for binary features
137+
- `weighted-cosine`: Weighted cosine (requires `--weights weights.json`)
138+
139+
**Features:**
140+
- Integrated caching in PROJECT_INDEX.json for fast queries
141+
- Multiple similarity algorithms for different use cases
142+
- Duplicate detection and similarity search
143+
- Custom output files with `-o` flag for experimentation
144+
- Real-time calculation with `--no-cache` for one-off queries
145+
146+
**Note**: Requires embeddings to be generated first with `python3 scripts/project_index.py -e`
96147

97148
## Token Sizing
98149

example_weights.json

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
{
2+
"description": "Example weights for weighted-cosine similarity algorithm",
3+
"weights": [
4+
1.2, 0.8, 1.5, 0.9, 1.1, 0.7, 1.3, 1.0, 0.6, 1.4,
5+
0.8, 1.2, 0.9, 1.1, 0.7, 1.3, 1.0, 0.6, 1.4, 0.8
6+
]
7+
}

run_tests.py

Lines changed: 262 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,262 @@
1+
#!/usr/bin/env python3
2+
"""
3+
Test runner for PROJECT_INDEX scripts
4+
Runs all tests with comprehensive reporting and coverage
5+
"""
6+
7+
import sys
8+
import unittest
9+
import argparse
10+
from pathlib import Path
11+
from io import StringIO
12+
import time
13+
14+
# Add tests directory to path
15+
sys.path.insert(0, str(Path(__file__).parent / "tests"))
16+
sys.path.insert(0, str(Path(__file__).parent / "scripts"))
17+
18+
def run_all_tests(verbosity=2, pattern="test_*.py"):
19+
"""Run all tests with specified verbosity and pattern."""
20+
# Discover and run tests
21+
test_dir = Path(__file__).parent / "tests"
22+
loader = unittest.TestLoader()
23+
suite = loader.discover(str(test_dir), pattern=pattern)
24+
25+
# Custom test result class for better reporting
26+
class CustomTestResult(unittest.TextTestResult):
27+
def __init__(self, stream, descriptions, verbosity):
28+
super().__init__(stream, descriptions, verbosity)
29+
self.test_times = []
30+
self.start_time = None
31+
32+
def startTest(self, test):
33+
super().startTest(test)
34+
self.start_time = time.time()
35+
36+
def stopTest(self, test):
37+
super().stopTest(test)
38+
if self.start_time:
39+
duration = time.time() - self.start_time
40+
self.test_times.append((str(test), duration))
41+
42+
# Run tests with custom result class
43+
stream = sys.stderr if verbosity > 1 else StringIO()
44+
runner = unittest.TextTestRunner(
45+
stream=stream,
46+
verbosity=verbosity,
47+
resultclass=CustomTestResult
48+
)
49+
50+
print("🧪 Running PROJECT_INDEX Test Suite")
51+
print("=" * 50)
52+
53+
result = runner.run(suite)
54+
55+
# Print summary
56+
print("\n📊 Test Summary")
57+
print("-" * 30)
58+
print(f"Tests run: {result.testsRun}")
59+
print(f"Failures: {len(result.failures)}")
60+
print(f"Errors: {len(result.errors)}")
61+
print(f"Skipped: {len(result.skipped) if hasattr(result, 'skipped') else 0}")
62+
63+
# Print slowest tests
64+
if hasattr(result, 'test_times') and result.test_times:
65+
print("\n⏱️ Slowest Tests:")
66+
sorted_times = sorted(result.test_times, key=lambda x: x[1], reverse=True)
67+
for test_name, duration in sorted_times[:5]:
68+
print(f" {duration:.3f}s - {test_name}")
69+
70+
# Print failures and errors
71+
if result.failures:
72+
print("\n❌ Failures:")
73+
for test, traceback in result.failures:
74+
print(f" - {test}")
75+
if verbosity > 1:
76+
print(f" {traceback.strip()}")
77+
78+
if result.errors:
79+
print("\n💥 Errors:")
80+
for test, traceback in result.errors:
81+
print(f" - {test}")
82+
if verbosity > 1:
83+
print(f" {traceback.strip()}")
84+
85+
# Overall result
86+
success = len(result.failures) == 0 and len(result.errors) == 0
87+
if success:
88+
print("\n✅ All tests passed!")
89+
else:
90+
print("\n❌ Some tests failed!")
91+
92+
return success
93+
94+
95+
def run_specific_test(test_name, verbosity=2):
96+
"""Run a specific test file or test case."""
97+
if not test_name.endswith('.py'):
98+
test_name = f"test_{test_name}.py"
99+
100+
# Load the specific test
101+
test_dir = Path(__file__).parent / "tests"
102+
loader = unittest.TestLoader()
103+
104+
try:
105+
if '::' in test_name:
106+
# Specific test method (e.g., test_find_ollama.py::TestOllamaManager::test_init)
107+
parts = test_name.split('::')
108+
module_name = parts[0].replace('.py', '')
109+
110+
if len(parts) == 3:
111+
# Module::Class::Method
112+
class_name, method_name = parts[1], parts[2]
113+
test_path = f"{module_name}.{class_name}.{method_name}"
114+
elif len(parts) == 2:
115+
# Module::Class or Module::Method
116+
test_path = f"{module_name}.{parts[1]}"
117+
118+
suite = loader.loadTestsFromName(test_path)
119+
else:
120+
# Entire test file
121+
module_name = test_name.replace('.py', '')
122+
suite = loader.discover(str(test_dir), pattern=f"{module_name}.py")
123+
124+
runner = unittest.TextTestRunner(verbosity=verbosity)
125+
result = runner.run(suite)
126+
127+
return len(result.failures) == 0 and len(result.errors) == 0
128+
129+
except Exception as e:
130+
print(f"❌ Error running test '{test_name}': {e}")
131+
return False
132+
133+
134+
def list_available_tests():
135+
"""List all available test files and test cases."""
136+
test_dir = Path(__file__).parent / "tests"
137+
138+
print("📋 Available Tests:")
139+
print("=" * 30)
140+
141+
for test_file in sorted(test_dir.glob("test_*.py")):
142+
print(f"\n📄 {test_file.name}")
143+
144+
# Try to parse test classes and methods
145+
try:
146+
with open(test_file, 'r') as f:
147+
content = f.read()
148+
149+
# Simple regex to find test classes and methods
150+
import re
151+
152+
classes = re.findall(r'class (Test\w+)\(.*?\):', content)
153+
for class_name in classes:
154+
print(f" 📁 {class_name}")
155+
156+
# Find methods in this class
157+
class_pattern = rf'class {class_name}.*?(?=class|\Z)'
158+
class_match = re.search(class_pattern, content, re.DOTALL)
159+
if class_match:
160+
class_content = class_match.group(0)
161+
methods = re.findall(r'def (test_\w+)', class_content)
162+
for method in methods:
163+
print(f" 🧪 {method}")
164+
except Exception:
165+
print(" (Could not parse test structure)")
166+
167+
168+
def check_test_dependencies():
169+
"""Check if all required dependencies for testing are available."""
170+
print("🔍 Checking Test Dependencies")
171+
print("-" * 30)
172+
173+
dependencies = {
174+
'unittest': 'unittest',
175+
'unittest.mock': 'unittest.mock',
176+
'json': 'json',
177+
'pathlib': 'pathlib'
178+
}
179+
180+
missing = []
181+
for name, module in dependencies.items():
182+
try:
183+
__import__(module)
184+
print(f"✅ {name}")
185+
except ImportError:
186+
print(f"❌ {name}")
187+
missing.append(name)
188+
189+
if missing:
190+
print(f"\n❌ Missing dependencies: {', '.join(missing)}")
191+
return False
192+
else:
193+
print("\n✅ All dependencies available!")
194+
return True
195+
196+
197+
def main():
198+
"""Main test runner function."""
199+
parser = argparse.ArgumentParser(
200+
description='Test runner for PROJECT_INDEX scripts',
201+
formatter_class=argparse.RawDescriptionHelpFormatter,
202+
epilog='''
203+
Examples:
204+
%(prog)s # Run all tests
205+
%(prog)s --list # List available tests
206+
%(prog)s --test find_ollama # Run find_ollama tests
207+
%(prog)s --test find_ollama.py::TestOllamaManager # Run specific test class
208+
%(prog)s --check-deps # Check test dependencies
209+
%(prog)s --quiet # Run with minimal output
210+
%(prog)s --pattern "test_find*" # Run tests matching pattern
211+
'''
212+
)
213+
214+
parser.add_argument('--test', '-t', type=str,
215+
help='Run specific test file or test case')
216+
parser.add_argument('--list', '-l', action='store_true',
217+
help='List available tests')
218+
parser.add_argument('--check-deps', action='store_true',
219+
help='Check test dependencies')
220+
parser.add_argument('--pattern', '-p', default="test_*.py",
221+
help='Test file pattern (default: test_*.py)')
222+
parser.add_argument('--quiet', '-q', action='store_true',
223+
help='Minimal output')
224+
parser.add_argument('--verbose', '-v', action='store_true',
225+
help='Verbose output')
226+
227+
args = parser.parse_args()
228+
229+
# Determine verbosity
230+
if args.quiet:
231+
verbosity = 0
232+
elif args.verbose:
233+
verbosity = 3
234+
else:
235+
verbosity = 2
236+
237+
# Handle different modes
238+
if args.check_deps:
239+
return 0 if check_test_dependencies() else 1
240+
241+
if args.list:
242+
list_available_tests()
243+
return 0
244+
245+
if args.test:
246+
success = run_specific_test(args.test, verbosity)
247+
return 0 if success else 1
248+
249+
# Run all tests by default
250+
success = run_all_tests(verbosity, args.pattern)
251+
return 0 if success else 1
252+
253+
254+
if __name__ == '__main__':
255+
try:
256+
sys.exit(main())
257+
except KeyboardInterrupt:
258+
print("\n❌ Tests interrupted by user")
259+
sys.exit(130)
260+
except Exception as e:
261+
print(f"❌ Unexpected error: {e}")
262+
sys.exit(1)

0 commit comments

Comments
 (0)