Skip to content

Commit b001f13

Browse files
committed
Add canonical semantic and grep skill flows
1 parent 4b7dbb2 commit b001f13

8 files changed

Lines changed: 368 additions & 83 deletions

File tree

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,8 +81,10 @@ The key is stored once and shared across all agents on the same machine.
8181
Start your agent and ask naturally:
8282

8383
- *"How is authentication implemented?"*
84+
- *"Find the exact regex or string match for this token parser"*
8485
- *"Show me error handling patterns across services"*
8586
- *"Find similar features to guide my implementation"*
87+
- *"Show me who calls this handler and what it depends on"*
8688

8789
No special commands needed — the agent picks up the skill automatically.
8890

skills/codealive-context-engine/SKILL.md

Lines changed: 28 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,8 @@ Do NOT retry the failed script until setup completes successfully.
3838
| Tool | Script | Speed | Cost | Best For |
3939
|------|--------|-------|------|----------|
4040
| **List Data Sources** | `datasources.py` | Instant | Free | Discovering indexed repos and workspaces |
41-
| **Search** | `search.py` | Fast | Low | Finding code locations, descriptions, identifiers |
41+
| **Semantic Search** | `search.py` | Fast | Low | Finding relevant artifacts by meaning |
42+
| **Grep Search** | `grep.py` | Fast | Low | Exact text and regex matches with line previews |
4243
| **Fetch Artifacts** | `fetch.py` | Fast | Low | Retrieving full content for search results |
4344
| **Artifact Relationships** | `relationships.py` | Fast | Low | Drilling into call graph, inheritance, references for one artifact |
4445
| **Chat with Codebase** | `chat.py` | Slow | High | Synthesized answers, architectural explanations |
@@ -85,8 +86,9 @@ python scripts/datasources.py
8586

8687
```bash
8788
python scripts/search.py "JWT token validation" my-backend
88-
python scripts/search.py "error handling patterns" workspace:platform-team --mode deep
89-
python scripts/search.py "authentication flow" my-repo --description-detail full
89+
python scripts/search.py "authentication flow" my-repo --path src/auth --ext .py
90+
python scripts/grep.py "AuthService" my-repo
91+
python scripts/grep.py "auth\\(" my-repo --regex
9092
```
9193

9294
### 3. Fetch full content (for external repos)
@@ -135,11 +137,9 @@ python scripts/search.py <query> <data_sources...> [options]
135137

136138
| Option | Description |
137139
|--------|-------------|
138-
| `--mode auto` | Default. Intelligent semantic search — use 80% of the time |
139-
| `--mode fast` | Quick lexical search for known terms |
140-
| `--mode deep` | Exhaustive search for complex cross-cutting queries. Resource-intensive |
141-
| `--description-detail short` | Default. Brief description of each result |
142-
| `--description-detail full` | More detailed description of each result |
140+
| `--max-results N` | Optional cap for the number of returned artifacts |
141+
| `--path PATH` | Repo-relative path or directory scope (repeatable) |
142+
| `--ext EXT` | File extension scope such as `.py` or `.ts` (repeatable) |
143143

144144
**`description` is a triage pointer ONLY** — it tells you which artifacts are
145145
worth a closer look. It is NOT the source of truth and you must NOT draw
@@ -148,6 +148,25 @@ source: use `fetch.py <identifier>` for external repos, or your editor's
148148
file-read tool on the path for repos in the current working directory. Treat
149149
only that real `content` as ground truth.
150150

151+
### `grep.py` — Exact / Regex Search
152+
153+
Returns artifact-level matches with line previews. Use this when the pattern
154+
itself matters more than semantic similarity.
155+
156+
```bash
157+
python scripts/grep.py <query> <data_sources...> [--regex] [--max-results N] [--path PATH] [--ext EXT]
158+
```
159+
160+
| Option | Description |
161+
|--------|-------------|
162+
| `--regex` | Interpret the query as a regex pattern |
163+
| `--max-results N` | Optional cap for the number of returned artifacts |
164+
| `--path PATH` | Repo-relative path or directory scope (repeatable) |
165+
| `--ext EXT` | File extension scope such as `.py` or `.ts` (repeatable) |
166+
167+
Line previews are still search evidence, not source of truth. Use `fetch.py`
168+
or your local file-read tool before drawing conclusions about behavior.
169+
151170
### `fetch.py` — Fetch Artifact Content
152171

153172
Retrieves the full source code content for artifacts found via search. Use this for external repositories you cannot access locally.
@@ -270,7 +289,7 @@ This skill works standalone, but delivers the best experience when combined with
270289
| Component | What it provides |
271290
|-----------|-----------------|
272291
| **This skill** | Query patterns, workflow guidance, cost-aware tool selection |
273-
| **MCP server** | Direct `codebase_search`, `fetch_artifacts`, `get_artifact_relationships`, `codebase_consultant`, `get_data_sources` tools |
292+
| **MCP server** | Direct `semantic_search`, `grep_search`, `fetch_artifacts`, `get_artifact_relationships`, `codebase_consultant`, `get_data_sources` tools |
274293

275294
When both are installed, prefer the MCP server's tools for direct operations and this skill's scripts for guided workflows.
276295

skills/codealive-context-engine/scripts/fetch.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
# Fetch multiple artifacts
1616
python fetch.py "my-org/backend::src/auth.py::login" "my-org/backend::src/utils.py::helper"
1717
18-
Identifiers come from codebase_search results (the `identifier` field).
18+
Identifiers come from semantic/grep search results (the `identifier` field).
1919
The format is: {owner/repo}::{path}::{symbol} (for symbols/chunks)
2020
{owner/repo}::{path} (for files)
2121
Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
#!/usr/bin/env python3
2+
"""
3+
CodeAlive Grep Search - exact text or regex search across indexed repositories.
4+
5+
Usage:
6+
python grep.py "AuthService" my-repo
7+
python grep.py "auth\\(" my-repo --regex --max-results 25
8+
python grep.py "TODO" workspace:backend-team --path src --ext .py
9+
"""
10+
11+
import sys
12+
from pathlib import Path
13+
14+
sys.path.insert(0, str(Path(__file__).parent / "lib"))
15+
16+
from api_client import CodeAliveClient
17+
18+
19+
def format_grep_results(results: dict) -> str:
20+
items = results.get("results", []) if isinstance(results, dict) else []
21+
if not items:
22+
return "No results found."
23+
24+
output = []
25+
for idx, result in enumerate(items, 1):
26+
location = result.get("location", {})
27+
file_path = location.get("path") or result.get("path")
28+
matches = result.get("matches", [])
29+
30+
output.append(f"\n--- Result #{idx} [{result.get('kind', 'Artifact')}] ---")
31+
if file_path:
32+
output.append(f" File: {file_path}")
33+
if result.get("identifier"):
34+
output.append(f" Identifier: {result['identifier']}")
35+
if result.get("matchCount") is not None:
36+
output.append(f" Match count: {result['matchCount']}")
37+
38+
for match in matches:
39+
output.append(
40+
" "
41+
f"{match.get('lineNumber', '?')}:{match.get('startColumn', '?')}-"
42+
f"{match.get('endColumn', '?')} {match.get('lineText', '')}"
43+
)
44+
45+
output.append(
46+
"\nHint: match previews are search evidence only. Fetch the full source "
47+
"with `python fetch.py <identifier>` or read the local file before reasoning about behavior."
48+
)
49+
return "\n".join(output)
50+
51+
52+
def main():
53+
if len(sys.argv) < 3:
54+
print("Error: Missing required arguments.", file=sys.stderr)
55+
print(
56+
"Usage: python grep.py <query> <data_source> [data_source2...] "
57+
"[--regex] [--max-results N] [--path PATH] [--ext EXT]",
58+
file=sys.stderr,
59+
)
60+
sys.exit(1)
61+
62+
query = sys.argv[1]
63+
data_sources = []
64+
paths = []
65+
extensions = []
66+
max_results = None
67+
regex = False
68+
69+
i = 2
70+
while i < len(sys.argv):
71+
arg = sys.argv[i]
72+
if arg == "--regex":
73+
regex = True
74+
i += 1
75+
elif arg == "--max-results" and i + 1 < len(sys.argv):
76+
max_results = int(sys.argv[i + 1])
77+
i += 2
78+
elif arg == "--path" and i + 1 < len(sys.argv):
79+
paths.append(sys.argv[i + 1])
80+
i += 2
81+
elif arg == "--ext" and i + 1 < len(sys.argv):
82+
extensions.append(sys.argv[i + 1])
83+
i += 2
84+
elif arg == "--help":
85+
print(__doc__)
86+
sys.exit(0)
87+
else:
88+
data_sources.append(arg)
89+
i += 1
90+
91+
if not data_sources:
92+
print(
93+
"Error: At least one data source is required. Run datasources.py to see available sources.",
94+
file=sys.stderr,
95+
)
96+
sys.exit(1)
97+
98+
try:
99+
client = CodeAliveClient()
100+
results = client.grep_search(
101+
query=query,
102+
data_sources=data_sources,
103+
paths=paths or None,
104+
extensions=extensions or None,
105+
max_results=max_results,
106+
regex=regex,
107+
)
108+
print(format_grep_results(results))
109+
except Exception as e:
110+
print(f"Error: {e}", file=sys.stderr)
111+
sys.exit(1)
112+
113+
114+
if __name__ == "__main__":
115+
main()

skills/codealive-context-engine/scripts/lib/api_client.py

Lines changed: 125 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -268,6 +268,52 @@ def search(
268268
}
269269
return self._make_request("GET", "/api/search", params=params)
270270

271+
def semantic_search(
272+
self,
273+
query: str,
274+
data_sources: List[str],
275+
paths: Optional[List[str]] = None,
276+
extensions: Optional[List[str]] = None,
277+
max_results: Optional[int] = None,
278+
) -> Dict[str, Any]:
279+
"""Search indexed artifacts semantically using the canonical API."""
280+
params: Dict[str, Any] = {
281+
"Query": query,
282+
"Names": data_sources,
283+
}
284+
if paths:
285+
params["Paths"] = paths
286+
if extensions:
287+
params["Extensions"] = extensions
288+
if max_results is not None:
289+
params["MaxResults"] = max_results
290+
291+
return self._make_request("GET", "/api/search/semantic", params=params)
292+
293+
def grep_search(
294+
self,
295+
query: str,
296+
data_sources: List[str],
297+
paths: Optional[List[str]] = None,
298+
extensions: Optional[List[str]] = None,
299+
max_results: Optional[int] = None,
300+
regex: bool = False,
301+
) -> Dict[str, Any]:
302+
"""Search indexed artifacts by exact text or regex using the canonical API."""
303+
params: Dict[str, Any] = {
304+
"Query": query,
305+
"Names": data_sources,
306+
"Regex": str(regex).lower(),
307+
}
308+
if paths:
309+
params["Paths"] = paths
310+
if extensions:
311+
params["Extensions"] = extensions
312+
if max_results is not None:
313+
params["MaxResults"] = max_results
314+
315+
return self._make_request("GET", "/api/search/grep", params=params)
316+
271317
def fetch_artifacts(
272318
self,
273319
identifiers: List[str],
@@ -393,6 +439,8 @@ def main():
393439
print("Commands:")
394440
print(" datasources [--all]")
395441
print(" search <query> <data_source1> [data_source2...] [--mode auto|fast|deep] [--description-detail short|full]")
442+
print(" semantic-search <query> <data_source1> [data_source2...] [--path PATH] [--ext EXT] [--max-results N]")
443+
print(" grep-search <query> <data_source1> [data_source2...] [--regex] [--path PATH] [--ext EXT] [--max-results N]")
396444
print(" fetch <identifier1> [identifier2...]")
397445
print(" relationships <identifier> [--profile callsOnly|inheritanceOnly|allRelevant|referencesOnly] [--max-count N]")
398446
print(" chat <question> <data_source1> [data_source2...] [--conversation-id ID]")
@@ -433,6 +481,83 @@ def main():
433481
result = client.search(query, data_sources, mode, description_detail)
434482
print(json.dumps(result, indent=2))
435483

484+
elif command == "semantic-search":
485+
if len(sys.argv) < 4:
486+
print("Usage: semantic-search <query> <data_source1> [data_source2...] [--path PATH] [--ext EXT] [--max-results N]")
487+
sys.exit(1)
488+
489+
query = sys.argv[2]
490+
data_sources = []
491+
paths = []
492+
extensions = []
493+
max_results = None
494+
495+
i = 3
496+
while i < len(sys.argv):
497+
arg = sys.argv[i]
498+
if arg == "--path" and i + 1 < len(sys.argv):
499+
paths.append(sys.argv[i + 1])
500+
i += 2
501+
elif arg == "--ext" and i + 1 < len(sys.argv):
502+
extensions.append(sys.argv[i + 1])
503+
i += 2
504+
elif arg == "--max-results" and i + 1 < len(sys.argv):
505+
max_results = int(sys.argv[i + 1])
506+
i += 2
507+
else:
508+
data_sources.append(arg)
509+
i += 1
510+
511+
result = client.semantic_search(
512+
query,
513+
data_sources,
514+
paths=paths or None,
515+
extensions=extensions or None,
516+
max_results=max_results,
517+
)
518+
print(json.dumps(result, indent=2))
519+
520+
elif command == "grep-search":
521+
if len(sys.argv) < 4:
522+
print("Usage: grep-search <query> <data_source1> [data_source2...] [--regex] [--path PATH] [--ext EXT] [--max-results N]")
523+
sys.exit(1)
524+
525+
query = sys.argv[2]
526+
data_sources = []
527+
paths = []
528+
extensions = []
529+
max_results = None
530+
regex = False
531+
532+
i = 3
533+
while i < len(sys.argv):
534+
arg = sys.argv[i]
535+
if arg == "--regex":
536+
regex = True
537+
i += 1
538+
elif arg == "--path" and i + 1 < len(sys.argv):
539+
paths.append(sys.argv[i + 1])
540+
i += 2
541+
elif arg == "--ext" and i + 1 < len(sys.argv):
542+
extensions.append(sys.argv[i + 1])
543+
i += 2
544+
elif arg == "--max-results" and i + 1 < len(sys.argv):
545+
max_results = int(sys.argv[i + 1])
546+
i += 2
547+
else:
548+
data_sources.append(arg)
549+
i += 1
550+
551+
result = client.grep_search(
552+
query,
553+
data_sources,
554+
paths=paths or None,
555+
extensions=extensions or None,
556+
max_results=max_results,
557+
regex=regex,
558+
)
559+
print(json.dumps(result, indent=2))
560+
436561
elif command == "fetch":
437562
if len(sys.argv) < 3:
438563
print("Usage: fetch <identifier1> [identifier2...]")

0 commit comments

Comments
 (0)