Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,8 +81,10 @@ The key is stored once and shared across all agents on the same machine.
Start your agent and ask naturally:

- *"How is authentication implemented?"*
- *"Find the exact regex or string match for this token parser"*
- *"Show me error handling patterns across services"*
- *"Find similar features to guide my implementation"*
- *"Show me who calls this handler and what it depends on"*

No special commands needed — the agent picks up the skill automatically.

Expand Down
37 changes: 28 additions & 9 deletions skills/codealive-context-engine/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,8 @@ Do NOT retry the failed script until setup completes successfully.
| Tool | Script | Speed | Cost | Best For |
|------|--------|-------|------|----------|
| **List Data Sources** | `datasources.py` | Instant | Free | Discovering indexed repos and workspaces |
| **Search** | `search.py` | Fast | Low | Finding code locations, descriptions, identifiers |
| **Semantic Search** | `search.py` | Fast | Low | Finding relevant artifacts by meaning |
| **Grep Search** | `grep.py` | Fast | Low | Exact text and regex matches with line previews |
| **Fetch Artifacts** | `fetch.py` | Fast | Low | Retrieving full content for search results |
| **Artifact Relationships** | `relationships.py` | Fast | Low | Drilling into call graph, inheritance, references for one artifact |
| **Chat with Codebase** | `chat.py` | Slow | High | Synthesized answers, architectural explanations |
Expand Down Expand Up @@ -85,8 +86,9 @@ python scripts/datasources.py

```bash
python scripts/search.py "JWT token validation" my-backend
python scripts/search.py "error handling patterns" workspace:platform-team --mode deep
python scripts/search.py "authentication flow" my-repo --description-detail full
python scripts/search.py "authentication flow" my-repo --path src/auth --ext .py
python scripts/grep.py "AuthService" my-repo
python scripts/grep.py "auth\\(" my-repo --regex
```

### 3. Fetch full content (for external repos)
Expand Down Expand Up @@ -135,11 +137,9 @@ python scripts/search.py <query> <data_sources...> [options]

| Option | Description |
|--------|-------------|
| `--mode auto` | Default. Intelligent semantic search — use 80% of the time |
| `--mode fast` | Quick lexical search for known terms |
| `--mode deep` | Exhaustive search for complex cross-cutting queries. Resource-intensive |
| `--description-detail short` | Default. Brief description of each result |
| `--description-detail full` | More detailed description of each result |
| `--max-results N` | Optional cap for the number of returned artifacts |
| `--path PATH` | Repo-relative path or directory scope (repeatable) |
| `--ext EXT` | File extension scope such as `.py` or `.ts` (repeatable) |

**`description` is a triage pointer ONLY** — it tells you which artifacts are
worth a closer look. It is NOT the source of truth and you must NOT draw
Expand All @@ -148,6 +148,25 @@ source: use `fetch.py <identifier>` for external repos, or your editor's
file-read tool on the path for repos in the current working directory. Treat
only that real `content` as ground truth.

### `grep.py` — Exact / Regex Search

Returns artifact-level matches with line previews. Use this when the pattern
itself matters more than semantic similarity.

```bash
python scripts/grep.py <query> <data_sources...> [--regex] [--max-results N] [--path PATH] [--ext EXT]
```

| Option | Description |
|--------|-------------|
| `--regex` | Interpret the query as a regex pattern |
| `--max-results N` | Optional cap for the number of returned artifacts |
| `--path PATH` | Repo-relative path or directory scope (repeatable) |
| `--ext EXT` | File extension scope such as `.py` or `.ts` (repeatable) |

Line previews are still search evidence, not source of truth. Use `fetch.py`
or your local file-read tool before drawing conclusions about behavior.

### `fetch.py` — Fetch Artifact Content

Retrieves the full source code content for artifacts found via search. Use this for external repositories you cannot access locally.
Expand Down Expand Up @@ -270,7 +289,7 @@ This skill works standalone, but delivers the best experience when combined with
| Component | What it provides |
|-----------|-----------------|
| **This skill** | Query patterns, workflow guidance, cost-aware tool selection |
| **MCP server** | Direct `codebase_search`, `fetch_artifacts`, `get_artifact_relationships`, `codebase_consultant`, `get_data_sources` tools |
| **MCP server** | Direct `semantic_search`, `grep_search`, `fetch_artifacts`, `get_artifact_relationships`, `codebase_consultant`, `get_data_sources` tools |

When both are installed, prefer the MCP server's tools for direct operations and this skill's scripts for guided workflows.

Expand Down
2 changes: 1 addition & 1 deletion skills/codealive-context-engine/scripts/fetch.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
# Fetch multiple artifacts
python fetch.py "my-org/backend::src/auth.py::login" "my-org/backend::src/utils.py::helper"

Identifiers come from codebase_search results (the `identifier` field).
Identifiers come from semantic/grep search results (the `identifier` field).
The format is: {owner/repo}::{path}::{symbol} (for symbols/chunks)
{owner/repo}::{path} (for files)

Expand Down
115 changes: 115 additions & 0 deletions skills/codealive-context-engine/scripts/grep.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
#!/usr/bin/env python3
"""
CodeAlive Grep Search - exact text or regex search across indexed repositories.

Usage:
python grep.py "AuthService" my-repo
python grep.py "auth\\(" my-repo --regex --max-results 25
python grep.py "TODO" workspace:backend-team --path src --ext .py
"""

import sys
from pathlib import Path

sys.path.insert(0, str(Path(__file__).parent / "lib"))

from api_client import CodeAliveClient


def format_grep_results(results: dict) -> str:
items = results.get("results", []) if isinstance(results, dict) else []
if not items:
return "No results found."

output = []
for idx, result in enumerate(items, 1):
location = result.get("location", {})
file_path = location.get("path") or result.get("path")
matches = result.get("matches", [])

output.append(f"\n--- Result #{idx} [{result.get('kind', 'Artifact')}] ---")
if file_path:
output.append(f" File: {file_path}")
if result.get("identifier"):
output.append(f" Identifier: {result['identifier']}")
Comment on lines +26 to +34

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The result formatting in grep.py is missing logic to check for filePath and to extract the file path from the identifier if explicit path fields are missing. This logic is present in search.py and should be included here for consistency and to ensure the file path is displayed whenever possible.

        location = result.get("location", {})
        file_path = location.get("path") or result.get("filePath") or result.get("path")
        identifier = result.get("identifier", "")
        matches = result.get("matches", [])

        if not file_path and identifier and "::" in identifier:
            parts = identifier.split("::")
            if len(parts) >= 2:
                file_path = parts[1]

        output.append(f"\n--- Result #{idx} [{result.get('kind', 'Artifact')}] ---")
        if file_path:
            output.append(f"  File: {file_path}")
        if identifier:
            output.append(f"  Identifier: {identifier}")

if result.get("matchCount") is not None:
output.append(f" Match count: {result['matchCount']}")

for match in matches:
output.append(
" "
f"{match.get('lineNumber', '?')}:{match.get('startColumn', '?')}-"
f"{match.get('endColumn', '?')} {match.get('lineText', '')}"
)

output.append(
"\nHint: match previews are search evidence only. Fetch the full source "
"with `python fetch.py <identifier>` or read the local file before reasoning about behavior."
)
return "\n".join(output)


def main():
if len(sys.argv) < 3:
print("Error: Missing required arguments.", file=sys.stderr)
print(
"Usage: python grep.py <query> <data_source> [data_source2...] "
"[--regex] [--max-results N] [--path PATH] [--ext EXT]",
file=sys.stderr,
)
sys.exit(1)

query = sys.argv[1]
data_sources = []
paths = []
extensions = []
max_results = None
regex = False

i = 2
while i < len(sys.argv):
arg = sys.argv[i]
if arg == "--regex":
regex = True
i += 1
elif arg == "--max-results" and i + 1 < len(sys.argv):
max_results = int(sys.argv[i + 1])
i += 2
Comment on lines +75 to +77

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The int() conversion for --max-results will raise a ValueError and cause the script to crash with a stack trace if a non-integer value is provided. It is better to handle this gracefully with a user-friendly error message.

Suggested change
elif arg == "--max-results" and i + 1 < len(sys.argv):
max_results = int(sys.argv[i + 1])
i += 2
elif arg == "--max-results" and i + 1 < len(sys.argv):
try:
max_results = int(sys.argv[i + 1])
except ValueError:
print(f"Error: --max-results must be an integer, got '{sys.argv[i + 1]}'", file=sys.stderr)
sys.exit(1)
i += 2

elif arg == "--path" and i + 1 < len(sys.argv):
paths.append(sys.argv[i + 1])
i += 2
elif arg == "--ext" and i + 1 < len(sys.argv):
extensions.append(sys.argv[i + 1])
i += 2
elif arg == "--help":
print(__doc__)
sys.exit(0)
else:
data_sources.append(arg)
i += 1
Comment on lines +87 to +89

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current argument parsing logic treats any unknown argument as a data source. This can lead to confusing behavior if a user makes a typo in a flag (e.g., --max-result instead of --max-results), as the typo will be added to the list of data sources and likely cause an API error later. It is safer to validate that unknown arguments do not start with --.

        elif arg.startswith("--"):
            print(f"Error: Unknown option '{arg}'", file=sys.stderr)
            sys.exit(1)
        else:
            data_sources.append(arg)
            i += 1


if not data_sources:
print(
"Error: At least one data source is required. Run datasources.py to see available sources.",
file=sys.stderr,
)
sys.exit(1)

try:
client = CodeAliveClient()
results = client.grep_search(
query=query,
data_sources=data_sources,
paths=paths or None,
extensions=extensions or None,
max_results=max_results,
regex=regex,
)
print(format_grep_results(results))
except Exception as e:
print(f"Error: {e}", file=sys.stderr)
sys.exit(1)


if __name__ == "__main__":
main()
125 changes: 125 additions & 0 deletions skills/codealive-context-engine/scripts/lib/api_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -268,6 +268,52 @@ def search(
}
return self._make_request("GET", "/api/search", params=params)

def semantic_search(
self,
query: str,
data_sources: List[str],
paths: Optional[List[str]] = None,
extensions: Optional[List[str]] = None,
max_results: Optional[int] = None,
) -> Dict[str, Any]:
"""Search indexed artifacts semantically using the canonical API."""
params: Dict[str, Any] = {
"Query": query,
"Names": data_sources,
}
if paths:
params["Paths"] = paths
if extensions:
params["Extensions"] = extensions
if max_results is not None:
params["MaxResults"] = max_results

return self._make_request("GET", "/api/search/semantic", params=params)

def grep_search(
self,
query: str,
data_sources: List[str],
paths: Optional[List[str]] = None,
extensions: Optional[List[str]] = None,
max_results: Optional[int] = None,
regex: bool = False,
) -> Dict[str, Any]:
"""Search indexed artifacts by exact text or regex using the canonical API."""
params: Dict[str, Any] = {
"Query": query,
"Names": data_sources,
"Regex": str(regex).lower(),
}
if paths:
params["Paths"] = paths
if extensions:
params["Extensions"] = extensions
if max_results is not None:
params["MaxResults"] = max_results

return self._make_request("GET", "/api/search/grep", params=params)

def fetch_artifacts(
self,
identifiers: List[str],
Expand Down Expand Up @@ -393,6 +439,8 @@ def main():
print("Commands:")
print(" datasources [--all]")
print(" search <query> <data_source1> [data_source2...] [--mode auto|fast|deep] [--description-detail short|full]")
print(" semantic-search <query> <data_source1> [data_source2...] [--path PATH] [--ext EXT] [--max-results N]")
print(" grep-search <query> <data_source1> [data_source2...] [--regex] [--path PATH] [--ext EXT] [--max-results N]")
print(" fetch <identifier1> [identifier2...]")
print(" relationships <identifier> [--profile callsOnly|inheritanceOnly|allRelevant|referencesOnly] [--max-count N]")
print(" chat <question> <data_source1> [data_source2...] [--conversation-id ID]")
Expand Down Expand Up @@ -433,6 +481,83 @@ def main():
result = client.search(query, data_sources, mode, description_detail)
print(json.dumps(result, indent=2))

elif command == "semantic-search":
if len(sys.argv) < 4:
print("Usage: semantic-search <query> <data_source1> [data_source2...] [--path PATH] [--ext EXT] [--max-results N]")
sys.exit(1)

query = sys.argv[2]
data_sources = []
paths = []
extensions = []
max_results = None

i = 3
while i < len(sys.argv):
arg = sys.argv[i]
if arg == "--path" and i + 1 < len(sys.argv):
paths.append(sys.argv[i + 1])
i += 2
elif arg == "--ext" and i + 1 < len(sys.argv):
extensions.append(sys.argv[i + 1])
i += 2
elif arg == "--max-results" and i + 1 < len(sys.argv):
max_results = int(sys.argv[i + 1])
i += 2
Comment on lines +504 to +506

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The int() conversion for --max-results should be wrapped in a try-except block to avoid a stack trace on invalid user input.

Suggested change
elif arg == "--max-results" and i + 1 < len(sys.argv):
max_results = int(sys.argv[i + 1])
i += 2
elif arg == "--max-results" and i + 1 < len(sys.argv):
try:
max_results = int(sys.argv[i + 1])
except ValueError:
print(f"Error: --max-results must be an integer, got '{sys.argv[i + 1]}'", file=sys.stderr)
sys.exit(1)
i += 2

else:
data_sources.append(arg)
i += 1

result = client.semantic_search(
query,
data_sources,
paths=paths or None,
extensions=extensions or None,
max_results=max_results,
)
print(json.dumps(result, indent=2))

elif command == "grep-search":
if len(sys.argv) < 4:
print("Usage: grep-search <query> <data_source1> [data_source2...] [--regex] [--path PATH] [--ext EXT] [--max-results N]")
sys.exit(1)

query = sys.argv[2]
data_sources = []
paths = []
extensions = []
max_results = None
regex = False

i = 3
while i < len(sys.argv):
arg = sys.argv[i]
if arg == "--regex":
regex = True
i += 1
elif arg == "--path" and i + 1 < len(sys.argv):
paths.append(sys.argv[i + 1])
i += 2
elif arg == "--ext" and i + 1 < len(sys.argv):
extensions.append(sys.argv[i + 1])
i += 2
elif arg == "--max-results" and i + 1 < len(sys.argv):
max_results = int(sys.argv[i + 1])
i += 2
Comment on lines +544 to +546

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The int() conversion for --max-results should be wrapped in a try-except block to avoid a stack trace on invalid user input.

Suggested change
elif arg == "--max-results" and i + 1 < len(sys.argv):
max_results = int(sys.argv[i + 1])
i += 2
elif arg == "--max-results" and i + 1 < len(sys.argv):
try:
max_results = int(sys.argv[i + 1])
except ValueError:
print(f"Error: --max-results must be an integer, got '{sys.argv[i + 1]}'", file=sys.stderr)
sys.exit(1)
i += 2

else:
data_sources.append(arg)
i += 1

result = client.grep_search(
query,
data_sources,
paths=paths or None,
extensions=extensions or None,
max_results=max_results,
regex=regex,
)
print(json.dumps(result, indent=2))

elif command == "fetch":
if len(sys.argv) < 3:
print("Usage: fetch <identifier1> [identifier2...]")
Expand Down
Loading