Skip to content

Commit 4160f2f

Browse files
authored
Merge pull request #3 from CodeAlive-AI/feature/datasource-relevance-filter
Add datasource relevance filter support (--query)
2 parents ad298b7 + 1d54983 commit 4160f2f

8 files changed

Lines changed: 335 additions & 28 deletions

File tree

.claude-plugin/plugin.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
22
"name": "codealive",
33
"description": "CodeAlive context engine for semantic code search and AI-powered codebase Q&A. Enables AI coding agents to understand entire codebases beyond just open files — search across all indexed repositories, trace cross-service dependencies, discover usage patterns, and get synthesized answers to architectural questions. Includes a lightweight code exploration subagent, authentication hooks, and multiple search modes (fast lexical, semantic, and deep cross-cutting). Works standalone or alongside the CodeAlive MCP server for direct tool access via the Model Context Protocol.",
4-
"version": "2.0.9",
4+
"version": "2.1.0",
55
"author": {
66
"name": "CodeAlive AI",
77
"email": "hello@codealive.ai"

agents/codealive-context-explorer.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ You are a code exploration specialist. **Your default tool is CodeAlive — not
1616
Unless the request is unambiguously a local-only file lookup ("read line 42 of foo.ts", "is bar.py in this repo"), your first turn MUST include both of these calls before any answer:
1717

1818
```bash
19-
python scripts/datasources.py
19+
python scripts/datasources.py --query "<the user's question or task>"
2020
python scripts/search.py "<question paraphrased as a concept query>" <data_source>
2121
```
2222

@@ -28,9 +28,12 @@ The scripts directory is relative to the skill location. If a path fails, fall b
2828

2929
### 1. List data sources — run FIRST every session
3030
```bash
31-
python scripts/datasources.py
31+
python scripts/datasources.py --query "<the user's question or task>"
3232
```
33-
Without this you do not know what to search against. Instant, free, cheap.
33+
Without this you do not know what to search against. Pass the user's question as `--query` so
34+
the backend returns only the relevant sources, each with a `relevanceReason`. The output tells
35+
you when sources were omitted, and when filtering was unavailable (the full list is returned
36+
instead — fail-open). Omit `--query` only when the user asks for the complete inventory.
3437

3538
### 2. Semantic search — your default discovery tool
3639
```bash
@@ -64,7 +67,7 @@ Use after `search.py` or `fetch.py` to expand a call graph, inheritance, or symb
6467

6568
Standard loop, in order:
6669

67-
1. **`datasources.py`** — every session, no exceptions.
70+
1. **`datasources.py --query "<user's task>"`** — every session, no exceptions. The relevance-filtered shortlist tells you what to search against; if a source you expected is missing, rerun without `--query` to see the full list.
6871
2. **`search.py`** with the main concept — every session, no exceptions. Run it even when you have a guess; the search confirms or refutes it with real evidence.
6972
3. **`grep.py`** for specific identifiers, error messages, or config keys surfaced in step 2.
7073
4. **`fetch.py`** on the most relevant identifiers (descriptions are triage pointers only — never reason from them).

skills/codealive-context-engine/SKILL.md

Lines changed: 17 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ Do NOT retry the failed script until setup completes successfully.
3737

3838
| Tool | Script | Speed | Cost | Best For |
3939
|------|--------|-------|------|----------|
40-
| **List Data Sources** | `datasources.py` | Instant | Free | Discovering indexed repos and workspaces |
40+
| **List Data Sources** | `datasources.py` | Instant | Free | Discovering indexed repos and workspaces. With `--query "task"`, runs an AI relevance filter (low cost, not instant) returning only the relevant sources |
4141
| **Semantic Search** | `search.py` | Fast | Low | Default discovery — finds code by meaning (concepts, behavior, architecture) |
4242
| **Grep Search** | `grep.py` | Fast | Low | Finds code containing a specific string or regex (identifiers, literals, patterns) |
4343
| **Fetch Artifacts** | `fetch.py` | Fast | Low | Retrieving full content; function-like artifacts also include up to 3 outgoing/incoming calls as a preview |
@@ -106,9 +106,13 @@ logic.
106106
### 1. Discover what's indexed
107107

108108
```bash
109-
python scripts/datasources.py
109+
python scripts/datasources.py --query "the user's task in natural language"
110110
```
111111

112+
Recommended: pass the user's task as `--query` so the backend returns only the relevant
113+
data sources, each with a `relevanceReason`. Omit `--query` to list everything (instant,
114+
no AI filtering).
115+
112116
### 2. Search for code (fast, cheap)
113117

114118
```bash
@@ -151,11 +155,21 @@ python scripts/chat.py "What about security considerations?" --continue CONV_ID
151155
### `datasources.py` — List Data Sources
152156

153157
```bash
154-
python scripts/datasources.py # Ready-to-use sources
158+
python scripts/datasources.py --query "add OAuth to checkout" # Only sources relevant to a task (recommended)
159+
python scripts/datasources.py # Ready-to-use sources (full list)
155160
python scripts/datasources.py --all # All (including processing)
156161
python scripts/datasources.py --json # JSON output
157162
```
158163

164+
| Option | Description |
165+
|--------|-------------|
166+
| `--query "TASK"` | The user's task/intent in natural language. The backend runs an AI relevance filter and returns only the relevant sources, each with a `relevanceReason`. Recommended whenever you know what the user is trying to accomplish |
167+
| `--all` | Include sources still processing |
168+
| `--json` | Raw JSON output (with `--query`: `{"dataSources": [...], "message": "..."}`) |
169+
170+
**Fail-open:** if relevance filtering is unavailable, the FULL list is returned and the
171+
output says so — check the message before treating the result as a relevant shortlist.
172+
159173
### `search.py` — Semantic Code Search (default discovery tool)
160174

161175
The default starting point. Finds code by WHAT it does — concepts, behavior,

skills/codealive-context-engine/references/workflows.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,13 @@ Complete workflows for common code exploration scenarios using CodeAlive.
2020

2121
### Step 1: Discover Available Code
2222
```bash
23-
python datasources.py
23+
python datasources.py --query "your task in natural language"
2424
```
2525

26+
Pass your task as `--query` to get only the relevant data sources, each with a
27+
`relevanceReason` (recommended when you know the goal). Run plain `python datasources.py`
28+
for the complete inventory.
29+
2630
Review output to understand:
2731
- What repositories are indexed
2832
- What workspaces group related repos
@@ -287,8 +291,8 @@ python grep.py "useMemo|useCallback|React.memo" workspace:all-frontend --regex
287291
### Day 1: Get Overview
288292

289293
```bash
290-
# Discover what's indexed
291-
python datasources.py
294+
# Discover what's indexed (relevance-filtered to the onboarding goal)
295+
python datasources.py --query "onboard to the new-service codebase"
292296

293297
# Find entry points and main features
294298
python search.py "main application entry point, startup initialization" new-service

skills/codealive-context-engine/scripts/datasources.py

Lines changed: 49 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,16 @@
66
Includes current project repos, dependencies, libraries, and organizational codebases.
77
88
Usage:
9-
python datasources.py # Show ready-to-use data sources
10-
python datasources.py --all # Show all data sources (including processing)
11-
python datasources.py --json # Output as JSON
9+
python datasources.py # Show ready-to-use data sources
10+
python datasources.py --query "TASK" # Show only sources relevant to a task (recommended)
11+
python datasources.py --all # Show all data sources (including processing)
12+
python datasources.py --json # Output as JSON
1213
1314
Examples:
15+
# RECOMMENDED when you know the task: only sources relevant to it, each with a
16+
# relevanceReason explaining the match
17+
python datasources.py --query "add OAuth to the checkout flow"
18+
1419
# List ready data sources
1520
python datasources.py
1621
@@ -19,6 +24,10 @@
1924
2025
# Get JSON output for parsing
2126
python datasources.py --json
27+
28+
Note:
29+
--query runs an AI relevance filter on the backend. It fails open: if filtering is
30+
unavailable, the FULL list is returned and the output says so.
2231
"""
2332

2433
import sys
@@ -31,17 +40,27 @@
3140
from api_client import CodeAliveClient
3241

3342

34-
def format_datasources(datasources: list, as_json: bool = False) -> str:
35-
"""Format data sources for display."""
43+
def format_datasources(datasources: list, as_json: bool = False, message: str = "") -> str:
44+
"""Format data sources for display.
45+
46+
`message` is the relevance hint accompanying a --query'd listing: how many sources
47+
were omitted as non-relevant, or that filtering was unavailable and the list is full.
48+
"""
3649
if as_json:
50+
if message:
51+
return json.dumps({"dataSources": datasources, "message": message}, indent=2)
3752
return json.dumps(datasources, indent=2)
3853

3954
if not datasources:
55+
if message:
56+
return f"No data sources matched.\nℹ️ {message}"
4057
return "No data sources found.\nAdd repositories at https://app.codealive.ai"
4158

4259
output = []
4360
output.append(f"\n📚 Available Data Sources ({len(datasources)} total)\n")
4461
output.append("="*80)
62+
if message:
63+
output.append(f"\nℹ️ {message}")
4564

4665
# Group by type
4766
repos = [ds for ds in datasources if ds.get("type") == "Repository"]
@@ -58,6 +77,8 @@ def format_datasources(datasources: list, as_json: bool = False) -> str:
5877
status = f" [{state}]" if state and state != "Alive" else ""
5978
output.append(f"\n 📁 {name}{status}")
6079
output.append(f" {desc}")
80+
if ws.get("relevanceReason"):
81+
output.append(f" 🎯 {ws['relevanceReason']}")
6182

6283
if repos:
6384
output.append("\n\n📦 REPOSITORIES")
@@ -71,6 +92,8 @@ def format_datasources(datasources: list, as_json: bool = False) -> str:
7192
status = f" [{state}]" if state and state != "Alive" else ""
7293
output.append(f"\n 📄 {name}{status}")
7394
output.append(f" {desc}")
95+
if repo.get("relevanceReason"):
96+
output.append(f" 🎯 {repo['relevanceReason']}")
7497
if url:
7598
output.append(f" 🔗 {url}")
7699

@@ -79,6 +102,7 @@ def format_datasources(datasources: list, as_json: bool = False) -> str:
79102
output.append(" • Use names with search.py, grep.py, and fetch.py")
80103
output.append(" • Workspaces search ALL repos in the workspace")
81104
output.append(" • Combine multiple data sources for broader search")
105+
output.append(" • Pass --query 'your task' to list only the relevant sources")
82106
output.append("\n📖 Examples:")
83107
output.append(" python search.py 'auth logic' my-backend")
84108
output.append(" python grep.py 'AuthService' my-backend")
@@ -90,20 +114,37 @@ def main():
90114
"""CLI interface for listing data sources."""
91115
alive_only = True
92116
as_json = False
117+
query = None
93118

94-
for arg in sys.argv[1:]:
119+
args = sys.argv[1:]
120+
i = 0
121+
while i < len(args):
122+
arg = args[i]
95123
if arg == "--all":
96124
alive_only = False
97125
elif arg == "--json":
98126
as_json = True
127+
elif arg == "--query":
128+
if i + 1 >= len(args):
129+
print("❌ Error: --query requires a value", file=sys.stderr)
130+
sys.exit(1)
131+
query = args[i + 1]
132+
i += 1
99133
elif arg == "--help":
100134
print(__doc__)
101135
sys.exit(0)
136+
i += 1
102137

103138
try:
104139
client = CodeAliveClient()
105-
datasources = client.get_datasources(alive_only=alive_only)
106-
print(format_datasources(datasources, as_json))
140+
result = client.get_datasources(alive_only=alive_only, query=query)
141+
if isinstance(result, dict):
142+
datasources = result.get("dataSources", [])
143+
message = result.get("message", "")
144+
else:
145+
datasources = result
146+
message = ""
147+
print(format_datasources(datasources, as_json, message))
107148

108149
except Exception as e:
109150
print(f"❌ Error: {e}", file=sys.stderr)

0 commit comments

Comments
 (0)