Skip to content

Commit c0097d8

Browse files
committed
Add DataSource parameter for artifacts endpoints
1 parent 4160f2f commit c0097d8

8 files changed

Lines changed: 261 additions & 24 deletions

File tree

skills/codealive-context-engine/SKILL.md

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -218,20 +218,35 @@ or your local file-read tool before drawing conclusions about behavior.
218218
Retrieves the full source code content for artifacts found via search. Use this for external repositories you cannot access locally.
219219

220220
```bash
221-
python scripts/fetch.py <identifier1> [identifier2...]
221+
python scripts/fetch.py <identifier1> [identifier2...] [--data-source NAME_OR_ID]
222222
```
223223

224224
| Constraint | Value |
225225
|-----------|-------|
226226
| Max identifiers per request | 20 |
227227
| Identifiers source | `identifier` field from search results |
228228
| Identifier format | `{owner/repo}::{path}::{symbol}` (symbols), `{owner/repo}::{path}` (files) |
229+
| `--data-source NAME_OR_ID` | Optional. Data source Name or Id (from a result's `Source:` line) to disambiguate an identifier indexed in more than one data source |
229230

230231
For function-like artifacts the response includes a small **relationships
231232
preview** (up to 3 outgoing/incoming calls per direction). To see the full
232233
call graph, inheritance, or references, run `relationships.py` with the
233234
artifact's identifier.
234235

236+
**Disambiguating an identifier that lives in more than one data source.** Artifact
237+
identifiers are unique only per data source, so the same identifier can belong to
238+
more than one data source. If you fetch such an identifier without `--data-source`,
239+
the backend returns a **409** listing the candidate data sources instead of picking
240+
one for you. Every listed candidate **will** resolve, so the workflow is: call without
241+
`--data-source` → read the 409 candidates → try one → if that data source isn't the one
242+
you want, try the next. To resolve it: take the
243+
`Source:` name or id shown next to the search result you want and pass it back —
244+
`python scripts/fetch.py <identifier> --data-source "backend"` (or the id).
245+
The same `--data-source` flag works on `relationships.py`. If a `--data-source`-scoped
246+
call finds nothing (the script prints a "nothing was found in data source …" hint),
247+
the identifier belongs to a different data source or the selector is wrong: retry with
248+
a different `Source:` value, or drop `--data-source` to get the 409 candidate list.
249+
235250
### `relationships.py` — Drill into an Artifact's Relationship Graph
236251

237252
Returns the full call graph (incoming/outgoing calls), inheritance hierarchy
@@ -241,7 +256,7 @@ identifier and want to understand how the artifact relates to the rest of the
241256
codebase.
242257

243258
```bash
244-
python scripts/relationships.py <identifier> [--profile PROFILE] [--max-count N]
259+
python scripts/relationships.py <identifier> [--profile PROFILE] [--max-count N] [--data-source NAME_OR_ID]
245260
```
246261

247262
| Option | Description |
@@ -251,6 +266,7 @@ python scripts/relationships.py <identifier> [--profile PROFILE] [--max-count N]
251266
| `--profile allRelevant` | Calls + inheritance (4 groups) |
252267
| `--profile referencesOnly` | Symbol references |
253268
| `--max-count N` | Max related artifacts per relationship type (1–1000, default 50) |
269+
| `--data-source NAME_OR_ID` | Optional. Data source Name or Id to disambiguate an identifier indexed in more than one data source (same 409 contract as `fetch.py`) |
254270
| `--json` | Emit the raw JSON response instead of the formatted view |
255271

256272
**When this adds value vs the fetch preview:**

skills/codealive-context-engine/scripts/fetch.py

Lines changed: 44 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
CodeAlive Fetch - Retrieve full content for code artifacts
44
55
Usage:
6-
python fetch.py <identifier1> [identifier2...]
6+
python fetch.py <identifier1> [identifier2...] [--data-source NAME_OR_ID]
77
88
Examples:
99
# Fetch a single artifact (symbol)
@@ -15,10 +15,18 @@
1515
# Fetch multiple artifacts
1616
python fetch.py "my-org/backend::src/auth.py::login" "my-org/backend::src/utils.py::helper"
1717
18+
# Disambiguate an identifier that exists in more than one data source
19+
# (use the dataSource name or id from a search result)
20+
python fetch.py "my-org/backend::src/auth.py::login" --data-source "backend"
21+
1822
Identifiers come from semantic/grep search results (the `identifier` field).
1923
The format is: {owner/repo}::{path}::{symbol} (for symbols/chunks)
2024
{owner/repo}::{path} (for files)
2125
26+
Pass --data-source (a data source Name or Id from a search result's `dataSource`)
27+
to disambiguate an identifier that exists in more than one data source. Without it,
28+
an ambiguous identifier returns a 409 listing the candidate data sources.
29+
2230
Maximum 20 identifiers per request.
2331
"""
2432

@@ -83,11 +91,23 @@ def _format_relationships_preview(relationships: dict) -> list:
8391
return lines
8492

8593

86-
def format_artifacts(data: dict) -> str:
94+
def _data_source_miss_hint(data_source: str) -> str:
95+
"""Recovery hint when a data-source-scoped fetch returns nothing."""
96+
return (
97+
f'\n💡 Hint: nothing was found in data source "{data_source}". The identifier may belong to a '
98+
"different data source, or the --data-source value may be wrong. Try: re-run with --data-source "
99+
"set to a different candidate (use the Source name or id from your search results, or run "
100+
"datasources.py), or drop --data-source entirely — an ambiguous identifier then returns a 409 "
101+
"listing the candidate data sources to choose from."
102+
)
103+
104+
105+
def format_artifacts(data: dict, data_source: str = None) -> str:
87106
"""Format fetched artifacts for display."""
88107
artifacts = data.get("artifacts", [])
89108
if not artifacts:
90-
return "No artifacts returned."
109+
msg = "No artifacts returned."
110+
return msg + _data_source_miss_hint(data_source) if data_source else msg
91111

92112
output = []
93113
count = 0
@@ -119,7 +139,8 @@ def format_artifacts(data: dict) -> str:
119139
has_any_relationships = True
120140

121141
if not output:
122-
return "No artifacts found."
142+
msg = "No artifacts found."
143+
return msg + _data_source_miss_hint(data_source) if data_source else msg
123144

124145
output.append(f"\n({count} artifact(s))")
125146

@@ -144,7 +165,21 @@ def main():
144165
sys.exit(1)
145166
sys.exit(0)
146167

147-
identifiers = sys.argv[1:]
168+
identifiers = []
169+
data_source = None
170+
i = 1
171+
while i < len(sys.argv):
172+
arg = sys.argv[i]
173+
if arg == "--data-source" and i + 1 < len(sys.argv):
174+
data_source = sys.argv[i + 1]
175+
i += 2
176+
else:
177+
identifiers.append(arg)
178+
i += 1
179+
180+
if not identifiers:
181+
print("Error: At least one identifier is required.", file=sys.stderr)
182+
sys.exit(1)
148183

149184
if len(identifiers) > 20:
150185
print("Error: Maximum 20 identifiers per request.", file=sys.stderr)
@@ -154,11 +189,13 @@ def main():
154189
client = CodeAliveClient()
155190

156191
print(f"📥 Fetching {len(identifiers)} artifact(s)", file=sys.stderr)
192+
if data_source:
193+
print(f" data source: {data_source}", file=sys.stderr)
157194
print(file=sys.stderr)
158195

159-
result = client.fetch_artifacts(identifiers=identifiers)
196+
result = client.fetch_artifacts(identifiers=identifiers, data_source=data_source)
160197

161-
print(format_artifacts(result))
198+
print(format_artifacts(result, data_source=data_source))
162199

163200
except Exception as e:
164201
print(f"❌ Error: {e}", file=sys.stderr)

skills/codealive-context-engine/scripts/grep.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,19 @@ def format_grep_results(results: dict) -> str:
4242
output.append(f" File: {file_path}")
4343
if result.get("identifier"):
4444
output.append(f" Identifier: {result['identifier']}")
45+
46+
# Surface the data-source name/id so they can be passed back as --data-source to
47+
# fetch.py / relationships.py when an identifier is branch-ambiguous.
48+
ds = result.get("dataSource")
49+
if isinstance(ds, dict):
50+
ds_name = ds.get("name")
51+
ds_id = ds.get("id")
52+
if ds_name and ds_id:
53+
output.append(f" Source: {ds_name} (id: {ds_id})")
54+
elif ds_name:
55+
output.append(f" Source: {ds_name}")
56+
elif ds_id:
57+
output.append(f" Source: (id: {ds_id})")
4558
if result.get("matchCount") is not None:
4659
output.append(f" Match count: {result['matchCount']}")
4760

skills/codealive-context-engine/scripts/lib/api_client.py

Lines changed: 35 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -524,6 +524,7 @@ def grep_search(
524524
def fetch_artifacts(
525525
self,
526526
identifiers: List[str],
527+
data_source: Optional[str] = None,
527528
) -> Dict[str, Any]:
528529
"""
529530
Retrieve full content for code artifacts by their identifiers.
@@ -536,6 +537,10 @@ def fetch_artifacts(
536537
537538
Args:
538539
identifiers: List of artifact identifiers from search results (max 20)
540+
data_source: Optional data-source Name or Id to disambiguate an identifier that
541+
exists in more than one data source. Copy the `dataSource.name`/`dataSource.id`
542+
from a search result. Omit for normal lookups; an ambiguous identifier without
543+
it returns a 409 listing the candidate data sources.
539544
540545
Returns:
541546
Dict with 'artifacts' list. Each artifact has identifier, content,
@@ -545,13 +550,16 @@ def fetch_artifacts(
545550
the full list and other relationship profiles.
546551
"""
547552
body: Dict[str, Any] = {"identifiers": identifiers}
553+
if data_source:
554+
body["dataSource"] = data_source
548555
return self._make_request("POST", "/api/search/artifacts", body=body)
549556

550557
def get_artifact_relationships(
551558
self,
552559
identifier: str,
553560
profile: str = "callsOnly",
554561
max_count_per_type: int = 50,
562+
data_source: Optional[str] = None,
555563
) -> Dict[str, Any]:
556564
"""
557565
Retrieve relationship groups for a single artifact by profile.
@@ -569,6 +577,9 @@ def get_artifact_relationships(
569577
- "referencesOnly": symbol references
570578
max_count_per_type: Max related artifacts per relationship type
571579
(1–1000, default 50).
580+
data_source: Optional data-source Name or Id to disambiguate a source identifier
581+
that exists in more than one data source. Omit for normal lookups; an ambiguous
582+
identifier without it returns a 409 listing the candidate data sources.
572583
573584
Returns:
574585
Dict with sourceIdentifier, profile, found, and a list of
@@ -594,6 +605,8 @@ def get_artifact_relationships(
594605
"profile": api_profile,
595606
"maxCountPerType": max_count_per_type,
596607
}
608+
if data_source:
609+
body["dataSource"] = data_source
597610
return self._make_request(
598611
"POST", "/api/search/artifact-relationships", body=body
599612
)
@@ -665,8 +678,8 @@ def main():
665678
print(" search <query> <data_source1> [data_source2...] [--mode auto|fast|deep] [--description-detail short|full]")
666679
print(" semantic-search <query> <data_source1> [data_source2...] [--path PATH] [--ext EXT] [--max-results N]")
667680
print(" grep-search <query> <data_source1> [data_source2...] [--regex] [--path PATH] [--ext EXT] [--max-results N]")
668-
print(" fetch <identifier1> [identifier2...]")
669-
print(" relationships <identifier> [--profile callsOnly|inheritanceOnly|allRelevant|referencesOnly] [--max-count N]")
681+
print(" fetch <identifier1> [identifier2...] [--data-source NAME_OR_ID]")
682+
print(" relationships <identifier> [--profile callsOnly|inheritanceOnly|allRelevant|referencesOnly] [--max-count N] [--data-source NAME_OR_ID]")
670683
print(" chat <question> <data_source1> [data_source2...] [--conversation-id ID]")
671684
sys.exit(1)
672685

@@ -791,12 +804,22 @@ def main():
791804

792805
elif command == "fetch":
793806
if len(sys.argv) < 3:
794-
print("Usage: fetch <identifier1> [identifier2...]")
807+
print("Usage: fetch <identifier1> [identifier2...] [--data-source NAME_OR_ID]")
795808
sys.exit(1)
796809

797-
identifiers = sys.argv[2:]
810+
identifiers = []
811+
data_source = None
812+
i = 2
813+
while i < len(sys.argv):
814+
arg = sys.argv[i]
815+
if arg == "--data-source" and i + 1 < len(sys.argv):
816+
data_source = sys.argv[i + 1]
817+
i += 2
818+
else:
819+
identifiers.append(arg)
820+
i += 1
798821

799-
result = client.fetch_artifacts(identifiers)
822+
result = client.fetch_artifacts(identifiers, data_source=data_source)
800823
print(json.dumps(result, indent=2))
801824

802825
elif command == "relationships":
@@ -807,6 +830,7 @@ def main():
807830
identifier = sys.argv[2]
808831
profile = "callsOnly"
809832
max_count = 50
833+
data_source = None
810834

811835
i = 3
812836
while i < len(sys.argv):
@@ -817,10 +841,15 @@ def main():
817841
elif arg == "--max-count" and i + 1 < len(sys.argv):
818842
max_count = int(sys.argv[i + 1])
819843
i += 2
844+
elif arg == "--data-source" and i + 1 < len(sys.argv):
845+
data_source = sys.argv[i + 1]
846+
i += 2
820847
else:
821848
i += 1
822849

823-
result = client.get_artifact_relationships(identifier, profile, max_count)
850+
result = client.get_artifact_relationships(
851+
identifier, profile, max_count, data_source=data_source
852+
)
824853
print(json.dumps(result, indent=2))
825854

826855
elif command == "chat":

skills/codealive-context-engine/scripts/relationships.py

Lines changed: 27 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,11 @@
1111
this script gives you the full list and lets you switch profiles.
1212
1313
Usage:
14-
python relationships.py <identifier> [--profile PROFILE] [--max-count N]
14+
python relationships.py <identifier> [--profile PROFILE] [--max-count N] [--data-source NAME_OR_ID]
15+
16+
Pass --data-source (a data source Name or Id from a search result's `dataSource`)
17+
to disambiguate an identifier that exists in more than one data source. Without it,
18+
an ambiguous identifier returns a 409 listing the candidate data sources.
1519
1620
Profiles:
1721
callsOnly (default) outgoing + incoming calls
@@ -61,18 +65,27 @@
6165
}
6266

6367

64-
def format_relationships(data: dict) -> str:
68+
def format_relationships(data: dict, data_source: str = None) -> str:
6569
"""Format an artifact-relationships response for display."""
6670
source_id = data.get("sourceIdentifier") or "<unknown>"
6771
raw_profile = data.get("profile") or ""
6872
profile = PROFILE_LABELS.get(raw_profile, raw_profile)
6973
found = bool(data.get("found"))
7074

7175
if not found:
72-
return (
73-
f"Artifact not found or inaccessible: {source_id}\n"
74-
f"(profile={profile})"
75-
)
76+
lines = [
77+
f"Artifact not found or inaccessible: {source_id}",
78+
f"(profile={profile})",
79+
]
80+
if data_source:
81+
lines.append(
82+
f'\n💡 Hint: nothing matched in data source "{data_source}". The identifier may belong '
83+
"to a different data source, or the --data-source value may be wrong. Try: re-run with "
84+
"--data-source set to a different candidate (use the Source name or id from your "
85+
"search results, or run datasources.py), or drop --data-source entirely — an ambiguous "
86+
"identifier then returns a 409 listing the candidate data sources to choose from."
87+
)
88+
return "\n".join(lines)
7689

7790
relationships = data.get("relationships") or []
7891

@@ -142,6 +155,7 @@ def main():
142155
identifier = sys.argv[1]
143156
profile = "callsOnly"
144157
max_count = 50
158+
data_source = None
145159

146160
i = 2
147161
while i < len(sys.argv):
@@ -156,6 +170,9 @@ def main():
156170
print(f"Error: --max-count expects an integer, got '{sys.argv[i + 1]}'", file=sys.stderr)
157171
sys.exit(1)
158172
i += 2
173+
elif arg == "--data-source" and i + 1 < len(sys.argv):
174+
data_source = sys.argv[i + 1]
175+
i += 2
159176
elif arg == "--json":
160177
# Handled below — we strip it before calling format_relationships
161178
i += 1
@@ -171,18 +188,21 @@ def main():
171188

172189
print(f"🔗 Fetching {profile} relationships for: {identifier}", file=sys.stderr)
173190
print(f"⚙️ max-count={max_count}", file=sys.stderr)
191+
if data_source:
192+
print(f" data source: {data_source}", file=sys.stderr)
174193
print(file=sys.stderr)
175194

176195
result = client.get_artifact_relationships(
177196
identifier=identifier,
178197
profile=profile,
179198
max_count_per_type=max_count,
199+
data_source=data_source,
180200
)
181201

182202
if as_json:
183203
print(json.dumps(result, indent=2))
184204
else:
185-
print(format_relationships(result))
205+
print(format_relationships(result, data_source=data_source))
186206

187207
except Exception as e:
188208
print(f"❌ Error: {e}", file=sys.stderr)

0 commit comments

Comments
 (0)