Skip to content

Commit 85b9571

Browse files
Merge pull request #23 from NavidZ/llm-context-with-templates
DATA_DISCOVERY skill improvements — ranking
2 parents 7a5612c + 9001a57 commit 85b9571

2 files changed

Lines changed: 44 additions & 17 deletions

File tree

features/src/llm-context/generate-context.sh

Lines changed: 26 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -119,9 +119,9 @@ install_skills() {
119119
120120
## When to Use This Skill
121121
122-
**Only read this skill when the user is explicitly searching for data collections they do not yet have in their workspace — across all of Workbench.**
122+
**Always read this skill before calling `platform_list_data_collections`.** This skill controls the full discovery flow — do not call the MCP tool directly without following these steps first.
123123
124-
Do NOT read this skill if the user is asking about data already in their workspace. In that case, call `workspace_list_data_collections` or `workspace_list_resources` directly.
124+
Do NOT read this skill if the user is asking about data already in their workspace. In that case, call `workspace_list_data_collections` directly.
125125
126126
**Read this skill ONLY when the user says something like:**
127127
- "Search all data collections I have access to"
@@ -222,12 +222,24 @@ For each result, the tool returns the following fields — use ALL of them when
222222
223223
---
224224
225-
## Step 3 — Present Results and Offer to Refine
225+
## Step 3 — Rank, Present Results, and Offer to Refine
226226
227-
Present matching collections in a clear summary. For each result, highlight the fields most relevant to the user's query. Example format:
227+
For every result returned, assign a **relevance score from 1–5** based on how well the collection's metadata matches the user's query. Use ALL available metadata fields when scoring — name, description, shortDescription, dataModalityTags, therapeuticTags, dataModel, usageExamples, dataDictionary, patientCount, geographicCoverage.
228+
229+
**Scoring guide:**
230+
| Score | Meaning |
231+
|---|---|
232+
| ⭐⭐⭐⭐⭐ 5 | Exact match — directly contains the data type, gene, disease, or topic the user asked about |
233+
| ⭐⭐⭐⭐ 4 | Strong match — highly relevant to the query and covers the right domain or modality |
234+
| ⭐⭐⭐ 3 | Good match — related to the query's domain; may not be specific to the exact topic but offers valuable context |
235+
| ⭐⭐ 2 | Potential match — shares topical overlap with the query and is worth exploring further |
236+
| ⭐ 1 | Broad match — loosely connected to the query; included for completeness and may surface unexpected value |
237+
238+
Present results **sorted by score (highest first)**. For each result, include a one-sentence justification for the score that explains concretely why it ranked that way. Example format:
228239
229240
---
230-
**[Collection Name]**
241+
**[Collection Name]** — ⭐⭐⭐⭐⭐ 5/5
242+
- **Why**: [One concrete sentence explaining what in the metadata drove this score — e.g. "Contains whole-genome sequencing data with BRCA1/BRCA2 variant calls across 10,000 patients."]
231243
- **Summary**: [shortDescription]
232244
- **Data types**: [dataModalityTags]
233245
- **Patients**: [patientCount] | **Time frame**: [timeFrame] | **Geography**: [geographicCoverage]
@@ -237,7 +249,7 @@ Present matching collections in a clear summary. For each result, highlight the
237249
238250
After presenting results, ask:
239251
240-
> "Do any of these match what you're looking for? Would you like to refine the search — for example, filter by data type, study size, or access level?"
252+
> "Do any of these look useful? Would you like to refine the search or explore a specific collection in more detail?"
241253
242254
If the user wants deeper detail on a specific collection:
243255
- Use `underlayName` with `mcp__wb__underlay_list_entities` to explore the data schema
@@ -2877,11 +2889,14 @@ Read these directly — no index needed:
28772889
28782890
### ⚡ Skill Trigger Guide
28792891
2880-
**Read \`DATA_DISCOVERY.md\` ONLY when the user is searching for data collections they don't yet have, platform-wide:**
2881-
- "search all data collections I have access to" / "find data collections across Workbench"
2882-
- "what data collections can I add to my workspace?" / "data collections I haven't added yet"
2883-
- "find a data collection related to [topic / disease / modality]"
2884-
- "search across all Workbench data collections" / "what data collections are available on the platform?"
2892+
**ALWAYS read \`DATA_DISCOVERY.md\` BEFORE calling \`platform_list_data_collections\`.** The skill controls the full discovery flow including scope clarification, result presentation, and how to add a collection to the workspace.
2893+
2894+
Trigger \`DATA_DISCOVERY.md\` whenever the user is searching for data collections platform-wide:
2895+
- "find data collections" / "search for data collections" / "find data collections with [keyword]"
2896+
- "find data collections across Workbench" / "search all data collections I have access to"
2897+
- "what data collections can I add?" / "data collections I haven't added yet"
2898+
- "find a data collection related to [topic / disease / gene / modality]"
2899+
- "are there data collections about [topic]?" / "find data collections that have [keyword]"
28852900
- Do NOT use this skill for workspace-scoped questions — call \`workspace_list_data_collections\` directly instead
28862901
28872902
**ALWAYS read \`DASHBOARD_BUILDER.md\` FIRST when user says ANY of these:**

features/src/llm-context/skills/DATA_DISCOVERY.md

Lines changed: 18 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,9 @@
44

55
## When to Use This Skill
66

7-
**Only read this skill when the user is explicitly searching for data collections they do not yet have in their workspace — across all of Workbench.**
7+
**Always read this skill before calling `platform_list_data_collections`.** This skill controls the full discovery flow — do not call the MCP tool directly without following these steps first.
88

9-
Do NOT read this skill if the user is asking about data already in their workspace. In that case, call `workspace_list_data_collections` or `workspace_list_resources` directly.
9+
Do NOT read this skill if the user is asking about data already in their workspace. In that case, call `workspace_list_data_collections` directly.
1010

1111
**Read this skill ONLY when the user says something like:**
1212
- "Search all data collections I have access to"
@@ -107,12 +107,24 @@ For each result, the tool returns the following fields — use ALL of them when
107107

108108
---
109109

110-
## Step 3 — Present Results and Offer to Refine
110+
## Step 3 — Rank, Present Results, and Offer to Refine
111111

112-
Present matching collections in a clear summary. For each result, highlight the fields most relevant to the user's query. Example format:
112+
For every result returned, assign a **relevance score from 1–5** based on how well the collection's metadata matches the user's query. Use ALL available metadata fields when scoring — name, description, shortDescription, dataModalityTags, therapeuticTags, dataModel, usageExamples, dataDictionary, patientCount, geographicCoverage.
113+
114+
**Scoring guide:**
115+
| Score | Meaning |
116+
|---|---|
117+
| ⭐⭐⭐⭐⭐ 5 | Exact match — directly contains the data type, gene, disease, or topic the user asked about |
118+
| ⭐⭐⭐⭐ 4 | Strong match — highly relevant to the query and covers the right domain or modality |
119+
| ⭐⭐⭐ 3 | Good match — related to the query's domain; may not be specific to the exact topic but offers valuable context |
120+
| ⭐⭐ 2 | Potential match — shares topical overlap with the query and is worth exploring further |
121+
| ⭐ 1 | Broad match — loosely connected to the query; included for completeness and may surface unexpected value |
122+
123+
Present results **sorted by score (highest first)**. For each result, include a one-sentence justification for the score that explains concretely why it ranked that way. Example format:
113124

114125
---
115-
**[Collection Name]**
126+
**[Collection Name]** — ⭐⭐⭐⭐⭐ 5/5
127+
- **Why**: [One concrete sentence explaining what in the metadata drove this score — e.g. "Contains whole-genome sequencing data with BRCA1/BRCA2 variant calls across 10,000 patients."]
116128
- **Summary**: [shortDescription]
117129
- **Data types**: [dataModalityTags]
118130
- **Patients**: [patientCount] | **Time frame**: [timeFrame] | **Geography**: [geographicCoverage]
@@ -122,7 +134,7 @@ Present matching collections in a clear summary. For each result, highlight the
122134

123135
After presenting results, ask:
124136

125-
> "Do any of these match what you're looking for? Would you like to refine the search — for example, filter by data type, study size, or access level?"
137+
> "Do any of these look useful? Would you like to refine the search or explore a specific collection in more detail?"
126138
127139
If the user wants deeper detail on a specific collection:
128140
- Use `underlayName` with `mcp__wb__underlay_list_entities` to explore the data schema

0 commit comments

Comments
 (0)