Skip to content

Commit 4f297ee

Browse files
committed
feat: add docs_lookup tool with telemetry and failure logging
Create a dedicated docs_lookup tool that wraps Context7 CLI and web fetch with full telemetry integration. Tracks every documentation lookup (tool_id, method, status, duration, errors) via the existing Azure Application Insights pipeline. Changes: - New docs_lookup tool (src/altimate/tools/docs-lookup.ts) - Tries ctx7 first for library docs, falls back to webfetch for platform docs - Logs success, not_found, and error statuses with duration - Surfaces unknown tools and network failures clearly - New telemetry event type: docs_lookup - Tracks: tool_id, method (ctx7/webfetch), status, duration_ms, error message, source_url - Added "docs" category to tool categorization - Registered in tool registry alongside other altimate tools - Updated data-docs skill to use docs_lookup tool instead of raw bash/webfetch calls https://claude.ai/code/session_01NZPdvEHNXDcmhgJt9RLMu1
1 parent ee692ed commit 4f297ee

4 files changed

Lines changed: 342 additions & 64 deletions

File tree

.opencode/skills/data-docs/SKILL.md

Lines changed: 42 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,9 @@ When writing code or answering questions about data engineering tools,
1515
use this skill to fetch current, version-specific documentation instead
1616
of relying on training data.
1717

18+
## Requirements
19+
**Tools used:** docs_lookup, glob, read
20+
1821
## When to Use
1922

2023
Activate this skill when the user:
@@ -35,108 +38,83 @@ Activate this skill when the user:
3538
- Asks "how do I" questions about any data engineering library or platform
3639
- Needs SQL syntax, API references, method signatures, or configuration options
3740

38-
## Documentation Sources
39-
40-
This skill uses **two methods** depending on the type of documentation:
41-
42-
1. **Context7 CLI** (`ctx7`) — For Python libraries and SDKs (dbt-core, Airflow,
43-
PySpark, Snowpark, etc.). These have indexed documentation in Context7.
44-
2. **Web Fetch** (`webfetch`) — For database platform SQL documentation (Snowflake SQL,
45-
BigQuery SQL, Databricks SQL, DuckDB, PostgreSQL, ClickHouse). These platforms
46-
maintain official docs sites that can be fetched directly.
47-
48-
Check `references/library-ids.md` for the full mapping of which method to use.
49-
50-
## Method 1: Context7 CLI (for Python libraries/SDKs)
41+
## How to Fetch Documentation
5142

52-
### Step 1: Identify the Library
53-
54-
Check the `references/library-ids.md` file for pre-mapped Context7 library IDs.
55-
If you find a match, skip to Step 3.
56-
57-
If the library isn't in the reference file, resolve it:
58-
59-
```bash
60-
npx -y ctx7@latest library <library-name> "<user's question>"
61-
```
43+
### Step 1: Identify the Tool
6244

63-
Pick the result with the closest name match and highest score.
64-
Note the Library ID (format: `/org/project` or `/org/project/version`).
45+
Determine which data engineering tool or platform the user is asking about.
46+
Check `references/library-ids.md` for the full list of supported tools.
6547

66-
### Step 2: Check for Project Version
48+
### Step 2: Check for Project Version (optional)
6749

6850
Look for version info in the user's project:
6951

7052
- `requirements.txt` or `pyproject.toml` — Python package versions
7153
- `dbt_project.yml` — dbt version (`require-dbt-version`)
7254
- `packages.yml` — dbt package versions
73-
- `setup.py` or `setup.cfg` — Python package versions
7455

75-
If a specific version is found, prefer version-specific library IDs
76-
(format: `/org/project/vX.Y.Z`) when available from the resolution step.
56+
### Step 3: Use the `docs_lookup` Tool
7757

78-
### Step 3: Query Documentation
58+
Call the `docs_lookup` tool with the tool name and a specific query:
7959

80-
```bash
81-
npx -y ctx7@latest docs <libraryId> "<specific question>"
60+
```
61+
docs_lookup(tool="dbt-core", query="how to create incremental models with merge strategy")
62+
docs_lookup(tool="snowflake", query="MERGE statement syntax and examples")
63+
docs_lookup(tool="duckdb", query="window functions syntax")
64+
docs_lookup(tool="postgresql", query="JSONB operators and functions")
65+
docs_lookup(tool="clickhouse", query="MergeTree engine settings")
8266
```
8367

84-
Write **specific, detailed queries** for better results:
85-
- Good: `"How to create incremental models with merge strategy in dbt"`
86-
- Bad: `"incremental"`
87-
88-
## Method 2: Web Fetch (for database platform SQL docs)
89-
90-
For Snowflake, BigQuery, Databricks, DuckDB, PostgreSQL, and ClickHouse
91-
platform documentation (SQL syntax, functions, DDL, configuration), use
92-
the `webfetch` tool to fetch specific documentation pages.
93-
94-
### Step 1: Find the Right URL
95-
96-
Check `references/library-ids.md` for the **Platform Documentation URLs**
97-
section. Each platform has a base URL and common page paths listed.
98-
99-
### Step 2: Fetch the Documentation
68+
The tool automatically selects the best method:
69+
- **Context7 (ctx7)** for Python libraries/SDKs — indexed, searchable docs
70+
- **Web fetch** for database platforms — fetches from official documentation sites
10071

101-
Use the `webfetch` tool with the specific documentation URL and a prompt
102-
describing what information to extract:
72+
For platform docs with a **specific page URL** (see `references/library-ids.md`),
73+
pass it via the `url` parameter for better results:
10374

10475
```
105-
webfetch(url="https://docs.snowflake.com/en/sql-reference/sql/merge",
106-
prompt="Extract the full MERGE syntax, parameters, and examples")
76+
docs_lookup(tool="snowflake", query="MERGE syntax", url="https://docs.snowflake.com/en/sql-reference/sql/merge")
77+
docs_lookup(tool="postgresql", query="JSON functions", url="https://www.postgresql.org/docs/current/functions-json.html")
10778
```
10879

109-
### Step 3: Use the Documentation
80+
### Step 4: Use the Documentation
11081

11182
- Answer using the fetched documentation, not training data
11283
- Include relevant code examples from the docs
113-
- Cite the documentation URL for reference
84+
- Cite the library version or documentation URL when relevant
11485
- If docs mention deprecations or breaking changes, highlight them
11586

87+
## Supported Tools
88+
89+
**Libraries/SDKs (via Context7):** dbt-core, airflow, pyspark, snowflake-connector-python,
90+
snowpark-python, google-cloud-bigquery, databricks-sdk, duckdb, psycopg2, psycopg,
91+
clickhouse-connect, confluent-kafka, sqlalchemy, polars, pandas, great-expectations,
92+
dbt-utils, dbt-expectations, dbt-snowflake, dbt-bigquery, dbt-databricks, dbt-postgres,
93+
dbt-redshift, dbt-spark, dbt-duckdb, dbt-clickhouse, elementary
94+
95+
**Platforms (via web fetch):** snowflake, databricks, duckdb, postgresql, clickhouse, bigquery
96+
11697
## Guidelines
11798

118-
- Maximum 3 CLI/webfetch calls per user question to avoid rate limits
119-
- Context7 works without authentication; set `CONTEXT7_API_KEY` for higher limits
120-
- If a call fails (network error, rate limit), fall back to training data
121-
and note that the docs could not be fetched
99+
- Maximum 3 `docs_lookup` calls per user question to avoid rate limits
100+
- If a call fails, the tool logs the failure automatically for improvement tracking
101+
- On failure, fall back to training data and note that docs could not be fetched
122102
- For dbt: always check `dbt_project.yml` for version and `packages.yml` for packages
123103
- For Python tools: check `requirements.txt` or `pyproject.toml` for pinned versions
124104
- When multiple libraries are relevant (e.g., dbt-core + dbt-snowflake), fetch docs
125105
for the most specific one first
126-
- For SQL platform docs, prefer the most specific page URL (e.g., the MERGE
127-
statement page, not the general SQL reference index)
106+
- For SQL platform docs, pass a specific page URL via the `url` parameter for best results
128107

129108
## Usage
130109

131110
- `/data-docs How do I create an incremental model in dbt?`
132111
- `/data-docs What Airflow operators are available for BigQuery?`
133112
- `/data-docs How to use window functions in PySpark?`
134-
- `/data-docs Snowpark DataFrame API for joins`
135113
- `/data-docs Snowflake MERGE statement syntax`
136114
- `/data-docs DuckDB window functions`
137115
- `/data-docs PostgreSQL JSONB operators`
138116
- `/data-docs ClickHouse MergeTree engine settings`
139117

140-
Use the bash tool to run `ctx7` CLI commands for libraries, and the `webfetch`
141-
tool for platform SQL documentation. Reference `library-ids.md` for the full
142-
mapping of tools, IDs, and URLs.
118+
Use the `docs_lookup` tool for all documentation lookups. It handles method selection,
119+
telemetry, and failure logging automatically. Reference `library-ids.md` for the full
120+
mapping of tools, IDs, and documentation URLs.

packages/opencode/src/altimate/telemetry/index.ts

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -274,6 +274,17 @@ export namespace Telemetry {
274274
budget: number
275275
scopes_used: string[]
276276
}
277+
| {
278+
type: "docs_lookup"
279+
timestamp: number
280+
session_id: string
281+
tool_id: string
282+
method: "ctx7" | "webfetch"
283+
status: "success" | "error" | "not_found"
284+
duration_ms: number
285+
error?: string
286+
source_url?: string
287+
}
277288

278289
const FILE_TOOLS = new Set(["read", "write", "edit", "glob", "grep", "bash"])
279290

@@ -287,6 +298,7 @@ export namespace Telemetry {
287298
{ category: "warehouse", keywords: ["warehouse", "connection"] },
288299
{ category: "lineage", keywords: ["lineage", "dag"] },
289300
{ category: "memory", keywords: ["memory"] },
301+
{ category: "docs", keywords: ["docs_lookup"] },
290302
]
291303

292304
export function categorizeToolName(name: string, type: "standard" | "mcp"): string {

0 commit comments

Comments
 (0)