Skip to content

Commit 94a37d0

Browse files
sciapanCAclaude
andcommitted
Merge origin/main into feature/add-datasource-relevance-filter
Resolved conflict in src/tools/datasources.py by adapting the query relevance filter to main's dict envelope convention: - get_data_sources now returns {dataSources, hint} (from main) and additionally carries a `message` field when `query` is supplied - Empty results use a query-specific hint (_DATASOURCES_EMPTY_QUERY_HINT) instead of the 'add a repository' hint when filtering yields nothing - Tests updated from json.loads(JSON string) assertions to dict shape Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2 parents bed9850 + 339b11d commit 94a37d0

21 files changed

Lines changed: 1224 additions & 186 deletions

.github/workflows/ci.yml

Lines changed: 20 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,9 @@ jobs:
9494
password: ${{ secrets.GITHUB_TOKEN }}
9595

9696
- name: Login to Docker Hub
97+
id: dockerhub_login
9798
if: github.event_name == 'push'
99+
continue-on-error: true
98100
uses: docker/login-action@b45d80f862d83dbcd57f89517bcf500b2ab88fb2 # v4.0.0
99101
with:
100102
username: ${{ secrets.DOCKERHUB_USERNAME }}
@@ -111,17 +113,30 @@ jobs:
111113
tags: ${{ env.IMAGE_NAME }}:pr-${{ github.event.number }}
112114
cache-from: type=gha
113115

114-
# Push to main: build multi-platform and push with rolling tags
115-
- name: Build and push Docker image
116+
# Push to main: build multi-platform and push the production GHCR tag.
117+
- name: Build and push Docker image (GHCR)
116118
if: github.event_name == 'push'
117119
uses: docker/build-push-action@d08e5c354a6adb9ed34480a06d141179aa583294 # v7.0.0
118120
with:
119121
push: true
120122
platforms: linux/amd64,linux/arm64
121123
file: ./Dockerfile
122-
tags: |
123-
${{ env.IMAGE_NAME }}:main
124-
${{ env.DOCKERHUB_IMAGE }}:mcp-dev
124+
tags: ${{ env.IMAGE_NAME }}:main
125+
labels: |
126+
io.modelcontextprotocol.server.name=io.github.CodeAlive-AI/codealive-mcp
127+
cache-from: type=gha
128+
cache-to: type=gha
129+
130+
# Docker Hub is a secondary self-hosted distribution channel. Missing
131+
# credentials must not block GHCR, because production pulls from GHCR.
132+
- name: Build and push Docker image (Docker Hub)
133+
if: github.event_name == 'push' && steps.dockerhub_login.outcome == 'success'
134+
uses: docker/build-push-action@d08e5c354a6adb9ed34480a06d141179aa583294 # v7.0.0
135+
with:
136+
push: true
137+
platforms: linux/amd64,linux/arm64
138+
file: ./Dockerfile
139+
tags: ${{ env.DOCKERHUB_IMAGE }}:mcp-dev
125140
labels: |
126141
io.modelcontextprotocol.server.name=io.github.CodeAlive-AI/codealive-mcp
127142
cache-from: type=gha

.github/workflows/release.yml

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -148,13 +148,20 @@ jobs:
148148
username: ${{ github.actor }}
149149
password: ${{ secrets.GITHUB_TOKEN }}
150150

151+
# Docker Hub publish is a secondary distribution channel for self-hosted
152+
# customers. Treat it as best-effort: missing credentials must NOT block
153+
# the primary release path (GHCR push, MCP Registry publish, git tag,
154+
# GitHub Release). Configure DOCKERHUB_USERNAME / DOCKERHUB_TOKEN in the
155+
# `release` environment to re-enable.
151156
- name: Login to Docker Hub (self-hosted distribution)
157+
id: dockerhub_login
158+
continue-on-error: true
152159
uses: docker/login-action@b45d80f862d83dbcd57f89517bcf500b2ab88fb2 # v4.0.0
153160
with:
154161
username: ${{ secrets.DOCKERHUB_USERNAME }}
155162
password: ${{ secrets.DOCKERHUB_TOKEN }}
156163

157-
- name: Build and push Docker image
164+
- name: Build and push Docker image (GHCR)
158165
uses: docker/build-push-action@d08e5c354a6adb9ed34480a06d141179aa583294 # v7.0.0
159166
with:
160167
push: true
@@ -165,12 +172,25 @@ jobs:
165172
${{ env.IMAGE_NAME }}:${{ steps.version.outputs.version }}
166173
${{ env.IMAGE_NAME }}:v${{ steps.version.outputs.version }}
167174
${{ env.IMAGE_NAME }}:latest
175+
labels: |
176+
io.modelcontextprotocol.server.name=io.github.CodeAlive-AI/codealive-mcp
177+
cache-from: type=gha
178+
cache-to: type=gha
179+
180+
- name: Build and push Docker image (Docker Hub)
181+
if: steps.dockerhub_login.outcome == 'success'
182+
uses: docker/build-push-action@d08e5c354a6adb9ed34480a06d141179aa583294 # v7.0.0
183+
with:
184+
push: true
185+
platforms: linux/amd64,linux/arm64
186+
file: ./Dockerfile
187+
build-args: VERSION=${{ steps.version.outputs.version }}
188+
tags: |
168189
${{ env.DOCKERHUB_IMAGE }}:mcp
169190
${{ env.DOCKERHUB_IMAGE }}:mcp-v${{ steps.version.outputs.version }}
170191
labels: |
171192
io.modelcontextprotocol.server.name=io.github.CodeAlive-AI/codealive-mcp
172193
cache-from: type=gha
173-
cache-to: type=gha
174194

175195
# Git tag created AFTER Docker push succeeds — if Docker fails, no stale tag
176196
- name: Create and push git tag

CLAUDE.md

Lines changed: 75 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,7 @@ This is a Model Context Protocol (MCP) server that provides AI clients with acce
118118
2. Client calls tools (`get_data_sources``semantic_search` / `grep_search``fetch_artifacts` / `get_artifact_relationships``chat` only if synthesis is still needed)
119119
3. Middleware chain runs: N8N cleanup → ObservabilityMiddleware (OTel span + log correlation)
120120
4. Tool translates MCP call to CodeAlive API request (with `X-CodeAlive-*` headers)
121-
5. Response parsed, formatted as XML or text, returned to AI client
121+
5. Response parsed and returned to the AI client — as a `dict` for metadata/discovery tools, as an XML string for `fetch_artifacts`, or as plain text for `chat`
122122

123123
### Environment Variables
124124

@@ -172,6 +172,17 @@ This project uses **loguru** for structured JSON logging. All logs go to **stder
172172

173173
7. **Use `logger.configure(patcher=...)` for global context injection** (like OTel trace_id). Do NOT pass `patcher` to `logger.add()` — loguru 0.7.x does not support it there.
174174

175+
8. **Tool-call failures are warnings with full structured arguments.** Do not log MCP
176+
tool-call failures as `error` unless the whole server process is failing. A bad
177+
tool call is recoverable input for the model loop, same as backend agent tools.
178+
Log it as `logger.warning(..., tool_arguments={...})` or with `.bind(tool_arguments=...)`.
179+
Do not redact `tool_arguments` in the logger path; the purpose is to recover the
180+
exact failing invocation. Authorization headers remain masked via `log_api_request()`.
181+
182+
9. **Tool-call lifecycle logs are debug.** The per-tool "started" and "completed"
183+
messages from `ObservabilityMiddleware` must be `logger.debug`, not `logger.info`.
184+
Info-level logs should not be emitted for every agent step/tool step.
185+
175186
### OTel Trace Correlation
176187

177188
Every log record automatically gets `trace_id` and `span_id` injected by `_otel_patcher` (registered via `logger.configure`). The `ObservabilityMiddleware` also uses `logger.contextualize(trace_id=..., tool=...)` so all logs within a tool call carry the correlation ID. Do not duplicate this — it's automatic.
@@ -194,29 +205,79 @@ The `ObservabilityMiddleware` creates a span per tool call with these attributes
194205

195206
On errors, the span gets `StatusCode.ERROR` + `record_exception()`. Do not add redundant span creation inside tool functions — the middleware handles it.
196207

208+
#### Required MCP observability fix pattern
209+
210+
When touching MCP tool observability, update both the generic middleware and the
211+
tool-specific body:
212+
213+
- In `src/middleware/observability_middleware.py`, extract tool arguments from
214+
the incoming `tools/call` message (FastMCP currently exposes this through the
215+
message payload; keep the extraction defensive). Add them to the log context,
216+
e.g. `with logger.contextualize(trace_id=trace_id, tool=tool_name,
217+
tool_arguments=tool_arguments): ...`.
218+
- Change middleware lifecycle logs:
219+
- `logger.info("Tool call started...")` -> `logger.debug(...)`
220+
- `logger.info("Tool call completed...")` -> `logger.debug(...)`
221+
- `logger.error("Tool call failed...")` -> `logger.warning(...,
222+
tool_arguments=tool_arguments)`
223+
- Keep OTel span semantics unchanged: failed tool calls should still set
224+
`StatusCode.ERROR` and `record_exception(exc)` because tracing represents the
225+
tool invocation outcome, while logs use Warning to avoid misclassifying a
226+
recoverable model/tool-call error as a server crash.
227+
- In each tool body, log in-band validation failures before raising `ToolError`.
228+
Include all tool parameters in `tool_arguments`.
229+
230+
Concrete example: `src/tools/artifact_relationships.py::get_artifact_relationships`
231+
must log these branches as Warning with full arguments:
232+
233+
```python
234+
tool_arguments = {
235+
"identifier": identifier,
236+
"profile": profile,
237+
"max_count_per_type": max_count_per_type,
238+
}
239+
```
240+
241+
- missing/empty `identifier`
242+
- `max_count_per_type` outside `1..1000`
243+
- unsupported `profile` fallback branch
244+
- backend `HTTPStatusError` / unexpected exception before delegating to
245+
`handle_api_error(...)`
246+
247+
The API request/response helpers stay `Debug` and keep their existing masking
248+
rules. Do not put raw response bodies into warning logs.
249+
197250
### Adding New Tools — Observability Checklist
198251

199252
When adding a new tool, ensure:
200253
1. The tool receives `ctx: Context` as its first argument (required for lifespan context and logging)
201254
2. API requests include all four `X-CodeAlive-*` headers: `Integration`, `Tool`, `Client`, plus `Authorization`
202255
3. Call `log_api_request()` before and `log_api_response()` after the HTTP call
203-
4. Errors go through `handle_api_error(ctx, e, "description", method=_TOOL_NAME)` — this ensures the `[tool_name]` prefix in error messages
256+
4. Errors are logged as Warning with full `tool_arguments` before they go through `handle_api_error(ctx, e, "description", method=_TOOL_NAME)` — this ensures the `[tool_name]` prefix in error messages and preserves the exact failed call in logs
204257
5. The middleware automatically wraps the tool in an OTel span — no manual span creation needed
205258

206259
## Tool Response Conventions
207260

208-
### Response format: dict for metadata, XML for content
209-
210-
Tools that return **search metadata** (identifiers, match counts, line numbers)
211-
return a `dict`. FastMCP serializes it automatically via `pydantic_core.to_json`,
212-
which preserves Unicode — no manual `json.dumps()` needed. Examples:
213-
`semantic_search`, `grep_search`, `codebase_search`.
214-
215-
Tools that return **source code content** return an **XML string**. XML tags give
216-
the LLM clear structural boundaries between artifacts, content blocks, and
217-
relationships — this is critical for accurate reasoning over multi-artifact
218-
responses. **Do not convert `fetch_artifacts` or `get_artifact_relationships`
219-
to dict/JSON** — the XML structure is intentional.
261+
### Response format: dict for metadata/discovery, XML only for source code
262+
263+
Tools that return **structured metadata** (identifiers, match counts, line
264+
numbers, relationship groups, data source listings) return a `dict` (or list of
265+
dicts). FastMCP serializes it automatically via `pydantic_core.to_json`, which
266+
preserves Unicode — no manual `json.dumps()` needed. Examples:
267+
`semantic_search`, `grep_search`, `codebase_search`, `get_data_sources`,
268+
`get_artifact_relationships`.
269+
270+
**Never call `json.dumps(...)` from a tool's return path.** Python's `json.dumps`
271+
defaults to `ensure_ascii=True` and escapes Cyrillic/CJK/etc. to `\uXXXX`.
272+
Returning a `dict` lets FastMCP route through `pydantic_core.to_json`, which
273+
emits UTF-8. If you must serialize manually for some reason, pass
274+
`ensure_ascii=False` explicitly.
275+
276+
Only `fetch_artifacts` returns an **XML string**. XML tags give the LLM clear
277+
structural boundaries between artifacts, content blocks, and inline
278+
relationships when streaming source code — this is critical for accurate
279+
reasoning over multi-artifact responses. **Do not convert `fetch_artifacts` to
280+
dict/JSON** — the XML structure is intentional.
220281

221282
### Hint other MCP tools when the response implies a follow-up call
222283

README.md

Lines changed: 49 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -140,8 +140,8 @@ Replace `YOUR_API_KEY_HERE` with your actual API key.
140140
**Option 1: Remote HTTP (Recommended)**
141141

142142
1. Open Cursor → Settings (`Cmd+,` or `Ctrl+,`)
143-
2. Navigate to **"MCP"** in the left panel
144-
3. Click **"Add new MCP server"**
143+
2. Navigate to **"Tools & MCP"** in the left panel (older builds called this **"Tools & Integrations"**)
144+
3. Click **"New MCP Server"**
145145
4. Paste this configuration:
146146

147147
```json
@@ -157,7 +157,9 @@ Replace `YOUR_API_KEY_HERE` with your actual API key.
157157
}
158158
```
159159

160-
5. Save and restart Cursor
160+
5. Save — Cursor reloads the server automatically. The entry is stored in `.cursor/mcp.json` (project) or `~/.cursor/mcp.json` (global).
161+
162+
> **Tip:** Cursor also supports a one-click install deeplink — `cursor://anysphere.cursor-deeplink/mcp/install?name=codealive&config=BASE64_CONFIG`. Only follow deeplinks from trusted sources.
161163
162164
**Option 2: Docker (STDIO)**
163165

@@ -179,29 +181,64 @@ Replace `YOUR_API_KEY_HERE` with your actual API key.
179181
</details>
180182

181183
<details>
182-
<summary><b>Codex</b></summary>
184+
<summary><b>Codex (CLI, App, IDE Extension)</b></summary>
185+
186+
OpenAI Codex ships in three form-factors that **share the same configuration**: the **Codex CLI**, the **Codex App** (macOS / Windows), and the **Codex IDE Extension** (VS Code `openai.chatgpt` and JetBrains 2025.3+). All three read `~/.codex/config.toml`, so one snippet covers every Codex surface. A project-level `.codex/config.toml` in the repo root is also supported for trusted projects.
183187

184-
OpenAI Codex CLI supports MCP via `~/.codex/config.toml`.
188+
**Option 1: One-line add (Recommended)**
189+
190+
```bash
191+
codex mcp add codealive --url https://mcp.codealive.ai/api
192+
```
193+
194+
Then open `~/.codex/config.toml` and add the bearer-token reference plus the Streamable HTTP feature flag:
185195

186-
**`~/.codex/config.toml` (Docker stdio – recommended)**
187196
```toml
197+
[features]
198+
rmcp_client = true
199+
188200
[mcp_servers.codealive]
189-
command = "docker"
190-
args = ["run", "--rm", "-i",
191-
"-e", "CODEALIVE_API_KEY=YOUR_API_KEY_HERE",
192-
"ghcr.io/codealive-ai/codealive-mcp:main"]
201+
url = "https://mcp.codealive.ai/api"
202+
bearer_token_env_var = "CODEALIVE_API_KEY"
203+
```
204+
205+
Finally, export the key:
206+
```bash
207+
export CODEALIVE_API_KEY="YOUR_API_KEY_HERE"
193208
```
194209

195-
**Experimental: Streamable HTTP (requires `[features].rmcp_client = true`)**
210+
Verify with `codex mcp list`.
196211

197-
> **Note:** Streamable HTTP support requires `rmcp_client = true` under a `[features]` section in your Codex configuration.
212+
> **Note:** Streamable HTTP requires `[features].rmcp_client = true`. The old top-level `experimental_use_rmcp_client = true` flag is deprecated. `bearer_token_env_var` is preferred over inline `headers = { Authorization = "Bearer …" }` because it keeps secrets out of the config file.
213+
214+
**Option 2: Inline header (HTTP)**
198215

199216
```toml
217+
[features]
218+
rmcp_client = true
219+
200220
[mcp_servers.codealive]
201221
url = "https://mcp.codealive.ai/api"
202222
headers = { Authorization = "Bearer YOUR_API_KEY_HERE" }
203223
```
204224

225+
**Option 3: Docker (STDIO)**
226+
227+
```toml
228+
[mcp_servers.codealive]
229+
command = "docker"
230+
args = ["run", "--rm", "-i", "ghcr.io/codealive-ai/codealive-mcp:main"]
231+
env_vars = ["CODEALIVE_API_KEY"]
232+
```
233+
234+
```bash
235+
export CODEALIVE_API_KEY="YOUR_API_KEY_HERE"
236+
```
237+
238+
No `[features]` flag is needed for stdio. `env_vars` forwards values from the parent shell — safer than embedding the key in `args`.
239+
240+
**Codex App UI:** Settings → MCP Servers → Add Server. The UI writes the same `~/.codex/config.toml` entry. The CLI and IDE extension pick it up automatically.
241+
205242
</details>
206243

207244
<details>

manifest.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
"manifest_version": "0.4",
33
"name": "codealive-mcp",
44
"display_name": "CodeAlive",
5-
"version": "2.0.3",
5+
"version": "2.0.4",
66
"description": "Semantic code search and codebase Q&A for Claude Desktop using your CodeAlive account or self-hosted deployment.",
77
"long_description": "CodeAlive gives Claude Desktop access to semantic code search, artifact fetch, repository discovery, and architecture-aware codebase Q&A. This extension runs locally via MCP and supports both CodeAlive Cloud and self-hosted deployments.",
88
"author": {

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ packages = ["src"]
3737
package-dir = {"" = "."}
3838

3939
[tool.setuptools_scm]
40-
fallback_version = "2.0.3"
40+
fallback_version = "2.0.4"
4141

4242
[tool.uv]
4343
# Relative dates in exclude-newer (e.g. "7 days") require uv ≥ 0.11.

server.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
22
"$schema": "https://static.modelcontextprotocol.io/schemas/2025-12-11/server.schema.json",
33
"name": "io.github.CodeAlive-AI/codealive-mcp",
4-
"version": "2.0.3",
4+
"version": "2.0.4",
55
"description": "Semantic code search and analysis from CodeAlive for AI assistants and agents.",
66
"keywords": [
77
"context-engineering",

src/codealive_mcp_server.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,9 @@
8080
- Use specific function/class names or file path scopes when looking for particular implementations
8181
- Treat `semantic_search` and `grep_search` as the default discovery tools
8282
- Prefer `semantic_search` over the deprecated `codebase_search` legacy alias
83+
- Use `get_artifact_relationships` only with exact artifact identifiers from prior search/fetch results.
84+
It expands a known artifact's relationship graph; it does not search by path, class name, or guessed symbol.
85+
For exact source code, call `fetch_artifacts` on identifiers returned by search or relationships.
8386
- Remember that context from previous messages is maintained in the same conversation
8487
8588
Flexible data source usage:

0 commit comments

Comments
 (0)