Skip to content

feat(claude-agent-sdk): capture gen_ai.skill.* on Skill load execute_tool span#225

Closed
rangemer333-cell wants to merge 1 commit into
alibaba:mainfrom
rangemer333-cell:feat/claude-agent-sdk-skill-telemetry
Closed

feat(claude-agent-sdk): capture gen_ai.skill.* on Skill load execute_tool span#225
rangemer333-cell wants to merge 1 commit into
alibaba:mainfrom
rangemer333-cell:feat/claude-agent-sdk-skill-telemetry

Conversation

@rangemer333-cell

Copy link
Copy Markdown

Summary

Attach gen_ai.skill.name / gen_ai.skill.id / gen_ai.skill.description / gen_ai.skill.version to the execute_tool span of the built-in Skill tool in the Claude Agent SDK instrumentation.

Telemetry is bound to the ToolUseBlock(name="Skill") tool span — not to the subsequent UserMessage TextBlock that injects SKILL.md content (per the AGE-239 spec).

Attribute mapping

Source Attribute
ToolUseBlock.input.skill / frontmatter name gen_ai.skill.name
claude:project:<skill-name> gen_ai.skill.id
SKILL.md frontmatter description gen_ai.skill.description
SKILL.md frontmatter version gen_ai.skill.version
SystemMessage.data.cwd locates <cwd>/.claude/skills/<name>/SKILL.md

If start-time info is incomplete, skill name falls back to UserMessage.tool_use_result.commandName before the span closes.

Robustness

Skill metadata is read best-effort: any failure (missing/unreadable SKILL.md, malformed frontmatter, …) is swallowed and never affects the SDK call. When SKILL.md is unavailable, only gen_ai.skill.name/gen_ai.skill.id (derived from the tool input) are set.

Validation

PYTHONPATH="<repo>/util/opentelemetry-util-genai/src:<repo>/instrumentation-loongsuite/loongsuite-instrumentation-claude-agent-sdk/src" python -m pytest   instrumentation-loongsuite/loongsuite-instrumentation-claude-agent-sdk/tests/test_span_validation.py::test_skill_tool_span_attributes -q

python -m ruff check <changed-files>
python -m pytest   instrumentation-loongsuite/loongsuite-instrumentation-claude-agent-sdk/tests/test_span_validation.py   instrumentation-loongsuite/loongsuite-instrumentation-claude-agent-sdk/tests/test_with_cassettes.py   instrumentation-loongsuite/loongsuite-instrumentation-claude-agent-sdk/tests/test_task_subagent_real_data.py -q

All spec-scoped tests pass (24 passed); ruff clean.

Note: test_edge_cases.py / test_integration.py require a real claude-agent-sdk install and fail identically on clean main — pre-existing, unrelated to this change.

Draft PR per task scope — not for merge yet; review threads left open.

🤖 Generated with Claude Code

…tool span

Attach gen_ai.skill.name/id/description/version to the execute_tool span of
the built-in Skill tool. Telemetry is bound to the ToolUseBlock(name="Skill")
tool span (not the SKILL.md-injecting UserMessage TextBlock).

- skill.name from ToolUseBlock.input.skill (frontmatter.name fallback)
- skill.id = claude:project:<skill-name>
- skill.description/version read best-effort from <cwd>/.claude/skills/<name>/SKILL.md
  frontmatter (cwd from SystemMessage.data.cwd)
- fallback to UserMessage.tool_use_result.commandName when start info incomplete
- metadata read failures never propagate to the SDK call

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


流屿 seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@rangemer333-cell rangemer333-cell left a comment

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review for AGE-239 — Skill telemetry on execute_tool span.

Verification against spec

  1. Telemetry is on ToolUseBlock(name="Skill") execute_tool span, not on UserMessage TextBlock

    • _create_tool_spans_from_message (patch.py) creates ExecuteToolInvocation for name=="Skill" ToolUseBlocks and calls _apply_skill_metadatahandler.start_execute_tool.
    • The UserMessage TextBlock carrying SKILL.md content is handled in _process_user_message and only feeds user_parts / collected_messages; no span attributes are written there. ✓
  2. Four attributes map correctly, skill.id = claude:project:<name>

    • gen_ai.skill.name = frontmatter name (fallback to ToolUseBlock.input.skill).
    • gen_ai.skill.id = f"{_SKILL_ID_PREFIX}{name}" with _SKILL_ID_PREFIX = "claude:project:".
    • gen_ai.skill.description / gen_ai.skill.version from frontmatter; only written when non-empty.
  3. SKILL.md frontmatter read is best-effort, failures do not escape

    • _read_skill_metadata swallows file-open errors → {}.
    • _parse_skill_frontmatter swallows parse errors → {}.
    • _apply_skill_metadata call site wraps in try/except → logger.warning only.
    • _apply_skill_fallback call site also wrapped in try/except.
  4. Tests + cassette coverage, regression set green

    • test_skill_tool_span_attributes: happy path, asserts four attributes + tool.call.id.
    • test_skill_metadata_read_failure_does_not_break_sdk: cwd without SKILL.md → only name/id set.
    • Regression set (24 passed) green; ruff clean.

Findings

[minor] _apply_skill_fallback branch is not exercised by any test.
In both new tests, skill_name is captured at span start from tool_input.get("skill"), so _apply_skill_fallback returns early on if tool_invocation.skill_name: return. The tool_use_result.commandName path is never hit. Additionally, create_mock_message_from_data in test_with_cassettes.py / test_span_validation.py does not propagate tool_use_result onto the mock UserMessage, so the cassette cannot exercise it either. Suggest adding a case where tool_input omits skill (or sets it empty) and tool_use_result.commandName provides the fallback, asserting skill_name/skill_id are recovered.

[nit] test_skill_load.yaml cwd: __SKILL_CWD__ is an unsubstituted placeholder.
No substitution logic exists, so at runtime _apply_skill_metadata looks for __SKILL_CWD__/.claude/skills/probe-skill/SKILL.md as a relative path and falls through to best-effort. The cassette effectively only serves as smoke; it does not exercise the frontmatter parse path. Either have the cassette support a real tmp-path substitution or document it as smoke-only.

[nit] In _apply_skill_fallback, skill_id does not use str(command_name).
skill_name = str(command_name) but skill_id = f"{_SKILL_ID_PREFIX}{command_name}". f-string calls __str__ implicitly so behavior is identical, but for consistency with the previous line prefer f"{_SKILL_ID_PREFIX}{str(command_name)}".

[nit] _apply_skill_fallback recovers skill_name but does not attempt to re-read SKILL.md.
Per spec this is acceptable (name/id only on fallback). If description/version are wanted on the fallback path, call _read_skill_metadata after commandName is known. Optional enhancement.

Summary

Implementation matches the spec on all four required points; best-effort handling is solid and regression set is clean. Main gap is test coverage for the fallback branch. Suggested event: COMMENT (not APPROVE) pending the fallback test.

tool_invocation.skill_version = version


def _apply_skill_fallback(

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] _apply_skill_fallback is not exercised by any test.

In both new tests, skill_name is captured at span start from tool_input.get("skill"), so this function returns early at if tool_invocation.skill_name: return. The tool_use_result.commandName path below is never hit.

Also, create_mock_message_from_data in test_with_cassettes.py and test_span_validation.py does not propagate tool_use_result onto the mock UserMessage, so the cassette cannot exercise this path either.

Suggest adding a case where tool_input omits skill (or sets it empty) and tool_use_result.commandName provides the fallback, asserting skill_name/skill_id are recovered from the result.

command_name = tool_use_result.get("commandName")
if command_name:
tool_invocation.skill_name = str(command_name)
tool_invocation.skill_id = (

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] skill_id does not use str(command_name).

skill_name = str(command_name) on the previous line, but skill_id = f"{_SKILL_ID_PREFIX}{command_name}". The f-string calls __str__ implicitly so behavior is identical, but for consistency with the previous line prefer f"{_SKILL_ID_PREFIX}{str(command_name)}".

return
if not isinstance(tool_use_result, dict):
return
command_name = tool_use_result.get("commandName")

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] Optional enhancement: after recovering skill_name from commandName, re-read SKILL.md via _read_skill_metadata so description/version can still be populated on the fallback path.

Per spec this is acceptable (name/id only on fallback), so feel free to leave as-is — just flagging the option.

data:
type: system
subtype: init
cwd: __SKILL_CWD__

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] cwd: __SKILL_CWD__ is an unsubstituted placeholder.

No substitution logic exists, so at runtime _apply_skill_metadata looks for __SKILL_CWD__/.claude/skills/probe-skill/SKILL.md as a relative path and falls through to best-effort. The cassette effectively only serves as smoke; it does not exercise the frontmatter parse path.

Either have the cassette support a real tmp-path substitution (e.g. a fixture that rewrites __SKILL_CWD__ to tmp_path) or document it as smoke-only.

@sipercai sipercai closed this Jun 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants