feat(claude-agent-sdk): capture gen_ai.skill.* on Skill load execute_tool span#225
Conversation
…tool span Attach gen_ai.skill.name/id/description/version to the execute_tool span of the built-in Skill tool. Telemetry is bound to the ToolUseBlock(name="Skill") tool span (not the SKILL.md-injecting UserMessage TextBlock). - skill.name from ToolUseBlock.input.skill (frontmatter.name fallback) - skill.id = claude:project:<skill-name> - skill.description/version read best-effort from <cwd>/.claude/skills/<name>/SKILL.md frontmatter (cwd from SystemMessage.data.cwd) - fallback to UserMessage.tool_use_result.commandName when start info incomplete - metadata read failures never propagate to the SDK call Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
流屿 seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
rangemer333-cell
left a comment
There was a problem hiding this comment.
Code review for AGE-239 — Skill telemetry on execute_tool span.
Verification against spec
-
Telemetry is on
ToolUseBlock(name="Skill")execute_tool span, not on UserMessage TextBlock ✅_create_tool_spans_from_message(patch.py) createsExecuteToolInvocationforname=="Skill"ToolUseBlocks and calls_apply_skill_metadata→handler.start_execute_tool.- The
UserMessageTextBlockcarryingSKILL.mdcontent is handled in_process_user_messageand only feedsuser_parts/collected_messages; no span attributes are written there. ✓
-
Four attributes map correctly,
skill.id = claude:project:<name>✅gen_ai.skill.name= frontmattername(fallback toToolUseBlock.input.skill).gen_ai.skill.id=f"{_SKILL_ID_PREFIX}{name}"with_SKILL_ID_PREFIX = "claude:project:".gen_ai.skill.description/gen_ai.skill.versionfrom frontmatter; only written when non-empty.
-
SKILL.md frontmatter read is best-effort, failures do not escape ✅
_read_skill_metadataswallows file-open errors →{}._parse_skill_frontmatterswallows parse errors →{}._apply_skill_metadatacall site wraps in try/except →logger.warningonly._apply_skill_fallbackcall site also wrapped in try/except.
-
Tests + cassette coverage, regression set green ✅
test_skill_tool_span_attributes: happy path, asserts four attributes +tool.call.id.test_skill_metadata_read_failure_does_not_break_sdk: cwd without SKILL.md → only name/id set.- Regression set (24 passed) green; ruff clean.
Findings
[minor] _apply_skill_fallback branch is not exercised by any test.
In both new tests, skill_name is captured at span start from tool_input.get("skill"), so _apply_skill_fallback returns early on if tool_invocation.skill_name: return. The tool_use_result.commandName path is never hit. Additionally, create_mock_message_from_data in test_with_cassettes.py / test_span_validation.py does not propagate tool_use_result onto the mock UserMessage, so the cassette cannot exercise it either. Suggest adding a case where tool_input omits skill (or sets it empty) and tool_use_result.commandName provides the fallback, asserting skill_name/skill_id are recovered.
[nit] test_skill_load.yaml cwd: __SKILL_CWD__ is an unsubstituted placeholder.
No substitution logic exists, so at runtime _apply_skill_metadata looks for __SKILL_CWD__/.claude/skills/probe-skill/SKILL.md as a relative path and falls through to best-effort. The cassette effectively only serves as smoke; it does not exercise the frontmatter parse path. Either have the cassette support a real tmp-path substitution or document it as smoke-only.
[nit] In _apply_skill_fallback, skill_id does not use str(command_name).
skill_name = str(command_name) but skill_id = f"{_SKILL_ID_PREFIX}{command_name}". f-string calls __str__ implicitly so behavior is identical, but for consistency with the previous line prefer f"{_SKILL_ID_PREFIX}{str(command_name)}".
[nit] _apply_skill_fallback recovers skill_name but does not attempt to re-read SKILL.md.
Per spec this is acceptable (name/id only on fallback). If description/version are wanted on the fallback path, call _read_skill_metadata after commandName is known. Optional enhancement.
Summary
Implementation matches the spec on all four required points; best-effort handling is solid and regression set is clean. Main gap is test coverage for the fallback branch. Suggested event: COMMENT (not APPROVE) pending the fallback test.
| tool_invocation.skill_version = version | ||
|
|
||
|
|
||
| def _apply_skill_fallback( |
There was a problem hiding this comment.
[minor] _apply_skill_fallback is not exercised by any test.
In both new tests, skill_name is captured at span start from tool_input.get("skill"), so this function returns early at if tool_invocation.skill_name: return. The tool_use_result.commandName path below is never hit.
Also, create_mock_message_from_data in test_with_cassettes.py and test_span_validation.py does not propagate tool_use_result onto the mock UserMessage, so the cassette cannot exercise this path either.
Suggest adding a case where tool_input omits skill (or sets it empty) and tool_use_result.commandName provides the fallback, asserting skill_name/skill_id are recovered from the result.
| command_name = tool_use_result.get("commandName") | ||
| if command_name: | ||
| tool_invocation.skill_name = str(command_name) | ||
| tool_invocation.skill_id = ( |
There was a problem hiding this comment.
[nit] skill_id does not use str(command_name).
skill_name = str(command_name) on the previous line, but skill_id = f"{_SKILL_ID_PREFIX}{command_name}". The f-string calls __str__ implicitly so behavior is identical, but for consistency with the previous line prefer f"{_SKILL_ID_PREFIX}{str(command_name)}".
| return | ||
| if not isinstance(tool_use_result, dict): | ||
| return | ||
| command_name = tool_use_result.get("commandName") |
There was a problem hiding this comment.
[nit] Optional enhancement: after recovering skill_name from commandName, re-read SKILL.md via _read_skill_metadata so description/version can still be populated on the fallback path.
Per spec this is acceptable (name/id only on fallback), so feel free to leave as-is — just flagging the option.
| data: | ||
| type: system | ||
| subtype: init | ||
| cwd: __SKILL_CWD__ |
There was a problem hiding this comment.
[nit] cwd: __SKILL_CWD__ is an unsubstituted placeholder.
No substitution logic exists, so at runtime _apply_skill_metadata looks for __SKILL_CWD__/.claude/skills/probe-skill/SKILL.md as a relative path and falls through to best-effort. The cassette effectively only serves as smoke; it does not exercise the frontmatter parse path.
Either have the cassette support a real tmp-path substitution (e.g. a fixture that rewrites __SKILL_CWD__ to tmp_path) or document it as smoke-only.
Summary
Attach
gen_ai.skill.name/gen_ai.skill.id/gen_ai.skill.description/gen_ai.skill.versionto theexecute_toolspan of the built-inSkilltool in the Claude Agent SDK instrumentation.Telemetry is bound to the
ToolUseBlock(name="Skill")tool span — not to the subsequentUserMessageTextBlockthat injectsSKILL.mdcontent (per the AGE-239 spec).Attribute mapping
ToolUseBlock.input.skill/ frontmatternamegen_ai.skill.nameclaude:project:<skill-name>gen_ai.skill.idSKILL.mdfrontmatterdescriptiongen_ai.skill.descriptionSKILL.mdfrontmatterversiongen_ai.skill.versionSystemMessage.data.cwd<cwd>/.claude/skills/<name>/SKILL.mdIf start-time info is incomplete, skill name falls back to
UserMessage.tool_use_result.commandNamebefore the span closes.Robustness
Skill metadata is read best-effort: any failure (missing/unreadable
SKILL.md, malformed frontmatter, …) is swallowed and never affects the SDK call. WhenSKILL.mdis unavailable, onlygen_ai.skill.name/gen_ai.skill.id(derived from the tool input) are set.Validation
All spec-scoped tests pass (24 passed); ruff clean.
Draft PR per task scope — not for merge yet; review threads left open.
🤖 Generated with Claude Code