Skip to content

Latest commit

 

History

History
366 lines (219 loc) · 37.7 KB

File metadata and controls

366 lines (219 loc) · 37.7 KB

Claude Code Skills: Complete Technical Deep-Dive

Claude Code Skills use a progressive disclosure architecture where skills load in three stages—metadata (30-50 tokens), full instructions, then resources—enabling Claude to access specialized capabilities without context bloat. Skills are stateless, model-invoked, and operate in isolated sandboxed environments with no persistence between sessions. Each skill is simply a directory containing a SKILL.md file with YAML frontmatter and markdown instructions.

Technical architecture and discovery

Claude Code Skills represent a filesystem-based capability system built on progressive disclosure principles. At session initialization, Claude scans designated directories (~/.claude/skills/, .claude/skills/, and plugin locations) to build a lightweight skill index. During this discovery phase, only YAML frontmatter metadata—specifically the name and description fields—loads into context, consuming merely 30-50 tokens per skill. This minimal overhead enables Claude to maintain awareness of potentially hundreds of skills without saturating the context window.

The discovery mechanism requires each skill directory to contain a properly formatted SKILL.md file. The YAML frontmatter must include two mandatory fields: name (max 64 characters, lowercase letters/numbers/hyphens only) and description (max 1024 characters). Invalid YAML causes silent loading failures—skills simply won't appear in the index. The system validates naming conventions, checking for XML tags and reserved words like "anthropic" or "claude" in the name field.

Once indexed, Claude employs semantic matching to determine skill relevance. When analyzing user requests, Claude compares the request intent against all indexed skill descriptions. This model-invoked pattern means Claude autonomously decides which skills to activate—no explicit user commands required. The description field becomes absolutely critical here; vague descriptions like "analyze data files" fail to trigger, while specific descriptions like "analyze Excel spreadsheets, create pivot tables, and generate charts when working with .xlsx files" enable reliable activation.

Three-stage loading mechanism

The progressive disclosure system operates across three distinct stages, each optimizing for different aspects of token efficiency and capability access.

Stage 1: Metadata scanning happens at session start. Every available skill contributes its name and description to the system prompt, creating a comprehensive but lightweight capability index. With 100 skills, this totals roughly 5,000 tokens—significant but manageable given modern context windows. This stage provides Claude with complete awareness of available capabilities at minimal cost.

Stage 2: Full content loading occurs when Claude determines skill relevance. The complete SKILL.md markdown body loads into context, typically consuming under 5,000 additional tokens. In the API execution model, skill files copy to /skills/{skill-name}/ in the container. For Claude Code, files remain in place but become accessible to Claude's bash commands. The system uses bash: read path/to/SKILL.md for content access, treating skills as filesystem resources rather than pre-loaded instructions.

Stage 3: Resource and code execution happens as needed. Skills can bundle additional reference files (forms.md, api_reference.md, templates), executable scripts (Python, bash), and assets (images, templates, data files). These files don't load into context until explicitly accessed or executed. Critically, when Claude executes scripts, the code itself never enters the context window—only script output consumes tokens. This separation enables effectively unlimited bundled content since files remain dormant until needed.

This architecture fundamentally differs from traditional system prompts, which must include all instructions with every request. Skills achieve token efficiency through on-demand loading while maintaining comprehensive capability coverage.

Persistence model and state management

Skills exhibit no persistence between sessions by architectural design. Each API request receives a fresh isolated container where skills copy into the execution environment, execute their function, then disappear when the container terminates. This stateless model ensures security isolation but means skills cannot accumulate learning or maintain configuration across requests.

Within a single session, behavior differs by platform. In Claude.ai web conversations, once a skill loads into context, it persists for that conversation's duration. Subsequent invocations in the same conversation don't require reloading. However, starting a new conversation triggers the discovery process anew. For Claude Code, skills discovered at CLI startup remain available throughout the session, with loaded skill instructions persisting in the conversation context window as long as the session continues.

The filesystem itself provides the only persistence mechanism. Personal skills in ~/.claude/skills/ persist globally across projects and sessions. Project skills in .claude/skills/ remain version-controlled with the codebase, enabling team sharing via git. Plugin skills bundle with plugin installations, auto-discovering when plugins activate and removing when uninstalled. Modifying any skill file requires restarting Claude Code to reflect changes—the discovery scan happens only at initialization.

For API usage, each request truly starts fresh. The container lifecycle follows this pattern: spin up container → copy skills to /skills/ directory → execute request → destroy container. No state carries between calls. This creates a fundamental constraint: skills cannot learn from previous interactions, cache computations, or maintain configuration. Every invocation starts from zero state.

File structure and format specifications

Every skill consists of a directory containing a mandatory SKILL.md file. The minimal viable skill requires just this structure:

---
name: skill-name
description: Clear description of what this skill does and when to use it
---

# Skill Instructions

[Markdown instructions for Claude]

The YAML frontmatter delimiter (---) must appear exactly three times—once before and once after the YAML block. The parser splits on --- and expects valid YAML in the middle section. Required fields follow strict validation rules: the name must contain only lowercase letters, numbers, and hyphens; the description must be non-empty with a practical minimum around 20 characters and maximum of 1024.

Optional frontmatter fields include version for manual version tracking, license for legal clarity, and allowed-tools (Claude Code only) for restricting tool access. The metadata field accepts arbitrary key-value pairs for extensions. All fields must avoid XML tags and reserved words.

Complex skills employ a hierarchical directory structure:

skill-name/
├── SKILL.md                 # Required: Core instructions
├── scripts/                 # Optional: Executable code
│   ├── validator.py         # Python scripts
│   └── processor.sh         # Shell scripts
├── references/              # Optional: Loaded on-demand
│   ├── api_reference.md     # Detailed technical docs
│   ├── schema.md            # Data schemas
│   └── examples.md          # Code examples
└── assets/                  # Optional: Output resources
    ├── template.html        # HTML templates
    ├── logo.png             # Images
    └── fonts/               # Font files

This structure enables progressive disclosure at the file level. The SKILL.md might reference references/api_reference.md for detailed API documentation, which Claude reads only when needed. Scripts execute without loading into context. Assets provide templates and resources for generated outputs.

Discovery process and registration

Skills register through implicit discovery rather than explicit API calls. At session initialization, Claude scans all configured directories in this priority order:

  1. Personal skills: ~/.claude/skills/*/SKILL.md
  2. Project skills: .claude/skills/*/SKILL.md
  3. Plugin skills: [plugin-dir]/skills/*/SKILL.md

The scan validates each SKILL.md file, parsing the YAML frontmatter and building the skill index. Invalid skills fail silently—the scan continues, but broken skills simply don't appear in the available capabilities list. Running Claude Code with debug flags reveals loading errors, critical for troubleshooting mysterious missing skills.

For API registration, the workflow differs substantially. Custom skills require explicit upload via the /v1/skills endpoint, which returns a skill_id like skill_01AbCdEfGhIjKlMnOpQrStUv. Version management happens through /v1/skills/{skill_id}/versions, creating epoch timestamp versions (e.g., 1759178010641129). Anthropic-managed skills use date-based versions (e.g., 20251002 for October 2, 2025). The special version string "latest" always resolves to the most recent version, though pinning specific versions prevents unexpected behavior changes.

Skill precedence follows directory scan order when name collisions occur—personal skills override project skills override plugin skills. However, skills with identical names generally indicates a configuration error rather than intentional overriding.

Model-invoked execution pattern

Claude Code Skills operate on a fundamentally different invocation model than traditional slash commands or API endpoints. Users never explicitly call skills; instead, Claude autonomously decides when skills apply. This model-invoked pattern treats skills as available capabilities that Claude selects based on semantic understanding of user intent.

When receiving a request, Claude follows this decision process: parse user intent from natural language → match intent against all indexed skill descriptions → evaluate relevance scores → load applicable skills → coordinate usage if multiple skills needed. The entire process happens transparently, though Claude's "thinking" often reveals which skills loaded ("Reading QBR skill...").

This autonomy enables sophisticated composition. For a request like "Create a Q3 financial report following our brand guidelines," Claude might identify needs for the brand-guidelines skill (formatting standards), excel skill (data analysis), and pptx skill (presentation creation). It loads all three, coordinates their usage (data analysis → formatting → presentation), and generates output following all three skill instruction sets simultaneously.

The model-invoked pattern creates both power and challenges. Power: Claude intelligently applies skills without manual orchestration. Challenges: skill descriptions must be exceptionally clear for reliable triggering. The description field carries dual responsibility—explaining what the skill does AND specifying when to use it. Poor descriptions cause the single most common skill issue: properly installed skills that never activate.

Multiple skill composition and limits

Claude can use multiple skills simultaneously with automatic coordination. The API enforces a hard limit of 8 skills per request. For Claude.ai and Claude Code, no documented upper limit exists, though token efficiency degrades beyond approximately 100 available skills. The metadata for 100 skills consumes around 5,000 tokens—significant overhead before any actual work begins.

Skills compose through Claude's orchestration rather than direct skill-to-skill communication. Skills cannot explicitly reference or invoke other skills. Instead, Claude loads multiple relevant skills and integrates their instructions through its reasoning process. This architecture keeps skills modular and focused while enabling sophisticated combined behaviors.

Common composition patterns include: data analysis skill → visualization skill, brand guidelines skill + document creation skill, code generation skill + testing skill. The document-skills bundle demonstrates deliberate composition design—separate skills for xlsx, docx, pptx, and pdf that work independently but compose naturally for multi-format document workflows.

No explicit conflict resolution mechanism exists. When skills contain contradictory instructions, Claude must resolve conflicts through reasoning. Best practice prevents conflicts through clear, non-overlapping skill descriptions with distinct trigger terms. Skills should focus on single domains rather than attempting broad coverage.

Loading and unloading mechanics

Skills cannot be explicitly unloaded during a session. Once skill instructions load into context, they persist until session end. No API, command, or mechanism exists to remove a loaded skill mid-conversation. This architectural decision simplifies state management but means loaded skills continue consuming context tokens throughout the session.

Workarounds for effective unloading include: ending the session and starting fresh, disabling skills in Settings before starting conversations (prevents future loading), or deleting the skill from the filesystem (requires restart). For API usage, each request starts with a fresh container, so "unloading" happens automatically—the next request simply doesn't include that skill.

Automatic cleanup occurs at session termination. All loaded skill content clears from context, containers destroy (API), and memory releases. No manual cleanup required. The filesystem remains unchanged—skills persist on disk until explicitly deleted via rm -rf ~/.claude/skills/my-skill or similar commands.

This unload limitation affects performance strategy. Loading unnecessary skills wastes context tokens for the entire session. Developers should carefully curate available skills, removing or disabling unused ones. The principle "fewer, better skills" outperforms "comprehensive skill library" for token efficiency.

Project setup and configuration requirements

Setting up Claude Code Skills requires minimal configuration but has specific prerequisites by platform.

Claude.ai requirements: Pro, Max, Team, or Enterprise plan (free tier excludes skills entirely). Code execution must be enabled in Settings → Capabilities. The Skills feature toggle must be on. For Enterprise, organization admins must enable code execution org-wide—individual users cannot override disabled capabilities.

Claude Code requirements: Claude Code v1.0 or later provides built-in skill support. Skills discover automatically from filesystem locations. No additional configuration necessary beyond placing SKILL.md files in appropriate directories. Changes require restarting Claude Code to trigger rediscovery.

API requirements: Standard API key with appropriate access. Three beta headers required in requests: code-execution-2025-08-25, skills-2025-10-02, and files-api-2025-04-14. The code execution tool must be included in the tools array. Skills must be uploaded via /v1/skills endpoint or reference Anthropic-managed skills by ID.

For personal skills, creating the directory structure suffices:

mkdir -p ~/.claude/skills/my-skill
cat > ~/.claude/skills/my-skill/SKILL.md <<'EOF'
---
name: my-skill
description: What it does and when to use it
---
# Instructions here
EOF

For project-level skills shared across teams, place skills in .claude/skills/ within the project root and commit to version control. This enables team distribution via git without requiring each developer to manually install skills.

Integration with code execution environment

Skills require the code execution tool as a mandatory dependency. This tight coupling means skills cannot function without code execution capabilities enabled. The execution environment provides a sandboxed Linux container with pre-installed packages (Python PyPI libraries, npm packages), bash command access, and filesystem operations.

Critical constraints affect skill design: no network access (the container has no external connectivity), no runtime package installation in API contexts (only pre-installed packages available), and no persistence between sessions (each container starts clean). Claude.ai and Claude Code can install packages on-demand from PyPI and npm, but API skills must rely exclusively on pre-configured container packages.

Script execution provides deterministic reliability. When skills include Python or bash scripts, Claude executes them via the code execution tool. Crucially, the script code itself never loads into context—only the script output consumes tokens. This enables complex computations, data transformations, or validations without context overhead. A skill can bundle a 1000-line Python script that performs complex calculations, and only the final result number enters context.

The security model mirrors code execution's sandboxing. Same permission prompts apply. Malicious skills can execute arbitrary code within container constraints. This risk makes skill source trustworthiness absolutely critical. Only install skills from sources you would trust to run code on your system—Anthropic-provided skills, your own creations, or thoroughly audited community skills.

Common pitfalls and gotchas

The single most frequent issue developers encounter: skills won't trigger despite proper installation. Root causes trace almost entirely to description field problems. Vague descriptions like "analyze data files" lack specificity for Claude's matching algorithm. Effective descriptions follow this pattern: "Analyze Excel spreadsheets, create pivot tables, and generate charts. Use when working with Excel files, spreadsheets, or analyzing tabular data in .xlsx format." The description must include both what the skill does and when to use it, with specific trigger terms that match likely user phrasings.

Context management creates substantial challenges in extended sessions. Claude Code can "give up too early" on complex tasks, abandoning work with explanations like "I've made significant progress but the requested functionality doesn't work for major cases." Mitigation: break large tasks into smaller, independently verifiable chunks. Two 10-minute tasks consistently outperform one 2-day struggle. When Claude hits context limits, it performs automatic compaction that summarizes previous work. Post-compaction Claude often forgets which files it was examining, repeats previously corrected mistakes, and may give up again. Manual triggers exist (/compact command) for proactive compaction, with /clear as the nuclear option for complete context reset.

Test-related failures plague developers. Claude initially writes tests that "look right at first glance but fail on first encounter," potentially spiraling into bad tests enabling bad code. Best practice: test-driven development where Claude writes tests first, developers rigorously review them, then implementation begins. Warning sign: Claude modifying tests to match wrong implementation rather than fixing code. Be extremely wary of any test file changes during implementation phases.

Compilation and build steps frequently get forgotten. Even with explicit CLAUDE.md instructions, Claude forgets to compile before running tests, especially after dependency changes. This creates false pass/fail loops—tests run against outdated binaries. Fix: manually interrupt (ESC key) and explicitly remind Claude to compile/install before test execution.

Git management issues emerge constantly. Claude uses unusual Git commands, creates PRs with wrong merge bases, or commits unintended files. Best practice from production users: "I do all the Git stuff"—developers drive version control, letting Claude modify files only. Manual git status reviews before commits prevent accidental 100MB binary commits.

Dead code accumulates when Claude rewrites without corresponding deletes. Instead of replacing old implementations, Claude creates parallel versions with "New" prefixes or leaves partial implementations scattered through files. Fix: dedicated cleanup sessions with focused prompts like "There's a lot of dead code in file X. Please examine carefully and remove any dead code."

Platform-specific limitations and edge cases

Skills don't sync across platforms. Skills uploaded to Claude.ai must be separately uploaded to the API. Claude Code skills are filesystem-based and completely separate. Each platform requires independent skill management. No centralized administration exists—enterprise admins cannot deploy organization-wide custom skills on Claude.ai (each user uploads individually), though API skills are workspace-wide.

Built-in skills vary by platform. Skills enabled in Claude.ai don't automatically appear in Claude Code. This inconsistency causes significant user confusion with no clear error messages. The built-in document skills (xlsx, docx, pptx, pdf) ship with Claude Code but require explicit enabling on Claude.ai. Community developers report this as a frequent friction point.

No network access creates a hard constraint on skill capabilities. Skills cannot make external API calls, fetch remote data, or connect to databases. This sandbox security feature prevents entire categories of skills—no weather data fetching, no CRM integration, no real-time stock prices. MCP (Model Context Protocol) servers provide the external connectivity that skills deliberately exclude.

Rate limits and undefined constraints affect usage unpredictably. Users report hitting undefined limits mid-task, with stalls and errors. Performance varies dramatically—"one session cruises through tricky refactoring, next coughs and forgets context." Some users report limits as low as 45 messages per 5 hours on certain plans. This inconsistency makes skill development challenging for resource-intensive workflows.

skill-creator paradox creates workflow friction: you cannot use the skill-creator skill within a project to create a skill about that project—the most intuitive workflow. Workarounds involve creating skills outside project context then moving them, adding unnecessary friction to the development loop.

Performance considerations and optimization

Token efficiency represents the core performance advantage. Progressive disclosure means 100 available skills consume only ~5,000 tokens until specific skills load. This scales dramatically better than alternatives. MCP servers can consume tens of thousands of tokens just for protocol metadata. System prompts must include all instructions every request. Skills provide comparable capabilities at a fraction of the cost.

However, token efficiency degrades with scale. Beyond approximately 100 available skills, the metadata overhead becomes substantial. Each skill adds 30-50 tokens to every request's system prompt. With 200 skills, you're spending 10,000+ tokens before Claude performs any actual work. Practical deployment requires curating skill collections rather than enabling everything available.

Optimization strategies for skill design:

Keep SKILL.md concise—under 500 lines as a guideline. Split larger content into separate reference files that load on-demand. Structure skills to minimize content loading. If your skill handles PDF forms, tables, and text extraction but users typically need just one operation, split into separate reference files (forms.md, tables.md, text_extraction.md) that load independently.

Use scripts for deterministic operations. Sorting 1000 items algorithmically consumes zero context tokens (only the sorted result appears). Asking Claude to sort via text generation consumes thousands of tokens and produces unreliable results. Code execution beats token generation for mathematical computation, data transformation, validation, and formatting operations.

Optimize descriptions for fast matching. Clear, specific descriptions help Claude make relevance decisions quickly. Vague descriptions require more reasoning tokens. The description "Create formatted PDF documents with tables and charts using Python's ReportLab library" matches faster and more reliably than "PDF file operations."

Prompt caching compatibility: Skills work with Claude's prompt caching feature (5-minute lifetime, extends on use). Keeping skill sets stable enables cache hits on the system prompt containing skill metadata. Changing the available skills invalidates the cache. For high-volume API usage, stable skill configurations significantly reduce costs.

Security risks and mitigation strategies

Skills execute arbitrary code in Claude's environment with the same permissions as code execution tool operations. This creates substantial security surface area. Primary risk vectors: prompt injection (malicious skills manipulating Claude to execute unintended actions), data exfiltration (skills with access to sensitive data leaking information), and tool misuse (skills invoking tools in harmful ways).

Anthropic's official security warning: "We strongly recommend using Skills only from trusted sources: those you created yourself or obtained from Anthropic. Skills provide Claude with new capabilities through instructions and code, and while this makes them powerful, it also means a malicious Skill can direct Claude to invoke tools or execute code in ways that don't match the Skill's stated purpose."

Audit guidelines for third-party skills:

Review all files bundled in the skill—SKILL.md, scripts, images, resources. Look for unusual patterns like unexpected network connection attempts, file access outside skill boundaries, or suspicious data handling. Pay attention to code dependencies and bundled resources. Check for instructions connecting to external networks (note: blocked by container, but indicates intent). Inspect any instructions handling sensitive data.

For production systems with sensitive data, implement validation layers beyond Claude's sandboxing. Run skills in further isolated environments. Perform regular security reviews of installed skills. Maintain audit logs of skill usage. Treat skill installation like installing software packages—apply similar security rigor.

The allowed-tools field (Claude Code only) provides permission restrictions, limiting which tools skills can invoke. Example: a read-only analysis skill might specify allowed-tools: ["read", "grep"] to prevent file modifications. However, this field provides defense-in-depth rather than primary security—always vet skill code thoroughly.

Best practices from production usage

Developers with extensive production experience converge on several critical practices.

Task sizing dramatically affects success rates. Small, isolated problems consistently outperform large, complex tasks. DoltHub's experience with a 42,000-test suite: "Two 10-minute PRs vs one $100, 2-day struggle." Even if you would naturally group tasks, don't for Claude. If Claude starts giving up, the task is definitively too large—subdivide immediately.

Active collaboration beats passive supervision. Stop thinking of Claude Code as a tool; treat it as an incredibly fast junior developer needing good direction. Be an active collaborator reviewing plans before execution, guiding approach rather than just watching, interrupting when trajectories go off track. The ESC key becomes your most-used interface element for course correction.

Test-driven development prevents death spirals. Have Claude write tests first. Spend more time reviewing generated tests than reviewing code. Give Claude tools to see its outputs (screenshots, test results). Provide expected outputs like tests for iteration. This workflow prevents bad tests enabling bad code, a common failure mode.

Description fields require iteration and testing. Write descriptions, test with representative prompts, refine based on whether skills trigger correctly. Include specific terminology users will employ. "PDF files" works better than "document management." "Excel spreadsheets" beats "tabular data analysis." Match user vocabulary exactly.

CLAUDE.md files provide project-level guidance. Create this file at project root with explicit, copy-pastable commands. Include coding conventions, testing instructions, compilation steps, safety warnings ("Never delete files outside ./tmp"), and scope boundaries. Put most important rules at top. Brief, unambiguous sections work better than comprehensive documentation.

Manual Git control prevents disasters. Let Claude modify code files. Developers drive git add, git commit, git push. Review git status before every commit. This division prevents wrong merge bases, accidental binary commits, and unusual Git command usage.

Troubleshooting toolkit and debugging

Skill won't trigger checklist:

  • Description includes both "what" and "when" ✓
  • Contains specific trigger words matching user vocabulary ✓
  • Code execution enabled in Settings ✓
  • Skill toggled ON in Settings → Capabilities ✓
  • Claude Code restarted after adding skill ✓
  • Try explicit mention: "Use my [skill-name] skill to..." ✓

Upload failures checklist:

  • ZIP file under size limits (8MB for API) ✓
  • Folder name exactly matches skill name ✓
  • SKILL.md file present with correct capitalization ✓
  • No invalid characters in name/description ✓
  • YAML parses correctly (test with online validator) ✓

Skills greyed out: Code execution may be disabled organization-wide (Team/Enterprise plans). Check with organization Owner. Individual users cannot override org-level restrictions.

Debug mode for skill loading: Run Claude Code with debug flags to see detailed skill loading logs. Specific commands vary by version—consult current documentation. Look for YAML parsing errors, file access issues, or validation failures.

Performance debugging: If skills load but perform poorly, profile which skills consume the most tokens. The skill-performance-profiler meta-skill (community-developed) identifies wasteful skills. Consider consolidating or removing underused skills to free context budget.

Community troubleshooting resources: The awesome-claude-skills GitHub repository maintains troubleshooting guides. DEV community posts document common issues. Anthropic's Discord channels provide peer support. GitHub issues on anthropics/skills repository track known problems.

Practical step-by-step implementation

Quick start (5 minutes):

# Create personal skill directory
mkdir -p ~/.claude/skills/example-skill

# Create basic SKILL.md
cat > ~/.claude/skills/example-skill/SKILL.md <<'EOF'
---
name: example-skill
description: Demonstrates basic skill structure. Use when testing skill functionality.
---

# Example Skill

## Instructions

When this skill is active, respond with "Example skill is working!"

## Examples

- User: "Test the example skill"
- Response: "Example skill is working!"
EOF

# Restart Claude Code to discover new skill
# Test with: "Test the example skill"

Production deployment workflow:

  1. Define skill scope: Identify specific capability gap. What should Claude do that it currently cannot or does poorly? Write one-sentence goal.

  2. Create skill structure: Use skill-creator skill or manual template. Establish directory with SKILL.md. Add subdirectories (scripts/, references/, assets/) as needed.

  3. Write description carefully: Spend disproportionate time here. Include what the skill does, when to use it, and specific trigger terms. Test descriptions with colleagues—can they predict when skill would activate?

  4. Implement instructions: Write clear, imperative instructions. Use "Do X" not "You should do X." Reference external files for detailed content. Keep main SKILL.md under 500 lines.

  5. Add executable components: Create scripts for deterministic operations. Place in scripts/ directory. Make executable (chmod +x). Test independently before integration.

  6. Test thoroughly: Create test prompts matching description. Verify skill triggers. Check output follows instructions. Test edge cases. Validate script execution. Test with multiple models (Haiku needs more guidance, Opus needs less).

  7. Deploy appropriately: Personal skills to ~/.claude/skills/. Project skills to .claude/skills/ and commit to git. API skills via /v1/skills endpoint. Document installation process for team.

  8. Iterate based on usage: Monitor how Claude uses skill in practice. Look for unexpected behaviors or missed activations. Refine description and instructions. Have Claude capture successful patterns into refined skill versions.

Example production skill (webapp-testing):

---
name: webapp-testing
description: Test local web applications using Playwright. Use when users want to automate browser testing, verify UI functionality, or test web applications.
---

# Web Application Testing Skill

## Instructions

1. Ensure Playwright is installed: `pip install playwright`
2. Install browsers: `playwright install`
3. Create test script in scripts/test_app.py
4. Run tests: `python scripts/test_app.py`
5. Report results with screenshots for failures

## Test Structure

```python
from playwright.sync_api import sync_playwright

def test_homepage():
    with sync_playwright() as p:
        browser = p.chromium.launch()
        page = browser.new_page()
        page.goto('http://localhost:3000')
        assert page.title() == 'Expected Title'
        browser.close()

Examples

  • "Test that the login form accepts credentials"
  • "Verify the checkout flow works end-to-end"
  • "Check that all navigation links work correctly"

## Architectural comparison: skills vs alternatives

**Skills vs MCP servers:** MCP provides external tool integration with network access, complex server setup, and tens of thousands of tokens overhead. Skills provide procedural knowledge, zero network access, simple filesystem deployment, and 30-50 token overhead until loaded. Use MCP for external data sources (GitHub API, CRM access). Use skills for workflows and procedures (how to use that GitHub API effectively). They complement rather than compete.

**Skills vs system prompts:** System prompts must include all instructions every request. Skills load metadata cheaply then full content on-demand. System prompts lack versioning and reusability. Skills are modular and shareable. For one-off customizations, use system prompts. For reusable capabilities, use skills.

**Skills vs subagents:** Subagents are separate AI assistants with isolated context windows, delegated for complex multi-step workflows. Skills are capability enhancements that share the main conversation context. Use subagents for complex analysis, code review, research, or multi-step reasoning. Use skills for utility functions, templates, domain knowledge, or quick operations. Skills and subagents work together—a subagent might invoke skills during its delegated task.

**Skills vs Projects (Claude.ai):** Projects provide bounded workspaces with accumulated context over time. Skills provide capabilities that work across all conversations. Projects are context, skills are capability. Use both together—projects provide domain knowledge and conversation history, skills provide procedures for working with that context.

## Future developments and ecosystem

Anthropic's stated roadmap includes simplified skill creation workflows, enterprise-wide deployment capabilities, better tooling for skill discovery and sharing, and exploring agents creating/editing their own skills. The vision: agents that capture successful patterns into reusable skills automatically, building organizational capability libraries over time.

Community predictions suggest explosive skill ecosystem growth. Simon Willison: "I expect we'll see a Cambrian explosion in Skills which will make this year's MCP rush look pedestrian by comparison." The simplicity advantage—markdown + YAML versus complex protocols—lowers barriers to contribution dramatically.

Emerging patterns include skills-as-code (version controlling skills like software, code review for changes), AGENTS.md formalization (skills make the established AGENTS.md convention scalable through progressive disclosure), and skill composition architectures (building complex behaviors from simple, focused skills).

The skills complement MCP future involves skills teaching complex workflows using external tools provided by MCP. Example: MCP server provides Salesforce API access, skill teaches effective CRM management workflows using that API. This division—MCP for connectivity, skills for capability—creates powerful combinations.

## Critical takeaways for practitioners

**What works:** Small focused skills for specific tasks, test-driven development workflows, breaking large tasks into smaller chunks, active collaboration with Claude (not passive supervision), manual Git control, clear specific descriptions with trigger words, progressive disclosure architecture, skills for utilities with subagents for complex workflows.

**What doesn't work:** Vague skill descriptions, letting Claude manage Git, large complex tasks in one go, trusting test changes without review, assuming skills sync across platforms, ignoring context compaction signals, network-dependent operations in skills, installing untrusted skills.

**Core success factors:** Description quality determines 80% of trigger reliability. Task size dramatically affects success rate—smaller always better. Test review rigor before implementation prevents death spirals. Proactive context management with `/compact` before hitting limits. Treat as collaboration, not magic. Security matters—only trusted sources. Platform awareness—know which features work where.

Skills represent a fundamentally different architectural approach to extending AI capabilities. The progressive disclosure model, model-invoked pattern, and filesystem-based simplicity create a capability system that scales efficiently while remaining accessible to non-expert developers. For Claude Code workflows, skills should be the first extension mechanism considered, with more complex alternatives like MCP reserved for requirements—particularly external connectivity—that skills cannot address.