Transform test files into documentation assets by extracting real API usage patterns
The Test Example Extractor analyzes test files to automatically extract meaningful usage examples showing:
- Object Instantiation: Real parameter values and configuration
- Method Calls: Expected behaviors and return values
- Configuration Examples: Valid configuration dictionaries
- Setup Patterns: Initialization from setUp() methods and pytest fixtures
- Multi-Step Workflows: Integration test sequences
| Language | Extraction Method | Supported Features |
|---|---|---|
| Python | AST-based (deep) | All categories, high accuracy |
| JavaScript | Regex patterns | Instantiation, assertions, configs |
| TypeScript | Regex patterns | Instantiation, assertions, configs |
| Go | Regex patterns | Table tests, assertions |
| Rust | Regex patterns | Test macros, assertions |
| Java | Regex patterns | JUnit patterns |
| C# | Regex patterns | xUnit patterns |
| PHP | Regex patterns | PHPUnit patterns |
| Ruby | Regex patterns | RSpec patterns |
# Extract from directory
skill-seekers extract-test-examples tests/ --language python
# Extract from single file
skill-seekers extract-test-examples --file tests/test_scraper.py
# JSON output
skill-seekers extract-test-examples tests/ --json > examples.json
# Markdown output
skill-seekers extract-test-examples tests/ --markdown > examples.md
# Filter by confidence
skill-seekers extract-test-examples tests/ --min-confidence 0.7
# Limit examples per file
skill-seekers extract-test-examples tests/ --max-per-file 5# From Claude Code
extract_test_examples(directory="tests/", language="python")
# Single file with JSON output
extract_test_examples(file="tests/test_api.py", json=True)
# High confidence only
extract_test_examples(directory="tests/", min_confidence=0.7)# Combine with codebase analysis
skill-seekers analyze --directory . --extract-test-examples{
"total_examples": 42,
"examples_by_category": {
"instantiation": 15,
"method_call": 12,
"config": 8,
"setup": 4,
"workflow": 3
},
"examples_by_language": {
"Python": 42
},
"avg_complexity": 0.65,
"high_value_count": 28,
"examples": [
{
"example_id": "a3f2b1c0",
"test_name": "test_database_connection",
"category": "instantiation",
"code": "db = Database(host=\"localhost\", port=5432)",
"language": "Python",
"description": "Instantiate Database: Test database connection",
"expected_behavior": "self.assertTrue(db.connect())",
"setup_code": null,
"file_path": "tests/test_db.py",
"line_start": 15,
"line_end": 15,
"complexity_score": 0.6,
"confidence": 0.85,
"tags": ["unittest"],
"dependencies": ["unittest", "database"]
}
]
}# Test Example Extraction Report
**Total Examples**: 42
**High Value Examples** (confidence > 0.7): 28
**Average Complexity**: 0.65
## Examples by Category
- **instantiation**: 15
- **method_call**: 12
- **config**: 8
- **setup**: 4
- **workflow**: 3
## Extracted Examples
### test_database_connection
**Category**: instantiation
**Description**: Instantiate Database: Test database connection
**Expected**: self.assertTrue(db.connect())
**Confidence**: 0.85
**Tags**: unittest
```python
db = Database(host="localhost", port=5432)Source: tests/test_db.py:15
## Extraction Categories
### 1. Instantiation
**Extracts**: Object creation with real parameters
```python
# Example from test
db = Database(
host="localhost",
port=5432,
user="admin",
password="secret"
)
Use Case: Shows valid initialization parameters
Extracts: Method calls followed by assertions
# Example from test
response = api.get("/users/1")
assert response.status_code == 200Use Case: Demonstrates expected behavior
Extracts: Configuration dictionaries (2+ keys)
# Example from test
config = {
"debug": True,
"database_url": "postgresql://localhost/test",
"cache_enabled": False
}Use Case: Shows valid configuration examples
Extracts: setUp() methods and pytest fixtures
# Example from setUp
self.client = APIClient(api_key="test-key")
self.client.connect()Use Case: Demonstrates initialization sequences
Extracts: Multi-step integration tests (3+ steps)
# Example workflow
user = User(name="John", email="john@example.com")
user.save()
user.verify()
session = user.login(password="secret")
assert session.is_activeUse Case: Shows complete usage patterns
- Instantiation: 0.8 (high - clear object creation)
- Method Call + Assertion: 0.85 (very high - behavior proven)
- Config Dict: 0.75 (good - clear configuration)
- Workflow: 0.9 (excellent - complete pattern)
Removes:
- Trivial patterns:
assertTrue(True),assertEqual(1, 1) - Mock-only code:
Mock(),MagicMock() - Too short: < 20 characters
- Empty constructors:
MyClass()with no parameters
Adjustable Thresholds:
# High confidence only (0.7+)
--min-confidence 0.7
# Allow lower confidence for discovery
--min-confidence 0.4Problem: Documentation often lacks real usage examples
Solution: Extract examples from working tests
# Generate examples for SKILL.md
skill-seekers extract-test-examples tests/ --markdown >> SKILL.mdProblem: New developers struggle with API usage
Solution: Show how APIs are actually tested
Problem: Creating step-by-step guides is time-consuming
Solution: Use workflow examples as tutorial steps
Problem: Valid configuration is unclear
Solution: Extract config dictionaries from tests
TestExampleExtractor (Orchestrator)
├── PythonTestAnalyzer (AST-based)
│ ├── extract_from_test_class()
│ ├── extract_from_test_function()
│ ├── _find_instantiations()
│ ├── _find_method_calls_with_assertions()
│ ├── _find_config_dicts()
│ └── _find_workflows()
├── GenericTestAnalyzer (Regex-based)
│ └── PATTERNS (per-language regex)
└── ExampleQualityFilter
├── filter()
└── _is_trivial()
- Find Test Files: Glob patterns (test_*.py, *_test.go, etc.)
- Detect Language: File extension mapping
- Extract Examples:
- Python → PythonTestAnalyzer (AST)
- Others → GenericTestAnalyzer (Regex)
- Apply Quality Filter: Remove trivial patterns
- Limit Per File: Top N by confidence
- Generate Report: JSON or Markdown
- Python: Full AST-based extraction (all categories)
- Other Languages: Regex-based (limited to common patterns)
- Focus: Test files only (not production code)
- Complexity: Simple to moderate test patterns
- Complex mocking setups
- Parameterized tests (partial support)
- Nested helper functions
- Dynamically generated tests
- C3.3: Build 'how to' guides from workflow examples
- C3.4: Extract configuration patterns
- C3.5: Architectural overview from test coverage
Symptom: total_examples: 0
Causes:
- Test files not found (check patterns: test_*.py, *_test.go)
- Confidence threshold too high
- Language not supported
Solutions:
# Lower confidence threshold
--min-confidence 0.3
# Check test file detection
ls tests/test_*.py
# Verify language support
--language python # Use supported languageSymptom: Many trivial or incomplete examples
Causes:
- Tests use heavy mocking
- Tests are too simple
- Confidence threshold too low
Solutions:
# Increase confidence threshold
--min-confidence 0.7
# Reduce examples per file (get best only)
--max-per-file 3Symptom: Failed to parse warnings
Causes:
- Syntax errors in test files
- Incompatible Python version
- Dynamic code generation
Solutions:
- Fix syntax errors in test files
- Ensure tests are valid Python/JS/Go code
- Errors are logged but don't stop extraction
# tests/test_database.py
import unittest
class TestDatabase(unittest.TestCase):
def test_connection(self):
"""Test database connection with real params"""
db = Database(
host="localhost",
port=5432,
user="admin",
timeout=30
)
self.assertTrue(db.connect())Extracts:
- Category: instantiation
- Code:
db = Database(host="localhost", port=5432, user="admin", timeout=30) - Confidence: 0.8
- Expected:
self.assertTrue(db.connect())
# tests/test_api.py
import pytest
@pytest.fixture
def client():
return APIClient(base_url="https://api.test.com")
def test_get_user(client):
"""Test fetching user data"""
response = client.get("/users/123")
assert response.status_code == 200
assert response.json()["id"] == 123Extracts:
- Category: method_call
- Setup:
# Fixtures: client - Code:
response = client.get("/users/123")\nassert response.status_code == 200 - Confidence: 0.85
// add_test.go
func TestAdd(t *testing.T) {
calc := Calculator{mode: "basic"}
result := calc.Add(2, 3)
if result != 5 {
t.Errorf("Add(2, 3) = %d; want 5", result)
}
}Extracts:
- Category: instantiation
- Code:
calc := Calculator{mode: "basic"} - Confidence: 0.6
| Metric | Value |
|---|---|
| Processing Speed | ~100 files/second (Python AST) |
| Memory Usage | ~50MB for 1000 test files |
| Example Quality | 80%+ high-confidence (>0.7) |
| False Positives | <5% (with default filtering) |
skill-seekers extract-test-examples tests/codebase-scraper --directory . --extract-test-examples# Via Claude Code
extract_test_examples(directory="tests/")from skill_seekers.cli.test_example_extractor import TestExampleExtractor
extractor = TestExampleExtractor(min_confidence=0.6)
report = extractor.extract_from_directory("tests/")
print(f"Found {report.total_examples} examples")
for example in report.examples:
print(f"- {example.test_name}: {example.code[:50]}...")- Pattern Detection (C3.1) - Detect design patterns
- Codebase Scraper - Analyze local repositories
- Unified Scraping - Multi-source documentation
Status: ✅ Implemented in v2.6.0 Issue: #TBD (C3.2) Related Tasks: C3.1 (Pattern Detection), C3.3-C3.5 (Future enhancements)