Summary
Add comprehensive built-in debug plugins to Mellea for tracing and troubleshooting the full lifecycle of generation, validation, and sampling pipelines.
What's New
Three categories of debug plugins in mellea.plugins.builtin_debug:
Generation Pipeline (generation.py)
log_generation_pre_call: Logs model ID, generation ID, prompt preview, and repair feedback
log_generation_post_call: Logs response preview, latency (ms), token usage, and model
Validation Pipeline (validation.py)
log_validation_pre_check: Logs requirement count and target type
log_validation_post_check: Logs per-requirement pass/fail status with reasons and scores
Sampling Pipeline (sampling.py)
log_sampling_loop_start: Logs strategy name, loop budget, requirement count
log_sampling_iteration: Logs iteration number, pass/fail counts, per-requirement status
log_sampling_repair: Logs repair trigger point, repair type, failed validations
log_sampling_loop_end: Logs success/failure, iterations used, and statistics
Documentation
- New how-to guide:
docs/docs/how-to/debug-with-plugins.md covering all plugin categories, usage patterns, and 6 common debugging scenarios
- 7 runnable example scripts in
docs/examples/plugins/ demonstrating each plugin category and combinations
Examples
builtin_generation_tracing.py — Basic generation pipeline tracing
builtin_validation_tracing.py — Requirement validation with detailed per-requirement results
builtin_validation_failures.py — Real validation failures demonstration
builtin_validation_strict.py — Strict requirements testing
builtin_sampling_diagnostics.py — Sampling strategy diagnostics with repair events
builtin_full_pipeline_tracing.py — End-to-end generation + sampling visibility
builtin_complete_diagnostics.py — All three plugin categories together
Quality Checks
- ✅ All code passes
ruff format and ruff check
- ✅ All code passes
mypy type checking
- ✅ All 7 examples execute successfully
- ✅ Documentation passes markdownlint validation
Usage
from mellea.plugins.builtin_debug.generation import (
log_generation_pre_call,
log_generation_post_call,
)
from mellea.plugins import register
register([
log_generation_pre_call,
log_generation_post_call,
])
with mellea.start_session() as m:
result = m.instruct("...") # Tracing fires automatically
See docs/docs/how-to/debug-with-plugins.md for complete usage guide and debugging scenarios.
Summary
Add comprehensive built-in debug plugins to Mellea for tracing and troubleshooting the full lifecycle of generation, validation, and sampling pipelines.
What's New
Three categories of debug plugins in
mellea.plugins.builtin_debug:Generation Pipeline (
generation.py)log_generation_pre_call: Logs model ID, generation ID, prompt preview, and repair feedbacklog_generation_post_call: Logs response preview, latency (ms), token usage, and modelValidation Pipeline (
validation.py)log_validation_pre_check: Logs requirement count and target typelog_validation_post_check: Logs per-requirement pass/fail status with reasons and scoresSampling Pipeline (
sampling.py)log_sampling_loop_start: Logs strategy name, loop budget, requirement countlog_sampling_iteration: Logs iteration number, pass/fail counts, per-requirement statuslog_sampling_repair: Logs repair trigger point, repair type, failed validationslog_sampling_loop_end: Logs success/failure, iterations used, and statisticsDocumentation
docs/docs/how-to/debug-with-plugins.mdcovering all plugin categories, usage patterns, and 6 common debugging scenariosdocs/examples/plugins/demonstrating each plugin category and combinationsExamples
builtin_generation_tracing.py— Basic generation pipeline tracingbuiltin_validation_tracing.py— Requirement validation with detailed per-requirement resultsbuiltin_validation_failures.py— Real validation failures demonstrationbuiltin_validation_strict.py— Strict requirements testingbuiltin_sampling_diagnostics.py— Sampling strategy diagnostics with repair eventsbuiltin_full_pipeline_tracing.py— End-to-end generation + sampling visibilitybuiltin_complete_diagnostics.py— All three plugin categories togetherQuality Checks
ruff formatandruff checkmypytype checkingUsage