Skip to content

feat: add built-in debug plugins for generation, validation, and sampling #1250

Description

@akihikokuroda

Summary

Add comprehensive built-in debug plugins to Mellea for tracing and troubleshooting the full lifecycle of generation, validation, and sampling pipelines.

What's New

Three categories of debug plugins in mellea.plugins.builtin_debug:

Generation Pipeline (generation.py)

  • log_generation_pre_call: Logs model ID, generation ID, prompt preview, and repair feedback
  • log_generation_post_call: Logs response preview, latency (ms), token usage, and model

Validation Pipeline (validation.py)

  • log_validation_pre_check: Logs requirement count and target type
  • log_validation_post_check: Logs per-requirement pass/fail status with reasons and scores

Sampling Pipeline (sampling.py)

  • log_sampling_loop_start: Logs strategy name, loop budget, requirement count
  • log_sampling_iteration: Logs iteration number, pass/fail counts, per-requirement status
  • log_sampling_repair: Logs repair trigger point, repair type, failed validations
  • log_sampling_loop_end: Logs success/failure, iterations used, and statistics

Documentation

  • New how-to guide: docs/docs/how-to/debug-with-plugins.md covering all plugin categories, usage patterns, and 6 common debugging scenarios
  • 7 runnable example scripts in docs/examples/plugins/ demonstrating each plugin category and combinations

Examples

  • builtin_generation_tracing.py — Basic generation pipeline tracing
  • builtin_validation_tracing.py — Requirement validation with detailed per-requirement results
  • builtin_validation_failures.py — Real validation failures demonstration
  • builtin_validation_strict.py — Strict requirements testing
  • builtin_sampling_diagnostics.py — Sampling strategy diagnostics with repair events
  • builtin_full_pipeline_tracing.py — End-to-end generation + sampling visibility
  • builtin_complete_diagnostics.py — All three plugin categories together

Quality Checks

  • ✅ All code passes ruff format and ruff check
  • ✅ All code passes mypy type checking
  • ✅ All 7 examples execute successfully
  • ✅ Documentation passes markdownlint validation

Usage

from mellea.plugins.builtin_debug.generation import (
    log_generation_pre_call,
    log_generation_post_call,
)
from mellea.plugins import register

register([
    log_generation_pre_call,
    log_generation_post_call,
])

with mellea.start_session() as m:
    result = m.instruct("...")  # Tracing fires automatically

See docs/docs/how-to/debug-with-plugins.md for complete usage guide and debugging scenarios.

Metadata

Metadata

Assignees

Labels

p2Medium/low: minor bugs, niche features, polish, docs, tests, cleanup. Scoped, lower urgency.

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions