[WIP] Agent skills (aiperf-*) + SessionStart bootstrap + per-pattern docs by ajcasagrande · Pull Request #937 · ai-dynamo/aiperf

ajcasagrande · 2026-05-14T03:29:55Z

Warning

Not ready for prime time yet, still needs more tweaks

Summary

Introduces 26 aiperf-* skills under .agents/skills/ (mirrored to .claude/skills/) — a focused skill set for AI agents working in this repo, covering review, runtime testing, authoring, daily rituals, workspace utilities, and specialized debugging. Top-level skill is aiperf; it routes to specialized sub-skills.
Adds a SessionStart hook in .claude/settings.json that force-injects a short bootstrap (.claude/hooks/bootstrap.md) into every Claude Code session opened in this repo, so the routing rules are always in context.
Splits docs/dev/patterns.md into per-pattern pages under docs/dev/patterns/ (cli-command, service, model, message, plugin-system, error-handling, logging, testing, console-exporter, uncertainty-plot, strategy-protocol, drop-oldest-fanout-queue) with an index. docs/index.yml (Fern site index) updated.
Four-file sync (AGENTS.md, CLAUDE.md, .github/copilot-instructions.md, .cursor/rules/python.mdc) updated together to document the skill set, warn against uv sync (uninstalls the editable aiperf-mock-server), and reference the new patterns/ layout.

Skill inventory (26)

Review: aiperf-review (router), aiperf-code-review, aiperf-re-review, aiperf-llm-ergonomics-review
Runtime testing: aiperf-correctness-testing, aiperf-adversarial-testing
Workspace utilities: aiperf-worktree, aiperf-pr-checkout, aiperf-mock-server
Daily rituals: aiperf-pytest, aiperf-commit, aiperf-debug
Authoring rituals: aiperf-add-plugin, aiperf-add-service, aiperf-add-metric, aiperf-add-env-var, aiperf-add-cli
Specialized: aiperf-profile-export, aiperf-integration-test, aiperf-merge-from-main, aiperf-baseline-bump, aiperf-perf-profile, aiperf-message-trace, aiperf-finish-pr, aiperf-dataset-synthesis
Top-level router: aiperf

Why bootstrap + force-inject

Skills are only useful if agents invoke them. The bootstrap states an "iron rule" (invoke the relevant aiperf-* skill before responding, even before clarifying questions) and the SessionStart hook makes that rule appear in every Claude Code session opened in this repo without depending on the agent loading CLAUDE.md first.

Test plan

All 28 pre-commit hooks pass on the squashed commit: check-ast, debug-statements, detect-private-key, check-added-large-files, check-case-conflict, check-executables-have-shebangs, check-merge-conflict, check-json, check-toml, check-yaml, check-shebang-scripts-are-executable, end-of-file-fixer, mixed-line-ending, no-commit-to-branch, requirements-txt-fixer, trailing-whitespace, codespell, add-license, generate-cli-docs, generate-env-vars-docs, generate-plugin-artifacts, validate-plugin-schemas, test-imports, check-agent-files-sync, check-ergonomics, check-ruff-baselined, ruff, ruff-format.
make check-agent-files-sync exits 0 (four-file sync verified).
Open a fresh Claude Code session in this repo and confirm the SessionStart hook injects the bootstrap.
Open a Cursor / Copilot session in this repo and confirm the agent-facing files (AGENTS.md, .cursor/rules/python.mdc, .github/copilot-instructions.md) document the skill set identically.
Invoke aiperf from a test session and confirm routing to a sub-skill works.
Manually exercise one skill end-to-end (suggest aiperf-correctness-testing against the in-repo mock — short matrix, no infra dependency).

Notes for reviewers

This is a single squashed commit (9e9a5ecbc) that replaces the prior 11 commits on the branch. The branch was rebased directly onto origin/main (no behind commits, no conflicts).
The skills under .agents/skills/ are deliberately self-contained team-facing content — they don't reference any contributor's private memory files. Recent audit edits (aiperf-profile-export, aiperf-llm-ergonomics-review, aiperf-finish-pr, etc.) reframe opinionated content as documented conventions rather than absolute rules where appropriate.
Bootstrap is intentionally aggressive ("if there is even a 1% chance an aiperf-* skill applies, invoke it") — load-bearing for routing reliability.
aiperf-commit documents the "worktree-isolation + cherry-pick" pattern for parallel-agent work instead of --no-verify. This is project policy for this repo (different from sibling Rust repo).

Summary by CodeRabbit

Documentation
- Reorganized development pattern documentation into dedicated guides for better accessibility.
- Enhanced developer tooling integration guides for Claude Code, Cursor, and Copilot IDEs.
Chores
- Added comprehensive agent skill documentation for common development tasks.
- Updated repository configuration to better support AI-assisted development workflows.

Note: This release is primarily an internal infrastructure and documentation update with minimal user-facing changes.

…rn docs Introduces 26 aiperf-* skills under .agents/skills/ (mirrored to .claude/skills/) that AI agents use as the primary entry point for any aiperf work — review, runtime testing, authoring, daily rituals, workspace utilities, and specialized debugging. The top-level skill is `aiperf`; it routes to specialized sub-skills. A SessionStart hook in .claude/settings.json force-injects a short bootstrap (.claude/hooks/bootstrap.md) into every Claude Code session opened in this repo, so the routing rules are always in context. The skill set: - Review: aiperf-review (router), aiperf-code-review, aiperf-re-review, aiperf-llm-ergonomics-review - Runtime testing: aiperf-correctness-testing, aiperf-adversarial-testing - Workspace: aiperf-worktree, aiperf-pr-checkout, aiperf-mock-server - Daily rituals: aiperf-pytest, aiperf-commit, aiperf-debug - Authoring: aiperf-add-plugin, aiperf-add-service, aiperf-add-metric, aiperf-add-env-var, aiperf-add-cli - Specialized: aiperf-profile-export, aiperf-integration-test, aiperf-merge-from-main, aiperf-baseline-bump, aiperf-perf-profile, aiperf-message-trace, aiperf-finish-pr, aiperf-dataset-synthesis Also splits docs/dev/patterns.md into per-pattern pages under docs/dev/patterns/ (cli-command, service, model, message, plugin-system, error-handling, logging, testing, console-exporter, uncertainty-plot, strategy-protocol, drop-oldest-fanout-queue) with an index and Fern config update. Four-file sync (AGENTS.md, CLAUDE.md, .github/copilot-instructions.md, .cursor/rules/python.mdc) is updated together to document the skill set, warn against `uv sync` (uninstalls the editable mock server), and reference the new patterns/ layout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>

github-actions · 2026-05-14T03:30:05Z

Try out this PR

Quick install:

pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@9e9a5ecbc879e411c10274b25ddff08d70523e3c

Recommended with virtual environment (using uv):

uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@9e9a5ecbc879e411c10274b25ddff08d70523e3c

Last updated for commit: 9e9a5ec • Browse code

github-actions · 2026-05-14T03:30:39Z

Fern Docs Preview: https://nvidia-preview-50428fcf-a825-4a65-b360-ed5c77ccda17.docs.buildwithfern.com/aiperf/dev

coderabbitai · 2026-05-14T03:35:59Z

Walkthrough

This PR introduces a comprehensive agent skills system for AIPerf development workflows, with 24 new procedural skill documents under .agents/skills/, IDE integration hooks for Claude Code and Cursor, and a refactoring of development patterns documentation from a monolithic file into a granular pattern directory structure. Agent skills codify end-to-end procedures for tasks like adding CLI commands, metrics, and plugins; running tests and profiling; code review sequencing; and commit/merge workflows.

Changes

Agent Skills System & IDE Integration

Layer / File(s)	Summary
Entry point skill and routing `.agents/skills/aiperf/SKILL.md`, `AGENTS.md`, `CLAUDE.md`, `.github/copilot-instructions.md`, `README.md`	`aiperf/SKILL.md` defines the mandatory skill-tool invocation entry point with routing precedence rules; reference documentation indexes updated across AGENTS, CLAUDE, copilot, Cursor, and README point to the new skills and granular pattern pages.
Claude Code / Cursor IDE bootstrap `.claude/hooks/bootstrap.md`, `.claude/hooks/session-start`, `.claude/settings.json`, `.cursor/rules/python.mdc`, `.gitignore`	Session-start hook injects mandatory skill-routing instructions into Claude Code and Cursor IDE sessions; `.claude/settings.json` registers the hook; `.gitignore` updated to version-control `.claude/hooks/` and `settings.json`; Cursor rule file updated with agent-skills guidance.

Feature/Plugin Authoring Skills

Layer / File(s)	Summary
Adding CLI commands, metrics, plugins, services, and environment variables `.agents/skills/aiperf-add-cli/SKILL.md`, `aiperf-add-metric/SKILL.md`, `aiperf-add-plugin/SKILL.md`, `aiperf-add-service/SKILL.md`, `aiperf-add-env-var/SKILL.md`	Five skills providing step-by-step workflows for extending aiperf with new user-facing features, including lazy CLI registration via import strings, metric base-class selection and dependency validation, plugin registration and schema regeneration, service message-handler decorators, and environment-variable Pydantic field conventions.
Dataset synthesis and specialization `.agents/skills/aiperf-dataset-synthesis/SKILL.md`	Skill distinguishing when to use `aiperf synthesize`, `aiperf analyze-trace`, or plugin-based dataset loaders; documents semantics of `--request-count` vs `--num-conversations` and KV cache-hit-rate correctness requirements.

Testing, Validation, and Profiling Skills

Layer / File(s)	Summary
Correctness testing, adversarial testing, and performance profiling `.agents/skills/aiperf-correctness-testing/SKILL.md`, `aiperf-adversarial-testing/SKILL.md`, `aiperf-perf-profile/SKILL.md`, `aiperf-integration-test/SKILL.md`, `aiperf-pytest/SKILL.md`	Five skills for comprehensive validation: correctness testing defines endpoint scenario matrix and JSONL assertion invariants; adversarial testing specifies fault-injection and boundary-input scenarios with timeout/error/data-loss verdict heuristics; perf profiling pairs py-spy and RSS sampling with startup/latency analysis; integration-test establishes tier selection and fixture patterns; pytest codifies canonical invocation rules and pre-flight concurrent-process checks.
Mock server skill `.agents/skills/aiperf-mock-server/SKILL.md`	Skill for launching the in-repo OpenAI-compatible mock server on a random free port, polling health, exporting connection details, and ensuring teardown via trap; specifies fast-flag usage and endpoint route reference.

Code Review and PR Workflows

Layer / File(s)	Summary
Code review routing, re-review classification, and ergonomics review `.agents/skills/aiperf-code-review/SKILL.md`, `aiperf-re-review/SKILL.md`, `aiperf-review/SKILL.md`, `aiperf-llm-ergonomics-review/SKILL.md`	Four skills for structured code review: `aiperf-code-review` performs runtime validation via mock server and generates timestamped report with per-finding reproduction receipts; `aiperf-re-review` fetches prior review threads, classifies as addressed/responded/ignored/outdated, detects scope creep; `aiperf-review` routes review intents to appropriate sub-skill sequence; `aiperf-llm-ergonomics-review` updated to reference granular pattern documentation under `docs/dev/patterns/`.
PR checkout and baseline-bump management `.agents/skills/aiperf-pr-checkout/SKILL.md`, `aiperf-baseline-bump/SKILL.md`	Two skills: `aiperf-pr-checkout` creates isolated workspace via `aiperf-worktree`, verifies HEAD match via `gh pr view`, and handles extra-ref fetching; `aiperf-baseline-bump` enforces shrinkage-only baseline regeneration with diff verification and separate-commit discipline.

Development Utilities and Sequencers

Layer / File(s)	Summary
Workspace isolation, commit hygiene, and merge workflows `.agents/skills/aiperf-worktree/SKILL.md`, `aiperf-commit/SKILL.md`, `aiperf-merge-from-main/SKILL.md`	Three skills for day-to-day workflows: `aiperf-worktree` creates isolated clone/worktree with `make first-time-setup` (not `uv sync`); `aiperf-commit` enforces DCO sign-off, explicit per-path staging, and heredoc message preservation on hook rewrite; `aiperf-merge-from-main` adds post-merge signature-drift audit and four-file sync enforcement.
Debugging and message tracing `.agents/skills/aiperf-debug/SKILL.md`, `aiperf-message-trace/SKILL.md`	Two diagnostic skills: `aiperf-debug` indexes symptom-to-trap mappings for pytest/xdist, git/pre-commit, msgspec, networking, RSS, and tooling failures; `aiperf-message-trace` provides dead-letter troubleshooting for the ZMQ bus, covering subscription validation, initialization ordering, and credit-pipeline inspection.
PR-finishing sequencer `.agents/skills/aiperf-finish-pr/SKILL.md`	Sequencer skill that mandates an ordered checklist (mechanical lint/format, tiered tests, docs regeneration, four-file sync, runtime smoke/correctness, code review, optional ergonomics, commit cleanliness, then PR open) with per-step exit criteria and scope-based skipping rules.

Profile Export and Analysis Skills

Layer / File(s)	Summary
Profile export interpretation `.agents/skills/aiperf-profile-export/SKILL.md`	Skill documenting the output artifacts from `aiperf profile` (JSONL, aggregate JSON/CSV, optional outputs/server-metrics, optional GPU telemetry), including which is authoritative per-request data and example Python analysis patterns.

Documentation Refactoring: Patterns Index and Pattern Pages

Layer / File(s)	Summary
Patterns directory index and navigation `docs/dev/patterns/index.md`, `docs/index.yml`	New index page landing pattern-specific documentation under organized sections (Service, Data, Runtime/Infrastructure, Visualization, Testing); `docs/index.yml` navigation updated to replace single `dev/patterns.md` entry with a nested `dev/patterns/` subsection listing individual pattern pages.
Service, message, model, and data-layer patterns `docs/dev/patterns/service.md`, `docs/dev/patterns/message.md`, `docs/dev/patterns/model.md`, `docs/dev/patterns/plugin-system.md`, `docs/dev/patterns/error-handling.md`	Five pattern documentation pages covering `BaseComponentService` async handler structure, message-type enum and handler registration, Pydantic model/config base classes, YAML plugin registry with lazy loading, and error logging/publishing.
CLI command, logging, and testing patterns `docs/dev/patterns/cli-command.md`, `docs/dev/patterns/logging.md`, `docs/dev/patterns/testing.md`	Three pattern pages covering lazy cyclopts command registration via import strings, deferred log-message evaluation via lambdas, and pytest async/parametrize/mock-plugin conventions with auto-fixture behaviors.
Runtime infrastructure and visualization patterns `docs/dev/patterns/drop-oldest-fanout-queue.md`, `docs/dev/patterns/strategy-protocol.md`, `docs/dev/patterns/console-exporter.md`, `docs/dev/patterns/uncertainty-plot.md`	Four advanced pattern pages covering bounded drop-oldest queue fanout with backpressure, runtime-checkable Protocol dispatch in OTel result processing, console exporter subclass configuration, and latency-throughput uncertainty plot data contracts and rendering.
Docstring and reference updates `src/aiperf/cli_commands/__init__.py`, `src/aiperf/post_processors/otel_metrics_results_processor.py`, `.cursor/rules/python.mdc`, `.github/copilot-instructions.md`, `AGENTS.md`, `CLAUDE.md`	Module docstrings and IDE/developer-guide references updated from `docs/dev/patterns.md` anchor links to specific granular pattern pages (e.g., `cli-command.md`, `drop-oldest-fanout-queue.md`).
Removed monolithic patterns file `docs/dev/patterns.md`	Original monolithic patterns documentation file removed; its sections migrated to individual pattern pages under `docs/dev/patterns/`.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 A warren of skills now guides the way,
From CLI tweaks to metrics display,
Tests and profiles hop through the night,
While patterns split—each one just right!
The agent knows where to go, what to do,
One skill invoked means the path shines through. ✨

coderabbitai

Actionable comments posted: 15

🧹 Nitpick comments (9)

.agents/skills/aiperf-debug/SKILL.md (1)
14-31: ⚡ Quick win

Use Mermaid for the decision-tree diagram.

This block is currently Graphviz DOT. Repo markdown guidance asks for Mermaid in .md files, so this should be converted to a Mermaid flowchart block to stay compliant.

As per coding guidelines, "**/*.md: Use mermaid diagrams instead of ASCII art in markdown files".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.agents/skills/aiperf-debug/SKILL.md around lines 14 - 31, The diagram is
written in Graphviz DOT but must be converted to a Mermaid flowchart: replace
the ```dot block with a ```mermaid block and convert the digraph into a
flowchart (e.g., start with "flowchart TD"), map each node label from the DOT
(e.g., "Bug / failure / hang in aiperf?", "Capture exact symptom (error text,
command, env)", "Scan the catalog below for matching symptom", "Match found?",
"Apply the documented fix; record outcome", "Invoke
superpowers:systematic-debugging", "If new durable trap found: add to this
skill") into Mermaid node identifiers and use --> for edges, use labeled edges
for the decision ("Match found?" --|yes|--> "Apply the documented fix; record
outcome" and --|no|--> "Invoke superpowers:systematic-debugging"), and if
desired add simple styling to reflect shapes (e.g., use class or style for the
decision node "Match found?" and the doublecircle start node) so the resulting
block follows the repo guideline of using mermaid diagrams in .md files.
.agents/skills/aiperf-review/SKILL.md (1)
69-71: ⚡ Quick win

Disambiguate the runtime-routing decision branch.

The Runtime phrasing? diamond currently has three outgoing paths, which makes the decision logic unclear in the graph. Split this into explicit follow-up decisions (e.g., happy-path vs error-path vs full-review) so each branch is deterministic.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.agents/skills/aiperf-review/SKILL.md around lines 69 - 71, The decision
node labeled "Runtime phrasing?" currently branches to three targets ("Full
review?", "aiperf-correctness-testing", "aiperf-adversarial-testing"); replace
that single trinary branch with deterministic follow-up decisions by adding
explicit decision nodes (e.g., "Happy path?" and "Error path?") so each outgoing
edge is binary: have "Runtime phrasing?" only decide between "Full review?" vs
"Proceed to runtime-specific check", then route "Proceed to runtime-specific
check" to a new "Happy path?" node which goes to "aiperf-correctness-testing"
and a new "Error path?" node which goes to "aiperf-adversarial-testing"; update
the labels accordingly so "aiperf-correctness-testing" is reached only from the
happy-path node and "aiperf-adversarial-testing" only from the error-path node.
.agents/skills/aiperf-finish-pr/SKILL.md (1)
16-39: ⚡ Quick win

Consider converting to mermaid diagram for consistency.

The repo guideline for **/*.md files specifies using mermaid diagrams instead of other diagram formats. A mermaid flowchart would provide the same information with better consistency across the documentation.
🔄 Mermaid alternative
-```dot
-digraph finish_pr {
-  "Branch ready to ship?" [shape=doublecircle];
-  "1. Mechanical floor green?" [shape=box];
-  "2. Tests green?" [shape=box];
-  "3. Docs regenerated?" [shape=box];
-  "4. Four-file sync OK?" [shape=box];
-  "5. Runtime smoke passes?" [shape=box];
-  "6. Code review pass?" [shape=box];
-  "7. Ergonomics review pass?" [shape=box];
-  "8. Commit clean?" [shape=box];
-  "9. Open PR" [shape=doublecircle];
-
-  "Branch ready to ship?" -> "1. Mechanical floor green?";
-  "1. Mechanical floor green?" -> "2. Tests green?";
-  "2. Tests green?" -> "3. Docs regenerated?";
-  "3. Docs regenerated?" -> "4. Four-file sync OK?";
-  "4. Four-file sync OK?" -> "5. Runtime smoke passes?";
-  "5. Runtime smoke passes?" -> "6. Code review pass?";
-  "6. Code review pass?" -> "7. Ergonomics review pass?";
-  "7. Ergonomics review pass?" -> "8. Commit clean?";
-  "8. Commit clean?" -> "9. Open PR";
-}
-```
+```mermaid
+flowchart TD
+    A((Branch ready to ship?)) --> B[1. Mechanical floor green?]
+    B --> C[2. Tests green?]
+    C --> D[3. Docs regenerated?]
+    D --> E[4. Four-file sync OK?]
+    E --> F[5. Runtime smoke passes?]
+    F --> G[6. Code review pass?]
+    G --> H[7. Ergonomics review pass?]
+    H --> I[8. Commit clean?]
+    I --> J((9. Open PR))
+```
As per coding guidelines, "Use mermaid diagrams instead of ASCII art in markdown files" for **/*.md.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.agents/skills/aiperf-finish-pr/SKILL.md around lines 16 - 39, Replace the
DOT graph block in .agents/skills/aiperf-finish-pr/SKILL.md with a mermaid
flowchart block: change the fenced code block language to ```mermaid and convert
the digraph nodes/edges into a flowchart TD sequence (use nodes A..J or
descriptive labels like "Branch ready to ship?" -> "1. Mechanical floor green?"
-> ... -> "9. Open PR") so the visual matches the original order; ensure
start/end nodes use mermaid circular notation ((...)) for doublecircle and
square/bracket notation [...] for boxed steps and keep the same step texts.
.agents/skills/aiperf-dataset-synthesis/SKILL.md (1)
18-39: ⚡ Quick win

Use Mermaid for the decision tree instead of DOT.

This diagram is currently authored as dot; project markdown guidance prefers mermaid in markdown files.

As per coding guidelines: "**/*.md: Use mermaid diagrams instead of ASCII art in markdown files".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.agents/skills/aiperf-dataset-synthesis/SKILL.md around lines 18 - 39, The
DOT diagram titled "dataset_pick" should be converted to a Mermaid flowchart:
replace the ```dot fenced block with ```mermaid and translate the
digraph/directed edges and node shapes into Mermaid flowchart syntax (use
"flowchart TD" or appropriate direction), map the DOT nodes like "What are you
doing?", "Generating prompts from scratch / a recipe?", "Inspecting an existing
trace file?", "Adding support for a new public dataset?", and terminal/action
boxes ("Use aiperf synthesize", "Use aiperf analyze-trace", "Build a custom
loader as a plugin", "Probably want aiperf profile --public-dataset") to Mermaid
nodes, convert the diamond decision nodes into conditional style using -->|yes|
and -->|no| labels, and ensure the overall graph name/identity (dataset_pick) is
preserved in a comment or label so the logic remains identical.
.agents/skills/aiperf-add-plugin/SKILL.md (1)
21-40: ⚡ Quick win

Convert the decision tree from DOT to Mermaid.

Lines 21-40 use a dot diagram; repo markdown guidance asks for mermaid diagrams in markdown files.

As per coding guidelines: "**/*.md: Use mermaid diagrams instead of ASCII art in markdown files".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.agents/skills/aiperf-add-plugin/SKILL.md around lines 21 - 40, Replace the
DOT diagram named "add_plugin" with a mermaid flowchart (use flowchart TD)
preserving the same node labels and transitions; map "Adding a plugin?" to a
decision/start node, the steps "Identify category in categories.yaml",
"Implement class against the category's Protocol", "Register in plugins.yaml
with correct metadata schema", "Run make generate-all-plugin-files", "Run make
validate-plugin-schemas", "Validate with aiperf plugins --validate", and "Run
aiperf-correctness-testing for the affected endpoint" to sequential flowchart
nodes, and recreate the same arrow relationships between them so the visual flow
and labels remain identical but in mermaid syntax.
.agents/skills/aiperf-correctness-testing/SKILL.md (2)
189-193: ⚡ Quick win

Add language identifier to code fence.

The output contract code block is missing a language specification, which triggered a markdownlint warning. Add text or bash after the opening fence.
📋 Proposed fix
 ## Output contract
 
 Calling skills (or the user) get:
 
-```
+```text
 RESULT=pass | fail
 ART_DIR=<absolute path>/artifacts/correctness-<epoch>
 FAILED_SCENARIOS=<comma-separated list, or empty>
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.agents/skills/aiperf-correctness-testing/SKILL.md around lines 189 - 193,
The Markdown code fence for the output contract block is missing a language
identifier; update the opening fence from totext (or ```bash) so the
block reads as a labeled text code block and resolves the markdownlint
warning—keep the existing block contents (RESULT=..., ART_DIR=...,
FAILED_SCENARIOS=...) unchanged and only add the language token after the
opening backticks.
</details>

---

`133-135`: _⚡ Quick win_

**Hardcoded request count assumption creates maintenance risk.**

The assertion hardcodes `20` as the expected record count with a comment to "adjust here" if scenarios override. This violates DRY and creates a maintenance trap when scenarios use different `--request-count` values (like embeddings on line 83).




<details>
<summary>♻️ Proposed fix: Parse expected count from scenario metadata</summary>

```diff
+# Parse expected counts per scenario (default 20, but some may override)
+expected = {"embeddings": 20, "nim-rankings": 20}  # extend as needed
+expected_default = 20
+
 for run in sorted(art.glob("*/exit-code.txt")):
     name = run.parent.name
+    expected_count = expected.get(name, expected_default)
     exit_code = int(run.read_text().strip())
     if exit_code != 0:
         fail.append(f"{name}: exit {exit_code}")
         continue
     jsonl = run.parent / "profile_export.jsonl"
     if not jsonl.exists():
         fail.append(f"{name}: profile_export.jsonl missing")
         continue
     lines = jsonl.read_text().splitlines()
     if len(lines) == 0:
         fail.append(f"{name}: zero records in profile_export.jsonl")
         continue
-    if len(lines) != 20:
-        # 20 = --request-count from COMMON; if your scenario overrides it, adjust here
-        fail.append(f"{name}: expected 20 records, got {len(lines)}")
+    if len(lines) != expected_count:
+        fail.append(f"{name}: expected {expected_count} records, got {len(lines)}")
         continue
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.agents/skills/aiperf-correctness-testing/SKILL.md around lines 133 - 135,
Replace the hardcoded expected count 20 in the len(lines) check with a value
parsed from the scenario metadata/flags (the same source that supplies
--request-count) so the assertion follows the scenario configuration; locate the
block using the variables lines, name and the fail.append call and read the
request-count from the scenario object or COMMON config (or parse the scenario's
flags for "--request-count") into an integer expected_count, then change the
condition to if len(lines) != expected_count and update the fail.append message
to use expected_count instead of 20.
```

</details>

</blockquote></details>
<details>
<summary>.agents/skills/aiperf-code-review/SKILL.md (2)</summary><blockquote>

`34-39`: _⚡ Quick win_

**Add language identifier to code fence.**

The artifact layout code block is missing a language specification, which triggered a markdownlint warning. Add `text` after the opening fence to resolve.




<details>
<summary>📋 Proposed fix</summary>

```diff
 ## Artifact layout
 
 Compute `EPOCH="$(date +%s)"` once at the start of the invocation; reuse for all paths.
 
-```
+```text
 artifacts/code-review-<epoch>/
   REPORT.md             # the living document; update in place during the run
   meta.json             # branch, head sha, base sha (origin/main), start time
   repro/<scenario>/     # one subdir per reproduction (commands, logs, profile_export.jsonl, mock-server.log)
 ```
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.agents/skills/aiperf-code-review/SKILL.md around lines 34 - 39, The
markdown code fence in the artifact layout block is missing a language tag;
update the opening fence in the SKILL.md artifact layout snippet so the backtick
fence reads as a text-fenced block (add "text" after the three backticks) so the
block becomes a ```text fenced code block to silence the markdownlint warning
while keeping the listed lines unchanged.
</details>

---

`69-69`: **GitHub API position calculation guidance may be insufficient.**

Line 69 instructs to "count lines within the patch to find the `position` value" for inline comments, but diff position calculation is notoriously error-prone. The instruction doesn't specify whether to count only changed lines, context lines, or handle multi-hunk complications.



Expand line 69 with explicit guidance or reference GitHub's position calculation rules: position is 1-indexed from the first line of the diff file, counting every line including context, and stops at the target line in the new file.

Alternatively, add a concrete example showing position calculation for a sample diff.

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.agents/skills/aiperf-code-review/SKILL.md at line 69, Update the guidance
at the sentence currently on line 69 in SKILL.md to explicitly state GitHub's
position calculation: clarify that "position" is 1-indexed from the start of
each file patch (the first line of the diff hunk for that file), that you must
count every line in the patch including added, removed and unchanged context
lines, and that for files with multiple hunks you compute position relative to
the specific hunk containing the target line; either add that exact wording or
include a short, concrete example diff showing how to count lines and produce
the position value for a target line.
```

</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.agents/skills/aiperf-add-env-var/SKILL.md:

Around line 19-41: The markdown fenced block containing the taxonomy lines
(the list of classes like _APIServerSettings, _CompressionSettings, ...
_ZMQSettings) is missing a language tag which triggers MD040; update the opening
fence to include a language identifier (e.g., text) so the block reads text and keep the existing content and closing ``` unchanged to satisfy the
linter.

In @.agents/skills/aiperf-add-metric/SKILL.md:

Line 117: The doc contains contradictory guidance: the class attribute
metric_type is required by init_subclass to register subclasses, but another
line says subclasses should not set metric type; update the SKILL.md text to
require that subclasses declare a class-level metric_type (e.g., class MyMetric:
metric_type = "accuracy") and remove or change the earlier sentence that forbids
setting metric_type so both statements agree; reference the metric_type
attribute and the init_subclass registration behavior when editing the
documentation.

In @.agents/skills/aiperf-adversarial-testing/SKILL.md:

Line 112: The command in SKILL.md's scenario "embedding-on-chat-endpoint" uses
an undefined $PORT variable; update the URL to use the documented MOCK_URL
environment variable (or explicitly document exporting PORT) so the example runs
when copied. Specifically change the --url http://127.0.0.1:$PORT snippet in
the table row for scenario 10 to reference --url $MOCK_URL (or instruct to
export PORT beforehand) so the command is consistent with the rest of the doc
and runnable as-is.

In @.agents/skills/aiperf-correctness-testing/SKILL.md:

Around line 103-105: The multimodal scenario is left undefined; either add a
concrete default multimodal command into the run_scenario block so the document
truly implements six scenarios (e.g., a specific /v1/custom-multimodal
invocation or "multimodal-prompt" example) or update the header text and the
scenarios table (the "six common endpoint shapes" line and the table referenced
around lines 78-85) to state that only five scenarios are implemented plus one
optional/user-defined multimodal scenario; modify the SKILL.md run_scenario
section or the table/header accordingly so the documented matrix and claim are
consistent.

In @.agents/skills/aiperf-dataset-synthesis/SKILL.md:

Around line 65-66: The doc contains contradictory guidance about adding custom
synthesis recipes: remove or reconcile the claim that new recipes are
“synthesize plugin” and instead state the current truth that custom recipes must
be added by editing cli_commands/synthesize.py where the built-in target is
registered as the Literal[...] switch (the existing built-in is "agentic-code");
update SKILL.md text (references around the current lines that mention
"synthesize plugin" and the sentence pointing to cli_commands/synthesize.py) so
both places consistently say that extensibility today requires modifying
cli_commands/synthesize.py and optionally suggest a future plugin design instead
of instructing contributors to create a non-existent "synthesize plugin".

In @.agents/skills/aiperf-integration-test/SKILL.md:

Around line 51-53: The fenced code block showing the test filename pattern
(the line containing "test_.py") lacks a language
tag; update that fenced block to include a language identifier such as "text"
(e.g., change totext) so the block is marked and markdown lint warnings
are resolved.

In @.agents/skills/aiperf-message-trace/SKILL.md:

Line 152: In SKILL.md remove or replace the broken skill reference
superpowers:systematic-debugging: locate the line containing that token and
either delete the reference or swap it for a valid skill name from the
repository (search the repo for other skill identifiers to pick a real one) so
the document does not point to a non-existent superpowers namespace.

Line 8: Update the statement describing service count in SKILL.md: change the
text that currently reads "10 services (1 BaseService orchestrator —
SystemController — plus 9 BaseComponentService components)" to "9 services (1
BaseService orchestrator — SystemController — plus 8 BaseComponentService
components)"; confirm the listed component names (WorkerManager,
RecordProcessor, TimingManager, DatasetManager, ServerMetricsManager,
FastAPIService, RecordsManager, GPUTelemetryManager) reflect the eight
implementations in the codebase and adjust the sentence accordingly.

Around line 82-97: Update the SKILL.md text to remove the nonexistent CLI
flags (--log-level, --verbose, -v) from the example using aiperf profile;
instead state that aiperf profile only accepts user_config and service_config
and show that log verbosity should be controlled via those configs (e.g., set
LOG_LEVEL or logging configuration inside user_config/service_config passed to
aiperf profile), keep the grep pipeline example for filtering, and leave the
correct aiperf service invocation (aiperf service --type <ServiceType.YOUR>
--service-id debug-) unchanged.

In @.agents/skills/aiperf-perf-profile/SKILL.md:

Around line 123-130: The fenced code block in SKILL.md that lists the
artifacts (the block starting on line with "artifacts/perf-/") is missing
a language tag which triggers MD040 lint warnings; update that fenced block to
include a language identifier such as "text" (i.e., change the opening "" to "text") so the markdown linter recognizes it as a plain-text block and the
MD040 warnings stop.

In @.agents/skills/aiperf-pr-checkout/SKILL.md:

Around line 84-91: The fenced code block that currently starts with and contains the variables WORKDIR, PR, HEAD_SHA, BASE_REF, EXTRA_SHA, and SETUP should be annotated with a language tag to avoid MD040 warnings; change the opening fence from to ```text (or another plain-text tag) in the SKILL.md
block that lists these variables so the block is treated as text by Markdown
linters.

In @.agents/skills/aiperf-pytest/SKILL.md:

Line 3: The top-level description currently says “always -n auto” but the doc
later (the section that allows omitting -n auto when invoking a single test ID)
creates a contradiction; update the description sentence in SKILL.md to codify
the exception by stating “always -n auto, except when running a single test ID
invocation (a single test node/ID) where -n auto may be omitted,” and keep the
rest of the canonical invocation rules (no combining subfolders, slow-marker
deselect, MALLOC_ARENA_MAX, anti-collision pre-flight) unchanged so the
single-test exception in the later section (the single test ID allowance)
matches the top-level rule.

In @.agents/skills/aiperf-review/SKILL.md:

Around line 109-115: The two unlabeled fenced code blocks that show the
artifacts tree and the output block (the fences containing "artifacts/
-/ ..." and the other similar output block) must have language
tags to satisfy MD040; update each opening triple-backtick to use "text" (i.e.,
change totext) so both fenced blocks are explicitly labeled as text.

In @.agents/skills/aiperf-worktree/SKILL.md:

Around line 70-74: The fenced output-contract block that currently starts with
"" and contains the lines "WORKDIR=...", "SETUP=...", and "GIT_HEAD=..." needs an explicit language tag to satisfy MD040; change the opening fence from "" to a language-marked fence such as "```text" (or another appropriate
language) so the code block in SKILL.md's output-contract has a declared
language.

In @docs/dev/patterns/console-exporter.md:

Around line 28-31: The markdown table cells contain inline code with unescaped
pipe characters (e.g., the title type str | None, console_groups type
tuple[MetricConsoleGroup, ...] | None, and the exclude_flags value
ERROR_ONLY | INTERNAL | EXPERIMENTAL) which breaks table rendering; update
these cells to escape the pipe characters (e.g., str \| None,
tuple[MetricConsoleGroup, ...] \| None) or replace pipes with the HTML entity
(|) so the title, require_flags, exclude_flags, and console_groups
rows render correctly.

Nitpick comments:
In @.agents/skills/aiperf-add-plugin/SKILL.md:

Around line 21-40: Replace the DOT diagram named "add_plugin" with a mermaid
flowchart (use flowchart TD) preserving the same node labels and transitions;
map "Adding a plugin?" to a decision/start node, the steps "Identify category in
categories.yaml", "Implement class against the category's Protocol", "Register
in plugins.yaml with correct metadata schema", "Run make
generate-all-plugin-files", "Run make validate-plugin-schemas", "Validate with
aiperf plugins --validate", and "Run aiperf-correctness-testing for the affected
endpoint" to sequential flowchart nodes, and recreate the same arrow
relationships between them so the visual flow and labels remain identical but in
mermaid syntax.

In @.agents/skills/aiperf-code-review/SKILL.md:

Around line 34-39: The markdown code fence in the artifact layout block is
missing a language tag; update the opening fence in the SKILL.md artifact layout
snippet so the backtick fence reads as a text-fenced block (add "text" after the
three backticks) so the block becomes a ```text fenced code block to silence the
markdownlint warning while keeping the listed lines unchanged.

Line 69: Update the guidance at the sentence currently on line 69 in SKILL.md
to explicitly state GitHub's position calculation: clarify that "position" is
1-indexed from the start of each file patch (the first line of the diff hunk for
that file), that you must count every line in the patch including added, removed
and unchanged context lines, and that for files with multiple hunks you compute
position relative to the specific hunk containing the target line; either add
that exact wording or include a short, concrete example diff showing how to
count lines and produce the position value for a target line.

In @.agents/skills/aiperf-correctness-testing/SKILL.md:

Around line 189-193: The Markdown code fence for the output contract block is
missing a language identifier; update the opening fence from totext (or
markdownlint warning—keep the existing block contents (RESULT=..., ART_DIR=...,
FAILED_SCENARIOS=...) unchanged and only add the language token after the
opening backticks.
- Around line 133-135: Replace the hardcoded expected count 20 in the len(lines)
check with a value parsed from the scenario metadata/flags (the same source that
supplies --request-count) so the assertion follows the scenario configuration;
locate the block using the variables lines, name and the fail.append call and
read the request-count from the scenario object or COMMON config (or parse the
scenario's flags for "--request-count") into an integer expected_count, then
change the condition to if len(lines) != expected_count and update the
fail.append message to use expected_count instead of 20.

In @.agents/skills/aiperf-dataset-synthesis/SKILL.md:
- Around line 18-39: The DOT diagram titled "dataset_pick" should be converted
to a Mermaid flowchart: replace the ```dot fenced block with ```mermaid and
translate the digraph/directed edges and node shapes into Mermaid flowchart
syntax (use "flowchart TD" or appropriate direction), map the DOT nodes like
"What are you doing?", "Generating prompts from scratch / a recipe?",
"Inspecting an existing trace file?", "Adding support for a new public
dataset?", and terminal/action boxes ("Use aiperf synthesize", "Use aiperf
analyze-trace", "Build a custom loader as a plugin", "Probably want aiperf
profile --public-dataset") to Mermaid nodes, convert the diamond decision nodes
into conditional style using -->|yes| and -->|no| labels, and ensure the overall
graph name/identity (dataset_pick) is preserved in a comment or label so the
logic remains identical.

In @.agents/skills/aiperf-debug/SKILL.md:
- Around line 14-31: The diagram is written in Graphviz DOT but must be
converted to a Mermaid flowchart: replace the ```dot block with a ```mermaid
block and convert the digraph into a flowchart (e.g., start with "flowchart
TD"), map each node label from the DOT (e.g., "Bug / failure / hang in aiperf?",
"Capture exact symptom (error text, command, env)", "Scan the catalog below for
matching symptom", "Match found?", "Apply the documented fix; record outcome",
"Invoke superpowers:systematic-debugging", "If new durable trap found: add to
this skill") into Mermaid node identifiers and use --> for edges, use labeled
edges for the decision ("Match found?" --|yes|--> "Apply the documented fix;
record outcome" and --|no|--> "Invoke superpowers:systematic-debugging"), and if
desired add simple styling to reflect shapes (e.g., use class or style for the
decision node "Match found?" and the doublecircle start node) so the resulting
block follows the repo guideline of using mermaid diagrams in .md files.

In @.agents/skills/aiperf-finish-pr/SKILL.md:
- Around line 16-39: Replace the DOT graph block in
.agents/skills/aiperf-finish-pr/SKILL.md with a mermaid flowchart block: change
the fenced code block language to ```mermaid and convert the digraph nodes/edges
into a flowchart TD sequence (use nodes A..J or descriptive labels like "Branch
ready to ship?" -> "1. Mechanical floor green?" -> ... -> "9. Open PR") so the
visual matches the original order; ensure start/end nodes use mermaid circular
notation ((...)) for doublecircle and square/bracket notation [...] for boxed
steps and keep the same step texts.

In @.agents/skills/aiperf-review/SKILL.md:
- Around line 69-71: The decision node labeled "Runtime phrasing?" currently
branches to three targets ("Full review?", "aiperf-correctness-testing",
"aiperf-adversarial-testing"); replace that single trinary branch with
deterministic follow-up decisions by adding explicit decision nodes (e.g.,
"Happy path?" and "Error path?") so each outgoing edge is binary: have "Runtime
phrasing?" only decide between "Full review?" vs "Proceed to runtime-specific
check", then route "Proceed to runtime-specific check" to a new "Happy path?"
node which goes to "aiperf-correctness-testing" and a new "Error path?" node
which goes to "aiperf-adversarial-testing"; update the labels accordingly so
"aiperf-correctness-testing" is reached only from the happy-path node and
"aiperf-adversarial-testing" only from the error-path node.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)

Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a108f24f-1da6-4e48-9e04-f8baf644fb50

📥 Commits

Reviewing files that changed from the base of the PR and between e7602bc and 9e9a5ec.

📒 Files selected for processing (52)

.agents/skills/aiperf-add-cli/SKILL.md

.agents/skills/aiperf-add-env-var/SKILL.md

.agents/skills/aiperf-add-metric/SKILL.md

.agents/skills/aiperf-add-plugin/SKILL.md

.agents/skills/aiperf-add-service/SKILL.md

.agents/skills/aiperf-adversarial-testing/SKILL.md

.agents/skills/aiperf-baseline-bump/SKILL.md

.agents/skills/aiperf-code-review/SKILL.md

.agents/skills/aiperf-commit/SKILL.md

.agents/skills/aiperf-correctness-testing/SKILL.md

.agents/skills/aiperf-dataset-synthesis/SKILL.md

.agents/skills/aiperf-debug/SKILL.md

.agents/skills/aiperf-finish-pr/SKILL.md

.agents/skills/aiperf-integration-test/SKILL.md

.agents/skills/aiperf-llm-ergonomics-review/SKILL.md

.agents/skills/aiperf-merge-from-main/SKILL.md

.agents/skills/aiperf-message-trace/SKILL.md

.agents/skills/aiperf-mock-server/SKILL.md

.agents/skills/aiperf-perf-profile/SKILL.md

.agents/skills/aiperf-pr-checkout/SKILL.md

.agents/skills/aiperf-profile-export/SKILL.md

.agents/skills/aiperf-pytest/SKILL.md

.agents/skills/aiperf-re-review/SKILL.md

.agents/skills/aiperf-review/SKILL.md

.agents/skills/aiperf-worktree/SKILL.md

.agents/skills/aiperf/SKILL.md

.claude/hooks/bootstrap.md

.claude/hooks/session-start

.claude/settings.json

.cursor/rules/python.mdc

.github/copilot-instructions.md

.gitignore

AGENTS.md

CLAUDE.md

README.md

docs/dev/patterns.md

docs/dev/patterns/cli-command.md

docs/dev/patterns/console-exporter.md

docs/dev/patterns/drop-oldest-fanout-queue.md

docs/dev/patterns/error-handling.md

docs/dev/patterns/index.md

docs/dev/patterns/logging.md

docs/dev/patterns/message.md

docs/dev/patterns/model.md

docs/dev/patterns/plugin-system.md

docs/dev/patterns/service.md

docs/dev/patterns/strategy-protocol.md

docs/dev/patterns/testing.md

docs/dev/patterns/uncertainty-plot.md

docs/index.yml

src/aiperf/cli_commands/__init__.py

src/aiperf/post_processors/otel_metrics_results_processor.py

💤 Files with no reviewable changes (1)

docs/dev/patterns.md

coderabbitai · 2026-05-14T03:36:02Z

+```
+src/aiperf/common/environment.py:
+
+class _APIServerSettings    → Environment.API_SERVER.*
+class _CompressionSettings  → Environment.COMPRESSION.*
+class _ConfigSettings       → Environment.CONFIG.*
+class _DatasetSettings      → Environment.DATASET.*
+class _DeveloperSettings    → Environment.DEV.*
+class _GPUSettings          → Environment.GPU.*
+class _HTTPSettings         → Environment.HTTP.*
+class _LoggingSettings      → Environment.LOGGING.*
+class _MetricsSettings      → Environment.METRICS.*
+class _MLflowSettings       → Environment.MLFLOW.*
+class _OTelSettings         → Environment.OTEL.*
+class _RecordSettings       → Environment.RECORD.*
+class _ServerMetricsSettings → Environment.SERVER_METRICS.*
+class _ServiceSettings      → Environment.SERVICE.*
+class _TimingSettings       → Environment.TIMING.*
+class _TokenizerSettings    → Environment.TOKENIZER.*
+class _UISettings           → Environment.UI.*
+class _WorkerSettings       → Environment.WORKER.*
+class _ZMQSettings          → Environment.ZMQ.*
+```


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language tag to the fenced taxonomy block (Line 19).

This fence is missing a language identifier, which triggers MD040 and creates avoidable lint noise.

Suggested fix

-``` +```text src/aiperf/common/environment.py: @@ class _ZMQSettings → Environment.ZMQ.*

</details>  <details> <summary>📝 Committable suggestion</summary> > ‼️ **IMPORTANT** > Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements. ```suggestion

🧰 Tools

🪛 markdownlint-cli2 (0.22.1)

[warning] 19-19: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.agents/skills/aiperf-add-env-var/SKILL.md around lines 19 - 41, The markdown fenced block containing the taxonomy lines (the list of classes like _APIServerSettings, _CompressionSettings, ... _ZMQSettings) is missing a language tag which triggers MD040; update the opening fence to include a language identifier (e.g., ```text) so the block reads ```text and keep the existing content and closing ``` unchanged to satisfy the linter.

coderabbitai · 2026-05-14T03:36:02Z

+## Common mistakes
+
+- **Picking the wrong base class.** `BaseDerivedMetric` for things that should be `BaseAggregateMetric` produces re-computation per record (slow + wrong).
+- **Missing `metric_type` declaration.** `__init_subclass__` needs it to register.


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Resolve contradictory guidance on metric type declaration.

This bullet conflicts with Line 57 (which says subclasses should not set metric type). Keeping both will misdirect implementations.

Proposed fix

-- **Missing `metric_type` declaration.** `__init_subclass__` needs it to register. +- **Missing required class attributes (`tag`, and `required_metrics` for derived metrics).** Registration and dependency validation rely on these.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- **Missing `metric_type` declaration.** `__init_subclass__` needs it to register.

- **Missing required class attributes (`tag`, and `required_metrics` for derived metrics).** Registration and dependency validation rely on these.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.agents/skills/aiperf-add-metric/SKILL.md at line 117, The doc contains contradictory guidance: the class attribute metric_type is required by __init_subclass__ to register subclasses, but another line says subclasses should not set metric type; update the SKILL.md text to require that subclasses declare a class-level metric_type (e.g., class MyMetric: metric_type = "accuracy") and remove or change the earlier sentence that forbids setting metric_type so both statements agree; reference the metric_type attribute and the __init_subclass__ registration behavior when editing the documentation.

coderabbitai · 2026-05-14T03:36:02Z

+| 7 | **zero-concurrency** | `--fast` | `--endpoint-type chat --request-count 10 --concurrency 0` | aiperf rejects the config with a clear error at startup. No crash, no run started. |
+| 8 | **conflicting-flags** | `--fast` | `--endpoint-type chat --request-count 10 --request-rate 5.0 --concurrency 4 --benchmark-duration 30` (over-specified) | aiperf either picks one with a warning or rejects with a clear message. Don't accept "silently uses concurrency" — that's a finding. |
+| 9 | **malformed-trace-input** | `--fast` | `--input-file /tmp/garbage.jsonl --fixed-schedule` (write 3 lines of invalid JSON beforehand) | Fails fast with line-number + parse error. Don't accept "ran 0 requests and exited 0" — that's silent data loss. |
+| 10 | **embedding-on-chat-endpoint** | `--fast` | `--endpoint-type embeddings --url http://127.0.0.1:$PORT --request-count 5 --concurrency 1` (the user wires an embeddings client at the chat endpoint by changing the `--url` suffix). | Per-request error with informative message. |


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix undefined $PORT in scenario 10 command.

This scenario uses --url http://127.0.0.1:$PORT, but the documented execution path provides MOCK_URL (not PORT). This makes the example inconsistent and brittle for copy/paste runs.

Proposed fix

-| 10 | **embedding-on-chat-endpoint** | `--fast` | `--endpoint-type embeddings --url http://127.0.0.1:$PORT --request-count 5 --concurrency 1` (the user wires an embeddings client at the chat endpoint by changing the `--url` suffix). | Per-request error with informative message. | +| 10 | **embedding-on-chat-endpoint** | `--fast` | `--endpoint-type embeddings --url "${MOCK_URL%/v1}" --request-count 5 --concurrency 1` (wire embeddings requests to a chat endpoint path by overriding URL suffix intentionally). | Per-request error with informative message. |

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

| 10 | **embedding-on-chat-endpoint** | `--fast` | `--endpoint-type embeddings --url http://127.0.0.1:$PORT --request-count 5 --concurrency 1` (the user wires an embeddings client at the chat endpoint by changing the `--url` suffix). | Per-request error with informative message. |

| 10 | **embedding-on-chat-endpoint** | `--fast` | `--endpoint-type embeddings --url "${MOCK_URL%/v1}" --request-count 5 --concurrency 1` (wire embeddings requests to a chat endpoint path by overriding URL suffix intentionally). | Per-request error with informative message. |

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.agents/skills/aiperf-adversarial-testing/SKILL.md at line 112, The command in SKILL.md's scenario "embedding-on-chat-endpoint" uses an undefined $PORT variable; update the URL to use the documented MOCK_URL environment variable (or explicitly document exporting PORT) so the example runs when copied. Specifically change the `--url http://127.0.0.1:$PORT` snippet in the table row for scenario 10 to reference `--url $MOCK_URL` (or instruct to export PORT beforehand) so the command is consistent with the rest of the doc and runnable as-is.

coderabbitai · 2026-05-14T03:36:02Z

+# multimodal: depends on how the user wants to exercise the mock's /v1/custom-multimodal route;
+# the simplest case is a chat run with a multimodal-prompt input — leaving as user-chosen.
+```


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Multimodal scenario is left undefined.

The workflow documents six scenarios in the table (lines 78-85) but the implementation comment at line 103-105 defers the multimodal scenario as "user-chosen." This creates an incomplete matrix and undermines the "six common endpoint shapes" claim at line 69.

Either provide a concrete default multimodal scenario command in the run_scenario block, or update the table and line 69 to explicitly list five implemented scenarios plus one optional/user-defined multimodal scenario.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.agents/skills/aiperf-correctness-testing/SKILL.md around lines 103 - 105, The multimodal scenario is left undefined; either add a concrete default multimodal command into the run_scenario block so the document truly implements six scenarios (e.g., a specific /v1/custom-multimodal invocation or "multimodal-prompt" example) or update the header text and the scenarios table (the "six common endpoint shapes" line and the table referenced around lines 78-85) to state that only five scenarios are implemented plus one optional/user-defined multimodal scenario; modify the SKILL.md run_scenario section or the table/header accordingly so the documented matrix and claim are consistent.

coderabbitai · 2026-05-14T03:36:02Z

+Custom synthesis recipes are NOT a plugin category today — `agentic-code` is the only built-in target, registered as a `Literal[...]` switch in `cli_commands/synthesize.py`. Adding a new recipe requires editing that file (no plugin registration needed).
+


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Resolve contradictory guidance on synthesis extensibility.

Line 65 says new synthesis recipes are not plugin-based and must be added via cli_commands/synthesize.py, but Line 136 advises making reusable recipes a “synthesize plugin.” These instructions conflict and will send contributors down the wrong path.

Suggested wording fix

-| "I'll write a one-off Python script to generate prompts" | If the recipe is reusable, make it a `synthesize` plugin. One-offs accumulate as drift. | +| "I'll write a one-off Python script to generate prompts" | If the recipe is reusable, add it as a first-class `aiperf synthesize` target in `cli_commands/synthesize.py` and `src/aiperf/dataset/synthesis/`. One-offs accumulate as drift. |

Also applies to: 136-137

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.agents/skills/aiperf-dataset-synthesis/SKILL.md around lines 65 - 66, The doc contains contradictory guidance about adding custom synthesis recipes: remove or reconcile the claim that new recipes are “synthesize plugin” and instead state the current truth that custom recipes must be added by editing cli_commands/synthesize.py where the built-in target is registered as the Literal[...] switch (the existing built-in is "agentic-code"); update SKILL.md text (references around the current lines that mention "synthesize plugin" and the sentence pointing to cli_commands/synthesize.py) so both places consistently say that extensibility today requires modifying cli_commands/synthesize.py and optionally suggest a future plugin design instead of instructing contributors to create a non-existent "synthesize plugin".

coderabbitai · 2026-05-14T03:36:03Z

+```
+WORKDIR=<absolute path>
+PR=<n>
+HEAD_SHA=<sha>
+BASE_REF=<baseRefName from gh pr view, usually 'main'>
+EXTRA_SHA=<sha or empty>
+SETUP=ok | failed | read-only
+```


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language tag to the report-back fenced block (Line 84).

Please mark this fence as text (or similar) to avoid MD040 warnings.

🧰 Tools

🪛 markdownlint-cli2 (0.22.1)

[warning] 84-84: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.agents/skills/aiperf-pr-checkout/SKILL.md around lines 84 - 91, The fenced code block that currently starts with ``` and contains the variables WORKDIR, PR, HEAD_SHA, BASE_REF, EXTRA_SHA, and SETUP should be annotated with a language tag to avoid MD040 warnings; change the opening fence from ``` to ```text (or another plain-text tag) in the SKILL.md block that lists these variables so the block is treated as text by Markdown linters.

coderabbitai · 2026-05-14T03:36:03Z

@@ -0,0 +1,98 @@
+---
+name: aiperf-pytest
+description: Use BEFORE invoking pytest in the aiperf repo for any reason — "run the tests", "run unit tests", "verify with pytest", "kick off the integration suite", "re-run the failing tests", "pytest tests/unit/...". Codifies the project's pytest invocation rules (always -n auto, never combine subfolders, slow-marker deselect, MALLOC_ARENA_MAX for integration, anti-collision pre-flight). Wraps the single canonical command per test tier.


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Resolve the -n auto rule inconsistency.

Line 3 states “always -n auto”, but Lines 45-47 explicitly allow omitting it for a single test ID. Please reflect that exception in the top-level description so the rule is unambiguous.

Also applies to: 45-47

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.agents/skills/aiperf-pytest/SKILL.md at line 3, The top-level description currently says “always -n auto” but the doc later (the section that allows omitting -n auto when invoking a single test ID) creates a contradiction; update the description sentence in SKILL.md to codify the exception by stating “always -n auto, except when running a single test ID invocation (a single test node/ID) where -n auto may be omitted,” and keep the rest of the canonical invocation rules (no combining subfolders, slow-marker deselect, MALLOC_ARENA_MAX, anti-collision pre-flight) unchanged so the single-test exception in the later section (the single test ID allowance) matches the top-level rule.

coderabbitai · 2026-05-14T03:36:03Z

+```
+artifacts/
+  <shortname>-<epoch>/
+    REPORT.md           # main deliverable
+    meta.json           # branch, head sha, base sha, invocation timestamp, mock flags
+    <scenario-or-finding-dirs>/
+```


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add language tags to both fenced path/output blocks (Lines 109 and 162).

Both fences are unlabeled and trigger MD040. Use text for each.

Also applies to: 162-167

🧰 Tools

🪛 markdownlint-cli2 (0.22.1)

[warning] 109-109: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.agents/skills/aiperf-review/SKILL.md around lines 109 - 115, The two unlabeled fenced code blocks that show the artifacts tree and the output block (the fences containing "artifacts/ <shortname>-<epoch>/ ..." and the other similar output block) must have language tags to satisfy MD040; update each opening triple-backtick to use "text" (i.e., change ``` to ```text) so both fenced blocks are explicitly labeled as text.

coderabbitai · 2026-05-14T03:36:03Z

+```
+WORKDIR=<absolute path>
+SETUP=ok | failed | read-only
+GIT_HEAD=<sha of HEAD in the new workspace>
+```


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language tag to the output-contract fence (Line 70).

This block should declare a language (e.g., text) to satisfy MD040.

🧰 Tools

🪛 markdownlint-cli2 (0.22.1)

[warning] 70-70: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.agents/skills/aiperf-worktree/SKILL.md around lines 70 - 74, The fenced output-contract block that currently starts with "```" and contains the lines "WORKDIR=...", "SETUP=...", and "GIT_HEAD=..." needs an explicit language tag to satisfy MD040; change the opening fence from "```" to a language-marked fence such as "```text" (or another appropriate language) so the code block in SKILL.md's output-contract has a declared language.

coderabbitai · 2026-05-14T03:36:03Z

+| `title`          | `str | None`                             | Static title; `None` derives from endpoint metadata.                                     |
+| `require_flags`  | `MetricFlags`                            | Records must have ALL of these. Default `MetricFlags.NONE` (no requirement).             |
+| `exclude_flags`  | `MetricFlags`                            | Records with ANY of these are hidden. Default `ERROR_ONLY | INTERNAL | EXPERIMENTAL`.    |
+| `console_groups` | `tuple[MetricConsoleGroup, ...] | None`  | Groups to include, in render order. `None` disables group filtering (single table).      |


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Escape pipe characters in table cells to prevent broken rendering.

Inline code values containing | are splitting table columns, so the table won’t render correctly in many markdown parsers.

Proposed fix

-| `title` | `str | None` | Static title; `None` derives from endpoint metadata. | +| `title` | `str \| None` | Static title; `None` derives from endpoint metadata. | | `require_flags` | `MetricFlags` | Records must have ALL of these. Default `MetricFlags.NONE` (no requirement). | -| `exclude_flags` | `MetricFlags` | Records with ANY of these are hidden. Default `ERROR_ONLY | INTERNAL | EXPERIMENTAL`. | -| `console_groups` | `tuple[MetricConsoleGroup, ...] | None` | Groups to include, in render order. `None` disables group filtering (single table). | +| `exclude_flags` | `MetricFlags` | Records with ANY of these are hidden. Default `ERROR_ONLY \| INTERNAL \| EXPERIMENTAL`. | +| `console_groups` | `tuple[MetricConsoleGroup, ...] \| None` | Groups to include, in render order. `None` disables group filtering (single table). |

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

| `title` | `str | None` | Static title; `None` derives from endpoint metadata. |

| `require_flags` | `MetricFlags` | Records must have ALL of these. Default `MetricFlags.NONE` (no requirement). |

| `exclude_flags` | `MetricFlags` | Records with ANY of these are hidden. Default `ERROR_ONLY | INTERNAL | EXPERIMENTAL`. |

| `console_groups` | `tuple[MetricConsoleGroup, ...] | None` | Groups to include, in render order. `None` disables group filtering (single table). |

| `title` | `str \| None` | Static title; `None` derives from endpoint metadata. |

| `require_flags` | `MetricFlags` | Records must have ALL of these. Default `MetricFlags.NONE` (no requirement). |

| `exclude_flags` | `MetricFlags` | Records with ANY of these are hidden. Default `ERROR_ONLY \| INTERNAL \| EXPERIMENTAL`. |

| `console_groups` | `tuple[MetricConsoleGroup, ...] \| None` | Groups to include, in render order. `None` disables group filtering (single table). |

🧰 Tools

🪛 markdownlint-cli2 (0.22.1)

[warning] 28-28: Table column count
Expected: 3; Actual: 4; Too many cells, extra data will be missing

(MD056, table-column-count)

[warning] 30-30: Table column count
Expected: 3; Actual: 5; Too many cells, extra data will be missing

(MD056, table-column-count)

[warning] 31-31: Table column count
Expected: 3; Actual: 4; Too many cells, extra data will be missing

(MD056, table-column-count)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/dev/patterns/console-exporter.md` around lines 28 - 31, The markdown table cells contain inline code with unescaped pipe characters (e.g., the `title` type `str | None`, `console_groups` type `tuple[MetricConsoleGroup, ...] | None`, and the `exclude_flags` value `ERROR_ONLY | INTERNAL | EXPERIMENTAL`) which breaks table rendering; update these cells to escape the pipe characters (e.g., `str \| None`, `tuple[MetricConsoleGroup, ...] \| None`) or replace pipes with the HTML entity (|) so the `title`, `require_flags`, `exclude_flags`, and `console_groups` rows render correctly.

codecov · 2026-05-14T03:42:01Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

coderabbitai · 2026-05-14T14:18:53Z

Caution

Review failed

An error occurred during the review process. Please try again later.

Walkthrough

This PR introduces a comprehensive agent skills system for AIPerf development workflows, with 24 new procedural skill documents under .agents/skills/, IDE integration hooks for Claude Code and Cursor, and a refactoring of development patterns documentation from a monolithic file into a granular pattern directory structure. Agent skills codify end-to-end procedures for tasks like adding CLI commands, metrics, and plugins; running tests and profiling; code review sequencing; and commit/merge workflows.

Changes

Agent Skills System & IDE Integration

Layer / File(s)	Summary
Entry point skill and routing `.agents/skills/aiperf/SKILL.md`, `AGENTS.md`, `CLAUDE.md`, `.github/copilot-instructions.md`, `README.md`	`aiperf/SKILL.md` defines the mandatory skill-tool invocation entry point with routing precedence rules; reference documentation indexes updated across AGENTS, CLAUDE, copilot, Cursor, and README point to the new skills and granular pattern pages.
Claude Code / Cursor IDE bootstrap `.claude/hooks/bootstrap.md`, `.claude/hooks/session-start`, `.claude/settings.json`, `.cursor/rules/python.mdc`, `.gitignore`	Session-start hook injects mandatory skill-routing instructions into Claude Code and Cursor IDE sessions; `.claude/settings.json` registers the hook; `.gitignore` updated to version-control `.claude/hooks/` and `settings.json`; Cursor rule file updated with agent-skills guidance.

Feature/Plugin Authoring Skills

Layer / File(s)	Summary
Adding CLI commands, metrics, plugins, services, and environment variables `.agents/skills/aiperf-add-cli/SKILL.md`, `aiperf-add-metric/SKILL.md`, `aiperf-add-plugin/SKILL.md`, `aiperf-add-service/SKILL.md`, `aiperf-add-env-var/SKILL.md`	Five skills providing step-by-step workflows for extending aiperf with new user-facing features, including lazy CLI registration via import strings, metric base-class selection and dependency validation, plugin registration and schema regeneration, service message-handler decorators, and environment-variable Pydantic field conventions.
Dataset synthesis and specialization `.agents/skills/aiperf-dataset-synthesis/SKILL.md`	Skill distinguishing when to use `aiperf synthesize`, `aiperf analyze-trace`, or plugin-based dataset loaders; documents semantics of `--request-count` vs `--num-conversations` and KV cache-hit-rate correctness requirements.

Testing, Validation, and Profiling Skills

Layer / File(s)	Summary
Correctness testing, adversarial testing, and performance profiling `.agents/skills/aiperf-correctness-testing/SKILL.md`, `aiperf-adversarial-testing/SKILL.md`, `aiperf-perf-profile/SKILL.md`, `aiperf-integration-test/SKILL.md`, `aiperf-pytest/SKILL.md`	Five skills for comprehensive validation: correctness testing defines endpoint scenario matrix and JSONL assertion invariants; adversarial testing specifies fault-injection and boundary-input scenarios with timeout/error/data-loss verdict heuristics; perf profiling pairs py-spy and RSS sampling with startup/latency analysis; integration-test establishes tier selection and fixture patterns; pytest codifies canonical invocation rules and pre-flight concurrent-process checks.
Mock server skill `.agents/skills/aiperf-mock-server/SKILL.md`	Skill for launching the in-repo OpenAI-compatible mock server on a random free port, polling health, exporting connection details, and ensuring teardown via trap; specifies fast-flag usage and endpoint route reference.

Code Review and PR Workflows

Layer / File(s)	Summary
Code review routing, re-review classification, and ergonomics review `.agents/skills/aiperf-code-review/SKILL.md`, `aiperf-re-review/SKILL.md`, `aiperf-review/SKILL.md`, `aiperf-llm-ergonomics-review/SKILL.md`	Four skills for structured code review: `aiperf-code-review` performs runtime validation via mock server and generates timestamped report with per-finding reproduction receipts; `aiperf-re-review` fetches prior review threads, classifies as addressed/responded/ignored/outdated, detects scope creep; `aiperf-review` routes review intents to appropriate sub-skill sequence; `aiperf-llm-ergonomics-review` updated to reference granular pattern documentation under `docs/dev/patterns/`.
PR checkout and baseline-bump management `.agents/skills/aiperf-pr-checkout/SKILL.md`, `aiperf-baseline-bump/SKILL.md`	Two skills: `aiperf-pr-checkout` creates isolated workspace via `aiperf-worktree`, verifies HEAD match via `gh pr view`, and handles extra-ref fetching; `aiperf-baseline-bump` enforces shrinkage-only baseline regeneration with diff verification and separate-commit discipline.

Development Utilities and Sequencers

Layer / File(s)	Summary
Workspace isolation, commit hygiene, and merge workflows `.agents/skills/aiperf-worktree/SKILL.md`, `aiperf-commit/SKILL.md`, `aiperf-merge-from-main/SKILL.md`	Three skills for day-to-day workflows: `aiperf-worktree` creates isolated clone/worktree with `make first-time-setup` (not `uv sync`); `aiperf-commit` enforces DCO sign-off, explicit per-path staging, and heredoc message preservation on hook rewrite; `aiperf-merge-from-main` adds post-merge signature-drift audit and four-file sync enforcement.
Debugging and message tracing `.agents/skills/aiperf-debug/SKILL.md`, `aiperf-message-trace/SKILL.md`	Two diagnostic skills: `aiperf-debug` indexes symptom-to-trap mappings for pytest/xdist, git/pre-commit, msgspec, networking, RSS, and tooling failures; `aiperf-message-trace` provides dead-letter troubleshooting for the ZMQ bus, covering subscription validation, initialization ordering, and credit-pipeline inspection.
PR-finishing sequencer `.agents/skills/aiperf-finish-pr/SKILL.md`	Sequencer skill that mandates an ordered checklist (mechanical lint/format, tiered tests, docs regeneration, four-file sync, runtime smoke/correctness, code review, optional ergonomics, commit cleanliness, then PR open) with per-step exit criteria and scope-based skipping rules.

Profile Export and Analysis Skills

Layer / File(s)	Summary
Profile export interpretation `.agents/skills/aiperf-profile-export/SKILL.md`	Skill documenting the output artifacts from `aiperf profile` (JSONL, aggregate JSON/CSV, optional outputs/server-metrics, optional GPU telemetry), including which is authoritative per-request data and example Python analysis patterns.

Documentation Refactoring: Patterns Index and Pattern Pages

Layer / File(s)	Summary
Patterns directory index and navigation `docs/dev/patterns/index.md`, `docs/index.yml`	New index page landing pattern-specific documentation under organized sections (Service, Data, Runtime/Infrastructure, Visualization, Testing); `docs/index.yml` navigation updated to replace single `dev/patterns.md` entry with a nested `dev/patterns/` subsection listing individual pattern pages.
Service, message, model, and data-layer patterns `docs/dev/patterns/service.md`, `docs/dev/patterns/message.md`, `docs/dev/patterns/model.md`, `docs/dev/patterns/plugin-system.md`, `docs/dev/patterns/error-handling.md`	Five pattern documentation pages covering `BaseComponentService` async handler structure, message-type enum and handler registration, Pydantic model/config base classes, YAML plugin registry with lazy loading, and error logging/publishing.
CLI command, logging, and testing patterns `docs/dev/patterns/cli-command.md`, `docs/dev/patterns/logging.md`, `docs/dev/patterns/testing.md`	Three pattern pages covering lazy cyclopts command registration via import strings, deferred log-message evaluation via lambdas, and pytest async/parametrize/mock-plugin conventions with auto-fixture behaviors.
Runtime infrastructure and visualization patterns `docs/dev/patterns/drop-oldest-fanout-queue.md`, `docs/dev/patterns/strategy-protocol.md`, `docs/dev/patterns/console-exporter.md`, `docs/dev/patterns/uncertainty-plot.md`	Four advanced pattern pages covering bounded drop-oldest queue fanout with backpressure, runtime-checkable Protocol dispatch in OTel result processing, console exporter subclass configuration, and latency-throughput uncertainty plot data contracts and rendering.
Docstring and reference updates `src/aiperf/cli_commands/__init__.py`, `src/aiperf/post_processors/otel_metrics_results_processor.py`, `.cursor/rules/python.mdc`, `.github/copilot-instructions.md`, `AGENTS.md`, `CLAUDE.md`	Module docstrings and IDE/developer-guide references updated from `docs/dev/patterns.md` anchor links to specific granular pattern pages (e.g., `cli-command.md`, `drop-oldest-fanout-queue.md`).
Removed monolithic patterns file `docs/dev/patterns.md`	Original monolithic patterns documentation file removed; its sections migrated to individual pattern pages under `docs/dev/patterns/`.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 A warren of skills now guides the way,
From CLI tweaks to metrics display,
Tests and profiles hop through the night,
While patterns split—each one just right!
The agent knows where to go, what to do,
One skill invoked means the path shines through. ✨

coderabbitai Bot reviewed May 14, 2026

View reviewed changes

ajcasagrande changed the title ~~Agent skills (aiperf-*) + SessionStart bootstrap + per-pattern docs~~ [WIP] Agent skills (aiperf-*) + SessionStart bootstrap + per-pattern docs May 14, 2026

	- Missing `metric_type` declaration. `__init_subclass__` needs it to register.
	- Missing required class attributes (`tag`, and `required_metrics` for derived metrics). Registration and dependency validation rely on these.

	\| 10 \| embedding-on-chat-endpoint \| `--fast` \| `--endpoint-type embeddings --url http://127.0.0.1:$PORT --request-count 5 --concurrency 1` (the user wires an embeddings client at the chat endpoint by changing the `--url` suffix). \| Per-request error with informative message. \|
	\| 10 \| embedding-on-chat-endpoint \| `--fast` \| `--endpoint-type embeddings --url "${MOCK_URL%/v1}" --request-count 5 --concurrency 1` (wire embeddings requests to a chat endpoint path by overriding URL suffix intentionally). \| Per-request error with informative message. \|

		Custom synthesis recipes are NOT a plugin category today — `agentic-code` is the only built-in target, registered as a `Literal[...]` switch in `cli_commands/synthesize.py`. Adding a new recipe requires editing that file (no plugin registration needed).

Conversation

ajcasagrande commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Skill inventory (26)

Why bootstrap + force-inject

Test plan

Notes for reviewers

Summary by CodeRabbit

Uh oh!

github-actions Bot commented May 14, 2026

Try out this PR

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

coderabbitai Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented May 14, 2026

Codecov Report

Uh oh!

coderabbitai Bot commented May 14, 2026

Review failed

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ajcasagrande commented May 14, 2026 •

edited

Loading

coderabbitai Bot commented May 14, 2026 •

edited

Loading