[WIP] Agent skills (aiperf-*) + SessionStart bootstrap + per-pattern docs#937
[WIP] Agent skills (aiperf-*) + SessionStart bootstrap + per-pattern docs#937ajcasagrande wants to merge 1 commit into
Conversation
…rn docs Introduces 26 aiperf-* skills under .agents/skills/ (mirrored to .claude/skills/) that AI agents use as the primary entry point for any aiperf work — review, runtime testing, authoring, daily rituals, workspace utilities, and specialized debugging. The top-level skill is `aiperf`; it routes to specialized sub-skills. A SessionStart hook in .claude/settings.json force-injects a short bootstrap (.claude/hooks/bootstrap.md) into every Claude Code session opened in this repo, so the routing rules are always in context. The skill set: - Review: aiperf-review (router), aiperf-code-review, aiperf-re-review, aiperf-llm-ergonomics-review - Runtime testing: aiperf-correctness-testing, aiperf-adversarial-testing - Workspace: aiperf-worktree, aiperf-pr-checkout, aiperf-mock-server - Daily rituals: aiperf-pytest, aiperf-commit, aiperf-debug - Authoring: aiperf-add-plugin, aiperf-add-service, aiperf-add-metric, aiperf-add-env-var, aiperf-add-cli - Specialized: aiperf-profile-export, aiperf-integration-test, aiperf-merge-from-main, aiperf-baseline-bump, aiperf-perf-profile, aiperf-message-trace, aiperf-finish-pr, aiperf-dataset-synthesis Also splits docs/dev/patterns.md into per-pattern pages under docs/dev/patterns/ (cli-command, service, model, message, plugin-system, error-handling, logging, testing, console-exporter, uncertainty-plot, strategy-protocol, drop-oldest-fanout-queue) with an index and Fern config update. Four-file sync (AGENTS.md, CLAUDE.md, .github/copilot-instructions.md, .cursor/rules/python.mdc) is updated together to document the skill set, warn against `uv sync` (uninstalls the editable mock server), and reference the new patterns/ layout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
Try out this PRQuick install: pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@9e9a5ecbc879e411c10274b25ddff08d70523e3cRecommended with virtual environment (using uv): uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@9e9a5ecbc879e411c10274b25ddff08d70523e3cLast updated for commit: |
WalkthroughThis PR introduces a comprehensive agent skills system for AIPerf development workflows, with 24 new procedural skill documents under ChangesAgent Skills System & IDE Integration
Feature/Plugin Authoring Skills
Testing, Validation, and Profiling Skills
Code Review and PR Workflows
Development Utilities and Sequencers
Profile Export and Analysis Skills
Documentation Refactoring: Patterns Index and Pattern Pages
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
|
There was a problem hiding this comment.
Actionable comments posted: 15
🧹 Nitpick comments (9)
.agents/skills/aiperf-debug/SKILL.md (1)
14-31: ⚡ Quick winUse Mermaid for the decision-tree diagram.
This block is currently Graphviz DOT. Repo markdown guidance asks for Mermaid in
.mdfiles, so this should be converted to a Mermaidflowchartblock to stay compliant.As per coding guidelines, "
**/*.md: Use mermaid diagrams instead of ASCII art in markdown files".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.agents/skills/aiperf-debug/SKILL.md around lines 14 - 31, The diagram is written in Graphviz DOT but must be converted to a Mermaid flowchart: replace the ```dot block with a ```mermaid block and convert the digraph into a flowchart (e.g., start with "flowchart TD"), map each node label from the DOT (e.g., "Bug / failure / hang in aiperf?", "Capture exact symptom (error text, command, env)", "Scan the catalog below for matching symptom", "Match found?", "Apply the documented fix; record outcome", "Invoke superpowers:systematic-debugging", "If new durable trap found: add to this skill") into Mermaid node identifiers and use --> for edges, use labeled edges for the decision ("Match found?" --|yes|--> "Apply the documented fix; record outcome" and --|no|--> "Invoke superpowers:systematic-debugging"), and if desired add simple styling to reflect shapes (e.g., use class or style for the decision node "Match found?" and the doublecircle start node) so the resulting block follows the repo guideline of using mermaid diagrams in .md files..agents/skills/aiperf-review/SKILL.md (1)
69-71: ⚡ Quick winDisambiguate the runtime-routing decision branch.
The
Runtime phrasing?diamond currently has three outgoing paths, which makes the decision logic unclear in the graph. Split this into explicit follow-up decisions (e.g., happy-path vs error-path vs full-review) so each branch is deterministic.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.agents/skills/aiperf-review/SKILL.md around lines 69 - 71, The decision node labeled "Runtime phrasing?" currently branches to three targets ("Full review?", "aiperf-correctness-testing", "aiperf-adversarial-testing"); replace that single trinary branch with deterministic follow-up decisions by adding explicit decision nodes (e.g., "Happy path?" and "Error path?") so each outgoing edge is binary: have "Runtime phrasing?" only decide between "Full review?" vs "Proceed to runtime-specific check", then route "Proceed to runtime-specific check" to a new "Happy path?" node which goes to "aiperf-correctness-testing" and a new "Error path?" node which goes to "aiperf-adversarial-testing"; update the labels accordingly so "aiperf-correctness-testing" is reached only from the happy-path node and "aiperf-adversarial-testing" only from the error-path node..agents/skills/aiperf-finish-pr/SKILL.md (1)
16-39: ⚡ Quick winConsider converting to mermaid diagram for consistency.
The repo guideline for
**/*.mdfiles specifies using mermaid diagrams instead of other diagram formats. A mermaid flowchart would provide the same information with better consistency across the documentation.🔄 Mermaid alternative
-```dot -digraph finish_pr { - "Branch ready to ship?" [shape=doublecircle]; - "1. Mechanical floor green?" [shape=box]; - "2. Tests green?" [shape=box]; - "3. Docs regenerated?" [shape=box]; - "4. Four-file sync OK?" [shape=box]; - "5. Runtime smoke passes?" [shape=box]; - "6. Code review pass?" [shape=box]; - "7. Ergonomics review pass?" [shape=box]; - "8. Commit clean?" [shape=box]; - "9. Open PR" [shape=doublecircle]; - - "Branch ready to ship?" -> "1. Mechanical floor green?"; - "1. Mechanical floor green?" -> "2. Tests green?"; - "2. Tests green?" -> "3. Docs regenerated?"; - "3. Docs regenerated?" -> "4. Four-file sync OK?"; - "4. Four-file sync OK?" -> "5. Runtime smoke passes?"; - "5. Runtime smoke passes?" -> "6. Code review pass?"; - "6. Code review pass?" -> "7. Ergonomics review pass?"; - "7. Ergonomics review pass?" -> "8. Commit clean?"; - "8. Commit clean?" -> "9. Open PR"; -} -``` +```mermaid +flowchart TD + A((Branch ready to ship?)) --> B[1. Mechanical floor green?] + B --> C[2. Tests green?] + C --> D[3. Docs regenerated?] + D --> E[4. Four-file sync OK?] + E --> F[5. Runtime smoke passes?] + F --> G[6. Code review pass?] + G --> H[7. Ergonomics review pass?] + H --> I[8. Commit clean?] + I --> J((9. Open PR)) +```As per coding guidelines, "Use mermaid diagrams instead of ASCII art in markdown files" for
**/*.md.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.agents/skills/aiperf-finish-pr/SKILL.md around lines 16 - 39, Replace the DOT graph block in .agents/skills/aiperf-finish-pr/SKILL.md with a mermaid flowchart block: change the fenced code block language to ```mermaid and convert the digraph nodes/edges into a flowchart TD sequence (use nodes A..J or descriptive labels like "Branch ready to ship?" -> "1. Mechanical floor green?" -> ... -> "9. Open PR") so the visual matches the original order; ensure start/end nodes use mermaid circular notation ((...)) for doublecircle and square/bracket notation [...] for boxed steps and keep the same step texts..agents/skills/aiperf-dataset-synthesis/SKILL.md (1)
18-39: ⚡ Quick winUse Mermaid for the decision tree instead of DOT.
This diagram is currently authored as
dot; project markdown guidance prefers mermaid in markdown files.As per coding guidelines: "
**/*.md: Use mermaid diagrams instead of ASCII art in markdown files".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.agents/skills/aiperf-dataset-synthesis/SKILL.md around lines 18 - 39, The DOT diagram titled "dataset_pick" should be converted to a Mermaid flowchart: replace the ```dot fenced block with ```mermaid and translate the digraph/directed edges and node shapes into Mermaid flowchart syntax (use "flowchart TD" or appropriate direction), map the DOT nodes like "What are you doing?", "Generating prompts from scratch / a recipe?", "Inspecting an existing trace file?", "Adding support for a new public dataset?", and terminal/action boxes ("Use aiperf synthesize", "Use aiperf analyze-trace", "Build a custom loader as a plugin", "Probably want aiperf profile --public-dataset") to Mermaid nodes, convert the diamond decision nodes into conditional style using -->|yes| and -->|no| labels, and ensure the overall graph name/identity (dataset_pick) is preserved in a comment or label so the logic remains identical..agents/skills/aiperf-add-plugin/SKILL.md (1)
21-40: ⚡ Quick winConvert the decision tree from DOT to Mermaid.
Lines 21-40 use a
dotdiagram; repo markdown guidance asks for mermaid diagrams in markdown files.As per coding guidelines: "
**/*.md: Use mermaid diagrams instead of ASCII art in markdown files".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.agents/skills/aiperf-add-plugin/SKILL.md around lines 21 - 40, Replace the DOT diagram named "add_plugin" with a mermaid flowchart (use flowchart TD) preserving the same node labels and transitions; map "Adding a plugin?" to a decision/start node, the steps "Identify category in categories.yaml", "Implement class against the category's Protocol", "Register in plugins.yaml with correct metadata schema", "Run make generate-all-plugin-files", "Run make validate-plugin-schemas", "Validate with aiperf plugins --validate", and "Run aiperf-correctness-testing for the affected endpoint" to sequential flowchart nodes, and recreate the same arrow relationships between them so the visual flow and labels remain identical but in mermaid syntax..agents/skills/aiperf-correctness-testing/SKILL.md (2)
189-193: ⚡ Quick winAdd language identifier to code fence.
The output contract code block is missing a language specification, which triggered a markdownlint warning. Add
textorbashafter the opening fence.📋 Proposed fix
## Output contract Calling skills (or the user) get: -``` +```text RESULT=pass | fail ART_DIR=<absolute path>/artifacts/correctness-<epoch> FAILED_SCENARIOS=<comma-separated list, or empty></details> <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.In @.agents/skills/aiperf-correctness-testing/SKILL.md around lines 189 - 193,
The Markdown code fence for the output contract block is missing a language
identifier; update the opening fence fromtotext (or ```bash) so the
block reads as a labeled text code block and resolves the markdownlint
warning—keep the existing block contents (RESULT=..., ART_DIR=...,
FAILED_SCENARIOS=...) unchanged and only add the language token after the
opening backticks.</details> --- `133-135`: _⚡ Quick win_ **Hardcoded request count assumption creates maintenance risk.** The assertion hardcodes `20` as the expected record count with a comment to "adjust here" if scenarios override. This violates DRY and creates a maintenance trap when scenarios use different `--request-count` values (like embeddings on line 83). <details> <summary>♻️ Proposed fix: Parse expected count from scenario metadata</summary> ```diff +# Parse expected counts per scenario (default 20, but some may override) +expected = {"embeddings": 20, "nim-rankings": 20} # extend as needed +expected_default = 20 + for run in sorted(art.glob("*/exit-code.txt")): name = run.parent.name + expected_count = expected.get(name, expected_default) exit_code = int(run.read_text().strip()) if exit_code != 0: fail.append(f"{name}: exit {exit_code}") continue jsonl = run.parent / "profile_export.jsonl" if not jsonl.exists(): fail.append(f"{name}: profile_export.jsonl missing") continue lines = jsonl.read_text().splitlines() if len(lines) == 0: fail.append(f"{name}: zero records in profile_export.jsonl") continue - if len(lines) != 20: - # 20 = --request-count from COMMON; if your scenario overrides it, adjust here - fail.append(f"{name}: expected 20 records, got {len(lines)}") + if len(lines) != expected_count: + fail.append(f"{name}: expected {expected_count} records, got {len(lines)}") continue ``` </details> <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.agents/skills/aiperf-correctness-testing/SKILL.md around lines 133 - 135, Replace the hardcoded expected count 20 in the len(lines) check with a value parsed from the scenario metadata/flags (the same source that supplies --request-count) so the assertion follows the scenario configuration; locate the block using the variables lines, name and the fail.append call and read the request-count from the scenario object or COMMON config (or parse the scenario's flags for "--request-count") into an integer expected_count, then change the condition to if len(lines) != expected_count and update the fail.append message to use expected_count instead of 20. ``` </details> </blockquote></details> <details> <summary>.agents/skills/aiperf-code-review/SKILL.md (2)</summary><blockquote> `34-39`: _⚡ Quick win_ **Add language identifier to code fence.** The artifact layout code block is missing a language specification, which triggered a markdownlint warning. Add `text` after the opening fence to resolve. <details> <summary>📋 Proposed fix</summary> ```diff ## Artifact layout Compute `EPOCH="$(date +%s)"` once at the start of the invocation; reuse for all paths. -``` +```text artifacts/code-review-<epoch>/ REPORT.md # the living document; update in place during the run meta.json # branch, head sha, base sha (origin/main), start time repro/<scenario>/ # one subdir per reproduction (commands, logs, profile_export.jsonl, mock-server.log) ``` ``` </details> <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.In @.agents/skills/aiperf-code-review/SKILL.md around lines 34 - 39, The
markdown code fence in the artifact layout block is missing a language tag;
update the opening fence in the SKILL.md artifact layout snippet so the backtick
fence reads as a text-fenced block (add "text" after the three backticks) so the
block becomes a ```text fenced code block to silence the markdownlint warning
while keeping the listed lines unchanged.</details> --- `69-69`: **GitHub API position calculation guidance may be insufficient.** Line 69 instructs to "count lines within the patch to find the `position` value" for inline comments, but diff position calculation is notoriously error-prone. The instruction doesn't specify whether to count only changed lines, context lines, or handle multi-hunk complications. Expand line 69 with explicit guidance or reference GitHub's position calculation rules: position is 1-indexed from the first line of the diff file, counting every line including context, and stops at the target line in the new file. Alternatively, add a concrete example showing position calculation for a sample diff. <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.agents/skills/aiperf-code-review/SKILL.md at line 69, Update the guidance at the sentence currently on line 69 in SKILL.md to explicitly state GitHub's position calculation: clarify that "position" is 1-indexed from the start of each file patch (the first line of the diff hunk for that file), that you must count every line in the patch including added, removed and unchanged context lines, and that for files with multiple hunks you compute position relative to the specific hunk containing the target line; either add that exact wording or include a short, concrete example diff showing how to count lines and produce the position value for a target line. ``` </details> </blockquote></details> </blockquote></details> <details> <summary>🤖 Prompt for all review comments with AI agents</summary>Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.Inline comments:
In @.agents/skills/aiperf-add-env-var/SKILL.md:
- Around line 19-41: The markdown fenced block containing the taxonomy lines
(the list of classes like _APIServerSettings, _CompressionSettings, ...
_ZMQSettings) is missing a language tag which triggers MD040; update the opening
fence to include a language identifier (e.g.,text) so the block readstext and keep the existing content and closing ``` unchanged to satisfy the
linter.In @.agents/skills/aiperf-add-metric/SKILL.md:
- Line 117: The doc contains contradictory guidance: the class attribute
metric_type is required by init_subclass to register subclasses, but another
line says subclasses should not set metric type; update the SKILL.md text to
require that subclasses declare a class-level metric_type (e.g., class MyMetric:
metric_type = "accuracy") and remove or change the earlier sentence that forbids
setting metric_type so both statements agree; reference the metric_type
attribute and the init_subclass registration behavior when editing the
documentation.In @.agents/skills/aiperf-adversarial-testing/SKILL.md:
- Line 112: The command in SKILL.md's scenario "embedding-on-chat-endpoint" uses
an undefined $PORT variable; update the URL to use the documented MOCK_URL
environment variable (or explicitly document exporting PORT) so the example runs
when copied. Specifically change the--url http://127.0.0.1:$PORTsnippet in
the table row for scenario 10 to reference--url $MOCK_URL(or instruct to
export PORT beforehand) so the command is consistent with the rest of the doc
and runnable as-is.In @.agents/skills/aiperf-correctness-testing/SKILL.md:
- Around line 103-105: The multimodal scenario is left undefined; either add a
concrete default multimodal command into the run_scenario block so the document
truly implements six scenarios (e.g., a specific /v1/custom-multimodal
invocation or "multimodal-prompt" example) or update the header text and the
scenarios table (the "six common endpoint shapes" line and the table referenced
around lines 78-85) to state that only five scenarios are implemented plus one
optional/user-defined multimodal scenario; modify the SKILL.md run_scenario
section or the table/header accordingly so the documented matrix and claim are
consistent.In @.agents/skills/aiperf-dataset-synthesis/SKILL.md:
- Around line 65-66: The doc contains contradictory guidance about adding custom
synthesis recipes: remove or reconcile the claim that new recipes are
“synthesize plugin” and instead state the current truth that custom recipes must
be added by editing cli_commands/synthesize.py where the built-in target is
registered as the Literal[...] switch (the existing built-in is "agentic-code");
update SKILL.md text (references around the current lines that mention
"synthesize plugin" and the sentence pointing to cli_commands/synthesize.py) so
both places consistently say that extensibility today requires modifying
cli_commands/synthesize.py and optionally suggest a future plugin design instead
of instructing contributors to create a non-existent "synthesize plugin".In @.agents/skills/aiperf-integration-test/SKILL.md:
- Around line 51-53: The fenced code block showing the test filename pattern
(the line containing "test_.py") lacks a language
tag; update that fenced block to include a language identifier such as "text"
(e.g., changetotext) so the block is marked and markdown lint warnings
are resolved.In @.agents/skills/aiperf-message-trace/SKILL.md:
- Line 152: In SKILL.md remove or replace the broken skill reference
superpowers:systematic-debugging: locate the line containing that token and
either delete the reference or swap it for a valid skill name from the
repository (search the repo for other skill identifiers to pick a real one) so
the document does not point to a non-existentsuperpowersnamespace.- Line 8: Update the statement describing service count in SKILL.md: change the
text that currently reads "10 services (1BaseServiceorchestrator —
SystemController — plus 9BaseComponentServicecomponents)" to "9 services (1
BaseServiceorchestrator — SystemController — plus 8BaseComponentService
components)"; confirm the listed component names (WorkerManager,
RecordProcessor, TimingManager, DatasetManager, ServerMetricsManager,
FastAPIService, RecordsManager, GPUTelemetryManager) reflect the eight
implementations in the codebase and adjust the sentence accordingly.- Around line 82-97: Update the SKILL.md text to remove the nonexistent CLI
flags (--log-level, --verbose, -v) from the example using aiperf profile;
instead state that aiperf profile only accepts user_config and service_config
and show that log verbosity should be controlled via those configs (e.g., set
LOG_LEVEL or logging configuration inside user_config/service_config passed to
aiperf profile), keep the grep pipeline example for filtering, and leave the
correct aiperf service invocation (aiperf service --type <ServiceType.YOUR>
--service-id debug-) unchanged.In @.agents/skills/aiperf-perf-profile/SKILL.md:
- Around line 123-130: The fenced code block in SKILL.md that lists the
artifacts (the block starting on line with "artifacts/perf-/") is missing
a language tag which triggers MD040 lint warnings; update that fenced block to
include a language identifier such as "text" (i.e., change the opening "" to "text") so the markdown linter recognizes it as a plain-text block and the
MD040 warnings stop.In @.agents/skills/aiperf-pr-checkout/SKILL.md:
- Around line 84-91: The fenced code block that currently starts with
and contains the variables WORKDIR, PR, HEAD_SHA, BASE_REF, EXTRA_SHA, and SETUP should be annotated with a language tag to avoid MD040 warnings; change the opening fence fromto ```text (or another plain-text tag) in the SKILL.md
block that lists these variables so the block is treated as text by Markdown
linters.In @.agents/skills/aiperf-pytest/SKILL.md:
- Line 3: The top-level description currently says “always -n auto” but the doc
later (the section that allows omitting -n auto when invoking a single test ID)
creates a contradiction; update the description sentence in SKILL.md to codify
the exception by stating “always -n auto, except when running a single test ID
invocation (a single test node/ID) where -n auto may be omitted,” and keep the
rest of the canonical invocation rules (no combining subfolders, slow-marker
deselect, MALLOC_ARENA_MAX, anti-collision pre-flight) unchanged so the
single-test exception in the later section (the single test ID allowance)
matches the top-level rule.In @.agents/skills/aiperf-review/SKILL.md:
- Around line 109-115: The two unlabeled fenced code blocks that show the
artifacts tree and the output block (the fences containing "artifacts/
-/ ..." and the other similar output block) must have language
tags to satisfy MD040; update each opening triple-backtick to use "text" (i.e.,
changetotext) so both fenced blocks are explicitly labeled as text.In @.agents/skills/aiperf-worktree/SKILL.md:
- Around line 70-74: The fenced output-contract block that currently starts with
"" and contains the lines "WORKDIR=...", "SETUP=...", and "GIT_HEAD=..." needs an explicit language tag to satisfy MD040; change the opening fence from "" to a language-marked fence such as "```text" (or another appropriate
language) so the code block in SKILL.md's output-contract has a declared
language.In
@docs/dev/patterns/console-exporter.md:
- Around line 28-31: The markdown table cells contain inline code with unescaped
pipe characters (e.g., thetitletypestr | None,console_groupstype
tuple[MetricConsoleGroup, ...] | None, and theexclude_flagsvalue
ERROR_ONLY | INTERNAL | EXPERIMENTAL) which breaks table rendering; update
these cells to escape the pipe characters (e.g.,str \| None,
tuple[MetricConsoleGroup, ...] \| None) or replace pipes with the HTML entity
(|) so thetitle,require_flags,exclude_flags, andconsole_groups
rows render correctly.
Nitpick comments:
In @.agents/skills/aiperf-add-plugin/SKILL.md:
- Around line 21-40: Replace the DOT diagram named "add_plugin" with a mermaid
flowchart (use flowchart TD) preserving the same node labels and transitions;
map "Adding a plugin?" to a decision/start node, the steps "Identify category in
categories.yaml", "Implement class against the category's Protocol", "Register
in plugins.yaml with correct metadata schema", "Run make
generate-all-plugin-files", "Run make validate-plugin-schemas", "Validate with
aiperf plugins --validate", and "Run aiperf-correctness-testing for the affected
endpoint" to sequential flowchart nodes, and recreate the same arrow
relationships between them so the visual flow and labels remain identical but in
mermaid syntax.In @.agents/skills/aiperf-code-review/SKILL.md:
- Around line 34-39: The markdown code fence in the artifact layout block is
missing a language tag; update the opening fence in the SKILL.md artifact layout
snippet so the backtick fence reads as a text-fenced block (add "text" after the
three backticks) so the block becomes a ```text fenced code block to silence the
markdownlint warning while keeping the listed lines unchanged.- Line 69: Update the guidance at the sentence currently on line 69 in SKILL.md
to explicitly state GitHub's position calculation: clarify that "position" is
1-indexed from the start of each file patch (the first line of the diff hunk for
that file), that you must count every line in the patch including added, removed
and unchanged context lines, and that for files with multiple hunks you compute
position relative to the specific hunk containing the target line; either add
that exact wording or include a short, concrete example diff showing how to
count lines and produce the position value for a target line.In @.agents/skills/aiperf-correctness-testing/SKILL.md:
- Around line 189-193: The Markdown code fence for the output contract block is
missing a language identifier; update the opening fence fromtotext (ormarkdownlint warning—keep the existing block contents (RESULT=..., ART_DIR=..., FAILED_SCENARIOS=...) unchanged and only add the language token after the opening backticks. - Around line 133-135: Replace the hardcoded expected count 20 in the len(lines) check with a value parsed from the scenario metadata/flags (the same source that supplies --request-count) so the assertion follows the scenario configuration; locate the block using the variables lines, name and the fail.append call and read the request-count from the scenario object or COMMON config (or parse the scenario's flags for "--request-count") into an integer expected_count, then change the condition to if len(lines) != expected_count and update the fail.append message to use expected_count instead of 20. In @.agents/skills/aiperf-dataset-synthesis/SKILL.md: - Around line 18-39: The DOT diagram titled "dataset_pick" should be converted to a Mermaid flowchart: replace the ```dot fenced block with ```mermaid and translate the digraph/directed edges and node shapes into Mermaid flowchart syntax (use "flowchart TD" or appropriate direction), map the DOT nodes like "What are you doing?", "Generating prompts from scratch / a recipe?", "Inspecting an existing trace file?", "Adding support for a new public dataset?", and terminal/action boxes ("Use aiperf synthesize", "Use aiperf analyze-trace", "Build a custom loader as a plugin", "Probably want aiperf profile --public-dataset") to Mermaid nodes, convert the diamond decision nodes into conditional style using -->|yes| and -->|no| labels, and ensure the overall graph name/identity (dataset_pick) is preserved in a comment or label so the logic remains identical. In @.agents/skills/aiperf-debug/SKILL.md: - Around line 14-31: The diagram is written in Graphviz DOT but must be converted to a Mermaid flowchart: replace the ```dot block with a ```mermaid block and convert the digraph into a flowchart (e.g., start with "flowchart TD"), map each node label from the DOT (e.g., "Bug / failure / hang in aiperf?", "Capture exact symptom (error text, command, env)", "Scan the catalog below for matching symptom", "Match found?", "Apply the documented fix; record outcome", "Invoke superpowers:systematic-debugging", "If new durable trap found: add to this skill") into Mermaid node identifiers and use --> for edges, use labeled edges for the decision ("Match found?" --|yes|--> "Apply the documented fix; record outcome" and --|no|--> "Invoke superpowers:systematic-debugging"), and if desired add simple styling to reflect shapes (e.g., use class or style for the decision node "Match found?" and the doublecircle start node) so the resulting block follows the repo guideline of using mermaid diagrams in .md files. In @.agents/skills/aiperf-finish-pr/SKILL.md: - Around line 16-39: Replace the DOT graph block in .agents/skills/aiperf-finish-pr/SKILL.md with a mermaid flowchart block: change the fenced code block language to ```mermaid and convert the digraph nodes/edges into a flowchart TD sequence (use nodes A..J or descriptive labels like "Branch ready to ship?" -> "1. Mechanical floor green?" -> ... -> "9. Open PR") so the visual matches the original order; ensure start/end nodes use mermaid circular notation ((...)) for doublecircle and square/bracket notation [...] for boxed steps and keep the same step texts. In @.agents/skills/aiperf-review/SKILL.md: - Around line 69-71: The decision node labeled "Runtime phrasing?" currently branches to three targets ("Full review?", "aiperf-correctness-testing", "aiperf-adversarial-testing"); replace that single trinary branch with deterministic follow-up decisions by adding explicit decision nodes (e.g., "Happy path?" and "Error path?") so each outgoing edge is binary: have "Runtime phrasing?" only decide between "Full review?" vs "Proceed to runtime-specific check", then route "Proceed to runtime-specific check" to a new "Happy path?" node which goes to "aiperf-correctness-testing" and a new "Error path?" node which goes to "aiperf-adversarial-testing"; update the labels accordingly so "aiperf-correctness-testing" is reached only from the happy-path node and "aiperf-adversarial-testing" only from the error-path node.🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID:
a108f24f-1da6-4e48-9e04-f8baf644fb50📒 Files selected for processing (52)
.agents/skills/aiperf-add-cli/SKILL.md.agents/skills/aiperf-add-env-var/SKILL.md.agents/skills/aiperf-add-metric/SKILL.md.agents/skills/aiperf-add-plugin/SKILL.md.agents/skills/aiperf-add-service/SKILL.md.agents/skills/aiperf-adversarial-testing/SKILL.md.agents/skills/aiperf-baseline-bump/SKILL.md.agents/skills/aiperf-code-review/SKILL.md.agents/skills/aiperf-commit/SKILL.md.agents/skills/aiperf-correctness-testing/SKILL.md.agents/skills/aiperf-dataset-synthesis/SKILL.md.agents/skills/aiperf-debug/SKILL.md.agents/skills/aiperf-finish-pr/SKILL.md.agents/skills/aiperf-integration-test/SKILL.md.agents/skills/aiperf-llm-ergonomics-review/SKILL.md.agents/skills/aiperf-merge-from-main/SKILL.md.agents/skills/aiperf-message-trace/SKILL.md.agents/skills/aiperf-mock-server/SKILL.md.agents/skills/aiperf-perf-profile/SKILL.md.agents/skills/aiperf-pr-checkout/SKILL.md.agents/skills/aiperf-profile-export/SKILL.md.agents/skills/aiperf-pytest/SKILL.md.agents/skills/aiperf-re-review/SKILL.md.agents/skills/aiperf-review/SKILL.md.agents/skills/aiperf-worktree/SKILL.md.agents/skills/aiperf/SKILL.md.claude/hooks/bootstrap.md.claude/hooks/session-start.claude/settings.json.cursor/rules/python.mdc.github/copilot-instructions.md.gitignoreAGENTS.mdCLAUDE.mdREADME.mddocs/dev/patterns.mddocs/dev/patterns/cli-command.mddocs/dev/patterns/console-exporter.mddocs/dev/patterns/drop-oldest-fanout-queue.mddocs/dev/patterns/error-handling.mddocs/dev/patterns/index.mddocs/dev/patterns/logging.mddocs/dev/patterns/message.mddocs/dev/patterns/model.mddocs/dev/patterns/plugin-system.mddocs/dev/patterns/service.mddocs/dev/patterns/strategy-protocol.mddocs/dev/patterns/testing.mddocs/dev/patterns/uncertainty-plot.mddocs/index.ymlsrc/aiperf/cli_commands/__init__.pysrc/aiperf/post_processors/otel_metrics_results_processor.py💤 Files with no reviewable changes (1)
- docs/dev/patterns.md
| ``` | ||
| src/aiperf/common/environment.py: | ||
|
|
||
| class _APIServerSettings → Environment.API_SERVER.* | ||
| class _CompressionSettings → Environment.COMPRESSION.* | ||
| class _ConfigSettings → Environment.CONFIG.* | ||
| class _DatasetSettings → Environment.DATASET.* | ||
| class _DeveloperSettings → Environment.DEV.* | ||
| class _GPUSettings → Environment.GPU.* | ||
| class _HTTPSettings → Environment.HTTP.* | ||
| class _LoggingSettings → Environment.LOGGING.* | ||
| class _MetricsSettings → Environment.METRICS.* | ||
| class _MLflowSettings → Environment.MLFLOW.* | ||
| class _OTelSettings → Environment.OTEL.* | ||
| class _RecordSettings → Environment.RECORD.* | ||
| class _ServerMetricsSettings → Environment.SERVER_METRICS.* | ||
| class _ServiceSettings → Environment.SERVICE.* | ||
| class _TimingSettings → Environment.TIMING.* | ||
| class _TokenizerSettings → Environment.TOKENIZER.* | ||
| class _UISettings → Environment.UI.* | ||
| class _WorkerSettings → Environment.WORKER.* | ||
| class _ZMQSettings → Environment.ZMQ.* | ||
| ``` |
There was a problem hiding this comment.
Add a language tag to the fenced taxonomy block (Line 19).
This fence is missing a language identifier, which triggers MD040 and creates avoidable lint noise.
Suggested fix
-```
+```text
src/aiperf/common/environment.py:
@@
class _ZMQSettings → Environment.ZMQ.*</details>
<!-- suggestion_start -->
<details>
<summary>📝 Committable suggestion</summary>
> ‼️ **IMPORTANT**
> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
```suggestion
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 19-19: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.agents/skills/aiperf-add-env-var/SKILL.md around lines 19 - 41, The
markdown fenced block containing the taxonomy lines (the list of classes like
_APIServerSettings, _CompressionSettings, ... _ZMQSettings) is missing a
language tag which triggers MD040; update the opening fence to include a
language identifier (e.g., ```text) so the block reads ```text and keep the
existing content and closing ``` unchanged to satisfy the linter.
| ## Common mistakes | ||
|
|
||
| - **Picking the wrong base class.** `BaseDerivedMetric` for things that should be `BaseAggregateMetric` produces re-computation per record (slow + wrong). | ||
| - **Missing `metric_type` declaration.** `__init_subclass__` needs it to register. |
There was a problem hiding this comment.
Resolve contradictory guidance on metric type declaration.
This bullet conflicts with Line 57 (which says subclasses should not set metric type). Keeping both will misdirect implementations.
Proposed fix
-- **Missing `metric_type` declaration.** `__init_subclass__` needs it to register.
+- **Missing required class attributes (`tag`, and `required_metrics` for derived metrics).** Registration and dependency validation rely on these.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - **Missing `metric_type` declaration.** `__init_subclass__` needs it to register. | |
| - **Missing required class attributes (`tag`, and `required_metrics` for derived metrics).** Registration and dependency validation rely on these. |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.agents/skills/aiperf-add-metric/SKILL.md at line 117, The doc contains
contradictory guidance: the class attribute metric_type is required by
__init_subclass__ to register subclasses, but another line says subclasses
should not set metric type; update the SKILL.md text to require that subclasses
declare a class-level metric_type (e.g., class MyMetric: metric_type =
"accuracy") and remove or change the earlier sentence that forbids setting
metric_type so both statements agree; reference the metric_type attribute and
the __init_subclass__ registration behavior when editing the documentation.
| | 7 | **zero-concurrency** | `--fast` | `--endpoint-type chat --request-count 10 --concurrency 0` | aiperf rejects the config with a clear error at startup. No crash, no run started. | | ||
| | 8 | **conflicting-flags** | `--fast` | `--endpoint-type chat --request-count 10 --request-rate 5.0 --concurrency 4 --benchmark-duration 30` (over-specified) | aiperf either picks one with a warning or rejects with a clear message. Don't accept "silently uses concurrency" — that's a finding. | | ||
| | 9 | **malformed-trace-input** | `--fast` | `--input-file /tmp/garbage.jsonl --fixed-schedule` (write 3 lines of invalid JSON beforehand) | Fails fast with line-number + parse error. Don't accept "ran 0 requests and exited 0" — that's silent data loss. | | ||
| | 10 | **embedding-on-chat-endpoint** | `--fast` | `--endpoint-type embeddings --url http://127.0.0.1:$PORT --request-count 5 --concurrency 1` (the user wires an embeddings client at the chat endpoint by changing the `--url` suffix). | Per-request error with informative message. | |
There was a problem hiding this comment.
Fix undefined $PORT in scenario 10 command.
This scenario uses --url http://127.0.0.1:$PORT, but the documented execution path provides MOCK_URL (not PORT). This makes the example inconsistent and brittle for copy/paste runs.
Proposed fix
-| 10 | **embedding-on-chat-endpoint** | `--fast` | `--endpoint-type embeddings --url http://127.0.0.1:$PORT --request-count 5 --concurrency 1` (the user wires an embeddings client at the chat endpoint by changing the `--url` suffix). | Per-request error with informative message. |
+| 10 | **embedding-on-chat-endpoint** | `--fast` | `--endpoint-type embeddings --url "${MOCK_URL%/v1}" --request-count 5 --concurrency 1` (wire embeddings requests to a chat endpoint path by overriding URL suffix intentionally). | Per-request error with informative message. |📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| | 10 | **embedding-on-chat-endpoint** | `--fast` | `--endpoint-type embeddings --url http://127.0.0.1:$PORT --request-count 5 --concurrency 1` (the user wires an embeddings client at the chat endpoint by changing the `--url` suffix). | Per-request error with informative message. | | |
| | 10 | **embedding-on-chat-endpoint** | `--fast` | `--endpoint-type embeddings --url "${MOCK_URL%/v1}" --request-count 5 --concurrency 1` (wire embeddings requests to a chat endpoint path by overriding URL suffix intentionally). | Per-request error with informative message. | |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.agents/skills/aiperf-adversarial-testing/SKILL.md at line 112, The command
in SKILL.md's scenario "embedding-on-chat-endpoint" uses an undefined $PORT
variable; update the URL to use the documented MOCK_URL environment variable (or
explicitly document exporting PORT) so the example runs when copied.
Specifically change the `--url http://127.0.0.1:$PORT` snippet in the table row
for scenario 10 to reference `--url $MOCK_URL` (or instruct to export PORT
beforehand) so the command is consistent with the rest of the doc and runnable
as-is.
| # multimodal: depends on how the user wants to exercise the mock's /v1/custom-multimodal route; | ||
| # the simplest case is a chat run with a multimodal-prompt input — leaving as user-chosen. | ||
| ``` |
There was a problem hiding this comment.
Multimodal scenario is left undefined.
The workflow documents six scenarios in the table (lines 78-85) but the implementation comment at line 103-105 defers the multimodal scenario as "user-chosen." This creates an incomplete matrix and undermines the "six common endpoint shapes" claim at line 69.
Either provide a concrete default multimodal scenario command in the run_scenario block, or update the table and line 69 to explicitly list five implemented scenarios plus one optional/user-defined multimodal scenario.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.agents/skills/aiperf-correctness-testing/SKILL.md around lines 103 - 105,
The multimodal scenario is left undefined; either add a concrete default
multimodal command into the run_scenario block so the document truly implements
six scenarios (e.g., a specific /v1/custom-multimodal invocation or
"multimodal-prompt" example) or update the header text and the scenarios table
(the "six common endpoint shapes" line and the table referenced around lines
78-85) to state that only five scenarios are implemented plus one
optional/user-defined multimodal scenario; modify the SKILL.md run_scenario
section or the table/header accordingly so the documented matrix and claim are
consistent.
| Custom synthesis recipes are NOT a plugin category today — `agentic-code` is the only built-in target, registered as a `Literal[...]` switch in `cli_commands/synthesize.py`. Adding a new recipe requires editing that file (no plugin registration needed). | ||
|
|
There was a problem hiding this comment.
Resolve contradictory guidance on synthesis extensibility.
Line 65 says new synthesis recipes are not plugin-based and must be added via cli_commands/synthesize.py, but Line 136 advises making reusable recipes a “synthesize plugin.” These instructions conflict and will send contributors down the wrong path.
Suggested wording fix
-| "I'll write a one-off Python script to generate prompts" | If the recipe is reusable, make it a `synthesize` plugin. One-offs accumulate as drift. |
+| "I'll write a one-off Python script to generate prompts" | If the recipe is reusable, add it as a first-class `aiperf synthesize` target in `cli_commands/synthesize.py` and `src/aiperf/dataset/synthesis/`. One-offs accumulate as drift. |Also applies to: 136-137
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.agents/skills/aiperf-dataset-synthesis/SKILL.md around lines 65 - 66, The
doc contains contradictory guidance about adding custom synthesis recipes:
remove or reconcile the claim that new recipes are “synthesize plugin” and
instead state the current truth that custom recipes must be added by editing
cli_commands/synthesize.py where the built-in target is registered as the
Literal[...] switch (the existing built-in is "agentic-code"); update SKILL.md
text (references around the current lines that mention "synthesize plugin" and
the sentence pointing to cli_commands/synthesize.py) so both places consistently
say that extensibility today requires modifying cli_commands/synthesize.py and
optionally suggest a future plugin design instead of instructing contributors to
create a non-existent "synthesize plugin".
| ``` | ||
| WORKDIR=<absolute path> | ||
| PR=<n> | ||
| HEAD_SHA=<sha> | ||
| BASE_REF=<baseRefName from gh pr view, usually 'main'> | ||
| EXTRA_SHA=<sha or empty> | ||
| SETUP=ok | failed | read-only | ||
| ``` |
There was a problem hiding this comment.
Add a language tag to the report-back fenced block (Line 84).
Please mark this fence as text (or similar) to avoid MD040 warnings.
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 84-84: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.agents/skills/aiperf-pr-checkout/SKILL.md around lines 84 - 91, The fenced
code block that currently starts with ``` and contains the variables WORKDIR,
PR, HEAD_SHA, BASE_REF, EXTRA_SHA, and SETUP should be annotated with a language
tag to avoid MD040 warnings; change the opening fence from ``` to ```text (or
another plain-text tag) in the SKILL.md block that lists these variables so the
block is treated as text by Markdown linters.
| @@ -0,0 +1,98 @@ | |||
| --- | |||
| name: aiperf-pytest | |||
| description: Use BEFORE invoking pytest in the aiperf repo for any reason — "run the tests", "run unit tests", "verify with pytest", "kick off the integration suite", "re-run the failing tests", "pytest tests/unit/...". Codifies the project's pytest invocation rules (always -n auto, never combine subfolders, slow-marker deselect, MALLOC_ARENA_MAX for integration, anti-collision pre-flight). Wraps the single canonical command per test tier. | |||
There was a problem hiding this comment.
Resolve the -n auto rule inconsistency.
Line 3 states “always -n auto”, but Lines 45-47 explicitly allow omitting it for a single test ID. Please reflect that exception in the top-level description so the rule is unambiguous.
Also applies to: 45-47
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.agents/skills/aiperf-pytest/SKILL.md at line 3, The top-level description
currently says “always -n auto” but the doc later (the section that allows
omitting -n auto when invoking a single test ID) creates a contradiction; update
the description sentence in SKILL.md to codify the exception by stating “always
-n auto, except when running a single test ID invocation (a single test node/ID)
where -n auto may be omitted,” and keep the rest of the canonical invocation
rules (no combining subfolders, slow-marker deselect, MALLOC_ARENA_MAX,
anti-collision pre-flight) unchanged so the single-test exception in the later
section (the single test ID allowance) matches the top-level rule.
| ``` | ||
| artifacts/ | ||
| <shortname>-<epoch>/ | ||
| REPORT.md # main deliverable | ||
| meta.json # branch, head sha, base sha, invocation timestamp, mock flags | ||
| <scenario-or-finding-dirs>/ | ||
| ``` |
There was a problem hiding this comment.
Add language tags to both fenced path/output blocks (Lines 109 and 162).
Both fences are unlabeled and trigger MD040. Use text for each.
Also applies to: 162-167
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 109-109: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.agents/skills/aiperf-review/SKILL.md around lines 109 - 115, The two
unlabeled fenced code blocks that show the artifacts tree and the output block
(the fences containing "artifacts/ <shortname>-<epoch>/ ..." and the other
similar output block) must have language tags to satisfy MD040; update each
opening triple-backtick to use "text" (i.e., change ``` to ```text) so both
fenced blocks are explicitly labeled as text.
| ``` | ||
| WORKDIR=<absolute path> | ||
| SETUP=ok | failed | read-only | ||
| GIT_HEAD=<sha of HEAD in the new workspace> | ||
| ``` |
There was a problem hiding this comment.
Add a language tag to the output-contract fence (Line 70).
This block should declare a language (e.g., text) to satisfy MD040.
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 70-70: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.agents/skills/aiperf-worktree/SKILL.md around lines 70 - 74, The fenced
output-contract block that currently starts with "```" and contains the lines
"WORKDIR=...", "SETUP=...", and "GIT_HEAD=..." needs an explicit language tag to
satisfy MD040; change the opening fence from "```" to a language-marked fence
such as "```text" (or another appropriate language) so the code block in
SKILL.md's output-contract has a declared language.
| | `title` | `str | None` | Static title; `None` derives from endpoint metadata. | | ||
| | `require_flags` | `MetricFlags` | Records must have ALL of these. Default `MetricFlags.NONE` (no requirement). | | ||
| | `exclude_flags` | `MetricFlags` | Records with ANY of these are hidden. Default `ERROR_ONLY | INTERNAL | EXPERIMENTAL`. | | ||
| | `console_groups` | `tuple[MetricConsoleGroup, ...] | None` | Groups to include, in render order. `None` disables group filtering (single table). | |
There was a problem hiding this comment.
Escape pipe characters in table cells to prevent broken rendering.
Inline code values containing | are splitting table columns, so the table won’t render correctly in many markdown parsers.
Proposed fix
-| `title` | `str | None` | Static title; `None` derives from endpoint metadata. |
+| `title` | `str \| None` | Static title; `None` derives from endpoint metadata. |
| `require_flags` | `MetricFlags` | Records must have ALL of these. Default `MetricFlags.NONE` (no requirement). |
-| `exclude_flags` | `MetricFlags` | Records with ANY of these are hidden. Default `ERROR_ONLY | INTERNAL | EXPERIMENTAL`. |
-| `console_groups` | `tuple[MetricConsoleGroup, ...] | None` | Groups to include, in render order. `None` disables group filtering (single table). |
+| `exclude_flags` | `MetricFlags` | Records with ANY of these are hidden. Default `ERROR_ONLY \| INTERNAL \| EXPERIMENTAL`. |
+| `console_groups` | `tuple[MetricConsoleGroup, ...] \| None` | Groups to include, in render order. `None` disables group filtering (single table). |📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| | `title` | `str | None` | Static title; `None` derives from endpoint metadata. | | |
| | `require_flags` | `MetricFlags` | Records must have ALL of these. Default `MetricFlags.NONE` (no requirement). | | |
| | `exclude_flags` | `MetricFlags` | Records with ANY of these are hidden. Default `ERROR_ONLY | INTERNAL | EXPERIMENTAL`. | | |
| | `console_groups` | `tuple[MetricConsoleGroup, ...] | None` | Groups to include, in render order. `None` disables group filtering (single table). | | |
| | `title` | `str \| None` | Static title; `None` derives from endpoint metadata. | | |
| | `require_flags` | `MetricFlags` | Records must have ALL of these. Default `MetricFlags.NONE` (no requirement). | | |
| | `exclude_flags` | `MetricFlags` | Records with ANY of these are hidden. Default `ERROR_ONLY \| INTERNAL \| EXPERIMENTAL`. | | |
| | `console_groups` | `tuple[MetricConsoleGroup, ...] \| None` | Groups to include, in render order. `None` disables group filtering (single table). | |
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 28-28: Table column count
Expected: 3; Actual: 4; Too many cells, extra data will be missing
(MD056, table-column-count)
[warning] 30-30: Table column count
Expected: 3; Actual: 5; Too many cells, extra data will be missing
(MD056, table-column-count)
[warning] 31-31: Table column count
Expected: 3; Actual: 4; Too many cells, extra data will be missing
(MD056, table-column-count)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/dev/patterns/console-exporter.md` around lines 28 - 31, The markdown
table cells contain inline code with unescaped pipe characters (e.g., the
`title` type `str | None`, `console_groups` type `tuple[MetricConsoleGroup, ...]
| None`, and the `exclude_flags` value `ERROR_ONLY | INTERNAL | EXPERIMENTAL`)
which breaks table rendering; update these cells to escape the pipe characters
(e.g., `str \| None`, `tuple[MetricConsoleGroup, ...] \| None`) or replace pipes
with the HTML entity (|) so the `title`, `require_flags`, `exclude_flags`,
and `console_groups` rows render correctly.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
|
Caution Review failedAn error occurred during the review process. Please try again later. WalkthroughThis PR introduces a comprehensive agent skills system for AIPerf development workflows, with 24 new procedural skill documents under ChangesAgent Skills System & IDE Integration
Feature/Plugin Authoring Skills
Testing, Validation, and Profiling Skills
Code Review and PR Workflows
Development Utilities and Sequencers
Profile Export and Analysis Skills
Documentation Refactoring: Patterns Index and Pattern Pages
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
|
Warning
Not ready for prime time yet, still needs more tweaks
Summary
aiperf-*skills under.agents/skills/(mirrored to.claude/skills/) — a focused skill set for AI agents working in this repo, covering review, runtime testing, authoring, daily rituals, workspace utilities, and specialized debugging. Top-level skill isaiperf; it routes to specialized sub-skills.SessionStarthook in.claude/settings.jsonthat force-injects a short bootstrap (.claude/hooks/bootstrap.md) into every Claude Code session opened in this repo, so the routing rules are always in context.docs/dev/patterns.mdinto per-pattern pages underdocs/dev/patterns/(cli-command, service, model, message, plugin-system, error-handling, logging, testing, console-exporter, uncertainty-plot, strategy-protocol, drop-oldest-fanout-queue) with an index.docs/index.yml(Fern site index) updated.AGENTS.md,CLAUDE.md,.github/copilot-instructions.md,.cursor/rules/python.mdc) updated together to document the skill set, warn againstuv sync(uninstalls the editableaiperf-mock-server), and reference the newpatterns/layout.Skill inventory (26)
Review:
aiperf-review(router),aiperf-code-review,aiperf-re-review,aiperf-llm-ergonomics-reviewRuntime testing:
aiperf-correctness-testing,aiperf-adversarial-testingWorkspace utilities:
aiperf-worktree,aiperf-pr-checkout,aiperf-mock-serverDaily rituals:
aiperf-pytest,aiperf-commit,aiperf-debugAuthoring rituals:
aiperf-add-plugin,aiperf-add-service,aiperf-add-metric,aiperf-add-env-var,aiperf-add-cliSpecialized:
aiperf-profile-export,aiperf-integration-test,aiperf-merge-from-main,aiperf-baseline-bump,aiperf-perf-profile,aiperf-message-trace,aiperf-finish-pr,aiperf-dataset-synthesisTop-level router:
aiperfWhy bootstrap + force-inject
Skills are only useful if agents invoke them. The bootstrap states an "iron rule" (invoke the relevant
aiperf-*skill before responding, even before clarifying questions) and theSessionStarthook makes that rule appear in every Claude Code session opened in this repo without depending on the agent loadingCLAUDE.mdfirst.Test plan
check-ast,debug-statements,detect-private-key,check-added-large-files,check-case-conflict,check-executables-have-shebangs,check-merge-conflict,check-json,check-toml,check-yaml,check-shebang-scripts-are-executable,end-of-file-fixer,mixed-line-ending,no-commit-to-branch,requirements-txt-fixer,trailing-whitespace,codespell,add-license,generate-cli-docs,generate-env-vars-docs,generate-plugin-artifacts,validate-plugin-schemas,test-imports,check-agent-files-sync,check-ergonomics,check-ruff-baselined,ruff,ruff-format.make check-agent-files-syncexits 0 (four-file sync verified).SessionStarthook injects the bootstrap.AGENTS.md,.cursor/rules/python.mdc,.github/copilot-instructions.md) document the skill set identically.aiperffrom a test session and confirm routing to a sub-skill works.aiperf-correctness-testingagainst the in-repo mock — short matrix, no infra dependency).Notes for reviewers
9e9a5ecbc) that replaces the prior 11 commits on the branch. The branch was rebased directly ontoorigin/main(no behind commits, no conflicts)..agents/skills/are deliberately self-contained team-facing content — they don't reference any contributor's private memory files. Recent audit edits (aiperf-profile-export,aiperf-llm-ergonomics-review,aiperf-finish-pr, etc.) reframe opinionated content as documented conventions rather than absolute rules where appropriate.aiperf-*skill applies, invoke it") — load-bearing for routing reliability.aiperf-commitdocuments the "worktree-isolation + cherry-pick" pattern for parallel-agent work instead of--no-verify. This is project policy for this repo (different from sibling Rust repo).Summary by CodeRabbit
Documentation
Chores
Note: This release is primarily an internal infrastructure and documentation update with minimal user-facing changes.