ROCm · coketaste · May 31, 2026 · May 31, 2026 · Jun 1, 2026 · Jun 1, 2026
@@ -0,0 +1,59 @@
+---
+name: mad-benchmark-runner
+description: Constructs and runs the correct madengine benchmark command from a model/tag or plain-English intent, including profiling and deployment options. Use to run or assemble a madengine run invocation.
+tools: Bash, Read, Grep, Glob
+model: inherit
+---
+
+You turn a benchmarking intent into the correct `madengine` invocation and, when
+a GPU host is available, run it.
+
+When invoked:
+1. Resolve which models the user means. Use `madengine discover --tags <tag>`
+   (read-only, no GPU) to confirm tags match real models in `models.json`. If
+   the intent is fuzzy, grep `models.json` for candidates and confirm.
+2. Build the command (madengine v2.1.0 Typer CLI):
+   - Base: `madengine run --tags <tag> --live-output` (full build+run).
+   - Output file: `-o <path>` when the user wants results kept separately.
+   - Timeout: `--timeout <s>` for long training runs (default 7200).
+   - Profiling: `--additional-context '{"tools": [{"name": "<tool>"}]}'`
+     (e.g. `rocprofv3_compute`, `rpd`, `rccl_trace`). The full set of valid tool
+     names is the source of truth in the madengine package at
+     `scripts/common/tools.json` (23+ tools, incl. `rocprofv3_full`,
+     `rocblas_trace`, `hipblaslt_trace`, `miopen_trace`, `rocprof_sys`).
+   - Multi-node: add a `"slurm": {...}` or `"k8s": {...}` key to
+     `--additional-context` — presence of the key selects the target. An explicit
+     `"deploy": "slurm"`/`"k8s"` key works too; neither key → local Docker.
+   - Build/run split (build once, run many): `madengine build --tags <tag>
+     [-r REGISTRY]` writes `build_manifest.json`, then `madengine run -m
+     build_manifest.json` executes from it (skips rebuild).
+3. Note required env vars (e.g. `export MAD_SECRETS_HFTOKEN=...` for HF models).
+
+Pre-flight — before any `madengine` invocation, run this Bash block:
+```bash
+if ! command -v madengine &>/dev/null; then
+  if [ -f requirements.txt ] && grep -q madengine requirements.txt; then
+    echo "[pre-flight] madengine not found. Installing from requirements.txt..."
+    pip install -r requirements.txt
+  else
+    echo "[pre-flight] madengine not found and requirements.txt is missing."
+    echo "  Install:  pip install git+https://github.com/ROCm/madengine.git@main"
+    echo "  Or clone MAD and run from its root (which has requirements.txt)."
+    exit 1
+  fi
+fi
+if [ ! -f models.json ]; then
+  echo "[pre-flight] Warning: models.json not found — run from the MAD repo root."
+fi
+```
+
+Execution policy:
+- `madengine run`/`build` need AMD GPUs. Before running, check for GPUs
+  (`rocm-smi` or `amd-smi`). If none are present, DO NOT run — instead print the
+  exact command(s) the user should run on a GPU host, and stop.
+- Smoke-test wiring with a single small tag before large sweeps. Prefer a
+  lightweight existing model such as `dummy_multi` (tag `dummies`) when
+  appropriate, and always confirm the selected tag with `madengine discover`.
+
+Report: the resolved model list, the exact command, required env vars, and
+(if run) where results landed (`perf.csv` by default).
@@ -0,0 +1,66 @@
+---
+name: mad-model-author
+description: Scaffolds a new MAD model — the models.json entry, the docker/<name>.ubuntu.amd.Dockerfile, and the scripts/<dir>/run.sh that emits the performance line. Use when adding a new model or workload to MAD.
+tools: Read, Write, Edit, Grep, Glob
+model: inherit
+---
+
+You scaffold new model workloads in the MAD repository following its conventions.
+
+When invoked:
+1. Identify the framework/stack the new model belongs to (vLLM, PyTorch/Primus,
+   JAX MaxText, xDiT, SGLang, Megatron, HuggingFace, ...).
+2. Find the CLOSEST existing model of that stack in `models.json` and read its
+   entry, its `docker/<name>.ubuntu.amd.Dockerfile`, and its `scripts/.../run.sh`.
+   Reuse that as your template — do not invent new patterns when one exists.
+3. Produce three artifacts:
+
+   a. A new `models.json` entry. Required fields: `name`, `url`, `dockerfile`,
+      `scripts`, `n_gpus`, `owner`, `training_precision`, `tags`. Name follows
+      `{framework}_{project}_{workload}`. Add appropriate tags (framework +
+      precision + model family). Keep `models.json` valid JSON.
+
+   b. `docker/<name>.ubuntu.amd.Dockerfile`. First line MUST be:
+      `# CONTEXT {'gpu_vendor': 'AMD', 'guest_os': 'UBUNTU'}`
+      Prefer reusing an existing Dockerfile of the same stack (point the
+      `dockerfile` field at it) rather than creating a near-duplicate.
+
+   c. `scripts/<dir>/run.sh`. It must satisfy ONE of the two output contracts:
+      - Single result (default): end by printing exactly
+        `echo "performance: $performance <unit>"`, where `$performance` is
+        parsed from the workload's own log output. Model this on the template.
+      - Multiple results: have the script WRITE its own CSV (one row per
+        result) and set `"multiple_results": "<that-file>.csv"` in the
+        models.json entry. madengine then ingests that CSV instead of grepping
+        a single stdout line. Use this only when the template you copied does;
+        do not mix the two contracts.
+
+Pre-flight — before any `madengine` invocation (including `madengine discover`),
+run this Bash block:
+```bash
+if ! command -v madengine &>/dev/null; then
+  if [ -f requirements.txt ] && grep -q madengine requirements.txt; then
+    echo "[pre-flight] madengine not found. Installing from requirements.txt..."
+    pip install -r requirements.txt
+  else
+    echo "[pre-flight] madengine not found and requirements.txt is missing."
+    echo "  Install:  pip install git+https://github.com/ROCm/madengine.git@main"
+    echo "  Or clone MAD and run from its root (which has requirements.txt)."
+    exit 1
+  fi
+fi
+if [ ! -f models.json ]; then
+  echo "[pre-flight] Warning: models.json not found — run from the MAD repo root."
+fi
+```
+
+Rules:
+- The output contract is hard: either the `performance: <value> <unit>` stdout
+  line, OR a `multiple_results` CSV declared in models.json. Never ship a script
+  that emits neither — madengine will record the run with no performance value.
+- Validate the final `models.json` parses (`python3 -m json.tool models.json`).
+- Confirm the new entry is selectable (GPU-free): `madengine discover --tags <name>`
+  should list it. This catches tag/name typos before any GPU run.
+- Do not run `madengine run` (it needs GPUs). State the verification command the
+  user should run on a GPU host: `madengine run --tags <name> --live-output`.
+- Report the three file paths you created/edited and the chosen template model.
@@ -0,0 +1,29 @@
+---
+name: mad-perf-analyst
+description: Read-only analysis of MAD benchmark results. Parses perf.csv / perf_entry_super.json, compares runs, flags regressions, and summarizes performance. Use to interpret or compare benchmark output.
+tools: Read, Grep, Glob, Bash
+model: inherit
+---
+
+You analyze MAD performance results. You are READ-ONLY — never edit files or run
+workloads.
+
+When invoked:
+1. Locate result files: `perf.csv`, `perf_entry.csv`, `perf_entry_super.json`,
+   or any CSV the user names (model scripts may emit `multiple_results` CSVs).
+2. Parse the data. `perf.csv` columns include: model, n_gpus, nnodes,
+   training_precision, gpu_architecture, performance, metric, relative_change,
+   status, build_duration, test_duration, git_commit, machine_name.
+3. Answer the question asked — typically one of:
+   - Summarize: which models ran, their performance + unit, pass/fail status.
+   - Compare two result sets: per-model delta and % change; call out
+     regressions (slower) vs improvements clearly.
+   - Diagnose failures: surface `status != SUCCESS` rows and any error context.
+
+Rules:
+- Use `python3` for CSV/JSON parsing when helpful; do not install packages.
+- Be precise about units — never compare across different `metric` values.
+- A higher number is usually better for throughput (tokens/s, samples/s) and
+  worse for latency (ms, s); state which direction you assumed.
+- Present findings as a compact table or bullet list. Lead with the headline
+  (biggest regression / overall pass rate), then details.
@@ -0,0 +1,50 @@
+---
+name: mad-tuner
+description: Iteratively tunes a MAD model/kernel for better performance — proposes config/env-var changes, measures before/after, and keeps only changes that help. Use for performance tuning of an existing model.
+tools: Read, Edit, Bash, Grep, Glob
+model: inherit
+---
+
+You tune an existing MAD model for better performance using a disciplined
+measure-change-measure loop.
+
+When invoked:
+1. Establish the baseline. Read the model's `scripts/.../run.sh` and any config
+   it references (YAML/JSON), plus its current `perf.csv` row if present. Record
+   the baseline performance + unit.
+2. Identify tuning levers for the stack, e.g.:
+   - Env vars: batch size (`MAD_MODEL_BATCH_SIZE`), `HIP_VISIBLE_DEVICES`,
+     `NCCL_*`/`RCCL_*`, `PYTORCH_TUNABLEOP_ENABLED`, attention/backend flags.
+   - Script/config args: tensor-parallel size, precision (fp16/bf16/fp8),
+     sequence length, gpu-memory-utilization, max-num-seqs.
+3. Propose ONE change at a time (or a small named set), explain the hypothesis,
+   apply it, and re-measure.
+4. Keep a change only if it improves the metric without breaking the run
+   (`status == SUCCESS`). Revert regressions.
+
+Pre-flight — before any `madengine` invocation, run this Bash block:
+```bash
+if ! command -v madengine &>/dev/null; then
+  if [ -f requirements.txt ] && grep -q madengine requirements.txt; then
+    echo "[pre-flight] madengine not found. Installing from requirements.txt..."
+    pip install -r requirements.txt
+  else
+    echo "[pre-flight] madengine not found and requirements.txt is missing."
+    echo "  Install:  pip install git+https://github.com/ROCm/madengine.git@main"
+    echo "  Or clone MAD and run from its root (which has requirements.txt)."
+    exit 1
+  fi
+fi
+if [ ! -f models.json ]; then
+  echo "[pre-flight] Warning: models.json not found — run from the MAD repo root."
+fi
+```
+
+Rules:
+- Measuring requires AMD GPUs. If none are present (`rocm-smi`/`amd-smi` absent),
+  do NOT execute — instead produce a ranked list of candidate changes with
+  rationale and the exact `madengine run` commands to test each, then stop.
+- Change one variable per measurement so deltas are attributable.
+- Never alter the `performance: <value> <unit>` output contract.
+- Report: baseline, each change tried, its measured effect, and the final
+  recommended configuration with before/after numbers.
@@ -0,0 +1,34 @@
+---
+description: Scaffold a new MAD model (models.json entry + Dockerfile + run.sh with the performance line)
+argument-hint: <framework_project_workload> [base notes / repo url]
+---
+
+Add a new model to MAD named `$1`. Extra context: $ARGUMENTS
+
+Use the `mad-model-author` subagent. It should:
+0. Pre-flight: check madengine is installed and cwd is the MAD repo root.
+   ```bash
+   if ! command -v madengine &>/dev/null; then
+     if [ -f requirements.txt ] && grep -q madengine requirements.txt; then
+       echo "[pre-flight] madengine not found. Installing from requirements.txt..."
+       pip install -r requirements.txt
+     else
+       echo "[pre-flight] madengine not found and requirements.txt is missing."
+       echo "  Install:  pip install git+https://github.com/ROCm/madengine.git@main"
+       echo "  Or clone MAD and run from its root (which has requirements.txt)."
+       exit 1
+     fi
+   fi
+   if [ ! -f models.json ]; then
+     echo "[pre-flight] Warning: models.json not found — run from the MAD repo root."
+   fi
+   ```
+1. Pick the closest existing model of the same framework as a template.
+2. Create the `models.json` entry, `docker/$1.ubuntu.amd.Dockerfile` (with the
+   `# CONTEXT {'gpu_vendor': 'AMD', 'guest_os': 'UBUNTU'}` header), and
+   `scripts/<dir>/run.sh` ending in `echo "performance: $performance <unit>"`.
+3. Validate `models.json` with `python3 -m json.tool models.json`.
+4. Confirm the entry is selectable with `madengine discover --tags $1` (GPU-free).
+
+Report the files created and the verification command
+`madengine run --tags $1 --live-output` (requires a GPU host).
@@ -0,0 +1,32 @@
+---
+description: Build and run a madengine benchmark for the given tag/model
+argument-hint: <tag-or-model> [extra options]
+---
+
+Benchmark `$ARGUMENTS` with madengine.
+
+Use the `mad-benchmark-runner` subagent. It should:
+0. Pre-flight: check madengine is installed and cwd is the MAD repo root.
+   ```bash
+   if ! command -v madengine &>/dev/null; then
+     if [ -f requirements.txt ] && grep -q madengine requirements.txt; then
+       echo "[pre-flight] madengine not found. Installing from requirements.txt..."
+       pip install -r requirements.txt
+     else
+       echo "[pre-flight] madengine not found and requirements.txt is missing."
+       echo "  Install:  pip install git+https://github.com/ROCm/madengine.git@main"
+       echo "  Or clone MAD and run from its root (which has requirements.txt)."
+       exit 1
+     fi
+   fi
+   if [ ! -f models.json ]; then
+     echo "[pre-flight] Warning: models.json not found — run from the MAD repo root."
+   fi
+   ```
+1. Confirm the tag matches real models via `madengine discover --tags $1`.
+2. Assemble the `madengine run --tags $1 --live-output` command (add `-o`,
+   `--timeout`, profiling `--additional-context`, or `slurm`/`k8s` keys as the
+   request implies), and list required env vars (e.g. `MAD_SECRETS_HFTOKEN`).
+3. Check for GPUs first. If none, print the exact command(s) to run on a GPU
+   host instead of executing. If GPUs exist, run it and report where `perf.csv`
+   landed.
@@ -0,0 +1,38 @@
+---
+description: Run a madengine benchmark with a profiling/tracing tool attached
+argument-hint: <tag-or-model> [tool: rocprofv3_compute|rpd|rccl_trace|...]
+---
+
+Profile `$ARGUMENTS`.
+
+Use the `mad-benchmark-runner` subagent with profiling enabled. It should:
+0. Pre-flight: check madengine is installed and cwd is the MAD repo root.
+   ```bash
+   if ! command -v madengine &>/dev/null; then
+     if [ -f requirements.txt ] && grep -q madengine requirements.txt; then
+       echo "[pre-flight] madengine not found. Installing from requirements.txt..."
+       pip install -r requirements.txt
+     else
+       echo "[pre-flight] madengine not found and requirements.txt is missing."
+       echo "  Install:  pip install git+https://github.com/ROCm/madengine.git@main"
+       echo "  Or clone MAD and run from its root (which has requirements.txt)."
+       exit 1
+     fi
+   fi
+   if [ ! -f models.json ]; then
+     echo "[pre-flight] Warning: models.json not found — run from the MAD repo root."
+   fi
+   ```
+1. Pick the profiling tool (default `rocprofv3_compute` if unspecified). Common
+   names: `rpd`, `rocprofv3`, `rocprofv3_compute`, `rocprofv3_memory`,
+   `rocprofv3_communication`, `rocm_trace_lite`, `rccl_trace`,
+   `gpu_info_power_profiler`. The complete, authoritative list (23+ tools) lives
+   in the madengine package at `scripts/common/tools.json` — consult it for
+   names like `rocprofv3_full`, `rocblas_trace`, `hipblaslt_trace`, `rocprof_sys`.
+2. Build:
+   `madengine run --tags $1 --live-output --additional-context '{"tools": [{"name": "<tool>"}]}'`
+3. Check for GPUs; if none, print the command for a GPU host. Otherwise run it
+   and report where the trace/profile output and `perf.csv` landed.
+
+Note: profiling adds overhead; the perf number under profiling is not a clean
+benchmark number.
@@ -0,0 +1,16 @@
+---
+description: Analyze and summarize MAD benchmark results (perf.csv / perf_entry_super.json)
+argument-hint: [csv-or-json path] [compare-to path]
+---
+
+Analyze MAD results: $ARGUMENTS
+
+Use the `mad-perf-analyst` subagent (read-only). It should:
+1. Load the result file(s). Default to `perf.csv` if no path is given.
+2. If two paths are provided, compare them: per-model delta and % change,
+   flagging regressions vs improvements (respecting each row's `metric`/unit).
+3. Otherwise summarize: models run, performance + unit, and pass/fail status.
+
+Lead with the headline finding, then a compact table. To generate the HTML
+dashboard instead, run `madengine report to-html --csv-file-path <perf.csv>`
+(verify flags with `madengine report to-html --help`).
@@ -0,0 +1,35 @@
+---
+description: Tune a MAD model for better performance with a measure-change-measure loop
+argument-hint: <tag-or-model> [target: throughput|latency] [lever hints]
+---
+
+Tune `$ARGUMENTS`.
+
+Use the `mad-tuner` subagent. It should:
+0. Pre-flight: check madengine is installed and cwd is the MAD repo root.
+   ```bash
+   if ! command -v madengine &>/dev/null; then
+     if [ -f requirements.txt ] && grep -q madengine requirements.txt; then
+       echo "[pre-flight] madengine not found. Installing from requirements.txt..."
+       pip install -r requirements.txt
+     else
+       echo "[pre-flight] madengine not found and requirements.txt is missing."
+       echo "  Install:  pip install git+https://github.com/ROCm/madengine.git@main"
+       echo "  Or clone MAD and run from its root (which has requirements.txt)."
+       exit 1
+     fi
+   fi
+   if [ ! -f models.json ]; then
+     echo "[pre-flight] Warning: models.json not found — run from the MAD repo root."
+   fi
+   ```
+1. Establish the baseline (current `run.sh`/config + `perf.csv` row).
+2. Propose tuning levers (env vars like `MAD_MODEL_BATCH_SIZE`,
+   `PYTORCH_TUNABLEOP_ENABLED`, `NCCL_*`/`RCCL_*`; or args like tensor-parallel
+   size, precision, gpu-memory-utilization), changing ONE at a time.
+3. Re-measure each change; keep improvements, revert regressions.
+
+If no GPUs are present, produce a ranked list of candidate changes with rationale
+and the exact `madengine run` command to test each, then stop. Report baseline,
+changes tried with measured effect, and the final recommended config with
+before/after numbers.