Improve liger-autopatch skill to enable modifying existing monkey-patches (#1177)

vaibhavjindal · claude · web-flow · commit f16c9f76f4a3 · 2026-04-01T18:35:13.000Z
## Summary Extends the `liger-autopatch` Claude Code skill to support **modifying existing monkey-patches**, not just adding new models. Previously, the skill only triggered for new model support, so modification tasks (e.g., adding `LigerReLUSquared` to nemotron in #1172) bypassed the skill entirely and missed its guidelines. Changes to 2 skill files: **SKILL.md:** - Broadened skill description to trigger on modification-related keywords (update, fix, change, extend, etc.) - Added **Mode Detection** section that routes to create vs modify pipelines - Added **Modify Pipeline** with 3 stages: Change Impact Analysis → Apply Changes → Validate - Validate stage lists all convergence test commands explicitly (including multimodal) **code-generator.md:** - Added **Mode** section distinguishing create vs modify mode - Added **Modification Checklist** with 7 rules (R1-R7): - R1: Both patching levels (class-level + instance-level) - R2: New parameter with default - R3: Update docstring - R4: Update tests - R5: Check revert function in test/utils.py - R6: Run convergence tests (all 6 files including multimodal) - R7: Update README.md - Added **Common Modification Patterns** for: adding activation kernels, adding norm variants, fixing missing instance patching, updating for upstream HF changes ## Testing Done Tested by asking Claude to add `relu_squared` to nemotron's existing monkey-patch. Verified the skill triggers in modify mode and follows the modification checklist. - Hardware Type: N/A (skill files only, no code changes) - [x] run `make test` to ensure correctness - [x] run `make checkstyle` to ensure code style - [x] run `make test-convergence` to ensure convergence --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
diff --git a/.claude/skills/liger-autopatch/SKILL.md b/.claude/skills/liger-autopatch/SKILL.md
@@ -1,13 +1,20 @@
 ---
 name: liger-autopatch
-description: "Adds Liger Kernel support for a new HuggingFace Transformers model. Generates lce_forward, monkey-patch function, tests, and README entry. Use when adding a new model to Liger Kernel, when a user asks to patch an unsupported model, or when extending MODEL_TYPE_TO_APPLY_LIGER_FN."
+description: "Adds Liger Kernel support for a new HuggingFace Transformers model, or modifies existing monkey-patching. Generates lce_forward, monkey-patch function, tests, and README entry. Use when adding a new model to Liger Kernel, when a user asks to patch an unsupported model, when extending MODEL_TYPE_TO_APPLY_LIGER_FN, or when modifying/updating/fixing an existing monkey-patch (e.g., adding a new kernel to an already-supported model, fixing instance patching, updating a patch for upstream HF changes)."
 ---
 
 # Liger Auto-Patch
 
-Adds Liger Kernel optimization support for a new HuggingFace model through a 3-stage pipeline with human review between stages.
+Adds Liger Kernel optimization support for a new HuggingFace model, or modifies existing monkey-patching, through a staged pipeline with human review between stages. Supports creating new model patches and modifying existing ones.
 
-## Pipeline
+## Mode Detection
+
+- **Create mode**: User asks to add/patch/support a new model → full pipeline (Analyze → Generate → Validate)
+- **Modify mode**: User asks to update/fix/change/extend an existing monkey-patch → lighter pipeline (Change Impact Analysis → Apply Changes → Validate)
+
+Keywords that suggest modify mode: update, fix, change, add [kernel] to [existing model], extend, modify, new activation, new norm, bug in patch, upstream changed
+
+## Pipeline (Create Mode)
 
 ### Stage 1: Analyze
 
@@ -47,6 +54,42 @@ Runs instance patching test, convergence test, and lint check. Retries up to 3 t
 
 **Human checkpoint:** Report final test results.
 
+## Pipeline (Modify Mode)
+
+### Stage 1: Change Impact Analysis
+
+Read the existing `apply_liger_kernel_to_{model_type}` function in `monkey_patch.py` and the relevant section of the upstream HF `modeling_{model_type}.py`. Produce a short change plan:
+
+- What is being added/changed/fixed
+- Which Liger kernel(s) are involved
+- Which files need modification (subset of the 13 files from create mode)
+- What the expected behavior should be after the change
+
+**Human checkpoint:** Present the change plan. Confirm before proceeding.
+
+### Stage 2: Apply Changes
+
+Spawn the **Code Generator** agent (read [code-generator.md](code-generator.md)) in **modify mode**.
+
+**Human checkpoint:** Present changes for review.
+
+### Stage 3: Validate
+
+Spawn the **Validator** agent (read [validator.md](validator.md)). This stage is **mandatory** — do not skip it. At minimum, run:
+
+1. Instance patching test: `pytest test/transformers/test_monkey_patch.py -k "{model_type}" -xvs`
+2. All convergence tests for the model:
+   - `pytest test/convergence/bf16/test_mini_models.py -k "{model_type}" -xvs` (FLCE, bf16)
+   - `pytest test/convergence/bf16/test_mini_models_with_logits.py -k "{model_type}" -xvs` (non-FLCE, bf16)
+   - `pytest test/convergence/fp32/test_mini_models.py -k "{model_type}" -xvs` (FLCE, fp32)
+   - `pytest test/convergence/fp32/test_mini_models_with_logits.py -k "{model_type}" -xvs` (non-FLCE, fp32)
+   - If VL (multimodal) model, also run:
+     - `pytest test/convergence/bf16/test_mini_models_multimodal.py -k "{model_type}" -xvs`
+     - `pytest test/convergence/fp32/test_mini_models_multimodal.py -k "{model_type}" -xvs`
+3. Checkstyle: `make checkstyle`
+
+**Human checkpoint:** Report final test results.
+
 ## Reference Files
 
 - [decision-matrix.md](decision-matrix.md) — 12 architectural decisions to resolve per model
diff --git a/.claude/skills/liger-autopatch/code-generator.md b/.claude/skills/liger-autopatch/code-generator.md
@@ -1,6 +1,11 @@
 # Code Generator Agent
 
-Takes a confirmed model profile and generates all files to add Liger Kernel support.
+Takes a confirmed model profile (create mode) or a change plan (modify mode) and generates or modifies files for Liger Kernel support.
+
+## Mode
+
+- **Create mode** (default): Generating all files for a new model. Follow the full "Files to Generate" list below.
+- **Modify mode**: Making targeted changes to an existing monkey-patch. Follow the "Modification Checklist" section instead.
 
 ## Pre-Requisites
 
@@ -84,3 +89,85 @@ Add row to the Patching table under "### Patching":
 - Follow exact patterns from existing code — do not innovate on style
 - When modifying existing files, insert new entries in **alphabetical order** alongside similar existing entries. Never append to the end of a section — find the correct alphabetical position.
 - After generating all files, run `make checkstyle` to verify formatting. If it fails, run `ruff check . --fix && ruff format .` to auto-fix, then verify with `make checkstyle` again.
+
+## Modification Checklist (Modify Mode)
+
+Before making changes, read the existing implementation:
+1. Read `apply_liger_kernel_to_{model_type}` in `monkey_patch.py`
+2. Read the existing test in `test_monkey_patch.py` for this model
+3. Read the relevant HF modeling source for context
+
+### Rules for All Modifications
+
+**R1. Both patching levels.** If adding a new kernel, it must appear in BOTH:
+  - Class-level patching (the main body of `apply_liger_kernel_to_{model_type}`)
+  - Instance-level patching (the `if model is not None` block)
+
+  Omitting one is the most common mistake.
+
+**R2. New parameter with default.** Every new kernel gets a bool parameter on the
+  apply function signature (e.g., `relu_squared: bool = True`). Default should be `True`
+  for kernels that are safe to enable by default, `False` otherwise.
+
+**R3. Update docstring.** Update the function's docstring to:
+  - Add an `Args` entry for the new parameter
+  - Remove any stale notes that the new kernel invalidates
+    (e.g., "squared ReLU is not supported" → remove if you're adding it)
+
+**R4. Update tests.** In the existing `test_apply_liger_kernel_to_instance_for_{model_type}`:
+  - Add import for the new Liger kernel class
+  - Add "not yet patched" assertion before `_apply_liger_kernel_to_instance`
+  - Add "correctly patched" assertion after
+  - Follow the exact pattern of existing assertions in the same test
+
+**R5. Check revert function.** Read `revert_liger_kernel_to_{model_type}` in `test/utils.py`.
+  The revert function uses `importlib.reload(modeling_{model_type})` to undo all patches.
+  This handles most cases automatically, but check if the new kernel requires additional
+  revert logic (e.g., if the kernel patches something outside the modeling module, or
+  replaces a global like `ACT2FN` that `importlib.reload` won't fully restore). Update
+  the revert function if needed.
+
+**R6. Run convergence tests.** Don't modify convergence test files unless the change
+  requires it (e.g., new mini model config fields). But DO run existing convergence
+  tests in the Validate stage to verify no regression. This is critical — the Validator
+  agent (Stage 3) handles this, but if you are generating code without a separate
+  Validate stage, run these yourself:
+  ```bash
+  pytest test/convergence/bf16/test_mini_models.py -k "{model_type}" -xvs
+  pytest test/convergence/bf16/test_mini_models_with_logits.py -k "{model_type}" -xvs
+  pytest test/convergence/fp32/test_mini_models.py -k "{model_type}" -xvs
+  pytest test/convergence/fp32/test_mini_models_with_logits.py -k "{model_type}" -xvs
+  ```
+  For VL (multimodal) models, also run:
+  ```bash
+  pytest test/convergence/bf16/test_mini_models_multimodal.py -k "{model_type}" -xvs
+  pytest test/convergence/fp32/test_mini_models_multimodal.py -k "{model_type}" -xvs
+  ```
+
+**R7. Update README.md.** If the change adds a visibly new capability to the model's
+  row in the patching table (e.g., a new operation), update the supported operations list.
+
+### Common Modification Patterns
+
+**Adding an activation kernel (e.g., relu_squared for nemotron):**
+- Import the Liger kernel class at the top of `monkey_patch.py`
+- Add bool parameter to apply function signature
+- Class-level: replace in `ACT2FN` dict or replace the MLP class
+- Instance-level: patch each `decoder_layer`'s activation/MLP
+- Test: `assert isinstance` checks on the activation/MLP
+
+**Adding a norm variant:**
+- Add bool parameter to apply function
+- Class-level: replace the norm class
+- Instance-level: use `_patch_rms_norm_module` or `_patch_layer_norm_module` on all norm attrs
+- Test: `assert isinstance` checks on norm modules
+
+**Fixing missing instance patching:**
+- Read the class-level patching to see what's patched
+- Add corresponding instance-level patches in the `if model is not None` block
+- Test: add assertions that were missing
+
+**Updating for upstream HF changes:**
+- Compare the current HF modeling file against what the patch assumes
+- Update class names, attribute names, forward signatures as needed
+- May require updating `lce_forward` if the base model's forward changed