A vLLM plugin for per-layer ablation studies.
Dynamically generates ablated subclasses of vLLM models to support ablation of attention and MLP layers of any vLLM model.
To install an editable version of this plugin to your environment (and add new models), first clone the repo:
git clone https://github.com/maxzuo/ablated-vllm-pluginand install with pip:
cd ablated-vllm-plugin
pip install -e .To use directly with pip:
pip install ablated-vllm-pluginTo run an ablated model, start by downloading a huggingface model to a local directory:
hf download Qwen/Qwen3.5-9B --local-dir ./ablated-qwen3.5-9bThen, set ablated_attention_layers and/or ablated_mlp_layers in the model's config.json, and add the "Ablated" prefix to the model architecture.
Values are dicts mapping layer index to an implementation:
{
"architectures": ["AblatedQwen3_5ForConditionalGeneration"],
"ablated_attention_layers": {"10": "zero"},
"ablated_mlp_layers": {"5": "identity"}
}| Value | Effect |
|---|---|
'zero' |
Replaces the layer output with zeros |
'identity' |
Passes hidden states through unchanged (skips the layer) |
No changes are needed to run this model with vLLM.
Add an entry to ablated_vllm/models.json:
{
"MyModelForCausalLM": {
"attention_classes": ["MyAttentionClass"],
"mlp_classes": ["MyMLPClass"],
"decoder_class": "MyDecoderLayer",
"module_path": "vllm.model_executor.models.my_model"
}
}No other code changes needed.
The plugin generates AblatedMyModelForCausalLM dynamically and automatically registers it with vLLM.