Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 34 additions & 29 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
<div align="center">

![*Open*Unlearning](assets/banner.png)
![OpenUnlearning](assets/banner.png)

<h3><strong>An easily extensible framework unifying LLM unlearning evaluation benchmarks.</strong></h3>

<div style="display: flex; gap: 10px; justify-content: center; align-items: center;">
<div style="display: flex; gap: 10px; justify-content: center; align-items: center;">
<a href="https://arxiv.org/abs/2506.12618"><img src="https://img.shields.io/badge/arXiv-Report-b31b1b?logo=arxiv&logoColor=white" alt="arXiv Paper"/></a>
<a href="https://github.com/locuslab/open-unlearning"><img src="https://img.shields.io/github/stars/locuslab/open-unlearning?style=social" alt="GitHub Repo stars"/></a>
<a href="https://github.com/locuslab/open-unlearning/actions"><img src="https://github.com/locuslab/open-unlearning/actions/workflows/tests.yml/badge.svg" alt="Build Status"/></a>
Expand All @@ -21,7 +21,6 @@

We provide efficient and streamlined implementations of the TOFU, MUSE and WMDP unlearning benchmarks while supporting 12+ unlearning methods, 5+ datasets, 10+ evaluation metrics, and 7+ LLM architectures. Each of these can be easily extended to incorporate more variants.


We invite the LLM unlearning community to collaborate by adding new benchmarks, unlearning methods, datasets and evaluation metrics here to expand OpenUnlearning's features, gain feedback from wider usage and drive progress in the field.

---
Expand All @@ -37,14 +36,14 @@ We invite the LLM unlearning community to collaborate by adding new benchmarks,
🚨 Our paper `OpenUnlearning: Accelerating LLM Unlearning via Unified Benchmarking of Methods and Metrics` is now out on [arXiv](https://arxiv.org/abs/2506.12618).

🌟 **Highlights:**

- A detailed technical report on OpenUnlearning covering the design, features, and implementation.
- A meta-evaluation framework for benchmarking unlearning evaluations across 450+ models, open-sourced on HuggingFace 🤗: [TOFU Models w & w/o Knowledge](https://huggingface.co/collections/open-unlearning/tofu-models-w-and-w-o-knowledge-6861e4d935eb99ba162e55cd), [TOFU Unlearned Models](https://huggingface.co/collections/open-unlearning/tofu-unlearned-models-6860f6cf3fe35d0223d92e88).
- A meta-evaluation framework for benchmarking unlearning evaluations across 450+ models, open-sourced on HuggingFace 🤗: [TOFU Models w &amp; w/o Knowledge](https://huggingface.co/collections/open-unlearning/tofu-models-w-and-w-o-knowledge-6861e4d935eb99ba162e55cd), [TOFU Unlearned Models](https://huggingface.co/collections/open-unlearning/tofu-unlearned-models-6860f6cf3fe35d0223d92e88).
- Results benchmarking 8 diverse unlearning methods in one place using 10 evaluation metrics on TOFU.

<details>
<summary><b>Older Updates</b></summary>


#### [May 19, 2025]

- **More Methods!** Added support for unlearning methods [UNDIAL](https://aclanthology.org/2025.naacl-long.444/) and [AltPO](https://aclanthology.org/2025.coling-main.252/).
Expand All @@ -55,19 +54,23 @@ We invite the LLM unlearning community to collaborate by adding new benchmarks,
- **More evaluations!** The [`lm-evaluation-harness`](https://github.com/EleutherAI/lm-evaluation-harness) toolkit has been integrated into OpenUnlearning, enabling WMDP evaluations and support for popular general LLM benchmarks, including MMLU, GSM8K, and others.

#### [Apr 6, 2025]

- **More Metrics!** Added 6 Membership Inference Attacks (MIA) (LOSS, ZLib, Reference, GradNorm, MinK, and MinK++), along with Extraction Strength (ES) and Exact Memorization (EM) as additional evaluation metrics.
- **More TOFU Evaluations!** Now includes a holdout set and supports MIA attack-based evaluation. You can now compute MUSE's privleak on TOFU.
- **More Documentation!** [`docs/links.md`](docs/links.md) contains resources for each of the implemented features and other useful LLM unlearning resources.

Be sure to run `python setup_data.py` immediately after merging the latest version. This is required to refresh the downloaded eval log files and ensure they're compatible with the latest evaluation metrics.

#### [Mar 27, 2025]

- **More Documentation: easy contributions and the leaderboard functionality**: We've updated the documentation to make contributing new unlearning methods and benchmarks much easier. Users can document additions better and also update a leaderboard with their results. See [this section](#-how-to-contribute) for details.

#### [Mar 9, 2025]

- **More Methods!** Added support for [RMU](https://arxiv.org/abs/2403.03218) (representation-engineering based unlearning).

#### [Feb 27, 2025]
#### [Feb 27, 2025]

⚠️ **Repository Update**: This repo replaces the original TOFU codebase at [`github.com/locuslab/tofu`](https://github.com/locuslab/tofu), which is no longer maintained.

</details>
Expand All @@ -78,18 +81,18 @@ Be sure to run `python setup_data.py` immediately after merging the latest versi

We provide several variants for each of the components in the unlearning pipeline.

| **Component** | **Available Options** |
|------------------------|----------------------|
| **Benchmarks** | [TOFU](https://arxiv.org/abs/2401.06121), [MUSE](https://muse-bench.github.io/), [WMDP](https://www.wmdp.ai/) |
| **Unlearning Methods** | GradAscent, GradDiff, NPO, SimNPO, DPO, RMU, UNDIAL, AltPO, SatImp, WGA, CE-U, PDU |
| **Evaluation Metrics** | Verbatim Probability, Verbatim ROUGE, Knowledge QA-ROUGE, Model Utility, Forget Quality, TruthRatio, Extraction Strength, Exact Memorization, 6 MIA attacks, [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) |
| **Datasets** | MUSE-News (BBC), MUSE-Books (Harry Potter), TOFU (different splits), WMDP-Bio, WMDP-Cyber |
| **Model Families** | TOFU: Llama-3.2, Llama-3.1, Llama-2; MUSE: Llama-2; Additional: Phi-3.5, Phi-1.5, Gemma, Zephyr |
| **Component** | **Available Options** |
| ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Benchmarks** | [TOFU](https://arxiv.org/abs/2401.06121), [MUSE](https://muse-bench.github.io/), [WMDP](https://www.wmdp.ai/) |
| **Unlearning Methods** | GradAscent, GradDiff, NPO, SimNPO, DPO, RMU, UNDIAL, AltPO, SatImp, WGA, CE-U, PDU |
| **Evaluation Metrics** | Verbatim Probability, Verbatim ROUGE, Knowledge QA-ROUGE, Model Utility, Forget Quality, TruthRatio, Extraction Strength, Exact Memorization, 6 MIA attacks,[lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) |
| **Datasets** | MUSE-News (BBC), MUSE-Books (Harry Potter), TOFU (different splits), WMDP-Bio, WMDP-Cyber |
| **Model Families** | TOFU: Llama-3.2, Llama-3.1, Llama-2; MUSE: Llama-2; Additional: Phi-3.5, Phi-1.5, Gemma, Zephyr |

---


## 📌 Table of Contents

- 📖 [Overview](#-overview)
- 📢 [Updates](#-updates)
- 🗃️ [Available Components](#%EF%B8%8F-available-components)
Expand All @@ -101,7 +104,7 @@ We provide several variants for each of the components in the unlearning pipelin
- 📜 [Running Baseline Experiments](#-running-baseline-experiments)
- ➕ [How to Contribute](#-how-to-contribute)
- 📚 [Further Documentation](#-further-documentation)
- 🔗 [Support & Contributors](#-support--contributors)
- 🔗 [Support &amp; Contributors](#-support--contributors)
- 📝 [Citing this work](#-citing-this-work)
- 🤝 [Acknowledgements](#-acknowledgements)
- 📄 [License](#-license)
Expand Down Expand Up @@ -129,7 +132,7 @@ python setup_data.py --eval # saves/eval now contains evaluation results of the

### 🔄 Updated TOFU benchmark

We've updated Open-Unlearning's TOFU benchmark target models to use a wider variety of newer architectures with sizes varying from 1B to 8B. These include Llama 3.2 1B, Llama 3.2 3B, Llama 3.1 8B, and the original Llama-2 7B (re-created) target models from [the old version of TOFU](github.com/locuslab/tofu).
We've updated Open-Unlearning's TOFU benchmark target models to use a wider variety of newer architectures with sizes varying from 1B to 8B. These include Llama 3.2 1B, Llama 3.2 3B, Llama 3.1 8B, and the original Llama-2 7B (re-created) target models from [the old version of TOFU](github.com/locuslab/tofu).

For each architecture, we have finetuned with four different splits of the TOFU datasets: `full`, `retain90`, `retain95`, `retain99`, for a total of 16 finetuned models. The first serves as the target (base model for unlearning) and the rest are retain models used to measure performance against for each forget split. These models are on [HuggingFace](`https://huggingface.co/collections/open-unlearning/tofu-new-models-67bcf636334ea81727573a9f0`) and the paths to these models can be set in the experimental configs or in command-line overrides.

Expand Down Expand Up @@ -172,8 +175,8 @@ python src/eval.py --config-name=eval.yaml experiment=eval/tofu/default \

For more details about creating and running evaluations, refer [`docs/evaluation.md`](docs/evaluation.md).


### 📜 Running Baseline Experiments

The scripts below execute standard baseline unlearning experiments on the TOFU and MUSE datasets, evaluated using their corresponding benchmarks. The expected results for these are in [`docs/repro.md`](docs/repro.md).

```bash
Expand All @@ -189,20 +192,20 @@ The above scripts are not tuned and uses default hyper parameter settings. We en

If you are interested in contributing to our work, please have a look at [`contributing.md`](docs/contributing.md) guide.


## 📚 Further Documentation

For more in-depth information on specific aspects of the framework, refer to the following documents:

| **Documentation** | **Contains** |
|------------------------------------------------|--------------------------------------------------------------------------------------------------------------------|
| [`docs/contributing.md`](docs/contributing.md) | Instructions on how to add new methods, benchmarks, components such as trainers, benchmarks, metrics, models, datasets, etc. |
| [`docs/evaluation.md`](docs/evaluation.md) | Detailed instructions on creating and running evaluation metrics and benchmarks. |
| [`docs/experiments.md`](docs/experiments.md) | Guide on running experiments in various configurations and settings, including distributed training, fine-tuning, and overriding arguments. |
| [`docs/hydra.md`](docs/hydra.md) | A short tutorial on Hydra features, Hydra is the configuration management package we use extensively. |
| [`community/leaderboard.md`](community/leaderboard.md) | Reference results from various unlearning methods run using this framework on TOFU and MUSE benchmarks. |
| [`docs/links.md`](docs/links.md) | List of all links to the research papers or other sources the implemented features are sourced from. |
| [`docs/repro.md`](docs/repro.md) | Results are provided solely for reproducibility purposes, without any parameter tuning. |
| **Documentation** | **Contains** |
| ----------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
| [`docs/contributing.md`](docs/contributing.md) | Instructions on how to add new methods, benchmarks, components such as trainers, benchmarks, metrics, models, datasets, etc. |
| [`docs/evaluation.md`](docs/evaluation.md) | Detailed instructions on creating and running evaluation metrics and benchmarks. |
| [`docs/experiments.md`](docs/experiments.md) | Guide on running experiments in various configurations and settings, including distributed training, fine-tuning, and overriding arguments. |
| [`docs/hydra.md`](docs/hydra.md) | A short tutorial on Hydra features, Hydra is the configuration management package we use extensively. |
| [`community/leaderboard.md`](community/leaderboard.md) | Reference results from various unlearning methods run using this framework on TOFU and MUSE benchmarks. |
| [`docs/links.md`](docs/links.md) | List of all links to the research papers or other sources the implemented features are sourced from. |
| [`docs/repro.md`](docs/repro.md) | Results are provided solely for reproducibility purposes, without any parameter tuning. |

---

## 🔗 Support & Contributors
Expand Down Expand Up @@ -239,18 +242,20 @@ If you use OpenUnlearning in your research, please make sure to cite our OpenUnl
url={https://arxiv.org/abs/2407.06460}
}
```

</details>

---

### 🤝 Acknowledgements

- This repo is inspired from [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory).
- The [TOFU](https://github.com/locuslab/tofu) and [MUSE](https://github.com/swj0419/muse_bench) benchmarks served as the foundation for our re-implementation.
- This repo is inspired from [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory).
- The [TOFU](https://github.com/locuslab/tofu) and [MUSE](https://github.com/swj0419/muse_bench) benchmarks served as the foundation for our re-implementation.

---

### 📄 License

This project is licensed under the MIT License. See the [`LICENSE`](LICENSE) file for details.

---
Expand Down
1 change: 1 addition & 0 deletions configs/eval/tofu.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ defaults: # include all defined metrics files
- model_utility # populated in the metrics key as metrics.model_utility
- privleak
- extraction_strength
- retain_extraction_strength
# - exact_memorization
# - mia_min_k_plus_plus
# - mia_min_k
Expand Down
15 changes: 15 additions & 0 deletions configs/eval/tofu_metrics/retain_extraction_strength.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# @package eval.tofu.metrics.retain_extraction_strength
defaults:
- ../../data/datasets@datasets: TOFU_QA_retain_eval
- ../../collator@collators: DataCollatorForSupervisedDatasetwithIndex
# ^ get default dataset and generation config information

handler: retain_extraction_strength
batch_size: ${eval.tofu.batch_size}

datasets:
TOFU_QA_retain_eval:
args:
hf_args:
name: "retain_perturbed"
question_key: ${eval.tofu.question_key}
8 changes: 4 additions & 4 deletions configs/trainer/SatImp.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@ args: # HuggingFace TrainingArguments
num_train_epochs: 5

method_args:
beta1: 5.0
beta2: 1.0
alpha: 1.0
gamma: 0.1
beta1: 4.0
beta2: 0.1
alpha: 0.1
gamma: 1.0
retain_loss_type: NLL
2 changes: 2 additions & 0 deletions src/evals/metrics/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
rouge,
truth_ratio,
extraction_strength,
retain_extraction_strength,
exact_memorization,
)
from evals.metrics.privacy import ks_test, privleak, rel_diff
Expand Down Expand Up @@ -62,6 +63,7 @@ def get_metrics(metric_cfgs: DictConfig, **kwargs):
_register_metric(rel_diff)
_register_metric(exact_memorization)
_register_metric(extraction_strength)
_register_metric(retain_extraction_strength)

# Register MIA metrics
_register_metric(mia_loss)
Expand Down
50 changes: 50 additions & 0 deletions src/evals/metrics/memorization.py
Original file line number Diff line number Diff line change
Expand Up @@ -267,3 +267,53 @@ def _extraction_strength(model, batch):
)
es_values = aggregate_to_1D(es_values)
return {"agg_value": np.mean(es_values), "value_by_index": scores_by_index}


@unlearning_metric(name="retain_extraction_strength")
def retain_extraction_strength(model, **kwargs):
data = kwargs["data"]
collator = kwargs["collators"]
batch_size = kwargs["batch_size"]
dataloader = DataLoader(data, batch_size=batch_size, collate_fn=collator)

def _extraction_strength(model, batch):
log_probs_batch, labels_batch = tokenwise_vocab_logprobs(
model, batch, grad=False, return_labels=True
)
es_batch = []
for log_probs, labels in zip(log_probs_batch, labels_batch):
valid_len = len(labels)
preds = torch.argmax(log_probs, dim=-1)
for k in range(valid_len):
suff_preds = preds[k:]
suff_labels = labels[k:]
if torch.equal(suff_preds, suff_labels):
break
if valid_len == 0:
# Rarely, tokenization can result in a mismatch with no valid target
# tokens for loss computation (see preprocess_chat_instance() for
# reference). Since this condition makes no sense in terms of
# computing ES, we just choose to set ES=None
logger.warning(
"ES score for an instance is marked None, due to "
"tokenization issues that resulted in no valid target tokens."
)
es_batch.append({"score": 0})
else:
es_score = 1 - (k / valid_len)
es_batch.append({"score": es_score})
return es_batch

fun_args = {}
scores_by_index = run_batchwise_evals(
model, dataloader, _extraction_strength, fun_args, "Calculating ES"
)
es_values = np.array(
[
evals["score"]
for evals in scores_by_index.values()
if evals["score"] is not None
]
)
es_values = aggregate_to_1D(es_values)
return {"agg_value": np.mean(es_values), "value_by_index": scores_by_index}
Binary file added src/trainer/unlearn/.DS_Store
Binary file not shown.
2 changes: 1 addition & 1 deletion src/trainer/unlearn/satimp.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

class SatImp(GradDiff):
def __init__(
self, beta1=5.0, beta2=1.0, gamma=1.0, alpha=0.1, *args, **kwargs
self, beta1=4.0, beta2=0.1, gamma=1.0, alpha=0.1, *args, **kwargs
): # attention, satimp requires two beta!!!!
super().__init__(*args, **kwargs)
self.beta1 = beta1
Expand Down
Loading