
@@ -21,7 +21,6 @@
We provide efficient and streamlined implementations of the TOFU, MUSE and WMDP unlearning benchmarks while supporting 12+ unlearning methods, 5+ datasets, 10+ evaluation metrics, and 7+ LLM architectures. Each of these can be easily extended to incorporate more variants.
-
We invite the LLM unlearning community to collaborate by adding new benchmarks, unlearning methods, datasets and evaluation metrics here to expand OpenUnlearning's features, gain feedback from wider usage and drive progress in the field.
---
@@ -37,14 +36,14 @@ We invite the LLM unlearning community to collaborate by adding new benchmarks,
🚨 Our paper `OpenUnlearning: Accelerating LLM Unlearning via Unified Benchmarking of Methods and Metrics` is now out on [arXiv](https://arxiv.org/abs/2506.12618).
🌟 **Highlights:**
+
- A detailed technical report on OpenUnlearning covering the design, features, and implementation.
-- A meta-evaluation framework for benchmarking unlearning evaluations across 450+ models, open-sourced on HuggingFace 🤗: [TOFU Models w & w/o Knowledge](https://huggingface.co/collections/open-unlearning/tofu-models-w-and-w-o-knowledge-6861e4d935eb99ba162e55cd), [TOFU Unlearned Models](https://huggingface.co/collections/open-unlearning/tofu-unlearned-models-6860f6cf3fe35d0223d92e88).
+- A meta-evaluation framework for benchmarking unlearning evaluations across 450+ models, open-sourced on HuggingFace 🤗: [TOFU Models w & w/o Knowledge](https://huggingface.co/collections/open-unlearning/tofu-models-w-and-w-o-knowledge-6861e4d935eb99ba162e55cd), [TOFU Unlearned Models](https://huggingface.co/collections/open-unlearning/tofu-unlearned-models-6860f6cf3fe35d0223d92e88).
- Results benchmarking 8 diverse unlearning methods in one place using 10 evaluation metrics on TOFU.
Older Updates
-
#### [May 19, 2025]
- **More Methods!** Added support for unlearning methods [UNDIAL](https://aclanthology.org/2025.naacl-long.444/) and [AltPO](https://aclanthology.org/2025.coling-main.252/).
@@ -55,6 +54,7 @@ We invite the LLM unlearning community to collaborate by adding new benchmarks,
- **More evaluations!** The [`lm-evaluation-harness`](https://github.com/EleutherAI/lm-evaluation-harness) toolkit has been integrated into OpenUnlearning, enabling WMDP evaluations and support for popular general LLM benchmarks, including MMLU, GSM8K, and others.
#### [Apr 6, 2025]
+
- **More Metrics!** Added 6 Membership Inference Attacks (MIA) (LOSS, ZLib, Reference, GradNorm, MinK, and MinK++), along with Extraction Strength (ES) and Exact Memorization (EM) as additional evaluation metrics.
- **More TOFU Evaluations!** Now includes a holdout set and supports MIA attack-based evaluation. You can now compute MUSE's privleak on TOFU.
- **More Documentation!** [`docs/links.md`](docs/links.md) contains resources for each of the implemented features and other useful LLM unlearning resources.
@@ -62,12 +62,15 @@ We invite the LLM unlearning community to collaborate by adding new benchmarks,
Be sure to run `python setup_data.py` immediately after merging the latest version. This is required to refresh the downloaded eval log files and ensure they're compatible with the latest evaluation metrics.
#### [Mar 27, 2025]
+
- **More Documentation: easy contributions and the leaderboard functionality**: We've updated the documentation to make contributing new unlearning methods and benchmarks much easier. Users can document additions better and also update a leaderboard with their results. See [this section](#-how-to-contribute) for details.
#### [Mar 9, 2025]
+
- **More Methods!** Added support for [RMU](https://arxiv.org/abs/2403.03218) (representation-engineering based unlearning).
-#### [Feb 27, 2025]
+#### [Feb 27, 2025]
+
⚠️ **Repository Update**: This repo replaces the original TOFU codebase at [`github.com/locuslab/tofu`](https://github.com/locuslab/tofu), which is no longer maintained.
@@ -78,18 +81,18 @@ Be sure to run `python setup_data.py` immediately after merging the latest versi
We provide several variants for each of the components in the unlearning pipeline.
-| **Component** | **Available Options** |
-|------------------------|----------------------|
-| **Benchmarks** | [TOFU](https://arxiv.org/abs/2401.06121), [MUSE](https://muse-bench.github.io/), [WMDP](https://www.wmdp.ai/) |
-| **Unlearning Methods** | GradAscent, GradDiff, NPO, SimNPO, DPO, RMU, UNDIAL, AltPO, SatImp, WGA, CE-U, PDU |
-| **Evaluation Metrics** | Verbatim Probability, Verbatim ROUGE, Knowledge QA-ROUGE, Model Utility, Forget Quality, TruthRatio, Extraction Strength, Exact Memorization, 6 MIA attacks, [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) |
-| **Datasets** | MUSE-News (BBC), MUSE-Books (Harry Potter), TOFU (different splits), WMDP-Bio, WMDP-Cyber |
-| **Model Families** | TOFU: Llama-3.2, Llama-3.1, Llama-2; MUSE: Llama-2; Additional: Phi-3.5, Phi-1.5, Gemma, Zephyr |
+| **Component** | **Available Options** |
+| ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| **Benchmarks** | [TOFU](https://arxiv.org/abs/2401.06121), [MUSE](https://muse-bench.github.io/), [WMDP](https://www.wmdp.ai/) |
+| **Unlearning Methods** | GradAscent, GradDiff, NPO, SimNPO, DPO, RMU, UNDIAL, AltPO, SatImp, WGA, CE-U, PDU |
+| **Evaluation Metrics** | Verbatim Probability, Verbatim ROUGE, Knowledge QA-ROUGE, Model Utility, Forget Quality, TruthRatio, Extraction Strength, Exact Memorization, 6 MIA attacks,[lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) |
+| **Datasets** | MUSE-News (BBC), MUSE-Books (Harry Potter), TOFU (different splits), WMDP-Bio, WMDP-Cyber |
+| **Model Families** | TOFU: Llama-3.2, Llama-3.1, Llama-2; MUSE: Llama-2; Additional: Phi-3.5, Phi-1.5, Gemma, Zephyr |
---
-
## 📌 Table of Contents
+
- 📖 [Overview](#-overview)
- 📢 [Updates](#-updates)
- 🗃️ [Available Components](#%EF%B8%8F-available-components)
@@ -101,7 +104,7 @@ We provide several variants for each of the components in the unlearning pipelin
- 📜 [Running Baseline Experiments](#-running-baseline-experiments)
- ➕ [How to Contribute](#-how-to-contribute)
- 📚 [Further Documentation](#-further-documentation)
-- 🔗 [Support & Contributors](#-support--contributors)
+- 🔗 [Support & Contributors](#-support--contributors)
- 📝 [Citing this work](#-citing-this-work)
- 🤝 [Acknowledgements](#-acknowledgements)
- 📄 [License](#-license)
@@ -129,7 +132,7 @@ python setup_data.py --eval # saves/eval now contains evaluation results of the
### 🔄 Updated TOFU benchmark
-We've updated Open-Unlearning's TOFU benchmark target models to use a wider variety of newer architectures with sizes varying from 1B to 8B. These include Llama 3.2 1B, Llama 3.2 3B, Llama 3.1 8B, and the original Llama-2 7B (re-created) target models from [the old version of TOFU](github.com/locuslab/tofu).
+We've updated Open-Unlearning's TOFU benchmark target models to use a wider variety of newer architectures with sizes varying from 1B to 8B. These include Llama 3.2 1B, Llama 3.2 3B, Llama 3.1 8B, and the original Llama-2 7B (re-created) target models from [the old version of TOFU](github.com/locuslab/tofu).
For each architecture, we have finetuned with four different splits of the TOFU datasets: `full`, `retain90`, `retain95`, `retain99`, for a total of 16 finetuned models. The first serves as the target (base model for unlearning) and the rest are retain models used to measure performance against for each forget split. These models are on [HuggingFace](`https://huggingface.co/collections/open-unlearning/tofu-new-models-67bcf636334ea81727573a9f0`) and the paths to these models can be set in the experimental configs or in command-line overrides.
@@ -172,8 +175,8 @@ python src/eval.py --config-name=eval.yaml experiment=eval/tofu/default \
For more details about creating and running evaluations, refer [`docs/evaluation.md`](docs/evaluation.md).
-
### 📜 Running Baseline Experiments
+
The scripts below execute standard baseline unlearning experiments on the TOFU and MUSE datasets, evaluated using their corresponding benchmarks. The expected results for these are in [`docs/repro.md`](docs/repro.md).
```bash
@@ -189,20 +192,20 @@ The above scripts are not tuned and uses default hyper parameter settings. We en
If you are interested in contributing to our work, please have a look at [`contributing.md`](docs/contributing.md) guide.
-
## 📚 Further Documentation
For more in-depth information on specific aspects of the framework, refer to the following documents:
-| **Documentation** | **Contains** |
-|------------------------------------------------|--------------------------------------------------------------------------------------------------------------------|
-| [`docs/contributing.md`](docs/contributing.md) | Instructions on how to add new methods, benchmarks, components such as trainers, benchmarks, metrics, models, datasets, etc. |
-| [`docs/evaluation.md`](docs/evaluation.md) | Detailed instructions on creating and running evaluation metrics and benchmarks. |
-| [`docs/experiments.md`](docs/experiments.md) | Guide on running experiments in various configurations and settings, including distributed training, fine-tuning, and overriding arguments. |
-| [`docs/hydra.md`](docs/hydra.md) | A short tutorial on Hydra features, Hydra is the configuration management package we use extensively. |
-| [`community/leaderboard.md`](community/leaderboard.md) | Reference results from various unlearning methods run using this framework on TOFU and MUSE benchmarks. |
-| [`docs/links.md`](docs/links.md) | List of all links to the research papers or other sources the implemented features are sourced from. |
-| [`docs/repro.md`](docs/repro.md) | Results are provided solely for reproducibility purposes, without any parameter tuning. |
+| **Documentation** | **Contains** |
+| ----------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
+| [`docs/contributing.md`](docs/contributing.md) | Instructions on how to add new methods, benchmarks, components such as trainers, benchmarks, metrics, models, datasets, etc. |
+| [`docs/evaluation.md`](docs/evaluation.md) | Detailed instructions on creating and running evaluation metrics and benchmarks. |
+| [`docs/experiments.md`](docs/experiments.md) | Guide on running experiments in various configurations and settings, including distributed training, fine-tuning, and overriding arguments. |
+| [`docs/hydra.md`](docs/hydra.md) | A short tutorial on Hydra features, Hydra is the configuration management package we use extensively. |
+| [`community/leaderboard.md`](community/leaderboard.md) | Reference results from various unlearning methods run using this framework on TOFU and MUSE benchmarks. |
+| [`docs/links.md`](docs/links.md) | List of all links to the research papers or other sources the implemented features are sourced from. |
+| [`docs/repro.md`](docs/repro.md) | Results are provided solely for reproducibility purposes, without any parameter tuning. |
+
---
## 🔗 Support & Contributors
@@ -239,18 +242,20 @@ If you use OpenUnlearning in your research, please make sure to cite our OpenUnl
url={https://arxiv.org/abs/2407.06460}
}
```
+
---
### 🤝 Acknowledgements
-- This repo is inspired from [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory).
-- The [TOFU](https://github.com/locuslab/tofu) and [MUSE](https://github.com/swj0419/muse_bench) benchmarks served as the foundation for our re-implementation.
+- This repo is inspired from [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory).
+- The [TOFU](https://github.com/locuslab/tofu) and [MUSE](https://github.com/swj0419/muse_bench) benchmarks served as the foundation for our re-implementation.
---
### 📄 License
+
This project is licensed under the MIT License. See the [`LICENSE`](LICENSE) file for details.
---
diff --git a/configs/eval/tofu.yaml b/configs/eval/tofu.yaml
index 29e05e488..e1d4fd368 100644
--- a/configs/eval/tofu.yaml
+++ b/configs/eval/tofu.yaml
@@ -11,6 +11,7 @@ defaults: # include all defined metrics files
- model_utility # populated in the metrics key as metrics.model_utility
- privleak
- extraction_strength
+ - retain_extraction_strength
# - exact_memorization
# - mia_min_k_plus_plus
# - mia_min_k
diff --git a/configs/eval/tofu_metrics/retain_extraction_strength.yaml b/configs/eval/tofu_metrics/retain_extraction_strength.yaml
new file mode 100644
index 000000000..981851211
--- /dev/null
+++ b/configs/eval/tofu_metrics/retain_extraction_strength.yaml
@@ -0,0 +1,15 @@
+# @package eval.tofu.metrics.retain_extraction_strength
+defaults:
+ - ../../data/datasets@datasets: TOFU_QA_retain_eval
+ - ../../collator@collators: DataCollatorForSupervisedDatasetwithIndex
+ # ^ get default dataset and generation config information
+
+handler: retain_extraction_strength
+batch_size: ${eval.tofu.batch_size}
+
+datasets:
+ TOFU_QA_retain_eval:
+ args:
+ hf_args:
+ name: "retain_perturbed"
+ question_key: ${eval.tofu.question_key}
\ No newline at end of file
diff --git a/configs/trainer/SatImp.yaml b/configs/trainer/SatImp.yaml
index f8d9c757b..8da6b1336 100644
--- a/configs/trainer/SatImp.yaml
+++ b/configs/trainer/SatImp.yaml
@@ -8,8 +8,8 @@ args: # HuggingFace TrainingArguments
num_train_epochs: 5
method_args:
- beta1: 5.0
- beta2: 1.0
- alpha: 1.0
- gamma: 0.1
+ beta1: 4.0
+ beta2: 0.1
+ alpha: 0.1
+ gamma: 1.0
retain_loss_type: NLL
\ No newline at end of file
diff --git a/src/evals/metrics/__init__.py b/src/evals/metrics/__init__.py
index 5afb04243..967e89a7c 100644
--- a/src/evals/metrics/__init__.py
+++ b/src/evals/metrics/__init__.py
@@ -7,6 +7,7 @@
rouge,
truth_ratio,
extraction_strength,
+ retain_extraction_strength,
exact_memorization,
)
from evals.metrics.privacy import ks_test, privleak, rel_diff
@@ -62,6 +63,7 @@ def get_metrics(metric_cfgs: DictConfig, **kwargs):
_register_metric(rel_diff)
_register_metric(exact_memorization)
_register_metric(extraction_strength)
+_register_metric(retain_extraction_strength)
# Register MIA metrics
_register_metric(mia_loss)
diff --git a/src/evals/metrics/memorization.py b/src/evals/metrics/memorization.py
index c7bbe386c..c70b2f6d2 100644
--- a/src/evals/metrics/memorization.py
+++ b/src/evals/metrics/memorization.py
@@ -267,3 +267,53 @@ def _extraction_strength(model, batch):
)
es_values = aggregate_to_1D(es_values)
return {"agg_value": np.mean(es_values), "value_by_index": scores_by_index}
+
+
+@unlearning_metric(name="retain_extraction_strength")
+def retain_extraction_strength(model, **kwargs):
+ data = kwargs["data"]
+ collator = kwargs["collators"]
+ batch_size = kwargs["batch_size"]
+ dataloader = DataLoader(data, batch_size=batch_size, collate_fn=collator)
+
+ def _extraction_strength(model, batch):
+ log_probs_batch, labels_batch = tokenwise_vocab_logprobs(
+ model, batch, grad=False, return_labels=True
+ )
+ es_batch = []
+ for log_probs, labels in zip(log_probs_batch, labels_batch):
+ valid_len = len(labels)
+ preds = torch.argmax(log_probs, dim=-1)
+ for k in range(valid_len):
+ suff_preds = preds[k:]
+ suff_labels = labels[k:]
+ if torch.equal(suff_preds, suff_labels):
+ break
+ if valid_len == 0:
+ # Rarely, tokenization can result in a mismatch with no valid target
+ # tokens for loss computation (see preprocess_chat_instance() for
+ # reference). Since this condition makes no sense in terms of
+ # computing ES, we just choose to set ES=None
+ logger.warning(
+ "ES score for an instance is marked None, due to "
+ "tokenization issues that resulted in no valid target tokens."
+ )
+ es_batch.append({"score": 0})
+ else:
+ es_score = 1 - (k / valid_len)
+ es_batch.append({"score": es_score})
+ return es_batch
+
+ fun_args = {}
+ scores_by_index = run_batchwise_evals(
+ model, dataloader, _extraction_strength, fun_args, "Calculating ES"
+ )
+ es_values = np.array(
+ [
+ evals["score"]
+ for evals in scores_by_index.values()
+ if evals["score"] is not None
+ ]
+ )
+ es_values = aggregate_to_1D(es_values)
+ return {"agg_value": np.mean(es_values), "value_by_index": scores_by_index}
diff --git a/src/trainer/unlearn/.DS_Store b/src/trainer/unlearn/.DS_Store
new file mode 100644
index 000000000..5008ddfcf
Binary files /dev/null and b/src/trainer/unlearn/.DS_Store differ
diff --git a/src/trainer/unlearn/satimp.py b/src/trainer/unlearn/satimp.py
index f42d4acbb..dfd1ed33a 100644
--- a/src/trainer/unlearn/satimp.py
+++ b/src/trainer/unlearn/satimp.py
@@ -4,7 +4,7 @@
class SatImp(GradDiff):
def __init__(
- self, beta1=5.0, beta2=1.0, gamma=1.0, alpha=0.1, *args, **kwargs
+ self, beta1=4.0, beta2=0.1, gamma=1.0, alpha=0.1, *args, **kwargs
): # attention, satimp requires two beta!!!!
super().__init__(*args, **kwargs)
self.beta1 = beta1