locuslab · Puning97 · May 11, 2026 · May 15, 2026 · May 15, 2026 · May 15, 2026
diff --git a/README.md b/README.md
@@ -1,10 +1,10 @@
 <div align="center">
 
-![*Open*Unlearning](assets/banner.png)
+![OpenUnlearning](assets/banner.png)
 
 <h3><strong>An easily extensible framework unifying LLM unlearning evaluation benchmarks.</strong></h3>
 
-  <div style="display: flex; gap: 10px; justify-content: center; align-items: center;">
+<div style="display: flex; gap: 10px; justify-content: center; align-items: center;">
     <a href="https://arxiv.org/abs/2506.12618"><img src="https://img.shields.io/badge/arXiv-Report-b31b1b?logo=arxiv&logoColor=white" alt="arXiv Paper"/></a>
     <a href="https://github.com/locuslab/open-unlearning"><img src="https://img.shields.io/github/stars/locuslab/open-unlearning?style=social" alt="GitHub Repo stars"/></a>
     <a href="https://github.com/locuslab/open-unlearning/actions"><img src="https://github.com/locuslab/open-unlearning/actions/workflows/tests.yml/badge.svg" alt="Build Status"/></a>
@@ -21,7 +21,6 @@
 
 We provide efficient and streamlined implementations of the TOFU, MUSE and WMDP unlearning benchmarks while supporting 12+ unlearning methods, 5+ datasets, 10+ evaluation metrics, and 7+ LLM architectures. Each of these can be easily extended to incorporate more variants.
 
-
 We invite the LLM unlearning community to collaborate by adding new benchmarks, unlearning methods, datasets and evaluation metrics here to expand OpenUnlearning's features, gain feedback from wider usage and drive progress in the field.
 
 ---
@@ -37,14 +36,14 @@ We invite the LLM unlearning community to collaborate by adding new benchmarks,
 🚨 Our paper `OpenUnlearning: Accelerating LLM Unlearning via Unified Benchmarking of Methods and Metrics` is now out on [arXiv](https://arxiv.org/abs/2506.12618).
 
 🌟 **Highlights:**
+
 - A detailed technical report on OpenUnlearning covering the design, features, and implementation.
-- A meta-evaluation framework for benchmarking unlearning evaluations across 450+ models, open-sourced on HuggingFace 🤗: [TOFU Models w & w/o Knowledge](https://huggingface.co/collections/open-unlearning/tofu-models-w-and-w-o-knowledge-6861e4d935eb99ba162e55cd), [TOFU Unlearned Models](https://huggingface.co/collections/open-unlearning/tofu-unlearned-models-6860f6cf3fe35d0223d92e88).
+- A meta-evaluation framework for benchmarking unlearning evaluations across 450+ models, open-sourced on HuggingFace 🤗: [TOFU Models w &amp; w/o Knowledge](https://huggingface.co/collections/open-unlearning/tofu-models-w-and-w-o-knowledge-6861e4d935eb99ba162e55cd), [TOFU Unlearned Models](https://huggingface.co/collections/open-unlearning/tofu-unlearned-models-6860f6cf3fe35d0223d92e88).
 - Results benchmarking 8 diverse unlearning methods in one place using 10 evaluation metrics on TOFU.
 
 <details>
 <summary><b>Older Updates</b></summary>
 
-
 #### [May 19, 2025]
 
 - **More Methods!** Added support for unlearning methods [UNDIAL](https://aclanthology.org/2025.naacl-long.444/) and [AltPO](https://aclanthology.org/2025.coling-main.252/).
@@ -55,19 +54,23 @@ We invite the LLM unlearning community to collaborate by adding new benchmarks,
 - **More evaluations!**  The [`lm-evaluation-harness`](https://github.com/EleutherAI/lm-evaluation-harness) toolkit has been integrated into OpenUnlearning, enabling WMDP evaluations and support for popular general LLM benchmarks, including MMLU, GSM8K, and others.
 
 #### [Apr 6, 2025]
+
 - **More Metrics!** Added 6 Membership Inference Attacks (MIA) (LOSS, ZLib, Reference, GradNorm, MinK, and MinK++), along with Extraction Strength (ES) and  Exact Memorization (EM) as additional evaluation metrics.
 - **More TOFU Evaluations!** Now includes a holdout set and supports MIA attack-based evaluation. You can now compute MUSE's privleak on TOFU.
 - **More Documentation!** [`docs/links.md`](docs/links.md) contains resources for each of the implemented features and other useful LLM unlearning resources.
 
 Be sure to run `python setup_data.py` immediately after merging the latest version. This is required to refresh the downloaded eval log files and ensure they're compatible with the latest evaluation metrics.
 
 #### [Mar 27, 2025]
+
 - **More Documentation: easy contributions and the leaderboard functionality**: We've updated the documentation to make contributing new unlearning methods and benchmarks much easier. Users can document additions better and also update a leaderboard with their results. See [this section](#-how-to-contribute) for details.
 
 #### [Mar 9, 2025]
+
 - **More Methods!** Added support for [RMU](https://arxiv.org/abs/2403.03218) (representation-engineering based unlearning).
 
-#### [Feb 27, 2025]  
+#### [Feb 27, 2025]
+
 ⚠️ **Repository Update**: This repo replaces the original TOFU codebase at [`github.com/locuslab/tofu`](https://github.com/locuslab/tofu), which is no longer maintained.
 
 </details>
@@ -78,18 +81,18 @@ Be sure to run `python setup_data.py` immediately after merging the latest versi
 
 We provide several variants for each of the components in the unlearning pipeline.
 
-| **Component**          | **Available Options** |
-|------------------------|----------------------|
-| **Benchmarks**        | [TOFU](https://arxiv.org/abs/2401.06121), [MUSE](https://muse-bench.github.io/), [WMDP](https://www.wmdp.ai/) |
-| **Unlearning Methods** | GradAscent, GradDiff, NPO, SimNPO, DPO, RMU, UNDIAL, AltPO, SatImp, WGA, CE-U, PDU |
-| **Evaluation Metrics** | Verbatim Probability, Verbatim ROUGE, Knowledge QA-ROUGE, Model Utility, Forget Quality, TruthRatio, Extraction Strength, Exact Memorization, 6 MIA attacks, [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) |
-| **Datasets**          | MUSE-News (BBC), MUSE-Books (Harry Potter), TOFU (different splits), WMDP-Bio, WMDP-Cyber |
-| **Model Families**    | TOFU: Llama-3.2, Llama-3.1, Llama-2; MUSE: Llama-2; Additional: Phi-3.5, Phi-1.5, Gemma, Zephyr |
+| **Component**          | **Available Options**                                                                                                                                                                                                           |
+| ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| **Benchmarks**         | [TOFU](https://arxiv.org/abs/2401.06121), [MUSE](https://muse-bench.github.io/), [WMDP](https://www.wmdp.ai/)                                                                                                                                  |
+| **Unlearning Methods** | GradAscent, GradDiff, NPO, SimNPO, DPO, RMU, UNDIAL, AltPO, SatImp, WGA, CE-U, PDU                                                                                                                                                    |
+| **Evaluation Metrics** | Verbatim Probability, Verbatim ROUGE, Knowledge QA-ROUGE, Model Utility, Forget Quality, TruthRatio, Extraction Strength, Exact Memorization, 6 MIA attacks,[lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) |
+| **Datasets**           | MUSE-News (BBC), MUSE-Books (Harry Potter), TOFU (different splits), WMDP-Bio, WMDP-Cyber                                                                                                                                             |
+| **Model Families**     | TOFU: Llama-3.2, Llama-3.1, Llama-2; MUSE: Llama-2; Additional: Phi-3.5, Phi-1.5, Gemma, Zephyr                                                                                                                                       |
 
 ---
 
-
 ## 📌 Table of Contents
+
 - 📖 [Overview](#-overview)
 - 📢 [Updates](#-updates)
 - 🗃️ [Available Components](#%EF%B8%8F-available-components)
@@ -101,7 +104,7 @@ We provide several variants for each of the components in the unlearning pipelin
   - 📜 [Running Baseline Experiments](#-running-baseline-experiments)
 - ➕ [How to Contribute](#-how-to-contribute)
 - 📚 [Further Documentation](#-further-documentation)
-- 🔗 [Support & Contributors](#-support--contributors)
+- 🔗 [Support &amp; Contributors](#-support--contributors)
 - 📝 [Citing this work](#-citing-this-work)
 - 🤝 [Acknowledgements](#-acknowledgements)
 - 📄 [License](#-license)
@@ -129,7 +132,7 @@ python setup_data.py --eval # saves/eval now contains evaluation results of the
 
 ### 🔄 Updated TOFU benchmark
 
-We've updated Open-Unlearning's TOFU benchmark target models to use a wider variety of newer architectures with sizes varying from 1B to 8B. These include Llama 3.2 1B, Llama 3.2 3B, Llama 3.1 8B, and the original Llama-2 7B (re-created) target models from [the old version of TOFU](github.com/locuslab/tofu). 
+We've updated Open-Unlearning's TOFU benchmark target models to use a wider variety of newer architectures with sizes varying from 1B to 8B. These include Llama 3.2 1B, Llama 3.2 3B, Llama 3.1 8B, and the original Llama-2 7B (re-created) target models from [the old version of TOFU](github.com/locuslab/tofu).
 
 For each architecture, we have finetuned with four different splits of the TOFU datasets: `full`, `retain90`, `retain95`, `retain99`, for a total of 16 finetuned models. The first serves as the target (base model for unlearning) and the rest are retain models used to measure performance against for each forget split. These models are on [HuggingFace](`https://huggingface.co/collections/open-unlearning/tofu-new-models-67bcf636334ea81727573a9f0`) and the paths to these models can be set in the experimental configs or in command-line overrides.
 
@@ -172,8 +175,8 @@ python src/eval.py --config-name=eval.yaml experiment=eval/tofu/default \
 
 For more details about creating and running evaluations, refer [`docs/evaluation.md`](docs/evaluation.md).
 
-
 ### 📜 Running Baseline Experiments
+
 The scripts below execute standard baseline unlearning experiments on the TOFU and MUSE datasets, evaluated using their corresponding benchmarks. The expected results for these are in [`docs/repro.md`](docs/repro.md).
 
 ```bash
@@ -189,20 +192,20 @@ The above scripts are not tuned and uses default hyper parameter settings. We en
 
 If you are interested in contributing to our work, please have a look at [`contributing.md`](docs/contributing.md) guide.
 
-
 ## 📚 Further Documentation
 
 For more in-depth information on specific aspects of the framework, refer to the following documents:
 
-| **Documentation**                              | **Contains**                                                                                                       |
-|------------------------------------------------|--------------------------------------------------------------------------------------------------------------------|
-| [`docs/contributing.md`](docs/contributing.md)       | Instructions on how to add new methods, benchmarks, components such as trainers, benchmarks, metrics, models, datasets, etc.              |
-| [`docs/evaluation.md`](docs/evaluation.md)       | Detailed instructions on creating and running evaluation metrics and benchmarks.                                     |
-| [`docs/experiments.md`](docs/experiments.md)     | Guide on running experiments in various configurations and settings, including distributed training, fine-tuning, and overriding arguments. |
-| [`docs/hydra.md`](docs/hydra.md)                 | A short tutorial on Hydra features, Hydra is the configuration management package we use extensively.                                  |
-| [`community/leaderboard.md`](community/leaderboard.md)             | Reference results from various unlearning methods run using this framework on TOFU and MUSE benchmarks.              |
-| [`docs/links.md`](docs/links.md)             | List of all links to the research papers or other sources the implemented features are sourced from.              |
-| [`docs/repro.md`](docs/repro.md)            | Results are provided solely for reproducibility purposes, without any parameter tuning.             |
+| **Documentation**                               | **Contains**                                                                                                                          |
+| ----------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
+| [`docs/contributing.md`](docs/contributing.md)         | Instructions on how to add new methods, benchmarks, components such as trainers, benchmarks, metrics, models, datasets, etc.                |
+| [`docs/evaluation.md`](docs/evaluation.md)             | Detailed instructions on creating and running evaluation metrics and benchmarks.                                                            |
+| [`docs/experiments.md`](docs/experiments.md)           | Guide on running experiments in various configurations and settings, including distributed training, fine-tuning, and overriding arguments. |
+| [`docs/hydra.md`](docs/hydra.md)                       | A short tutorial on Hydra features, Hydra is the configuration management package we use extensively.                                       |
+| [`community/leaderboard.md`](community/leaderboard.md) | Reference results from various unlearning methods run using this framework on TOFU and MUSE benchmarks.                                     |
+| [`docs/links.md`](docs/links.md)                       | List of all links to the research papers or other sources the implemented features are sourced from.                                        |
+| [`docs/repro.md`](docs/repro.md)                       | Results are provided solely for reproducibility purposes, without any parameter tuning.                                                     |
+
 ---
 
 ## 🔗 Support & Contributors
@@ -239,18 +242,20 @@ If you use OpenUnlearning in your research, please make sure to cite our OpenUnl
   url={https://arxiv.org/abs/2407.06460}
 }
 ```
+
 </details>
 
 ---
 
 ### 🤝 Acknowledgements
 
-- This repo is inspired from [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory). 
-- The [TOFU](https://github.com/locuslab/tofu) and [MUSE](https://github.com/swj0419/muse_bench) benchmarks served as the foundation for our re-implementation. 
+- This repo is inspired from [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory).
+- The [TOFU](https://github.com/locuslab/tofu) and [MUSE](https://github.com/swj0419/muse_bench) benchmarks served as the foundation for our re-implementation.
 
 ---
 
 ### 📄 License
+
 This project is licensed under the MIT License. See the [`LICENSE`](LICENSE) file for details.
 
 ---

diff --git a/configs/eval/tofu.yaml b/configs/eval/tofu.yaml
@@ -11,6 +11,7 @@ defaults: # include all defined metrics files
     - model_utility # populated in the metrics key as metrics.model_utility
     - privleak
     - extraction_strength
+    - retain_extraction_strength
     # - exact_memorization
     # - mia_min_k_plus_plus
     # - mia_min_k

diff --git a/configs/eval/tofu_metrics/retain_extraction_strength.yaml b/configs/eval/tofu_metrics/retain_extraction_strength.yaml
@@ -0,0 +1,15 @@
+# @package eval.tofu.metrics.retain_extraction_strength
+defaults:
+  - ../../data/datasets@datasets: TOFU_QA_retain_eval
+  - ../../collator@collators: DataCollatorForSupervisedDatasetwithIndex
+  # ^ get default dataset and generation config information
+
+handler: retain_extraction_strength
+batch_size: ${eval.tofu.batch_size}
+
+datasets:
+  TOFU_QA_retain_eval:
+    args:
+      hf_args:
+        name: "retain_perturbed"
+      question_key: ${eval.tofu.question_key}
diff --git a/configs/trainer/SatImp.yaml b/configs/trainer/SatImp.yaml
@@ -8,8 +8,8 @@ args: # HuggingFace TrainingArguments
   num_train_epochs: 5
 
 method_args:
-  beta1: 5.0
-  beta2: 1.0
-  alpha: 1.0
-  gamma: 0.1
+  beta1: 4.0
+  beta2: 0.1
+  alpha: 0.1
+  gamma: 1.0
   retain_loss_type: NLL
diff --git a/src/evals/metrics/__init__.py b/src/evals/metrics/__init__.py
@@ -7,6 +7,7 @@
     rouge,
     truth_ratio,
     extraction_strength,
+    retain_extraction_strength,
     exact_memorization,
 )
 from evals.metrics.privacy import ks_test, privleak, rel_diff
@@ -62,6 +63,7 @@ def get_metrics(metric_cfgs: DictConfig, **kwargs):
 _register_metric(rel_diff)
 _register_metric(exact_memorization)
 _register_metric(extraction_strength)
+_register_metric(retain_extraction_strength)
 
 # Register MIA metrics
 _register_metric(mia_loss)

diff --git a/src/evals/metrics/memorization.py b/src/evals/metrics/memorization.py
@@ -267,3 +267,53 @@ def _extraction_strength(model, batch):
     )
     es_values = aggregate_to_1D(es_values)
     return {"agg_value": np.mean(es_values), "value_by_index": scores_by_index}
+
+
+@unlearning_metric(name="retain_extraction_strength")
+def retain_extraction_strength(model, **kwargs):
+    data = kwargs["data"]
+    collator = kwargs["collators"]
+    batch_size = kwargs["batch_size"]
+    dataloader = DataLoader(data, batch_size=batch_size, collate_fn=collator)
+
+    def _extraction_strength(model, batch):
+        log_probs_batch, labels_batch = tokenwise_vocab_logprobs(
+            model, batch, grad=False, return_labels=True
+        )
+        es_batch = []
+        for log_probs, labels in zip(log_probs_batch, labels_batch):
+            valid_len = len(labels)
+            preds = torch.argmax(log_probs, dim=-1)
+            for k in range(valid_len):
+                suff_preds = preds[k:]
+                suff_labels = labels[k:]
+                if torch.equal(suff_preds, suff_labels):
+                    break
+            if valid_len == 0:
+                # Rarely, tokenization can result in a mismatch with no valid target
+                # tokens for loss computation (see preprocess_chat_instance() for
+                # reference). Since this condition makes no sense in terms of
+                # computing ES, we just choose to set ES=None
+                logger.warning(
+                    "ES score for an instance is marked None, due to "
+                    "tokenization issues that resulted in no valid target tokens."
+                )
+                es_batch.append({"score": 0})
+            else:
+                es_score = 1 - (k / valid_len)
+                es_batch.append({"score": es_score})
+        return es_batch
+
+    fun_args = {}
+    scores_by_index = run_batchwise_evals(
+        model, dataloader, _extraction_strength, fun_args, "Calculating ES"
+    )
+    es_values = np.array(
+        [
+            evals["score"]
+            for evals in scores_by_index.values()
+            if evals["score"] is not None
+        ]
+    )
+    es_values = aggregate_to_1D(es_values)
+    return {"agg_value": np.mean(es_values), "value_by_index": scores_by_index}
diff --git a/src/trainer/unlearn/.DS_Store b/src/trainer/unlearn/.DS_Store
diff --git a/src/trainer/unlearn/satimp.py b/src/trainer/unlearn/satimp.py
@@ -4,7 +4,7 @@
 
 class SatImp(GradDiff):
     def __init__(
-        self, beta1=5.0, beta2=1.0, gamma=1.0, alpha=0.1, *args, **kwargs
+        self, beta1=4.0, beta2=0.1, gamma=1.0, alpha=0.1, *args, **kwargs
     ):  # attention, satimp requires two beta!!!!
         super().__init__(*args, **kwargs)
         self.beta1 = beta1