feat: redact PII fields before delete in user retirement flows by ktyagiapphelix2u · Pull Request #38426 · openedx/openedx-platform

ktyagiapphelix2u · 2026-04-23T10:23:01Z

Summary

Adds support for redacting sensitive PII fields before deletion in DeletableByUserValue models.

Changes include:

Added redact_before_delete_fields hook to model mixin
Redact emails before delete for:
- CourseEnrollmentAllowed
- PendingEmailChange
- UnregisteredLearnerCohortAssignments
Added SQL assertion helpers to verify UPDATE occurs before DELETE
Added regression/unit tests for safe ID-filtered redaction and deletion flows
Updated email change flow to use delete_by_user_value

Ticket & Reference

https://2u-internal.atlassian.net/browse/BOMS-498

Related PR

https://2u-internal.atlassian.net/browse/BOMS-565
https://2u-internal.atlassian.net/browse/BOMS-564

robrap · 2026-04-24T16:51:27Z

+        original_new_email = self.email_change.new_email
+        original_activation_key = self.email_change.activation_key
+        record_was_redacted = PendingEmailChange.redact_pending_email_by_user_value(self.user2, field='user')
+        assert not record_was_redacted


I don't really understand this test. Should it just have lines 617 and 618, where you ask to redact on a user that isn't in the table, and it returns that it didn't redact?

All the other details about the user 1 email change seem irrelevant and confusing. If you think it is important, I'd need better comments.

The test now:

Has a clear docstring explaining it verifies redacting a user with no pending email change returns False

Only tests the relevant behavior - calling redact on user2 (who has no email change record) returns False

Removes the confusing assertions about user1's email change data remaining unchanged

robrap · 2026-04-24T16:53:50Z

+        Redact pending email change fields for records matching ``field=value``.
+        This method is intended for retirement flows where downstream systems
+        may keep soft-deleted snapshots of these rows.
+        """


Docstrings should have a one line summary, and an optional blank line and longer description. Like a comment message.

Also, maybe add something like:

Returns True if redacted, and False if no matching records found.

Done. Condensed to a one-line summary + the Returns line

Akanshu-2u · 2026-04-27T13:54:07Z

+            record.new_email = get_retired_email_by_email(record.new_email)
+            record.save(update_fields=['new_email'])


The PR description explicitly identifies activation_key as sensitive data that can still persist indirectly in logs, backups, or downstream systems. However, the implementation only redacts new_email . activation_key is left as-is. If downstream systems snapshot these rows, the activation key still leaks. It should be cleared before deletion, e.g.:

Suggested change

record.new_email = get_retired_email_by_email(record.new_email)

record.save(update_fields=['new_email'])

record.new_email = get_retired_email_by_email(record.new_email)

record.activation_key = '' # or a redacted value

record.save(update_fields=['new_email', 'activation_key'])

@Akanshu-2u The activation_key field has a unique=True database constraint. If we attempt to redact it to a fixed value (empty string or redacted placeholder), we'll violate this constraint when processing multiple users, causing database integrity errors.

Additionally, the activation key is a random UUID with no PII - it's just a token

@ktyagiapphelix2u: The PR description mentions the activation key. I agree that this PR should not touch it, but can you update the PR description to remove any mention of it? Thanks.

Sure I have updated the PR Description

Akanshu-2u · 2026-04-27T13:56:42Z

+        assert record_was_redacted
+        self.email_change.refresh_from_db()
+        assert self.email_change.new_email == expected_retired_email
+        assert self.email_change.activation_key == original_activation_key


If activation_key redaction is added, this assertion must be updated to verify the key is also cleared/replaced. As-is, this test will need to change regardless once above issue is fixed.

This test assertion is correct as-is and does not need to change.

As explained in the previous comment thread, we are not adding activation_key redaction

Akanshu-2u · 2026-04-27T14:03:29Z

+        records_matching_user_value = cls.objects.filter(**filter_kwargs)
+        if not records_matching_user_value.exists():
+            return False
+        for record in records_matching_user_value:


Both queries fetch the same data. Since the queryset is lazy, .exists() and the for loop each trigger a separate SQL query. Change to:

Suggested change

records_matching_user_value = cls.objects.filter(**filter_kwargs)

if not records_matching_user_value.exists():

return False

for record in records_matching_user_value:

records = list(cls.objects.filter(**filter_kwargs))

if not records:

return False

for record in records:

This is a single DB hit, which matters more if the field filter ever doesn't use the OneToOneField on user.

Note: The change is optional.

Since PendingEmailChange has a OneToOneField on user, this will only ever return 0 or 1 records, so the memory impact is negligible while reducing database round-trips.

Thanks Updated

robrap · 2026-04-29T23:27:57Z

+            record.new_email = get_retired_email_by_email(record.new_email)
+            record.save(update_fields=['new_email'])


@ktyagiapphelix2u: The PR description mentions the activation key. I agree that this PR should not touch it, but can you update the PR description to remove any mention of it? Thanks.

robrap · 2026-04-29T23:32:53Z

+            PendingEmailChange.redact_pending_email_by_user_value(user, field="user")
            PendingEmailChange.delete_by_user_value(user, field="user")


Don't we just want to redact in delete_by_user_value before the delete happens? We'd use the same simple value from other PRs, like redact-before-delete@redacted.com. And this would resolve for any flow that is deleting this record, including retirement flow or completion of pending email changes, etc.

Additionally, I am alluding to two bugs in my above comment. Are these accurate? The PR description does not yet mention both of these. Also, the ticket has an AC that mentions three bugs. Is there another, or is that a copy/paste issue in the ticket?

Yes, there are two bugs this addresses:

Retirement flow - PendingEmailChange.delete_by_user_value() was called during account retirement without first redacting new_email, leaving the PII briefly visible (and in any soft-delete/audit snapshot taken at delete time).

Email confirmation flow - confirm_email_change in management.py also calls delete_by_user_value() when a pending email change is confirmed or cancelled, and it had the same gap. Moving the redaction inside the method fixes both flows simultaneously.

vgulati-apphelix · 2026-05-06T13:22:38Z

This change addresses a privacy issue in the retirement flow for users who have a pending primary email change.

Please elaborate the problem statement and solution provided.

ktyagiapphelix2u · 2026-05-04T06:23:56Z

+        Redact pending email change fields for records matching ``field=value``.
+        This method is intended for retirement flows where downstream systems
+        may keep soft-deleted snapshots of these rows.
+        """


Done. Condensed to a one-line summary + the Returns line

ktyagiapphelix2u · 2026-05-04T06:29:37Z

+            record.new_email = get_retired_email_by_email(record.new_email)
+            record.save(update_fields=['new_email'])


Sure I have updated the PR Description

ktyagiapphelix2u · 2026-05-04T07:34:29Z

+            PendingEmailChange.redact_pending_email_by_user_value(user, field="user")
            PendingEmailChange.delete_by_user_value(user, field="user")


Yes, there are two bugs this addresses:

Retirement flow - PendingEmailChange.delete_by_user_value() was called during account retirement without first redacting new_email, leaving the PII briefly visible (and in any soft-delete/audit snapshot taken at delete time).

Email confirmation flow - confirm_email_change in management.py also calls delete_by_user_value() when a pending email change is confirmed or cancelled, and it had the same gap. Moving the redaction inside the method fixes both flows simultaneously.

ktyagiapphelix2u · 2026-05-06T14:44:53Z

    activation_key = models.CharField(('activation key'), max_length=32, unique=True, db_index=True)

+    @classmethod
+    def delete_by_user_value(cls, value, field):


@robrap Redact at Call Sites (What you're suggesting)

In management.py:

# Email confirmation flow user.email = pec.new_email user.save() # Redact inline before delete pec.new_email = 'redacted@retired.invalid' pec.save() pec.delete()

In accounts/views.py:

# User retirement flow # Redact inline before delete PendingEmailChange.objects.filter(user=user).update(new_email='redacted@retired.invalid') PendingEmailChange.objects.filter(user=user).delete()

What the current approch is

@classmethod def delete_by_user_value(cls, value, field): records = cls.objects.filter(**{field: value}) records.update(new_email='redacted@retired.invalid') records.delete() return True

In both call sites:
PendingEmailChange.delete_by_user_value(user, field="user")

# management.py - must remember to redact pec.new_email = 'redacted@retired.invalid' pec.save() pec.delete() # accounts/views.py - must remember again records.update(new_email='redacted@retired.invalid') records.delete() # Future code in some other file pec.delete() # ←if they forget to redact

So current implementation is correct

# Change in ONE place: @classmethod def delete_by_user_value(cls, value, field): records.update(new_email='redacted@retired.invalid') #Only here # All callers automatically get the new value!

@robrap
PendingEmailChange overrides delete_by_user_value from DeletableByUserValue (in model_mixins.py) because the parent's version only deletes it has no knowledge of PII fields. The issue was that Fivetran's soft-delete ETL preserves deleted rows, so if new_email isn't redacted before deletion, the PII leaks downstream. The override works via Python's MRO: when any caller invokes PendingEmailChange.delete_by_user_value(...), Python finds the method on PendingEmailChange first and uses it automatically getting redaction without any change to the callers. We can't put the redaction logic in model_mixins.py itself because it's a generic mixin used by many models (UserOrgTag, CourseEnrollmentAllowed, etc.) that don't have a new_email field adding it there would either crash other models or force callers to pass field names in, which breaks loose coupling by making callers responsible for knowing a model's internal PII fields. The override keeps the design clean: callers stay unchanged, and each model owns its own redaction logic.

ttak-apphelix · 2026-05-12T07:16:17Z

+    @skip_unless_lms
+    @patch('common.djangoapps.student.signals.receivers.EmailChangeMiddleware.register_email_change')
+    @patch('common.djangoapps.student.views.management.ace')
+    def test_successful_email_change_redacts_pending_email_before_delete(self, ace_mail, mock_register):  # pylint: disable=unused-argument


mock_register is silently unused, comment doesn't explain why it's patched.

What I changed:

Removed the unused-argument suppression from the test.
Added an explicit assertion to use the patched mock:

mock_register.assert_called_once_with(self.request, expected_new_email)

ttak-apphelix · 2026-05-12T07:22:53Z

+        )
+
+        assert response.status_code == 200
+        assert mock_render_to_response('email_change_successful.html', {


The test is verifying both (a) that redaction ordering is correct AND (b) that the email change itself succeeded end-to-end. The existing test_successful_email_change already covers the success path. This test should only assert the redaction ordering plus the record being gone — the response content, user email update, and ace_mail.send.call_count assertions are noise here and make it harder to understand what the test is actually for.

I narrowed test_successful_email_change_redacts_pending_email_before_delete so it now only checks:

SQL ordering: UPDATE on student_pendingemailchange occurs before DELETE.
The pending email change record is removed (PendingEmailChange.objects.count() == 0).

robrap

Thanks.

robrap · 2026-05-20T03:15:59Z

    activation_key = models.CharField(('activation key'), max_length=32, unique=True, db_index=True)

+    @classmethod
+    def delete_by_user_value(cls, value, field):


This is how I think I'd handle this...

Create a redact_before_delete_fields in DeletableByUserValue that returns a dict of fields and their redacted values.

The default implementation of redact_before_delete_fields would return None or {}.

Not sure if you need to work with kwargs instead of a dict, and how that is done.

The method docstring would be something like:

Returns dict of PII fields that will be redacted before delete, and their redacted values. Always returns empty dict, unless overridden. Results will be used by `delete_by_user_value` to redact before delete. This can be used by models containing PII, in case the data is synced downstream, where soft-deletes would otherwise keep copies of the PII.

Then update delete_by_user_value to make use of this method to redact before delete. Then you wouldn't need this redundant implementation. As written, there is no need to even include DeletableByUserValue, since it has one method, it is overwritten, and the super is never called.

This would move the major testing to DeletableByUserValue as well. We'd have to determine if we'd want a shared test script for the actual models, or a mock of redact_before_delete_fields just to ensure it is called with the correct fields? Maybe we mock for now, or use a shared test method, to remove redundancy and then consider a refactor later?

Additionally, redact-pii-before-delete can happen under two circumstances:

During user retirement, where I don't think any operator would complain, or

Outside of user retirement, like sometimes occurs with a PendingEmailChange that is no longer pending.

For case 2, we are opting to redact early, which corrupts downstream soft-deleted data for users that are not retired, and may never retire. We are choosing for simpler/safer retirement, rather than later needing to track all these cases down and hard-delete this data from our downstream sync.
@bmtcril: Is it ok to make this decision for everyone, or would we want a setting named ENABLE_PII_REDACT_BEFORE_DELETE for circumstances like this? Related, if we added the setting, what would be its default?

We need to discuss this

robrap · 2026-05-20T03:37:47Z

+        with CaptureQueriesContext(connection) as ctx:
+            self.post_and_assert_status(data)
+
+        sql_list = [q['sql'].upper() for q in ctx]
+        update_indices = [i for i, sql in enumerate(sql_list) if 'UPDATE' in sql and table.upper() in sql]
+        delete_indices = [i for i, sql in enumerate(sql_list) if 'DELETE' in sql and table.upper() in sql]
+        assert update_indices, f'Expected an UPDATE on {table}'
+        assert delete_indices, f'Expected a DELETE on {table}'
+        assert any(u < d for u in update_indices for d in delete_indices), (
+            'Expected UPDATE to precede DELETE'
+        )


Suggestion:

test_retire_user could have with CaptureQueriesContext(connection) as ctx:,

The SQL assertions could be moved to a test helper associated with DeletableByUserValue (e.g. a TestDeletableByUserValue mixin)?

test_retire_user calls the new assertion helper.

Otherwise, this adds a redundant test (which will be multiple redundant tests later), just to add some additional assertions.

robrap · 2026-05-20T03:39:05Z

            self.retire_entitlement_support_detail(user)

-            # Retire misc. models that may contain PII of this user
+            # Retire misc. PII models. PendingEmailChange redacts before deletion for downstream safety.


I'd leave out this change. Eventually will be doing the same for all of these.

robrap · 2026-05-20T03:42:25Z

        assert PendingEmailChange.objects.count() == 0

+    @skip_unless_lms
+    def test_successful_email_change_redacts_pending_email_before_delete(self):


Note: This isn't actually a different test from test_successful_email_change, but simply an additional assertion. If we cleaned up the assertion code we probably don't need a separate test. See other comments for notes about cleaning up tests.

robrap · 2026-05-20T03:43:35Z

+        with CaptureQueriesContext(connection) as ctx:
+            confirm_email_change(self.request, self.key)
+
+        table = 'student_pendingemailchange'


Can we get this from PendingEmailChange? It's just a question that might make the assertion helper calls cleaner.

I switched it to PendingEmailChange._meta.db_table, so the test no longer hard-codes the table name and the helper call is a bit cleaner. It still exercises the same UPDATE-before-DELETE assertion.

vgulati-apphelix · 2026-05-21T12:18:49Z

+        if redact_fields:
+            cls.objects.filter(id__in=record_ids).update(**redact_fields)
+
+        cls.objects.filter(id__in=record_ids).delete()


@ktyagiapphelix2u avoid hitting the database twice for the same queryset creation.

Suggested change

if redact_fields:

cls.objects.filter(id__in=record_ids).update(**redact_fields)

cls.objects.filter(id__in=record_ids).delete()

queryset = cls.objects.filter(id__in=record_ids)

if redact_fields:

queryset.update(**redact_fields)

queryset.delete()

vgulati-apphelix · 2026-05-21T12:28:51Z

        self.req_factory = RequestFactory()
        self.request = self.req_factory.get('unused_url')
        self.request.user = self.user
+        self.request.session = {}


@ktyagiapphelix2u why is this required?

I originally added self.request.session = {} as a defensive/default request setup since many Django management-view tests depend on session or message storage indirectly. After checking the actual confirm_email_change flow in management.py, it turns out this path does not use request.session, so the line was unnecessary and I removed it

fix: redact pending primary email before retirement deletion

5e6a070

ktyagiapphelix2u marked this pull request as ready for review April 23, 2026 11:29

ktyagiapphelix2u requested a review from a team as a code owner April 23, 2026 11:29

robrap reviewed Apr 24, 2026

View reviewed changes

Akanshu-2u reviewed Apr 27, 2026

View reviewed changes

Akanshu-2u mentioned this pull request Apr 28, 2026

fix: retirement PII leaks by redacting pending secondary email/name data #38427

Open

Akanshu-2u approved these changes Apr 28, 2026

View reviewed changes

robrap reviewed Apr 29, 2026

View reviewed changes

ktyagiapphelix2u force-pushed the ktyagi/primaryemail branch 2 times, most recently from 0d5d5e6 to 2593ac2 Compare May 4, 2026 08:49

fix: redact pending primary email before retirement deletion

07bf642

ktyagiapphelix2u force-pushed the ktyagi/primaryemail branch from f7e8036 to 07bf642 Compare May 4, 2026 09:31

ktyagiapphelix2u added 6 commits May 4, 2026 15:04

Merge branch 'master' into ktyagi/primaryemail

3f9a0d4

fix: redact pending primary email before retirement deletion

ee86eeb

fix: redact pending primary email before retirement deletion

a59c1ed

fix: redact pending primary email before retirement deletion

f6d7e93

fix: redact pending primary email before retirement deletion

33c6df3

Merge branch 'master' into ktyagi/primaryemail

3f4cffd

ktyagiapphelix2u commented May 12, 2026

View reviewed changes

ktyagiapphelix2u added 3 commits May 12, 2026 06:04

fix: redact pending primary email before retirement deletion

5165304

fix: redact pending primary email before retirement deletion

be85318

fix: redact pending primary email before retirement deletion

0d526a3

ttak-apphelix reviewed May 12, 2026

View reviewed changes

ktyagiapphelix2u added 3 commits May 12, 2026 07:34

fix: redact pending primary email before retirement deletion

af71158

fix: redact pending primary email before retirement deletion

027069b

fix: redact pending primary email before retirement deletion

568ed45

ttak-apphelix approved these changes May 12, 2026

View reviewed changes

robrap reviewed May 20, 2026

View reviewed changes

ktyagiapphelix2u added 2 commits May 20, 2026 06:28

fix: redact pending primary email before retirement deletion

d168dc4

fix: redact pending primary email before retirement deletion

03fce31

ktyagiapphelix2u force-pushed the ktyagi/primaryemail branch from 3905bd5 to 03fce31 Compare May 20, 2026 09:07

ktyagiapphelix2u added 2 commits May 21, 2026 07:03

fix: redact pending primary email before retirement deletion

4b60c74

fix: redact pending primary email before retirement deletion

36bfeb9

ktyagiapphelix2u requested a review from a team as a code owner May 21, 2026 07:37

fix: redact pending primary email before retirement deletion

ab7b642

ktyagiapphelix2u force-pushed the ktyagi/primaryemail branch from f340dae to ab7b642 Compare May 21, 2026 07:42

vgulati-apphelix reviewed May 21, 2026

View reviewed changes

ktyagiapphelix2u added 4 commits May 22, 2026 05:43

fix: redact pending primary email before retirement deletion

6fd1b63

fix: redact pending primary email before retirement deletion

576c9e1

fix: redact pending primary email before retirement deletion

ad865a9

fix: redact pending primary email before retirement deletion

c351f22

ktyagiapphelix2u changed the title ~~fix: redact pending primary email before retirement deletion~~ feat: redact PII fields before delete in user retirement flows May 22, 2026

		record.new_email = get_retired_email_by_email(record.new_email)
		record.save(update_fields=['new_email'])

		PendingEmailChange.redact_pending_email_by_user_value(user, field="user")
		PendingEmailChange.delete_by_user_value(user, field="user")

Conversation

ktyagiapphelix2u commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Ticket & Reference

Related PR

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vgulati-apphelix commented May 6, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

robrap left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

ktyagiapphelix2u commented Apr 23, 2026 •

edited

Loading

ktyagiapphelix2u May 22, 2026 •

edited

Loading