Skip to content

PM-33527 Database SQL scripts#7616

Open
prograhamming wants to merge 18 commits into
mainfrom
dirt/pm-33527/db-orphaned-event-logs
Open

PM-33527 Database SQL scripts#7616
prograhamming wants to merge 18 commits into
mainfrom
dirt/pm-33527/db-orphaned-event-logs

Conversation

@prograhamming
Copy link
Copy Markdown
Contributor

@prograhamming prograhamming commented May 12, 2026

🎟️ Tracking

PM-33527

📔 Objective

In order to stay within GDPR compliance when a organization wants to be deleted we need to remove the event logs from Azure Table Storage. This PR is the database changes needed in order to delete the event logs using jobs and long running processes.

📸 Screenshots

@prograhamming prograhamming changed the title PM-33527 initial commit PM-33527 Database SQL scripts May 12, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

Codecov Report

❌ Patch coverage is 2.00000% with 49 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.61%. Comparing base (59ded30) to head (7ee8b24).

Files with missing lines Patch % Lines
...Repositories/OrganizationEventCleanupRepository.cs 0.00% 39 Missing ⚠️
src/Core/Dirt/Entities/OrganizationEventCleanup.cs 0.00% 10 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7616      +/-   ##
==========================================
+ Coverage   60.20%   64.61%   +4.40%     
==========================================
  Files        2137     2139       +2     
  Lines       94412    94462      +50     
  Branches     8409     8409              
==========================================
+ Hits        56840    61034    +4194     
+ Misses      35580    31345    -4235     
- Partials     1992     2083      +91     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@prograhamming prograhamming marked this pull request as ready for review May 13, 2026 04:24
@prograhamming prograhamming requested review from a team as code owners May 13, 2026 04:24
@Banrion Banrion added the ai-review Request a Claude code review label May 14, 2026
@Banrion
Copy link
Copy Markdown
Contributor

Banrion commented May 14, 2026

@claude Please add a review

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 14, 2026

Claude finished @Banrion's task in 2m 56s —— View job


Code Review — PM-33527 Database SQL scripts

  • Read changed files
  • Analyze SQL scripts and migration
  • Analyze C# entity and repository
  • Post review feedback

Thanks for the GDPR cleanup scaffolding. The DDL/sproc/Dapper repo look coherent on their own, but there are gaps before this is wireable into the worker. Findings below grouped by severity.


🔴 Blocking

1. Missing EF Core implementation (dual‑ORM strategy)
Bitwarden supports MSSQL (Dapper) and PostgreSQL / MySQL / SQLite via EF Core. This PR adds only the Dapper side. To be deployable to all supported databases you also need:

  • src/Infrastructure.EntityFramework/Dirt/Models/OrganizationEventCleanup.cs
  • src/Infrastructure.EntityFramework/Dirt/Configurations/OrganizationEventCleanupEntityTypeConfiguration.cs
  • src/Infrastructure.EntityFramework/Dirt/Repositories/OrganizationEventCleanupRepository.cs
  • A DbSet<OrganizationEventCleanup> entry on DatabaseContext
  • Generated migrations under util/MySqlMigrations/Migrations, util/PostgresMigrations/Migrations, util/SqliteMigrations/Migrations

See the precedent set by OrganizationApplication/OrganizationReport.

2. DI registration missing
Repository is not registered. It must be added to both:

  • src/Infrastructure.Dapper/DapperServiceCollectionExtensions.cs (around line 81 where the other Dirt repos register)
  • src/Infrastructure.EntityFramework/EntityFrameworkServiceCollectionExtensions.cs (around line 118)

Without this, IOrganizationEventCleanupRepository cannot be resolved at runtime.

3. ReadNextPending doesn't atomically claim work
OrganizationEventCleanup_ReadNextPending.sql:1-14

SELECT TOP 1 * FROM [dbo].[OrganizationEventCleanup]
WHERE [CompletedAt] IS NULL
ORDER BY [QueuedAt] ASC

If more than one worker instance polls (which is the normal case in App Services / k8s), every worker will pick the same row. MarkStarted uses COALESCE([StartedAt], SYSUTCDATETIME()) so it's idempotent — but that also means it does nothing to gate concurrent processors. Two workers will then run the deletion loop for the same organization in parallel.

Recommended fix — atomically claim and return in one statement:

UPDATE TOP (1) c
SET    [StartedAt] = SYSUTCDATETIME(),
       [LastProgressAt] = SYSUTCDATETIME()
OUTPUT inserted.*
FROM   [dbo].[OrganizationEventCleanup] c WITH (UPDLOCK, READPAST, ROWLOCK)
WHERE  [CompletedAt] IS NULL
  AND  [StartedAt]   IS NULL
ORDER BY [QueuedAt] ASC

If you do want a separate "peek" + "start", at minimum use WITH (UPDLOCK, READPAST, ROWLOCK) and filter StartedAt IS NULL so a started job is skipped.


🟠 Important

4. No max-attempts / terminal failure handling
OrganizationEventCleanup_RecordError.sql increments Attempts forever. Without a MaxAttempts cutoff or a terminal "Failed" state, a permanently-broken row blocks the queue head and burns retries indefinitely. Consider either a FailedAt/Status column or a MaxAttempts constant the read-side filters on.

5. No retry backoff
After an error, ReadNextPending will immediately re-surface the same row at the head of the queue (since QueuedAt doesn't change). Workers will hot-loop on a failing org. Consider a NextAttemptAt / NotBefore column populated by RecordError with exponential backoff.

6. Missing tests
.claude/CLAUDE.md explicitly says: "ALWAYS add unit tests (with mocking) for any new feature development." No tests added for the new repository. At minimum, add Dapper repository tests under test/Infrastructure.IntegrationTest/... similar to the other Dirt repositories.

7. Event.OrganizationId filtered index — deployment risk
2026-05-12_00_AddOrganizationEventCleanup.sql:30-35 creates IX_Event_OrganizationId on what is one of the largest tables in production. On a busy SQL Server this needs to be created with WITH (ONLINE = ON) (Enterprise) and ideally during a maintenance window. Please confirm with DBA before merging and consider:

CREATE NONCLUSTERED INDEX [IX_Event_OrganizationId]
    ON [dbo].[Event]([OrganizationId] ASC)
    WHERE [OrganizationId] IS NOT NULL
    WITH (ONLINE = ON, MAXDOP = 4);

Also: the existing IX_Event_DateOrganizationIdUserId already leads with Date then OrganizationId — verify that an OrganizationId-led range scan is actually needed by your cleanup query and isn't already adequately served (e.g., via residual predicate on the existing index).


🟡 Minor / Style

8. SELECT * in stored procedure
OrganizationEventCleanup_ReadNextPending.sql:7 — repo convention is to list columns explicitly so adding a column doesn't accidentally change the returned shape.

9. Filtered index would be cheaper
Both queries on OrganizationEventCleanup filter [CompletedAt] IS NULL. Once jobs complete, they bloat the index for no reason. Consider:

CREATE NONCLUSTERED INDEX [IX_OrganizationEventCleanup_Pending]
    ON [dbo].[OrganizationEventCleanup]([QueuedAt] ASC)
    WHERE [CompletedAt] IS NULL;

10. Attempts column name is ambiguous
It's only incremented in RecordError, so it actually counts failures, not attempts. Either rename to FailureCount or also increment in MarkStarted to match the name.

11. SetNewId() not invoked by CreateAsync
OrganizationEventCleanupRepository.cs:17-24 trusts the caller to set cleanup.Id. Most other Dapper repos either call SetNewId() in CreateAsync if the id is empty, or default it in the stored procedure. Worth aligning so a caller can't insert Guid.Empty.

12. QueuedAt set client-side, others server-side
QueuedAt is passed in from C#, but StartedAt/LastProgressAt/CompletedAt come from SYSUTCDATETIME(). For clock-skew safety and consistency, consider defaulting QueuedAt server-side too (DEFAULT SYSUTCDATETIME()) or sourcing it from SYSUTCDATETIME() in the Create sproc.

13. GDPR retention question
The objective is GDPR org deletion. OrganizationEventCleanup retains OrganizationId after the org is gone (effectively an "this org was deleted on date X" audit record). That's usually fine, but worth confirming with privacy that the retention policy on this audit row matches the rest of the GDPR deletion contract — e.g., do you want to purge completed rows after N days?

14. SonarCloud flagged 2 new issues on the Sonar dashboard — please review.


Summary

The shape of the queue table is reasonable, but as written this PR can't ship: there is no EF Core implementation for non-MSSQL providers, the repository isn't registered with DI, and ReadNextPending will hand the same row to every concurrent worker. After those three are addressed, the retry/backoff and test gaps are the next things to tackle.
• Branch: dirt/pm-33527/db-orphaned-event-logs

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 15, 2026

🤖 Bitwarden Claude Code Review

Overall Assessment: REQUEST CHANGES

This PR adds the database schema, stored procedures, entity, interface, and Dapper repository for tracking organization event-log cleanup jobs (GDPR-driven Azure Table Storage purge of orphaned events). Recent commits address several earlier review comments — stored procedures were renamed to the Update* convention, SetNewId() is now called in CreateAsync, the Dapper repository is registered with DI, CreationDate is generated in code, and the stray review-inline-comments.md artifact has been removed. The main outstanding blocker is the missing Entity Framework Core implementation required by the dual-ORM strategy.

Code Review Details
  • ⚠️ : Missing EF Core implementation (entity model, configuration, repository, and DbContext registration) for OrganizationEventCleanup. Sibling Dirt entities (OrganizationApplication, OrganizationReport) all have parallel implementations in src/Infrastructure.EntityFramework/Dirt/. Already raised in an open thread — flagging here for visibility.
    • src/Core/Dirt/Repositories/IOrganizationEventCleanupRepository.cs:12

Other open threads from prior review passes are tracked in the PR conversation (sproc naming, SYSUTCDATETIME removal, table column naming, Event-index applicability, and ClaimNextPending lease semantics). No new inline comments added in this pass to avoid duplication.

Comment thread src/Core/Dirt/Repositories/IOrganizationEventCleanupRepository.cs
Comment thread src/Sql/dbo/Dirt/Stored Procedures/OrganizationEventCleanup_ReadNextPending.sql Outdated
Comment thread util/Migrator/DbScripts/2026-05-12_00_AddOrganizationEventCleanup.sql Outdated
Copy link
Copy Markdown
Contributor

@mkincaid-bw mkincaid-bw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR creates a new table and modifies an existing table but there are no EF migrations to handle these changes for self-host/lite customers. Is that intentional?

Comment thread src/Sql/dbo/Dirt/Tables/OrganizationEventCleanup.sql Outdated
Comment thread src/Sql/dbo/Dirt/Tables/Event.sql Outdated
ON [dbo].[Event]([Date] DESC, [OrganizationId] ASC, [ActingUserId] ASC, [CipherId] ASC) INCLUDE ([ServiceAccountId], [GrantedServiceAccountId]);

GO
CREATE NONCLUSTERED INDEX [IX_Event_OrganizationId]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ℹ️ Just noting that Events for our cloud database are kept in Azure Table storage, not in the SQL database, so this index would only be applicable to self-host/lite customers.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@@ -0,0 +1,15 @@
CREATE PROCEDURE [dbo].[OrganizationEventCleanup_IncrementProgress]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stored procedure name does not follow our naming conventions. This should be named something like OrganizationEventCleanup_UpdateProgress.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

[dbo].[OrganizationEventCleanup]
SET
[EventsDeletedCount] = [EventsDeletedCount] + @Delta,
[LastProgressAt] = SYSUTCDATETIME()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When updating a value in the database, the value should be generated in code and passed in as parameters to the stored procedure.. We should not be using built-in SQL functions like SYSUTCDATETIME or GETUTCDATE

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@@ -0,0 +1,14 @@
CREATE PROCEDURE [dbo].[OrganizationEventCleanup_MarkCompleted]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stored procedure name does not follow our naming conventions. This should be named something like OrganizationEventCleanup_UpdateCompleted.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

Comment on lines +10 to +11
[CompletedAt] = SYSUTCDATETIME(),
[LastProgressAt] = SYSUTCDATETIME()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See previous comment regarding SYSUTCDATETIME

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@@ -0,0 +1,14 @@
CREATE PROCEDURE [dbo].[OrganizationEventCleanup_MarkStarted]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stored procedure name does not follow our naming conventions. This should be named something like OrganizationEventCleanup_UpdateStarted.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

Comment on lines +10 to +11
[StartedAt] = COALESCE([StartedAt], SYSUTCDATETIME()),
[LastProgressAt] = SYSUTCDATETIME()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See previous comments regarding SYSUTCDATETIME

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@@ -0,0 +1,16 @@
CREATE PROCEDURE [dbo].[OrganizationEventCleanup_RecordError]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stored procedure name does not follow our naming conventions. This should be named something like OrganizationEventCleanup_UpdateError.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

SET
[Attempts] = [Attempts] + 1,
[LastError] = @Message,
[LastProgressAt] = SYSUTCDATETIME()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See previous comments regarding SYSUTCDATETIME

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@prograhamming
Copy link
Copy Markdown
Contributor Author

prograhamming commented May 20, 2026

This PR creates a new table and modifies an existing table but there are no EF migrations to handle these changes for self-host/lite customers. Is that intentional?

Yes because my thinking is if they are self-hosted and they want to delete their organization they should have the ability to just delete the table or database on their server since Events are stored in a table for self-hosted/lite customers. @mkincaid-bw

Comment thread review-inline-comments.md Outdated
Comment thread src/Core/Dirt/Entities/OrganizationEventCleanup.cs Outdated
@sonarqubecloud
Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
E Reliability Rating on New Code (required ≥ D)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

@prograhamming
Copy link
Copy Markdown
Contributor Author

@claude Please re-review

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 21, 2026

Claude finished @prograhamming's task in 2m 49s —— View job


Code Review — PM-33527 Database SQL scripts (re-review)

  • Fetch latest branch state
  • Review changed files (table, sprocs, repository, entity, migration)
  • Verify previously raised issues are fixed
  • Identify any new issues
  • Post review feedback

Summary

Most prior feedback has been addressed. The atomic claim implementation is solid, naming conventions are now consistent with the contributing guide, and the runaway Event table index has been removed. A few smaller items remain.

✅ Resolved since last review

Prior finding Status
ReadNextPending not atomic (concurrent workers dequeue same row) ✅ Fixed via ClaimNextPending CTE with UPDLOCK, READPAST + OUTPUT inserted.*, with stale-lease re-claim
SetNewId() not called in CreateAsync ✅ Fixed (OrganizationEventCleanupRepository.cs:19)
CreationDate not defaulted on entity ✅ Fixed (OrganizationEventCleanup.cs:10)
SELECT * in claim sproc ✅ Fixed — explicit column list
DI registration missing ✅ Fixed (DapperServiceCollectionExtensions.cs:85)
SYSUTCDATETIME inside sprocs ✅ Fixed — @Now passed in from code for all update sprocs
Sproc names not following Update* convention ✅ Fixed (UpdateProgress, UpdateError, UpdateCompleted)
Datetime columns ending At instead of Date ✅ Fixed (CreationDate, RevisionDate, StartDate, CompletedDate)
IX_Event_OrganizationId deploy risk on large table ✅ Fixed — index removed entirely (Events not in SQL for cloud)
review-inline-comments.md / review-summary.md artifacts at repo root ✅ Fixed — removed
EF Core implementation missing ✅ Intentional per author response — self-host customers manage their own event tables

🟠 Important — still open

1. SonarCloud reliability gate still failing
The latest SonarCloud run (2026-05-21) reports E Reliability Rating on New Code — required ≥ D. Please address before merge.

2. No tests added
.claude/CLAUDE.md requires unit tests for new feature development. No tests cover OrganizationEventCleanupRepository or the claim/update sprocs. Consider at minimum:

  • A Dapper integration test (in test/Infrastructure.IntegrationTest/Dirt/...) covering CreateAsyncClaimNextPendingAsyncUpdateProgressAsyncUpdateCompletedAsync happy path
  • A concurrent-claim test confirming two simultaneous ClaimNextPendingAsync calls don't return the same row
  • A lease-recovery test confirming a stale RevisionDate allows re-claim

🟡 Minor

3. Hardcoded 10-minute lease in ClaimNextPendingAsync
OrganizationEventCleanupRepository.cs:33now.AddMinutes(-10) is a magic number. Consider a named constant on the repository or a GlobalSettings/options value so operators can tune lease duration without a code change.

4. Filtered index would be cheaper for queue read pattern
OrganizationEventCleanup.sql:15-16 — both reads filter CompletedDate IS NULL, so a filtered non-clustered index would stay small as completed rows accumulate:

CREATE NONCLUSTERED INDEX [IX_OrganizationEventCleanup_Pending]
    ON [dbo].[OrganizationEventCleanup]([CreationDate] ASC)
    INCLUDE ([StartDate], [RevisionDate])
    WHERE [CompletedDate] IS NULL;

5. Attempts semantics still ambiguous
OrganizationEventCleanup_UpdateError.sql:12 is the only place Attempts is incremented, so it actually counts failures. Rename to FailureCount, or also increment in ClaimNextPending when a fresh lease is taken — current name doesn't match behavior.

6. No terminal-failure / max-attempts cutoff
A permanently broken row will keep cycling through claim → fail → lease expires → claim forever. Either gate ClaimNextPending on Attempts < @MaxAttempts, or add a Status / FailedAt column for explicit terminal state. Without it, a poison row blocks the queue head indefinitely (and the operator only finds out via LastError).

7. Whitespace-only change to Event.sql
src/Sql/dbo/Dirt/Tables/Event.sql now has an extra trailing blank line and no other change. Probably worth reverting to keep the diff clean — touching Event.sql will be confusing for future archaeology since the change is invisible.

8. @LeaseExpiry parameter name
OrganizationEventCleanup_ClaimNextPending.sql:3 — the parameter is the threshold for "stale", not when the new lease expires. @StaleLeaseThreshold or @LeaseExpiredBefore would read more naturally next to the [RevisionDate] < @LeaseExpiry predicate.


The shape is much closer to mergeable. The SonarCloud reliability finding and the missing tests are the two items I would not ship without; the rest are polish.

• Branch: dirt/pm-33527/db-orphaned-event-logs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-review Request a Claude code review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants