Skip to content

chore: tenant data bulk delete#77

Merged
xernobyl merged 5 commits into
mainfrom
chore/tenant-data-bulk-delete
May 22, 2026
Merged

chore: tenant data bulk delete#77
xernobyl merged 5 commits into
mainfrom
chore/tenant-data-bulk-delete

Conversation

@xernobyl
Copy link
Copy Markdown
Contributor

@xernobyl xernobyl commented May 20, 2026

What does this PR do?

Fixes ENG-1010

Implements tenant data purge for Hub:

DELETE /v1/tenants/{tenant_id}/data

The endpoint deletes all current tenant-owned Hub data for the requested tenant:

  • embeddings derived from matching feedback records
  • feedback_records
  • webhooks

It returns per-resource delete counts, is idempotent, and does not publish webhook events during purge.

This PR also refactors the existing GDPR feedback-record delete-by-user path to use clearer naming and shared identifier validation for the endpoints touched in this change.

Related follow-up: ENG-1013 for full tenant-write serialization during purge.

Fixes: N/A

Example response:

{
  "tenant_id": "org-123",
  "deleted_feedback_records": 42,
  "deleted_embeddings": 17,
  "deleted_webhooks": 3,
  "message": "Successfully deleted tenant data for org-123"
}

How should this be tested?

Commands run:

  • go test ./internal/... ./cmd/api
  • golangci-lint run ./...
  • make lint-openapi
  • git diff --check

Local smoke test:

  1. Run migrations with make init-db.
  2. Start the API locally with valid DATABASE_URL and API_KEY.
  3. Create two feedback records with the same tenant_id.
  4. Call:
curl -i -sS -X DELETE "$HUB_URL/v1/tenants/$TENANT_ID/data" \
  -H "Authorization: Bearer $API_KEY"

Observed response:

{
  "deleted_feedback_records": 2,
  "deleted_embeddings": 2,
  "deleted_webhooks": 0
}
  1. GET /v1/feedback-records/{id} for both created records returned 404 Not Found.

Checklist

Required

  • Filled out the "How to test" section in this PR
  • Read Repository Guidelines
  • Self-reviewed my own code
  • Commented on my code in hard-to-understand bits
  • Ran make build
  • Ran make tests (integration tests in tests/)
  • Ran make fmt and make lint; no new warnings
  • Removed debug prints / temporary logging
  • Merged the latest changes from main onto my branch with git pull origin main
  • If database schema changed: added migration in migrations/ with goose annotations and ran make migrate-validate

Appreciated

  • If API changed: added or updated OpenAPI spec and ran contract tests (make tests or API contract workflow)
  • If API behavior changed: added request/response examples or Swagger UI screenshots to this PR
  • Updated docs in docs/ if changes were necessary
  • Ran make tests-coverage for meaningful logic changes

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 20, 2026

✱ Stainless preview builds

This PR will update the hub SDKs with the following commit message.

chore: tenant data bulk delete
hub-openapi studio · code

Your SDK build had at least one "note" diagnostic.
generate ✅

hub-typescript studio · code

Your SDK build had at least one "note" diagnostic.
generate ✅build ✅lint ✅test ✅


This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push.
If you push custom code to the preview branch, re-run this workflow to update the comment.
Last updated: 2026-05-22 09:50:35 UTC

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 20, 2026

Review Change Stack

Walkthrough

This pull request introduces tenant-scoped data purge capability and refactors feedback records deletion to user-scoped deletion. A new HTTP endpoint DELETE /v1/tenants/{tenant_id}/data purges all feedback records, embeddings, and webhooks for a tenant via transactional database operations. Simultaneously, the feedback-records delete endpoint transitions from generic bulk deletion to user-aware deletion with required user_id and optional tenant_id parameters, both with explicit validation constraints. Shared identifier validation helpers normalize and validate tenant and user IDs across both features. Integration tests and OpenAPI specifications were updated to reflect the new endpoint semantics and response structures.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 22.86% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'chore: tenant data bulk delete' is vague and contradictory—it describes the changes as a 'bulk delete' when the actual implementation emphasizes clearer naming and refactors away from bulk-delete terminology toward 'delete by user' and 'delete by tenant' patterns. Consider a more precise title that reflects the main change: either 'feat: implement tenant data purge endpoint' or 'refactor: rename feedback delete to delete-by-user with tenant data purge' to better represent both the new endpoint and the naming improvements.
✅ Passed checks (3 passed)
Check name Status Explanation
Description check ✅ Passed The PR description is comprehensive and well-structured, covering what the PR does (tenant data purge endpoint), testing instructions, example response, and a completed checklist with most required items marked.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
tests/README.md (1)

40-50: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Document tenant-data purge coverage in this list.

The coverage section omits the newly added tenant purge integration test, so the inventory is stale.

Proposed docs update
 The integration tests cover:
@@
 - ✅ Delete feedback record
 - ✅ Delete feedback records by user
+- ✅ Delete tenant data (feedback records, embeddings, webhooks)
 - ✅ Webhook CRUD and validation
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/README.md` around lines 40 - 50, The test inventory in tests/README.md
is missing the newly added tenant-data purge integration test; update the
bulleted list to include an entry like "✅ Tenant-data purge (tenant purge
integration test)" or "✅ Purge feedback records by tenant" so the README
reflects the new test coverage and matches the test name used in your suite
(reference the "tenant purge" / "tenant-data purge integration test" terminology
to locate the corresponding test).
internal/api/handlers/feedback_records_handler.go (1)

253-259: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Remove raw user_id/tenant_id from error logs in delete-by-user path.

This logs direct identifiers on server errors, which creates avoidable PII exposure in logs. Log only presence/length/hash or omit entirely.

Proposed fix
-		slog.Error("Failed to delete feedback records by user", // `#nosec` G706 -- slog key-values
+		slog.Error("Failed to delete feedback records by user", // `#nosec` G706 -- slog key-values
 			"method", r.Method,
 			"path", r.URL.Path,
-			"user_id", filters.UserID,
-			"tenant_id", tenantID,
+			"has_user_id", filters.UserID != "",
+			"has_tenant_id", tenantID != "",
 			"error", err,
 		)

As per coding guidelines: “Treat tenant_id as a security boundary…” and avoid compliance/privacy risks such as logging sensitive user identifiers.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/api/handlers/feedback_records_handler.go` around lines 253 - 259,
The slog.Error call in the delete-by-user path logs raw PII (the "user_id" and
"tenant_id" keys); change that to avoid printing direct identifiers by either
omitting them or replacing them with non-PII representations (for example, log
"user_id_present": true and "user_id_len": len(filters.UserID) or a
deterministic hash like sha256(filters.UserID) and similarly for tenantID) in
the slog.Error invocation; locate the failing block around the slog.Error call
in feedback_records_handler.go (references: filters.UserID, tenantID) and update
the key/value pairs to emit only presence/length/hash or remove the keys
entirely.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/api/app_test.go`:
- Line 265: Add a unit test named
TestNewHTTPServerKeepsTenantDataRoutesProtected that uses newTestHTTPServer(t)
to create the server, sends an unauthenticated DELETE request (via
httptest.NewRequestWithContext) to "/v1/tenants/test-tenant-id/data" and invokes
server.Handler.ServeHTTP(recorder, request), then assert that recorder.Code ==
http.StatusUnauthorized; this mirrors the pattern used in
TestNewHTTPServerKeepsV1RoutesProtected and verifies the auth protection for the
DELETE /v1/tenants/{tenant_id}/data endpoint.

In `@internal/repository/tenant_data_repository.go`:
- Around line 54-77: DeleteByTenant currently runs a transaction that can race
with concurrent tenant-scoped writes (so deletes may miss new rows); before
calling deleteTenantDataInTx acquire a tenant-scoped write lock shared across
all tenant-mutating paths (e.g., implement and call a
LockTenantWrite/UnlockTenantWrite or use a tenant-specific advisory lock
mechanism) to serialize purge and mutating operations for the same tenant, hold
the lock for the duration of the DB transaction (between BeginTx and
Commit/Rollback), and ensure other write paths also use the same lock protocol
when mutating tenant data.

In `@internal/repository/webhooks_repository.go`:
- Around line 31-33: In the Create method, add robust input validation: first
check if req == nil and return huberrors.NewValidationError("tenant_id",
"request is required") (or similar), then check req.TenantID for nil or
empty/whitespace by trimming *req.TenantID and ensuring length > 0; if trimmed
tenant_id is empty return huberrors.NewValidationError("tenant_id", "tenant_id
is required"); update the existing req.TenantID == nil guard to use this
combined nil+empty/whitespace check so tenant_id is treated as a strict security
boundary.

In `@openapi.yaml`:
- Around line 1075-1079: Update the OpenAPI operation with operationId
"delete-tenant-data" so its description explicitly states tenant purge does NOT
emit webhook events: modify the multi-line description block for the
delete-tenant-data operation to add a sentence clarifying that this synchronous,
idempotent purge will not trigger any webhook notifications or callbacks for the
tenant, while keeping the existing details about feedback, embeddings, and
webhooks being deleted.

In `@tests/integration_test.go`:
- Around line 1013-1077: TestDeleteTenantData currently checks DB/API isolation
but doesn't assert that async webhook dispatch produces no deliveries after
purge; add a sink webhook endpoint (using createTenantDataWebhook or a new
helper like createSinkWebhook) that records/returns delivery counts, call DELETE
via deleteTenantData, then assert that the sink received zero deliveries for
tenantA (e.g., check recorded deliveries or poll the sink endpoint), and include
the same zero-delivery check on the repeated delete (repeatedResp) to ensure no
async fan-out occurs; update TestDeleteTenantData to create the sink before the
purge and verify zero webhook deliveries afterwards using the existing
client/server URL and tenantA identifiers.

---

Outside diff comments:
In `@internal/api/handlers/feedback_records_handler.go`:
- Around line 253-259: The slog.Error call in the delete-by-user path logs raw
PII (the "user_id" and "tenant_id" keys); change that to avoid printing direct
identifiers by either omitting them or replacing them with non-PII
representations (for example, log "user_id_present": true and "user_id_len":
len(filters.UserID) or a deterministic hash like sha256(filters.UserID) and
similarly for tenantID) in the slog.Error invocation; locate the failing block
around the slog.Error call in feedback_records_handler.go (references:
filters.UserID, tenantID) and update the key/value pairs to emit only
presence/length/hash or remove the keys entirely.

In `@tests/README.md`:
- Around line 40-50: The test inventory in tests/README.md is missing the newly
added tenant-data purge integration test; update the bulleted list to include an
entry like "✅ Tenant-data purge (tenant purge integration test)" or "✅ Purge
feedback records by tenant" so the README reflects the new test coverage and
matches the test name used in your suite (reference the "tenant purge" /
"tenant-data purge integration test" terminology to locate the corresponding
test).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: bbbf14c5-8932-4500-a4af-b4b667a987a4

📥 Commits

Reviewing files that changed from the base of the PR and between 6e7b278 and 6af7d35.

📒 Files selected for processing (22)
  • cmd/api/app.go
  • cmd/api/app_test.go
  • internal/api/handlers/feedback_records_handler.go
  • internal/api/handlers/feedback_records_handler_test.go
  • internal/api/handlers/tenant_data_handler.go
  • internal/api/handlers/tenant_data_handler_test.go
  • internal/models/feedback_records.go
  • internal/models/tenant_data.go
  • internal/repository/feedback_records_repository.go
  • internal/repository/feedback_records_repository_test.go
  • internal/repository/tenant_data_repository.go
  • internal/repository/tenant_data_repository_test.go
  • internal/repository/webhooks_repository.go
  • internal/service/feedback_records_service.go
  • internal/service/feedback_records_service_test.go
  • internal/service/id_validation.go
  • internal/service/tenant_data_service.go
  • internal/service/tenant_data_service_test.go
  • internal/service/tenant_validation.go
  • openapi.yaml
  • tests/README.md
  • tests/integration_test.go
💤 Files with no reviewable changes (1)
  • internal/service/tenant_validation.go

Comment thread cmd/api/app_test.go
Comment thread internal/repository/tenant_data_repository.go
Comment thread internal/repository/webhooks_repository.go Outdated
Comment thread openapi.yaml
Comment thread tests/integration_test.go
@xernobyl xernobyl requested a review from Dhruwang May 20, 2026 13:14
Comment thread internal/repository/tenant_data_repository.go
Comment thread Makefile
@xernobyl xernobyl enabled auto-merge May 22, 2026 09:43
@BhagyaAmarasinghe BhagyaAmarasinghe self-requested a review May 22, 2026 09:47
@xernobyl xernobyl added this pull request to the merge queue May 22, 2026
Merged via the queue into main with commit 1db86a9 May 22, 2026
12 checks passed
@xernobyl xernobyl deleted the chore/tenant-data-bulk-delete branch May 22, 2026 09:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants