chore: tenant data bulk delete#77
Conversation
✱ Stainless preview buildsThis PR will update the ✅ hub-typescript studio · code
This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push. |
WalkthroughThis pull request introduces tenant-scoped data purge capability and refactors feedback records deletion to user-scoped deletion. A new HTTP endpoint 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
tests/README.md (1)
40-50:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winDocument tenant-data purge coverage in this list.
The coverage section omits the newly added tenant purge integration test, so the inventory is stale.
Proposed docs update
The integration tests cover: @@ - ✅ Delete feedback record - ✅ Delete feedback records by user +- ✅ Delete tenant data (feedback records, embeddings, webhooks) - ✅ Webhook CRUD and validation🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/README.md` around lines 40 - 50, The test inventory in tests/README.md is missing the newly added tenant-data purge integration test; update the bulleted list to include an entry like "✅ Tenant-data purge (tenant purge integration test)" or "✅ Purge feedback records by tenant" so the README reflects the new test coverage and matches the test name used in your suite (reference the "tenant purge" / "tenant-data purge integration test" terminology to locate the corresponding test).internal/api/handlers/feedback_records_handler.go (1)
253-259:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRemove raw
user_id/tenant_idfrom error logs in delete-by-user path.This logs direct identifiers on server errors, which creates avoidable PII exposure in logs. Log only presence/length/hash or omit entirely.
Proposed fix
- slog.Error("Failed to delete feedback records by user", // `#nosec` G706 -- slog key-values + slog.Error("Failed to delete feedback records by user", // `#nosec` G706 -- slog key-values "method", r.Method, "path", r.URL.Path, - "user_id", filters.UserID, - "tenant_id", tenantID, + "has_user_id", filters.UserID != "", + "has_tenant_id", tenantID != "", "error", err, )As per coding guidelines: “Treat
tenant_idas a security boundary…” and avoid compliance/privacy risks such as logging sensitive user identifiers.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/api/handlers/feedback_records_handler.go` around lines 253 - 259, The slog.Error call in the delete-by-user path logs raw PII (the "user_id" and "tenant_id" keys); change that to avoid printing direct identifiers by either omitting them or replacing them with non-PII representations (for example, log "user_id_present": true and "user_id_len": len(filters.UserID) or a deterministic hash like sha256(filters.UserID) and similarly for tenantID) in the slog.Error invocation; locate the failing block around the slog.Error call in feedback_records_handler.go (references: filters.UserID, tenantID) and update the key/value pairs to emit only presence/length/hash or remove the keys entirely.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cmd/api/app_test.go`:
- Line 265: Add a unit test named
TestNewHTTPServerKeepsTenantDataRoutesProtected that uses newTestHTTPServer(t)
to create the server, sends an unauthenticated DELETE request (via
httptest.NewRequestWithContext) to "/v1/tenants/test-tenant-id/data" and invokes
server.Handler.ServeHTTP(recorder, request), then assert that recorder.Code ==
http.StatusUnauthorized; this mirrors the pattern used in
TestNewHTTPServerKeepsV1RoutesProtected and verifies the auth protection for the
DELETE /v1/tenants/{tenant_id}/data endpoint.
In `@internal/repository/tenant_data_repository.go`:
- Around line 54-77: DeleteByTenant currently runs a transaction that can race
with concurrent tenant-scoped writes (so deletes may miss new rows); before
calling deleteTenantDataInTx acquire a tenant-scoped write lock shared across
all tenant-mutating paths (e.g., implement and call a
LockTenantWrite/UnlockTenantWrite or use a tenant-specific advisory lock
mechanism) to serialize purge and mutating operations for the same tenant, hold
the lock for the duration of the DB transaction (between BeginTx and
Commit/Rollback), and ensure other write paths also use the same lock protocol
when mutating tenant data.
In `@internal/repository/webhooks_repository.go`:
- Around line 31-33: In the Create method, add robust input validation: first
check if req == nil and return huberrors.NewValidationError("tenant_id",
"request is required") (or similar), then check req.TenantID for nil or
empty/whitespace by trimming *req.TenantID and ensuring length > 0; if trimmed
tenant_id is empty return huberrors.NewValidationError("tenant_id", "tenant_id
is required"); update the existing req.TenantID == nil guard to use this
combined nil+empty/whitespace check so tenant_id is treated as a strict security
boundary.
In `@openapi.yaml`:
- Around line 1075-1079: Update the OpenAPI operation with operationId
"delete-tenant-data" so its description explicitly states tenant purge does NOT
emit webhook events: modify the multi-line description block for the
delete-tenant-data operation to add a sentence clarifying that this synchronous,
idempotent purge will not trigger any webhook notifications or callbacks for the
tenant, while keeping the existing details about feedback, embeddings, and
webhooks being deleted.
In `@tests/integration_test.go`:
- Around line 1013-1077: TestDeleteTenantData currently checks DB/API isolation
but doesn't assert that async webhook dispatch produces no deliveries after
purge; add a sink webhook endpoint (using createTenantDataWebhook or a new
helper like createSinkWebhook) that records/returns delivery counts, call DELETE
via deleteTenantData, then assert that the sink received zero deliveries for
tenantA (e.g., check recorded deliveries or poll the sink endpoint), and include
the same zero-delivery check on the repeated delete (repeatedResp) to ensure no
async fan-out occurs; update TestDeleteTenantData to create the sink before the
purge and verify zero webhook deliveries afterwards using the existing
client/server URL and tenantA identifiers.
---
Outside diff comments:
In `@internal/api/handlers/feedback_records_handler.go`:
- Around line 253-259: The slog.Error call in the delete-by-user path logs raw
PII (the "user_id" and "tenant_id" keys); change that to avoid printing direct
identifiers by either omitting them or replacing them with non-PII
representations (for example, log "user_id_present": true and "user_id_len":
len(filters.UserID) or a deterministic hash like sha256(filters.UserID) and
similarly for tenantID) in the slog.Error invocation; locate the failing block
around the slog.Error call in feedback_records_handler.go (references:
filters.UserID, tenantID) and update the key/value pairs to emit only
presence/length/hash or remove the keys entirely.
In `@tests/README.md`:
- Around line 40-50: The test inventory in tests/README.md is missing the newly
added tenant-data purge integration test; update the bulleted list to include an
entry like "✅ Tenant-data purge (tenant purge integration test)" or "✅ Purge
feedback records by tenant" so the README reflects the new test coverage and
matches the test name used in your suite (reference the "tenant purge" /
"tenant-data purge integration test" terminology to locate the corresponding
test).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: bbbf14c5-8932-4500-a4af-b4b667a987a4
📒 Files selected for processing (22)
cmd/api/app.gocmd/api/app_test.gointernal/api/handlers/feedback_records_handler.gointernal/api/handlers/feedback_records_handler_test.gointernal/api/handlers/tenant_data_handler.gointernal/api/handlers/tenant_data_handler_test.gointernal/models/feedback_records.gointernal/models/tenant_data.gointernal/repository/feedback_records_repository.gointernal/repository/feedback_records_repository_test.gointernal/repository/tenant_data_repository.gointernal/repository/tenant_data_repository_test.gointernal/repository/webhooks_repository.gointernal/service/feedback_records_service.gointernal/service/feedback_records_service_test.gointernal/service/id_validation.gointernal/service/tenant_data_service.gointernal/service/tenant_data_service_test.gointernal/service/tenant_validation.goopenapi.yamltests/README.mdtests/integration_test.go
💤 Files with no reviewable changes (1)
- internal/service/tenant_validation.go
What does this PR do?
Fixes ENG-1010
Implements tenant data purge for Hub:
DELETE /v1/tenants/{tenant_id}/dataThe endpoint deletes all current tenant-owned Hub data for the requested tenant:
embeddingsderived from matching feedback recordsfeedback_recordswebhooksIt returns per-resource delete counts, is idempotent, and does not publish webhook events during purge.
This PR also refactors the existing GDPR feedback-record delete-by-user path to use clearer naming and shared identifier validation for the endpoints touched in this change.
Related follow-up: ENG-1013 for full tenant-write serialization during purge.
Fixes: N/A
Example response:
{ "tenant_id": "org-123", "deleted_feedback_records": 42, "deleted_embeddings": 17, "deleted_webhooks": 3, "message": "Successfully deleted tenant data for org-123" }How should this be tested?
Commands run:
go test ./internal/... ./cmd/apigolangci-lint run ./...make lint-openapigit diff --checkLocal smoke test:
make init-db.DATABASE_URLandAPI_KEY.tenant_id.Observed response:
{ "deleted_feedback_records": 2, "deleted_embeddings": 2, "deleted_webhooks": 0 }GET /v1/feedback-records/{id}for both created records returned404 Not Found.Checklist
Required
make buildmake tests(integration tests intests/)make fmtandmake lint; no new warningsgit pull origin mainmigrations/with goose annotations and ranmake migrate-validateAppreciated
make testsor API contract workflow)docs/if changes were necessarymake tests-coveragefor meaningful logic changes