Ft/import export by kelvinkipruto · Pull Request #567 · CodeForAfrica/PromiseTracker

kelvinkipruto · 2026-04-23T11:12:56Z

Description

Add ability to export AI extractions as CSV.

Fixes # (issue)

Type of change

New feature (non-breaking change which adds functionality)

Screenshots

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation

Enable data import/export functionality for the "promises" and "ai-extractions" collections via the Payload CMS import-export plugin. This provides administrators with CSV/JSON export capabilities and data import functionality directly from the admin interface.

Introduce a new collection `ai-extraction-export-rows` to provide flattened export data for AI extractions, enabling CSV exports via the import-export plugin. Add hooks to related collections (AIExtractions, Documents, PoliticalEntities, Tenants) to automatically synchronize export rows on changes. Include a migration to backfill existing data, a scheduled task for rebuilding, and comprehensive unit tests. Update package dependencies and fix import ordering.

vercel · 2026-04-23T11:13:02Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
PromiseTracker	Ready	Preview, Comment	Apr 30, 2026 8:16pm

This workflow was using a third-party action that is no longer needed. Removing it simplifies our CI setup and reduces dependency on external services.

kelvinkipruto · 2026-04-23T15:16:18Z

@claude @codex Review

claude · 2026-04-23T15:16:38Z

Claude encountered an error —— View job

I'll analyze this and get back to you.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2dfa0ba191

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Add hooks to PromiseStatus collection to trigger resync of AI extraction export rows when a status is updated or deleted. This ensures data consistency between promise statuses and their associated export rows. Implement syncAIExtractionExportRowsForStatus function that finds all export rows linked to a status, identifies unique AI extractions, and triggers resync for each. Update test to verify deduplication behavior and proper payload calls. Also fix unique key generation to use Payload array row IDs instead of uniqueId to handle duplicate uniqueId values, and remove unnecessary eslint-disable comment.

kelvinkipruto · 2026-04-24T08:59:06Z

@codex @claude Review

Replace find-then-delete loop with a single delete operation using where clause to improve performance and reduce database round trips.

kelvinkipruto · 2026-04-30T07:38:28Z

@claude @codex Review

kilemensi · 2026-04-30T12:44:23Z

Is this still here @kelvinkipruto ?

…r export row sync - Wrap all export row sync/delete operations in try-catch blocks with proper error logging - Add background job queuing for tenant and political entity sync operations to improve performance - Pass request context through all sync operations for proper access control - Add indexes to frequently queried fields in AIExtractionExportRows collection - Extend sync task to support tenant and political entity scoped operations

kelvinkipruto · 2026-04-30T13:14:30Z

@claude @codex Review

Replace direct sync calls in collection hooks with job queue to defer processing Add batching utility for large collections to prevent memory issues Update rebuild function to delete stale rows instead of clearing all rows

kelvinkipruto · 2026-04-30T18:05:40Z

@codex @claude Review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 848caca6c2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

kelvinkipruto · 2026-04-30T18:43:26Z

P2 — Export rows are not tenant-scoped for reads

This is intended

…d optimize deletion - Queue sync tasks for tenant and promise status changes/delete operations instead of direct execution - Add support for status-scoped sync in task handler - Optimize stale row deletion in rebuildAllAIExtractionExportRows to use bulk delete operations - Improve error logging messages to reflect queuing behavior

kelvinkipruto · 2026-04-30T19:04:28Z

@claude @codex review

…t rows Add early return when no AI extractions are found during rebuild to avoid unnecessary count and delete operations. Also replace queued job with direct function call for tenant deletion cleanup to ensure immediate removal of associated export rows.

kelvinkipruto · 2026-04-30T19:27:52Z

@claude Review

Move inline job queueing logic from collection configs to dedicated hook files Extract shared type definitions and utility functions to central module Add new PromiseStatus collection with sync hooks Create new Documents and AIExtractions collections with sync hooks

kelvinkipruto · 2026-04-30T19:51:05Z

@claude Review

Add input validation to ensure scoped sync operations provide the required identifier. Refuse execution when scope is specified but the corresponding ID is missing or when an unsupported scope is provided. Simplify conditional logic in rebuildAllAIExtractionExportRows by removing unnecessary ternary operators, as the condition always evaluates to true in practice.

kelvinkipruto · 2026-04-30T20:02:15Z

@claude @codex Review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1eb30896aa

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

… export rows Improve the rebuildAllAIExtractionExportRows function to only delete export rows for AI extractions that are confirmed missing, rather than using a simple not_in check against the synced IDs. This prevents race conditions where export rows created after the rebuild scan started would be incorrectly deleted. Add helper functions aiExtractionExists and findOrphanedAIExtractionIds to verify each candidate AI extraction still exists in the database before marking its export rows for deletion. Also add error logging wrappers to all sync functions to improve debuggability.

kelvinkipruto · 2026-04-30T20:14:41Z

@claude Review

claude · 2026-04-30T20:15:02Z

Claude finished @kelvinkipruto's task in 2m 40s —— View job

Code Review

Read CLAUDE.md and repo guidelines
Read core library: src/lib/aiExtractionExportRows.ts
Read collection: src/collections/AIExtractionExportRows.ts
Read task: src/tasks/syncAIExtractionExportRows.ts
Read hook files (AIExtractions, Documents, PoliticalEntities, PromiseStatus, Tenant)
Read migration and tests
Analyze and post review

This is a solid implementation overall. The job-queue pattern for async sync, pure builder function, and per-extraction error isolation are all good design choices. Several previous P1 issues from earlier Codex reviews have been fixed (row key now correctly uses extraction.id first, PromiseStatus hooks are wired, status-sync errors are caught per-extraction).

Below are the remaining issues I found.

P1 — Deleted `claude-code-review.yml` removes automatic PR reviews

The deletion of .github/workflows/claude-code-review.yml means new PRs no longer get automatic Claude reviews on open/synchronize events. Only explicit @claude comment mentions now trigger reviews, weakening the default quality gate. I cannot modify workflow files, but this should be restored or replaced intentionally.

P1 — Orphaned export rows after tenant deletion with `cleanupAfterTenantDelete: false`

src/collections/Tenant/hooks/index.ts:31 and src/plugins/index.ts:57

The multi-tenant plugin is configured with cleanupAfterTenantDelete: false, so political-entities, documents, and ai-extractions remain in the database after a tenant is deleted. However, deleteAIExtractionExportRowsAfterTenantDelete deletes all export rows for that tenant. This leaves the underlying ai-extractions documents without export rows permanently — they won't be re-synced unless a manual full rebuild is triggered. If tenant deletion is expected to be reversible or if orphaned extractions are ever re-assigned, the export rows will be missing until a manual rebuild.

If tenant deletion truly means "delete everything for that tenant", this hook is fine but cleanupAfterTenantDelete: false should probably be true. If orphaned data is intentional, consider queuing a rebuild instead of deleting export rows on tenant delete.

Fix this →

P2 — Single long-running job for tenant-scoped sync may timeout

src/lib/aiExtractionExportRows.ts:561

syncAIExtractionExportRowsForTenant iterates every political entity → every document → every AI extraction in a single job. For a large tenant this can take many minutes inside a single Payload job worker, risking job timeout. Consider either:

Breaking the tenant sync into per-political-entity jobs (fan-out), or
Adding a configurable batch size guard with early-exit/continuation logic

Fix this →

P2 — `depth: 3` on every sync is expensive

src/lib/aiExtractionExportRows.ts:349

const extractionDoc = await payload.findByID({
  collection: "ai-extractions",
  depth: 3,
  ...

depth: 3 populates document → politicalEntity → tenant plus all other fields at each level (media, other relationships, etc.). For collections with many relationship fields at each level, this over-fetches significantly. Using depth: 0 and fetching each relationship explicitly with targeted selects would reduce DB load at scale.

P2 — Migration `down()` silently wipes all export rows

migrations/20260423_000000_backfill_ai_extraction_export_rows.ts:11

await payload.delete({
  collection: AI_EXTRACTION_EXPORT_ROWS_COLLECTION,
  where: {},
});

This deletes every row in the collection. While likely intentional for a rollback, where: {} is the maximum blast-radius delete with no guard. A comment explaining this is intentional would prevent future confusion. Also, depending on the Payload adapter, an empty where clause may behave differently (e.g., some adapters may reject it). Consider where: { id: { exists: true } } as a more explicit variant.

Observation — `statusLabel` is locale-dependent

src/lib/aiExtractionExportRows.ts:193

PromiseStatus.label has localized: true. When findByID is called from a job worker, the locale depends on the worker's request context (likely the default locale). This is actually consistent and deterministic for background jobs, and the author has confirmed this is intentional. Just noting it here so reviewers are aware that statusLabel in CSV exports will always reflect the default locale regardless of tenant locale.

Minor — `isSyncExportRowsInput` type guard is a no-op

src/tasks/syncAIExtractionExportRows.ts:13

const isSyncExportRowsInput = (value: unknown): value is SyncAIExtractionExportRowsInput =>
  typeof value === "object" && value !== null;

This accepts any non-null object, including { foo: "bar" }. The narrowing gives false confidence. The actual validation work is done by scopedInputIsMissingRequiredId. This is harmless but could be dropped in favor of a simple input ?? {} fallback.

What's already fixed vs. earlier Codex feedback

Previous P1	Status
`getExtractionRowId` using `uniqueId` as key	✅ Fixed — now uses `extraction.id` first via `getId(extraction)`
`PromiseStatus` changes not syncing export rows	✅ Fixed — `PromiseStatus` hooks are wired
Transaction ID not threaded through	✅ N/A — sync is now async via job queue, so it correctly runs outside the originating transaction
Status-scoped sync aborting on missing extraction	✅ Fixed — each extraction sync is wrapped in try/catch
Locale nondeterminism	Intentional per author
Tenant rows deleted vs. rebuilt	Still unresolved (P1 above)

kelvinkipruto added 2 commits April 23, 2026 13:04

vercel Bot deployed to Preview April 23, 2026 11:15 View deployment

ci: remove Claude Code Review workflow

2dfa0ba

This workflow was using a third-party action that is no longer needed. Removing it simplifies our CI setup and reduces dependency on external services.

vercel Bot deployed to Preview April 23, 2026 15:18 View deployment

chatgpt-codex-connector Bot reviewed Apr 23, 2026

View reviewed changes

Comment thread src/lib/aiExtractionExportRows.ts Outdated

Comment thread src/lib/aiExtractionExportRows.ts