Skip to content

Ft/import export#567

Open
kelvinkipruto wants to merge 12 commits intomainfrom
ft/import-export
Open

Ft/import export#567
kelvinkipruto wants to merge 12 commits intomainfrom
ft/import-export

Conversation

@kelvinkipruto
Copy link
Copy Markdown
Contributor

Description

Add ability to export AI extractions as CSV.

Fixes # (issue)

Type of change

  • New feature (non-breaking change which adds functionality)

Screenshots

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation

Enable data import/export functionality for the "promises" and "ai-extractions" collections via the Payload CMS import-export plugin. This provides administrators with CSV/JSON export capabilities and data import functionality directly from the admin interface.
Introduce a new collection `ai-extraction-export-rows` to provide flattened export data for AI extractions, enabling CSV exports via the import-export plugin. Add hooks to related collections (AIExtractions, Documents, PoliticalEntities, Tenants) to automatically synchronize export rows on changes. Include a migration to backfill existing data, a scheduled task for rebuilding, and comprehensive unit tests.

Update package dependencies and fix import ordering.
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 23, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
PromiseTracker Ready Ready Preview, Comment Apr 30, 2026 8:16pm

Request Review

This workflow was using a third-party action that is no longer needed.
Removing it simplifies our CI setup and reduces dependency on external services.
@kelvinkipruto
Copy link
Copy Markdown
Contributor Author

@claude @codex Review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 23, 2026

Claude encountered an error —— View job


I'll analyze this and get back to you.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2dfa0ba191

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/lib/aiExtractionExportRows.ts Outdated
Comment thread src/lib/aiExtractionExportRows.ts
Add hooks to PromiseStatus collection to trigger resync of AI extraction export rows when a status is updated or deleted. This ensures data consistency between promise statuses and their associated export rows.

Implement syncAIExtractionExportRowsForStatus function that finds all export rows linked to a status, identifies unique AI extractions, and triggers resync for each. Update test to verify deduplication behavior and proper payload calls.

Also fix unique key generation to use Payload array row IDs instead of uniqueId to handle duplicate uniqueId values, and remove unnecessary eslint-disable comment.
@kelvinkipruto
Copy link
Copy Markdown
Contributor Author

@codex @claude Review

@claude

This comment was marked as resolved.

chatgpt-codex-connector[bot]

This comment was marked as resolved.

Replace find-then-delete loop with a single delete operation using where clause to improve performance and reduce database round trips.
@kelvinkipruto
Copy link
Copy Markdown
Contributor Author

@claude @codex Review

@claude

This comment was marked as resolved.

chatgpt-codex-connector[bot]

This comment was marked as resolved.

@kilemensi
Copy link
Copy Markdown
Member

Is this still here @kelvinkipruto ?

…r export row sync

- Wrap all export row sync/delete operations in try-catch blocks with proper error logging
- Add background job queuing for tenant and political entity sync operations to improve performance
- Pass request context through all sync operations for proper access control
- Add indexes to frequently queried fields in AIExtractionExportRows collection
- Extend sync task to support tenant and political entity scoped operations
@kelvinkipruto
Copy link
Copy Markdown
Contributor Author

@claude @codex Review

@claude

This comment was marked as resolved.

chatgpt-codex-connector[bot]

This comment was marked as resolved.

Replace direct sync calls in collection hooks with job queue to defer processing
Add batching utility for large collections to prevent memory issues
Update rebuild function to delete stale rows instead of clearing all rows
@kelvinkipruto
Copy link
Copy Markdown
Contributor Author

@codex @claude Review

@claude

This comment was marked as resolved.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 848caca6c2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/lib/aiExtractionExportRows.ts
Comment thread src/collections/Tenant/index.ts Outdated
@kelvinkipruto
Copy link
Copy Markdown
Contributor Author

P2 — Export rows are not tenant-scoped for reads

This is intended

…d optimize deletion

- Queue sync tasks for tenant and promise status changes/delete operations instead of direct execution
- Add support for status-scoped sync in task handler
- Optimize stale row deletion in rebuildAllAIExtractionExportRows to use bulk delete operations
- Improve error logging messages to reflect queuing behavior
@kelvinkipruto
Copy link
Copy Markdown
Contributor Author

@claude @codex review

@claude

This comment was marked as resolved.

chatgpt-codex-connector[bot]

This comment was marked as resolved.

…t rows

Add early return when no AI extractions are found during rebuild to avoid
unnecessary count and delete operations. Also replace queued job with direct
function call for tenant deletion cleanup to ensure immediate removal of
associated export rows.
@kelvinkipruto
Copy link
Copy Markdown
Contributor Author

@claude Review

@claude

This comment was marked as resolved.

Move inline job queueing logic from collection configs to dedicated hook files
Extract shared type definitions and utility functions to central module
Add new PromiseStatus collection with sync hooks
Create new Documents and AIExtractions collections with sync hooks
@kelvinkipruto
Copy link
Copy Markdown
Contributor Author

@claude Review

@claude

This comment was marked as resolved.

Add input validation to ensure scoped sync operations provide the required identifier. Refuse execution when scope is specified but the corresponding ID is missing or when an unsupported scope is provided.

Simplify conditional logic in rebuildAllAIExtractionExportRows by removing unnecessary ternary operators, as the condition always evaluates to true in practice.
@kelvinkipruto
Copy link
Copy Markdown
Contributor Author

@claude @codex Review

@claude

This comment was marked as resolved.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1eb30896aa

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/lib/aiExtractionExportRows.ts Outdated
… export rows

Improve the rebuildAllAIExtractionExportRows function to only delete export rows for AI extractions that are confirmed missing, rather than using a simple not_in check against the synced IDs. This prevents race conditions where export rows created after the rebuild scan started would be incorrectly deleted.

Add helper functions aiExtractionExists and findOrphanedAIExtractionIds to verify each candidate AI extraction still exists in the database before marking its export rows for deletion. Also add error logging wrappers to all sync functions to improve debuggability.
@kelvinkipruto
Copy link
Copy Markdown
Contributor Author

@claude Review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Claude finished @kelvinkipruto's task in 2m 40s —— View job


Code Review

  • Read CLAUDE.md and repo guidelines
  • Read core library: src/lib/aiExtractionExportRows.ts
  • Read collection: src/collections/AIExtractionExportRows.ts
  • Read task: src/tasks/syncAIExtractionExportRows.ts
  • Read hook files (AIExtractions, Documents, PoliticalEntities, PromiseStatus, Tenant)
  • Read migration and tests
  • Analyze and post review

This is a solid implementation overall. The job-queue pattern for async sync, pure builder function, and per-extraction error isolation are all good design choices. Several previous P1 issues from earlier Codex reviews have been fixed (row key now correctly uses extraction.id first, PromiseStatus hooks are wired, status-sync errors are caught per-extraction).

Below are the remaining issues I found.


P1 — Deleted claude-code-review.yml removes automatic PR reviews

The deletion of .github/workflows/claude-code-review.yml means new PRs no longer get automatic Claude reviews on open/synchronize events. Only explicit @claude comment mentions now trigger reviews, weakening the default quality gate. I cannot modify workflow files, but this should be restored or replaced intentionally.


P1 — Orphaned export rows after tenant deletion with cleanupAfterTenantDelete: false

src/collections/Tenant/hooks/index.ts:31 and src/plugins/index.ts:57

The multi-tenant plugin is configured with cleanupAfterTenantDelete: false, so political-entities, documents, and ai-extractions remain in the database after a tenant is deleted. However, deleteAIExtractionExportRowsAfterTenantDelete deletes all export rows for that tenant. This leaves the underlying ai-extractions documents without export rows permanently — they won't be re-synced unless a manual full rebuild is triggered. If tenant deletion is expected to be reversible or if orphaned extractions are ever re-assigned, the export rows will be missing until a manual rebuild.

If tenant deletion truly means "delete everything for that tenant", this hook is fine but cleanupAfterTenantDelete: false should probably be true. If orphaned data is intentional, consider queuing a rebuild instead of deleting export rows on tenant delete.

Fix this →


P2 — Single long-running job for tenant-scoped sync may timeout

src/lib/aiExtractionExportRows.ts:561

syncAIExtractionExportRowsForTenant iterates every political entity → every document → every AI extraction in a single job. For a large tenant this can take many minutes inside a single Payload job worker, risking job timeout. Consider either:

  • Breaking the tenant sync into per-political-entity jobs (fan-out), or
  • Adding a configurable batch size guard with early-exit/continuation logic

Fix this →


P2 — depth: 3 on every sync is expensive

src/lib/aiExtractionExportRows.ts:349

const extractionDoc = await payload.findByID({
  collection: "ai-extractions",
  depth: 3,
  ...

depth: 3 populates document → politicalEntity → tenant plus all other fields at each level (media, other relationships, etc.). For collections with many relationship fields at each level, this over-fetches significantly. Using depth: 0 and fetching each relationship explicitly with targeted selects would reduce DB load at scale.


P2 — Migration down() silently wipes all export rows

migrations/20260423_000000_backfill_ai_extraction_export_rows.ts:11

await payload.delete({
  collection: AI_EXTRACTION_EXPORT_ROWS_COLLECTION,
  where: {},
});

This deletes every row in the collection. While likely intentional for a rollback, where: {} is the maximum blast-radius delete with no guard. A comment explaining this is intentional would prevent future confusion. Also, depending on the Payload adapter, an empty where clause may behave differently (e.g., some adapters may reject it). Consider where: { id: { exists: true } } as a more explicit variant.


Observation — statusLabel is locale-dependent

src/lib/aiExtractionExportRows.ts:193

PromiseStatus.label has localized: true. When findByID is called from a job worker, the locale depends on the worker's request context (likely the default locale). This is actually consistent and deterministic for background jobs, and the author has confirmed this is intentional. Just noting it here so reviewers are aware that statusLabel in CSV exports will always reflect the default locale regardless of tenant locale.


Minor — isSyncExportRowsInput type guard is a no-op

src/tasks/syncAIExtractionExportRows.ts:13

const isSyncExportRowsInput = (value: unknown): value is SyncAIExtractionExportRowsInput =>
  typeof value === "object" && value !== null;

This accepts any non-null object, including { foo: "bar" }. The narrowing gives false confidence. The actual validation work is done by scopedInputIsMissingRequiredId. This is harmless but could be dropped in favor of a simple input ?? {} fallback.


What's already fixed vs. earlier Codex feedback

Previous P1 Status
getExtractionRowId using uniqueId as key ✅ Fixed — now uses extraction.id first via getId(extraction)
PromiseStatus changes not syncing export rows ✅ Fixed — PromiseStatus hooks are wired
Transaction ID not threaded through ✅ N/A — sync is now async via job queue, so it correctly runs outside the originating transaction
Status-scoped sync aborting on missing extraction ✅ Fixed — each extraction sync is wrapped in try/catch
Locale nondeterminism Intentional per author
Tenant rows deleted vs. rebuilt Still unresolved (P1 above)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants