Skip to content

e2e: retry forced external DB sync until idle#1150

Closed
BilalG1 wants to merge 46 commits into
devfrom
external-db-sync-3
Closed

e2e: retry forced external DB sync until idle#1150
BilalG1 wants to merge 46 commits into
devfrom
external-db-sync-3

Conversation

@BilalG1

@BilalG1 BilalG1 commented Feb 2, 2026

Copy link
Copy Markdown
Collaborator

Possible CI flake fix: run multiple sequencer/poller passes when forcing sync in tests.\n\n- Lint: pass\n- Typecheck: pass

Summary by CodeRabbit

  • New Features

    • Added external database synchronization capability to automatically replicate user data to external PostgreSQL databases
    • Introduced configuration support for connecting external databases to projects
    • Added background job processing for continuous data synchronization
  • Infrastructure

    • Enhanced CI/CD workflows to support external database sync operations
    • Updated test infrastructure to validate sync functionality across multiple scenarios

@vercel

vercel Bot commented Feb 2, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
stack-backend Ready Ready Preview, Comment Feb 2, 2026 11:11pm
stack-dashboard Ready Ready Preview, Comment Feb 2, 2026 11:11pm
stack-demo Ready Ready Preview, Comment Feb 2, 2026 11:11pm
stack-docs Ready Ready Preview, Comment Feb 2, 2026 11:11pm

@coderabbitai

coderabbitai Bot commented Feb 2, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

This pull request introduces a complete external database synchronization system enabling Stack to mirror user data from internal PostgreSQL databases to external customer-managed databases. It includes database migrations with sync tracking, new backend API routes for data sequencing and polling, environment configuration for sync control, workflow updates for continuous sync operations, and comprehensive end-to-end test coverage.

Changes

Cohort / File(s) Summary
GitHub Workflows - Cron Jobs & Sync Configuration
.github/workflows/db-migration-backwards-compatibility.yaml, .github/workflows/e2e-api-tests.yaml, .github/workflows/e2e-custom-base-port-api-tests.yaml, .github/workflows/e2e-source-of-truth-api-tests.yaml, .github/workflows/restart-dev-and-test.yaml, .github/workflows/restart-dev-and-test-with-custom-base-port.yaml, .github/workflows/setup-tests.yaml, .github/workflows/setup-tests-with-custom-base-port.yaml
Add background cron-job tasks (JarvusInnovations/background-action) with QStash polling, external DB sync environment variables (STACK_FORCE_EXTERNAL_DB_SYNC, STACK_EXTERNAL_DB_SYNC_MAX_DURATION_MS, STACK_EXTERNAL_DB_SYNC_DIRECT), priming steps for external DB sync, and test command updates from pnpm test to pnpm test run.
Database Schema & Migration
apps/backend/prisma/migrations/20251125030551_external_db_sync/migration.sql, apps/backend/prisma/schema.prisma
Add global sequence for ordering data changes, new OutgoingRequest and DeletedRow tables with indexes, sequence tracking columns (sequenceId, shouldUpdateSequenceId) to ProjectUser and ContactChannel, database triggers for tracking updates, and partial indexes for efficient querying of sync-pending rows.
External DB Sync API Routes
apps/backend/src/app/api/latest/internal/external-db-sync/sequencer/route.ts, apps/backend/src/app/api/latest/internal/external-db-sync/poller/route.ts, apps/backend/src/app/api/latest/internal/external-db-sync/sync-engine/route.tsx
Implement GET sequencer to backfill sequence IDs across tenants, GET poller to claim and process pending sync requests with QStash or direct sync support, and POST sync-engine webhook to trigger external database synchronization upon completion.
External DB Sync Core Utilities
apps/backend/src/lib/external-db-sync.ts, apps/backend/src/lib/external-db-sync-queue.ts
Introduce syncExternalDatabases function to orchestrate end-to-end syncing with schema validation and row batching, and enqueueExternalDbSync to queue requests with duplicate prevention.
Backend Configuration & Scripts
apps/backend/package.json, apps/backend/scripts/run-cron-jobs.ts, apps/backend/scripts/db-migrations.tsup.config.ts, apps/backend/.env.development, apps/backend/src/app/api/latest/internal/config/override/[level]/route.tsx
Add cron-job runner script with randomized polling intervals, new package.json scripts for running cron jobs, trigger external DB sync when configuration changes include external database settings, and adjust environment file quoting for consistency.
Shared Configuration & Schema
packages/stack-shared/src/config/db-sync-mappings.ts, packages/stack-shared/src/config/schema.ts, packages/stack-shared/src/config/schema-fuzzer.test.ts
Define DEFAULT_DB_SYNC_MAPPINGS for users table sync with SQL templates for both internal fetch and external database upsert operations, add dbSync.externalDatabases to configuration schema with per-database PostgreSQL connection strings, and augment fuzzing config.
E2E Tests - External DB Sync
apps/e2e/tests/backend/endpoints/api/v1/external-db-sync-basics.test.ts, apps/e2e/tests/backend/endpoints/api/v1/external-db-sync-advanced.test.ts, apps/e2e/tests/backend/endpoints/api/v1/external-db-sync-high-volume.test.ts, apps/e2e/tests/backend/endpoints/api/v1/external-db-sync-race.test.ts, apps/e2e/tests/backend/endpoints/api/v1/external-db-sync-utils.ts
Add comprehensive E2E test suites covering basic CRUD lifecycle, multi-tenant isolation, metadata tracking, idempotency, special characters, high-volume batching, race conditions, deletion semantics, and test utilities including TestDbManager for external database lifecycle management.
E2E Infrastructure
apps/e2e/.env.development, apps/e2e/package.json, apps/e2e/tests/global-setup.ts, apps/e2e/tests/backend/endpoints/api/v1/auth/sessions/index.test.ts
Add CRON_SECRET to test environment, include PostgreSQL dev dependencies (pg, @types/pg), extend dotenv paths for test-specific .env files, and increase test timeout thresholds from 4000ms to 6000ms.
Deployment & Docker
docker/dependencies/docker.compose.yaml, vercel.json
Increase PostgreSQL max_connections to 500 for E2E tests, add Vercel cron jobs for external-db-sync polling at one-minute intervals with 300-second max duration.

Sequence Diagram

sequenceDiagram
    participant Client as Client/Webhook
    participant API as Backend API
    participant Queue as Sync Queue<br/>(OutgoingRequest)
    participant Sequencer as Sequencer Route
    participant Poller as Poller Route
    participant QStash as QStash/Direct Sync
    participant Internal as Internal DB
    participant External as External DB
    participant SyncEngine as Sync Engine

    Client->>API: Update config with<br/>externalDatabases
    API->>Queue: enqueueExternalDbSync(tenancyId)
    Queue->>Internal: INSERT OutgoingRequest
    
    loop Every 60 seconds (Cron)
        Sequencer->>Internal: Backfill sequence IDs
        Sequencer->>Internal: Mark rows for sync<br/>(shouldUpdateSequenceId)
        
        Poller->>Internal: Claim pending<br/>OutgoingRequest rows
        alt Direct Sync Enabled
            Poller->>API: POST /sync-engine<br/>(direct HTTP)
        else QStash Enabled
            Poller->>QStash: Publish request
            QStash->>SyncEngine: Webhook callback
        end
    end
    
    SyncEngine->>Internal: Fetch tenancy config
    SyncEngine->>External: syncExternalDatabases()
    External->>Internal: Fetch rows with<br/>sequence > lastSynced
    Internal-->>External: Return denormalized rows
    External->>External: Upsert to target table
    External->>External: Update _stack_sync_metadata
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Suggested reviewers

  • N2D4

Poem

🐰 Hop-hop, the sync flows true,
Data hops from old to new,
Sequences track every change,
External databases exchange,
With triggers, queues, and batches clean—
The finest sync you've ever seen!

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 32.56% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: implementing retry logic for forced external DB sync in e2e tests until idle state is reached.
Description check ✅ Passed The description is minimal but present. It states the purpose (CI flake fix by running multiple sequencer/poller passes) and provides build/lint/typecheck status, though it lacks detailed context about implementation approach.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch external-db-sync-3

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@BilalG1 BilalG1 marked this pull request as draft February 2, 2026 23:05
@greptile-apps

greptile-apps Bot commented Feb 2, 2026

Copy link
Copy Markdown
Contributor

Greptile Overview

Greptile Summary

This PR introduces a complete external database synchronization feature and implements retry logic to fix CI flakes. The retry mechanism runs multiple sequencer/poller passes when forcing sync in tests, ensuring all pending operations complete before assertions.

Key Changes:

  • Implemented forceExternalDbSync() in test utils that runs sequencer and poller endpoints 3 times with stopWhenIdle=true, continuing until both report zero work remaining (lines 163-199 in external-db-sync-utils.ts)
  • Added stopWhenIdle parameter to both poller and sequencer endpoints, allowing them to exit early when no work remains instead of running for full maxDurationMs
  • Created complete external DB sync infrastructure: sequence tracking, outgoing request queue, deleted row tracking, and sync engine
  • Added comprehensive test coverage with 4 new test suites covering basics, advanced scenarios, high volume, and race conditions
  • Integrated sync priming step in CI workflows that runs forced sync 3 times before tests begin

Architecture:
The sync system uses a three-component architecture:

  1. Sequencer: Assigns sequence IDs to changed rows and enqueues sync requests
  2. Poller: Processes outgoing sync requests from queue
  3. Sync Engine: Performs actual data synchronization to external PostgreSQL databases

The retry logic ensures deterministic test behavior by repeatedly running sequencer/poller until the system reaches an idle state with no pending work.

Confidence Score: 5/5

  • This PR is safe to merge with high confidence
  • The implementation is well-structured with comprehensive test coverage, proper error handling, and careful attention to edge cases like connection timeouts and race conditions. The retry logic directly addresses the CI flake issue mentioned in the PR description by ensuring deterministic test behavior.
  • No files require special attention

Important Files Changed

Filename Overview
apps/e2e/tests/backend/endpoints/api/v1/external-db-sync-utils.ts New utility file implementing retry logic with forceExternalDbSync that runs sequencer/poller multiple times until idle
apps/backend/src/app/api/latest/internal/external-db-sync/poller/route.ts New poller endpoint with stopWhenIdle parameter to process outgoing requests until queue is empty
apps/backend/src/app/api/latest/internal/external-db-sync/sequencer/route.ts New sequencer endpoint with stopWhenIdle parameter to backfill sequence IDs until no work remains
apps/backend/src/app/api/latest/internal/external-db-sync/sync-engine/route.tsx New sync engine endpoint that handles actual data synchronization to external databases
apps/backend/src/lib/external-db-sync.ts Core external DB sync logic for pushing data changes to external PostgreSQL databases
apps/backend/src/lib/external-db-sync-queue.ts Queue management that prevents duplicate sync requests for the same tenant
apps/backend/prisma/schema.prisma Adds OutgoingRequest, DeletedRow models and sequence tracking columns to ProjectUser and ContactChannel
apps/backend/prisma/migrations/20251125030551_external_db_sync/migration.sql Migration creating global sequence, new tables, indexes, and triggers for external DB sync feature
.github/workflows/e2e-api-tests.yaml Adds environment variables for forced sync, starts cron jobs service, and primes sync before tests
apps/backend/scripts/run-cron-jobs.ts New script for running scheduled cron jobs including external DB sync sequencer and poller

Sequence Diagram

sequenceDiagram
    participant Test as E2E Test
    participant Utils as external-db-sync-utils
    participant Sequencer as Sequencer Endpoint
    participant Poller as Poller Endpoint
    participant Queue as OutgoingRequest Queue
    participant SyncEngine as Sync Engine
    participant InternalDB as Internal Postgres
    participant ExternalDB as External Postgres

    Note over Test,ExternalDB: CI Priming Phase (Before Tests)
    Test->>Utils: Prime sync before tests
    loop 3 times
        Utils->>Sequencer: GET with stopWhenIdle=true
        Sequencer->>InternalDB: Backfill sequence IDs
        Sequencer->>Queue: Enqueue sync requests
        Sequencer-->>Utils: {iterations: N}
        Utils->>Poller: GET with stopWhenIdle=true
        Poller->>Queue: Claim pending requests
        Poller->>SyncEngine: POST sync request
        SyncEngine->>InternalDB: Fetch changed rows
        SyncEngine->>ExternalDB: Upsert data
        SyncEngine-->>Poller: Success
        Poller->>Queue: Delete request
        Poller-->>Utils: {requests_processed: M}
        Note over Utils: If iterations=0 AND requests=0, break
    end

    Note over Test,ExternalDB: Test Execution Phase
    Test->>InternalDB: Create/Update user
    InternalDB->>InternalDB: Mark shouldUpdateSequenceId=true
    
    Test->>Utils: waitForSyncedData(email)
    loop Until condition met or timeout
        Utils->>Utils: maybeForceExternalDbSync()
        Note over Utils: If >2s since last sync
        Utils->>Sequencer: Retry loop (3 attempts)
        Sequencer->>InternalDB: Assign sequenceId
        Sequencer->>Queue: Enqueue if needed
        Utils->>Poller: Retry loop (3 attempts)
        Poller->>Queue: Process requests
        Poller->>SyncEngine: Trigger sync
        SyncEngine->>ExternalDB: Write changes
        Utils->>ExternalDB: Query for data
        alt Data found with correct value
            ExternalDB-->>Utils: Row exists
            Utils-->>Test: Condition satisfied
        else Data not ready
            Utils->>Utils: Wait 500ms, retry
        end
    end
Loading

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Comment on lines +35 to +36
console.warn(`[sync-engine] Tenancy ${tenancyId} in queue but not found.`);
throw new StatusError(404, `Tenancy ${tenancyId} not found.`);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
console.warn(`[sync-engine] Tenancy ${tenancyId} in queue but not found.`);
throw new StatusError(404, `Tenancy ${tenancyId} not found.`);
console.warn(`[sync-engine] Tenancy ${tenancyId} in queue but not found.`);
throw new StatusError(404, `Tenancy ${tenancyId} not found.`);

Incorrect indentation in if block: console.warn and throw statements not properly indented inside if condition

Fix on Vercel


// Queues a sync request for a specific tenant if one isn't already pending.
export async function enqueueExternalDbSync(tenancyId: string): Promise<void> {
assertUuid(tenancyId, "tenancyId");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Race condition in enqueueExternalDbSync allows duplicate pending sync requests for the same tenant due to non-atomic INSERT...WHERE NOT EXISTS pattern in READ COMMITTED isolation level

Fix on Vercel

const sequencerRes = await niceFetch(new URL('/api/latest/internal/external-db-sync/sequencer', STACK_BACKEND_BASE_URL), {
query: {
maxDurationMs: String(FORCE_SYNC_MAX_DURATION_MS),
stopWhenIdle: "true",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forceExternalDbSync function breaks retry loop on first idle pass without confirming cascading syncs are complete

Fix on Vercel

export const POSTGRES_USER = process.env.EXTERNAL_DB_TEST_USER || 'postgres';
export const POSTGRES_PASSWORD = process.env.EXTERNAL_DB_TEST_PASSWORD || 'PASSWORD-PLACEHOLDER--uqfEC1hmmv';
export const TEST_TIMEOUT = 120000;
export const HIGH_VOLUME_TIMEOUT = 600000; // 10 minutes for 1500+ users

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing STACK_FORCE_EXTERNAL_DB_SYNC environment variable in .env.development causes tests to rely on slow automatic cron job instead of proactive forced syncs

Fix on Vercel

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In @.github/workflows/e2e-custom-base-port-api-tests.yaml:
- Around line 143-151: The CI step currently starts the background job using the
development cron script name "run-cron-jobs", which reads .env.development;
update the GitHub Action step that uses JarvusInnovations/background-action and
the run command to invoke the test variant (e.g., replace the current "pnpm -C
apps/backend run run-cron-jobs --log-order=stream &" with the test script name
such as "pnpm -C apps/backend run run-cron-jobs:test --log-order=stream &" so
the job uses the .env.test* environment for e2e tests; keep the rest of the step
(wait-on, tail, wait-for, log-output-if) unchanged.

In @.github/workflows/e2e-source-of-truth-api-tests.yaml:
- Around line 150-158: The CI cron job step currently invokes the script
"run-cron-jobs" which uses with-env:dev and loads .env.development; change the
invocation to "run-cron-jobs:test" so the job runs with the test environment
(.env.test). Locate the workflow step that uses
JarvusInnovations/background-action and replace the run command referencing
run-cron-jobs with run-cron-jobs:test (ensure the rest of the action inputs like
wait-on, tail, and wait-for remain unchanged).

In `@apps/e2e/tests/backend/endpoints/api/v1/auth/sessions/index.test.ts`:
- Around line 103-104: The timeout assertion currently in the finally block
(throwing when timeSinceBeginDate > 6_000) should be moved out of the finally so
it cannot mask exceptions from the try block; instead compute elapsed using
performance.now() (capture start with const begin = performance.now() and
compute timeSinceBeginDate = performance.now() - begin) and perform the timeout
check/assert after the try/catch completes (not inside finally), replacing the
manual throw with a proper test assertion or error that runs only when no other
test error occurred; update references to timeSinceBeginDate and any beginDate
variables accordingly.

In `@apps/e2e/tests/backend/endpoints/api/v1/external-db-sync-utils.ts`:
- Around line 175-199: The loop currently trusts sequencerRes and pollerRes
bodies without verifying HTTP status, so non‑2xx responses (401/500) will be
treated as idle; update the logic after the niceFetch calls to validate
sequencerRes.status and pollerRes.status (or an isOk/is2xx helper) and throw an
error if either response is not 2xx before reading
iterations/requests_processed, referencing the variables sequencerRes and
pollerRes used in this block and keeping the existing break behavior only when
both responses are successful and their counts are zero.
- Around line 47-129: Change all catch parameters from any to unknown (e.g., in
TestDbManager.cleanup and other try/catch blocks) and replace silent swallows
with guarded handling: add a type-guard function (e.g., isPostgresError(err:
unknown): err is { code?: string; message?: string }) that checks err is
object/non-null and has message/code properties, then log meaningful details
(console.warn/processLogger.warn) when the guard passes and otherwise rethrow or
log the unknown value; also add a proper type for the externalDatabases
parameter in createProjectWithExternalDb (replace externalDatabases: any with a
defined interface/type describing host, port, user, password, dbName, ssl, etc.)
and apply the same unknown + guard pattern to all other catch blocks mentioned
so no errors are silently ignored.
🧹 Nitpick comments (7)
apps/backend/.env.development (1)

35-40: Email configuration ordering is semantically logical (static analysis nitpick).

The static analysis tool suggests alphabetical ordering (PASSWORD before PORT), but the current order (HOST → PORT → SECURE → USERNAME → PASSWORD → SENDER) follows a logical configuration hierarchy. This semantic grouping is more intuitive than alphabetical sorting, so the warning can be safely ignored.

apps/e2e/.env.development (1)

14-14: Consider prefixing CRON_SECRET with STACK_.

Per project conventions, environment variables should be prefixed with STACK_ (or NEXT_PUBLIC_STACK_ if public) to ensure Turborepo picks up changes and improve readability. Consider renaming to STACK_CRON_SECRET.

Based on learnings: "Environment variables should be prefixed with STACK_ or NEXT_PUBLIC_STACK_ (if public) to ensure Turborepo picks up changes and improve readability"

apps/backend/src/app/api/latest/internal/external-db-sync/sync-engine/route.tsx (1)

34-37: Inconsistent indentation inside the if block.

Lines 35-36 are not indented relative to the surrounding if block, which reduces readability.

🔧 Proposed fix for indentation
     const tenancy = await getTenancy(tenancyId);
     if (!tenancy) {
-console.warn(`[sync-engine] Tenancy ${tenancyId} in queue but not found.`);
-throw new StatusError(404, `Tenancy ${tenancyId} not found.`);
+      console.warn(`[sync-engine] Tenancy ${tenancyId} in queue but not found.`);
+      throw new StatusError(404, `Tenancy ${tenancyId} not found.`);
     }
apps/e2e/tests/backend/endpoints/api/v1/external-db-sync-advanced.test.ts (1)

955-955: Use .slice() instead of deprecated .substr().

String.prototype.substr() is deprecated. Use .slice(2, 11) for equivalent behavior.

🔧 Proposed fix
-      const testRunId = `${Date.now()}-${Math.random().toString(36).substr(2, 9)}`;
+      const testRunId = `${Date.now()}-${Math.random().toString(36).slice(2, 11)}`;
apps/backend/src/app/api/latest/internal/external-db-sync/sequencer/route.ts (1)

16-39: Code duplication with poller/route.ts.

parseMaxDurationMs, parseStopWhenIdle, UUID_REGEX, and assertUuid are duplicated from poller/route.ts. Consider extracting these to a shared utility module (e.g., @/lib/external-db-sync-utils.ts) to reduce duplication and ensure consistent behavior.

apps/backend/src/lib/external-db-sync.ts (2)

66-79: Consider adding type annotation comment for newRows: any[].

The any[] type is necessary because row shapes vary by mapping and come from raw SQL. A brief comment explaining this would help future maintainers understand why strict typing isn't feasible here.

📝 Suggested documentation
 async function pushRowsToExternalDb(
   externalClient: Client,
   tableName: string,
+  // Rows from internalDbFetchQuery have dynamic shapes per mapping; strict typing isn't feasible.
   newRows: any[],
   upsertQuery: string,
   expectedTenancyId: string,
   mappingId: string,
 ) {

192-202: BigInt to Number conversion may lose precision for extremely large sequence IDs.

The conversion on line 196 (Number(seq)) could lose precision if sequence_id exceeds Number.MAX_SAFE_INTEGER (~9 quadrillion). While unlikely in practice, consider using BigInt throughout or adding a safeguard.

🛡️ Optional safeguard
         const seq = row.sequence_id;
         if (seq != null) {
           const seqNum = typeof seq === 'bigint' ? Number(seq) : Number(seq);
+          if (!Number.isSafeInteger(seqNum)) {
+            throw new StackAssertionError(`sequence_id ${seq} exceeds safe integer range for mapping ${mappingId}.`);
+          }
           if (seqNum > maxSeqInBatch) {
             maxSeqInBatch = seqNum;
           }
         }

Comment on lines +143 to +151
- name: Start run-cron-jobs in background
uses: JarvusInnovations/background-action@v1.0.7
with:
run: pnpm -C apps/backend run run-cron-jobs --log-order=stream &
wait-on: |
http://localhost:6702
tail: true
wait-for: 30s
log-output-if: true

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

find . -name "package.json" -path "*/apps/backend/*" | head -5

Repository: stack-auth/stack-auth

Length of output: 92


🏁 Script executed:

fd "package.json" apps/backend

Repository: stack-auth/stack-auth

Length of output: 90


🏁 Script executed:

rg "run-cron-jobs" --type json | head -20

Repository: stack-auth/stack-auth

Length of output: 586


🏁 Script executed:

sed -n '130,160p' .github/workflows/e2e-custom-base-port-api-tests.yaml

Repository: stack-auth/stack-auth

Length of output: 1022


🏁 Script executed:

rg "e2e.*test" --type yaml .github/workflows/ -l

Repository: stack-auth/stack-auth

Length of output: 268


🏁 Script executed:

rg "run-email-queue|run-cron-jobs" apps/backend/package.json

Repository: stack-auth/stack-auth

Length of output: 586


🏁 Script executed:

grep -A 5 "run-email-queue\|run-cron-jobs" .github/workflows/e2e-api-tests.yaml | head -30

Repository: stack-auth/stack-auth

Length of output: 721


🏁 Script executed:

rg "run-email-queue" apps/backend/package.json

Repository: stack-auth/stack-auth

Length of output: 430


🏁 Script executed:

cat apps/backend/package.json | jq '.scripts | keys[] | select(contains("email-queue"))'

Repository: stack-auth/stack-auth

Length of output: 82


Use the test env for cron jobs in CI.
run-cron-jobs uses with-env:dev, which reads .env.development. Since this e2e test workflow sources .env.test.local, switch to the test script.

Suggested change
-          run: pnpm -C apps/backend run run-cron-jobs --log-order=stream &
+          run: pnpm -C apps/backend run run-cron-jobs:test --log-order=stream &
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Start run-cron-jobs in background
uses: JarvusInnovations/background-action@v1.0.7
with:
run: pnpm -C apps/backend run run-cron-jobs --log-order=stream &
wait-on: |
http://localhost:6702
tail: true
wait-for: 30s
log-output-if: true
- name: Start run-cron-jobs in background
uses: JarvusInnovations/background-action@v1.0.7
with:
run: pnpm -C apps/backend run run-cron-jobs:test --log-order=stream &
wait-on: |
http://localhost:6702
tail: true
wait-for: 30s
log-output-if: true
🤖 Prompt for AI Agents
In @.github/workflows/e2e-custom-base-port-api-tests.yaml around lines 143 -
151, The CI step currently starts the background job using the development cron
script name "run-cron-jobs", which reads .env.development; update the GitHub
Action step that uses JarvusInnovations/background-action and the run command to
invoke the test variant (e.g., replace the current "pnpm -C apps/backend run
run-cron-jobs --log-order=stream &" with the test script name such as "pnpm -C
apps/backend run run-cron-jobs:test --log-order=stream &" so the job uses the
.env.test* environment for e2e tests; keep the rest of the step (wait-on, tail,
wait-for, log-output-if) unchanged.

Comment on lines +150 to +158
- name: Start run-cron-jobs in background
uses: JarvusInnovations/background-action@v1.0.7
with:
run: pnpm -C apps/backend run run-cron-jobs --log-order=stream &
wait-on: |
http://localhost:8102
tail: true
wait-for: 30s
log-output-if: true

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

find . -type f -name "package.json" -path "*/apps/backend/*" | head -5

Repository: stack-auth/stack-auth

Length of output: 92


🏁 Script executed:

cat ./apps/backend/package.json | grep -A 50 '"scripts"'

Repository: stack-auth/stack-auth

Length of output: 4278


🏁 Script executed:

cat -n ./.github/workflows/e2e-source-of-truth-api-tests.yaml | sed -n '145,165p'

Repository: stack-auth/stack-auth

Length of output: 799


Use the test environment for cron jobs in CI.
The run-cron-jobs script uses with-env:dev, which reads .env.development. In this e2e test workflow, switch to run-cron-jobs:test to apply test environment settings (.env.test).

🛠️ Suggested change
-          run: pnpm -C apps/backend run run-cron-jobs --log-order=stream &
+          run: pnpm -C apps/backend run run-cron-jobs:test --log-order=stream &
🤖 Prompt for AI Agents
In @.github/workflows/e2e-source-of-truth-api-tests.yaml around lines 150 - 158,
The CI cron job step currently invokes the script "run-cron-jobs" which uses
with-env:dev and loads .env.development; change the invocation to
"run-cron-jobs:test" so the job runs with the test environment (.env.test).
Locate the workflow step that uses JarvusInnovations/background-action and
replace the run command referencing run-cron-jobs with run-cron-jobs:test
(ensure the rest of the action inputs like wait-on, tail, and wait-for remain
unchanged).

Comment on lines +103 to +104
if (timeSinceBeginDate > 6_000) {
throw new StackAssertionError(`Timeout error: Requests were too slow (${timeSinceBeginDate}ms > 6000ms); try again or try to understand why they were slow.`);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

JavaScript finally block throw error override try catch does it mask

💡 Result:

Yes. In JavaScript, a finally block can mask/override what happened in try/catch.

  • If finally completes normally (no return/throw/break/continue), then the original completion from try/catch is preserved (return value stays, or the original exception keeps propagating). Per the spec: if F.type (the finally result) is normal, the overall result is the earlier one; otherwise the overall result becomes F. [1]
  • If finally executes a control-flow statement like throw or return, it wins and replaces (“masks”) any prior return/throw from try or catch. MDN explicitly notes that control flow in finally “mask[s] any completion value” from try/catch. [2]

Example (exception is masked by finally):

function f() {
  try {
    throw new Error("from try");
  } catch (e) {
    throw new Error("from catch");
  } finally {
    throw new Error("from finally"); // this is what the caller sees
  }
}

f(); // throws "from finally"

Rule of thumb: avoid return/throw in finally unless you intentionally want to override earlier returns/errors. [2]

Sources: ECMAScript spec (ES5.1) try-statement completion rules [1]; MDN try...catch...finally (“mask” behavior) [2].


🏁 Script executed:

# Read the actual file to understand the context better
cat -n apps/e2e/tests/backend/endpoints/api/v1/auth/sessions/index.test.ts | head -150 | tail -60

Repository: stack-auth/stack-auth

Length of output: 2692


Move timeout assertion out of finally block and use performance.now() for timing.

Throwing from finally overrides any error from the try block, masking the actual test failure and making flaky runs harder to debug. Additionally, per coding guidelines, use performance.now() instead of Date.now() for measuring elapsed time.

Suggested refactor
-  const beginDate = new Date();
+  const beginTime = performance.now();
   const res2 = await niceBackendFetch("/api/v1/auth/sessions", {
     accessType: "server",
     method: "POST",
     body: {
       user_id: res.userId,
       expires_in_millis: 5_000,
     },
   });
   expect(res2).toMatchInlineSnapshot(`
     NiceResponse {
       "status": 200,
       "body": {
         "access_token": <stripped field 'access_token'>,
         "refresh_token": <stripped field 'refresh_token'>,
       },
       "headers": Headers { <some fields may have been hidden> },
     }
   `);
   const waitPromise = wait(5_001);
-  try {
-    const refreshSessionResponse1 = await niceBackendFetch("/api/v1/auth/sessions/current/refresh", {
-      method: "POST",
-      accessType: "client",
-      headers: {
-        "x-stack-refresh-token": res2.body.refresh_token
-      },
-    });
-    expect(refreshSessionResponse1).toMatchInlineSnapshot(`
-      NiceResponse {
-        "status": 200,
-        "body": { "access_token": <stripped field 'access_token'> },
-        "headers": Headers { <some fields may have been hidden> },
-      }
-    `);
-    backendContext.set({ userAuth: { accessToken: refreshSessionResponse1.body.access_token, refreshToken: res2.body.refresh_token } });
-    await Auth.expectToBeSignedIn();
-  } finally {
-    const timeSinceBeginDate = new Date().getTime() - beginDate.getTime();
-    if (timeSinceBeginDate > 6_000) {
-      throw new StackAssertionError(`Timeout error: Requests were too slow (${timeSinceBeginDate}ms > 6000ms); try again or try to understand why they were slow.`);
-    }
-  }
+  const refreshSessionResponse1 = await niceBackendFetch("/api/v1/auth/sessions/current/refresh", {
+    method: "POST",
+    accessType: "client",
+    headers: {
+      "x-stack-refresh-token": res2.body.refresh_token
+    },
+  });
+  expect(refreshSessionResponse1).toMatchInlineSnapshot(`
+    NiceResponse {
+      "status": 200,
+      "body": { "access_token": <stripped field 'access_token'> },
+      "headers": Headers { <some fields may have been hidden> },
+    }
+  `);
+  backendContext.set({ userAuth: { accessToken: refreshSessionResponse1.body.access_token, refreshToken: res2.body.refresh_token } });
+  await Auth.expectToBeSignedIn();
+  const timeSinceBeginDate = performance.now() - beginTime;
+  if (timeSinceBeginDate > 6_000) {
+    throw new StackAssertionError(`Timeout error: Requests were too slow (${timeSinceBeginDate}ms > 6000ms); try again or try to understand why they were slow.`);
+  }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (timeSinceBeginDate > 6_000) {
throw new StackAssertionError(`Timeout error: Requests were too slow (${timeSinceBeginDate}ms > 6000ms); try again or try to understand why they were slow.`);
const beginTime = performance.now();
const res2 = await niceBackendFetch("/api/v1/auth/sessions", {
accessType: "server",
method: "POST",
body: {
user_id: res.userId,
expires_in_millis: 5_000,
},
});
expect(res2).toMatchInlineSnapshot(`
NiceResponse {
"status": 200,
"body": {
"access_token": <stripped field 'access_token'>,
"refresh_token": <stripped field 'refresh_token'>,
},
"headers": Headers { <some fields may have been hidden> },
}
`);
const waitPromise = wait(5_001);
const refreshSessionResponse1 = await niceBackendFetch("/api/v1/auth/sessions/current/refresh", {
method: "POST",
accessType: "client",
headers: {
"x-stack-refresh-token": res2.body.refresh_token
},
});
expect(refreshSessionResponse1).toMatchInlineSnapshot(`
NiceResponse {
"status": 200,
"body": { "access_token": <stripped field 'access_token'> },
"headers": Headers { <some fields may have been hidden> },
}
`);
backendContext.set({ userAuth: { accessToken: refreshSessionResponse1.body.access_token, refreshToken: res2.body.refresh_token } });
await Auth.expectToBeSignedIn();
const timeSinceBeginDate = performance.now() - beginTime;
if (timeSinceBeginDate > 6_000) {
throw new StackAssertionError(`Timeout error: Requests were too slow (${timeSinceBeginDate}ms > 6000ms); try again or try to understand why they were slow.`);
}
🧰 Tools
🪛 Biome (2.3.13)

[error] 104-104: Unsafe usage of 'throw'.

'throw' in 'finally' overwrites the control flow statements inside 'try' and 'catch'.

(lint/correctness/noUnsafeFinally)

🤖 Prompt for AI Agents
In `@apps/e2e/tests/backend/endpoints/api/v1/auth/sessions/index.test.ts` around
lines 103 - 104, The timeout assertion currently in the finally block (throwing
when timeSinceBeginDate > 6_000) should be moved out of the finally so it cannot
mask exceptions from the try block; instead compute elapsed using
performance.now() (capture start with const begin = performance.now() and
compute timeSinceBeginDate = performance.now() - begin) and perform the timeout
check/assert after the try/catch completes (not inside finally), replacing the
manual throw with a proper test assertion or error that runs only when no other
test error occurred; update references to timeSinceBeginDate and any beginDate
variables accordingly.

Comment on lines +47 to +129
export class TestDbManager {
private setupClient: Client | null = null;
private databases: Map<string, Client> = new Map();
private databaseNames: Set<string> = new Set();

async init() {
this.setupClient = new Client({
connectionString: `postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@${POSTGRES_HOST}/postgres`,
...CLIENT_CONFIG,
});
await this.setupClient.connect();
}

async createDatabase(dbName: string): Promise<string> {
if (!this.setupClient) throw new Error('TestDbManager not initialized');

const uniqueDbName = `${dbName}_${Date.now()}_${Math.random().toString(36).slice(2, 8)}`;
await this.setupClient.query(`CREATE DATABASE "${uniqueDbName}"`);
const connectionString = `postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@${POSTGRES_HOST}/${uniqueDbName}`;
const client = new Client({
connectionString,
...CLIENT_CONFIG,
});
await client.connect();

this.databases.set(dbName, client);
this.databaseNames.add(uniqueDbName);
return connectionString;
}

getClient(dbName: string): Client {
const client = this.databases.get(dbName);
if (!client) throw new Error(`Database ${dbName} not found`);
return client;
}

async cleanup() {
// First, clean up all project configs to stop the sync cron from trying to connect
await cleanupAllProjectConfigs();

// Close all tracked database clients
const closePromises = Array.from(this.databases.values()).map(async (client) => {
try {
await Promise.race([
client.end(),
new Promise((_, reject) => setTimeout(() => reject(new Error('Client close timeout')), 5000)),
]);
} catch (err) {
// Ignore errors when closing clients - they may already be closed or timed out
}
});
await Promise.all(closePromises);
this.databases.clear();

if (this.setupClient) {
// Terminate all connections and drop databases
for (const dbName of this.databaseNames) {
try {
// Forcefully terminate ALL connections to this database
await this.setupClient.query(`
SELECT pg_terminate_backend(pg_stat_activity.pid)
FROM pg_stat_activity
WHERE pg_stat_activity.datname = $1
AND pid <> pg_backend_pid()
`, [dbName]);

// Small delay to ensure connections are terminated
await new Promise(r => setTimeout(r, 100));

await this.setupClient.query(`DROP DATABASE IF EXISTS "${dbName}"`);
} catch (err) {
console.warn(`Failed to drop database ${dbName}:`, err);
}
}
this.databaseNames.clear();

try {
await this.setupClient.end();
} catch (err) {
// Ignore errors when closing setup client
}
this.setupClient = null;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n apps/e2e/tests/backend/endpoints/api/v1/external-db-sync-utils.ts | head -200

Repository: stack-auth/stack-auth

Length of output: 8492


🏁 Script executed:

cat -n apps/e2e/tests/backend/endpoints/api/v1/external-db-sync-utils.ts | sed -n '200,400p'

Repository: stack-auth/stack-auth

Length of output: 6980


Replace any with unknown and add type guards; type externalDatabases parameter.

Multiple catch blocks use err: any (lines 150, 223, 254, 321) without proper type guards, and createProjectWithExternalDb accepts externalDatabases: any (line 333). Several catch blocks also silently swallow errors (lines 94–96, 117–119, 125–127, 384) without logging, hiding unexpected failures.

Switch catch blocks to err: unknown with a PostgreSQL error guard, and provide a proper type for the externalDatabases config:

🧩 Suggested pattern
+type PgError = { code?: string };
+const isPgError = (err: unknown): err is PgError =>
+  typeof err === 'object' && err !== null && 'code' in err;

-export async function waitForCondition(
-  checkFn: () => Promise<boolean>,
-  options: { timeoutMs?: number, intervalMs?: number, description?: string } = {}
-): Promise<void> {
+export async function waitForCondition(
+  checkFn: () => Promise<boolean>,
+  options: { timeoutMs?: number, intervalMs?: number, description?: string } = {}
+): Promise<void> {
   ...
-  } catch (err: any) {
-    if (err?.code === '57P01' || err?.code === '08006' || err?.code === '53300') {
+  } catch (err: unknown) {
+    if (isPgError(err) && (err.code === '57P01' || err.code === '08006' || err.code === '53300')) {
       await new Promise(r => setTimeout(r, intervalMs));
       continue;
     }
     throw err;
   }
   ...
 }
 
-export async function createProjectWithExternalDb(externalDatabases: any, projectOptions?: { display_name?: string, description?: string }) {
+export type ExternalDatabasesConfig = Record<string, unknown>;
+export async function createProjectWithExternalDb(
+  externalDatabases: ExternalDatabasesConfig,
+  projectOptions?: { display_name?: string, description?: string }
+) {

Apply the same guard pattern to other catch blocks (lines 223, 254, 321) and remove silent error suppression.

🤖 Prompt for AI Agents
In `@apps/e2e/tests/backend/endpoints/api/v1/external-db-sync-utils.ts` around
lines 47 - 129, Change all catch parameters from any to unknown (e.g., in
TestDbManager.cleanup and other try/catch blocks) and replace silent swallows
with guarded handling: add a type-guard function (e.g., isPostgresError(err:
unknown): err is { code?: string; message?: string }) that checks err is
object/non-null and has message/code properties, then log meaningful details
(console.warn/processLogger.warn) when the guard passes and otherwise rethrow or
log the unknown value; also add a proper type for the externalDatabases
parameter in createProjectWithExternalDb (replace externalDatabases: any with a
defined interface/type describing host, port, user, password, dbName, ssl, etc.)
and apply the same unknown + guard pattern to all other catch blocks mentioned
so no errors are silently ignored.

Comment on lines +175 to +199
for (let attempt = 0; attempt < FORCE_SYNC_ATTEMPTS; attempt++) {
const sequencerRes = await niceFetch(new URL('/api/latest/internal/external-db-sync/sequencer', STACK_BACKEND_BASE_URL), {
query: {
maxDurationMs: String(FORCE_SYNC_MAX_DURATION_MS),
stopWhenIdle: "true",
},
headers: {
Authorization: `Bearer ${cronSecret}`,
},
});
const pollerRes = await niceFetch(new URL('/api/latest/internal/external-db-sync/poller', STACK_BACKEND_BASE_URL), {
query: {
maxDurationMs: String(FORCE_SYNC_MAX_DURATION_MS),
stopWhenIdle: "true",
},
headers: {
Authorization: `Bearer ${cronSecret}`,
},
});

const sequencerIterations = (sequencerRes.body as { iterations?: number } | undefined)?.iterations ?? 0;
const requestsProcessed = (pollerRes.body as { requests_processed?: number } | undefined)?.requests_processed ?? 0;
if (sequencerIterations === 0 && requestsProcessed === 0) {
break;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate sequencer/poller HTTP status before treating sync as idle.
If either endpoint returns 401/500, iterations/requests_processed default to 0 and the loop exits, masking failures and increasing flake risk. Please throw on non‑2xx responses.

✅ Suggested guard
     const sequencerRes = await niceFetch(new URL('/api/latest/internal/external-db-sync/sequencer', STACK_BACKEND_BASE_URL), {
       query: {
         maxDurationMs: String(FORCE_SYNC_MAX_DURATION_MS),
         stopWhenIdle: "true",
       },
       headers: {
         Authorization: `Bearer ${cronSecret}`,
       },
     });
+    if (sequencerRes.status < 200 || sequencerRes.status >= 300) {
+      throw new Error(`Sequencer sync failed: ${sequencerRes.status}`);
+    }

     const pollerRes = await niceFetch(new URL('/api/latest/internal/external-db-sync/poller', STACK_BACKEND_BASE_URL), {
       query: {
         maxDurationMs: String(FORCE_SYNC_MAX_DURATION_MS),
         stopWhenIdle: "true",
       },
       headers: {
         Authorization: `Bearer ${cronSecret}`,
       },
     });
+    if (pollerRes.status < 200 || pollerRes.status >= 300) {
+      throw new Error(`Poller sync failed: ${pollerRes.status}`);
+    }
🤖 Prompt for AI Agents
In `@apps/e2e/tests/backend/endpoints/api/v1/external-db-sync-utils.ts` around
lines 175 - 199, The loop currently trusts sequencerRes and pollerRes bodies
without verifying HTTP status, so non‑2xx responses (401/500) will be treated as
idle; update the logic after the niceFetch calls to validate sequencerRes.status
and pollerRes.status (or an isOk/is2xx helper) and throw an error if either
response is not 2xx before reading iterations/requests_processed, referencing
the variables sequencerRes and pollerRes used in this block and keeping the
existing break behavior only when both responses are successful and their counts
are zero.

@BilalG1 BilalG1 closed this Feb 2, 2026
@BilalG1 BilalG1 deleted the external-db-sync-3 branch February 5, 2026 20:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants