Skip to content

Fix external DB sync poller stale claim recovery#1241

Closed
mantrakp04 wants to merge 2 commits intodevfrom
codex/poller-expired-claim-fix
Closed

Fix external DB sync poller stale claim recovery#1241
mantrakp04 wants to merge 2 commits intodevfrom
codex/poller-expired-claim-fix

Conversation

@mantrakp04
Copy link
Copy Markdown
Collaborator

@mantrakp04 mantrakp04 commented Mar 11, 2026

Summary

  • reclaim expired OutgoingRequest claims in one SQL claim query instead of a separate stale scan
  • log reclaimed stale requests once per poller iteration instead of repeatedly emitting the same error
  • keep stuck requests retryable after a dev-server restart without dropping queued work

Validation

  • pnpm exec eslint src/app/api/latest/internal/external-db-sync/poller/route.ts --max-warnings=0 (in apps/backend)
  • route-level validation previously passed in the main workspace before isolation, including the stale-row repro and exhaustive 80-case state matrix
  • in the clean worktree, pnpm test run apps/backend/src/app/api/latest/internal/external-db-sync/poller/route.ts is still blocked by missing generated @stackframe/stack-shared/dist/* artifacts

cc @BilalG1 @N2D4

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Enhanced tracking and recovery of incomplete requests in external database synchronization to improve overall reliability.
    • Updated monitoring metrics to better reflect request processing status and reclaimed requests.
  • Tests

    • Improved test coverage for stale request identification and handling.

Copilot AI review requested due to automatic review settings March 11, 2026 04:34
@vercel
Copy link
Copy Markdown

vercel Bot commented Mar 11, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
stack-auth-hosted-components Ready Ready Preview, Comment Mar 11, 2026 5:01am
stack-backend Ready Ready Preview, Comment Mar 11, 2026 5:01am
stack-dashboard Ready Ready Preview, Comment Mar 11, 2026 5:01am
stack-demo Ready Ready Preview, Comment Mar 11, 2026 5:01am
stack-docs Ready Ready Preview, Comment Mar 11, 2026 5:01am

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 11, 2026

📝 Walkthrough

Walkthrough

Refactors the external DB sync poller to introduce a ClaimedOutgoingRequest type with a wasStale flag, changes claimPendingRequests to a CTE-based/serializable-transaction claim that returns reclaimed-stale info, replaces raw STALE threshold usage with a reclaim interval, and updates telemetry and logging to surface reclaimed stale counts (no public API changes).

Changes

Cohort / File(s) Summary
External DB Sync Poller Claim Logic
apps/backend/src/app/api/latest/internal/external-db-sync/poller/route.ts
Adds ClaimedOutgoingRequest and getClaimedStaleRequestIds; refactors claimPendingRequests to a CTE + serializable transaction returning wasStale; replaces STALE_REQUEST_THRESHOLD_MS usage with STALE_CLAIM_INTERVAL_MINUTES; updates polling iteration telemetry, logs reclaimed stale IDs, and emits a console warning when reclaiming stale requests; adds unit test for getClaimedStaleRequestIds.

Sequence Diagram(s)

sequenceDiagram
    participant Poller as Poller
    participant DB as Database
    participant Telemetry as Telemetry
    participant Console as Console
    participant Worker as Worker

    Poller->>DB: Begin serializable transaction\nSelect claimable candidates (unstarted or stale beyond interval)
    DB-->>Poller: Return claimed rows with `wasStale` flag
    Poller->>Telemetry: Increment claimed-count, claimed-stale-count (if any)
    Poller->>Console: Log/warn reclaimed stale IDs (if `wasStale`)
    Poller->>Worker: Enqueue/process claimed requests
    Poller->>DB: Commit transaction
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • N2D4

Poem

🐰 Hopping through SQL fields so wide,
I claim the stale with careful stride,
Flags set true, meters sing aloud,
Syncs run smooth and make me proud. 🥕✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main change: fixing stale claim recovery in the external DB sync poller logic.
Description check ✅ Passed The description provides a clear summary of changes, validation steps performed, and relevant context, though it doesn't strictly follow the minimal template structure provided.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch codex/poller-expired-claim-fix

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the external DB sync poller to recover stuck OutgoingRequest work by reclaiming stale claims during the normal claim step, and reduces repeated error emission by logging reclaimed stale requests once per iteration.

Changes:

  • Replaces the raw SQL “claim pending” step with a Prisma-based claim/reclaim flow that also marks whether a claim was stale.
  • Removes the separate stale-scan error reporting and replaces it with a per-iteration warning that includes reclaimed IDs.
  • Adds a small unit test for getClaimedStaleRequestIds(...) using inline Vitest.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +112 to +146
const candidateRequests = await tx.outgoingRequest.findMany({
where: {
OR: [
{ startedFulfillingAt: null },
{ startedFulfillingAt: { lt: reclaimBefore } },
],
},
orderBy: [
{ createdAt: "asc" },
{ id: "asc" },
],
take: pollerClaimLimit,
});

const claimedRequests: ClaimedOutgoingRequest[] = [];
for (const candidateRequest of candidateRequests) {
const claimResult = await tx.outgoingRequest.updateMany({
where: {
id: candidateRequest.id,
startedFulfillingAt: candidateRequest.startedFulfillingAt,
},
data: {
startedFulfillingAt: claimTime,
},
});

if (claimResult.count !== 1) {
continue;
}

claimedRequests.push({
...candidateRequest,
startedFulfillingAt: claimTime,
wasStale: candidateRequest.startedFulfillingAt != null,
});
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

claimPendingRequests first loads candidates via findMany without FOR UPDATE SKIP LOCKED semantics. With multiple pollers, they can repeatedly select the same oldest rows, then lose races in updateMany, and this function does not attempt to fetch additional candidates to fill the remaining claim capacity—reducing throughput under contention. Consider switching back to a single SQL claim query using row locks/SKIP LOCKED (or otherwise looping to keep selecting more candidates until the claim limit is met).

Suggested change
const candidateRequests = await tx.outgoingRequest.findMany({
where: {
OR: [
{ startedFulfillingAt: null },
{ startedFulfillingAt: { lt: reclaimBefore } },
],
},
orderBy: [
{ createdAt: "asc" },
{ id: "asc" },
],
take: pollerClaimLimit,
});
const claimedRequests: ClaimedOutgoingRequest[] = [];
for (const candidateRequest of candidateRequests) {
const claimResult = await tx.outgoingRequest.updateMany({
where: {
id: candidateRequest.id,
startedFulfillingAt: candidateRequest.startedFulfillingAt,
},
data: {
startedFulfillingAt: claimTime,
},
});
if (claimResult.count !== 1) {
continue;
}
claimedRequests.push({
...candidateRequest,
startedFulfillingAt: claimTime,
wasStale: candidateRequest.startedFulfillingAt != null,
});
const claimedRequests: ClaimedOutgoingRequest[] = [];
const triedIds: string[] = [];
while (claimedRequests.length < pollerClaimLimit) {
const remainingCapacity = pollerClaimLimit - claimedRequests.length;
const candidateRequests = await tx.outgoingRequest.findMany({
where: {
...(triedIds.length > 0
? { id: { notIn: triedIds } }
: {}),
OR: [
{ startedFulfillingAt: null },
{ startedFulfillingAt: { lt: reclaimBefore } },
],
},
orderBy: [
{ createdAt: "asc" },
{ id: "asc" },
],
take: remainingCapacity,
});
if (candidateRequests.length === 0) {
break;
}
let claimedThisRound = false;
for (const candidateRequest of candidateRequests) {
triedIds.push(candidateRequest.id);
const claimResult = await tx.outgoingRequest.updateMany({
where: {
id: candidateRequest.id,
startedFulfillingAt: candidateRequest.startedFulfillingAt,
},
data: {
startedFulfillingAt: claimTime,
},
});
if (claimResult.count !== 1) {
continue;
}
claimedThisRound = true;
claimedRequests.push({
...candidateRequest,
startedFulfillingAt: claimTime,
wasStale: candidateRequest.startedFulfillingAt != null,
});
if (claimedRequests.length >= pollerClaimLimit) {
break;
}
}
if (!claimedThisRound) {
// No progress made this iteration; avoid an infinite loop.
break;
}

Copilot uses AI. Check for mistakes.
Comment on lines +126 to +147
const claimedRequests: ClaimedOutgoingRequest[] = [];
for (const candidateRequest of candidateRequests) {
const claimResult = await tx.outgoingRequest.updateMany({
where: {
id: candidateRequest.id,
startedFulfillingAt: candidateRequest.startedFulfillingAt,
},
data: {
startedFulfillingAt: claimTime,
},
});

if (claimResult.count !== 1) {
continue;
}

claimedRequests.push({
...candidateRequest,
startedFulfillingAt: claimTime,
wasStale: candidateRequest.startedFulfillingAt != null,
});
}
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The per-row updateMany loop runs up to pollerClaimLimit separate UPDATEs inside a serializable Prisma transaction. This turns the claim step into an N+1 pattern (1 query + up to 1000 updates) and can significantly increase transaction time/lock duration on a hot cron endpoint. A more scalable approach is to do the claim/reclaim in a single DB statement (e.g., an UPDATE…RETURNING CTE with the stale/pending predicate and ordering) and avoid the deprecated retryTransaction path (see prisma-client.tsx deprecation note).

Copilot uses AI. Check for mistakes.
Comment on lines +291 to 294
console.warn(
`[Poller] Reclaimed ${reclaimedStaleRequestIds.length} stale outgoing request(s) older than ${STALE_CLAIM_INTERVAL_MINUTES} minutes.`,
{ staleRequestIds: reclaimedStaleRequestIds },
);
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

console.warn logs the full staleRequestIds array, which can be as large as pollerClaimLimit (default 1000). This can produce very large log lines and increase log volume/costs or get truncated. Consider capping the number of IDs logged (e.g., first N) while still logging the total count.

Copilot uses AI. Check for mistakes.
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 11, 2026

Greptile Summary

This PR reworks how the external-DB-sync poller reclaims expired OutgoingRequest claims. Instead of a separate stale-row scan followed by a separate claim query, both are now folded into a single claimPendingRequests call that also returns a wasStale flag, eliminating the duplicate error on every iteration for the same stuck request.

Key concerns:

  • Performance regression: The original claim used a single UPDATE … FOR UPDATE SKIP LOCKED … RETURNING * statement — one round-trip, ideal for queue-worker patterns. The new path does findMany + up to pollerClaimLimit (default 1,000) individual updateMany calls inside a SERIALIZABLE transaction. At the default limit this is ~1,001 database round-trips per claim cycle, which is likely to exceed Prisma's default 5-second transaction timeout and cause repeated retries. The retryTransaction helper itself is marked @deprecated in the codebase with the note "Prisma transactions are slow and lock the database."
  • Observability downgrade: captureError (error-monitoring platform / Sentry) is replaced by console.warn. Stale-request accumulation is an operational incident signal and should continue to page on-call engineers.
  • Redundant telemetry: stack.external-db-sync.reclaimed-count and stack.external-db-sync.stale-count are both set to the same value on the same span.

Confidence Score: 2/5

  • Not safe to merge as-is — the N+1 serializable transaction can timeout at the default claim limit, causing the poller to stall processing outgoing requests.
  • The correctness goal of the PR is sound (fold stale reclaim into the normal claim path), but the chosen implementation introduces a performance hazard (1,001 queries inside a single serializable transaction with a 1,000-row limit) that directly affects the critical outgoing-request queue. The original raw-SQL approach was strictly superior for this workload. Additionally, the switch from captureError to console.warn silently drops production alerts for a class of operational incident.
  • apps/backend/src/app/api/latest/internal/external-db-sync/poller/route.ts — specifically the claimPendingRequests implementation (lines 108–154) and the stale-alert replacement (lines 290–295).

Important Files Changed

Filename Overview
apps/backend/src/app/api/latest/internal/external-db-sync/poller/route.ts Replaces the atomic FOR UPDATE SKIP LOCKED SQL with an N+1 optimistic-locking loop inside a deprecated, serializable retryTransaction; also downgrades stale-request alerting from captureError to console.warn.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A([Poller iteration starts]) --> B{Poller enabled?}
    B -- No --> Z([Return: disabled])
    B -- Yes --> C[claimPendingRequests]

    subgraph claimPendingRequests ["claimPendingRequests (serializable tx)"]
        C --> D["findMany: startedFulfillingAt IS NULL\nOR startedFulfillingAt < now − 5 min\nORDER BY createdAt, id\nLIMIT pollerClaimLimit"]
        D --> E{For each candidate}
        E --> F["updateMany WHERE id = candidate.id\nAND startedFulfillingAt = original value"]
        F --> G{count == 1?}
        G -- No --> E
        G -- Yes --> H["Push to claimedRequests\nwasStale = (original startedFulfillingAt != null)"]
        H --> E
        E -- done --> I[Return claimedRequests]
    end

    I --> J{Any wasStale=true?}
    J -- Yes --> K[console.warn reclaimed IDs]
    J -- No --> L[processRequests]
    K --> L
    L --> M{directSyncEnabled?}
    M -- Yes --> N[processRequest one-by-one via QStash]
    M -- No --> O[batchJSON to QStash, then deleteOutgoingRequests]
    N --> P([End iteration])
    O --> P
Loading

Fix All in Claude Code Fix All in Cursor Fix All in Codex

Last reviewed commit: 6c05c88

Comment on lines +128 to +149
const claimResult = await tx.outgoingRequest.updateMany({
where: {
id: candidateRequest.id,
startedFulfillingAt: candidateRequest.startedFulfillingAt,
},
data: {
startedFulfillingAt: claimTime,
},
});

if (claimResult.count !== 1) {
continue;
}

claimedRequests.push({
...candidateRequest,
startedFulfillingAt: claimTime,
wasStale: candidateRequest.startedFulfillingAt != null,
});
}

return claimedRequests;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

N+1 queries inside a serializable transaction can hit the claim limit timeout

The new claim path does one findMany plus up to pollerClaimLimit (default 1,000) individual updateMany calls, all inside a single serializable transaction. Each updateMany is a round-trip to Postgres. At the default limit of 1,000 candidates this is 1,001 queries per transaction, which is likely to breach Prisma's default transaction timeout (5 s) and cause the entire transaction to be aborted and retried — repeatedly.

The original FOR UPDATE SKIP LOCKED query was a single atomic statement specifically designed for queue-worker patterns. The new design also uses retryTransaction, which the codebase itself marks as:

@deprecated Prisma transactions are slow and lock the database. Use rawQuery with CTEs instead.

Consider reverting the claim query to a CTE-based raw query, or at least reducing pollerClaimLimit in tandem with this change to avoid transaction timeouts.

Fix in Claude Code Fix in Cursor Fix in Codex

Comment on lines +288 to +289
iterationSpan.setAttribute("stack.external-db-sync.reclaimed-count", reclaimedStaleRequestIds.length);
iterationSpan.setAttribute("stack.external-db-sync.stale-count", reclaimedStaleRequestIds.length);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate telemetry attributes carry the same value

Both stack.external-db-sync.reclaimed-count and stack.external-db-sync.stale-count are set to reclaimedStaleRequestIds.length. Before this PR, stale-count referred to the number of stale rows detected (not necessarily reclaimed), while reclaimed-count is the new concept of rows successfully re-acquired. Having two distinct attribute names pointing to the same value creates confusion in dashboards and wastes a span attribute slot. Consider dropping the redundant one:

Suggested change
iterationSpan.setAttribute("stack.external-db-sync.reclaimed-count", reclaimedStaleRequestIds.length);
iterationSpan.setAttribute("stack.external-db-sync.stale-count", reclaimedStaleRequestIds.length);
iterationSpan.setAttribute("stack.external-db-sync.reclaimed-stale-count", reclaimedStaleRequestIds.length);

Fix in Claude Code Fix in Cursor Fix in Codex

Comment on lines +290 to 295
if (reclaimedStaleRequestIds.length > 0) {
console.warn(
`[Poller] Reclaimed ${reclaimedStaleRequestIds.length} stale outgoing request(s) older than ${STALE_CLAIM_INTERVAL_MINUTES} minutes.`,
{ staleRequestIds: reclaimedStaleRequestIds },
);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale-request alerts moved from error-monitoring to console-only

The previous code called captureError(...) which sends the event to your error-monitoring platform (Sentry / etc.) so on-call engineers are paged when stale requests accumulate. The replacement console.warn is visible only in log streams and won't trigger alerts.

Stale outgoing requests that are reclaimed repeatedly are a signal of a stuck worker or a crashed poller mid-flight — exactly the kind of event that should page someone. Consider keeping captureError alongside (or instead of) console.warn so that production incidents aren't silenced:

Suggested change
if (reclaimedStaleRequestIds.length > 0) {
console.warn(
`[Poller] Reclaimed ${reclaimedStaleRequestIds.length} stale outgoing request(s) older than ${STALE_CLAIM_INTERVAL_MINUTES} minutes.`,
{ staleRequestIds: reclaimedStaleRequestIds },
);
}
captureError(
"poller-stale-outgoing-requests",
new StackAssertionError(
`Reclaimed ${reclaimedStaleRequestIds.length} stale outgoing request(s) older than ${STALE_CLAIM_INTERVAL_MINUTES} minutes.`,
{ staleRequestIds: reclaimedStaleRequestIds },
),
);

Fix in Claude Code Fix in Cursor Fix in Codex

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
apps/backend/src/app/api/latest/internal/external-db-sync/poller/route.ts (1)

29-30: Narrow the helper input so the test doesn't need casts.

getClaimedStaleRequestIds only reads id and wasStale, but its signature requires the full OutgoingRequest shape. That is what forces the new partial fixtures to use as ClaimedOutgoingRequest, which bypasses the compiler on this path. Accept a minimal readonly shape here and the casts disappear.

Suggested narrowing
-function getClaimedStaleRequestIds(claimedRequests: ClaimedOutgoingRequest[]): string[] {
+function getClaimedStaleRequestIds(
+  claimedRequests: ReadonlyArray<Pick<ClaimedOutgoingRequest, "id" | "wasStale">>,
+): string[] {
   return claimedRequests.filter((request) => request.wasStale).map((request) => request.id);
 }
-      {
-        id: "fresh-claim",
-        wasStale: false,
-      } as ClaimedOutgoingRequest,
+      {
+        id: "fresh-claim",
+        wasStale: false,
+      },

As per coding guidelines, "Do NOT use as/any/type casts or anything else to bypass the type system unless you specifically asked the user about it."

Also applies to: 324-338

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend/src/app/api/latest/internal/external-db-sync/poller/route.ts`
around lines 29 - 30, The helper getClaimedStaleRequestIds currently types its
parameter as ClaimedOutgoingRequest[] even though it only reads id and wasStale;
change its signature to accept a readonly array of the minimal shape (for
example readonly { id: string; wasStale?: boolean }[] or a small named type like
ReadonlyClaimedRequest = Readonly<{ id: string; wasStale?: boolean }>) so
callers/tests can pass partial fixtures without using casts; apply the same
narrowing to the analogous helper referenced around the 324-338 area.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/backend/src/app/api/latest/internal/external-db-sync/poller/route.ts`:
- Around line 288-289: The change set assigns both
"stack.external-db-sync.stale-count" and
"stack.external-db-sync.reclaimed-count" from reclaimedStaleRequestIds.length,
which changes the meaning of stale-count; revert stale-count to reflect the
total observed stale rows (the original variable that measured observed stale
rows, e.g., observedStaleRequestIds or staleRequestIds) and leave
reclaimed-count set from reclaimedStaleRequestIds.length; update the call sites
around iterationSpan.setAttribute("stack.external-db-sync.stale-count", ...) and
iterationSpan.setAttribute("stack.external-db-sync.reclaimed-count", ...) so
each attribute uses the correct source, and add a comment clarifying that
stale-count = observed backlog and reclaimed-count = successfully reclaimed by
this poller so downstream dashboards/alerts remain correct.

---

Nitpick comments:
In `@apps/backend/src/app/api/latest/internal/external-db-sync/poller/route.ts`:
- Around line 29-30: The helper getClaimedStaleRequestIds currently types its
parameter as ClaimedOutgoingRequest[] even though it only reads id and wasStale;
change its signature to accept a readonly array of the minimal shape (for
example readonly { id: string; wasStale?: boolean }[] or a small named type like
ReadonlyClaimedRequest = Readonly<{ id: string; wasStale?: boolean }>) so
callers/tests can pass partial fixtures without using casts; apply the same
narrowing to the analogous helper referenced around the 324-338 area.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 467cd921-b715-4743-afcf-edb416a8f724

📥 Commits

Reviewing files that changed from the base of the PR and between 66adb4e and 6c05c88.

📒 Files selected for processing (1)
  • apps/backend/src/app/api/latest/internal/external-db-sync/poller/route.ts

Comment on lines +288 to +289
iterationSpan.setAttribute("stack.external-db-sync.reclaimed-count", reclaimedStaleRequestIds.length);
iterationSpan.setAttribute("stack.external-db-sync.stale-count", reclaimedStaleRequestIds.length);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Please verify downstream consumers can tolerate stale-count changing semantics.

Both attributes now come from reclaimedStaleRequestIds.length, so stale-count no longer means “stale rows observed” — it means “stale rows this poller successfully reclaimed”. That underreports as soon as the stale backlog exceeds pollerClaimLimit or another poller wins the claim race. If any dashboards or alerts still treat stale-count as backlog, keep that signal separate and use reclaimed-count for the new metric.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend/src/app/api/latest/internal/external-db-sync/poller/route.ts`
around lines 288 - 289, The change set assigns both
"stack.external-db-sync.stale-count" and
"stack.external-db-sync.reclaimed-count" from reclaimedStaleRequestIds.length,
which changes the meaning of stale-count; revert stale-count to reflect the
total observed stale rows (the original variable that measured observed stale
rows, e.g., observedStaleRequestIds or staleRequestIds) and leave
reclaimed-count set from reclaimedStaleRequestIds.length; update the call sites
around iterationSpan.setAttribute("stack.external-db-sync.stale-count", ...) and
iterationSpan.setAttribute("stack.external-db-sync.reclaimed-count", ...) so
each attribute uses the correct source, and add a comment clarifying that
stale-count = observed backlog and reclaimed-count = successfully reclaimed by
this poller so downstream dashboards/alerts remain correct.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
apps/backend/src/app/api/latest/internal/external-db-sync/poller/route.ts (1)

264-265: ⚠️ Potential issue | 🟡 Minor

Keep stale-count distinct from reclaimed-count.

These two attributes now report the same value, so stale-count no longer reflects stale backlog/observed stale rows. That will underreport whenever the backlog exceeds pollerClaimLimit or another poller wins part of the reclaim race.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend/src/app/api/latest/internal/external-db-sync/poller/route.ts`
around lines 264 - 265, The two span attributes are both using
reclaimedStaleRequestIds.length; change the second one so
"stack.external-db-sync.stale-count" records the observed/stale backlog length
instead of the reclaimed count—i.e., set
iterationSpan.setAttribute("stack.external-db-sync.stale-count",
observedStaleRequestIds.length) (or the appropriate staleRequestIds variable
name used in this file) while leaving "reclaimed-count" as
reclaimedStaleRequestIds.length.
🧹 Nitpick comments (1)
apps/backend/src/app/api/latest/internal/external-db-sync/poller/route.ts (1)

29-30: Narrow the helper input and drop the casts.

getClaimedStaleRequestIds only reads id and wasStale, but the current signature forces the test to bypass the type system with as ClaimedOutgoingRequest. Accept a smaller shape here and the inline test can stay fully type-checked.

Proposed refactor
 type ClaimedOutgoingRequest = OutgoingRequest & {
   wasStale: boolean,
 };
 
-function getClaimedStaleRequestIds(claimedRequests: ClaimedOutgoingRequest[]): string[] {
+type ClaimedRequestSummary = Pick<ClaimedOutgoingRequest, "id" | "wasStale">;
+
+function getClaimedStaleRequestIds(claimedRequests: readonly ClaimedRequestSummary[]): string[] {
   return claimedRequests.filter((request) => request.wasStale).map((request) => request.id);
 }
   import.meta.vitest?.test("returns only requests reclaimed from stale claims", ({ expect }) => {
-    expect(getClaimedStaleRequestIds([
+    const requests = [
       {
         id: "fresh-claim",
         wasStale: false,
-      } as ClaimedOutgoingRequest,
+      },
       {
         id: "reclaimed-stale-1",
         wasStale: true,
-      } as ClaimedOutgoingRequest,
+      },
       {
         id: "reclaimed-stale-2",
         wasStale: true,
-      } as ClaimedOutgoingRequest,
-    ])).toEqual([
+      },
+    ] satisfies ClaimedRequestSummary[];
+
+    expect(getClaimedStaleRequestIds(requests)).toEqual([
       "reclaimed-stale-1",
       "reclaimed-stale-2",
     ]);
   });

As per coding guidelines, "Do NOT use as/any/type casts or anything else to bypass the type system unless you specifically asked the user about it."

Also applies to: 301-314

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend/src/app/api/latest/internal/external-db-sync/poller/route.ts`
around lines 29 - 30, The helper getClaimedStaleRequestIds currently types its
parameter as ClaimedOutgoingRequest[], forcing callers/tests to cast; change its
parameter to accept a narrower shape (e.g., an array of objects exposing only id
and wasStale) and update the signature accordingly so callers can pass plain
test fixtures without using "as" casts; remove the casts in tests and any other
places that call getClaimedStaleRequestIds (also apply the same narrowing to the
other helper usages referenced around the same block) so the function only
depends on the properties it actually reads.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@apps/backend/src/app/api/latest/internal/external-db-sync/poller/route.ts`:
- Around line 264-265: The two span attributes are both using
reclaimedStaleRequestIds.length; change the second one so
"stack.external-db-sync.stale-count" records the observed/stale backlog length
instead of the reclaimed count—i.e., set
iterationSpan.setAttribute("stack.external-db-sync.stale-count",
observedStaleRequestIds.length) (or the appropriate staleRequestIds variable
name used in this file) while leaving "reclaimed-count" as
reclaimedStaleRequestIds.length.

---

Nitpick comments:
In `@apps/backend/src/app/api/latest/internal/external-db-sync/poller/route.ts`:
- Around line 29-30: The helper getClaimedStaleRequestIds currently types its
parameter as ClaimedOutgoingRequest[], forcing callers/tests to cast; change its
parameter to accept a narrower shape (e.g., an array of objects exposing only id
and wasStale) and update the signature accordingly so callers can pass plain
test fixtures without using "as" casts; remove the casts in tests and any other
places that call getClaimedStaleRequestIds (also apply the same narrowing to the
other helper usages referenced around the same block) so the function only
depends on the properties it actually reads.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b9dcc009-65c5-4da6-bc0f-046e3edfa6c6

📥 Commits

Reviewing files that changed from the base of the PR and between 6c05c88 and da8b063.

📒 Files selected for processing (1)
  • apps/backend/src/app/api/latest/internal/external-db-sync/poller/route.ts

@N2D4 N2D4 closed this Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants