Skip to content

fix(taskprocessing): claim tasks atomically so parallel workers don't duplicate#61053

Open
bygadd wants to merge 3 commits into
nextcloud:masterfrom
bygadd:fix/taskprocessing-worker-atomic-claim
Open

fix(taskprocessing): claim tasks atomically so parallel workers don't duplicate#61053
bygadd wants to merge 3 commits into
nextcloud:masterfrom
bygadd:fix/taskprocessing-worker-atomic-claim

Conversation

@bygadd

@bygadd bygadd commented Jun 6, 2026

Copy link
Copy Markdown

Developed with AI assistance (Claude Code)

Summary

Replace the TaskProcessing worker's retry/ignore-list claim loop with a single atomic SELECT ... FOR UPDATE SKIP LOCKED claim, and add the supporting composite index via the table-creating migration plus the db:add-missing-indices listener (per maintainer guidance). This supersedes the earlier dup-fix retry-loop approach on this PR.

Production context

On a real deployment we observed a TaskProcessing backlog of ~6,200 scheduled tasks draining at only ~5 tasks/min with multiple workers running. Instrumentation showed the claim phase consumed ~99% of each worker's wall time — not the actual task execution. Two compounding faults:

  1. Thundering herd on the claim. Every worker SELECTed the same oldest scheduled row, then locked it separately. The losers built a per-worker id NOT IN (...) ignore list and retried, which made each subsequent SELECT progressively heavier under contention.
  2. Unindexed ordered scan. The WHERE status = ? [AND type IN (...)] ORDER BY last_updated ASC LIMIT 1 claim had no covering index. On a large tasks table this degraded to a full scan (observed ~11s on a 213k-row table), so the claim loop alone took 50–240s per task.

After the change, the same backlog drains at ~79 tasks/min — the worker spends its time executing tasks, not contending for them.

Approach

  • TaskMapper::claimOldestScheduledTask() claims the oldest scheduled task in one atomic step. On MySQL/MariaDB/PostgreSQL it runs SELECT ... ORDER BY last_updated ASC LIMIT 1 FOR UPDATE SKIP LOCKED inside a transaction, then a guarded UPDATE to RUNNING. Concurrent workers skip rows already locked by another transaction, so no two workers ever claim the same task and no retry/ignore-list is needed. The no-duplicate guarantee is preserved.

  • SQLite fallback. SKIP LOCKED is unsupported on SQLite (Doctrine throws), so we feature-detect via IDBConnection::getDatabaseProvider() and fall back to the existing bounded lock-and-retry claim (claimWithBoundedRetry). This stays correct because the UPDATE ... WHERE status = SCHEDULED is atomic and SQLite serialises writers. findOldestScheduledByType() / lockTask() are kept for the fallback and backward compatibility.

  • Manager::claimNextScheduledTask() exposes the claim on the public IManager interface.

  • WorkerCommand::processNextTask() now makes a single claim call instead of fetch-then-lock: null -> no work, otherwise process. The old retry/ignore-list loop is removed.

  • Composite index taskp_status_type_upd on (status, type, last_updated) so the ordered oldest-scheduled claim is index-served (~11s -> ms). Following @marcelklehr's guidance, the index is not a standalone migration: it is added to the table-creating migration (Version30000Date20240429122720) for fresh installs, and registered in core/Listener/AddMissingIndicesListener.php so existing installs pick it up via occ db:add-missing-indices.

  • started_at recorded at claim time. Since the worker now receives the task already RUNNING, the later Manager::setTaskStatus SCHEDULED -> RUNNING edge (which used to set started_at) no longer fires; the claim's UPDATE persists started_at directly, so the timestamp is preserved.

A task is only ever transitioned SCHEDULED -> RUNNING by claiming; it is never marked FAILED by the claim path.

Testing

  • composer cs:fix + composer cs:check: clean on the touched files.
  • composer psalm: no new errors introduced by these changes.
  • TaskProcessing PHPUnit suite (tests/lib/TaskProcessing/) green, including the SKIP LOCKED claim tests (claim returns the running task, returns null when nothing is schedulable, and never double-claims), a started_at-recorded-at-claim test, and a SQL-shape test asserting the claim emits FOR UPDATE/SKIP LOCKED on non-SQLite. WorkerCommandTest updated for the single-claim flow.
  • Validated on a production instance (NC 33, MariaDB 10.11, 8 taskprocessing:worker processes): an injected backlog drains via the SKIP LOCKED claim, no duplicates, and started_at is populated for every claimed task. occ db:add-missing-indices is a no-op there because the index already exists (idempotent hasIndex check) — i.e. no re-index is triggered on installs that already have it.

Note for reviewers

@marcelklehr suggested adding the index to "the existing migration that adds the table (or the last created column)." This PR adds it to the table-creating migration Version30000. If you'd prefer the latest column-adding migration instead, that is a one-line move of the same addIndex call to a different migration file — happy to adjust.

@bygadd bygadd requested a review from a team as a code owner June 6, 2026 16:12
@bygadd bygadd requested review from ArtificialOwl, artonge, leftybournes and salmart-dev and removed request for a team June 6, 2026 16:12
@susnux susnux requested a review from marcelklehr June 7, 2026 09:33
@susnux susnux added bug 3. to review Waiting for reviews labels Jun 7, 2026
@susnux susnux added this to the Nextcloud 35 milestone Jun 7, 2026
@susnux susnux requested a review from oleksandr-nc June 7, 2026 09:35

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes duplicate task execution when running multiple occ taskprocessing:worker processes in parallel by making task claiming atomic and preventing already-finished tasks from being re-claimed.

Changes:

  • Updated WorkerCommand::processNextTask() to loop until it can successfully lockTask() the next scheduled task (skipping tasks that can’t be claimed using an ignore list).
  • Tightened TaskMapper::lockTask() to only claim tasks in STATUS_SCHEDULED, preventing re-claiming completed tasks.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
lib/private/TaskProcessing/Db/TaskMapper.php Restricts atomic claim update to scheduled tasks only.
core/Command/TaskProcessing/WorkerCommand.php Adds atomic lock-and-retry loop when selecting the next task to process.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread core/Command/TaskProcessing/WorkerCommand.php Outdated
@bygadd

bygadd commented Jun 8, 2026

Copy link
Copy Markdown
Author

Production follow-up: this fixes duplicates, but the claim-loop can't keep up under backlog — proposing an atomic SKIP LOCKED claim + an index

Thanks for the review, @marcelklehr. We've run this on a production NC 33 instance (MariaDB 10.11, 8 taskprocessing:worker processes, heavy Mail-summary load). Sharing what we found, with numbers, and proposing an evolution of this PR.

What works: the lockTask() claim + status = scheduled guard eliminate duplicate processing — zero duplicates at 8 workers, confirmed over repeated windows.

What we hit in production: a backlog of ~6,200 core:text2text tasks built up and drained at only ~5 tasks/min (≈15 h to clear, while new tasks kept arriving). We instrumented processNextTask():

  • getProviders() ~11 ms, getPreferredProvider() ~0.5 ms — provider enumeration is not the bottleneck.
  • ~99 % of each worker's wall-time is in the claim phase (findOldestScheduledByType() + the lockTask() retry-loop): a single processNextTask() measured 50–240 s, while the task's own LLM call is 2–5 s.

Two compounding causes:

  1. findOldestScheduledByType() orders by last_updated; on a table grown with old SUCCESSFUL rows (the default taskprocessing:task:cleanup retention is months — ours reached ~213k rows, ~208k successful) the planner scans them — ~11 s/query (EXPLAIN). No composite (status, type, last_updated) index exists.
  2. The while (true) { getNextScheduledTask() + lockTask() + $taskIdsToIgnore[] } loop is a thundering herd: all workers race the same oldest row in lockTask(); losers grow their ignore-list and re-scan, and InnoDB row-lock waits stretch each claim further.

Controlled A/B on that instance (inject a 120-task backlog, 90 s window, reversible; the batch shares a last_updated, i.e. worst-case contention):

  • this PR's claim-loop: ~0 tasks/min (workers wedged in the claim phase)
  • atomic SKIP LOCKED claim: ~79 tasks/min (LLM-bound — the real ceiling)

Why it generalizes: any instance with a non-trivial backlog + several workers + a table that's grown (the default retention makes that normal for active AI usage) hits the same — the claim-loop's cost scales with table size × worker contention, not with useful work.

Proposed evolution (same guarantee, better mechanism): replace the retry-loop with one atomic claim —

SELECT … WHERE status = :scheduled [AND type IN (…)] ORDER BY last_updated LIMIT 1 FOR UPDATE SKIP LOCKED

via IQueryBuilder::forUpdate(\OCP\DB\QueryBuilder\ConflictResolutionMode::SkipLocked), flipped to running in the same transaction. Same no-duplicate guarantee (atomic, still status = scheduled-guarded), no herd, no ignore-list. SQLite (no SKIP LOCKED) keeps a short bounded-retry fallback. Plus a migration adding the composite index (status, type, last_updated) (~11 s → ~6 ms on our table).

It's implemented, PHPUnit-tested, and running in production. Since it's a strict improvement of the same guarantee this PR already provides, would you prefer we update this PR to the SKIP LOCKED approach (+ the index migration), or land this as-is and open a follow-up? Happy to do either.

@susnux susnux added the community pull requests from community label Jun 9, 2026
@marcelklehr

Copy link
Copy Markdown
Member

@bygadd Thank you for the thorough investigation! Yes, please update this PR to the SKIP LOCKED approach! Please note that we have a dedicated process for adding new indexes on existing tables: The index-adding-code should be added in the existing migration that adds the table (or the last created column), and there should be code to add the index on existing instances using occ db:add-missing-indices. I hope this helps.

@bygadd bygadd force-pushed the fix/taskprocessing-worker-atomic-claim branch from df826bb to b00f32b Compare June 10, 2026 13:06
@bygadd

bygadd commented Jun 10, 2026

Copy link
Copy Markdown
Author

@marcelklehr updated to the SKIP LOCKED approach as requested. This revision:

  • Atomic claimTaskMapper::claimOldestScheduledTask: SELECT … WHERE status = scheduled [AND type IN (…)] ORDER BY last_updated LIMIT 1 FOR UPDATE SKIP LOCKED (via IQueryBuilder::forUpdate(\OCP\DB\QueryBuilder\ConflictResolutionMode::SkipLocked)) + a guarded UPDATE → RUNNING in one transaction; SQLite gets a bounded lock-and-retry fallback. Same no-duplicate guarantee, but no thundering-herd / ignore-list.
  • Manager::claimNextScheduledTask on the public IManager; WorkerCommand makes a single claim.
  • Index (status, type, last_updated) added to the table-creating migration and core/Listener/AddMissingIndicesListener.php per your db:add-missing-indices guidance — no standalone migration. The listener add is idempotent (hasIndex), so installs that already have the index aren't re-indexed.
  • started_at is now recorded at claim time (the task is RUNNING before processTask, so the old setTaskStatus SCHEDULED→RUNNING edge no longer set it), with a test.
  • php-cs-fixer + psalm clean; WorkerCommandTest + the SKIP LOCKED claim / started_at / SQL-shape tests green.
  • Validated on a production NC 33 instance (MariaDB 10.11, 8 workers): an injected backlog drains via the SKIP LOCKED claim, zero duplicates, and started_at is populated for every claimed task.

One open point from your note: I placed the index in the table-creating migration Version30000; happy to move it to the latest column-adding migration instead if you prefer. Thanks for the guidance!

@come-nc

come-nc commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Hello,

Please see https://github.com/nextcloud/server/blob/master/.github/CONTRIBUTING.md#ai-assisted-contributions regarding AI contributions.
Here it looks like you did not flag commits as assisted and also sent us the LLM generated description and comments instead of using your own words.

@come-nc come-nc added 2. developing Work in progress and removed 3. to review Waiting for reviews labels Jun 11, 2026
@bygadd bygadd force-pushed the fix/taskprocessing-worker-atomic-claim branch from b00f32b to 1cd198a Compare June 11, 2026 08:33
@bygadd

bygadd commented Jun 11, 2026

Copy link
Copy Markdown
Author

Thanks @come-nc, and sorry for the missing disclosure — I've updated the commit message and will add a note to the PR description.

To give some context on ownership: I run the production instance where this actually bit us. We had a few thousand core:text2text tasks piling up in the queue — mail summaries and the Assistant were barely draining and effectively unusable. I investigated it using Claude Code as a tool, but I drove the diagnosis and made the decisions. I validated the fix on my own instance: deployed it reversibly, injected a backlog, watched throughput go from ~0–5/min to ~79/min, and confirmed no duplicate processing and that started_at is recorded correctly.

Happy to walk through any part of it.

@marcelklehr

Copy link
Copy Markdown
Member

Thank you for updating the PR. It looks solid.

@marcelklehr

Copy link
Copy Markdown
Member
 ORA-02014: cannot select FOR UPDATE from view with DISTINCT, GROUP BY, etc.
Help: https://docs.oracle.com/error-help/db/ora-02014/

Oracle is being difficult as always.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Comment thread lib/private/TaskProcessing/Db/TaskMapper.php
Comment thread core/Command/TaskProcessing/WorkerCommand.php Outdated
Comment thread core/Command/TaskProcessing/WorkerCommand.php Outdated
Comment thread lib/private/TaskProcessing/Db/TaskMapper.php Outdated
@marcelklehr

Copy link
Copy Markdown
Member

I wonder if we should use this new mechanism in the ex app endpoint as well. It was designed when we didn't have support for kubernetes yet, but when running ex apps in kubernetes, there may be multiple instances competing for the same taks as well.

@bygadd

bygadd commented Jun 12, 2026

Copy link
Copy Markdown
Author

I wonder if we should use this new mechanism in the ex app endpoint as well. It was designed when we didn't have support for kubernetes yet, but when running ex apps in kubernetes, there may be multiple instances competing for the same taks as well.

Good point @marcelklehr — multiple ex-app instances under k8s can definitely race. Worth noting this PR already closes the correctness side there: the endpoint goes through lockTask, which now guards = SCHEDULED, so two instances can't both claim the same task — the loser just moves on. What's left is the efficiency win (dropping the fetch-then-lock ignore-list loop for the atomic SKIP LOCKED claim). That endpoint filters by preferred provider and has a batch variant, so I'd rather do it as a focused follow-up than widen this PR. Happy to open it.

@bygadd

bygadd commented Jun 12, 2026

Copy link
Copy Markdown
Author

Pushed d20c487 with the review feedback addressed.

lockTask now only picks up SCHEDULED tasks and sets started_at in the same UPDATE. This also fixes the SQLite fallback case where finished tasks could be picked up again.

For Oracle, I switched it to the same bounded-retry fallback we use for SQLite. The issue was that Oracle does not allow the row-limiting clause together with FOR UPDATE, which explains the ORA-02014. SKIP LOCKED stays in place for MySQL/MariaDB/Postgres, where it works fine.

I also cleaned up the ordering comments, added a regression test, and tested the full change on a production MariaDB instance with multiple workers. The backlog drains correctly, there are no duplicate picks, and started_at is recorded as expected.

@marcelklehr

Copy link
Copy Markdown
Member

That endpoint filters by preferred provider and has a batch variant, so I'd rather do it as a focused follow-up than widen this PR. Happy to open it.

That would be great! Thank you

bygadd and others added 3 commits June 12, 2026 09:46
…ite index

Replace the worker retry/ignore-list claim-loop with a single atomic
SELECT ... FOR UPDATE SKIP LOCKED claim (SQLite bounded-retry fallback),
preserving the no-duplicate guarantee while removing the thundering-herd
contention that throttled backlog draining. Add a (status,type,last_updated)
index via the table-creating migration + db:add-missing-indices listener.

Signed-off-by: Yoan Bozhilov <bygadd@gmail.com>
Assisted-by: Claude Code:claude-opus-4-8
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Marcel Klehr <mklehr@gmx.net>
…Oracle fallback

Address review feedback (@marcelklehr, Copilot):
- lockTask claims only SCHEDULED tasks (was status != RUNNING) and stamps
  started_at in the same atomic UPDATE, so a finished task cannot be re-claimed
  and the external-provider claim path records started_at as well.
- claimWithBoundedRetry re-reads after lockTask instead of a follow-up UPDATE.
- Oracle joins SQLite on the bounded-retry fallback: Oracle cannot combine a
  row-limiting clause with FOR UPDATE (ORA-02014), which failed the claim tests
  on Oracle CI.
- Reword the worker docblock/comments to "prefer oldest available" (parallel
  SKIP LOCKED does not guarantee a strict global order).
- Add a regression test that lockTask does not resurrect a finished task.

Signed-off-by: Yoan Bozhilov <bygadd@gmail.com>
Assisted-by: Claude Code:claude-opus-4-8
@marcelklehr marcelklehr force-pushed the fix/taskprocessing-worker-atomic-claim branch from d20c487 to f1f1659 Compare June 12, 2026 07:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2. developing Work in progress AI assisted bug community pull requests from community

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants