Summary
The dt._LockNextTask stored procedure contains a hard-coded index hint WITH (INDEX (PK_NewTasks)) that forces SQL Server to use the clustered primary key index for the activity task dequeue UPDATE. Under high concurrency and sustained task backlogs, this hint forces a suboptimal execution plan that causes disproportionate CPU consumption and query latency on Azure SQL, even after scaling compute.
Problem Details
The forced hint
In logic.sql, the _LockNextTask procedure contains:
UPDATE TOP (1) NewTasks WITH (READPAST)
SET ...
FROM NewTasks WITH (INDEX (PK_NewTasks))
WHERE
[TaskHub] = @TaskHub AND
([LockExpiration] IS NULL OR [LockExpiration] < @now) AND
([VisibleTime] IS NULL OR [VisibleTime] < @now)
The comment says: "The PK_NewTasks hint is specified to help ensure in-order selection."
Why the hint is problematic at scale
PK_NewTasks is defined as PRIMARY KEY (TaskHub, SequenceNumber) -- a clustered index. The WHERE clause filters on LockExpiration and VisibleTime, which are not in this index. The forced hint makes SQL Server evaluate these as residual predicates by scanning through clustered index rows until it finds one matching TOP (1) with READPAST.
With a backlog of locked or invisible tasks, this means scanning past many ineligible rows on every call. The existing IX_NewTasks_InstanceID index already INCLUDEs both LockExpiration and VisibleTime, but SQL Server cannot use it because of the forced hint.
Why the hint does not achieve its goal
The hint intends to ensure FIFO (in-order) activity task selection by SequenceNumber. However, UPDATE TOP (1) with READPAST is inherently non-deterministic -- SQL Server returns the first unlocked row it encounters, which may not be the lowest SequenceNumber. The hint provides a soft tendency toward ordering but not a guarantee, while forcing a costly plan.
Polling amplification
Each worker runs LockNextTaskActivityWorkItem in a tight loop with default settings:
MaxConcurrentActivities = Environment.ProcessorCount
MinActivityPollingInterval = 50ms
- Backoff resets immediately when a task is found
With a sustained backlog, backoff never engages. A deployment with N workers x M cores creates NxM concurrent polling threads, each calling _LockNextTask at up to 20 times/sec. This can easily produce hundreds of executions per second, all hitting the suboptimal forced plan.
Comparison with _LockNextOrchestration
The companion procedure _LockNextOrchestration uses UPDATE TOP (1) Instances WITH (READPAST) with a join to NewEvents -- without any forced index hint. This proves the pattern works correctly without the hint.
Proposed Fix
Option 1: Remove the hint (minimal change)
-- Before:
FROM NewTasks WITH (INDEX (PK_NewTasks))
-- After:
FROM NewTasks
This lets SQL Server choose the optimal plan based on current table statistics, consistent with _LockNextOrchestration.
Option 2: Add a covering index (performance optimization)
In addition to removing the hint, add a new index that covers the WHERE clause predicates:
CREATE NONCLUSTERED INDEX IX_NewTasks_LockNext
ON dt.NewTasks (TaskHub, LockExpiration, VisibleTime)
INCLUDE (SequenceNumber, LockedBy, DequeueCount)
This gives the optimizer an efficient path to find unlocked, visible tasks without scanning the clustered index.
Customer Workaround
Until a fix is released, customers can safely apply the following workaround:
-
Retrieve the current procedure definition:
SELECT OBJECT_DEFINITION(OBJECT_ID('dt._LockNextTask'))
-
Re-create the procedure with the hint removed -- copy the full definition from step 1, replace FROM NewTasks WITH (INDEX (PK_NewTasks)) with FROM NewTasks, and execute the modified CREATE OR ALTER PROCEDURE.
-
Monitor Azure SQL CPU, _LockNextTask duration, and logical reads via Query Store for 24-48 hours.
-
Re-apply after NuGet upgrades -- the durabletask-mssql package uses CREATE OR ALTER PROCEDURE during schema create/upgrade, which will restore the original procedure. Re-apply the patch after any package upgrade.
Related Issues
Summary
The
dt._LockNextTaskstored procedure contains a hard-coded index hintWITH (INDEX (PK_NewTasks))that forces SQL Server to use the clustered primary key index for the activity task dequeueUPDATE. Under high concurrency and sustained task backlogs, this hint forces a suboptimal execution plan that causes disproportionate CPU consumption and query latency on Azure SQL, even after scaling compute.Problem Details
The forced hint
In
logic.sql, the_LockNextTaskprocedure contains:The comment says: "The PK_NewTasks hint is specified to help ensure in-order selection."
Why the hint is problematic at scale
PK_NewTasksis defined asPRIMARY KEY (TaskHub, SequenceNumber)-- a clustered index. TheWHEREclause filters onLockExpirationandVisibleTime, which are not in this index. The forced hint makes SQL Server evaluate these as residual predicates by scanning through clustered index rows until it finds one matchingTOP (1)withREADPAST.With a backlog of locked or invisible tasks, this means scanning past many ineligible rows on every call. The existing
IX_NewTasks_InstanceIDindex alreadyINCLUDEs bothLockExpirationandVisibleTime, but SQL Server cannot use it because of the forced hint.Why the hint does not achieve its goal
The hint intends to ensure FIFO (in-order) activity task selection by
SequenceNumber. However,UPDATE TOP (1)withREADPASTis inherently non-deterministic -- SQL Server returns the first unlocked row it encounters, which may not be the lowestSequenceNumber. The hint provides a soft tendency toward ordering but not a guarantee, while forcing a costly plan.Polling amplification
Each worker runs
LockNextTaskActivityWorkItemin a tight loop with default settings:MaxConcurrentActivities = Environment.ProcessorCountMinActivityPollingInterval = 50msWith a sustained backlog, backoff never engages. A deployment with N workers x M cores creates NxM concurrent polling threads, each calling
_LockNextTaskat up to 20 times/sec. This can easily produce hundreds of executions per second, all hitting the suboptimal forced plan.Comparison with _LockNextOrchestration
The companion procedure
_LockNextOrchestrationusesUPDATE TOP (1) Instances WITH (READPAST)with a join toNewEvents-- without any forced index hint. This proves the pattern works correctly without the hint.Proposed Fix
Option 1: Remove the hint (minimal change)
This lets SQL Server choose the optimal plan based on current table statistics, consistent with
_LockNextOrchestration.Option 2: Add a covering index (performance optimization)
In addition to removing the hint, add a new index that covers the
WHEREclause predicates:This gives the optimizer an efficient path to find unlocked, visible tasks without scanning the clustered index.
Customer Workaround
Until a fix is released, customers can safely apply the following workaround:
Retrieve the current procedure definition:
Re-create the procedure with the hint removed -- copy the full definition from step 1, replace
FROM NewTasks WITH (INDEX (PK_NewTasks))withFROM NewTasks, and execute the modifiedCREATE OR ALTER PROCEDURE.Monitor Azure SQL CPU,
_LockNextTaskduration, and logical reads via Query Store for 24-48 hours.Re-apply after NuGet upgrades -- the
durabletask-mssqlpackage usesCREATE OR ALTER PROCEDUREduring schema create/upgrade, which will restore the original procedure. Re-apply the patch after any package upgrade.Related Issues
_LockNextTaskbehavior)ReadEntityStateAsync(separate stored procedure, but related SQL concurrency pattern)