@@ -334,6 +334,177 @@ For more information about CDC on Azure SQL Database, see Microsoft's
334334[Change Data Capture with Azure SQL Database](https:// learn .microsoft .com/ en- us/ azure/ azure- sql/ database/ change- data- capture- overview?view= azuresql)
335335guide.
336336
337+ # ### Reducing end-to-end latency on Azure SQL Database
338+
339+ Because the capture cadence on Azure SQL Database is fixed at ~20 seconds and
340+ cannot be tuned, the CDC step alone can add up to that much latency to your
341+ end- to- end change- propagation time . If your workload needs lower latency, you
342+ can supplement the automatic Azure scheduler with an external worker that
343+ periodically calls the ` sys.sp_cdc_scan` stored procedure. The automatic
344+ scheduler continues to run; each manual call adds an extra CDC log scan in
345+ between, lowering the effective capture cadence to roughly the worker' s
346+ polling interval.
347+
348+ Each call runs one CDC log scan, bounded by the `maxtrans` and `maxscans`
349+ parameters covered in
350+ [SQL Server capture job agent configuration parameters](#sql-server-capture-job-agent-configuration-parameters).
351+ On Azure SQL Database, `pollinginterval` and `continuous` do not apply, but
352+ `maxtrans` and `maxscans` remain tunable via `sp_cdc_change_job`. For low and
353+ moderate change volumes the defaults are usually fine — each call drains the
354+ pending transactions. For high-volume workloads, raise `maxtrans` and
355+ `maxscans` if a single call cannot keep up with the change rate.
356+
357+ This is a customer-operated workaround for an Azure platform limitation, not a
358+ Redis-supplied component. It does not apply to Azure SQL Managed Instance,
359+ SQL Server on Azure VM, or on-premises SQL Server — those use SQL Server Agent
360+ and the tunable `pollinginterval` parameter described in
361+ [SQL Server capture job agent configuration parameters](#sql-server-capture-job-agent-configuration-parameters).
362+
363+ {{< warning >}}Run **only one** instance of the scan worker per source database.
364+ `sys.sp_cdc_scan` holds an exclusive log-reader lock for the duration of each
365+ call; concurrent callers fail rather than running in parallel, so additional
366+ replicas add no throughput and only generate error noise.{{< /warning >}}
367+
368+ ##### Requirements
369+
370+ - A database identity with permission to execute `sys.sp_cdc_scan`. This
371+ requires `db_owner` and is **more privilege than the Debezium user needs**, so
372+ create a separate login dedicated to the scan worker rather than reusing the
373+ RDI source credentials.
374+ - A single-replica runtime (a Kubernetes Deployment with `replicas: 1`, a
375+ systemd unit, a serverless cron with `maxConcurrency: 1`, or equivalent).
376+ - Network access from the worker to the Azure SQL endpoint on TCP 1433.
377+
378+ ##### Scan loop
379+
380+ The worker repeatedly opens a connection (or holds a long-lived one), runs
381+ `EXEC sys.sp_cdc_scan;` with a bounded command timeout, sleeps for the
382+ configured interval, and handles two expected error classes:
383+
384+ - **Scan already in progress** — `sys.sp_cdc_scan` cannot run while another
385+ CDC log scan is active, either the Azure-internal scheduler' s scan or a
386+ previous call from this worker that has not yet returned. The procedure
387+ returns a SQL error in this state. The error message has been observed to
388+ contain ` sp_replcmds` (the underlying log- reader procedure), but the exact
389+ wording is not contractual — match by whatever signature your client
390+ surfaces, then log the occurrence and continue. Do not back off.
391+ - ** Connection or transport errors** — close and reopen the connection with
392+ exponential backoff before the next attempt.
393+
394+ The example below shows the loop in pseudocode:
395+
396+ ` ` ` text
397+ loop until shutdown:
398+ start = now()
399+ try:
400+ EXEC sys.sp_cdc_scan # command timeout: 30s
401+ log("scan_ok", now() - start)
402+ catch SqlException identifying "scan already active":
403+ log("scan_already_running", now() - start)
404+ catch any other exception as e:
405+ log("scan_error", e)
406+ reconnect with exponential backoff
407+ sleep(max(0, interval - (now() - start)))
408+ ` ` `
409+
410+ # #### Choosing the interval
411+
412+ The scan interval directly trades end- to- end latency against source- database
413+ load — each call reads the transaction log. Pick the largest interval that
414+ meets your latency target:
415+
416+ | Interval | Approximate CDC- step latency | Typical use |
417+ | -- - | --- | --- |
418+ | No worker | Up to ~20s | Azure SQL Database default; the automatic scheduler runs every ~20s. |
419+ | 5s | Around 5s | Workload tolerates ~5s end- to- end. |
420+ | 2s | Around 2s under low to moderate load; can be higher under heavy write volume | Latency- sensitive workloads. Confirm the achieved latency under your own workload before relying on it. |
421+
422+ Intervals below 1s are not recommended — each call has a fixed cost on the
423+ source database and the marginal latency improvement is small.
424+
425+ {{< warning > }}CDC scans consume regular database resources. Every call reads
426+ the transaction log, competing with the workload for CPU, memory, and log I/ O.
427+ An aggressive interval can degrade the source database, especially on lower
428+ service tiers or under high write volume. Microsoft provides no SLA on CDC
429+ freshness on Azure SQL Database; treat measured end- to- end latency under your
430+ own workload as the source of truth, not the configured interval. If scans
431+ start falling behind, raise the service tier, raise ` maxtrans` and ` maxscans` ,
432+ or relax the interval.{{< / warning > }}
433+
434+ # #### Example Kubernetes deployment
435+
436+ A minimal single- replica deployment skeleton — adapt the image, namespace, and
437+ secret reference to your environment:
438+
439+ ` ` ` yaml
440+ apiVersion: apps/v1
441+ kind: Deployment
442+ metadata:
443+ name: azure-sql-cdc-scan-worker
444+ spec:
445+ replicas: 1
446+ selector:
447+ matchLabels:
448+ app: azure-sql-cdc-scan-worker
449+ template:
450+ metadata:
451+ labels:
452+ app: azure-sql-cdc-scan-worker
453+ spec:
454+ containers:
455+ - name: worker
456+ image: <your-registry>/<your-scan-worker-image>:<tag>
457+ env:
458+ - name: SQL_HOST
459+ value: <server-name>.database.windows.net
460+ - name: SQL_DATABASE
461+ value: <database-name>
462+ - name: SCAN_INTERVAL_MS
463+ value: "2000"
464+ envFrom:
465+ - secretRef:
466+ name: <scan-worker-db-secret>
467+ ` ` `
468+
469+ The secret referenced by ` envFrom` must provide the credentials of the
470+ ` db_owner` identity created for the scan worker — not the RDI source
471+ credentials.
472+
473+ # #### Verifying the workaround
474+
475+ After the worker has been running for a few minutes, confirm that the
476+ effective scan cadence has dropped to the worker' s interval by querying the
477+ `sys.dm_cdc_log_scan_sessions` dynamic management view. This DMV records both
478+ the automatic scheduler' s scans and the worker' s manual scans, so the gap
479+ between successive `start_time` values should now match the worker' s interval:
480+
481+ ` ` ` sql
482+ -- Recent CDC log-scan sessions (manual and automatic combined)
483+ SELECT TOP (10)
484+ session_id, start_time, end_time, duration, scan_phase,
485+ latency, tran_count, last_commit_cdc_time
486+ FROM sys.dm_cdc_log_scan_sessions
487+ WHERE session_id > 0
488+ ORDER BY session_id DESC
489+ ` ` `
490+
491+ To check the commit time of the latest change captured for a specific table,
492+ map the highest captured LSN back to a time using ` sys.fn_cdc_map_lsn_to_time` :
493+
494+ ` ` ` sql
495+ -- Replace <capture-instance> with the capture instance name shown by
496+ -- sys.sp_cdc_help_change_data_capture
497+ SELECT sys.fn_cdc_map_lsn_to_time(MAX(__$start_lsn)) AS latest_captured_commit_time
498+ FROM cdc.<capture-instance>_CT
499+ ` ` `
500+
501+ The difference between that value and the current time is an upper bound on
502+ how stale the captured stream is for that table.
503+
504+ You can also confirm that end- to- end change propagation through RDI now meets
505+ your latency target by measuring ` <change committed in source> → <change visible in Redis>`
506+ on a representative table.
507+
337508# ### Azure SQL Managed Instance
338509
339510Follow the on - premises instructions for
0 commit comments