Sqlserver connector: Data loss with direct mode on update with  fetch size > 0

## Bug report

**What Debezium connector do you use and what version?**

SQL Server connector. Reproduced on the **latest `main` (3.6.0-SNAPSHOT)**, built and run locally. The behavior also affects earlier releases (the relevant code is unchanged).

---

**What is the connector configuration?**

Key settings that trigger it:
```json
{
  "connector.class": "io.debezium.connector.sqlserver.SqlServerConnector",
  "data.query.mode": "direct",
  "streaming.fetch.size": "5",
  "snapshot.mode": "no_data",
  "database.names": "bgoyal_test",
  "table.include.list": "dbo.users"
}
```
`streaming.fetch.size` is set to `5` here to reproduce minimally; in production it is the default `10000` (any non-zero value reproduces it once the transaction exceeds the fetch boundary).

---

**What is the captured database version and mode of deployment?**

Microsoft SQL Server 2019 (RTM-CU32) 15.0.4430.1, self-managed (VM/EC2). CDC enabled via `sys.sp_cdc_enable_table`.

---

**What behavior do you expect?**

Every change captured by SQL Server CDC for a committed transaction should be emitted to Kafka exactly once, independent of `data.query.mode` and `streaming.fetch.size`. An update to a unique-index column should produce all of its delete/insert events.

---

**What behavior do you see?**

Data loss of update events when using `data.query.mode=direct` with `streaming.fetch.size` > 0. The loss happens only when the update is to a column that is part of the table's **unique index** — SQL Server records this as a "deferred update", i.e. a **delete followed by an insert with the same `__$seqval`**. If the transaction is a large bulk update that does not fully return in one fetch (because of `streaming.fetch.size`), the attempt to fetch the rest of the transaction returns only a subset of the rows, so some events are silently dropped.

Example. Table:
```sql
CREATE TABLE dbo.users (
    id         INT          NOT NULL PRIMARY KEY CLUSTERED,
    email      VARCHAR(100) NOT NULL,
    name       VARCHAR(100) NOT NULL,
    updated_at DATETIME     NOT NULL DEFAULT getutcdate()
);
CREATE UNIQUE NONCLUSTERED INDEX UX_users_email ON dbo.users(email);
EXEC sys.sp_cdc_enable_table @source_schema='dbo', @source_name='users',
     @role_name=NULL, @supports_net_changes=0;
```
Update to the unique-index column:
```sql
UPDATE dbo.users SET email = name + '_v0@example.com' WHERE id BETWEEN 1 AND 5;
```
On the CDC table this produces 10 rows (5 deletes + 5 inserts). Ordered as the **direct-mode** query orders them (`ORDER BY __$start_lsn, __$command_id, __$seqval, __$operation`):
```
commit_lsn            |seqval                |__$command_id|__$operation|name |email
----------------------+----------------------+-------------+------------+-----+--------------------
0x00000a24000018a80015|0x00000a24000018a80003|            1|           1|alice|alice_v1@example.com
0x00000a24000018a80015|0x00000a24000018a80004|            2|           1|bob  |bob_v1@example.com
0x00000a24000018a80015|0x00000a24000018a80005|            3|           1|carol|carol_v1@example.com
0x00000a24000018a80015|0x00000a24000018a80006|            4|           1|dave |dave_v1@example.com
0x00000a24000018a80015|0x00000a24000018a80007|            5|           1|eve  |eve_v1@example.com
0x00000a24000018a80015|0x00000a24000018a80003|            6|           2|alice|alice_v0@example.com
0x00000a24000018a80015|0x00000a24000018a80004|            7|           2|bob  |bob_v0@example.com
0x00000a24000018a80015|0x00000a24000018a80005|            8|           2|carol|carol_v0@example.com
0x00000a24000018a80015|0x00000a24000018a80006|            9|           2|dave |dave_v0@example.com
0x00000a24000018a80015|0x00000a24000018a80007|           10|           2|eve  |eve_v0@example.com
```
Ordered as the **function-mode** query orders them (`ORDER BY __$start_lsn, __$seqval, __$operation` — no `__$command_id`), the delete/insert of each row stay adjacent and `__$seqval` is monotonic:
```
commit_lsn            |seqval                |__$operation|name |email
----------------------+----------------------+------------+-----+--------------------
0x00000a24000018a80015|0x00000a24000018a80003|           1|alice|alice_v1@example.com
0x00000a24000018a80015|0x00000a24000018a80003|           2|alice|alice_v0@example.com
0x00000a24000018a80015|0x00000a24000018a80004|           1|bob  |bob_v1@example.com
0x00000a24000018a80015|0x00000a24000018a80004|           2|bob  |bob_v0@example.com
0x00000a24000018a80015|0x00000a24000018a80005|           1|carol|carol_v1@example.com
0x00000a24000018a80015|0x00000a24000018a80005|           2|carol|carol_v0@example.com
0x00000a24000018a80015|0x00000a24000018a80006|           1|dave |dave_v1@example.com
0x00000a24000018a80015|0x00000a24000018a80006|           2|dave |dave_v0@example.com
0x00000a24000018a80015|0x00000a24000018a80007|           1|eve  |eve_v1@example.com
0x00000a24000018a80015|0x00000a24000018a80007|           2|eve  |eve_v0@example.com
```
With `streaming.fetch.size=5`, the Kafka topic receives **5 deletes and only 1 create** (the `eve` / `command_id 10` row). The 4 inserts for `alice/bob/carol/dave` are missing:
```
op=d id=1 change_lsn=00000a24:000018a8:0003
op=d id=2 change_lsn=00000a24:000018a8:0004
op=d id=3 change_lsn=00000a24:000018a8:0005
op=d id=4 change_lsn=00000a24:000018a8:0006
op=d id=5 change_lsn=00000a24:000018a8:0007
op=c id=5 change_lsn=00000a24:000018a8:0007 event_serial_no=2     <-- only surviving insert
```
All inserts with `seqval < 00000a24:000018a8:0007` are dropped. Only `command_id 10` survives because its `seqval` (`...0007`) is `>=` the last-seen `seqval` and its operation is `> 1`. The issue reproduces whenever the number of (deferred) updates exceeds the `streaming.fetch.size` boundary.

**Likely cause:** the first fetch returns the 5 deletes (`command_id 1–5`, ordered by `command_id`), so the resume position becomes the *highest delete* `seqval` (`...0007`). The next fetch resumes with `getNextResultSet()` using a `WHERE __$seqval > ?` predicate that does **not** account for `__$command_id`, so the inserts whose `seqval` is below that high-water mark are skipped. In other words, in direct mode the query `ORDER BY` uses `__$command_id` but the pagination resume key uses only `__$seqval`, and the two disagree.

---

**Do you see the same behaviour using the latest released Debezium version?**

Yes — reproduced on the latest `main` (3.6.0-SNAPSHOT) built locally. It does **not** occur with `data.query.mode=function` (whose `ORDER BY` has no `__$command_id`, so `__$seqval` stays monotonic and delete/insert pairs remain adjacent).

---

**Do you have the connector logs, ideally from start till finish?**

```
TRACE  Getting top 5 changes for table Capture instance "dbo_orders" [...changeTableId=...cdc.dbo_orders_CT...] in range [00000a23:0001b318:0010-00000000:00000000:0000-0, 00000a23:0001b838:0015]   (io.debezium...SqlServerConnection:421)
TRACE  Processing change ChangeTableResultSet{... currentChangePosition=00000a23:0001b838:0015(00000a23:0001b838:0003,1)}   (...SqlServerStreamingChangeEventSource:314)
TRACE  Processing change ChangeTableResultSet{... currentChangePosition=00000a23:0001b838:0015(00000a23:0001b838:0008,1)}
TRACE  Processing change ChangeTableResultSet{... currentChangePosition=00000a23:0001b838:0015(00000a23:0001b838:000b,1)}
TRACE  Processing change ChangeTableResultSet{... currentChangePosition=00000a23:0001b838:0015(00000a23:0001b838:000e,1)}
TRACE  Processing change ChangeTableResultSet{... currentChangePosition=00000a23:0001b838:0015(00000a23:0001b838:0011,1)}
TRACE  Getting top 5 changes for table Capture instance "dbo_orders" [...] in range [00000a23:0001b838:0015-00000a23:0001b838:0011-1, 00000a23:0001b838:0015]   (io.debezium...SqlServerConnection:421)
INFO   11 records sent during previous ... last recorded offset ... is {event_serial_no=1, commit_lsn=00000a23:0001b838:0015, change_lsn=00000a23:0001b838:0011}   (io.debezium...BaseSourceTask:387)
TRACE  Processing change ChangeTableResultSet{... currentChangePosition=00000a23:0001b838:0015(00000a23:0001b838:0011,2)}   <-- only the op2 insert at the high-water seqval survives
```

---

**How to reproduce the issue using our tutorial deployment?**

1. Create the `dbo.users` table and unique nonclustered index, enable CDC (DDL above).
2. Insert 5 rows; start the SQL Server connector with `data.query.mode=direct`, `streaming.fetch.size=5`, `snapshot.mode=no_data`, capturing `dbo.users`.
3. In a single transaction, update the unique-index column on all 5 rows:
   `UPDATE dbo.users SET email = name + '_v0@example.com' WHERE id BETWEEN 1 AND 5;`
4. Consume the topic: expected 5 deletes + 5 creates; actual 5 deletes + 1 create (4 rows lost). Increase the row count / `fetch.size` to confirm it scales with the fetch boundary.

The potential cause seems to be the `getNextResultSet()` call not taking `__$command_id` into account in direct mode. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sqlserver connector: Data loss with direct mode on update with fetch size > 0 #2012

Bug report

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Sqlserver connector: Data loss with direct mode on update with fetch size > 0 #2012

Description

Bug report

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions