diff --git a/docs/content.zh/docs/connectors/pipeline-connectors/db2.md b/docs/content.zh/docs/connectors/pipeline-connectors/db2.md new file mode 100644 index 00000000000..d6812c18ec2 --- /dev/null +++ b/docs/content.zh/docs/connectors/pipeline-connectors/db2.md @@ -0,0 +1,199 @@ +--- +title: "DB2" +weight: 4 +type: docs +aliases: +- /connectors/pipeline-connectors/db2 +--- + + +# DB2 Connector + +DB2 CDC Pipeline 连接器支持从 DB2 数据库读取快照数据和增量变更数据。该连接器基于 DB2 CDC Source 连接器,并向 Flink CDC Pipeline 输出变更事件。 + +## 依赖 + +IBM DB2 JDBC driver 使用 IBM IPLA 协议,该协议与 Flink CDC 项目不兼容。 +Flink CDC 不会将该驱动打包进 DB2 Pipeline 连接器 jar。 +提交 YAML Pipeline 作业时,需要手动配置以下依赖,并通过 Flink CDC CLI 的 `--jar` 参数传入。 + +
+ + + + + + + + + + + + + +
依赖名称说明
com.ibm.db2.jcc:db2jcc:db2jcc4用于连接 DB2 数据库。
+
+ +## 前置条件 + +使用 DB2 Pipeline 连接器前,请确认: + +1. DB2 已配置 CDC 采集,目标表已启用 DB2 change capture。 +2. Flink 集群可以通过 TCP 访问 DB2 服务。 +3. 配置的用户具备读取目标表和 DB2 CDC 元数据表的权限。 +4. 同一个 Pipeline Source 中的所有表属于同一个 DB2 数据库,`tables` 选项使用 `database.schema.table` 格式。 + +### 在 DB2 侧开启 CDC + +DB2 Pipeline 连接器不会自动创建 DB2 SQL Replication 相关对象。提交 Pipeline 作业前,需要由 DB2 实例用户或数据库管理员先完成 DB2 侧 CDC 配置。完整上游流程可参考 [Debezium DB2 connector 文档中的 Setting up Db2](https://debezium.io/documentation/reference/stable/connectors/db2.html#setting-up-db2)。 + +1. 为源数据库启用 DB2 SQL Replication,并开启归档日志或日志保留。容器化 DB2 环境通常需要在创建数据库前完成该配置,例如连接器测试镜像通过 `ARCHIVE_LOGS=true` 启动 DB2。修改日志模式后,需要执行一次完整数据库备份,并重启或激活数据库,确保 ASN capture 服务有可用的日志起点。 +2. 安装 Debezium DB2 管理 UDF,或创建等价的 DB2 SQL Replication 控制对象。UDF 安装流程会编译 `asncdc`,将其安装到 DB2 function 目录,执行 `db2 bind db2schema.bnd blocking all grant public sqlerror continue` 绑定 catalog 访问权限,并创建 `ASNCDC` schema、元数据表,以及 `ASNCDC.ADDTABLE`、`ASNCDC.REMOVETABLE`、`ASNCDC.ASNCDCSERVICES` 等辅助例程。 +3. 启动 ASN capture 服务并确认服务运行正常: + + ```sql + VALUES ASNCDC.ASNCDCSERVICES('start','asncdc'); + VALUES ASNCDC.ASNCDCSERVICES('status','asncdc'); + ``` + +4. 将每张源表加入 capture mode,然后重新初始化 ASN 服务。请将示例中的 schema 和表名替换为实际源表标识: + + ```sql + CALL ASNCDC.ADDTABLE('DB2INST1', 'ORDERS'); + VALUES ASNCDC.ASNCDCSERVICES('reinit','asncdc'); + ``` + +5. 确认源表已经注册,并且存在对应的 change table: + + ```sql + SELECT SOURCE_OWNER, SOURCE_TABLE, CD_OWNER, CD_TABLE, STATE + FROM ASNCDC.IBMSNAP_REGISTER + WHERE SOURCE_OWNER = 'DB2INST1' AND SOURCE_TABLE = 'ORDERS'; + ``` + +6. 为 Flink CDC 运行用户授予源表、ASNCDC 元数据表和对应 change table 的读取权限: + + ```sql + GRANT SELECT ON TABLE DB2INST1.ORDERS TO USER FLINK_USER; + GRANT SELECT ON TABLE ASNCDC.IBMSNAP_REGISTER TO USER FLINK_USER; + GRANT SELECT ON TABLE ASNCDC.CDC_DB2INST1_ORDERS TO USER FLINK_USER; + ``` + +如果 DB2 标识符创建时未使用引号,DB2 会以大写形式保存 schema 和表名。`ASNCDC.ADDTABLE` 与 Pipeline `tables` 选项都应使用 DB2 中保存的名称,例如 `TESTDB.DB2INST1.ORDERS`。 + +DB2 SQL Replication 不支持捕获包含 `BOOLEAN` 列的表。这类表可以读取快照,但无法获得增量 CDC 记录;如需增量同步,应将 `BOOLEAN` 替换为 SQL Replication 支持的类型。生产环境启用前,请先确认 IBM 与 Debezium 对 SQL Replication 的许可要求。 + +## 示例 + +从 DB2 读取数据同步到 Fluss 的 Pipeline 可以定义如下: + +```yaml +source: + type: db2 + name: DB2 Source + hostname: 127.0.0.1 + port: 50000 + username: db2inst1 + password: pass + tables: TESTDB.DB2INST1.ORDERS + schema-change.enabled: true + metadata.list: op_ts,table_name,database_name,schema_name + +sink: + type: fluss + name: Fluss Sink + bootstrap.servers: localhost:9123 + +pipeline: + name: DB2 to Fluss Pipeline + parallelism: 4 +``` + +## 连接器配置项 + +| Option | Required | Default | Type | Description | +|--------|----------|---------|------|-------------| +| hostname | required | (none) | String | DB2 数据库服务器的 IP 地址或主机名。 | +| port | optional | 50000 | Integer | DB2 数据库服务器端口号。 | +| username | required | (none) | String | 连接 DB2 数据库服务器时使用的用户名。 | +| password | required | (none) | String | 连接 DB2 数据库服务器时使用的密码。 | +| tables | required | (none) | String | 需要监控的 DB2 表,支持正则表达式。表名格式为 `database.schema.table`。点号会作为数据库、Schema 和表名分隔符;如需在正则表达式中使用点号匹配任意字符,请使用反斜杠转义。 | +| tables.exclude | optional | (none) | String | 在 `tables` 匹配结果中排除的 DB2 表,格式与 `tables` 相同。 | +| schema-change.enabled | optional | true | Boolean | 是否发送 Schema 变更事件,以便下游 Sink 响应表结构变更。 | +| server-time-zone | optional | UTC | String | 数据库服务器会话时区。 | +| scan.startup.mode | optional | initial | String | DB2 CDC 消费者的启动模式。有效值为 `initial` 和 `latest-offset`。 | +| scan.incremental.snapshot.chunk.size | optional | 8096 | Integer | 快照读取阶段切分表数据时使用的分片行数。 | +| scan.snapshot.fetch.size | optional | 1024 | Integer | 读取表快照时每次拉取的最大行数。 | +| scan.incremental.snapshot.chunk.key-column | optional | (none) | String | 表快照分片使用的 chunk key 列。默认使用第一个主键列。 | +| scan.incremental.snapshot.backfill.skip | optional | true | Boolean | 是否跳过快照读取阶段的 backfill。跳过 backfill 后,快照期间发生的变更可能在增量阶段回放,下游需要正确处理 at-least-once 事件。 | +| scan.incremental.close-idle-reader.enabled | optional | false | Boolean | 是否在快照阶段结束后关闭空闲 reader。该功能依赖任务结束后的 checkpoint 支持。 | +| connect.timeout | optional | 30s | Duration | 建立 DB2 连接时的最大等待时间。 | +| connect.max-retries | optional | 3 | Integer | 建立 DB2 数据库连接的最大重试次数。 | +| connection.pool.size | optional | 20 | Integer | 连接池大小。 | +| metadata.list | optional | (none) | String | 从 SourceRecord 读取并传递给下游、可在 transform 表达式中使用的元数据字段列表,多个字段用逗号分隔。可用字段包括:`op_ts`、`table_name`、`database_name` 和 `schema_name`。 | +| jdbc.properties.* | optional | (none) | String | 传递给 DB2 JDBC 连接的自定义 JDBC URL 属性。 | +| debezium.* | optional | (none) | String | 传递给内嵌 DB2 Debezium connector 的 Debezium 参数。 | + +## 启动模式 + +`scan.startup.mode` 选项指定 DB2 CDC 从哪里开始读取: + +- `initial`(默认):先对监控表执行初始快照,然后继续读取 DB2 变更流。 +- `latest-offset`:跳过快照阶段,直接从 DB2 变更流末尾开始读取。 + +## 支持的元数据列 + +DB2 CDC Pipeline 连接器支持从源记录中读取元数据列。这些元数据列可以在 transform 操作中使用,也可以传递给下游 Sink。 + +部分元数据信息也可以通过 Transform 表达式获取,例如 `__namespace_name__`、`__schema_name__` 和 `__table_name__`。`op_ts` 只能通过 `metadata.list` 获取,表示数据库中的操作时间戳。 + +```yaml +source: + type: db2 + # ... 其他配置 + metadata.list: op_ts,table_name,database_name,schema_name +``` + +| 元数据列 | 数据类型 | 描述 | +|----------|----------|------| +| op_ts | BIGINT NOT NULL | 变更事件在数据库中发生的时间戳,单位为从 epoch 开始的毫秒数。对于快照记录,该值为 0。 | +| table_name | STRING NOT NULL | 包含变更行的表名。 | +| database_name | STRING NOT NULL | 包含变更行的数据库名。 | +| schema_name | STRING NOT NULL | 包含变更行的 Schema 名称。 | + +## 可用的指标 + +指标可以帮助了解分片分发进度,支持的 Flink 指标如下: + +| Group | Name | Type | Description | +|-------|------|------|-------------| +| namespace.schema.table | isSnapshotting | Gauge | 表是否处于快照读取阶段 | +| namespace.schema.table | isStreamReading | Gauge | 表是否处于增量读取阶段 | +| namespace.schema.table | numTablesSnapshotted | Gauge | 已完成快照读取的表数量 | +| namespace.schema.table | numTablesRemaining | Gauge | 尚未完成快照读取的表数量 | +| namespace.schema.table | numSnapshotSplitsProcessed | Gauge | 正在处理的快照分片数量 | +| namespace.schema.table | numSnapshotSplitsRemaining | Gauge | 尚未处理的快照分片数量 | +| namespace.schema.table | numSnapshotSplitsFinished | Gauge | 已处理完成的快照分片数量 | +| namespace.schema.table | snapshotStartTime | Gauge | 快照读取阶段开始时间 | +| namespace.schema.table | snapshotEndTime | Gauge | 快照读取阶段结束时间 | + +Group 名称是 `namespace.schema.table`,其中 `namespace` 是 DB2 数据库名,`schema` 是 DB2 Schema 名称,`table` 是 DB2 表名。 + +{{< top >}} diff --git a/docs/content.zh/docs/connectors/pipeline-connectors/overview.md b/docs/content.zh/docs/connectors/pipeline-connectors/overview.md index a150843355a..4e93ad5a60f 100644 --- a/docs/content.zh/docs/connectors/pipeline-connectors/overview.md +++ b/docs/content.zh/docs/connectors/pipeline-connectors/overview.md @@ -33,6 +33,7 @@ Flink CDC 提供了可用于 YAML 作业的 Pipeline Source 和 Sink 连接器 | 连接器 | 类型 | 支持的外部系统 | 下载链接 | |----------------------------------------------------------------------------------|--------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | [Apache Doris]({{< ref "docs/connectors/pipeline-connectors/doris" >}}) | Sink |
  • [Apache Doris](https://doris.apache.org/): 1.2.x, 2.x.x, 3.x.x | [Apache Doris](https://mvnrepository.com/artifact/org.apache.flink/flink-cdc-pipeline-connector-doris) | +| [DB2]({{< ref "docs/connectors/pipeline-connectors/db2" >}}) | Source |
  • [IBM DB2](https://www.ibm.com/products/db2) | [DB2](https://mvnrepository.com/artifact/org.apache.flink/flink-cdc-pipeline-connector-db2) | | [Elasticsearch]({{< ref "docs/connectors/pipeline-connectors/elasticsearch" >}}) | Sink |
  • [Elasticsearch](https://www.elastic.co/elasticsearch): 6.x, 7.x, 8.x | [Elasticsearch](https://mvnrepository.com/artifact/org.apache.flink/flink-cdc-pipeline-connector-elasticsearch) | | [Fluss]({{< ref "docs/connectors/pipeline-connectors/fluss" >}}) | Sink |
  • [Fluss](https://fluss.apache.org/): 0.7, 0.8, 0.9 | [Fluss](https://mvnrepository.com/artifact/org.apache.flink/flink-cdc-pipeline-connector-fluss) | | [Hudi]({{< ref "docs/connectors/pipeline-connectors/hudi" >}}) | Sink |
  • [Apache Hudi](https://hudi.apache.org/) | [Apache Hudi](https://mvnrepository.com/artifact/org.apache.flink/flink-cdc-pipeline-connector-hudi) | @@ -53,7 +54,7 @@ Flink CDC 提供了可用于 YAML 作业的 Pipeline Source 和 Sink 连接器 | Flink CDC 版本 | Flink 版本 | Pipeline Source | Pipeline Sink | 备注 | |:------------:|:-------------------------:|:-------------------------:|:-------------------------------------------------------------------------------------------:|:-------------------------------:| -| **3.6.x** | 1.20.\*, 2.2.\* | MySQL, PostgreSQL, Oracle | StarRocks, Doris, Paimon, Kafka, Elasticsearch, OceanBase, MaxCompute, Iceberg, Fluss, Hudi | | +| **3.6.x** | 1.20.\*, 2.2.\* | MySQL, PostgreSQL, Oracle, DB2 | StarRocks, Doris, Paimon, Kafka, Elasticsearch, OceanBase, MaxCompute, Iceberg, Fluss, Hudi | | | **3.5.x** | 1.19.\*, 1.20.\* | MySQL, PostgreSQL | StarRocks, Doris, Paimon, Kafka, Elasticsearch, OceanBase, MaxCompute, Iceberg, Fluss | | | **3.4.x** | 1.19.\*, 1.20.\* | MySQL | StarRocks, Doris, Paimon, Kafka, Elasticsearch, OceanBase, MaxCompute, Iceberg | | | **3.3.x** | 1.19.\*, 1.20.\* | MySQL | StarRocks, Doris, Paimon, Kafka, Elasticsearch, OceanBase, MaxCompute | | diff --git a/docs/content/docs/connectors/pipeline-connectors/db2.md b/docs/content/docs/connectors/pipeline-connectors/db2.md new file mode 100644 index 00000000000..8e3910f8287 --- /dev/null +++ b/docs/content/docs/connectors/pipeline-connectors/db2.md @@ -0,0 +1,200 @@ +--- +title: "DB2" +weight: 4 +type: docs +aliases: +- /connectors/pipeline-connectors/db2 +--- + + +# DB2 Connector + +DB2 CDC Pipeline connector allows reading snapshot data and incremental change data from DB2 databases. +The connector is based on the DB2 CDC source connector and emits Flink CDC pipeline events for end-to-end synchronization. + +## Dependencies + +The IBM DB2 JDBC driver uses the IBM IPLA license, which is incompatible with the Flink CDC project. +Flink CDC does not package this driver in the DB2 pipeline connector jar. +You need to configure the following dependency manually, and pass it with the `--jar` argument of Flink CDC CLI when submitting YAML pipeline jobs. + +
    + + + + + + + + + + + + + +
    Dependency ItemDescription
    com.ibm.db2.jcc:db2jcc:db2jcc4Used for connecting to DB2 databases.
    +
    + +## Prerequisites + +Before using the DB2 pipeline connector, make sure that: + +1. DB2 is configured for CDC capture, and the target tables are enabled for DB2 change capture. +2. The DB2 server is reachable from the Flink cluster over TCP. +3. The configured user has permission to read the captured tables and DB2 CDC metadata tables. +4. All tables in one pipeline source belong to the same DB2 database. The `tables` option uses the `database.schema.table` format. + +### Enable CDC in DB2 + +The DB2 pipeline connector does not create DB2 SQL Replication artifacts automatically. A DB2 instance owner or database administrator must complete the DB2-side CDC setup before submitting the pipeline. For the full upstream procedure, see [Setting up Db2 in the Debezium DB2 connector documentation](https://debezium.io/documentation/reference/stable/connectors/db2.html#setting-up-db2). + +1. Enable DB2 SQL Replication and archive logging or log retention for the source database. In containerized DB2 environments this is commonly configured before database creation. For example, the connector test image starts DB2 with `ARCHIVE_LOGS=true`. After changing logging mode, take a full database backup and restart or activate the database so the ASN capture service has a valid log start point. +2. Install the Debezium DB2 management UDFs, or create equivalent DB2 SQL Replication control objects. The UDF setup compiles `asncdc`, installs it into the DB2 function directory, binds catalog access with `db2 bind db2schema.bnd blocking all grant public sqlerror continue`, and creates the `ASNCDC` schema, metadata tables, and helper routines such as `ASNCDC.ADDTABLE`, `ASNCDC.REMOVETABLE`, and `ASNCDC.ASNCDCSERVICES`. +3. Start the ASN capture service and verify that it is running: + + ```sql + VALUES ASNCDC.ASNCDCSERVICES('start','asncdc'); + VALUES ASNCDC.ASNCDCSERVICES('status','asncdc'); + ``` + +4. Put every source table into capture mode, then reinitialize the ASN service. Replace the schema and table names with your source table identifiers: + + ```sql + CALL ASNCDC.ADDTABLE('DB2INST1', 'ORDERS'); + VALUES ASNCDC.ASNCDCSERVICES('reinit','asncdc'); + ``` + +5. Verify that the table is registered and has an associated change table: + + ```sql + SELECT SOURCE_OWNER, SOURCE_TABLE, CD_OWNER, CD_TABLE, STATE + FROM ASNCDC.IBMSNAP_REGISTER + WHERE SOURCE_OWNER = 'DB2INST1' AND SOURCE_TABLE = 'ORDERS'; + ``` + +6. Grant the Flink CDC runtime user read access to the source table, the ASNCDC metadata table, and each generated change table: + + ```sql + GRANT SELECT ON TABLE DB2INST1.ORDERS TO USER FLINK_USER; + GRANT SELECT ON TABLE ASNCDC.IBMSNAP_REGISTER TO USER FLINK_USER; + GRANT SELECT ON TABLE ASNCDC.CDC_DB2INST1_ORDERS TO USER FLINK_USER; + ``` + +When DB2 identifiers are created without quotes, DB2 stores schema and table names in uppercase. Use the stored uppercase names in both `ASNCDC.ADDTABLE` and the pipeline `tables` option, for example `TESTDB.DB2INST1.ORDERS`. + +DB2 SQL Replication does not support capturing tables that contain `BOOLEAN` columns. Such tables can be read during snapshot, but incremental CDC records are not available until the schema uses a supported replacement type. Review IBM and Debezium licensing requirements for SQL Replication before enabling this setup in production. + +## Example + +An example pipeline for reading data from DB2 and writing to Fluss can be defined as follows: + +```yaml +source: + type: db2 + name: DB2 Source + hostname: 127.0.0.1 + port: 50000 + username: db2inst1 + password: pass + tables: TESTDB.DB2INST1.ORDERS + schema-change.enabled: true + metadata.list: op_ts,table_name,database_name,schema_name + +sink: + type: fluss + name: Fluss Sink + bootstrap.servers: localhost:9123 + +pipeline: + name: DB2 to Fluss Pipeline + parallelism: 4 +``` + +## Connector Options + +| Option | Required | Default | Type | Description | +|--------|----------|---------|------|-------------| +| hostname | required | (none) | String | IP address or hostname of the DB2 database server. | +| port | optional | 50000 | Integer | Integer port number of the DB2 database server. | +| username | required | (none) | String | Name of the DB2 user to use when connecting to the DB2 database server. | +| password | required | (none) | String | Password to use when connecting to the DB2 database server. | +| tables | required | (none) | String | DB2 tables to monitor. Regular expressions are supported. The expected format is `database.schema.table`. Dot characters are treated as delimiters; escape dots used as regular expression wildcards with a backslash. | +| tables.exclude | optional | (none) | String | DB2 tables to exclude after applying the `tables` selector. The format is the same as `tables`. | +| schema-change.enabled | optional | true | Boolean | Whether to send schema change events so that downstream sinks can respond to table structure changes. | +| server-time-zone | optional | UTC | String | The session time zone of the database server. | +| scan.startup.mode | optional | initial | String | Startup mode for DB2 CDC consumer. Valid values are `initial` and `latest-offset`. | +| scan.incremental.snapshot.chunk.size | optional | 8096 | Integer | The chunk size, in rows, used when splitting captured tables during snapshot reading. | +| scan.snapshot.fetch.size | optional | 1024 | Integer | The maximum fetch size per poll when reading a table snapshot. | +| scan.incremental.snapshot.chunk.key-column | optional | (none) | String | The chunk key column used for table snapshot splitting. By default, the first primary-key column is used. | +| scan.incremental.snapshot.backfill.skip | optional | true | Boolean | Whether to skip backfill in the snapshot reading phase. Skipping backfill can replay changes that happened during the snapshot phase, so downstream sinks must handle at-least-once events correctly. | +| scan.incremental.close-idle-reader.enabled | optional | false | Boolean | Whether to close idle readers at the end of the snapshot phase. This requires checkpoint support after tasks finish. | +| connect.timeout | optional | 30s | Duration | The maximum time to wait while establishing a DB2 connection. | +| connect.max-retries | optional | 3 | Integer | The maximum number of retries for building a DB2 database connection. | +| connection.pool.size | optional | 20 | Integer | The connection pool size. | +| metadata.list | optional | (none) | String | List of readable metadata fields from SourceRecord to pass downstream and use in transform expressions, separated by commas. Available metadata fields are: `op_ts`, `table_name`, `database_name`, and `schema_name`. | +| jdbc.properties.* | optional | (none) | String | Custom JDBC URL properties passed to the DB2 JDBC connection. | +| debezium.* | optional | (none) | String | Pass-through Debezium properties for the embedded DB2 connector. | + +## Startup Reading Position + +The `scan.startup.mode` option specifies where DB2 CDC starts reading: + +- `initial` (default): Performs an initial snapshot on the monitored tables, then continues to read the DB2 change stream. +- `latest-offset`: Skips the snapshot and starts reading from the end of the DB2 change stream. + +## Supported Metadata Columns + +DB2 CDC pipeline connector supports reading metadata columns from source records. These metadata columns can be used in transform operations or passed to downstream sinks. + +Some metadata information is also available through Transform expressions, such as `__namespace_name__`, `__schema_name__`, and `__table_name__`. The `op_ts` metadata field is only available through `metadata.list` and represents the operation timestamp from the database. + +```yaml +source: + type: db2 + # ... other configurations + metadata.list: op_ts,table_name,database_name,schema_name +``` + +| Metadata Column | Data Type | Description | +|-----------------|-----------|-------------| +| op_ts | BIGINT NOT NULL | The timestamp, in milliseconds since epoch, when the change event occurred in the database. For snapshot records, this value is 0. | +| table_name | STRING NOT NULL | The name of the table that contains the changed row. | +| database_name | STRING NOT NULL | The name of the database that contains the changed row. | +| schema_name | STRING NOT NULL | The name of the schema that contains the changed row. | + +## Available Source Metrics + +Metrics can help understand the progress of assignments. The following Flink metrics are supported: + +| Group | Name | Type | Description | +|-------|------|------|-------------| +| namespace.schema.table | isSnapshotting | Gauge | Whether the table is snapshotting | +| namespace.schema.table | isStreamReading | Gauge | Whether the table is stream reading | +| namespace.schema.table | numTablesSnapshotted | Gauge | The number of tables that have been snapshotted | +| namespace.schema.table | numTablesRemaining | Gauge | The number of tables that have not been snapshotted | +| namespace.schema.table | numSnapshotSplitsProcessed | Gauge | The number of splits that are being processed | +| namespace.schema.table | numSnapshotSplitsRemaining | Gauge | The number of splits that have not been processed | +| namespace.schema.table | numSnapshotSplitsFinished | Gauge | The number of splits that have been processed | +| namespace.schema.table | snapshotStartTime | Gauge | The time when the snapshot started | +| namespace.schema.table | snapshotEndTime | Gauge | The time when the snapshot ended | + +The group name is `namespace.schema.table`, where `namespace` is the DB2 database name, `schema` is the DB2 schema name, and `table` is the DB2 table name. + +{{< top >}} diff --git a/docs/content/docs/connectors/pipeline-connectors/overview.md b/docs/content/docs/connectors/pipeline-connectors/overview.md index 5cf6dbd2fa4..37aaedce3ae 100644 --- a/docs/content/docs/connectors/pipeline-connectors/overview.md +++ b/docs/content/docs/connectors/pipeline-connectors/overview.md @@ -36,6 +36,7 @@ definition. | Connector | Supported Type | External System | Download Page | |----------------------------------------------------------------------------------|----------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | [Apache Doris]({{< ref "docs/connectors/pipeline-connectors/doris" >}}) | Sink |
  • [Apache Doris](https://doris.apache.org/): 1.2.x, 2.x.x, 3.x.x | [Apache Doris](https://mvnrepository.com/artifact/org.apache.flink/flink-cdc-pipeline-connector-doris) | +| [DB2]({{< ref "docs/connectors/pipeline-connectors/db2" >}}) | Source |
  • [IBM DB2](https://www.ibm.com/products/db2) | [DB2](https://mvnrepository.com/artifact/org.apache.flink/flink-cdc-pipeline-connector-db2) | | [Elasticsearch]({{< ref "docs/connectors/pipeline-connectors/elasticsearch" >}}) | Sink |
  • [Elasticsearch](https://www.elastic.co/elasticsearch): 6.x, 7.x, 8.x | [Elasticsearch](https://mvnrepository.com/artifact/org.apache.flink/flink-cdc-pipeline-connector-elasticsearch) | | [Fluss]({{< ref "docs/connectors/pipeline-connectors/fluss" >}}) | Sink |
  • [Fluss](https://fluss.apache.org/): 0.7, 0.8, 0.9 | [Fluss](https://mvnrepository.com/artifact/org.apache.flink/flink-cdc-pipeline-connector-fluss) | | [Hudi]({{< ref "docs/connectors/pipeline-connectors/hudi" >}}) | Sink |
  • [Apache Hudi](https://hudi.apache.org/) | [Apache Hudi](https://mvnrepository.com/artifact/org.apache.flink/flink-cdc-pipeline-connector-hudi) | @@ -57,7 +58,7 @@ The following table shows the version mapping between Flink CDC Pipeline Connect | Flink CDC Version | Flink Version | Pipeline Source | Pipeline Sink | Notes | |:-----------------:|:-------------------------:|:-------------------------:|:-------------------------------------------------------------------------------------------:|:----------------------------------------:| -| **3.6.x** | 1.20.\*, 2.2.\* | MySQL, PostgreSQL, Oracle | StarRocks, Doris, Paimon, Kafka, Elasticsearch, OceanBase, MaxCompute, Iceberg, Fluss, Hudi | | +| **3.6.x** | 1.20.\*, 2.2.\* | MySQL, PostgreSQL, Oracle, DB2 | StarRocks, Doris, Paimon, Kafka, Elasticsearch, OceanBase, MaxCompute, Iceberg, Fluss, Hudi | | | **3.5.x** | 1.19.\*, 1.20.\* | MySQL, PostgreSQL | StarRocks, Doris, Paimon, Kafka, Elasticsearch, OceanBase, MaxCompute, Iceberg, Fluss | | | **3.4.x** | 1.19.\*, 1.20.\* | MySQL | StarRocks, Doris, Paimon, Kafka, Elasticsearch, OceanBase, MaxCompute, Iceberg | | | **3.3.x** | 1.19.\*, 1.20.\* | MySQL | StarRocks, Doris, Paimon, Kafka, Elasticsearch, OceanBase, MaxCompute | | diff --git a/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/pom.xml b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/pom.xml new file mode 100644 index 00000000000..11abb7f9f6c --- /dev/null +++ b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/pom.xml @@ -0,0 +1,256 @@ + + + + + + org.apache.flink + flink-cdc-pipeline-connectors + ${revision} + + 4.0.0 + + flink-cdc-pipeline-connector-db2 + + + + + + + + org.apache.flink + flink-connector-db2-cdc + ${project.version} + + + + com.ibm.db2.jcc + db2jcc + db2jcc4 + + + + org.apache.flink + flink-connector-db2-cdc + ${project.version} + test + test-jar + + + + org.apache.flink + flink-connector-test-util + ${project.version} + test + + + + io.debezium + debezium-core + ${debezium.version} + test-jar + test + + + + org.apache.flink + flink-table-planner_${scala.binary.version} + ${flink.version} + test + + + + org.apache.flink + flink-table-runtime + ${flink.version} + test + + + + org.apache.flink + flink-test-utils + ${flink.version} + test + + + + org.apache.flink + flink-connector-test-utils + ${flink.version} + test + + + + org.apache.flink + flink-core + ${flink.version} + test-jar + test + + + + org.apache.flink + flink-streaming-java + ${flink.version} + test-jar + test + + + + org.apache.flink + flink-table-common + ${flink.version} + test-jar + test + + + + org.apache.flink + flink-tests + ${flink.version} + test-jar + test + + + + + org.testcontainers + db2 + ${testcontainers.version} + test + + + + org.apache.flink + flink-table-planner_${scala.binary.version} + ${flink.version} + test-jar + test + + + + org.apache.flink + flink-cdc-composer + ${project.version} + test + + + + org.apache.flink + flink-cdc-pipeline-connector-values + ${project.version} + test + + + + org.apache.flink + flink-test-utils-junit + ${flink.version} + test + + + + + + + + org.apache.maven.plugins + maven-shade-plugin + ${maven.shade.plugin.version} + + + shade-flink + package + + shade + + + false + + + io.debezium:debezium-api + io.debezium:debezium-embedded + io.debezium:debezium-core + io.debezium:debezium-connector-db2 + org.apache.flink:flink-cdc-base + org.apache.flink:flink-connector-debezium + org.apache.flink:flink-connector-db2-cdc + com.zaxxer:HikariCP + com.google.protobuf:protobuf-java + com.google.guava:* + org.apache.kafka:* + com.fasterxml.*:* + + org.apache.flink:flink-shaded-guava + + + + + org.apache.kafka:* + + kafka/kafka-version.properties + LICENSE + + NOTICE + common/** + + + + + + org.apache.kafka + + org.apache.flink.cdc.connectors.shaded.org.apache.kafka + + + + com.google + + org.apache.flink.cdc.connectors.shaded.com.google + + + + com.fasterxml + + org.apache.flink.cdc.connectors.shaded.com.fasterxml + + + + + + + + + + org.apache.maven.plugins + maven-jar-plugin + + + test-jar + + test-jar + + + + + + + diff --git a/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/main/java/org/apache/flink/cdc/connectors/db2/factory/Db2DataSourceFactory.java b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/main/java/org/apache/flink/cdc/connectors/db2/factory/Db2DataSourceFactory.java new file mode 100644 index 00000000000..cfca39c136d --- /dev/null +++ b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/main/java/org/apache/flink/cdc/connectors/db2/factory/Db2DataSourceFactory.java @@ -0,0 +1,407 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.cdc.connectors.db2.factory; + +import org.apache.flink.cdc.common.annotation.Internal; +import org.apache.flink.cdc.common.configuration.ConfigOption; +import org.apache.flink.cdc.common.configuration.Configuration; +import org.apache.flink.cdc.common.event.TableId; +import org.apache.flink.cdc.common.factories.DataSourceFactory; +import org.apache.flink.cdc.common.factories.Factory; +import org.apache.flink.cdc.common.factories.FactoryHelper; +import org.apache.flink.cdc.common.schema.Selectors; +import org.apache.flink.cdc.common.source.DataSource; +import org.apache.flink.cdc.common.utils.StringUtils; +import org.apache.flink.cdc.connectors.base.options.StartupOptions; +import org.apache.flink.cdc.connectors.db2.source.Db2DataSource; +import org.apache.flink.cdc.connectors.db2.source.config.Db2SourceConfig; +import org.apache.flink.cdc.connectors.db2.source.config.Db2SourceConfigFactory; +import org.apache.flink.cdc.connectors.db2.table.Db2ReadableMetadata; +import org.apache.flink.cdc.connectors.db2.utils.Db2SchemaUtils; +import org.apache.flink.table.api.ValidationException; + +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import javax.annotation.Nullable; + +import java.time.Duration; +import java.time.ZoneId; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collections; +import java.util.HashSet; +import java.util.List; +import java.util.Map; +import java.util.Properties; +import java.util.Set; +import java.util.stream.Collectors; + +import static org.apache.flink.cdc.connectors.base.utils.ObjectUtils.doubleCompare; +import static org.apache.flink.cdc.connectors.db2.source.Db2DataSourceOptions.CHUNK_KEY_EVEN_DISTRIBUTION_FACTOR_LOWER_BOUND; +import static org.apache.flink.cdc.connectors.db2.source.Db2DataSourceOptions.CHUNK_KEY_EVEN_DISTRIBUTION_FACTOR_UPPER_BOUND; +import static org.apache.flink.cdc.connectors.db2.source.Db2DataSourceOptions.CHUNK_META_GROUP_SIZE; +import static org.apache.flink.cdc.connectors.db2.source.Db2DataSourceOptions.CONNECTION_POOL_SIZE; +import static org.apache.flink.cdc.connectors.db2.source.Db2DataSourceOptions.CONNECT_MAX_RETRIES; +import static org.apache.flink.cdc.connectors.db2.source.Db2DataSourceOptions.CONNECT_TIMEOUT; +import static org.apache.flink.cdc.connectors.db2.source.Db2DataSourceOptions.HOSTNAME; +import static org.apache.flink.cdc.connectors.db2.source.Db2DataSourceOptions.METADATA_LIST; +import static org.apache.flink.cdc.connectors.db2.source.Db2DataSourceOptions.PASSWORD; +import static org.apache.flink.cdc.connectors.db2.source.Db2DataSourceOptions.PORT; +import static org.apache.flink.cdc.connectors.db2.source.Db2DataSourceOptions.SCAN_INCREMENTAL_CLOSE_IDLE_READER_ENABLED; +import static org.apache.flink.cdc.connectors.db2.source.Db2DataSourceOptions.SCAN_INCREMENTAL_SNAPSHOT_BACKFILL_SKIP; +import static org.apache.flink.cdc.connectors.db2.source.Db2DataSourceOptions.SCAN_INCREMENTAL_SNAPSHOT_CHUNK_KEY_COLUMN; +import static org.apache.flink.cdc.connectors.db2.source.Db2DataSourceOptions.SCAN_INCREMENTAL_SNAPSHOT_CHUNK_SIZE; +import static org.apache.flink.cdc.connectors.db2.source.Db2DataSourceOptions.SCAN_INCREMENTAL_SNAPSHOT_UNBOUNDED_CHUNK_FIRST_ENABLED; +import static org.apache.flink.cdc.connectors.db2.source.Db2DataSourceOptions.SCAN_SNAPSHOT_FETCH_SIZE; +import static org.apache.flink.cdc.connectors.db2.source.Db2DataSourceOptions.SCAN_STARTUP_MODE; +import static org.apache.flink.cdc.connectors.db2.source.Db2DataSourceOptions.SCHEMA_CHANGE_ENABLED; +import static org.apache.flink.cdc.connectors.db2.source.Db2DataSourceOptions.SERVER_TIME_ZONE; +import static org.apache.flink.cdc.connectors.db2.source.Db2DataSourceOptions.TABLES; +import static org.apache.flink.cdc.connectors.db2.source.Db2DataSourceOptions.TABLES_EXCLUDE; +import static org.apache.flink.cdc.connectors.db2.source.Db2DataSourceOptions.USERNAME; +import static org.apache.flink.cdc.debezium.table.DebeziumOptions.DEBEZIUM_OPTIONS_PREFIX; +import static org.apache.flink.cdc.debezium.table.DebeziumOptions.getDebeziumProperties; +import static org.apache.flink.cdc.debezium.utils.JdbcUrlUtils.PROPERTIES_PREFIX; +import static org.apache.flink.cdc.debezium.utils.JdbcUrlUtils.getJdbcProperties; +import static org.apache.flink.util.Preconditions.checkState; + +/** A {@link Factory} to create {@link Db2DataSource}. */ +@Internal +public class Db2DataSourceFactory implements DataSourceFactory { + + private static final Logger LOG = LoggerFactory.getLogger(Db2DataSourceFactory.class); + + public static final String IDENTIFIER = "db2"; + + @Override + public DataSource createDataSource(Context context) { + FactoryHelper.createFactoryHelper(this, context) + .validateExcept(PROPERTIES_PREFIX, DEBEZIUM_OPTIONS_PREFIX); + + final Configuration config = context.getFactoryConfiguration(); + String hostname = config.get(HOSTNAME); + int port = config.get(PORT); + String username = config.get(USERNAME); + String password = config.get(PASSWORD); + String chunkKeyColumn = config.get(SCAN_INCREMENTAL_SNAPSHOT_CHUNK_KEY_COLUMN); + String tables = config.get(TABLES); + ZoneId serverTimeZone = getServerTimeZone(config); + String tablesExclude = config.get(TABLES_EXCLUDE); + StartupOptions startupOptions = getStartupOptions(config); + + boolean includeSchemaChanges = config.get(SCHEMA_CHANGE_ENABLED); + + int fetchSize = config.get(SCAN_SNAPSHOT_FETCH_SIZE); + int splitSize = config.get(SCAN_INCREMENTAL_SNAPSHOT_CHUNK_SIZE); + int splitMetaGroupSize = config.get(CHUNK_META_GROUP_SIZE); + + double distributionFactorUpper = config.get(CHUNK_KEY_EVEN_DISTRIBUTION_FACTOR_UPPER_BOUND); + double distributionFactorLower = config.get(CHUNK_KEY_EVEN_DISTRIBUTION_FACTOR_LOWER_BOUND); + + boolean closeIdleReaders = config.get(SCAN_INCREMENTAL_CLOSE_IDLE_READER_ENABLED); + boolean isAssignUnboundedChunkFirst = + config.get(SCAN_INCREMENTAL_SNAPSHOT_UNBOUNDED_CHUNK_FIRST_ENABLED); + Duration connectTimeout = config.get(CONNECT_TIMEOUT); + int connectMaxRetries = config.get(CONNECT_MAX_RETRIES); + int connectionPoolSize = config.get(CONNECTION_POOL_SIZE); + boolean skipSnapshotBackfill = config.get(SCAN_INCREMENTAL_SNAPSHOT_BACKFILL_SKIP); + + validateIntegerOption(SCAN_INCREMENTAL_SNAPSHOT_CHUNK_SIZE, splitSize, 1); + validateIntegerOption(CHUNK_META_GROUP_SIZE, splitMetaGroupSize, 1); + validateIntegerOption(SCAN_SNAPSHOT_FETCH_SIZE, fetchSize, 1); + validateIntegerOption(CONNECTION_POOL_SIZE, connectionPoolSize, 1); + validateIntegerOption(CONNECT_MAX_RETRIES, connectMaxRetries, 0); + validateDistributionFactorUpper(distributionFactorUpper); + validateDistributionFactorLower(distributionFactorLower); + + Map configMap = config.toMap(); + mergeJdbcPropertiesIntoDebeziumProperties(configMap); + String databaseName = getValidateDatabaseName(tables); + + Db2SourceConfigFactory configFactory = new Db2SourceConfigFactory(); + configFactory + .hostname(hostname) + .port(port) + .databaseList(databaseName) + .tableList(".*\\..*") + .username(username) + .password(password) + .serverTimeZone(serverTimeZone.getId()) + .debeziumProperties(getDebeziumProperties(configMap)) + .splitSize(splitSize) + .splitMetaGroupSize(splitMetaGroupSize) + .distributionFactorUpper(distributionFactorUpper) + .distributionFactorLower(distributionFactorLower) + .fetchSize(fetchSize) + .connectTimeout(connectTimeout) + .connectMaxRetries(connectMaxRetries) + .connectionPoolSize(connectionPoolSize) + .includeSchemaChanges(includeSchemaChanges) + .startupOptions(startupOptions) + .chunkKeyColumn(chunkKeyColumn) + .closeIdleReaders(closeIdleReaders) + .skipSnapshotBackfill(skipSnapshotBackfill) + .assignUnboundedChunkFirst(isAssignUnboundedChunkFirst); + + Db2SourceConfig sourceConfig = configFactory.create(0); + + List tableIds = Db2SchemaUtils.listTables(sourceConfig, null, null); + + Selectors selectors = new Selectors.SelectorsBuilder().includeTables(tables).build(); + List capturedTables = getTableList(tableIds, selectors); + if (capturedTables.isEmpty()) { + throw new IllegalArgumentException( + "Cannot find any table by the option 'tables' = " + tables); + } + if (tablesExclude != null) { + Selectors selectExclude = + new Selectors.SelectorsBuilder().includeTables(tablesExclude).build(); + List excludeTables = getTableList(tableIds, selectExclude); + if (!excludeTables.isEmpty()) { + capturedTables.removeAll(excludeTables); + } + if (capturedTables.isEmpty()) { + throw new IllegalArgumentException( + "Cannot find any table with the option 'tables.exclude' = " + + tablesExclude); + } + } + configFactory.tableList(capturedTables.toArray(new String[0])); + + String metadataList = config.get(METADATA_LIST); + List readableMetadataList = listReadableMetadata(metadataList); + + return new Db2DataSource(configFactory, readableMetadataList); + } + + private void mergeJdbcPropertiesIntoDebeziumProperties(Map configMap) { + Properties jdbcProperties = getJdbcProperties(configMap); + for (String propertyName : jdbcProperties.stringPropertyNames()) { + configMap.putIfAbsent( + DEBEZIUM_OPTIONS_PREFIX + "database." + propertyName, + jdbcProperties.getProperty(propertyName)); + } + } + + private List listReadableMetadata(String metadataList) { + if (StringUtils.isNullOrWhitespaceOnly(metadataList)) { + return new ArrayList<>(); + } + Set readableMetadataSet = + Arrays.stream(metadataList.split(",")) + .map(String::trim) + .collect(Collectors.toSet()); + List foundMetadata = new ArrayList<>(); + for (Db2ReadableMetadata metadata : Db2ReadableMetadata.values()) { + if (readableMetadataSet.contains(metadata.getKey())) { + foundMetadata.add(metadata); + readableMetadataSet.remove(metadata.getKey()); + } + } + if (readableMetadataSet.isEmpty()) { + return foundMetadata; + } + throw new IllegalArgumentException( + String.format( + "[%s] cannot be found in DB2 metadata.", + String.join(", ", readableMetadataSet))); + } + + @Override + public Set> requiredOptions() { + Set> options = new HashSet<>(); + options.add(HOSTNAME); + options.add(USERNAME); + options.add(PASSWORD); + options.add(TABLES); + return options; + } + + @Override + public Set> optionalOptions() { + Set> options = new HashSet<>(); + options.add(PORT); + options.add(TABLES_EXCLUDE); + options.add(SCAN_INCREMENTAL_SNAPSHOT_CHUNK_KEY_COLUMN); + options.add(SCAN_INCREMENTAL_SNAPSHOT_CHUNK_SIZE); + options.add(SCAN_SNAPSHOT_FETCH_SIZE); + options.add(SCHEMA_CHANGE_ENABLED); + options.add(SCAN_STARTUP_MODE); + options.add(SERVER_TIME_ZONE); + options.add(CONNECT_TIMEOUT); + options.add(CONNECT_MAX_RETRIES); + options.add(CONNECTION_POOL_SIZE); + options.add(SCAN_INCREMENTAL_CLOSE_IDLE_READER_ENABLED); + options.add(SCAN_INCREMENTAL_SNAPSHOT_BACKFILL_SKIP); + options.add(CHUNK_META_GROUP_SIZE); + options.add(CHUNK_KEY_EVEN_DISTRIBUTION_FACTOR_UPPER_BOUND); + options.add(CHUNK_KEY_EVEN_DISTRIBUTION_FACTOR_LOWER_BOUND); + options.add(METADATA_LIST); + options.add(SCAN_INCREMENTAL_SNAPSHOT_UNBOUNDED_CHUNK_FIRST_ENABLED); + return options; + } + + @Override + public String identifier() { + return IDENTIFIER; + } + + private static List getTableList( + @Nullable List tableIdList, Selectors selectors) { + List tableIds = + tableIdList != null ? tableIdList : Collections.emptyList(); + return tableIds.stream() + .filter(selectors::isMatch) + // DB2 tableList format: schemaName.tableName (without a database prefix) + // See Db2SourceBuilder: "Each identifier is of the form + // ." + .map(tableId -> tableId.getSchemaName() + "." + tableId.getTableName()) + .collect(Collectors.toList()); + } + + /** Checks whether the value of the given integer option is valid. */ + private void validateIntegerOption( + ConfigOption option, int optionValue, int exclusiveMin) { + checkState( + optionValue > exclusiveMin, + String.format( + "The value of option '%s' must be larger than %d, but is %d", + option.key(), exclusiveMin, optionValue)); + } + + private static final String SCAN_STARTUP_MODE_VALUE_INITIAL = "initial"; + private static final String SCAN_STARTUP_MODE_VALUE_LATEST = "latest-offset"; + + private static StartupOptions getStartupOptions(Configuration config) { + String modeString = config.get(SCAN_STARTUP_MODE); + + switch (modeString.toLowerCase()) { + case SCAN_STARTUP_MODE_VALUE_INITIAL: + return StartupOptions.initial(); + case SCAN_STARTUP_MODE_VALUE_LATEST: + return StartupOptions.latest(); + + default: + throw new ValidationException( + String.format( + "Invalid value for option '%s'. Supported values are [%s, %s], but was: %s", + SCAN_STARTUP_MODE.key(), + SCAN_STARTUP_MODE_VALUE_INITIAL, + SCAN_STARTUP_MODE_VALUE_LATEST, + modeString)); + } + } + + /** Checks whether the given even distribution factor upper bound is valid. */ + private void validateDistributionFactorUpper(double distributionFactorUpper) { + checkState( + doubleCompare(distributionFactorUpper, 1.0d) >= 0, + String.format( + "The value of option '%s' must be greater than or equal to %s, but is %s", + CHUNK_KEY_EVEN_DISTRIBUTION_FACTOR_UPPER_BOUND.key(), + 1.0d, + distributionFactorUpper)); + } + + /** Checks whether the given even distribution factor lower bound is valid. */ + private void validateDistributionFactorLower(double distributionFactorLower) { + checkState( + doubleCompare(distributionFactorLower, 0.0d) >= 0 + && doubleCompare(distributionFactorLower, 1.0d) <= 0, + String.format( + "The value of option '%s' must be between %s and %s inclusive, but is %s", + CHUNK_KEY_EVEN_DISTRIBUTION_FACTOR_LOWER_BOUND.key(), + 0.0d, + 1.0d, + distributionFactorLower)); + } + + /** + * Gets the database name from the tables configuration. + * + * @param tables Table name list. The format is "db.schema.table,db.schema.table,..." Each table + * name consists of three parts separated by ".", which are database name, schema name, and + * table name. + * @throws IllegalArgumentException If the input parameter is null or does not match the + * expected format, or if database names are inconsistent. + */ + private String getValidateDatabaseName(String tables) { + if (tables == null || tables.trim().isEmpty()) { + throw new IllegalArgumentException("Parameter tables cannot be null or empty"); + } + + String[] tableNames = tables.split(","); + String dbName = null; + + for (String tableName : tableNames) { + String trimmedTableName = tableName.trim(); + String[] tableNameParts = trimmedTableName.split("(? getJavaClass() { + return String.class; + } + + @Override + public Object read(Map metadata) { + if (metadata.containsKey(getName())) { + return metadata.get(getName()); + } + throw new IllegalArgumentException( + "database_name doesn't exist in the metadata: " + metadata); + } +} diff --git a/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/main/java/org/apache/flink/cdc/connectors/db2/source/Db2DataSource.java b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/main/java/org/apache/flink/cdc/connectors/db2/source/Db2DataSource.java new file mode 100644 index 00000000000..ed89a5f39e2 --- /dev/null +++ b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/main/java/org/apache/flink/cdc/connectors/db2/source/Db2DataSource.java @@ -0,0 +1,100 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.cdc.connectors.db2.source; + +import org.apache.flink.cdc.common.annotation.Internal; +import org.apache.flink.cdc.common.annotation.VisibleForTesting; +import org.apache.flink.cdc.common.source.DataSource; +import org.apache.flink.cdc.common.source.EventSourceProvider; +import org.apache.flink.cdc.common.source.FlinkSourceProvider; +import org.apache.flink.cdc.common.source.MetadataAccessor; +import org.apache.flink.cdc.common.source.SupportedMetadataColumn; +import org.apache.flink.cdc.connectors.db2.source.config.Db2SourceConfig; +import org.apache.flink.cdc.connectors.db2.source.config.Db2SourceConfigFactory; +import org.apache.flink.cdc.connectors.db2.source.dialect.Db2Dialect; +import org.apache.flink.cdc.connectors.db2.source.offset.LsnFactory; +import org.apache.flink.cdc.connectors.db2.table.Db2ReadableMetadata; +import org.apache.flink.cdc.debezium.table.DebeziumChangelogMode; + +import java.util.ArrayList; +import java.util.List; + +/** A {@link DataSource} for DB2 CDC connector. */ +@Internal +public class Db2DataSource implements DataSource { + + private final Db2SourceConfigFactory configFactory; + private final Db2SourceConfig db2SourceConfig; + + private final List readableMetadataList; + + public Db2DataSource(Db2SourceConfigFactory configFactory) { + this(configFactory, new ArrayList<>()); + } + + public Db2DataSource( + Db2SourceConfigFactory configFactory, List readableMetadataList) { + this.configFactory = configFactory; + this.db2SourceConfig = configFactory.create(0); + this.readableMetadataList = readableMetadataList; + } + + @Override + public EventSourceProvider getEventSourceProvider() { + Db2EventDeserializer deserializer = + new Db2EventDeserializer( + DebeziumChangelogMode.ALL, + db2SourceConfig.isIncludeSchemaChanges(), + readableMetadataList); + + LsnFactory lsnFactory = new LsnFactory(); + Db2Dialect db2Dialect = new Db2Dialect(db2SourceConfig); + + Db2PipelineSource source = + new Db2PipelineSource(configFactory, deserializer, lsnFactory, db2Dialect); + + return FlinkSourceProvider.of(source); + } + + @Override + public MetadataAccessor getMetadataAccessor() { + return new Db2MetadataAccessor(db2SourceConfig); + } + + @Override + public SupportedMetadataColumn[] supportedMetadataColumns() { + return new SupportedMetadataColumn[] { + new OpTsMetadataColumn(), + new TableNameMetadataColumn(), + new DatabaseNameMetadataColumn(), + new SchemaNameMetadataColumn() + }; + } + + @Override + public boolean isParallelMetadataSource() { + // During the incremental stage, DB2 never emits schema change events on different + // partitions, since it has only one transaction log stream. + return false; + } + + @VisibleForTesting + public Db2SourceConfig getDb2SourceConfig() { + return db2SourceConfig; + } +} diff --git a/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/main/java/org/apache/flink/cdc/connectors/db2/source/Db2DataSourceOptions.java b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/main/java/org/apache/flink/cdc/connectors/db2/source/Db2DataSourceOptions.java new file mode 100644 index 00000000000..2bb46991007 --- /dev/null +++ b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/main/java/org/apache/flink/cdc/connectors/db2/source/Db2DataSourceOptions.java @@ -0,0 +1,218 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.cdc.connectors.db2.source; + +import org.apache.flink.cdc.common.annotation.Experimental; +import org.apache.flink.cdc.common.annotation.PublicEvolving; +import org.apache.flink.cdc.common.configuration.ConfigOption; +import org.apache.flink.cdc.common.configuration.ConfigOptions; + +import java.time.Duration; + +/** Configurations for {@link Db2DataSource}. */ +@PublicEvolving +public class Db2DataSourceOptions { + + public static final ConfigOption HOSTNAME = + ConfigOptions.key("hostname") + .stringType() + .noDefaultValue() + .withDescription("IP address or hostname of the DB2 database server."); + + public static final ConfigOption PORT = + ConfigOptions.key("port") + .intType() + .defaultValue(50000) + .withDescription("Integer port number of the DB2 database server."); + + public static final ConfigOption USERNAME = + ConfigOptions.key("username") + .stringType() + .noDefaultValue() + .withDescription( + "Name of the DB2 user to use when connecting to the DB2 database server."); + + public static final ConfigOption PASSWORD = + ConfigOptions.key("password") + .stringType() + .noDefaultValue() + .withDescription("Password to use when connecting to the DB2 database server."); + + public static final ConfigOption TABLES = + ConfigOptions.key("tables") + .stringType() + .noDefaultValue() + .withDescription( + "Table names of the DB2 tables to monitor. Regular expressions are supported. " + + "The expected table name format is database.schema.table. " + + "Note that the dot (.) is treated as a delimiter for database, schema, and table names. " + + "To use a dot (.) in a regular expression to match any character, " + + "escape the dot with a backslash. " + + "For example: testdb.DB2INST1.\\.*, testdb.DB2INST1.user_table_[0-9]+."); + + public static final ConfigOption SERVER_TIME_ZONE = + ConfigOptions.key("server-time-zone") + .stringType() + .defaultValue("UTC") + .withDescription( + "The session time zone of the database server. The default value is UTC."); + + public static final ConfigOption SCAN_INCREMENTAL_SNAPSHOT_CHUNK_KEY_COLUMN = + ConfigOptions.key("scan.incremental.snapshot.chunk.key-column") + .stringType() + .noDefaultValue() + .withDescription( + "The chunk key of the table snapshot. Captured tables are split into multiple chunks by a chunk key when reading the table snapshot. " + + "By default, the chunk key is the first column of the primary key. " + + "This column must be a column of the primary key."); + + public static final ConfigOption SCAN_INCREMENTAL_SNAPSHOT_CHUNK_SIZE = + ConfigOptions.key("scan.incremental.snapshot.chunk.size") + .intType() + .defaultValue(8096) + .withDescription( + "The chunk size (number of rows) of the table snapshot. Captured tables are split into multiple chunks when reading the table snapshot."); + + public static final ConfigOption SCAN_SNAPSHOT_FETCH_SIZE = + ConfigOptions.key("scan.snapshot.fetch.size") + .intType() + .defaultValue(1024) + .withDescription( + "The maximum fetch size per poll when reading a table snapshot."); + + public static final ConfigOption CONNECT_TIMEOUT = + ConfigOptions.key("connect.timeout") + .durationType() + .defaultValue(Duration.ofSeconds(30)) + .withDescription( + "The maximum time that the connector should wait after trying to connect to the DB2 database server before timing out."); + + public static final ConfigOption CONNECTION_POOL_SIZE = + ConfigOptions.key("connection.pool.size") + .intType() + .defaultValue(20) + .withDescription("The connection pool size."); + + public static final ConfigOption CONNECT_MAX_RETRIES = + ConfigOptions.key("connect.max-retries") + .intType() + .defaultValue(3) + .withDescription( + "The maximum number of retries for building a DB2 database server connection."); + + public static final ConfigOption SCAN_STARTUP_MODE = + ConfigOptions.key("scan.startup.mode") + .stringType() + .defaultValue("initial") + .withDescription( + "Optional startup mode for DB2 CDC consumer, valid enumerations are " + + "\"initial\" or \"latest-offset\"."); + + public static final ConfigOption SCAN_INCREMENTAL_SNAPSHOT_BACKFILL_SKIP = + ConfigOptions.key("scan.incremental.snapshot.backfill.skip") + .booleanType() + .defaultValue(true) + .withDescription( + "Whether to skip backfill in the snapshot reading phase. If backfill is skipped, changes on captured tables during the snapshot phase will be consumed later in the change log reading phase instead of being merged into the snapshot. WARNING: Skipping backfill might lead to data inconsistency because some change log events that happened within the snapshot phase might be replayed (only at-least-once semantics are promised). For example, this can happen when updating an already updated value in the snapshot or deleting an already deleted entry in the snapshot. These replayed change log events should be handled specially."); + + // ---------------------------------------------------------------------------- + // experimental options, won't add them to documentation + // ---------------------------------------------------------------------------- + @Experimental + public static final ConfigOption CHUNK_META_GROUP_SIZE = + ConfigOptions.key("chunk-meta.group.size") + .intType() + .defaultValue(1000) + .withDescription( + "The group size of chunk meta, if the meta size exceeds the group size, the meta will be divided into multiple groups."); + + @Experimental + public static final ConfigOption CHUNK_KEY_EVEN_DISTRIBUTION_FACTOR_UPPER_BOUND = + ConfigOptions.key("chunk-key.even-distribution.factor.upper-bound") + .doubleType() + .defaultValue(1000.0d) + .withFallbackKeys("split-key.even-distribution.factor.upper-bound") + .withDescription( + "The upper bound of the chunk key distribution factor. The distribution factor is used to determine whether the" + + " table has an even distribution." + + " The table chunks use the even distribution optimization when the data distribution is even," + + " and DB2 is queried for splitting when it is uneven." + + " The distribution factor can be calculated by (MAX(id) - MIN(id) + 1) / rowCount."); + + @Experimental + public static final ConfigOption CHUNK_KEY_EVEN_DISTRIBUTION_FACTOR_LOWER_BOUND = + ConfigOptions.key("chunk-key.even-distribution.factor.lower-bound") + .doubleType() + .defaultValue(0.05d) + .withFallbackKeys("split-key.even-distribution.factor.lower-bound") + .withDescription( + "The lower bound of the chunk key distribution factor. The distribution factor is used to determine whether the" + + " table has an even distribution." + + " The table chunks use the even distribution optimization when the data distribution is even," + + " and DB2 is queried for splitting when it is uneven." + + " The distribution factor can be calculated by (MAX(id) - MIN(id) + 1) / rowCount."); + + @Experimental + public static final ConfigOption SCAN_INCREMENTAL_CLOSE_IDLE_READER_ENABLED = + ConfigOptions.key("scan.incremental.close-idle-reader.enabled") + .booleanType() + .defaultValue(false) + .withDescription( + "Whether to close idle readers at the end of the snapshot phase. This feature depends on " + + "FLIP-147: Support Checkpoints After Tasks Finished. The Flink version must be " + + "greater than or equal to 1.14 when enabling this feature."); + + @Experimental + public static final ConfigOption SCHEMA_CHANGE_ENABLED = + ConfigOptions.key("schema-change.enabled") + .booleanType() + .defaultValue(true) + .withDescription( + "Whether to send schema change events. The default value is true. If set to false, schema changes will not be sent."); + + @Experimental + public static final ConfigOption TABLES_EXCLUDE = + ConfigOptions.key("tables.exclude") + .stringType() + .noDefaultValue() + .withDescription( + "Table names of the DB2 tables to exclude. Regular expressions are supported. " + + "The expected table name format is database.schema.table. " + + "Note that the dot (.) is treated as a delimiter for database, schema, and table names. " + + "To use a dot (.) in a regular expression to match any character, " + + "escape the dot with a backslash. " + + "For example: testdb.DB2INST1.\\.*, testdb.DB2INST1.user_table_[0-9]+."); + + @Experimental + public static final ConfigOption METADATA_LIST = + ConfigOptions.key("metadata.list") + .stringType() + .noDefaultValue() + .withDescription( + "List of readable metadata from SourceRecord to be passed downstream, split by `,`. " + + "Available readable metadata fields are: database_name, schema_name, table_name, op_ts."); + + @Experimental + public static final ConfigOption + SCAN_INCREMENTAL_SNAPSHOT_UNBOUNDED_CHUNK_FIRST_ENABLED = + ConfigOptions.key("scan.incremental.snapshot.unbounded-chunk-first.enabled") + .booleanType() + .defaultValue(false) + .withDescription( + "Whether to assign the unbounded chunks first during the snapshot reading phase. This might help reduce the risk of the TaskManager experiencing an out-of-memory (OOM) error when taking a snapshot of the largest unbounded chunk. Defaults to false."); +} diff --git a/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/main/java/org/apache/flink/cdc/connectors/db2/source/Db2EventDeserializer.java b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/main/java/org/apache/flink/cdc/connectors/db2/source/Db2EventDeserializer.java new file mode 100644 index 00000000000..c9275cd32b6 --- /dev/null +++ b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/main/java/org/apache/flink/cdc/connectors/db2/source/Db2EventDeserializer.java @@ -0,0 +1,281 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.cdc.connectors.db2.source; + +import org.apache.flink.cdc.common.annotation.Internal; +import org.apache.flink.cdc.common.data.DecimalData; +import org.apache.flink.cdc.common.event.SchemaChangeEvent; +import org.apache.flink.cdc.common.event.TableId; +import org.apache.flink.cdc.common.schema.Schema; +import org.apache.flink.cdc.common.types.DecimalType; +import org.apache.flink.cdc.common.utils.SchemaMergingUtils; +import org.apache.flink.cdc.connectors.db2.table.Db2ReadableMetadata; +import org.apache.flink.cdc.connectors.db2.utils.Db2SchemaUtils; +import org.apache.flink.cdc.debezium.event.DebeziumEventDeserializationSchema; +import org.apache.flink.cdc.debezium.history.FlinkJsonTableChangeSerializer; +import org.apache.flink.cdc.debezium.table.DebeziumChangelogMode; +import org.apache.flink.table.data.TimestampData; + +import io.debezium.data.Envelope; +import io.debezium.relational.history.TableChanges; +import io.debezium.relational.history.TableChanges.TableChange; +import org.apache.kafka.connect.data.Struct; +import org.apache.kafka.connect.source.SourceRecord; + +import java.math.BigDecimal; +import java.math.BigInteger; +import java.nio.ByteBuffer; +import java.util.ArrayList; +import java.util.Collections; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +import static org.apache.flink.cdc.connectors.base.utils.SourceRecordUtils.getHistoryRecord; +import static org.apache.flink.cdc.connectors.base.utils.SourceRecordUtils.isSchemaChangeEvent; + +/** Event deserializer for {@link Db2DataSource}. */ +@Internal +public class Db2EventDeserializer extends DebeziumEventDeserializationSchema { + + private static final long serialVersionUID = 1L; + private final boolean includeSchemaChanges; + private final List readableMetadataList; + + /** + * Cache to compute schema differences for ALTER events. + * + *

    This cache is runtime-only and will be reconstructed from checkpointed split state (see + * {@link #initializeTableSchemaCacheFromSplitSchemas(Map)}). It must not be {@code final} + * because Java deserialization bypasses field initializers for {@code transient} fields. + */ + private transient Map tableSchemaCache; + + private static final FlinkJsonTableChangeSerializer TABLE_CHANGE_SERIALIZER = + new FlinkJsonTableChangeSerializer(); + + public Db2EventDeserializer(DebeziumChangelogMode changelogMode, boolean includeSchemaChanges) { + super(new Db2SchemaDataTypeInference(), changelogMode); + this.includeSchemaChanges = includeSchemaChanges; + this.readableMetadataList = new ArrayList<>(); + this.tableSchemaCache = new HashMap<>(); + } + + public Db2EventDeserializer( + DebeziumChangelogMode changelogMode, + boolean includeSchemaChanges, + List readableMetadataList) { + super(new Db2SchemaDataTypeInference(), changelogMode); + this.includeSchemaChanges = includeSchemaChanges; + this.readableMetadataList = readableMetadataList; + this.tableSchemaCache = new HashMap<>(); + } + + /** + * Initializes the schema cache from checkpointed split state. + * + *

    The incremental source checkpoints Debezium {@link TableChange}s in {@code StreamSplit}'s + * {@code tableSchemas}. We use it as the source of truth to (re)build the local {@link Schema} + * cache after failover or task redistribution. + */ + public void initializeTableSchemaCacheFromSplitSchemas( + Map tableSchemas) { + if (!includeSchemaChanges || tableSchemas == null || tableSchemas.isEmpty()) { + return; + } + final Map cache = getTableSchemaCache(); + for (Map.Entry entry : + tableSchemas.entrySet()) { + final io.debezium.relational.TableId dbzTableId = entry.getKey(); + final TableChange tableChange = entry.getValue(); + if (dbzTableId == null || tableChange == null || tableChange.getTable() == null) { + continue; + } + final TableId tableId = + TableId.tableId(dbzTableId.catalog(), dbzTableId.schema(), dbzTableId.table()); + cache.putIfAbsent(tableId, Db2SchemaUtils.toSchema(tableChange.getTable())); + } + } + + private Map getTableSchemaCache() { + if (tableSchemaCache == null) { + tableSchemaCache = new HashMap<>(); + } + return tableSchemaCache; + } + + @Override + protected List deserializeSchemaChangeRecord(SourceRecord record) { + if (!includeSchemaChanges) { + return Collections.emptyList(); + } + + try { + TableChanges changes = + TABLE_CHANGE_SERIALIZER.deserialize( + getHistoryRecord(record) + .document() + .getArray( + io.debezium.relational.history.HistoryRecord.Fields + .TABLE_CHANGES), + true); + + final Map cache = getTableSchemaCache(); + List events = new ArrayList<>(); + for (TableChange change : changes) { + TableId tableId = + TableId.tableId( + change.getId().catalog(), + change.getId().schema(), + change.getId().table()); + switch (change.getType()) { + case CREATE: + Schema createdSchema = Db2SchemaUtils.toSchema(change.getTable()); + events.add( + new org.apache.flink.cdc.common.event.CreateTableEvent( + tableId, createdSchema)); + cache.put(tableId, createdSchema); + break; + case ALTER: + Schema newSchema = Db2SchemaUtils.toSchema(change.getTable()); + Schema oldSchema = cache.get(tableId); + if (oldSchema == null) { + events.add( + new org.apache.flink.cdc.common.event.CreateTableEvent( + tableId, newSchema)); + } else { + events.addAll( + SchemaMergingUtils.getSchemaDifference( + tableId, oldSchema, newSchema)); + } + cache.put(tableId, newSchema); + break; + case DROP: + events.add(new org.apache.flink.cdc.common.event.DropTableEvent(tableId)); + cache.remove(tableId); + break; + default: + // ignore others + } + } + return events; + } catch (Exception e) { + throw new IllegalStateException("Failed to deserialize DB2 schema change event", e); + } + } + + @Override + protected boolean isDataChangeRecord(SourceRecord record) { + org.apache.kafka.connect.data.Schema valueSchema = record.valueSchema(); + Struct value = (Struct) record.value(); + return value != null + && valueSchema != null + && valueSchema.field(Envelope.FieldName.OPERATION) != null + && value.getString(Envelope.FieldName.OPERATION) != null; + } + + @Override + protected boolean isSchemaChangeRecord(SourceRecord record) { + return isSchemaChangeEvent(record); + } + + @Override + protected TableId getTableId(SourceRecord record) { + // Debezium source record contains database/schema/table information in the source struct. + // Using SourceRecordUtils keeps the namespace (database) in the TableId so that schema + // change events and data change events refer to the same identifier. + io.debezium.relational.TableId dbzTableId = + org.apache.flink.cdc.connectors.base.utils.SourceRecordUtils.getTableId(record); + return Db2SchemaUtils.toCdcTableId(dbzTableId); + } + + @Override + protected Map getMetadata(SourceRecord record) { + Map metadataMap = new HashMap<>(); + if (readableMetadataList == null || readableMetadataList.isEmpty()) { + return metadataMap; + } + readableMetadataList.forEach( + (db2ReadableMetadata -> { + Object metadata = db2ReadableMetadata.getConverter().read(record); + if (db2ReadableMetadata.equals(Db2ReadableMetadata.OP_TS)) { + metadataMap.put( + db2ReadableMetadata.getKey(), + String.valueOf(((TimestampData) metadata).getMillisecond())); + } else { + metadataMap.put(db2ReadableMetadata.getKey(), String.valueOf(metadata)); + } + })); + return metadataMap; + } + + @Override + protected Object convertToDecimal( + DecimalType decimalType, Object dbzObj, org.apache.kafka.connect.data.Schema schema) { + BigDecimal db2Decimal = decodeDb2CdcDecimal(dbzObj); + if (db2Decimal != null) { + return DecimalData.fromBigDecimal( + db2Decimal, decimalType.getPrecision(), decimalType.getScale()); + } + return super.convertToDecimal(decimalType, dbzObj, schema); + } + + private static BigDecimal decodeDb2CdcDecimal(Object dbzObj) { + byte[] bytes; + if (dbzObj instanceof byte[]) { + bytes = (byte[]) dbzObj; + } else if (dbzObj instanceof ByteBuffer) { + ByteBuffer duplicate = ((ByteBuffer) dbzObj).duplicate(); + bytes = new byte[duplicate.remaining()]; + duplicate.get(bytes); + } else if (dbzObj instanceof String) { + try { + bytes = new BigDecimal((String) dbzObj).unscaledValue().toByteArray(); + } catch (NumberFormatException e) { + return null; + } + } else if (dbzObj instanceof BigDecimal) { + bytes = ((BigDecimal) dbzObj).unscaledValue().toByteArray(); + } else if (dbzObj instanceof BigInteger) { + bytes = ((BigInteger) dbzObj).toByteArray(); + } else { + return null; + } + + StringBuilder decimalText = new StringBuilder(bytes.length); + for (int i = bytes.length - 1; i >= 0; i--) { + int ch = bytes[i] & 0xFF; + if (ch == 0 || (ch & 0x80) != 0) { + continue; + } + if ((ch >= '0' && ch <= '9') || ch == '.' || ch == '-' || ch == '+') { + decimalText.append((char) ch); + continue; + } + return null; + } + if (decimalText.length() == 0) { + return null; + } + try { + return new BigDecimal(decimalText.toString()); + } catch (NumberFormatException ignored) { + return null; + } + } +} diff --git a/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/main/java/org/apache/flink/cdc/connectors/db2/source/Db2MetadataAccessor.java b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/main/java/org/apache/flink/cdc/connectors/db2/source/Db2MetadataAccessor.java new file mode 100644 index 00000000000..56abc99d2e9 --- /dev/null +++ b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/main/java/org/apache/flink/cdc/connectors/db2/source/Db2MetadataAccessor.java @@ -0,0 +1,84 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.cdc.connectors.db2.source; + +import org.apache.flink.cdc.common.annotation.Internal; +import org.apache.flink.cdc.common.event.TableId; +import org.apache.flink.cdc.common.schema.Schema; +import org.apache.flink.cdc.common.source.MetadataAccessor; +import org.apache.flink.cdc.connectors.db2.source.config.Db2SourceConfig; +import org.apache.flink.cdc.connectors.db2.utils.Db2SchemaUtils; + +import javax.annotation.Nullable; + +import java.util.List; + +/** {@link MetadataAccessor} for {@link Db2DataSource}. */ +@Internal +public class Db2MetadataAccessor implements MetadataAccessor { + + private final Db2SourceConfig sourceConfig; + + public Db2MetadataAccessor(Db2SourceConfig sourceConfig) { + this.sourceConfig = sourceConfig; + } + + /** + * List all databases from DB2. + * + * @return The list of database names + */ + @Override + public List listNamespaces() { + return Db2SchemaUtils.listNamespaces(sourceConfig); + } + + /** + * List all schemas from a DB2 database. + * + * @param namespace The database name to list schemas from. + * @return The list of schema names + */ + @Override + public List listSchemas(@Nullable String namespace) { + return Db2SchemaUtils.listSchemas(sourceConfig, namespace); + } + + /** + * List tables from DB2. + * + * @param namespace The database name. If null, uses the configured database. + * @param schemaName The schema name. If null, lists tables from all schemas. + * @return The list of {@link TableId}s. + */ + @Override + public List listTables(@Nullable String namespace, @Nullable String schemaName) { + return Db2SchemaUtils.listTables(sourceConfig, namespace, schemaName); + } + + /** + * Get the {@link Schema} of the given table. + * + * @param tableId The {@link TableId} of the given table. + * @return The {@link Schema} of the table. + */ + @Override + public Schema getTableSchema(TableId tableId) { + return Db2SchemaUtils.getTableSchema(sourceConfig, tableId); + } +} diff --git a/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/main/java/org/apache/flink/cdc/connectors/db2/source/Db2PipelineSource.java b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/main/java/org/apache/flink/cdc/connectors/db2/source/Db2PipelineSource.java new file mode 100644 index 00000000000..5fef7ad283e --- /dev/null +++ b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/main/java/org/apache/flink/cdc/connectors/db2/source/Db2PipelineSource.java @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.cdc.connectors.db2.source; + +import org.apache.flink.cdc.common.annotation.Internal; +import org.apache.flink.cdc.common.event.Event; +import org.apache.flink.cdc.connectors.base.config.SourceConfig; +import org.apache.flink.cdc.connectors.base.source.meta.split.SourceRecords; +import org.apache.flink.cdc.connectors.base.source.meta.split.SourceSplitState; +import org.apache.flink.cdc.connectors.base.source.metrics.SourceReaderMetrics; +import org.apache.flink.cdc.connectors.db2.source.config.Db2SourceConfig; +import org.apache.flink.cdc.connectors.db2.source.config.Db2SourceConfigFactory; +import org.apache.flink.cdc.connectors.db2.source.dialect.Db2Dialect; +import org.apache.flink.cdc.connectors.db2.source.offset.LsnFactory; +import org.apache.flink.cdc.connectors.db2.source.reader.Db2PipelineRecordEmitter; +import org.apache.flink.cdc.debezium.DebeziumDeserializationSchema; +import org.apache.flink.connector.base.source.reader.RecordEmitter; + +/** + * The DB2 CDC Source for Pipeline connector, which supports parallel snapshot reading of tables and + * then continues to capture data changes from the transaction log. + * + *

    This source extends {@link Db2SourceBuilder.Db2IncrementalSource} and overrides the record + * emitter to use {@link Db2PipelineRecordEmitter} for proper handling of schema events in the CDC + * pipeline. + */ +@Internal +public class Db2PipelineSource extends Db2SourceBuilder.Db2IncrementalSource { + + private static final long serialVersionUID = 1L; + + public Db2PipelineSource( + Db2SourceConfigFactory configFactory, + DebeziumDeserializationSchema deserializationSchema, + LsnFactory offsetFactory, + Db2Dialect dataSourceDialect) { + super(configFactory, deserializationSchema, offsetFactory, dataSourceDialect); + } + + @Override + protected RecordEmitter createRecordEmitter( + SourceConfig sourceConfig, SourceReaderMetrics sourceReaderMetrics) { + Db2SourceConfig db2SourceConfig = (Db2SourceConfig) sourceConfig; + Db2Dialect db2Dialect = (Db2Dialect) dataSourceDialect; + return new Db2PipelineRecordEmitter<>( + deserializationSchema, + sourceReaderMetrics, + db2SourceConfig, + offsetFactory, + db2Dialect); + } +} diff --git a/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/main/java/org/apache/flink/cdc/connectors/db2/source/Db2SchemaDataTypeInference.java b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/main/java/org/apache/flink/cdc/connectors/db2/source/Db2SchemaDataTypeInference.java new file mode 100644 index 00000000000..7646a3d7274 --- /dev/null +++ b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/main/java/org/apache/flink/cdc/connectors/db2/source/Db2SchemaDataTypeInference.java @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.cdc.connectors.db2.source; + +import org.apache.flink.cdc.common.annotation.Internal; +import org.apache.flink.cdc.common.types.DataType; +import org.apache.flink.cdc.debezium.event.DebeziumSchemaDataTypeInference; + +import org.apache.kafka.connect.data.Schema; + +/** {@link DataType} inference for DB2 Debezium {@link Schema}. */ +@Internal +public class Db2SchemaDataTypeInference extends DebeziumSchemaDataTypeInference { + + private static final long serialVersionUID = 1L; + + // DB2 has database-specific types, but no special handling is currently + // needed here, so this class uses the default implementation from the parent class. + // If DB2-specific types require special handling in the future, + // they can be added here by overriding the inferStruct method. +} diff --git a/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/main/java/org/apache/flink/cdc/connectors/db2/source/OpTsMetadataColumn.java b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/main/java/org/apache/flink/cdc/connectors/db2/source/OpTsMetadataColumn.java new file mode 100644 index 00000000000..61a83a5c691 --- /dev/null +++ b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/main/java/org/apache/flink/cdc/connectors/db2/source/OpTsMetadataColumn.java @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.cdc.connectors.db2.source; + +import org.apache.flink.cdc.common.source.SupportedMetadataColumn; +import org.apache.flink.cdc.common.types.DataType; +import org.apache.flink.cdc.common.types.DataTypes; + +import java.util.Map; + +/** A {@link SupportedMetadataColumn} for op_ts. */ +public class OpTsMetadataColumn implements SupportedMetadataColumn { + + @Override + public String getName() { + return "op_ts"; + } + + @Override + public DataType getType() { + return DataTypes.BIGINT().notNull(); + } + + @Override + public Class getJavaClass() { + return Long.class; + } + + @Override + public Object read(Map metadata) { + if (metadata.containsKey(getName())) { + return Long.parseLong(metadata.get(getName())); + } + throw new IllegalArgumentException("op_ts doesn't exist in the metadata: " + metadata); + } +} diff --git a/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/main/java/org/apache/flink/cdc/connectors/db2/source/SchemaNameMetadataColumn.java b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/main/java/org/apache/flink/cdc/connectors/db2/source/SchemaNameMetadataColumn.java new file mode 100644 index 00000000000..6cbe6a9a15b --- /dev/null +++ b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/main/java/org/apache/flink/cdc/connectors/db2/source/SchemaNameMetadataColumn.java @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.cdc.connectors.db2.source; + +import org.apache.flink.cdc.common.source.SupportedMetadataColumn; +import org.apache.flink.cdc.common.types.DataType; +import org.apache.flink.cdc.common.types.DataTypes; + +import java.util.Map; + +/** A {@link SupportedMetadataColumn} for schema_name. */ +public class SchemaNameMetadataColumn implements SupportedMetadataColumn { + + @Override + public String getName() { + return "schema_name"; + } + + @Override + public DataType getType() { + return DataTypes.STRING().notNull(); + } + + @Override + public Class getJavaClass() { + return String.class; + } + + @Override + public Object read(Map metadata) { + if (metadata.containsKey(getName())) { + return metadata.get(getName()); + } + throw new IllegalArgumentException( + "schema_name doesn't exist in the metadata: " + metadata); + } +} diff --git a/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/main/java/org/apache/flink/cdc/connectors/db2/source/TableNameMetadataColumn.java b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/main/java/org/apache/flink/cdc/connectors/db2/source/TableNameMetadataColumn.java new file mode 100644 index 00000000000..c3cd3f03b5e --- /dev/null +++ b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/main/java/org/apache/flink/cdc/connectors/db2/source/TableNameMetadataColumn.java @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.cdc.connectors.db2.source; + +import org.apache.flink.cdc.common.source.SupportedMetadataColumn; +import org.apache.flink.cdc.common.types.DataType; +import org.apache.flink.cdc.common.types.DataTypes; + +import java.util.Map; + +/** A {@link SupportedMetadataColumn} for table_name. */ +public class TableNameMetadataColumn implements SupportedMetadataColumn { + + @Override + public String getName() { + return "table_name"; + } + + @Override + public DataType getType() { + return DataTypes.STRING().notNull(); + } + + @Override + public Class getJavaClass() { + return String.class; + } + + @Override + public Object read(Map metadata) { + if (metadata.containsKey(getName())) { + return metadata.get(getName()); + } + throw new IllegalArgumentException("table_name doesn't exist in the metadata: " + metadata); + } +} diff --git a/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/main/java/org/apache/flink/cdc/connectors/db2/source/reader/Db2PipelineRecordEmitter.java b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/main/java/org/apache/flink/cdc/connectors/db2/source/reader/Db2PipelineRecordEmitter.java new file mode 100644 index 00000000000..d0d709979b6 --- /dev/null +++ b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/main/java/org/apache/flink/cdc/connectors/db2/source/reader/Db2PipelineRecordEmitter.java @@ -0,0 +1,177 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.cdc.connectors.db2.source.reader; + +import org.apache.flink.api.connector.source.SourceOutput; +import org.apache.flink.cdc.common.event.CreateTableEvent; +import org.apache.flink.cdc.common.schema.Schema; +import org.apache.flink.cdc.connectors.base.options.StartupOptions; +import org.apache.flink.cdc.connectors.base.source.meta.offset.OffsetFactory; +import org.apache.flink.cdc.connectors.base.source.meta.split.SourceSplitState; +import org.apache.flink.cdc.connectors.base.source.metrics.SourceReaderMetrics; +import org.apache.flink.cdc.connectors.base.source.reader.IncrementalSourceRecordEmitter; +import org.apache.flink.cdc.connectors.db2.source.Db2EventDeserializer; +import org.apache.flink.cdc.connectors.db2.source.config.Db2SourceConfig; +import org.apache.flink.cdc.connectors.db2.source.dialect.Db2Dialect; +import org.apache.flink.cdc.connectors.db2.source.utils.Db2ConnectionUtils; +import org.apache.flink.cdc.connectors.db2.utils.Db2SchemaUtils; +import org.apache.flink.cdc.debezium.DebeziumDeserializationSchema; +import org.apache.flink.connector.base.source.reader.RecordEmitter; + +import io.debezium.connector.db2.Db2Connection; +import org.apache.kafka.connect.source.SourceRecord; + +import java.sql.SQLException; +import java.util.HashMap; +import java.util.HashSet; +import java.util.List; +import java.util.Map; +import java.util.Set; + +import static org.apache.flink.cdc.connectors.base.source.meta.wartermark.WatermarkEvent.isLowWatermarkEvent; +import static org.apache.flink.cdc.connectors.base.utils.SourceRecordUtils.getTableId; +import static org.apache.flink.cdc.connectors.base.utils.SourceRecordUtils.isDataChangeRecord; +import static org.apache.flink.cdc.connectors.base.utils.SourceRecordUtils.isSchemaChangeEvent; +import static org.apache.flink.cdc.connectors.db2.source.utils.Db2ConnectionUtils.createDb2Connection; + +/** The {@link RecordEmitter} implementation for DB2 pipeline connector. */ +public class Db2PipelineRecordEmitter extends IncrementalSourceRecordEmitter { + private final Db2SourceConfig sourceConfig; + private final Db2Dialect db2Dialect; + + // Track tables for which CreateTableEvent has already been sent. + private final Set alreadySendCreateTableTables; + + // Used when the startup mode is snapshot (bounded mode). + private boolean shouldEmitAllCreateTableEventsInSnapshotMode = true; + private final boolean isBounded; + + // Cache CreateTableEvent instances by table for O(1) lookup. + private final Map createTableEventCache; + + public Db2PipelineRecordEmitter( + DebeziumDeserializationSchema debeziumDeserializationSchema, + SourceReaderMetrics sourceReaderMetrics, + Db2SourceConfig sourceConfig, + OffsetFactory offsetFactory, + Db2Dialect db2Dialect) { + super( + debeziumDeserializationSchema, + sourceReaderMetrics, + sourceConfig.isIncludeSchemaChanges(), + offsetFactory); + this.sourceConfig = sourceConfig; + this.db2Dialect = db2Dialect; + this.alreadySendCreateTableTables = new HashSet<>(); + this.createTableEventCache = new HashMap<>(); + this.isBounded = StartupOptions.snapshot().equals(sourceConfig.getStartupOptions()); + } + + @Override + protected void processElement( + SourceRecord element, SourceOutput output, SourceSplitState splitState) + throws Exception { + // Rebuild schema cache from checkpointed split state before handling schema change + // records. + // The stream split checkpoints Debezium TableChange(s) (table schemas) and will be restored + // on failover; the deserializer's local cache is runtime-only and must be reinitialized. + if (isSchemaChangeEvent(element) + && splitState.isStreamSplitState() + && debeziumDeserializationSchema instanceof Db2EventDeserializer) { + ((Db2EventDeserializer) debeziumDeserializationSchema) + .initializeTableSchemaCacheFromSplitSchemas( + splitState.asStreamSplitState().getTableSchemas()); + } + + if (shouldEmitAllCreateTableEventsInSnapshotMode && isBounded) { + // In snapshot mode, emit all schemas at once. + ensureCreateTableEventsLoaded(); + emitAllCreateTableEvents(output); + shouldEmitAllCreateTableEventsInSnapshotMode = false; + } else if (isLowWatermarkEvent(element) && splitState.isSnapshotSplitState()) { + // In the snapshot phase of INITIAL startup mode, lazily send CreateTableEvent + // downstream to avoid checkpoint timeouts. + io.debezium.relational.TableId tableId = + splitState.asSnapshotSplitState().toSourceSplit().getTableId(); + emitCreateTableEventIfNeeded(tableId, output); + } else if (isDataChangeRecord(element)) { + // Handle data change events; schema change events are handled downstream directly. + io.debezium.relational.TableId tableId = getTableId(element); + emitCreateTableEventIfNeeded(tableId, output); + } + super.processElement(element, output, splitState); + } + + @SuppressWarnings("unchecked") + private void emitAllCreateTableEvents(SourceOutput output) { + createTableEventCache.forEach( + (tableId, createTableEvent) -> { + output.collect((T) createTableEvent); + alreadySendCreateTableTables.add(tableId); + }); + } + + @SuppressWarnings("unchecked") + private void emitCreateTableEventIfNeeded( + io.debezium.relational.TableId tableId, SourceOutput output) { + if (alreadySendCreateTableTables.contains(tableId)) { + return; + } + + CreateTableEvent createTableEvent = createTableEventCache.get(tableId); + if (createTableEvent != null) { + output.collect((T) createTableEvent); + } else { + // The table is not in the cache, so fetch its schema from the database. + try (Db2Connection jdbc = createDb2Connection(sourceConfig.getDbzConnectorConfig())) { + createTableEvent = buildCreateTableEvent(jdbc, tableId); + output.collect((T) createTableEvent); + createTableEventCache.put(tableId, createTableEvent); + } catch (SQLException e) { + throw new RuntimeException("Failed to get table schema for " + tableId, e); + } + } + alreadySendCreateTableTables.add(tableId); + } + + private CreateTableEvent buildCreateTableEvent( + Db2Connection jdbc, io.debezium.relational.TableId tableId) { + Schema schema = Db2SchemaUtils.getTableSchema(tableId, jdbc, db2Dialect); + return new CreateTableEvent(Db2SchemaUtils.toCdcTableId(tableId), schema); + } + + private void ensureCreateTableEventsLoaded() { + if (!createTableEventCache.isEmpty()) { + return; + } + generateCreateTableEvents(); + } + + private void generateCreateTableEvents() { + try (Db2Connection jdbc = createDb2Connection(sourceConfig.getDbzConnectorConfig())) { + List capturedTableIds = + Db2ConnectionUtils.listTables(jdbc, sourceConfig.getTableFilters()); + for (io.debezium.relational.TableId tableId : capturedTableIds) { + CreateTableEvent createTableEvent = buildCreateTableEvent(jdbc, tableId); + createTableEventCache.put(tableId, createTableEvent); + } + } catch (SQLException e) { + throw new RuntimeException("Cannot start emitter to fetch table schema.", e); + } + } +} diff --git a/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/main/java/org/apache/flink/cdc/connectors/db2/table/Db2ReadableMetadata.java b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/main/java/org/apache/flink/cdc/connectors/db2/table/Db2ReadableMetadata.java new file mode 100644 index 00000000000..13c2e84d40d --- /dev/null +++ b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/main/java/org/apache/flink/cdc/connectors/db2/table/Db2ReadableMetadata.java @@ -0,0 +1,123 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.cdc.connectors.db2.table; + +import org.apache.flink.cdc.debezium.table.MetadataConverter; +import org.apache.flink.table.api.DataTypes; +import org.apache.flink.table.data.StringData; +import org.apache.flink.table.data.TimestampData; +import org.apache.flink.table.types.DataType; + +import io.debezium.connector.AbstractSourceInfo; +import io.debezium.data.Envelope; +import org.apache.kafka.connect.data.Struct; +import org.apache.kafka.connect.source.SourceRecord; + +/** Defines the supported metadata columns for DB2 pipeline source records. */ +public enum Db2ReadableMetadata { + /** Name of the table that contains the row. */ + TABLE_NAME( + "table_name", + DataTypes.STRING().notNull(), + new MetadataConverter() { + private static final long serialVersionUID = 1L; + + @Override + public Object read(SourceRecord record) { + Struct messageStruct = (Struct) record.value(); + Struct sourceStruct = messageStruct.getStruct(Envelope.FieldName.SOURCE); + return StringData.fromString( + sourceStruct.getString(AbstractSourceInfo.TABLE_NAME_KEY)); + } + }), + + /** Name of the schema that contains the row. */ + SCHEMA_NAME( + "schema_name", + DataTypes.STRING().notNull(), + new MetadataConverter() { + private static final long serialVersionUID = 1L; + + @Override + public Object read(SourceRecord record) { + Struct messageStruct = (Struct) record.value(); + Struct sourceStruct = messageStruct.getStruct(Envelope.FieldName.SOURCE); + return StringData.fromString( + sourceStruct.getString(AbstractSourceInfo.SCHEMA_NAME_KEY)); + } + }), + + /** Name of the database that contains the row. */ + DATABASE_NAME( + "database_name", + DataTypes.STRING().notNull(), + new MetadataConverter() { + private static final long serialVersionUID = 1L; + + @Override + public Object read(SourceRecord record) { + Struct messageStruct = (Struct) record.value(); + Struct sourceStruct = messageStruct.getStruct(Envelope.FieldName.SOURCE); + return StringData.fromString( + sourceStruct.getString(AbstractSourceInfo.DATABASE_NAME_KEY)); + } + }), + + /** + * Indicates the time when the change was made in the database. If the record is read from a + * table snapshot instead of the change stream, the value is always 0. + */ + OP_TS( + "op_ts", + DataTypes.TIMESTAMP_LTZ(3).notNull(), + new MetadataConverter() { + private static final long serialVersionUID = 1L; + + @Override + public Object read(SourceRecord record) { + Struct messageStruct = (Struct) record.value(); + Struct sourceStruct = messageStruct.getStruct(Envelope.FieldName.SOURCE); + return TimestampData.fromEpochMillis( + (Long) sourceStruct.get(AbstractSourceInfo.TIMESTAMP_KEY)); + } + }); + + private final String key; + + private final DataType dataType; + + private final MetadataConverter converter; + + Db2ReadableMetadata(String key, DataType dataType, MetadataConverter converter) { + this.key = key; + this.dataType = dataType; + this.converter = converter; + } + + public String getKey() { + return key; + } + + public DataType getDataType() { + return dataType; + } + + public MetadataConverter getConverter() { + return converter; + } +} diff --git a/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/main/java/org/apache/flink/cdc/connectors/db2/utils/Db2SchemaUtils.java b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/main/java/org/apache/flink/cdc/connectors/db2/utils/Db2SchemaUtils.java new file mode 100644 index 00000000000..19f1a1d2f22 --- /dev/null +++ b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/main/java/org/apache/flink/cdc/connectors/db2/utils/Db2SchemaUtils.java @@ -0,0 +1,291 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.cdc.connectors.db2.utils; + +import org.apache.flink.cdc.common.event.TableId; +import org.apache.flink.cdc.common.schema.Column; +import org.apache.flink.cdc.common.schema.Schema; +import org.apache.flink.cdc.connectors.db2.source.config.Db2SourceConfig; +import org.apache.flink.cdc.connectors.db2.source.dialect.Db2Dialect; +import org.apache.flink.cdc.connectors.db2.source.utils.Db2ConnectionUtils; + +import io.debezium.connector.db2.Db2Connection; +import io.debezium.jdbc.JdbcConnection; +import io.debezium.relational.Table; +import io.debezium.relational.history.TableChanges.TableChange; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import javax.annotation.Nullable; + +import java.sql.SQLException; +import java.util.ArrayList; +import java.util.List; +import java.util.stream.Collectors; + +/** Utilities for converting from Debezium {@link Table} types to {@link Schema}. */ +public class Db2SchemaUtils { + + private static final Logger LOG = LoggerFactory.getLogger(Db2SchemaUtils.class); + + public static List listSchemas(Db2SourceConfig sourceConfig, String namespace) { + try (JdbcConnection jdbc = getDb2Dialect(sourceConfig).openJdbcConnection(sourceConfig)) { + return listSchemas(jdbc, namespace); + } catch (SQLException e) { + throw new RuntimeException(db2MetadataError("list schemas", e), e); + } + } + + public static List listNamespaces(Db2SourceConfig sourceConfig) { + if (sourceConfig.getDatabaseList() != null && !sourceConfig.getDatabaseList().isEmpty()) { + return new ArrayList<>(sourceConfig.getDatabaseList()); + } + try (JdbcConnection jdbc = getDb2Dialect(sourceConfig).openJdbcConnection(sourceConfig)) { + return listNamespaces(jdbc); + } catch (SQLException e) { + throw new RuntimeException(db2MetadataError("list namespaces", e), e); + } + } + + public static List listTables( + Db2SourceConfig sourceConfig, @Nullable String dbName, @Nullable String schemaName) { + try (JdbcConnection jdbc = getDb2Dialect(sourceConfig).openJdbcConnection(sourceConfig)) { + List dbzTableIds = + Db2ConnectionUtils.listTables(jdbc, sourceConfig.getTableFilters()); + + return dbzTableIds.stream() + .filter(tableId -> dbName == null || dbName.equalsIgnoreCase(tableId.catalog())) + .filter( + tableId -> + schemaName == null + || schemaName.equalsIgnoreCase(tableId.schema())) + .map(Db2SchemaUtils::toCdcTableId) + .collect(Collectors.toList()); + } catch (SQLException e) { + throw new RuntimeException(db2MetadataError("list tables", e), e); + } + } + + public static Schema getTableSchema(Db2SourceConfig sourceConfig, TableId tableId) { + Db2Dialect dialect = getDb2Dialect(sourceConfig); + try (JdbcConnection jdbc = dialect.openJdbcConnection(sourceConfig)) { + return getTableSchema(tableId, (Db2Connection) jdbc, dialect); + } catch (SQLException e) { + throw new RuntimeException(db2MetadataError("get table schema", e), e); + } + } + + public static Db2Dialect getDb2Dialect(Db2SourceConfig sourceConfig) { + return new Db2Dialect(sourceConfig); + } + + static String db2MetadataError(String action, SQLException e) { + return "Failed to " + + action + + ". Verify DB2 SQL Replication/ASNCDC is initialized, captured tables are registered, " + + "and the configured user can read ASNCDC metadata and change tables. Cause: " + + e.getMessage(); + } + + public static List listSchemas(JdbcConnection jdbc, String namespace) + throws SQLException { + LOG.info("Read list of available schemas"); + final List schemaNames = new ArrayList<>(); + + jdbc.query( + "SELECT DISTINCT TABSCHEMA FROM SYSCAT.TABLES WHERE TYPE = 'T' ORDER BY TABSCHEMA", + rs -> { + while (rs.next()) { + String schemaName = rs.getString(1); + if (schemaName != null) { + schemaNames.add(schemaName.trim()); + } + } + }); + LOG.info("\t list of available schemas are: {}", schemaNames); + return schemaNames; + } + + public static List listNamespaces(JdbcConnection jdbc) throws SQLException { + LOG.info("Read list of available namespaces (databases)"); + final List namespaceNames = new ArrayList<>(); + namespaceNames.add(((Db2Connection) jdbc).getRealDatabaseName()); + LOG.info("\t list of available namespaces are: {}", namespaceNames); + return namespaceNames; + } + + public static String quote(String dbOrTableName) { + return "\"" + dbOrTableName.replace("\"", "\"\"") + "\""; + } + + public static Schema getTableSchema(TableId tableId, Db2Connection jdbc, Db2Dialect dialect) { + io.debezium.relational.TableId dbzTableId = toDbzTableId(tableId); + return getTableSchema(dbzTableId, jdbc, dialect); + } + + public static Schema getTableSchema( + io.debezium.relational.TableId tableId, Db2Connection jdbc, Db2Dialect dialect) { + try { + TableChange tableChange = dialect.queryTableSchema(jdbc, tableId); + if (tableChange == null || tableChange.getTable() == null) { + throw new RuntimeException("Cannot find table schema for " + tableId); + } + return toSchema(tableChange.getTable()); + } catch (Exception e) { + throw new RuntimeException("Failed to get table schema for " + tableId, e); + } + } + + public static Schema toSchema(Table table) { + List columns = + table.columns().stream().map(Db2SchemaUtils::toColumn).collect(Collectors.toList()); + + return Schema.newBuilder() + .setColumns(columns) + .primaryKey(table.primaryKeyColumnNames()) + .comment(table.comment()) + .build(); + } + + public static Column toColumn(io.debezium.relational.Column column) { + if (column.defaultValueExpression().isPresent()) { + String defaultValueExpression = + normalizeDefaultValueExpression(column.defaultValueExpression().get()); + return Column.physicalColumn( + column.name(), + Db2TypeUtils.fromDbzColumn(column), + column.comment(), + defaultValueExpression); + } else { + return Column.physicalColumn( + column.name(), Db2TypeUtils.fromDbzColumn(column), column.comment()); + } + } + + private static String normalizeDefaultValueExpression(String defaultValueExpression) { + if (defaultValueExpression == null) { + return null; + } + String trimmed = defaultValueExpression.trim(); + if (trimmed.isEmpty()) { + return trimmed; + } + String unwrapped = stripOuterParentheses(trimmed); + return unquoteDb2StringLiteral(unwrapped); + } + + private static String stripOuterParentheses(String expression) { + String current = expression; + while (isWrappedByParentheses(current)) { + current = current.substring(1, current.length() - 1).trim(); + } + return current; + } + + private static boolean isWrappedByParentheses(String expression) { + if (expression.length() < 2 + || expression.charAt(0) != '(' + || expression.charAt(expression.length() - 1) != ')') { + return false; + } + int depth = 0; + boolean inSingleQuote = false; + for (int i = 0; i < expression.length(); i++) { + char c = expression.charAt(i); + if (c == '\'') { + if (inSingleQuote + && i + 1 < expression.length() + && expression.charAt(i + 1) == '\'') { + i++; + continue; + } + inSingleQuote = !inSingleQuote; + continue; + } + if (inSingleQuote) { + continue; + } + if (c == '(') { + depth++; + } else if (c == ')') { + depth--; + if (depth == 0 && i < expression.length() - 1) { + return false; + } + } + } + return depth == 0 && !inSingleQuote; + } + + private static String unquoteDb2StringLiteral(String expression) { + String trimmed = expression.trim(); + if (trimmed.isEmpty()) { + return trimmed; + } + int quoteIndex = -1; + if (trimmed.startsWith("N'") || trimmed.startsWith("n'")) { + quoteIndex = 1; + } else if (trimmed.charAt(0) == '\'') { + quoteIndex = 0; + } + if (quoteIndex < 0 || trimmed.charAt(quoteIndex) != '\'') { + return expression; + } + StringBuilder literal = new StringBuilder(); + for (int i = quoteIndex + 1; i < trimmed.length(); i++) { + char c = trimmed.charAt(i); + if (c == '\'') { + if (i + 1 < trimmed.length() && trimmed.charAt(i + 1) == '\'') { + literal.append('\''); + i++; + continue; + } + if (i == trimmed.length() - 1) { + return literal.toString(); + } + return expression; + } + literal.append(c); + } + return expression; + } + + public static io.debezium.relational.TableId toDbzTableId(TableId tableId) { + // DB2 TableId format: database.schema.table + // CDC TableId: namespace (database), schemaName (schema), tableName (table) + return new io.debezium.relational.TableId( + tableId.getNamespace(), tableId.getSchemaName(), tableId.getTableName()); + } + + public static TableId toCdcTableId(io.debezium.relational.TableId dbzTableId) { + // DB2 uses database.schema.table structure + // Debezium TableId: catalog (database), schema, table + // CDC TableId: namespace (database), schemaName (schema), tableName (table) + String catalog = dbzTableId.catalog(); + String schema = dbzTableId.schema(); + String table = dbzTableId.table(); + + LOG.debug( + "Converting Debezium TableId to CDC TableId - catalog: {}, schema: {}, table: {}", + catalog, + schema, + table); + + return TableId.tableId(catalog, schema, table); + } +} diff --git a/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/main/java/org/apache/flink/cdc/connectors/db2/utils/Db2TypeUtils.java b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/main/java/org/apache/flink/cdc/connectors/db2/utils/Db2TypeUtils.java new file mode 100644 index 00000000000..a6ce59241e6 --- /dev/null +++ b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/main/java/org/apache/flink/cdc/connectors/db2/utils/Db2TypeUtils.java @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.cdc.connectors.db2.utils; + +import org.apache.flink.cdc.common.types.DataType; +import org.apache.flink.cdc.common.types.DataTypes; +import org.apache.flink.cdc.common.types.TimestampType; +import org.apache.flink.table.types.logical.DecimalType; + +import io.debezium.relational.Column; + +import java.sql.Types; + +/** A utility class for converting DB2 types to Flink CDC types. */ +public class Db2TypeUtils { + + static final String XML = "xml"; + static final String DECFLOAT = "DECFLOAT"; + + /** Returns a corresponding Flink CDC data type from a Debezium {@link Column}. */ + public static DataType fromDbzColumn(Column column) { + DataType dataType = convertFromColumn(column); + if (column.isOptional()) { + return dataType; + } else { + return dataType.notNull(); + } + } + + /** + * Returns a corresponding Flink CDC data type from a Debezium {@link Column} that is always + * nullable. + */ + private static DataType convertFromColumn(Column column) { + int precision = column.length(); + int scale = column.scale().orElse(0); + + switch (column.jdbcType()) { + case Types.BIT: + case Types.BOOLEAN: + return DataTypes.BOOLEAN(); + case Types.TINYINT: + // DB2 TINYINT is unsigned 0-255, maps to SMALLINT + return DataTypes.SMALLINT(); + case Types.SMALLINT: + return DataTypes.SMALLINT(); + case Types.INTEGER: + return DataTypes.INT(); + case Types.BIGINT: + return DataTypes.BIGINT(); + case Types.REAL: + return DataTypes.FLOAT(); + case Types.FLOAT: + return DataTypes.FLOAT(); + case Types.DOUBLE: + return DataTypes.DOUBLE(); + case Types.NUMERIC: + case Types.DECIMAL: + if (precision > 0 && precision <= DecimalType.MAX_PRECISION) { + return DataTypes.DECIMAL(precision, scale); + } + return DataTypes.DECIMAL(DecimalType.MAX_PRECISION, scale); + case Types.CHAR: + case Types.NCHAR: + return precision > 0 ? DataTypes.CHAR(precision) : DataTypes.STRING(); + case Types.VARCHAR: + case Types.NVARCHAR: + case Types.LONGVARCHAR: + case Types.LONGNVARCHAR: + if (precision > 0) { + return DataTypes.VARCHAR(precision); + } + return DataTypes.STRING(); + case Types.CLOB: + case Types.NCLOB: + case Types.SQLXML: + return DataTypes.STRING(); + case Types.BINARY: + case Types.VARBINARY: + case Types.LONGVARBINARY: + case Types.BLOB: + return DataTypes.BYTES(); + case Types.DATE: + return DataTypes.DATE(); + case Types.TIME: + case Types.TIME_WITH_TIMEZONE: + return DataTypes.TIME(Math.max(scale, 0)); + case Types.TIMESTAMP: + return DataTypes.TIMESTAMP(timestampPrecision(column)); + case Types.TIMESTAMP_WITH_TIMEZONE: + return DataTypes.TIMESTAMP_LTZ(timestampPrecision(column)); + default: + String unknownTypeName = column.typeName(); + if (XML.equalsIgnoreCase(unknownTypeName)) { + return DataTypes.STRING(); + } + if (DECFLOAT.equalsIgnoreCase(unknownTypeName)) { + return DataTypes.DOUBLE(); + } + throw new UnsupportedOperationException( + String.format( + "Doesn't support DB2 type '%s', JDBC type '%d' yet.", + column.typeName(), column.jdbcType())); + } + } + + private static int timestampPrecision(Column column) { + int precision = column.length(); + if (precision < TimestampType.MIN_PRECISION) { + return DataTypes.TIMESTAMP().getPrecision(); + } + return Math.min(precision, TimestampType.MAX_PRECISION); + } +} diff --git a/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/main/resources/META-INF/services/org.apache.flink.cdc.common.factories.Factory b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/main/resources/META-INF/services/org.apache.flink.cdc.common.factories.Factory new file mode 100644 index 00000000000..ab55eac70b7 --- /dev/null +++ b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/main/resources/META-INF/services/org.apache.flink.cdc.common.factories.Factory @@ -0,0 +1,16 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +org.apache.flink.cdc.connectors.db2.factory.Db2DataSourceFactory diff --git a/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/test/java/org/apache/flink/cdc/connectors/db2/factory/Db2DataSourceFactoryContainerTest.java b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/test/java/org/apache/flink/cdc/connectors/db2/factory/Db2DataSourceFactoryContainerTest.java new file mode 100644 index 00000000000..cef97cb60fc --- /dev/null +++ b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/test/java/org/apache/flink/cdc/connectors/db2/factory/Db2DataSourceFactoryContainerTest.java @@ -0,0 +1,229 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.cdc.connectors.db2.factory; + +import org.apache.flink.api.common.eventtime.WatermarkStrategy; +import org.apache.flink.api.common.restartstrategy.RestartStrategies; +import org.apache.flink.cdc.common.configuration.Configuration; +import org.apache.flink.cdc.common.event.CreateTableEvent; +import org.apache.flink.cdc.common.event.DataChangeEvent; +import org.apache.flink.cdc.common.event.Event; +import org.apache.flink.cdc.common.event.OperationType; +import org.apache.flink.cdc.common.event.TableId; +import org.apache.flink.cdc.common.factories.Factory; +import org.apache.flink.cdc.common.schema.Schema; +import org.apache.flink.cdc.common.source.FlinkSourceProvider; +import org.apache.flink.cdc.common.source.MetadataAccessor; +import org.apache.flink.cdc.connectors.db2.source.Db2DataSource; +import org.apache.flink.cdc.runtime.typeutils.EventTypeInfo; +import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; +import org.apache.flink.util.CloseableIterator; + +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.Test; +import org.junit.jupiter.api.Timeout; + +import java.util.ArrayList; +import java.util.Collections; +import java.util.HashMap; +import java.util.Iterator; +import java.util.List; +import java.util.Map; +import java.util.concurrent.TimeUnit; + +import static org.apache.flink.cdc.connectors.db2.source.Db2DataSourceOptions.HOSTNAME; +import static org.apache.flink.cdc.connectors.db2.source.Db2DataSourceOptions.METADATA_LIST; +import static org.apache.flink.cdc.connectors.db2.source.Db2DataSourceOptions.PASSWORD; +import static org.apache.flink.cdc.connectors.db2.source.Db2DataSourceOptions.PORT; +import static org.apache.flink.cdc.connectors.db2.source.Db2DataSourceOptions.SCAN_INCREMENTAL_SNAPSHOT_CHUNK_KEY_COLUMN; +import static org.apache.flink.cdc.connectors.db2.source.Db2DataSourceOptions.TABLES; +import static org.apache.flink.cdc.connectors.db2.source.Db2DataSourceOptions.TABLES_EXCLUDE; +import static org.apache.flink.cdc.connectors.db2.source.Db2DataSourceOptions.USERNAME; +import static org.assertj.core.api.Assertions.assertThat; +import static org.assertj.core.api.Assertions.assertThatThrownBy; +import static org.testcontainers.containers.Db2Container.DB2_PORT; + +/** Testcontainers-backed tests for {@link Db2DataSourceFactory}. */ +@Timeout(value = 300, unit = TimeUnit.SECONDS) +class Db2DataSourceFactoryContainerTest extends PipelineDb2TestBase { + + private static final String DATABASE_NAME = "TESTDB"; + private static final String SCHEMA_NAME = "DB2INST1"; + + @BeforeEach + void before() { + initializeDb2Table("inventory", "PRODUCTS"); + initializeDb2Table("customers", "CUSTOMERS"); + } + + @Test + void testCreateDataSourceWithExactTable() { + Map options = containerOptions(DATABASE_NAME + ".DB2INST1.PRODUCTS"); + + Db2DataSource dataSource = createDataSource(options); + + assertThat(dataSource.getDb2SourceConfig().getTableList()) + .containsExactly("DB2INST1.PRODUCTS"); + } + + @Test + void testCreateDataSourceWithWildcardAndExclude() { + Map options = containerOptions(DATABASE_NAME + ".DB2INST1.\\.*"); + options.put(TABLES_EXCLUDE.key(), DATABASE_NAME + ".DB2INST1.CUSTOMERS"); + + Db2DataSource dataSource = createDataSource(options); + + assertThat(dataSource.getDb2SourceConfig().getTableList()) + .contains("DB2INST1.PRODUCTS") + .doesNotContain("DB2INST1.CUSTOMERS"); + } + + @Test + void testExcludeAllMatchedTables() { + Map options = containerOptions(DATABASE_NAME + ".DB2INST1.PRODUCTS"); + options.put(TABLES_EXCLUDE.key(), DATABASE_NAME + ".DB2INST1.PRODUCTS"); + + assertThatThrownBy(() -> createDataSource(options)) + .isInstanceOf(IllegalArgumentException.class) + .hasMessageContaining("Cannot find any table with the option 'tables.exclude'"); + } + + @Test + void testChunkKeyColumnOptionIsForwarded() { + Map options = containerOptions(DATABASE_NAME + ".DB2INST1.PRODUCTS"); + options.put(SCAN_INCREMENTAL_SNAPSHOT_CHUNK_KEY_COLUMN.key(), "ID"); + + Db2DataSource dataSource = createDataSource(options); + + assertThat(dataSource.getDb2SourceConfig().getChunkKeyColumn()).isEqualTo("ID"); + } + + @Test + void testMetadataAccessorListsDb2ObjectsAndSchema() { + Db2DataSource dataSource = + createDataSource(containerOptions(DATABASE_NAME + ".DB2INST1.PRODUCTS")); + MetadataAccessor metadataAccessor = dataSource.getMetadataAccessor(); + + assertThat(metadataAccessor.listNamespaces()).containsExactly(DATABASE_NAME); + assertThat(metadataAccessor.listSchemas(DATABASE_NAME)).contains(SCHEMA_NAME); + assertThat(metadataAccessor.listTables(DATABASE_NAME, SCHEMA_NAME)) + .contains(TableId.tableId(DATABASE_NAME, SCHEMA_NAME, "PRODUCTS")); + + Schema schema = + metadataAccessor.getTableSchema( + TableId.tableId(DATABASE_NAME, SCHEMA_NAME, "PRODUCTS")); + assertThat(schema.getColumnNames()).containsExactly("ID", "NAME", "DESCRIPTION", "WEIGHT"); + assertThat(schema.primaryKeys()).containsExactly("ID"); + } + + @Test + void testPipelineSourceReadsSnapshotEvents() throws Exception { + Map options = containerOptions(DATABASE_NAME + ".DB2INST1.PRODUCTS"); + options.put(METADATA_LIST.key(), "database_name,schema_name,table_name,op_ts"); + Db2DataSource dataSource = createDataSource(options); + FlinkSourceProvider sourceProvider = + (FlinkSourceProvider) dataSource.getEventSourceProvider(); + + StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); + env.setParallelism(1); + env.enableCheckpointing(200); + env.setRestartStrategy(RestartStrategies.noRestart()); + + try (CloseableIterator events = + env.fromSource( + sourceProvider.getSource(), + WatermarkStrategy.noWatermarks(), + Db2DataSourceFactory.IDENTIFIER, + new EventTypeInfo()) + .executeAndCollect()) { + TableId tableId = TableId.tableId(DATABASE_NAME, SCHEMA_NAME, "PRODUCTS"); + List snapshotEvents = fetchSnapshotEvents(events, 9); + + assertThat(snapshotEvents) + .allSatisfy( + event -> { + assertThat(event.tableId()).isEqualTo(tableId); + assertThat(event.op()).isEqualTo(OperationType.INSERT); + assertThat(event.after()).isNotNull(); + assertThat(event.meta()) + .containsEntry("database_name", DATABASE_NAME) + .containsEntry("schema_name", SCHEMA_NAME) + .containsEntry("table_name", "PRODUCTS") + .containsKey("op_ts"); + }); + } + } + + private static List fetchSnapshotEvents(Iterator events, int size) { + List createTableEvents = new ArrayList<>(); + List dataChangeEvents = new ArrayList<>(); + while (events.hasNext()) { + Event event = events.next(); + if (event instanceof CreateTableEvent) { + createTableEvents.add((CreateTableEvent) event); + } else if (event instanceof DataChangeEvent) { + dataChangeEvents.add((DataChangeEvent) event); + if (dataChangeEvents.size() == size) { + break; + } + } + } + assertThat(createTableEvents).isNotEmpty(); + return dataChangeEvents; + } + + private static Db2DataSource createDataSource(Map options) { + return (Db2DataSource) + new Db2DataSourceFactory() + .createDataSource(new MockContext(Configuration.fromMap(options))); + } + + private static Map containerOptions(String tables) { + Map options = new HashMap<>(); + options.put(HOSTNAME.key(), DB2_CONTAINER.getHost()); + options.put(PORT.key(), String.valueOf(DB2_CONTAINER.getMappedPort(DB2_PORT))); + options.put(USERNAME.key(), DB2_CONTAINER.getUsername()); + options.put(PASSWORD.key(), DB2_CONTAINER.getPassword()); + options.put(TABLES.key(), tables); + return options; + } + + private static class MockContext implements Factory.Context { + + private final Configuration factoryConfiguration; + + private MockContext(Configuration factoryConfiguration) { + this.factoryConfiguration = factoryConfiguration; + } + + @Override + public Configuration getFactoryConfiguration() { + return factoryConfiguration; + } + + @Override + public Configuration getPipelineConfiguration() { + return Configuration.fromMap(Collections.emptyMap()); + } + + @Override + public ClassLoader getClassLoader() { + return getClass().getClassLoader(); + } + } +} diff --git a/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/test/java/org/apache/flink/cdc/connectors/db2/factory/Db2DataSourceFactoryTest.java b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/test/java/org/apache/flink/cdc/connectors/db2/factory/Db2DataSourceFactoryTest.java new file mode 100644 index 00000000000..07cac44d4ca --- /dev/null +++ b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/test/java/org/apache/flink/cdc/connectors/db2/factory/Db2DataSourceFactoryTest.java @@ -0,0 +1,249 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.cdc.connectors.db2.factory; + +import org.apache.flink.cdc.common.configuration.ConfigOption; +import org.apache.flink.cdc.common.configuration.Configuration; +import org.apache.flink.cdc.common.factories.Factory; +import org.apache.flink.cdc.common.source.SupportedMetadataColumn; +import org.apache.flink.cdc.connectors.db2.source.DatabaseNameMetadataColumn; +import org.apache.flink.cdc.connectors.db2.source.Db2DataSource; +import org.apache.flink.cdc.connectors.db2.source.OpTsMetadataColumn; +import org.apache.flink.cdc.connectors.db2.source.SchemaNameMetadataColumn; +import org.apache.flink.cdc.connectors.db2.source.TableNameMetadataColumn; +import org.apache.flink.cdc.connectors.db2.source.config.Db2SourceConfigFactory; +import org.apache.flink.table.api.ValidationException; + +import org.junit.jupiter.api.Test; + +import java.util.Collections; +import java.util.HashMap; +import java.util.Map; +import java.util.Properties; +import java.util.stream.Collectors; + +import static org.apache.flink.cdc.connectors.db2.source.Db2DataSourceOptions.HOSTNAME; +import static org.apache.flink.cdc.connectors.db2.source.Db2DataSourceOptions.METADATA_LIST; +import static org.apache.flink.cdc.connectors.db2.source.Db2DataSourceOptions.PASSWORD; +import static org.apache.flink.cdc.connectors.db2.source.Db2DataSourceOptions.PORT; +import static org.apache.flink.cdc.connectors.db2.source.Db2DataSourceOptions.SCAN_STARTUP_MODE; +import static org.apache.flink.cdc.connectors.db2.source.Db2DataSourceOptions.TABLES; +import static org.apache.flink.cdc.connectors.db2.source.Db2DataSourceOptions.TABLES_EXCLUDE; +import static org.apache.flink.cdc.connectors.db2.source.Db2DataSourceOptions.USERNAME; +import static org.assertj.core.api.Assertions.assertThat; +import static org.assertj.core.api.Assertions.assertThatThrownBy; + +/** Tests for {@link Db2DataSourceFactory}. */ +class Db2DataSourceFactoryTest { + + @Test + void testIdentifierAndOptions() { + Db2DataSourceFactory factory = new Db2DataSourceFactory(); + + assertThat(factory.identifier()).isEqualTo("db2"); + assertThat( + factory.requiredOptions().stream() + .map(ConfigOption::key) + .collect(Collectors.toSet())) + .containsExactlyInAnyOrder( + HOSTNAME.key(), USERNAME.key(), PASSWORD.key(), TABLES.key()); + assertThat(factory.optionalOptions()).contains(PORT, TABLES_EXCLUDE, METADATA_LIST); + assertThat(factory.optionalOptions().stream().map(ConfigOption::key)) + .contains(SCAN_STARTUP_MODE.key()) + .doesNotContain("scan.startup.timestamp-millis") + .doesNotContain("scan.newly-added-table.enabled"); + } + + @Test + void testLackRequiredOptions() { + Db2DataSourceFactory factory = new Db2DataSourceFactory(); + + for (ConfigOption requiredOption : factory.requiredOptions()) { + Map options = validOptions(); + options.remove(requiredOption.key()); + + assertThatThrownBy( + () -> + factory.createDataSource( + new MockContext(Configuration.fromMap(options)))) + .isInstanceOf(ValidationException.class) + .hasMessageContaining(requiredOption.key()); + } + } + + @Test + void testUnsupportedOption() { + Map options = validOptions(); + options.put("unsupported_key", "unsupported_value"); + + assertThatThrownBy( + () -> + new Db2DataSourceFactory() + .createDataSource( + new MockContext(Configuration.fromMap(options)))) + .isInstanceOf(ValidationException.class) + .hasMessageContaining("Unsupported options found for 'db2'") + .hasMessageContaining("unsupported_key"); + } + + @Test + void testUnsupportedStartupModesFailBeforeDatabaseAccess() { + assertUnsupportedStartupMode("snapshot"); + assertUnsupportedStartupMode("timestamp"); + } + + @Test + void testTableValidationWithDifferentDatabases() { + Map options = validOptions(); + options.put(TABLES.key(), "DB1.DB2INST1.T1,DB2.DB2INST1.T2"); + + assertThatThrownBy( + () -> + new Db2DataSourceFactory() + .createDataSource( + new MockContext(Configuration.fromMap(options)))) + .isInstanceOf(IllegalStateException.class) + .hasMessageContaining("not all table names have the same database name"); + } + + @Test + void testTableValidationRequiresDatabaseSchemaTableFormat() { + Map options = validOptions(); + options.put(TABLES.key(), "TESTDB.DB2INST1"); + + assertThatThrownBy( + () -> + new Db2DataSourceFactory() + .createDataSource( + new MockContext(Configuration.fromMap(options)))) + .isInstanceOf(IllegalStateException.class) + .hasMessageContaining("does not match the expected 'database.schema.table' format") + .hasMessageContaining(TABLES.key()); + } + + @Test + void testDatabaseNameLengthValidation() { + Map options = validOptions(); + options.put(TABLES.key(), repeat("D", 129) + ".DB2INST1.T1"); + + assertThatThrownBy( + () -> + new Db2DataSourceFactory() + .createDataSource( + new MockContext(Configuration.fromMap(options)))) + .isInstanceOf(IllegalStateException.class) + .hasMessageContaining("exceeds DB2's maximum identifier length"); + } + + @Test + void testSupportedMetadataColumns() { + Db2DataSource dataSource = new Db2DataSource(testConfigFactory()); + + SupportedMetadataColumn[] metadataColumns = dataSource.supportedMetadataColumns(); + + assertThat(metadataColumns).hasSize(4); + assertThat(metadataColumns[0]).isInstanceOf(OpTsMetadataColumn.class); + assertThat(metadataColumns[0].getName()).isEqualTo("op_ts"); + assertThat(metadataColumns[1]).isInstanceOf(TableNameMetadataColumn.class); + assertThat(metadataColumns[1].getName()).isEqualTo("table_name"); + assertThat(metadataColumns[2]).isInstanceOf(DatabaseNameMetadataColumn.class); + assertThat(metadataColumns[2].getName()).isEqualTo("database_name"); + assertThat(metadataColumns[3]).isInstanceOf(SchemaNameMetadataColumn.class); + assertThat(metadataColumns[3].getName()).isEqualTo("schema_name"); + + Map metadata = new HashMap<>(); + metadata.put("op_ts", "12345"); + metadata.put("table_name", "PRODUCTS"); + metadata.put("database_name", "TESTDB"); + metadata.put("schema_name", "DB2INST1"); + assertThat(metadataColumns[0].read(metadata)).isEqualTo(12345L); + assertThat(metadataColumns[1].read(metadata)).isEqualTo("PRODUCTS"); + assertThat(metadataColumns[2].read(metadata)).isEqualTo("TESTDB"); + assertThat(metadataColumns[3].read(metadata)).isEqualTo("DB2INST1"); + } + + private static void assertUnsupportedStartupMode(String startupMode) { + Map options = validOptions(); + options.put(SCAN_STARTUP_MODE.key(), startupMode); + + assertThatThrownBy( + () -> + new Db2DataSourceFactory() + .createDataSource( + new MockContext(Configuration.fromMap(options)))) + .isInstanceOf(ValidationException.class) + .hasMessageContaining("Supported values are [initial, latest-offset]") + .hasMessageContaining(startupMode); + } + + private static Map validOptions() { + Map options = new HashMap<>(); + options.put(HOSTNAME.key(), "localhost"); + options.put(USERNAME.key(), "db2inst1"); + options.put(PASSWORD.key(), "password"); + options.put(TABLES.key(), "TESTDB.DB2INST1.PRODUCTS"); + return options; + } + + private static Db2SourceConfigFactory testConfigFactory() { + Db2SourceConfigFactory configFactory = new Db2SourceConfigFactory(); + Properties dbzProperties = new Properties(); + configFactory + .hostname("localhost") + .port(50000) + .databaseList("TESTDB") + .tableList("DB2INST1.PRODUCTS") + .username("db2inst1") + .password("password") + .serverTimeZone("UTC") + .debeziumProperties(dbzProperties); + return configFactory; + } + + private static String repeat(String value, int times) { + StringBuilder builder = new StringBuilder(value.length() * times); + for (int i = 0; i < times; i++) { + builder.append(value); + } + return builder.toString(); + } + + private static class MockContext implements Factory.Context { + + private final Configuration factoryConfiguration; + + private MockContext(Configuration factoryConfiguration) { + this.factoryConfiguration = factoryConfiguration; + } + + @Override + public Configuration getFactoryConfiguration() { + return factoryConfiguration; + } + + @Override + public Configuration getPipelineConfiguration() { + return Configuration.fromMap(Collections.emptyMap()); + } + + @Override + public ClassLoader getClassLoader() { + return getClass().getClassLoader(); + } + } +} diff --git a/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/test/java/org/apache/flink/cdc/connectors/db2/factory/PipelineDb2TestBase.java b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/test/java/org/apache/flink/cdc/connectors/db2/factory/PipelineDb2TestBase.java new file mode 100644 index 00000000000..835603a0b22 --- /dev/null +++ b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/test/java/org/apache/flink/cdc/connectors/db2/factory/PipelineDb2TestBase.java @@ -0,0 +1,346 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.cdc.connectors.db2.factory; + +import org.apache.flink.util.FlinkRuntimeException; + +import org.apache.commons.lang3.StringUtils; +import org.assertj.core.api.Assertions; +import org.awaitility.Awaitility; +import org.junit.jupiter.api.AfterAll; +import org.junit.jupiter.api.BeforeAll; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import org.testcontainers.containers.Db2Container; +import org.testcontainers.containers.output.Slf4jLogConsumer; +import org.testcontainers.images.builder.ImageFromDockerfile; +import org.testcontainers.lifecycle.Startables; +import org.testcontainers.utility.DockerImageName; + +import java.io.BufferedReader; +import java.io.IOException; +import java.io.InputStream; +import java.io.InputStreamReader; +import java.net.JarURLConnection; +import java.net.URISyntaxException; +import java.net.URL; +import java.nio.charset.StandardCharsets; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.sql.Connection; +import java.sql.DriverManager; +import java.sql.ResultSet; +import java.sql.SQLException; +import java.sql.Statement; +import java.util.Arrays; +import java.util.List; +import java.util.Locale; +import java.util.concurrent.CompletableFuture; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.atomic.AtomicBoolean; +import java.util.jar.JarEntry; +import java.util.jar.JarFile; +import java.util.regex.Matcher; +import java.util.regex.Pattern; +import java.util.stream.Collectors; +import java.util.stream.Stream; + +import static java.lang.String.format; +import static org.apache.flink.util.Preconditions.checkState; + +/** Testcontainers base for DB2 pipeline connector tests. */ +class PipelineDb2TestBase { + + private static final Logger LOG = LoggerFactory.getLogger(PipelineDb2TestBase.class); + + private static final DockerImageName DEBEZIUM_DOCKER_IMAGE_NAME = + DockerImageName.parse( + new ImageFromDockerfile("custom/db2-cdc:1.4") + .withDockerfile( + createDb2ServerBuildContext().resolve("Dockerfile")) + .get()) + .asCompatibleSubstituteFor("ibmcom/db2"); + + private static final Pattern COMMENT_PATTERN = Pattern.compile("^(.*)--.*$"); + private static final CompletableFuture db2AsnAgentStarted = new CompletableFuture<>(); + + protected static final Db2Container DB2_CONTAINER = + new Db2Container(DEBEZIUM_DOCKER_IMAGE_NAME) + .withDatabaseName("testdb") + .withUsername("db2inst1") + .withPassword("flinkpw") + .withEnv("AUTOCONFIG", "false") + .withEnv("ARCHIVE_LOGS", "true") + .acceptLicense() + .withCreateContainerCmdModifier( + createContainerCmd -> createContainerCmd.withPlatform("linux/amd64")) + .withLogConsumer(new Slf4jLogConsumer(LOG)) + .withLogConsumer( + outputFrame -> { + if (outputFrame + .getUtf8String() + .contains("The asncdc program enable finished")) { + db2AsnAgentStarted.complete(null); + } + }); + + @BeforeAll + static void startContainers() { + LOG.info("Starting DB2 container..."); + Startables.deepStart(Stream.of(DB2_CONTAINER)).join(); + LOG.info("DB2 container is started."); + + db2AsnAgentStarted.join(); + assertCdcAgentRunning(); + LOG.info("DB2 ASN agent is available."); + } + + @AfterAll + static void stopContainers() { + LOG.info("Stopping DB2 container..."); + DB2_CONTAINER.stop(); + } + + protected Connection getJdbcConnection() throws SQLException { + return DriverManager.getConnection( + DB2_CONTAINER.getJdbcUrl(), + DB2_CONTAINER.getUsername(), + DB2_CONTAINER.getPassword()); + } + + private static void assertCdcAgentRunning() { + try (Connection connection = + DriverManager.getConnection( + DB2_CONTAINER.getJdbcUrl(), + DB2_CONTAINER.getUsername(), + DB2_CONTAINER.getPassword()); + Statement statement = connection.createStatement(); + ResultSet resultSet = + statement.executeQuery("VALUES ASNCDC.ASNCDCSERVICES('status','asncdc')")) { + Assertions.assertThat(resultSet.next()).isTrue(); + Assertions.assertThat(resultSet.getString(1)) + .doesNotContainIgnoringCase("asncap is not running"); + } catch (SQLException e) { + throw new FlinkRuntimeException("Failed to verify DB2 ASN agent status.", e); + } + } + + protected void initializeDb2Table(String sqlFile, String tableName) { + try (Connection connection = getJdbcConnection(); + Statement statement = connection.createStatement()) { + if (checkTableExists(connection, tableName)) { + dropTestTable(connection, tableName.toUpperCase(Locale.ROOT)); + Thread.sleep(10_000); + } + for (String stmt : readSqlStatements(sqlFile)) { + statement.execute(stmt); + Thread.sleep(500); + } + } catch (Exception e) { + throw new RuntimeException(e); + } + } + + public String getTableNameRegex(String[] captureCustomerTables) { + checkState(captureCustomerTables.length > 0); + if (captureCustomerTables.length == 1) { + return captureCustomerTables[0]; + } + return format("(%s)", StringUtils.join(captureCustomerTables, ",")); + } + + private static List readSqlStatements(String sqlFile) throws IOException { + String ddlFile = String.format("db2_server/%s.sql", sqlFile); + InputStream ddlStream = + PipelineDb2TestBase.class.getClassLoader().getResourceAsStream(ddlFile); + Assertions.assertThat(ddlStream).withFailMessage("Cannot locate " + ddlFile).isNotNull(); + try (BufferedReader reader = + new BufferedReader(new InputStreamReader(ddlStream, StandardCharsets.UTF_8))) { + return Arrays.stream( + reader.lines() + .map(String::trim) + .filter(x -> !x.startsWith("--") && !x.isEmpty()) + .map( + x -> { + Matcher matcher = COMMENT_PATTERN.matcher(x); + return matcher.matches() ? matcher.group(1) : x; + }) + .collect(Collectors.joining("\n")) + .split(";")) + .filter(stmt -> !stmt.trim().isEmpty()) + .collect(Collectors.toList()); + } + } + + private static void dropTestTable(Connection connection, String tableName) { + try { + Awaitility.await(String.format("cdc remove table %s", tableName)) + .atMost(30, TimeUnit.SECONDS) + .until( + () -> { + try { + connection + .createStatement() + .execute( + String.format( + "CALL ASNCDC.REMOVETABLE('DB2INST1', '%s')", + tableName)); + connection + .createStatement() + .execute( + "VALUES ASNCDC.ASNCDCSERVICES('reinit','asncdc')"); + return true; + } catch (SQLException e) { + LOG.warn( + "CDC remove table {} failed, will retry.", + tableName, + e); + return false; + } + }); + } catch (Exception e) { + throw new FlinkRuntimeException("Failed to remove CDC table " + tableName, e); + } + + try { + Awaitility.await(String.format("drop table %s", tableName)) + .atMost(30, TimeUnit.SECONDS) + .until( + () -> { + try { + connection + .createStatement() + .execute( + String.format( + "DROP TABLE DB2INST1.%s", tableName)); + connection.commit(); + return true; + } catch (SQLException e) { + LOG.warn("Drop table {} failed, will retry.", tableName, e); + return false; + } + }); + } catch (Exception e) { + throw new FlinkRuntimeException("Failed to drop table " + tableName, e); + } + } + + private static boolean checkTableExists(Connection connection, String tableName) { + AtomicBoolean tableExists = new AtomicBoolean(false); + try { + Awaitility.await(String.format("check table %s exists", tableName)) + .atMost(30, TimeUnit.SECONDS) + .until( + () -> { + try { + ResultSet resultSet = + connection + .createStatement() + .executeQuery( + String.format( + "SELECT COUNT(*) FROM SYSCAT.TABLES WHERE TABNAME = '%s' AND TABSCHEMA = 'DB2INST1'", + tableName)); + if (resultSet.next() && resultSet.getInt(1) == 1) { + tableExists.set(true); + } + return true; + } catch (SQLException e) { + LOG.warn( + "Check table {} exists failed, will retry.", + tableName, + e); + return false; + } + }); + } catch (Exception e) { + throw new FlinkRuntimeException("Failed to check table " + tableName, e); + } + return tableExists.get(); + } + + private static Path createDb2ServerBuildContext() { + Path sourceDir = getDb2ServerResourceDir(); + try { + Path targetDir = Files.createTempDirectory("flink-cdc-db2-server-build"); + try (Stream files = Files.walk(sourceDir)) { + for (Path source : files.collect(Collectors.toList())) { + Path target = targetDir.resolve(sourceDir.relativize(source).toString()); + if (Files.isDirectory(source)) { + Files.createDirectories(target); + } else { + Files.createDirectories(target.getParent()); + Files.copy(source, target); + } + } + } + Path dockerfile = targetDir.resolve("Dockerfile"); + String dockerfileContent = + new String(Files.readAllBytes(dockerfile), StandardCharsets.UTF_8); + Files.write( + dockerfile, + dockerfileContent + .replace( + "FROM ibmcom/db2:11.5.0.0a", + "FROM --platform=linux/amd64 ibmcom/db2:11.5.0.0a") + .getBytes(StandardCharsets.UTF_8)); + return targetDir; + } catch (IOException e) { + throw new RuntimeException("Failed to create DB2 Docker build context", e); + } + } + + private static Path getDb2ServerResourceDir() { + try { + URL dockerfile = + PipelineDb2TestBase.class.getClassLoader().getResource("db2_server/Dockerfile"); + Assertions.assertThat(dockerfile) + .withFailMessage("Cannot locate db2_server/Dockerfile") + .isNotNull(); + if ("file".equals(dockerfile.getProtocol())) { + return Paths.get(dockerfile.toURI()).getParent(); + } + if ("jar".equals(dockerfile.getProtocol())) { + return extractDb2ServerResources((JarURLConnection) dockerfile.openConnection()); + } + throw new IllegalStateException("Unsupported resource protocol: " + dockerfile); + } catch (IOException | URISyntaxException e) { + throw new RuntimeException("Failed to resolve db2_server resources", e); + } + } + + private static Path extractDb2ServerResources(JarURLConnection jarConnection) + throws IOException { + Path tempDir = Files.createTempDirectory("flink-cdc-db2-server"); + Path db2ServerDir = tempDir.resolve("db2_server"); + Files.createDirectories(db2ServerDir); + try (JarFile jarFile = jarConnection.getJarFile()) { + for (JarEntry entry : jarFile.stream().collect(Collectors.toList())) { + String name = entry.getName(); + if (!entry.isDirectory() && name.startsWith("db2_server/")) { + Path target = tempDir.resolve(name); + Files.createDirectories(target.getParent()); + try (InputStream inputStream = jarFile.getInputStream(entry)) { + Files.copy(inputStream, target); + } + } + } + } + return db2ServerDir; + } +} diff --git a/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/test/java/org/apache/flink/cdc/connectors/db2/source/Db2EventDeserializerTest.java b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/test/java/org/apache/flink/cdc/connectors/db2/source/Db2EventDeserializerTest.java new file mode 100644 index 00000000000..38cd5d570b9 --- /dev/null +++ b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/test/java/org/apache/flink/cdc/connectors/db2/source/Db2EventDeserializerTest.java @@ -0,0 +1,332 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.cdc.connectors.db2.source; + +import org.apache.flink.cdc.common.data.DecimalData; +import org.apache.flink.cdc.common.event.AddColumnEvent; +import org.apache.flink.cdc.common.event.CreateTableEvent; +import org.apache.flink.cdc.common.event.DropTableEvent; +import org.apache.flink.cdc.common.event.Event; +import org.apache.flink.cdc.common.schema.Schema; +import org.apache.flink.cdc.common.types.DecimalType; +import org.apache.flink.cdc.debezium.table.DebeziumChangelogMode; +import org.apache.flink.util.Collector; + +import io.debezium.document.DocumentWriter; +import io.debezium.relational.Column; +import io.debezium.relational.Table; +import io.debezium.relational.TableEditor; +import io.debezium.relational.TableId; +import io.debezium.relational.history.HistoryRecord; +import io.debezium.relational.history.TableChanges; +import io.debezium.relational.history.TableChanges.TableChange; +import io.debezium.relational.history.TableChanges.TableChangeType; +import org.apache.kafka.connect.data.Decimal; +import org.apache.kafka.connect.data.SchemaBuilder; +import org.apache.kafka.connect.data.Struct; +import org.apache.kafka.connect.source.SourceRecord; +import org.junit.jupiter.api.Test; + +import java.math.BigDecimal; +import java.sql.Types; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collections; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +import static org.assertj.core.api.Assertions.assertThat; + +/** Tests for {@link Db2EventDeserializer} schema change handling. */ +class Db2EventDeserializerTest { + + private static final DocumentWriter DOCUMENT_WRITER = DocumentWriter.defaultWriter(); + + @Test + void testDb2CdcDecimalBytesAreDecoded() { + Db2EventDeserializer deserializer = + new Db2EventDeserializer(DebeziumChangelogMode.ALL, true); + org.apache.kafka.connect.data.Schema dbzDecimalSchema = Decimal.schema(2); + + assertThat( + ((DecimalData) + deserializer.convertToDecimal( + new DecimalType(false, 14, 2), + new byte[] { + (byte) 0x87, + 0x36, + 0x35, + 0x2e, + 0x34, + 0x33, + 0x32, + 0x31 + }, + dbzDecimalSchema)) + .toBigDecimal()) + .isEqualByComparingTo("1234.56"); + assertThat( + ((DecimalData) + deserializer.convertToDecimal( + new DecimalType(false, 14, 2), + new byte[] { + (byte) 0x86, + 0x00, + 0x30, + 0x30, + 0x2e, + 0x38, + 0x39, + 0x34 + }, + dbzDecimalSchema)) + .toBigDecimal()) + .isEqualByComparingTo("498.00"); + } + + @Test + void testDb2CdcDecimalBigDecimalPayloadIsDecoded() { + Db2EventDeserializer deserializer = + new Db2EventDeserializer(DebeziumChangelogMode.ALL, true); + org.apache.kafka.connect.data.Schema dbzDecimalSchema = Decimal.schema(2); + + assertThat( + ((DecimalData) + deserializer.convertToDecimal( + new DecimalType(false, 14, 2), + new BigDecimal("-87037107572863666.71"), + dbzDecimalSchema)) + .toBigDecimal()) + .isEqualByComparingTo("1234.56"); + assertThat( + ((DecimalData) + deserializer.convertToDecimal( + new DecimalType(false, 14, 2), + new BigDecimal("-87909734891352081.40"), + dbzDecimalSchema)) + .toBigDecimal()) + .isEqualByComparingTo("498.00"); + } + + @Test + void testDb2CdcDecimalStringPayloadIsDecoded() { + Db2EventDeserializer deserializer = + new Db2EventDeserializer(DebeziumChangelogMode.ALL, true); + org.apache.kafka.connect.data.Schema dbzDecimalSchema = Decimal.schema(2); + + assertThat( + ((DecimalData) + deserializer.convertToDecimal( + new DecimalType(false, 14, 2), + "-87037107572863666.71", + dbzDecimalSchema)) + .toBigDecimal()) + .isEqualByComparingTo("1234.56"); + assertThat( + ((DecimalData) + deserializer.convertToDecimal( + new DecimalType(false, 14, 2), + "1234.56", + dbzDecimalSchema)) + .toBigDecimal()) + .isEqualByComparingTo("1234.56"); + } + + @Test + void testCreateAlterDropAreEmitted() throws Exception { + Db2EventDeserializer deserializer = + new Db2EventDeserializer(DebeziumChangelogMode.ALL, true); + List events = new ArrayList<>(); + TestCollector collector = new TestCollector(events); + + SourceRecord createRecord = + buildSchemaChangeRecord( + TableChangeType.CREATE, Collections.singletonList(col("ID", false, 1))); + deserializer.deserialize(createRecord, collector); + + assertThat(events).hasSize(1); + assertThat(events.get(0)).isInstanceOf(CreateTableEvent.class); + Schema createSchema = ((CreateTableEvent) events.get(0)).getSchema(); + assertThat(createSchema.getColumns()).hasSize(1); + + SourceRecord alterRecord = + buildSchemaChangeRecord( + TableChangeType.ALTER, + Arrays.asList(col("ID", false, 1), col("AGE", true, 2))); + deserializer.deserialize(alterRecord, collector); + + assertThat(events).hasSize(2); + assertThat(events.get(1)).isInstanceOf(AddColumnEvent.class); + AddColumnEvent addColumnEvent = (AddColumnEvent) events.get(1); + assertThat(addColumnEvent.getAddedColumns()).hasSize(1); + assertThat(addColumnEvent.getAddedColumns().get(0).getAddColumn().getName()) + .isEqualTo("AGE"); + + SourceRecord dropRecord = + buildSchemaChangeRecord(TableChangeType.DROP, Collections.emptyList()); + deserializer.deserialize(dropRecord, collector); + + assertThat(events).hasSize(3); + assertThat(events.get(2)).isInstanceOf(DropTableEvent.class); + } + + @Test + void testAlterDiffUsesRestoredSchemaCache() throws Exception { + Db2EventDeserializer deserializer = + new Db2EventDeserializer(DebeziumChangelogMode.ALL, true); + TableChange restoredTableChange = + buildTableChanges( + TableChangeType.CREATE, + Collections.singletonList(col("ID", false, 1))) + .iterator() + .next(); + Map tableSchemas = new HashMap<>(); + tableSchemas.put(restoredTableChange.getId(), restoredTableChange); + deserializer.initializeTableSchemaCacheFromSplitSchemas(tableSchemas); + + List events = new ArrayList<>(); + TestCollector collector = new TestCollector(events); + SourceRecord alterRecord = + buildSchemaChangeRecord( + TableChangeType.ALTER, + Arrays.asList(col("ID", false, 1), col("AGE", true, 2))); + deserializer.deserialize(alterRecord, collector); + + assertThat(events).hasSize(1); + assertThat(events.get(0)).isInstanceOf(AddColumnEvent.class); + AddColumnEvent addColumnEvent = (AddColumnEvent) events.get(0); + assertThat(addColumnEvent.getAddedColumns()).hasSize(1); + assertThat(addColumnEvent.getAddedColumns().get(0).getAddColumn().getName()) + .isEqualTo("AGE"); + } + + private static Column col(String name, boolean optional, int position) { + return Column.editor() + .name(name) + .jdbcType(Types.INTEGER) + .type("INTEGER", "INTEGER") + .position(position) + .optional(optional) + .create(); + } + + private static SourceRecord buildSchemaChangeRecord(TableChangeType type, List columns) + throws Exception { + TableId tableId = new TableId("TESTDB", "DB2INST1", "USERS"); + TableChanges tableChanges = buildTableChanges(type, columns); + + HistoryRecord historyRecord = + new HistoryRecord( + Collections.singletonMap("file", "test"), + Collections.singletonMap("pos", "1"), + tableId.catalog(), + tableId.schema(), + "ddl", + tableChanges); + + String historyJson = DOCUMENT_WRITER.write(historyRecord.document()); + + org.apache.kafka.connect.data.Schema keySchema = + SchemaBuilder.struct() + .name("io.debezium.connector.db2.SchemaChangeKey") + .field("databaseName", org.apache.kafka.connect.data.Schema.STRING_SCHEMA) + .build(); + Struct keyStruct = new Struct(keySchema).put("databaseName", tableId.catalog()); + + org.apache.kafka.connect.data.Schema sourceSchema = + SchemaBuilder.struct() + .name("source") + .field("dummy", org.apache.kafka.connect.data.Schema.OPTIONAL_STRING_SCHEMA) + .optional() + .build(); + org.apache.kafka.connect.data.Schema valueSchema = + SchemaBuilder.struct() + .name("io.debezium.connector.db2.SchemaChangeValue") + .field("source", sourceSchema) + .field( + org.apache.flink.cdc.connectors.base.relational + .JdbcSourceEventDispatcher.HISTORY_RECORD_FIELD, + org.apache.kafka.connect.data.Schema.STRING_SCHEMA) + .build(); + + Struct valueStruct = + new Struct(valueSchema) + .put("source", new Struct(sourceSchema)) + .put( + org.apache.flink.cdc.connectors.base.relational + .JdbcSourceEventDispatcher.HISTORY_RECORD_FIELD, + historyJson); + + Map partition = new HashMap<>(); + partition.put("server", "server1"); + Map offset = new HashMap<>(); + offset.put("lsn", "1"); + + return new SourceRecord( + partition, + offset, + "server1.TESTDB.DB2INST1.USERS", + null, + keySchema, + keyStruct, + valueSchema, + valueStruct); + } + + private static TableChanges buildTableChanges(TableChangeType type, List columns) { + TableId tableId = new TableId("TESTDB", "DB2INST1", "USERS"); + TableEditor editor = Table.editor().tableId(tableId); + columns.forEach(editor::addColumn); + if (!columns.isEmpty()) { + editor.setPrimaryKeyNames("ID"); + } + Table table = editor.create(); + TableChanges tableChanges = new TableChanges(); + switch (type) { + case CREATE: + tableChanges.create(table); + break; + case ALTER: + tableChanges.alter(table); + break; + case DROP: + tableChanges.drop(table); + break; + default: + throw new IllegalArgumentException("Unsupported type " + type); + } + return tableChanges; + } + + private static class TestCollector implements Collector { + private final List results; + + private TestCollector(List results) { + this.results = results; + } + + @Override + public void collect(Event record) { + results.add(record); + } + + @Override + public void close() {} + } +} diff --git a/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/test/java/org/apache/flink/cdc/connectors/db2/source/reader/Db2PipelineRecordEmitterTest.java b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/test/java/org/apache/flink/cdc/connectors/db2/source/reader/Db2PipelineRecordEmitterTest.java new file mode 100644 index 00000000000..0836002c3b5 --- /dev/null +++ b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/test/java/org/apache/flink/cdc/connectors/db2/source/reader/Db2PipelineRecordEmitterTest.java @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.cdc.connectors.db2.source.reader; + +import org.apache.flink.cdc.connectors.base.options.StartupOptions; +import org.apache.flink.cdc.connectors.db2.source.Db2EventDeserializer; +import org.apache.flink.cdc.connectors.db2.source.config.Db2SourceConfig; +import org.apache.flink.cdc.connectors.db2.source.config.Db2SourceConfigFactory; +import org.apache.flink.cdc.connectors.db2.source.dialect.Db2Dialect; +import org.apache.flink.cdc.connectors.db2.source.offset.LsnFactory; +import org.apache.flink.cdc.debezium.table.DebeziumChangelogMode; + +import org.junit.jupiter.api.Test; + +import java.time.Duration; + +import static org.assertj.core.api.Assertions.assertThatCode; + +/** Tests for {@link Db2PipelineRecordEmitter}. */ +class Db2PipelineRecordEmitterTest { + + @Test + void constructorDoesNotFetchAllTableSchemas() { + Db2SourceConfigFactory configFactory = new Db2SourceConfigFactory(); + configFactory + .hostname("localhost") + .port(1) + .databaseList("TESTDB") + .tableList("DB2INST1.PRODUCTS") + .username("db2inst1") + .password("password") + .startupOptions(StartupOptions.initial()) + .connectTimeout(Duration.ofMillis(1)) + .connectMaxRetries(0); + Db2SourceConfig sourceConfig = configFactory.create(0); + + assertThatCode( + () -> + new Db2PipelineRecordEmitter<>( + new Db2EventDeserializer(DebeziumChangelogMode.ALL, true), + null, + sourceConfig, + new LsnFactory(), + new Db2Dialect(sourceConfig))) + .doesNotThrowAnyException(); + } +} diff --git a/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/test/java/org/apache/flink/cdc/connectors/db2/table/Db2ReadableMetadataTest.java b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/test/java/org/apache/flink/cdc/connectors/db2/table/Db2ReadableMetadataTest.java new file mode 100644 index 00000000000..6997e4bbfc1 --- /dev/null +++ b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/test/java/org/apache/flink/cdc/connectors/db2/table/Db2ReadableMetadataTest.java @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.cdc.connectors.db2.table; + +import org.apache.flink.table.data.TimestampData; + +import io.debezium.connector.AbstractSourceInfo; +import io.debezium.data.Envelope; +import org.apache.kafka.connect.data.SchemaBuilder; +import org.apache.kafka.connect.data.Struct; +import org.apache.kafka.connect.source.SourceRecord; +import org.junit.jupiter.api.Test; + +import java.util.Collections; + +import static org.assertj.core.api.Assertions.assertThat; + +class Db2ReadableMetadataTest { + + @Test + void testReadDatabaseSchemaTableAndOperationTimestamp() { + SourceRecord record = sourceRecord("TESTDB", "DB2INST1", "PRODUCTS", 1_714_000_000_123L); + + assertThat(Db2ReadableMetadata.DATABASE_NAME.getConverter().read(record).toString()) + .isEqualTo("TESTDB"); + assertThat(Db2ReadableMetadata.SCHEMA_NAME.getConverter().read(record).toString()) + .isEqualTo("DB2INST1"); + assertThat(Db2ReadableMetadata.TABLE_NAME.getConverter().read(record).toString()) + .isEqualTo("PRODUCTS"); + assertThat( + ((TimestampData) Db2ReadableMetadata.OP_TS.getConverter().read(record)) + .getMillisecond()) + .isEqualTo(1_714_000_000_123L); + } + + @Test + void testMetadataKeysAndTypes() { + assertThat(Db2ReadableMetadata.DATABASE_NAME.getKey()).isEqualTo("database_name"); + assertThat(Db2ReadableMetadata.SCHEMA_NAME.getKey()).isEqualTo("schema_name"); + assertThat(Db2ReadableMetadata.TABLE_NAME.getKey()).isEqualTo("table_name"); + assertThat(Db2ReadableMetadata.OP_TS.getKey()).isEqualTo("op_ts"); + assertThat(Db2ReadableMetadata.OP_TS.getDataType().toString()) + .isEqualTo("TIMESTAMP_LTZ(3) NOT NULL"); + } + + private static SourceRecord sourceRecord( + String database, String schema, String table, long timestampMillis) { + org.apache.kafka.connect.data.Schema sourceSchema = + SchemaBuilder.struct() + .name("source") + .field( + AbstractSourceInfo.DATABASE_NAME_KEY, + org.apache.kafka.connect.data.Schema.STRING_SCHEMA) + .field( + AbstractSourceInfo.SCHEMA_NAME_KEY, + org.apache.kafka.connect.data.Schema.STRING_SCHEMA) + .field( + AbstractSourceInfo.TABLE_NAME_KEY, + org.apache.kafka.connect.data.Schema.STRING_SCHEMA) + .field( + AbstractSourceInfo.TIMESTAMP_KEY, + org.apache.kafka.connect.data.Schema.INT64_SCHEMA) + .build(); + Struct sourceStruct = + new Struct(sourceSchema) + .put(AbstractSourceInfo.DATABASE_NAME_KEY, database) + .put(AbstractSourceInfo.SCHEMA_NAME_KEY, schema) + .put(AbstractSourceInfo.TABLE_NAME_KEY, table) + .put(AbstractSourceInfo.TIMESTAMP_KEY, timestampMillis); + + org.apache.kafka.connect.data.Schema valueSchema = + SchemaBuilder.struct() + .name("io.debezium.connector.db2.Envelope") + .field(Envelope.FieldName.SOURCE, sourceSchema) + .build(); + Struct valueStruct = new Struct(valueSchema).put(Envelope.FieldName.SOURCE, sourceStruct); + + return new SourceRecord( + Collections.singletonMap("server", "server1"), + Collections.singletonMap("lsn", "1"), + "server1.TESTDB.DB2INST1.PRODUCTS", + null, + null, + null, + valueSchema, + valueStruct); + } +} diff --git a/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/test/java/org/apache/flink/cdc/connectors/db2/utils/Db2SchemaUtilsTest.java b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/test/java/org/apache/flink/cdc/connectors/db2/utils/Db2SchemaUtilsTest.java new file mode 100644 index 00000000000..f912aa23529 --- /dev/null +++ b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/test/java/org/apache/flink/cdc/connectors/db2/utils/Db2SchemaUtilsTest.java @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.cdc.connectors.db2.utils; + +import org.apache.flink.cdc.common.event.TableId; +import org.apache.flink.cdc.common.schema.Column; +import org.apache.flink.cdc.common.types.DataTypes; + +import org.junit.jupiter.api.Test; + +import java.sql.SQLException; +import java.sql.Types; + +import static org.assertj.core.api.Assertions.assertThat; + +class Db2SchemaUtilsTest { + + @Test + void testQuoteEscapesDoubleQuote() { + assertThat(Db2SchemaUtils.quote("SCHEMA\"NAME")).isEqualTo("\"SCHEMA\"\"NAME\""); + } + + @Test + void testTableIdConversionKeepsDatabaseSchemaAndTable() { + io.debezium.relational.TableId dbzTableId = + new io.debezium.relational.TableId("TESTDB", "DB2INST1", "PRODUCTS"); + + TableId cdcTableId = Db2SchemaUtils.toCdcTableId(dbzTableId); + + assertThat(cdcTableId.getNamespace()).isEqualTo("TESTDB"); + assertThat(cdcTableId.getSchemaName()).isEqualTo("DB2INST1"); + assertThat(cdcTableId.getTableName()).isEqualTo("PRODUCTS"); + assertThat(Db2SchemaUtils.toDbzTableId(cdcTableId)).isEqualTo(dbzTableId); + } + + @Test + void testToColumnNormalizesDb2DefaultValueExpression() { + io.debezium.relational.Column dbzColumn = + io.debezium.relational.Column.editor() + .name("STATUS") + .jdbcType(Types.VARCHAR) + .type("VARCHAR", "VARCHAR") + .length(8) + .optional(true) + .defaultValueExpression("('A''B')") + .create(); + + Column column = Db2SchemaUtils.toColumn(dbzColumn); + + assertThat(column.getName()).isEqualTo("STATUS"); + assertThat(column.getType()).isEqualTo(DataTypes.VARCHAR(8)); + assertThat(column.getDefaultValueExpression()).isEqualTo("A'B"); + } + + @Test + void testMetadataErrorMentionsDb2Prerequisites() { + assertThat( + Db2SchemaUtils.db2MetadataError( + "list tables", new SQLException("missing ASNCDC"))) + .contains("Failed to list tables") + .contains("ASNCDC") + .contains("captured tables") + .contains("configured user") + .contains("missing ASNCDC"); + } +} diff --git a/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/test/java/org/apache/flink/cdc/connectors/db2/utils/Db2TypeUtilsTest.java b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/test/java/org/apache/flink/cdc/connectors/db2/utils/Db2TypeUtilsTest.java new file mode 100644 index 00000000000..0025eba5bf5 --- /dev/null +++ b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/test/java/org/apache/flink/cdc/connectors/db2/utils/Db2TypeUtilsTest.java @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.cdc.connectors.db2.utils; + +import org.apache.flink.cdc.common.types.DataTypes; + +import org.junit.jupiter.api.Test; + +import java.sql.Types; + +import static org.assertj.core.api.Assertions.assertThat; + +class Db2TypeUtilsTest { + + @Test + void testCharacterAndLargeObjectTypes() { + assertThat(Db2TypeUtils.fromDbzColumn(column(Types.CHAR, "CHAR", 5, 0, true))) + .isEqualTo(DataTypes.CHAR(5)); + assertThat(Db2TypeUtils.fromDbzColumn(column(Types.VARCHAR, "VARCHAR", 16, 0, true))) + .isEqualTo(DataTypes.VARCHAR(16)); + assertThat(Db2TypeUtils.fromDbzColumn(column(Types.CLOB, "CLOB", 0, 0, true))) + .isEqualTo(DataTypes.STRING()); + assertThat(Db2TypeUtils.fromDbzColumn(column(Types.BLOB, "BLOB", 0, 0, true))) + .isEqualTo(DataTypes.BYTES()); + } + + @Test + void testNumericAndTemporalTypes() { + assertThat(Db2TypeUtils.fromDbzColumn(column(Types.SMALLINT, "SMALLINT", 0, 0, true))) + .isEqualTo(DataTypes.SMALLINT()); + assertThat(Db2TypeUtils.fromDbzColumn(column(Types.INTEGER, "INTEGER", 0, 0, false))) + .isEqualTo(DataTypes.INT().notNull()); + assertThat(Db2TypeUtils.fromDbzColumn(column(Types.OTHER, "DECFLOAT", 16, 0, true))) + .isEqualTo(DataTypes.DOUBLE()); + assertThat(Db2TypeUtils.fromDbzColumn(column(Types.DECIMAL, "DECIMAL", 12, 3, true))) + .isEqualTo(DataTypes.DECIMAL(12, 3)); + assertThat(Db2TypeUtils.fromDbzColumn(column(Types.TIMESTAMP, "TIMESTAMP", 3, 0, true))) + .isEqualTo(DataTypes.TIMESTAMP(3)); + } + + @Test + void testTimestampPrecisionDefaultsAndClampsFromLength() { + assertThat(Db2TypeUtils.fromDbzColumn(column(Types.TIMESTAMP, "TIMESTAMP", -1, 0, true))) + .isEqualTo(DataTypes.TIMESTAMP()); + assertThat(Db2TypeUtils.fromDbzColumn(column(Types.TIMESTAMP, "TIMESTAMP", 12, 0, true))) + .isEqualTo(DataTypes.TIMESTAMP(9)); + } + + private static io.debezium.relational.Column column( + int jdbcType, String typeName, int length, int scale, boolean optional) { + return io.debezium.relational.Column.editor() + .name("C") + .jdbcType(jdbcType) + .type(typeName, typeName) + .length(length) + .scale(scale) + .optional(optional) + .create(); + } +} diff --git a/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/test/resources/log4j2-test.properties b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/test/resources/log4j2-test.properties new file mode 100644 index 00000000000..931478c6c76 --- /dev/null +++ b/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-db2/src/test/resources/log4j2-test.properties @@ -0,0 +1,34 @@ +################################################################################ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# http://www.apache.org/licenses/LICENSE-2.0 +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +################################################################################ + +# Set root logger level to INFO to not flood build logs +# set manually to DEBUG for debugging purposes +rootLogger.level=INFO +rootLogger.appenderRef.test.ref = TestLogger + +appender.testlogger.name = TestLogger +appender.testlogger.type = CONSOLE +appender.testlogger.target = SYSTEM_ERR +appender.testlogger.layout.type = PatternLayout +appender.testlogger.layout.pattern = %-4r [%t] %-5p %c - %m%n + +logger.debezium.name = io.debezium +logger.debezium.level = INFO + +logger.testcontainers.name = org.testcontainers +logger.testcontainers.level = WARN + +logger.docker.name = com.github.dockerjava +logger.docker.level = WARN diff --git a/flink-cdc-connect/flink-cdc-pipeline-connectors/pom.xml b/flink-cdc-connect/flink-cdc-pipeline-connectors/pom.xml index 490a0d33a6c..73b11dab002 100644 --- a/flink-cdc-connect/flink-cdc-pipeline-connectors/pom.xml +++ b/flink-cdc-connect/flink-cdc-pipeline-connectors/pom.xml @@ -43,6 +43,7 @@ limitations under the License. flink-cdc-pipeline-connector-fluss flink-cdc-pipeline-connector-sqlserver flink-cdc-pipeline-connector-hudi + flink-cdc-pipeline-connector-db2 @@ -61,4 +62,4 @@ limitations under the License. - \ No newline at end of file + diff --git a/flink-cdc-connect/flink-cdc-source-connectors/flink-connector-db2-cdc/src/main/java/io/debezium/connector/db2/Db2Connection.java b/flink-cdc-connect/flink-cdc-source-connectors/flink-connector-db2-cdc/src/main/java/io/debezium/connector/db2/Db2Connection.java index 63da894ec85..47ea8c4a514 100644 --- a/flink-cdc-connect/flink-cdc-source-connectors/flink-connector-db2-cdc/src/main/java/io/debezium/connector/db2/Db2Connection.java +++ b/flink-cdc-connect/flink-cdc-source-connectors/flink-connector-db2-cdc/src/main/java/io/debezium/connector/db2/Db2Connection.java @@ -66,6 +66,7 @@ public class Db2Connection extends JdbcConnection { + "WHEN IBMSNAP_OPERATION = 'I' AND (LAG(cdc.IBMSNAP_OPERATION,1,'X') OVER (PARTITION BY cdc.IBMSNAP_COMMITSEQ ORDER BY cdc.IBMSNAP_INTENTSEQ)) ='D' THEN 4 " + "WHEN IBMSNAP_OPERATION = 'D' THEN 1 " + "WHEN IBMSNAP_OPERATION = 'I' THEN 2 " + + "WHEN IBMSNAP_OPERATION = 'U' THEN 5 " + "END " + "OPCODE," + "cdc.* " @@ -77,14 +78,13 @@ public class Db2Connection extends JdbcConnection { + CDC_SCHEMA + ".IBMSNAP_REGISTER r left JOIN SYSCAT.TABLES t ON r.SOURCE_OWNER = t.TABSCHEMA AND r.SOURCE_TABLE = t.TABNAME WHERE r.SOURCE_OWNER <> ''"; - // No new Tabels 1=0 private static final String GET_LIST_OF_NEW_CDC_ENABLED_TABLES = "select CAST((t.TBSPACEID * 65536 + t.TABLEID )AS INTEGER ) AS OBJECTID, " + " CD_OWNER CONCAT '.' CONCAT CD_TABLE, " + " CD_NEW_SYNCHPOINT, " + " CD_OLD_SYNCHPOINT " + "from ASNCDC.IBMSNAP_REGISTER r left JOIN SYSCAT.TABLES t ON r.SOURCE_OWNER = t.TABSCHEMA AND r.SOURCE_TABLE = t.TABNAME " - + "WHERE r.SOURCE_OWNER <> '' AND 1=0 AND CD_NEW_SYNCHPOINT > ? AND CD_OLD_SYNCHPOINT < ? "; + + "WHERE r.SOURCE_OWNER <> '' AND CD_NEW_SYNCHPOINT > ? AND CD_OLD_SYNCHPOINT < ? "; private static final String GET_LIST_OF_KEY_COLUMNS = "SELECT " @@ -161,8 +161,7 @@ public Lsn getMaxLsn() throws SQLException { public void getChangesForTable( TableId tableId, Lsn fromLsn, Lsn toLsn, ResultSetConsumer consumer) throws SQLException { - final String query = - GET_ALL_CHANGES_FOR_TABLE.replace(STATEMENTS_PLACEHOLDER, cdcNameForTable(tableId)); + final String query = getAllChangesForTableQuery(cdcNameForTable(tableId)); prepareQuery( query, statement -> { @@ -192,9 +191,7 @@ public void getChangesForTables( int idx = 0; for (Db2ChangeTable changeTable : changeTables) { - final String query = - GET_ALL_CHANGES_FOR_TABLE.replace( - STATEMENTS_PLACEHOLDER, changeTable.getCaptureInstance()); + final String query = getAllChangesForTableQuery(changeTable.getCaptureInstance()); queries[idx] = query; // If the table was added in the middle of queried buffer we need // to adjust from to the first LSN available @@ -214,6 +211,10 @@ public void getChangesForTables( prepareQuery(queries, preparers, consumer); } + static String getAllChangesForTableQuery(String captureInstance) { + return GET_ALL_CHANGES_FOR_TABLE.replace(STATEMENTS_PLACEHOLDER, captureInstance); + } + /** * Obtain the next available position in the database log. * diff --git a/flink-cdc-connect/flink-cdc-source-connectors/flink-connector-db2-cdc/src/main/java/io/debezium/connector/db2/Db2StreamingChangeEventSource.java b/flink-cdc-connect/flink-cdc-source-connectors/flink-connector-db2-cdc/src/main/java/io/debezium/connector/db2/Db2StreamingChangeEventSource.java index 261a59fb2f2..b968b703e26 100644 --- a/flink-cdc-connect/flink-cdc-source-connectors/flink-connector-db2-cdc/src/main/java/io/debezium/connector/db2/Db2StreamingChangeEventSource.java +++ b/flink-cdc-connect/flink-cdc-source-connectors/flink-connector-db2-cdc/src/main/java/io/debezium/connector/db2/Db2StreamingChangeEventSource.java @@ -6,10 +6,15 @@ package io.debezium.connector.db2; +import io.debezium.data.Envelope.Operation; import io.debezium.pipeline.ErrorHandler; import io.debezium.pipeline.EventDispatcher; import io.debezium.pipeline.source.spi.ChangeTableResultSet; import io.debezium.pipeline.source.spi.StreamingChangeEventSource; +import io.debezium.pipeline.spi.OffsetContext; +import io.debezium.relational.Column; +import io.debezium.relational.RelationalChangeRecordEmitter; +import io.debezium.relational.Table; import io.debezium.relational.TableId; import io.debezium.schema.DatabaseSchema; import io.debezium.schema.SchemaChangeEvent.SchemaChangeEventType; @@ -26,9 +31,11 @@ import java.util.Arrays; import java.util.List; import java.util.Map; +import java.util.Objects; import java.util.PriorityQueue; import java.util.Queue; import java.util.Set; +import java.util.concurrent.atomic.AtomicBoolean; import java.util.concurrent.atomic.AtomicReference; import java.util.regex.Matcher; import java.util.regex.Pattern; @@ -60,11 +67,15 @@ public class Db2StreamingChangeEventSource implements StreamingChangeEventSource { + static final int OP_DIRECT_UPDATE = 5; + private static final int COL_COMMIT_LSN = 2; private static final int COL_ROW_LSN = 3; private static final int COL_OPERATION = 1; private static final int COL_DATA = 5; + private static final Lsn ZERO_LSN = Lsn.valueOf("00000000000000000000000000000000"); + private static final Pattern MISSING_CDC_FUNCTION_CHANGES_ERROR = Pattern.compile("Invalid object name 'cdc.fn_cdc_get_all_changes_(.*)'\\."); @@ -150,6 +161,7 @@ public void execute( if (currentMaxLsn.equals(lastProcessedPosition.getCommitLsn()) && shouldIncreaseFromLsn) { LOGGER.debug("No change in the database"); + dataConnection.rollback(); metronome.pause(); continue; } @@ -165,16 +177,24 @@ public void execute( while (!schemaChangeCheckpoints.isEmpty()) { migrateTable(partition, offsetContext, schemaChangeCheckpoints); } - if (!dataConnection.listOfNewChangeTables(fromLsn, currentMaxLsn).isEmpty()) { + final boolean cdcRegisterAdvanced = + !dataConnection.listOfNewChangeTables(fromLsn, currentMaxLsn).isEmpty(); + boolean hasPendingSchemaChangeCheckpoint = false; + if (cdcRegisterAdvanced) { final Db2ChangeTable[] tables = getCdcTablesToQuery(partition, offsetContext); tablesSlot.set(tables); for (Db2ChangeTable table : tables) { - if (table.getStartLsn().isBetween(fromLsn, currentMaxLsn)) { - LOGGER.info("Schema will be changed for {}", table); - schemaChangeCheckpoints.add(table); + if (table.getStartLsn().compareTo(fromLsn) >= 0 + && table.getStartLsn().compareTo(currentMaxLsn) <= 0) { + if (isTableSchemaChanged(table)) { + LOGGER.info("Schema will be changed for {}", table); + schemaChangeCheckpoints.add(table); + hasPendingSchemaChangeCheckpoint = true; + } } } } + final AtomicBoolean dispatchedDataChange = new AtomicBoolean(false); try { dataConnection.getChangesForTables( tablesSlot.get(), @@ -252,10 +272,10 @@ public void execute( tableWithSmallestLsn.next(); continue; } - if (tableWithSmallestLsn - .getChangeTable() - .getStopLsn() - .isAvailable() + if (isValidStopLsn( + tableWithSmallestLsn + .getChangeTable() + .getStopLsn()) && tableWithSmallestLsn .getChangeTable() .getStopLsn() @@ -332,17 +352,25 @@ public void execute( dispatcher.dispatchDataChangeEvent( partition, tableId, - new Db2ChangeRecordEmitter( - partition, - offsetContext, - operation, - data, - dataNext, - clock)); + operation == OP_DIRECT_UPDATE + ? new DirectUpdateRecordEmitter( + partition, offsetContext, data, clock) + : new Db2ChangeRecordEmitter( + partition, + offsetContext, + operation, + data, + dataNext, + clock)); + dispatchedDataChange.set(true); tableWithSmallestLsn.next(); } }); - lastProcessedPosition = TxLogPosition.valueOf(currentMaxLsn); + if (dispatchedDataChange.get()) { + lastProcessedPosition = offsetContext.getChangePosition(); + } else if (shouldAdvancePositionOnEmptyRead(hasPendingSchemaChangeCheckpoint)) { + lastProcessedPosition = TxLogPosition.valueOf(currentMaxLsn); + } // Terminate the transaction otherwise CDC could not be disabled for tables dataConnection.rollback(); // Determine whether to continue streaming in db2 cdc snapshot phase @@ -363,6 +391,7 @@ private void migrateTable( throws InterruptedException, SQLException { final Db2ChangeTable newTable = schemaChangeCheckpoints.poll(); LOGGER.info("Migrating schema to {}", newTable); + offsetContext.event(newTable.getSourceTableId(), Instant.now()); dispatcher.dispatchSchemaChangeEvent( partition, newTable.getSourceTableId(), @@ -374,6 +403,63 @@ private void migrateTable( SchemaChangeEventType.ALTER)); } + private boolean isTableSchemaChanged(Db2ChangeTable table) throws SQLException { + final Table currentTable = schema.tableFor(table.getSourceTableId()); + if (currentTable == null) { + return true; + } + + final Table latestTable = metadataConnection.getTableSchemaFromTable(table); + final boolean changed = isTableSchemaChanged(currentTable, latestTable); + if (!changed) { + LOGGER.debug( + "Ignoring CDC register LSN advance for {} because source table schema is unchanged", + table); + } + return changed; + } + + static boolean isTableSchemaChanged(Table currentTable, Table latestTable) { + return !columnsEqual(currentTable.columns(), latestTable.columns()) + || !currentTable + .primaryKeyColumnNames() + .equals(latestTable.primaryKeyColumnNames()); + } + + static boolean isValidStopLsn(Lsn stopLsn) { + return stopLsn.isAvailable() && stopLsn.compareTo(ZERO_LSN) > 0; + } + + static boolean shouldAdvancePositionOnEmptyRead(boolean hasPendingSchemaChangeCheckpoint) { + return !hasPendingSchemaChangeCheckpoint; + } + + private static boolean columnsEqual(List currentColumns, List latestColumns) { + if (currentColumns.size() != latestColumns.size()) { + return false; + } + for (int i = 0; i < currentColumns.size(); i++) { + Column current = currentColumns.get(i); + Column latest = latestColumns.get(i); + if (!Objects.equals(current.name(), latest.name()) + || current.jdbcType() != latest.jdbcType() + || !Objects.equals(current.length(), latest.length()) + || !Objects.equals(current.scale(), latest.scale()) + || current.isOptional() != latest.isOptional() + || !typeNamesEqual(current.typeName(), latest.typeName())) { + return false; + } + } + return true; + } + + private static boolean typeNamesEqual(String currentTypeName, String latestTypeName) { + if (currentTypeName == null || latestTypeName == null) { + return Objects.equals(currentTypeName, latestTypeName); + } + return currentTypeName.equalsIgnoreCase(latestTypeName); + } + private Db2ChangeTable[] processErrorFromChangeTableQuery( SQLException exception, Db2ChangeTable[] currentChangeTables) throws Exception { final Matcher m = MISSING_CDC_FUNCTION_CHANGES_ERROR.matcher(exception.getMessage()); @@ -495,6 +581,32 @@ protected TxLogPosition getNextChangePosition(ResultSet resultSet) throws SQLExc } } + static class DirectUpdateRecordEmitter extends RelationalChangeRecordEmitter { + + private final Object[] data; + + DirectUpdateRecordEmitter( + Db2Partition partition, OffsetContext offsetContext, Object[] data, Clock clock) { + super(partition, offsetContext, clock); + this.data = data; + } + + @Override + public Operation getOperation() { + return Operation.UPDATE; + } + + @Override + protected Object[] getOldColumnValues() { + return null; + } + + @Override + protected Object[] getNewColumnValues() { + return data; + } + } + /** expose control to the user to stop the connector. */ protected void afterHandleLsn(Db2Partition partition, Lsn toLsn) { // do nothing diff --git a/flink-cdc-connect/flink-cdc-source-connectors/flink-connector-db2-cdc/src/main/java/io/debezium/connector/db2/Db2ValueConverters.java b/flink-cdc-connect/flink-cdc-source-connectors/flink-connector-db2-cdc/src/main/java/io/debezium/connector/db2/Db2ValueConverters.java new file mode 100644 index 00000000000..141612dc070 --- /dev/null +++ b/flink-cdc-connect/flink-cdc-source-connectors/flink-connector-db2-cdc/src/main/java/io/debezium/connector/db2/Db2ValueConverters.java @@ -0,0 +1,82 @@ +/* + * Copyright Debezium Authors. + * + * Licensed under the Apache Software License version 2.0, available at http://www.apache.org/licenses/LICENSE-2.0 + */ + +package io.debezium.connector.db2; + +import io.debezium.jdbc.JdbcValueConverters; +import io.debezium.jdbc.TemporalPrecisionMode; +import io.debezium.relational.Column; +import io.debezium.relational.ValueConverter; +import org.apache.kafka.connect.data.Field; +import org.apache.kafka.connect.data.SchemaBuilder; + +import java.sql.Types; +import java.time.ZoneOffset; +import java.util.Locale; + +/** + * Conversion of DB2 specific datatypes. + * + *

    This class intentionally shadows Debezium 1.9.8's Db2ValueConverters because that version does + * not register a converter for DB2 DECFLOAT, which is reported by the IBM JDBC driver as {@link + * Types#OTHER}. Without this override Debezium drops DECFLOAT columns from data events while Flink + * CDC still exposes them in pipeline schema discovery. + */ +public class Db2ValueConverters extends JdbcValueConverters { + + private static final String DECFLOAT = "DECFLOAT"; + + public Db2ValueConverters() {} + + public Db2ValueConverters( + DecimalMode decimalMode, TemporalPrecisionMode temporalPrecisionMode) { + super(decimalMode, temporalPrecisionMode, ZoneOffset.UTC, null, null, null); + } + + @Override + public SchemaBuilder schemaBuilder(Column column) { + switch (column.jdbcType()) { + case Types.TINYINT: + // values are an 8-bit unsigned integer value between 0 and 255, so store as int16 + return SchemaBuilder.int16(); + default: + if (isDecfloat(column)) { + return SchemaBuilder.float64(); + } + return super.schemaBuilder(column); + } + } + + @Override + public ValueConverter converter(Column column, Field fieldDefn) { + switch (column.jdbcType()) { + case Types.TINYINT: + // values are an 8-bit unsigned integer value between 0 and 255, so store as int16 + return (data) -> convertSmallInt(column, fieldDefn, data); + default: + if (isDecfloat(column)) { + return (data) -> convertDouble(column, fieldDefn, data); + } + return super.converter(column, fieldDefn); + } + } + + /** Time precision in DB2 is defined in scale, the default one is 7. */ + @Override + protected int getTimePrecision(Column column) { + return column.scale().get(); + } + + protected Object convertTimestampWithZone(Column column, Field fieldDefn, Object data) { + return super.convertTimestampWithZone(column, fieldDefn, data); + } + + private static boolean isDecfloat(Column column) { + return column.jdbcType() == Types.OTHER + && column.typeName() != null + && column.typeName().toUpperCase(Locale.ROOT).startsWith(DECFLOAT); + } +} diff --git a/flink-cdc-connect/flink-cdc-source-connectors/flink-connector-db2-cdc/src/main/java/org/apache/flink/cdc/connectors/db2/source/utils/Db2TypeUtils.java b/flink-cdc-connect/flink-cdc-source-connectors/flink-connector-db2-cdc/src/main/java/org/apache/flink/cdc/connectors/db2/source/utils/Db2TypeUtils.java index 2c0f6a104b7..971da4e2a99 100644 --- a/flink-cdc-connect/flink-cdc-source-connectors/flink-connector-db2-cdc/src/main/java/org/apache/flink/cdc/connectors/db2/source/utils/Db2TypeUtils.java +++ b/flink-cdc-connect/flink-cdc-source-connectors/flink-connector-db2-cdc/src/main/java/org/apache/flink/cdc/connectors/db2/source/utils/Db2TypeUtils.java @@ -23,10 +23,13 @@ import io.debezium.relational.Column; import java.sql.Types; +import java.util.Locale; /** Utilities for converting from Db2 types to Flink SQL types. */ public class Db2TypeUtils { + private static final String DECFLOAT = "DECFLOAT"; + /** Returns a corresponding Flink data type from a debezium {@link Column}. */ public static DataType fromDbzColumn(Column column) { DataType dataType = convertFromColumn(column); @@ -64,6 +67,14 @@ private static DataType convertFromColumn(Column column) { return DataTypes.FLOAT(); case Types.DOUBLE: return DataTypes.DOUBLE(); + case Types.OTHER: + if (isDecfloat(column)) { + return DataTypes.DOUBLE(); + } + throw new UnsupportedOperationException( + String.format( + "Don't support DB2 type '%s' yet, jdbcType:'%s'.", + column.typeName(), column.jdbcType())); case Types.DECIMAL: case Types.NUMERIC: return DataTypes.DECIMAL(column.length(), column.scale().orElse(0)); @@ -80,4 +91,9 @@ private static DataType convertFromColumn(Column column) { column.typeName(), column.jdbcType())); } } + + private static boolean isDecfloat(Column column) { + return column.typeName() != null + && column.typeName().toUpperCase(Locale.ROOT).startsWith(DECFLOAT); + } } diff --git a/flink-cdc-connect/flink-cdc-source-connectors/flink-connector-db2-cdc/src/test/java/io/debezium/connector/db2/Db2ConnectionQueryTest.java b/flink-cdc-connect/flink-cdc-source-connectors/flink-connector-db2-cdc/src/test/java/io/debezium/connector/db2/Db2ConnectionQueryTest.java new file mode 100644 index 00000000000..e622b15d776 --- /dev/null +++ b/flink-cdc-connect/flink-cdc-source-connectors/flink-connector-db2-cdc/src/test/java/io/debezium/connector/db2/Db2ConnectionQueryTest.java @@ -0,0 +1,32 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package io.debezium.connector.db2; + +import org.junit.jupiter.api.Test; + +import static org.assertj.core.api.Assertions.assertThat; + +class Db2ConnectionQueryTest { + + @Test + void testDirectUpdateOperationIsMapped() { + assertThat(Db2Connection.getAllChangesForTableQuery("CDC_DB2INST1_PRODUCTS")) + .contains("WHEN IBMSNAP_OPERATION = 'U' THEN 5") + .contains("FROM ASNCDC.CDC_DB2INST1_PRODUCTS cdc"); + } +} diff --git a/flink-cdc-connect/flink-cdc-source-connectors/flink-connector-db2-cdc/src/test/java/io/debezium/connector/db2/Db2StreamingChangeEventSourceTest.java b/flink-cdc-connect/flink-cdc-source-connectors/flink-connector-db2-cdc/src/test/java/io/debezium/connector/db2/Db2StreamingChangeEventSourceTest.java new file mode 100644 index 00000000000..fa23329f8b2 --- /dev/null +++ b/flink-cdc-connect/flink-cdc-source-connectors/flink-connector-db2-cdc/src/test/java/io/debezium/connector/db2/Db2StreamingChangeEventSourceTest.java @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package io.debezium.connector.db2; + +import io.debezium.data.Envelope.Operation; +import io.debezium.relational.Column; +import io.debezium.relational.Table; +import io.debezium.relational.TableId; +import org.junit.jupiter.api.Test; + +import java.sql.Types; + +import static org.assertj.core.api.Assertions.assertThat; + +/** Tests for DB2 streaming schema-change decisions. */ +class Db2StreamingChangeEventSourceTest { + + @Test + void testSchemaChangeDetectionIgnoresUnchangedSchema() { + Table table = + Table.editor() + .tableId(new TableId("TESTDB", "DB2INST1", "PRODUCTS")) + .addColumn(Column.editor().name("ID").jdbcType(Types.INTEGER).create()) + .addColumn(Column.editor().name("NAME").jdbcType(Types.VARCHAR).create()) + .setPrimaryKeyNames("ID") + .create(); + + assertThat(Db2StreamingChangeEventSource.isTableSchemaChanged(table, table)).isFalse(); + } + + @Test + void testSchemaChangeDetectionFindsAddedColumn() { + Table before = + Table.editor() + .tableId(new TableId("TESTDB", "DB2INST1", "PRODUCTS")) + .addColumn(Column.editor().name("ID").jdbcType(Types.INTEGER).create()) + .addColumn(Column.editor().name("NAME").jdbcType(Types.VARCHAR).create()) + .setPrimaryKeyNames("ID") + .create(); + Table after = + Table.editor() + .tableId(new TableId("TESTDB", "DB2INST1", "PRODUCTS")) + .addColumn(Column.editor().name("ID").jdbcType(Types.INTEGER).create()) + .addColumn(Column.editor().name("NAME").jdbcType(Types.VARCHAR).create()) + .addColumn(Column.editor().name("VOLUME").jdbcType(Types.FLOAT).create()) + .setPrimaryKeyNames("ID") + .create(); + + assertThat(Db2StreamingChangeEventSource.isTableSchemaChanged(before, after)).isTrue(); + } + + @Test + void testZeroStopLsnIsOpenEnded() { + assertThat( + Db2StreamingChangeEventSource.isValidStopLsn( + Lsn.valueOf("00000000000000000000000000000000"))) + .isFalse(); + assertThat( + Db2StreamingChangeEventSource.isValidStopLsn( + Lsn.valueOf("00000000000000000000000000000001"))) + .isTrue(); + } + + @Test + void testPendingSchemaChangeCheckpointPreventsEmptyReadLsnAdvance() { + assertThat(Db2StreamingChangeEventSource.shouldAdvancePositionOnEmptyRead(true)).isFalse(); + assertThat(Db2StreamingChangeEventSource.shouldAdvancePositionOnEmptyRead(false)).isTrue(); + } + + @Test + void testDirectUpdateDoesNotFakeBeforeImage() { + Object[] data = new Object[] {1, "after"}; + Db2StreamingChangeEventSource.DirectUpdateRecordEmitter emitter = + new Db2StreamingChangeEventSource.DirectUpdateRecordEmitter(null, null, data, null); + + assertThat(emitter.getOperation()).isEqualTo(Operation.UPDATE); + assertThat(emitter.getOldColumnValues()).isNull(); + assertThat(emitter.getNewColumnValues()).isSameAs(data); + } +} diff --git a/flink-cdc-connect/flink-cdc-source-connectors/flink-connector-db2-cdc/src/test/java/io/debezium/connector/db2/Db2ValueConvertersTest.java b/flink-cdc-connect/flink-cdc-source-connectors/flink-connector-db2-cdc/src/test/java/io/debezium/connector/db2/Db2ValueConvertersTest.java new file mode 100644 index 00000000000..a73257725c9 --- /dev/null +++ b/flink-cdc-connect/flink-cdc-source-connectors/flink-connector-db2-cdc/src/test/java/io/debezium/connector/db2/Db2ValueConvertersTest.java @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package io.debezium.connector.db2; + +import io.debezium.relational.Column; +import org.apache.kafka.connect.data.Schema; +import org.junit.jupiter.api.Test; + +import java.math.BigDecimal; +import java.sql.Types; + +import static org.assertj.core.api.Assertions.assertThat; + +class Db2ValueConvertersTest { + + private final Db2ValueConverters converters = new Db2ValueConverters(); + + @Test + void shouldExposeDecfloatAsDoubleSchema() { + Schema schema = converters.schemaBuilder(decfloatColumn("C_DECFLOAT16")).build(); + + assertThat(schema.type()).isEqualTo(Schema.Type.FLOAT64); + } + + @Test + void shouldConvertDecfloatValueToDouble() { + Object converted = + converters + .converter(decfloatColumn("C_DECFLOAT34"), null) + .convert(new BigDecimal("12345.6789")); + + assertThat(converted).isEqualTo(12345.6789d); + } + + private static Column decfloatColumn(String name) { + return Column.editor() + .name(name) + .position(1) + .jdbcType(Types.OTHER) + .type("DECFLOAT") + .optional(true) + .create(); + } +} diff --git a/flink-cdc-connect/flink-cdc-source-connectors/flink-connector-db2-cdc/src/test/java/org/apache/flink/cdc/connectors/db2/Db2ConnectionContainerTest.java b/flink-cdc-connect/flink-cdc-source-connectors/flink-connector-db2-cdc/src/test/java/org/apache/flink/cdc/connectors/db2/Db2ConnectionContainerTest.java new file mode 100644 index 00000000000..a7eced93061 --- /dev/null +++ b/flink-cdc-connect/flink-cdc-source-connectors/flink-connector-db2-cdc/src/test/java/org/apache/flink/cdc/connectors/db2/Db2ConnectionContainerTest.java @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.cdc.connectors.db2; + +import io.debezium.connector.db2.Db2ChangeTable; +import io.debezium.connector.db2.Db2Connection; +import io.debezium.connector.db2.Lsn; +import io.debezium.jdbc.JdbcConfiguration; +import org.awaitility.Awaitility; +import org.junit.jupiter.api.Test; + +import java.sql.Connection; +import java.sql.ResultSet; +import java.sql.Statement; +import java.util.Set; +import java.util.concurrent.TimeUnit; + +import static org.assertj.core.api.Assertions.assertThat; +import static org.testcontainers.containers.Db2Container.DB2_PORT; + +/** Testcontainers-backed tests for {@link Db2Connection}. */ +class Db2ConnectionContainerTest extends Db2TestBase { + + @Test + void testListOfNewChangeTablesDetectsRecapturedTableAfterSchemaChange() throws Exception { + initializeDb2Table("inventory", "PRODUCTS"); + + try (Db2Connection db2Connection = createDb2Connection(); + Connection jdbcConnection = getJdbcConnection(); + Statement statement = jdbcConnection.createStatement()) { + statement.execute( + "INSERT INTO DB2INST1.PRODUCTS(NAME, DESCRIPTION, WEIGHT) " + + "VALUES ('baseline robot', 'before schema change', 1.0)"); + Lsn fromLsn = waitForAvailableMaxLsn(db2Connection); + db2Connection.rollback(); + + statement.execute("ALTER TABLE DB2INST1.PRODUCTS ADD COLUMN VOLUME FLOAT"); + statement.execute("CALL ASNCDC.REMOVETABLE('DB2INST1', 'PRODUCTS')"); + statement.execute("CALL ASNCDC.ADDTABLE('DB2INST1', 'PRODUCTS')"); + statement.execute( + "UPDATE ASNCDC.IBMSNAP_REGISTER " + + "SET CD_OLD_SYNCHPOINT = X'00000000000000000000000000000000' " + + "WHERE SOURCE_OWNER = 'DB2INST1' AND SOURCE_TABLE = 'PRODUCTS'"); + statement.execute("VALUES ASNCDC.ASNCDCSERVICES('reinit','asncdc')"); + statement.execute( + "INSERT INTO DB2INST1.PRODUCTS(NAME, DESCRIPTION, WEIGHT, VOLUME) " + + "VALUES ('schema robot', 'after schema change', 2.0, 13.5)"); + + assertChangeTableContainsColumn(jdbcConnection, "VOLUME"); + Awaitility.await("new DB2 CDC capture table is discoverable") + .atMost(60, TimeUnit.SECONDS) + .untilAsserted( + () -> { + Lsn toLsn = db2Connection.getMaxLsn(); + if (toLsn.compareTo(fromLsn) <= 0) { + db2Connection.rollback(); + } + assertThat(toLsn.compareTo(fromLsn)).isGreaterThan(0); + Set newChangeTables = + db2Connection.listOfNewChangeTables(fromLsn, toLsn); + assertThat(newChangeTables) + .isNotEmpty() + .anySatisfy( + table -> { + assertThat(table.getStartLsn()) + .isEqualTo(toLsn); + assertThat( + table.getStartLsn() + .compareTo(fromLsn)) + .isGreaterThanOrEqualTo(0); + assertThat(table.getStartLsn().compareTo(toLsn)) + .isLessThanOrEqualTo(0); + }); + }); + } + } + + private Db2Connection createDb2Connection() { + JdbcConfiguration jdbcConfiguration = + JdbcConfiguration.create() + .with(JdbcConfiguration.HOSTNAME, DB2_CONTAINER.getHost()) + .with(JdbcConfiguration.PORT, DB2_CONTAINER.getMappedPort(DB2_PORT)) + .with(JdbcConfiguration.DATABASE, DB2_CONTAINER.getDatabaseName()) + .with(JdbcConfiguration.USER, DB2_CONTAINER.getUsername()) + .with(JdbcConfiguration.PASSWORD, DB2_CONTAINER.getPassword()) + .build(); + return new Db2Connection(jdbcConfiguration); + } + + private Lsn waitForAvailableMaxLsn(Db2Connection db2Connection) { + final Lsn[] maxLsn = new Lsn[1]; + Awaitility.await("DB2 CDC max LSN") + .atMost(60, TimeUnit.SECONDS) + .untilAsserted( + () -> { + maxLsn[0] = db2Connection.getMaxLsn(); + assertThat(maxLsn[0].isAvailable()).isTrue(); + }); + return maxLsn[0]; + } + + private void assertChangeTableContainsColumn(Connection connection, String columnName) + throws Exception { + try (ResultSet resultSet = + connection + .createStatement() + .executeQuery( + "SELECT COUNT(*) FROM SYSCAT.COLUMNS " + + "WHERE TABSCHEMA = 'ASNCDC' " + + "AND TABNAME = 'CDC_DB2INST1_PRODUCTS' " + + "AND COLNAME = '" + + columnName + + "'")) { + assertThat(resultSet.next()).isTrue(); + assertThat(resultSet.getInt(1)).isEqualTo(1); + } + } +} diff --git a/flink-cdc-connect/flink-cdc-source-connectors/flink-connector-db2-cdc/src/test/java/org/apache/flink/cdc/connectors/db2/Db2SourceTest.java b/flink-cdc-connect/flink-cdc-source-connectors/flink-connector-db2-cdc/src/test/java/org/apache/flink/cdc/connectors/db2/Db2SourceTest.java index 860cc5e7c33..5f6e943d739 100644 --- a/flink-cdc-connect/flink-cdc-source-connectors/flink-connector-db2-cdc/src/test/java/org/apache/flink/cdc/connectors/db2/Db2SourceTest.java +++ b/flink-cdc-connect/flink-cdc-source-connectors/flink-connector-db2-cdc/src/test/java/org/apache/flink/cdc/connectors/db2/Db2SourceTest.java @@ -38,6 +38,8 @@ import org.apache.flink.util.Preconditions; import com.jayway.jsonpath.JsonPath; +import io.debezium.data.Envelope; +import org.apache.kafka.connect.data.Struct; import org.apache.kafka.connect.source.SourceRecord; import org.assertj.core.api.Assertions; import org.junit.jupiter.api.Test; @@ -119,6 +121,7 @@ public void go() throws Exception { statement.execute("UPDATE DB2INST1.PRODUCTS SET WEIGHT=1345.67 WHERE ID=2001"); records = drain(sourceContext, 1); assertUpdate(records.get(0), "ID", 2001); + assertDirectUpdateImages(records.get(0), 1234.56d, 1345.67d); // --------------------------------------------------------------------------------------------------------------- // Change our schema with a fully-qualified name; we should still see this event @@ -284,6 +287,7 @@ public void go() throws Exception { List records = drain(sourceContext3, 2); assertInsert(records.get(0), "ID", 1001); assertUpdate(records.get(1), "ID", 1001); + assertDirectUpdateImages(records.get(1), 1234.56d, 1345.67d); // make sure there is no more events Assertions.assertThat(waitForAvailableRecords(Duration.ofSeconds(3), sourceContext3)) @@ -384,6 +388,17 @@ private boolean waitForAvailableRecords(Duration timeout, TestSourceContext s return !sourceContext.getCollectedOutputs().isEmpty(); } + private static void assertDirectUpdateImages( + SourceRecord record, double expectedBeforeWeight, double expectedAfterWeight) { + Struct value = (Struct) record.value(); + Struct before = value.getStruct(Envelope.FieldName.BEFORE); + Assertions.assertThat((Double) before.get("WEIGHT")) + .isCloseTo(expectedBeforeWeight, Assertions.within(0.00001d)); + Struct after = value.getStruct(Envelope.FieldName.AFTER); + Assertions.assertThat((Double) after.get("WEIGHT")) + .isCloseTo(expectedAfterWeight, Assertions.within(0.00001d)); + } + private static void setupSource(DebeziumSourceFunction source) throws Exception { setupSource( source, null, null, null, diff --git a/flink-cdc-connect/flink-cdc-source-connectors/flink-connector-db2-cdc/src/test/java/org/apache/flink/cdc/connectors/db2/Db2TestBase.java b/flink-cdc-connect/flink-cdc-source-connectors/flink-connector-db2-cdc/src/test/java/org/apache/flink/cdc/connectors/db2/Db2TestBase.java index 92c595e3ae0..0bd3f5252c6 100644 --- a/flink-cdc-connect/flink-cdc-source-connectors/flink-connector-db2-cdc/src/test/java/org/apache/flink/cdc/connectors/db2/Db2TestBase.java +++ b/flink-cdc-connect/flink-cdc-source-connectors/flink-connector-db2-cdc/src/test/java/org/apache/flink/cdc/connectors/db2/Db2TestBase.java @@ -32,8 +32,10 @@ import org.testcontainers.lifecycle.Startables; import org.testcontainers.utility.DockerImageName; +import java.io.IOException; import java.net.URISyntaxException; import java.net.URL; +import java.nio.charset.StandardCharsets; import java.nio.file.Files; import java.nio.file.Path; import java.nio.file.Paths; @@ -45,6 +47,7 @@ import java.util.Arrays; import java.util.List; import java.util.Locale; +import java.util.concurrent.CompletableFuture; import java.util.concurrent.TimeUnit; import java.util.concurrent.atomic.AtomicBoolean; import java.util.regex.Matcher; @@ -63,10 +66,11 @@ public class Db2TestBase { private static final DockerImageName DEBEZIUM_DOCKER_IMAGE_NAME = DockerImageName.parse( new ImageFromDockerfile("custom/db2-cdc:1.4") - .withDockerfile(getFilePath("db2_server/Dockerfile")) + .withDockerfile( + createDb2ServerBuildContext().resolve("Dockerfile")) .get()) .asCompatibleSubstituteFor("ibmcom/db2"); - private static boolean db2AsnAgentRunning = false; + private static final CompletableFuture db2AsnAgentStarted = new CompletableFuture<>(); private static final Pattern COMMENT_PATTERN = Pattern.compile("^(.*)--.*$"); protected static final Db2Container DB2_CONTAINER = @@ -77,13 +81,15 @@ public class Db2TestBase { .withEnv("AUTOCONFIG", "false") .withEnv("ARCHIVE_LOGS", "true") .acceptLicense() + .withCreateContainerCmdModifier( + createContainerCmd -> createContainerCmd.withPlatform("linux/amd64")) .withLogConsumer(new Slf4jLogConsumer(LOG)) .withLogConsumer( outputFrame -> { if (outputFrame .getUtf8String() .contains("The asncdc program enable finished")) { - db2AsnAgentRunning = true; + db2AsnAgentStarted.complete(null); } }); @@ -93,15 +99,9 @@ public static void startContainers() { Startables.deepStart(Stream.of(DB2_CONTAINER)).join(); LOG.info("Containers are started."); - LOG.info("Waiting db2 asn agent start..."); - while (!db2AsnAgentRunning) { - try { - Thread.sleep(5000L); - } catch (InterruptedException e) { - LOG.error("unexpected interrupted exception", e); - } - } - LOG.info("Db2 asn agent are started."); + db2AsnAgentStarted.join(); + assertCdcAgentRunning(); + LOG.info("Db2 asn agent is available."); } @AfterAll @@ -120,6 +120,36 @@ protected Connection getJdbcConnection() throws SQLException { DB2_CONTAINER.getPassword()); } + private static void assertCdcAgentRunning() { + try { + Awaitility.await("DB2 ASN agent status") + .atMost(120, TimeUnit.SECONDS) + .until( + () -> { + try (Connection connection = + DriverManager.getConnection( + DB2_CONTAINER.getJdbcUrl(), + DB2_CONTAINER.getUsername(), + DB2_CONTAINER.getPassword()); + Statement statement = connection.createStatement(); + ResultSet resultSet = + statement.executeQuery( + "VALUES ASNCDC.ASNCDCSERVICES('status','asncdc')")) { + return resultSet.next() + && !resultSet + .getString(1) + .toLowerCase(Locale.ROOT) + .contains("asncap is not running"); + } catch (SQLException e) { + LOG.warn("DB2 ASN agent status check failed, will retry.", e); + return false; + } + }); + } catch (Exception e) { + throw new FlinkRuntimeException("Failed to verify DB2 ASN agent status.", e); + } + } + private static Path getFilePath(String resourceFilePath) { Path path = null; try { @@ -134,6 +164,37 @@ private static Path getFilePath(String resourceFilePath) { return path; } + private static Path createDb2ServerBuildContext() { + Path sourceDir = getFilePath("db2_server/Dockerfile").getParent(); + try { + Path targetDir = Files.createTempDirectory("flink-cdc-db2-server-build"); + try (Stream files = Files.walk(sourceDir)) { + for (Path source : files.collect(Collectors.toList())) { + Path target = targetDir.resolve(sourceDir.relativize(source).toString()); + if (Files.isDirectory(source)) { + Files.createDirectories(target); + } else { + Files.createDirectories(target.getParent()); + Files.copy(source, target); + } + } + } + Path dockerfile = targetDir.resolve("Dockerfile"); + String dockerfileContent = + new String(Files.readAllBytes(dockerfile), StandardCharsets.UTF_8); + Files.write( + dockerfile, + dockerfileContent + .replace( + "FROM ibmcom/db2:11.5.0.0a", + "FROM --platform=linux/amd64 ibmcom/db2:11.5.0.0a") + .getBytes(StandardCharsets.UTF_8)); + return targetDir; + } catch (IOException e) { + throw new RuntimeException("Failed to create DB2 Docker build context", e); + } + } + private static void dropTestTable(Connection connection, String tableName) { try {