|
| 1 | +--- |
| 2 | +{ |
| 3 | + "title": "MySQL", |
| 4 | + "language": "en", |
| 5 | + "description": "Doris provides multiple ways to load data from MySQL, including ad-hoc loading via JDBC Catalog, continuous full + incremental synchronization via Streaming Job, and CDC sync via Flink Doris Connector." |
| 6 | +} |
| 7 | +--- |
| 8 | + |
| 9 | +Doris provides the following ways to load data from MySQL: |
| 10 | + |
| 11 | +- **Loading MySQL data via JDBC Catalog** |
| 12 | + |
| 13 | +Doris uses JDBC Catalog to map MySQL as an external catalog, allowing direct SQL queries against MySQL data. Combined with `INSERT INTO` or `CREATE TABLE AS SELECT`, this is suitable for one-time migration or periodic batch loading. |
| 14 | + |
| 15 | +- **Continuously syncing MySQL data via Streaming Job** |
| 16 | + |
| 17 | +Doris uses Streaming Job to continuously sync full and incremental data from MySQL to Doris. By integrating [Flink CDC](https://github.com/apache/flink-cdc) reading capability, Doris keeps the job running, reads Binlog from MySQL and writes it to Doris tables with exactly-once semantics. Two modes are supported: table-level sync and database-level sync. Available since Doris 4.1.0. |
| 18 | + |
| 19 | +- **Loading MySQL data via Flink Doris Connector** |
| 20 | + |
| 21 | +Use Flink Doris Connector together with Flink MySQL CDC for real-time synchronization. This is suitable for scenarios that require additional Flink stream processing logic. The connector also provides a one-click full-database synchronization tool. For details, see [Flink Doris Connector](../../../ecosystem/flink-doris-connector/flink-doris-connector.md). |
| 22 | + |
| 23 | +- **Loading MySQL data via third-party tools** |
| 24 | + |
| 25 | +Data integration tools such as [DataX](../../../ecosystem/datax), [SeaTunnel](../../../ecosystem/seatunnel), and [CloudCanal](../../../ecosystem/cloudcanal) also support syncing data from MySQL to Doris. |
| 26 | + |
| 27 | +In most cases, you can use JDBC Catalog directly for one-time data migration. When continuous full + incremental synchronization is required, Streaming Job is recommended. |
| 28 | + |
| 29 | +## Loading MySQL data via JDBC Catalog |
| 30 | + |
| 31 | +Use JDBC Catalog to map MySQL as an external catalog, then use `INSERT INTO` or `CREATE TABLE AS SELECT` to load data. For detailed syntax, see [JDBC MySQL Catalog](../../../lakehouse/catalogs/jdbc-mysql-catalog.md). |
| 32 | + |
| 33 | +### Step 1: Prepare data in MySQL |
| 34 | + |
| 35 | +```sql |
| 36 | +CREATE TABLE test.students ( |
| 37 | + id INT PRIMARY KEY, |
| 38 | + name VARCHAR(64), |
| 39 | + age INT |
| 40 | +); |
| 41 | + |
| 42 | +INSERT INTO test.students VALUES (1, 'Emily', 25), (2, 'Bob', 30); |
| 43 | +``` |
| 44 | + |
| 45 | +### Step 2: Create a Catalog in Doris |
| 46 | + |
| 47 | +```sql |
| 48 | +CREATE CATALOG mysql_catalog PROPERTIES ( |
| 49 | + "type" = "jdbc", |
| 50 | + "user" = "root", |
| 51 | + "password" = "123456", |
| 52 | + "jdbc_url" = "jdbc:mysql://127.0.0.1:3306/test", |
| 53 | + "driver_url" = "mysql-connector-java-8.0.25.jar", |
| 54 | + "driver_class" = "com.mysql.cj.jdbc.Driver" |
| 55 | +); |
| 56 | +``` |
| 57 | + |
| 58 | +### Step 3: Create the target table in Doris |
| 59 | + |
| 60 | +```sql |
| 61 | +CREATE DATABASE IF NOT EXISTS doris_db; |
| 62 | + |
| 63 | +CREATE TABLE doris_db.students ( |
| 64 | + id INT, |
| 65 | + name VARCHAR(64), |
| 66 | + age INT |
| 67 | +) |
| 68 | +UNIQUE KEY(id) |
| 69 | +DISTRIBUTED BY HASH(id) BUCKETS 1 |
| 70 | +PROPERTIES ("replication_num" = "1"); |
| 71 | +``` |
| 72 | + |
| 73 | +### Step 4: Load data with INSERT INTO |
| 74 | + |
| 75 | +```sql |
| 76 | +INSERT INTO doris_db.students |
| 77 | +SELECT id, name, age FROM mysql_catalog.test.students; |
| 78 | +``` |
| 79 | + |
| 80 | +If the target table does not exist yet, you can also use `CREATE TABLE AS SELECT` to create the table and load data in one step: |
| 81 | + |
| 82 | +```sql |
| 83 | +CREATE TABLE doris_db.students |
| 84 | +PROPERTIES ("replication_num" = "1") |
| 85 | +AS |
| 86 | +SELECT * FROM mysql_catalog.test.students; |
| 87 | +``` |
| 88 | + |
| 89 | +### Step 5: Verify loaded data |
| 90 | + |
| 91 | +```sql |
| 92 | +SELECT * FROM doris_db.students; |
| 93 | ++----+-------+------+ |
| 94 | +| id | name | age | |
| 95 | ++----+-------+------+ |
| 96 | +| 1 | Emily | 25 | |
| 97 | +| 2 | Bob | 30 | |
| 98 | ++----+-------+------+ |
| 99 | +``` |
| 100 | + |
| 101 | +## Continuously syncing MySQL data via Streaming Job |
| 102 | + |
| 103 | +Streaming Job continuously reads MySQL Binlog via Flink CDC and writes it to Doris. Two modes are supported: |
| 104 | + |
| 105 | +- [MySQL Database-Level Sync](../streaming-job/continuous-load-mysql-database.md): sync at the database level (use `include_tables` to sync one, several, or all tables). Doris automatically creates downstream tables on first sync. Provides at-least-once semantics. |
| 106 | +- [MySQL Table-Level Sync](../streaming-job/continuous-load-mysql-table.md): sync at the table level. The target table must be pre-created in Doris. Supports flexible column mapping, data transformation, and exactly-once semantics. |
| 107 | + |
| 108 | +### Limitations |
| 109 | + |
| 110 | +1. Only primary key tables (Unique Key) are supported. |
| 111 | +2. Load privilege is required. Database-level sync also needs Create privilege when auto-creating downstream tables on first run. |
| 112 | +3. Available since Doris 4.1.0. |
| 113 | + |
| 114 | +### Prerequisites |
| 115 | + |
| 116 | +Before submitting a Streaming Job, Binlog must be enabled on the MySQL side and the user must be granted the corresponding REPLICATION privileges. For environment-specific setup steps, see: |
| 117 | + |
| 118 | +- [Amazon RDS MySQL Setup Guide](../streaming-job/prerequisites/amazon-rds-mysql.md) |
| 119 | +- [Amazon Aurora MySQL Setup Guide](../streaming-job/prerequisites/amazon-aurora-mysql.md) |
| 120 | +- See [Continuous Load Overview](../streaming-job/continuous-load-overview.md) for notes and required permissions of each mode. |
| 121 | + |
| 122 | +### Operation Example: Database-Level Sync |
| 123 | + |
| 124 | +Database-level sync uses the `FROM MYSQL ... TO DATABASE ...` syntax. The target is a Doris database, and the downstream tables are automatically created on first sync. |
| 125 | + |
| 126 | +#### Step 1: Prepare data in MySQL |
| 127 | + |
| 128 | +```sql |
| 129 | +CREATE TABLE test.students ( |
| 130 | + id INT PRIMARY KEY, |
| 131 | + name VARCHAR(64), |
| 132 | + age INT |
| 133 | +); |
| 134 | + |
| 135 | +INSERT INTO test.students VALUES (1, 'Emily', 25), (2, 'Bob', 30); |
| 136 | +``` |
| 137 | + |
| 138 | +#### Step 2: Create the target database in Doris |
| 139 | + |
| 140 | +Database-level sync **does not require pre-creating tables**, but the target database that hosts them must exist: |
| 141 | + |
| 142 | +```sql |
| 143 | +CREATE DATABASE IF NOT EXISTS doris_db; |
| 144 | +``` |
| 145 | + |
| 146 | +#### Step 3: Create a Streaming Job |
| 147 | + |
| 148 | +The example below uses `include_tables` to sync only the `students` table (multiple tables can be comma-separated; leave empty to sync the whole database): |
| 149 | + |
| 150 | +```sql |
| 151 | +CREATE JOB mysql_db_sync |
| 152 | +ON STREAMING |
| 153 | +FROM MYSQL ( |
| 154 | + "jdbc_url" = "jdbc:mysql://127.0.0.1:3306", |
| 155 | + "driver_url" = "mysql-connector-java-8.0.25.jar", |
| 156 | + "driver_class" = "com.mysql.cj.jdbc.Driver", |
| 157 | + "user" = "root", |
| 158 | + "password" = "123456", |
| 159 | + "database" = "test", |
| 160 | + "include_tables" = "students", |
| 161 | + "offset" = "initial" |
| 162 | +) |
| 163 | +TO DATABASE doris_db ( |
| 164 | + "table.create.properties.replication_num" = "1" -- set to 1 in single-BE deployments |
| 165 | +); |
| 166 | +``` |
| 167 | + |
| 168 | +#### Step 4: Check job status |
| 169 | + |
| 170 | +```sql |
| 171 | +SELECT * FROM jobs("type"="insert") WHERE ExecuteType = "STREAMING"; |
| 172 | +``` |
| 173 | + |
| 174 | +#### Step 5: Inspect auto-created Doris tables and loaded data |
| 175 | + |
| 176 | +```sql |
| 177 | +SHOW TABLES FROM doris_db; |
| 178 | +SELECT * FROM doris_db.students; |
| 179 | +``` |
| 180 | + |
| 181 | +For more common operations and full parameter reference, see [MySQL Database-Level Sync](../streaming-job/continuous-load-mysql-database.md). |
| 182 | + |
| 183 | +### Operation Example: Table-Level Sync |
| 184 | + |
| 185 | +#### Step 1: Prepare data in MySQL |
| 186 | + |
| 187 | +```sql |
| 188 | +CREATE TABLE test.students ( |
| 189 | + id INT PRIMARY KEY, |
| 190 | + name VARCHAR(64), |
| 191 | + age INT |
| 192 | +); |
| 193 | + |
| 194 | +INSERT INTO test.students VALUES (1, 'Emily', 25), (2, 'Bob', 30); |
| 195 | +``` |
| 196 | + |
| 197 | +#### Step 2: Create the target table in Doris |
| 198 | + |
| 199 | +Table-level sync requires the target table to exist beforehand: |
| 200 | + |
| 201 | +```sql |
| 202 | +CREATE DATABASE IF NOT EXISTS doris_db; |
| 203 | + |
| 204 | +CREATE TABLE doris_db.students ( |
| 205 | + id INT, |
| 206 | + name VARCHAR(64), |
| 207 | + age INT |
| 208 | +) |
| 209 | +UNIQUE KEY(id) |
| 210 | +DISTRIBUTED BY HASH(id) BUCKETS 1 |
| 211 | +PROPERTIES ("replication_num" = "1"); |
| 212 | +``` |
| 213 | + |
| 214 | +#### Step 3: Create a Streaming Job |
| 215 | + |
| 216 | +Use [CREATE STREAMING JOB](../../../sql-manual/sql-statements/job/CREATE-STREAMING-JOB.md) with the `INSERT INTO ... SELECT * FROM cdc_stream(...)` syntax: |
| 217 | + |
| 218 | +```sql |
| 219 | +CREATE JOB mysql_students_sync |
| 220 | +ON STREAMING |
| 221 | +DO |
| 222 | +INSERT INTO doris_db.students |
| 223 | +SELECT * FROM cdc_stream( |
| 224 | + "type" = "mysql", |
| 225 | + "jdbc_url" = "jdbc:mysql://127.0.0.1:3306", |
| 226 | + "driver_url" = "mysql-connector-java-8.0.25.jar", |
| 227 | + "driver_class" = "com.mysql.cj.jdbc.Driver", |
| 228 | + "user" = "root", |
| 229 | + "password" = "123456", |
| 230 | + "database" = "test", |
| 231 | + "table" = "students", |
| 232 | + "offset" = "initial" |
| 233 | +); |
| 234 | +``` |
| 235 | + |
| 236 | +#### Step 4: Check job status |
| 237 | + |
| 238 | +```sql |
| 239 | +SELECT * FROM jobs("type"="insert") WHERE ExecuteType = "STREAMING"; |
| 240 | +``` |
| 241 | + |
| 242 | +#### Step 5: Verify loaded data |
| 243 | + |
| 244 | +```sql |
| 245 | +SELECT * FROM doris_db.students; |
| 246 | +``` |
| 247 | + |
| 248 | +For more common operations (pause, resume, delete, check task, etc.) and full parameter reference, see [MySQL Table-Level Sync](../streaming-job/continuous-load-mysql-table.md). |
0 commit comments