Skip to content

Commit a3000c1

Browse files
authored
[doc] add MySQL and PostgreSQL data source pages (#3600)
## Summary Add dedicated MySQL and PostgreSQL pages under **Data Loading → Data Source** to surface Doris's continuous load (Streaming Job) capability for MySQL and PostgreSQL, which were previously only discoverable inside the generic `migrate-data-from-other-oltp` page. Each new page covers four import paths, with **JDBC Catalog** as the primary one-time-migration option and **Streaming Job (Continuous Load)** as the recommended path for full + incremental sync since 4.1.0: - JDBC Catalog (with a 5-step example: source prep → catalog → target table → INSERT/CTAS → verify) - Streaming Job — including a Limitations section, a Prerequisites section linking to RDS / Aurora setup guides and the Continuous Load overview, plus complete 5-step examples for both **database-level sync** (using `FROM MYSQL/POSTGRES … TO DATABASE …` with `include_tables = "students"`) and **table-level sync** (using `INSERT INTO … SELECT * FROM cdc_stream(...)`) - Flink Doris Connector (link to existing doc) - Third-party tools (DataX / SeaTunnel / CloudCanal) The Streaming Job section in `migrate-data-from-other-oltp.md` is removed (it referenced a stale `streaming-job-multi-table.md` link), since the same content now lives in the dedicated MySQL/PostgreSQL pages.
1 parent 2a1c7db commit a3000c1

14 files changed

Lines changed: 2006 additions & 106 deletions

File tree

docs/data-operate/import/data-source/migrate-data-from-other-oltp.md

Lines changed: 3 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ AS
3434
SELECT * FROM iceberg_catalog.iceberg_db.table1;
3535
```
3636

37-
For more details, refer to [Catalog Data Load](../../../lakehouse/catalog-overview.md#data-import)
37+
For more details, refer to [Catalog Data Load](../../../lakehouse/catalog-overview.md#data-ingestion)
3838

3939
## Flink Doris Connector
4040

@@ -133,7 +133,7 @@ You can leverage Flink to achieve offline and real-time synchronization for TP s
133133
--sink-conf sink.label-prefix=label \
134134
--table-conf replication_num=1
135135
```
136-
For more details, refer to [Full Database Synchronization](../../../ecosystem/flink-doris-connector.md#full-database-synchronization)
136+
For more details, refer to [Full Database Synchronization](../../../ecosystem/flink-doris-connector/flink-doris-connector.md#full-database-synchronization)
137137

138138
## Spark Connector
139139
You can use the JDBC Source and Doris Sink of the Spark Connector to complete the data write.
@@ -153,31 +153,7 @@ val jdbcDF = spark.read
153153
.option("password", "")
154154
.save()
155155
```
156-
For more details, refer to [JDBC To Other Databases](https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html)[Spark Doris Connector](../../../ecosystem/spark-doris-connector.md#batch-write)
157-
158-
## Streaming Job
159-
160-
Use Streaming Job to synchronize data from MySQL or Postgres.
161-
162-
```sql
163-
CREATE JOB multi_table_sync
164-
ON STREAMING
165-
FROM MYSQL (
166-
"jdbc_url" = "jdbc:mysql://127.0.0.1:3306",
167-
"driver_url" = "mysql-connector-j-8.0.31.jar",
168-
"driver_class" = "com.mysql.cj.jdbc.Driver",
169-
"user" = "root",
170-
"password" = "123456",
171-
"database" = "test",
172-
"include_tables" = "user_info,order_info",
173-
"offset" = "initial"
174-
)
175-
TO DATABASE target_test_db (
176-
"table.create.properties.replication_num" = "1"
177-
)
178-
```
179-
180-
For more details, refer to: [Postgres/MySQL Continuous Load](../streaming-job/streaming-job-multi-table.md)
156+
For more details, refer to [JDBC To Other Databases](https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html)[Spark Doris Connector](../../../ecosystem/spark-doris-connector/spark-doris-connector.md#batch-write)
181157

182158
## DataX / Seatunnel / CloudCanal and other third-party tools.
183159

Lines changed: 248 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,248 @@
1+
---
2+
{
3+
"title": "MySQL",
4+
"language": "en",
5+
"description": "Doris provides multiple ways to load data from MySQL, including ad-hoc loading via JDBC Catalog, continuous full + incremental synchronization via Streaming Job, and CDC sync via Flink Doris Connector."
6+
}
7+
---
8+
9+
Doris provides the following ways to load data from MySQL:
10+
11+
- **Loading MySQL data via JDBC Catalog**
12+
13+
Doris uses JDBC Catalog to map MySQL as an external catalog, allowing direct SQL queries against MySQL data. Combined with `INSERT INTO` or `CREATE TABLE AS SELECT`, this is suitable for one-time migration or periodic batch loading.
14+
15+
- **Continuously syncing MySQL data via Streaming Job**
16+
17+
Doris uses Streaming Job to continuously sync full and incremental data from MySQL to Doris. By integrating [Flink CDC](https://github.com/apache/flink-cdc) reading capability, Doris keeps the job running, reads Binlog from MySQL and writes it to Doris tables with exactly-once semantics. Two modes are supported: table-level sync and database-level sync. Available since Doris 4.1.0.
18+
19+
- **Loading MySQL data via Flink Doris Connector**
20+
21+
Use Flink Doris Connector together with Flink MySQL CDC for real-time synchronization. This is suitable for scenarios that require additional Flink stream processing logic. The connector also provides a one-click full-database synchronization tool. For details, see [Flink Doris Connector](../../../ecosystem/flink-doris-connector/flink-doris-connector.md).
22+
23+
- **Loading MySQL data via third-party tools**
24+
25+
Data integration tools such as [DataX](../../../ecosystem/datax), [SeaTunnel](../../../ecosystem/seatunnel), and [CloudCanal](../../../ecosystem/cloudcanal) also support syncing data from MySQL to Doris.
26+
27+
In most cases, you can use JDBC Catalog directly for one-time data migration. When continuous full + incremental synchronization is required, Streaming Job is recommended.
28+
29+
## Loading MySQL data via JDBC Catalog
30+
31+
Use JDBC Catalog to map MySQL as an external catalog, then use `INSERT INTO` or `CREATE TABLE AS SELECT` to load data. For detailed syntax, see [JDBC MySQL Catalog](../../../lakehouse/catalogs/jdbc-mysql-catalog.md).
32+
33+
### Step 1: Prepare data in MySQL
34+
35+
```sql
36+
CREATE TABLE test.students (
37+
id INT PRIMARY KEY,
38+
name VARCHAR(64),
39+
age INT
40+
);
41+
42+
INSERT INTO test.students VALUES (1, 'Emily', 25), (2, 'Bob', 30);
43+
```
44+
45+
### Step 2: Create a Catalog in Doris
46+
47+
```sql
48+
CREATE CATALOG mysql_catalog PROPERTIES (
49+
"type" = "jdbc",
50+
"user" = "root",
51+
"password" = "123456",
52+
"jdbc_url" = "jdbc:mysql://127.0.0.1:3306/test",
53+
"driver_url" = "mysql-connector-java-8.0.25.jar",
54+
"driver_class" = "com.mysql.cj.jdbc.Driver"
55+
);
56+
```
57+
58+
### Step 3: Create the target table in Doris
59+
60+
```sql
61+
CREATE DATABASE IF NOT EXISTS doris_db;
62+
63+
CREATE TABLE doris_db.students (
64+
id INT,
65+
name VARCHAR(64),
66+
age INT
67+
)
68+
UNIQUE KEY(id)
69+
DISTRIBUTED BY HASH(id) BUCKETS 1
70+
PROPERTIES ("replication_num" = "1");
71+
```
72+
73+
### Step 4: Load data with INSERT INTO
74+
75+
```sql
76+
INSERT INTO doris_db.students
77+
SELECT id, name, age FROM mysql_catalog.test.students;
78+
```
79+
80+
If the target table does not exist yet, you can also use `CREATE TABLE AS SELECT` to create the table and load data in one step:
81+
82+
```sql
83+
CREATE TABLE doris_db.students
84+
PROPERTIES ("replication_num" = "1")
85+
AS
86+
SELECT * FROM mysql_catalog.test.students;
87+
```
88+
89+
### Step 5: Verify loaded data
90+
91+
```sql
92+
SELECT * FROM doris_db.students;
93+
+----+-------+------+
94+
| id | name | age |
95+
+----+-------+------+
96+
| 1 | Emily | 25 |
97+
| 2 | Bob | 30 |
98+
+----+-------+------+
99+
```
100+
101+
## Continuously syncing MySQL data via Streaming Job
102+
103+
Streaming Job continuously reads MySQL Binlog via Flink CDC and writes it to Doris. Two modes are supported:
104+
105+
- [MySQL Database-Level Sync](../streaming-job/continuous-load-mysql-database.md): sync at the database level (use `include_tables` to sync one, several, or all tables). Doris automatically creates downstream tables on first sync. Provides at-least-once semantics.
106+
- [MySQL Table-Level Sync](../streaming-job/continuous-load-mysql-table.md): sync at the table level. The target table must be pre-created in Doris. Supports flexible column mapping, data transformation, and exactly-once semantics.
107+
108+
### Limitations
109+
110+
1. Only primary key tables (Unique Key) are supported.
111+
2. Load privilege is required. Database-level sync also needs Create privilege when auto-creating downstream tables on first run.
112+
3. Available since Doris 4.1.0.
113+
114+
### Prerequisites
115+
116+
Before submitting a Streaming Job, Binlog must be enabled on the MySQL side and the user must be granted the corresponding REPLICATION privileges. For environment-specific setup steps, see:
117+
118+
- [Amazon RDS MySQL Setup Guide](../streaming-job/prerequisites/amazon-rds-mysql.md)
119+
- [Amazon Aurora MySQL Setup Guide](../streaming-job/prerequisites/amazon-aurora-mysql.md)
120+
- See [Continuous Load Overview](../streaming-job/continuous-load-overview.md) for notes and required permissions of each mode.
121+
122+
### Operation Example: Database-Level Sync
123+
124+
Database-level sync uses the `FROM MYSQL ... TO DATABASE ...` syntax. The target is a Doris database, and the downstream tables are automatically created on first sync.
125+
126+
#### Step 1: Prepare data in MySQL
127+
128+
```sql
129+
CREATE TABLE test.students (
130+
id INT PRIMARY KEY,
131+
name VARCHAR(64),
132+
age INT
133+
);
134+
135+
INSERT INTO test.students VALUES (1, 'Emily', 25), (2, 'Bob', 30);
136+
```
137+
138+
#### Step 2: Create the target database in Doris
139+
140+
Database-level sync **does not require pre-creating tables**, but the target database that hosts them must exist:
141+
142+
```sql
143+
CREATE DATABASE IF NOT EXISTS doris_db;
144+
```
145+
146+
#### Step 3: Create a Streaming Job
147+
148+
The example below uses `include_tables` to sync only the `students` table (multiple tables can be comma-separated; leave empty to sync the whole database):
149+
150+
```sql
151+
CREATE JOB mysql_db_sync
152+
ON STREAMING
153+
FROM MYSQL (
154+
"jdbc_url" = "jdbc:mysql://127.0.0.1:3306",
155+
"driver_url" = "mysql-connector-java-8.0.25.jar",
156+
"driver_class" = "com.mysql.cj.jdbc.Driver",
157+
"user" = "root",
158+
"password" = "123456",
159+
"database" = "test",
160+
"include_tables" = "students",
161+
"offset" = "initial"
162+
)
163+
TO DATABASE doris_db (
164+
"table.create.properties.replication_num" = "1" -- set to 1 in single-BE deployments
165+
);
166+
```
167+
168+
#### Step 4: Check job status
169+
170+
```sql
171+
SELECT * FROM jobs("type"="insert") WHERE ExecuteType = "STREAMING";
172+
```
173+
174+
#### Step 5: Inspect auto-created Doris tables and loaded data
175+
176+
```sql
177+
SHOW TABLES FROM doris_db;
178+
SELECT * FROM doris_db.students;
179+
```
180+
181+
For more common operations and full parameter reference, see [MySQL Database-Level Sync](../streaming-job/continuous-load-mysql-database.md).
182+
183+
### Operation Example: Table-Level Sync
184+
185+
#### Step 1: Prepare data in MySQL
186+
187+
```sql
188+
CREATE TABLE test.students (
189+
id INT PRIMARY KEY,
190+
name VARCHAR(64),
191+
age INT
192+
);
193+
194+
INSERT INTO test.students VALUES (1, 'Emily', 25), (2, 'Bob', 30);
195+
```
196+
197+
#### Step 2: Create the target table in Doris
198+
199+
Table-level sync requires the target table to exist beforehand:
200+
201+
```sql
202+
CREATE DATABASE IF NOT EXISTS doris_db;
203+
204+
CREATE TABLE doris_db.students (
205+
id INT,
206+
name VARCHAR(64),
207+
age INT
208+
)
209+
UNIQUE KEY(id)
210+
DISTRIBUTED BY HASH(id) BUCKETS 1
211+
PROPERTIES ("replication_num" = "1");
212+
```
213+
214+
#### Step 3: Create a Streaming Job
215+
216+
Use [CREATE STREAMING JOB](../../../sql-manual/sql-statements/job/CREATE-STREAMING-JOB.md) with the `INSERT INTO ... SELECT * FROM cdc_stream(...)` syntax:
217+
218+
```sql
219+
CREATE JOB mysql_students_sync
220+
ON STREAMING
221+
DO
222+
INSERT INTO doris_db.students
223+
SELECT * FROM cdc_stream(
224+
"type" = "mysql",
225+
"jdbc_url" = "jdbc:mysql://127.0.0.1:3306",
226+
"driver_url" = "mysql-connector-java-8.0.25.jar",
227+
"driver_class" = "com.mysql.cj.jdbc.Driver",
228+
"user" = "root",
229+
"password" = "123456",
230+
"database" = "test",
231+
"table" = "students",
232+
"offset" = "initial"
233+
);
234+
```
235+
236+
#### Step 4: Check job status
237+
238+
```sql
239+
SELECT * FROM jobs("type"="insert") WHERE ExecuteType = "STREAMING";
240+
```
241+
242+
#### Step 5: Verify loaded data
243+
244+
```sql
245+
SELECT * FROM doris_db.students;
246+
```
247+
248+
For more common operations (pause, resume, delete, check task, etc.) and full parameter reference, see [MySQL Table-Level Sync](../streaming-job/continuous-load-mysql-table.md).

0 commit comments

Comments
 (0)