Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
---

Check warning on line 1 in docs/data-operate/import/data-source/migrate-data-from-other-oltp.md

View workflow job for this annotation

GitHub Actions / Build Check

seo-title-duplicate

Rendered SEO title is duplicated across indexable pages%3A "Migrating Data from Other OLTP - Apache Doris". Add a version%2C locale%2C or page-specific qualifier. Owner%3A @apache/doris-website-maintainers
{
"title": "Migrating Data from Other OLTP",
"language": "en",
Expand Down Expand Up @@ -34,7 +34,7 @@
SELECT * FROM iceberg_catalog.iceberg_db.table1;
```

For more details, refer to [Catalog Data Load](../../../lakehouse/catalog-overview.md#data-import)。
For more details, refer to [Catalog Data Load](../../../lakehouse/catalog-overview.md#data-ingestion)。

## Flink Doris Connector

Expand Down Expand Up @@ -71,7 +71,7 @@

INSERT into student_sink select * from student_source;
```
For more details, refer to [Flink JDBC](https://nightlies.apache.org/flink/flink-docs-master/zh/docs/connectors/table/jdbc/#%e5%a6%82%e4%bd%95%e5%88%9b%e5%bb%ba-jdbc-%e8%a1%a8)。

Check notice on line 74 in docs/data-operate/import/data-source/migrate-data-from-other-oltp.md

View workflow job for this annotation

GitHub Actions / Build Check

link-external-report-only

External link is report-only and was not fetched%3A https%3A//nightlies.apache.org/flink/flink-docs-master/zh/docs/connectors/table/jdbc/#%e5%a6%82%e4%bd%95%e5%88%9b%e5%bb%ba-jdbc-%e8%a1%a8. Owner%3A @apache/doris-website-maintainers

- Real-time synchronization can be achieved using FlinkCDC to read both full and incremental data. For example, using FlinkSQL:
```sql
Expand Down Expand Up @@ -133,7 +133,7 @@
--sink-conf sink.label-prefix=label \
--table-conf replication_num=1
```
For more details, refer to [Full Database Synchronization](../../../ecosystem/flink-doris-connector.md#full-database-synchronization)
For more details, refer to [Full Database Synchronization](../../../ecosystem/flink-doris-connector/flink-doris-connector.md#full-database-synchronization)

## Spark Connector
You can use the JDBC Source and Doris Sink of the Spark Connector to complete the data write.
Expand All @@ -153,31 +153,7 @@
.option("password", "")
.save()
```
For more details, refer to [JDBC To Other Databases](https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html),[Spark Doris Connector](../../../ecosystem/spark-doris-connector.md#batch-write)

## Streaming Job

Use Streaming Job to synchronize data from MySQL or Postgres.

```sql
CREATE JOB multi_table_sync
ON STREAMING
FROM MYSQL (
"jdbc_url" = "jdbc:mysql://127.0.0.1:3306",
"driver_url" = "mysql-connector-j-8.0.31.jar",
"driver_class" = "com.mysql.cj.jdbc.Driver",
"user" = "root",
"password" = "123456",
"database" = "test",
"include_tables" = "user_info,order_info",
"offset" = "initial"
)
TO DATABASE target_test_db (
"table.create.properties.replication_num" = "1"
)
```

For more details, refer to: [Postgres/MySQL Continuous Load](../streaming-job/streaming-job-multi-table.md)
For more details, refer to [JDBC To Other Databases](https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html),[Spark Doris Connector](../../../ecosystem/spark-doris-connector/spark-doris-connector.md#batch-write)

Check notice on line 156 in docs/data-operate/import/data-source/migrate-data-from-other-oltp.md

View workflow job for this annotation

GitHub Actions / Build Check

link-external-report-only

External link is report-only and was not fetched%3A https%3A//spark.apache.org/docs/latest/sql-data-sources-jdbc.html. Owner%3A @apache/doris-website-maintainers

## DataX / Seatunnel / CloudCanal and other third-party tools.

Expand Down
248 changes: 248 additions & 0 deletions docs/data-operate/import/data-source/mysql.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,248 @@
---

Check warning on line 1 in docs/data-operate/import/data-source/mysql.md

View workflow job for this annotation

GitHub Actions / Build Check

seo-description-length

SEO description should be 80-160 characters; current length is 201. Owner%3A @apache/doris-website-maintainers
{
"title": "MySQL",
"language": "en",
"description": "Doris provides multiple ways to load data from MySQL, including ad-hoc loading via JDBC Catalog, continuous full + incremental synchronization via Streaming Job, and CDC sync via Flink Doris Connector."
}
---

Doris provides the following ways to load data from MySQL:

- **Loading MySQL data via JDBC Catalog**

Doris uses JDBC Catalog to map MySQL as an external catalog, allowing direct SQL queries against MySQL data. Combined with `INSERT INTO` or `CREATE TABLE AS SELECT`, this is suitable for one-time migration or periodic batch loading.

- **Continuously syncing MySQL data via Streaming Job**

Doris uses Streaming Job to continuously sync full and incremental data from MySQL to Doris. By integrating [Flink CDC](https://github.com/apache/flink-cdc) reading capability, Doris keeps the job running, reads Binlog from MySQL and writes it to Doris tables with exactly-once semantics. Two modes are supported: table-level sync and database-level sync. Available since Doris 4.1.0.

Check notice on line 17 in docs/data-operate/import/data-source/mysql.md

View workflow job for this annotation

GitHub Actions / Build Check

link-external-report-only

External link is report-only and was not fetched%3A https%3A//github.com/apache/flink-cdc. Owner%3A @apache/doris-website-maintainers

- **Loading MySQL data via Flink Doris Connector**

Use Flink Doris Connector together with Flink MySQL CDC for real-time synchronization. This is suitable for scenarios that require additional Flink stream processing logic. The connector also provides a one-click full-database synchronization tool. For details, see [Flink Doris Connector](../../../ecosystem/flink-doris-connector/flink-doris-connector.md).

- **Loading MySQL data via third-party tools**

Data integration tools such as [DataX](../../../ecosystem/datax), [SeaTunnel](../../../ecosystem/seatunnel), and [CloudCanal](../../../ecosystem/cloudcanal) also support syncing data from MySQL to Doris.

In most cases, you can use JDBC Catalog directly for one-time data migration. When continuous full + incremental synchronization is required, Streaming Job is recommended.

## Loading MySQL data via JDBC Catalog

Use JDBC Catalog to map MySQL as an external catalog, then use `INSERT INTO` or `CREATE TABLE AS SELECT` to load data. For detailed syntax, see [JDBC MySQL Catalog](../../../lakehouse/catalogs/jdbc-mysql-catalog.md).

### Step 1: Prepare data in MySQL

```sql
CREATE TABLE test.students (
id INT PRIMARY KEY,
name VARCHAR(64),
age INT
);

INSERT INTO test.students VALUES (1, 'Emily', 25), (2, 'Bob', 30);
```

### Step 2: Create a Catalog in Doris

```sql
CREATE CATALOG mysql_catalog PROPERTIES (
"type" = "jdbc",
"user" = "root",
"password" = "123456",
"jdbc_url" = "jdbc:mysql://127.0.0.1:3306/test",
"driver_url" = "mysql-connector-java-8.0.25.jar",
"driver_class" = "com.mysql.cj.jdbc.Driver"
);
```

### Step 3: Create the target table in Doris

```sql
CREATE DATABASE IF NOT EXISTS doris_db;

CREATE TABLE doris_db.students (
id INT,
name VARCHAR(64),
age INT
)
UNIQUE KEY(id)
DISTRIBUTED BY HASH(id) BUCKETS 1
PROPERTIES ("replication_num" = "1");
```

### Step 4: Load data with INSERT INTO

```sql
INSERT INTO doris_db.students
SELECT id, name, age FROM mysql_catalog.test.students;
```

If the target table does not exist yet, you can also use `CREATE TABLE AS SELECT` to create the table and load data in one step:

```sql
CREATE TABLE doris_db.students
PROPERTIES ("replication_num" = "1")
AS
SELECT * FROM mysql_catalog.test.students;
```

### Step 5: Verify loaded data

```sql
SELECT * FROM doris_db.students;
+----+-------+------+
| id | name | age |
+----+-------+------+
| 1 | Emily | 25 |
| 2 | Bob | 30 |
+----+-------+------+
```

## Continuously syncing MySQL data via Streaming Job

Streaming Job continuously reads MySQL Binlog via Flink CDC and writes it to Doris. Two modes are supported:

- [MySQL Database-Level Sync](../streaming-job/continuous-load-mysql-database.md): sync at the database level (use `include_tables` to sync one, several, or all tables). Doris automatically creates downstream tables on first sync. Provides at-least-once semantics.
- [MySQL Table-Level Sync](../streaming-job/continuous-load-mysql-table.md): sync at the table level. The target table must be pre-created in Doris. Supports flexible column mapping, data transformation, and exactly-once semantics.

### Limitations

1. Only primary key tables (Unique Key) are supported.
2. Load privilege is required. Database-level sync also needs Create privilege when auto-creating downstream tables on first run.
3. Available since Doris 4.1.0.

### Prerequisites

Before submitting a Streaming Job, Binlog must be enabled on the MySQL side and the user must be granted the corresponding REPLICATION privileges. For environment-specific setup steps, see:

- [Amazon RDS MySQL Setup Guide](../streaming-job/prerequisites/amazon-rds-mysql.md)
- [Amazon Aurora MySQL Setup Guide](../streaming-job/prerequisites/amazon-aurora-mysql.md)
- See [Continuous Load Overview](../streaming-job/continuous-load-overview.md) for notes and required permissions of each mode.

### Operation Example: Database-Level Sync

Database-level sync uses the `FROM MYSQL ... TO DATABASE ...` syntax. The target is a Doris database, and the downstream tables are automatically created on first sync.

#### Step 1: Prepare data in MySQL

```sql
CREATE TABLE test.students (
id INT PRIMARY KEY,
name VARCHAR(64),
age INT
);

INSERT INTO test.students VALUES (1, 'Emily', 25), (2, 'Bob', 30);
```

#### Step 2: Create the target database in Doris

Database-level sync **does not require pre-creating tables**, but the target database that hosts them must exist:

```sql
CREATE DATABASE IF NOT EXISTS doris_db;
```

#### Step 3: Create a Streaming Job

The example below uses `include_tables` to sync only the `students` table (multiple tables can be comma-separated; leave empty to sync the whole database):

```sql
CREATE JOB mysql_db_sync
ON STREAMING
FROM MYSQL (
"jdbc_url" = "jdbc:mysql://127.0.0.1:3306",
"driver_url" = "mysql-connector-java-8.0.25.jar",
"driver_class" = "com.mysql.cj.jdbc.Driver",
"user" = "root",
"password" = "123456",
"database" = "test",
"include_tables" = "students",
"offset" = "initial"
)
TO DATABASE doris_db (
"table.create.properties.replication_num" = "1" -- set to 1 in single-BE deployments
);
```

#### Step 4: Check job status

```sql
SELECT * FROM jobs("type"="insert") WHERE ExecuteType = "STREAMING";
```

#### Step 5: Inspect auto-created Doris tables and loaded data

```sql
SHOW TABLES FROM doris_db;
SELECT * FROM doris_db.students;
```

For more common operations and full parameter reference, see [MySQL Database-Level Sync](../streaming-job/continuous-load-mysql-database.md).

### Operation Example: Table-Level Sync

#### Step 1: Prepare data in MySQL

```sql
CREATE TABLE test.students (
id INT PRIMARY KEY,
name VARCHAR(64),
age INT
);

INSERT INTO test.students VALUES (1, 'Emily', 25), (2, 'Bob', 30);
```

#### Step 2: Create the target table in Doris

Table-level sync requires the target table to exist beforehand:

```sql
CREATE DATABASE IF NOT EXISTS doris_db;

CREATE TABLE doris_db.students (
id INT,
name VARCHAR(64),
age INT
)
UNIQUE KEY(id)
DISTRIBUTED BY HASH(id) BUCKETS 1
PROPERTIES ("replication_num" = "1");
```

#### Step 3: Create a Streaming Job

Use [CREATE STREAMING JOB](../../../sql-manual/sql-statements/job/CREATE-STREAMING-JOB.md) with the `INSERT INTO ... SELECT * FROM cdc_stream(...)` syntax:

```sql
CREATE JOB mysql_students_sync
ON STREAMING
DO
INSERT INTO doris_db.students
SELECT * FROM cdc_stream(
"type" = "mysql",
"jdbc_url" = "jdbc:mysql://127.0.0.1:3306",
"driver_url" = "mysql-connector-java-8.0.25.jar",
"driver_class" = "com.mysql.cj.jdbc.Driver",
"user" = "root",
"password" = "123456",
"database" = "test",
"table" = "students",
"offset" = "initial"
);
```

#### Step 4: Check job status

```sql
SELECT * FROM jobs("type"="insert") WHERE ExecuteType = "STREAMING";
```

#### Step 5: Verify loaded data

```sql
SELECT * FROM doris_db.students;
```

For more common operations (pause, resume, delete, check task, etc.) and full parameter reference, see [MySQL Table-Level Sync](../streaming-job/continuous-load-mysql-table.md).
Loading
Loading