Skip to content

Commit dce4176

Browse files
Migration docs restructure (#21245)
* Migration docs restructure * worked out the IA for migration variables, basically * reverting molt-fetch to main * moved the splitting up of Fetch into a separate PR, fixed links for this PR * moved the splitting up of Fetch into a separate PR, fixed links for this PR * more progress on considerations: granularity, rollback, replication * added validation strategy consideration * removed dead links * added data transformation strategy * did main splitting of Fetch docs * merged in recent changes to replicator docs * Update pr-reviews.yml to allow deployment of draft * Update pr-reviews.yml, returning to previous * separated Replicator to match Fetch, fixed links * Made Fetch docs much more compact and clean, improved linking, added draft diagram, improved readability * added small intro paragraphs to pages * restarting build * WIP on splitting pages by source db type * classic bulk load split up by source db type * fixed two includes to pass linkcheck * added phased bulk load per source type * added delta migration per source type * fixed sidebar, updated diagrams, added start of phased delta migration * added phased delta with failback * removed original Migration Flows pages and all references/links to them * added new diagrams, moved type mapping, removed duplicate info in Configure Molt Fetch, updated sidebar * removed molt-setup.md * rebuilding deploy preview * line edits on the migration considerations section * fixed broken link * removed draft fetch flow image * moved crdb-to-crdb callout * round one of changes based on Ryan Luu, Tuan, and Steven's feedback * added rollback details in migration walkthrough descriptions * updated metrics * made changes from Ryan Luu's feedback * improved limitation visibility * added limitation and fixed links * connect validation advice to recent Verify change * connect validation advice to recent Verify change * updated broken link * updated gemfile.lock after build * removed partitioned tables links from molt fetch * merged in changes from recent molt doc updates * added new pages to v26.2 sidebar * incorporated Ryan's feedback * added redirects, modified sidebar * Revert Gemfile.lock changes to match main * fixed some typos
1 parent 5957868 commit dce4176

92 files changed

Lines changed: 11074 additions & 2990 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

src/current/_data/redirects.yml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1162,6 +1162,17 @@
11621162
- destination: cockroachcloud/byoc-azure-deployment.md
11631163
sources: ['cockroachcloud/byoc-deployment.md']
11641164

1165+
- destination: molt/migration-approach-classic-bulk-load.md
1166+
sources: ['molt/migrate-bulk-load.md']
1167+
1168+
- destination: molt/migration-approach-delta.md
1169+
sources: ['molt/migrate-load-replicate.md']
1170+
1171+
- destination: molt/molt-replicator.md#resume-after-an-interruption
1172+
sources: ['molt/migrate-resume-replication.md']
1173+
1174+
- destination: molt/molt-replicator.md#failback-replication
1175+
sources: ['molt/migrate-failback.md']
11651176
- destination: multiregion-overview.md
11661177
sources: ['demo-low-latency-multi-region-deployment.md', 'migrate-to-multiregion-sql.md']
11671178
versions: ['v26.1', 'v26.2']
Lines changed: 254 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,254 @@
1+
A [*Classic Bulk Load Migration*]({% link molt/migration-approach-classic-bulk-load.md %}) is the simplest way of [migrating data to CockroachDB]({% link molt/migration-overview.md %}). In this approach, you stop application traffic to the source database and migrate data to the target cluster using [MOLT Fetch]({% link molt/molt-fetch.md %}) during a **significant downtime window**. Application traffic is then cut over to the target after schema finalization and data verification.
2+
3+
- All source data is migrated to the target [at once]({% link molt/migration-considerations-granularity.md %}).
4+
5+
- This approach does not utilize [continuous replication]({% link molt/migration-considerations-replication.md %}).
6+
7+
- [Rollback]({% link molt/migration-considerations-rollback.md %}) is manual, but in most cases it's simple, as the source database is preserved and write traffic begins on the target all at once. If you wish to roll back before the target has received any writes that are not present on the source database, nothing needs to be done. If you wish to roll back after the target has received writes that are not present on the source database, you must manually replicate these new rows on the source.
8+
9+
This approach is best for small databases (<100 GB), internal tools, dev/staging environments, and production environments that can handle business disruption. It's a simple approach that guarantees full data consistency and is easy to execute with limited resources, but it can only be performed if your system can handle significant downtime.
10+
11+
This page describes an example scenario. While the commands provided can be copy-and-pasted, they may need to be altered or reconsidered to suit the needs of your specific environment.
12+
13+
<div style="text-align: center;">
14+
<img src="{{ 'images/molt/molt_classic_bulk_load_flow.svg' | relative_url }}" alt="Classic Bulk Load Migration flow" style="max-width:100%" />
15+
</div>
16+
17+
## Example scenario
18+
19+
You have a small (50 GB) database that provides the data store for a web application. You want to migrate the entirety of this database to a new CockroachDB cluster. You schedule a maintenance window for Saturday from 2 AM to 6 AM, and announce it to your users several weeks in advance.
20+
21+
The application runs on a Kubernetes cluster.
22+
23+
**Estimated system downtime:** 4 hours.
24+
25+
## Before the migration
26+
27+
- Install the [MOLT (Migrate Off Legacy Technology)]({% link molt/molt-fetch-installation.md %}#installation) tools.
28+
- Review the [MOLT Fetch]({% link molt/molt-fetch-best-practices.md %}) documentation.
29+
- [Develop a migration plan]({% link molt/migration-strategy.md %}#develop-a-migration-plan) and [prepare for the migration]({% link molt/migration-strategy.md %}#prepare-for-migration).
30+
- **Recommended:** Perform a dry run of this full set of instructions in a development environment that closely resembles your production environment. This can help you get a realistic sense of the time and complexity it requires.
31+
- Announce the maintenance window to your users.
32+
- Understand the prerequisites and limitations of the MOLT tools:
33+
34+
<section class="filter-content" markdown="1" data-scope="oracle">
35+
{% include molt/oracle-migration-prerequisites.md %}
36+
</section>
37+
38+
{% include molt/molt-limitations.md %}
39+
40+
## Step 1: Prepare the source database
41+
42+
In this step, you will:
43+
44+
- [Create a dedicated migration user on your source database](#create-migration-user-on-source-database).
45+
46+
{% include molt/migration-prepare-database.md %}
47+
48+
## Step 2: Prepare the target database
49+
50+
In this step, you will:
51+
52+
- [Provision and run a new CockroachDB cluster](#provision-a-cockroachdb-cluster).
53+
- [Define the tables on the target cluster](#define-the-target-tables) to match those on the source.
54+
- [Create a SQL user on the target cluster](#create-the-sql-user) with the necessary write permissions.
55+
56+
### Provision a CockroachDB cluster
57+
58+
Use one of the following options to create and run a new CockroachDB cluster. This is your migration **target**.
59+
60+
#### Option 1: Create a secure cluster locally
61+
62+
If you have the CockroachDB binary installed locally, you can manually deploy a multi-node, self-hosted CockroachDB cluster on your local machine.
63+
64+
Learn how to [deploy a CockroachDB cluster locally]({% link {{ site.versions["stable"] }}/secure-a-cluster.md %}).
65+
66+
#### Option 2: Create a CockroachDB Self-Hosted cluster on AWS
67+
68+
You can manually deploy a multi-node, self-hosted CockroachDB cluster on Amazon's AWS EC2 platform, using AWS's managed load-balancing service to distribute client traffic.
69+
70+
Learn how to [deploy a CockroachDB cluster on AWS]({% link {{ site.versions["stable"] }}/deploy-cockroachdb-on-aws.md %}).
71+
72+
#### Option 3: Create a CockroachDB Cloud cluster
73+
74+
CockroachDB Cloud is a fully-managed service run by Cockroach Labs, which simplifies the deployment and management of CockroachDB.
75+
76+
[Sign up for a CockroachDB Cloud account](https://cockroachlabs.cloud) and [create a cluster]({% link cockroachcloud/create-your-cluster.md %}) using [trial credits]({% link cockroachcloud/free-trial.md %}).
77+
78+
### Define the target tables
79+
80+
{% include molt/migration-prepare-schema.md %}
81+
82+
### Create the SQL user
83+
84+
{% include molt/migration-create-sql-user.md %}
85+
86+
## Step 3: Stop application traffic
87+
88+
With both the source and target databases prepared for the data load, it's time to stop application traffic to the source. At the start of the maintenance window, scale down the Kubernetes cluster to zero pods.
89+
90+
{% include_cached copy-clipboard.html %}
91+
~~~shell
92+
kubectl scale deployment app --replicas=0
93+
~~~
94+
95+
{{ site.data.alerts.callout_danger }}
96+
Application downtime begins now.
97+
98+
It is strongly recommended that you perform a dry run of this migration in a test environment. This will allow you to practice using the MOLT tools in real time, and it will give you an accurate sense of how long application downtime might last.
99+
{{ site.data.alerts.end }}
100+
101+
## Step 4: Load data into CockroachDB
102+
103+
In this step, you will:
104+
105+
- [Configure MOLT Fetch with the flags needed for your migration](#configure-molt-fetch).
106+
- [Run MOLT Fetch](#run-molt-fetch).
107+
- [Understand how to continue a load after an interruption](#continue-molt-fetch-after-an-interruption).
108+
109+
### Configure MOLT Fetch
110+
111+
The [MOLT Fetch documentation]({% link molt/molt-fetch.md %}) includes detailed information about how to [configure MOLT Fetch]({% link molt/molt-fetch.md %}#run-molt-fetch), and how to [monitor MOLT Fetch metrics]({% link molt/molt-fetch-monitoring.md %}).
112+
113+
When you run `molt fetch`, you can configure the following options for data load:
114+
115+
<a id="schema-and-table-filtering"></a>
116+
<a id="source-connection-string"></a>
117+
<a id="table-handling-mode"></a>
118+
<a id="target-connection-string"></a>
119+
<a id="cloud-storage-authentication"></a>
120+
<a id="secure-connections"></a>
121+
<a id="intermediate-file-storage"></a>
122+
<a id="data-load-mode"></a>
123+
<a id="connection-strings"></a>
124+
125+
- [Specify source and target databases]({% link molt/molt-fetch.md %}#specify-source-and-target-databases): Specify URL‑encoded source and target connections.
126+
- [Select data to migrate]({% link molt/molt-fetch.md %}#select-data-to-migrate): Specify schema and table names to migrate.
127+
- [Define intermediate file storage]({% link molt/molt-fetch.md %}#define-intermediate-storage): Export data to cloud storage or a local file server.
128+
- [Define fetch mode]({% link molt/molt-fetch.md %}#define-fetch-mode): Specifies whether data will only be loaded into/from intermediate storage.
129+
- [Shard tables]({% link molt/molt-fetch.md %}#shard-tables-for-concurrent-export): Divide larger tables into multiple shards during data export.
130+
- [Data load mode]({% link molt/molt-fetch.md %}#import-into-vs-copy-from): Choose between `IMPORT INTO` and `COPY FROM`.
131+
- [Table handling mode]({% link molt/molt-fetch.md %}#handle-target-tables): Determine how existing target tables are initialized before load.
132+
- [Define data transformations]({% link molt/molt-fetch.md %}#define-transformations): Define any row-level transformations to apply to the data before it reaches the target.
133+
- [Monitor fetch metrics]({% link molt/molt-fetch-monitoring.md %}): Configure metrics collection during initial data load.
134+
135+
Read through the documentation to understand how to configure your `molt fetch` command and its flags. Follow [best practices]({% link molt/molt-fetch-best-practices.md %}), especially those related to security.
136+
137+
At minimum, the `molt fetch` command should include the source, target, data path, and [`--ignore-replication-check`]({% link molt/molt-fetch-commands-and-flags.md %}#ignore-replication-check) flags:
138+
139+
{% include_cached copy-clipboard.html %}
140+
~~~ shell
141+
molt fetch \
142+
--source $SOURCE \
143+
--target $TARGET \
144+
--bucket-path 's3://bucket/path' \
145+
--ignore-replication-check
146+
~~~
147+
148+
However, depending on the needs of your migration, you may have many more flags set, and you may need to prepare some accompanying .json files.
149+
150+
### Run MOLT Fetch
151+
152+
Perform the bulk load of the source data.
153+
154+
1. Run the [MOLT Fetch]({% link molt/molt-fetch.md %}) command to move the source data into CockroachDB. This example command passes the source and target connection strings [as environment variables](#secure-connections), writes [intermediate files](#intermediate-file-storage) to S3 storage, and uses the `truncate-if-exists` [table handling mode](#table-handling-mode) to truncate the target tables before loading data. It limits the migration to a single schema and filters for three specific tables. The [data load mode]({% link molt/molt-fetch.md %}#import-into-vs-copy-from) defaults to `IMPORT INTO`. Include the `--ignore-replication-check` flag to skip replication checkpoint queries, which eliminates the need to configure the source database for logical replication.
155+
156+
<section class="filter-content" markdown="1" data-scope="postgres">
157+
{% include_cached copy-clipboard.html %}
158+
~~~ shell
159+
molt fetch \
160+
--source $SOURCE \
161+
--target $TARGET \
162+
--schema-filter 'migration_schema' \
163+
--table-filter 'employees|payments|orders' \
164+
--bucket-path 's3://migration/data/cockroach' \
165+
--table-handling truncate-if-exists \
166+
--ignore-replication-check
167+
~~~
168+
</section>
169+
170+
<section class="filter-content" markdown="1" data-scope="mysql">
171+
{% include_cached copy-clipboard.html %}
172+
~~~ shell
173+
molt fetch \
174+
--source $SOURCE \
175+
--target $TARGET \
176+
--table-filter 'employees|payments|orders' \
177+
--bucket-path 's3://migration/data/cockroach' \
178+
--table-handling truncate-if-exists \
179+
--ignore-replication-check
180+
~~~
181+
</section>
182+
183+
<section class="filter-content" markdown="1" data-scope="oracle">
184+
The command assumes an Oracle Multitenant (CDB/PDB) source. [`--source-cdb`]({% link molt/molt-fetch-commands-and-flags.md %}#source-cdb) specifies the container database (CDB) connection string.
185+
186+
{% include_cached copy-clipboard.html %}
187+
~~~ shell
188+
molt fetch \
189+
--source $SOURCE \
190+
--source-cdb $SOURCE_CDB \
191+
--target $TARGET \
192+
--schema-filter 'migration_schema' \
193+
--table-filter 'employees|payments|orders' \
194+
--bucket-path 's3://migration/data/cockroach' \
195+
--table-handling truncate-if-exists \
196+
--ignore-replication-check
197+
~~~
198+
</section>
199+
200+
{% include molt/fetch-data-load-output.md %}
201+
202+
### Continue MOLT Fetch after an interruption
203+
204+
{% include molt/fetch-continue-after-interruption.md %}
205+
206+
## Step 5: Verify the data
207+
208+
In this step, you will use [MOLT Verify]({% link molt/molt-verify.md %}) to confirm that the source and target data is consistent. This ensures that the data load was successful.
209+
210+
### Run MOLT Verify
211+
212+
{% include molt/verify-output.md %}
213+
214+
## Step 6: Finalize the target schema
215+
216+
### Add constraints and indexes
217+
218+
{% include molt/migration-modify-target-schema.md %}
219+
220+
## Step 7: Cut over application traffic
221+
222+
With the target cluster verified and finalized, it's time to resume application traffic.
223+
224+
### Modify application code
225+
226+
In the application back end, make sure that the application now directs traffic to the CockroachDB cluster. For example:
227+
228+
~~~yml
229+
env:
230+
- name: DATABASE_URL
231+
value: postgres://root@localhost:26257/defaultdb?sslmode=verify-full
232+
~~~
233+
234+
### Resume application traffic
235+
236+
Scale up the Kubernetes deployment to the original number of replicas:
237+
238+
{% include_cached copy-clipboard.html %}
239+
~~~shell
240+
kubectl scale deployment app --replicas=3
241+
~~~
242+
243+
This ends downtime.
244+
245+
## Troubleshooting
246+
247+
{% include molt/molt-troubleshooting-fetch.md %}
248+
249+
## See also
250+
251+
- [MOLT Fetch]({% link molt/molt-fetch.md %})
252+
- [MOLT Verify]({% link molt/molt-verify.md %})
253+
- [Migration Overview]({% link molt/migration-overview.md %})
254+
- [MOLT Schema Conversion Tool]({% link cockroachcloud/migrations-page.md %})

0 commit comments

Comments
 (0)