You can use {sstable-sideloader} to migrate data to {astra-db} from {cass-reg}, {dse}, or {hcd}.
Before you use {sstable-sideloader} for a migration, learn about the {sstable-sideloader} process and prepare your environments for {sstable-sideloader}.
On each node in your origin cluster, use nodetool to create a backup of the data that you want to migrate, including all keyspaces and CQL tables that you want to migrate.
-
Due to {sstable-sideloader} limitations related to materialized views, secondary indexes, and encrypted data, you might need to modify the data model on your origin cluster to prepare for the migration. For more information, see Origin cluster requirements.
-
Optional: Before you create snapshots, consider running
nodetool cleanupto remove data that no longer belongs to your nodes. This command is particularly useful after adding more nodes to a cluster because it helps ensure that each node only contains the data that it is responsible for, according to the current cluster configuration and partitioning scheme.If you run
nodetool cleanupbefore you take a snapshot, you can ensure that the snapshot only includes relevant data, potentially reducing the size of the snapshot. Smaller snapshots can lead to lower overall migration times and lower network transfer costs.However, take adequate precautions before you run this command because the cleanup operations can introduce additional load on your origin cluster.
Use nodetool snapshot to create snapshots for the tables that you want to migrate.
Don’t create snapshots of system tables or tables that you don’t want to migrate. The migration can fail if you attempt to migrate snapshots that don’t have a matching schema in the target database. {sstable-sideloader} ignores system keyspaces.
The structure of the nodetool snapshot command depends on the keyspaces and tables that you want to migrate.
Create a snapshot of all tables in all keyspaces:
nodetool snapshot -t SNAPSHOT_NAMEReplace the following:
-
SNAPSHOT_NAME: A descriptive name for the snapshot. Use the same snapshot name for each node’s snapshot; this makes it easier to programmatically upload the snapshots to the migration directory.
Create a snapshot of all tables in one or more specified keyspaces:
nodetool snapshot -t SNAPSHOT_NAME KEYSPACE_NAMEnodetool snapshot -t SNAPSHOT_NAME KEYSPACE_NAME_1 KEYSPACE_NAME_2Replace the following:
-
SNAPSHOT_NAME: A descriptive name for the snapshot. Use the same snapshot name for each node’s snapshot; this makes it easier to programmatically upload the snapshots to the migration directory. -
KEYSPACE_NAME: The name of the keyspace that you want to migrate.To snapshot multiple keyspaces, pass a space-separated list of keyspace names. For example,
customer_data product_data purchase_historyspecifies three keyspaces.
Create a snapshot of one or more specified tables:
nodetool snapshot -kt KEYSPACE_NAME.TABLE_NAME -t SNAPSHOT_NAMEnodetool snapshot -kt KEYSPACE_NAME_1.TABLE_NAME_A KEYSPACE_NAME_1.TABLE_NAME_B KEYSPACE_NAME_2.TABLE_NAME_X -t SNAPSHOT_NAMEReplace the following:
-
KEYSPACE_NAME.TABLE_NAME: The name of the table that you want to migrate and the keyspace that it belongs to, separated by a period. For example,product_data.appliancesspecifies theappliancestable in theproduct_datakeyspace.To snapshot multiple tables, pass a space-separated list of keyspace-table pairs. For example,
product_data.appliances purchase_history.nevada purchase_history.wisconsinspecifies theappliancestable in theproduct_datakeyspace and thenevadaandwisconsintables in thepurchase_historykeyspace. -
SNAPSHOT_NAME: A descriptive name for the snapshot. Use the same snapshot name for each node’s snapshot; this makes it easier to programmatically upload the snapshots to the migration directory.
Use nodetool listsnapshots to verify that the snapshots were created:
nodetool listsnapshots|
Important
|
Snapshots have a specific directory structure, such as |
If the nodes in your origin cluster are named in a predictable way (for example, dse0, dse1, dse2, etc.), you can use a for loop to simplify snapshot creation.
For example:
- Use a
forloop to snapshot all keyspaces -
To snapshot all keyspaces on each node, append the
nodetoolcommand to yourforloop:for i in 0 1 2; do ssh dse${i} nodetool snapshot -t SNAPSHOT_NAME; done
- Use a
forloop to snapshot specific keyspaces -
To snapshot one keyspace on each node, append the
nodetoolcommand to yourforloop:for i in 0 1 2; do ssh dse${i} nodetool snapshot -t SNAPSHOT_NAME KEYSPACE_NAME; done
To snapshot multiple specific keyspaces on each node, use commas (not spaces) to separate the keyspace names:
for i in 0 1 2; do ssh dse${i} nodetool snapshot -t SNAPSHOT_NAME KEYSPACE_NAME_1,KEYSPACE_NAME_2; done
- Use a
forloop to snapshot specific tables -
To snapshot one table on each node, append the
nodetoolcommand to yourforloop:for i in 0 1 2; do ssh dse${i} nodetool snapshot -kt KEYSPACE_NAME.TABLE_NAME -t SNAPSHOT_NAME; done
To snapshot multiple specific tables on each node, use commas (not spaces) to separate the keyspace-table pairs:
for i in 0 1 2; do ssh dse${i} nodetool snapshot -kt KEYSPACE_NAME_1.TABLE_NAME_A,KEYSPACE_NAME_1.TABLE_NAME_B -t SNAPSHOT_NAME; done
You can use the same for loop structure to verify that each snapshot was successfully created:
for i in 0 1 2; do ssh dse${i} nodetool listsnapshots; doneTo prepare your target database for the migration, you must record the schema for each table in your origin cluster that you want to migrate, re-create these schemas in your target database, and then set environment variables required to connect to your database.
|
Warning
|
For the migration to succeed, your target database must meet the schema requirements described in this section. Additionally, your snapshots must contain compatible data and directories, as described in Origin cluster requirements and Create snapshots. For example, {astra-db} doesn’t support materialized views, and {sstable-sideloader} cannot migrate encrypted data. However, indexes don’t need to match. You can define indexes in your target database independently from the origin cluster because {sstable-sideloader} ignores Storage Attached Indexes (SAI) defined on the origin cluster. During the migration, {sstable-sideloader} automatically populates any SAI defined in your target database, even if those SAI weren’t present in your origin cluster. |
-
Get the following schema properties for each table that you want to migrate:
-
Exact keyspace name.
-
Exact table name.
-
Exact column names, data types, and the order in which they appear in the table creation DDL.
-
Exact primary key definition as defined in your origin cluster, including the partition key, clustering columns, and ascending/descending ordering clauses. You must define partition key columns and clustering columns in the exact order that they are defined on your origin cluster.
To retrieve schema properties, you can run the
DESCRIBE KEYSPACEcommand on your origin cluster:DESCRIBE KEYSPACE_NAME;Replace
KEYSPACE_NAMEwith the name of the keyspace that contains the tables you want to migrate, such asDESCRIBE smart_home;.Then, get the schema properties from the result:
CREATE TABLE smart_home.sensor_readings ( device_id UUID, room_id UUID, reading_type TEXT, reading_value DOUBLE, reading_timestamp TIMESTAMP, PRIMARY KEY (device_id, room_id, reading_timestamp) ) WITH CLUSTERING ORDER BY (room_id ASC, reading_timestamp DESC);
-
-
Re-create the schemas in your target database:
-
In the {astra-ui-link} navigation menu, click Databases, and then click the name of your {astra-db} database.
-
Create a keyspace with the exact same name as your origin cluster’s keyspace.
-
In your database’s {cql-console}, create tables with the exact same names and schemas as your origin cluster.
{astra-db} rejects or ignores some table properties, such as compaction strategy. See astra-db-serverless:databases:database-limits.adoc for more information.
-
-
In your terminal, set environment variables for your target database:
export dbID=DATABASE_ID export token=APPLICATION_TOKEN
Replace the following:
DATABASE_ID: The database ID of your target {astra-db} database. *APPLICATION_TOKEN: An application token with a role that has the required permissions for {sstable-sideloader}, which are {create-db-permission} and {view-db-permission}. You can use a built-in role, such as the {database-administrator-role} role, or a custom role with the required permissions.TipLater, you will add another environment variable for the migration ID.
The curl commands in this guide assume that you have set environment variables for token, database ID, and migration ID. Running the commands without these environment variables causes error messages like
<a href="/v2/databases/migrations/">Moved Permanently</a>and404 page not found.Additionally, the curl command use jq to format the JSON responses. If you don’t have jq installed, remove
| jq .from the end of each command.
Use the {devops-api} to initialize the migration and get your migration directory path and credentials.
To learn more about the initialization process, see About {sstable-sideloader}: Initialize a migration.
The initialization process can take several minutes to complete, especially if the migration bucket doesn’t already exist.
-
In your terminal, use the {devops-api} to initialize the data migration:
curl -X POST \ -H "Authorization: Bearer ${token}" \ https://api.astra.datastax.com/v2/databases/${dbID}/migrations/initialize \ | jq . -
Get the
migrationIDfrom the response:{ "migrationID": "272eac1d-df8e-4d1b-a7c6-71d5af232182", "dbID": "b7e7761f-6f7f-4116-81a5-e8eefcf0cc1d", "status": "Initializing", "progressInfo": "", "uploadBucketDir": "", "uploadCredentials": { "name": "", "keys": null, "credentialExpiration": null }, "expectedCleanupTime": "2025-03-04T15:14:38Z" }The
migrationIDis a unique identifier (UUID) for the migration.The response also includes the migration
status. You will refer to this status multiple times throughout the migration process. -
Assign the migration ID to an environment variable:
export migrationID=MIGRATION_IDReplace
MIGRATION_IDwith themigrationIDreturned by theinitializeendpoint.
-
Check the migration status:
sideloader:partial$check-status.adoc
-
Check the
statusfield in the response:-
"status": "ReceivingFiles": Initialization is complete and your upload credentials are available. Proceed to the next step. -
"status": "Initializing": The migration is still initializing. Wait a few minutes before you check the status again.
-
Get your migration directory path and upload credentials from the response. You need these values to upload snapshots to the migration directory.
Securely store the uploadBucketDir, accessKeyID, secretAccessKey, and sessionToken from the response:
{
"migrationID": "272eac1d-df8e-4d1b-a7c6-71d5af232182",
"dbID": "b7e7761f-6f7f-4116-81a5-e8eefcf0cc1d",
"status": "ReceivingFiles",
"progressInfo": "",
"uploadBucketDir": "s3://ds-mig-b7e7761f-6f7f-4116-81a5-e8eefcf0cc1d/272eac1d-df8e-4d1b-a7c6-71d5af232182/sstables/",
"uploadCredentials": {
"name": "sessionToken",
"keys": {
"accessKeyID": "ASXXXXXXXXXXXXXXXXXX",
"secretAccessKey": "2XXXXXXXXXXXXXXXWqcdV519ZubYbyfuNxbZg1Rw",
"sessionToken": "XXXXXXXXXX"
},
"credentialExpiration": "2024-01-18T19:45:09Z",
"hint": "\nexport AWS_ACCESS_KEY_ID=ASXXXXXXXXXXXXXXXXXX\nexport AWS_SECRET_ACCESS_KEY=2XXXXXXXXXXXXXXXWqcdV519ZubYbyfuNxbZg1Rw\nexport AWS_SESSION_TOKEN=XXXXXXXXXXXXXX\n"
},
"expectedCleanupTime": "2024-01-25T15:14:38Z"
}uploadBucketDir is the migration directory URL.
Note the trailing slash.
uploadCredentials contains the AWS credentials that authorize uploads to the migration directory, namely accessKeyID, secretAccessKey, and sessionToken.
|
Important
|
The If you use automation to handle {sstable-sideloader} migrations, you might need to script a pause every hour so you can generate new credentials without unexpectedly interrupting the migration. |
-
Find the
uploadBucketDirand theuploadCredentialsin the response:{ "migrationID": "272eac1d-df8e-4d1b-a7c6-71d5af232182", "dbID": "b7e7761f-6f7f-4116-81a5-e8eefcf0cc1d", "status": "ReceivingFiles", "progressInfo": "", "uploadBucketDir": "gs://ds-mig-b7e7761f-6f7f-4116-81a5-e8eefcf0cc1d/272eac1d-df8e-4d1b-a7c6-71d5af232182/sstables/", "uploadCredentials": { "name": "TYPE_GOOGLE_CREDENTIALS_FILE", "keys": { "file": "CREDENTIALS_FILE" }, "credentialExpiration": "2024-08-07T18:51:39Z" }, "expectedCleanupTime": "2024-08-14T15:14:38Z" }uploadBucketDiris the migration directory URL. Note the trailing slash.uploadCredentialscontains a base64-encoded file containing Google Cloud credentials that authorize uploads to the migration directory. -
Pipe the Google Cloud credentials
fileto acreds.jsonfile:curl -X GET \ -H "Authorization: Bearer ${token}" \ https://api.astra.datastax.com/v2/databases/${dbID}/migrations/${migrationID} \ | jq -r '.uploadCredentials.keys.file' \ | base64 -d > creds.json -
Securely store the
uploadBucketDirandcreds.json.
Securely store the uploadBucketDir and urlSignature from the response:
{
"migrationID": "456ca4a9-0551-46c4-b8bb-90fcd136a0c3",
"dbID": "ccefd141-8fda-4e4d-a746-a102a96657bc",
"status": "ReceivingFiles",
"progressInfo": "",
"uploadBucketDir": "https://muztx5cqmp3jhe3j2guebksz.blob.core.windows.net/mig-upload-456ca4a9-0551-46c4-b8bb-90fcd136a0c3/sstables/",
"uploadCredentials": {
"name": "URL signature",
"keys": {
"url": "https://UPLOAD_BUCKET_DIR/?si=AZURE_SAS_TOKEN",
"urlSignature": "si=AZURE_SAS_TOKEN"
},
"credentialExpiration": "2025-04-02T15:14:31Z"
},
"expectedCleanupTime": "2025-03-04T15:14:38Z"
}uploadBucketDir is the migration directory URL.
Note the trailing slash.
uploadCredentials contains url and urlSignature keys that represent an Azure Shared Access Signature (SAS) token.
You need the urlSignature to upload snapshots to the migration directory.
In the preceding example, these strings are truncated for readability.
Use your cloud provider’s CLI and your upload credentials to upload snapshots for each origin node into the migration directory.
|
Important
|
Be aware of the following requirements for the upload commands:
|
-
Set environment variables for the AWS credentials that were generated when you initialized the migration:
export AWS_ACCESS_KEY_ID=ACCESS_KEY_ID export AWS_SECRET_ACCESS_KEY=SECRET_ACCESS_KEY export AWS_SESSION_TOKEN=SESSION_TOKEN
-
Use the AWS CLI to upload one snapshot from one node into the migration directory:
du -sh CASSANDRA_DATA_DIR/KEYSPACE_NAME/*/snapshots/*SNAPSHOT_NAME*; \ aws s3 sync --only-show-errors --exclude '*' --include '*/snapshots/SNAPSHOT_NAME*' CASSANDRA_DATA_DIR/ MIGRATION_DIRNODE_NAME
Replace the following:
sideloader:partial$command-placeholders-common.adoc
Example: Upload a snapshot with AWS CLI# Set environment variables export AWS_ACCESS_KEY_ID=XXXXXXXX export AWS_SECRET_ACCESS_KEY=XXXXXXXXXX export AWS_SESSION_TOKEN=XXXXXXXXXX # Upload "sensor_readings" snapshot from "dse0" node du -sh /var/lib/cassandra/data/smart_home/*/snapshots/*sensor_readings*; \ aws s3 sync --only-show-errors --exclude '*' --include '*/snapshots/sensor_readings*' /var/lib/cassandra/data/ s3://ds-mig-b7e7761f-6f7f-4116-81a5-e8eefcf0cc1d/272eac1d-df8e-4d1b-a7c6-71d5af232182/sstables/dse0
-
Monitor upload progress:
-
Use the AWS CLI to get a list of cloud storage keys for the files that have been successfully uploaded to the migration directory:
aws s3 ls --human-readable --summarize --recursive MIGRATION_DIR
Replace
MIGRATION_DIRwith theuploadBucketDirthat was generated when you initialized the migration. -
Compare the returned list against the files in your snapshot directory. When the lists match, the upload is complete.
You can potentially increase upload speeds by adjusting the
max_concurrent_requests,multipart_threshold, andmultipart_chunksizeparameters in your AWS CLI S3 configuration. However, upload time primarily depends on the snapshot size, network throughput from your origin cluster to the migration bucket, and whether the origin cluster and migration bucket are in the same region.
-
-
Repeat the upload process for each snapshot (
SNAPSHOT_NAME) and node (NODE_NAME) in your origin cluster.If your credentials expire, see Get new upload credentials.
TipUse aforloop to simplify snapshot uploadsIf the nodes in your origin cluster have predictable names (for example,
dse0,dse1, anddse2), then you can use aforloop to streamline the execution of the upload commands. For example:# Set environment variables export AWS_ACCESS_KEY_ID=ACCESS_KEY_ID export AWS_SECRET_ACCESS_KEY=SECRET_ACCESS_KEY export AWS_SESSION_TOKEN=SESSION_TOKEN # Loop over the sync command for all nodes for i in 0 1 2; do ssh dse${i} \ "du -sh CASSANDRA_DATA_DIR/KEYSPACE_NAME/*/snapshots/*SNAPSHOT_NAME*; \ aws s3 sync --only-show-errors --exclude '*' --include '*/snapshots/SNAPSHOT_NAME*' CASSANDRA_DATA_DIR/ MIGRATION_DIRdse${i}" & done
sideloader:partial$staged-snapshots-need-import-ph.adoc
sideloader:partial$idle-migration-directories-note.adoc
-
Authenticate to Google Cloud with the
creds.jsonfile that you created when you initialized the migration:gcloud auth activate-service-account --key-file=creds.json
If necessary, modify the
--key-filepath to match the location of yourcreds.jsonfile, such as--key-file=~/.gcloud_credentials/creds.json.You can also use
gcloud auth login --cred-file creds.json. -
Use
gsutilto upload one snapshot from one node into the migration directory:gsutil -m rsync -r -d CASSANDRA_DATA_DIR/KEYSPACE_NAME/**/snapshots/SNAPSHOT_NAME/ MIGRATION_DIRNODE_NAME/Replace the following:
sideloader:partial$command-placeholders-common.adoc
Example: Upload a snapshot with gcloud and gsutil# Authenticate gcloud auth activate-service-account --key-file=creds.json # Upload "sensor_readings" snapshot from "dse0" node gsutil -m rsync -r -d /var/lib/cassandra/data/smart_home/**/snapshots/sensor_readings/ gs://ds-mig-b7e7761f-6f7f-4116-81a5-e8eefcf0cc1d/272eac1d-df8e-4d1b-a7c6-71d5af232182/sstables/dse0
-
Monitor upload progress:
-
Use
gsutilto get a list of objects that have been successfully uploaded to the migration directory:gsutil ls -r MIGRATION_DIR
Replace
MIGRATION_DIRwith theuploadBucketDirthat was generated when you initialized the migration. -
Compare the returned list against the files in your snapshot directory. When the lists match, the upload is complete.
The
-mflag ingsutil -m rsyncenables parallel synchronization, which can improve upload speed. However, upload time primarily depends on the snapshot size, network throughput from your origin cluster to the migration bucket, and whether the origin cluster and migration bucket are in the same region.
-
-
Repeat the upload process for each snapshot (
SNAPSHOT_NAME) and node (NODE_NAME) in your origin cluster.TipUse aforloop to simplify snapshot uploadsIf the nodes in your origin cluster have predictable names (for example,
dse0,dse1, anddse2), then you can use aforloop to streamline the execution of thegsutil rsynccommands. For example:for i in 0 1 2; do ssh dse${i} \ du -sh CASSANDRA_DATA_DIR/KEYSPACE_NAME/*/snapshots/*SNAPSHOT_NAME*; \ gsutil -m rsync -r -d CASSANDRA_DATA_DIR/KEYSPACE_NAME/**/snapshots/SNAPSHOT_NAME/ MIGRATION_DIRdse${i} & done
sideloader:partial$staged-snapshots-need-import-ph.adoc
sideloader:partial$idle-migration-directories-note.adoc
-
Set environment variables for the following values:
-
AZURE_SAS_TOKEN: TheurlSignaturekey that was generated when you initialized the migration. -
CASSANDRA_DATA_DIR: The absolute file system path to where {cass-short} data is stored on the node, including the trailing slash. For example,/var/lib/cassandra/data/. -
SNAPSHOT_NAME: The name of the snapshot backup that you created withnodetool snapshot. -
MIGRATION_DIR: The entireuploadBucketDirvalue that was generated when you initialized the migration, including the trailing slash. -
NODE_NAME: The host name of the node that your snapshots are from. It is important to use the specific node name to ensure that each node has a unique directory in the migration bucket.
export AZURE_SAS_TOKEN="AZURE_CREDENTIALS_URL" export CASSANDRA_DATA_DIR="CASSANDRA_DATA_DIR" export SNAPSHOT_NAME="SNAPSHOT_NAME" export MIGRATION_DIR="MIGRATION_DIR" export NODE_NAME="NODE_NAME"
-
-
Use the Azure CLI to upload one snapshot from one node into the migration directory:
for dir in $(find "$CASSANDRA_DATA_DIR" -type d -path "*/snapshots/${SNAPSHOT_NAME}*"); do REL_PATH="${dir#"$CASSANDRA_DATA_DIR"}" # Remove the base path DEST_PATH="${MIGRATION_DIR}${NODE_NAME}/${REL_PATH}/?${AZURE_SAS_TOKEN}" azcopy sync "$dir" "$DEST_PATH" --recursive done
-
Monitor upload progress:
-
Use the Azure CLI to get the curent contents of the migration directory:
azcopy list ${MIGRATION_DIR}?${AZURE_SAS_TOKEN}
-
Compare the returned list against the files in your snapshot directory. When the lists match, the upload is complete.
Upload time primarily depends on the snapshot size, network throughput from your origin cluster to the migration bucket, and whether the origin cluster and migration bucket are in the same region.
-
-
Repeat the upload process for each snapshot and node in your origin cluster. Be sure to change the
SNAPSHOT_NAMEandNODE_NAMEenvironment variables as needed.
sideloader:partial$staged-snapshots-need-import-ph.adoc
sideloader:partial$idle-migration-directories-note.adoc
After you completely upload snapshots for each origin node, import the data into your target database.
Data import is a multi-step operation that requires complete success. If one step fails, then the entire import operation stops and the migration fails.
To learn more about the data import process, see About {sstable-sideloader}: Import data.
|
Warning
|
|
-
Use the {devops-api} to launch the data import:
curl -X POST \ -H "Authorization: Bearer ${token}" \ https://api.astra.datastax.com/v2/databases/${dbID}/migrations/${migrationID}/launch \ | jq .Although this call returns immediately, the import process takes time.
-
Check the migration status periodically:
sideloader:partial$check-status.adoc
-
Check the
statusfield in the response:-
"status": "ImportInProgress": The data is still being imported. Wait a few minutes before you check the status again. -
"status": "MigrationDone": The import is complete, and you can proceed to Validate the migrated data.
-
-
If the migration takes more than a few days, manually reschedule the cleanup to avoid automatic cleanup.
-
If the migration fails, see sideloader:troubleshoot-sideloader.adoc.