Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 35 additions & 35 deletions docs/modules/ROOT/pages/backfill-cli.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ Run as a `pulsar-admin` extension::
The `pulsar-admin` extension is packaged with the IBM Elite Support for {pulsar} (formerly {company} Luna Streaming) distribution in the /cliextensions folder, so you don't need to build from source unless you want to make changes to the code.
+
.. Move the generated NAR archive to the /cliextensions folder of your {pulsar-short} installation (e.g. /pulsar/cliextensions).
..Modify the client.conf file of your {pulsar-short} installation to include: `customCommandFactories=cassandra-cdc`.
.. Modify the client.conf file of your {pulsar-short} installation to include: `customCommandFactories=cassandra-cdc`.
.. Run the following command (this assumes the xref:6.9@dse:installing:tarball-dse.adoc[default tarball installation of {dse-short}]):
+
[source,shell]
Expand Down Expand Up @@ -264,86 +264,86 @@ In both the `pulsar-admin` extension and the standalone Java application, {cass-
|Parameter
|Description

|--data-dir=PATH
|`--data-dir=PATH`
|The directory where data is exported to and imported from. The
default is a 'data' subdirectory in the current working directory.
The data directory is created if it doesn't exist. Tables are exported in subdirectories of the data directory specified here;
there is one subdirectory per keyspace inside the data
directory, then one subdirectory per table inside each keyspace
directory.

|--help, -h
|`--help`, `-h`
|Displays this help message

|--dsbulk-log-dir=PATH, -l
|`--dsbulk-log-dir=PATH`, `-l`
|The directory where {dsbulk-short} should store its logs. The default is a
'logs' subdirectory in the current working directory. This
subdirectory is created if it doesn't exist. Each {dsbulk-short}
operation creates a subdirectory inside the log directory
specified here. This command isn't available in the `pulsar-admin` extension.

|--export-bundle=PATH
|`--export-bundle=PATH`
|The path to a {scb} to connect to an {astra-db} database. Options --export-host and --export-bundle are mutually exclusive.

|--export-consistency=CONSISTENCY
|`--export-consistency=CONSISTENCY`
|The consistency level to use when exporting data. The default is
LOCAL_QUORUM.

|--export-max-concurrent-files=NUM\|AUTO
|`--export-max-concurrent-files=NUM|AUTO`
|The maximum number of concurrent files to write to. Must be a positive
number or the special value AUTO. The default is AUTO.

|--export-max-concurrent-queries=NUM\|AUTO
|`--export-max-concurrent-queries=NUM|AUTO`
|The maximum number of concurrent queries to execute. Must be a
positive number or the special value AUTO. The default is AUTO.

|--export-splits=NUM\|NC
|`--export-splits=NUM|NC`
|The maximum number of token range queries to generate. Use the NC
syntax to specify a multiple of the number of available cores, e.g.
8C = 8 times the number of available cores. The default is 8C. This
is an advanced setting; you should rarely need to modify the default
value.

|--export-dsbulk-option=OPT=VALUE
|`--export-dsbulk-option=OPT=VALUE`
|An extra {dsbulk-short} option to use when exporting. Any valid {dsbulk-short} option
can be specified here, and it is passed as-is to the {dsbulk-short}
process. {dsbulk-short} options, including driver options, must be passed as
'--long.option.name=<value>'. Short options aren't supported.

|--export-host=HOST[:PORT]
|`--export-host=HOST[:PORT]`
|The host name or IP and, optionally, the port of a node from the
{cass-short} cluster. If the port isn't specified, it defaults to
9042. This option can be specified multiple times. Options
--export-host and --export-bundle are mutually exclusive.

|--export-password
|`--export-password`
|The password to use to authenticate against the origin cluster.
Options --export-username and --export-password must be provided
together, or not at all. Omit the parameter value to be prompted for
the password interactively.

|--export-protocol-version=VERSION
|`--export-protocol-version=VERSION`
|The protocol version to use to connect to the {cass-short} cluster, e.g.
'V4'. If not specified, the driver negotiates the highest
version supported by both the client and the server.

|--export-username=STRING
|`--export-username=STRING`
|The username to use to authenticate against the origin cluster.
Options --export-username and --export-password must be provided
together, or not at all.

|--keyspace=<keyspace>, -k
|`--keyspace=<keyspace>`, `-k`
|The name of the keyspace where the table to be exported exists

|--max-rows-per-second=PATH
|`--max-rows-per-second=PATH`
|The maximum number of rows per second to read from the {cass-short}
table. Setting this option to any negative value or zero
disables it. The default is -1.

|--table=<table>, -t
|`--table=<table>`, `-t`
|The name of the table to export data from for cdc back filling

|--version, -v
|`--version`, `-v`
|Displays version info.
|===

Expand All @@ -357,52 +357,52 @@ These parameters should be passed as command line arguments in the standalone Ja
|===
|Parameter |Description

|--events-topic-prefix=<topicPrefix>
|`--events-topic-prefix=<topicPrefix>`
|The event topic name prefix. The `<keyspace_name>.<table_name>` is appended to that prefix to build the topic name.

|--pulsar-auth-params=<pulsarAuthParams>
|`--pulsar-auth-params=<pulsarAuthParams>`
|The {pulsar-short} authentication parameters.

|--pulsar-auth-plugin-class-name=<pulsarAuthPluginClassName>
|`--pulsar-auth-plugin-class-name=<pulsarAuthPluginClassName>`
|The {pulsar-short} authentication plugin class name.

|--pulsar-url=<pulsarServiceUrl>
|`--pulsar-url=<pulsarServiceUrl>`
|The {pulsar-short} broker service URL.

|--pulsar-ssl-provider=<sslProvider>
|`--pulsar-ssl-provider=<sslProvider>`
|The SSL/TLS provider to use.

|--pulsar-ssl-truststore-path=<sslTruststorePath>
|`--pulsar-ssl-truststore-path=<sslTruststorePath>`
|The path to the SSL/TLS truststore file.

|--pulsar-ssl-truststore-password=<sslTruststorePassword>
|`--pulsar-ssl-truststore-password=<sslTruststorePassword>`
|The password for the SSL/TLS truststore.

|--pulsar-ssl-truststore-type=<sslTruststoreType>
|`--pulsar-ssl-truststore-type=<sslTruststoreType>`
|The type of the SSL/TLS truststore.

|--pulsar-ssl-keystore-path=<sslKeystorePath>
|`--pulsar-ssl-keystore-path=<sslKeystorePath>`
|The path to the SSL/TLS keystore file.

|--pulsar-ssl-keystore-password=<sslKeystorePassword>
|`--pulsar-ssl-keystore-password=<sslKeystorePassword>`
|The password for the SSL/TLS keystore.

|--pulsar-ssl-cipher-suites=<sslCipherSuites>
|`--pulsar-ssl-cipher-suites=<sslCipherSuites>`
|Defines one or more cipher suites to use for negotiating the SSL/TLS connection.

|--pulsar-ssl-enabled-protocols=<sslEnabledProtocols>
|`--pulsar-ssl-enabled-protocols=<sslEnabledProtocols>`
|Enabled SSL/TLS protocols

|--pulsar-ssl-allow-insecure-connections
|`--pulsar-ssl-allow-insecure-connections`
|Allows insecure connections to servers whose certificate hasn't been signed by an approved CA.
Always disable `sslAllowInsecureConnection` in production environments.
Always disable `--pulsar-ssl-allow-insecure-connections` in production environments.

|--pulsar-ssl-enable-hostname-verification
|`--pulsar-ssl-enable-hostname-verification`
|Enable the server hostname verification.

|--pulsar-ssl-tls-trust-certs-path=<tlsTrustCertsFilePath>
|`--pulsar-ssl-tls-trust-certs-path=<tlsTrustCertsFilePath>`
|The path to the trusted TLS certificate file.

|--pulsar-ssl-use-key-store-tls
|`--pulsar-ssl-use-key-store-tls`
|If TLS is enabled, specifies whether to use KeyStore type as TLS configuration parameter.
|===
32 changes: 17 additions & 15 deletions docs/modules/ROOT/pages/cdc-cassandra-events.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ The {product} agent pushes the mutation primary key for the CDC-enabled table in

In order to support https://pulsar.apache.org/docs/en/concepts-topic-compaction/[{pulsar-short} Topic Compaction], the message key is encoded separately from the message payload, in the message metadata.

Finally, the following CQL data types are encoded as AVRO logical types:
The following CQL data types are encoded as https://avro.apache.org/docs/current/spec.html#Logical+Types[AVRO logical types]:

* `Date`
* `Decimal`
Expand All @@ -16,15 +16,13 @@ Finally, the following CQL data types are encoded as AVRO logical types:
* `Varint`
* `Uuid`, `timeuuid`

See https://avro.apache.org/docs/current/spec.html#Logical+Types[AVRO Logical Types] for more info on AVRO.

== Change Event's Key

For a given table, the change event's key is an AVRO record that contains a field for each column in the primary key of the table at the time the event was created. Both the events and the data topics (also called the dirty and the clean topics) have the same message key, an AVRO record including the primary key columns.

== `INSERT` Event

Let's create a {cass-short} table to illustrate what happens:
Create a {cass-short} table to test this behavior:

[source,bash]
----
Expand All @@ -49,7 +47,7 @@ CREATE TABLE ks1.tbl1 (
AND speculative_retry = '99PERCENTILE';
----

Then insert a row:
Insert a row:

[source,bash]
----
Expand Down Expand Up @@ -89,6 +87,10 @@ You can check the connector status with the following command. The connector mus
[source,bash]
----
bin/pulsar-admin source status --name cassandra-source-ks1-table1
----

[source,json]
----
{
"numInstances" : 1,
"numRunning" : 1,
Expand All @@ -115,22 +117,24 @@ bin/pulsar-admin source status --name cassandra-source-ks1-table1

If you're having issues consuming CDC events, check the source connector logs on your {pulsar-short} function workers and the data topic schema.

=== Check the source connector logs

Check the source connector logs::
Check the source connector logs on your {pulsar-short} function workers. The name of the logs depends on the connectors' name.

+
[source,bash]
----
cat logs/functions/public/default/cassandra-source-ks1-table1/cassandra-source-ks1-table1-0.log
----

=== Check the data topic schema

Check the data topic schema::
Check the https://pulsar.apache.org/docs/en/schema-manage/[{pulsar-short} schema] to ensure the clean topic matches your CQL table:

+
[source,bash]
----
bin/pulsar-admin schemas get "persistent://public/default/data-ks1.table1"
----
+
[source,json]
----
{
"version": 0,
"schemaInfo": {
Expand Down Expand Up @@ -188,8 +192,6 @@ bin/pulsar-admin schemas get "persistent://public/default/data-ks1.table1"
}
----

== What's next?

For more on change data capture, see xref:cdcExample.adoc[].

== See also

* xref:ROOT:cdcExample.adoc[]
2 changes: 1 addition & 1 deletion docs/modules/ROOT/pages/cdcExample.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -229,4 +229,4 @@ Any captured CDC events from your database table should be reflected in the comm

== See also

* xref:monitor.adoc[]
* xref:ROOT:monitor.adoc[]
28 changes: 14 additions & 14 deletions docs/modules/ROOT/pages/faqs.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ From there, the data can be published to external platforms like Elasticsearch,

== How do I install {product-short}?

Follow the xref:install.adoc[installation instructions].
Follow the xref:ROOT:install.adoc[installation instructions].

== What are the requirements for {product-short}?

Expand Down Expand Up @@ -65,7 +65,7 @@ If the {pulsar-short} cluster is down, the change agent continues trying to send
When the disk space of the `cdc_raw` directory reaches your `cdc_total_space_in_mb` {cass-short} setting (less than 4 GB by default), writes to CDC-enabled tables fail with a `CDCWriteException`.
The following warning message is included in {cass-short} logs:

[source,bash]
[source,console]
----
WARN [CoreThread-5] 2021-10-29 09:12:52,790 NoSpamLogger.java:98 - Rejecting Mutation containing CDC-enabled table. Free up space in /mnt/data/cdc_raw.
----
Expand Down Expand Up @@ -93,7 +93,7 @@ SELECT * FROM system_distributed.cdc_local WHERE keyspace_name = 'keyspace_name'

There are three possible statuses:

Enabled::
`enabled`::
If the CDC status is `enabled`, then CDC is enabled on the table.
+
From this status, you can disable CDC on the table by running the following CQL query:
Expand All @@ -103,7 +103,7 @@ From this status, you can disable CDC on the table by running the following CQL
ALTER TABLE keyspace_name.table_name WITH cdc = {'enabled': false};
----

Disabled::
`disabled`::
If the CDC status is `disabled` then CDC is disabled on the table.
+
From this status, you can enable CDC on the table by running the following CQL query:
Expand All @@ -113,7 +113,7 @@ From this status, you can enable CDC on the table by running the following CQL q
ALTER TABLE keyspace_name.table_name WITH cdc = {'enabled': true};
----

Null::
`null`::
If the CDC status is `null` then CDC isn't enabled on the table.
+
From this status, you can enable CDC on the table by running the following CQL query:
Expand All @@ -134,7 +134,7 @@ SELECT * FROM system_distributed.cdc_local WHERE keyspace_name = 'cdc' AND table

There are three possible statuses:

Running::
`running`::
If the `status` column is `running`, then the agent is running.
+
From this status, you can stop the agent by running the following CQL query:
Expand All @@ -144,7 +144,7 @@ From this status, you can stop the agent by running the following CQL query:
ALTER TABLE cdc.raw_cdc WITH cdc = {'enabled': false};
----

Stopped::
`stopped`::
If the `status` column is `stopped` then the agent isn't running.
+
From this status, you can start the agent by running the following CQL query:
Expand All @@ -154,7 +154,7 @@ From this status, you can start the agent by running the following CQL query:
ALTER TABLE cdc.raw_cdc WITH cdc = {'enabled': true};
----

Null::
`null`::
If the `status` column is `null`, then the agent isn't running.
+
From this status, you can start the agent by running the following CQL query:
Expand All @@ -179,7 +179,7 @@ The design of CDC in {cass-short} assumed that when table changes are synchroniz
There is a max log size setting that disables writes to the table when the set threshold is reached.
If a connection to the {pulsar-short} cluster is needed for the log to be drained, and it isn't responsive, then the log begins to fill, which can impact a table's write availability.

For more, see the xref:cdc-for-cassandra:ROOT:install.adoc#scaling-up-your-configuration[Scaling up your CDC configuration].
For more, see the xref:ROOT:install.adoc#scaling-up-your-configuration[Scaling up your CDC configuration].

== Does the {csc_pulsar_first} use a dead-letter topic?

Expand Down Expand Up @@ -227,11 +227,11 @@ The most manageable way to handle this is to use the {pulsar-short} {cass-short}

The {cass-short} sink requires the following provisions:

- Use the CDC data topic as its source of messages
- Provide a secure bundle (creds) to another {cass-short} cluster
- Map message values to a specific table in the other cluster
- Use the {pulsar-short} delivery guarantee to ensure success
- Use the {pulsar-short} connector health metrics to monitor failures
* Use the CDC data topic as its source of messages
* Provide a secure bundle (creds) to another {cass-short} cluster
* Map message values to a specific table in the other cluster
* Use the {pulsar-short} delivery guarantee to ensure success
* Use the {pulsar-short} connector health metrics to monitor failures

== How do I migrate table data using CDC?

Expand Down
Loading
Loading