diff --git a/docs/modules/ROOT/pages/backfill-cli.adoc b/docs/modules/ROOT/pages/backfill-cli.adoc index 72726337..5c3ae3eb 100644 --- a/docs/modules/ROOT/pages/backfill-cli.adoc +++ b/docs/modules/ROOT/pages/backfill-cli.adoc @@ -50,7 +50,7 @@ Run as a `pulsar-admin` extension:: The `pulsar-admin` extension is packaged with the IBM Elite Support for {pulsar} (formerly {company} Luna Streaming) distribution in the /cliextensions folder, so you don't need to build from source unless you want to make changes to the code. + .. Move the generated NAR archive to the /cliextensions folder of your {pulsar-short} installation (e.g. /pulsar/cliextensions). -..Modify the client.conf file of your {pulsar-short} installation to include: `customCommandFactories=cassandra-cdc`. +.. Modify the client.conf file of your {pulsar-short} installation to include: `customCommandFactories=cassandra-cdc`. .. Run the following command (this assumes the xref:6.9@dse:installing:tarball-dse.adoc[default tarball installation of {dse-short}]): + [source,shell] @@ -264,7 +264,7 @@ In both the `pulsar-admin` extension and the standalone Java application, {cass- |Parameter |Description -|--data-dir=PATH +|`--data-dir=PATH` |The directory where data is exported to and imported from. The default is a 'data' subdirectory in the current working directory. The data directory is created if it doesn't exist. Tables are exported in subdirectories of the data directory specified here; @@ -272,78 +272,78 @@ there is one subdirectory per keyspace inside the data directory, then one subdirectory per table inside each keyspace directory. -|--help, -h +|`--help`, `-h` |Displays this help message -|--dsbulk-log-dir=PATH, -l +|`--dsbulk-log-dir=PATH`, `-l` |The directory where {dsbulk-short} should store its logs. The default is a 'logs' subdirectory in the current working directory. This subdirectory is created if it doesn't exist. Each {dsbulk-short} operation creates a subdirectory inside the log directory specified here. This command isn't available in the `pulsar-admin` extension. -|--export-bundle=PATH +|`--export-bundle=PATH` |The path to a {scb} to connect to an {astra-db} database. Options --export-host and --export-bundle are mutually exclusive. -|--export-consistency=CONSISTENCY +|`--export-consistency=CONSISTENCY` |The consistency level to use when exporting data. The default is LOCAL_QUORUM. -|--export-max-concurrent-files=NUM\|AUTO +|`--export-max-concurrent-files=NUM|AUTO` |The maximum number of concurrent files to write to. Must be a positive number or the special value AUTO. The default is AUTO. -|--export-max-concurrent-queries=NUM\|AUTO +|`--export-max-concurrent-queries=NUM|AUTO` |The maximum number of concurrent queries to execute. Must be a positive number or the special value AUTO. The default is AUTO. -|--export-splits=NUM\|NC +|`--export-splits=NUM|NC` |The maximum number of token range queries to generate. Use the NC syntax to specify a multiple of the number of available cores, e.g. 8C = 8 times the number of available cores. The default is 8C. This is an advanced setting; you should rarely need to modify the default value. -|--export-dsbulk-option=OPT=VALUE +|`--export-dsbulk-option=OPT=VALUE` |An extra {dsbulk-short} option to use when exporting. Any valid {dsbulk-short} option can be specified here, and it is passed as-is to the {dsbulk-short} process. {dsbulk-short} options, including driver options, must be passed as '--long.option.name='. Short options aren't supported. -|--export-host=HOST[:PORT] +|`--export-host=HOST[:PORT]` |The host name or IP and, optionally, the port of a node from the {cass-short} cluster. If the port isn't specified, it defaults to 9042. This option can be specified multiple times. Options --export-host and --export-bundle are mutually exclusive. -|--export-password +|`--export-password` |The password to use to authenticate against the origin cluster. Options --export-username and --export-password must be provided together, or not at all. Omit the parameter value to be prompted for the password interactively. -|--export-protocol-version=VERSION +|`--export-protocol-version=VERSION` |The protocol version to use to connect to the {cass-short} cluster, e.g. 'V4'. If not specified, the driver negotiates the highest version supported by both the client and the server. -|--export-username=STRING +|`--export-username=STRING` |The username to use to authenticate against the origin cluster. Options --export-username and --export-password must be provided together, or not at all. -|--keyspace=, -k +|`--keyspace=`, `-k` |The name of the keyspace where the table to be exported exists -|--max-rows-per-second=PATH +|`--max-rows-per-second=PATH` |The maximum number of rows per second to read from the {cass-short} table. Setting this option to any negative value or zero disables it. The default is -1. -|--table=, -t +|`--table=
`, `-t` |The name of the table to export data from for cdc back filling -|--version, -v +|`--version`, `-v` |Displays version info. |=== @@ -357,52 +357,52 @@ These parameters should be passed as command line arguments in the standalone Ja |=== |Parameter |Description -|--events-topic-prefix= +|`--events-topic-prefix=` |The event topic name prefix. The `.` is appended to that prefix to build the topic name. -|--pulsar-auth-params= +|`--pulsar-auth-params=` |The {pulsar-short} authentication parameters. -|--pulsar-auth-plugin-class-name= +|`--pulsar-auth-plugin-class-name=` |The {pulsar-short} authentication plugin class name. -|--pulsar-url= +|`--pulsar-url=` |The {pulsar-short} broker service URL. -|--pulsar-ssl-provider= +|`--pulsar-ssl-provider=` |The SSL/TLS provider to use. -|--pulsar-ssl-truststore-path= +|`--pulsar-ssl-truststore-path=` |The path to the SSL/TLS truststore file. -|--pulsar-ssl-truststore-password= +|`--pulsar-ssl-truststore-password=` |The password for the SSL/TLS truststore. -|--pulsar-ssl-truststore-type= +|`--pulsar-ssl-truststore-type=` |The type of the SSL/TLS truststore. -|--pulsar-ssl-keystore-path= +|`--pulsar-ssl-keystore-path=` |The path to the SSL/TLS keystore file. -|--pulsar-ssl-keystore-password= +|`--pulsar-ssl-keystore-password=` |The password for the SSL/TLS keystore. -|--pulsar-ssl-cipher-suites= +|`--pulsar-ssl-cipher-suites=` |Defines one or more cipher suites to use for negotiating the SSL/TLS connection. -|--pulsar-ssl-enabled-protocols= +|`--pulsar-ssl-enabled-protocols=` |Enabled SSL/TLS protocols -|--pulsar-ssl-allow-insecure-connections +|`--pulsar-ssl-allow-insecure-connections` |Allows insecure connections to servers whose certificate hasn't been signed by an approved CA. -Always disable `sslAllowInsecureConnection` in production environments. +Always disable `--pulsar-ssl-allow-insecure-connections` in production environments. -|--pulsar-ssl-enable-hostname-verification +|`--pulsar-ssl-enable-hostname-verification` |Enable the server hostname verification. -|--pulsar-ssl-tls-trust-certs-path= +|`--pulsar-ssl-tls-trust-certs-path=` |The path to the trusted TLS certificate file. -|--pulsar-ssl-use-key-store-tls +|`--pulsar-ssl-use-key-store-tls` |If TLS is enabled, specifies whether to use KeyStore type as TLS configuration parameter. |=== \ No newline at end of file diff --git a/docs/modules/ROOT/pages/cdc-cassandra-events.adoc b/docs/modules/ROOT/pages/cdc-cassandra-events.adoc index 9f7f3fc2..c224c336 100644 --- a/docs/modules/ROOT/pages/cdc-cassandra-events.adoc +++ b/docs/modules/ROOT/pages/cdc-cassandra-events.adoc @@ -7,7 +7,7 @@ The {product} agent pushes the mutation primary key for the CDC-enabled table in In order to support https://pulsar.apache.org/docs/en/concepts-topic-compaction/[{pulsar-short} Topic Compaction], the message key is encoded separately from the message payload, in the message metadata. -Finally, the following CQL data types are encoded as AVRO logical types: +The following CQL data types are encoded as https://avro.apache.org/docs/current/spec.html#Logical+Types[AVRO logical types]: * `Date` * `Decimal` @@ -16,15 +16,13 @@ Finally, the following CQL data types are encoded as AVRO logical types: * `Varint` * `Uuid`, `timeuuid` -See https://avro.apache.org/docs/current/spec.html#Logical+Types[AVRO Logical Types] for more info on AVRO. - == Change Event's Key For a given table, the change event's key is an AVRO record that contains a field for each column in the primary key of the table at the time the event was created. Both the events and the data topics (also called the dirty and the clean topics) have the same message key, an AVRO record including the primary key columns. == `INSERT` Event -Let's create a {cass-short} table to illustrate what happens: +Create a {cass-short} table to test this behavior: [source,bash] ---- @@ -49,7 +47,7 @@ CREATE TABLE ks1.tbl1 ( AND speculative_retry = '99PERCENTILE'; ---- -Then insert a row: +Insert a row: [source,bash] ---- @@ -89,6 +87,10 @@ You can check the connector status with the following command. The connector mus [source,bash] ---- bin/pulsar-admin source status --name cassandra-source-ks1-table1 +---- + +[source,json] +---- { "numInstances" : 1, "numRunning" : 1, @@ -115,22 +117,24 @@ bin/pulsar-admin source status --name cassandra-source-ks1-table1 If you're having issues consuming CDC events, check the source connector logs on your {pulsar-short} function workers and the data topic schema. -=== Check the source connector logs - +Check the source connector logs:: Check the source connector logs on your {pulsar-short} function workers. The name of the logs depends on the connectors' name. - ++ [source,bash] ---- cat logs/functions/public/default/cassandra-source-ks1-table1/cassandra-source-ks1-table1-0.log ---- -=== Check the data topic schema - +Check the data topic schema:: Check the https://pulsar.apache.org/docs/en/schema-manage/[{pulsar-short} schema] to ensure the clean topic matches your CQL table: - ++ [source,bash] ---- bin/pulsar-admin schemas get "persistent://public/default/data-ks1.table1" +---- ++ +[source,json] +---- { "version": 0, "schemaInfo": { @@ -188,8 +192,6 @@ bin/pulsar-admin schemas get "persistent://public/default/data-ks1.table1" } ---- -== What's next? - -For more on change data capture, see xref:cdcExample.adoc[]. - +== See also +* xref:ROOT:cdcExample.adoc[] \ No newline at end of file diff --git a/docs/modules/ROOT/pages/cdcExample.adoc b/docs/modules/ROOT/pages/cdcExample.adoc index 52097bb4..5ce6676d 100644 --- a/docs/modules/ROOT/pages/cdcExample.adoc +++ b/docs/modules/ROOT/pages/cdcExample.adoc @@ -229,4 +229,4 @@ Any captured CDC events from your database table should be reflected in the comm == See also -* xref:monitor.adoc[] \ No newline at end of file +* xref:ROOT:monitor.adoc[] \ No newline at end of file diff --git a/docs/modules/ROOT/pages/faqs.adoc b/docs/modules/ROOT/pages/faqs.adoc index cf9d1110..d50e4eec 100644 --- a/docs/modules/ROOT/pages/faqs.adoc +++ b/docs/modules/ROOT/pages/faqs.adoc @@ -23,7 +23,7 @@ From there, the data can be published to external platforms like Elasticsearch, == How do I install {product-short}? -Follow the xref:install.adoc[installation instructions]. +Follow the xref:ROOT:install.adoc[installation instructions]. == What are the requirements for {product-short}? @@ -65,7 +65,7 @@ If the {pulsar-short} cluster is down, the change agent continues trying to send When the disk space of the `cdc_raw` directory reaches your `cdc_total_space_in_mb` {cass-short} setting (less than 4 GB by default), writes to CDC-enabled tables fail with a `CDCWriteException`. The following warning message is included in {cass-short} logs: -[source,bash] +[source,console] ---- WARN [CoreThread-5] 2021-10-29 09:12:52,790 NoSpamLogger.java:98 - Rejecting Mutation containing CDC-enabled table. Free up space in /mnt/data/cdc_raw. ---- @@ -93,7 +93,7 @@ SELECT * FROM system_distributed.cdc_local WHERE keyspace_name = 'keyspace_name' There are three possible statuses: -Enabled:: +`enabled`:: If the CDC status is `enabled`, then CDC is enabled on the table. + From this status, you can disable CDC on the table by running the following CQL query: @@ -103,7 +103,7 @@ From this status, you can disable CDC on the table by running the following CQL ALTER TABLE keyspace_name.table_name WITH cdc = {'enabled': false}; ---- -Disabled:: +`disabled`:: If the CDC status is `disabled` then CDC is disabled on the table. + From this status, you can enable CDC on the table by running the following CQL query: @@ -113,7 +113,7 @@ From this status, you can enable CDC on the table by running the following CQL q ALTER TABLE keyspace_name.table_name WITH cdc = {'enabled': true}; ---- -Null:: +`null`:: If the CDC status is `null` then CDC isn't enabled on the table. + From this status, you can enable CDC on the table by running the following CQL query: @@ -134,7 +134,7 @@ SELECT * FROM system_distributed.cdc_local WHERE keyspace_name = 'cdc' AND table There are three possible statuses: -Running:: +`running`:: If the `status` column is `running`, then the agent is running. + From this status, you can stop the agent by running the following CQL query: @@ -144,7 +144,7 @@ From this status, you can stop the agent by running the following CQL query: ALTER TABLE cdc.raw_cdc WITH cdc = {'enabled': false}; ---- -Stopped:: +`stopped`:: If the `status` column is `stopped` then the agent isn't running. + From this status, you can start the agent by running the following CQL query: @@ -154,7 +154,7 @@ From this status, you can start the agent by running the following CQL query: ALTER TABLE cdc.raw_cdc WITH cdc = {'enabled': true}; ---- -Null:: +`null`:: If the `status` column is `null`, then the agent isn't running. + From this status, you can start the agent by running the following CQL query: @@ -179,7 +179,7 @@ The design of CDC in {cass-short} assumed that when table changes are synchroniz There is a max log size setting that disables writes to the table when the set threshold is reached. If a connection to the {pulsar-short} cluster is needed for the log to be drained, and it isn't responsive, then the log begins to fill, which can impact a table's write availability. -For more, see the xref:cdc-for-cassandra:ROOT:install.adoc#scaling-up-your-configuration[Scaling up your CDC configuration]. +For more, see the xref:ROOT:install.adoc#scaling-up-your-configuration[Scaling up your CDC configuration]. == Does the {csc_pulsar_first} use a dead-letter topic? @@ -227,11 +227,11 @@ The most manageable way to handle this is to use the {pulsar-short} {cass-short} The {cass-short} sink requires the following provisions: -- Use the CDC data topic as its source of messages -- Provide a secure bundle (creds) to another {cass-short} cluster -- Map message values to a specific table in the other cluster -- Use the {pulsar-short} delivery guarantee to ensure success -- Use the {pulsar-short} connector health metrics to monitor failures +* Use the CDC data topic as its source of messages +* Provide a secure bundle (creds) to another {cass-short} cluster +* Map message values to a specific table in the other cluster +* Use the {pulsar-short} delivery guarantee to ensure success +* Use the {pulsar-short} connector health metrics to monitor failures == How do I migrate table data using CDC? diff --git a/docs/modules/ROOT/pages/index.adoc b/docs/modules/ROOT/pages/index.adoc index 373c72d3..8e40b25b 100644 --- a/docs/modules/ROOT/pages/index.adoc +++ b/docs/modules/ROOT/pages/index.adoc @@ -17,9 +17,8 @@ Other than the prerequisite {cass-short} and {pulsar-short} clusters, {product-s * {cdc_agent_first}, which is an event producer deployed as a JVM agent on each {cass-short} data node * {csc_pulsar_first}, which is a source connector deployed in your {pulsar-short} cluster -The following diagram describes the general architecture. - -image::cdc-for-cassandra-overview.png[] +.General {product-short} architecture +image::ROOT:cdc-for-cassandra-overview.png[] Since version 3.0, {cass-short} has included a change data capture (CDC) feature. The CDC feature can be enabled on the table level by setting the table property `cdc=true`, after which any commit log containing data for a CDC-enabled table is moved to the CDC directory specified in `cassandra.yaml` on discard/flush (default: `cdc_raw`). @@ -36,14 +35,19 @@ When CDC is enabled: It maintains a processing offset for each commit log. If the {cdc_agent} restarts, it picks up where it left off using the recorded offset value. -The following table describes what is published to the data topic for each update to a CDC-enabled {cass-short} table. - -[cols="1,1"] +.Event data published to the data topic for each write to a CDC-enabled {cass-short} table. +[cols=2] |=== -| Type | Event Data -| insert | Key set to primary key of the row, value set to all column values -| update | Key set to primary key of the row, value set to all column values -| delete | Key set to primary key of the row, value set to null +|Type |Event Data + +|insert +|Key set to primary key of the row, value set to all column values + +|update +|Key set to primary key of the row, value set to all column values + +|delete +|Key set to primary key of the row, value set to null |=== The {csc_pulsar} updates the schema registry to dynamically reflect the {cass-short} table schema. @@ -60,37 +64,41 @@ For each update to the table, an MD5 digest is calculated to de-duplicate the up === Change Agent deployment matrix -[cols="1,1"] -|=== -| {cass-short} version | Self-managed {pulsar} or IBM Elite Support for {pulsar} (formerly {company} Luna Streaming) agent -| {cass-short} 3.x | {product-repo}/tree/master/agent-c3[agent-c3] -| {cass-short} 4.x | {product-repo}/tree/master/agent-c4[agent-c4] -| {dse-short} 6.8.16 or later | {product-repo}/tree/master/agent-dse4[agent-dse4] +[cols=2] |=== +|{cass-short} version |Self-managed {pulsar} or IBM Elite Support for {pulsar} (formerly {company} Luna Streaming) agent -== Supported streaming platforms +|{cass-short} 3.x +|{product-repo}/tree/master/agent-c3[agent-c3] -* IBM Elite Support for {pulsar} (formerly {company} Luna Streaming) 2.8 and later (current version is {luna_version}) -* Self-managed {pulsar} version 2.8.1 and later +|{cass-short} 4.x +|{product-repo}/tree/master/agent-c4[agent-c4] -=== Connector deployment matrix - -[cols="1"] -|=== -| Self-managed {pulsar} or IBM Elite Support for {pulsar} (formerly {company} Luna Streaming) -| {product-repo}/tree/master/connector[connector] +|{dse-short} 6.8.16 or later +|{product-repo}/tree/master/agent-dse4[agent-dse4] |=== [#supported-databases] -== Supported databases +== Compatibility -* {cass-reg} 3.11.x and 4.x databases -* {dse} 6.8.16 or later +Supported streaming platforms:: ++ +* IBM Elite Support for {pulsar} (formerly {company} Luna Streaming) 2.8 and later (current version is {luna_version}) +* Self-managed {pulsar} version 2.8.1 and later -== Supported {cass-short} data structures +Connectors:: ++ +* {product-repo}/tree/master/connector[Connector for self-managed {pulsar} or IBM Elite Support for {pulsar}] -The following CQL data types are encoded as AVRO logical types: +Supported databases:: ++ +* Open-source {cass-reg} 3.11.x and 4.x +* {dse} 6.8.16 or later +Supported {cass-short} data structures:: +The following CQL data types are encoded as AVRO logical types: ++ +-- * ascii (string) * bigint (long) * blob(bytes) @@ -116,17 +124,18 @@ The following CQL data types are encoded as AVRO logical types: * User Defined Types (record) * uuid (uuid) * varint (cql_varint) - -[NOTE] -==== -If using the `key-value-json` output format, the supported {cass-short} types are the same as AVRO. The output is an exact schema with logical types, but with a JSON schema type. -==== - -{cass-short} static columns are supported: - +-- ++ +If using the `key-value-json` output format, the supported {cass-short} types are the same as AVRO. +The output is an exact schema with logical types, but with a JSON schema type. ++ +{cass-short} static columns are supported in the following ways: ++ +-- * On row-level updates, static columns are included in the message value. * On partition-level updates, the clustering keys are null in the message key, and the message value only has static columns on `insert`/`update` operations. - +-- ++ For data types that aren't supported, columns using those data types are omitted from the events sent to the data topic. If a row update contains both supported and unsupported data types, the event includes only columns with supported data types. @@ -189,7 +198,7 @@ It stores the number of milliseconds since January 1, 1970, 00:00:00 GMT as an I == Manage schema updates on topics -Schema registry updates on a {pulsar-short} topic are controlled by the `is-allow-auto-update-schema` option. +Schema registry updates on a {pulsar-short} topic are controlled by the `is-allow-auto-update-schema` option: * `true` allows the broker to register a new schema for a topic and connect the producer if the schema isn't registered. * `false` rejects the producer's connection to the broker if the schema isn't registered. @@ -205,4 +214,4 @@ To ensure the data sent to all datacenters are delivered to the data topic, make For example, given a {cass-short} cluster with three datacenters (DC1, DC2, and DC3), you would enable CDC and install the change agent in only DC1. To ensure all updates in DC2 and DC3 are propagated to the data topic, configure the table's keyspace to replicate data from DC2 and DC3 to DC1. For example, `replication = {'class': 'NetworkTopologyStrategy', 'dc1': 3, 'dc2': 3, 'dc3': 3}`. -The data replicated to DC1 is processed by the change agent and eventually end up in the data topic. +The data replicated to DC1 is processed by the change agent and eventually end up in the data topic. \ No newline at end of file diff --git a/docs/modules/ROOT/pages/install.adoc b/docs/modules/ROOT/pages/install.adoc index 69b3a8eb..3723b941 100644 --- a/docs/modules/ROOT/pages/install.adoc +++ b/docs/modules/ROOT/pages/install.adoc @@ -29,14 +29,18 @@ Depending on the workloads of the CDC enabled {cass-short} tables, you might nee . Download the `cassandra-source-agents` tar file from the {product-repo}/releases[{product-short} GitHub repository]. The following files are available in the tar file: + -[cols="1,1"] +[cols=2] |=== -| {cass-short} type | JAR file +|{cass-short} type |JAR file -| {cass} 3.x | `agent-c3-**VERSION**-all.jar` -| {cass} 4.x | `agent-c4-**VERSION**-all.jar` -| {dse-short} 6.8.16 or later | `agent-dse4-**VERSION**-all.jar` +|{cass} 3.x +|`agent-c3-**VERSION**-all.jar` +|{cass} 4.x +|`agent-c4-**VERSION**-all.jar` + +|{dse-short} 6.8.16 or later +|`agent-dse4-**VERSION**-all.jar` |=== . Extract the files from the tar with the following command: @@ -74,9 +78,9 @@ And to enable CDC for your {cass-short} deployment: JVM_OPTS="$JVM_OPTS -javaagent:/home/automaton/cdc104/agent-dse4-pulsar-1.0.5-all.jar" ---- -The CDC parameter mappings between JVM and {cass-short} environment variables are provided in xref:stringMappings.adoc[CDC Environment Parameter Strings]. +The CDC parameter mappings between JVM and {cass-short} environment variables are provided in xref:ROOT:stringMappings.adoc[CDC Environment Parameter Strings]. -For the full set of JVM configuration options, see xref:install.adoc#agentParams[]. +For the full set of JVM configuration options, see <>. == cassandra.yaml @@ -128,7 +132,7 @@ For each CDC-enabled table, the change agent sends events to the events topic. The topic name is determined by the `topicPrefix` setting in the agent (default is `events-`). The `.` is appended to the prefix to build the topic name. -You have to specify the following parameters: +The following parameters are required: * Connector name. You have one connector per CDC-enabled {cass-short} table, make sure to use a unique name. * Previously downloaded {csc_pulsar} `NAR` file. @@ -171,12 +175,7 @@ pulsar-admin source status --name cassandra-source-1 Once the connector is running, it processes events from the `events` topic, and then publishes the result to the `data` topic. -For mre information and options, see the following: - -* xref:install.adoc#datastax-cassandra-source-connector-for-apache-pulsar-settings[{csc_pulsar} settings] -* xref:install.adoc#cassandra-authentication-settings[{cass-short} Authentication settings] -* xref:install.adoc#cassandra-ssltls-settings[{cass-short} SSL/TLS settings] -* xref:install.adoc#pass-cdc-for-cassandra-settings-directly-to-the-datastax-java-driver[Pass {csc_pulsar} settings directly to the {company} Java driver] +For all configuration options, see the other information on this page. == Enabling and disabling CDC on a table @@ -191,11 +190,11 @@ ALTER TABLE foo WITH cdc=false; When CDC is enabled on a table, updates to that table are sent by the change agent to the {csc_pulsar} which further processes the event and then sends it to the data topic when it can be processed by other connectors (for example, Elasticsearch). -include::partial$cfgCassandraSource.adoc[] +include::ROOT:partial$cfgCassandraSource.adoc[] -include::partial$cfgCassandraAuth.adoc[] +include::ROOT:partial$cfgCassandraAuth.adoc[] -include::partial$cfgCassandraSSL.adoc[] +include::ROOT:partial$cfgCassandraSSL.adoc[] == Pass {csc_pulsar} settings directly to the {company} Java driver @@ -216,29 +215,30 @@ If you don't provide either in your configuration, {csc_pulsar} defaults are app For information about the Java properties, refer to the https://docs.datastax.com/en/developer/java-driver/4.3/manual/core/configuration/reference/index.html[{company} Java driver documentation]. +[cols=2] |=== -| {csc_pulsar} | Using `datastax-java-driver` prefix +|{csc_pulsar} |Using `datastax-java-driver` prefix -| `contactPoints` -| `datastax-java-driver.basic.contact-points` +|`contactPoints` +|`datastax-java-driver.basic.contact-points` -| `loadBalancing.localDc` -| `datastax-java-driver.basic.load-balancing-policy.local-datacenter` +|`loadBalancing.localDc` +|`datastax-java-driver.basic.load-balancing-policy.local-datacenter` -| `cloud.secureConnectBundle` -| `datastax-java-driver.basic.cloud.secure-connect-bundle` +|`cloud.secureConnectBundle` +|`datastax-java-driver.basic.cloud.secure-connect-bundle` -| `queryExecutionTimeout` -| `datastax-java-driver.basic.request.timeout` +|`queryExecutionTimeout` +|`datastax-java-driver.basic.request.timeout` -| `connectionPoolLocalSize` -| `datastax-java-driver.advanced.connection.pool.local.size` +|`connectionPoolLocalSize` +|`datastax-java-driver.advanced.connection.pool.local.size` -| `compression` -| `datastax-java-driver.advanced.protocol.compression` +|`compression` +|`datastax-java-driver.advanced.protocol.compression` -| `metricsHighestLatency` -| `datastax-java-driver.advanced.metrics.session.cql-requests.highest-latency` +|`metricsHighestLatency` +|`datastax-java-driver.advanced.metrics.session.cql-requests.highest-latency` |=== There is a difference between the {csc_pulsar}'s `contactPoints` setting and the Java driver's `datastax-java-driver.basic.contact-points`. @@ -270,4 +270,4 @@ To further improve the throughput, you can adjust the `pulsarBatchDelayInMs` in To improve performance on individual connector instances as they read data from {cass-short}, you can adjust the `batch.size` and the `query.executors`. Increasing these values from their defaults increases parallelism within the connector instances. -The de-duplication cache is configurable, including the cache size with `cache.max.capacity`, the entry retention duration `cache.expire.after.ms` and the number of MD5 digest per primary key entry with `cache.max.digest`. +The de-duplication cache is configurable, including the cache size with `cache.max.capacity`, the entry retention duration `cache.expire.after.ms` and the number of MD5 digest per primary key entry with `cache.max.digest`. \ No newline at end of file diff --git a/docs/modules/ROOT/pages/monitor.adoc b/docs/modules/ROOT/pages/monitor.adoc index 61c0acc8..f844163d 100644 --- a/docs/modules/ROOT/pages/monitor.adoc +++ b/docs/modules/ROOT/pages/monitor.adoc @@ -1,8 +1,11 @@ = Monitoring {product-short} +Statistics and metrics are available to help you monitor the performance and health of {product-short}. + == Change Agent Metrics -The change agent is a JVM agent running in {cass-reg} nodes and provides a dedicated MBean type=`CdcAgent` with the following metrics: +The change agent is a JVM agent running in {cass-reg} nodes. +It provides a dedicated MBean `type='CdcAgent'` with the following metrics: [cols="2,1,3"] |=== @@ -55,12 +58,18 @@ The change agent is a JVM agent running in {cass-reg} nodes and provides a dedic == {product-short} stats -The {product-short} framework reports stats for each connector. You can view the stats for a connector like this: +The {product-short} framework reports stats for each connector. +You can use `pulsar-admin` commands to get a connector's stats: [source,bash] ---- pulsar-admin source stats --name cassandra-source-1 +---- +The output is a JSON object: + +[source,json] +---- { "numInstances" : 1, "numRunning" : 0, @@ -81,13 +90,14 @@ pulsar-admin source stats --name cassandra-source-1 } ---- -The stats `numReceivedFromSource` and `numWritten` indicate how many events have been processed by the {product-short}. -If the connector has errors, the counts are shown. -A description of the last seen error is displayed in the `error` field. +In this example, the stats `numReceivedFromSource` and `numWritten` indicate how many events have been processed by the {product-short}. + +If the connector has errors, the counts are output. +A description of the last seen error is output to the `error` field. == {product-short} metrics -{product-short} also publishes per message metrics: +{product-short} publishes per-message metrics: [cols="2,3"] |=== @@ -116,11 +126,15 @@ A description of the last seen error is displayed in the `error` field. |=== -Here an example of those user-defined metrics aggregated by {pulsar-reg} when processing 2000 mutations: +Here is an example of user-defined metrics aggregated by {pulsar-reg} when processing 2000 mutations: [source,bash] ---- curl http://localhost:8080/metrics/ 2>/dev/null | grep user +---- + +[source,console] +---- # HELP process_cpu_seconds_total Total user and system CPU time spent in seconds. # HELP pulsar_source_user_metric_ User defined metric. # TYPE pulsar_source_user_metric_ summary @@ -146,7 +160,7 @@ pulsar_source_user_metric__sum{tenant="public",namespace="public/default",name=" == Monitoring and Alerting resources -* The change agent exposes metrics with xref:planning:ROOT:metrics-alerts.adoc[JMX], a technology within Java that provides tools for managing and monitoring applications. -* xref:opscenter:overview:opscenter-about.adoc[{opscenter}] can collect these exposed metrics for visualization and alerts, and pass them on to xref:monitoring:ROOT:ops-use-metrics-collector.adoc[{metrics-collector}] for additional integration with https://prometheus.io/docs/introduction/overview/[Prometheus] and https://grafana.com/[Grafana]. +* The change agent xref:dse:managing:operations/monitor-contents.adoc[exposes metrics with JMX], a technology within Java that provides tools for managing and monitoring applications. +* xref:opscenter:overview:opscenter-about.adoc[{opscenter}] can collect these exposed metrics for visualization and alerts, and pass them on to xref:dse:managing:tools/about-metrics-collector.adoc[{metrics-collector}] for additional integration with https://prometheus.io/docs/introduction/overview/[Prometheus] and https://grafana.com/[Grafana]. * The https://github.com/datastax/metric-collector-for-apache-cassandra[Metrics Collector for {cass}] with Prometheus and Grafana dashboards provides the same functionality as {metrics-collector}, built on the well-supported collectd agent. * Other monitoring tools like https://github.com/prometheus/jmx_exporter[JMX Exporter] by Prometheus are available, but they might require additional tuning. \ No newline at end of file diff --git a/docs/modules/ROOT/pages/stringMappings.adoc b/docs/modules/ROOT/pages/stringMappings.adoc index 964a041e..796faf88 100644 --- a/docs/modules/ROOT/pages/stringMappings.adoc +++ b/docs/modules/ROOT/pages/stringMappings.adoc @@ -1,148 +1,156 @@ = CDC Change Agent Parameter Mappings -In CDC versions *before 1.0.3*, the {cdc_agent} {pulsar-short} connection parameters were provided as extra JVM options after the `.jar` file name in the form of a comma-separated list of `paramName=paramValue`, as below: - -[source,bash] ----- -export JVM_EXTRA_OPTS="-javaagent:/path/to/agent-c4-luna--all.jar=pulsarServiceUrl=pulsar://pulsar:6650" ----- - -In CDC versions *after 1.0.3*, the {cdc_agent} {pulsar-short} connection parameters are also provided as system environment parameters in `cassandra-env.sh`. The JVM option above is now appended to `cassandra-env.sh` as below: - +The format of parameter mappings depends on the version: + +* Version 1.0.3 and later: ++ +The {cdc_agent} {pulsar-short} connection parameters are provided as system environment parameters in `cassandra-env.sh`. +The JVM options are appended to `cassandra-env.sh`. +For example: ++ [source,bash,subs="+quotes"] ---- export CDC_PULSAR_SERVICE_URL="pulsar://**PULSAR_SERVER_IP**:6650" ---- -This document lists the CDC Change Agent parameter mappings between the JVM option strings and {cass-short} strings. +* Versions earlier than 1.0.3: ++ +The {cdc_agent} {pulsar-short} connection parameters are provided as extra JVM options after the `.jar` file name in the form of a comma-separated list of `paramName=paramValue`. +For example: ++ +[source,bash] +---- +export JVM_EXTRA_OPTS="-javaagent:/path/to/agent-c4-luna--all.jar=pulsarServiceUrl=pulsar://pulsar:6650" +---- -== Change Agent Parameter Mappings +The following table describes the CDC Change Agent parameters, and maps JVM option strings to {cass-short} strings. +.Change Agent parameter mappings [cols="2,3,1"] |=== -|JVM Option | Description | System Mapping -| *topicPrefix* -| The event topic name prefix. The `.` is appended to that prefix to build the topic name. -| TOPIC_PREFIX +|JVM Option |Description |System Mapping +|topicPrefix +|The event topic name prefix. The `.` is appended to that prefix to build the topic name. +|TOPIC_PREFIX -| *cdcWorkingDir* -| The CDC working directory where the last sent offset is saved, and where the archived and errored commitlogs files are copied. -| CDC_WORKING_DIR +|cdcWorkingDir +|The CDC working directory where the last sent offset is saved, and where the archived and errored commitlogs files are copied. +|CDC_WORKING_DIR -| *cdcPollIntervalMs* -| The poll interval in milliseconds for watching new commitlog files in the CDC raw directory. -| CDC_DIR_POLL_INTERNAL_MS +|cdcPollIntervalMs +|The poll interval in milliseconds for watching new commitlog files in the CDC raw directory. +|CDC_DIR_POLL_INTERNAL_MS -| *errorCommitLogReprocessEnabled* -| Enable the re-processing of error commitlogs files. -| ERROR_COMMITLOG_REPROCESS_ENABLED +|errorCommitLogReprocessEnabled +|Enable the re-processing of error commitlogs files. +|ERROR_COMMITLOG_REPROCESS_ENABLED -| *cdcConcurrentProcessors* -| The number of threads used to process commitlog files. The default value is the `memtable_flush_writers`. -| CDC_CONCURRENT_PROCESSORS +|cdcConcurrentProcessors +|The number of threads used to process commitlog files. The default value is the `memtable_flush_writers`. +|CDC_CONCURRENT_PROCESSORS -| *maxInflightMessagesPerTask* -| The maximum number of in-flight messages per commitlog processing task. -| MAX_INFLIGHT_MESSAGES_PER_TASK +|maxInflightMessagesPerTask +|The maximum number of in-flight messages per commitlog processing task. +|MAX_INFLIGHT_MESSAGES_PER_TASK -| *pulsarServiceUrl* -| The {pulsar-short} broker service URL. -| PULSAR_SERVICE_URL +|pulsarServiceUrl +|The {pulsar-short} broker service URL. +|PULSAR_SERVICE_URL -| *pulsarBatchDelayInMs* -| {pulsar-short} batching delay in milliseconds. {pulsar-short} batching is enabled when this value is greater than zero. -| PULSAR_BATCH_DELAY_IN_MS +|pulsarBatchDelayInMs +|{pulsar-short} batching delay in milliseconds. {pulsar-short} batching is enabled when this value is greater than zero. +|PULSAR_BATCH_DELAY_IN_MS -| *pulsarKeyBasedBatcher* -| When true, use the {pulsar-short} KEY_BASED BatchBuilder. -| PULSAR_KEY_BASED_BATCHER +|pulsarKeyBasedBatcher +|When true, use the {pulsar-short} KEY_BASED BatchBuilder. +|PULSAR_KEY_BASED_BATCHER -| *pulsarMaxPendingMessages* -| The {pulsar-short} maximum size of a queue holding pending messages. -| PULSAR_MAX_PENDING_MESSAGES +|pulsarMaxPendingMessages +|The {pulsar-short} maximum size of a queue holding pending messages. +|PULSAR_MAX_PENDING_MESSAGES -| *pulsarMaxPendingMessagesAcrossPartitions* -| The {pulsar-short} maximum number of pending messages across partitions. -| PULSAR_MAX_PENDING_MESSAGES_ACROSS_PARTITIONS +|pulsarMaxPendingMessagesAcrossPartitions +|The {pulsar-short} maximum number of pending messages across partitions. +|PULSAR_MAX_PENDING_MESSAGES_ACROSS_PARTITIONS -| *pulsarAuthPluginClassName* -| The {pulsar-short} authentication plugin class name. -| PULSAR_AUTH_PLUGIN_CLASS_NAME +|pulsarAuthPluginClassName +|The {pulsar-short} authentication plugin class name. +|PULSAR_AUTH_PLUGIN_CLASS_NAME -| *pulsarAuthParams* -| The {pulsar-short} authentication parameters. -| PULSAR_AUTH_PARAMS +|pulsarAuthParams +|The {pulsar-short} authentication parameters. +|PULSAR_AUTH_PARAMS -| *sslProvider* -| The SSL/TLS provider to use. -| SSL_PROVIDER +|sslProvider +|The SSL/TLS provider to use. +|SSL_PROVIDER -| *sslTruststorePath* -| The path to the SSL/TLS truststore file. -| SSL_TRUSTSTORE_PATH +|sslTruststorePath +|The path to the SSL/TLS truststore file. +|SSL_TRUSTSTORE_PATH -| *sslTruststorePassword* -| The password for the SSL/TLS truststore. -| SSL_TRUSTSTORE_PASSWORD +|sslTruststorePassword +|The password for the SSL/TLS truststore. +|SSL_TRUSTSTORE_PASSWORD -| *sslTruststoreType* -| The type of the SSL/TLS truststore. -| SSL_TRUSTSTORE_TYPE +|sslTruststoreType +|The type of the SSL/TLS truststore. +|SSL_TRUSTSTORE_TYPE -| *sslKeystorePath* -| The path to the SSL/TLS keystore file. -| SSL_KEYSTORE_PATH +|sslKeystorePath +|The path to the SSL/TLS keystore file. +|SSL_KEYSTORE_PATH -| *sslKeystorePassword* -| The password for the SSL/TLS keystore. -| SSL_KEYSTORE_PASSWORD +|sslKeystorePassword +|The password for the SSL/TLS keystore. +|SSL_KEYSTORE_PASSWORD -| *sslCipherSuites* -| Defines one or more cipher suites to use for negotiating the SSL/TLS connection. -| SSL_CIPHER_SUITES +|sslCipherSuites +|Defines one or more cipher suites to use for negotiating the SSL/TLS connection. +|SSL_CIPHER_SUITES -| *sslEnabledProtocols* -| Enabled SSL/TLS protocols -| SSL_ENABLED_PROTOCOLS +|sslEnabledProtocols +|Enabled SSL/TLS protocols +|SSL_ENABLED_PROTOCOLS -| *sslAllowInsecureConnection* -| Allows insecure connections to servers whose certificate hasn't been signed by an approved CA. You should always disable `sslAllowInsecureConnection` in production environments. -| SSL_ALLOW_INSECURE_CONNECTION +|sslAllowInsecureConnection +|Allows insecure connections to servers whose certificate hasn't been signed by an approved CA. You should always disable `sslAllowInsecureConnection` in production environments. +|SSL_ALLOW_INSECURE_CONNECTION -| *sslHostnameVerificationEnable* -| Enable the server hostname verification. -| SSL_HOSTNAME_VERIFICATION_ENABLE +|sslHostnameVerificationEnable +|Enable the server hostname verification. +|SSL_HOSTNAME_VERIFICATION_ENABLE -| *tlsTrustCertsFilePath* -| The path to the trusted TLS certificate file. -| TLS_TRUST_CERTS_FILE_PATH +|tlsTrustCertsFilePath +|The path to the trusted TLS certificate file. +|TLS_TRUST_CERTS_FILE_PATH -| *useKeyStoreTls* -| Enable or disable TLS keystore. -| USE_KEYSTORE_TLS -|=== +|useKeyStoreTls +|Enable or disable TLS keystore. +|USE_KEYSTORE_TLS +|=== \ No newline at end of file