Skip to content

Latest commit

 

History

History
299 lines (244 loc) · 9.72 KB

File metadata and controls

299 lines (244 loc) · 9.72 KB

Using {pulsar-short} SQL with {product}

{pulsar-short} SQL allows enterprises to query {pulsar} topic data with SQL. This is a powerful feature for an Enterprise, and SQL is a language they’re likely familiar with. Stream processing, real-time analytics, and highly customized dashboards are just a few of the possibilities. {pulsar-short} offers a pre-made plugin for Trino that is included in its distribution. Additionally, {pulsar-short} has built-in options to create Trino workers and automatically configure the communications between {pulsar-short}'s ledger and Trino.

Note

ROOT:partial$luna-name.adoc

In this guide, we will use the {product} Helm Chart to install a {pulsar-short} cluster with {pulsar-short} SQL. The Trino coordinator and desired number of workers will be created directly in the cluster.

Prerequisites

You will need the following prerequisites in place to complete this guide:

  • {pulsar-short} CLI

  • Presto CLI (this example version 0.278.1)

  • Helm 3 CLI (this example uses version 3.8.0)

  • Kubectl CLI (this example uses version 1.23.4)

  • Enough access to a K8s cluster to create a namespace, deployments, and pods

Important

PrestoDB has been replaced by Trino, but {pulsar} is using Presto’s version. The Trino CLI uses the "X-TRINO-USER" header for authentications but Presto expects "X-PRESTO-USER", which is why we use the Presto CLI.

Install {product} Helm chart

ROOT:partial$install-helm.adoc

Forward service port

You’ll need to interact with services in the K8s cluster. Map a few ports to those services. There’s no need to forward {pulsar-short}'s messaging service ports.

In a new terminal window, port forward the Presto SQL service:

kubectl port-forward -n datastax-pulsar service/pulsar-sql 8090:8090

In a separate terminal, port forward {pulsar-short}'s admin service:

kubectl port-forward -n datastax-pulsar service/pulsar-broker 8080:8080

Confirm Presto is available

  1. In a browser, navigate to http://localhost:8090. You will be greeted by Presto’s login.

    Presto SQL login
  2. There is no authentication enabled, so input whatever user name you prefer and select Log In. You will be greeted by a blank dashboard. This confirms Presto is up and running.

    Presto SQL dashboard

Fill a topic with the data-generator source

In this example, we will use the "data-generator" source connector to create a topic and add sample data simultaneously. The minimalist Helm chart values use the datastax/lunastreaming-all image, which includes all supported {pulsar-short} connectors. This example uses the "public" tenant and "default" namespace. These are created by default in {pulsar-short}, but you can use whatever tenant and namespace you are comfortable with.

  1. Download the minimalist {pulsar-short} client. This "client.conf" assumes the port forwarding addresses we will perform in the next step.

    wget https://raw.githubusercontent.com/datastaxdevs/luna-streaming-examples/main/client.conf
  2. Set the client environment variable. Replace the value with the absolute path to the "client.conf" you just downloaded.

    # the client conf path must be an absolute path to the downloaded conf file
    export PULSAR_CLIENT_CONF=<REPLACE_WITH /path/to/client.conf>
  3. Navigate to the {pulsar-short} home folder and run the following command. The CLI will use the environment variable’s value as configuration for interacting with the {pulsar-short} cluster.

    ./bin/pulsar-admin sources create --name generator --destination-topic-name public/default/mytopic --source-type data-generator
    Note

    The topic name will become a table name in Trino, so SQL naming rules apply.

    Names can contain only alphanumeric characters and must begin with an alphabetic character or an underscore (_).

  4. Confirm the topic was created.

./bin/pulsar-admin topics list public/default

You should get an output of:

persistent://public/default/mytopic

Interact with the topic data

  1. Open a Presto shell session using the Presto CLI. The server’s port number should match the forwarded port number from the previous step. The user can match the name you used to login earlier in this guide, but doesn’t have to.

    ./presto --user test --server localhost:8090
  2. Once inside the shell, try listing the catalogs Presto has loaded:

    presto> show catalogs;

    Notice the similarities between your {pulsar-short} tenant/namespaces and Presto’s output:

    Result
     Catalog
    ---------
     pulsar
     system
    (2 rows)
    
    Query 20230103_163242_00000_zvk84, FINISHED, 2 nodes
  3. List the tables within the pulsar."public/default" catalog/schema:

    presto> show tables in pulsar."public/default";

    Notice the topic that you created from the data-generator source connector earlier in this guide:

      Table
    ---------
     mytopic
    (1 row)
    
    Query 20230103_163355_00001_zvk84, FINISHED, 2 nodes
  4. Query the table to see the first 10 rows of data:

    presto> select * from pulsar."public/default".mytopic limit 10;

    The output should be the 10 messages that were added to the {pulsar-short} topic previously.

    If you prefer, you can query your table with the Presto client REST API. The response will include a nextUri value. Follow that link to see the results.

    POST http://localhost:8090/v1/statement  HTTP/1.1
    content-type: application/json
    X-Presto-User: test
    
    select * from pulsar."public/default".mytopic limit 10
  5. Exit the presto shell:

    presto> exit

You have successfully interacted with a {pulsar-short} cluster via SQL.

Want to put your new learnings to the test? Try using the Presto plugin in Redash or Superset to create useful dashboards.

Why are there quotes around the schema name?

You might wonder why there are quotes ("") around the schema name. This is a result of mapping Presto primitives to {pulsar-short}'s primitives.

Presto has catalogs, schemas, and tables. {pulsar-short} has tenants, namespaces, and topics. The {pulsar-short} Presto plugin assumes the catalog name which leaves schema and table, so the tenant and namespace are combined with a forward slash delimited string. Presto has to see that combination as a single string, which means it needs to be wrapped in quotes.

Connect with JDBC

This example queries messages from a Java app using the Presto JDBC client. You’ll need the Java runtime to run the example.

  1. Create a file named "PrestoExample.java" with the following contents. Remember to update the JDBC connection with the correct values for your environment.

    import java.sql.*;
    
    class PrestoExample {
      public static void main(String[] args) {
        try {
          Connection conn = DriverManager.getConnection("jdbc:presto://localhost:8090/pulsar?user=test");
          System.out.println("Connection established......");
    
          Statement stmt = conn.createStatement();
    
          try {
            ResultSet rs = stmt.executeQuery("select * from pulsar.\"public/default\".mytopic limit 10");
            while(rs.next()) {
                String str = rs.getString(1);
                System.out.println(String.format("%s", str));
            }
          }
          finally {
            stmt.close();
            conn.close();
          }
        }
        catch (Exception e) {
          e.printStackTrace();
        }
      }
    }
  2. Build the Java class.

    javac PrestoExample.java
  3. Download the version 334 Presto JDBC driver to the same folder as the Java class.

  4. Run the client and observe the output. -cp tells Java to use the downloaded presto-jdbc jar in your classpath to connect.

    java -cp .:presto-jdbc-334.jar PrestoExample
    Result
    Connection established......
    {apartmentNumber=235, city=San Francisco, postalCode=29223, street=Tabor Court, streetNumber=94}
    {apartmentNumber=260, city=Miami, postalCode=42228, street=Tabor Court, streetNumber=92}
    {apartmentNumber=279, city=Washington, postalCode=22821, street=Summer Place, streetNumber=126}
    {apartmentNumber=220, city=Miami, postalCode=71871, street=Tabor Court, streetNumber=64}
    {apartmentNumber=294, city=Miami, postalCode=69067, street=Highland Place, streetNumber=167}
    {apartmentNumber=142, city=Washington, postalCode=32774, street=Aster Court, streetNumber=189}
    {apartmentNumber=233, city=Miami, postalCode=43499, street=Atkins Avenue, streetNumber=30}
    {apartmentNumber=, city=Washington, postalCode=92651, street=Tabor Court, streetNumber=49}
    {apartmentNumber=225, city=Washington, postalCode=64877, street=Tabor Court, streetNumber=160}
    {apartmentNumber=66, city=New York, postalCode=45519, street=Stillwell Avenue, streetNumber=23}

Clean up

ROOT:partial$cleanup-terminal-and-helm.adoc