{pulsar-short} SQL allows enterprises to query {pulsar} topic data with SQL. This is a powerful feature for an Enterprise, and SQL is a language they’re likely familiar with. Stream processing, real-time analytics, and highly customized dashboards are just a few of the possibilities. {pulsar-short} offers a pre-made plugin for Trino that is included in its distribution. Additionally, {pulsar-short} has built-in options to create Trino workers and automatically configure the communications between {pulsar-short}'s ledger and Trino.
|
Note
|
ROOT:partial$luna-name.adoc |
In this guide, we will use the {product} Helm Chart to install a {pulsar-short} cluster with {pulsar-short} SQL. The Trino coordinator and desired number of workers will be created directly in the cluster.
You will need the following prerequisites in place to complete this guide:
-
{pulsar-short} CLI
-
Presto CLI (this example version 0.278.1)
-
Helm 3 CLI (this example uses version 3.8.0)
-
Kubectl CLI (this example uses version 1.23.4)
-
Enough access to a K8s cluster to create a namespace, deployments, and pods
|
Important
|
PrestoDB has been replaced by Trino, but {pulsar} is using Presto’s version. The Trino CLI uses the "X-TRINO-USER" header for authentications but Presto expects "X-PRESTO-USER", which is why we use the Presto CLI. |
You’ll need to interact with services in the K8s cluster. Map a few ports to those services. There’s no need to forward {pulsar-short}'s messaging service ports.
In a new terminal window, port forward the Presto SQL service:
kubectl port-forward -n datastax-pulsar service/pulsar-sql 8090:8090In a separate terminal, port forward {pulsar-short}'s admin service:
kubectl port-forward -n datastax-pulsar service/pulsar-broker 8080:8080In this example, we will use the "data-generator" source connector to create a topic and add sample data simultaneously. The minimalist Helm chart values use the datastax/lunastreaming-all image, which includes all supported {pulsar-short} connectors. This example uses the "public" tenant and "default" namespace. These are created by default in {pulsar-short}, but you can use whatever tenant and namespace you are comfortable with.
-
Download the minimalist {pulsar-short} client. This "client.conf" assumes the port forwarding addresses we will perform in the next step.
wget https://raw.githubusercontent.com/datastaxdevs/luna-streaming-examples/main/client.conf
-
Set the client environment variable. Replace the value with the absolute path to the "client.conf" you just downloaded.
# the client conf path must be an absolute path to the downloaded conf file export PULSAR_CLIENT_CONF=<REPLACE_WITH /path/to/client.conf>
-
Navigate to the {pulsar-short} home folder and run the following command. The CLI will use the environment variable’s value as configuration for interacting with the {pulsar-short} cluster.
./bin/pulsar-admin sources create --name generator --destination-topic-name public/default/mytopic --source-type data-generator
NoteThe topic name will become a table name in Trino, so SQL naming rules apply.
Names can contain only alphanumeric characters and must begin with an alphabetic character or an underscore (_).
-
Confirm the topic was created.
./bin/pulsar-admin topics list public/defaultYou should get an output of:
persistent://public/default/mytopic-
Open a Presto shell session using the Presto CLI. The server’s port number should match the forwarded port number from the previous step. The user can match the name you used to login earlier in this guide, but doesn’t have to.
./presto --user test --server localhost:8090 -
Once inside the shell, try listing the catalogs Presto has loaded:
presto> show catalogs;
Notice the similarities between your {pulsar-short} tenant/namespaces and Presto’s output:
ResultCatalog --------- pulsar system (2 rows) Query 20230103_163242_00000_zvk84, FINISHED, 2 nodes
-
List the tables within the
pulsar."public/default"catalog/schema:presto> show tables in pulsar."public/default";
Notice the topic that you created from the data-generator source connector earlier in this guide:
Table --------- mytopic (1 row) Query 20230103_163355_00001_zvk84, FINISHED, 2 nodes
-
Query the table to see the first 10 rows of data:
presto> select * from pulsar."public/default".mytopic limit 10;
The output should be the 10 messages that were added to the {pulsar-short} topic previously.
If you prefer, you can query your table with the Presto client REST API. The response will include a
nextUrivalue. Follow that link to see the results.POST http://localhost:8090/v1/statement HTTP/1.1 content-type: application/json X-Presto-User: test select * from pulsar."public/default".mytopic limit 10
-
Exit the presto shell:
presto> exit
You have successfully interacted with a {pulsar-short} cluster via SQL.
Want to put your new learnings to the test? Try using the Presto plugin in Redash or Superset to create useful dashboards.
You might wonder why there are quotes ("") around the schema name. This is a result of mapping Presto primitives to {pulsar-short}'s primitives.
Presto has catalogs, schemas, and tables. {pulsar-short} has tenants, namespaces, and topics. The {pulsar-short} Presto plugin assumes the catalog name which leaves schema and table, so the tenant and namespace are combined with a forward slash delimited string. Presto has to see that combination as a single string, which means it needs to be wrapped in quotes.
This example queries messages from a Java app using the Presto JDBC client. You’ll need the Java runtime to run the example.
-
Create a file named "PrestoExample.java" with the following contents. Remember to update the JDBC connection with the correct values for your environment.
import java.sql.*; class PrestoExample { public static void main(String[] args) { try { Connection conn = DriverManager.getConnection("jdbc:presto://localhost:8090/pulsar?user=test"); System.out.println("Connection established......"); Statement stmt = conn.createStatement(); try { ResultSet rs = stmt.executeQuery("select * from pulsar.\"public/default\".mytopic limit 10"); while(rs.next()) { String str = rs.getString(1); System.out.println(String.format("%s", str)); } } finally { stmt.close(); conn.close(); } } catch (Exception e) { e.printStackTrace(); } } }
-
Build the Java class.
javac PrestoExample.java
-
Download the version 334 Presto JDBC driver to the same folder as the Java class.
-
Run the client and observe the output.
-cptells Java to use the downloaded presto-jdbc jar in your classpath to connect.java -cp .:presto-jdbc-334.jar PrestoExample
ResultConnection established...... {apartmentNumber=235, city=San Francisco, postalCode=29223, street=Tabor Court, streetNumber=94} {apartmentNumber=260, city=Miami, postalCode=42228, street=Tabor Court, streetNumber=92} {apartmentNumber=279, city=Washington, postalCode=22821, street=Summer Place, streetNumber=126} {apartmentNumber=220, city=Miami, postalCode=71871, street=Tabor Court, streetNumber=64} {apartmentNumber=294, city=Miami, postalCode=69067, street=Highland Place, streetNumber=167} {apartmentNumber=142, city=Washington, postalCode=32774, street=Aster Court, streetNumber=189} {apartmentNumber=233, city=Miami, postalCode=43499, street=Atkins Avenue, streetNumber=30} {apartmentNumber=, city=Washington, postalCode=92651, street=Tabor Court, streetNumber=49} {apartmentNumber=225, city=Washington, postalCode=64877, street=Tabor Court, streetNumber=160} {apartmentNumber=66, city=New York, postalCode=45519, street=Stillwell Avenue, streetNumber=23}

