Skip to content

[pull] master from cube-js:master#441

Merged
pull[bot] merged 1 commit intocode:masterfrom
cube-js:master
May 1, 2026
Merged

[pull] master from cube-js:master#441
pull[bot] merged 1 commit intocode:masterfrom
cube-js:master

Conversation

@pull
Copy link
Copy Markdown

@pull pull Bot commented May 1, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

* docs: document Kafka streams mode for ksqlDB integration

Add documentation for the Kafka streams mode, where Cube reads data
directly from Kafka topics instead of going through the ksqlDB REST API
for data streaming. In this mode, Cube does not create any tables or
streams in ksqlDB.

The documentation covers:
- What Kafka streams mode is and how it differs from the default mode
- When to use it (read-only ksqlDB, higher throughput, restricted perms)
- How to enable it via CUBEJS_DB_KAFKA_* environment variables
- How it works under the hood (metadata from ksqlDB, data from Kafka)
- Configuration via driverFactory for programmatic setup

Also fixes incorrect 'Possible Values' descriptions for CUBEJS_DB_USER
and CUBEJS_DB_PASS in the env vars table.

Updated both the Mintlify docs (docs-mintlify/) and the legacy Nextra
docs (docs/) for consistency.

Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

* docs: replace driverFactory with data modeling example for Kafka streams mode

Remove the driverFactory configuration section and replace it with a
Data modeling section that shows:
- How to configure ksqlDB as a named data source using decorated
  environment variables (CUBEJS_DS_KSQL_DB_*)
- How to create a cube with data_source: ksql that references an
  existing ksqlDB stream or table
- A complete cube definition with measures, dimensions, and a streaming
  pre-aggregation in both YAML and JavaScript

Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

* docs: add lambda pre-aggregation example with batch + streaming cubes

Replace the simple single-cube example with a full lambda pre-aggregation
pattern showing:
- A batch cube (order_events) querying a warehouse with FILTER_PARAMS,
  incremental daily partitions, and a rollup_lambda that merges batch
  and streaming rollups
- A streaming cube (order_events_stream) with data_source: ksql pointing
  at an existing ksqlDB stream, using read_only: true, stream_offset,
  unique_key_columns, and incremental refresh
- Documentation of key streaming pre-aggregation properties (read_only,
  stream_offset, unique_key_columns)

Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

* docs: remove external: true from pre-aggregation examples

It is true by default, so specifying it is redundant.

Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

* docs: explain unique key columns, stream format, and filtering

Add three new subsections to the Kafka streams mode documentation:

- Unique key columns and deduplication: explains how __seq column is
  appended from Kafka offset, deduplication happens at read/compaction
  time (last row per key wins), and key values can fall back to Kafka
  message key when missing from payload
- Stream format: documents the expected JSON object format for Kafka
  message values, case-sensitivity of field names, and optional message
  key parsing
- Filtering on the stream: explains that Cube Store applies SELECT
  projections and WHERE filters from the cube's sql property directly
  on each micro-batch of Kafka messages, without creating any objects
  in ksqlDB

Also expands stream_offset documentation to explain defaults and
automatic resume behavior on subsequent refreshes.

Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

* docs: document stream filtering limitations and supported SQL syntax

Add a 'Supported SQL syntax' subsection under 'Filtering on the stream'
that documents:
- The strict plan shape requirement (Projection > Filter > TableScan)
- Supported clauses: SELECT, WHERE with comparisons/boolean logic,
  IS NULL, IN, BETWEEN, CASE, CAST, EXTRACT, SUBSTRING, scalar
  functions, CONVERT_TZ, nested expressions
- Unsupported clauses: JOIN, subqueries, GROUP BY, HAVING, aggregates,
  ORDER BY, LIMIT/OFFSET, UNION/INTERSECT/EXCEPT, window functions,
  multiple FROM/WHERE, CTEs
- Alias requirement for non-column expressions
- Unique key column expression constraints

Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

* docs: document time dimension truncation and ksql timestamp UDFs

Add PARSE_TIMESTAMP and FORMAT_TIMESTAMP to the supported functions
list, and add a paragraph explaining that time dimension truncation
(granularity) is fully supported via the
PARSE_TIMESTAMP(FORMAT_TIMESTAMP(CONVERT_TZ(...))) expression chain
that Cube generates automatically and Cube Store evaluates natively
as custom UDFs in its post-processing engine.

Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

* docs: document timestamp handling for Kafka streams

Add a 'Timestamp handling' subsection under 'Stream format' that covers:
- String timestamps: ISO 8601 / RFC 3339 formats with examples
- Numeric timestamps: epoch milliseconds (not seconds/microseconds)
- PARSE_TIMESTAMP for converting non-standard timestamp formats
- Time dimension truncation via granularity (auto-generated
  PARSE_TIMESTAMP/FORMAT_TIMESTAMP/CONVERT_TZ chain)
- date_trunc availability as a standard SQL function

Move time dimension truncation docs from the Supported SQL syntax
section into the Timestamp handling section where it fits better.
Add date_trunc to the supported functions list.

Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

* docs: explain primary key requirement for ungrouped streaming queries

Add 'Primary key and ungrouped queries' subsection explaining:
- Cube Store's stream post-processing does not support GROUP BY
- Cube omits GROUP BY when at least one dimension has primary_key: true
- This makes the query a simple SELECT ... FROM ... eligible for
  read-only streaming
- Without a primary key dimension, GROUP BY is generated and the
  pre-aggregation cannot use the streaming path

Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

* docs: clarify that all primary key columns must be included

Update the ungrouped query requirement to state that all primary key
columns must be present in the streaming pre-aggregation's dimensions
list, not just one.

Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

* docs: add YAML model examples alongside JavaScript

Wrap the data modeling example in CodeGroup/CodeTabs with both YAML and
JavaScript versions of the batch cube (order_events) and streaming cube
(order_events_stream) including all pre-aggregation configuration.

Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
@pull pull Bot locked and limited conversation to collaborators May 1, 2026
@pull pull Bot added the ⤵️ pull label May 1, 2026
@pull pull Bot merged commit 98128af into code:master May 1, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant