[pull] master from cube-js:master by pull[bot] · Pull Request #441 · code/app-cube-js

pull · 2026-05-01T19:22:05Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

* docs: document Kafka streams mode for ksqlDB integration Add documentation for the Kafka streams mode, where Cube reads data directly from Kafka topics instead of going through the ksqlDB REST API for data streaming. In this mode, Cube does not create any tables or streams in ksqlDB. The documentation covers: - What Kafka streams mode is and how it differs from the default mode - When to use it (read-only ksqlDB, higher throughput, restricted perms) - How to enable it via CUBEJS_DB_KAFKA_* environment variables - How it works under the hood (metadata from ksqlDB, data from Kafka) - Configuration via driverFactory for programmatic setup Also fixes incorrect 'Possible Values' descriptions for CUBEJS_DB_USER and CUBEJS_DB_PASS in the env vars table. Updated both the Mintlify docs (docs-mintlify/) and the legacy Nextra docs (docs/) for consistency. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * docs: replace driverFactory with data modeling example for Kafka streams mode Remove the driverFactory configuration section and replace it with a Data modeling section that shows: - How to configure ksqlDB as a named data source using decorated environment variables (CUBEJS_DS_KSQL_DB_*) - How to create a cube with data_source: ksql that references an existing ksqlDB stream or table - A complete cube definition with measures, dimensions, and a streaming pre-aggregation in both YAML and JavaScript Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * docs: add lambda pre-aggregation example with batch + streaming cubes Replace the simple single-cube example with a full lambda pre-aggregation pattern showing: - A batch cube (order_events) querying a warehouse with FILTER_PARAMS, incremental daily partitions, and a rollup_lambda that merges batch and streaming rollups - A streaming cube (order_events_stream) with data_source: ksql pointing at an existing ksqlDB stream, using read_only: true, stream_offset, unique_key_columns, and incremental refresh - Documentation of key streaming pre-aggregation properties (read_only, stream_offset, unique_key_columns) Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * docs: remove external: true from pre-aggregation examples It is true by default, so specifying it is redundant. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * docs: explain unique key columns, stream format, and filtering Add three new subsections to the Kafka streams mode documentation: - Unique key columns and deduplication: explains how __seq column is appended from Kafka offset, deduplication happens at read/compaction time (last row per key wins), and key values can fall back to Kafka message key when missing from payload - Stream format: documents the expected JSON object format for Kafka message values, case-sensitivity of field names, and optional message key parsing - Filtering on the stream: explains that Cube Store applies SELECT projections and WHERE filters from the cube's sql property directly on each micro-batch of Kafka messages, without creating any objects in ksqlDB Also expands stream_offset documentation to explain defaults and automatic resume behavior on subsequent refreshes. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * docs: document stream filtering limitations and supported SQL syntax Add a 'Supported SQL syntax' subsection under 'Filtering on the stream' that documents: - The strict plan shape requirement (Projection > Filter > TableScan) - Supported clauses: SELECT, WHERE with comparisons/boolean logic, IS NULL, IN, BETWEEN, CASE, CAST, EXTRACT, SUBSTRING, scalar functions, CONVERT_TZ, nested expressions - Unsupported clauses: JOIN, subqueries, GROUP BY, HAVING, aggregates, ORDER BY, LIMIT/OFFSET, UNION/INTERSECT/EXCEPT, window functions, multiple FROM/WHERE, CTEs - Alias requirement for non-column expressions - Unique key column expression constraints Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * docs: document time dimension truncation and ksql timestamp UDFs Add PARSE_TIMESTAMP and FORMAT_TIMESTAMP to the supported functions list, and add a paragraph explaining that time dimension truncation (granularity) is fully supported via the PARSE_TIMESTAMP(FORMAT_TIMESTAMP(CONVERT_TZ(...))) expression chain that Cube generates automatically and Cube Store evaluates natively as custom UDFs in its post-processing engine. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * docs: document timestamp handling for Kafka streams Add a 'Timestamp handling' subsection under 'Stream format' that covers: - String timestamps: ISO 8601 / RFC 3339 formats with examples - Numeric timestamps: epoch milliseconds (not seconds/microseconds) - PARSE_TIMESTAMP for converting non-standard timestamp formats - Time dimension truncation via granularity (auto-generated PARSE_TIMESTAMP/FORMAT_TIMESTAMP/CONVERT_TZ chain) - date_trunc availability as a standard SQL function Move time dimension truncation docs from the Supported SQL syntax section into the Timestamp handling section where it fits better. Add date_trunc to the supported functions list. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * docs: explain primary key requirement for ungrouped streaming queries Add 'Primary key and ungrouped queries' subsection explaining: - Cube Store's stream post-processing does not support GROUP BY - Cube omits GROUP BY when at least one dimension has primary_key: true - This makes the query a simple SELECT ... FROM ... eligible for read-only streaming - Without a primary key dimension, GROUP BY is generated and the pre-aggregation cannot use the streaming path Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * docs: clarify that all primary key columns must be included Update the ungrouped query requirement to state that all primary key columns must be present in the streaming pre-aggregation's dimensions list, not just one. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * docs: add YAML model examples alongside JavaScript Wrap the data modeling example in CodeGroup/CodeTabs with both YAML and JavaScript versions of the batch cube (order_events) and streaming cube (order_events_stream) including all pre-aggregation configuration. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com>

pull Bot locked and limited conversation to collaborators May 1, 2026

pull Bot added the ⤵️ pull label May 1, 2026

pull Bot merged commit 98128af into code:master May 1, 2026

github-actions Bot added the pr:community label May 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from cube-js:master#441

[pull] master from cube-js:master#441
pull[bot] merged 1 commit intocode:masterfrom
cube-js:master

pull Bot commented May 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pull Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pull Bot commented May 1, 2026 •

edited

Loading