[pull] master from cube-js:master#441
Merged
pull[bot] merged 1 commit intocode:masterfrom May 1, 2026
Merged
Conversation
* docs: document Kafka streams mode for ksqlDB integration Add documentation for the Kafka streams mode, where Cube reads data directly from Kafka topics instead of going through the ksqlDB REST API for data streaming. In this mode, Cube does not create any tables or streams in ksqlDB. The documentation covers: - What Kafka streams mode is and how it differs from the default mode - When to use it (read-only ksqlDB, higher throughput, restricted perms) - How to enable it via CUBEJS_DB_KAFKA_* environment variables - How it works under the hood (metadata from ksqlDB, data from Kafka) - Configuration via driverFactory for programmatic setup Also fixes incorrect 'Possible Values' descriptions for CUBEJS_DB_USER and CUBEJS_DB_PASS in the env vars table. Updated both the Mintlify docs (docs-mintlify/) and the legacy Nextra docs (docs/) for consistency. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * docs: replace driverFactory with data modeling example for Kafka streams mode Remove the driverFactory configuration section and replace it with a Data modeling section that shows: - How to configure ksqlDB as a named data source using decorated environment variables (CUBEJS_DS_KSQL_DB_*) - How to create a cube with data_source: ksql that references an existing ksqlDB stream or table - A complete cube definition with measures, dimensions, and a streaming pre-aggregation in both YAML and JavaScript Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * docs: add lambda pre-aggregation example with batch + streaming cubes Replace the simple single-cube example with a full lambda pre-aggregation pattern showing: - A batch cube (order_events) querying a warehouse with FILTER_PARAMS, incremental daily partitions, and a rollup_lambda that merges batch and streaming rollups - A streaming cube (order_events_stream) with data_source: ksql pointing at an existing ksqlDB stream, using read_only: true, stream_offset, unique_key_columns, and incremental refresh - Documentation of key streaming pre-aggregation properties (read_only, stream_offset, unique_key_columns) Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * docs: remove external: true from pre-aggregation examples It is true by default, so specifying it is redundant. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * docs: explain unique key columns, stream format, and filtering Add three new subsections to the Kafka streams mode documentation: - Unique key columns and deduplication: explains how __seq column is appended from Kafka offset, deduplication happens at read/compaction time (last row per key wins), and key values can fall back to Kafka message key when missing from payload - Stream format: documents the expected JSON object format for Kafka message values, case-sensitivity of field names, and optional message key parsing - Filtering on the stream: explains that Cube Store applies SELECT projections and WHERE filters from the cube's sql property directly on each micro-batch of Kafka messages, without creating any objects in ksqlDB Also expands stream_offset documentation to explain defaults and automatic resume behavior on subsequent refreshes. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * docs: document stream filtering limitations and supported SQL syntax Add a 'Supported SQL syntax' subsection under 'Filtering on the stream' that documents: - The strict plan shape requirement (Projection > Filter > TableScan) - Supported clauses: SELECT, WHERE with comparisons/boolean logic, IS NULL, IN, BETWEEN, CASE, CAST, EXTRACT, SUBSTRING, scalar functions, CONVERT_TZ, nested expressions - Unsupported clauses: JOIN, subqueries, GROUP BY, HAVING, aggregates, ORDER BY, LIMIT/OFFSET, UNION/INTERSECT/EXCEPT, window functions, multiple FROM/WHERE, CTEs - Alias requirement for non-column expressions - Unique key column expression constraints Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * docs: document time dimension truncation and ksql timestamp UDFs Add PARSE_TIMESTAMP and FORMAT_TIMESTAMP to the supported functions list, and add a paragraph explaining that time dimension truncation (granularity) is fully supported via the PARSE_TIMESTAMP(FORMAT_TIMESTAMP(CONVERT_TZ(...))) expression chain that Cube generates automatically and Cube Store evaluates natively as custom UDFs in its post-processing engine. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * docs: document timestamp handling for Kafka streams Add a 'Timestamp handling' subsection under 'Stream format' that covers: - String timestamps: ISO 8601 / RFC 3339 formats with examples - Numeric timestamps: epoch milliseconds (not seconds/microseconds) - PARSE_TIMESTAMP for converting non-standard timestamp formats - Time dimension truncation via granularity (auto-generated PARSE_TIMESTAMP/FORMAT_TIMESTAMP/CONVERT_TZ chain) - date_trunc availability as a standard SQL function Move time dimension truncation docs from the Supported SQL syntax section into the Timestamp handling section where it fits better. Add date_trunc to the supported functions list. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * docs: explain primary key requirement for ungrouped streaming queries Add 'Primary key and ungrouped queries' subsection explaining: - Cube Store's stream post-processing does not support GROUP BY - Cube omits GROUP BY when at least one dimension has primary_key: true - This makes the query a simple SELECT ... FROM ... eligible for read-only streaming - Without a primary key dimension, GROUP BY is generated and the pre-aggregation cannot use the streaming path Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * docs: clarify that all primary key columns must be included Update the ungrouped query requirement to state that all primary key columns must be present in the streaming pre-aggregation's dimensions list, not just one. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * docs: add YAML model examples alongside JavaScript Wrap the data modeling example in CodeGroup/CodeTabs with both YAML and JavaScript versions of the batch cube (order_events) and streaming cube (order_events_stream) including all pre-aggregation configuration. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )