diff --git a/.cursor/rules/namings-rule.mdc b/.cursor/rules/namings-rule.mdc index d62d2a1439ca5..a62a1dde76e1a 100644 --- a/.cursor/rules/namings-rule.mdc +++ b/.cursor/rules/namings-rule.mdc @@ -88,7 +88,7 @@ Guidance: infrastructure. # Product Taxonomy -Make sure to use correct terms. +Make sure to use correct terms. On billing, pricing, and support pages, use **on-demand customers** for the on-demand payment plan (legacy billing copy: "self-serve customers") and **contract customers** for the commit payment plan (legacy: "order form customers"). Elsewhere, **self-serve** (e.g. self-serve analytics) describes end-user exploration, not the billing segment. - **Account** - **Deployment** diff --git a/docs-mintlify/admin/account-billing/ai-tokens.mdx b/docs-mintlify/admin/account-billing/ai-tokens.mdx index f66335e49778b..3dd3f0e8ed179 100644 --- a/docs-mintlify/admin/account-billing/ai-tokens.mdx +++ b/docs-mintlify/admin/account-billing/ai-tokens.mdx @@ -12,10 +12,10 @@ Cube's [AI-powered features][ref-ai-overview] consume tokens based on the resources required to complete each request. Token allocation differs by customer type: -- **Self-serve customers** receive [per-seat token grants](#per-seat-token-grants) +- **On-demand customers** receive [per-seat token grants](#per-seat-token-grants) equal to half of their seat price, with optional [on-demand consumption](#on-demand-consumption) beyond that -- **Order form customers** can purchase [pooled token packages](#token-packages) +- **Contract customers** can purchase [pooled token packages](#token-packages) ## Token consumption @@ -36,7 +36,7 @@ is subject to change as the product evolves. ## Per-seat token grants -Self-serve customers on paid plans receive **per-seat token grants** equal to +On-demand customers on paid plans receive **per-seat token grants** equal to **half of the seat price**. Each user is awarded an individual monthly token allocation based on their role. @@ -48,7 +48,7 @@ Per-seat grants: ### On-demand consumption -When a self-serve user exceeds their monthly per-seat grant, usage +When a user on an on-demand plan exceeds their monthly per-seat grant, usage automatically continues as **on-demand consumption**. On-demand usage is billed through the credit card on file. @@ -58,7 +58,7 @@ for each billing cycle. ## Token packages -Order form customers can purchase **pooled add-on token packages**. Token +Contract customers can purchase **pooled add-on token packages**. Token packages are added to a shared pool accessible by all users in the account. - Each package is valid for the duration of the contract or until fully diff --git a/docs-mintlify/admin/account-billing/billing-faq.mdx b/docs-mintlify/admin/account-billing/billing-faq.mdx index 8311bc46cb109..a87fed4de4ea7 100644 --- a/docs-mintlify/admin/account-billing/billing-faq.mdx +++ b/docs-mintlify/admin/account-billing/billing-faq.mdx @@ -8,9 +8,9 @@ description: Frequently asked questions about Cube billing, pricing, payments, a ### How does the billing structure work? -#### Self-serve customers +#### On-demand customers -Self-serve customers are billed monthly. The credit card on file is automatically +On-demand customers are billed monthly. The credit card on file is automatically charged through Stripe upon invoice generation. - Invoices bill active seats in advance for the upcoming period @@ -18,38 +18,38 @@ charged through Stripe upon invoice generation. arrears on a prorated basis on the next invoice, as well as billed in advance on all subsequent invoices until the customer makes further adjustments -#### Order form customers +#### Contract customers -**Order form customers** are billed annually and upfront according to the terms -in their signed order form. +**Contract customers** are billed annually and upfront according to the terms +in their contract. - Committed seats are billed annually - Overages for uncommitted seats and usage are billed in arrears separately on a - basis specified in the signed order form + basis specified in the contract - Standard payment terms are **Net 30**, unless otherwise specified in the agreement ### What payment methods does Cube accept? -#### Self-serve customers +#### On-demand customers -Self-serve subscriptions must be paid by credit card via Stripe. Your card is +On-demand subscriptions must be paid by credit card via Stripe. Your card is automatically charged on a recurring subscription basis at the start of each billing cycle. -Cube does **not** support ACH or wire transfer payments for self-serve plans. +Cube does **not** support ACH or wire transfer payments for on-demand payment plans. -#### Order form customers +#### Contract customers -Order form customers can pay via: +Contract customers can pay via: - **Invoice billing** (e.g., Net 30 terms) — Payment terms are defined in your - executed order form and reflected on your invoice. + contract and reflected on your invoice. - **ACH / wire transfer** — Bank details (Stripe-managed virtual accounts) are listed directly on the invoice. Please reach out to [support@cube.dev](mailto:support@cube.dev) if you require validation. -- **Credit card** (via Stripe) — Order form customers may also pay by credit card. +- **Credit card** (via Stripe) — Contract customers may also pay by credit card. ### How can I update my payment information? diff --git a/docs-mintlify/admin/account-billing/pricing.mdx b/docs-mintlify/admin/account-billing/pricing.mdx index 6febf1fb8c08f..a199e1d34f15d 100644 --- a/docs-mintlify/admin/account-billing/pricing.mdx +++ b/docs-mintlify/admin/account-billing/pricing.mdx @@ -30,7 +30,7 @@ subscription: card, and start using Cube Cloud right away. [Starter](#starter) and [Premium](#premium) product tiers are available on the on-demand payment plan. * _Commit payment plan_ allows you to have a contract with a CCU amount specified -in an order form. [Premium](#premium) and [Enterprise](#enterprise) product tiers are available on the +in that contract. [Premium](#premium) and [Enterprise](#enterprise) product tiers are available on the commit payment plan. [Contact us][link-contact-us] to learn more. ## Product tiers diff --git a/docs-mintlify/admin/account-billing/support.mdx b/docs-mintlify/admin/account-billing/support.mdx index be4f175e94376..fe1721b0692da 100644 --- a/docs-mintlify/admin/account-billing/support.mdx +++ b/docs-mintlify/admin/account-billing/support.mdx @@ -29,8 +29,8 @@ with **2 support tickets per month** through the in-product chat feature. * [Premium product tier][ref-premium-tier] includes support via **online resources** such as [documentation][ref-docs-intro], [webinars][cube-webinars], and [community Slack][cube-slack]. -* It also includes **either 4 support tickets per month** (self-serve customers) -or **unlimited support tickets** (customers with an annual commit) for our support engineers during +* It also includes **either 4 support tickets per month** (on-demand customers) +or **unlimited support tickets** (contract customers) for our support engineers during [support hours](#support-hours) through our in-product chat. | Priority | Response time during support hours | diff --git a/docs-mintlify/docs.json b/docs-mintlify/docs.json index ab131e9bacec3..2aae2264dcc93 100644 --- a/docs-mintlify/docs.json +++ b/docs-mintlify/docs.json @@ -11,10 +11,6 @@ "icons": { "library": "tabler" }, - "banner": { - "content": "🟣 Agentic Analytics Summit — April 29, 2026 — Online. Registration is open! [Join now →](https://cube.dev/events/agentic-analytics-summit)", - "dismissible": true - }, "navigation": { "tabs": [ { @@ -137,7 +133,8 @@ "group": "Views", "root": "docs/data-modeling/views", "pages": [ - "docs/data-modeling/multi-fact-views" + "docs/data-modeling/multi-fact-views", + "docs/data-modeling/view-groups" ] }, { @@ -456,6 +453,7 @@ "pages": [ "reference/data-modeling/cube", "reference/data-modeling/view", + "reference/data-modeling/view-group", "reference/data-modeling/measures", "reference/data-modeling/dimensions", "reference/data-modeling/hierarchies", diff --git a/docs-mintlify/docs/data-modeling/concepts/syntax.mdx b/docs-mintlify/docs/data-modeling/concepts/syntax.mdx index 2765b621d0d93..615358b88dc1c 100644 --- a/docs-mintlify/docs/data-modeling/concepts/syntax.mdx +++ b/docs-mintlify/docs/data-modeling/concepts/syntax.mdx @@ -17,7 +17,8 @@ option][ref-config-repository-factory] to dynamically define the folder name and data model file contents. It's recommended to place each cube or view in a separate file, in `model/cubes` -and `model/views` folders, respectively. Example: +and `model/views` folders, respectively. [View groups][ref-view-groups] can be +defined alongside views or in their own files. Example: ```tree model @@ -26,7 +27,8 @@ model │ ├── products.yml │ └── users.yml └── views - └── revenue.yml + ├── revenue.yml + └── view_groups.yml ``` ## Model syntax @@ -998,6 +1000,7 @@ string values in time dimensions. [ref-dax-api-date-hierarchies]: /reference/dax-api#date-hierarchies [ref-time-dimension]: /docs/data-modeling/dimensions#time-dimensions [ref-recipe-string-time-dimensions]: /recipes/data-modeling/string-time-dimensions +[ref-view-groups]: /reference/data-modeling/view-group [ref-views]: /docs/data-modeling/views [ref-preaggs]: /reference/data-modeling/pre-aggregations [ref-join-paths]: /docs/data-modeling/joins#join-paths diff --git a/docs-mintlify/docs/data-modeling/view-groups.mdx b/docs-mintlify/docs/data-modeling/view-groups.mdx new file mode 100644 index 0000000000000..e078263a06bde --- /dev/null +++ b/docs-mintlify/docs/data-modeling/view-groups.mdx @@ -0,0 +1,165 @@ +--- +title: View groups +description: View groups organize views into named collections by domain or purpose, helping downstream consumers — including AI agents and embedded analytics — navigate large data models. +--- + +When a data model contains many [views][ref-views], view groups help organize +them into named collections by domain or purpose — for example, `sales`, +`finance`, or `people`. View groups are exposed through the +[`/v1/meta`][ref-meta-endpoint] API, making it easier for downstream tools, +AI agents, and embedded analytics to present a navigable catalog. + + + +See the [view group reference][ref-view-group-ref] for the full list of +parameters and configuration options. + + + +## Defining a view group + +A view group is a top-level entity, defined alongside views. At minimum it +needs a `name`; `title` and `description` make it easier to navigate in +downstream tools. + + + +```yaml title="YAML" +view_groups: + - name: sales + title: Sales + description: Revenue and order views for the sales team +``` + +```javascript title="JavaScript" +view_group(`sales`, { + title: `Sales`, + description: `Revenue and order views for the sales team` +}) +``` + + + +## Assigning views to a group + +There are two ways to assign a view to a group, and both can be combined. +Pick whichever keeps group membership co-located with the entity you'd +rather edit — the group definition for a curated catalog, or the view +definition for ad-hoc additions. + +### From the group side + +List views on the group via the [`views`][ref-view-group-views] parameter. +This keeps the full membership in one place, which is convenient when you +want to review a group at a glance. + + + +```yaml title="YAML" +view_groups: + - name: sales + title: Sales + views: + - orders_overview + - revenue +``` + +```javascript title="JavaScript" +view_group(`sales`, { + title: `Sales`, + views: [`orders_overview`, `revenue`] +}) +``` + + + +### From the view side + +Set [`view_group`][ref-view-view-group] on the view itself. The view +declares which group it belongs to, without touching the group definition. + + + +```yaml title="YAML" +views: + - name: revenue + view_group: sales + cubes: + - join_path: order_items + includes: + - total_sale_price + - created_at +``` + +```javascript title="JavaScript" +view(`revenue`, { + view_group: sales, + cubes: [ + { + join_path: order_items, + includes: [`total_sale_price`, `created_at`] + } + ] +}) +``` + + + +### Belonging to multiple groups + +A view can belong to more than one group. Use +[`view_groups`][ref-view-view-groups] (plural) on the view to list every +group it should appear in. + + + +```yaml title="YAML" +views: + - name: revenue + view_groups: + - sales + - finance + cubes: + - join_path: order_items + includes: + - total_sale_price + - created_at +``` + +```javascript title="JavaScript" +view(`revenue`, { + view_groups: [sales, finance], + cubes: [ + { + join_path: order_items, + includes: [`total_sale_price`, `created_at`] + } + ] +}) +``` + + + +## Where view groups live in the model + +By [convention][ref-syntax], view groups are typically defined alongside +views in the `model/views` folder — for example, in a dedicated +`view_groups.yml` file. They behave like any other top-level data model +entity and can be split across multiple files as your model grows. + +## Next steps + +- See the [view group reference][ref-view-group-ref] for the full list of + parameters +- Learn about [views][ref-views] and how they curate cubes for downstream + consumers +- Explore [AI context][ref-ai-context] to improve AI query accuracy + +[ref-views]: /docs/data-modeling/views +[ref-syntax]: /docs/data-modeling/concepts/syntax +[ref-ai-context]: /docs/data-modeling/ai-context +[ref-view-group-ref]: /reference/data-modeling/view-group +[ref-view-group-views]: /reference/data-modeling/view-group#views +[ref-view-view-group]: /reference/data-modeling/view#view_group +[ref-view-view-groups]: /reference/data-modeling/view#view_groups +[ref-meta-endpoint]: /reference/core-data-apis/rest-api/reference diff --git a/docs-mintlify/docs/data-modeling/views.mdx b/docs-mintlify/docs/data-modeling/views.mdx index 30dbc77f86aa6..a3333deb00c9a 100644 --- a/docs-mintlify/docs/data-modeling/views.mdx +++ b/docs-mintlify/docs/data-modeling/views.mdx @@ -423,10 +423,24 @@ Check [APIs & Integrations][ref-apis-support] for details on folder support. For tools that don't support nested folders, the structure is automatically flattened. +## Grouping views with view groups + +When a data model contains many views, [view groups][ref-view-groups] help +organize them into named collections by domain or purpose — for example, +`sales`, `finance`, or `people`. They're exposed through the +[`/v1/meta`][ref-meta-endpoint] API so downstream tools, AI agents, and +embedded analytics can present a navigable catalog. + +See [View groups][ref-view-groups] for the full guide and the +[view group reference][ref-view-group-ref] for the complete list of +parameters. + ## Next steps - See the [view reference][ref-view-reference] for the full list of parameters +- Learn about [view groups][ref-view-groups] to organize views into + named collections - Learn about [access policies][ref-access-policies] to govern view access - Explore [AI context][ref-ai-context] to improve AI query accuracy - Use the [Semantic Model IDE][ref-ide] to develop views interactively @@ -446,4 +460,7 @@ automatically flattened. [ref-ide]: /docs/data-modeling/data-model-ide [ref-viz-tools]: /admin/connect-to-data/visualization-tools [ref-apis-support]: /reference#data-modeling +[ref-view-groups]: /docs/data-modeling/view-groups +[ref-view-group-ref]: /reference/data-modeling/view-group +[ref-meta-endpoint]: /reference/core-data-apis/rest-api/reference [wiki-dry]: https://en.wikipedia.org/wiki/Don%27t_repeat_yourself diff --git a/docs-mintlify/reference/data-modeling/view-group.mdx b/docs-mintlify/reference/data-modeling/view-group.mdx new file mode 100644 index 0000000000000..1642defc8c2e9 --- /dev/null +++ b/docs-mintlify/reference/data-modeling/view-group.mdx @@ -0,0 +1,348 @@ +--- +title: View groups +description: View groups organize views into named collections, making it easier for data consumers to navigate and discover related views. +--- + +View groups let you organize [views][ref-views] into named collections. +When a data model contains many views, grouping them by domain or purpose +helps downstream consumers — including AI agents, embedded analytics, and +visualization tools — discover the right dataset faster. + +View groups are returned as a top-level `viewGroups` array in the +[`/v1/meta`][ref-meta-endpoint] response, alongside the `cubes` array. +Each view that belongs to at least one group also carries a `viewGroups` +string array on its own entry. + +A view group should have the following parameter: [`name`](#name). + +## Parameters + +### `name` + +The `name` parameter serves as the identifier of a view group. It must be +unique among all view groups within a deployment and follow the [naming +conventions][ref-naming]. + + + +```yaml title="YAML" +view_groups: + - name: sales +``` + +```javascript title="JavaScript" +view_group(`sales`, {}) +``` + + + +### `title` + +Use the `title` parameter to set a human-readable display name for the +view group. + + + +```yaml title="YAML" +view_groups: + - name: sales + title: Sales +``` + +```javascript title="JavaScript" +view_group(`sales`, { + title: `Sales` +}) +``` + + + +### `description` + +This parameter provides a human-readable description of the view group. + + + +```yaml title="YAML" +view_groups: + - name: sales + title: Sales + description: Revenue, order, and customer views for the sales team +``` + +```javascript title="JavaScript" +view_group(`sales`, { + title: `Sales`, + description: `Revenue, order, and customer views for the sales team` +}) +``` + + + +### `views` + +A list of view names that belong to this group. Views listed here are +merged with any views that reference this group via their own +[`view_group`][ref-view-view-group] or +[`view_groups`][ref-view-view-groups] parameter. + + + +```yaml title="YAML" +view_groups: + - name: sales + title: Sales + views: + - orders_overview + - revenue +``` + +```javascript title="JavaScript" +view_group(`sales`, { + title: `Sales`, + views: [`orders_overview`, `revenue`] +}) +``` + + + +## Assigning views to groups + +There are two complementary ways to associate a view with a view group: + +1. **On the view group** — list view names in the [`views`](#views) parameter. +2. **On the view** — set [`view_group`][ref-view-view-group] (singular) or + [`view_groups`][ref-view-view-groups] (plural) on the view itself. + +Both approaches can be combined. Cube merges the membership from all +sources, so a view listed under a group's `views` *and* referencing that +group via `view_group` will appear only once. + +### Example + +The following model defines two view groups. The `sales` group lists +`orders_overview` in its `views` parameter; `revenue` joins the same +group via its own `view_group` property. The `customers_view` belongs to +`people` through the group-level `views` list. + + + +```yaml title="YAML" +cubes: + - name: order_items + sql_table: ECOMMERCE.ORDER_ITEMS + + measures: + - name: count + type: count + + - name: total_sale_price + sql: sale_price + type: sum + + dimensions: + - name: id + sql: id + type: number + primary_key: true + + - name: status + sql: status + type: string + + - name: created_at + sql: created_at + type: time + + - name: users + sql_table: ECOMMERCE.USERS + + measures: + - name: count + type: count + + dimensions: + - name: id + sql: id + type: number + primary_key: true + + - name: city + sql: city + type: string + +views: + - name: orders_overview + cubes: + - join_path: order_items + includes: + - count + - total_sale_price + - status + - created_at + + - name: revenue + view_group: sales + cubes: + - join_path: order_items + includes: + - total_sale_price + - created_at + + - name: customers_view + cubes: + - join_path: users + includes: "*" + +view_groups: + - name: sales + title: Sales + description: Revenue and order views for the sales team + views: + - orders_overview + + - name: people + title: People + description: Customer and user views + views: + - customers_view +``` + +```javascript title="JavaScript" +cube(`order_items`, { + sql_table: `ECOMMERCE.ORDER_ITEMS`, + + measures: { + count: { + type: `count` + }, + + total_sale_price: { + sql: `sale_price`, + type: `sum` + } + }, + + dimensions: { + id: { + sql: `id`, + type: `number`, + primary_key: true + }, + + status: { + sql: `status`, + type: `string` + }, + + created_at: { + sql: `created_at`, + type: `time` + } + } +}) + +cube(`users`, { + sql_table: `ECOMMERCE.USERS`, + + measures: { + count: { + type: `count` + } + }, + + dimensions: { + id: { + sql: `id`, + type: `number`, + primary_key: true + }, + + city: { + sql: `city`, + type: `string` + } + } +}) + +view(`orders_overview`, { + cubes: [ + { + join_path: order_items, + includes: [ + `count`, + `total_sale_price`, + `status`, + `created_at` + ] + } + ] +}) + +view(`revenue`, { + view_group: sales, + cubes: [ + { + join_path: order_items, + includes: [ + `total_sale_price`, + `created_at` + ] + } + ] +}) + +view(`customers_view`, { + cubes: [ + { + join_path: users, + includes: `*` + } + ] +}) + +view_group(`sales`, { + title: `Sales`, + description: `Revenue and order views for the sales team`, + views: [`orders_overview`] +}) + +view_group(`people`, { + title: `People`, + description: `Customer and user views`, + views: [`customers_view`] +}) +``` + + + +With this model, the `/v1/meta` response includes a `viewGroups` array: + +```json +{ + "viewGroups": [ + { + "name": "sales", + "title": "Sales", + "description": "Revenue and order views for the sales team", + "views": ["orders_overview", "revenue"] + }, + { + "name": "people", + "title": "People", + "description": "Customer and user views", + "views": ["customers_view"] + } + ] +} +``` + +Notice that `revenue` appears in the `sales` group even though it was not +listed in the group's `views` — it was added because the view itself set +`view_group: sales`. + +[ref-views]: /docs/data-modeling/views +[ref-naming]: /docs/data-modeling/concepts/syntax#naming +[ref-meta-endpoint]: /reference/core-data-apis/rest-api/reference +[ref-view-view-group]: /reference/data-modeling/view#view_group +[ref-view-view-groups]: /reference/data-modeling/view#view_groups diff --git a/docs-mintlify/reference/data-modeling/view.mdx b/docs-mintlify/reference/data-modeling/view.mdx index bc163c6b94f58..1fe48caf2727f 100644 --- a/docs-mintlify/reference/data-modeling/view.mdx +++ b/docs-mintlify/reference/data-modeling/view.mdx @@ -230,6 +230,72 @@ view(`active_users`, { +### `view_group` + +Assigns this view to a single [view group][ref-ref-view-group]. The referenced +view group must be defined with `view_group()` elsewhere in the data model. + + + +```yaml title="YAML" +views: + - name: revenue + view_group: sales + cubes: + - join_path: order_items + includes: "*" +``` + +```javascript title="JavaScript" +view(`revenue`, { + view_group: sales, + cubes: [ + { + join_path: order_items, + includes: `*` + } + ] +}) +``` + + + +### `view_groups` + +Assigns this view to multiple [view groups][ref-ref-view-group] at once. +Each referenced view group must be defined with `view_group()` elsewhere +in the data model. + + + +```yaml title="YAML" +views: + - name: revenue + view_groups: + - sales + - finance + cubes: + - join_path: order_items + includes: "*" +``` + +```javascript title="JavaScript" +view(`revenue`, { + view_groups: [`sales`, `finance`], + cubes: [ + { + join_path: order_items, + includes: `*` + } + ] +}) +``` + + + +You can use both `view_group` and `view_groups` on the same view. Cube +merges them and removes duplicates. + ### `cubes` Use `cubes` parameter in view to include exposed cubes in bulk. You can build @@ -665,4 +731,5 @@ The `access_policy` parameter is used to configure [access policies][ref-ref-dap [ref-dim-title]: /reference/data-modeling/dimensions#title [ref-dim-description]: /reference/data-modeling/dimensions#description [ref-dim-format]: /reference/data-modeling/dimensions#format -[ref-dim-meta]: /reference/data-modeling/dimensions#meta \ No newline at end of file +[ref-dim-meta]: /reference/data-modeling/dimensions#meta +[ref-ref-view-group]: /reference/data-modeling/view-group \ No newline at end of file diff --git a/docs/content/product/administration/deployment/support.mdx b/docs/content/product/administration/deployment/support.mdx index 9bd5c8ddc0cb9..c7888e4021157 100644 --- a/docs/content/product/administration/deployment/support.mdx +++ b/docs/content/product/administration/deployment/support.mdx @@ -25,8 +25,8 @@ with **2 support tickets per month** through the in-product chat feature. * [Premium product tier][ref-premium-tier] includes support via **online resources** such as [documentation][ref-docs-intro], [webinars][cube-webinars], and [community Slack][cube-slack]. -* It also includes **either 4 support tickets per month** (self-serve customers) -or **unlimited support tickets** (customers with an annual commit) for our support engineers during +* It also includes **either 4 support tickets per month** (on-demand customers) +or **unlimited support tickets** (contract customers) for our support engineers during [support hours](#support-hours) through our in-product chat. | Priority | Response time during support hours | diff --git a/docs/content/product/administration/pricing.mdx b/docs/content/product/administration/pricing.mdx index b2156a1eb6653..9a051a24c7193 100644 --- a/docs/content/product/administration/pricing.mdx +++ b/docs/content/product/administration/pricing.mdx @@ -28,7 +28,7 @@ subscription: card, and start using Cube Cloud right away. [Starter](#starter) and [Premium](#premium) product tiers are available on the on-demand payment plan. * _Commit payment plan_ allows you to have a contract with a CCU amount specified -in an order form. [Premium](#premium), [Enterprise](#enterprise), and +in that contract. [Premium](#premium), [Enterprise](#enterprise), and [Enterprise Premier](#enterprise-premier) product tiers are available on the commit payment plan. [Contact us][link-contact-us] to learn more. diff --git a/packages/cubejs-schema-compiler/src/adapter/BaseQuery.js b/packages/cubejs-schema-compiler/src/adapter/BaseQuery.js index 4107babf4709f..24bc94423b54b 100644 --- a/packages/cubejs-schema-compiler/src/adapter/BaseQuery.js +++ b/packages/cubejs-schema-compiler/src/adapter/BaseQuery.js @@ -4304,8 +4304,14 @@ export class BaseQuery { cube, preAggregation ); + const cubeFromPath = this.cubeEvaluator.cubeFromPath(cube); return this.paramAllocator.buildSqlAndParams(originalSqlPreAggregationQuery.evaluateSymbolSqlWithContext( - () => originalSqlPreAggregationQuery.evaluateSql(cube, this.cubeEvaluator.cubeFromPath(cube).sql), + () => { + if (cubeFromPath.sqlTable) { + return `SELECT * FROM ${originalSqlPreAggregationQuery.cubeSql(cube)}`; + } + return originalSqlPreAggregationQuery.evaluateSql(cube, cubeFromPath.sql); + }, { preAggregationQuery: true, collectOriginalSqlPreAggregations } )); } diff --git a/packages/cubejs-schema-compiler/test/integration/postgres/pre-aggregations.test.ts b/packages/cubejs-schema-compiler/test/integration/postgres/pre-aggregations.test.ts index d9f0bd5e1dac8..f62b856a65931 100644 --- a/packages/cubejs-schema-compiler/test/integration/postgres/pre-aggregations.test.ts +++ b/packages/cubejs-schema-compiler/test/integration/postgres/pre-aggregations.test.ts @@ -1234,6 +1234,43 @@ describe('PreAggregations', () => { } }); + cube('sql_table_visitors', { + sqlTable: \`visitors\`, + + measures: { + count: { + type: 'count' + }, + }, + + dimensions: { + id: { + sql: 'id', + type: 'number', + primaryKey: true + }, + source: { + sql: 'source', + type: 'string' + }, + createdAt: { + sql: 'created_at', + type: 'time' + }, + }, + + preAggregations: { + main: { + type: 'originalSql', + }, + partitioned: { + type: 'originalSql', + partitionGranularity: 'month', + timeDimensionReference: createdAt, + }, + }, + }); + `); it('simple pre-aggregation', async () => { @@ -3642,4 +3679,62 @@ describe('PreAggregations', () => { expect(() => query.buildSqlAndParams()).toThrow('No rollups found that can be used for a rollup join'); }); } + + it('originalSql pre-aggregation with sqlTable', async () => { + await compiler.compile(); + const query = new PostgresQuery({ joinGraph, cubeEvaluator, compiler }, { + measures: [ + 'sql_table_visitors.count' + ], + dimensions: [ + 'sql_table_visitors.source' + ], + timeDimensions: [{ + dimension: 'sql_table_visitors.createdAt', + granularity: 'day', + dateRange: ['2017-01-01', '2017-01-30'] + }], + timezone: 'America/Los_Angeles', + order: [{ + id: 'sql_table_visitors.createdAt' + }], + preAggregationsSchema: '' + }); + + const queryAndParams = query.buildSqlAndParams(); + console.log(queryAndParams); + const preAggregationsDescription: any = query.preAggregations?.preAggregationsDescription(); + console.log(preAggregationsDescription); + expect(preAggregationsDescription.length).toBeGreaterThanOrEqual(1); + expect(preAggregationsDescription[0].type).toEqual('originalSql'); + expect(preAggregationsDescription[0].loadSql[0]).toMatch(/CREATE TABLE/); + expect(preAggregationsDescription[0].loadSql[0]).toMatch(/SELECT \* FROM visitors/); + + return dbRunner.evaluateQueryWithPreAggregations(query).then(res => { + expect(res).toEqual( + [ + { + sql_table_visitors__created_at_day: '2017-01-02T00:00:00.000Z', + sql_table_visitors__source: 'some', + sql_table_visitors__count: '1' + }, + { + sql_table_visitors__created_at_day: '2017-01-04T00:00:00.000Z', + sql_table_visitors__source: 'some', + sql_table_visitors__count: '1' + }, + { + sql_table_visitors__created_at_day: '2017-01-05T00:00:00.000Z', + sql_table_visitors__source: 'google', + sql_table_visitors__count: '1' + }, + { + sql_table_visitors__created_at_day: '2017-01-06T00:00:00.000Z', + sql_table_visitors__source: null, + sql_table_visitors__count: '2' + } + ] + ); + }); + }); }); diff --git a/packages/cubejs-schema-compiler/test/unit/pre-aggregations.test.ts b/packages/cubejs-schema-compiler/test/unit/pre-aggregations.test.ts index abd3ff345aafe..634d3ad49c2b5 100644 --- a/packages/cubejs-schema-compiler/test/unit/pre-aggregations.test.ts +++ b/packages/cubejs-schema-compiler/test/unit/pre-aggregations.test.ts @@ -750,4 +750,89 @@ describe('pre-aggregations', () => { expect(preAggregationsDescription[0].preAggregationId).toEqual('orders.pre_agg_with_multiplied_measures'); }); }); + + describe('originalSql pre-aggregation with sqlTable', () => { + const { compiler, joinGraph, cubeEvaluator } = prepareJsCompiler(` + cube(\`orders\`, { + sqlTable: \`public.orders\`, + + measures: { + count: { + type: \`count\`, + }, + }, + + dimensions: { + id: { + sql: \`id\`, + type: \`number\`, + primaryKey: true, + }, + status: { + sql: \`status\`, + type: \`string\`, + }, + created_at: { + sql: \`created_at\`, + type: \`time\`, + }, + }, + + preAggregations: { + main: { + type: \`originalSql\`, + }, + partitioned: { + type: \`originalSql\`, + partitionGranularity: \`month\`, + timeDimension: CUBE.created_at, + }, + }, + }); + `); + + beforeAll(async () => { + await compiler.compile(); + }); + + it('generates sql and loadSql for non-partitioned originalSql pre-aggregation', async () => { + const query = new PostgresQuery({ joinGraph, cubeEvaluator, compiler }, { + measures: ['orders.count'], + dimensions: ['orders.status'], + preAggregationsSchema: '', + }); + + const preAggregationsDescription: any = query.preAggregations?.preAggregationsDescription(); + expect(preAggregationsDescription.length).toBeGreaterThanOrEqual(1); + const originalSqlDesc = preAggregationsDescription.find((d: any) => d.type === 'originalSql'); + expect(originalSqlDesc).toBeDefined(); + + // preAggregationSql() must produce a valid SELECT from the sqlTable + expect(originalSqlDesc.sql[0]).toMatch(/SELECT \* FROM public\.orders/); + expect(originalSqlDesc.loadSql[0]).toMatch(/CREATE TABLE/); + expect(originalSqlDesc.loadSql[0]).toMatch(/SELECT \* FROM public\.orders/); + }); + + it('generates sql and loadSql for partitioned originalSql pre-aggregation', async () => { + const query = new PostgresQuery({ joinGraph, cubeEvaluator, compiler }, { + measures: ['orders.count'], + timeDimensions: [{ + dimension: 'orders.created_at', + granularity: 'month', + dateRange: ['2017-01-01', '2017-03-31'], + }], + preAggregationsSchema: '', + }); + + const preAggregationsDescription: any = query.preAggregations?.preAggregationsDescription(); + expect(preAggregationsDescription.length).toBeGreaterThanOrEqual(1); + const originalSqlDesc = preAggregationsDescription.find((d: any) => d.type === 'originalSql'); + expect(originalSqlDesc).toBeDefined(); + + // preAggregationSql() must produce a valid SELECT from the sqlTable + expect(originalSqlDesc.sql[0]).toMatch(/SELECT \* FROM public\.orders/); + expect(originalSqlDesc.loadSql[0]).toMatch(/CREATE TABLE/); + expect(originalSqlDesc.loadSql[0]).toMatch(/SELECT \* FROM public\.orders/); + }); + }); }); diff --git a/rust/cubestore/cubestore/src/app_metrics.rs b/rust/cubestore/cubestore/src/app_metrics.rs index dff0b1dd43c2a..e2c1d7061a6a3 100644 --- a/rust/cubestore/cubestore/src/app_metrics.rs +++ b/rust/cubestore/cubestore/src/app_metrics.rs @@ -13,10 +13,16 @@ pub static WORKER_POOL_ERROR: Counter = metrics::counter("cs.worker_pool.errors" /// Incoming SQL queries that do data reads. pub static DATA_QUERIES: Counter = metrics::counter("cs.sql.query.data"); pub static DATA_QUERIES_CACHE_HIT: Counter = metrics::counter("cs.sql.query.data.cache.hit"); +pub static DATA_QUERIES_CACHE_STALE_HIT: Counter = + metrics::counter("cs.sql.query.data.cache.stale_hit"); // Approximate number of entries in this cache. pub static DATA_QUERIES_CACHE_SIZE: Gauge = metrics::gauge("cs.sql.query.data.cache.size"); // Approximate total weighted size of entries in this cache. pub static DATA_QUERIES_CACHE_WEIGHT: Gauge = metrics::gauge("cs.sql.query.data.cache.weight"); +pub static DATA_QUERIES_STALE_CACHE_SIZE: Gauge = + metrics::gauge("cs.sql.query.data.cache.stale.size"); +pub static DATA_QUERIES_STALE_CACHE_WEIGHT: Gauge = + metrics::gauge("cs.sql.query.data.cache.stale.weight"); pub static DATA_QUERY_TIME_MS: Histogram = metrics::histogram("cs.sql.query.data.ms"); pub static DATA_QUERY_LOGICAL_PLAN_TOTAL_CREATION_TIME_US: Histogram = metrics::histogram("cs.sql.query.data.planning.logical_plan.total_creation.us"); diff --git a/rust/cubestore/cubestore/src/config/mod.rs b/rust/cubestore/cubestore/src/config/mod.rs index c0bba2edc7109..8c3c78492af64 100644 --- a/rust/cubestore/cubestore/src/config/mod.rs +++ b/rust/cubestore/cubestore/src/config/mod.rs @@ -510,6 +510,8 @@ pub trait ConfigObj: DIService { fn query_cache_time_to_idle_secs(&self) -> Option; + fn query_cache_stale_while_revalidate_secs(&self) -> Option; + fn metadata_cache_max_capacity_bytes(&self) -> u64; fn metadata_cache_time_to_idle_secs(&self) -> u64; @@ -643,6 +645,7 @@ pub struct ConfigObjImpl { pub query_cache_max_capacity_bytes: u64, pub query_queue_cache_max_capacity: u64, pub query_cache_time_to_idle_secs: Option, + pub query_cache_stale_while_revalidate_secs: Option, pub metadata_cache_max_capacity_bytes: u64, pub metadata_cache_time_to_idle_secs: u64, pub stream_replay_check_interval_secs: u64, @@ -956,6 +959,9 @@ impl ConfigObj for ConfigObjImpl { fn query_cache_time_to_idle_secs(&self) -> Option { self.query_cache_time_to_idle_secs } + fn query_cache_stale_while_revalidate_secs(&self) -> Option { + self.query_cache_stale_while_revalidate_secs + } fn metadata_cache_max_capacity_bytes(&self) -> u64 { self.metadata_cache_max_capacity_bytes @@ -1233,6 +1239,8 @@ impl Config { pub fn default() -> Config { let query_timeout = env_parse("CUBESTORE_QUERY_TIMEOUT", 120); + let query_cache_stale_while_revalidate_secs: u64 = + env_parse("CUBESTORE_QUERY_CACHE_STALE_WHILE_REVALIDATE", 0); let query_cache_time_to_idle_secs = env_parse( "CUBESTORE_QUERY_CACHE_TIME_TO_IDLE", // 1 hour @@ -1528,6 +1536,13 @@ impl Config { } else { Some(query_cache_time_to_idle_secs) }, + query_cache_stale_while_revalidate_secs: if query_cache_stale_while_revalidate_secs + == 0 + { + None + } else { + Some(query_cache_stale_while_revalidate_secs) + }, metadata_cache_max_capacity_bytes: env_parse( "CUBESTORE_METADATA_CACHE_MAX_CAPACITY_BYTES", 0, @@ -1740,6 +1755,7 @@ impl Config { query_cache_max_capacity_bytes: 512 << 20, query_queue_cache_max_capacity: 10000, query_cache_time_to_idle_secs: Some(600), + query_cache_stale_while_revalidate_secs: None, metadata_cache_max_capacity_bytes: 0, metadata_cache_time_to_idle_secs: 1_000, meta_store_log_upload_interval: 30, @@ -2393,6 +2409,7 @@ impl Config { self.config_obj.query_cache_max_capacity_bytes(), self.config_obj.query_cache_time_to_idle_secs(), self.config_obj.query_queue_cache_max_capacity(), + self.config_obj.query_cache_stale_while_revalidate_secs(), )); let query_cache_to_move = query_cache.clone(); diff --git a/rust/cubestore/cubestore/src/queryplanner/mod.rs b/rust/cubestore/cubestore/src/queryplanner/mod.rs index b34ea6fb67d0d..47bf8a9c0ec65 100644 --- a/rust/cubestore/cubestore/src/queryplanner/mod.rs +++ b/rust/cubestore/cubestore/src/queryplanner/mod.rs @@ -1063,7 +1063,7 @@ pub mod tests { Arc::new(test_utils::MetaStoreMock {}), Arc::new(test_utils::CacheStoreMock {}), &vec![], - Arc::new(SqlResultCache::new(1 << 20, None, 10000)), + Arc::new(SqlResultCache::new(1 << 20, None, 10000, None)), Arc::new(SessionContext::new().state()), ) } diff --git a/rust/cubestore/cubestore/src/sql/cache.rs b/rust/cubestore/cubestore/src/sql/cache.rs index b1fdd3b586d58..fc5c2e55a49a6 100644 --- a/rust/cubestore/cubestore/src/sql/cache.rs +++ b/rust/cubestore/cubestore/src/sql/cache.rs @@ -10,7 +10,7 @@ use log::trace; use moka::future::{Cache, ConcurrentCacheExt, Iter}; use std::collections::{HashMap, HashSet}; use std::sync::Arc; -use std::time::Duration; +use std::time::{Duration, Instant}; use tokio::sync::{watch, Mutex}; #[derive(Clone, Hash, Eq, PartialEq, Debug, DeepSizeOf)] @@ -65,11 +65,19 @@ impl SqlQueueCacheKey { } } +#[derive(Clone)] +struct StaleEntry { + result: Arc, + created_at: Instant, +} + pub struct SqlResultCache { queue_cache: Mutex< lru::LruCache, CubeError>>>>, >, result_cache: Cache>, + stale_cache: Option>, + stale_while_revalidate_timeout: Option, create_table_cache: Mutex, CubeError>>>>>, } @@ -80,11 +88,20 @@ pub fn sql_result_cache_sizeof(key: &SqlResultCacheKey, df: &Arc) -> .unwrap_or(u32::MAX) } +fn stale_cache_sizeof(_key: &SqlQueueCacheKey, entry: &StaleEntry) -> u32 { + (std::mem::size_of::() + + std::mem::size_of::() + + entry.result.deep_size_of()) + .try_into() + .unwrap_or(u32::MAX) +} + impl SqlResultCache { pub fn new( capacity_bytes: u64, time_to_idle_secs: Option, queue_cache_max_capacity: u64, + stale_while_revalidate_secs: Option, ) -> Self { let cache_builder = if let Some(time_to_idle_secs) = time_to_idle_secs { Cache::builder().time_to_idle(Duration::from_secs(time_to_idle_secs)) @@ -92,24 +109,49 @@ impl SqlResultCache { Cache::builder() }; + let stale_while_revalidate_timeout = stale_while_revalidate_secs.map(Duration::from_secs); + + let stale_cache = stale_while_revalidate_timeout.map(|timeout| { + Cache::builder() + .time_to_idle(timeout * 2) + .max_capacity(capacity_bytes) + .weigher(stale_cache_sizeof) + .build() + }); + Self { queue_cache: Mutex::new(lru::LruCache::new(queue_cache_max_capacity as usize)), result_cache: cache_builder .max_capacity(capacity_bytes) .weigher(sql_result_cache_sizeof) .build(), + stale_cache, + stale_while_revalidate_timeout, create_table_cache: Mutex::new(HashMap::new()), } } + fn report_stale_cache_metrics(&self) { + if let Some(stale_cache) = &self.stale_cache { + app_metrics::DATA_QUERIES_STALE_CACHE_SIZE.report(stale_cache.entry_count() as i64); + app_metrics::DATA_QUERIES_STALE_CACHE_WEIGHT.report(stale_cache.weighted_size() as i64); + } + } + pub async fn clear(&self) { // invalidation will be done in the background self.result_cache.invalidate_all(); // it doesnt flush all, blocking, but it's ok because it's used in one command. self.result_cache.sync(); + if let Some(stale_cache) = &self.stale_cache { + stale_cache.invalidate_all(); + stale_cache.sync(); + } + app_metrics::DATA_QUERIES_CACHE_SIZE.report(self.result_cache.entry_count() as i64); app_metrics::DATA_QUERIES_CACHE_WEIGHT.report(self.result_cache.weighted_size() as i64); + self.report_stale_cache_metrics(); } pub fn entry_count(&self) -> u64 { @@ -120,93 +162,160 @@ impl SqlResultCache { self.result_cache.iter() } + fn try_get_stale(&self, queue_key: &SqlQueueCacheKey) -> Option> { + let stale_cache = self.stale_cache.as_ref()?; + let timeout = self.stale_while_revalidate_timeout?; + let entry = stale_cache.get(queue_key)?; + if entry.created_at.elapsed() <= timeout { + Some(entry.result) + } else { + None + } + } + + async fn update_stale_cache(&self, queue_key: &SqlQueueCacheKey, result: &Arc) { + if let Some(stale_cache) = &self.stale_cache { + stale_cache + .insert( + queue_key.clone(), + StaleEntry { + result: result.clone(), + created_at: Instant::now(), + }, + ) + .await; + } + } + #[tracing::instrument(level = "trace", skip(self, context, plan, exec))] pub async fn get( - &self, + self: &Arc, query: &str, context: SqlQueryContext, plan: SerializedPlan, - exec: impl FnOnce(SerializedPlan) -> F, + exec: impl FnOnce(SerializedPlan) -> F + Send + 'static, ) -> Result, CubeError> where F: Future> + Send + 'static, { - let result_key = SqlResultCacheKey::from_plan(query, &context.inline_tables, &plan); + Arc::clone(self) + .get_inner(query.to_string(), context, plan, exec, false) + .await + } + + fn get_inner( + self: Arc, + query: String, + context: SqlQueryContext, + plan: SerializedPlan, + exec: impl FnOnce(SerializedPlan) -> F + Send + 'static, + is_background_refresh: bool, + ) -> std::pin::Pin, CubeError>> + Send>> + where + F: Future> + Send + 'static, + { + Box::pin(async move { + let result_key = SqlResultCacheKey::from_plan(&query, &context.inline_tables, &plan); - if let Some(result) = self.result_cache.get(&result_key) { - app_metrics::DATA_QUERIES_CACHE_HIT.increment(); - trace!("Using result cache for '{}'", query); - return Ok(result); - } + if let Some(result) = self.result_cache.get(&result_key) { + app_metrics::DATA_QUERIES_CACHE_HIT.increment(); + trace!("Using result cache for '{}'", query); + return Ok(result); + } - let queue_key = SqlQueueCacheKey::from_query(query, &context.inline_tables); - let (sender, receiver) = { - let key = queue_key.clone(); - let mut cache = self.queue_cache.lock().await; + let queue_key = SqlQueueCacheKey::from_query(&query, &context.inline_tables); + + if !is_background_refresh { + if let Some(stale_result) = self.try_get_stale(&queue_key) { + app_metrics::DATA_QUERIES_CACHE_STALE_HIT.increment(); + trace!( + "Using stale-while-revalidate cache for '{}', spawning background refresh", + query + ); + let this = Arc::clone(&self); + let query_clone = query.clone(); + tokio::spawn(async move { + if let Err(e) = this.get_inner(query_clone, context, plan, exec, true).await + { + log::error!("Background stale-while-revalidate refresh failed: {}", e); + } + }); + return Ok(stale_result); + } + } - if cache.contains(&key) { - if let Some(receiver) = cache.get(&key) { - if receiver.has_changed().is_err() { - log::error!("Queue cache contains closed channel"); + let (sender, receiver) = { + let key = queue_key.clone(); + let mut cache = self.queue_cache.lock().await; + + if cache.contains(&key) { + if let Some(receiver) = cache.get(&key) { + if receiver.has_changed().is_err() { + log::error!("Queue cache contains closed channel"); + cache.pop(&key); + } + } else { + log::error!("Queue cache doesn't contains channel"); cache.pop(&key); } - } else { - log::error!("Queue cache doesn't contains channel"); - cache.pop(&key); } - } - if !cache.contains(&key) { - let (tx, rx) = watch::channel(None); - cache.put(key, rx); + if !cache.contains(&key) { + let (tx, rx) = watch::channel(None); + cache.put(key, rx); - app_metrics::DATA_QUERIES_CACHE_SIZE.report(self.result_cache.entry_count() as i64); - app_metrics::DATA_QUERIES_CACHE_WEIGHT - .report(self.result_cache.weighted_size() as i64); + app_metrics::DATA_QUERIES_CACHE_SIZE + .report(self.result_cache.entry_count() as i64); + app_metrics::DATA_QUERIES_CACHE_WEIGHT + .report(self.result_cache.weighted_size() as i64); + self.report_stale_cache_metrics(); - (Some(tx), None) - } else { - (None, cache.get(&key).cloned()) - } - }; - - if let Some(sender) = sender { - trace!("Missing cache for '{}'", query); - let result = exec(plan).await.map(|d| Arc::new(d)); - if let Err(e) = sender.send(Some(result.clone())) { - trace!( - "Failed to set cached query result, possibly flushed from LRU cache: {}", - e - ); - } - match &result { - Ok(r) => { - if !self.result_cache.contains_key(&result_key) { - self.result_cache - .insert(result_key.clone(), r.clone()) - .await; - - app_metrics::DATA_QUERIES_CACHE_SIZE - .report(self.result_cache.entry_count() as i64); - app_metrics::DATA_QUERIES_CACHE_WEIGHT - .report(self.result_cache.weighted_size() as i64); - } + (Some(tx), None) + } else { + (None, cache.get(&key).cloned()) } - Err(_) => { - trace!("Removing error result from cache"); + }; + + if let Some(sender) = sender { + trace!("Missing cache for '{}'", query); + let result = exec(plan).await.map(|d| Arc::new(d)); + if let Err(e) = sender.send(Some(result.clone())) { + trace!( + "Failed to set cached query result, possibly flushed from LRU cache: {}", + e + ); + } + match &result { + Ok(r) => { + if !self.result_cache.contains_key(&result_key) { + self.result_cache + .insert(result_key.clone(), r.clone()) + .await; + + app_metrics::DATA_QUERIES_CACHE_SIZE + .report(self.result_cache.entry_count() as i64); + app_metrics::DATA_QUERIES_CACHE_WEIGHT + .report(self.result_cache.weighted_size() as i64); + } + self.update_stale_cache(&queue_key, r).await; + self.report_stale_cache_metrics(); + } + Err(_) => { + trace!("Removing error result from cache"); + } } - } - self.queue_cache.lock().await.pop(&queue_key); + self.queue_cache.lock().await.pop(&queue_key); - return result; - } + return result; + } - std::mem::drop(plan); - std::mem::drop(result_key); - std::mem::drop(context); + std::mem::drop(plan); + std::mem::drop(result_key); + std::mem::drop(context); - self.wait_for_queue(receiver, query).await + self.wait_for_queue(receiver, &query).await + }) } pub async fn create_table( @@ -286,6 +395,8 @@ impl Drop for SqlResultCache { fn drop(&mut self) { app_metrics::DATA_QUERIES_CACHE_SIZE.report(0); app_metrics::DATA_QUERIES_CACHE_WEIGHT.report(0); + app_metrics::DATA_QUERIES_STALE_CACHE_SIZE.report(0); + app_metrics::DATA_QUERIES_STALE_CACHE_WEIGHT.report(0); } } @@ -302,6 +413,7 @@ mod tests { use datafusion::logical_expr::{EmptyRelation, LogicalPlan}; use futures::future::join_all; use futures_timer::Delay; + use moka::future::ConcurrentCacheExt; use std::collections::HashMap; use std::sync::atomic::AtomicI64; use std::sync::atomic::Ordering; @@ -310,7 +422,7 @@ mod tests { #[tokio::test] async fn simple() -> Result<(), CubeError> { - let cache = SqlResultCache::new(1 << 20, Some(120), 1000); + let cache = Arc::new(SqlResultCache::new(1 << 20, Some(120), 1000, None)); let schema = Arc::new(DFSchema::empty()); let plan = SerializedPlan::try_new( LogicalPlan::EmptyRelation(EmptyRelation { @@ -390,4 +502,239 @@ mod tests { ); Ok(()) } + + #[tokio::test] + async fn stale_while_revalidate() -> Result<(), CubeError> { + let cache = Arc::new(SqlResultCache::new(1 << 20, Some(120), 1000, Some(30))); + let schema = Arc::new(DFSchema::empty()); + let plan = SerializedPlan::try_new( + LogicalPlan::EmptyRelation(EmptyRelation { + produce_one_row: false, + schema, + }), + PlanningMeta { + indices: Vec::new(), + multi_part_subtree: HashMap::new(), + }, + None, + ) + .await?; + + let counter = Arc::new(AtomicI64::new(1)); + let counter_clone = counter.clone(); + let exec = async move |_p| { + Delay::new(Duration::from_millis(100)).await; + Ok(DataFrame::new( + Vec::new(), + vec![Row::new(vec![TableValue::Int( + counter_clone.fetch_add(1, Ordering::Relaxed), + )])], + )) + }; + + let result = cache + .get("SELECT 1", SqlQueryContext::default(), plan.clone(), exec) + .await?; + assert_eq!( + result.get_rows().get(0).unwrap().values().get(0).unwrap(), + &TableValue::Int(1) + ); + + // Simulate a partition change: clear the exact result cache so the next get() + // misses the exact key but still finds the stale entry (keyed by SQL only). + cache.result_cache.invalidate_all(); + cache.result_cache.sync(); + + let counter_clone2 = counter.clone(); + let exec2 = async move |_p| { + Delay::new(Duration::from_millis(500)).await; + Ok(DataFrame::new( + Vec::new(), + vec![Row::new(vec![TableValue::Int( + counter_clone2.fetch_add(1, Ordering::Relaxed), + )])], + )) + }; + + let stale_result = cache + .get("SELECT 1", SqlQueryContext::default(), plan.clone(), exec2) + .await?; + assert_eq!( + stale_result + .get_rows() + .get(0) + .unwrap() + .values() + .get(0) + .unwrap(), + &TableValue::Int(1), + "Should return stale result immediately" + ); + + // Wait for the background refresh to complete. + Delay::new(Duration::from_millis(800)).await; + + // The background refresh should have populated the exact result cache. + // Verify by fetching again — this should hit the exact cache with the + // refreshed value, or at minimum the stale cache was updated. + let counter_clone3 = counter.clone(); + let exec3 = async move |_p| { + Ok(DataFrame::new( + Vec::new(), + vec![Row::new(vec![TableValue::Int( + counter_clone3.fetch_add(1, Ordering::Relaxed), + )])], + )) + }; + let fresh_result = cache + .get("SELECT 1", SqlQueryContext::default(), plan, exec3) + .await?; + + let val = fresh_result + .get_rows() + .get(0) + .unwrap() + .values() + .get(0) + .unwrap() + .clone(); + assert!( + val == TableValue::Int(2) || val == TableValue::Int(3), + "Should see the updated value from background refresh: got {:?}", + val + ); + + Ok(()) + } + + #[tokio::test] + async fn stale_while_revalidate_background_failure() -> Result<(), CubeError> { + let cache = Arc::new(SqlResultCache::new(1 << 20, Some(120), 1000, Some(30))); + let schema = Arc::new(DFSchema::empty()); + let plan = SerializedPlan::try_new( + LogicalPlan::EmptyRelation(EmptyRelation { + produce_one_row: false, + schema, + }), + PlanningMeta { + indices: Vec::new(), + multi_part_subtree: HashMap::new(), + }, + None, + ) + .await?; + + let exec = async move |_p| { + Ok(DataFrame::new( + Vec::new(), + vec![Row::new(vec![TableValue::Int(42)])], + )) + }; + cache + .get("SELECT 1", SqlQueryContext::default(), plan.clone(), exec) + .await?; + + // Simulate partition change + cache.result_cache.invalidate_all(); + cache.result_cache.sync(); + + let exec_fail = async move |_p| -> Result { + Err(CubeError::internal("background exec failed".to_string())) + }; + + let stale_result = cache + .get( + "SELECT 1", + SqlQueryContext::default(), + plan.clone(), + exec_fail, + ) + .await?; + assert_eq!( + stale_result + .get_rows() + .get(0) + .unwrap() + .values() + .get(0) + .unwrap(), + &TableValue::Int(42), + "Should still return stale result when background refresh will fail" + ); + + Delay::new(Duration::from_millis(200)).await; + + // Stale entry should still be intact after background failure + cache.result_cache.invalidate_all(); + cache.result_cache.sync(); + + let exec_after = async move |_p| { + Ok(DataFrame::new( + Vec::new(), + vec![Row::new(vec![TableValue::Int(99)])], + )) + }; + let result = cache + .get("SELECT 1", SqlQueryContext::default(), plan, exec_after) + .await?; + assert_eq!( + result.get_rows().get(0).unwrap().values().get(0).unwrap(), + &TableValue::Int(42), + "Stale entry should still be available after background failure" + ); + + Ok(()) + } + + #[tokio::test] + async fn stale_while_revalidate_timeout_expiry() -> Result<(), CubeError> { + let cache = Arc::new(SqlResultCache::new(1 << 20, Some(120), 1000, Some(1))); + let schema = Arc::new(DFSchema::empty()); + let plan = SerializedPlan::try_new( + LogicalPlan::EmptyRelation(EmptyRelation { + produce_one_row: false, + schema, + }), + PlanningMeta { + indices: Vec::new(), + multi_part_subtree: HashMap::new(), + }, + None, + ) + .await?; + + let exec = async move |_p| { + Ok(DataFrame::new( + Vec::new(), + vec![Row::new(vec![TableValue::Int(1)])], + )) + }; + cache + .get("SELECT 1", SqlQueryContext::default(), plan.clone(), exec) + .await?; + + // Simulate partition change + cache.result_cache.invalidate_all(); + cache.result_cache.sync(); + + // Wait for the 1-second stale timeout to expire + Delay::new(Duration::from_millis(1200)).await; + + let exec2 = async move |_p| { + Ok(DataFrame::new( + Vec::new(), + vec![Row::new(vec![TableValue::Int(99)])], + )) + }; + let result = cache + .get("SELECT 1", SqlQueryContext::default(), plan, exec2) + .await?; + assert_eq!( + result.get_rows().get(0).unwrap().values().get(0).unwrap(), + &TableValue::Int(99), + "After stale timeout expires, should execute fresh query instead of serving stale" + ); + + Ok(()) + } } diff --git a/rust/cubestore/cubestore/src/sql/mod.rs b/rust/cubestore/cubestore/src/sql/mod.rs index ad21a22acd8db..2ab0f11032b9b 100644 --- a/rust/cubestore/cubestore/src/sql/mod.rs +++ b/rust/cubestore/cubestore/src/sql/mod.rs @@ -2004,6 +2004,7 @@ mod tests { config.config_obj().query_cache_max_capacity_bytes(), config.config_obj().query_cache_time_to_idle_secs(), 1000, + None, )), BasicProcessRateLimiter::new(), ); @@ -2088,6 +2089,7 @@ mod tests { config.config_obj().query_cache_max_capacity_bytes(), config.config_obj().query_cache_time_to_idle_secs(), 1000, + None, )), BasicProcessRateLimiter::new(), ); @@ -2207,6 +2209,7 @@ mod tests { config.config_obj().query_cache_max_capacity_bytes(), config.config_obj().query_cache_time_to_idle_secs(), 1000, + None, )), BasicProcessRateLimiter::new(), );