Skip to content

Commit 1af43a6

Browse files
Add Databricks dialect documentation (#291)
- New dialect page with connection config, table references, limitations, functions - Add Databricks connection parameters to config.malloynb - Add Databricks to database support page, sidebar, and connections page
1 parent c77d679 commit 1af43a6

5 files changed

Lines changed: 93 additions & 0 deletions

File tree

src/documentation/language/connections.malloynb

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,10 @@ In the official Malloy connection implementations, the behavior is as follows:
2525

2626
In BigQuery, the string passed to the `.table()` connection method can be a two- or three-segment path including the (optional) project ID, dataset ID, and table name, e.g. `bigquery.table('project-id.dataset-id.table-name')` or `bigquery.table('dataset-id.table-name')`. If the project ID is left off, the default project ID for the connection will be used, or else the system default if none is set on the connection.
2727

28+
### Databricks
29+
30+
In Databricks, the string passed to the `.table()` connection method can be a one-, two-, or three-segment path: `table`, `schema.table`, or `catalog.schema.table`. If the catalog or schema is omitted, the configured defaults (or workspace defaults) are used.
31+
2832
### DuckDB
2933

3034
In DuckDB, the `.table()` method accepts the path (relative to the Malloy file) of CSV, JSON, or Parquet file containing the table data, e.g. `duckdb.table('data/users.csv')` or `duckdb.table('../../users.parquet')`. URLs to such files (or APIs) are also allowed: see [an example here](../patterns/apijson.malloynb).
Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
>>>markdown
2+
# Databricks
3+
4+
Databricks uses the [Databricks SQL Connector](https://docs.databricks.com/en/dev-tools/nodejs-sql-driver.html) to connect to Databricks SQL warehouses and clusters.
5+
6+
## Connection Configuration
7+
8+
In `malloy-config.json`:
9+
10+
```json
11+
{
12+
"connections": {
13+
"databricks": {
14+
"is": "databricks",
15+
"host": "my-workspace.cloud.databricks.com",
16+
"path": "/sql/1.0/warehouses/abc123",
17+
"token": {"env": "DATABRICKS_TOKEN"}
18+
}
19+
}
20+
}
21+
```
22+
23+
| Parameter | Type | Description |
24+
|---|---|---|
25+
| `host` | string | Databricks workspace hostname (e.g. `my-workspace.cloud.databricks.com`) |
26+
| `path` | string | SQL warehouse HTTP path (e.g. `/sql/1.0/warehouses/abc123`) |
27+
| `token` | secret | Personal access token (optional if using OAuth) |
28+
| `oauthClientId` | string | OAuth M2M client ID (optional) |
29+
| `oauthClientSecret` | secret | OAuth M2M client secret (optional) |
30+
| `defaultCatalog` | string | Default Unity Catalog name (optional) |
31+
| `defaultSchema` | string | Default schema name (optional) |
32+
| `setupSQL` | text | Connection setup SQL ([see configuration docs](../../setup/config.malloynb#setup-sql)) |
33+
34+
Authentication is either **personal access token** (`token`) or **OAuth M2M** (`oauthClientId` + `oauthClientSecret`).
35+
36+
## Table References
37+
38+
The `.table()` method accepts a one-, two-, or three-segment path: `table`, `schema.table`, or `catalog.schema.table`. If the catalog or schema is omitted, the configured defaults (or workspace defaults) are used.
39+
40+
```malloy
41+
source: flights is databricks.table('malloytest.flights')
42+
source: orders is databricks.table('my_catalog.analytics.orders')
43+
```
44+
45+
## Limitations
46+
47+
- **`string_agg` ordering**: Databricks does not support `ORDER BY` inside `COLLECT_LIST`/`COLLECT_SET`, so `string_agg` and `string_agg_distinct` do not support the `order_by` modifier.
48+
- **`TIMESTAMP_NTZ`**: Databricks' `TIMESTAMP_NTZ` (timestamp without timezone) maps to `sql native` in Malloy. Use explicit casting to `timestamp` when needed.
49+
50+
# Functions
51+
52+
## Useful Functions not in the database function library
53+
54+
string_agg_distinct
55+
56+
## Database Functions
57+
58+
Malloy code can, in addition to the [Malloy Standard Functions](../functions.malloynb), reference
59+
any of the listed functions here without needing to use [Raw SQL Functions](../functions.malloynb#raw-sql-functions).
60+
61+
string_agg
62+
repeat
63+
reverse
64+
65+
# External Resources
66+
67+
* [Databricks SQL Reference](https://docs.databricks.com/en/sql/language-manual/index.html)

src/documentation/setup/config.malloynb

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,21 @@ The `is` field identifies the connection type. All other fields are type-specifi
5858
| `billingProjectId` | string | Billing project (if different) |
5959
| `setupSQL` | text | Connection setup SQL ([see below](#setup-sql)) |
6060

61+
### `databricks` — Databricks
62+
63+
| Parameter | Type | Description |
64+
|---|---|---|
65+
| `host` | string | Workspace hostname (e.g. `my-workspace.cloud.databricks.com`) |
66+
| `path` | string | SQL warehouse HTTP path (e.g. `/sql/1.0/warehouses/abc123`) |
67+
| `token` | secret | Personal access token (optional if using OAuth) |
68+
| `oauthClientId` | string | OAuth M2M client ID |
69+
| `oauthClientSecret` | secret | OAuth M2M client secret |
70+
| `defaultCatalog` | string | Default Unity Catalog name |
71+
| `defaultSchema` | string | Default schema name |
72+
| `setupSQL` | text | Connection setup SQL ([see below](#setup-sql)) |
73+
74+
Authentication: provide either `token` or the `oauthClientId` + `oauthClientSecret` pair.
75+
6176
### `postgres` — PostgreSQL
6277

6378
| Parameter | Type | Description |

src/documentation/setup/database_support.malloynb

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ Malloy connects to a variety of databases. This page provides an overview of sup
1010
| [DuckDB](../language/dialect/duckdb.malloynb) | `duckdb` | Built-in, no setup required. Reads Parquet, CSV, JSON files |
1111
| [MotherDuck](../language/dialect/duckdb.malloynb) | `duckdb` | Cloud-hosted DuckDB |
1212
| [BigQuery](../language/dialect/bigquery.malloynb) | `bigquery` | OAuth or service account authentication |
13+
| [Databricks](../language/dialect/databricks.malloynb) | `databricks` | Personal access token or OAuth M2M |
1314
| [Snowflake](../language/dialect/snowflake.malloynb) | `snowflake` | Password or RSA key authentication |
1415
| [PostgreSQL](../language/dialect/postgres.malloynb) | `postgres` | Standard credentials |
1516
| [MySQL](../language/dialect/mysql.malloynb) | `mysql` | Standard credentials |
@@ -74,6 +75,7 @@ Each database supports [Malloy Standard Functions](../language/functions.malloyn
7475
|----------|-------------------|
7576
| DuckDB | [DuckDB Functions](https://duckdb.org/docs/sql/functions/overview) |
7677
| BigQuery | [BigQuery Functions](https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators) |
78+
| Databricks | [Databricks Functions](https://docs.databricks.com/en/sql/language-manual/sql-ref-functions-builtin.html) |
7779
| Snowflake | [Snowflake Functions](https://docs.snowflake.com/en/sql-reference-functions) |
7880
| PostgreSQL | [PostgreSQL Functions](https://www.postgresql.org/docs/current/functions.html) |
7981
| MySQL | [MySQL Functions](https://dev.mysql.com/doc/refman/8.0/en/functions.html) |
@@ -90,6 +92,7 @@ Each database has unique capabilities and limitations. See the dialect documenta
9092

9193
- [DuckDB](../language/dialect/duckdb.malloynb) - File-based queries, approximate counts, full bigint precision
9294
- [BigQuery](../language/dialect/bigquery.malloynb) - HyperLogLog, approximate counts, full bigint precision
95+
- [Databricks](../language/dialect/databricks.malloynb) - Unity Catalog, SQL warehouses, full bigint precision
9396
- [Snowflake](../language/dialect/snowflake.malloynb) - HyperLogLog, TOML credential file support, full bigint precision
9497
- [PostgreSQL](../language/dialect/postgres.malloynb) - String aggregation extensions, limited bigint precision
9598
- [MySQL](../language/dialect/mysql.malloynb) - Boolean type workarounds, full bigint precision

src/table_of_contents.json

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -218,6 +218,10 @@
218218
"title": "BigQuery",
219219
"link": "/language/dialect/bigquery.malloynb"
220220
},
221+
{
222+
"title": "Databricks",
223+
"link": "/language/dialect/databricks.malloynb"
224+
},
221225
{
222226
"title": "Presto / Trino",
223227
"link": "/language/dialect/presto-trino.malloynb"

0 commit comments

Comments
 (0)