Apache Doris is a modern analytical database product based on an MPP architecture. It provides real-time analytical capabilities, supporting both high-concurrency point queries and high-throughput complex analysis.
SQLMesh supports Doris through its MySQL-compatible protocol, while providing Doris-specific optimizations for table models, indexing, partitioning, and other features. The adapter is designed to leverage Doris's strengths for analytical workloads, with sensible defaults and support for advanced configuration.
doris:
connection:
type: doris
host: fe.doris.cluster # Frontend (FE) node address
port: 9030 # Query port (default: 9030)
user: doris_user
password: your_password
database: your_database
# Optional MySQL-compatible settings
charset: utf8mb4
connect_timeout: 60
state_connection:
# Use duckdb as state connection
type: duckdbDoris supports three table models: DUPLICATE, UNIQUE, and AGGREGATE. SQLMesh supports DUPLICATE and UNIQUE models through the physical_properties configuration.
Example Configuration:
MODEL (
name user_events,
kind FULL,
physical_properties (
duplicate_key ('user_id', 'event_time'),
distributed_by (
kind = 'HASH',
expressions = 'user_id',
buckets = 10
)
)
);Example Configuration:
MODEL (
name user_events,
kind FULL,
physical_properties (
unique_key 'user_id',
distributed_by (
kind = 'HASH',
expressions = 'user_id',
buckets = 16
)
)
);The Doris adapter supports a comprehensive set of table properties that can be configured in the physical_properties section of your model.
| Property | Type | Description | Example |
|---|---|---|---|
unique_key |
Tuple[str] or str |
Defines unique key columns for UNIQUE model | ('user_id') or 'user_id' |
duplicate_key |
Tuple[str] or str |
Defines key columns for DUPLICATE model | ('user_id', 'event_time') |
distributed_by |
Dict |
Distribution configuration | See Distribution section |
partitions |
Tuple[str] or str |
Custom partition expression | 'FROM ("2000-11-14") TO ("2099-11-14") INTERVAL 1 MONTH' |
The distributed_by property supports multiple formats:
Dictionary Format:
MODEL (
name my_table,
kind FULL,
physical_properties (
distributed_by (
kind = 'HASH',
expressions = 'user_id',
buckets = 10
)
)
);MODEL (
name my_table,
kind FULL,
physical_properties (
distributed_by (
kind = 'RANDOM'
)
)
);Supported Distribution Types:
HASH: Hash-based distribution (most common)RANDOM: Random distribution
Bucket Configuration:
- Integer value: Fixed number of buckets (e.g.,
10) 'AUTO': Automatic bucket calculation
Doris table supports range partitioning and list partitioning to improve query performance.
Custom Partition Expression:
MODEL (
name my_partitioned_model,
kind INCREMENTAL_BY_TIME_RANGE(time_column (event_date, '%Y-%m-%d')),
partitioned_by RANGE(event_date),
physical_properties (
partitions = 'FROM ("2000-11-14") TO ("2099-11-14") INTERVAL 2 YEAR',
),
);MODEL (
name my_custom_partitioned_model,
kind FULL,
partitioned_by RANGE(event_date),
physical_properties (
partitions = (
'PARTITION `p2023` VALUES [("2023-01-01"), ("2024-01-01"))',
'PARTITION `p2024` VALUES [("2024-01-01"), ("2025-01-01"))',
'PARTITION `p2025` VALUES [("2025-01-01"), ("2026-01-01"))',
'PARTITION `other` VALUES LESS THAN MAXVALUE'
),
)
);Any additional properties in physical_properties are passed through as Doris table properties:
MODEL (
name advanced_table,
kind FULL,
physical_properties (
unique_key = 'id',
distributed_by (
kind = 'HASH',
expressions = 'id',
buckets = 8
),
replication_allocation = 'tag.location.default: 3',
in_memory = 'false',
storage_format = 'V2',
disable_auto_compaction = 'false',
)
);SQLMesh supports creating materialized views in Doris with comprehensive configuration options.
MODEL (
name user_summary_mv,
kind VIEW (
materialized true
)
);
SELECT
user_id,
COUNT(*) as event_count,
MAX(event_time) as last_event
FROM user_events
GROUP BY user_id;MODEL (
name sqlmesh_test.view_materialized1,
kind VIEW (
materialized true
),
partitioned_by ds,
physical_properties (
build = 'IMMEDIATE',
refresh = 'AUTO',
refresh_trigger = 'ON SCHEDULE EVERY 12 hour',
unique_key = id,
distributed_by = (kind='HASH', expressions=id, buckets=10),
replication_allocation = 'tag.location.default: 3',
in_memory = 'false',
storage_format = 'V2',
disable_auto_compaction = 'false'
),
description "customer zip",
columns (
id int,
ds datetime,
zip int,
),
column_descriptions (
id = "order id",
zip = "zip code",
)
);| Property | Description | Values |
|---|---|---|
build |
Build strategy | 'IMMEDIATE', 'DEFERRED' |
refresh |
Refresh strategy | 'COMPLETE', 'AUTO' |
refresh_trigger |
Schedule for automatic refresh | 'MANUAL', 'ON SCHEDULE INTERVAL 1 HOUR', 'ON COMMIT' |
unique_key |
Unique key columns | 'user_id' or ['user_id', 'date'] |
duplicate_key |
Duplicate key columns | 'user_id' or ['user_id', 'date'] |
materialized_type |
Materialized type | SYNC, ASYNC |
source_table |
Source table of synchronous materialized view | schema_name.table_name |
SQLMesh supports creating indexes in Doris to accelerate queries. You can define indexes in your model's DDL.
Example:
MODEL (
name my_indexed_table,
kind FULL
);
SELECT
user_id,
username,
city
FROM
users;
@IF(
@runtime_stage = 'creating',
CREATE INDEX idx_username ON my_indexed_table (username) USING INVERTED COMMENT 'Inverted index on username'
);SQLMesh supports adding comments to tables and columns with automatic truncation to Doris limits.
- Table Comments: Use the
descriptionproperty in theMODELdefinition - Column Comments: Use the
column_descriptionsproperty in theMODELdefinition
MODEL (
name my_commented_table,
kind TABLE,
description 'This is a comprehensive table comment that describes the purpose and usage of this table in detail.',
column_descriptions (
id = "Unique identifier for each record",
user_id = "Foreign key reference to users table",
event_type = "Type of event that occurred"
)
);Limits:
- Table comments: 2048 characters (automatically truncated)
- Column comments: 255 characters (automatically truncated)
SQLMesh supports both regular and materialized views in Doris.
MODEL (
name user_summary_view,
kind VIEW
);
SELECT
user_id,
COUNT(*) as event_count,
MAX(event_time) as last_event
FROM user_events
GROUP BY user_id;To use Doris with SQLMesh, install the required MySQL driver:
pip install "sqlmesh[doris]"
# or
pip install pymysql