diff --git a/TOC-tidb-cloud-lake.md b/TOC-tidb-cloud-lake.md index f412e26144faf..2888d52c6f326 100644 --- a/TOC-tidb-cloud-lake.md +++ b/TOC-tidb-cloud-lake.md @@ -12,6 +12,7 @@ - Resources - [Dashboards](/tidb-cloud-lake/guides/dashboards.md) + - [Task Flow](/tidb-cloud-lake/guides/task-flow.md) - [Warehouses](/tidb-cloud-lake/guides/warehouse.md) - [Worksheets](/tidb-cloud-lake/guides/worksheet.md) - Administration @@ -89,11 +90,12 @@ - [Track and Transform Data via Streams](/tidb-cloud-lake/guides/track-and-transform-data-via-streams.md) - [Automate Data Loading with Tasks](/tidb-cloud-lake/guides/automate-data-loading-with-tasks.md) - Data Unloading + - [Overview](/tidb-cloud-lake/guides/unload-data.md) - [Unload Parquet File](/tidb-cloud-lake/guides/unload-parquet-file.md) - [Unload CSV File](/tidb-cloud-lake/guides/unload-csv-file.md) - [Unload TSV File](/tidb-cloud-lake/guides/unload-tsv-file.md) - [Unload NDJSON File](/tidb-cloud-lake/guides/unload-ndjson-file.md) - - [Unload Data from Databend](/tidb-cloud-lake/guides/unload-data-from-databend.md) + - [Unload Lance Dataset](/tidb-cloud-lake/guides/unload-lance-dataset.md) - AI and ML Integration - [Overview](/tidb-cloud-lake/guides/ai-ml-integration.md) - [External AI Functions](/tidb-cloud-lake/guides/external-ai-functions.md) @@ -234,16 +236,16 @@ - [SQL Dialects & Conformance](/tidb-cloud-lake/sql/sql-dialects-conformance.md) - SQL Statements - [Overview](/tidb-cloud-lake/sql/sql-statements-reference.md) - - DDL Statements + - DDL Commands - [DDL Overview](/tidb-cloud-lake/sql/ddl.md) - Database - [Overview](/tidb-cloud-lake/sql/ddl-database-overview.md) - - [RENAME DATABASE](/tidb-cloud-lake/sql/rename-database.md) - [CREATE DATABASE](/tidb-cloud-lake/sql/create-database.md) - - [DROP DATABASE](/tidb-cloud-lake/sql/drop-database.md) - - [USE DATABASE](/tidb-cloud-lake/sql/use-database.md) - [SHOW CREATE DATABASE](/tidb-cloud-lake/sql/show-create-database.md) + - [USE DATABASE](/tidb-cloud-lake/sql/use-database.md) + - [ALTER DATABASE](/tidb-cloud-lake/sql/alter-database.md) - [SHOW DATABASES](/tidb-cloud-lake/sql/show-databases.md) + - [DROP DATABASE](/tidb-cloud-lake/sql/drop-database.md) - [SHOW DROP DATABASES](/tidb-cloud-lake/sql/show-drop-databases.md) - [UNDROP DATABASE](/tidb-cloud-lake/sql/undrop-database.md) - Table @@ -326,6 +328,12 @@ - [CREATE NOTIFICATION INTEGRATION](/tidb-cloud-lake/sql/create-notification-integration.md) - [ALTER NOTIFICATION INTEGRATION](/tidb-cloud-lake/sql/alter-notification-integration.md) - [DROP NOTIFICATION INTEGRATION](/tidb-cloud-lake/sql/drop-notification-integration.md) + - Tag + - [Overview](/tidb-cloud-lake/sql/tag-overview.md) + - [CREATE TAG](/tidb-cloud-lake/sql/create-tag.md) + - [DROP TAG](/tidb-cloud-lake/sql/drop-tag.md) + - [SHOW TAGS](/tidb-cloud-lake/sql/show-tags.md) + - [SET TAG](/tidb-cloud-lake/sql/set-tag.md) - Connection - [Overview](/tidb-cloud-lake/sql/connection.md) - [CREATE CONNECTION](/tidb-cloud-lake/sql/create-connection.md) @@ -425,13 +433,17 @@ - [QUERY_HISTORY](/tidb-cloud-lake/sql/query-history.md) - [SHOW WAREHOUSES](/tidb-cloud-lake/sql/show-warehouses.md) - [USE WAREHOUSE](/tidb-cloud-lake/sql/use-warehouse.md) - - workload group + - Workload Group - [ALTER WORKLOAD GROUP](/tidb-cloud-lake/sql/alter-workload-group.md) - [CREATE WORKLOAD GROUP](/tidb-cloud-lake/sql/create-workload-group.md) - [DROP WORKLOAD GROUP](/tidb-cloud-lake/sql/drop-workload-group.md) - [Workload Group](/tidb-cloud-lake/sql/workload-group.md) - [RENAME WORKLOAD GROUP](/tidb-cloud-lake/sql/rename-workload-group.md) - [SHOW WORKLOAD GROUPS](/tidb-cloud-lake/sql/show-workload-groups.md) + - Table Versioning + - [Overview](/tidb-cloud-lake/sql/table-versioning.md) + - [CREATE SNAPSHOT TAG](/tidb-cloud-lake/sql/create-snapshot-tag.md) + - [DROP SNAPSHOT TAG](/tidb-cloud-lake/sql/drop-snapshot-tag.md) - DML Commands - [DML Overview](/tidb-cloud-lake/sql/dml.md) - [`COPY INTO `](/tidb-cloud-lake/sql/copy-into-location.md) @@ -839,9 +851,12 @@ - [ROWS BETWEEN](/tidb-cloud-lake/sql/rows-between.md) - Geospatial Functions - [Overview](/tidb-cloud-lake/sql/geospatial-functions.md) + - [GEO_DISTANCE](/tidb-cloud-lake/sql/geo-distance.md) - [GEO_TO_H3](/tidb-cloud-lake/sql/geo-to-h3.md) - [GEOHASH_DECODE](/tidb-cloud-lake/sql/geohash-decode.md) - [GEOHASH_ENCODE](/tidb-cloud-lake/sql/geohash-encode.md) + - [GREAT_CIRCLE_ANGLE](/tidb-cloud-lake/sql/great-circle-angle.md) + - [GREAT_CIRCLE_DISTANCE](/tidb-cloud-lake/sql/great-circle-distance.md) - [H3_CELL_AREA_M2](/tidb-cloud-lake/sql/h3-cell-area-m2.md) - [H3_CELL_AREA_RADS2](/tidb-cloud-lake/sql/h3-cell-area-rads2.md) - [H3_DISTANCE](/tidb-cloud-lake/sql/h3-distance.md) @@ -878,6 +893,7 @@ - [H3_TO_STRING](/tidb-cloud-lake/sql/h3-to-string.md) - [H3_UNIDIRECTIONAL_EDGE_IS_VALID](/tidb-cloud-lake/sql/h3-unidirectional-edge-is-valid.md) - [HAVERSINE](/tidb-cloud-lake/sql/haversine.md) + - [POINT_IN_ELLIPSES](/tidb-cloud-lake/sql/point-in-ellipses.md) - [POINT_IN_POLYGON](/tidb-cloud-lake/sql/point-in-polygon.md) - [ST_AREA](/tidb-cloud-lake/sql/st-area.md) - [ST_ASBINARY](/tidb-cloud-lake/sql/st-asbinary.md) @@ -887,10 +903,17 @@ - [ST_ASTEXT](/tidb-cloud-lake/sql/st-astext.md) - [ST_ASWKB](/tidb-cloud-lake/sql/st-aswkb.md) - [ST_ASWKT](/tidb-cloud-lake/sql/st-aswkt.md) + - [ST_CENTROID](/tidb-cloud-lake/sql/st-centroid.md) - [ST_CONTAINS](/tidb-cloud-lake/sql/st-contains.md) + - [ST_CONVEXHULL](/tidb-cloud-lake/sql/st-convexhull.md) + - [ST_DIFFERENCE](/tidb-cloud-lake/sql/st-difference.md) - [ST_DIMENSION](/tidb-cloud-lake/sql/st-dimension.md) + - [ST_DISJOINT](/tidb-cloud-lake/sql/st-disjoint.md) - [ST_DISTANCE](/tidb-cloud-lake/sql/st-distance.md) + - [ST_DWITHIN](/tidb-cloud-lake/sql/st-dwithin.md) - [ST_ENDPOINT](/tidb-cloud-lake/sql/st-endpoint.md) + - [ST_ENVELOPE](/tidb-cloud-lake/sql/st-envelope.md) + - [ST_EQUALS](/tidb-cloud-lake/sql/st-equals.md) - [ST_GEOGETRYFROMWKB](/tidb-cloud-lake/sql/st-geogetryfromwkb.md) - [ST_GEOGFROMEWKB](/tidb-cloud-lake/sql/st-geogfromewkb.md) - [ST_GEOGFROMGEOHASH](/tidb-cloud-lake/sql/st-geogfromgeohash.md) @@ -916,6 +939,9 @@ - [ST_GEOMFROMWKB](/tidb-cloud-lake/sql/st-geomfromwkb.md) - [ST_GEOMFROMWKT](/tidb-cloud-lake/sql/st-geomfromwkt.md) - [ST_GEOMPOINTFROMGEOHASH](/tidb-cloud-lake/sql/st-geompointfromgeohash.md) + - [ST_HILBERT](/tidb-cloud-lake/sql/st-hilbert.md) + - [ST_INTERSECTION](/tidb-cloud-lake/sql/st-intersection.md) + - [ST_INTERSECTS](/tidb-cloud-lake/sql/st-intersects.md) - [ST_LENGTH](/tidb-cloud-lake/sql/st-length.md) - [ST_MAKE_LINE](/tidb-cloud-lake/sql/st-make-line.md) - [ST_MAKEGEOMPOINT](/tidb-cloud-lake/sql/st-makegeompoint.md) @@ -930,7 +956,10 @@ - [ST_SETSRID](/tidb-cloud-lake/sql/st-setsrid.md) - [ST_SRID](/tidb-cloud-lake/sql/st-srid.md) - [ST_STARTPOINT](/tidb-cloud-lake/sql/st-startpoint.md) + - [ST_SYMDIFFERENCE](/tidb-cloud-lake/sql/st-symdifference.md) - [ST_TRANSFORM](/tidb-cloud-lake/sql/st-transform.md) + - [ST_UNION](/tidb-cloud-lake/sql/st-union.md) + - [ST_WITHIN](/tidb-cloud-lake/sql/st-within.md) - [ST_X](/tidb-cloud-lake/sql/st-x.md) - [ST_XMAX](/tidb-cloud-lake/sql/st-xmax.md) - [ST_XMIN](/tidb-cloud-lake/sql/st-xmin.md) @@ -967,6 +996,7 @@ - [JSON_PATH_QUERY_ARRAY](/tidb-cloud-lake/sql/json-path-query-array.md) - [JSON_PATH_QUERY_FIRST](/tidb-cloud-lake/sql/json-path-query-first.md) - [JSON_PRETTY](/tidb-cloud-lake/sql/json-pretty.md) + - [JSON_STRIP_NULLS](/tidb-cloud-lake/sql/json-strip-nulls.md) - [JSON_TO_STRING](/tidb-cloud-lake/sql/json-to-string.md) - [JSON_TYPEOF](/tidb-cloud-lake/sql/json-typeof.md) - [PARSE_JSON](/tidb-cloud-lake/sql/parse-json.md) @@ -1111,6 +1141,7 @@ - [FUSE_SEGMENT](/tidb-cloud-lake/sql/fuse-segment.md) - [FUSE_SNAPSHOT](/tidb-cloud-lake/sql/fuse-snapshot.md) - [FUSE_STATISTIC](/tidb-cloud-lake/sql/fuse-statistic.md) + - [FUSE_TAG](/tidb-cloud-lake/sql/fuse-tag.md) - [FUSE_TIME_TRAVEL_SIZE](/tidb-cloud-lake/sql/fuse-time-travel-size.md) - [FUSE_VIRTUAL_COLUMN](/tidb-cloud-lake/sql/fuse-virtual-column.md) - [System Functions](/tidb-cloud-lake/sql/system-functions-sql.md) @@ -1128,9 +1159,11 @@ - [POLICY_REFERENCES](/tidb-cloud-lake/sql/policy-references.md) - [READ_FILE](/tidb-cloud-lake/sql/read-file.md) - [RESULT_SCAN](/tidb-cloud-lake/sql/result-scan.md) + - [SYSTEM$SET_CACHE_CAPACITY](/tidb-cloud-lake/sql/set-cache-capacity.md) - [SHOW_GRANTS](/tidb-cloud-lake/sql/show-grants-sql.md) - [SHOW_VARIABLES](/tidb-cloud-lake/sql/show-variables-sql.md) - [STREAM_STATUS](/tidb-cloud-lake/sql/stream-status.md) + - [TAG_REFERENCES](/tidb-cloud-lake/sql/tag-references.md) - [TASK_HISTORY](/tidb-cloud-lake/sql/task-history.md) - Sequence Functions - [Overview](/tidb-cloud-lake/sql/sequence-functions-overview.md) @@ -1144,12 +1177,12 @@ - [Test Functions](/tidb-cloud-lake/sql/test-functions.md) - [SLEEP](/tidb-cloud-lake/sql/sleep.md) - Other Functions + - [Overview](/tidb-cloud-lake/sql/other-functions.md) - [ASSUME_NOT_NULL](/tidb-cloud-lake/sql/assume-not-null.md) - [EXISTS](/tidb-cloud-lake/sql/exists.md) - [GROUPING](/tidb-cloud-lake/sql/grouping.md) - [HUMANIZE_NUMBER](/tidb-cloud-lake/sql/humanize-number.md) - [HUMANIZE_SIZE](/tidb-cloud-lake/sql/humanize-size.md) - - [Other Functions](/tidb-cloud-lake/sql/other-functions.md) - [REMOVE_NULLABLE](/tidb-cloud-lake/sql/remove-nullable.md) - [TO_NULLABLE](/tidb-cloud-lake/sql/nullable.md) - [TYPEOF](/tidb-cloud-lake/sql/typeof.md) diff --git a/tidb-cloud-lake/guides/data-lifecycle.md b/tidb-cloud-lake/guides/data-lifecycle.md index 3a2a2b86a2ff7..a1d4e227758c5 100644 --- a/tidb-cloud-lake/guides/data-lifecycle.md +++ b/tidb-cloud-lake/guides/data-lifecycle.md @@ -27,6 +27,7 @@ summary: "{{{ .lake }}} supports familiar Data Definition Language (DDL) and Dat - Grants - Warehouse - Task +- [Snapshot Tag](/tidb-cloud-lake/sql/table-versioning.md#snapshot-tags) ## Organizing Data @@ -35,7 +36,7 @@ Arrange your data in databases and tables. Key Commands: - [`CREATE DATABASE`](/tidb-cloud-lake/sql/create-database.md): To create a new database. -- [`ALTER DATABASE`](/tidb-cloud-lake/sql/rename-database.md): To modify an existing database. +- [`ALTER DATABASE`](/tidb-cloud-lake/sql/alter-database.md): To modify an existing database. - [`CREATE TABLE`](/tidb-cloud-lake/sql/create-table.md): To create a new table. - [`ALTER TABLE`](/tidb-cloud-lake/sql/alter-table.md): To modify an existing table. diff --git a/tidb-cloud-lake/guides/mcp-server.md b/tidb-cloud-lake/guides/mcp-server.md index 9f12968e61682..ab567e795997e 100644 --- a/tidb-cloud-lake/guides/mcp-server.md +++ b/tidb-cloud-lake/guides/mcp-server.md @@ -234,5 +234,5 @@ cd agent-ui && npm run dev - **GitHub Repository**: [databendlabs/mcp-databend](https://github.com/databendlabs/mcp-databend) - **PyPI Package**: [mcp-databend](https://pypi.org/project/mcp-databend) -- **Agno Framework**: [Agno MCP](https://docs.agno.com/tools/mcp/mcp) +- **Agno Framework**: [Agno MCP](https://docs.agno.com/tools/mcp/overview) - **Agent UI**: [Agent UI](https://docs.agno.com/agent-ui/introduction) diff --git a/tidb-cloud-lake/guides/mindsdb.md b/tidb-cloud-lake/guides/mindsdb.md index fdcd977804bc5..94eb5f0260182 100644 --- a/tidb-cloud-lake/guides/mindsdb.md +++ b/tidb-cloud-lake/guides/mindsdb.md @@ -11,7 +11,7 @@ Both {{{ .lake }}} and {{{ .lake }}} can integrate with MindsDB as a data source ## Tutorial-1: Integrating {{{ .lake }}} with MindsDB -Before you start, install a local MindsDB or sign up an account for MindsDB Cloud. This tutorial uses MindsDB Cloud. For more information about how to install a local MindsDB, refer to +Before you start, install a local MindsDB or sign up an account for MindsDB Cloud. This tutorial uses MindsDB Cloud. For more information about how to install a local MindsDB, refer to . ### Step 1. Load Dataset into {{{ .lake }}} @@ -100,7 +100,7 @@ WHERE (NO2 = 0.005) ## Tutorial-2: Integrating {{{ .lake }}} with MindsDB -Before you start, install a local MindsDB or sign up an account for MindsDB Cloud. This tutorial uses MindsDB Cloud. For more information about how to install a local MindsDB, refer to +Before you start, install a local MindsDB or sign up an account for MindsDB Cloud. This tutorial uses MindsDB Cloud. For more information about how to install a local MindsDB, refer to . ### Step 1. Load Dataset into {{{ .lake }}} diff --git a/tidb-cloud-lake/guides/recovery-from-operational-errors.md b/tidb-cloud-lake/guides/recovery-from-operational-errors.md index efb32efe27bda..6eee0afe15c41 100644 --- a/tidb-cloud-lake/guides/recovery-from-operational-errors.md +++ b/tidb-cloud-lake/guides/recovery-from-operational-errors.md @@ -197,7 +197,7 @@ If you've made unwanted changes to a table's structure, you can revert to the pr ## Important Considerations and Limitations - **Time Constraints**: Recovery only works within the retention period (default: 24 hours). -- **Name Conflicts**: Cannot undrop if an object with the same name exists — [rename database](/tidb-cloud-lake/sql/rename-database.md) or [rename table](/tidb-cloud-lake/sql/rename-table.md) first. +- **Name Conflicts**: Cannot undrop if an object with the same name exists — [rename database](/tidb-cloud-lake/sql/alter-database.md) or [rename table](/tidb-cloud-lake/sql/rename-table.md) first. - **Ownership**: Ownership isn't automatically restored—manually grant it after recovery. - **Transient Tables**: Flashback doesn't work for transient tables (no snapshots stored). diff --git a/tidb-cloud-lake/guides/task-flow.md b/tidb-cloud-lake/guides/task-flow.md new file mode 100644 index 0000000000000..cebe92db95262 --- /dev/null +++ b/tidb-cloud-lake/guides/task-flow.md @@ -0,0 +1,214 @@ +--- +title: Task Flow +summary: "Task Flow is {{{ .lake }}}'s built-in workflow orchestration feature. It lets you define, schedule, and monitor SQL-based data pipelines as directed acyclic graphs (DAGs)." +--- + +# Task Flow + +Task Flow is {{{ .lake }}}'s built-in workflow orchestration feature. It lets you define, schedule, and monitor SQL-based data pipelines as directed acyclic graphs (DAGs). Each node in the graph is a **Task** — a SQL statement with its own schedule, dependencies, and execution settings. A **Flow** groups multiple tasks together and manages their execution order automatically. + +## Overview + +Task Flow replaces the legacy Task List with a more powerful model: + +| Feature | Legacy Task List | Task Flow | +| --------------------- | ---------------- | --------- | +| Single SQL task | ✅ | ✅ | +| Multi-task DAG | ❌ | ✅ | +| Visual graph editor | ❌ | ✅ | +| Version history | ❌ | ✅ | +| Stream-based triggers | ❌ | ✅ | +| Bulk operations | ❌ | ✅ | + +## Key Concepts + +### Task + +A Task is the smallest unit of work. It contains: + +- A SQL statement to execute +- A schedule (manual, interval, or cron) +- Optional dependencies on other tasks or streams +- Advanced settings (failure threshold, result cache, min execution interval) + +### Flow + +A Flow is a named collection of tasks with dependency relationships. {{{ .lake }}} automatically determines execution order based on the DAG structure. A flow has: + +- A name and an assigned warehouse +- One or more tasks with defined dependencies +- A lifecycle: Created → Started → Suspended → Resumed → Dropped + +### DAG (Directed Acyclic Graph) + +The dependency graph between tasks. If Task B depends on Task A, {{{ .lake }}} runs Task A first and only triggers Task B after Task A succeeds. Cycles are not allowed. + +## Getting Started + +### Creating a Task Flow + +1. Navigate to **Data** > **Task & Flows** in the left sidebar. +2. Click **Create** in the top-right corner. +3. In the flow modal: + - Enter a **Flow Name**. + - Select a **Warehouse** to run the tasks on. +4. Click **Add Task to Flow** to add your first task. + +### Configuring a Task + +In the task form, fill in the following: + +**Basic Settings** + +| Field | Description | +| --------- | ------------------------------------------------------------------------ | +| Task Name | Unique name within the flow | +| Schedule | When to run: Manual, Interval (e.g. every 5 minutes), or Cron expression | +| Timezone | Timezone for cron schedule evaluation | +| SQL | The SQL statement to execute | +| Comment | Optional description | + +**Dependencies** + +| Field | Description | +| -------------- | ------------------------------------------------------------------- | +| Require Tasks | Other tasks that must complete before this task runs | +| Require Stream | A database stream that must have new data before this task triggers | + +**Advanced Options** + +| Field | Description | +| ------------------------------- | ----------------------------------------------------------------------- | +| Suspend Task After Num Failures | Automatically suspend the task after N consecutive failures (0 = never) | +| Enable Query Result Cache | Cache query results to avoid redundant computation | +| Min Execute Seconds | Minimum interval between executions (5s / 10s / 15s / 30s) | + +5. Click **Save** to add the task to the flow. +6. Repeat to add more tasks. Use **Require Tasks** to define dependencies between them. +7. Click **Publish** to create the flow. + +> **Note:** +> +> Only `account_admin` or the flow creator can edit or delete a flow. + +## Visualizing the Flow + +After creating a flow, click its name to open the details page. The **Latest Run** tab shows the DAG visualization. + +Each node displays: + +- Task name +- Latest execution status (color-coded) +- Execution time range +- Error message (if failed) + +**Status colors:** + +| Color | Status | +| ----------------- | ------------------- | +| Blue border | Scheduled | +| Green border | Succeeded | +| Red border | Failed | +| Light blue border | Executing | +| Gray border | Cancelled / Waiting | + +## Managing Flows + +### Flow Actions + +From the **Task & Flows** list, each row has an action menu with: + +| Action | Description | +| --------------------- | ------------------------------------- | +| Edit | Modify flow name, warehouse, or tasks | +| Suspend | Pause all scheduled executions | +| Resume | Re-enable scheduled executions | +| Execute Once | Trigger an immediate one-time run | +| View Runs History | See all past executions | +| View Versions History | Browse and compare previous versions | +| Delete | Permanently remove the flow | + +### Bulk Operations + +Select multiple flows using the checkboxes, then use the bulk action menu to: + +- Suspend all selected flows +- Resume all selected flows +- Drop all selected flows + +## Monitoring Executions + +### Runs History + +Click **Runs History** on the details page to see all past executions: + +| Column | Description | +| -------------- | ------------------------------------------------------ | +| Task Name | Which task ran | +| Warehouse | Warehouse used | +| State | Scheduled / Executing / Succeeded / Failed / Cancelled | +| SQL | The SQL that was executed (with Query ID link) | +| Scheduled Time | When the run was triggered | +| Completed Time | When the run finished | +| Comment | Task comment | + +Failed or cancelled runs show an error tooltip. You can click the error to view details or create a support ticket. + +### Global Task History + +Navigate to **Data** → **Task History** to see executions across all flows in your organization. You can filter by: + +- Task names (multi-select) +- Time range (Last 2 days, Last 3 days) + +## Version Control + +Every time you publish changes to a flow, {{{ .lake }}} saves a new version. To access version history: + +1. Open the flow details page. +2. Click the **Versions History** tab. + +### Comparing Versions + +1. Select two versions using the checkboxes. +2. Click **Compare**. +3. A side-by-side SQL diff drawer opens showing what changed between the two versions. + +### Reverting to a Previous Version + +1. Select a version from the list. +2. Click **Revert**. +3. Confirm the action in the dialog. + +The flow is restored to the selected version and a new version entry is created. + +## Scheduling Reference + +### Schedule Types + +**Manual**: The task only runs when triggered via **Execute Once**. No automatic scheduling. + +**Interval**: Run every N minutes/hours. Example: `EVERY 5 MINUTE`. + +**Cron**: Standard cron expression with timezone support. Example: `0 9 * * 1-5` (weekdays at 9am). + +### Stream-Based Triggers + +If a task has a **Require Stream** dependency, it only executes when the specified stream has unconsumed data. This is useful for building event-driven pipelines that react to table changes (CDC). + +## Best Practices + +- **Start simple**: Create a single-task flow first to validate your SQL before adding dependencies. +- **Use streams for CDC pipelines**: Combine stream triggers with `MERGE INTO` statements to build incremental data pipelines. +- **Set failure thresholds**: Use **Suspend Task After Num Failures** to prevent runaway retries from consuming warehouse credits. +- **Enable result cache**: For tasks that query the same data repeatedly, enable **Query Result Cache** to reduce compute costs. +- **Use version history**: Before making significant changes, note the current version number so you can revert if needed. +- **Separate warehouses by workload**: Assign heavier transformation tasks to a larger warehouse and lightweight tasks to a smaller one. + +## Permissions + +| Role | Create | Edit | Delete | View | +| ------------- | ------ | -------- | -------- | ---- | +| account_admin | ✅ | ✅ (any) | ✅ (any) | ✅ | +| Creator | ✅ | ✅ (own) | ✅ (own) | ✅ | +| Other users | ❌ | ❌ | ❌ | ✅ | diff --git a/tidb-cloud-lake/guides/unload-csv-file.md b/tidb-cloud-lake/guides/unload-csv-file.md index 88d58dd123623..c9b1ed85e25ac 100644 --- a/tidb-cloud-lake/guides/unload-csv-file.md +++ b/tidb-cloud-lake/guides/unload-csv-file.md @@ -5,9 +5,9 @@ summary: Learn about how to unload CSV files. # Unloading CSV File -## Unloading CSV File +This page describes how to unload CSV files using the `COPY INTO` command. -Syntax: +## Syntax ```sql COPY INTO { internalStage | externalStage | externalLocation } diff --git a/tidb-cloud-lake/guides/unload-data-from-databend.md b/tidb-cloud-lake/guides/unload-data.md similarity index 80% rename from tidb-cloud-lake/guides/unload-data-from-databend.md rename to tidb-cloud-lake/guides/unload-data.md index 086ac251e2a9f..b44411470a799 100644 --- a/tidb-cloud-lake/guides/unload-data-from-databend.md +++ b/tidb-cloud-lake/guides/unload-data.md @@ -1,6 +1,6 @@ --- title: Unload Data from TiDB Cloud Lake -summary: "{{{ .lake }}}'s COPY INTO command exports data to various file formats and storage locations with flexible formatting options." +summary: Learn how to unload data from TiDB Cloud Lake to various file formats and storage destinations using the `COPY INTO` command. --- # Unload Data from TiDB Cloud Lake @@ -15,6 +15,7 @@ summary: "{{{ .lake }}}'s COPY INTO command exports data to various file formats | [**Unload CSV File**](/tidb-cloud-lake/guides/unload-csv-file.md) | `FILE_FORMAT = (TYPE = CSV)` | Data exchange, universal compatibility | | [**Unload TSV File**](/tidb-cloud-lake/guides/unload-tsv-file.md) | `FILE_FORMAT = (TYPE = TSV)` | Tabular data with comma values | | [**Unload NDJSON File**](/tidb-cloud-lake/guides/unload-ndjson-file.md) | `FILE_FORMAT = (TYPE = NDJSON)` | Semi-structured data, flexible schemas | +| [**Unload Lance Dataset**](/tidb-cloud-lake/guides/unload-lance-dataset.md) | `FILE_FORMAT = (TYPE = LANCE)` | ML and vector workloads, Arrow/Lance consumers | ## Storage Destinations diff --git a/tidb-cloud-lake/guides/unload-lance-dataset.md b/tidb-cloud-lake/guides/unload-lance-dataset.md new file mode 100644 index 0000000000000..53cc5ffa286d4 --- /dev/null +++ b/tidb-cloud-lake/guides/unload-lance-dataset.md @@ -0,0 +1,180 @@ +--- +title: Unloading Lance Dataset +summary: Learn about how to unload Lance datasets. +--- + +## Unloading Lance Dataset + +Lance exports are aimed at dataset-oriented consumers such as machine learning and vector workflows. Unlike CSV, TSV, NDJSON, or Parquet unloading, {{{ .lake }}} writes a Lance **dataset directory** that contains `.lance` data files plus metadata such as `_versions/`. + +Syntax: + +```sql +COPY INTO { internalStage | externalStage | externalLocation } +FROM { [.] | ( ) } +FILE_FORMAT = (TYPE = LANCE) +[MAX_FILE_SIZE = ] +[USE_RAW_PATH = true | false] +[OVERWRITE = true | false] +[DETAILED_OUTPUT = true | false] +``` + +- Lance is supported only for `COPY INTO `. +- `SINGLE` and `PARTITION BY` are not supported with Lance. +- When `USE_RAW_PATH = false` (default), {{{ .lake }}} appends the query ID to the target path so each export gets its own dataset root. +- When you want a stable dataset URI for downstream readers such as Python `lance`, set `USE_RAW_PATH = true`. +- More details about the syntax can be found in [COPY INTO location](/tidb-cloud-lake/sql/copy-into-location.md). +- More Lance behavior notes are listed in [Input & Output File Formats](/tidb-cloud-lake/sql/input-output-file-formats.md#lance-options). + +## Tutorial + +This example builds a small document-classification dataset. The raw text files are stored in a stage, `READ_FILE` turns them into `BINARY` values during query execution, and {{{ .lake }}} exports the final dataset in Lance format for Python consumers. + +### Prerequisites + +Prepare an S3-compatible bucket that is reachable from both {{{ .lake }}} and your Python environment. + +### Step 1. Create an External Stage + +```sql +CREATE OR REPLACE STAGE ml_assets +URL = 's3://your-bucket/lance-demo/' +CONNECTION = ( + ENDPOINT_URL = '', + ACCESS_KEY_ID = '', + SECRET_ACCESS_KEY = '', + REGION = '' +); +``` + +### Step 2. Create Sample Source Files + +Create three raw text files in the stage: + +```sql +COPY INTO @ml_assets/raw/ticket_001.txt +FROM (SELECT 'customer asked for a refund after the package arrived damaged') +FILE_FORMAT = (TYPE = CSV FIELD_DELIMITER = '|' RECORD_DELIMITER = '\n') +SINGLE = TRUE +USE_RAW_PATH = TRUE +OVERWRITE = TRUE; + +COPY INTO @ml_assets/raw/ticket_002.txt +FROM (SELECT 'customer praised the fast response and confirmed the issue was resolved') +FILE_FORMAT = (TYPE = CSV FIELD_DELIMITER = '|' RECORD_DELIMITER = '\n') +SINGLE = TRUE +USE_RAW_PATH = TRUE +OVERWRITE = TRUE; + +COPY INTO @ml_assets/raw/ticket_003.txt +FROM (SELECT 'customer requested escalation because the replacement order was delayed') +FILE_FORMAT = (TYPE = CSV FIELD_DELIMITER = '|' RECORD_DELIMITER = '\n') +SINGLE = TRUE +USE_RAW_PATH = TRUE +OVERWRITE = TRUE; +``` + +### Step 3. Create a Manifest Table + +```sql +CREATE OR REPLACE TABLE support_ticket_manifest ( + ticket_id INT, + label STRING, + file_path STRING +); + +INSERT INTO support_ticket_manifest VALUES + (1, 'refund', 'raw/ticket_001.txt'), + (2, 'resolved', 'raw/ticket_002.txt'), + (3, 'escalation', 'raw/ticket_003.txt'); +``` + +### Step 4. Export the Dataset to Lance + +`READ_FILE` reads the staged text files as raw bytes. `COPY INTO` then writes those rows into a Lance dataset: + +```sql +COPY INTO @ml_assets/datasets/support-ticket-train +FROM ( + SELECT + ticket_id, + label, + file_path, + READ_FILE('@ml_assets', file_path) AS content + FROM support_ticket_manifest + ORDER BY ticket_id +) +FILE_FORMAT = (TYPE = LANCE) +USE_RAW_PATH = TRUE +OVERWRITE = TRUE +DETAILED_OUTPUT = TRUE; +``` + +Result: + +```text +┌───────────────────────────────────────────────────────────────┐ +│ file_name │ file_size │ row_count │ +├────────────────────────────────────┼───────────┼─────────────┤ +│ datasets/support-ticket-train │ ... │ 3 │ +└───────────────────────────────────────────────────────────────┘ +``` + +### Step 5. Inspect the Exported Dataset Layout + +```sql +LIST @ml_assets/datasets/support-ticket-train; +``` + +You will see a dataset directory that includes paths similar to: + +```text +datasets/support-ticket-train/_versions/... +datasets/support-ticket-train/data/... .lance +datasets/support-ticket-train/*.manifest +``` + +### Step 6. Verify with Python `lance` + +Install the Python package: + +```bash +pip install pylance +``` + +Read the exported dataset from the same object storage location: + +```python +import os +import lance + +storage_options = { + "aws_access_key_id": os.environ["AWS_ACCESS_KEY_ID"], + "aws_secret_access_key": os.environ["AWS_SECRET_ACCESS_KEY"], + "region": os.environ.get("AWS_REGION", "us-east-1"), +} + +if endpoint := os.environ.get("AWS_ENDPOINT_URL"): + storage_options["aws_endpoint"] = endpoint + storage_options["aws_allow_http"] = "true" if endpoint.startswith("http://") else "false" + +dataset = lance.dataset( + "s3://your-bucket/lance-demo/datasets/support-ticket-train", + storage_options=storage_options, +) + +table = dataset.to_table() +print(table.num_rows) +print(table["label"].to_pylist()) +print(table["content"].to_pylist()[0].decode("utf-8").strip()) +``` + +Expected output: + +```text +3 +['refund', 'resolved', 'escalation'] +customer asked for a refund after the package arrived damaged +``` + +At this point you have a complete Lance dataset that keeps the label, original path, and raw file bytes together for downstream ML processing. diff --git a/tidb-cloud-lake/guides/worksheet.md b/tidb-cloud-lake/guides/worksheet.md index 7ee6440f65c1f..6313a1c31f285 100644 --- a/tidb-cloud-lake/guides/worksheet.md +++ b/tidb-cloud-lake/guides/worksheet.md @@ -33,6 +33,18 @@ The query result shows in the output area. You can click **Export** to save the > > - If you enter multiple statements in the SQL input area, {{{ .lake }}} will only execute the statement where the cursor is located. You can move the cursor to execute other statements. Additionally, you can use keyboard shortcuts: Ctrl + Enter (Windows) or Command + Enter (Mac) to execute the current statement, and Ctrl + Shift + Enter (Windows) or Command + Shift + Enter (Mac) to execute all statements. +## Query Result Defaults + +{{{ .lake }}} applies the following default limits to query results displayed in the worksheet output area: + +| Setting | Default | Description | +|---|---|---| +| Max display rows | 10,000 | Only the first 10,000 rows are shown in the preview. | +| Max display columns | 200 | Only the first 200 columns are shown in the preview. | +| Max cell content length | 3,000 characters | Cell values longer than this are truncated in the display. | + +The row and column limits are fixed. To adjust the max cell content length, click the settings icon in the bottom-right corner of the result area and choose a value (3K–Unlimited). Note that setting a very large value or **Unlimited** may cause the browser to slow down or become unresponsive when working with large result sets. + ## Sharing a Worksheet You can share your worksheets with everyone in your organization or specific individuals. To do so, click **Share** in the worksheet you want to share, or click **Share this Folder** to share a worksheet folder. diff --git a/tidb-cloud-lake/sql/alter-database.md b/tidb-cloud-lake/sql/alter-database.md new file mode 100644 index 0000000000000..b2c6c9d740b4e --- /dev/null +++ b/tidb-cloud-lake/sql/alter-database.md @@ -0,0 +1,101 @@ +--- +title: ALTER DATABASE +summary: Changes the name of a database, or sets default storage options for a database. +--- + +# ALTER DATABASE + +> **Note:** +> +> Introduced or updated in v1.2.866. + +Changes the name of a database, or sets default storage options for a database. + +## Syntax + +```sql +-- Rename a database +ALTER DATABASE [ IF EXISTS ] RENAME TO + +-- Set default storage options +ALTER DATABASE [ IF EXISTS ] SET OPTIONS ( + DEFAULT_STORAGE_CONNECTION = '' + | DEFAULT_STORAGE_PATH = '' +) +``` + +## Parameters + +| Parameter | Description | +|:-----------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------| +| `DEFAULT_STORAGE_CONNECTION` | The name of an existing connection (created via `CREATE CONNECTION`) to use as the default storage connection for tables in this database. | +| `DEFAULT_STORAGE_PATH` | The default storage path URI (e.g., `s3://bucket/path/`) for tables in this database. Must end with `/` and match the connection's storage type. | + +> **Note:** +> +> - `SET OPTIONS` only affects tables created after the statement is executed. Existing tables are not changed. +> - You can update one option at a time, as long as the other option already exists on the database. + +## Examples + +### Rename a database + +```sql +CREATE DATABASE LAKE; +``` + +```sql +SHOW DATABASES; ++--------------------+ +| Database | ++--------------------+ +| LAKE | +| information_schema | +| default | +| system | ++--------------------+ +``` + +```sql +ALTER DATABASE `LAKE` RENAME TO `NEW_LAKE`; +``` + +```sql +SHOW DATABASES; ++--------------------+ +| Database | ++--------------------+ +| information_schema | +| NEW_LAKE | +| default | +| system | ++--------------------+ +``` + +### Set default storage options + +```sql +ALTER DATABASE analytics SET OPTIONS ( + DEFAULT_STORAGE_CONNECTION = 'my_s3', + DEFAULT_STORAGE_PATH = 's3://mybucket/analytics_v2/' +); +``` + +## Tag Operations + +Assigns or removes tags on a database. Tags must be created with [CREATE TAG](/tidb-cloud-lake/sql/create-tag.md) first. For full details, see [SET TAG / UNSET TAG](/tidb-cloud-lake/sql/set-tag.md). + +### Syntax + +```sql +ALTER DATABASE [ IF EXISTS ] SET TAG = '' [, = '' ...] + +ALTER DATABASE [ IF EXISTS ] UNSET TAG [, ...] +``` + +### Examples + +```sql +ALTER DATABASE mydb SET TAG env = 'prod', owner = 'team_a'; +ALTER DATABASE mydb UNSET TAG env, owner; +``` diff --git a/tidb-cloud-lake/sql/alter-table.md b/tidb-cloud-lake/sql/alter-table.md index 59f1efe74f211..60060a3c53f89 100644 --- a/tidb-cloud-lake/sql/alter-table.md +++ b/tidb-cloud-lake/sql/alter-table.md @@ -227,9 +227,9 @@ CREATE TABLE t(id INT); ALTER TABLE t COMMENT = 'new-comment'; ``` -## Fuse Engine Options {#fuse-engine-options} +## Fuse Engine Options -Sets or unsets [Fuse Engine options](/tidb-cloud-lake/sql/table-engines.md#available-engines) for a table. +Sets or unsets [Fuse Engine options](/tidb-cloud-lake/sql/fuse-engine-tables.md#fuse-engine-options) for a table. ### Syntax @@ -351,12 +351,27 @@ CONNECTION=(connection_name = 's3_access_key_conn'); CREATE CONNECTION s3_role_conn STORAGE_TYPE = 's3' - ROLE_ARN = 'arn:aws:iam::123456789012:role/databend-access'; + ROLE_ARN = 'arn:aws:iam::123456789012:role/lake-access'; ALTER TABLE sales_data CONNECTION=( connection_name = 's3_role_conn' ); ``` -## Swap Tables {#swap-tables} +## Snapshot Tag Operations + + + +Creates or drops a named snapshot tag that references a specific FUSE table snapshot. Snapshot tags let you bookmark a point-in-time state of a table so you can query it later with the [AT](/tidb-cloud-lake/sql/at.md) clause. + +For full details, see: + +- [CREATE SNAPSHOT TAG](/tidb-cloud-lake/sql/create-snapshot-tag.md) +- [DROP SNAPSHOT TAG](/tidb-cloud-lake/sql/drop-snapshot-tag.md) + +> **Note:** +> +> Snapshot tags are different from [governance tags](#tag-operations). Snapshot tags bookmark a table snapshot for time travel, while governance tags attach key-value metadata to objects for classification. + +## Swap Tables Swaps all table metadata and data between two tables atomically in a single transaction. This operation exchanges the table schemas, including all columns, constraints, and data, effectively making each table take on the identity of the other. @@ -398,3 +413,24 @@ ALTER TABLE t1 SWAP WITH t2; DESC t1; DESC t2; ``` + +## Tag Operations + +Assigns or removes governance tags on a table. Governance tags are key-value metadata for classification and data governance. Tags must be created with [CREATE TAG](/tidb-cloud-lake/sql/create-tag.md) first. For full details, see [SET TAG / UNSET TAG](/tidb-cloud-lake/sql/set-tag.md). + +### Syntax + +```sql +ALTER TABLE [ IF EXISTS ] [ . ] + SET TAG = '' [, = '' ...] + +ALTER TABLE [ IF EXISTS ] [ . ] + UNSET TAG [, ...] +``` + +### Examples + +```sql +ALTER TABLE default.users SET TAG env = 'prod', owner = 'team_a'; +ALTER TABLE default.users UNSET TAG env, owner; +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/alter-view.md b/tidb-cloud-lake/sql/alter-view.md index 20efe221f2544..4673c5641cc05 100644 --- a/tidb-cloud-lake/sql/alter-view.md +++ b/tidb-cloud-lake/sql/alter-view.md @@ -38,3 +38,24 @@ SELECT * FROM tmp_view; | 2 | +------+ ``` + +## Tag Operations {#tag-operations} + +Assigns or removes tags on a view. Tags must be created with [CREATE TAG](/tidb-cloud-lake/sql/create-tag.md) first. For full details, see [SET TAG / UNSET TAG](/tidb-cloud-lake/sql/set-tag.md). + +### Syntax + +```sql +ALTER VIEW [ IF EXISTS ] [ . ] + SET TAG = '' [, = '' ...] + +ALTER VIEW [ IF EXISTS ] [ . ] + UNSET TAG [, ...] +``` + +### Examples + +```sql +ALTER VIEW default.active_users SET TAG env = 'prod', owner = 'analytics'; +ALTER VIEW default.active_users UNSET TAG env, owner; +``` diff --git a/tidb-cloud-lake/sql/at.md b/tidb-cloud-lake/sql/at.md index 957ba25178e9d..591b06a7d1a9c 100644 --- a/tidb-cloud-lake/sql/at.md +++ b/tidb-cloud-lake/sql/at.md @@ -34,6 +34,7 @@ AT ( | TIMESTAMP | Specifies a particular timestamp to retrieve data from. | | STREAM | Indicates querying the data at the time the specified stream was created. | | OFFSET | Specifies the number of seconds to go back from the current time. It should be in the form of a negative integer, where the absolute value represents the time difference in seconds. For example, `-3600` represents traveling back in time by 1 hour (3,600 seconds). | +| TAG | Specifies a named tag created by `ALTER TABLE ... CREATE TAG` to query the snapshot associated with that tag. This is an experimental feature and requires `SET enable_experimental_table_ref = 1`. See [Snapshot Tag Operations](/tidb-cloud-lake/sql/alter-table.md#snapshot-tag-operations). | ## Obtaining Snapshot ID and Timestamp @@ -53,7 +54,7 @@ This example demonstrates the AT clause, allowing retrieval of previous data ver ```sql CREATE TABLE t(a INT); - + INSERT INTO t VALUES(1); INSERT INTO t VALUES(2); ``` @@ -62,7 +63,7 @@ This example demonstrates the AT clause, allowing retrieval of previous data ver ```sql CREATE STREAM s ON TABLE t; - + INSERT INTO t VALUES(3); ``` diff --git a/tidb-cloud-lake/sql/copy-into-location.md b/tidb-cloud-lake/sql/copy-into-location.md index a2508c7ddc5b9..6df35a9714485 100644 --- a/tidb-cloud-lake/sql/copy-into-location.md +++ b/tidb-cloud-lake/sql/copy-into-location.md @@ -24,7 +24,7 @@ FROM { [.] | ( ) } [ PARTITION BY ( ) ] [ FILE_FORMAT = ( FORMAT_NAME = '' - | TYPE = { CSV | TSV | NDJSON | PARQUET } [ formatTypeOptions ] + | TYPE = { CSV | TSV | NDJSON | PARQUET | LANCE } [ formatTypeOptions ] ) ] [ copyOptions ] [ VALIDATION_MODE = RETURN_ROWS ] @@ -122,6 +122,8 @@ For the connection parameters available for accessing Tencent Cloud Object Stora See [Input & Output File Formats](/tidb-cloud-lake/sql/input-output-file-formats.md) for details. +`LANCE` is supported only in `COPY INTO `. {{{ .lake}}} writes a Lance dataset directory under the target path, not a standalone file. + ### PARTITION BY Specifies an expression used to partition the unloaded data into separate folders. The expression must evaluate to a `STRING` type. Each distinct value produced by the expression creates a subfolder in the destination path, and the corresponding rows are written into files under that subfolder. @@ -157,6 +159,13 @@ copyOptions ::= | INCLUDE_QUERY_ID | true | When `true`, a unique UUID will be included in the exported file names. | | USE_RAW_PATH | false | When `true`, the exact user-provided path (including the full file name) will be used for exporting the data. If set to `false`, the user must provide a directory path. | +> **Note:** +> +> - With `TYPE = LANCE`, `SINGLE` is not supported. +> - With `TYPE = LANCE`, `PARTITION BY` is not supported. +> - With `TYPE = LANCE`, `USE_RAW_PATH = TRUE` is recommended when you want a stable dataset URI for downstream Lance readers. +> - With `TYPE = LANCE` and `USE_RAW_PATH = FALSE`, {{{ .lake}}} appends the query ID to the target path and creates a separate dataset root for each export. + ### DETAILED_OUTPUT Determines whether a detailed result of the data unloading should be returned, with the default value set to `false`. For more information, see [Output](#output). @@ -350,3 +359,31 @@ SELECT name FROM list_stage(location => '@partitioned_stage') ORDER BY name; ``` When the partition expression evaluates to `NULL`, the data is placed in a `_NULL_` folder. Each unique partition value creates its own subfolder containing the corresponding data files. + +### Example 5: Unloading to a Lance Dataset + +This example unloads data as a Lance dataset directory instead of standalone files: + +```sql +CREATE STAGE ml_stage; + +COPY INTO @ml_stage/datasets/train +FROM ( + SELECT number, number + 1 AS label + FROM numbers(10) +) +FILE_FORMAT = (TYPE = LANCE) +USE_RAW_PATH = TRUE +OVERWRITE = TRUE +DETAILED_OUTPUT = TRUE; +``` + +The output path will contain a Lance dataset layout with entries similar to: + +```text +datasets/train/_versions/... +datasets/train/data/... .lance +datasets/train/*.manifest +``` + +For a complete end-to-end example, including validation with Python `lance`, see [Unloading Lance Dataset](/tidb-cloud-lake/guides/unload-lance-dataset.md). diff --git a/tidb-cloud-lake/sql/create-database.md b/tidb-cloud-lake/sql/create-database.md index 17441482f4f90..41855dd5f9bea 100644 --- a/tidb-cloud-lake/sql/create-database.md +++ b/tidb-cloud-lake/sql/create-database.md @@ -7,7 +7,7 @@ summary: Create a database. > **Note:** > -> Introduced or updated in v1.2.339. +> Introduced or updated in v1.2.866. Create a database. @@ -15,8 +15,24 @@ Create a database. ```sql CREATE [ OR REPLACE ] DATABASE [ IF NOT EXISTS ] + [ OPTIONS ( + DEFAULT_STORAGE_CONNECTION = '', + DEFAULT_STORAGE_PATH = '' + ) ] ``` +## Parameters + +| Parameter | Description | +|:-----------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------| +| `DEFAULT_STORAGE_CONNECTION` | The name of an existing connection (created via `CREATE CONNECTION`) to use as the default storage connection for tables in this database. | +| `DEFAULT_STORAGE_PATH` | The default storage path URI (e.g., `s3://bucket/path/`) for tables in this database. Must end with `/` and match the connection's storage type. | + +> **Note:** +> +> - `DEFAULT_STORAGE_CONNECTION` and `DEFAULT_STORAGE_PATH` must be specified together. Specifying only one is an error. +> - When both options are set, {{{ .lake }}} validates that the connection exists, the path URI is well-formed, and the storage location is accessible. + ## Access control requirements | Privilege | Object Type | Description | @@ -32,3 +48,14 @@ The following example creates a database named `test`: ```sql CREATE DATABASE test; ``` + +The following example creates a database with a default storage connection and path: + +```sql +CREATE CONNECTION my_s3 STORAGE_TYPE = 's3' ACCESS_KEY_ID = '' SECRET_ACCESS_KEY = ''; + +CREATE DATABASE analytics OPTIONS ( + DEFAULT_STORAGE_CONNECTION = 'my_s3', + DEFAULT_STORAGE_PATH = 's3://mybucket/analytics/' +); +``` diff --git a/tidb-cloud-lake/sql/create-file-format.md b/tidb-cloud-lake/sql/create-file-format.md index a309b2acd538d..7fb41bf375c29 100644 --- a/tidb-cloud-lake/sql/create-file-format.md +++ b/tidb-cloud-lake/sql/create-file-format.md @@ -52,3 +52,23 @@ COPY INTO analytics.orders FROM @sales_stage/2024/order.parquet FILE_FORMAT = (FORMAT_NAME = 'my_parquet'); ``` + +## LANCE format note + +You can also create a named Lance file format: + +```sql +CREATE FILE FORMAT my_lance TYPE = LANCE; +``` + +Unlike CSV, TSV, NDJSON, or PARQUET, a named `LANCE` format is only reusable with `COPY INTO `. It is not supported for stage-table reads or `COPY INTO `, because {{{ .lake}}} writes a Lance dataset directory rather than a standalone file. + +```sql +COPY INTO @ml_stage/datasets/train +FROM my_training_table +FILE_FORMAT = (FORMAT_NAME = 'my_lance') +USE_RAW_PATH = TRUE +OVERWRITE = TRUE; +``` + +For Lance-specific behavior and limitations, see [Input & Output File Formats](/tidb-cloud-lake/sql/input-output-file-formats.md#lance-options) and [`COPY INTO `](/tidb-cloud-lake/sql/copy-into-location.md). diff --git a/tidb-cloud-lake/sql/create-snapshot-tag.md b/tidb-cloud-lake/sql/create-snapshot-tag.md new file mode 100644 index 0000000000000..44df5e89a7403 --- /dev/null +++ b/tidb-cloud-lake/sql/create-snapshot-tag.md @@ -0,0 +1,83 @@ +--- +title: CREATE SNAPSHOT TAG +summary: Creates a named snapshot tag on a FUSE table, allowing you to bookmark and query specific points in the table's history. +--- + +# CREATE SNAPSHOT TAG + +> **Note:** +> +> Introduced or updated in v1.2.891. + +Creates a named snapshot tag on a FUSE table. A snapshot tag bookmarks a specific point-in-time state of the table, allowing you to query that state later with the [AT](/tidb-cloud-lake/sql/at.md) clause. + +> **Note:** +> +> - This is an **experimental** feature. Enable it first before use: `SET enable_experimental_table_ref = 1;`. +> - Only supported on FUSE engine tables. Memory engine tables and temporary tables are not supported. + +## Syntax + +```sql +ALTER TABLE [.] CREATE TAG + [ AT ( + SNAPSHOT => '' | + TIMESTAMP => | + STREAM => | + OFFSET => | + TAG => + ) ] + [ RETAIN { DAYS | SECONDS } ] +``` + +## Parameters + +| Parameter | Description | +|-----------|-------------| +| tag_name | The name of the tag. Must be unique within the table. | +| AT | Specifies which snapshot the tag references. If omitted, the tag references the current (latest) snapshot. Supports the same options as the [AT](/tidb-cloud-lake/sql/at.md) clause, plus `TAG` to copy from an existing tag. | +| RETAIN | Sets an automatic expiration period. After the specified duration, the tag is removed during the next [VACUUM](/tidb-cloud-lake/sql/vacuum-table.md) operation. Without `RETAIN`, the tag persists until explicitly dropped. | + +## Examples + +### Tag the Current Snapshot + +```sql +SET enable_experimental_table_ref = 1; + +CREATE TABLE t1(a INT, b STRING); +INSERT INTO t1 VALUES (1, 'a'), (2, 'b'), (3, 'c'); + +-- Create a tag at the current snapshot +ALTER TABLE t1 CREATE TAG v1_0; + +-- Insert more data +INSERT INTO t1 VALUES (4, 'd'), (5, 'e'); + +-- Query the tagged snapshot (returns 3 rows, not 5) +SELECT * FROM t1 AT (TAG => v1_0) ORDER BY a; +``` + +### Tag from an Existing Reference + +```sql +-- Copy from an existing tag +ALTER TABLE t1 CREATE TAG v1_0_copy AT (TAG => v1_0); + +-- Tag a specific snapshot +ALTER TABLE t1 CREATE TAG before_migration + AT (SNAPSHOT => 'aaa4857c5935401790db2c9f0f2818be'); + +-- Tag the state from 1 hour ago +ALTER TABLE t1 CREATE TAG hourly_checkpoint AT (OFFSET => -3600); +``` + +### Tag with Automatic Expiration + +```sql +-- Tag expires after 7 days +ALTER TABLE t1 CREATE TAG temp_tag RETAIN 7 DAYS; + +-- Tag expires after 3600 seconds +ALTER TABLE t1 CREATE TAG debug_snapshot RETAIN 3600 SECONDS; +``` diff --git a/tidb-cloud-lake/sql/create-tag.md b/tidb-cloud-lake/sql/create-tag.md new file mode 100644 index 0000000000000..ac0e709ba42bc --- /dev/null +++ b/tidb-cloud-lake/sql/create-tag.md @@ -0,0 +1,62 @@ +--- +title: CREATE TAG +summary: Creates a new tag with optional allowed values and comment. +--- + +# CREATE TAG + +> **Note:** +> +> Introduced or updated in v1.2.863. + +Creates a new tag. Tags are tenant-level metadata objects that can be assigned to database objects for governance and classification. + +See also: [DROP TAG](/tidb-cloud-lake/sql/drop-tag.md), [SHOW TAGS](/tidb-cloud-lake/sql/show-tags.md), [SET TAG / UNSET TAG](/tidb-cloud-lake/sql/set-tag.md). + +## Syntax + +```sql +CREATE TAG [ IF NOT EXISTS ] + [ ALLOWED_VALUES = ( '' [, '', ... ] ) ] + [ COMMENT = '' ] +``` + +| Parameter | Description | +|------------------|----------------------------------------------------------| +| `tag_name` | Name of the tag to create. | +| `ALLOWED_VALUES` | Optional list of permitted values. When set, only these values can be used in SET TAG. Duplicate values are removed automatically. | +| `COMMENT` | Optional description for the tag. | + +## Examples + +Create a tag with allowed values and a comment: + +```sql +CREATE TAG env ALLOWED_VALUES = ('dev', 'staging', 'prod') COMMENT = 'Environment classification'; +``` + +Create a tag that accepts any value: + +```sql +CREATE TAG owner COMMENT = 'Data owner'; +``` + +Create a tag with no restrictions: + +```sql +CREATE TAG cost_center; +``` + +Verify tag definitions: + +```sql +SELECT name, allowed_values, comment FROM system.tags ORDER BY name; + +┌──────────────────────────────────────────────────────────────────────┐ +│ name │ allowed_values │ comment │ +├────────────────┼────────────────────────────┼────────────────────────┤ +│ cost_center │ NULL │ │ +│ env │ ['dev', 'staging', 'prod'] │ Environment classific… │ +│ owner │ NULL │ Data owner │ +└──────────────────────────────────────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/ddl-database-overview.md b/tidb-cloud-lake/sql/ddl-database-overview.md index 94bd90b10f182..676d0e8e6576a 100644 --- a/tidb-cloud-lake/sql/ddl-database-overview.md +++ b/tidb-cloud-lake/sql/ddl-database-overview.md @@ -12,7 +12,7 @@ This page provides a comprehensive overview of database operations in {{{ .lake | Command | Description | |---------|-------------| | [CREATE DATABASE](/tidb-cloud-lake/sql/create-database.md) | Creates a new database | -| [ALTER DATABASE](/tidb-cloud-lake/sql/rename-database.md) | Modifies a database | +| [ALTER DATABASE](/tidb-cloud-lake/sql/alter-database.md) | Modifies a database | | [DROP DATABASE](/tidb-cloud-lake/sql/drop-database.md) | Removes a database | | [USE DATABASE](/tidb-cloud-lake/sql/use-database.md) | Sets the current working database | | [UNDROP DATABASE](/tidb-cloud-lake/sql/undrop-database.md) | Recovers a dropped database | diff --git a/tidb-cloud-lake/sql/drop-snapshot-tag.md b/tidb-cloud-lake/sql/drop-snapshot-tag.md new file mode 100644 index 0000000000000..63937e9d4917b --- /dev/null +++ b/tidb-cloud-lake/sql/drop-snapshot-tag.md @@ -0,0 +1,46 @@ +--- +title: DROP SNAPSHOT TAG +summary: Drops a named snapshot tag from a FUSE table, allowing the referenced snapshot to be garbage collected if no other tags or retention policies protect it. +--- + +# DROP SNAPSHOT TAG + +> **Note:** +> +> Introduced or updated in v1.2.891. + +Drops a named snapshot tag from a FUSE table. Once dropped, the referenced snapshot becomes eligible for garbage collection if no other tags or retention policies protect it. + +> **Note:** +> +> - This is an **experimental** feature. Enable it first before use: `SET enable_experimental_table_ref = 1;`. +> - Only supported on FUSE engine tables. + +## Syntax + +```sql +ALTER TABLE [.] DROP TAG +``` + +## Parameters + +| Parameter | Description | +|-----------|-------------| +| tag_name | The name of the snapshot tag to drop. An error is returned if the tag does not exist. | + +## Examples + +```sql +SET enable_experimental_table_ref = 1; + +CREATE TABLE t1(a INT, b STRING); +INSERT INTO t1 VALUES (1, 'a'), (2, 'b'); + +-- Create and then drop a tag +ALTER TABLE t1 CREATE TAG v1_0; +ALTER TABLE t1 DROP TAG v1_0; + +-- Querying a dropped tag returns an error +SELECT * FROM t1 AT (TAG => v1_0); +-- Error: tag 'v1_0' not found +``` diff --git a/tidb-cloud-lake/sql/drop-tag.md b/tidb-cloud-lake/sql/drop-tag.md new file mode 100644 index 0000000000000..97c2bb3565147 --- /dev/null +++ b/tidb-cloud-lake/sql/drop-tag.md @@ -0,0 +1,40 @@ +--- +title: DROP TAG +summary: Removes a tag. A tag cannot be dropped if it is still referenced by any object. +--- + +# DROP TAG + +> **Note:** +> +> Introduced or updated in v1.2.863. + +Removes a tag. A tag cannot be dropped if it is still referenced by any object — you must first unset the tag from all objects or drop those objects. + +See also: [CREATE TAG](/tidb-cloud-lake/sql/create-tag.md), [SET TAG / UNSET TAG](/tidb-cloud-lake/sql/set-tag.md). + +## Syntax + +```sql +DROP TAG [ IF EXISTS ] +``` + +## Examples + +```sql +-- Fails if the tag is still in use +DROP TAG env; +-- Error: Tag 'env' still has references + +-- Remove the tag reference first +ALTER TABLE my_table UNSET TAG env; + +-- Now it succeeds +DROP TAG env; +``` + +Drop a tag only if it exists: + +```sql +DROP TAG IF EXISTS env; +``` diff --git a/tidb-cloud-lake/sql/fuse-engine-tables.md b/tidb-cloud-lake/sql/fuse-engine-tables.md index cbd8f8380df55..ff177bb63e8c2 100644 --- a/tidb-cloud-lake/sql/fuse-engine-tables.md +++ b/tidb-cloud-lake/sql/fuse-engine-tables.md @@ -7,7 +7,7 @@ summary: "{{{ .lake }}} uses the Fuse Engine as its default storage engine, prov > **Note:** > -> Introduced or updated in v1.2.736. +> Introduced or updated in v1.2.892. ## Overview @@ -44,21 +44,22 @@ For more details about the `CREATE TABLE` syntax, see [CREATE TABLE](/tidb-cloud Below are the main parameters for creating a Fuse Engine table: -### `ENGINE` +#### `ENGINE` -- **Description:** If an engine is not explicitly specified, {{{ .lake }}} will automatically default to using the Fuse Engine to create tables, which is equivalent to `ENGINE = FUSE`. +**Description:** If an engine is not explicitly specified, {{{ .lake }}} will automatically default to using the Fuse Engine to create tables, which is equivalent to `ENGINE = FUSE`. -### `CLUSTER BY` +#### `CLUSTER BY` -- **Description:** Specifies the sorting method for data that consists of multiple expressions. For more information, see [Cluster Key](/tidb-cloud-lake/guides/cluster-key-performance.md). +**Description:** Specifies the sorting method for data that consists of multiple expressions. For more information, see [Cluster Key](/tidb-cloud-lake/sql/cluster-key.md). -### `` +#### `` -- **Description:** The Fuse Engine offers various options (case-insensitive) that allow you to customize the table's properties. - - See [Fuse Engine Options](#fuse-engine-options) for details. - - Separate multiple options with a space. - - Use [ALTER TABLE](/tidb-cloud-lake/sql/alter-table.md#fuse-engine-options) to modify a table's options. - - Use [SHOW CREATE TABLE](/tidb-cloud-lake/sql/show-create-table.md) to show a table's options. +**Description:** The Fuse Engine offers various options (case-insensitive) that allow you to customize the table's properties. + +- See [Fuse Engine Options](#fuse-engine-options) for details. +- Separate multiple options with a space. +- Use [ALTER TABLE](/tidb-cloud-lake/sql/alter-table.md#fuse-engine-options) to modify a table's options. +- Use [SHOW CREATE TABLE](/tidb-cloud-lake/sql/show-create-table.md) to show a table's options. ## Fuse Engine Options @@ -94,10 +95,29 @@ Below are the available Fuse Engine options, grouped by their purpose: - **Syntax:** `bloom_index_columns = ' [, ...]'` - **Description:** Specifies the columns to be used for the bloom index. The data type of these columns can be Map, Number, String, Date, or Timestamp. If no specific columns are specified, the bloom index is created by default on all supported columns. `bloom_index_columns=''` disables the bloom indexing. +### `bloom_index_type` + +- **Syntax:** `bloom_index_type = 'xor8' | 'binary_fuse32'` +- **Description:** Specifies the filter algorithm used for the bloom index. Defaults to `xor8`. Use `binary_fuse32` for tables with heavy point-lookup workloads — it offers a lower false-positive rate at the cost of a larger index size (approximately 4x that of `xor8`). + + Note that `ALTER TABLE ... SET OPTIONS(bloom_index_type = ...)` only affects new writes and rebuilt bloom indexes. Existing `xor8` index files and new `binary_fuse32` index files can coexist in the same table. + + **Examples:** + ```sql + -- Set bloom_index_type at table creation + CREATE TABLE t (a INT) bloom_index_type = 'binary_fuse32'; + + -- Change bloom_index_type for an existing table (affects new writes only) + ALTER TABLE t SET OPTIONS(bloom_index_type = 'binary_fuse32'); + + -- Revert to xor8 + ALTER TABLE t SET OPTIONS(bloom_index_type = 'xor8'); + ``` + ### `change_tracking` - **Syntax:** `change_tracking = True / False` -- **Description:** Setting this option to `True` in the Fuse Engine allows for tracking changes for a table. Creating a stream for a table will automatically set `change_tracking` to `True` and introduce additional hidden columns to the table as change tracking metadata. For more information, see [How Stream Works](/tidb-cloud-lake/sql/stream.md#stream-management). +- **Description:** Setting this option to `True` in the Fuse Engine allows for tracking changes for a table. Creating a stream for a table will automatically set `change_tracking` to `True` and introduce additional hidden columns to the table as change tracking metadata. For more information, see [How Stream Works](/tidb-cloud-lake/guides/track-and-transform-data-via-streams.md). ### `data_retention_period_in_hours` @@ -109,62 +129,62 @@ Below are the available Fuse Engine options, grouped by their purpose: - **Syntax:** `enable_auto_vacuum = 0 / 1` - **Description:** Controls whether a table automatically triggers vacuum operations during mutations. This can be set globally as a setting for all tables or configured at the table level. The table-level option has a higher priority than the session/global setting of the same name. When enabled (set to 1), vacuum operations will be automatically triggered after mutations like INSERT or ALTER TABLE, cleaning up the table data according to the configured retention policy. - **Examples:** - - ```sql - -- Set enable_auto_vacuum globally for all tables across all sessions - SET GLOBAL enable_auto_vacuum = 1; +**Examples:** - -- Create a table with auto vacuum disabled (overrides global setting) - CREATE OR REPLACE TABLE t1 (id INT) ENABLE_AUTO_VACUUM = 0; - INSERT INTO t1 VALUES(1); -- Won't trigger vacuum despite global setting - - -- Create another table that inherits the global setting - CREATE OR REPLACE TABLE t2 (id INT); - INSERT INTO t2 VALUES(1); -- Will trigger vacuum due to global setting - - -- Enable auto vacuum for an existing table - ALTER TABLE t1 SET OPTIONS(ENABLE_AUTO_VACUUM = 1); - INSERT INTO t1 VALUES(2); -- Now will trigger vacuum - - -- Table option takes precedence over global settings - SET GLOBAL enable_auto_vacuum = 0; -- Turn off globally - -- t1 will still vacuum because table setting overrides global - INSERT INTO t1 VALUES(3); -- Will still trigger vacuum - INSERT INTO t2 VALUES(2); -- Won't trigger vacuum anymore - ``` +```sql +-- Set enable_auto_vacuum globally for all tables across all sessions +SET GLOBAL enable_auto_vacuum = 1; + +-- Create a table with auto vacuum disabled (overrides global setting) +CREATE OR REPLACE TABLE t1 (id INT) ENABLE_AUTO_VACUUM = 0; +INSERT INTO t1 VALUES(1); -- Won't trigger vacuum despite global setting + +-- Create another table that inherits the global setting +CREATE OR REPLACE TABLE t2 (id INT); +INSERT INTO t2 VALUES(1); -- Will trigger vacuum due to global setting + +-- Enable auto vacuum for an existing table +ALTER TABLE t1 SET OPTIONS(ENABLE_AUTO_VACUUM = 1); +INSERT INTO t1 VALUES(2); -- Now will trigger vacuum + +-- Table option takes precedence over global settings +SET GLOBAL enable_auto_vacuum = 0; -- Turn off globally +-- t1 will still vacuum because table setting overrides global +INSERT INTO t1 VALUES(3); -- Will still trigger vacuum +INSERT INTO t2 VALUES(2); -- Won't trigger vacuum anymore +``` ### `data_retention_num_snapshots_to_keep` - **Syntax:** `data_retention_num_snapshots_to_keep = ` - **Description:** Specifies the number of snapshots to retain during vacuum operations. This can be set globally as a setting for all tables or configured at the table level. The table-level option has a higher priority than the session/global setting of the same name. When set, only the specified number of most recent snapshots will be kept after vacuum operations. Overrides the `data_retention_time_in_days` setting. If set to 0, this setting will be ignored. This option works in conjunction with the `enable_auto_vacuum` setting to provide granular control over snapshot retention policies. - **Examples:** +**Examples:** - ```sql - -- Set global retention to 10 snapshots for all tables across all sessions - SET GLOBAL data_retention_num_snapshots_to_keep = 10; +```sql +-- Set global retention to 10 snapshots for all tables across all sessions +SET GLOBAL data_retention_num_snapshots_to_keep = 10; - -- Create a table with custom snapshot retention (overrides global setting) - CREATE OR REPLACE TABLE t1 (id INT) - enable_auto_vacuum = 1 - data_retention_num_snapshots_to_keep = 5; +-- Create a table with custom snapshot retention (overrides global setting) +CREATE OR REPLACE TABLE t1 (id INT) + enable_auto_vacuum = 1 + data_retention_num_snapshots_to_keep = 5; - -- Create another table that inherits the global setting - CREATE OR REPLACE TABLE t2 (id INT) enable_auto_vacuum = 1; +-- Create another table that inherits the global setting +CREATE OR REPLACE TABLE t2 (id INT) enable_auto_vacuum = 1; - -- When vacuum is triggered: - -- t1 will keep 5 snapshots (table setting) - -- t2 will keep 10 snapshots (global setting) +-- When vacuum is triggered: +-- t1 will keep 5 snapshots (table setting) +-- t2 will keep 10 snapshots (global setting) - -- Change global setting - SET GLOBAL data_retention_num_snapshots_to_keep = 20; +-- Change global setting +SET GLOBAL data_retention_num_snapshots_to_keep = 20; - -- Table options still take precedence: - -- t1 will still keep only 5 snapshots - -- t2 will now keep 20 snapshots +-- Table options still take precedence: +-- t1 will still keep only 5 snapshots +-- t2 will now keep 20 snapshots - -- Modify snapshot retention for an existing table - ALTER TABLE t1 SET OPTIONS(data_retention_num_snapshots_to_keep = 3); - -- Now t1 will keep 3 snapshots when vacuum is triggered - ``` +-- Modify snapshot retention for an existing table +ALTER TABLE t1 SET OPTIONS(data_retention_num_snapshots_to_keep = 3); +-- Now t1 will keep 3 snapshots when vacuum is triggered +``` diff --git a/tidb-cloud-lake/sql/fuse-tag.md b/tidb-cloud-lake/sql/fuse-tag.md new file mode 100644 index 0000000000000..67df68caddf38 --- /dev/null +++ b/tidb-cloud-lake/sql/fuse-tag.md @@ -0,0 +1,52 @@ +--- +title: FUSE_TAG +summary: Returns the snapshot tags of a table. For more information about snapshot tags, see Snapshot Tags. +--- + +# FUSE_TAG + +> **Note:** +> +> Introduced or updated in v1.2.894. + +Returns the snapshot tags of a table. For more information about snapshot tags, see [Snapshot Tags](/tidb-cloud-lake/sql/table-versioning.md#snapshot-tags). + +## Syntax + +```sql +FUSE_TAG('', '') +``` + +## Output Columns + +| Column | Type | Description | +|---------------------|--------------------|-----------------------------------------------------------------------------| +| name | STRING | Tag name | +| snapshot_location | STRING | Snapshot file the tag points to | +| expire_at | TIMESTAMP (nullable) | Expiration timestamp; set when `RETAIN` is used in CREATE SNAPSHOT TAG | + +## Examples + +```sql +SET enable_experimental_table_ref = 1; + +CREATE TABLE mytable(a INT, b INT); + +INSERT INTO mytable VALUES(1, 1),(2, 2); + +-- Create a snapshot tag +ALTER TABLE mytable CREATE TAG v1; + +INSERT INTO mytable VALUES(3, 3); + +-- Create another tag with expiration +ALTER TABLE mytable CREATE TAG temp RETAIN 2 DAYS; + +SELECT * FROM FUSE_TAG('default', 'mytable'); + +--- +| name | snapshot_location | expire_at | +|------|------------------------------------------------------------|----------------------------| +| v1 | 1/319/_ss/a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4_v4.mpk | NULL | +| temp | 1/319/_ss/f6e5d4c3b2a1f6e5d4c3b2a1f6e5d4c3_v4.mpk | 2025-06-15 10:30:00.000000 | +``` diff --git a/tidb-cloud-lake/sql/geo-distance.md b/tidb-cloud-lake/sql/geo-distance.md new file mode 100644 index 0000000000000..def73c1190110 --- /dev/null +++ b/tidb-cloud-lake/sql/geo-distance.md @@ -0,0 +1,39 @@ +--- +title: GEO_DISTANCE +summary: Returns the approximate distance in meters between two points on Earth. +--- + +# GEO_DISTANCE + +Returns the approximate distance in meters between two points on Earth. The points are specified by longitude and latitude in degrees and the distance is computed using a WGS84-based approximation. + +## Syntax + +```sql +GEO_DISTANCE(, , , ) +``` + +## Arguments + +| Arguments | Description | +|-----------|-------------| +| `` | Longitude of the first point in degrees. | +| `` | Latitude of the first point in degrees. | +| `` | Longitude of the second point in degrees. | +| `` | Latitude of the second point in degrees. | + +## Return Type + +Float32. + +## Examples + +```sql +SELECT GEO_DISTANCE(55.755831, 37.617673, -55.755831, -37.617673) AS distance; + +╭────────────╮ +│ distance │ +├────────────┤ +│ 14128353.0 │ +╰────────────╯ +``` diff --git a/tidb-cloud-lake/sql/geospatial-functions.md b/tidb-cloud-lake/sql/geospatial-functions.md index 9bc6f65301875..46074e66ba0d4 100644 --- a/tidb-cloud-lake/sql/geospatial-functions.md +++ b/tidb-cloud-lake/sql/geospatial-functions.md @@ -7,62 +7,90 @@ summary: "{{{ .lake }}} ships with two complementary sets of geospatial capabili {{{ .lake }}} ships with two complementary sets of geospatial capabilities: PostGIS-style geometry functions for building and analysing shapes, and H3 utilities for global hexagonal indexing. The tables below group the functions by task so you can quickly locate the right tool, similar to the layout used in the Snowflake documentation. -## Geometry Constructors - -| Function | Description | Example | -|----------|-------------|---------| -| [ST_MAKEGEOMPOINT](/tidb-cloud-lake/sql/st-makegeompoint.md) / [ST_GEOM_POINT](/tidb-cloud-lake/sql/st-geom-point.md) | Construct a Point geometry | `ST_MAKEGEOMPOINT(-122.35, 37.55)` → `POINT(-122.35 37.55)` | -| [ST_MAKEPOINT](/tidb-cloud-lake/sql/st-makepoint.md) / [ST_POINT](/tidb-cloud-lake/sql/st-point.md) | Construct a Point geography | `ST_MAKEPOINT(-122.35, 37.55)` → `POINT(-122.35 37.55)` | -| [ST_MAKELINE](/tidb-cloud-lake/sql/st-makeline.md) / [ST_MAKE_LINE](/tidb-cloud-lake/sql/st-make-line.md) | Create a LineString from points | `ST_MAKELINE(ST_MAKEGEOMPOINT(-122.35, 37.55), ST_MAKEGEOMPOINT(-122.40, 37.60))` → `LINESTRING(-122.35 37.55, -122.40 37.60)` | -| [ST_MAKEPOLYGON](/tidb-cloud-lake/sql/st-makepolygon.md) | Create a Polygon from a closed LineString | `ST_MAKEPOLYGON(ST_MAKELINE(...))` → `POLYGON(...)` | -| [ST_POLYGON](/tidb-cloud-lake/sql/st-polygon.md) | Create a Polygon from coordinate rings | `ST_POLYGON(...)` → `POLYGON(...)` | - -## Geometry Conversion - -| Function | Description | Example | -|----------|-------------|---------| -| [ST_GEOMETRYFROMTEXT](/tidb-cloud-lake/sql/st-geometryfromtext.md) / [ST_GEOMFROMTEXT](/tidb-cloud-lake/sql/st-geomfromtext.md) | Convert WKT to geometry | `ST_GEOMETRYFROMTEXT('POINT(-122.35 37.55)')` → `POINT(-122.35 37.55)` | -| [ST_GEOMETRYFROMWKB](/tidb-cloud-lake/sql/st-geometryfromwkb.md) / [ST_GEOMFROMWKB](/tidb-cloud-lake/sql/st-geomfromwkb.md) | Convert WKB to geometry | `ST_GEOMETRYFROMWKB(...)` → `POINT(...)` | -| [ST_GEOMETRYFROMEWKT](/tidb-cloud-lake/sql/st-geometryfromewkt.md) / [ST_GEOMFROMEWKT](/tidb-cloud-lake/sql/st-geomfromewkt.md) | Convert EWKT to geometry | `ST_GEOMETRYFROMEWKT('SRID=4326;POINT(-122.35 37.55)')` → `POINT(-122.35 37.55)` | -| [ST_GEOMETRYFROMEWKB](/tidb-cloud-lake/sql/st-geometryfromewkb.md) / [ST_GEOMFROMEWKB](/tidb-cloud-lake/sql/st-geomfromewkb.md) | Convert EWKB to geometry | `ST_GEOMETRYFROMEWKB(...)` → `POINT(...)` | -| [ST_GEOGRAPHYFROMWKT](/tidb-cloud-lake/sql/st-geographyfromwkt.md) / [ST_GEOGFROMWKT](/tidb-cloud-lake/sql/st-geogfromwkt.md) | Convert WKT/EWKT to geography | `ST_GEOGRAPHYFROMWKT('POINT(-122.35 37.55)')` → `POINT(-122.35 37.55)` | -| [ST_GEOGRAPHYFROMWKB](/tidb-cloud-lake/sql/st-geographyfromwkb.md) / [ST_GEOGFROMWKB](/tidb-cloud-lake/sql/st-geogfromwkb.md) | Convert WKB/EWKB to geography | `ST_GEOGRAPHYFROMWKB(...)` → `POINT(...)` | -| [ST_GEOMFROMGEOHASH](/tidb-cloud-lake/sql/st-geomfromgeohash.md) | Convert GeoHash to geometry | `ST_GEOMFROMGEOHASH('9q8yyk8')` → `POLYGON(...)` | -| [ST_GEOMPOINTFROMGEOHASH](/tidb-cloud-lake/sql/st-geompointfromgeohash.md) | Convert GeoHash to Point geometry | `ST_GEOMPOINTFROMGEOHASH('9q8yyk8')` → `POINT(...)` | -| [ST_GEOGFROMGEOHASH](/tidb-cloud-lake/sql/st-geogfromgeohash.md) | Convert GeoHash to geography polygon | `ST_GEOGFROMGEOHASH('9q8yyk8')` → `POLYGON(...)` | -| [ST_GEOGPOINTFROMGEOHASH](/tidb-cloud-lake/sql/st-geogpointfromgeohash.md) | Convert GeoHash to geography point | `ST_GEOGPOINTFROMGEOHASH('9q8yyk8')` → `POINT(...)` | -| [TO_GEOMETRY](/tidb-cloud-lake/sql/geometry.md) | Parse various formats into geometry | `TO_GEOMETRY('POINT(-122.35 37.55)')` → `POINT(-122.35 37.55)` | -| [TO_GEOGRAPHY](/tidb-cloud-lake/sql/to-geography.md) / [TRY_TO_GEOGRAPHY](/tidb-cloud-lake/sql/to-geography.md) | Parse various formats into geography | `TO_GEOGRAPHY('POINT(-122.35 37.55)')` → `POINT(-122.35 37.55)` | - -## Geometry Output - -| Function | Description | Example | -|----------|-------------|---------| -| [ST_ASTEXT](/tidb-cloud-lake/sql/st-astext.md) | Convert geometry to WKT | `ST_ASTEXT(ST_MAKEGEOMPOINT(-122.35, 37.55))` → `'POINT(-122.35 37.55)'` | -| [ST_ASWKT](/tidb-cloud-lake/sql/st-aswkt.md) | Convert geometry to WKT | `ST_ASWKT(ST_MAKEGEOMPOINT(-122.35, 37.55))` → `'POINT(-122.35 37.55)'` | -| [ST_ASBINARY](/tidb-cloud-lake/sql/st-asbinary.md) / [ST_ASWKB](/tidb-cloud-lake/sql/st-aswkb.md) | Convert geometry to WKB | `ST_ASBINARY(ST_MAKEGEOMPOINT(-122.35, 37.55))` → `WKB representation` | -| [ST_ASEWKT](/tidb-cloud-lake/sql/st-asewkt.md) | Convert geometry to EWKT | `ST_ASEWKT(ST_MAKEGEOMPOINT(-122.35, 37.55))` → `'SRID=4326;POINT(-122.35 37.55)'` | -| [ST_ASEWKB](/tidb-cloud-lake/sql/st-asewkb.md) | Convert geometry to EWKB | `ST_ASEWKB(ST_MAKEGEOMPOINT(-122.35, 37.55))` → `EWKB representation` | -| [ST_ASGEOJSON](/tidb-cloud-lake/sql/st-asgeojson.md) | Convert geometry to GeoJSON | `ST_ASGEOJSON(ST_MAKEGEOMPOINT(-122.35, 37.55))` → `'{"type":"Point","coordinates":[-122.35,37.55]}'` | -| [ST_GEOHASH](/tidb-cloud-lake/sql/st-geohash.md) | Convert geometry to GeoHash | `ST_GEOHASH(ST_MAKEGEOMPOINT(-122.35, 37.55), 7)` → `'9q8yyk8'` | -| [TO_STRING](/tidb-cloud-lake/sql/string.md) | Convert geometry to string | `TO_STRING(ST_MAKEGEOMPOINT(-122.35, 37.55))` → `'POINT(-122.35 37.55)'` | - -## Geometry Accessors & Properties - -| Function | Description | Example | -|----------|-------------|---------| -| [ST_DIMENSION](/tidb-cloud-lake/sql/st-dimension.md) | Return the topological dimension | `ST_DIMENSION(ST_MAKEGEOMPOINT(-122.35, 37.55))` → `0` | -| [ST_SRID](/tidb-cloud-lake/sql/st-srid.md) | Return the SRID of a geometry | `ST_SRID(ST_MAKEGEOMPOINT(-122.35, 37.55))` → `4326` | -| [ST_SETSRID](/tidb-cloud-lake/sql/st-setsrid.md) | Assign an SRID to a geometry | `ST_SETSRID(ST_MAKEGEOMPOINT(-122.35, 37.55), 3857)` → `POINT(-122.35 37.55)` | -| [ST_TRANSFORM](/tidb-cloud-lake/sql/st-transform.md) | Transform geometry to a new SRID | `ST_TRANSFORM(ST_MAKEGEOMPOINT(-122.35, 37.55), 3857)` → `POINT(-13618288.8 4552395.0)` | -| [ST_NPOINTS](/tidb-cloud-lake/sql/st-npoints.md) / [ST_NUMPOINTS](/tidb-cloud-lake/sql/st-numpoints.md) | Count points in a geometry | `ST_NPOINTS(ST_MAKELINE(...))` → `2` | -| [ST_POINTN](/tidb-cloud-lake/sql/st-pointn.md) | Return a specific point from a LineString | `ST_POINTN(ST_MAKELINE(...), 1)` → `POINT(-122.35 37.55)` | -| [ST_STARTPOINT](/tidb-cloud-lake/sql/st-startpoint.md) | Return the first point in a LineString | `ST_STARTPOINT(ST_MAKELINE(...))` → `POINT(-122.35 37.55)` | -| [ST_ENDPOINT](/tidb-cloud-lake/sql/st-endpoint.md) | Return the last point in a LineString | `ST_ENDPOINT(ST_MAKELINE(...))` → `POINT(-122.40 37.60)` | -| [ST_LENGTH](/tidb-cloud-lake/sql/st-length.md) | Measure the length of a LineString | `ST_LENGTH(ST_MAKELINE(...))` → `5.57` | -| [ST_X](/tidb-cloud-lake/sql/st-x.md) / [ST_Y](/tidb-cloud-lake/sql/st-y.md) | Return the X or Y coordinate of a Point | `ST_X(ST_MAKEGEOMPOINT(-122.35, 37.55))` → `-122.35` | -| [ST_XMIN](/tidb-cloud-lake/sql/st-xmin.md) / [ST_XMAX](/tidb-cloud-lake/sql/st-xmax.md) | Return the min/max X coordinate | `ST_XMIN(ST_MAKELINE(...))` → `-122.40` | -| [ST_YMIN](/tidb-cloud-lake/sql/st-ymin.md) / [ST_YMAX](/tidb-cloud-lake/sql/st-ymax.md) | Return the min/max Y coordinate | `ST_YMAX(ST_MAKELINE(...))` → `37.60` | +## Constructors + +| Function | Description | Note | Example | +|----------|-------------|------|---------| +| [ST_MAKEGEOMPOINT](/tidb-cloud-lake/sql/st-makegeompoint.md) / [ST_GEOM_POINT](/tidb-cloud-lake/sql/st-geom-point.md) | Construct a Point geometry | GEOGRAPHY only | `ST_MAKEGEOMPOINT(-122.35, 37.55)` → `POINT(-122.35 37.55)` | +| [ST_MAKEPOINT](/tidb-cloud-lake/sql/st-makepoint.md) / [ST_POINT](/tidb-cloud-lake/sql/st-point.md) | Construct a Point geography | GEOGRAPHY only | `ST_MAKEPOINT(-122.35, 37.55)` → `POINT(-122.35 37.55)` | +| [ST_MAKELINE](/tidb-cloud-lake/sql/st-makeline.md) / [ST_MAKE_LINE](/tidb-cloud-lake/sql/st-make-line.md) | Create a LineString from points | | `ST_MAKELINE(ST_MAKEGEOMPOINT(-122.35, 37.55), ST_MAKEGEOMPOINT(-122.40, 37.60))` → `LINESTRING(-122.35 37.55, -122.40 37.60)` | +| [ST_MAKEPOLYGON](/tidb-cloud-lake/sql/st-makepolygon.md) | Create a Polygon from a closed LineString | | `ST_MAKEPOLYGON(ST_MAKELINE(...))` → `POLYGON(...)` | +| [ST_POLYGON](/tidb-cloud-lake/sql/st-polygon.md) | Create a Polygon from coordinate rings | GEOGRAPHY only | `ST_POLYGON(...)` → `POLYGON(...)` | + +## Conversion + +| Function | Description | Note | Example | +|----------|-------------|------|---------| +| [ST_GEOMETRYFROMTEXT](/tidb-cloud-lake/sql/st-geometryfromtext.md) / [ST_GEOMFROMTEXT](/tidb-cloud-lake/sql/st-geomfromtext.md) | Convert WKT to geometry | GEOGRAPHY only | `ST_GEOMETRYFROMTEXT('POINT(-122.35 37.55)')` → `POINT(-122.35 37.55)` | +| [ST_GEOMETRYFROMWKB](/tidb-cloud-lake/sql/st-geometryfromwkb.md) / [ST_GEOMFROMWKB](/tidb-cloud-lake/sql/st-geomfromwkb.md) | Convert WKB to geometry | GEOGRAPHY only | `ST_GEOMETRYFROMWKB(...)` → `POINT(...)` | +| [ST_GEOMETRYFROMEWKT](/tidb-cloud-lake/sql/st-geometryfromewkt.md) / [ST_GEOMFROMEWKT](/tidb-cloud-lake/sql/st-geomfromewkt.md) | Convert EWKT to geometry | GEOGRAPHY only | `ST_GEOMETRYFROMEWKT('SRID=4326;POINT(-122.35 37.55)')` → `POINT(-122.35 37.55)` | +| [ST_GEOMETRYFROMEWKB](/tidb-cloud-lake/sql/st-geometryfromewkb.md) / [ST_GEOMFROMEWKB](/tidb-cloud-lake/sql/st-geomfromewkb.md) | Convert EWKB to geometry | GEOGRAPHY only | `ST_GEOMETRYFROMEWKB(...)` → `POINT(...)` | +| [ST_GEOGRAPHYFROMWKT](/tidb-cloud-lake/sql/st-geographyfromwkt.md) / [ST_GEOGFROMWKT](/tidb-cloud-lake/sql/st-geogfromwkt.md) | Convert WKT/EWKT to geography | GEOGRAPHY only | `ST_GEOGRAPHYFROMWKT('POINT(-122.35 37.55)')` → `POINT(-122.35 37.55)` | +| [ST_GEOGRAPHYFROMWKB](/tidb-cloud-lake/sql/st-geographyfromwkb.md) / [ST_GEOGFROMWKB](/tidb-cloud-lake/sql/st-geogfromwkb.md) | Convert WKB/EWKB to geography | GEOGRAPHY only | `ST_GEOGRAPHYFROMWKB(...)` → `POINT(...)` | +| [ST_GEOMFROMGEOHASH](/tidb-cloud-lake/sql/st-geomfromgeohash.md) | Convert GeoHash to geometry | GEOGRAPHY only | `ST_GEOMFROMGEOHASH('9q8yyk8')` → `POLYGON(...)` | +| [ST_GEOMPOINTFROMGEOHASH](/tidb-cloud-lake/sql/st-geompointfromgeohash.md) | Convert GeoHash to Point geometry | GEOGRAPHY only | `ST_GEOMPOINTFROMGEOHASH('9q8yyk8')` → `POINT(...)` | +| [ST_GEOGFROMGEOHASH](/tidb-cloud-lake/sql/st-geogfromgeohash.md) | Convert GeoHash to geography polygon | GEOGRAPHY only | `ST_GEOGFROMGEOHASH('9q8yyk8')` → `POLYGON(...)` | +| [ST_GEOGPOINTFROMGEOHASH](/tidb-cloud-lake/sql/st-geogpointfromgeohash.md) | Convert GeoHash to geography point | GEOGRAPHY only | `ST_GEOGPOINTFROMGEOHASH('9q8yyk8')` → `POINT(...)` | +| [TO_GEOMETRY](/tidb-cloud-lake/sql/geometry.md) | Parse various formats into geometry | GEOGRAPHY only | `TO_GEOMETRY('POINT(-122.35 37.55)')` → `POINT(-122.35 37.55)` | +| [TO_GEOGRAPHY](/tidb-cloud-lake/sql/to-geography.md) / [TRY_TO_GEOGRAPHY](/tidb-cloud-lake/sql/to-geography.md) | Parse various formats into geography | GEOGRAPHY only | `TO_GEOGRAPHY('POINT(-122.35 37.55)')` → `POINT(-122.35 37.55)` | + +## Output + +| Function | Description | Note | Example | +|----------|-------------|------|---------| +| [ST_ASTEXT](/tidb-cloud-lake/sql/st-astext.md) | Convert geometry to WKT | | `ST_ASTEXT(ST_MAKEGEOMPOINT(-122.35, 37.55))` → `'POINT(-122.35 37.55)'` | +| [ST_ASWKT](/tidb-cloud-lake/sql/st-aswkt.md) | Convert geometry to WKT | | `ST_ASWKT(ST_MAKEGEOMPOINT(-122.35, 37.55))` → `'POINT(-122.35 37.55)'` | +| [ST_ASBINARY](/tidb-cloud-lake/sql/st-asbinary.md) / [ST_ASWKB](/tidb-cloud-lake/sql/st-aswkb.md) | Convert geometry to WKB | | `ST_ASBINARY(ST_MAKEGEOMPOINT(-122.35, 37.55))` → `WKB representation` | +| [ST_ASEWKT](/tidb-cloud-lake/sql/st-asewkt.md) | Convert geometry to EWKT | | `ST_ASEWKT(ST_MAKEGEOMPOINT(-122.35, 37.55))` → `'SRID=4326;POINT(-122.35 37.55)'` | +| [ST_ASEWKB](/tidb-cloud-lake/sql/st-asewkb.md) | Convert geometry to EWKB | | `ST_ASEWKB(ST_MAKEGEOMPOINT(-122.35, 37.55))` → `EWKB representation` | +| [ST_ASGEOJSON](/tidb-cloud-lake/sql/st-asgeojson.md) | Convert geometry to GeoJSON | | `ST_ASGEOJSON(ST_MAKEGEOMPOINT(-122.35, 37.55))` → '{"type":"Point","coordinates":[-122.35,37.55]}' | +| [ST_GEOHASH](/tidb-cloud-lake/sql/st-geohash.md) | Convert geometry to GeoHash | | `ST_GEOHASH(ST_MAKEGEOMPOINT(-122.35, 37.55), 7)` → `'9q8yyk8'` | +| [TO_STRING](/tidb-cloud-lake/sql/string.md) | Convert geometry to string | | `TO_STRING(ST_MAKEGEOMPOINT(-122.35, 37.55))` → `'POINT(-122.35 37.55)'` | + +## Accessors & Properties + +| Function | Description | Note | Example | +|----------|-------------|------|---------| +| [ST_DIMENSION](/tidb-cloud-lake/sql/st-dimension.md) | Return the topological dimension | | `ST_DIMENSION(ST_MAKEGEOMPOINT(-122.35, 37.55))` → `0` | +| [ST_CENTROID](/tidb-cloud-lake/sql/st-centroid.md) | Return the centroid of a geometry | GEOMETRY only | `ST_CENTROID(TO_GEOMETRY('LINESTRING(0 0, 2 0)'))` → `POINT(1 0)` | +| [ST_ENVELOPE](/tidb-cloud-lake/sql/st-envelope.md) | Return the minimum bounding rectangle | GEOMETRY only | `ST_ENVELOPE(TO_GEOMETRY('LINESTRING(0 0, 2 3)'))` → `POLYGON((0 0,2 0,2 3,0 3,0 0))` | +| [ST_SRID](/tidb-cloud-lake/sql/st-srid.md) | Return the SRID of a geometry | | `ST_SRID(ST_MAKEGEOMPOINT(-122.35, 37.55))` → `4326` | +| [ST_POINTN](/tidb-cloud-lake/sql/st-pointn.md) | Return a specific point from a LineString | | `ST_POINTN(ST_MAKELINE(...), 1)` → `POINT(-122.35 37.55)` | +| [ST_STARTPOINT](/tidb-cloud-lake/sql/st-startpoint.md) | Return the first point in a LineString | | `ST_STARTPOINT(ST_MAKELINE(...))` → `POINT(-122.35 37.55)` | +| [ST_ENDPOINT](/tidb-cloud-lake/sql/st-endpoint.md) | Return the last point in a LineString | | `ST_ENDPOINT(ST_MAKELINE(...))` → `POINT(-122.40 37.60)` | +| [ST_X](/tidb-cloud-lake/sql/st-x.md) / [ST_Y](/tidb-cloud-lake/sql/st-y.md) | Return the X or Y coordinate of a Point | | `ST_X(ST_MAKEGEOMPOINT(-122.35, 37.55))` → `-122.35` | +| [ST_XMIN](/tidb-cloud-lake/sql/st-xmin.md) / [ST_XMAX](/tidb-cloud-lake/sql/st-xmax.md) | Return the min/max X coordinate | | `ST_XMIN(ST_MAKELINE(...))` → `-122.40` | +| [ST_YMIN](/tidb-cloud-lake/sql/st-ymin.md) / [ST_YMAX](/tidb-cloud-lake/sql/st-ymax.md) | Return the min/max Y coordinate | | `ST_YMAX(ST_MAKELINE(...))` → `37.60` | + +## Relationship and measurement + +| Function | Description | Note | Example | +|----------|-------------|------|---------| +| [HAVERSINE](/tidb-cloud-lake/sql/haversine.md) | Compute great-circle distance between coordinates | | `HAVERSINE(37.55, -122.35, 37.60, -122.40)` → `6.12` | +| [ST_AREA](/tidb-cloud-lake/sql/st-area.md) | Measure the area of geometry or geography object | | `ST_AREA(TO_GEOMETRY('POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))'))` → `1.0` | +| [ST_CONTAINS](/tidb-cloud-lake/sql/st-contains.md) | Test whether one geometry contains another | GEOMETRY only | `ST_CONTAINS(ST_MAKEPOLYGON(...), ST_MAKEGEOMPOINT(...))` → `TRUE` | +| [ST_CONVEXHULL](/tidb-cloud-lake/sql/st-convexhull.md) | Compute the convex hull of a geometry | GEOMETRY only | `ST_CONVEXHULL(TO_GEOMETRY('POLYGON((0 0,2 0,2 2,0 2,0 0))'))` → `POLYGON((0 0,2 0,2 2,0 2,0 0))` | +| [ST_NPOINTS](/tidb-cloud-lake/sql/st-npoints.md) | Count points in a geometry | | `ST_NPOINTS(ST_MAKELINE(...))` → `2` | +| [ST_NUMPOINTS](/tidb-cloud-lake/sql/st-numpoints.md) | Count points in a geometry | GEOMETRY only | `ST_NUMPOINTS(ST_MAKELINE(...))` → `2` | +| [ST_INTERSECTS](/tidb-cloud-lake/sql/st-intersects.md) | Test whether two geometries intersect | GEOMETRY only | `ST_INTERSECTS(TO_GEOMETRY('LINESTRING(0 0, 2 2)'), TO_GEOMETRY('LINESTRING(0 2, 2 0)'))` → `TRUE` | +| [ST_DISJOINT](/tidb-cloud-lake/sql/st-disjoint.md) | Test whether two geometries are disjoint | GEOMETRY only | `ST_DISJOINT(TO_GEOMETRY('POINT(3 3)'), TO_GEOMETRY('POLYGON((0 0,2 0,2 2,0 2,0 0))'))` → `TRUE` | +| [ST_WITHIN](/tidb-cloud-lake/sql/st-within.md) | Test whether one geometry is within another | GEOMETRY only | `ST_WITHIN(TO_GEOMETRY('POINT(1 1)'), TO_GEOMETRY('POLYGON((0 0,2 0,2 2,0 2,0 0))'))` → `TRUE` | +| [ST_EQUALS](/tidb-cloud-lake/sql/st-equals.md) | Test whether two geometries are spatially equal | GEOMETRY only | `ST_EQUALS(TO_GEOMETRY('POINT(1 1)'), TO_GEOMETRY('POINT(1 1)'))` → `TRUE` | +| [ST_LENGTH](/tidb-cloud-lake/sql/st-length.md) | Measure the length of a LineString | | `ST_LENGTH(ST_MAKELINE(...))` → `5.57` | +| [ST_DISTANCE](/tidb-cloud-lake/sql/st-distance.md) | Measure the distance between geometries | | `ST_DISTANCE(ST_MAKEGEOMPOINT(-122.35, 37.55), ST_MAKEGEOMPOINT(-122.40, 37.60))` → `5.57` | +| [ST_DWITHIN](/tidb-cloud-lake/sql/st-dwithin.md) | Test whether two geometries are within a distance | GEOMETRY only | `ST_DWITHIN(TO_GEOMETRY('POINT(0 0)'), TO_GEOMETRY('POINT(1 1)'), 1.5)` → `TRUE` | +| [ST_UNION](/tidb-cloud-lake/sql/st-union.md) | Return the combined geometry of two inputs | GEOMETRY only | `ST_UNION(TO_GEOMETRY('POINT(0 0)'), TO_GEOMETRY('POINT(1 1)'))` → `MULTIPOINT(0 0,1 1)` | +| [ST_INTERSECTION](/tidb-cloud-lake/sql/st-intersection.md) | Return the shared part of two geometries | GEOMETRY only | `ST_INTERSECTION(TO_GEOMETRY('LINESTRING(0 0, 1 1)'), TO_GEOMETRY('LINESTRING(0 0, 1 1)'))` → `LINESTRING(0 0,1 1)` | +| [ST_DIFFERENCE](/tidb-cloud-lake/sql/st-difference.md) | Return the part of the first geometry not covered by the second | GEOMETRY only | `ST_DIFFERENCE(TO_GEOMETRY('POINT(0 0)'), TO_GEOMETRY('POINT(1 1)'))` → `POINT(0 0)` | +| [ST_SYMDIFFERENCE](/tidb-cloud-lake/sql/st-symdifference.md) | Return the non-overlapping parts of two geometries | GEOMETRY only | `ST_SYMDIFFERENCE(TO_GEOMETRY('POINT(0 0)'), TO_GEOMETRY('POINT(1 1)'))` → `MULTIPOINT(0 0,1 1)` | + +## Transformation + +| Function | Description | Note | Example | +|----------|-------------|------|---------| +| [ST_HILBERT](/tidb-cloud-lake/sql/st-hilbert.md) | Encode geometry or geography into a Hilbert curve index | | `ST_HILBERT(TO_GEOMETRY('POINT(0.5 0.5)'), [0, 0, 1, 1])` → `715827882` | +| [ST_SETSRID](/tidb-cloud-lake/sql/st-setsrid.md) | Assign an SRID to a geometry | GEOMETRY only | `ST_SETSRID(ST_MAKEGEOMPOINT(-122.35, 37.55), 3857)` → `POINT(-122.35 37.55)` | +| [ST_TRANSFORM](/tidb-cloud-lake/sql/st-transform.md) | Transform geometry to a new SRID | GEOMETRY only | `ST_TRANSFORM(ST_MAKEGEOMPOINT(-122.35, 37.55), 3857)` → `POINT(-13618288.8 4552395.0)` | ## Spatial Relationships @@ -108,6 +136,11 @@ summary: "{{{ .lake }}} ships with two complementary sets of geospatial capabili | [H3_HEX_AREA_M2](/tidb-cloud-lake/sql/h3-hex-area-m2.md) | Return the average hexagon area in m² | `H3_HEX_AREA_M2(10)` → `15200` | | [H3_TO_GEO_BOUNDARY](/tidb-cloud-lake/sql/h3-to-geo-boundary.md) | Return the boundary of a cell | `H3_TO_GEO_BOUNDARY(644325524701193974)` → `[[lon1,lat1], ...]` | | [H3_NUM_HEXAGONS](/tidb-cloud-lake/sql/h3-num-hexagons.md) | Return the number of hexagons at a resolution | `H3_NUM_HEXAGONS(2)` → `5882` | +| [GEO_DISTANCE](/tidb-cloud-lake/sql/geo-distance.md) | Return the approximate distance in meters using WGS84 | `GEO_DISTANCE(0, 0, 0, 0)` → `0` | +| [GREAT_CIRCLE_DISTANCE](/tidb-cloud-lake/sql/great-circle-distance.md) | Return the great-circle distance in meters | `GREAT_CIRCLE_DISTANCE(0, 0, 0, 0)` → `0` | +| [GREAT_CIRCLE_ANGLE](/tidb-cloud-lake/sql/great-circle-angle.md) | Return the great-circle central angle in degrees | `GREAT_CIRCLE_ANGLE(0, 0, 45, 0)` → `45` | +| [POINT_IN_POLYGON](/tidb-cloud-lake/sql/point-in-polygon.md) | Check if a point lies inside a polygon | `POINT_IN_POLYGON([lon, lat], [[p1_lon, p1_lat], ...])` → `TRUE` | +| [POINT_IN_ELLIPSES](/tidb-cloud-lake/sql/point-in-ellipses.md) | Check if a point lies inside any ellipse | `POINT_IN_ELLIPSES(10, 10, 10, 9.1, 1, 0.9999)` → `1` | ## H3 Neighborhoods diff --git a/tidb-cloud-lake/sql/great-circle-angle.md b/tidb-cloud-lake/sql/great-circle-angle.md new file mode 100644 index 0000000000000..ea10776d763f9 --- /dev/null +++ b/tidb-cloud-lake/sql/great-circle-angle.md @@ -0,0 +1,39 @@ +--- +title: GREAT_CIRCLE_ANGLE +summary: Returns the central angle in degrees between two points on a sphere. +--- + +# GREAT_CIRCLE_ANGLE + +Returns the central angle in degrees between two points on a sphere. The points are specified by longitude and latitude in degrees. + +## Syntax + +```sql +GREAT_CIRCLE_ANGLE(, , , ) +``` + +## Arguments + +| Arguments | Description | +|-----------|-------------| +| `` | Longitude of the first point in degrees. | +| `` | Latitude of the first point in degrees. | +| `` | Longitude of the second point in degrees. | +| `` | Latitude of the second point in degrees. | + +## Return Type + +Float32. + +## Examples + +```sql +SELECT GREAT_CIRCLE_ANGLE(55.755831, 37.617673, -55.755831, -37.617673) AS angle; + +╭───────────╮ +│ angle │ +├───────────┤ +│ 127.05919 │ +╰───────────╯ +``` diff --git a/tidb-cloud-lake/sql/great-circle-distance.md b/tidb-cloud-lake/sql/great-circle-distance.md new file mode 100644 index 0000000000000..4fb38dc8ef2f8 --- /dev/null +++ b/tidb-cloud-lake/sql/great-circle-distance.md @@ -0,0 +1,39 @@ +--- +title: GREAT_CIRCLE_DISTANCE +summary: Returns the great-circle distance in meters between two points on a sphere. +--- + +# GREAT_CIRCLE_DISTANCE + +Returns the great-circle distance in meters between two points on a sphere. The points are specified by longitude and latitude in degrees. + +## Syntax + +```sql +GREAT_CIRCLE_DISTANCE(, , , ) +``` + +## Arguments + +| Arguments | Description | +|-----------|-------------| +| `` | Longitude of the first point in degrees. | +| `` | Latitude of the first point in degrees. | +| `` | Longitude of the second point in degrees. | +| `` | Latitude of the second point in degrees. | + +## Return Type + +Float32. + +## Examples + +```sql +SELECT GREAT_CIRCLE_DISTANCE(55.755831, 37.617673, -55.755831, -37.617673) AS distance; + +╭────────────╮ +│ distance │ +├────────────┤ +│ 14128353.0 │ +╰────────────╯ +``` diff --git a/tidb-cloud-lake/sql/input-output-file-formats.md b/tidb-cloud-lake/sql/input-output-file-formats.md index f5588ce1835e1..f66a4da891cc8 100644 --- a/tidb-cloud-lake/sql/input-output-file-formats.md +++ b/tidb-cloud-lake/sql/input-output-file-formats.md @@ -17,12 +17,18 @@ To specify a file format in a statement, use the following syntax: ```sql -- Specify a standard file format -... FILE_FORMAT = ( TYPE = { CSV | TSV | NDJSON | PARQUET | ORC | AVRO } [ formatTypeOptions ] ) +... FILE_FORMAT = ( TYPE = { CSV | TSV | NDJSON | PARQUET | LANCE | ORC | AVRO } [ formatTypeOptions ] ) -- Specify a custom file format ... FILE_FORMAT = ( FORMAT_NAME = '' ) ``` +> **Note:** +> +> - Starting in {{{ .lake }}} `v1.2.891-nightly`, `TEXT` is supported as an alias for `TSV`. +> - Older servers may reject `TYPE = TEXT`, so this page continues to use `TSV` in syntax and examples > for cross-version compatibility. +> - If you only target {{{ .lake }}} `v1.2.891-nightly` or later, prefer `TYPE = TEXT` for new configurations. + {{{ .lake }}} determines the file format used by a COPY or Select statement in the following order of priority: 1. First, it checks if a FILE_FORMAT is explicitly specified within the statement. @@ -31,7 +37,8 @@ To specify a file format in a statement, use the following syntax: > **Note:** > -> - {{{ .lake }}} currently supports ORC and AVRO as a source ONLY. Unloading data into an ORC or AVRO file is not supported yet. +> - {{{ .lake }}} currently supports ORC and AVRO as a source ONLY. Unloading data into an ORC or AVRO file > is not supported yet. +> - {{{ .lake }}} currently supports LANCE as an unload target ONLY. `COPY INTO ` writes a Lance > dataset directory instead of a standalone file, so it is intended for downstream Lance tooling rather > than stage-table reads or `COPY INTO
`. > - For managing custom file formats in {{{ .lake }}}, see [File Format](/tidb-cloud-lake/sql/file-format.md). ### formatTypeOptions @@ -97,7 +104,7 @@ Character used to escape the quote character within quoted values, in addition t In some variants of CSV, quotes are escaped using a special escape character like `\`, instead of escaping quotes by doubling quoting. -**Available Values**: `'\\'` or `''` (emtpy, means only use double quoting) +**Available Values**: `'\\'` or `''` (empty, means only use double quoting) **Default**: `''` @@ -199,24 +206,26 @@ The compression algorithm. ## TSV Options +{{{ .lake }}} TSV (also called `TEXT` in `v1.2.891-nightly` and later) uses the same format and options under both names. This page keeps `TSV` as the primary term for compatibility with older server versions. + {{{ .lake }}} TSV is subject to the following conditions: -- [RECORD_DELIMITER](#record_delimiter-1), [FIELD_DELIMITER](#field_delimiter-1) are escaped by `\` to resolve [delimter collision](https://en.wikipedia.org/wiki/Delimiter#Delimiter_collision) -- In addition to delimters, these characters in are also escaped: `\b`, `\f`, `\r`, `\n`, `\t`, `\0`, `\\`, `\'`. +- [RECORD_DELIMITER](#record_delimiter-1), [FIELD_DELIMITER](#field_delimiter-1) are escaped by `\` to resolve [delimiter collision](https://en.wikipedia.org/wiki/Delimiter#Delimiter_collision) +- In addition to delimiters, these characters in are also escaped: `\b`, `\f`, `\r`, `\n`, `\t`, `\0`, `\\`, `\'`. - [QUOTE](#quote-load-only) is NOT part of the format. - NULL is represent as `\N`. > **Note:** > -> 1. In {{{ .lake }}}, the main difference between TSV and CSV is NOT using a tab instead of a comma as a field delemiter (which can be changed by options), but using escaping instead of quoting for -> [delimter collision](https://en.wikipedia.org/wiki/Delimiter#Delimiter_collision) +> 1. In {{{ .lake }}}, the main difference between TSV and CSV is NOT using a tab instead of a comma as a > field delimiter (which can be changed by options), but using escaping instead of quoting for +> [delimiter collision](https://en.wikipedia.org/wiki/Delimiter#Delimiter_collision) > 2. We recommend CSV over TSV as a storage format since it has a formal standard. > 3. TSV can be used to load files generated by -> 1. [Clickhouse TSV](https://clickhouse.com/docs/integrations/data-formats/csv-tsv#tsv-tab-separated-files) -> 2. [MySQL TabSeperated](https://dev.mysql.com/doc/refman/8.4/en/mysqldump.html) MySQL `mysqldump --tab`. If `--fields-enclosed-by` or `--fields-optinally-enclosed-by`, use CSV instead. -> 3. [Postgresql TEXT](https://www.postgresql.org/docs/current/sql-copy.html). -> 4. [Snowflake CSV](https://docs.snowflake.com/en/sql-reference/sql/create-file-format#type-csv) with default options. If `ESCAPE_UNENCLOSED_FIELD` is specified, use CSV instead. -> 5. Hive Textfile. +> 1. [Postgresql TEXT](https://www.postgresql.org/docs/current/sql-copy.html). +> 2. [Clickhouse TSV](https://clickhouse.com/docs/integrations/data-formats/csv-tsv#tsv-tab-separated-files) +> 3. [MySQL TabSeparated](https://dev.mysql.com/doc/refman/8.4/en/mysqldump.html) MySQL `mysqldump > --tab`. If `--fields-enclosed-by` or `--fields-optionally-enclosed-by`, use CSV instead. +> 4. [Snowflake CSV](https://docs.snowflake.com/en/sql-reference/sql/create-file-format#type-csv) with default options. If `ESCAPE_UNENCLOSED_FIELD` is specified, use CSV instead. +> 5. Hive Textfile. ### RECORD_DELIMITER @@ -290,8 +299,37 @@ Compression algorithm for internal blocks of parquet file. | `ZSTD` (default) | Zstandard v0.8 (and higher) is supported. | | `SNAPPY` | Snappy is a popular and fast compression algorithm often used with Parquet. | +## LANCE Options + +`LANCE` is only supported when unloading with `COPY INTO `. + +Compared with CSV, TSV, NDJSON, and Parquet, a Lance export does **not** produce one or more standalone files that {{{ .lake }}} can read back directly. Instead, {{{ .lake }}} writes a dataset directory containing `.lance` data files together with dataset metadata such as `_versions/`. + +This makes Lance a better fit for downstream machine learning, vector, and Arrow-based workflows that consume the dataset with Lance tooling such as Python `lance` (`pip install pylance`). + +### Format-Specific Options + +Lance has no format-specific options. Use: + +```sql +FILE_FORMAT = (TYPE = LANCE) +``` + +### Behavioral Differences + +| Item | LANCE behavior | +|------|----------------| +| Supported direction | Unload only | +| Read back in {{{ .lake }}} stage query | Not supported | +| `COPY INTO
` | Not supported | +| Output layout | A dataset directory with `.lance` files and metadata | +| `SINGLE` copy option | Not supported | +| `PARTITION BY` | Not supported | + + ## ORC Options + ### MISSING_FIELD_AS (Load Only) The value that missing field is converted to. diff --git a/tidb-cloud-lake/sql/json-functions-overview.md b/tidb-cloud-lake/sql/json-functions-overview.md index 0946d30f09235..38a3d8aa9ef58 100644 --- a/tidb-cloud-lake/sql/json-functions-overview.md +++ b/tidb-cloud-lake/sql/json-functions-overview.md @@ -47,6 +47,7 @@ This section provides reference information for JSON functions in {{{ .lake }}}. | [JSON_EXTRACT_PATH_TEXT](/tidb-cloud-lake/sql/json-extract-path-text.md) | Extracts text value from JSON using path | `JSON_EXTRACT_PATH_TEXT('{"name":"John"}', 'name')` → `'John'` | | [JSON_EACH](/tidb-cloud-lake/sql/json-each.md) | Expands JSON object into key-value pairs | `JSON_EACH('{"a":1,"b":2}')` → `[("a",1),("b",2)]` | | [JSON_ARRAY_ELEMENTS](/tidb-cloud-lake/sql/json-array-elements.md) | Expands JSON array into individual elements | `JSON_ARRAY_ELEMENTS('[1,2,3]')` → `1, 2, 3` | +| [JQ](/tidb-cloud-lake/sql/jq.md) | Processes JSON using jq-style queries | `JQ('{"name":"John"}', '.name')` → `"John"` | ## JSON Formatting & Processing @@ -54,7 +55,7 @@ This section provides reference information for JSON functions in {{{ .lake }}}. |----------|-------------|---------| | [JSON_PRETTY](/tidb-cloud-lake/sql/json-pretty.md) | Formats JSON with proper indentation | `JSON_PRETTY('{"a":1}')` → Formatted JSON string | | [STRIP_NULL_VALUE](/tidb-cloud-lake/sql/strip-null-value.md) | Removes null values from JSON | `STRIP_NULL_VALUE('{"a":1,"b":null}')` → `{"a":1}` | -| [JQ](/tidb-cloud-lake/sql/jq.md) | Processes JSON using jq-style queries | `JQ('{"name":"John"}', '.name')` → `"John"` | +| [JSON_STRIP_NULLS](/tidb-cloud-lake/sql/json-strip-nulls.md) | Removes null values from JSON Object | `JSON_STRIP_NULLS(PARSE_JSON('{"a":1,"b":null}'))` → `{"a":1}` | ## JSON Containment & Existence diff --git a/tidb-cloud-lake/sql/json-strip-nulls.md b/tidb-cloud-lake/sql/json-strip-nulls.md new file mode 100644 index 0000000000000..d4f2eb6e71894 --- /dev/null +++ b/tidb-cloud-lake/sql/json-strip-nulls.md @@ -0,0 +1,38 @@ +--- +title: JSON_STRIP_NULLS +summary: Removes all properties with null values from a JSON object. +--- + +# JSON_STRIP_NULLS + +> **Note:** +> +> Introduced or updated in v1.2.762. + +Removes all properties with null values from a JSON object. + +## Syntax + +```sql +JSON_STRIP_NULLS() +``` + +## Arguments + +An expression of type VARIANT. + +## Return Type + +VARIANT. + +## Examples + +```sql +SELECT JSON_STRIP_NULLS(PARSE_JSON('{"name": "Alice", "age": 30, "city": null}')) AS value; + +╭───────────────────────────╮ +│ value │ +├───────────────────────────┤ +│ {"age":30,"name":"Alice"} │ +╰───────────────────────────╯ +``` diff --git a/tidb-cloud-lake/sql/point-in-ellipses.md b/tidb-cloud-lake/sql/point-in-ellipses.md new file mode 100644 index 0000000000000..28dcf90973d54 --- /dev/null +++ b/tidb-cloud-lake/sql/point-in-ellipses.md @@ -0,0 +1,39 @@ +--- +title: POINT_IN_ELLIPSES +summary: Returns 1 if the point lies inside any of the provided ellipses, otherwise returns 0. +--- + +# POINT_IN_ELLIPSES + +Returns 1 if the point lies inside any of the provided ellipses, otherwise returns 0. Each ellipse is defined by a center point and its semi-major and semi-minor axes. + +## Syntax + +```sql +POINT_IN_ELLIPSES(x, y, x1, y1, a1, b1 [, x2, y2, a2, b2, ...]) +``` + +## Arguments + +| Arguments | Description | +|-----------|-------------| +| `x`, `y` | Coordinates of the point to test. | +| `x1`, `y1` | Center of the first ellipse. | +| `a1`, `b1` | Semi-major and semi-minor axis lengths of the first ellipse. | +| `x2`, `y2`, `a2`, `b2`, ... | Optional additional ellipses, defined the same way. | + +## Return Type + +UInt8 (1 for true, 0 for false). + +## Examples + +```sql +SELECT POINT_IN_ELLIPSES(10, 10, 10, 9.1, 1, 0.9999) AS inside; + +╭────────╮ +│ inside │ +├────────┤ +│ 1 │ +╰────────╯ +``` diff --git a/tidb-cloud-lake/sql/rename-database.md b/tidb-cloud-lake/sql/rename-database.md deleted file mode 100644 index 15d2a92ce6b0b..0000000000000 --- a/tidb-cloud-lake/sql/rename-database.md +++ /dev/null @@ -1,48 +0,0 @@ ---- -title: RENAME DATABASE -summary: Changes the name of a database. ---- - -# RENAME DATABASE - -Changes the name of a database. - -## Syntax - -```sql -ALTER DATABASE [ IF EXISTS ] RENAME TO -``` - -## Examples - -```sql -CREATE DATABASE DATABEND; -``` - -```sql -SHOW DATABASES; -+--------------------+ -| Database | -+--------------------+ -| DATABEND | -| information_schema | -| default | -| system | -+--------------------+ -``` - -```sql -ALTER DATABASE `DATABEND` RENAME TO `NEW_DATABEND`; -``` - -```sql -SHOW DATABASES; -+--------------------+ -| Database | -+--------------------+ -| information_schema | -| NEW_DATABEND | -| default | -| system | -+--------------------+ -``` diff --git a/tidb-cloud-lake/sql/set-cache-capacity.md b/tidb-cloud-lake/sql/set-cache-capacity.md new file mode 100644 index 0000000000000..a1d776b0a0d66 --- /dev/null +++ b/tidb-cloud-lake/sql/set-cache-capacity.md @@ -0,0 +1,48 @@ +--- +title: SYSTEM$SET_CACHE_CAPACITY +summary: Adjusts the capacity of a named cache at runtime. +--- + +# SYSTEM$SET_CACHE_CAPACITY + +Sets the maximum capacity for a named cache at runtime. Changes take effect immediately but are **not persisted** — the cache reverts to the value in the configuration file after a restart. + +See also: [system.caches](/tidb-cloud-lake/sql/system-caches.md). + +## Syntax + +```sql +CALL system$set_cache_capacity('', ) +``` + +| Parameter | Description | +|--------------|--------------------------------------------------------------------------| +| cache_name | The name of the cache (see the cache list in [system.caches](/tidb-cloud-lake/sql/system-caches.md)) | +| new_capacity | New capacity value. Unit (count or bytes) depends on the cache type. | + +## Notes + +- If the new capacity is **larger** than the current value, existing cache entries are retained. +- If the new capacity is **smaller**, entries may be evicted according to the LRU policy. +- Changes are **not persisted**. After a restart, the capacity reverts to the configuration file value. +- `disk_cache_column_data` cannot be adjusted with this command. + +## Examples + +Set the bloom index metadata cache to 5000 entries: + +```sql +CALL system$set_cache_capacity('memory_cache_bloom_index_file_meta_data', 5000); + +┌────────────────────────┬────────┐ +│ node │ result │ +├────────────────────────┼────────┤ +│ Gwo2DYOLZ9zAdYbGTWY9y6 │ Ok │ +└────────────────────────┴────────┘ +``` + +Disable partition pruning cache for testing: + +```sql +CALL system$set_cache_capacity('memory_cache_prune_partitions', 0); +``` diff --git a/tidb-cloud-lake/sql/set-tag.md b/tidb-cloud-lake/sql/set-tag.md new file mode 100644 index 0000000000000..d6ed0dbbef518 --- /dev/null +++ b/tidb-cloud-lake/sql/set-tag.md @@ -0,0 +1,98 @@ +--- +title: SET TAG and UNSET TAG +summary: Assigns or removes tags on database objects. +--- + +# SET TAG and UNSET TAG + +> **Note:** +> +> Introduced or updated in v1.2.866. + +Assigns or removes tags on database objects. Tags must be created with [CREATE TAG](/tidb-cloud-lake/sql/create-tag.md) before they can be assigned. + +See also: [CREATE TAG](/tidb-cloud-lake/sql/create-tag.md), [TAG_REFERENCES](/tidb-cloud-lake/sql/tag-references.md). + +## Syntax + +```sql +-- Assign tags +ALTER { DATABASE | TABLE | VIEW | STAGE | CONNECTION + | USER | ROLE | STREAM | FUNCTION | PROCEDURE } + [ IF EXISTS ] + SET TAG = '' [, = '' ...] + +-- Remove tags +ALTER { DATABASE | TABLE | VIEW | STAGE | CONNECTION + | USER | ROLE | STREAM | FUNCTION | PROCEDURE } + [ IF EXISTS ] + UNSET TAG [, ...] +``` + +## Supported Object Types + +| Object Type | Object Name Format | Example | +|-------------|-------------------|---------| +| DATABASE | `` | `ALTER DATABASE mydb SET TAG env = 'prod'` | +| TABLE | `[.]
` | `ALTER TABLE mydb.users SET TAG env = 'prod'` | +| VIEW | `[.]` | `ALTER VIEW mydb.active_users SET TAG env = 'prod'` | +| STAGE | `` | `ALTER STAGE my_stage SET TAG env = 'prod'` | +| CONNECTION | `` | `ALTER CONNECTION my_conn SET TAG env = 'prod'` | +| USER | `''` | `ALTER USER 'alice' SET TAG env = 'prod'` | +| ROLE | `` | `ALTER ROLE analyst SET TAG env = 'prod'` | +| STREAM | `[.]` | `ALTER STREAM mydb.my_stream SET TAG env = 'prod'` | +| FUNCTION | `` | `ALTER FUNCTION my_udf SET TAG env = 'prod'` | +| PROCEDURE | `()` | `ALTER PROCEDURE my_proc(INT) SET TAG env = 'prod'` | + +> **Notes:** +> +> - If the tag has `ALLOWED_VALUES`, the value must be one of the allowed values. +> - `UNSET TAG` with a non-existent tag name returns an error, unless the object itself does not exist and `IF EXISTS` is specified. +> - For PROCEDURE, you must include the argument type signature in the object name. + +## Examples + +### Tag a Database and Table + +```sql +CREATE TAG env ALLOWED_VALUES = ('dev', 'staging', 'prod'); +CREATE TAG owner; + +ALTER DATABASE default SET TAG env = 'prod'; +ALTER TABLE default.my_table SET TAG env = 'staging', owner = 'team_a'; +``` + +### Tag a Stage and Connection + +```sql +ALTER STAGE data_stage SET TAG env = 'dev', owner = 'data_team'; +ALTER CONNECTION my_s3 SET TAG env = 'prod'; +``` + +### Tag a View + +```sql +ALTER VIEW default.active_users SET TAG env = 'prod', owner = 'analytics'; +``` + +### Tag a User and Role + +```sql +ALTER USER 'alice' SET TAG env = 'prod', owner = 'security'; +ALTER ROLE analyst SET TAG env = 'dev'; +``` + +### Tag a UDF and Procedure + +```sql +ALTER FUNCTION my_udf SET TAG env = 'dev'; +ALTER PROCEDURE my_proc(DECIMAL(10,2)) SET TAG env = 'prod'; +``` + +### Remove Tags + +```sql +ALTER TABLE default.my_table UNSET TAG env, owner; +ALTER STAGE data_stage UNSET TAG env; +ALTER USER 'alice' UNSET TAG env, owner; +``` diff --git a/tidb-cloud-lake/sql/show-tags.md b/tidb-cloud-lake/sql/show-tags.md new file mode 100644 index 0000000000000..ab6381613ea96 --- /dev/null +++ b/tidb-cloud-lake/sql/show-tags.md @@ -0,0 +1,61 @@ +--- +title: SHOW TAGS +summary: Lists tag definitions in the current tenant. +--- + +# SHOW TAGS + +> **Note:** +> +> Introduced or updated in v1.2.863. + +Lists tag definitions in the current tenant. You can also query tag definitions through the `system.tags` table. + +See also: [CREATE TAG](/tidb-cloud-lake/sql/create-tag.md), [DROP TAG](/tidb-cloud-lake/sql/drop-tag.md). + +## Syntax + +```sql +SHOW TAGS [ LIKE '' | WHERE ] [ LIMIT ] +``` + +## Output Columns + +| Column | Description | +|------------------|------------------------------------------------------| +| `name` | Tag name | +| `allowed_values` | Permitted values list, or NULL if any value is allowed | +| `comment` | Tag description | +| `created_on` | Creation timestamp | + +## Examples + +Show all tags: + +```sql +SHOW TAGS; +``` + +Filter tags by name pattern: + +```sql +SHOW TAGS LIKE 'env%'; +``` + +Filter with a WHERE condition: + +```sql +SHOW TAGS WHERE comment IS NOT NULL; +``` + +Limit results: + +```sql +SHOW TAGS LIMIT 5; +``` + +Equivalent query using the system table: + +```sql +SELECT * FROM system.tags; +``` diff --git a/tidb-cloud-lake/sql/st-centroid.md b/tidb-cloud-lake/sql/st-centroid.md new file mode 100644 index 0000000000000..894528a15f367 --- /dev/null +++ b/tidb-cloud-lake/sql/st-centroid.md @@ -0,0 +1,52 @@ +--- +title: ST_CENTROID +summary: Returns the centroid of a GEOMETRY object. +--- + +# ST_CENTROID + +> **Note:** +> +> Introduced or updated in v1.2.895. + +Returns the centroid of a GEOMETRY object. + +This function only supports GEOMETRY values. + +## Syntax + +```sql +ST_CENTROID() +``` + +## Arguments + +| Arguments | Description | +|--------------|--------------------------------------------------------| +| `` | The argument must be an expression of type GEOMETRY. | + +## Return Type + +GEOMETRY. + +## Examples + +```sql +SELECT ST_ASWKT(ST_CENTROID(TO_GEOMETRY('POINT(1 2)'))); + +╭──────────────────────────────────────────────────╮ +│ st_aswkt(st_centroid(to_geometry('POINT(1 2)'))) │ +├──────────────────────────────────────────────────┤ +│ POINT(1 2) │ +╰──────────────────────────────────────────────────╯ +``` + +```sql +SELECT ST_ASWKT(ST_CENTROID(TO_GEOMETRY('LINESTRING(0 0, 2 0)'))); + +╭────────────────────────────────────────────────────────────╮ +│ st_aswkt(st_centroid(to_geometry('LINESTRING(0 0, 2 0)'))) │ +├────────────────────────────────────────────────────────────┤ +│ POINT(1 0) │ +╰────────────────────────────────────────────────────────────╯ +``` diff --git a/tidb-cloud-lake/sql/st-convexhull.md b/tidb-cloud-lake/sql/st-convexhull.md new file mode 100644 index 0000000000000..715878e0822f7 --- /dev/null +++ b/tidb-cloud-lake/sql/st-convexhull.md @@ -0,0 +1,44 @@ +--- +title: ST_CONVEXHULL +summary: Returns the convex hull of a GEOMETRY object. +--- + +# ST_CONVEXHULL + +> **Note:** +> +> Introduced or updated in v1.2.564. + +Returns the convex hull of a GEOMETRY object. + +## Syntax + +```sql +ST_CONVEXHULL() +``` + +## Arguments + +| Arguments | Description | +|-------------|-------------------------------------------------------| +| `` | The argument must be an expression of type GEOMETRY. | + +## Return Type + +GEOMETRY. + +## Examples + +```sql +SELECT ST_ASTEXT( + ST_CONVEXHULL( + TO_GEOMETRY('POLYGON((0 0, 2 0, 2 2, 0 2, 0 0))') + ) +) AS hull; + +╭────────────────────────────────╮ +│ hull │ +├────────────────────────────────┤ +│ POLYGON((2 0,2 2,0 2,0 0,2 0)) │ +╰────────────────────────────────╯ +``` diff --git a/tidb-cloud-lake/sql/st-difference.md b/tidb-cloud-lake/sql/st-difference.md new file mode 100644 index 0000000000000..81f19d514279d --- /dev/null +++ b/tidb-cloud-lake/sql/st-difference.md @@ -0,0 +1,47 @@ +--- +title: ST_DIFFERENCE +summary: Returns the part of the first GEOMETRY object that is not covered by the second GEOMETRY object. +--- + +# ST_DIFFERENCE + +> **Note:** +> +> Introduced or updated in v1.2.895. + +Returns the part of the first GEOMETRY object that is not covered by the second GEOMETRY object. + +This function only supports GEOMETRY values. + +## Syntax + +```sql +ST_DIFFERENCE(, ) +``` + +## Arguments + +| Arguments | Description | +|---------------|--------------------------------------------------------| +| `` | The argument must be an expression of type GEOMETRY. | +| `` | The argument must be an expression of type GEOMETRY. | + +> **Note:** +> +> The function reports an error if the two input GEOMETRY objects have different SRIDs. + +## Return Type + +GEOMETRY. + +## Examples + +```sql +SELECT ST_ASWKT(ST_DIFFERENCE(TO_GEOMETRY('POINT(0 0)'), TO_GEOMETRY('POINT(1 1)'))); + +╭───────────────────────────────────────────────────────────────────────────────╮ +│ st_aswkt(st_difference(to_geometry('POINT(0 0)'), to_geometry('POINT(1 1)'))) │ +├───────────────────────────────────────────────────────────────────────────────┤ +│ POINT(0 0) │ +╰───────────────────────────────────────────────────────────────────────────────╯ +``` diff --git a/tidb-cloud-lake/sql/st-disjoint.md b/tidb-cloud-lake/sql/st-disjoint.md new file mode 100644 index 0000000000000..a37c1485f98e7 --- /dev/null +++ b/tidb-cloud-lake/sql/st-disjoint.md @@ -0,0 +1,59 @@ +--- +title: ST_DISJOINT +summary: Returns TRUE if two GEOMETRY objects do not intersect. +--- + +# ST_DISJOINT + +> **Note:** +> +> Introduced or updated in v1.2.564. + +Returns TRUE if two GEOMETRY objects do not intersect. + +## Syntax + +```sql +ST_DISJOINT(, ) +``` + +## Arguments + +| Arguments | Description | +|---------------|-------------------------------------------------------| +| `` | The argument must be an expression of type GEOMETRY. | +| `` | The argument must be an expression of type GEOMETRY. | + +> **Note:** +> +> The function reports an error if the two input GEOMETRY objects have different SRIDs. + +## Return Type + +Boolean. + +## Examples + +```sql +SELECT ST_DISJOINT( + TO_GEOMETRY('POINT(3 3)'), + TO_GEOMETRY('POLYGON((0 0, 2 0, 2 2, 0 2, 0 0))') +) AS disjoint; + +╭──────────╮ +│ disjoint │ +├──────────┤ +│ true │ +╰──────────╯ + +SELECT ST_DISJOINT( + TO_GEOMETRY('LINESTRING(0 0, 2 2)'), + TO_GEOMETRY('LINESTRING(0 2, 2 0)') +) AS disjoint; + +╭──────────╮ +│ disjoint │ +├──────────┤ +│ false │ +╰──────────╯ +``` diff --git a/tidb-cloud-lake/sql/st-dwithin.md b/tidb-cloud-lake/sql/st-dwithin.md new file mode 100644 index 0000000000000..f56113f446dc8 --- /dev/null +++ b/tidb-cloud-lake/sql/st-dwithin.md @@ -0,0 +1,58 @@ +--- +title: ST_DWITHIN +summary: Returns TRUE if two GEOMETRY objects are within the specified Euclidean distance. +--- + +# ST_DWITHIN + +> **Note:** +> +> Introduced or updated in v1.2.895. + +Returns TRUE if two GEOMETRY objects are within the specified Euclidean distance. + +This function only supports GEOMETRY values. + +## Syntax + +```sql +ST_DWITHIN(, , ) +``` + +## Arguments + +| Arguments | Description | +|---------------|--------------------------------------------------------------| +| `` | The argument must be an expression of type GEOMETRY. | +| `` | The argument must be an expression of type GEOMETRY. | +| `` | The maximum Euclidean distance as a Float64-compatible value. | + +> **Note:** +> +> The function reports an error if the two input GEOMETRY objects have different SRIDs. + +## Return Type + +Boolean. + +## Examples + +```sql +SELECT ST_DWITHIN(TO_GEOMETRY('POINT(0 0)'), TO_GEOMETRY('POINT(1 1)'), 1.5); + +╭───────────────────────────────────────────────────────────────────────╮ +│ st_dwithin(to_geometry('POINT(0 0)'), to_geometry('POINT(1 1)'), 1.5) │ +├───────────────────────────────────────────────────────────────────────┤ +│ true │ +╰───────────────────────────────────────────────────────────────────────╯ +``` + +```sql +SELECT ST_DWITHIN(TO_GEOMETRY('POINT(0 0)'), TO_GEOMETRY('LINESTRING(2 0, 2 2)'), 1.9); + +╭─────────────────────────────────────────────────────────────────────────────────╮ +│ st_dwithin(to_geometry('POINT(0 0)'), to_geometry('LINESTRING(2 0, 2 2)'), 1.9) │ +├─────────────────────────────────────────────────────────────────────────────────┤ +│ false │ +╰─────────────────────────────────────────────────────────────────────────────────╯ +``` diff --git a/tidb-cloud-lake/sql/st-envelope.md b/tidb-cloud-lake/sql/st-envelope.md new file mode 100644 index 0000000000000..252570ea538df --- /dev/null +++ b/tidb-cloud-lake/sql/st-envelope.md @@ -0,0 +1,43 @@ +--- +title: ST_ENVELOPE +summary: Returns the minimum bounding rectangle of a GEOMETRY object as a polygon. +--- + +# ST_ENVELOPE + +> **Note:** +> +> Introduced or updated in v1.2.895. + +Returns the minimum bounding rectangle of a GEOMETRY object as a polygon. + +This function only supports GEOMETRY values. + +## Syntax + +```sql +ST_ENVELOPE() +``` + +## Arguments + +| Arguments | Description | +|--------------|--------------------------------------------------------| +| `` | The argument must be an expression of type GEOMETRY. | + +## Return Type + +GEOMETRY. + +## Examples + +```sql +SELECT ST_ASWKT(ST_ENVELOPE(TO_GEOMETRY('LINESTRING(0 0, 2 3)'))); + +╭────────────────────────────────────────────────────────────╮ +│ st_aswkt(st_envelope(to_geometry('LINESTRING(0 0, 2 3)'))) │ +│ String │ +├────────────────────────────────────────────────────────────┤ +│ POLYGON((0 0,2 0,2 3,0 3,0 0)) │ +╰────────────────────────────────────────────────────────────╯ +``` diff --git a/tidb-cloud-lake/sql/st-equals.md b/tidb-cloud-lake/sql/st-equals.md new file mode 100644 index 0000000000000..aa8873682258b --- /dev/null +++ b/tidb-cloud-lake/sql/st-equals.md @@ -0,0 +1,55 @@ +--- +title: ST_EQUALS +summary: Returns TRUE if two GEOMETRY objects are spatially equal. +--- + +# ST_EQUALS + +Returns TRUE if two GEOMETRY objects are spatially equal. + +## Syntax + +```sql +ST_EQUALS(, ) +``` + +## Arguments + +| Arguments | Description | +|---------------|-------------------------------------------------------| +| `` | The argument must be an expression of type GEOMETRY. | +| `` | The argument must be an expression of type GEOMETRY. | + +> **Note:** +> +> The function reports an error if the two input GEOMETRY objects have different SRIDs. + +## Return Type + +Boolean. + +## Examples + +```sql +SELECT ST_EQUALS( + TO_GEOMETRY('POINT(1 1)'), + TO_GEOMETRY('POINT(1 1)') +) AS equals; + +╭────────╮ +│ equals │ +├────────┤ +│ true │ +╰────────╯ + +SELECT ST_EQUALS( + TO_GEOMETRY('POINT(1 1)'), + TO_GEOMETRY('POINT(1 2)') +) AS equals; + +╭────────╮ +│ equals │ +├────────┤ +│ false │ +╰────────╯ +``` diff --git a/tidb-cloud-lake/sql/st-hilbert.md b/tidb-cloud-lake/sql/st-hilbert.md new file mode 100644 index 0000000000000..d667c0b44d567 --- /dev/null +++ b/tidb-cloud-lake/sql/st-hilbert.md @@ -0,0 +1,77 @@ +--- +title: ST_HILBERT +summary: Encodes a GEOMETRY or GEOGRAPHY object into a Hilbert curve index. +--- + +# ST_HILBERT + +> **Note:** +> +> Introduced or updated in v1.2.885. + +Encodes a GEOMETRY or GEOGRAPHY object into a Hilbert curve index. The function uses the center of the geometry's bounding box as the point to encode. When bounds are provided, the point is normalized into the specified bounding box before encoding. + +## Syntax + +```sql +ST_HILBERT() +ST_HILBERT(, ) +``` + +## Arguments + +| Arguments | Description | +|-----------|-------------| +| `` | The argument must be an expression of type GEOMETRY or GEOGRAPHY. | +| `` | Optional. An array `[xmin, ymin, xmax, ymax]` used to normalize the point before encoding. | + +> **Note:** +> +> - Geometry: If no bounding box is provided, GEOMETRY coordinates are not normalized to a specific bounding box. Instead, the center point values are mapped to the full `float32` domain, and then encoded into the Hilbert index. +> - Geography: If no bounding box is provided, the default bounds are `[-180, -90, 180, 90]`. + +## Return Type + +UInt64. + +## Examples + +### GEOMETRY examples + +```sql +SELECT ST_HILBERT(TO_GEOMETRY('POINT(1 2)')) AS hilbert1, ST_HILBERT(TO_GEOMETRY('POINT(5 5)')) AS hilbert2; + +╭───────────────────────────╮ +│ hilbert1 │ hilbert2 │ +├─────────────┼─────────────┤ +│ 3355443200 │ 2155872256 │ +╰───────────────────────────╯ + +SELECT ST_HILBERT(TO_GEOMETRY('POINT(1 2)'), [0, 0, 1, 1]) AS hilbert1, ST_HILBERT(TO_GEOMETRY('POINT(5 5)'), [0, 0, 5, 5]) AS hilbert2; + +╭───────────────────────────╮ +│ hilbert1 │ hilbert2 │ +├─────────────┼─────────────┤ +│ 2863311530 │ 2863311530 │ +╰───────────────────────────╯ +``` + +### GEOGRAPHY examples + +```sql +SELECT ST_HILBERT(TO_GEOGRAPHY('POINT(113.15 23.06)')) AS hilbert1, ST_HILBERT(TO_GEOGRAPHY('POINT(116.25 39.54)')) AS hilbert2; + +╭───────────────────────────╮ +│ hilbert1 │ hilbert2 │ +├─────────────┼─────────────┤ +│ 3070259060 │ 3033451300 │ +╰───────────────────────────╯ + +SELECT ST_HILBERT(TO_GEOGRAPHY('POINT(113.15 23.06)'), [73, 4, 135, 53]) AS hilbert1, ST_HILBERT(TO_GEOGRAPHY('POINT(116.25 39.54)'), [73, 4, 135, 53]) AS hilbert2; + +╭───────────────────────────╮ +│ hilbert1 │ hilbert2 │ +├─────────────┼─────────────┤ +│ 3533607194 │ 2330429279 │ +╰───────────────────────────╯ +``` diff --git a/tidb-cloud-lake/sql/st-intersection.md b/tidb-cloud-lake/sql/st-intersection.md new file mode 100644 index 0000000000000..527fd7fd8f4ca --- /dev/null +++ b/tidb-cloud-lake/sql/st-intersection.md @@ -0,0 +1,47 @@ +--- +title: ST_INTERSECTION +summary: Returns the shared part of two GEOMETRY objects. +--- + +# ST_INTERSECTION + +> **Note:** +> +> Introduced or updated in v1.2.895. + +Returns the shared part of two GEOMETRY objects. + +This function only supports GEOMETRY values. + +## Syntax + +```sql +ST_INTERSECTION(, ) +``` + +## Arguments + +| Arguments | Description | +|---------------|--------------------------------------------------------| +| `` | The argument must be an expression of type GEOMETRY. | +| `` | The argument must be an expression of type GEOMETRY. | + +> **Note:** +> +> The function reports an error if the two input GEOMETRY objects have different SRIDs. + +## Return Type + +GEOMETRY. + +## Examples + +```sql +SELECT ST_ASWKT(ST_INTERSECTION(TO_GEOMETRY('LINESTRING(0 0, 1 1)'), TO_GEOMETRY('LINESTRING(0 0, 1 1)'))); + +╭─────────────────────────────────────────────────────────────────────────────────────────────────────╮ +│ st_aswkt(st_intersection(to_geometry('LINESTRING(0 0, 1 1)'), to_geometry('LINESTRING(0 0, 1 1)'))) │ +├─────────────────────────────────────────────────────────────────────────────────────────────────────┤ +│ LINESTRING(0 0,1 1) │ +╰─────────────────────────────────────────────────────────────────────────────────────────────────────╯ +``` diff --git a/tidb-cloud-lake/sql/st-intersects.md b/tidb-cloud-lake/sql/st-intersects.md new file mode 100644 index 0000000000000..81fd1751bdcff --- /dev/null +++ b/tidb-cloud-lake/sql/st-intersects.md @@ -0,0 +1,59 @@ +--- +title: ST_INTERSECTS +summary: Returns TRUE if two GEOMETRY objects share any portion of space. +--- + +# ST_INTERSECTS + +> **Note:** +> +> Introduced or updated in v1.2.564. + +Returns TRUE if two GEOMETRY objects share any portion of space. + +## Syntax + +```sql +ST_INTERSECTS(, ) +``` + +## Arguments + +| Arguments | Description | +|---------------|-------------------------------------------------------| +| `` | The argument must be an expression of type GEOMETRY. | +| `` | The argument must be an expression of type GEOMETRY. | + +> **Note:** +> +> The function reports an error if the two input GEOMETRY objects have different SRIDs. + +## Return Type + +Boolean. + +## Examples + +```sql +SELECT ST_INTERSECTS( + TO_GEOMETRY('LINESTRING(0 0, 2 2)'), + TO_GEOMETRY('LINESTRING(0 2, 2 0)') +) AS intersects; + +╭────────────╮ +│ intersects │ +├────────────┤ +│ true │ +╰────────────╯ + +SELECT ST_INTERSECTS( + TO_GEOMETRY('POLYGON((0 0, 2 0, 2 2, 0 2, 0 0))'), + TO_GEOMETRY('POINT(3 3)') +) AS intersects; + +╭────────────╮ +│ intersects │ +├────────────┤ +│ false │ +╰────────────╯ +``` diff --git a/tidb-cloud-lake/sql/st-srid.md b/tidb-cloud-lake/sql/st-srid.md index 2de7f7ea1f81f..81d37bd5a3e43 100644 --- a/tidb-cloud-lake/sql/st-srid.md +++ b/tidb-cloud-lake/sql/st-srid.md @@ -29,7 +29,8 @@ INT32. > **Note:** > -> If the Geometry don't have a SRID, a default value 4326 will be returned. +> - If the Geometry don't have a SRID, a default value `0` will be returned. +> - For Geography, the SRID is always `4326`. ## Examples @@ -60,7 +61,7 @@ SELECT ┌───────────────┐ │ pipeline_srid │ ├───────────────┤ -│ 4326 │ +│ 0 │ └───────────────┘ ``` diff --git a/tidb-cloud-lake/sql/st-symdifference.md b/tidb-cloud-lake/sql/st-symdifference.md new file mode 100644 index 0000000000000..7086fda365b0a --- /dev/null +++ b/tidb-cloud-lake/sql/st-symdifference.md @@ -0,0 +1,47 @@ +--- +title: ST_SYMDIFFERENCE +summary: Returns the non-overlapping parts of two GEOMETRY objects. +--- + +# ST_SYMDIFFERENCE + +> **Note:** +> +> Introduced or updated in v1.2.895. + +Returns the parts of two GEOMETRY objects that do not overlap. + +This function only supports GEOMETRY values. + +## Syntax + +```sql +ST_SYMDIFFERENCE(, ) +``` + +## Arguments + +| Arguments | Description | +|---------------|--------------------------------------------------------| +| `` | The argument must be an expression of type GEOMETRY. | +| `` | The argument must be an expression of type GEOMETRY. | + +> **Note:** +> +> The function reports an error if the two input GEOMETRY objects have different SRIDs. + +## Return Type + +GEOMETRY. + +## Examples + +```sql +SELECT ST_ASWKT(ST_SYMDIFFERENCE(TO_GEOMETRY('POINT(0 0)'), TO_GEOMETRY('POINT(1 1)'))); + +╭──────────────────────────────────────────────────────────────────────────────────╮ +│ st_aswkt(st_symdifference(to_geometry('POINT(0 0)'), to_geometry('POINT(1 1)'))) │ +├──────────────────────────────────────────────────────────────────────────────────┤ +│ MULTIPOINT(0 0,1 1) │ +╰──────────────────────────────────────────────────────────────────────────────────╯ +``` diff --git a/tidb-cloud-lake/sql/st-union.md b/tidb-cloud-lake/sql/st-union.md new file mode 100644 index 0000000000000..4c419bdec7abd --- /dev/null +++ b/tidb-cloud-lake/sql/st-union.md @@ -0,0 +1,47 @@ +--- +title: ST_UNION +summary: Returns the combined GEOMETRY made from two input GEOMETRY objects. +--- + +# ST_UNION + +> **Note:** +> +> Introduced or updated in v1.2.895. + +Returns the combined GEOMETRY made from two input GEOMETRY objects. + +This function only supports GEOMETRY values. + +## Syntax + +```sql +ST_UNION(, ) +``` + +## Arguments + +| Arguments | Description | +|---------------|--------------------------------------------------------| +| `` | The argument must be an expression of type GEOMETRY. | +| `` | The argument must be an expression of type GEOMETRY. | + +> **Note:** +> +> The function reports an error if the two input GEOMETRY objects have different SRIDs. + +## Return Type + +GEOMETRY. + +## Examples + +```sql +SELECT ST_ASWKT(ST_UNION(TO_GEOMETRY('POINT(0 0)'), TO_GEOMETRY('POINT(1 1)'))); + +╭──────────────────────────────────────────────────────────────────────────╮ +│ st_aswkt(st_union(to_geometry('POINT(0 0)'), to_geometry('POINT(1 1)'))) │ +├──────────────────────────────────────────────────────────────────────────┤ +│ MULTIPOINT(0 0,1 1) │ +╰──────────────────────────────────────────────────────────────────────────╯ +``` diff --git a/tidb-cloud-lake/sql/st-within.md b/tidb-cloud-lake/sql/st-within.md new file mode 100644 index 0000000000000..be57c18aa4490 --- /dev/null +++ b/tidb-cloud-lake/sql/st-within.md @@ -0,0 +1,48 @@ +--- +title: ST_WITHIN +summary: Returns TRUE if the first GEOMETRY object is completely within the second GEOMETRY object. +--- + +# ST_WITHIN + +> **Note:** +> +> Introduced or updated in v1.2.564. + +Returns TRUE if the first GEOMETRY object is completely within the second GEOMETRY object. + +## Syntax + +```sql +ST_WITHIN(, ) +``` + +## Arguments + +| Arguments | Description | +|---------------|-------------------------------------------------------| +| `` | The argument must be an expression of type GEOMETRY. | +| `` | The argument must be an expression of type GEOMETRY. | + +> **Note:** +> +> The function reports an error if the two input GEOMETRY objects have different SRIDs. + +## Return Type + +Boolean. + +## Examples + +```sql +SELECT ST_WITHIN( + TO_GEOMETRY('POINT(1 1)'), + TO_GEOMETRY('POLYGON((0 0, 2 0, 2 2, 0 2, 0 0))') +) AS within; + +╭─────────╮ +│ within │ +├─────────┤ +│ true │ +╰─────────╯ +``` diff --git a/tidb-cloud-lake/sql/strip-null-value.md b/tidb-cloud-lake/sql/strip-null-value.md index 03c25213f702d..c00fc49dfc551 100644 --- a/tidb-cloud-lake/sql/strip-null-value.md +++ b/tidb-cloud-lake/sql/strip-null-value.md @@ -1,6 +1,6 @@ --- title: STRIP_NULL_VALUE -summary: Removes all properties with null values from a JSON object. +summary: Converts a JSON null value to a SQL NULL value. All other variant values are passed unchanged. --- # STRIP_NULL_VALUE @@ -9,24 +9,39 @@ summary: Removes all properties with null values from a JSON object. > > Introduced or updated in v1.2.762. -Removes all properties with null values from a JSON object. +Converts a JSON null value to a SQL NULL value. All other variant values are passed unchanged. ## Syntax ```sql -STRIP_NULL_VALUE() +STRIP_NULL_VALUE() ``` +## Arguments + +An expression of type VARIANT. + ## Return Type -Returns a value of the same type as the input JSON value. +- If the expression is a JSON null value, the function returns a SQL NULL. +- If the expression is not a JSON null value, the function returns the input value. ## Examples ```sql -SELECT STRIP_NULL_VALUE(PARSE_JSON('{"name": "Alice", "age": 30, "city": null}')); +SELECT STRIP_NULL_VALUE(PARSE_JSON('null')) AS value; + +╭───────╮ +│ value │ +├───────┤ +│ NULL │ +╰───────╯ + +SELECT STRIP_NULL_VALUE(PARSE_JSON('{"name": "Alice", "age": 30, "city": null}')) AS value; -strip_null_value(parse_json('{"name": "alice", "age": 30, "city": null}'))| ---------------------------------------------------------------------------+ -{"age":30,"name":"Alice"} | +╭───────────────────────────────────────╮ +│ value │ +├───────────────────────────────────────┤ +│ {"age":30,"city":null,"name":"Alice"} │ +╰───────────────────────────────────────╯ ``` diff --git a/tidb-cloud-lake/sql/structured-semi-structured-functions.md b/tidb-cloud-lake/sql/structured-semi-structured-functions.md index 5462e943951c7..23a2928855366 100644 --- a/tidb-cloud-lake/sql/structured-semi-structured-functions.md +++ b/tidb-cloud-lake/sql/structured-semi-structured-functions.md @@ -43,7 +43,8 @@ Structured and semi-structured functions in {{{ .lake }}} enable efficient proce |----------|-------------|--------| | [JSON_TO_STRING](/tidb-cloud-lake/sql/json-to-string.md) | Converts a JSON value to a string | `JSON_TO_STRING(PARSE_JSON('{"a":1}'))` | | [JSON_PRETTY](/tidb-cloud-lake/sql/json-pretty.md) | Formats JSON with proper indentation | `JSON_PRETTY(PARSE_JSON('{"a":1}'))` | -| [STRIP_NULL_VALUE](/tidb-cloud-lake/sql/strip-null-value.md) | Removes null values from JSON | `STRIP_NULL_VALUE(PARSE_JSON('{"a":1,"b":null}'))` | +| [STRIP_NULL_VALUE](/tidb-cloud-lake/sql/strip-null-value.md) | Converts a JSON null value to a SQL NULL value | `STRIP_NULL_VALUE(parse_json('null'))` → `NULL` | +| [JSON_STRIP_NULLS](/tidb-cloud-lake/sql/json-strip-nulls.md) | Removes null values from JSON Object | `JSON_STRIP_NULLS(PARSE_JSON('{"a":1,"b":null}'))` → `{"a":1}` | ### Array/Object Expansion diff --git a/tidb-cloud-lake/sql/system-caches.md b/tidb-cloud-lake/sql/system-caches.md index b4b4cac8b43bf..c18a6da300d15 100644 --- a/tidb-cloud-lake/sql/system-caches.md +++ b/tidb-cloud-lake/sql/system-caches.md @@ -5,21 +5,67 @@ summary: An overview of various caches being managed in {{{ .lake }}}. # system.caches -An overview of various caches being managed in {{{ .lake }}}. +An overview of various caches managed in {{{ .lake }}}, including usage and hit rate statistics. -The table below shows the cache name, the number of items in the cache, and the size of the cache: +## Columns + +| Column | Description | +|-----------|--------------------------------------------------------------------------| +| node | The node name | +| name | Cache name (same as the first parameter in `system$set_cache_capacity`) | +| num_items | Number of cached entries | +| size | Size of cached entries (count or bytes depending on `unit`) | +| capacity | Maximum capacity (count or bytes depending on `unit`) | +| unit | Unit of `size` and `capacity`: `count` or `bytes` | +| access | Total number of cache accesses | +| hit | Number of cache hits | +| miss | Number of cache misses | + +## Cache List + +| Cache Name | Cached Object | Unit | Notes | +|----------------------------------------------|----------------------------------------------------|-------|-------| +| memory_cache_table_snapshot | Table snapshot | count | Enabled by default; default capacity is usually sufficient | +| memory_cache_table_statistics | Table statistics | count | | +| memory_cache_compact_segment_info | Compressed table segment metadata | bytes | | +| memory_cache_segment_statistics | Segment-level statistics | bytes | | +| memory_cache_column_oriented_segment_info | Column-oriented segment metadata | bytes | | +| disk_cache_column_data | On-disk column data cache | bytes | Cannot be adjusted via `system$set_cache_capacity` | +| memory_cache_bloom_index_filter | Bloom filter data | bytes | One entry per column per block. Memory usage is small. Monitor hit rate for point-lookup workloads. | +| memory_cache_bloom_index_file_meta_data | Bloom filter metadata | count | Each table can cache up to as many entries as it has blocks. Memory usage is small. Monitor hit rate for point-lookup workloads. | +| memory_cache_inverted_index_file_meta_data | Inverted index metadata | count | | +| memory_cache_inverted_index_file | Inverted index data | bytes | | +| memory_cache_vector_index_file_meta_data | Vector index metadata | count | | +| memory_cache_vector_index_file | Vector index data | bytes | | +| memory_cache_spatial_index_file_meta_data | Spatial index metadata | count | | +| memory_cache_spatial_index_file | Spatial index data | bytes | | +| memory_cache_virtual_column_file_meta_data | Virtual column file metadata | count | | +| memory_cache_prune_partitions | Partition pruning cache | count | Enabled by default. Caches pruning results for deterministic queries. Set capacity to 0 to bypass for pruning testing. | +| memory_cache_parquet_meta_data | Parquet file metadata | count | Used by Hive tables and other sources | +| memory_cache_iceberg_table | Iceberg table metadata | count | | + +## Example ```sql SELECT * FROM system.caches; -+--------------------------------+-----------+------+ -| name | num_items | size | -+--------------------------------+-----------+------+ -| table_snapshot_cache | 2 | 2 | -| table_snapshot_statistic_cache | 0 | 0 | -| segment_info_cache | 64 | 64 | -| bloom_index_filter_cache | 0 | 0 | -| bloom_index_meta_cache | 0 | 0 | -| prune_partitions_cache | 2 | 2 | -| file_meta_data_cache | 0 | 0 | -+--------------------------------+-----------+------+ +``` + +Check utilization and hit rate for all caches: + +```sql +SELECT + node, + name, + capacity, + if(unit = 'count', (num_items + 1) / (capacity + 1), + unit = 'bytes', (size + 1) / (capacity + 1), -1) AS utilization, + if(access = 0, 0, hit / access) AS hit_rate, + if(access = 0, 0, miss / access) AS miss_rate, + num_items, + size, + unit, + access, + hit, + miss +FROM system.caches; ``` diff --git a/tidb-cloud-lake/sql/table-functions.md b/tidb-cloud-lake/sql/table-functions.md index 69d4e91f9a3e6..09015076ae6b4 100644 --- a/tidb-cloud-lake/sql/table-functions.md +++ b/tidb-cloud-lake/sql/table-functions.md @@ -43,6 +43,7 @@ This page provides reference information for the table functions in {{{ .lake }} | [STREAM_STATUS](/tidb-cloud-lake/sql/stream-status.md) | Shows stream status information | `SELECT * FROM STREAM_STATUS('mystream')` | | [TASK_HISTROY](/tidb-cloud-lake/sql/task-history.md) | Shows task execution history | `SELECT * FROM TASK_HISTROY('mytask')` | | [POLICY_REFERENCES](/tidb-cloud-lake/sql/policy-references.md) | Returns associations between security policies and tables/views | `SELECT * FROM POLICY_REFERENCES(POLICY_NAME => 'mypolicy')` | +| [TAG_REFERENCES](/tidb-cloud-lake/sql/tag-references.md) | Returns tags assigned to a database object | `SELECT * FROM TAG_REFERENCES('mydb.mytable', 'TABLE')` | ## Storage Engine Functions diff --git a/tidb-cloud-lake/sql/table-versioning.md b/tidb-cloud-lake/sql/table-versioning.md new file mode 100644 index 0000000000000..9e94d4f1256d4 --- /dev/null +++ b/tidb-cloud-lake/sql/table-versioning.md @@ -0,0 +1,42 @@ +--- +title: Table Versioning +summary: Table versioning lets you create named references to specific snapshots of a FUSE table. These references survive automatic retention cleanup, giving you stable, human-readable pointers to historical table states. +--- + +# Table Versioning + +Table versioning lets you create named references to specific snapshots of a FUSE table. These references survive automatic retention cleanup, giving you stable, human-readable pointers to historical table states. + +> **Note:** +> +> Table versioning is an experimental feature. Enable it before use: +> +> ```sql +> SET enable_experimental_table_ref = 1; +> ``` + +## Snapshot Tags + +Snapshot tags pin a specific point-in-time state of a table by name. Once created, a tag holds a reference to a particular snapshot so you can query that exact state at any time using the [AT](/tidb-cloud-lake/sql/at.md) clause, without needing to track snapshot IDs or timestamps. + +**Use cases:** + +- **Release checkpoints**: Tag the table state before and after a data pipeline run so you can compare or roll back. +- **Audit and compliance**: Preserve a named snapshot for regulatory review without worrying about retention expiry. +- **Safe experimentation**: Tag the current state, run experimental transforms, then query the tag to verify what changed. +- **Reproducible analytics**: Pin a dataset version so dashboards and reports always reference the same data. + +**How it works:** + +A snapshot tag attaches a human-readable name to a snapshot. As long as the tag exists, the referenced snapshot is protected from vacuum and garbage collection — even if the retention period has passed. + +- A tag without `RETAIN` lives until explicitly dropped. +- A tag with `RETAIN { DAYS | SECONDS }` is automatically removed after the specified duration during the next vacuum operation. + +**SQL Commands:** + +| Command | Description | +|---------|-------------| +| [CREATE SNAPSHOT TAG](/tidb-cloud-lake/sql/create-snapshot-tag.md) | Create a named tag on a table snapshot | +| [DROP SNAPSHOT TAG](/tidb-cloud-lake/sql/drop-snapshot-tag.md) | Remove a snapshot tag | +| [FUSE_TAG](/tidb-cloud-lake/sql/fuse-tag.md) | List all snapshot tags on a table | diff --git a/tidb-cloud-lake/sql/tag-overview.md b/tidb-cloud-lake/sql/tag-overview.md new file mode 100644 index 0000000000000..38a9c4ad813be --- /dev/null +++ b/tidb-cloud-lake/sql/tag-overview.md @@ -0,0 +1,23 @@ +--- +title: Tag +summary: Overview of tag management and assignment in TiDB Cloud Lake. +--- + +# Tag + +Tags let you attach key-value metadata to {{{ .lake }}} objects for data governance, classification, and compliance tracking. You can define tags with optional allowed values, assign them to objects, and query tag assignments through the `TAG_REFERENCES` table function. + +## Tag Management + +| Command | Description | +|---------|-------------| +| [CREATE TAG](/tidb-cloud-lake/sql/create-tag.md) | Creates a new tag with optional allowed values and comment | +| [DROP TAG](/tidb-cloud-lake/sql/drop-tag.md) | Removes a tag (must have no active references) | +| [SHOW TAGS](/tidb-cloud-lake/sql/show-tags.md) | Lists tag definitions | + +## Tag Assignment + +| Command | Description | +|---------|-------------| +| [SET TAG / UNSET TAG](/tidb-cloud-lake/sql/set-tag.md) | Assigns or removes tags on database objects | +| [TAG_REFERENCES](/tidb-cloud-lake/sql/tag-references.md) | Queries tag assignments on a specific object | \ No newline at end of file diff --git a/tidb-cloud-lake/sql/tag-references.md b/tidb-cloud-lake/sql/tag-references.md new file mode 100644 index 0000000000000..4b374c4c84ce2 --- /dev/null +++ b/tidb-cloud-lake/sql/tag-references.md @@ -0,0 +1,87 @@ +--- +title: TAG_REFERENCES +summary: Returns all tags assigned to a specified database object. +--- + +# TAG_REFERENCES + +> **Note:** +> +> Introduced or updated in v1.2.866. + +Returns all tags assigned to a specified database object. Use this function to audit tag assignments for governance and compliance. + +See also: [SET TAG / UNSET TAG](/tidb-cloud-lake/sql/set-tag.md). + +## Syntax + +```sql +SELECT * FROM TAG_REFERENCES('', '') +``` + +| Parameter | Description | +|-----------------|--------------------------------------------------------------------| +| `object_name` | Name of the object. For tables/views/streams, use `db.name` format. For procedures, include the type signature (e.g., `my_proc(INT)`). | +| `domain` | Object type: `DATABASE`, `TABLE`, `VIEW`, `STREAM`, `STAGE`, `CONNECTION`, `USER`, `ROLE`, `UDF`, or `PROCEDURE`. | + +## Output Columns + +| Column | Type | Description | +|-------------------|------------------|---------------------------------------------| +| `tag_name` | String | Name of the tag | +| `tag_value` | String | Value assigned to the tag | +| `object_database` | Nullable(String) | Database name (NULL for STAGE, CONNECTION, USER, ROLE, UDF, PROCEDURE) | +| `object_id` | Nullable(UInt64) | Object ID (non-NULL only for DATABASE, TABLE, VIEW) | +| `object_name` | String | Name of the object | +| `domain` | String | Object type | + +## Examples + +### Query Tags on a Table + +```sql +CREATE TAG env ALLOWED_VALUES = ('dev', 'staging', 'prod'); +CREATE TAG owner; + +CREATE TABLE default.users (id INT, name STRING); +ALTER TABLE default.users SET TAG env = 'prod', owner = 'team_a'; + +SELECT * EXCLUDE(object_id) FROM TAG_REFERENCES('default.users', 'TABLE'); + +┌───────────────────────────────────────────────────────────────────────┐ +│ tag_name │ tag_value │ object_database │ object_name │ domain │ +├──────────┼───────────┼─────────────────┼─────────────┼──────────────┤ +│ env │ prod │ default │ users │ TABLE │ +│ owner │ team_a │ default │ users │ TABLE │ +└───────────────────────────────────────────────────────────────────────┘ +``` + +### Query Tags on a Stage + +```sql +CREATE STAGE data_stage; +ALTER STAGE data_stage SET TAG env = 'staging', owner = 'data_team'; + +SELECT * EXCLUDE(object_id) FROM TAG_REFERENCES('data_stage', 'STAGE'); + +┌───────────────────────────────────────────────────────────────────────┐ +│ tag_name │ tag_value │ object_database │ object_name │ domain │ +├──────────┼───────────┼─────────────────┼─────────────┼──────────────┤ +│ env │ staging │ NULL │ data_stage │ STAGE │ +│ owner │ data_team │ NULL │ data_stage │ STAGE │ +└───────────────────────────────────────────────────────────────────────┘ +``` + +### Query Tags on a Database + +```sql +ALTER DATABASE default SET TAG env = 'prod'; + +SELECT * EXCLUDE(object_id) FROM TAG_REFERENCES('default', 'DATABASE'); + +┌───────────────────────────────────────────────────────────────────────┐ +│ tag_name │ tag_value │ object_database │ object_name │ domain │ +├──────────┼───────────┼─────────────────┼─────────────┼──────────────┤ +│ env │ prod │ default │ default │ DATABASE │ +└───────────────────────────────────────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/tutorials/backup-restore-with-bendsave.md b/tidb-cloud-lake/tutorials/backup-restore-with-bendsave.md index a590148d824da..441da07aa51cb 100644 --- a/tidb-cloud-lake/tutorials/backup-restore-with-bendsave.md +++ b/tidb-cloud-lake/tutorials/backup-restore-with-bendsave.md @@ -111,7 +111,7 @@ aws --endpoint-url http://127.0.0.1:9000/ s3 mb s3://databend ```bash curl -I http://127.0.0.1:28002/v1/health - + curl -I http://127.0.0.1:8080/v1/health ``` @@ -155,7 +155,7 @@ aws --endpoint-url http://127.0.0.1:9000 s3 ls s3://databend/ --recursive ```bash export AWS_ACCESS_KEY_ID=minioadmin export AWS_SECRET_ACCESS_KEY=minioadmin - + ./databend-bendsave backup \ --from ../configs/databend-query.toml \ --to 's3://backupbucket?endpoint=http://127.0.0.1:9000/®ion=us-east-1'