Skip to content

Commit fd75019

Browse files
authored
docs: add PARTITION BY for COPY INTO <location> (#3077)
* docs: add PARTITION BY support for COPY INTO <location> (databend#19390) * docs: fix PARTITION BY example to correctly demonstrate _NULL_ folder
1 parent 3dc705c commit fd75019

1 file changed

Lines changed: 61 additions & 1 deletion

File tree

docs/en/sql-reference/10-sql-commands/10-dml/dml-copy-into-location.md

Lines changed: 61 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ sidebar_label: "COPY INTO <location>"
55

66
import FunctionDescription from '@site/src/components/FunctionDescription';
77

8-
<FunctionDescription description="Introduced or updated: v1.2.647"/>
8+
<FunctionDescription description="Introduced or updated: v1.2.881"/>
99

1010
COPY INTO allows you to unload data from a table or query into one or more files in one of the following locations:
1111

@@ -19,6 +19,7 @@ See also: [`COPY INTO <table>`](dml-copy-into-table.md)
1919
```sql
2020
COPY INTO { internalStage | externalStage | externalLocation }
2121
FROM { [<database_name>.]<table_name> | ( <query> ) }
22+
[ PARTITION BY ( <expr> ) ]
2223
[ FILE_FORMAT = (
2324
FORMAT_NAME = '<your-custom-format>'
2425
| TYPE = { CSV | TSV | NDJSON | PARQUET } [ formatTypeOptions ]
@@ -118,6 +119,22 @@ For the connection parameters available for accessing Tencent Cloud Object Stora
118119

119120
See [Input & Output File Formats](../../00-sql-reference/50-file-format-options.md) for details.
120121

122+
### PARTITION BY
123+
124+
Specifies an expression used to partition the unloaded data into separate folders. The expression must evaluate to a `STRING` type. Each distinct value produced by the expression creates a subfolder in the destination path, and the corresponding rows are written into files under that subfolder.
125+
126+
- If the expression evaluates to `NULL`, the rows are placed in a special `_NULL_` folder.
127+
- The expression can reference any columns from the source table or query.
128+
- Path traversal (`..`) is not allowed in partition values.
129+
130+
The following options are incompatible with `PARTITION BY` and will cause an error if set:
131+
132+
| Option | Restriction |
133+
| ------------------- | ------------------------------------------------ |
134+
| SINGLE | Cannot be `TRUE` when using `PARTITION BY`. |
135+
| OVERWRITE | Cannot be `TRUE` when using `PARTITION BY`. |
136+
| INCLUDE_QUERY_ID | Cannot be `FALSE` when using `PARTITION BY`. |
137+
121138
### copyOptions
122139

123140
```sql
@@ -289,3 +306,46 @@ COPY INTO 's3://databend'
289306
```
290307

291308
![Alt text](/img/sql/copy-into-bucket.png)
309+
310+
### Example 4: Unloading with PARTITION BY
311+
312+
This example unloads data into partitioned folders based on a derived expression:
313+
314+
```sql
315+
-- Create a sample table
316+
CREATE TABLE sales_data (
317+
sale_date DATE,
318+
region VARCHAR,
319+
amount INT
320+
);
321+
322+
INSERT INTO sales_data VALUES
323+
('2025-01-15', 'east', 100),
324+
('2025-01-20', 'west', 200),
325+
('2025-02-10', 'east', 150),
326+
(NULL, 'west', 50);
327+
328+
-- Create an internal stage
329+
CREATE STAGE partitioned_stage;
330+
331+
-- Unload data partitioned by year-month derived from sale_date
332+
-- When sale_date is NULL, to_varchar() returns NULL, so the entire
333+
-- concatenation evaluates to NULL and the row lands in the _NULL_ folder.
334+
COPY INTO @partitioned_stage
335+
FROM sales_data
336+
PARTITION BY ('month=' || to_varchar(sale_date, 'YYYY-MM'))
337+
FILE_FORMAT = (TYPE = PARQUET);
338+
339+
-- Verify the partitioned folder layout
340+
SELECT name FROM list_stage(location => '@partitioned_stage') ORDER BY name;
341+
342+
┌──────────────────────────────────────────────────────────────────┐
343+
│ name │
344+
├──────────────────────────────────────────────────────────────────┤
345+
│ _NULL_/data_<query_id>_0000_00000000.parquet
346+
│ month=2025-01/data_<query_id>_0000_00000000.parquet
347+
│ month=2025-02/data_<query_id>_0000_00000000.parquet
348+
└──────────────────────────────────────────────────────────────────┘
349+
```
350+
351+
When the partition expression evaluates to `NULL`, the data is placed in a `_NULL_` folder. Each unique partition value creates its own subfolder containing the corresponding data files.

0 commit comments

Comments
 (0)