@@ -10,6 +10,8 @@ Registers the `maxcompute` dialect via Python entry points so that SQLGlot can p
1010pip install sqlglot-maxcompute
1111```
1212
13+ Requires Python ≥ 3.9 and SQLGlot ≥ 29.
14+
1315## Usage
1416
1517``` python
@@ -19,72 +21,82 @@ import sqlglot
1921ast = sqlglot.parse_one(" SELECT DATEADD(dt, 1, 'DAY')" , read = " maxcompute" )
2022
2123# Transpile from another dialect to MaxCompute
22- sqlglot.transpile(
23- " SELECT DATE_ADD(dt, 1)" ,
24- read = " spark" ,
25- write = " maxcompute" ,
26- )
24+ sqlglot.transpile(" SELECT DATE_ADD(dt, 1)" , read = " spark" , write = " maxcompute" )
2725# ["SELECT DATEADD(dt, 1, 'DAY')"]
2826
2927# Transpile from MaxCompute to another dialect
30- sqlglot.transpile(
31- " SELECT DATETRUNC(dt, 'MONTH')" ,
32- read = " maxcompute" ,
33- write = " spark" ,
34- )
28+ sqlglot.transpile(" SELECT DATETRUNC(dt, 'MONTH')" , read = " maxcompute" , write = " spark" )
3529# ["SELECT TRUNC(dt, 'MONTH')"]
3630
37- # Round-trip: parse and regenerate MaxCompute SQL
31+ # TO_DATE return type depends on args:
32+ # without format → DATE (exp.TsOrDsToDate)
33+ # with format → DATETIME (exp.StrToTime)
34+ sqlglot.transpile(" TO_DATE('20240101', 'yyyymmdd')" , read = " maxcompute" , write = " spark" )
35+ # ["TO_TIMESTAMP('20240101', 'yyyymmdd')"]
36+
37+ # Round-trip MaxCompute DDL
3838sqlglot.transpile(
39- " CREATE TABLE t (id INT ) LIFECYCLE 30" ,
39+ " CREATE TABLE t (id BIGINT ) LIFECYCLE 30" ,
4040 read = " maxcompute" ,
4141 write = " maxcompute" ,
4242)
43- # ["CREATE TABLE t (id INT ) LIFECYCLE 30"]
43+ # ["CREATE TABLE t (id BIGINT ) LIFECYCLE 30"]
4444```
4545
46- ## What's implemented
46+ ## What's supported
4747
48- ### Parser ( MaxCompute → canonical AST)
48+ ### Parser — MaxCompute → canonical AST
4949
5050| Category | Functions |
5151| ---| ---|
52- | Date arithmetic | ` DATEADD ` , ` DATEDIFF ` , ` ADD_MONTHS ` , ` MONTHS_BETWEEN ` |
53- | Date extraction | ` DATEPART ` , ` DATETRUNC ` , ` TRUNC_TIME ` , ` DAYOFMONTH ` , ` DAYOFWEEK ` , ` DAYOFYEAR ` , ` HOUR ` , ` MINUTE ` , ` SECOND ` , ` QUARTER ` , ` WEEKDAY ` , ` WEEKOFYEAR ` |
54- | Date conversion | ` DATE_FORMAT ` , ` TO_CHAR ` , ` TO_DATE ` , ` FROM_UNIXTIME ` , ` GETDATE ` , ` NOW ` , ` CURRENT_TIMESTAMP ` , ` CURRENT_TIMEZONE ` , ` FROM_UTC_TIMESTAMP ` |
52+ | Date arithmetic | ` DATEADD ` , ` DATE_SUB ` , ` DATEDIFF ` , ` ADD_MONTHS ` , ` MONTHS_BETWEEN ` |
53+ | Date extraction | ` DATEPART ` , ` DATETRUNC ` , ` TRUNC_TIME ` , ` DAY ` , ` MONTH ` , ` YEAR ` , ` HOUR ` , ` MINUTE ` , ` SECOND ` , ` QUARTER ` , ` DAYOFMONTH ` , ` DAYOFWEEK ` , ` DAYOFYEAR ` , ` WEEKDAY ` , ` WEEKOFYEAR ` |
54+ | Date conversion | ` TO_DATE ` , ` DATE_FORMAT ` , ` TO_CHAR ` , ` FROM_UNIXTIME ` , ` FROM_UTC_TIMESTAMP ` , ` TO_MILLIS ` , ` ISDATE ` |
55+ | Current date/time | ` GETDATE ` , ` NOW ` , ` CURRENT_TIMESTAMP ` , ` CURRENT_TIMEZONE ` |
5556| Last/next day | ` LAST_DAY ` , ` LASTDAY ` , ` NEXT_DAY ` |
56- | String | ` TOLOWER ` , ` TOUPPER ` , ` REGEXP_COUNT ` , ` SPLIT_PART ` |
57- | Aggregate | ` WM_CONCAT ` , ` COUNT_IF ` , ` ARG_MAX ` , ` ARG_MIN ` , ` ANY_VALUE ` , ` APPROX_DISTINCT ` , ` STDDEV_SAMP ` , ` COVAR_POP ` , ` COVAR_SAMP ` , ` CORR ` , ` MEDIAN ` , ` PERCENTILE_APPROX ` , ` BITWISE_AND_AGG ` , ` BITWISE_OR_AGG ` , ` BITWISE_XOR_AGG ` |
58- | Array | ` ALL_MATCH ` , ` ANY_MATCH ` , ` ARRAY_SORT ` , ` ARRAY_DISTINCT ` , ` ARRAY_EXCEPT ` , ` ARRAY_JOIN ` , ` ARRAY_MAX ` , ` ARRAY_MIN ` , ` ARRAYS_OVERLAP ` , ` ARRAYS_ZIP ` , ` ARRAY_INTERSECT ` , ` ARRAY_POSITION ` , ` ARRAY_REMOVE ` , ` ARRAY_CONTAINS ` |
57+ | String | ` TOLOWER ` , ` TOUPPER ` , ` REGEXP_COUNT ` , ` SPLIT_PART ` , ` SUBSTR ` |
58+ | Aggregate | ` WM_CONCAT ` , ` COUNT_IF ` , ` ARG_MAX ` , ` ARG_MIN ` , ` MAX_BY ` , ` MIN_BY ` , ` ANY_VALUE ` , ` APPROX_DISTINCT ` , ` STDDEV_SAMP ` , ` COVAR_POP ` , ` COVAR_SAMP ` , ` CORR ` , ` MEDIAN ` , ` PERCENTILE_APPROX ` , ` BITWISE_AND_AGG ` , ` BITWISE_OR_AGG ` , ` BITWISE_XOR_AGG ` |
59+ | Array | ` ALL_MATCH ` , ` ANY_MATCH ` , ` ARRAY_SORT ` , ` ARRAY_DISTINCT ` , ` ARRAY_EXCEPT ` , ` ARRAY_JOIN ` , ` ARRAY_MAX ` , ` ARRAY_MIN ` , ` ARRAYS_OVERLAP ` , ` ARRAYS_ZIP ` , ` ARRAY_INTERSECT ` , ` ARRAY_POSITION ` , ` ARRAY_REMOVE ` , ` ARRAY_CONTAINS ` , ` SLICE ` |
5960| Map | ` MAP_CONCAT ` , ` MAP_FROM_ENTRIES ` |
60- | JSON / misc | ` FROM_JSON ` , ` GET_USER_ID ` , ` REGEXP_SUBSTR ` , ` SLICE ` , ` TO_MILLIS ` , ` ISDATE ` |
61-
62- ### Generator (canonical AST → MaxCompute SQL)
63-
64- - Date/time: ` DATEADD ` , ` DATEDIFF ` , ` DATETRUNC ` , ` DATEPART ` , ` GETDATE() ` , ` NOW() `
65- - String: ` TOLOWER ` , ` TOUPPER `
66- - Aggregate: ` WM_CONCAT ` , ` ARG_MAX ` , ` ARG_MIN ` , ` APPROX_DISTINCT `
67- - JSON/misc: ` FROM_JSON ` , ` GET_USER_ID() ` , ` TO_MILLIS ` , ` TO_CHAR `
68- - Type mapping: ` VARCHAR ` /` CHAR ` /` TEXT ` → ` STRING ` , ` DATETIME ` preserved
61+ | JSON / misc | ` FROM_JSON ` , ` GET_JSON_OBJECT ` , ` JSON_TUPLE ` , ` GET_USER_ID ` , ` REGEXP_SUBSTR ` , ` TO_MILLIS ` , ` ISDATE ` |
62+
63+ Functions not listed are handled via Hive inheritance and work without explicit mapping (e.g. ` SPLIT ` , ` REGEXP_EXTRACT ` , ` COLLECT_LIST ` , ` PERCENTILE ` , all math/trig functions, window functions).
64+
65+ ### Generator — canonical AST → MaxCompute SQL
66+
67+ Explicit transforms on top of Hive:
68+
69+ | Expression | MaxCompute output | Note |
70+ | ---| ---| ---|
71+ | ` DATEADD ` / ` DATE_SUB ` | ` DATEADD(dt, ±n, 'UNIT') ` | Correct negation for ` DATE_SUB ` |
72+ | ` DATEDIFF ` | ` DATEDIFF(dt1, dt2[, unit]) ` | |
73+ | ` DATETRUNC ` | ` DATETRUNC(dt, 'unit') ` | Week units: ` 'week(monday)' ` etc. |
74+ | ` DATEPART ` | ` DATEPART(dt, 'UNIT') ` | |
75+ | ` TO_DATE(str, fmt) ` | ` TO_DATE(str, fmt) ` | Maps to ` exp.StrToTime ` (DATETIME) |
76+ | ` TO_DATE(str) ` | ` TO_DATE(str) ` | Maps to ` exp.TsOrDsToDate ` (DATE) |
77+ | ` CurrentTimestamp ` | ` GETDATE() ` | Covers ` GETDATE ` , ` NOW ` , ` CURRENT_TIMESTAMP ` |
78+ | ` CurrentDatetime ` | ` NOW() ` | For BigQuery-origin ` CURRENT_DATETIME ` |
79+ | ` SPACE(n) ` | ` SPACE(n) ` | Hive emits ` REPEAT(' ', n) ` |
80+ | ` VAR_POP(x) ` | ` VAR_POP(x) ` | Hive emits ` VARIANCE_POP ` |
81+ | ` VAR_SAMP(x) ` | ` VAR_SAMP(x) ` | Hive emits ` VARIANCE ` |
82+ | ` INSTR(str, sub) ` | ` INSTR(str, sub) ` | Hive emits ` LOCATE(sub, str) ` |
83+ | ` SUBSTR(str, pos, len) ` | ` SUBSTR(...) ` | Hive emits ` SUBSTRING ` |
84+ | Type: ` VARCHAR ` /` CHAR ` /` TEXT ` | ` STRING ` | |
85+ | Type: ` DATETIME ` | ` DATETIME ` | |
6986
7087### DDL
7188
7289- ` LIFECYCLE n ` — table retention in days
7390- ` RANGE CLUSTERED BY (cols) [SORTED BY (cols)] INTO n BUCKETS `
7491- ` AUTO PARTITIONED BY (TRUNC_TIME(col, 'unit') [AS alias]) `
75- - ` TBLPROPERTIES ('key'='value') ` coexists correctly with ` LIFECYCLE `
92+ - ` TBLPROPERTIES ('key'='value') ` — coexists correctly with ` LIFECYCLE `
93+ - ` COMMENT ` on columns and tables
7694
7795## Development
7896
7997``` bash
80- # Install dependencies
81- uv sync
82-
83- # Run tests
84- uv run pytest
85-
86- # Run a single test
87- uv run pytest tests/test_maxcompute.py::TestMaxCompute::test_dateadd_roundtrip
98+ uv sync # install dependencies
99+ uv run pytest # run all tests
88100```
89101
90102## License
0 commit comments