|
| 1 | +# sqlglot-maxcompute |
| 2 | + |
| 3 | +A [SQLGlot](https://github.com/tobymao/sqlglot) dialect plugin for [Alibaba Cloud MaxCompute](https://www.alibabacloud.com/product/maxcompute) (formerly ODPS). |
| 4 | + |
| 5 | +Registers the `maxcompute` dialect via Python entry points so that SQLGlot can parse and generate MaxCompute SQL. |
| 6 | + |
| 7 | +## Installation |
| 8 | + |
| 9 | +```bash |
| 10 | +pip install sqlglot-maxcompute |
| 11 | +``` |
| 12 | + |
| 13 | +## Usage |
| 14 | + |
| 15 | +```python |
| 16 | +import sqlglot |
| 17 | + |
| 18 | +# Parse MaxCompute SQL |
| 19 | +ast = sqlglot.parse_one("SELECT DATEADD(dt, 1, 'DAY')", read="maxcompute") |
| 20 | + |
| 21 | +# Transpile from another dialect to MaxCompute |
| 22 | +sqlglot.transpile( |
| 23 | + "SELECT DATE_ADD(dt, 1)", |
| 24 | + read="spark", |
| 25 | + write="maxcompute", |
| 26 | +) |
| 27 | +# ["SELECT DATEADD(dt, 1, 'DAY')"] |
| 28 | + |
| 29 | +# Transpile from MaxCompute to another dialect |
| 30 | +sqlglot.transpile( |
| 31 | + "SELECT DATETRUNC(dt, 'MONTH')", |
| 32 | + read="maxcompute", |
| 33 | + write="spark", |
| 34 | +) |
| 35 | +# ["SELECT TRUNC(dt, 'MONTH')"] |
| 36 | + |
| 37 | +# Round-trip: parse and regenerate MaxCompute SQL |
| 38 | +sqlglot.transpile( |
| 39 | + "CREATE TABLE t (id INT) LIFECYCLE 30", |
| 40 | + read="maxcompute", |
| 41 | + write="maxcompute", |
| 42 | +) |
| 43 | +# ["CREATE TABLE t (id INT) LIFECYCLE 30"] |
| 44 | +``` |
| 45 | + |
| 46 | +## What's implemented |
| 47 | + |
| 48 | +### Parser (MaxCompute → canonical AST) |
| 49 | + |
| 50 | +| Category | Functions | |
| 51 | +|---|---| |
| 52 | +| Date arithmetic | `DATEADD`, `DATEDIFF`, `ADD_MONTHS`, `MONTHS_BETWEEN` | |
| 53 | +| Date extraction | `DATEPART`, `DATETRUNC`, `TRUNC_TIME`, `DAYOFMONTH`, `DAYOFWEEK`, `DAYOFYEAR`, `HOUR`, `MINUTE`, `SECOND`, `QUARTER`, `WEEKDAY`, `WEEKOFYEAR` | |
| 54 | +| Date conversion | `DATE_FORMAT`, `TO_CHAR`, `TO_DATE`, `FROM_UNIXTIME`, `GETDATE`, `NOW`, `CURRENT_TIMESTAMP`, `CURRENT_TIMEZONE`, `FROM_UTC_TIMESTAMP` | |
| 55 | +| Last/next day | `LAST_DAY`, `LASTDAY`, `NEXT_DAY` | |
| 56 | +| String | `TOLOWER`, `TOUPPER`, `REGEXP_COUNT`, `SPLIT_PART` | |
| 57 | +| Aggregate | `WM_CONCAT`, `COUNT_IF`, `ARG_MAX`, `ARG_MIN`, `ANY_VALUE`, `APPROX_DISTINCT`, `STDDEV_SAMP`, `COVAR_POP`, `COVAR_SAMP`, `CORR`, `MEDIAN`, `PERCENTILE_APPROX`, `BITWISE_AND_AGG`, `BITWISE_OR_AGG`, `BITWISE_XOR_AGG` | |
| 58 | +| Array | `ALL_MATCH`, `ANY_MATCH`, `ARRAY_SORT`, `ARRAY_DISTINCT`, `ARRAY_EXCEPT`, `ARRAY_JOIN`, `ARRAY_MAX`, `ARRAY_MIN`, `ARRAYS_OVERLAP`, `ARRAYS_ZIP`, `ARRAY_INTERSECT`, `ARRAY_POSITION`, `ARRAY_REMOVE`, `ARRAY_CONTAINS` | |
| 59 | +| Map | `MAP_CONCAT`, `MAP_FROM_ENTRIES` | |
| 60 | +| JSON / misc | `FROM_JSON`, `GET_USER_ID`, `REGEXP_SUBSTR`, `SLICE`, `TO_MILLIS`, `ISDATE` | |
| 61 | + |
| 62 | +### Generator (canonical AST → MaxCompute SQL) |
| 63 | + |
| 64 | +- Date/time: `DATEADD`, `DATEDIFF`, `DATETRUNC`, `DATEPART`, `GETDATE()`, `NOW()` |
| 65 | +- String: `TOLOWER`, `TOUPPER` |
| 66 | +- Aggregate: `WM_CONCAT`, `ARG_MAX`, `ARG_MIN`, `APPROX_DISTINCT` |
| 67 | +- JSON/misc: `FROM_JSON`, `GET_USER_ID()`, `TO_MILLIS`, `TO_CHAR` |
| 68 | +- Type mapping: `VARCHAR`/`CHAR`/`TEXT` → `STRING`, `DATETIME` preserved |
| 69 | + |
| 70 | +### DDL |
| 71 | + |
| 72 | +- `LIFECYCLE n` — table retention in days |
| 73 | +- `RANGE CLUSTERED BY (cols) [SORTED BY (cols)] INTO n BUCKETS` |
| 74 | +- `AUTO PARTITIONED BY (TRUNC_TIME(col, 'unit') [AS alias])` |
| 75 | +- `TBLPROPERTIES ('key'='value')` coexists correctly with `LIFECYCLE` |
| 76 | + |
| 77 | +## Development |
| 78 | + |
| 79 | +```bash |
| 80 | +# Install dependencies |
| 81 | +uv sync |
| 82 | + |
| 83 | +# Run tests |
| 84 | +uv run pytest |
| 85 | + |
| 86 | +# Run a single test |
| 87 | +uv run pytest tests/test_maxcompute.py::TestMaxCompute::test_dateadd_roundtrip |
| 88 | +``` |
| 89 | + |
| 90 | +## License |
| 91 | + |
| 92 | +MIT |
0 commit comments