Skip to content

Commit 96587af

Browse files
committed
docs: update README to reflect 0.3.1 — full function tables, type modeling notes, generator summary
1 parent 6629abc commit 96587af

1 file changed

Lines changed: 51 additions & 39 deletions

File tree

README.md

Lines changed: 51 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@ Registers the `maxcompute` dialect via Python entry points so that SQLGlot can p
1010
pip install sqlglot-maxcompute
1111
```
1212

13+
Requires Python ≥ 3.9 and SQLGlot ≥ 29.
14+
1315
## Usage
1416

1517
```python
@@ -19,72 +21,82 @@ import sqlglot
1921
ast = sqlglot.parse_one("SELECT DATEADD(dt, 1, 'DAY')", read="maxcompute")
2022

2123
# Transpile from another dialect to MaxCompute
22-
sqlglot.transpile(
23-
"SELECT DATE_ADD(dt, 1)",
24-
read="spark",
25-
write="maxcompute",
26-
)
24+
sqlglot.transpile("SELECT DATE_ADD(dt, 1)", read="spark", write="maxcompute")
2725
# ["SELECT DATEADD(dt, 1, 'DAY')"]
2826

2927
# Transpile from MaxCompute to another dialect
30-
sqlglot.transpile(
31-
"SELECT DATETRUNC(dt, 'MONTH')",
32-
read="maxcompute",
33-
write="spark",
34-
)
28+
sqlglot.transpile("SELECT DATETRUNC(dt, 'MONTH')", read="maxcompute", write="spark")
3529
# ["SELECT TRUNC(dt, 'MONTH')"]
3630

37-
# Round-trip: parse and regenerate MaxCompute SQL
31+
# TO_DATE return type depends on args:
32+
# without format → DATE (exp.TsOrDsToDate)
33+
# with format → DATETIME (exp.StrToTime)
34+
sqlglot.transpile("TO_DATE('20240101', 'yyyymmdd')", read="maxcompute", write="spark")
35+
# ["TO_TIMESTAMP('20240101', 'yyyymmdd')"]
36+
37+
# Round-trip MaxCompute DDL
3838
sqlglot.transpile(
39-
"CREATE TABLE t (id INT) LIFECYCLE 30",
39+
"CREATE TABLE t (id BIGINT) LIFECYCLE 30",
4040
read="maxcompute",
4141
write="maxcompute",
4242
)
43-
# ["CREATE TABLE t (id INT) LIFECYCLE 30"]
43+
# ["CREATE TABLE t (id BIGINT) LIFECYCLE 30"]
4444
```
4545

46-
## What's implemented
46+
## What's supported
4747

48-
### Parser (MaxCompute → canonical AST)
48+
### Parser MaxCompute → canonical AST
4949

5050
| Category | Functions |
5151
|---|---|
52-
| Date arithmetic | `DATEADD`, `DATEDIFF`, `ADD_MONTHS`, `MONTHS_BETWEEN` |
53-
| Date extraction | `DATEPART`, `DATETRUNC`, `TRUNC_TIME`, `DAYOFMONTH`, `DAYOFWEEK`, `DAYOFYEAR`, `HOUR`, `MINUTE`, `SECOND`, `QUARTER`, `WEEKDAY`, `WEEKOFYEAR` |
54-
| Date conversion | `DATE_FORMAT`, `TO_CHAR`, `TO_DATE`, `FROM_UNIXTIME`, `GETDATE`, `NOW`, `CURRENT_TIMESTAMP`, `CURRENT_TIMEZONE`, `FROM_UTC_TIMESTAMP` |
52+
| Date arithmetic | `DATEADD`, `DATE_SUB`, `DATEDIFF`, `ADD_MONTHS`, `MONTHS_BETWEEN` |
53+
| Date extraction | `DATEPART`, `DATETRUNC`, `TRUNC_TIME`, `DAY`, `MONTH`, `YEAR`, `HOUR`, `MINUTE`, `SECOND`, `QUARTER`, `DAYOFMONTH`, `DAYOFWEEK`, `DAYOFYEAR`, `WEEKDAY`, `WEEKOFYEAR` |
54+
| Date conversion | `TO_DATE`, `DATE_FORMAT`, `TO_CHAR`, `FROM_UNIXTIME`, `FROM_UTC_TIMESTAMP`, `TO_MILLIS`, `ISDATE` |
55+
| Current date/time | `GETDATE`, `NOW`, `CURRENT_TIMESTAMP`, `CURRENT_TIMEZONE` |
5556
| Last/next day | `LAST_DAY`, `LASTDAY`, `NEXT_DAY` |
56-
| String | `TOLOWER`, `TOUPPER`, `REGEXP_COUNT`, `SPLIT_PART` |
57-
| Aggregate | `WM_CONCAT`, `COUNT_IF`, `ARG_MAX`, `ARG_MIN`, `ANY_VALUE`, `APPROX_DISTINCT`, `STDDEV_SAMP`, `COVAR_POP`, `COVAR_SAMP`, `CORR`, `MEDIAN`, `PERCENTILE_APPROX`, `BITWISE_AND_AGG`, `BITWISE_OR_AGG`, `BITWISE_XOR_AGG` |
58-
| Array | `ALL_MATCH`, `ANY_MATCH`, `ARRAY_SORT`, `ARRAY_DISTINCT`, `ARRAY_EXCEPT`, `ARRAY_JOIN`, `ARRAY_MAX`, `ARRAY_MIN`, `ARRAYS_OVERLAP`, `ARRAYS_ZIP`, `ARRAY_INTERSECT`, `ARRAY_POSITION`, `ARRAY_REMOVE`, `ARRAY_CONTAINS` |
57+
| String | `TOLOWER`, `TOUPPER`, `REGEXP_COUNT`, `SPLIT_PART`, `SUBSTR` |
58+
| Aggregate | `WM_CONCAT`, `COUNT_IF`, `ARG_MAX`, `ARG_MIN`, `MAX_BY`, `MIN_BY`, `ANY_VALUE`, `APPROX_DISTINCT`, `STDDEV_SAMP`, `COVAR_POP`, `COVAR_SAMP`, `CORR`, `MEDIAN`, `PERCENTILE_APPROX`, `BITWISE_AND_AGG`, `BITWISE_OR_AGG`, `BITWISE_XOR_AGG` |
59+
| Array | `ALL_MATCH`, `ANY_MATCH`, `ARRAY_SORT`, `ARRAY_DISTINCT`, `ARRAY_EXCEPT`, `ARRAY_JOIN`, `ARRAY_MAX`, `ARRAY_MIN`, `ARRAYS_OVERLAP`, `ARRAYS_ZIP`, `ARRAY_INTERSECT`, `ARRAY_POSITION`, `ARRAY_REMOVE`, `ARRAY_CONTAINS`, `SLICE` |
5960
| Map | `MAP_CONCAT`, `MAP_FROM_ENTRIES` |
60-
| JSON / misc | `FROM_JSON`, `GET_USER_ID`, `REGEXP_SUBSTR`, `SLICE`, `TO_MILLIS`, `ISDATE` |
61-
62-
### Generator (canonical AST → MaxCompute SQL)
63-
64-
- Date/time: `DATEADD`, `DATEDIFF`, `DATETRUNC`, `DATEPART`, `GETDATE()`, `NOW()`
65-
- String: `TOLOWER`, `TOUPPER`
66-
- Aggregate: `WM_CONCAT`, `ARG_MAX`, `ARG_MIN`, `APPROX_DISTINCT`
67-
- JSON/misc: `FROM_JSON`, `GET_USER_ID()`, `TO_MILLIS`, `TO_CHAR`
68-
- Type mapping: `VARCHAR`/`CHAR`/`TEXT``STRING`, `DATETIME` preserved
61+
| JSON / misc | `FROM_JSON`, `GET_JSON_OBJECT`, `JSON_TUPLE`, `GET_USER_ID`, `REGEXP_SUBSTR`, `TO_MILLIS`, `ISDATE` |
62+
63+
Functions not listed are handled via Hive inheritance and work without explicit mapping (e.g. `SPLIT`, `REGEXP_EXTRACT`, `COLLECT_LIST`, `PERCENTILE`, all math/trig functions, window functions).
64+
65+
### Generator — canonical AST → MaxCompute SQL
66+
67+
Explicit transforms on top of Hive:
68+
69+
| Expression | MaxCompute output | Note |
70+
|---|---|---|
71+
| `DATEADD` / `DATE_SUB` | `DATEADD(dt, ±n, 'UNIT')` | Correct negation for `DATE_SUB` |
72+
| `DATEDIFF` | `DATEDIFF(dt1, dt2[, unit])` | |
73+
| `DATETRUNC` | `DATETRUNC(dt, 'unit')` | Week units: `'week(monday)'` etc. |
74+
| `DATEPART` | `DATEPART(dt, 'UNIT')` | |
75+
| `TO_DATE(str, fmt)` | `TO_DATE(str, fmt)` | Maps to `exp.StrToTime` (DATETIME) |
76+
| `TO_DATE(str)` | `TO_DATE(str)` | Maps to `exp.TsOrDsToDate` (DATE) |
77+
| `CurrentTimestamp` | `GETDATE()` | Covers `GETDATE`, `NOW`, `CURRENT_TIMESTAMP` |
78+
| `CurrentDatetime` | `NOW()` | For BigQuery-origin `CURRENT_DATETIME` |
79+
| `SPACE(n)` | `SPACE(n)` | Hive emits `REPEAT(' ', n)` |
80+
| `VAR_POP(x)` | `VAR_POP(x)` | Hive emits `VARIANCE_POP` |
81+
| `VAR_SAMP(x)` | `VAR_SAMP(x)` | Hive emits `VARIANCE` |
82+
| `INSTR(str, sub)` | `INSTR(str, sub)` | Hive emits `LOCATE(sub, str)` |
83+
| `SUBSTR(str, pos, len)` | `SUBSTR(...)` | Hive emits `SUBSTRING` |
84+
| Type: `VARCHAR`/`CHAR`/`TEXT` | `STRING` | |
85+
| Type: `DATETIME` | `DATETIME` | |
6986

7087
### DDL
7188

7289
- `LIFECYCLE n` — table retention in days
7390
- `RANGE CLUSTERED BY (cols) [SORTED BY (cols)] INTO n BUCKETS`
7491
- `AUTO PARTITIONED BY (TRUNC_TIME(col, 'unit') [AS alias])`
75-
- `TBLPROPERTIES ('key'='value')` coexists correctly with `LIFECYCLE`
92+
- `TBLPROPERTIES ('key'='value')` — coexists correctly with `LIFECYCLE`
93+
- `COMMENT` on columns and tables
7694

7795
## Development
7896

7997
```bash
80-
# Install dependencies
81-
uv sync
82-
83-
# Run tests
84-
uv run pytest
85-
86-
# Run a single test
87-
uv run pytest tests/test_maxcompute.py::TestMaxCompute::test_dateadd_roundtrip
98+
uv sync # install dependencies
99+
uv run pytest # run all tests
88100
```
89101

90102
## License

0 commit comments

Comments
 (0)