Skip to content

Commit fac8f26

Browse files
azurechen97claude
andcommitted
refactor: split parser/generator from Hive nested classes, rename to dialect.py, bump to v0.4.0
- Rename maxcompute.py → dialect.py; update entry point, __init__, and docs - MaxComputeParser now inherits HiveParser (sqlglot.parsers.hive) - MaxComputeGenerator now inherits HiveGenerator (sqlglot.generators.hive) - Raise sqlglot dependency floor to >=30.1.0 (first release with split modules) - Bump version to 0.4.0 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent 7275b6d commit fac8f26

9 files changed

Lines changed: 81 additions & 38 deletions

File tree

CHANGELOG.md

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,13 @@
11
# Changelog
22

3-
## [0.3.2] - 2026-04-01
3+
## [0.4.0] - 2026-04-01
4+
5+
### Changed (internal refactor)
6+
7+
- `maxcompute.py` renamed to `dialect.py` — the coordinator class `MaxCompute` is now in `src/sqlglot_maxcompute/dialect.py`
8+
- `MaxComputeParser` now inherits from `HiveParser` (imported from `sqlglot.parsers.hive`) instead of `Hive.Parser`
9+
- `MaxComputeGenerator` now inherits from `HiveGenerator` (imported from `sqlglot.generators.hive`) instead of `Hive.Generator`
10+
- `sqlglot` dependency floor raised to `>=30.1.0` (first release with split `parsers/` and `generators/` modules)
411

512
### Fixed (parser + generator correctness)
613

@@ -32,7 +39,7 @@
3239

3340
### Changed (internal)
3441

35-
- Dialect split: `maxcompute.py` now delegates to `parser.py` and `generator.py` (mirrors sqlglot's own mypyc-compile refactor)
42+
- Dialect split: `dialect.py` now delegates to `parser.py` and `generator.py` (mirrors sqlglot's own mypyc-compile refactor)
3643

3744
### Tests
3845

CLAUDE.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -28,23 +28,23 @@ uv run pytest tests/test_foo.py::test_bar
2828

2929
The dialect is split across three files in `src/sqlglot_maxcompute/`:
3030

31-
- **`parser.py`**`MaxComputeParser(Hive.Parser)`: `FUNCTIONS` dict mapping MaxCompute function names to canonical `sqlglot.exp` nodes; `PROPERTY_PARSERS` for `LIFECYCLE`, `RANGE`, and `AUTO`; helper builders `_build_dateadd`, `_build_datetrunc`.
32-
- **`generator.py`**`MaxComputeGenerator(Hive.Generator)`: `TYPE_MAPPING`, `TRANSFORMS`, and named `_sql` methods that map canonical AST nodes back to MaxCompute SQL.
33-
- **`maxcompute.py`**`MaxCompute(Hive)`: slim coordinator that sets `TIME_MAPPING`/`DATE_FORMAT`/`TIME_FORMAT`, adds `Tokenizer` keywords (`EXPORT`, `LIFECYCLE`, `OPTION`), and wires `Parser = MaxComputeParser` / `Generator = MaxComputeGenerator`.
31+
- **`parser.py`**`MaxComputeParser(HiveParser)`: `FUNCTIONS` dict mapping MaxCompute function names to canonical `sqlglot.exp` nodes; `PROPERTY_PARSERS` for `LIFECYCLE`, `RANGE`, and `AUTO`; helper builders `_build_dateadd`, `_build_datetrunc`.
32+
- **`generator.py`**`MaxComputeGenerator(HiveGenerator)`: `TYPE_MAPPING`, `TRANSFORMS`, and named `_sql` methods that map canonical AST nodes back to MaxCompute SQL.
33+
- **`dialect.py`**`MaxCompute(Hive)`: slim coordinator that sets `TIME_MAPPING`/`DATE_FORMAT`/`TIME_FORMAT`, adds `Tokenizer` keywords (`EXPORT`, `LIFECYCLE`, `OPTION`), and wires `Parser = MaxComputeParser` / `Generator = MaxComputeGenerator`.
3434

3535
The dialect is registered as a plugin in `pyproject.toml` under `[project.entry-points."sqlglot.dialects"]`, so after installation it is automatically discoverable by sqlglot as `"maxcompute"`.
3636

37-
This split mirrors sqlglot's own mypyc-compile refactor (parsers/generators split by file) and is required for compatibility with sqlglot ≥ 31 compiled wheels.
37+
This split mirrors sqlglot's own mypyc-compile refactor (parsers/generators split into `sqlglot.parsers.*` / `sqlglot.generators.*` modules) and requires sqlglot ≥ 30.1.0.
3838

3939
`local/` contains development scratch files and references — **not part of the package**:
4040
- `scratch.py` — keyword comparison scratch script
41-
- `sqlglot/` — full clone of the sqlglot repo for reference (expressions, dialects, generator internals); `sqlglot/posts/` contains official guides (`onboarding.md` for architecture deep-dive, `ast_primer.md` for AST tutorial). Note: local clone is newer than installed (30.0.1) — dialect parsers moved to `parsers/`, expressions split into `expressions/` package
41+
- `sqlglot/` — full clone of the sqlglot repo for reference (expressions, dialects, generator internals); `sqlglot/posts/` contains official guides (`onboarding.md` for architecture deep-dive, `ast_primer.md` for AST tutorial). Parsers live in `parsers/`, generators in `generators/`, expressions in `expressions/` package
4242
- `ydb-sqlglot-plugin/` — YDB dialect plugin, used as reference for how a well-behaved plugin is structured
4343
- `maxcompute_doc/` — MaxCompute official function documentation (e.g., `date_func.md`, `func_comparison.md`)
4444

4545
## Implementation Status
4646

47-
The dialect is complete at v0.3.2:
47+
The dialect is complete at v0.4.0:
4848
- **Parser**: ~65 functions explicitly mapped (date/time, string, aggregate, array, map); remainder inherited from Hive.
4949
- **Generator**: `TRANSFORMS` + named `_sql` methods for all major expression types; Hive handles the rest.
5050
- **Tests**: 40 test methods, 186 subtests covering parse, round-trip, and cross-dialect transpilation.

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ Registers the `maxcompute` dialect via Python entry points so that SQLGlot can p
1010
pip install sqlglot-maxcompute
1111
```
1212

13-
Requires Python ≥ 3.9 and SQLGlot ≥ 29.
13+
Requires Python ≥ 3.9 and SQLGlot ≥ 30.1.
1414

1515
## Usage
1616

pyproject.toml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[project]
22
name = "sqlglot-maxcompute"
3-
version = "0.3.2"
3+
version = "0.4.0"
44
description = "MaxCompute dialect plugin for SQLGlot"
55
readme = "README.md"
66
license = { text = "MIT" }
@@ -9,7 +9,7 @@ authors = [
99
]
1010
requires-python = ">=3.9"
1111
dependencies = [
12-
"sqlglot>=29.0.0,<31",
12+
"sqlglot>=30.1.0,<31",
1313
]
1414
classifiers = [
1515
"Development Status :: 3 - Alpha",
@@ -38,4 +38,4 @@ dev = [
3838
testpaths = ["tests"]
3939

4040
[project.entry-points."sqlglot.dialects"]
41-
maxcompute = "sqlglot_maxcompute.maxcompute:MaxCompute"
41+
maxcompute = "sqlglot_maxcompute.dialect:MaxCompute"

src/sqlglot_maxcompute/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
from sqlglot_maxcompute.maxcompute import MaxCompute
1+
from sqlglot_maxcompute.dialect import MaxCompute
22
from sqlglot_maxcompute.parser import MaxComputeParser
33
from sqlglot_maxcompute.generator import MaxComputeGenerator
44

src/sqlglot_maxcompute/generator.py

Lines changed: 40 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
import typing as t
44

55
from sqlglot import exp
6-
from sqlglot.dialects.hive import Hive
6+
from sqlglot.generators.hive import HiveGenerator
77
from sqlglot.dialects.dialect import rename_func, unit_to_str
88
from sqlglot.transforms import (
99
move_schema_columns_to_partitioned_by,
@@ -13,7 +13,12 @@
1313
)
1414

1515

16-
_AUTO_PARTITION_TYPES = (exp.DateTrunc, exp.TimestampTrunc, exp.DatetimeTrunc, exp.Alias)
16+
_AUTO_PARTITION_TYPES = (
17+
exp.DateTrunc,
18+
exp.TimestampTrunc,
19+
exp.DatetimeTrunc,
20+
exp.Alias,
21+
)
1722

1823

1924
def _move_schema_columns_to_partitioned_by(expression: exp.Expr) -> exp.Expr:
@@ -25,17 +30,17 @@ def _move_schema_columns_to_partitioned_by(expression: exp.Expr) -> exp.Expr:
2530
return move_schema_columns_to_partitioned_by(expression)
2631

2732

28-
class MaxComputeGenerator(Hive.Generator):
33+
class MaxComputeGenerator(HiveGenerator):
2934
TYPE_MAPPING = {
30-
**Hive.Generator.TYPE_MAPPING,
35+
**HiveGenerator.TYPE_MAPPING,
3136
exp.DType.DATETIME: "DATETIME",
3237
exp.DType.VARCHAR: "STRING",
3338
exp.DType.CHAR: "STRING",
3439
exp.DType.TEXT: "STRING",
3540
}
3641

3742
TRANSFORMS = {
38-
**Hive.Generator.TRANSFORMS,
43+
**HiveGenerator.TRANSFORMS,
3944
exp.Create: preprocess(
4045
[
4146
remove_unique_constraints,
@@ -76,14 +81,24 @@ class MaxComputeGenerator(Hive.Generator):
7681
# Numeric truncation: TRUNC(n, d)
7782
exp.Trunc: lambda self, e: self.func("TRUNC", e.this, e.args.get("decimals")),
7883
# String position: MaxCompute uses INSTR(str, substr), not LOCATE(substr, str)
79-
exp.StrPosition: lambda self, e: self.func("INSTR", e.this, e.args.get("substr"), e.args.get("position")),
84+
exp.StrPosition: lambda self, e: self.func(
85+
"INSTR", e.this, e.args.get("substr"), e.args.get("position")
86+
),
8087
# TO_DATE(str, fmt) returns DATETIME — modeled as StrToTime; emit TO_DATE in MaxCompute
81-
exp.StrToTime: lambda self, e: self.func("TO_DATE", e.this, e.args.get("format")),
88+
exp.StrToTime: lambda self, e: self.func(
89+
"TO_DATE", e.this, e.args.get("format")
90+
),
8291
}
8392

8493
def _dateadd_sql(
8594
self,
86-
expression: exp.TsOrDsAdd | exp.DateAdd | exp.DateSub | exp.TimestampAdd | exp.DatetimeAdd,
95+
expression: (
96+
exp.TsOrDsAdd
97+
| exp.DateAdd
98+
| exp.DateSub
99+
| exp.TimestampAdd
100+
| exp.DatetimeAdd
101+
),
87102
) -> str:
88103
unit = unit_to_str(expression) if expression.args.get("unit") else "'DAY'"
89104
delta = expression.expression
@@ -122,18 +137,26 @@ def tochar_sql(self, expression: exp.ToChar) -> str:
122137
return self.func("TO_CHAR", expression.this, expression.args.get("format"))
123138

124139
def substring_sql(self, expression: exp.Substring) -> str:
125-
return self.func("SUBSTR", expression.this, expression.args.get("start"), expression.args.get("length"))
140+
return self.func(
141+
"SUBSTR",
142+
expression.this,
143+
expression.args.get("start"),
144+
expression.args.get("length"),
145+
)
126146

127147
def extract_sql(self, expression: exp.Extract) -> str:
128148
unit = expression.this
129-
return self.func("DATEPART", expression.expression, exp.Literal.string(unit.name))
149+
return self.func(
150+
"DATEPART", expression.expression, exp.Literal.string(unit.name)
151+
)
130152

131153
def mod_sql(self, expression: exp.Mod) -> str:
132154
# Reverse the WEEKDAY parser transform: (DAYOFWEEK(x) + 5) % 7 → WEEKDAY(x)
133155
rhs = expression.expression
134156
lhs = expression.this
135157
if (
136-
isinstance(rhs, exp.Literal) and rhs.this == "7"
158+
isinstance(rhs, exp.Literal)
159+
and rhs.this == "7"
137160
and isinstance(lhs, exp.Paren)
138161
and isinstance(lhs.this, exp.Add)
139162
and isinstance(lhs.this.this, exp.DayOfWeek)
@@ -152,7 +175,9 @@ def _partitioned_by_sql(self, expression: exp.PartitionedByProperty) -> str:
152175
inner = inner.this
153176
unit = inner.args.get("unit")
154177
unit_str = unit.name.lower() if unit else ""
155-
trunc_sql = self.func("TRUNC_TIME", inner.this, exp.Literal.string(unit_str))
178+
trunc_sql = self.func(
179+
"TRUNC_TIME", inner.this, exp.Literal.string(unit_str)
180+
)
156181
return f"AUTO PARTITIONED BY ({trunc_sql}{alias_sql})"
157182
return f"PARTITIONED BY {self.sql(expression, 'this')}"
158183

@@ -163,7 +188,9 @@ def clusteredbyproperty_sql(self, expression: exp.ClusteredByProperty) -> str:
163188
def datatype_sql(self, expression: exp.DataType) -> str:
164189
# VARCHAR and CHAR map to STRING in MaxCompute, with no length parameters
165190
if expression.this in (exp.DType.VARCHAR, exp.DType.CHAR):
166-
return self.TYPE_MAPPING.get(expression.this, super().datatype_sql(expression))
191+
return self.TYPE_MAPPING.get(
192+
expression.this, super().datatype_sql(expression)
193+
)
167194
return super().datatype_sql(expression)
168195

169196
def properties_sql(self, expression: exp.Properties) -> str:

src/sqlglot_maxcompute/parser.py

Lines changed: 17 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
import typing as t
55

66
from sqlglot import exp
7-
from sqlglot.dialects.hive import Hive
7+
from sqlglot.parsers.hive import HiveParser
88
from sqlglot.dialects.dialect import build_timetostr_or_tochar
99
from sqlglot.helper import seq_get
1010
from sqlglot.tokens import TokenType
@@ -61,9 +61,9 @@ def _build_datetrunc(
6161
return exp.DateTrunc(unit=unit, this=this)
6262

6363

64-
class MaxComputeParser(Hive.Parser):
64+
class MaxComputeParser(HiveParser):
6565
FUNCTIONS = {
66-
**Hive.Parser.FUNCTIONS,
66+
**HiveParser.FUNCTIONS,
6767
# Hive overrides: MaxCompute accepts date/datetime/timestamp/string directly
6868
# without needing TsOrDsToDate wrapping
6969
"DAY": exp.Day.from_arg_list,
@@ -86,7 +86,9 @@ class MaxComputeParser(Hive.Parser):
8686
# Hive override: produce exp.DateSub so _dateadd_sql emits DATEADD(date, -n, unit)
8787
# cleanly. Hive maps DATE_SUB to TsOrDsAdd(expression=Mul(n, -1)) which generates
8888
# "3 * -1" in the output.
89-
"DATE_SUB": lambda args: exp.DateSub(this=seq_get(args, 0), expression=seq_get(args, 1)),
89+
"DATE_SUB": lambda args: exp.DateSub(
90+
this=seq_get(args, 0), expression=seq_get(args, 1)
91+
),
9092
# Date arithmetic
9193
"DATEADD": _build_dateadd,
9294
"DATEDIFF": lambda args: exp.DateDiff(
@@ -118,7 +120,10 @@ class MaxComputeParser(Hive.Parser):
118120
"MINUTE": exp.Minute.from_arg_list,
119121
"SECOND": exp.Second.from_arg_list,
120122
"QUARTER": exp.Quarter.from_arg_list,
121-
"WEEKDAY": lambda args: exp.paren(exp.DayOfWeek(this=seq_get(args, 0)) + 5, copy=False) % 7,
123+
"WEEKDAY": lambda args: exp.paren(
124+
exp.DayOfWeek(this=seq_get(args, 0)) + 5, copy=False
125+
)
126+
% 7,
122127
"WEEKOFYEAR": exp.WeekOfYear.from_arg_list,
123128
# Last/next day
124129
"LAST_DAY": exp.LastDay.from_arg_list,
@@ -141,7 +146,9 @@ class MaxComputeParser(Hive.Parser):
141146
),
142147
"ISDATE": lambda args: exp.not_(
143148
exp.Is(
144-
this=exp.TsOrDsToDate(this=seq_get(args, 0), format=seq_get(args, 1), safe=True),
149+
this=exp.TsOrDsToDate(
150+
this=seq_get(args, 0), format=seq_get(args, 1), safe=True
151+
),
145152
expression=exp.Null(),
146153
)
147154
),
@@ -198,7 +205,7 @@ class MaxComputeParser(Hive.Parser):
198205
}
199206

200207
PROPERTY_PARSERS = {
201-
**Hive.Parser.PROPERTY_PARSERS,
208+
**HiveParser.PROPERTY_PARSERS,
202209
# LIFECYCLE n — MaxCompute table retention in days. Stored as a generic
203210
# exp.Property with a Var key so no custom expression class is needed and
204211
# sqlglot's PROPERTIES_LOCATION contract is not broken.
@@ -209,7 +216,9 @@ class MaxComputeParser(Hive.Parser):
209216
"AUTO": lambda self: self._parse_auto_partition(),
210217
}
211218

212-
def _parse_auto_partition(self) -> exp.PartitionedByProperty | exp.AutoRefreshProperty | None:
219+
def _parse_auto_partition(
220+
self,
221+
) -> exp.PartitionedByProperty | exp.AutoRefreshProperty | None:
213222
if self._match(TokenType.PARTITION_BY):
214223
self._match(TokenType.L_PAREN)
215224
expr = self._parse_conjunction()

uv.lock

Lines changed: 4 additions & 4 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)