You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CLAUDE.md
+17-9Lines changed: 17 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -26,14 +26,16 @@ uv run pytest tests/test_foo.py::test_bar
26
26
27
27
## Architecture
28
28
29
-
The entire dialect lives in `src/sqlglot_maxcompute/maxcompute.py`. The `MaxCompute` class subclasses `sqlglot.dialects.hive.Hive` and overrides three inner classes:
29
+
The dialect is split across three files in `src/sqlglot_maxcompute/`:
30
30
31
-
-**`Tokenizer`** — adds MaxCompute-specific keywords (e.g., `EXPORT`, `OPTION`) on top of Hive's keywords.
-**`Generator`** — will map canonical expression nodes back to MaxCompute SQL syntax via auto-discovered `<name>_sql()` methods or `TRANSFORMS` entries. Currently `pass`.
31
+
-**`parser.py`** — `MaxComputeParser(Hive.Parser)`: `FUNCTIONS` dict mapping MaxCompute function names to canonical `sqlglot.exp` nodes; `PROPERTY_PARSERS` for `LIFECYCLE`, `RANGE`, and `AUTO`; helper builders `_build_dateadd`, `_build_datetrunc`.
32
+
-**`generator.py`** — `MaxComputeGenerator(Hive.Generator)`: `TYPE_MAPPING`, `TRANSFORMS`, and named `_sql` methods that map canonical AST nodes back to MaxCompute SQL.
The dialect is registered as a plugin in `pyproject.toml` under `[project.entry-points."sqlglot.dialects"]`, so after installation it is automatically discoverable by sqlglot as `"maxcompute"`.
36
36
37
+
This split mirrors sqlglot's own mypyc-compile refactor (parsers/generators split by file) and is required for compatibility with sqlglot ≥ 31 compiled wheels.
38
+
37
39
`local/` contains development scratch files and references — **not part of the package**:
38
40
-`scratch.py` — keyword comparison scratch script
39
41
-`sqlglot/` — full clone of the sqlglot repo for reference (expressions, dialects, generator internals); `sqlglot/posts/` contains official guides (`onboarding.md` for architecture deep-dive, `ast_primer.md` for AST tutorial). Note: local clone is newer than installed (30.0.1) — dialect parsers moved to `parsers/`, expressions split into `expressions/` package
@@ -42,11 +44,10 @@ The dialect is registered as a plugin in `pyproject.toml` under `[project.entry-
42
44
43
45
## Implementation Status
44
46
45
-
The dialect is largely complete. Current state:
46
-
-**Parser**: ~65 functions mapped across date/time, string, aggregate, array, and map categories.
47
-
-**Generator**: `TRANSFORMS` entries + named `_sql` methods for all major expression types; inherits Hive for the rest.
-**Generator**: `TRANSFORMS` + named `_sql` methods for all major expression types; Hive handles the rest.
50
+
-**Tests**: 39 test methods, 180+ subtests covering parse, round-trip, and cross-dialect transpilation.
50
51
51
52
## Key sqlglot patterns
52
53
@@ -98,3 +99,10 @@ Note: snapshots exceed token limits; grep the saved file for the button ref inst
98
99
-**Named `_sql` methods vs TRANSFORMS** — use a named method when the base class already defines one (e.g. `extract_sql`, `groupconcat_sql`); both work but the method is cleaner and avoids surprise overrides.
99
100
-**Don't add empty `PROPERTIES_LOCATION = {**Hive.Generator.PROPERTIES_LOCATION}`** — pure boilerplate; only add the dict when you have new entries to include.
100
101
-**DateSub string-literal delta (BigQuery quirk)** — BigQuery's `DATE_SUB` stores the magnitude as a string literal; normalize before negating: `exp.Literal.number(delta.this)` so you emit `-3` not `-'3'`.
102
+
103
+
## DDL design decisions
104
+
105
+
-**LIFECYCLE vs TBLPROPERTIES coexistence** — stored as `exp.Property(this=exp.var("LIFECYCLE"), value=...)`. The `properties_sql` override in `MaxComputeGenerator` separates Var-keyed properties (rendered bare as `LIFECYCLE 30`) from string-keyed ones (delegated to Hive's TBLPROPERTIES wrapper). This avoids overriding `PROPERTIES_LOCATION[exp.Property]`, which would break other dialects.
106
+
-**RANGE CLUSTERED BY** — reuses `exp.ClusteredByProperty` with an undeclared `args["range"] = True` flag. Undeclared args survive `copy()`/`deepcopy()` in sqlglot's `Expression` base. The generator's `clusteredbyproperty_sql` override prepends `RANGE ` when the flag is present.
107
+
-**AUTO PARTITIONED BY** — parsed as `PartitionedByProperty(this=DateTrunc(...))` or `PartitionedByProperty(this=Alias(this=DateTrunc(...), alias=...))`. The generator detects `DateTrunc`/`TimestampTrunc`/`DatetimeTrunc` (or `Alias` wrapping one) as the `this` child to identify auto-partition nodes and emit `AUTO PARTITIONED BY (TRUNC_TIME(...))`.
108
+
-**TO_DATE return type** — `TO_DATE(str)` → `exp.TsOrDsToDate` (DATE); `TO_DATE(str, fmt)` → `exp.StrToTime` (DATETIME). The generator maps `exp.StrToTime` back to `TO_DATE(str, fmt)` so MaxCompute output is correct and cross-dialect consumers see the right type.
0 commit comments