Skip to content

Commit d889985

Browse files
committed
chore: exclude docs/superpowers from repo; update CLAUDE.md with current architecture and DDL design
1 parent 96587af commit d889985

4 files changed

Lines changed: 18 additions & 678 deletions

File tree

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,5 +14,5 @@ local/
1414

1515
# Claude Code
1616
.claude
17-
docs/superpowers/plans/
17+
docs/
1818
.playwright-mcp

CLAUDE.md

Lines changed: 17 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -26,14 +26,16 @@ uv run pytest tests/test_foo.py::test_bar
2626

2727
## Architecture
2828

29-
The entire dialect lives in `src/sqlglot_maxcompute/maxcompute.py`. The `MaxCompute` class subclasses `sqlglot.dialects.hive.Hive` and overrides three inner classes:
29+
The dialect is split across three files in `src/sqlglot_maxcompute/`:
3030

31-
- **`Tokenizer`**adds MaxCompute-specific keywords (e.g., `EXPORT`, `OPTION`) on top of Hive's keywords.
32-
- **`Parser`**maps MaxCompute built-in function names to canonical `sqlglot.exp` expression nodes (e.g., `DATEADD``TsOrDsAdd`, `DATEDIFF``DateDiff`, `WM_CONCAT``GroupConcat`).
33-
- **`Generator`**will map canonical expression nodes back to MaxCompute SQL syntax via auto-discovered `<name>_sql()` methods or `TRANSFORMS` entries. Currently `pass`.
31+
- **`parser.py`**`MaxComputeParser(Hive.Parser)`: `FUNCTIONS` dict mapping MaxCompute function names to canonical `sqlglot.exp` nodes; `PROPERTY_PARSERS` for `LIFECYCLE`, `RANGE`, and `AUTO`; helper builders `_build_dateadd`, `_build_datetrunc`.
32+
- **`generator.py`**`MaxComputeGenerator(Hive.Generator)`: `TYPE_MAPPING`, `TRANSFORMS`, and named `_sql` methods that map canonical AST nodes back to MaxCompute SQL.
33+
- **`maxcompute.py`**`MaxCompute(Hive)`: slim coordinator that sets `TIME_MAPPING`/`DATE_FORMAT`/`TIME_FORMAT`, adds `Tokenizer` keywords (`EXPORT`, `LIFECYCLE`, `OPTION`), and wires `Parser = MaxComputeParser` / `Generator = MaxComputeGenerator`.
3434

3535
The dialect is registered as a plugin in `pyproject.toml` under `[project.entry-points."sqlglot.dialects"]`, so after installation it is automatically discoverable by sqlglot as `"maxcompute"`.
3636

37+
This split mirrors sqlglot's own mypyc-compile refactor (parsers/generators split by file) and is required for compatibility with sqlglot ≥ 31 compiled wheels.
38+
3739
`local/` contains development scratch files and references — **not part of the package**:
3840
- `scratch.py` — keyword comparison scratch script
3941
- `sqlglot/` — full clone of the sqlglot repo for reference (expressions, dialects, generator internals); `sqlglot/posts/` contains official guides (`onboarding.md` for architecture deep-dive, `ast_primer.md` for AST tutorial). Note: local clone is newer than installed (30.0.1) — dialect parsers moved to `parsers/`, expressions split into `expressions/` package
@@ -42,11 +44,10 @@ The dialect is registered as a plugin in `pyproject.toml` under `[project.entry-
4244

4345
## Implementation Status
4446

45-
The dialect is largely complete. Current state:
46-
- **Parser**: ~65 functions mapped across date/time, string, aggregate, array, and map categories.
47-
- **Generator**: `TRANSFORMS` entries + named `_sql` methods for all major expression types; inherits Hive for the rest.
48-
- **Tests**: `tests/test_maxcompute.py` covers Parser (parse + cross-dialect) and Generator (round-trip + cross-dialect).
49-
- **Reference**: Full implementation checklist is in `docs/superpowers/specs/2026-03-13-maxcompute-dialect-design.md`.
47+
The dialect is complete at v0.3.1:
48+
- **Parser**: ~65 functions explicitly mapped (date/time, string, aggregate, array, map); remainder inherited from Hive.
49+
- **Generator**: `TRANSFORMS` + named `_sql` methods for all major expression types; Hive handles the rest.
50+
- **Tests**: 39 test methods, 180+ subtests covering parse, round-trip, and cross-dialect transpilation.
5051

5152
## Key sqlglot patterns
5253

@@ -98,3 +99,10 @@ Note: snapshots exceed token limits; grep the saved file for the button ref inst
9899
- **Named `_sql` methods vs TRANSFORMS** — use a named method when the base class already defines one (e.g. `extract_sql`, `groupconcat_sql`); both work but the method is cleaner and avoids surprise overrides.
99100
- **Don't add empty `PROPERTIES_LOCATION = {**Hive.Generator.PROPERTIES_LOCATION}`** — pure boilerplate; only add the dict when you have new entries to include.
100101
- **DateSub string-literal delta (BigQuery quirk)** — BigQuery's `DATE_SUB` stores the magnitude as a string literal; normalize before negating: `exp.Literal.number(delta.this)` so you emit `-3` not `-'3'`.
102+
103+
## DDL design decisions
104+
105+
- **LIFECYCLE vs TBLPROPERTIES coexistence** — stored as `exp.Property(this=exp.var("LIFECYCLE"), value=...)`. The `properties_sql` override in `MaxComputeGenerator` separates Var-keyed properties (rendered bare as `LIFECYCLE 30`) from string-keyed ones (delegated to Hive's TBLPROPERTIES wrapper). This avoids overriding `PROPERTIES_LOCATION[exp.Property]`, which would break other dialects.
106+
- **RANGE CLUSTERED BY** — reuses `exp.ClusteredByProperty` with an undeclared `args["range"] = True` flag. Undeclared args survive `copy()`/`deepcopy()` in sqlglot's `Expression` base. The generator's `clusteredbyproperty_sql` override prepends `RANGE ` when the flag is present.
107+
- **AUTO PARTITIONED BY** — parsed as `PartitionedByProperty(this=DateTrunc(...))` or `PartitionedByProperty(this=Alias(this=DateTrunc(...), alias=...))`. The generator detects `DateTrunc`/`TimestampTrunc`/`DatetimeTrunc` (or `Alias` wrapping one) as the `this` child to identify auto-partition nodes and emit `AUTO PARTITIONED BY (TRUNC_TIME(...))`.
108+
- **TO_DATE return type**`TO_DATE(str)``exp.TsOrDsToDate` (DATE); `TO_DATE(str, fmt)``exp.StrToTime` (DATETIME). The generator maps `exp.StrToTime` back to `TO_DATE(str, fmt)` so MaxCompute output is correct and cross-dialect consumers see the right type.

0 commit comments

Comments
 (0)