This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
sqlglot-maxcompute is a SQLGlot dialect plugin for Alibaba Cloud MaxCompute (formerly ODPS). It registers the MaxCompute dialect via Python entry points so that sqlglot can parse and generate MaxCompute SQL.
This project uses uv for dependency management.
# Install dependencies (including dev)
uv sync
# Run all tests
uv run pytest
# Run a single test file
uv run pytest tests/test_foo.py
# Run a single test by name
uv run pytest tests/test_foo.py::test_barThe dialect is split across three files in src/sqlglot_maxcompute/:
parser.py—MaxComputeParser(HiveParser):FUNCTIONSdict mapping MaxCompute function names to canonicalsqlglot.expnodes;PROPERTY_PARSERSforLIFECYCLE,RANGE, andAUTO; helper builders_build_dateadd,_build_datetrunc.generator.py—MaxComputeGenerator(HiveGenerator):TYPE_MAPPING,TRANSFORMS, and named_sqlmethods that map canonical AST nodes back to MaxCompute SQL.dialect.py—MaxCompute(Hive): slim coordinator that setsTIME_MAPPING/DATE_FORMAT/TIME_FORMAT, addsTokenizerkeywords (EXPORT,LIFECYCLE,OPTION), and wiresParser = MaxComputeParser/Generator = MaxComputeGenerator.
The dialect is registered as a plugin in pyproject.toml under [project.entry-points."sqlglot.dialects"], so after installation it is automatically discoverable by sqlglot as "maxcompute".
This split mirrors sqlglot's own mypyc-compile refactor (parsers/generators split into sqlglot.parsers.* / sqlglot.generators.* modules) and requires sqlglot ≥ 30.1.0.
local/ contains development scratch files and references — not part of the package:
scratch.py— keyword comparison scratch scriptsqlglot/— full clone of the sqlglot repo for reference (expressions, dialects, generator internals);sqlglot/posts/contains official guides (onboarding.mdfor architecture deep-dive,ast_primer.mdfor AST tutorial). Parsers live inparsers/, generators ingenerators/, expressions inexpressions/packageydb-sqlglot-plugin/— YDB dialect plugin, used as reference for how a well-behaved plugin is structuredmaxcompute_doc/— MaxCompute official function documentation (e.g.,date_func.md,func_comparison.md)
The dialect is complete at v0.4.0:
- Parser: ~65 functions explicitly mapped (date/time, string, aggregate, array, map); remainder inherited from Hive.
- Generator:
TRANSFORMS+ named_sqlmethods for all major expression types; Hive handles the rest. - Tests: 40 test methods, 186 subtests covering parse, round-trip, and cross-dialect transpilation.
When adding function mappings in Parser.FUNCTIONS, use sqlglot.helper.seq_get to safely extract positional arguments from the args list. Note that MaxCompute argument order sometimes differs from the canonical expression (e.g., DATEDIFF(unit, start, end) vs DateDiff(this=end, expression=start, unit=unit)).
When adding generator transforms in Generator.TRANSFORMS, use self.func(name, *args) to produce correctly formatted SQL function calls.
Tests use a Validator base class (inline in tests/test_maxcompute.py) mirroring sqlglot's pattern:
validate_all(sql, write={dialect: expected})— cross-dialect transpilation assertionsassertIsInstance(parse_one(sql, read="maxcompute"), exp.SomeClass)— parse node assertionsread=must be a dict —read={"spark": "LOCATE(...)"}, notread="spark". Bare string is silently ignored byvalidate_all.- Pyright false positive —
assertIsNotNone(x)does not narrow types in Pyright;x.fieldafter it shows "attribute of None" errors that are noise, not real bugs.
Development is test-driven (TDD). For every fix or feature:
- Write the failing test first and run it to confirm it fails
- Implement the minimal change to make it pass
- Run the full suite to confirm no regressions
- Commit
Before writing validate_all assertions, probe actual output first:
uv run python -c "from sqlglot import parse_one; e = parse_one('FUNC(...)', read='maxcompute'); print(e.sql('spark'))"For multi-step debugging (AST inspection, tracing transforms, etc.), write a temporary script to local/probe.py and run it with uv run python local/probe.py. The local/ directory is gitignored, so probe scripts won't pollute the repo. Always delete when done — subagents consistently forget to clean up.
When instructing subagents to debug, explicitly include: "write probe scripts to local/probe.py, run with uv run python local/probe.py, delete when done."
This is a dialect plugin, not a fork. We must stay within sqlglot's public extension points:
- No custom
exp.Propertysubclasses — allPropertysubclasses must live in sqlglot'sexpressions/properties.pyand be registered in the baseGenerator.PROPERTIES_LOCATION. Defining a custom subclass in this plugin breaks every other dialect'slocate_properties(which uses a raw dict lookup with no fallback). Use genericexp.Property(this=exp.var("KEY"), value=...)instead and overrideTRANSFORMS[exp.Property]andPROPERTIES_LOCATION[exp.Property]inMaxCompute.Generatorto handle the formatting. - No monkey-patching sqlglot internals — do not patch
Generator.locate_properties,Generator.TRANSFORMS, or any other base class method/dict outside theMaxComputeclass hierarchy. - No new
exp.*expression classes — all AST node types must be existing sqlglot classes. Checkexpressions.pybefore considering anything custom.
Alibaba help pages have a 复制为 MD 格式 button that copies the page as markdown to clipboard.
Workflow: browser_navigate → browser_snapshot (save to file, grep for button ref) → browser_click → browser_evaluate(() => navigator.clipboard.readText()) → Write to local/maxcompute_doc/.
Note: snapshots exceed token limits; grep the saved file for the button ref instead of reading it directly.
- Never use
exp.Anonymous— checkexpressions.pyfor a proper class first; use formula-based expressions as fallback. - Inherit, don't re-implement — omit functions from
Parser.FUNCTIONSif MaxCompute and Hive have identical semantics. - Type-dispatch builders —
_build_dateadd/_build_datetruncdispatch to typed nodes viais_type(), with an untyped fallback.
self.funcdropsNoneargs silently — guard optional args before passing to avoid emitting invalid SQL (e.g.groupconcat_sqldefaultsseparatorto',').unit_to_stronWeekStartreturns the raw name, not a string literal — reconstruct asexp.Literal.string(f"week({day})")manually.- Named
_sqlmethods vs TRANSFORMS — use a named method when the base class already defines one (e.g.extract_sql,groupconcat_sql); both work but the method is cleaner and avoids surprise overrides. - Don't add empty
PROPERTIES_LOCATION = {**Hive.Generator.PROPERTIES_LOCATION}— pure boilerplate; only add the dict when you have new entries to include. - DateSub string-literal delta (BigQuery quirk) — BigQuery's
DATE_SUBstores the magnitude as a string literal; normalize before negating:exp.Literal.number(delta.this)so you emit-3not-'3'.
- LIFECYCLE vs TBLPROPERTIES coexistence — stored as
exp.Property(this=exp.var("LIFECYCLE"), value=...). Theproperties_sqloverride inMaxComputeGeneratorseparates Var-keyed properties (rendered bare asLIFECYCLE 30) from string-keyed ones (delegated to Hive's TBLPROPERTIES wrapper). This avoids overridingPROPERTIES_LOCATION[exp.Property], which would break other dialects. - RANGE CLUSTERED BY — reuses
exp.ClusteredByPropertywith an undeclaredargs["range"] = Trueflag. Undeclared args survivecopy()/deepcopy()in sqlglot'sExpressionbase. The generator'sclusteredbyproperty_sqloverride prependsRANGEwhen the flag is present. - AUTO PARTITIONED BY — parsed as
PartitionedByProperty(this=DateTrunc(...))orPartitionedByProperty(this=Alias(this=DateTrunc(...), alias=...)). The generator detectsDateTrunc/TimestampTrunc/DatetimeTrunc(orAliaswrapping one) as thethischild to identify auto-partition nodes and emitAUTO PARTITIONED BY (TRUNC_TIME(...)). - TO_DATE return type —
TO_DATE(str)→exp.TsOrDsToDate(DATE);TO_DATE(str, fmt)→exp.StrToTime(DATETIME). The generator mapsexp.StrToTimeback toTO_DATE(str, fmt)so MaxCompute output is correct and cross-dialect consumers see the right type.