Skip to content

Latest commit

 

History

History
271 lines (223 loc) · 15.4 KB

File metadata and controls

271 lines (223 loc) · 15.4 KB

MySQL Parser Syntax Coverage Scenarios

Goal: Close all syntax-bearing gaps between omni's MySQL parser and MySQL 8.0, with oracle-verified correctness Verification: go test ./mysql/parser/... — each scenario has a parse test (valid SQL parses, correct AST) or error test (invalid SQL rejected) Reference sources: MySQL 8.0 docs (https://dev.mysql.com/doc/refman/8.0/en/), real MySQL 8.0 via testcontainers

Status: [ ] pending, [x] passing, [~] partial (needs upstream change)


Phase 1: Keyword & Type Foundation

1.1 Missing Type Synonyms — Numeric

These are MySQL 8.0 numeric type aliases that the parser should accept and map to canonical types.

  • CREATE TABLE t (a REAL) — parses, maps to DOUBLE
  • CREATE TABLE t (a REAL(10,2)) — parses with precision, maps to DOUBLE(10,2)
  • CREATE TABLE t (a REAL UNSIGNED) — parses with UNSIGNED modifier
  • CREATE TABLE t (a DEC) — parses, maps to DECIMAL
  • CREATE TABLE t (a DEC(10,2)) — parses with precision, maps to DECIMAL(10,2)
  • CREATE TABLE t (a DEC UNSIGNED) — parses with UNSIGNED modifier
  • CREATE TABLE t (a FIXED) — parses, maps to DECIMAL
  • CREATE TABLE t (a FIXED(10,2)) — parses with precision, maps to DECIMAL(10,2)
  • CREATE TABLE t (a FIXED UNSIGNED ZEROFILL) — parses with all modifiers
  • CREATE TABLE t (a INT1) — parses, maps to TINYINT
  • CREATE TABLE t (a INT2) — parses, maps to SMALLINT
  • CREATE TABLE t (a INT3) — parses, maps to MEDIUMINT
  • CREATE TABLE t (a INT4) — parses, maps to INT
  • CREATE TABLE t (a INT8) — parses, maps to BIGINT
  • CREATE TABLE t (a MIDDLEINT) — parses, maps to MEDIUMINT
  • CREATE TABLE t (a FLOAT4) — parses, maps to FLOAT
  • CREATE TABLE t (a FLOAT8) — parses, maps to DOUBLE
  • CREATE TABLE t (a DOUBLE PRECISION) — existing synonym still works (no regression)

1.2 Missing Type Synonyms — String & Binary

  • CREATE TABLE t (a LONG) — parses, maps to MEDIUMTEXT
  • CREATE TABLE t (a LONG VARCHAR) — parses, maps to MEDIUMTEXT
  • CREATE TABLE t (a LONG VARBINARY) — parses, maps to MEDIUMBLOB
  • CREATE TABLE t (a LONG CHARACTER SET utf8mb4) — LONG with charset
  • CREATE TABLE t (a LONG VARCHAR COLLATE utf8mb4_general_ci) — LONG VARCHAR with collation
  • CREATE TABLE t (a TEXT(1000)) — TEXT with optional length (existing, regression check)
  • CREATE TABLE t (a NATIONAL CHAR(10)) — existing synonym (regression check)
  • CREATE TABLE t (a NCHAR(10)) — existing synonym (regression check)
  • CREATE TABLE t (a NVARCHAR(100)) — existing synonym (regression check)

1.3 SIGNED Modifier

  • CREATE TABLE t (a INT SIGNED) — parses, SIGNED accepted as modifier
  • CREATE TABLE t (a BIGINT SIGNED) — parses on BIGINT
  • CREATE TABLE t (a TINYINT SIGNED) — parses on TINYINT
  • CREATE TABLE t (a SMALLINT SIGNED) — parses on SMALLINT
  • CREATE TABLE t (a MEDIUMINT SIGNED) — parses on MEDIUMINT
  • CREATE TABLE t (a TINYINT SIGNED NOT NULL) — SIGNED followed by column constraints
  • CREATE TABLE t (a DECIMAL SIGNED) — oracle-verify: check if MySQL 8.0 accepts or rejects this
  • SELECT CAST(x AS SIGNED) — still works (no regression)
  • SELECT CAST(x AS SIGNED INTEGER) — still works (no regression)

1.4 Reserved Word Registration

Keywords that MySQL 8.0 reserves and that affect parsing when used in SQL.

  • SELECT 1 FROM DUAL — DUAL as keyword in FROM clause
  • PARTITION p1 VALUES LESS THAN (MAXVALUE) in CREATE TABLE — MAXVALUE recognized
  • SELECT GROUPING(a) FROM t GROUP BY a WITH ROLLUP — GROUPING function
  • SELECT VALUE FROM t — VALUE as reserved word (verify behavior with MySQL 8.0)

1.5 Numeric Type Shorthands in Combinations

  • CREATE TABLE t (a INT1 UNSIGNED ZEROFILL) — full modifier chain on shorthand
  • CREATE TABLE t (a INT4(11) UNSIGNED) — display width on shorthand
  • ALTER TABLE t ADD COLUMN b REAL DEFAULT 0.0 — type synonym in ALTER TABLE context
  • ALTER TABLE t MODIFY c DEC(10,2) NOT NULL — type synonym in MODIFY COLUMN
  • CREATE TABLE t (a FLOAT4, b FLOAT8, c INT1, d INT2, e INT3, f INT4, g INT8) — all shorthands in one table
  • CREATE TABLE t (a SERIAL) — SERIAL type (existing, regression check)

Phase 2: Statement Syntax Gaps

2.1 ALTER TABLE PARTITION BY

  • ALTER TABLE t PARTITION BY HASH(id) PARTITIONS 4 — basic hash repartition
  • ALTER TABLE t PARTITION BY KEY(id) PARTITIONS 4 — key repartition
  • ALTER TABLE t PARTITION BY RANGE(id) (PARTITION p0 VALUES LESS THAN (100), PARTITION p1 VALUES LESS THAN (MAXVALUE)) — range repartition
  • ALTER TABLE t PARTITION BY LIST(status) (PARTITION p0 VALUES IN (1,2), PARTITION p1 VALUES IN (3,4)) — list repartition
  • ALTER TABLE t PARTITION BY RANGE COLUMNS(created_at) (PARTITION p0 VALUES LESS THAN ('2024-01-01')) — range columns
  • ALTER TABLE t PARTITION BY LINEAR HASH(id) PARTITIONS 8 — linear hash
  • ALTER TABLE t PARTITION BY HASH(id) PARTITIONS 4 SUBPARTITION BY KEY(name) SUBPARTITIONS 2 — with subpartition
  • ALTER TABLE t PARTITION BY KEY ALGORITHM=2 (id) PARTITIONS 4 — KEY with algorithm
  • ALTER TABLE t PARTITION BY HASH(id) PARTITIONS 4, ALGORITHM=INPLACE — combined with ALTER TABLE options
  • ALTER TABLE t PARTITION BY — rejected: missing partition specification
  • ALTER TABLE t REMOVE PARTITIONING — existing operation (regression check)

2.2 Window Function Names as Reserved Words

MySQL 8.0 reserves these as keywords. Parser must accept them as function names in window expressions.

  • SELECT RANK() OVER (ORDER BY score DESC) FROM t — RANK
  • SELECT DENSE_RANK() OVER (ORDER BY score DESC) FROM t — DENSE_RANK
  • SELECT ROW_NUMBER() OVER (ORDER BY id) FROM t — ROW_NUMBER
  • SELECT NTILE(4) OVER (ORDER BY id) FROM t — NTILE with argument
  • SELECT LAG(val, 1) OVER (ORDER BY id) FROM t — LAG with offset
  • SELECT LEAD(val, 1, 0) OVER (ORDER BY id) FROM t — LEAD with offset and default
  • SELECT FIRST_VALUE(val) OVER (ORDER BY id) FROM t — FIRST_VALUE
  • SELECT LAST_VALUE(val) OVER (ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) FROM t — LAST_VALUE with frame
  • SELECT NTH_VALUE(val, 2) OVER (ORDER BY id) FROM t — NTH_VALUE
  • SELECT PERCENT_RANK() OVER (ORDER BY score) FROM t — PERCENT_RANK
  • SELECT CUME_DIST() OVER (ORDER BY score) FROM t — CUME_DIST

2.3 Interval Unit Keywords

MySQL reserves interval units for INTERVAL expr unit expressions and temporal arithmetic.

  • SELECT DATE_ADD(d, INTERVAL 1 DAY) FROM t — DAY
  • SELECT DATE_ADD(d, INTERVAL 1 HOUR) FROM t — HOUR
  • SELECT DATE_ADD(d, INTERVAL 1 MINUTE) FROM t — MINUTE
  • SELECT DATE_ADD(d, INTERVAL 1 SECOND) FROM t — SECOND
  • SELECT DATE_ADD(d, INTERVAL 1 MONTH) FROM t — MONTH
  • SELECT DATE_ADD(d, INTERVAL 1 WEEK) FROM t — WEEK
  • SELECT DATE_ADD(d, INTERVAL 1 QUARTER) FROM t — QUARTER
  • SELECT DATE_ADD(d, INTERVAL 1 YEAR) FROM t — YEAR (already a keyword)
  • SELECT DATE_ADD(d, INTERVAL 1 MICROSECOND) FROM t — MICROSECOND
  • SELECT DATE_ADD(d, INTERVAL '1:30' HOUR_MINUTE) FROM t — compound HOUR_MINUTE
  • SELECT DATE_ADD(d, INTERVAL '1 1:30:00' DAY_SECOND) FROM t — compound DAY_SECOND
  • SELECT DATE_ADD(d, INTERVAL '1-6' YEAR_MONTH) FROM t — compound YEAR_MONTH
  • SELECT DATE_ADD(d, INTERVAL '1:30:00.5' HOUR_MICROSECOND) FROM t — compound HOUR_MICROSECOND
  • SELECT DATE_ADD(d, INTERVAL '1 12' DAY_HOUR) FROM t — compound DAY_HOUR
  • SELECT DATE_ADD(d, INTERVAL '1 12:30' DAY_MINUTE) FROM t — compound DAY_MINUTE
  • SELECT DATE_ADD(d, INTERVAL '30:00' MINUTE_SECOND) FROM t — compound MINUTE_SECOND
  • SELECT DATE_ADD(d, INTERVAL '30:00.5' MINUTE_MICROSECOND) FROM t — compound MINUTE_MICROSECOND
  • SELECT DATE_ADD(d, INTERVAL '59.999999' SECOND_MICROSECOND) FROM t — compound SECOND_MICROSECOND
  • SELECT DATE_ADD(d, INTERVAL '1:30:59' HOUR_SECOND) FROM t — compound HOUR_SECOND
  • SELECT d + INTERVAL 1 DAY FROM t — interval in arithmetic expression
  • SELECT d - INTERVAL 1 MONTH FROM t — interval subtraction

2.4 UTC Temporal Functions

MySQL 8.0 reserves UTC_DATE, UTC_TIME, UTC_TIMESTAMP as no-argument function calls (no parentheses required).

  • SELECT UTC_DATE — no-parens form
  • SELECT UTC_DATE() — with-parens form
  • SELECT UTC_TIME — no-parens form
  • SELECT UTC_TIME() — with-parens form
  • SELECT UTC_TIME(3) — with fsp argument
  • SELECT UTC_TIMESTAMP — no-parens form
  • SELECT UTC_TIMESTAMP() — with-parens form
  • SELECT UTC_TIMESTAMP(6) — with fsp argument

Phase 3: Oracle Corpus Verification

Oracle-verified corpus: each scenario is a concrete SQL statement sent to both MySQL 8.0 (via testcontainers) and omni parser, asserting matching accept/reject decisions.

3.1 Oracle Corpus — Integer Types × Modifiers

Each scenario is a CREATE TABLE with one column exercising a specific type+modifier combination.

  • CREATE TABLE t (a INT) — baseline
  • CREATE TABLE t (a INT UNSIGNED) — unsigned
  • CREATE TABLE t (a INT SIGNED) — signed (redundant but valid)
  • CREATE TABLE t (a INT(11)) — display width
  • CREATE TABLE t (a INT UNSIGNED ZEROFILL) — unsigned + zerofill
  • CREATE TABLE t (a TINYINT SIGNED) — tinyint signed
  • CREATE TABLE t (a TINYINT UNSIGNED ZEROFILL) — tinyint unsigned zerofill
  • CREATE TABLE t (a SMALLINT(5) UNSIGNED) — smallint with width
  • CREATE TABLE t (a MEDIUMINT SIGNED) — mediumint signed
  • CREATE TABLE t (a BIGINT(20) UNSIGNED) — bigint with width
  • CREATE TABLE t (a INT1 UNSIGNED) — shorthand + modifier
  • CREATE TABLE t (a INT8 SIGNED) — shorthand + signed

3.2 Oracle Corpus — Decimal & Float Types

  • CREATE TABLE t (a DECIMAL) — bare decimal
  • CREATE TABLE t (a DECIMAL(10)) — single precision
  • CREATE TABLE t (a DECIMAL(10,2)) — full precision
  • CREATE TABLE t (a DECIMAL(10,2) UNSIGNED) — oracle-verify unsigned decimal
  • CREATE TABLE t (a NUMERIC(10,2)) — numeric synonym
  • CREATE TABLE t (a DEC(10,2)) — dec synonym
  • CREATE TABLE t (a FIXED(10,2)) — fixed synonym
  • CREATE TABLE t (a FLOAT) — bare float
  • CREATE TABLE t (a FLOAT(10,2)) — float with precision
  • CREATE TABLE t (a FLOAT(24)) — single float precision
  • CREATE TABLE t (a FLOAT(25)) — becomes double precision
  • CREATE TABLE t (a DOUBLE) — double
  • CREATE TABLE t (a DOUBLE PRECISION) — double precision synonym
  • CREATE TABLE t (a REAL) — real synonym
  • CREATE TABLE t (a FLOAT4) — float4 synonym
  • CREATE TABLE t (a FLOAT8) — float8 synonym

3.3 Oracle Corpus — String & Binary Types

  • CREATE TABLE t (a CHAR(10)) — fixed-length string
  • CREATE TABLE t (a CHAR(10) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci) — with charset + collation
  • CREATE TABLE t (a VARCHAR(255)) — variable string
  • CREATE TABLE t (a TEXT) — text
  • CREATE TABLE t (a TEXT(1000)) — text with length
  • CREATE TABLE t (a TINYTEXT) — tinytext
  • CREATE TABLE t (a MEDIUMTEXT CHARACTER SET latin1) — mediumtext with charset
  • CREATE TABLE t (a LONGTEXT) — longtext
  • CREATE TABLE t (a LONG) — long synonym
  • CREATE TABLE t (a LONG VARCHAR) — long varchar synonym
  • CREATE TABLE t (a BINARY(16)) — fixed binary
  • CREATE TABLE t (a VARBINARY(255)) — variable binary
  • CREATE TABLE t (a BLOB) — blob
  • CREATE TABLE t (a BLOB(1000)) — blob with length
  • CREATE TABLE t (a TINYBLOB) — tinyblob
  • CREATE TABLE t (a MEDIUMBLOB) — mediumblob
  • CREATE TABLE t (a LONGBLOB) — longblob
  • CREATE TABLE t (a LONG VARBINARY) — long varbinary synonym
  • CREATE TABLE t (a NATIONAL CHAR(10)) — national char
  • CREATE TABLE t (a NCHAR(10)) — nchar synonym
  • CREATE TABLE t (a NVARCHAR(100)) — nvarchar synonym

3.4 Oracle Corpus — Date/Time, JSON, Spatial & Special Types

  • CREATE TABLE t (a DATE) — date
  • CREATE TABLE t (a TIME) — time
  • CREATE TABLE t (a TIME(3)) — time with fsp
  • CREATE TABLE t (a DATETIME) — datetime
  • CREATE TABLE t (a DATETIME(6)) — datetime with fsp
  • CREATE TABLE t (a TIMESTAMP) — timestamp
  • CREATE TABLE t (a TIMESTAMP(3)) — timestamp with fsp
  • CREATE TABLE t (a YEAR) — year
  • CREATE TABLE t (a BIT(8)) — bit
  • CREATE TABLE t (a BOOL) — boolean
  • CREATE TABLE t (a JSON) — json
  • CREATE TABLE t (a SERIAL) — serial (auto_increment bigint)
  • CREATE TABLE t (a ENUM('a','b','c')) — enum
  • CREATE TABLE t (a SET('x','y','z')) — set
  • CREATE TABLE t (a GEOMETRY) — geometry
  • CREATE TABLE t (a POINT) — point
  • CREATE TABLE t (a LINESTRING) — linestring
  • CREATE TABLE t (a POLYGON) — polygon
  • CREATE TABLE t (a GEOMETRYCOLLECTION) — geometrycollection

3.5 Oracle Corpus — Window Functions

  • SELECT RANK() OVER w FROM t WINDOW w AS (ORDER BY id) — named window
  • SELECT ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) FROM t — partition by
  • SELECT LAG(val) OVER (ORDER BY id) FROM t — lag without offset (default 1)
  • SELECT LAG(val, 2, 'N/A') OVER (ORDER BY id) FROM t — lag with all args
  • SELECT NTH_VALUE(val, 3) FROM FIRST OVER (ORDER BY id) FROM t — with FROM FIRST
  • SELECT SUM(val) OVER (ORDER BY id ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) FROM t — aggregate as window
  • SELECT DENSE_RANK() OVER (ORDER BY score), PERCENT_RANK() OVER (ORDER BY score) FROM t — multiple window functions

3.6 Oracle Corpus — Interval Expressions

  • SELECT NOW() + INTERVAL 1 DAY — interval in expression
  • SELECT NOW() - INTERVAL 2 HOUR — interval subtraction
  • SELECT DATE_ADD('2024-01-01', INTERVAL 1 MONTH) — with literal date
  • SELECT DATE_SUB(NOW(), INTERVAL 30 MINUTE) — date_sub
  • SELECT TIMESTAMPADD(MINUTE, 30, NOW()) — timestampadd form
  • [~] SELECT EXTRACT(HOUR FROM NOW()) — extract — MISMATCH: omni rejects, MySQL accepts (parser lacks EXTRACT support)
  • SELECT * FROM t WHERE created_at > NOW() - INTERVAL 7 DAY — in WHERE clause
  • SELECT * FROM t WHERE created_at BETWEEN NOW() - INTERVAL 1 MONTH AND NOW() — interval in BETWEEN

3.7 Oracle Corpus — Rejected SQL

Verify omni matches MySQL 8.0 rejection behavior.

  • CREATE TABLE t (a VARCHAR UNSIGNED) — rejected: UNSIGNED on string type
  • CREATE TABLE t (a TEXT ZEROFILL) — rejected: ZEROFILL on text type
  • CREATE TABLE t (a JSON UNSIGNED) — rejected: UNSIGNED on JSON
  • [~] CREATE TABLE select (a INT) — MISMATCH: omni accepts, MySQL rejects (parser too lenient with reserved words)
  • [~] CREATE TABLE t (select INT) — MISMATCH: omni accepts, MySQL rejects (parser too lenient with reserved words)
  • ALTER TABLE t PARTITION BY — rejected: incomplete partition spec
  • ALTER TABLE t PARTITION BY RANGE(id) — rejected: missing partition definitions for RANGE
  • [~] SELECT DATE_ADD(d, INTERVAL 1 INVALID_UNIT) FROM t — MISMATCH: omni accepts, MySQL rejects (parser doesn't validate interval units)