While going through the Public BI Benchmark, I found multiple queries with HAVING clauses but no GROUP BY columns (e.g., [1, 2, 3]). I'd like to discuss whether we want to support such statements or not.
Related to #186.
Consider the following query [1]:
SELECT SUM(CAST("TrainsUK2_2"."Number of Records" AS BIGINT)) AS "sum:Number of Records:ok"
FROM "TrainsUK2_2"
HAVING (COUNT(1) > 0)
According to DBFiddle [4], Postgres executes the query*, whereas we raise a syntax error due to seeing HAVING without GROUP BY columns. However, most of the HAVING criteria seem weird, as they consist of a a term always evaluating to true (see example above [1]) or false (see below [5]), assuming that the numbers passed th the aggregate functions do not refer to column IDs.
HAVING ((SUM(1) >= 30) AND (SUM(1) <= 100000))
If we want to adapt here, we could add a rule to the parser and change
opt_group : GROUP BY expr_list opt_having {
$$ = new GroupByDescription();
$$->columns = $3;
$$->having = $4;
}
| /* empty */ { $$ = nullptr; };
to
opt_group : GROUP BY expr_list opt_having {
$$ = new GroupByDescription();
$$->columns = $3;
$$->having = $4;
}
| HAVING expr {
$$ = new GroupByDescription();
$$->having = $2;
}
| /* empty */ { $$ = nullptr; };
Additionally, I noticed that the escaped column names in the example [1] do not work properly: "Number of Records" is not considered as one single identifier and "sum:Number of Records:ok" does not parse because sum is mistakenly treated as token for the SUM(...)aggregate function.
[1] https://github.com/cwida/public_bi_benchmark/blob/master/benchmark/TrainsUK2/queries/17.sql
[2] https://github.com/cwida/public_bi_benchmark/blob/master/benchmark/MLB/queries/103.sql
[3] https://github.com/cwida/public_bi_benchmark/blob/master/benchmark/Wins/queries/10.sql
[4] https://www.db-fiddle.com/
[5] https://github.com/cwida/public_bi_benchmark/blob/master/benchmark/Hatred/queries/12.sql
* SQL for reproduction:
CREATE TABLE "TrainsUK2_2"("Number of Records" smallint NOT NULL);
INSERT INTO "TrainsUK2_2" VALUES(-1);
SELECT SUM(CAST("TrainsUK2_2"."Number of Records" AS BIGINT)) AS "sum:Number of Records:ok" FROM "TrainsUK2_2" HAVING (COUNT(1) > 0);
Result:
| sum:Number of Records:ok |
| -1 |
While going through the Public BI Benchmark, I found multiple queries with
HAVINGclauses but noGROUP BYcolumns (e.g., [1, 2, 3]). I'd like to discuss whether we want to support such statements or not.Related to #186.
Consider the following query [1]:
According to DBFiddle [4], Postgres executes the query*, whereas we raise a syntax error due to seeing
HAVINGwithoutGROUP BYcolumns. However, most of theHAVINGcriteria seem weird, as they consist of a a term always evaluating totrue(see example above [1]) orfalse(see below [5]), assuming that the numbers passed th the aggregate functions do not refer to column IDs.If we want to adapt here, we could add a rule to the parser and change
opt_group : GROUP BY expr_list opt_having { $$ = new GroupByDescription(); $$->columns = $3; $$->having = $4; } | /* empty */ { $$ = nullptr; };to
opt_group : GROUP BY expr_list opt_having { $$ = new GroupByDescription(); $$->columns = $3; $$->having = $4; } | HAVING expr { $$ = new GroupByDescription(); $$->having = $2; } | /* empty */ { $$ = nullptr; };Additionally, I noticed that the escaped column names in the example [1] do not work properly:
"Number of Records"is not considered as one single identifier and"sum:Number of Records:ok"does not parse becausesumis mistakenly treated as token for theSUM(...)aggregate function.[1] https://github.com/cwida/public_bi_benchmark/blob/master/benchmark/TrainsUK2/queries/17.sql
[2] https://github.com/cwida/public_bi_benchmark/blob/master/benchmark/MLB/queries/103.sql
[3] https://github.com/cwida/public_bi_benchmark/blob/master/benchmark/Wins/queries/10.sql
[4] https://www.db-fiddle.com/
[5] https://github.com/cwida/public_bi_benchmark/blob/master/benchmark/Hatred/queries/12.sql
* SQL for reproduction:
Result: