Skip to content

Add the official queries of TPC-H#86

Closed
minhancao wants to merge 13 commits into
prestodb:mainfrom
minhancao:tpch_official_queries
Closed

Add the official queries of TPC-H#86
minhancao wants to merge 13 commits into
prestodb:mainfrom
minhancao:tpch_official_queries

Conversation

@minhancao
Copy link
Copy Markdown
Contributor

Add the official queries of TPC-H as the column names have different syntax compared to the Presto generated ones, difference is column names have _ instead of . (ex: l_returnflag vs l.returnflag).

rzIBM and others added 12 commits March 17, 2026 14:08
)

* add new version of stream run for 10TB
* add README.md in queries_v2
* add hyperlink to TPCDS_FIXES_SUMMARY_PRESTO.md
* move README location
)

Bumps [golang.org/x/crypto](https://github.com/golang/crypto) from 0.36.0 to 0.45.0.
- [Commits](golang/crypto@v0.36.0...v0.45.0)

---
updated-dependencies:
- dependency-name: golang.org/x/crypto
  dependency-version: 0.45.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…db#80)

Bumps [filippo.io/edwards25519](https://github.com/FiloSottile/edwards25519) from 1.1.0 to 1.1.1.
- [Commits](FiloSottile/edwards25519@v1.1.0...v1.1.1)

---
updated-dependencies:
- dependency-name: filippo.io/edwards25519
  dependency-version: 1.1.1
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
- Fix data race: remove concurrent write to pseudoStage.States.RunStartTime
  inside syncedTime callback in loadjson (line 118 sets it after goroutines finish)
- Guard saveQueryJsonFile against empty QueryId to avoid calling GetQueryInfo("")
- Fix zerolog field chain in runShellScripts: assign result back to logEntry so
  stdout, stderr, exit_code, and stage fields are actually emitted
- Remove dead ValidateRequiredFlags() call in queryplan (no required flags exist)
- Log os.Remove error in genconfig stale file cleanup
- Switch FileBasedRunRecorder to encoding/csv.Writer to properly escape fields
  and log write errors instead of silently discarding them
- Replace shared package-level RunsValueOne/RunsValueZero with intPtr() helper
  to avoid aliasing risk where mutation would affect all stages
- Use errors.New instead of fmt.Errorf("%s", ...) in loadjson
- Replace custom fileNameWithoutPathAndExt with filepath.Base + filepath.Ext
- Check handleQueryError return value for SELECT COUNT(*) in table_summary
…stodb#81)

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Replace the previous run-specific delete with a generic cleanup that removes orphaned rows from five Presto query metadata tables (presto_query_creation_info, presto_query_operator_stats, presto_query_plans, presto_query_stage_stats, presto_query_statistics). Each DELETE uses a LEFT JOIN to presto_benchmarks.pbench_queries and removes rows where p.query_id IS NULL, ensuring metadata not referenced by pbench_queries is purged. This replaces the prior ad-hoc delete targeting r.run_id IN (2833).
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
…ntext propagation

- Quote SQL identifiers in table_summary.go to prevent injection via adversarial names
- Handle RowsAffected errors in mysql_run_recorder.go instead of discarding
- Check csv.Writer.Write errors in run_recorder.go
- Propagate parent context in GetCtxWithTimeout instead of using context.Background()
- Add dedicated HTTP client with 30s timeout for Pulumi API calls
- Handle JSON null in Float64Time.UnmarshalJSON
- Log queryOutputFile.Close() errors in stage.go
- Remove redundant continue in unmarshaller.go pointer loop
- Add tests for sqlIdent, Float64Time null, GetCtxWithTimeout, FileBasedRunRecorder
The query_json package was renamed to queryjson in v2.1.1.
Updated all import paths and package qualifiers across 6 files.
…erent syntax compared to the Presto generated ones, difference is column names have _ instead of . (ex: l_returnflag vs l.returnflag)
@minhancao minhancao force-pushed the tpch_official_queries branch from 3d1e7a1 to 8689ad2 Compare April 7, 2026 21:04
@ethanyzhang
Copy link
Copy Markdown
Collaborator

@minhancao What about the existing TPC-H queries? Should we remove them

@minhancao
Copy link
Copy Markdown
Contributor Author

@ethanyzhang No, I don't think we should remove the existing TPC-H queries as they are being used and ran in our internal cron job TPCH weekly runs.

@ethanyzhang
Copy link
Copy Markdown
Collaborator

@minhancao whats the difference.

@minhancao
Copy link
Copy Markdown
Contributor Author

@minhancao whats the difference.

@ethanyzhang The syntax for the column names are different. In the tpch official (tpchstandard catalog), the column names are with _, meanwhile Presto syntax has .

Ex:
Presto's TPCH Q1:

--TPCH Q1
SELECT
    l.returnflag,
    l.linestatus,
    sum(l.quantity)                                       AS sum_qty,
    sum(l.extendedprice)                                  AS sum_base_price,
    sum(l.extendedprice * (1 - l.discount))               AS sum_disc_price,
    sum(l.extendedprice * (1 - l.discount) * (1 + l.tax)) AS sum_charge,
    avg(l.quantity)                                       AS avg_qty,
    avg(l.extendedprice)                                  AS avg_price,
    avg(l.discount)                                       AS avg_disc,
    count(*)                                              AS count_order
FROM
    lineitem AS l
WHERE
    l.shipdate <= DATE '1998-12-01' - INTERVAL '90' DAY
GROUP BY
    l.returnflag,
    l.linestatus
ORDER BY
    l.returnflag,
    l.linestatus;

TPCH official/standard Q1:

--TPCH Q1
SELECT
    l_returnflag,
    l_linestatus,
    sum(l_quantity)                                       AS sum_qty,
    sum(l_extendedprice)                                  AS sum_base_price,
    sum(l_extendedprice * (1 - l_discount))               AS sum_disc_price,
    sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) AS sum_charge,
    avg(l_quantity)                                       AS avg_qty,
    avg(l_extendedprice)                                  AS avg_price,
    avg(l_discount)                                       AS avg_disc,
    count(*)                                              AS count_order
FROM
    lineitem AS l
WHERE
    l_shipdate <= DATE '1998-12-01' - INTERVAL '90' DAY
GROUP BY
    l_returnflag,
    l_linestatus
ORDER BY
    l_returnflag,
    l_linestatus;

@ethanyzhang
Copy link
Copy Markdown
Collaborator

can you add those to https://github.com/prestodb/pbench-benchmarks

@ethanyzhang
Copy link
Copy Markdown
Collaborator

please add some description to the commit on the difference when moving this there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants