Skip to content

Make SpatialBench comparison apples-to-apples: baseline provenance, I/O condition, and run protocol #2

Description

@james-willis

Thanks for publishing the bench harness — the PyCanopy-side timing is well done. The issues below are about comparability to the baseline. I'm a Apache Sedona/SpatialBench contributor.

  1. The baseline numbers are mis-sourced. bench/spatial_bench/utils.py attributes the whole PUBLISHED dict to "Actions run #152 (2026-06-20)." Diffing against the actual artifacts:
  • SF10 matches run #152 exactly. ✅
  • SF1 does not match #152 — it matches run #153
  1. Both cited runs are the CI path, not m7i.2xlarge. Runs #152/#153 are ubuntu-latest GitHub Actions runs: data downloaded from HuggingFace to local disk, 3-run average, 600s timeout, multiple concurrent queries — the path our docs explicitly flag as "not for performance comparison." The code comment labels them m7i.2xlarge … 600s, which is incorrect. It would be most appropriate to compare against the published m7i single-node results (docs/single-node-benchmarks.md: S3-direct, cold start, 1200s, 1 query at a time). TBQH i don't remember how recently our published results were generated so they might not represent the latest for the libraries there.

  2. Match the I/O condition. The published m7i baseline reads cold from S3 with no prewarm. PyCanopy predownloads to local EBS (aws s3 sync in bootstrap.sh) and prewarms the page cache (warm_tables() in utils.py), so the timed load always hits warm RAM. To match, point --data-dir at the s3:// URI and drop both steps — the code already supports it (_resolve_table() handles s3://, warm_tables() no-ops on it). (The warm_tables docstring claims it matches "the resident-data condition of the published baseline," but the m7i baseline is S3-direct/non-resident.)

  3. Match the run protocol + disclose. SpatialBench reports a 3-run average; measure_query() takes a single run — average ≥3 or document single-run. Use 1200s to match the m7i baseline, and state versions, hardware, and the cold-S3 vs warm-local condition in bench/README.md.

Net: run PyCanopy on m7i.2xlarge reading cold from S3, 3-run average, 1200s, against the published m7i table — or be explicit that it's a warm-local single-run on different hardware than the cited baseline. Happy to review a follow-up run.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions