Summary
Create comprehensive benchmarks comparing all cursor types with focus on:
- Result set fetching performance
- Memory usage with different chunk options
- Comparison with AWS Wrangler
Background
Related issues have highlighted the need for clearer performance guidance:
The existing benchmarks (benchmarks/20180915/, benchmarks/20220201/) are outdated and don't cover the full range of cursor options now available.
Benchmark Scope
Cursors to Test
- Cursor / DictCursor
- PandasCursor (with/without chunksize)
- ArrowCursor (arraysize, unload options)
- PolarsCursor (with/without chunksize)
- S3FSCursor
Metrics to Measure
| Category |
Metrics |
| Speed |
Query execution time, result set fetch time |
| Memory |
Peak memory usage, memory with chunk options |
| Comparison |
Side-by-side with AWS Wrangler |
| Scale |
Small, medium, large dataset behavior |
Data Source
Use the public PyPI download statistics dataset from BigQuery, which can be exported to S3 for Athena queries. This provides realistic, reproducible test data at various scales.
Reference: https://console.cloud.google.com/marketplace/product/gcp-public-data-pypi/pypi
Expected Deliverables
- Updated benchmark scripts in
benchmarks/ directory
- Documentation with:
- Performance comparison tables
- Memory usage guidance
- Recommendations for cursor selection based on use case
- README explaining how to reproduce benchmarks
Notes
Data preparation is required before implementation. The benchmark results should help users choose the appropriate cursor for their use case and understand trade-offs.
Summary
Create comprehensive benchmarks comparing all cursor types with focus on:
Background
Related issues have highlighted the need for clearer performance guidance:
The existing benchmarks (
benchmarks/20180915/,benchmarks/20220201/) are outdated and don't cover the full range of cursor options now available.Benchmark Scope
Cursors to Test
Metrics to Measure
Data Source
Use the public PyPI download statistics dataset from BigQuery, which can be exported to S3 for Athena queries. This provides realistic, reproducible test data at various scales.
Reference: https://console.cloud.google.com/marketplace/product/gcp-public-data-pypi/pypi
Expected Deliverables
benchmarks/directoryNotes
Data preparation is required before implementation. The benchmark results should help users choose the appropriate cursor for their use case and understand trade-offs.