Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,8 @@ By leveraging the Polars Rust engine (Lazy API), the system achieves near-optima
| 40M Snapshot (8GB / 4 vCPU) |
| :---: |
| ![engine-performance-8gb](assets/screenshots/engine-performance-8gb-4cpu.png) |
> Benchmark data: [`40m_stats_log.csv`](assets/benchmarks/polars/)
> Dataset : [`Dataset Information`](data/)

| Metric | Data |
|:---|:---|
Expand All @@ -72,6 +74,7 @@ By leveraging the Polars Rust engine (Lazy API), the system achieves near-optima
| Efficiency (Processing) | ~307k Rows / Second |
| Total Runtime (Wall-Clock) | 130 Seconds |


* **Maximized Memory Density:** The **Primitive Integer Pipeline** allows a ~5.34GB analytical model to process within the 8GB RAM limit by shrinking join-key overhead by ~16x.
* **Near-Linear Performance Scaling:** The engine saturates available vCPUs, yielding high throughput during streaming execution.
* **Zero-Idle Economics:** 100% serverless execution ensures zero billable time during idle periods.
Expand Down
2 changes: 1 addition & 1 deletion assets/benchmarks/polars/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Measurement Methodology

This section details the methodology used to capture the memory metrics in the [`GCP Stress-Test Metrics (Scaling Efficiency)`](../../../README.md#gcp-stress-test-metrics-scaling-efficiency)
This section details the methodology used to capture the memory metrics in the [`GCP Stress-Test Metrics (Scaling Efficiency)`](../../../README.md###gcp-stress-test-metrics-scaling-efficiency)

The telemetry logger below was added to the orchestrator for a specific benchmarking run.

Expand Down
2 changes: 1 addition & 1 deletion data/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
This directory serves as the local state provider for the pipeline when executing in a non-cloud environment. It mimics the structure of the Google Cloud Storage (GCS) buckets.

## Synthetic Dataset
To replicate the high-volume environment described in the [GCP Stress-Test Metrics (Scaling Efficiency)](/README.md#gcp-stress-test-metrics-scaling-efficiency) section, you can download the 40M-row synthetic dataset here: [**Kaggle Dataset Link**](https://www.kaggle.com/datasets/melvidabryan/e-commerce-synthetic-dataset)
To replicate the high-volume environment described in the [GCP Stress-Test Metrics (Scaling Efficiency)](/README.md###GCP-Stress-Test-Metrics) section, you can download the 40M-row synthetic dataset here: [**Kaggle Dataset Link**](https://www.kaggle.com/datasets/melvidabryan/e-commerce-synthetic-dataset)

> *Note: This upload contains the **Contracted Version** of the dataset. The original "Raw" state, totaling approximately ~26GB of unrefined CSVs was omitted to prioritize transfer efficiency.*

Expand Down
Loading