|
| 1 | +# Enhanced Query Benchmark Instructions |
| 2 | + |
| 3 | +This document covers two ways to run the Enhanced Query (join and aggregation) benchmarks: **local DynamoDB** (no AWS required) and **EC2 + real DynamoDB** (production-like numbers). |
| 4 | + |
| 5 | +--- |
| 6 | + |
| 7 | +## Local DynamoDB benchmark (recommended for design doc) |
| 8 | + |
| 9 | +No AWS account or credentials required. Uses in-process DynamoDB Local, creates and seeds 1000 customers × 1000 orders, then runs five scenarios and prints latency stats (avgMs, p50Ms, p95Ms, rows). |
| 10 | + |
| 11 | +### How to run |
| 12 | + |
| 13 | +From the **repository root**: |
| 14 | + |
| 15 | +```bash |
| 16 | +./services-custom/dynamodb-enhanced/run-enhanced-query-benchmark-local.sh |
| 17 | +``` |
| 18 | + |
| 19 | +The script sets `USE_LOCAL_DYNAMODB=true` and invokes the benchmark runner. Results are printed to stdout. To save CSV results: |
| 20 | + |
| 21 | +```bash |
| 22 | +BENCHMARK_OUTPUT_FILE=benchmark_local.csv ./services-custom/dynamodb-enhanced/run-enhanced-query-benchmark-local.sh |
| 23 | +``` |
| 24 | + |
| 25 | +Optional env vars (set before running the script): `BENCHMARK_ITERATIONS` (default 5), `BENCHMARK_WARMUP` (default 2), `BENCHMARK_OUTPUT_FILE` (optional path for CSV). |
| 26 | + |
| 27 | +### Results |
| 28 | + |
| 29 | +Example output (environment and scenario lines): |
| 30 | + |
| 31 | +``` |
| 32 | +Using in-process DynamoDB Local. |
| 33 | +Creating tables and seeding data (1000 customers x 1000 orders)... |
| 34 | +... |
| 35 | +Environment: DynamoDB Local (in-process) CUSTOMERS_TABLE=customers_large ORDERS_TABLE=orders_large |
| 36 | +Warmup=2 Iterations=5 |
| 37 | +--- |
| 38 | +baseOnly_keyCondition: avgMs=... p50Ms=... p95Ms=... rows=1 |
| 39 | +joinInner_c1: avgMs=... p50Ms=... p95Ms=... rows=1000 |
| 40 | +... |
| 41 | +``` |
| 42 | + |
| 43 | +Use this output (or the CSV file) in the design document. See [COMPLEX_QUERIES_DESIGN.md](COMPLEX_QUERIES_DESIGN.md#benchmarking) for where to reference the benchmark and link to results. |
| 44 | + |
| 45 | +--- |
| 46 | + |
| 47 | +## EC2 + Real DynamoDB benchmark |
| 48 | + |
| 49 | +Use this for production-like latency (e.g. external claims or SLA discussions). Requires AWS account and EC2. |
| 50 | + |
| 51 | +### Prerequisites |
| 52 | + |
| 53 | +- AWS account with permissions to create DynamoDB tables and launch EC2 instances (or use existing EC2). |
| 54 | +- AWS CLI configured (`aws configure`) or IAM role for EC2 with DynamoDB access. |
| 55 | +- Java 8+ and Maven 3.6+ (on your machine for building; on EC2 for running). |
| 56 | + |
| 57 | +--- |
| 58 | + |
| 59 | +## Step 1: Create DynamoDB tables (AWS CLI) |
| 60 | + |
| 61 | +Create two tables in your chosen region (e.g. `us-east-1`) with the same schema as the functional tests. |
| 62 | + |
| 63 | +**Customers table** (partition key: `customerId` String): |
| 64 | + |
| 65 | +```bash |
| 66 | +export AWS_REGION=us-east-1 |
| 67 | +export CUSTOMERS_TABLE=customers_large |
| 68 | +export ORDERS_TABLE=orders_large |
| 69 | + |
| 70 | +aws dynamodb create-table \ |
| 71 | + --table-name $CUSTOMERS_TABLE \ |
| 72 | + --attribute-definitions AttributeName=customerId,AttributeType=S \ |
| 73 | + --key-schema AttributeName=customerId,KeyType=HASH \ |
| 74 | + --billing-mode PAY_PER_REQUEST \ |
| 75 | + --region $AWS_REGION |
| 76 | +``` |
| 77 | + |
| 78 | +**Orders table** (partition key: `customerId` String, sort key: `orderId` String): |
| 79 | + |
| 80 | +```bash |
| 81 | +aws dynamodb create-table \ |
| 82 | + --table-name $ORDERS_TABLE \ |
| 83 | + --attribute-definitions \ |
| 84 | + AttributeName=customerId,AttributeType=S \ |
| 85 | + AttributeName=orderId,AttributeType=S \ |
| 86 | + --key-schema \ |
| 87 | + AttributeName=customerId,KeyType=HASH \ |
| 88 | + AttributeName=orderId,KeyType=RANGE \ |
| 89 | + --billing-mode PAY_PER_REQUEST \ |
| 90 | + --region $AWS_REGION |
| 91 | +``` |
| 92 | + |
| 93 | +Wait until both tables are `ACTIVE`: |
| 94 | + |
| 95 | +```bash |
| 96 | +aws dynamodb describe-table --table-name $CUSTOMERS_TABLE --query 'Table.TableStatus' |
| 97 | +aws dynamodb describe-table --table-name $ORDERS_TABLE --query 'Table.TableStatus' |
| 98 | +``` |
| 99 | + |
| 100 | +--- |
| 101 | + |
| 102 | +## Step 2: Seed the tables (optional: use benchmark runner with CREATE_AND_SEED) |
| 103 | + |
| 104 | +You can either seed from your **local machine** (or an EC2 instance) by running the benchmark runner once with `CREATE_AND_SEED=true`. This creates tables if they do not exist (skip if you already created them in Step 1) and seeds **1000 customers × 1000 orders** (1M orders). For tables you already created, use the same table names and set only the seed path. |
| 105 | + |
| 106 | +**Option A – Seed from local (or EC2) with the runner** |
| 107 | + |
| 108 | +From the **repo root**: |
| 109 | + |
| 110 | +```bash |
| 111 | +export AWS_REGION=us-east-1 |
| 112 | +export CUSTOMERS_TABLE=customers_large |
| 113 | +export ORDERS_TABLE=orders_large |
| 114 | +export CREATE_AND_SEED=true |
| 115 | + |
| 116 | +mvn test-compile exec:java -pl services-custom/dynamodb-enhanced \ |
| 117 | + -Dexec.mainClass="software.amazon.awssdk.enhanced.dynamodb.functionaltests.EnhancedQueryBenchmarkRunner" \ |
| 118 | + -Dexec.classpathScope=test |
| 119 | +``` |
| 120 | + |
| 121 | +If the tables already exist, the initializer will skip creation and only seed data (idempotent). If you use **pay-per-request** billing, no capacity settings are needed. Seeding 1M items may take several minutes and incur write cost. |
| 122 | + |
| 123 | +**Option B – Create tables via runner (omit Step 1)** |
| 124 | + |
| 125 | +If you omit Step 1 and set `CREATE_AND_SEED=true`, the runner will try to create the tables. The SDK’s `createTable` uses **provisioned** throughput by default (50 RCU/WCU). For pay-per-request, create tables in Step 1 and only seed via the runner (run with `CREATE_AND_SEED=true` once; the initializer skips create if tables exist). |
| 126 | + |
| 127 | +--- |
| 128 | + |
| 129 | +## Step 3: Launch EC2 and install Java + Maven |
| 130 | + |
| 131 | +1. Launch an EC2 instance (e.g. Amazon Linux 2 or Ubuntu) in the **same region** as your DynamoDB tables. |
| 132 | +2. Attach an **IAM role** to the instance with at least: |
| 133 | + - `dynamodb:GetItem`, `dynamodb:PutItem`, `dynamodb:Query`, `dynamodb:Scan`, `dynamodb:BatchWriteItem`, `dynamodb:DescribeTable`, `dynamodb:CreateTable` (if you use CREATE_AND_SEED). |
| 134 | +3. SSH into the instance and install Java and Maven: |
| 135 | + |
| 136 | +**Amazon Linux 2:** |
| 137 | + |
| 138 | +```bash |
| 139 | +sudo yum install -y java-11-amazon-corretto maven |
| 140 | +``` |
| 141 | + |
| 142 | +**Ubuntu:** |
| 143 | + |
| 144 | +```bash |
| 145 | +sudo apt-get update && sudo apt-get install -y openjdk-11-jdk maven |
| 146 | +``` |
| 147 | + |
| 148 | +4. Verify: |
| 149 | + |
| 150 | +```bash |
| 151 | +java -version |
| 152 | +mvn -version |
| 153 | +``` |
| 154 | + |
| 155 | +--- |
| 156 | + |
| 157 | +## Step 4: Build and copy the project to EC2 |
| 158 | + |
| 159 | +**On your local machine** (from repo root): |
| 160 | + |
| 161 | +```bash |
| 162 | +cd /path/to/aws-sdk-java-v2 |
| 163 | +mvn clean package -pl services-custom/dynamodb-enhanced -DskipTests -q |
| 164 | +``` |
| 165 | + |
| 166 | +Copy the module and its dependencies to EC2. Option A: copy the whole repo and build on EC2. Option B: copy the built JAR and dependency JARs. |
| 167 | + |
| 168 | +**Option A – Copy repo and build on EC2** |
| 169 | + |
| 170 | +```bash |
| 171 | +scp -r . ec2-user@<EC2_PUBLIC_IP>:~/aws-sdk-java-v2 |
| 172 | +ssh ec2-user@<EC2_PUBLIC_IP> "cd ~/aws-sdk-java-v2 && mvn clean test-compile -pl services-custom/dynamodb-enhanced -DskipTests -q" |
| 173 | +``` |
| 174 | + |
| 175 | +**Option B – Copy only the dynamodb-enhanced module and run with mvn exec:java on EC2** |
| 176 | + |
| 177 | +Copy the entire `aws-sdk-java-v2` repo (or at least the parent POMs and `services-custom/dynamodb-enhanced`) so that `mvn exec:java -pl services-custom/dynamodb-enhanced` can resolve the parent and run the benchmark. Building on EC2 is usually simpler: |
| 178 | + |
| 179 | +```bash |
| 180 | +rsync -avz --exclude='.git' . ec2-user@<EC2_PUBLIC_IP>:~/aws-sdk-java-v2 |
| 181 | +``` |
| 182 | + |
| 183 | +Then on EC2: |
| 184 | + |
| 185 | +```bash |
| 186 | +cd ~/aws-sdk-java-v2 |
| 187 | +mvn test-compile -pl services-custom/dynamodb-enhanced -DskipTests -q |
| 188 | +``` |
| 189 | + |
| 190 | +--- |
| 191 | + |
| 192 | +## Step 5: Run the benchmark on EC2 |
| 193 | + |
| 194 | +SSH to the EC2 instance and set environment variables, then run the benchmark. |
| 195 | + |
| 196 | +```bash |
| 197 | +cd ~/aws-sdk-java-v2 |
| 198 | + |
| 199 | +export AWS_REGION=us-east-1 |
| 200 | +export CUSTOMERS_TABLE=customers_large |
| 201 | +export ORDERS_TABLE=orders_large |
| 202 | +export BENCHMARK_ITERATIONS=5 |
| 203 | +export BENCHMARK_WARMUP=2 |
| 204 | +# Optional: append CSV results to a file |
| 205 | +export BENCHMARK_OUTPUT_FILE=benchmark_results.csv |
| 206 | + |
| 207 | +# Do NOT set CREATE_AND_SEED unless you want to create/seed from this instance (tables should already exist and be seeded). |
| 208 | + |
| 209 | +mvn exec:java -pl services-custom/dynamodb-enhanced \ |
| 210 | + -Dexec.mainClass="software.amazon.awssdk.enhanced.dynamodb.functionaltests.EnhancedQueryBenchmarkRunner" \ |
| 211 | + -Dexec.classpathScope=test -q |
| 212 | +``` |
| 213 | + |
| 214 | +Example output: |
| 215 | + |
| 216 | +``` |
| 217 | +Environment: AWS_REGION=us-east-1 CUSTOMERS_TABLE=customers_large ORDERS_TABLE=orders_large |
| 218 | +Warmup=2 Iterations=5 |
| 219 | +--- |
| 220 | +baseOnly_keyCondition: avgMs=45.20 p50Ms=42 p95Ms=58 rows=1 |
| 221 | +joinInner_c1: avgMs=320.40 p50Ms=310 p95Ms=380 rows=1000 |
| 222 | +aggregation_groupByCount_c1: avgMs=305.20 p50Ms=298 p95Ms=350 rows=1 |
| 223 | +aggregation_groupBySum_c1: avgMs=318.60 p50Ms=312 p95Ms=355 rows=1 |
| 224 | +joinLeft_c1_limit50: avgMs=89.40 p50Ms=85 p95Ms=102 rows=50 |
| 225 | +``` |
| 226 | + |
| 227 | +--- |
| 228 | + |
| 229 | +## Step 6: Collect results |
| 230 | + |
| 231 | +- **Stdout**: Redirect to a file, e.g. `mvn exec:java ... > benchmark_stdout.txt 2>&1`. |
| 232 | +- **CSV**: If `BENCHMARK_OUTPUT_FILE` is set, the runner appends one CSV line per scenario to the file. Copy the file from EC2: |
| 233 | + |
| 234 | + ```bash |
| 235 | + scp ec2-user@<EC2_PUBLIC_IP>:~/aws-sdk-java-v2/benchmark_results.csv . |
| 236 | + ``` |
| 237 | + |
| 238 | +Use the output (avgMs, p50Ms, p95Ms, rows) in your design doc. Document in the doc: **region**, **EC2 instance type**, **table names**, **dataset size** (e.g. 1000 customers × 1000 orders), and **billing mode** (pay-per-request or provisioned). |
| 239 | + |
| 240 | +--- |
| 241 | + |
| 242 | +## Step 7: Cleanup (optional) |
| 243 | + |
| 244 | +To avoid ongoing cost, delete the DynamoDB tables and terminate the EC2 instance when done: |
| 245 | + |
| 246 | +```bash |
| 247 | +aws dynamodb delete-table --table-name customers_large --region us-east-1 |
| 248 | +aws dynamodb delete-table --table-name orders_large --region us-east-1 |
| 249 | +# Terminate the EC2 instance from the AWS Console or CLI. |
| 250 | +``` |
| 251 | + |
| 252 | +--- |
| 253 | + |
| 254 | +## Environment variable reference |
| 255 | + |
| 256 | +| Variable | Required | Default | Description | |
| 257 | +|----------|----------|---------|-------------| |
| 258 | +| `AWS_REGION` | No | default SDK region | DynamoDB region (e.g. `us-east-1`). | |
| 259 | +| `CUSTOMERS_TABLE` | No | `customers_large` | Customers table name. | |
| 260 | +| `ORDERS_TABLE` | No | `orders_large` | Orders table name. | |
| 261 | +| `CREATE_AND_SEED` | No | (unset) | Set to `true` to create tables (if missing) and seed 1000×1000 data. Requires DynamoDB create/put permissions. | |
| 262 | +| `BENCHMARK_ITERATIONS` | No | `5` | Number of measured runs per scenario. | |
| 263 | +| `BENCHMARK_WARMUP` | No | `2` | Warm-up runs per scenario before measuring. | |
| 264 | +| `BENCHMARK_OUTPUT_FILE` | No | (none) | If set, CSV results are appended to this path. | |
| 265 | + |
| 266 | +--- |
| 267 | + |
| 268 | +## Running locally against DynamoDB Local |
| 269 | + |
| 270 | +To run the same benchmark against **DynamoDB Local** (e.g. for CI or no-AWS runs): |
| 271 | + |
| 272 | +1. Start DynamoDB Local (e.g. `docker run -p 8000:8000 amazon/dynamodb-local` or the SDK’s embedded LocalDynamoDb). |
| 273 | +2. Set `AWS_REGION` and point the SDK to the local endpoint (e.g. `DYNAMODB_ENDPOINT_OVERRIDE=http://localhost:8000` if your test setup supports it, or run the functional tests which use in-process LocalDynamoDb). |
| 274 | + |
| 275 | +The benchmark runner does **not** set an endpoint override by default; it uses the default DynamoDB endpoint for the given region. To run against Local, you would need to configure the client with an endpoint override (e.g. in a variant of the runner or via a system property your client builder reads). The functional tests and `run-enhanced-query-tests-and-print-timing.sh` already run against Local and produce timing output for the design doc. |
0 commit comments