Skip to content

Commit 373439e

Browse files
committed
Enhanced Queries (Joins & Aggregations) in AWS SDK Java v2
1 parent 26bf258 commit 373439e

File tree

49 files changed

+11594
-222
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+11594
-222
lines changed
Lines changed: 275 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,275 @@
1+
# Enhanced Query Benchmark Instructions
2+
3+
This document covers two ways to run the Enhanced Query (join and aggregation) benchmarks: **local DynamoDB** (no AWS required) and **EC2 + real DynamoDB** (production-like numbers).
4+
5+
---
6+
7+
## Local DynamoDB benchmark (recommended for design doc)
8+
9+
No AWS account or credentials required. Uses in-process DynamoDB Local, creates and seeds 1000 customers × 1000 orders, then runs five scenarios and prints latency stats (avgMs, p50Ms, p95Ms, rows).
10+
11+
### How to run
12+
13+
From the **repository root**:
14+
15+
```bash
16+
./services-custom/dynamodb-enhanced/run-enhanced-query-benchmark-local.sh
17+
```
18+
19+
The script sets `USE_LOCAL_DYNAMODB=true` and invokes the benchmark runner. Results are printed to stdout. To save CSV results:
20+
21+
```bash
22+
BENCHMARK_OUTPUT_FILE=benchmark_local.csv ./services-custom/dynamodb-enhanced/run-enhanced-query-benchmark-local.sh
23+
```
24+
25+
Optional env vars (set before running the script): `BENCHMARK_ITERATIONS` (default 5), `BENCHMARK_WARMUP` (default 2), `BENCHMARK_OUTPUT_FILE` (optional path for CSV).
26+
27+
### Results
28+
29+
Example output (environment and scenario lines):
30+
31+
```
32+
Using in-process DynamoDB Local.
33+
Creating tables and seeding data (1000 customers x 1000 orders)...
34+
...
35+
Environment: DynamoDB Local (in-process) CUSTOMERS_TABLE=customers_large ORDERS_TABLE=orders_large
36+
Warmup=2 Iterations=5
37+
---
38+
baseOnly_keyCondition: avgMs=... p50Ms=... p95Ms=... rows=1
39+
joinInner_c1: avgMs=... p50Ms=... p95Ms=... rows=1000
40+
...
41+
```
42+
43+
Use this output (or the CSV file) in the design document. See [COMPLEX_QUERIES_DESIGN.md](COMPLEX_QUERIES_DESIGN.md#benchmarking) for where to reference the benchmark and link to results.
44+
45+
---
46+
47+
## EC2 + Real DynamoDB benchmark
48+
49+
Use this for production-like latency (e.g. external claims or SLA discussions). Requires AWS account and EC2.
50+
51+
### Prerequisites
52+
53+
- AWS account with permissions to create DynamoDB tables and launch EC2 instances (or use existing EC2).
54+
- AWS CLI configured (`aws configure`) or IAM role for EC2 with DynamoDB access.
55+
- Java 8+ and Maven 3.6+ (on your machine for building; on EC2 for running).
56+
57+
---
58+
59+
## Step 1: Create DynamoDB tables (AWS CLI)
60+
61+
Create two tables in your chosen region (e.g. `us-east-1`) with the same schema as the functional tests.
62+
63+
**Customers table** (partition key: `customerId` String):
64+
65+
```bash
66+
export AWS_REGION=us-east-1
67+
export CUSTOMERS_TABLE=customers_large
68+
export ORDERS_TABLE=orders_large
69+
70+
aws dynamodb create-table \
71+
--table-name $CUSTOMERS_TABLE \
72+
--attribute-definitions AttributeName=customerId,AttributeType=S \
73+
--key-schema AttributeName=customerId,KeyType=HASH \
74+
--billing-mode PAY_PER_REQUEST \
75+
--region $AWS_REGION
76+
```
77+
78+
**Orders table** (partition key: `customerId` String, sort key: `orderId` String):
79+
80+
```bash
81+
aws dynamodb create-table \
82+
--table-name $ORDERS_TABLE \
83+
--attribute-definitions \
84+
AttributeName=customerId,AttributeType=S \
85+
AttributeName=orderId,AttributeType=S \
86+
--key-schema \
87+
AttributeName=customerId,KeyType=HASH \
88+
AttributeName=orderId,KeyType=RANGE \
89+
--billing-mode PAY_PER_REQUEST \
90+
--region $AWS_REGION
91+
```
92+
93+
Wait until both tables are `ACTIVE`:
94+
95+
```bash
96+
aws dynamodb describe-table --table-name $CUSTOMERS_TABLE --query 'Table.TableStatus'
97+
aws dynamodb describe-table --table-name $ORDERS_TABLE --query 'Table.TableStatus'
98+
```
99+
100+
---
101+
102+
## Step 2: Seed the tables (optional: use benchmark runner with CREATE_AND_SEED)
103+
104+
You can either seed from your **local machine** (or an EC2 instance) by running the benchmark runner once with `CREATE_AND_SEED=true`. This creates tables if they do not exist (skip if you already created them in Step 1) and seeds **1000 customers × 1000 orders** (1M orders). For tables you already created, use the same table names and set only the seed path.
105+
106+
**Option A – Seed from local (or EC2) with the runner**
107+
108+
From the **repo root**:
109+
110+
```bash
111+
export AWS_REGION=us-east-1
112+
export CUSTOMERS_TABLE=customers_large
113+
export ORDERS_TABLE=orders_large
114+
export CREATE_AND_SEED=true
115+
116+
mvn test-compile exec:java -pl services-custom/dynamodb-enhanced \
117+
-Dexec.mainClass="software.amazon.awssdk.enhanced.dynamodb.functionaltests.EnhancedQueryBenchmarkRunner" \
118+
-Dexec.classpathScope=test
119+
```
120+
121+
If the tables already exist, the initializer will skip creation and only seed data (idempotent). If you use **pay-per-request** billing, no capacity settings are needed. Seeding 1M items may take several minutes and incur write cost.
122+
123+
**Option B – Create tables via runner (omit Step 1)**
124+
125+
If you omit Step 1 and set `CREATE_AND_SEED=true`, the runner will try to create the tables. The SDK’s `createTable` uses **provisioned** throughput by default (50 RCU/WCU). For pay-per-request, create tables in Step 1 and only seed via the runner (run with `CREATE_AND_SEED=true` once; the initializer skips create if tables exist).
126+
127+
---
128+
129+
## Step 3: Launch EC2 and install Java + Maven
130+
131+
1. Launch an EC2 instance (e.g. Amazon Linux 2 or Ubuntu) in the **same region** as your DynamoDB tables.
132+
2. Attach an **IAM role** to the instance with at least:
133+
- `dynamodb:GetItem`, `dynamodb:PutItem`, `dynamodb:Query`, `dynamodb:Scan`, `dynamodb:BatchWriteItem`, `dynamodb:DescribeTable`, `dynamodb:CreateTable` (if you use CREATE_AND_SEED).
134+
3. SSH into the instance and install Java and Maven:
135+
136+
**Amazon Linux 2:**
137+
138+
```bash
139+
sudo yum install -y java-11-amazon-corretto maven
140+
```
141+
142+
**Ubuntu:**
143+
144+
```bash
145+
sudo apt-get update && sudo apt-get install -y openjdk-11-jdk maven
146+
```
147+
148+
4. Verify:
149+
150+
```bash
151+
java -version
152+
mvn -version
153+
```
154+
155+
---
156+
157+
## Step 4: Build and copy the project to EC2
158+
159+
**On your local machine** (from repo root):
160+
161+
```bash
162+
cd /path/to/aws-sdk-java-v2
163+
mvn clean package -pl services-custom/dynamodb-enhanced -DskipTests -q
164+
```
165+
166+
Copy the module and its dependencies to EC2. Option A: copy the whole repo and build on EC2. Option B: copy the built JAR and dependency JARs.
167+
168+
**Option A – Copy repo and build on EC2**
169+
170+
```bash
171+
scp -r . ec2-user@<EC2_PUBLIC_IP>:~/aws-sdk-java-v2
172+
ssh ec2-user@<EC2_PUBLIC_IP> "cd ~/aws-sdk-java-v2 && mvn clean test-compile -pl services-custom/dynamodb-enhanced -DskipTests -q"
173+
```
174+
175+
**Option B – Copy only the dynamodb-enhanced module and run with mvn exec:java on EC2**
176+
177+
Copy the entire `aws-sdk-java-v2` repo (or at least the parent POMs and `services-custom/dynamodb-enhanced`) so that `mvn exec:java -pl services-custom/dynamodb-enhanced` can resolve the parent and run the benchmark. Building on EC2 is usually simpler:
178+
179+
```bash
180+
rsync -avz --exclude='.git' . ec2-user@<EC2_PUBLIC_IP>:~/aws-sdk-java-v2
181+
```
182+
183+
Then on EC2:
184+
185+
```bash
186+
cd ~/aws-sdk-java-v2
187+
mvn test-compile -pl services-custom/dynamodb-enhanced -DskipTests -q
188+
```
189+
190+
---
191+
192+
## Step 5: Run the benchmark on EC2
193+
194+
SSH to the EC2 instance and set environment variables, then run the benchmark.
195+
196+
```bash
197+
cd ~/aws-sdk-java-v2
198+
199+
export AWS_REGION=us-east-1
200+
export CUSTOMERS_TABLE=customers_large
201+
export ORDERS_TABLE=orders_large
202+
export BENCHMARK_ITERATIONS=5
203+
export BENCHMARK_WARMUP=2
204+
# Optional: append CSV results to a file
205+
export BENCHMARK_OUTPUT_FILE=benchmark_results.csv
206+
207+
# Do NOT set CREATE_AND_SEED unless you want to create/seed from this instance (tables should already exist and be seeded).
208+
209+
mvn exec:java -pl services-custom/dynamodb-enhanced \
210+
-Dexec.mainClass="software.amazon.awssdk.enhanced.dynamodb.functionaltests.EnhancedQueryBenchmarkRunner" \
211+
-Dexec.classpathScope=test -q
212+
```
213+
214+
Example output:
215+
216+
```
217+
Environment: AWS_REGION=us-east-1 CUSTOMERS_TABLE=customers_large ORDERS_TABLE=orders_large
218+
Warmup=2 Iterations=5
219+
---
220+
baseOnly_keyCondition: avgMs=45.20 p50Ms=42 p95Ms=58 rows=1
221+
joinInner_c1: avgMs=320.40 p50Ms=310 p95Ms=380 rows=1000
222+
aggregation_groupByCount_c1: avgMs=305.20 p50Ms=298 p95Ms=350 rows=1
223+
aggregation_groupBySum_c1: avgMs=318.60 p50Ms=312 p95Ms=355 rows=1
224+
joinLeft_c1_limit50: avgMs=89.40 p50Ms=85 p95Ms=102 rows=50
225+
```
226+
227+
---
228+
229+
## Step 6: Collect results
230+
231+
- **Stdout**: Redirect to a file, e.g. `mvn exec:java ... > benchmark_stdout.txt 2>&1`.
232+
- **CSV**: If `BENCHMARK_OUTPUT_FILE` is set, the runner appends one CSV line per scenario to the file. Copy the file from EC2:
233+
234+
```bash
235+
scp ec2-user@<EC2_PUBLIC_IP>:~/aws-sdk-java-v2/benchmark_results.csv .
236+
```
237+
238+
Use the output (avgMs, p50Ms, p95Ms, rows) in your design doc. Document in the doc: **region**, **EC2 instance type**, **table names**, **dataset size** (e.g. 1000 customers × 1000 orders), and **billing mode** (pay-per-request or provisioned).
239+
240+
---
241+
242+
## Step 7: Cleanup (optional)
243+
244+
To avoid ongoing cost, delete the DynamoDB tables and terminate the EC2 instance when done:
245+
246+
```bash
247+
aws dynamodb delete-table --table-name customers_large --region us-east-1
248+
aws dynamodb delete-table --table-name orders_large --region us-east-1
249+
# Terminate the EC2 instance from the AWS Console or CLI.
250+
```
251+
252+
---
253+
254+
## Environment variable reference
255+
256+
| Variable | Required | Default | Description |
257+
|----------|----------|---------|-------------|
258+
| `AWS_REGION` | No | default SDK region | DynamoDB region (e.g. `us-east-1`). |
259+
| `CUSTOMERS_TABLE` | No | `customers_large` | Customers table name. |
260+
| `ORDERS_TABLE` | No | `orders_large` | Orders table name. |
261+
| `CREATE_AND_SEED` | No | (unset) | Set to `true` to create tables (if missing) and seed 1000×1000 data. Requires DynamoDB create/put permissions. |
262+
| `BENCHMARK_ITERATIONS` | No | `5` | Number of measured runs per scenario. |
263+
| `BENCHMARK_WARMUP` | No | `2` | Warm-up runs per scenario before measuring. |
264+
| `BENCHMARK_OUTPUT_FILE` | No | (none) | If set, CSV results are appended to this path. |
265+
266+
---
267+
268+
## Running locally against DynamoDB Local
269+
270+
To run the same benchmark against **DynamoDB Local** (e.g. for CI or no-AWS runs):
271+
272+
1. Start DynamoDB Local (e.g. `docker run -p 8000:8000 amazon/dynamodb-local` or the SDK’s embedded LocalDynamoDb).
273+
2. Set `AWS_REGION` and point the SDK to the local endpoint (e.g. `DYNAMODB_ENDPOINT_OVERRIDE=http://localhost:8000` if your test setup supports it, or run the functional tests which use in-process LocalDynamoDb).
274+
275+
The benchmark runner does **not** set an endpoint override by default; it uses the default DynamoDB endpoint for the given region. To run against Local, you would need to configure the client with an endpoint override (e.g. in a variant of the runner or via a system property your client builder reads). The functional tests and `run-enhanced-query-tests-and-print-timing.sh` already run against Local and produce timing output for the design doc.

services-custom/dynamodb-enhanced/pom.xml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,13 @@
111111
<artifactId>http-client-spi</artifactId>
112112
<version>${awsjavasdk.version}</version>
113113
</dependency>
114+
<!-- DynamoDB Local ServerRunner telemetry loads UrlConnectionHttpClient (Pinpoint); tests and exec:java use test scope -->
115+
<dependency>
116+
<groupId>software.amazon.awssdk</groupId>
117+
<artifactId>url-connection-client</artifactId>
118+
<version>${awsjavasdk.version}</version>
119+
<scope>test</scope>
120+
</dependency>
114121
<dependency>
115122
<groupId>software.amazon.awssdk</groupId>
116123
<artifactId>sdk-core</artifactId>

services-custom/dynamodb-enhanced/src/main/java/software/amazon/awssdk/enhanced/dynamodb/DynamoDbEnhancedAsyncClient.java

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,8 @@
3737
import software.amazon.awssdk.enhanced.dynamodb.model.TransactWriteItemsEnhancedRequest;
3838
import software.amazon.awssdk.enhanced.dynamodb.model.TransactWriteItemsEnhancedResponse;
3939
import software.amazon.awssdk.enhanced.dynamodb.model.UpdateItemEnhancedRequest;
40+
import software.amazon.awssdk.enhanced.dynamodb.query.result.EnhancedQueryRow;
41+
import software.amazon.awssdk.enhanced.dynamodb.query.spec.QueryExpressionSpec;
4042
import software.amazon.awssdk.services.dynamodb.DynamoDbAsyncClient;
4143

4244
/**
@@ -527,13 +529,22 @@ default CompletableFuture<TransactWriteItemsEnhancedResponse> transactWriteItems
527529
throw new UnsupportedOperationException();
528530
}
529531

532+
/**
533+
* Executes an enhanced query (joins, aggregations, filters) described by the given spec, asynchronously.
534+
*
535+
* @param spec the query specification
536+
* @return a publisher of result rows
537+
*/
538+
default SdkPublisher<EnhancedQueryRow> enhancedQuery(QueryExpressionSpec spec) {
539+
throw new UnsupportedOperationException();
540+
}
541+
530542
/**
531543
* Returns the underlying low-level {@link DynamoDbAsyncClient} that this enhanced client uses to make API calls.
532544
* <p>
533545
* The returned client is the same instance that was provided during construction via
534-
* {@link Builder#dynamoDbClient(DynamoDbAsyncClient)}, or the internally-created one if {@link #create()} was used.
535-
* It is <b>not</b> a copy — any operations performed on it (including {@code close()}) will affect this
536-
* enhanced client as well.
546+
* {@link Builder#dynamoDbClient(DynamoDbAsyncClient)}, or the internally-created one if {@link #create()} was used. It is
547+
* <b>not</b> a copy — any operations performed on it (including {@code close()}) will affect this enhanced client as well.
537548
*
538549
* @return the underlying {@link DynamoDbAsyncClient}
539550
* @throws UnsupportedOperationException if the implementation does not support this operation

services-custom/dynamodb-enhanced/src/main/java/software/amazon/awssdk/enhanced/dynamodb/DynamoDbEnhancedClient.java

Lines changed: 27 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,9 @@
3636
import software.amazon.awssdk.enhanced.dynamodb.model.TransactWriteItemsEnhancedRequest;
3737
import software.amazon.awssdk.enhanced.dynamodb.model.TransactWriteItemsEnhancedResponse;
3838
import software.amazon.awssdk.enhanced.dynamodb.model.UpdateItemEnhancedRequest;
39+
import software.amazon.awssdk.enhanced.dynamodb.query.result.EnhancedQueryLatencyReport;
40+
import software.amazon.awssdk.enhanced.dynamodb.query.result.EnhancedQueryResult;
41+
import software.amazon.awssdk.enhanced.dynamodb.query.spec.QueryExpressionSpec;
3942
import software.amazon.awssdk.services.dynamodb.DynamoDbClient;
4043
import software.amazon.awssdk.services.dynamodb.model.BatchGetItemRequest;
4144

@@ -537,13 +540,34 @@ default TransactWriteItemsEnhancedResponse transactWriteItemsWithResponse(
537540
throw new UnsupportedOperationException();
538541
}
539542

543+
/**
544+
* Executes an enhanced query (joins, aggregations, filters) described by the given spec.
545+
*
546+
* @param spec the query specification
547+
* @return iterable of result rows
548+
*/
549+
default EnhancedQueryResult enhancedQuery(QueryExpressionSpec spec) {
550+
throw new UnsupportedOperationException();
551+
}
552+
553+
/**
554+
* Executes an enhanced query and optionally reports latency.
555+
*
556+
* @param spec the query specification
557+
* @param reportConsumer optional consumer for the latency report; may be null
558+
* @return iterable of result rows
559+
*/
560+
default EnhancedQueryResult enhancedQuery(QueryExpressionSpec spec,
561+
Consumer<EnhancedQueryLatencyReport> reportConsumer) {
562+
throw new UnsupportedOperationException();
563+
}
564+
540565
/**
541566
* Returns the underlying low-level {@link DynamoDbClient} that this enhanced client uses to make API calls.
542567
* <p>
543568
* The returned client is the same instance that was provided during construction via
544-
* {@link Builder#dynamoDbClient(DynamoDbClient)}, or the internally-created one if {@link #create()} was used.
545-
* It is <b>not</b> a copy — any operations performed on it (including {@code close()}) will affect this
546-
* enhanced client as well.
569+
* {@link Builder#dynamoDbClient(DynamoDbClient)}, or the internally-created one if {@link #create()} was used. It is
570+
* <b>not</b> a copy — any operations performed on it (including {@code close()}) will affect this enhanced client as well.
547571
*
548572
* @return the underlying {@link DynamoDbClient}
549573
* @throws UnsupportedOperationException if the implementation does not support this operation

0 commit comments

Comments
 (0)