Skip to content

Commit 686dc1f

Browse files
committed
Add OpenSearch UBI sample code
1 parent 03eadb7 commit 686dc1f

29 files changed

Lines changed: 8137 additions & 0 deletions
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
# CDK output
2+
cdk.out/
3+
dist/
4+
5+
# Dependencies
6+
node_modules/
7+
8+
# TypeScript build
9+
*.js
10+
*.d.ts
11+
!jest.config.js
12+
13+
# Lambda layer build
14+
lambda/layers/dependencies/python/*
15+
!lambda/layers/dependencies/python/.gitkeep
16+
lambda/webapp-backend/
17+
*.zip
18+
19+
# IDE
20+
.idea/
21+
.vscode/
22+
*.swp
23+
*.swo
24+
25+
# OS files
26+
.DS_Store
27+
Thumbs.db
28+
29+
# Logs
30+
*.log
31+
npm-debug.log*
32+
33+
# Test coverage
34+
coverage/
35+
.nyc_output/
Lines changed: 278 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,278 @@
1+
# UBI-LTR Pipeline - AWS CDK Infrastructure
2+
3+
This CDK project provisions the complete AWS infrastructure for the User Behavior Insights (UBI) to Learning to Rank (LTR) pipeline.
4+
5+
## Architecture Overview
6+
7+
```
8+
+------------------+
9+
| Web Browser |
10+
+--------+---------+
11+
|
12+
+--------v---------+
13+
| CloudFront |
14+
| (Static Assets) |
15+
+--------+---------+
16+
|
17+
+-------------------+-------------------+
18+
| |
19+
+---------v---------+ +---------v---------+
20+
| API Gateway | | React Frontend |
21+
| (REST API) | | (Search UI) |
22+
+---------+---------+ +-------------------+
23+
|
24+
+---------v---------+
25+
| Lambda |
26+
| (FastAPI) |
27+
+---------+---------+
28+
|
29+
+---------v---------+ +-----------------------+
30+
| OSI Pipeline | <--> | S3 Bucket |
31+
| (Unified) | | (Data Archival) |
32+
+----+--------+-----+ +-----------------------+
33+
| |
34+
+----v----+ +----v----+
35+
|ubi_events| |ubi_queries|
36+
+---------+ +---------+
37+
\ /
38+
+----v---v----+
39+
| OpenSearch |
40+
| Domain |
41+
+------+------+
42+
|
43+
+------v------+ +------------------+
44+
|Step Functions| --> | Bedrock |
45+
|(LTR Pipeline)| | (Claude 4.5) |
46+
+-------------+ +------------------+
47+
```
48+
49+
## Quick Start
50+
51+
### One-Command Deployment
52+
53+
```bash
54+
# Deploy with defaults (dev environment, us-east-1)
55+
./deploy.sh
56+
57+
# Deploy with custom settings
58+
./deploy.sh -e prod -r us-west-2
59+
60+
# Deploy without confirmation
61+
./deploy.sh -y
62+
```
63+
64+
The deployment script automatically:
65+
1. Checks prerequisites (AWS CLI, Node.js 18+, npm)
66+
2. Installs CDK and frontend dependencies
67+
3. Builds the React frontend application
68+
4. Bootstraps CDK if needed
69+
5. Deploys all 7 stacks (~30-40 minutes)
70+
6. Displays deployment outputs (URLs, credentials)
71+
72+
### One-Command Destruction
73+
74+
```bash
75+
# Destroy all resources (requires "DELETE" confirmation)
76+
./destroy.sh
77+
78+
# Destroy without confirmation (dangerous!)
79+
./destroy.sh -y
80+
81+
# Only cleanup orphaned resources
82+
./destroy.sh --cleanup-only
83+
```
84+
85+
## CDK Stacks
86+
87+
| Stack | Description | Est. Time |
88+
|-------|-------------|-----------|
89+
| **StorageStack** | S3 buckets for UBI data and LTR models | ~1 min |
90+
| **IamStack** | IAM roles for Lambda, Step Functions, OSI | ~1 min |
91+
| **OpenSearchStack** | OpenSearch domain with UBI configuration | ~25 min |
92+
| **OsiStack** | Unified OSI pipeline with type-based routing | ~3 min |
93+
| **ProcessingStack** | Step Functions LTR training pipeline | ~2 min |
94+
| **SetupStack** | OpenSearch index and mapping initialization | ~2 min |
95+
| **WebappStack** | API Gateway + Lambda + CloudFront | ~5 min |
96+
97+
### Stack Details
98+
99+
#### 1. StorageStack (`storage-stack.ts`)
100+
- **Data Bucket**: S3 bucket for UBI exports, judgments, and LTR training data
101+
- **DLQ Bucket**: Dead-letter queue for failed OSI ingestion events
102+
- Lifecycle policies for cost optimization
103+
104+
#### 2. IamStack (`iam-stack.ts`)
105+
- **OSI Pipeline Role**: Permissions for OpenSearch Ingestion
106+
- **Lambda Execution Role**: Permissions for Lambda functions (OpenSearch, Bedrock, S3)
107+
- **Step Functions Role**: Permissions for state machine orchestration
108+
109+
#### 3. OpenSearchStack (`opensearch-stack.ts`)
110+
- OpenSearch Service domain (t3.small.search for demo)
111+
- Fine-grained access control with Secrets Manager
112+
- Encryption at rest and in transit
113+
- CloudWatch logging enabled
114+
115+
#### 4. OsiStack (`osi-stack.ts`)
116+
- Unified OSI pipeline with routing by `type` field
117+
- Routes `type: "event"``ubi_events` index
118+
- Routes `type: "query"``ubi_queries` index
119+
- Single pipeline is more cost-effective than separate pipelines
120+
121+
#### 5. ProcessingStack (`processing-stack.ts`)
122+
- **Extract UBI Data Lambda**: Extracts queries and events from OpenSearch
123+
- **Generate Judgments Lambda**: Uses Claude Sonnet 4.5 for relevance judgments
124+
- **Prepare LTR Data Lambda**: Formats data for LTR training
125+
- **Train LTR Model Lambda**: Creates and uploads LTR model
126+
- **Step Functions State Machine**: Orchestrates the complete pipeline
127+
128+
#### 6. SetupStack (`setup-stack.ts`)
129+
- Custom resource for automatic index creation
130+
- Creates `ubi_queries` and `ubi_events` indexes with proper mappings
131+
- Creates `ecommerce_products` sample data index
132+
133+
#### 7. WebappStack (`webapp-stack.ts`)
134+
- **Lambda (FastAPI)**: Search API with UBI event logging
135+
- **API Gateway**: REST API with CORS support
136+
- **CloudFront**: CDN for React frontend and API
137+
- **S3 Bucket**: Static website hosting
138+
139+
## Configuration Options
140+
141+
| Context Variable | Default | Description |
142+
|-----------------|---------|-------------|
143+
| `envPrefix` | `dev` | Environment prefix for resource naming |
144+
| `region` | `us-east-1` | AWS region for deployment |
145+
| `opensearchVersion` | `3.3` | OpenSearch version |
146+
| `instanceType` | `t3.small.search` | OpenSearch instance type |
147+
| `instanceCount` | `1` | Number of data nodes |
148+
| `ebsVolumeSize` | `20` | EBS volume size in GB |
149+
| `dedicatedMaster` | `false` | Enable dedicated master nodes |
150+
| `multiAz` | `false` | Enable multi-AZ deployment |
151+
| `bedrockModelId` | `us.anthropic.claude-sonnet-4-5-20250929-v1:0` | Bedrock model ID |
152+
| `enableSchedule` | `false` | Enable scheduled pipeline execution |
153+
| `scheduleExpression` | `rate(1 day)` | Schedule expression |
154+
155+
## Manual CDK Commands
156+
157+
If you prefer to run CDK commands manually instead of using scripts:
158+
159+
```bash
160+
# Install dependencies
161+
npm install
162+
163+
# Build TypeScript
164+
npm run build
165+
166+
# Review changes
167+
cdk diff
168+
169+
# Deploy all stacks
170+
cdk deploy --all -c envPrefix=dev -c region=us-east-1
171+
172+
# Deploy with approval skip
173+
cdk deploy --all --require-approval never
174+
175+
# Destroy all stacks
176+
cdk destroy --all
177+
```
178+
179+
## Cost Estimation
180+
181+
| Resource | Configuration | Est. Monthly Cost |
182+
|----------|--------------|-------------------|
183+
| OpenSearch | t3.small.search, 20GB EBS | ~$30 |
184+
| OSI Pipeline | 1 OCU min/max | ~$15 |
185+
| Lambda | Occasional execution | ~$1 |
186+
| Step Functions | Occasional execution | ~$1 |
187+
| S3 | Standard, <1GB | ~$1 |
188+
| CloudFront | Standard distribution | ~$1 |
189+
| API Gateway | REST API | ~$1 |
190+
| Bedrock (Claude) | Per LTR pipeline run | ~$5-50 |
191+
| **Total** | | **~$55-100/month** |
192+
193+
> **Note:** Costs vary based on usage. The Bedrock cost depends on LTR pipeline execution frequency and data volume.
194+
195+
## Production Recommendations
196+
197+
1. **OpenSearch**:
198+
- Use `r6g.large.search` or larger instances
199+
- Enable Multi-AZ deployment
200+
- Configure dedicated master nodes
201+
- Increase EBS volume size
202+
203+
2. **Security**:
204+
- Deploy in private VPC subnets
205+
- Use VPC endpoints for AWS services
206+
- Enable encryption at rest with KMS CMK
207+
- Restrict access policies
208+
209+
3. **Monitoring**:
210+
- Enable CloudWatch alarms
211+
- Set up error notifications
212+
- Monitor Bedrock usage and costs
213+
214+
4. **Scaling**:
215+
- Adjust OSI pipeline min/max units
216+
- Configure Lambda concurrency
217+
- Use reserved capacity for Bedrock
218+
219+
## Troubleshooting
220+
221+
### Deployment Issues
222+
223+
**Problem:** `CDK bootstrap failed`
224+
```bash
225+
# Solution: Bootstrap manually with explicit account/region
226+
aws sts get-caller-identity # Verify credentials
227+
cdk bootstrap aws://ACCOUNT_ID/REGION
228+
```
229+
230+
**Problem:** `Resource already exists` error
231+
```bash
232+
# Solution: Delete the orphaned resource and redeploy
233+
./destroy.sh --cleanup-only
234+
./deploy.sh -y
235+
```
236+
237+
**Problem:** `OpenSearch domain creation timeout`
238+
- OpenSearch domain creation takes 20-30 minutes. This is normal.
239+
- Check CloudFormation console for progress.
240+
241+
### Runtime Issues
242+
243+
**Problem:** `403 Forbidden` when accessing OpenSearch
244+
- Verify IAM role permissions
245+
- Check OpenSearch access policy
246+
- Verify Secrets Manager credentials
247+
248+
**Problem:** `OSI pipeline not receiving data`
249+
```bash
250+
# Check OSI pipeline status
251+
aws osis get-pipeline --pipeline-name dev-ubi-pipeline --region us-east-1
252+
253+
# Check CloudWatch logs for OSI
254+
aws logs tail /aws/vendedlogs/osis-dev-ubi-pipeline --follow
255+
```
256+
257+
**Problem:** `Bedrock model access denied`
258+
- Request access to Claude Sonnet 4.5 in AWS Bedrock console
259+
- Verify the region supports the model ID
260+
261+
### Cleanup Issues
262+
263+
**Problem:** `Stack deletion failed`
264+
```bash
265+
# Force cleanup orphaned resources
266+
./destroy.sh --cleanup-only -y
267+
268+
# Check for remaining resources
269+
aws cloudformation list-stacks --stack-status-filter DELETE_FAILED
270+
```
271+
272+
## References
273+
274+
- [OpenSearch UBI Documentation](https://opensearch.org/docs/latest/search-plugins/ubi/)
275+
- [UBI AWS Tutorial](https://docs.opensearch.org/latest/search-plugins/ubi/ubi-aws-managed-services-tutorial/)
276+
- [OpenSearch LTR Plugin](https://opensearch.org/docs/latest/search-plugins/ltr/)
277+
- [AWS CDK Documentation](https://docs.aws.amazon.com/cdk/)
278+
- [Amazon Bedrock Claude](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-messages.html)

opensearch/opensearch_ubi/cdk.json

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
{
2+
"app": "npx ts-node --prefer-ts-exts bin/app.ts",
3+
"watch": {
4+
"include": [
5+
"**"
6+
],
7+
"exclude": [
8+
"README.md",
9+
"cdk*.json",
10+
"**/*.d.ts",
11+
"**/*.js",
12+
"tsconfig.json",
13+
"package*.json",
14+
"yarn.lock",
15+
"node_modules",
16+
"test"
17+
]
18+
},
19+
"context": {
20+
"@aws-cdk/aws-apigateway:usagePlanKeyOrderInsensitiveId": true,
21+
"@aws-cdk/core:stackRelativeExports": true,
22+
"@aws-cdk/aws-rds:lowercaseDbIdentifier": true,
23+
"@aws-cdk/aws-lambda:recognizeVersionProps": true,
24+
"@aws-cdk/aws-lambda:recognizeLayerVersion": true,
25+
"@aws-cdk/aws-cloudfront:defaultSecurityPolicyTLSv1.2_2021": true,
26+
"@aws-cdk-containers/ecs-service-extensions:enableDefaultLogDriver": true,
27+
"@aws-cdk/aws-ec2:uniqueImdsv2TemplateName": true,
28+
"@aws-cdk/aws-ecs:arnFormatIncludesClusterName": true,
29+
"@aws-cdk/aws-iam:minimizePolicies": true,
30+
"@aws-cdk/core:validateSnapshotRemovalPolicy": true,
31+
"@aws-cdk/aws-codepipeline:crossAccountKeyAliasStackSafeResourceName": true,
32+
"@aws-cdk/aws-s3:createDefaultLoggingPolicy": true,
33+
"@aws-cdk/aws-sns-subscriptions:restrictSqsDescryption": true,
34+
"@aws-cdk/aws-apigateway:disableCloudWatchRole": true,
35+
"@aws-cdk/core:enablePartitionLiterals": true,
36+
"@aws-cdk/core:target-partitions": [
37+
"aws",
38+
"aws-cn"
39+
]
40+
}
41+
}

0 commit comments

Comments
 (0)