Skip to content

Commit abd764d

Browse files
committed
add support for serverless and fix otel demo flags
Signed-off-by: ps48 <pshenoy36@gmail.com>
1 parent 269f13a commit abd764d

9 files changed

Lines changed: 127 additions & 12 deletions

File tree

.env

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -225,5 +225,9 @@ JAEGER_HOST=jaeger
225225
JAEGER_UI_PORT=16686
226226
JAEGER_GRPC_PORT=4317
227227

228+
# Telemetry Docs (referenced by frontend-proxy envoy template)
229+
TELEMETRY_DOCS_HOST=otel-collector
230+
TELEMETRY_DOCS_PORT=4318
231+
228232
# Java Options (workaround for OSX JDK bug)
229233
_JAVA_OPTIONS=

aws/cdk/lib/demo-workload.ts

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,17 @@ export class DemoWorkload extends Construct {
8686
'git clone --depth 1 https://github.com/opensearch-project/observability-stack.git /opt/obs-stack',
8787
'cd /opt/obs-stack',
8888
'',
89+
'# Patch otel-demo frontend-proxy: upstream envoy template references',
90+
'# ${TELEMETRY_DOCS_HOST}/${TELEMETRY_DOCS_PORT} but those aren\'t wired',
91+
'# through compose, so envoy bootstraps with an empty socket address and',
92+
'# crash-loops. Inject the vars and forward them into the service.',
93+
'if ! grep -q "^TELEMETRY_DOCS_HOST=" .env; then',
94+
' printf "\\nTELEMETRY_DOCS_HOST=otel-collector\\nTELEMETRY_DOCS_PORT=4318\\n" >> .env',
95+
'fi',
96+
'if ! grep -q "TELEMETRY_DOCS_HOST" docker-compose.otel-demo.yml; then',
97+
' sed -i "/^ - FLAGD_UI_PORT$/a\\ - TELEMETRY_DOCS_HOST\\n - TELEMETRY_DOCS_PORT" docker-compose.otel-demo.yml',
98+
'fi',
99+
'',
89100
'cat > docker-compose/otel-collector/config.yaml << \'COLLECTOREOF\'',
90101
'extensions:',
91102
' sigv4auth:',
@@ -157,6 +168,11 @@ export class DemoWorkload extends Construct {
157168
' logging: *logging',
158169
'MANAGEDEOF',
159170
'',
171+
'# Kafka\'s healthcheck can exceed compose\'s dependency grace window on',
172+
'# first boot, leaving kafka-dependent services in Created state. Retry',
173+
'# once — second pass finds kafka healthy and starts the stragglers.',
174+
'docker compose -f docker-compose.managed.yml up -d || true',
175+
'sleep 60',
160176
'docker compose -f docker-compose.managed.yml up -d',
161177
].join('\n'),
162178
{ OsiEndpoint: osiEndpoint },

aws/cli-installer/.claude/mind.mv2

21.3 MB
Binary file not shown.

aws/cli-installer/.claude/mind.mv2.lock

Whitespace-only changes.

aws/cli-installer/README.md

Lines changed: 27 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,26 @@
11
# Observability Stack AWS CLI
22

3-
Deploy the [Observability Stack](https://github.com/opensearch-project/observability-stack) on AWS managed services with a single command. Creates an OpenSearch domain, OSIS ingestion pipeline, Amazon Managed Prometheus workspace, and a fully configured OpenSearch UI with dashboards — plus an EC2 instance running demo workloads that generate telemetry out of the box.
3+
Deploy the [Observability Stack](https://github.com/opensearch-project/observability-stack) on AWS managed services with a single command. Creates an OpenSearch domain (or serverless collection), OSIS ingestion pipeline, Amazon Managed Prometheus workspace, and a fully configured OpenSearch UI with dashboards — plus an EC2 instance running demo workloads that generate telemetry out of the box.
44

55
## Quick Start
66

7+
**Managed domain (default):**
78
```bash
89
npx @opensearch-project/observability-stack
910
```
1011

11-
Takes ~15 minutes. When complete, the CLI prints a dashboard URL — open it and you're in.
12+
**Serverless collection (AOSS):**
13+
```bash
14+
npx @opensearch-project/observability-stack --serverless
15+
```
16+
17+
Takes ~15 minutes for managed, ~5 minutes for serverless. When complete, the CLI prints a dashboard URL — open it and you're in.
1218

1319
## What Gets Created
1420

1521
| Resource | Description |
1622
|---|---|
17-
| OpenSearch domain | Stores logs, traces, and service map data |
23+
| OpenSearch domain **or** serverless collection | Stores logs, traces, and service map data |
1824
| OSIS pipeline | Ingests OTLP data (logs, traces, metrics) via SigV4 |
1925
| Amazon Managed Prometheus | Stores time-series metrics |
2026
| Connected Data Source (Prometheus) | Connects AMP to OpenSearch for metric queries |
@@ -26,13 +32,20 @@ All resources are tagged with `observability-stack:pipeline-name` for identifica
2632

2733
## Usage
2834

29-
**Create everything from scratch:**
35+
**Create a managed domain deployment from scratch:**
3036
```bash
3137
node bin/cli-installer.mjs --managed \
3238
--pipeline-name obs-stack-<your-alias> \
3339
--region us-east-1
3440
```
3541

42+
**Create a serverless (AOSS) deployment from scratch:**
43+
```bash
44+
node bin/cli-installer.mjs --serverless \
45+
--pipeline-name obs-stack-<your-alias> \
46+
--region us-east-1
47+
```
48+
3649
**Reuse existing OpenSearch domain / AMP workspace:**
3750
```bash
3851
node bin/cli-installer.mjs --managed \
@@ -50,6 +63,13 @@ node bin/cli-installer.mjs --managed \
5063
--skip-demo
5164
```
5265

66+
**Launch demo workload against an existing pipeline:**
67+
```bash
68+
node bin/launch-demo.mjs \
69+
--pipeline-name obs-stack-<your-alias> \
70+
--region us-east-1
71+
```
72+
5373
**Interactive mode** (TUI wizard):
5474
```bash
5575
node bin/cli-installer.mjs
@@ -73,7 +93,6 @@ Deletes: EC2 instance, OpenSearch Application, Connected Data Source, OSIS pipel
7393

7494
## Known Limitations
7595

76-
- **AOS (managed domains) only** — AOSS (serverless) has blocking bugs and is not supported yet.
7796
- **Index pattern fields need manual refresh** — After data starts flowing, go to Management → Index Patterns → select pattern → click 🔄 to pick up new fields.
7897
- **Demo data takes 10-15 minutes** — The EC2 instance needs time to bootstrap Docker, pull images, and start sending telemetry.
7998
- **Idempotent but not updateable** — Running twice safely no-ops, but won't update existing resources with new config.
@@ -84,7 +103,9 @@ Deletes: EC2 instance, OpenSearch Application, Connected Data Source, OSIS pipel
84103

85104
```
86105
aws/cli-installer/
87-
├── bin/cli-installer.mjs # Entry point
106+
├── bin/
107+
│ ├── cli-installer.mjs # Entry point (full pipeline)
108+
│ └── launch-demo.mjs # Standalone EC2 demo launcher against an existing pipeline
88109
├── src/
89110
│ ├── main.mjs # CLI orchestration + executePipeline flow
90111
│ ├── cli.mjs # Argument parsing + config
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
#!/usr/bin/env node
2+
/**
3+
* Launch only the EC2 demo workload against an existing OSI pipeline.
4+
* Usage: AWS_PROFILE=<p> node bin/launch-demo.mjs --pipeline-name <name> --region <r>
5+
*/
6+
import { Command } from 'commander';
7+
import { OSISClient, GetPipelineCommand } from '@aws-sdk/client-osis';
8+
import { STSClient, GetCallerIdentityCommand } from '@aws-sdk/client-sts';
9+
import { launchDemoInstance } from '../src/ec2-demo.mjs';
10+
import { printError, printInfo, printSuccess } from '../src/ui.mjs';
11+
12+
const program = new Command()
13+
.requiredOption('--pipeline-name <name>', 'Existing OSI pipeline name')
14+
.requiredOption('--region <region>', 'AWS region');
15+
program.parse(process.argv);
16+
const opts = program.opts();
17+
18+
try {
19+
const sts = new STSClient({ region: opts.region });
20+
const { Account } = await sts.send(new GetCallerIdentityCommand({}));
21+
printInfo(`Account: ${Account}`);
22+
23+
const osis = new OSISClient({ region: opts.region });
24+
const { Pipeline } = await osis.send(new GetPipelineCommand({ PipelineName: opts.pipelineName }));
25+
const urls = Pipeline?.IngestEndpointUrls || [];
26+
if (!urls.length) {
27+
printError(`Pipeline ${opts.pipelineName} has no ingest endpoints`);
28+
process.exit(1);
29+
}
30+
printInfo(`Ingest endpoint: https://${urls[0]}`);
31+
32+
const cfg = {
33+
pipelineName: opts.pipelineName,
34+
region: opts.region,
35+
accountId: Account,
36+
ingestEndpoints: urls,
37+
};
38+
39+
const instanceId = await launchDemoInstance(cfg);
40+
printSuccess(`Done. Instance ${instanceId}`);
41+
printInfo(`Connect: aws ssm start-session --target ${instanceId} --region ${opts.region}`);
42+
} catch (e) {
43+
printError(e.message);
44+
process.exit(1);
45+
}

aws/cli-installer/src/ec2-demo.mjs

Lines changed: 32 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,8 @@ import {
77
EC2Client, RunInstancesCommand, DescribeInstancesCommand, TerminateInstancesCommand,
88
CreateSecurityGroupCommand, AuthorizeSecurityGroupEgressCommand, DeleteSecurityGroupCommand,
99
DescribeSecurityGroupsCommand, RevokeSecurityGroupEgressCommand,
10-
DescribeSubnetsCommand, waitUntilInstanceRunning, waitUntilInstanceTerminated,
10+
DescribeSubnetsCommand, DescribeInstanceTypeOfferingsCommand,
11+
waitUntilInstanceRunning, waitUntilInstanceTerminated,
1112
} from '@aws-sdk/client-ec2';
1213
import {
1314
IAMClient, CreateRoleCommand, PutRolePolicyCommand, CreateInstanceProfileCommand,
@@ -21,7 +22,7 @@ import {
2122
import { printStep, printSuccess, printWarning, printInfo, createSpinner } from './ui.mjs';
2223

2324
const TAG_KEY = 'observability-stack:pipeline-name';
24-
const INSTANCE_TYPE = 't3.2xlarge';
25+
const INSTANCE_TYPE = 't3.xlarge';
2526

2627
function tags(pipelineName, extra = {}) {
2728
return [
@@ -42,12 +43,22 @@ async function getLatestAL2023Ami(ssm) {
4243
return Parameter.Value;
4344
}
4445

45-
async function getDefaultVpcSubnet(ec2) {
46+
async function getDefaultVpcSubnet(ec2, instanceType) {
4647
const { Subnets } = await ec2.send(new DescribeSubnetsCommand({
4748
Filters: [{ Name: 'default-for-az', Values: ['true'] }],
4849
}));
4950
if (!Subnets?.length) throw new Error('No default VPC subnet found. Ensure a default VPC exists in this region.');
50-
return Subnets[0];
51+
52+
const { InstanceTypeOfferings } = await ec2.send(new DescribeInstanceTypeOfferingsCommand({
53+
LocationType: 'availability-zone',
54+
Filters: [{ Name: 'instance-type', Values: [instanceType] }],
55+
}));
56+
const supportedAzs = new Set((InstanceTypeOfferings || []).map(o => o.Location));
57+
const match = Subnets.find(s => supportedAzs.has(s.AvailabilityZone));
58+
if (!match) {
59+
throw new Error(`No default VPC subnet is in an AZ that supports ${instanceType}. Supported AZs: ${[...supportedAzs].join(', ') || '(none)'}`);
60+
}
61+
return match;
5162
}
5263

5364
function buildUserData(cfg) {
@@ -123,6 +134,17 @@ cat > docker-compose/otel-collector/config.yaml << 'COLLECTOREOF'
123134
${collectorConfig}
124135
COLLECTOREOF
125136
137+
# Patch otel-demo frontend-proxy: the upstream envoy template references
138+
# \${TELEMETRY_DOCS_HOST}/\${TELEMETRY_DOCS_PORT} but those aren't wired through
139+
# docker-compose, so envoy bootstraps with an empty socket address and crash-loops.
140+
# Inject the vars into .env and forward them into the frontend-proxy service.
141+
if ! grep -q '^TELEMETRY_DOCS_HOST=' .env; then
142+
printf '\nTELEMETRY_DOCS_HOST=otel-collector\nTELEMETRY_DOCS_PORT=4318\n' >> .env
143+
fi
144+
if ! grep -q 'TELEMETRY_DOCS_HOST' docker-compose.otel-demo.yml; then
145+
sed -i '/^ - FLAGD_UI_PORT$/a\\ - TELEMETRY_DOCS_HOST\\n - TELEMETRY_DOCS_PORT' docker-compose.otel-demo.yml
146+
fi
147+
126148
# Write a standalone compose file for managed mode (no local backends)
127149
cat > /opt/obs-stack/docker-compose.managed.yml << 'MANAGEDEOF'
128150
# Managed mode: only collector + workload services, no local backends
@@ -163,6 +185,11 @@ services:
163185
logging: *logging
164186
MANAGEDEOF
165187
188+
# Kafka's healthcheck can exceed compose's dependency grace window on first boot,
189+
# leaving kafka-dependent services in 'Created' state. Retry once — second pass
190+
# finds kafka healthy and starts the stragglers.
191+
docker compose -f docker-compose.managed.yml up -d || true
192+
sleep 60
166193
docker compose -f docker-compose.managed.yml up -d
167194
`).toString('base64');
168195
}
@@ -253,7 +280,7 @@ export async function launchDemoInstance(cfg) {
253280
const ssm = new SSMClient({ region: cfg.region });
254281

255282
const spinner = createSpinner('Looking up AMI and subnet...');
256-
const [ami, subnet] = await Promise.all([getLatestAL2023Ami(ssm), getDefaultVpcSubnet(ec2)]);
283+
const [ami, subnet] = await Promise.all([getLatestAL2023Ami(ssm), getDefaultVpcSubnet(ec2, INSTANCE_TYPE)]);
257284
spinner.stop(`AMI: ${ami}`);
258285

259286
const sgSpinner = createSpinner('Creating security group...');

docker-compose.otel-demo.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -333,6 +333,8 @@ services:
333333
- FLAGD_PORT
334334
- FLAGD_UI_HOST
335335
- FLAGD_UI_PORT
336+
- TELEMETRY_DOCS_HOST
337+
- TELEMETRY_DOCS_PORT
336338
depends_on:
337339
frontend:
338340
condition: service_started

docs/starlight-docs/src/content/docs/deploy/aws.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@ The IAM principal sending data needs `osis:Ingest` permission on the pipeline AR
9696

9797
**CLI installer:**
9898
```bash
99-
node bin/cli-installer.mjs --destroy --pipeline-name obs-stack --region us-west-2
99+
node bin/cli-installer.mjs destroy --pipeline-name obs-stack --region us-west-2
100100
```
101101

102102
The destroy command automatically detects and cleans up both managed domains and serverless collections associated with the pipeline name.

0 commit comments

Comments
 (0)