Skip to content

Commit 76c6853

Browse files
markjbrownCopilot
andcommitted
Refresh README to match current project reality
- Replace restaurants/Python-script narrative with real bookings/listings + monitor web app - Document the four monitor-app tabs (Topology, Bookings, Vector Search, Load) - Replace 'big red button' with the actual Promote to primary + Rebuild replica flow - Drop references to non-existent scripts (query_examples.py, vector_restaurants_demo.py, generate_restaurants.py) - Update repo structure to match what's actually in the tree - Document start.ps1 as the one-shot launcher - Add load-data.sh remote-seed usage - Link to demo/01-local..06-multicloud runbooks Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent 3b22cf9 commit 76c6853

1 file changed

Lines changed: 110 additions & 76 deletions

File tree

README.md

Lines changed: 110 additions & 76 deletions
Original file line numberDiff line numberDiff line change
@@ -6,111 +6,120 @@ Build, test, and scale across clouds with DocumentDB.
66
**Event:** Techorama Belgium 2026 (May 11-13, Antwerp)
77
**Speaker:** Mark Brown
88

9-
## What You'll Learn
9+
## What You'll See
1010

11-
- Set up a complete local development environment in under 5 minutes
12-
- Build AI-powered features with built-in vector search (no API keys required)
13-
- Use Index Advisor to automatically optimize slow queries
14-
- Deploy to Kubernetes on any cloud provider
15-
- Implement cross-cloud failover for true high availability
16-
- Build CI/CD pipelines that test against real DocumentDB instances
11+
- Spin up DocumentDB locally with Docker Compose in under 60 seconds
12+
- Run semantic vector search on a real listings dataset (HNSW, 1536-dim, cosine)
13+
- Deploy DocumentDB across **AKS + EKS** with cross-cloud WAL replication over an Istio mesh
14+
- Drive a realistic travel-booking workload through a live web app
15+
- Trigger a one-click cross-cloud failover and watch writes flip in real time
16+
- Observe everything through Grafana dashboards on both clusters
1717

18-
## Quick Start
18+
## Quick start — local
1919

2020
```bash
21-
# Clone and start everything (DocumentDB + auto-loaded sample data)
2221
git clone https://github.com/AzureCosmosDB/documentdb-local-to-multicloud.git
2322
cd documentdb-local-to-multicloud
2423
docker compose up -d
24+
```
25+
26+
Docker Compose starts the `documentdb-local` container and auto-seeds the
27+
`bookingsdb.listings` collection (1,000 listing documents with 1536-dim
28+
`descriptionVector` fields) plus the vector + supporting indexes. Default
29+
credentials: `demo` / `demo` on `localhost:27017`.
30+
31+
Connect with mongosh:
2532

26-
# Connect with mongosh
33+
```bash
2734
mongosh "mongodb://demo:demo@localhost:27017/?tls=true&tlsAllowInvalidCertificates=true"
2835
```
2936

30-
That's it. DocumentDB is running locally with 20,000 restaurant documents and vector embeddings pre-loaded.
37+
Or use the **DocumentDB for VS Code** extension
38+
([Marketplace](https://marketplace.visualstudio.com/items?itemName=ms-azuretools.vscode-documentdb))
39+
and point it at the same connection string.
3140

3241
## Prerequisites
3342

34-
- [Docker Desktop](https://www.docker.com/)
35-
- [Visual Studio Code](https://code.visualstudio.com/)
36-
- [DocumentDB for VS Code Extension](https://marketplace.visualstudio.com/items?itemName=ms-documentdb.vscode-documentdb)
37-
- Python 3.11+ (for demo scripts)
43+
- [Docker Desktop](https://www.docker.com/) — local stack
44+
- [Visual Studio Code](https://code.visualstudio.com/) + [DocumentDB for VS Code](https://marketplace.visualstudio.com/items?itemName=ms-azuretools.vscode-documentdb)
45+
- [mongosh](https://www.mongodb.com/try/download/shell) — optional shell access
46+
- [Node.js 20+](https://nodejs.org/) — for the monitor web app
47+
- For the multi-cloud demo: Azure CLI, AWS CLI, `eksctl`, `kubectl`, `helm`, and a bash-capable shell (Git Bash on Windows)
48+
49+
See [SETUP.md](SETUP.md) for the full tooling matrix and cloud-account requirements.
3850

39-
## Repository Structure
51+
## Repository structure
4052

4153
```
42-
├── scripts/ # Demo scripts (query, vector search, data gen)
43-
│ ├── query_examples.py # Index Advisor demo (before/after COLLSCAN→IXSCAN)
44-
│ ├── vector_restaurants_demo.py # Vector search with fake embeddings
45-
│ ├── fake_embeddings.py # Deterministic embeddings (no API key needed)
46-
│ ├── generate_restaurants.py # Generate synthetic restaurant data
47-
│ └── load_restaurants.py # Load data into DocumentDB
48-
├── data/ # Sample datasets
49-
│ ├── restaurants.json # 20K restaurant documents
50-
│ ├── restaurants_vectors.json # Same + 256-dim vector embeddings
51-
│ └── embedded_data.json # 1K Airbnb listings with OpenAI embeddings
54+
├── docker-compose.yml # Local DocumentDB + auto-seeded listings
55+
├── docker/seed/ # Seed container (loads data/listings_vectors.json)
56+
├── data/ # Demo datasets + load scripts
57+
│ ├── listings_vectors.json # 1K listings + 1536-dim embeddings
58+
│ ├── load-data.sh # Seed any DocumentDB target via $MONGODB_URI
59+
│ └── load-data.ps1 # PowerShell equivalent
60+
├── app/monitor-app/ # Live monitoring + failover web app (Node/Express)
5261
├── infra/
53-
│ ├── azure/ # Bicep template + deploy script for AKS
54-
│ ├── aws/ # eksctl config + deploy script for EKS
55-
│ └── scripts/ # Start/stop for cost management
56-
├── docker-compose.yml # One-command local setup with auto-seeding
57-
├── docker/seed/ # Auto-seed containers
58-
├── k8s/ # Kubernetes manifests (AKS + EKS)
59-
├── demo/ # Per-section demo guides with commands
60-
├── docs/ # Presenter runbook + technical docs
61-
├── monitoring/ # Prometheus alerts
62-
├── tests/ # Integration tests
63-
├── .github/workflows/ # CI/CD with DocumentDB emulator
64-
├── .devcontainer/ # GitHub Codespaces config
65-
└── SETUP.md # Detailed setup instructions
66-
```
67-
68-
## Demo Scripts
69-
70-
### Index Advisor Demo (query_examples.py)
71-
72-
Interactive demo showing before/after query optimization:
73-
74-
```bash
75-
pip install -r requirements.txt
76-
python scripts/query_examples.py
62+
│ ├── multi-cloud/ # Fleet hub + AKS + EKS + Istio + operator
63+
│ ├── azure/ # Standalone AKS deploy
64+
│ └── aws/ # Standalone EKS deploy
65+
├── monitoring/ # kube-prometheus-stack Helm values + dashboards
66+
├── demo/ # Per-section runbooks (01-local … 06-multicloud)
67+
├── docs/ # Presenter runbook + technical notes
68+
├── scripts/ # Ad-hoc utilities (cert recovery, data load helpers)
69+
├── start.ps1 / start.bat # Launch monitor app + Grafana tabs
70+
├── fix-rebuild.bat # Manual recovery for operator cert-mismatch bug
71+
└── SETUP.md # Full setup instructions
7772
```
7873

79-
Shows 6 scenarios with COLLSCAN → IXSCAN transitions, timing comparisons, and compound index creation.
74+
## The monitor app
8075

81-
### Vector Search Demo (vector_restaurants_demo.py)
76+
The centerpiece of the live demo is the **Multi-Cloud Monitor** web app at
77+
[`app/monitor-app/`](app/monitor-app/README.md). One launcher does everything:
8278

83-
Semantic search using deterministic fake embeddings (no OpenAI API key required):
84-
85-
```bash
86-
python scripts/vector_restaurants_demo.py --query "cozy romantic date night pasta" --mode compact --k 10
79+
```powershell
80+
# From the repo root
81+
.\start.ps1
8782
```
8883

89-
### Cross-Cloud Failover Demo App (app/failover-demo)
84+
This sets the env vars, launches the Node server on `http://localhost:5174`,
85+
and opens the monitor UI plus both Grafana dashboards. Flags:
86+
`-NoGrafana` skips the Grafana tabs, `-NoBrowser` skips opening tabs entirely.
9087

91-
Live web app with a "big red button" that promotes the EKS replica to primary while writes flip in real time. See [`app/failover-demo/README.md`](app/failover-demo/README.md).
88+
The UI has four tabs:
9289

93-
### Data Generation
90+
| Tab | What it does |
91+
| --- | --- |
92+
| **Topology** | Live status of every cluster + per-replica WAL lag from `pg_stat_replication`. **Promote to primary** button triggers a one-click cross-cloud failover via `kubectl documentdb promote`. **Rebuild replica** wipes the replica's PVCs and rebuilds via `pg_basebackup` — with a background watcher that auto-recovers from a known cert-mismatch bug in the operator. |
93+
| **Bookings** | Direct Mongo reads/writes against the **current primary** for `bookingsdb.bookings`. Each insert is followed by a wait-for-replication check so you can see the row appear on the replica with a measured lag. |
94+
| **Vector Search** | `$vectorSearch` queries over `bookingsdb.listings` against the **local Docker** instance, using the HNSW index. Try queries like *"cozy mountain cabin with hot tub"*. |
95+
| **Load** | A travel-booking workload generator: 80% browse / 15% detail / 4% insert / 1% confirm against the current primary. RPS slider with presets (idle / morning / peak / Black Friday) and an in-flight semaphore to keep client and server pools healthy. |
9496

95-
Generate fresh restaurant data with configurable hot clusters:
97+
First-time setup of the monitor:
9698

97-
```bash
98-
python scripts/generate_restaurants.py --count 5000 --hot-count 1000 --hot-cuisine Italian
99+
```powershell
100+
cd app\monitor-app
101+
npm install
99102
```
100103

101-
## Multi-Cloud Deployment
104+
## Multi-cloud deployment
102105

103-
> **Shell on Windows**: run these scripts from **Git Bash**. WSL has DNS issues with `login.microsoftonline.com`. PowerShell is fine for the docker compose / Python steps above, but the deploy/cleanup scripts call bash directly.
106+
> **Shell on Windows:** run the deploy/cleanup scripts from **Git Bash**.
107+
> WSL has DNS issues with `login.microsoftonline.com`. PowerShell is fine for
108+
> `docker compose`, `npm`, and the CLI tools above.
104109
105-
The talk now uses the upstream `documentdb-playground/multi-cloud-deployment`
106-
setup (vendored into [`infra/multi-cloud/`](infra/multi-cloud/README.md)) which
107-
gives **real cross-cloud replication** instead of two unrelated clusters:
110+
The talk uses the upstream `documentdb-playground/multi-cloud-deployment`
111+
setup (vendored into [`infra/multi-cloud/`](infra/multi-cloud/README.md)),
112+
which gives **real cross-cloud replication** instead of two unrelated
113+
clusters:
108114

109-
- **AKS Fleet hub** in eastus2 (KubeFleet control plane)
115+
- **AKS Fleet hub** in eastus2 (KubeFleet control plane for the DocumentDB CR)
110116
- **AKS member** in eastus2 (DocumentDB primary by default)
111117
- **EKS member** in us-west-2 (WAL replica)
112118
- **Istio multi-cluster mesh** with shared root CA + east-west gateways for
113119
cross-cloud service discovery and mTLS-encrypted WAL replication
120+
- **DocumentDB operator** on top of CloudNativePG, deployed to all members
121+
- **kube-prometheus-stack** on each member, exposing Grafana with a shared
122+
`documentdb-failover` dashboard
114123

115124
```bash
116125
# Sign in
@@ -128,29 +137,54 @@ bash infra/multi-cloud/cleanup.sh -y --wait
128137
```
129138

130139
See [`infra/multi-cloud/README.md`](infra/multi-cloud/README.md) for env-var
131-
configuration (e.g. `INCLUDE_GKE=true` for a third leg, `PRIMARY_CLUSTER=...`
132-
to start with EKS as primary).
140+
configuration (`INCLUDE_GKE=true` for a third leg, `PRIMARY_CLUSTER=...` to
141+
start with EKS as primary, etc.).
133142

134143
The standalone single-cluster deploys are still in `infra/azure/` and
135144
`infra/aws/` for anyone who wants the simpler "one cloud at a time" demo, but
136-
the talk and `demo/04-aks` / `demo/05-eks` / `demo/06-multicloud` runbooks
145+
the talk and the `demo/04-aks`, `demo/05-eks`, `demo/06-multicloud` runbooks
137146
target the multi-cloud stack.
138147

139-
### Cost Management
148+
### Cost management
140149

141150
```bash
142-
bash infra/scripts/start.sh # Start clusters for rehearsal/demo
143-
bash infra/scripts/stop.sh # Stop AKS / delete EKS to save costs
151+
bash infra/scripts/start.sh # Start clusters for rehearsal
152+
bash infra/scripts/stop.sh # Stop AKS / delete EKS to save costs
144153
bash infra/multi-cloud/cleanup.sh -y --wait # Full multi-cloud teardown
145154
```
146155

147156
| Stack | Running | Stopped | Cleaned up |
148157
| --- | --- | --- | --- |
149158
| Multi-cloud (Fleet + AKS + EKS) | ~$13/day | n/a (tear down to save) | $0 |
150159
| AKS standalone | ~$17/day | ~$0.03/day (disk) | $0 |
151-
| EKS standalone | ~$5-8/day | n/a (must delete) | $0 |
160+
| EKS standalone | ~$5–8/day | n/a (must delete) | $0 |
161+
162+
## Loading data into a remote cluster
163+
164+
`data/load-data.sh` (or `load-data.ps1`) seeds any DocumentDB instance —
165+
local, AKS, or EKS — given a `MONGODB_URI`. It loads
166+
`data/listings_vectors.json` into `bookingsdb.listings`, creates the
167+
`vectorSearchIndex` (HNSW, cosine, 1536-dim) and the supporting query indexes,
168+
and is idempotent.
169+
170+
```bash
171+
MONGODB_URI="mongodb://demo:demo@<lb-host>:10260/?tls=true&tlsAllowInvalidCertificates=true" \
172+
bash data/load-data.sh
173+
```
174+
175+
## Demo runbooks
176+
177+
Section-by-section command sheets the speaker actually uses:
178+
179+
- [`demo/01-local`](demo/01-local) — Local Docker + VS Code extension
180+
- [`demo/02-vector-search`](demo/02-vector-search) — Vector search walkthrough
181+
- [`demo/03-cicd`](demo/03-cicd) — CI/CD against a DocumentDB emulator
182+
- [`demo/04-aks`](demo/04-aks) — Single-cloud on AKS
183+
- [`demo/05-eks`](demo/05-eks) — Single-cloud on EKS
184+
- [`demo/06-multicloud`](demo/06-multicloud) — Cross-cloud failover
152185

153-
See [SETUP.md](SETUP.md) for detailed instructions.
186+
Presenter notes (timing, fallback plans, talk-track) live in
187+
[`docs/presenter-runbook.md`](docs/presenter-runbook.md).
154188

155189
## License
156190

0 commit comments

Comments
 (0)