Skip to content

Commit c3e78b8

Browse files
committed
second iteration
1 parent 4a1e540 commit c3e78b8

File tree

7 files changed

+236
-1002
lines changed

7 files changed

+236
-1002
lines changed

paradedb/sample-movie-search/.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,3 +27,6 @@ npm-debug.log*
2727
# Local env
2828
.env
2929
.env.local
30+
31+
# Data
32+
data/

paradedb/sample-movie-search/Makefile

Lines changed: 26 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,23 @@
1-
.PHONY: install deploy init seed destroy clean help web-ui test-search get-api-url
1+
.PHONY: install deploy init seed destroy clean help web-ui test-search get-api-url download-data
2+
3+
DATASET_URL := https://docs.aws.amazon.com/opensearch-service/latest/developerguide/samples/sample-movies.zip
4+
DATA_DIR := data
25

36
help:
47
@echo "ParadeDB Movie Search Sample App"
58
@echo ""
69
@echo "Usage:"
7-
@echo " make install - Install all dependencies"
8-
@echo " make deploy - Deploy CDK stack to LocalStack"
9-
@echo " make init - Initialize database schema and BM25 index"
10-
@echo " make seed - Load movie data from S3 into ParadeDB"
11-
@echo " make web-ui - Run the Web UI on localhost port 3000"
12-
@echo " make destroy - Tear down the stack"
13-
@echo " make clean - Remove build artifacts"
10+
@echo " make install - Install all dependencies"
11+
@echo " make download-data - Download AWS sample movies dataset"
12+
@echo " make deploy - Deploy CDK stack to LocalStack"
13+
@echo " make init - Initialize database schema and BM25 index"
14+
@echo " make seed - Load movie data from S3 into ParadeDB"
15+
@echo " make web-ui - Run the Web UI on localhost port 3000"
16+
@echo " make destroy - Tear down the stack"
17+
@echo " make clean - Remove build artifacts"
1418
@echo ""
1519
@echo "Quick start:"
16-
@echo " make install && make deploy && make init && make seed && make web-ui"
20+
@echo " make install && make download-data && make deploy && make init && make seed && make web-ui"
1721

1822
install:
1923
@echo "Installing CDK dependencies..."
@@ -22,6 +26,18 @@ install:
2226
cd lambda && npm install
2327
@echo "Done!"
2428

29+
download-data:
30+
@echo "Downloading AWS sample movies dataset..."
31+
@mkdir -p $(DATA_DIR)
32+
@curl -sL $(DATASET_URL) -o $(DATA_DIR)/sample-movies.zip
33+
@echo "Extracting dataset..."
34+
@unzip -o $(DATA_DIR)/sample-movies.zip -d $(DATA_DIR)/
35+
@echo "Pre-processing bulk file (removing index instructions)..."
36+
@grep -v '^{ "index"' $(DATA_DIR)/sample-movies.bulk > $(DATA_DIR)/movies.bulk
37+
@rm -rf $(DATA_DIR)/sample-movies.zip $(DATA_DIR)/sample-movies.bulk $(DATA_DIR)/__MACOSX
38+
@echo "Dataset ready: $(DATA_DIR)/movies.bulk"
39+
@wc -l $(DATA_DIR)/movies.bulk | awk '{print "Total movies: " $$1}'
40+
2541
deploy:
2642
@echo "Deploying MovieSearchStack to LocalStack..."
2743
cdklocal bootstrap
@@ -74,7 +90,7 @@ destroy:
7490
@echo "Stack destroyed!"
7591

7692
clean:
77-
rm -rf node_modules lambda/node_modules cdk.out dist
93+
rm -rf node_modules lambda/node_modules cdk.out dist data/movies.bulk
7894
@echo "Cleaned!"
7995

8096
get-api-url:

paradedb/sample-movie-search/README.md

Lines changed: 66 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -11,13 +11,24 @@ This sample app deploys a serverless movie search application using:
1111
- **Amazon S3** - Stores movie dataset
1212
- **ParadeDB** - Full-text search engine (runs as LocalStack extension)
1313

14+
### Dataset
15+
16+
Uses the official [AWS OpenSearch sample movies dataset](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/samples/sample-movies.zip) containing **5,000 movies** with metadata including:
17+
18+
- Title, year, genres, rating
19+
- Directors and actors
20+
- Plot descriptions
21+
- Movie poster images
22+
- Runtime duration
23+
1424
### Features Demonstrated
1525

1626
| Feature | Description |
1727
|---------|-------------|
1828
| **BM25 Ranking** | Industry-standard relevance scoring |
1929
| **Fuzzy Matching** | Handles typos (e.g., "Godfater" finds "Godfather") |
2030
| **Highlighting** | Returns matched text with highlighted terms |
31+
| **Movie Posters** | Rich UI with movie poster images |
2132

2233
### API Endpoints
2334

@@ -48,19 +59,15 @@ localstack extensions install localstack-extension-paradedb
4859
localstack start
4960
```
5061

51-
### 2. Install Dependencies
62+
### 2. Install Dependencies and Download Dataset
5263

5364
```bash
5465
cd paradedb/sample-movie-search
5566
make install
67+
make download-data
5668
```
5769

58-
Or manually:
59-
60-
```bash
61-
npm install
62-
cd lambda && npm install
63-
```
70+
The `download-data` target downloads the AWS sample movies dataset (~5000 movies) and preprocesses it for ParadeDB ingestion.
6471

6572
### 3. Deploy the Stack
6673

@@ -110,19 +117,19 @@ make seed
110117

111118
```bash
112119
# Basic search
113-
curl "https://<api-id>.execute-api.localhost.localstack.cloud:4566/dev/search?q=redemption"
120+
curl "https://movie-search-api.execute-api.localhost.localstack.cloud:4566/dev/search?q=redemption"
114121

115122
# With pagination
116-
curl "https://<api-id>.execute-api.localhost.localstack.cloud:4566/dev/search?q=dark%20knight&limit=5&offset=0"
123+
curl "https://movie-search-api.execute-api.localhost.localstack.cloud:4566/dev/search?q=dark%20knight&limit=5&offset=0"
117124

118125
# Fuzzy search (handles typos)
119-
curl "https://<api-id>.execute-api.localhost.localstack.cloud:4566/dev/search?q=godfater"
126+
curl "https://movie-search-api.execute-api.localhost.localstack.cloud:4566/dev/search?q=godfater"
120127
```
121128

122129
### Get Movie Details
123130

124131
```bash
125-
curl "https://<api-id>.execute-api.localhost.localstack.cloud:4566/dev/movies/tt0111161"
132+
curl "https://movie-search-api.execute-api.localhost.localstack.cloud:4566/dev/movies/tt0111161"
126133
```
127134

128135
### Example Response
@@ -131,60 +138,75 @@ curl "https://<api-id>.execute-api.localhost.localstack.cloud:4566/dev/movies/tt
131138
{
132139
"success": true,
133140
"data": {
134-
"results": [
135-
{
136-
"id": "tt0111161",
137-
"title": "The Shawshank Redemption",
138-
"year": 1994,
139-
"genres": ["Drama"],
140-
"rating": 9.3,
141-
"directors": ["Frank Darabont"],
142-
"actors": ["Tim Robbins", "Morgan Freeman", "Bob Gunton"],
143-
"highlight": "...finding solace and eventual <mark>redemption</mark> through acts of common decency."
144-
}
141+
"id": "tt0111161",
142+
"title": "The Shawshank Redemption",
143+
"year": 1994,
144+
"genres": [
145+
"Crime",
146+
"Drama"
145147
],
146-
"total": 1,
147-
"limit": 10,
148-
"offset": 0
148+
"rating": 9.3,
149+
"directors": [
150+
"Frank Darabont"
151+
],
152+
"actors": [
153+
"Tim Robbins",
154+
"Morgan Freeman",
155+
"Bob Gunton"
156+
],
157+
"plot": "Two imprisoned men bond over a number of years, finding solace and eventual redemption through acts of common decency.",
158+
"image_url": "https://m.media-amazon.com/images/M/MV5BODU4MjU4NjIwNl5BMl5BanBnXkFtZTgwMDU2MjEyMDE@._V1_SX400_.jpg",
159+
"release_date": "1994-09-10T00:00:00.000Z",
160+
"rank": 80,
161+
"running_time_secs": 8520
149162
}
150163
}
151164
```
152165

153166
## Web UI
154167

155-
A minimal web UI is included in the `web/` directory. To use it:
168+
A web UI with movie posters is included in the `web/` directory.
156169

157-
1. Open `web/index.html` in a browser
158-
2. Set the API URL by opening the browser console and running:
170+
### Quick Start
159171

160-
```javascript
161-
setApiUrl('https://<api-id>.execute-api.localhost.localstack.cloud:4566/dev')
172+
```bash
173+
make web-ui
162174
```
163175

164-
3. Start searching!
176+
This starts a local web server at http://localhost:3000. The UI automatically connects to the API Gateway at `http://movie-search-api.execute-api.localhost.localstack.cloud:4566/dev`.
177+
178+
### Features
179+
180+
- Movie poster images from Amazon
181+
- Runtime display (e.g., "2h 22m")
182+
- Genre tags
183+
- Director and cast information
184+
- Search result highlighting
185+
- Pagination
165186

166187
## How It Works
167188

168-
1. **Deployment**: CDK creates Lambda functions, API Gateway, and S3 bucket with movie data
189+
1. **Dataset Preparation**: Download and preprocess the AWS OpenSearch sample movies dataset
190+
191+
2. **Deployment**: CDK creates Lambda functions, API Gateway, and S3 bucket with movie data (bulk format)
169192

170-
2. **Initialization**: The init Lambda creates the movies table and ParadeDB BM25 index:
193+
3. **Initialization**: The init Lambda creates the movies table and ParadeDB BM25 index:
171194
```sql
172-
CALL paradedb.create_bm25(
173-
index_name => 'movies_search_idx',
174-
table_name => 'movies',
175-
key_field => 'id',
176-
text_fields => paradedb.field('title') || paradedb.field('plot')
177-
);
195+
CREATE INDEX movies_search_idx ON movies
196+
USING bm25 (id, title, plot)
197+
WITH (key_field = 'id');
178198
```
179199

180-
3. **Data Loading**: The seed Lambda reads `movies.json` from S3 and inserts into ParadeDB
200+
4. **Data Loading**: The seed Lambda reads `movies.bulk` from S3 (newline-delimited JSON) and inserts 5000 movies into ParadeDB
181201

182-
4. **Search**: Queries use ParadeDB's BM25 search with fuzzy matching:
202+
5. **Search**: Queries use ParadeDB's BM25 search with fuzzy matching:
183203
```sql
184-
SELECT *, paradedb.snippet(plot) as highlight
204+
SELECT id, title, year, genres, rating, directors, actors, image_url, running_time_secs,
205+
pdb.snippet(plot, start_tag => '<mark>', end_tag => '</mark>') as highlight,
206+
pdb.score(id) as score
185207
FROM movies
186-
WHERE id @@@ paradedb.parse('title:query~1 OR plot:query~1')
187-
ORDER BY paradedb.score(id) DESC
208+
WHERE title ||| $1::pdb.fuzzy(1) OR plot ||| $1::pdb.fuzzy(1)
209+
ORDER BY score DESC
188210
```
189211

190212
## References

0 commit comments

Comments
 (0)