|
| 1 | +# ParadeDB Movie Search Sample App |
| 2 | + |
| 3 | +A CDK application demonstrating ParadeDB's full-text search capabilities with LocalStack. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +This sample app deploys a serverless movie search application using: |
| 8 | + |
| 9 | +- **AWS Lambda** - Handles search and data operations |
| 10 | +- **Amazon API Gateway** - REST API endpoints |
| 11 | +- **Amazon S3** - Stores movie dataset |
| 12 | +- **ParadeDB** - Full-text search engine (runs as LocalStack extension) |
| 13 | + |
| 14 | +### Dataset |
| 15 | + |
| 16 | +Uses the official [AWS OpenSearch sample movies dataset](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/samples/sample-movies.zip) containing **5,000 movies** with metadata including: |
| 17 | + |
| 18 | +- Title, year, genres, rating |
| 19 | +- Directors and actors |
| 20 | +- Plot descriptions |
| 21 | +- Movie poster images |
| 22 | +- Runtime duration |
| 23 | + |
| 24 | +### Features Demonstrated |
| 25 | + |
| 26 | +| Feature | Description | |
| 27 | +|---------|-------------| |
| 28 | +| **BM25 Ranking** | Industry-standard relevance scoring | |
| 29 | +| **Fuzzy Matching** | Handles typos (e.g., "Godfater" finds "Godfather") | |
| 30 | +| **Highlighting** | Returns matched text with highlighted terms | |
| 31 | +| **Movie Posters** | Rich UI with movie poster images | |
| 32 | + |
| 33 | +### API Endpoints |
| 34 | + |
| 35 | +| Method | Endpoint | Description | |
| 36 | +|--------|----------|-------------| |
| 37 | +| GET | `/search?q=<query>` | Search movies with BM25 ranking | |
| 38 | +| GET | `/movies/{id}` | Get movie details by ID | |
| 39 | +| POST | `/admin/init` | Initialize database schema | |
| 40 | +| POST | `/admin/seed` | Load movie data from S3 | |
| 41 | + |
| 42 | +## Prerequisites |
| 43 | + |
| 44 | +- [LocalStack](https://localstack.cloud/) installed and running |
| 45 | +- [Node.js](https://nodejs.org/) 18+ installed |
| 46 | +- [AWS CDK Local](https://github.com/localstack/aws-cdk-local) (`npm install -g aws-cdk-local`) |
| 47 | +- [AWS CLI](https://aws.amazon.com/cli/) configured |
| 48 | +- ParadeDB extension installed in LocalStack |
| 49 | + |
| 50 | +## Setup |
| 51 | + |
| 52 | +### 1. Start LocalStack with ParadeDB Extension |
| 53 | + |
| 54 | +```bash |
| 55 | +# Install the ParadeDB extension |
| 56 | +localstack extensions install localstack-extension-paradedb |
| 57 | + |
| 58 | +# Start LocalStack |
| 59 | +localstack start |
| 60 | +``` |
| 61 | + |
| 62 | +### 2. Install Dependencies and Download Dataset |
| 63 | + |
| 64 | +```bash |
| 65 | +cd paradedb/sample-movie-search |
| 66 | +make install |
| 67 | +make download-data |
| 68 | +``` |
| 69 | + |
| 70 | +The `download-data` target downloads the AWS sample movies dataset (~5000 movies) and preprocesses it for ParadeDB ingestion. |
| 71 | + |
| 72 | +### 3. Deploy the Stack |
| 73 | + |
| 74 | +```bash |
| 75 | +make deploy |
| 76 | +``` |
| 77 | + |
| 78 | +Or manually: |
| 79 | + |
| 80 | +```bash |
| 81 | +cdklocal bootstrap |
| 82 | +cdklocal deploy |
| 83 | +``` |
| 84 | + |
| 85 | +After deployment, you'll see output similar to: |
| 86 | + |
| 87 | +``` |
| 88 | +Outputs: |
| 89 | +MovieSearchStack.ApiEndpoint = https://movie-search-api.execute-api.localhost.localstack.cloud:4566/dev/ |
| 90 | +MovieSearchStack.DataBucketName = movie-search-data |
| 91 | +MovieSearchStack.InitEndpoint = https://movie-search-api.execute-api.localhost.localstack.cloud:4566/dev/admin/init |
| 92 | +MovieSearchStack.MovieSearchApiEndpointB25066EC = https://movie-search-api.execute-api.localhost.localstack.cloud:4566/dev/ |
| 93 | +MovieSearchStack.MoviesEndpoint = https://movie-search-api.execute-api.localhost.localstack.cloud:4566/dev/movies/{id} |
| 94 | +MovieSearchStack.SearchEndpoint = https://movie-search-api.execute-api.localhost.localstack.cloud:4566/dev/search |
| 95 | +MovieSearchStack.SeedEndpoint = https://movie-search-api.execute-api.localhost.localstack.cloud:4566/dev/admin/seed |
| 96 | +``` |
| 97 | + |
| 98 | +### 4. Initialize Database |
| 99 | + |
| 100 | +Create the movies table and BM25 search index: |
| 101 | + |
| 102 | +```bash |
| 103 | +make init |
| 104 | +``` |
| 105 | + |
| 106 | +### 5. Seed Data |
| 107 | + |
| 108 | +Load movie data from S3 into ParadeDB: |
| 109 | + |
| 110 | +```bash |
| 111 | +make seed |
| 112 | +``` |
| 113 | + |
| 114 | +## Usage |
| 115 | + |
| 116 | +### Search Movies |
| 117 | + |
| 118 | +```bash |
| 119 | +# Basic search |
| 120 | +curl "http://movie-search-api.execute-api.localhost.localstack.cloud:4566/dev/search?q=redemption" |
| 121 | + |
| 122 | +# With pagination |
| 123 | +curl "http://movie-search-api.execute-api.localhost.localstack.cloud:4566/dev/search?q=dark%20knight&limit=5&offset=0" |
| 124 | + |
| 125 | +# Fuzzy search (handles typos) |
| 126 | +curl "http://movie-search-api.execute-api.localhost.localstack.cloud:4566/dev/search?q=godfater" |
| 127 | +``` |
| 128 | + |
| 129 | +### Get Movie Details |
| 130 | + |
| 131 | +```bash |
| 132 | +curl "http://movie-search-api.execute-api.localhost.localstack.cloud:4566/dev/movies/tt0111161" |
| 133 | +``` |
| 134 | + |
| 135 | +### Example Response |
| 136 | + |
| 137 | +```json |
| 138 | +{ |
| 139 | + "success": true, |
| 140 | + "data": { |
| 141 | + "id": "tt0111161", |
| 142 | + "title": "The Shawshank Redemption", |
| 143 | + "year": 1994, |
| 144 | + "genres": [ |
| 145 | + "Crime", |
| 146 | + "Drama" |
| 147 | + ], |
| 148 | + "rating": 9.3, |
| 149 | + "directors": [ |
| 150 | + "Frank Darabont" |
| 151 | + ], |
| 152 | + "actors": [ |
| 153 | + "Tim Robbins", |
| 154 | + "Morgan Freeman", |
| 155 | + "Bob Gunton" |
| 156 | + ], |
| 157 | + "plot": "Two imprisoned men bond over a number of years, finding solace and eventual redemption through acts of common decency.", |
| 158 | + "image_url": "https://m.media-amazon.com/images/M/MV5BODU4MjU4NjIwNl5BMl5BanBnXkFtZTgwMDU2MjEyMDE@._V1_SX400_.jpg", |
| 159 | + "release_date": "1994-09-10T00:00:00.000Z", |
| 160 | + "rank": 80, |
| 161 | + "running_time_secs": 8520 |
| 162 | + } |
| 163 | +} |
| 164 | +``` |
| 165 | + |
| 166 | +## Web UI |
| 167 | + |
| 168 | +A web UI with movie posters is included in the `web/` directory. |
| 169 | + |
| 170 | +### Quick Start |
| 171 | + |
| 172 | +```bash |
| 173 | +make web-ui |
| 174 | +``` |
| 175 | + |
| 176 | +This starts a local web server at http://localhost:3000. The UI automatically connects to the API Gateway at `http://movie-search-api.execute-api.localhost.localstack.cloud:4566/dev`. |
| 177 | + |
| 178 | +<img width="2880" height="1402" alt="image" src="https://gist.github.com/user-attachments/assets/63986bfe-709b-4bde-bac8-4df2b15bd41a" /> |
| 179 | + |
| 180 | +## How It Works |
| 181 | + |
| 182 | +1. **Dataset Preparation**: Download and preprocess the AWS OpenSearch sample movies dataset |
| 183 | + |
| 184 | +2. **Deployment**: CDK creates Lambda functions, API Gateway, and S3 bucket with movie data (bulk format) |
| 185 | + |
| 186 | +3. **Initialization**: The init Lambda creates the movies table and ParadeDB BM25 index: |
| 187 | + ```sql |
| 188 | + CREATE INDEX movies_search_idx ON movies |
| 189 | + USING bm25 (id, title, plot) |
| 190 | + WITH (key_field = 'id'); |
| 191 | + ``` |
| 192 | + |
| 193 | +4. **Data Loading**: The seed Lambda reads `movies.bulk` from S3 (newline-delimited JSON) and inserts 5000 movies into ParadeDB |
| 194 | + |
| 195 | +5. **Search**: Queries use ParadeDB's BM25 search with fuzzy matching: |
| 196 | + ```sql |
| 197 | + SELECT id, title, year, genres, rating, directors, actors, image_url, running_time_secs, |
| 198 | + pdb.snippet(plot, start_tag => '<mark>', end_tag => '</mark>') as highlight, |
| 199 | + pdb.score(id) as score |
| 200 | + FROM movies |
| 201 | + WHERE title ||| $1::pdb.fuzzy(1) OR plot ||| $1::pdb.fuzzy(1) |
| 202 | + ORDER BY score DESC |
| 203 | + ``` |
| 204 | + |
| 205 | +## References |
| 206 | + |
| 207 | +- [ParadeDB Documentation](https://docs.paradedb.com/) |
| 208 | +- [LocalStack Extensions](https://docs.localstack.cloud/aws/tooling/extensions/) |
| 209 | +- [AWS CDK Local](https://github.com/localstack/aws-cdk-local) |
0 commit comments