Skip to content

Commit a2b73ab

Browse files
fr: gRPC VectorService gateway for binary-efficient vector
1 parent af32a4f commit a2b73ab

File tree

23 files changed

+1838
-95
lines changed

23 files changed

+1838
-95
lines changed

.github/workflows/ci.yml

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
name: CI
2+
3+
on:
4+
push:
5+
branches: [main, master]
6+
pull_request:
7+
branches: [main, master]
8+
9+
jobs:
10+
lint-build-test:
11+
runs-on: ubuntu-latest
12+
steps:
13+
- uses: actions/checkout@v4
14+
15+
- name: Setup Node.js
16+
uses: actions/setup-node@v4
17+
with:
18+
node-version: "20"
19+
cache: "npm"
20+
21+
- name: Install dependencies
22+
run: npm ci
23+
24+
- name: Lint
25+
run: npm run lint
26+
27+
- name: Build
28+
run: npm run build
29+
env:
30+
GOOGLE_GENERATIVE_AI_API_KEY: ${{ secrets.GOOGLE_GENERATIVE_AI_API_KEY || 'sk-dummy' }}
31+
UPSTASH_REDIS_REST_URL: ${{ secrets.UPSTASH_REDIS_REST_URL || 'https://dummy.upstash.io' }}
32+
UPSTASH_REDIS_REST_TOKEN: ${{ secrets.UPSTASH_REDIS_REST_TOKEN || 'dummy' }}
33+
UPSTASH_VECTOR_REST_URL: ${{ secrets.UPSTASH_VECTOR_REST_URL || 'https://dummy.upstash.io' }}
34+
UPSTASH_VECTOR_REST_TOKEN: ${{ secrets.UPSTASH_VECTOR_REST_TOKEN || 'dummy' }}
35+
36+
- name: Test
37+
run: npm run test

Dockerfile

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
# Build stage
2+
FROM node:20-alpine AS builder
3+
4+
WORKDIR /app
5+
6+
COPY package.json package-lock.json* ./
7+
RUN npm ci
8+
9+
COPY . .
10+
RUN npm run build
11+
12+
# Production stage
13+
FROM node:20-alpine AS runner
14+
15+
WORKDIR /app
16+
17+
ENV NODE_ENV=production
18+
ENV NEXT_TELEMETRY_DISABLED=1
19+
20+
RUN addgroup --system --gid 1001 nodejs
21+
RUN adduser --system --uid 1001 nextjs
22+
23+
COPY --from=builder /app/public ./public
24+
COPY --from=builder --chown=nextjs:nodejs /app/.next/standalone ./
25+
COPY --from=builder --chown=nextjs:nodejs /app/.next/static ./.next/static
26+
27+
USER nextjs
28+
29+
EXPOSE 3000
30+
31+
ENV PORT=3000
32+
ENV HOSTNAME="0.0.0.0"
33+
34+
CMD ["node", "server.js"]

README.md

Lines changed: 39 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,11 @@
2626
- **Semantic Caching**: 30-60% cost reduction via intelligent query caching
2727
- **Streaming Citations**: Citations stream before the answer for instant transparency
2828
- **Hybrid Runtime**: Edge for chat, Node.js for ingestion (optimal performance)
29+
- **Observability**: OpenTelemetry (ingest/retrieval spans), Langfuse (LLM traces, token usage)
30+
- **CI/CD & Testing**: GitHub Actions (lint, build, test), Vitest unit tests
31+
- **Docker**: Standalone image + docker-compose for local/self-hosted run
2932
- **Modern UI**: Beautiful, responsive chat interface with real-time streaming
33+
- **Optional gRPC Gateway**: Binary-efficient vector upsert/query sidecar (Proto + Node server)
3034

3135
## Architecture
3236

@@ -141,6 +145,10 @@ graph TB
141145
| **Cache** | Upstash Redis | Semantic caching for cost reduction |
142146
| **PDF Parsing** | `pdfjs-dist` | Client-side to avoid Edge limits |
143147
| **UI** | React + `@ai-sdk/react` | Streaming chat with citation support |
148+
| **Observability** | OpenTelemetry, Langfuse | Traces (ingest/retrieval), LLM generations & token usage |
149+
| **CI/CD** | GitHub Actions | Lint, build, test on push/PR |
150+
| **Testing** | Vitest | Unit tests (chunking, observability) |
151+
| **Containers** | Docker, docker-compose | Local/self-hosted run |
144152

145153
## Project Structure
146154

@@ -157,32 +165,23 @@ serverless-rag/
157165
│ │ └── page.tsx # Main chat UI
158166
│ ├── lib/
159167
│ │ ├── chunking.ts # Text chunking utilities
160-
│ │ ├── retrieval.ts # Vector search & cache logic
161-
│ │ └── upstash.ts # Upstash client initialization
162-
│ └── proxy.ts # Semantic cache proxy
168+
│ │ ├── observability.ts # OpenTelemetry spans (Node)
169+
│ │ ├── retrieval.ts # Vector search & cache logic
170+
│ │ └── upstash.ts # Upstash client initialization
171+
│ ├── instrumentation.ts # Next.js OTel registration
172+
│ └── proxy.ts # Semantic cache proxy
173+
├── .github/workflows/ci.yml # CI: lint, build, test
174+
├── services/vector-grpc/ # Optional gRPC vector gateway (proto + Node server)
175+
├── services/ingest-grpc/ # Optional gRPC bulk-ingest proto & design
163176
├── public/
164-
│ └── pdf.worker.min.mjs # PDF.js worker (static asset)
165-
└── env.example # Environment variable template
177+
│ └── pdf.worker.min.mjs # PDF.js worker (static asset)
178+
├── Dockerfile # Standalone image
179+
├── docker-compose.yml # Local run with env
180+
└── env.example # Environment variable template
166181
```
167182

168183
## Key Design Decisions
169184

170-
### Why Gemini 2.0 Flash by Default?
171-
172-
**The Scale-to-Zero Constraint:**
173-
The project's core value proposition is "$0/month when idle." This dictates every architectural decision, including model choice.
174-
175-
- **Cost Efficiency**: At $0.10 per 1M input tokens vs $1.25 for Pro (12.5x cheaper), Flash models align with our scale-to-zero philosophy
176-
- **Speed**: Lower latency is crucial for streaming chat experiences
177-
- **RAG Sufficiency**: In RAG pipelines, intelligence is split between retrieval (Upstash Vector) and synthesis (LLM). Flash excels at synthesis, it doesn't need to know everything; it just needs to read retrieved chunks and summarize them effectively
178-
- **Massive Context Window**: 1M tokens means you can feed significantly more retrieved chunks, reducing the risk of missing information
179-
180-
### Why Client-Side PDF Parsing?
181-
182-
- **Edge Runtime Limits**: Vercel Edge has 1MB bundle size limits and strict CPU time constraints
183-
- **Cost Efficiency**: Offloads parsing to user's browser (zero server cost)
184-
- **Reliability**: Avoids "Module not found: fs" errors common in serverless RAG apps
185-
186185
### Why Upstash Over Pinecone?
187186

188187
- **True Scale-to-Zero**: No $50/month minimum (Pinecone Serverless has a floor)
@@ -208,6 +207,24 @@ The app will automatically:
208207
- Use Node.js runtime for `/api/ingest`
209208
- Serve static assets (including PDF.js worker) from CDN
210209

210+
### Optional: Run the gRPC Vector Gateway
211+
212+
For internal services that prefer gRPC/Protobuf over HTTP+JSON, this repo includes a small vector gateway:
213+
214+
- Proto: `services/vector-grpc/vector.proto` (`VectorService` with `UpsertChunks` and `QueryChunks`)
215+
- Server: `services/vector-grpc/server.ts` (Node.js, wraps Upstash Vector over HTTP)
216+
217+
To run it locally:
218+
219+
```bash
220+
UPSTASH_VECTOR_REST_URL=... \
221+
UPSTASH_VECTOR_REST_TOKEN=... \
222+
VECTOR_GRPC_PORT=50051 \
223+
npm run vector-grpc:server
224+
```
225+
226+
This starts a gRPC server exposing a binary-efficient, schema-safe API for document upsert and similarity search. You can point other backend services or batch jobs at this endpoint instead of calling `/api/ingest` or Upstash HTTP directly.
227+
211228
## Cost Analysis
212229

213230
### Idle State (0 Users)
@@ -252,4 +269,4 @@ If this project helped you, please consider giving it a star! ⭐
252269

253270
---
254271

255-
**Built with ❤️ for the serverless community**
272+
**Built for the serverless community while drinking a lot of 🧃**

docker-compose.yml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
# Run the app locally with Docker. Set env in .env.local or below.
2+
services:
3+
app:
4+
build: .
5+
ports:
6+
- "3000:3000"
7+
env_file:
8+
- .env.local
9+
environment:
10+
- NODE_ENV=production

0 commit comments

Comments
 (0)