certainly-param
diff --git a/‎.github/workflows/ci.yml‎
Lines changed: 37 additions & 0 deletions b/‎.github/workflows/ci.yml‎
Lines changed: 37 additions & 0 deletions
diff --git a/‎Dockerfile‎
Lines changed: 34 additions & 0 deletions b/‎Dockerfile‎
Lines changed: 34 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 39 additions & 22 deletions b/‎README.md‎
Lines changed: 39 additions & 22 deletions
diff --git a/‎docker-compose.yml‎
Lines changed: 10 additions & 0 deletions b/‎docker-compose.yml‎
Lines changed: 10 additions & 0 deletions
@@ -0,0 +1,37 @@
+name: CI
+
+on:
+  push:
+    branches: [main, master]
+  pull_request:
+    branches: [main, master]
+
+jobs:
+  lint-build-test:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Setup Node.js
+        uses: actions/setup-node@v4
+        with:
+          node-version: "20"
+          cache: "npm"
+
+      - name: Install dependencies
+        run: npm ci
+
+      - name: Lint
+        run: npm run lint
+
+      - name: Build
+        run: npm run build
+        env:
+          GOOGLE_GENERATIVE_AI_API_KEY: ${{ secrets.GOOGLE_GENERATIVE_AI_API_KEY || 'sk-dummy' }}
+          UPSTASH_REDIS_REST_URL: ${{ secrets.UPSTASH_REDIS_REST_URL || 'https://dummy.upstash.io' }}
+          UPSTASH_REDIS_REST_TOKEN: ${{ secrets.UPSTASH_REDIS_REST_TOKEN || 'dummy' }}
+          UPSTASH_VECTOR_REST_URL: ${{ secrets.UPSTASH_VECTOR_REST_URL || 'https://dummy.upstash.io' }}
+          UPSTASH_VECTOR_REST_TOKEN: ${{ secrets.UPSTASH_VECTOR_REST_TOKEN || 'dummy' }}
+
+      - name: Test
+        run: npm run test
@@ -0,0 +1,34 @@
+# Build stage
+FROM node:20-alpine AS builder
+
+WORKDIR /app
+
+COPY package.json package-lock.json* ./
+RUN npm ci
+
+COPY . .
+RUN npm run build
+
+# Production stage
+FROM node:20-alpine AS runner
+
+WORKDIR /app
+
+ENV NODE_ENV=production
+ENV NEXT_TELEMETRY_DISABLED=1
+
+RUN addgroup --system --gid 1001 nodejs
+RUN adduser --system --uid 1001 nextjs
+
+COPY --from=builder /app/public ./public
+COPY --from=builder --chown=nextjs:nodejs /app/.next/standalone ./
+COPY --from=builder --chown=nextjs:nodejs /app/.next/static ./.next/static
+
+USER nextjs
+
+EXPOSE 3000
+
+ENV PORT=3000
+ENV HOSTNAME="0.0.0.0"
+
+CMD ["node", "server.js"]
@@ -26,7 +26,11 @@
 - **Semantic Caching**: 30-60% cost reduction via intelligent query caching
 - **Streaming Citations**: Citations stream before the answer for instant transparency
 - **Hybrid Runtime**: Edge for chat, Node.js for ingestion (optimal performance)
+- **Observability**: OpenTelemetry (ingest/retrieval spans), Langfuse (LLM traces, token usage)
+- **CI/CD & Testing**: GitHub Actions (lint, build, test), Vitest unit tests
+- **Docker**: Standalone image + docker-compose for local/self-hosted run
 - **Modern UI**: Beautiful, responsive chat interface with real-time streaming
+- **Optional gRPC Gateway**: Binary-efficient vector upsert/query sidecar (Proto + Node server)
 
 ## Architecture
 
@@ -141,6 +145,10 @@ graph TB
 | **Cache** | Upstash Redis | Semantic caching for cost reduction |
 | **PDF Parsing** | `pdfjs-dist` | Client-side to avoid Edge limits |
 | **UI** | React + `@ai-sdk/react` | Streaming chat with citation support |
+| **Observability** | OpenTelemetry, Langfuse | Traces (ingest/retrieval), LLM generations & token usage |
+| **CI/CD** | GitHub Actions | Lint, build, test on push/PR |
+| **Testing** | Vitest | Unit tests (chunking, observability) |
+| **Containers** | Docker, docker-compose | Local/self-hosted run |
 
 ## Project Structure
 
@@ -157,32 +165,23 @@ serverless-rag/
 │   │   └── page.tsx               # Main chat UI
 │   ├── lib/
 │   │   ├── chunking.ts            # Text chunking utilities
-│   │   ├── retrieval.ts           # Vector search & cache logic
-│   │   └── upstash.ts             # Upstash client initialization
-│   └── proxy.ts                   # Semantic cache proxy
+│   │   ├── observability.ts      # OpenTelemetry spans (Node)
+│   │   ├── retrieval.ts          # Vector search & cache logic
+│   │   └── upstash.ts            # Upstash client initialization
+│   ├── instrumentation.ts        # Next.js OTel registration
+│   └── proxy.ts                  # Semantic cache proxy
+├── .github/workflows/ci.yml      # CI: lint, build, test
+├── services/vector-grpc/         # Optional gRPC vector gateway (proto + Node server)
+├── services/ingest-grpc/         # Optional gRPC bulk-ingest proto & design
 ├── public/
-│   └── pdf.worker.min.mjs         # PDF.js worker (static asset)
-└── env.example                    # Environment variable template
+│   └── pdf.worker.min.mjs        # PDF.js worker (static asset)
+├── Dockerfile                    # Standalone image
+├── docker-compose.yml            # Local run with env
+└── env.example                   # Environment variable template
 ```
 
 ## Key Design Decisions
 
-### Why Gemini 2.0 Flash by Default?
-
-**The Scale-to-Zero Constraint:**
-The project's core value proposition is "$0/month when idle." This dictates every architectural decision, including model choice.
-
-- **Cost Efficiency**: At $0.10 per 1M input tokens vs $1.25 for Pro (12.5x cheaper), Flash models align with our scale-to-zero philosophy
-- **Speed**: Lower latency is crucial for streaming chat experiences
-- **RAG Sufficiency**: In RAG pipelines, intelligence is split between retrieval (Upstash Vector) and synthesis (LLM). Flash excels at synthesis, it doesn't need to know everything; it just needs to read retrieved chunks and summarize them effectively
-- **Massive Context Window**: 1M tokens means you can feed significantly more retrieved chunks, reducing the risk of missing information
-
-### Why Client-Side PDF Parsing?
-
-- **Edge Runtime Limits**: Vercel Edge has 1MB bundle size limits and strict CPU time constraints
-- **Cost Efficiency**: Offloads parsing to user's browser (zero server cost)
-- **Reliability**: Avoids "Module not found: fs" errors common in serverless RAG apps
-
 ### Why Upstash Over Pinecone?
 
 - **True Scale-to-Zero**: No $50/month minimum (Pinecone Serverless has a floor)
@@ -208,6 +207,24 @@ The app will automatically:
 - Use Node.js runtime for `/api/ingest`
 - Serve static assets (including PDF.js worker) from CDN
 
+### Optional: Run the gRPC Vector Gateway
+
+For internal services that prefer gRPC/Protobuf over HTTP+JSON, this repo includes a small vector gateway:
+
+- Proto: `services/vector-grpc/vector.proto` (`VectorService` with `UpsertChunks` and `QueryChunks`)
+- Server: `services/vector-grpc/server.ts` (Node.js, wraps Upstash Vector over HTTP)
+
+To run it locally:
+
+```bash
+UPSTASH_VECTOR_REST_URL=... \
+UPSTASH_VECTOR_REST_TOKEN=... \
+VECTOR_GRPC_PORT=50051 \
+npm run vector-grpc:server
+```
+
+This starts a gRPC server exposing a binary-efficient, schema-safe API for document upsert and similarity search. You can point other backend services or batch jobs at this endpoint instead of calling `/api/ingest` or Upstash HTTP directly.
+
 ## Cost Analysis
 
 ### Idle State (0 Users)
@@ -252,4 +269,4 @@ If this project helped you, please consider giving it a star! ⭐
 
 ---
 
-**Built with ❤️ for the serverless community**
+**Built for the serverless community while drinking a lot of 🧃**
@@ -0,0 +1,10 @@
+# Run the app locally with Docker. Set env in .env.local or below.
+services:
+  app:
+    build: .
+    ports:
+      - "3000:3000"
+    env_file:
+      - .env.local
+    environment:
+      - NODE_ENV=production