QueryPal is a highly scalable, intelligent database exploration and management platform designed for developers, analysts, and data professionals working with Azure Cosmos DB (MongoDB API). It combines the power of Google Gemini AI with a secure, user-friendly interface to transform how you interact with your NoSQL databases.
Key Capabilities:
- 🧠 Natural Language Queries: Convert plain English to MongoDB queries using AI
- 📊 AI-Powered Data Analysis: Automatic insights and visualizations from query results
- 🔍 Smart Data Explorer: Paginated browsing with advanced filtering and search
- 💾 Query Management: Save, share, and collaborate on queries with team members
- 🔒 Enterprise Security: Microsoft Entra ID authentication with On-Behalf-Of (OBO) flow
- 📝 Document Management: Full CRUD operations with audit trails and history
- 🎯 Schema Discovery: Intelligent schema inference and documentation
Azure Cosmos DB's portal interface can be limiting for real-world data exploration and analysis. QueryPal addresses these pain points by providing:
- 🎯 Intuitive Data Discovery: Browse collections, analyze schemas, and understand your data structure without complex queries
- 🧠 AI-Powered Query Generation: Ask questions in natural language and get optimized MongoDB queries instantly
- 📊 Intelligent Analytics: Automatic data analysis with AI-generated insights and Chart.js visualizations
- 👥 Team Collaboration: Share queries, insights, and findings with your team through built-in collaboration features
- 🛡️ Enterprise-Grade Security: Zero-trust architecture with Microsoft Entra ID and secure token management
- 📋 Data Management: Complete document lifecycle management with audit trails and version history
- 🔍 Advanced Search: Powerful filtering and search capabilities across collections and documents
- Smart Query Generation: Convert plain English to optimized MongoDB queries
- Context Awareness: Uses database schema and collection metadata for better results
- Query Optimization: AI suggests performance improvements and best practices
- Multi-step Queries: Handle complex queries requiring multiple steps
- Automatic Insights: AI analyzes query results and provides meaningful insights
- Dynamic Visualizations: Chart.js integration with 8+ chart types
- Theme-Aware Charts: Automatic dark/light mode adaptation
- Export Capabilities: Save query output for external analysis
- Save & Share Queries: Build a knowledge base of useful queries
- Team Collaboration: Share queries with specific team members
- Version History: Track query modifications and usage
- Quick Access: Organize and categorize saved queries
- Paginated Browsing: Handle large collections efficiently
- Smart Filtering: Filter by any field with intelligent search
- Document Linking: Automatic cross-reference detection and navigation
- Full CRUD Operations: Create, read, update, delete documents
- Audit Trails: Complete history of document changes
- Field-Level Editing: Modify specific fields without affecting the whole document
- Data Validation: Ensure data integrity with schema validation
- Interactive Tutorial: Guided onboarding for new users
- Contextual Help: In-app assistance and tooltips
- Responsive Design: Works seamlessly on desktop and mobile
- Accessibility: WCAG 2.1 compliant interface
| Component | Technology |
|---|---|
| Frontend | React 18, TypeScript, Vite, Tailwind CSS, Material-UI |
| AI & Analytics | Google Gemini Pro, Chart.js, React Chart.js 2 |
| Authentication | Microsoft Entra ID, MSAL (Browser & Python), On-Behalf-Of Flow |
| Backend API | FastAPI (Python 3.12), Uvicorn, Pydantic V2 |
| Database | Azure Cosmos DB (MongoDB API), PostgreSQL (User Data) |
| Cloud Platform | Google Cloud Run, Azure Resource Manager (ARM) |
| Infrastructure | Terraform, GCP Secret Manager, Serverless VPC Access, Cloud SQL |
| DevOps & CI/CD | GitHub Actions, Docker, Google Container Registry |
| Testing | Vitest, React Testing Library, Pytest, Coverage.py |
| Code Quality | ESLint, Black, Flake8, MyPy, TypeScript Strict Mode |
| Monitoring | Application Insights, Cloud SQL Proxy, Logging |
QueryPal follows a secure Backend-for-Frontend (BFF) pattern with enterprise-grade security:
┌─────────────────────┐ Auth ┌─────────────────────┐
│ React Frontend ├───────────────►│ Microsoft Entra │
│ (SPA + MSAL.js) │◄───────────────┤ Identity Platform │
└─────────────────────┘ Access Token └─────────────────────┘
│
▼ Bearer Token
┌─────────────────────┐
│ FastAPI Backend │
│ • Token Validation │
│ • OBO Exchange │◄──────────┐
│ • Query Processing │ │
│ • AI Integration │ │
│ • Document CRUD │ │
└─────────────────────┘ │
│ │
▼ │
┌─────────────────────┐ │
│ Google Gemini API │ │
│ • NL2Query │ │
│ • Data Analysis │ │
│ • Insights Gen │ │
└─────────────────────┘ │
│
┌─────────────────────┐ │
│ PostgreSQL DB │ │
│ • User Queries │ │
│ • Audit Logs │ │
│ • Query History │ │
└─────────────────────┘ │
│
┌─────────────────────┐ │
│ Azure Cosmos DB │◄──────────┘
│ • Document Storage │
│ • MongoDB API │
│ • ARM Management │
└─────────────────────┘
QueryPal employs a powerful LangGraph-based ReAct (Reasoning and Acting) agent to autonomously generate, sandbox-test, and evaluate MongoDB queries. This ensures unparalleled accuracy while maintaining strict safety boundaries for write operations.
graph TD
A[User Intent] -->|Natural Language| B(Generate Query)
B --> C{Write Operation?}
C -- Yes --> D[Return Query to User]
D --> E((Manual Review & Run))
E --> F[Execute Write]
F --> G[Log to Audit Database]
F --> H[Evaluate with AI]
C -- No (Read Only) --> I(Sandbox Execution)
I --> J{Evaluate Result}
J -- Success --> K[Return Final Query & Data]
J -- Failure/Error --> B
subgraph LangGraph Agent Loop
B
C
I
J
end
Security Features:
- ✅ Zero-Trust Architecture: No secrets stored in frontend
- 🔐 Token-Based Authentication: MSAL with automatic token refresh
- 🛡️ On-Behalf-Of Flow: Secure Azure resource access
- 🛡️ Input Validation: Comprehensive request/response validation
- 📝 Audit Logging: Complete audit trail for all operations
The fastest way to get QueryPal running locally:
# Clone the repository
git clone https://github.com/ChingEnLin/QueryPal
cd QueryPal
# Configure environment variables
cp backend/.env.example backend/.env
# Edit backend/.env with your API keys and Azure credentials
# Start both frontend and backend
docker-compose up --build
# Access the application
# Frontend: http://localhost:5173
# Backend API: http://localhost:8000
# API Documentation: http://localhost:8000/docsFor development with hot reload:
# Backend setup
cd backend
python -m venv venv
source venv/bin/activate # or venv\Scripts\activate on Windows
pip install -r requirements.txt
cp .env.example .env # Configure your environment variables
uvicorn main:app --reload
# Frontend setup (new terminal)
cd frontend
npm install
npm run devCreate a backend/.env file with:
# Google Gemini API
GEMINI_API_KEY=your_gemini_api_key_here
# Azure Entra ID Configuration
AZURE_TENANT_ID=your_tenant_id
AZURE_CLIENT_ID=your_backend_app_id
AZURE_CLIENT_SECRET=your_client_secret
ARM_SCOPE=https://management.azure.com/.default
# PostgreSQL Database (for user data)
DB_USER=querypal_user
DB_PASS=your_db_password
DB_NAME=querypal
DB_HOST=localhost
DB_PORT=5432
# Optional: For production
DB_UNIX_SOCKET=/cloudsql/project:region:instanceQueryPal maintains high code quality with comprehensive testing:
cd backend
# Run all tests with coverage
./run_tests.sh
# Individual commands
pytest --cov=. --cov-report=html # Tests with coverage
flake8 . --statistics # Code linting
black --check . # Code formatting
mypy . # Type checkingcd frontend
# Run all tests
npm test
# Run with coverage
npm run test:coverage
# Run tests once
npm run test:run
# Interactive UI testing
npm run test:ui- Backend: 85%+ code coverage with pytest
- Frontend: 80%+ code coverage with Vitest
- Integration Tests: E2E testing of critical user flows
- Static Analysis: Type checking, linting, and formatting
- ✅ Automated Testing: All PRs trigger comprehensive test suites
- 🚀 Deployment: Automatic deployment to Google Cloud Run on production branch
- 📊 Code Coverage: Coverage reports uploaded to Codecov
- 🔍 Code Quality: ESLint, Black, MyPy, and TypeScript strict mode
QueryPal runs on Google Cloud Run with a private backend topology. The frontend nginx container is the only public entry point — the backend service is network-isolated and unreachable from the internet.
graph TB
Browser(["👤 Browser"])
subgraph gcp["☁️ Google Cloud Platform — europe-west1"]
subgraph cloudrun["Cloud Run"]
direction TB
Frontend["<b>querypal-frontend</b><br/>──────────────<br/>ingress: public<br/>nginx · serves SPA<br/>proxies /api/* → backend"]
Backend["<b>querypal-backend</b><br/>──────────────<br/>ingress: internal only<br/>FastAPI · Uvicorn<br/>❌ not reachable from internet"]
end
subgraph vpc["VPC Network"]
Connector["Serverless VPC<br/>Access Connector<br/><i>10.8.0.0/28</i>"]
end
SM[("🔑 Secret Manager<br/>6 secrets")]
SQL[("🗄️ Cloud SQL<br/>PostgreSQL")]
GCR["📦 Container Registry"]
SA["🪪 Cloud Run SA<br/><i>least-privilege</i>"]
end
subgraph azure["☁️ Microsoft Azure"]
Entra["🔐 Entra ID<br/><i>MSAL · OBO flow</i>"]
Cosmos[("🌍 Cosmos DB<br/>MongoDB API")]
end
Gemini["🤖 Google Gemini Pro"]
Browser -- "HTTPS" --> Frontend
Frontend -. "vpc-egress: all-traffic" .-> Connector
Connector -- "internal ingress\n✅ VPC source allowed" --> Backend
Backend -- "Cloud SQL Proxy\nunix socket" --> SQL
Backend -- "HTTPS" --> Entra
Backend -- "HTTPS" --> Cosmos
Backend -- "HTTPS" --> Gemini
SM -- "mounted at startup\nvia --set-secrets" --> Backend
SA -. "identity" .-> Frontend
SA -. "identity" .-> Backend
GCR -- "image" --> Frontend
GCR -- "image" --> Backend
| Frontend | Backend | |
|---|---|---|
| Cloud Run ingress | all (public) |
internal (VPC only) |
| VPC egress | all-traffic (proxy to backend) |
private-ranges-only |
| Internet accessible | ✅ Yes | ❌ No — 403 from GFE |
| Who can call it | Anyone | Frontend nginx via VPC connector |
All API calls from the browser go to /api/* on the frontend's own origin. Nginx strips the /api prefix and proxies the request to the backend's internal Cloud Run URL through the VPC connector. The backend URL is never exposed to the browser.
All sensitive configuration is stored in GCP Secret Manager and mounted into the backend container at startup via Cloud Run's native --set-secrets integration. Secrets are never passed as plain environment variables and never appear in deployment logs or gcloud run describe output.
| Secret | Description |
|---|---|
querypal-azure-tenant-id |
Microsoft Entra ID tenant |
querypal-azure-client-id |
Backend app registration client ID |
querypal-azure-client-secret |
Backend app registration client secret |
querypal-gemini-api-key |
Google Gemini API key |
querypal-db-user |
Cloud SQL PostgreSQL username |
querypal-db-pass |
Cloud SQL PostgreSQL password |
Cloud infrastructure is managed by Terraform in the terraform/ directory. The CI pipeline owns image builds and Cloud Run deployments; Terraform owns everything underneath.
| Resource | Managed by |
|---|---|
| VPC connector | Terraform |
| Secret Manager secrets | Terraform |
| Cloud Run service account + IAM | Terraform |
| Cloud SQL instance & database | Terraform (import existing) |
| Cloud Run services | CI pipeline (GitHub Actions) |
| Docker images | CI pipeline (GitHub Actions) |
cd terraform
cp terraform.tfvars.example terraform.tfvars
terraform init
./import.sh # import existing Cloud SQL — no data migration needed
terraform applySee the PR migration guide for the full step-by-step checklist, including how to populate Secret Manager values and what to verify before the first production deploy.
Pushes to the production branch trigger the deploy workflow (.github/workflows/google-cloudrun-docker.yml).
flowchart LR
Push(["push to\nproduction"]) --> Auth
subgraph gha["GitHub Actions"]
Auth["Authenticate\nWorkload Identity\nFederation"]
Auth --> BuildBE["Build & push\nbackend image"]
Auth --> BuildFE["Build & push\nfrontend image"]
BuildBE --> DeployBE["Deploy backend\n--ingress=internal\n--set-secrets\n--vpc-connector"]
BuildFE --> DeployFE
DeployBE -- "backend URL" --> DeployFE["Deploy frontend\nBACKEND_URL=internal URL\n--vpc-connector"]
end
DeployBE --> SM
DeployFE --> Done(["✅ Live"])
subgraph gcp["GCP"]
SM[("Secret Manager\nfetch at startup")]
end
Workload Identity Federation is used for keyless authentication — no long-lived service account keys are stored in GitHub. The dedicated Cloud Run service account (querypal-cloudrun-sa) holds only the permissions it needs: secretmanager.secretAccessor, cloudsql.client, and vpcaccess.user.
- Node.js 20+ and npm
- Python 3.12+
- Docker and Docker Compose
- Google Cloud SDK (for deployment)
- Azure CLI (optional, for Azure resources)
- VS Code with extensions:
- Python
- TypeScript
- Pylance
- Prettier
- ESLint
- Docker
Frontend Application (SPA):
- Go to Azure Portal → App Registrations
- Create new registration:
- Name:
QueryPal Frontend - Platform: Single-page application (SPA)
- Redirect URI:
http://localhost:5173(development) / your production URL
- Name:
- Note the Application (client) ID and Directory (tenant) ID
Backend Application (Confidential Client):
- Create another registration:
- Name:
QueryPal Backend - Client type: Confidential client
- Name:
- Add a client secret (Certificates & secrets)
- Expose an API:
- Add scope:
api://[backend-client-id]/access_as_user - Add the frontend app as an authorized client
- Add scope:
API Permissions:
- Add permissions for both apps:
Microsoft Graph→User.ReadAzure Service Management→user_impersonation
- Grant admin consent for your organization
Grant the backend application appropriate access:
- Go to your Cosmos DB account → Access control (IAM)
- Add role assignment:
- Role:
Cosmos DB Account Reader Role - Assign access to: Service principal
- Select: Your backend application
- Role:
Update frontend/authConfig.ts:
export const msalConfig = {
auth: {
clientId: "your-frontend-client-id",
authority: "https://login.microsoftonline.com/your-tenant-id",
redirectUri: "http://localhost:5173" // or your production URL
},
};
export const loginRequest = {
scopes: ["User.Read", "api://your-backend-client-id/access_as_user"]
};This project uses Semantic Versioning with automated releases based on Conventional Commits.
- Version format:
vMAJOR.MINOR.PATCH(e.g.,v2.1.0) - Automated releases: Triggered when pushing to the
productionbranch - Release notes: Auto-generated and published to GitHub Releases and project wiki
For detailed information about our versioning process and commit message conventions, see docs/SEMANTIC_VERSIONING.md.
QueryPal provides comprehensive REST APIs. When running locally, access:
- Interactive Docs: http://localhost:8000/docs
- OpenAPI Spec: http://localhost:8000/openapi.json
This project is licensed under the MIT License - see the LICENSE file for details.
Built by Ching-En Lin
Powered by:
- 🤖 Google Gemini Pro AI
- ☁️ Microsoft Azure & Google Cloud
- ⚡ Modern web technologies
- Live Demo: QueryPal Production
- GitHub Repository: QueryPal Source
- Issues & Feedback: GitHub Issues