ResearchAI

An AI-powered semantic search and information retrieval system built with OpenAI embeddings and Supabase vector database

🎯 Purpose

ResearchAI is a full-stack application that allows you to:

Ingest text documents and convert them into searchable embeddings
Query your knowledge base using natural language
Retrieve semantically relevant information with similarity scores
Visualize results through a modern, real-time dashboard

Perfect for building personal knowledge bases, research assistants, or document search systems.

🏗️ Architecture

ResearchAI follows a three-tier architecture:

┌─────────────────────────────────────────────────────────────┐
│                     Frontend (React)                        │
│  - Real-time dashboard with WebSocket connection            │
│  - Ingestion controls & query interface                     │
│  - Live log viewer & results display                        │
└─────────────────────────────────────────────────────────────┘
                              ↕
┌─────────────────────────────────────────────────────────────┐
│                  Backend (Express.js + Socket.io)           │
│  - RESTful API for ingestion & queries                      │
│  - WebSocket for real-time logging                          │
│  - OpenAI integration for embeddings                        │
└─────────────────────────────────────────────────────────────┘
                              ↕
┌─────────────────────────────────────────────────────────────┐
│              Database (Supabase PostgreSQL)                 │
│  - pgvector extension for similarity search                 │
│  - Stores document chunks + metadata + embeddings           │
└─────────────────────────────────────────────────────────────┘

🛠️ Tech Stack

Frontend

React 18.3 - UI library
Vite 6.0 - Build tool and dev server
Socket.io Client 4.8 - Real-time WebSocket communication
CSS3 - Custom styling with dark theme

Backend

Node.js (ES Modules) - Runtime environment
Express.js 4.21 - Web application framework
Socket.io 4.8 - WebSocket server for real-time logs
OpenAI API 6.15 - Text embeddings generation
Supabase JS 2.89 - Database client
Postgres 3.4 - PostgreSQL client

Database & AI

Supabase - Hosted PostgreSQL with pgvector
OpenAI text-embedding-3-small - 1536-dimensional embeddings
OpenAI gpt-4o-mini / gpt-4o - Context-aware answer generation

📁 Project Structure

ResearchAI/
├── backend/                    # Express.js API server
│   ├── server.js               # Main server with Socket.io
│   ├── config.js               # API clients & configuration
│   ├── logger.js               # Custom logger with WebSocket broadcast
│   ├── routes/
│   │   └── api.js              # API route definitions
│   ├── controllers/
│   │   ├── ingestController.js # Ingestion endpoints
│   │   └── queryController.js  # Query endpoints
│   ├── ingestInfo.js           # Document ingestion logic
│   ├── retrieveInfo.js         # Semantic search logic
│   └── package.json
│
├── frontend/                   # React dashboard
│   ├── src/
│   │   ├── App.jsx             # Main app component
│   │   ├── main.jsx            # React entry point
│   │   ├── App.css             # Styling
│   │   └── components/
│   │       ├── StatusBar.jsx   # Connection status header
│   │       ├── LogViewer.jsx   # Real-time logs
│   │       ├── IngestPanel.jsx # Ingestion controls
│   │       ├── QueryPanel.jsx  # Search interface
│   │       └── ResultsDisplay.jsx # Results visualization
│   ├── index.html
│   ├── vite.config.js          # Vite config with proxy
│   └── package.json
│
├── info/                       # Sample documents to ingest
│   └── github-skills-experience.txt
│
├── index.js                    # CLI script for querying
├── ingestInfo.js               # CLI script for ingestion
├── retrieveInfo.js             # Shared retrieval logic
├── config.js                   # Shared configuration
├── create-table.sql            # Database schema
├── package.json                # Root dependencies
└── .env                        # Environment variables (not committed)

🚀 Getting Started

Prerequisites

Node.js 18+
npm or yarn
Supabase account (free tier works)
OpenAI API key

1. Clone the Repository

git clone https://github.com/chipsxp/ResearchAI.git
cd ResearchAI

2. Environment Setup

Create a .env file in the root directory:

# OpenAI Configuration
OPENAI_API_KEY=sk-your-openai-api-key

# Supabase Configuration
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_ROLE_KEY=your-supabase-anon-key

# Server Configuration (optional)
PORT=5000
NODE_ENV=development

3. Database Setup

Create a new Supabase project
Run the SQL schema from create-table.sql in the Supabase SQL Editor:

-- Creates the 'information' table with pgvector extension
-- See create-table.sql for full schema

4. Install Dependencies

# Root dependencies (for CLI scripts)
npm install

# Backend dependencies
cd backend
npm install

# Frontend dependencies
cd ../frontend
npm install

5. Run the Application

Option A: Full Stack (Recommended)

Terminal 1 - Backend:

cd backend
npm run dev
# Server runs on http://localhost:5000

Terminal 2 - Frontend:

cd frontend
npm run dev
# Dashboard runs on http://localhost:5173

Option B: CLI Only

# Ingest documents
npm run ingest

# Query from command line
node index.js

📊 How It Works

1. Data Ingestion Pipeline

Text Files → Chunking → Metadata Extraction → Embeddings → Database

Read Files: Scans the /info directory for .txt files
Chunking: Splits large documents into manageable chunks (~500 characters)
Metadata Extraction: Uses GPT-4 to extract structured metadata (tags, categories, key entities)
Embedding Generation: Converts text chunks into 1536-dimensional vectors using OpenAI
Database Storage: Saves chunks + embeddings + metadata to Supabase

2. Semantic Search Process

User Query → Embedding → Vector Search → Ranked Results → LLM Answer

Query Embedding: Convert user's natural language query to vector
Similarity Search: Use pgvector's cosine similarity to find matching chunks
Ranking: Sort results by similarity score (0-100%)
Context Building: Combine top results as context
Answer Generation: Feed context to GPT-4 for natural language answer

🔌 API Endpoints

Health Check

GET /api/health

Ingestion

# Start ingestion
POST /api/ingest
Body: { "clearFirst": true }

# Clear database
POST /api/ingest/clear

# List available files
GET /api/ingest/files

Search & Query

# Semantic search
POST /api/query
Body: { "query": "What is Jimmy's background?", "matchCount": 5 }

# Get AI-generated answer
POST /api/query/answer
Body: { "query": "What programming languages does Jimmy know?" }

# Enhanced answer with sources
POST /api/query/enhanced
Body: { "query": "Tell me about Jimmy's projects" }

Logs

# Get log history
GET /api/logs?count=100

# Clear logs
DELETE /api/logs

🎨 Features

✅ Real-time Dashboard - Live updates via WebSocket
✅ Semantic Search - Natural language queries
✅ AI-Powered Answers - Context-aware responses using GPT-4
✅ Metadata Extraction - Automatic tagging and categorization
✅ Similarity Scoring - Percentage match for each result
✅ File Management - List, ingest, and clear documents
✅ Comprehensive Logging - Real-time operation tracking
✅ RESTful API - Easy integration with other tools

🧪 Testing

Backend API Testing

cd backend
node test-api.js

Manual cURL Testing

# Health check
curl http://localhost:5000/api/health

# Search
curl -X POST http://localhost:5000/api/query \
  -H "Content-Type: application/json" \
  -d '{"query": "What is Jimmy skilled at?", "matchCount": 3}'

🚄 Deployment

The backend is designed for Railway.com deployment:

Push code to GitHub
Connect Railway to your repository
Add environment variables in Railway dashboard
Deploy automatically on push

Frontend can be deployed to:

Vercel (recommended for Vite/React)
Netlify
GitHub Pages

📚 Additional Documentation

For deeper technical details, see:

Backend Documentation - API details, deployment, and architecture
Frontend Documentation - Component guide and WebSocket events
Developer Guide - In-depth technical reference for AI engineers (to be created)

🛡️ Environment Variables Reference

Variable	Description	Required
`OPENAI_API_KEY`	Your OpenAI API key	✅
`SUPABASE_URL`	Supabase project URL	✅
`SUPABASE_ROLE_KEY`	Supabase service role key	✅
`PORT`	Backend server port (default: 5000)	❌
`NODE_ENV`	Environment mode (development/production)	❌
`CORS_ORIGINS`	Comma-separated allowed origins	❌

🤝 Contributing

Contributions are welcome! Please follow these steps:

Fork the repository
Create a feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

📝 License

This project is licensed under the MIT License.

👤 Author

Jimmy Burns (pluckCode / chipsxp)

GitHub: @chipsxp
Email: chips_xp@yahoo.com
Website: chipsxp.com
LinkedIn: in/chipsxp

🙏 Acknowledgments

OpenAI - GPT and embedding models
Supabase - Hosted PostgreSQL with pgvector
Socket.io - Real-time communication
Vite - Lightning-fast frontend tooling

📞 Support

If you encounter issues or have questions:

Check the Backend README for troubleshooting
Open an Issue
Contact via email: chips_xp@yahoo.com

Built with ❤️ for AI-powered knowledge management

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ResearchAI

🎯 Purpose

🏗️ Architecture

🛠️ Tech Stack

Frontend

Backend

Database & AI

📁 Project Structure

🚀 Getting Started

Prerequisites

1. Clone the Repository

2. Environment Setup

3. Database Setup

4. Install Dependencies

5. Run the Application

📊 How It Works

1. Data Ingestion Pipeline

2. Semantic Search Process

🔌 API Endpoints

Health Check

Ingestion

Search & Query

Logs

🎨 Features

🧪 Testing

Backend API Testing

Manual cURL Testing

🚄 Deployment

📚 Additional Documentation

🛡️ Environment Variables Reference

🤝 Contributing

📝 License

👤 Author

🙏 Acknowledgments

📞 Support

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

ResearchAI

🎯 Purpose

🏗️ Architecture

🛠️ Tech Stack

Frontend

Backend

Database & AI

📁 Project Structure

🚀 Getting Started

Prerequisites

1. Clone the Repository

2. Environment Setup

3. Database Setup

4. Install Dependencies

5. Run the Application

📊 How It Works

1. Data Ingestion Pipeline

2. Semantic Search Process

🔌 API Endpoints

Health Check

Ingestion

Search & Query

Logs

🎨 Features

🧪 Testing

Backend API Testing

Manual cURL Testing

🚄 Deployment

📚 Additional Documentation

🛡️ Environment Variables Reference

🤝 Contributing

📝 License

👤 Author

🙏 Acknowledgments

📞 Support