Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion app/models/message.rb
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
class Message < ApplicationRecord
acts_as_message tool_calls_foreign_key: :message_id
has_many_attached :attachments
broadcasts_to ->(message) { "chat_#{message.chat_id}" }, inserts_by: :append, target: "messages"
broadcasts_to ->(message) { "chat_#{message.chat_id}" }, inserts_by: :append, target: "messages", partial: "messages/message"

after_update_commit :broadcast_message_replacement, if: :assistant?
before_save :calculate_cost, if: :should_calculate_cost?
Expand Down
182 changes: 182 additions & 0 deletions docs/ideas/rag-chatbot.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,182 @@
# Idea: RAG Chatbot ("Ask about Ruby")

A small chatbot on the homepage where visitors can learn about Ruby, augmented with our own content (Posts, Testimonials, Projects) via vector search.

## Goals

- Add a simple "Ask about Ruby" experience to the homepage
- Gain experience with vector search using SQLite (no separate DB)
- Reuse existing chat infrastructure (Chat/Message models, views, streaming)
- Keep costs bounded and prevent abuse as a free general-purpose AI

## UX

- **Homepage**: a single input field ("Ask about Ruby...") with a send button — no full chat widget
- **Homepage nav**: new "Chat" link alongside existing category buttons and "Community"
- **`/chat` page**: reuses the existing beautiful chat view (from the remote template), requires login
- **Single chat per user** — no chat list, no "new chat" button, no multi-tenant URLs exposed to users
- **Auth gate**: hitting send on the homepage, or navigating to `/chat`, triggers GitHub OAuth if not signed in, then lands on `/chat` with the question submitted

## Cost control — single rotating chat

A single ever-growing chat would get expensive fast because RubyLLM's `acts_as_chat` sends full conversation history on every request.

**Strategy:** one active chat per user, silently rotated:

- Always show "the user's current chat" at `/chat`
- If the current chat exceeds ~30 messages or is older than 24h, start a new one behind the scenes
- Old chats stay in the DB for cost tracking but the user never sees them
- From the user's perspective it feels like one continuous conversation

## Topic restriction

Restrict conversations to Ruby / our site content via the system prompt. No code-level topic detection — the LLM enforces this. If a question isn't about Ruby, the Ruby ecosystem, or content on our site, politely decline and redirect.

## Architecture: RAG with SQLite vector search

### Stack

| Layer | Tool | Notes |
|-------|------|-------|
| Vector storage | `sqlite-vec` gem | SQLite extension, no separate DB |
| Rails integration | `neighbor` gem | Andrew Kane — `has_neighbors`, `nearest_neighbors` |
| Embeddings | `RubyLLM.embed` | Already have `ruby_llm` |
| Chat + streaming | Existing `Chat`/`Message` + `ChatResponseJob` + Turbo Streams | Full reuse |

Two new gems: `sqlite-vec`, `neighbor`. Everything else already in the project.

### Flow

```
User question
① RubyLLM.embed(question) → query vector (1536 floats)
② ContentChunk.nearest_neighbors(:embedding, vector, distance: "cosine").first(8)
③ Build system prompt with retrieved chunks + Ruby-only instructions
④ chat.ask(question) → streamed response via existing Turbo Streams
```

### Data model

**Two new tables:**

```ruby
# vec0 virtual table — stores only vectors + IDs
create_virtual_table :content_embeddings, :vec0, [
"id text PRIMARY KEY NOT NULL",
"embedding float[1536] distance_metric=cosine"
]

# Metadata table — stores text chunks + polymorphic source reference
create_table :content_chunks, id: { type: :string, default: -> { "uuid7()" } } do |t|
t.string :source_type, null: false # "Post", "Testimonial", "Project"
t.string :source_id, null: false
t.text :content, null: false
t.string :title
t.timestamps
end
```

### Content indexed

| Model | Fields embedded |
|-------|----------------|
| **Post** (published) | `title + summary + content_or_url` |
| **Testimonial** (published) | `heading + quote + body_text` |
| **Project** (visible) | `name + description + topics` |

### New concerns

**`Embeddable`** — shared by Post, Testimonial, Project. Each model defines `embeddable_text`. Callbacks trigger `GenerateEmbeddingJob` on save.

**`Chat::RagAugmentable`** — added to Chat. Provides `ask_with_rag(question, &block)`:
1. Generates query embedding via `RubyLLM.embed`
2. Finds nearest 8 `ContentChunk` records
3. Builds system prompt with retrieved context + Ruby-only instructions
4. Calls `ask(question, with_instructions: system_prompt, &block)`

### Chat purpose

Add `"rag"` to existing purposes (`conversation`, `summary`, `testimonial_generation`, `testimonial_validation`). `ChatResponseJob` detects `purpose == "rag"` and calls `ask_with_rag`.

### Settings (no hardcoded models)

Add `embedding_model` to `Setting::ALLOWED_KEYS`, default `"text-embedding-3-small"` (1536 dims). Configurable via the existing Madmin settings page, same pattern as `summary_model`, `testimonial_model`, etc.

## Routes

```ruby
# Top-level, no team slug exposed
resource :chat, only: [:show, :create], controller: "chats"
```

- `GET /chat` — shows user's current chat (creates one if none), requires auth
- `POST /chat` — sends a message, requires auth
- Homepage input form posts to `/chat`

## Auth flow

The existing `authenticate_user!` in `ApplicationController` already handles the redirect:

```ruby
def authenticate_user!
unless user_signed_in?
session[:return_to] = request.original_url if request.get?
redirect_to github_auth_with_return_path, alert: "Please sign in with GitHub to continue."
end
end
```

For POST from the homepage input, store the pending question in the session before redirecting to GitHub OAuth, then replay it after sign-in.

## Files to create/modify

| Action | File |
|--------|------|
| **Gemfile** | +`sqlite-vec`, +`neighbor` |
| **database.yml** | Load vec0 extension alongside existing uuid extension |
| **Migration** | `create_content_chunks` + `create_content_embeddings` (vec0) |
| **Model** | `app/models/content_chunk.rb` (with `has_neighbors`) |
| **Concern** | `app/models/concerns/embeddable.rb` |
| **Concern** | `app/models/concerns/chat/rag_augmentable.rb` |
| **Model mods** | Include `Embeddable` in Post, Testimonial, Project |
| **Model mod** | Include `Chat::RagAugmentable` in Chat |
| **Setting** | Add `embedding_model` to `ALLOWED_KEYS` |
| **Controller** | New top-level `/chat` controller (or reuse existing with new route) |
| **Job** | `generate_embedding_job.rb` |
| **Job** | `backfill_embeddings_job.rb` (one-time for existing content) |
| **Job mod** | `chat_response_job.rb` — detect `purpose == "rag"`, call `ask_with_rag` |
| **View mod** | `home/index.html.erb` — add input + "Chat" nav link |
| **Routes** | `config/routes.rb` — add top-level `/chat` |
| **i18n** | Locale files for new strings |

## Design decisions

- **No hardcoded AI models** — everything via `Setting.get(...)`, same pattern as existing AI features
- **No new chat views** — reuse the existing beautiful templates from the remote template
- **No team slug in URLs** — multi-tenancy is internal, users see clean `/chat`
- **No chat list, no multi-chat UI** — single rotating chat keeps UX simple and costs bounded
- **No vector DB** — sqlite-vec keeps infrastructure footprint zero
- **Auth required** — connects chats to users for cost tracking via existing `Costable`

## Migration path if corpus grows

If content grows to tens of thousands of documents or semantic quality becomes insufficient:
- Switch embedding model via the setting (e.g., to `text-embedding-3-large`, 3072 dims)
- Add chunking (split long posts into overlapping windows) in `GenerateEmbeddingJob`
- Everything else (controller, concern, UI) stays the same

## Open questions for implementation

- Auto-rotation threshold: 30 messages? 24h? Both?
- Should old rotated chats be visible to admins in Madmin? (Probably yes for debugging)
- Rate limit per user: e.g., 20 messages/hour to prevent abuse even among logged-in users
- Do we index draft/unpublished content? (No — only published)
- Should failed embeddings retry or silently skip? (Retry with backoff via SolidQueue defaults)
12 changes: 12 additions & 0 deletions test/models/message_test.rb
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
require "test_helper"

class MessageTest < ActiveSupport::TestCase
include ActiveJob::TestHelper

test "calculates cost based on token usage" do
message = Message.new(
chat: chats(:one),
Expand All @@ -15,4 +17,14 @@ class MessageTest < ActiveSupport::TestCase

assert message.cost > 0
end

test "broadcasts create using messages/message partial regardless of role" do
%w[user assistant system tool].each do |role|
message = Message.new(chat: chats(:one), role: role, content: "hi")

assert_nothing_raised do
perform_enqueued_jobs { message.save! }
end
end
end
end