| layout | default |
|---|---|
| title | Chapter 5: AI Copilot |
| nav_order | 5 |
| parent | AFFiNE Tutorial |
Welcome to Chapter 5: AI Copilot. In this part of AFFiNE Tutorial, you will learn how AFFiNE integrates AI capabilities directly into the workspace — enabling writing assistance, content summarization, translation, image generation, and conversational interactions with your documents.
AFFiNE's AI copilot is not a separate product bolted onto the editor. It is deeply integrated into the block system (see Chapter 3: Block System) and operates on the same content model, meaning AI actions can read, create, and modify blocks directly.
Modern knowledge workers need AI assistance embedded in their workflow — not in a separate chat window. AFFiNE's copilot solves this by providing AI actions that operate directly on selected blocks, entire pages, or the workspace context. The architecture supports multiple AI providers and allows self-hosted users to bring their own API keys.
- understand the copilot architecture and how it connects to the editor
- learn the available AI actions and how they transform content
- understand the provider abstraction for OpenAI, local models, and custom backends
- learn how AI chat sessions maintain context with document content
- trace an AI action from user trigger to content modification
flowchart TD
subgraph "Frontend"
A[User selects text or block]
B[AI Action Menu]
C[Copilot Panel / Chat]
D[Inline AI Toolbar]
end
subgraph "AI Module"
E[Action Router]
F[Prompt Builder]
G[Context Collector]
H[Response Handler]
end
subgraph "Backend"
I[Copilot API Endpoint]
J[Provider Abstraction]
K[OpenAI / GPT-4]
L[Custom Model Endpoint]
M[Image Generation]
end
A --> B
A --> D
B --> E
C --> E
D --> E
E --> F
F --> G
G --> I
I --> J
J --> K
J --> L
J --> M
K --> H
L --> H
M --> H
H --> A
AFFiNE provides several categories of AI actions that operate on document content:
// packages/frontend/core/src/modules/ai/actions/
// Writing assistance actions
const writingActions = {
'improve-writing': {
description: 'Improve the writing quality of selected text',
input: 'selected text blocks',
output: 'improved text replacement',
},
'fix-spelling-grammar': {
description: 'Fix spelling and grammar errors',
input: 'selected text blocks',
output: 'corrected text replacement',
},
'make-shorter': {
description: 'Condense the selected text',
input: 'selected text blocks',
output: 'shortened text replacement',
},
'make-longer': {
description: 'Expand the selected text with more detail',
input: 'selected text blocks',
output: 'expanded text replacement',
},
'change-tone': {
description: 'Rewrite in a specified tone (professional, casual, etc.)',
input: 'selected text blocks + tone parameter',
output: 'rewritten text replacement',
},
};
// Content generation actions
const generationActions = {
'summarize': {
description: 'Summarize selected content or entire page',
input: 'selected blocks or full page',
output: 'summary paragraph',
},
'translate': {
description: 'Translate content to a target language',
input: 'selected text + target language',
output: 'translated text',
},
'explain': {
description: 'Explain selected content in simpler terms',
input: 'selected blocks',
output: 'explanation paragraph',
},
'continue-writing': {
description: 'Generate continuation of the current content',
input: 'preceding content context',
output: 'new paragraphs appended',
},
'generate-outline': {
description: 'Create a document outline from a topic',
input: 'topic string',
output: 'heading and list blocks',
},
};
// Image actions
const imageActions = {
'create-image': {
description: 'Generate an image from a text prompt',
input: 'text prompt',
output: 'image block',
},
'explain-image': {
description: 'Describe the contents of an image',
input: 'image block',
output: 'description paragraph',
},
};When a user triggers an AI action, the system follows a structured pipeline:
// Simplified AI action execution flow
interface AIActionContext {
// The selected blocks or text that the action operates on
selectedBlocks: BlockModel[];
selectedText?: string;
// The full page content for context
pageContent: string;
// Action-specific parameters
params: Record<string, unknown>;
}
class AIActionExecutor {
async execute(
actionId: string,
context: AIActionContext
): Promise<AIActionResult> {
// 1. Build the prompt from the action template and context
const prompt = this.promptBuilder.build(actionId, context);
// 2. Collect additional context (page title, surrounding blocks)
const enrichedPrompt = this.contextCollector.enrich(prompt, context);
// 3. Send to the AI provider (streaming response)
const stream = await this.provider.chat({
messages: enrichedPrompt.messages,
model: enrichedPrompt.model,
stream: true,
});
// 4. Handle the response — either replace or insert blocks
return this.responseHandler.handle(actionId, stream, context);
}
}// The prompt builder constructs messages for the AI provider
// Each action has a system prompt template
const actionPrompts: Record<string, string> = {
'improve-writing': `You are a writing assistant integrated into a document editor.
The user has selected the following text and wants you to improve its quality.
Maintain the original meaning and tone while improving clarity, flow, and grammar.
Return ONLY the improved text without explanations.`,
'summarize': `You are a summarization assistant integrated into a document editor.
The user wants a concise summary of the following content.
Provide a clear, well-structured summary that captures the key points.`,
'translate': `You are a translation assistant integrated into a document editor.
Translate the following text to {{targetLanguage}}.
Maintain the original formatting and structure.
Return ONLY the translated text.`,
'continue-writing': `You are a writing assistant integrated into a document editor.
Based on the existing content, continue writing in the same style and tone.
Generate 2-3 paragraphs that naturally follow from the context.`,
};Beyond inline actions, AFFiNE provides a chat panel for conversational AI interactions:
// packages/frontend/core/src/modules/ai/chat/
interface CopilotChatSession {
id: string;
messages: ChatMessage[];
// The chat has access to the current page context
pageContext: {
docId: string;
title: string;
// Blocks can be attached as context
attachedBlocks: BlockModel[];
};
}
interface ChatMessage {
role: 'user' | 'assistant' | 'system';
content: string;
// AI responses can include actionable blocks
attachments?: {
type: 'text' | 'image' | 'code';
content: string;
}[];
// Users can insert AI responses directly into the document
actions?: {
insertToPage: () => void;
replaceSelection: () => void;
copyToClipboard: () => void;
};
}The chat panel supports:
- Document-aware conversations — the AI has access to the current page content
- Block references — users can attach specific blocks as context for questions
- Actionable responses — AI responses can be inserted directly into the document as blocks
- Session history — conversations are persisted per workspace
AFFiNE's copilot supports multiple AI providers through a backend abstraction:
// packages/backend/server/src/modules/copilot/providers/
interface CopilotProvider {
// Text generation (streaming)
chatStream(params: ChatParams): AsyncIterable<string>;
// Text generation (non-streaming)
chat(params: ChatParams): Promise<string>;
// Image generation
generateImage(params: ImageParams): Promise<Uint8Array>;
// Text embedding (for search and retrieval)
embed(text: string): Promise<number[]>;
}
interface ChatParams {
messages: Array<{
role: 'system' | 'user' | 'assistant';
content: string;
}>;
model: string;
temperature?: number;
maxTokens?: number;
}
// Provider implementations:
// - OpenAI (GPT-4, GPT-4o, DALL-E)
// - Custom endpoint (any OpenAI-compatible API)
// - Future: local models, Anthropic, etc.// For self-hosted instances, configure providers via environment:
// .env configuration
// COPILOT_OPENAI_API_KEY=sk-...
// COPILOT_OPENAI_MODEL=gpt-4o
// COPILOT_OPENAI_BASE_URL=https://api.openai.com/v1
// Or use a custom OpenAI-compatible endpoint (e.g., Ollama, vLLM):
// COPILOT_OPENAI_BASE_URL=http://localhost:11434/v1
// COPILOT_OPENAI_API_KEY=ollama
// COPILOT_OPENAI_MODEL=llama3When an AI action generates content, the response must be converted back into blocks:
sequenceDiagram
participant User
participant Editor as BlockSuite Editor
participant AI as AI Module
participant Backend as Copilot API
participant LLM as AI Provider
User->>Editor: Select text, click "Improve Writing"
Editor->>AI: Extract selected blocks as text
AI->>Backend: POST /api/copilot/chat (streaming)
Backend->>LLM: Forward with system prompt
LLM-->>Backend: Stream tokens
Backend-->>AI: Stream response chunks
AI->>AI: Accumulate response in preview
Note over AI: User sees streaming preview
User->>AI: Click "Replace"
AI->>Editor: Replace selected blocks with AI content
Editor->>Editor: Create new block transaction
Note over Editor: yjs update propagates to all providers
The response handling includes a critical step: converting AI-generated text (often markdown) back into AFFiNE blocks:
// AI response to block conversion
class AIResponseHandler {
async insertAsBlocks(
response: string,
targetDoc: Doc,
parentBlockId: string,
position: number
) {
// Parse the AI response (typically markdown)
const parsedBlocks = this.markdownParser.parse(response);
// Insert each parsed element as a new block
for (const parsed of parsedBlocks) {
switch (parsed.type) {
case 'paragraph':
targetDoc.addBlock('affine:paragraph', {
type: 'text',
text: new Y.Text(parsed.content),
}, parentBlockId, position++);
break;
case 'heading':
targetDoc.addBlock('affine:paragraph', {
type: `h${parsed.level}`,
text: new Y.Text(parsed.content),
}, parentBlockId, position++);
break;
case 'code':
targetDoc.addBlock('affine:code', {
language: parsed.language,
text: new Y.Text(parsed.content),
}, parentBlockId, position++);
break;
case 'list':
targetDoc.addBlock('affine:list', {
type: parsed.ordered ? 'numbered' : 'bulleted',
text: new Y.Text(parsed.content),
}, parentBlockId, position++);
break;
}
}
}
}In edgeless (whiteboard) mode, the copilot has additional capabilities:
- Mind map generation — generate a mind map from a topic, creating connected shape blocks on the canvas
- Presentation generation — create slide-like frames from a document outline
- Image generation — create images from prompts and place them on the canvas
- Content expansion — select a note on the canvas and ask AI to expand it
AFFiNE's AI copilot is deeply integrated into the block-based content model, providing inline writing actions, conversational chat with document context, and provider-agnostic backend support. AI responses are converted back into native blocks, making the output fully editable and collaborative through the same yjs CRDT system.
Next: Chapter 6: Database and Views — where we explore how AFFiNE implements inline databases with table, kanban, and filtered views.
Back to Tutorial Index | Previous: Chapter 4 | Next: Chapter 6
Generated by AI Codebase Knowledge Builder