diff --git a/docs/en-US/deployment/_category.yaml b/docs/en-US/deployment/_category.yaml
deleted file mode 100644
index b01ba155f..000000000
--- a/docs/en-US/deployment/_category.yaml
+++ /dev/null
@@ -1,2 +0,0 @@
-title: Deployment
-position: 3
diff --git a/docs/en-US/deployment/build-docker-image.md b/docs/en-US/deployment/build-docker-image.md
deleted file mode 100644
index f3ffbbeaa..000000000
--- a/docs/en-US/deployment/build-docker-image.md
+++ /dev/null
@@ -1,45 +0,0 @@
-# Build Guide
-
-This section covers how to build ApeRAG container images. It's primarily for users who need to create their own builds or deploy to environments other than the ones covered in "Getting Started".
-
-## Building Container Images
-
-The project uses Docker and `make` commands to build container images.
-
-* **Local Platform Builds**:
- These commands build images for your current machine's architecture.
- ```bash
- # Build all necessary images for local platform
- make build-local
-
- # Build only the backend image for local platform
- make build-aperag-local
-
- # Build only the frontend image for local platform
- make build-aperag-frontend-local
- ```
-
-* **Multi-platform Builds**:
- These commands build images for multiple architectures (e.g., amd64, arm64). This requires Docker Buildx to be set up and configured.
- ```bash
- # Build all necessary images for multiple platforms
- make build
-
- # Build only the backend image for multiple platforms
- make build-aperag
-
- # Build only the frontend image for multiple platforms
- make build-aperag-frontend
- ```
- You can specify the target platforms using the `PLATFORMS` variable, for example:
- ```bash
- make build PLATFORMS=linux/amd64,linux/arm64
- ```
-
-## Deployment
-
-Refer to the "Getting Started" section in the main README for common deployment methods:
-* [Getting Started with Kubernetes](../README.md#getting-started-with-kubernetes)
-* [Getting Started with Docker Compose](../README.md#getting-started-with-docker-compose)
-
-For custom deployments, you will need to adapt these methods or use the built container images with your chosen orchestration platform. Ensure all required services (databases, backend, frontend, Celery workers) are correctly configured and can communicate with each other.
\ No newline at end of file
diff --git a/docs/en-US/design/_category.yaml b/docs/en-US/design/_category.yaml
deleted file mode 100644
index 049843c35..000000000
--- a/docs/en-US/design/_category.yaml
+++ /dev/null
@@ -1,2 +0,0 @@
-title: Design
-position: 1
diff --git a/docs/en-US/design/agent_runtime_v3.md b/docs/en-US/design/agent_runtime_v3.md
deleted file mode 100644
index 996806cbf..000000000
--- a/docs/en-US/design/agent_runtime_v3.md
+++ /dev/null
@@ -1,546 +0,0 @@
----
-title: Agent Runtime V3 Design
-description: Detailed design for ApeRAG Agent Runtime V3, including the Turn, TimelineEvent, Artifact, SSE contract, and migration boundary
-keywords: Agent Runtime, SSE, Turn, TimelineEvent, Artifact, PydanticAI, MCP
-position: 4
----
-
-# ApeRAG Agent Runtime V3 Design
-
-## 1. Background and Goal
-
-The current ApeRAG agent chat path is built on top of `mcp-agent`. It has proven that the system can work, but it is not a good long-term fit for ApeRAG's product and delivery goals.
-
-The unstable part is no longer the business API surface. The weakest layer is the runtime glue:
-
-- event dispatch
-- streaming output
-- frontend/backend message grammar
-- session cache shape
-- tool-result propagation
-
-These are coupled too tightly to third-party runtime internals, which directly turns into support cost, debugging cost, and deployment risk in private environments.
-
-This design therefore does **not** aim to find a more powerful agent framework. Its purpose is to rebuild ApeRAG's `agent product layer` around private deployment requirements:
-
-1. private-deployment friendly
-2. simple and reliable
-3. low maintenance cost
-4. minimal post-delivery support burden
-
-## 2. Final Decision
-
-Agent Runtime V3 makes the following decisions official:
-
-1. `mcp-agent` will not remain the long-term runtime core
-2. `FastAPI + FastMCP + existing business API/provider integrations` remain the stable business surface
-3. `SSE` becomes the only primary transport
-4. the product contract is rebuilt around `Turn + TimelineEvent + Artifact`
-5. `PydanticAI adapter` is used as the Phase 1 runtime implementation
-6. the system keeps a long-term path to collapse into a self-owned thin orchestration layer
-7. the main runtime will not be rebuilt around `Vercel AI SDK`, `OpenAI Agents SDK`, or `LangGraph`
-
-External libraries may help with implementation, but they do not define the product contract.
-
-## 3. Design Principles
-
-### 3.1 Private deployment first
-
-All decisions optimize for:
-
-- deterministic defaults
-- diagnosable failures
-- clear rollback boundaries
-- minimal hidden assumptions
-- minimal compatibility baggage
-
-### 3.2 ApeRAG owns the contract
-
-The following are owned by ApeRAG itself:
-
-- API entrypoints
-- SSE event protocol
-- user-visible status vocabulary
-- TimelineEvent schema
-- Artifact schema
-- history commit policy
-
-Third-party runtimes must adapt to this contract, not the other way around.
-
-### 3.3 Final answer and process events must be separated
-
-The final answer, process timeline, references, and tool results are no longer packed into one assistant message.
-
-The layering is:
-
-- `answer` is an answer artifact
-- `timeline` is a process-event stream
-- `references` are separate artifacts
-- `tool result` is surfaced via summary events and artifact references
-
-### 3.4 Phase 1 stays intentionally narrow
-
-Phase 1 supports only:
-
-- single agent
-- serial tool loop
-- single MCP server view
-- multiple internal loops inside a single turn
-
-Phase 1 explicitly does not support:
-
-- multi-agent
-- parallel tool fan-out
-- workflow/graph orchestration
-- long-running orchestration
-
-## 4. Core Object Model
-
-## 4.1 Turn
-
-A `Turn` is one complete agent execution for one user query.
-
-It is important to clarify that a turn is **not** a single-step answer. A turn may contain many rounds of:
-
-- thinking
-- web search
-- tool calls
-- result reading
-- internal reasoning
-
-The turn is the outer execution boundary.
-
-### 4.1.1 Suggested fields
-
-```text
-schema_version
-turn_id
-chat_id
-user_id
-request_id
-client_idempotency_key
-status
-input_text
-model_profile
-started_at
-finished_at
-error_code
-error_message
-answer_artifact_id
-reference_bundle_artifact_id
-timeline_cursor
-```
-
-### 4.1.2 State machine
-
-```text
-queued -> running -> completed
-queued -> running -> failed
-queued -> running -> cancelled
-```
-
-### 4.1.3 Hard rules
-
-1. one turn must have exactly one final answer artifact
-2. one turn must never execute twice
-3. one `chat_id + client_idempotency_key` must not create multiple valid turns
-
-## 4.2 TimelineEvent
-
-A `TimelineEvent` is the standardized event stream for one turn.
-
-It serves three roles:
-
-- frontend timeline rendering model
-- SSE transport model
-- replay and diagnosis model
-
-It is **not** a raw debug-log dump.
-
-### 4.2.1 Required fields
-
-```text
-schema_version
-event_id
-turn_id
-sequence
-timestamp
-type
-label
-status
-actor
-data
-```
-
-### 4.2.2 Hard rules
-
-1. `sequence` must be strictly monotonic inside one turn
-2. the frontend must not infer ordering from timestamps
-3. `actor` is limited to `agent | tool | system`
-4. `data` carries only the minimum payload
-5. the timeline must be replayable
-
-### 4.2.3 Event types
-
-Phase 1 event types are:
-
-- `turn.started`
-- `agent.state.changed`
-- `tool.started`
-- `tool.progress`
-- `tool.finished`
-- `external_action.started`
-- `external_action.finished`
-- `text.delta`
-- `artifact.created`
-- `turn.completed`
-- `turn.failed`
-- `turn.cancelled`
-- `heartbeat`
-
-### 4.2.4 Layering rules
-
-- `tool.*` is for the standard tool loop
-- `external_action.*` is only for user-visible external actions such as `web_search`
-- not every internal runtime step should appear in the timeline
-
-## 4.3 Artifact
-
-An `Artifact` is a persisted, re-readable, reusable object.
-
-### 4.3.1 Suggested artifact types
-
-- `answer`
-- `reference_bundle`
-- `tool_result_summary`
-- `search_result_summary`
-- `error_summary`
-
-### 4.3.2 Suggested fields
-
-```text
-schema_version
-artifact_id
-turn_id
-artifact_type
-created_at
-summary
-storage_ref | payload
-```
-
-### 4.3.3 Hard rules
-
-1. the stream must not carry large bodies
-2. the stream only carries summaries, artifact ids, and minimal metadata
-3. references must be materialized as a separate artifact
-
-## 5. User-visible status vocabulary
-
-The frontend must not expose raw runtime event names.
-
-Phase 1 user-facing status vocabulary is fixed to:
-
-- `Thinking`
-- `Searching`
-- `Calling Tool`
-- `Reading Result`
-- `Streaming Answer`
-- `Completed`
-- `Failed`
-
-This keeps the UI stable even if backend internals evolve later.
-
-## 6. API and Transport Design
-
-## 6.1 Primary transport
-
-The primary transport is `SSE` only.
-
-The system should not keep a long-lived `WebSocket + SSE` dual stack.
-
-## 6.2 API surface
-
-### 6.2.1 Create a turn
-
-```text
-POST /api/v2/agent/chats/{chat_id}/turns
-```
-
-Suggested request fields:
-
-- `query`
-- `context`
-- `model_profile`
-- `client_idempotency_key`
-
-Suggested response fields:
-
-- `turn_id`
-- `status`
-- `stream_url`
-
-### 6.2.2 Subscribe to turn events
-
-```text
-GET /api/v2/agent/chats/{chat_id}/turns/{turn_id}/events
-```
-
-Response type:
-
-```text
-Content-Type: text/event-stream
-```
-
-### 6.2.3 Get turn snapshot
-
-```text
-GET /api/v2/agent/chats/{chat_id}/turns/{turn_id}
-```
-
-Used for:
-
-- page refresh recovery
-- fallback after failed SSE reconnect
-- diagnosis
-
-### 6.2.4 Cancel a turn
-
-```text
-POST /api/v2/agent/chats/{chat_id}/turns/{turn_id}/cancel
-```
-
-### 6.2.5 Get an artifact
-
-```text
-GET /api/v2/agent/artifacts/{artifact_id}
-```
-
-### 6.2.6 OpenAI-compatible adapter
-
-```text
-POST /v1/chat/completions
-```
-
-This endpoint is a compatibility adapter for OpenAI-shaped clients. It is not
-the primary UI contract. The implementation must translate each request into an
-Agent Runtime V3 turn and then format the result as either:
-
-- `chat.completion` JSON when `stream=false`
-- `text/event-stream` `chat.completion.chunk` frames when `stream=true`
-
-The adapter contract is:
-
-- `bot_id` is required as a query parameter
-- `chat_id` is optional; if omitted, the backend creates and later deletes an
- ephemeral chat
-- `language` is optional and defaults to `en-US`
-- `Idempotency-Key` / `X-Idempotency-Key` maps to
- `client_idempotency_key`
-
-## 6.3 Idempotency and reconnect
-
-### 6.3.1 Idempotency
-
-- `POST turn` must support a client idempotency key
-- repeated requests with the same `chat_id + client_idempotency_key` must not create multiple turns
-- one turn must never execute twice
-
-### 6.3.2 SSE reconnect
-
-- reconnect should use `Last-Event-ID` or an explicit offset
-- if the event buffer has expired:
- 1. the client fetches the turn snapshot
- 2. the client resumes from the newest available cursor
-
-## 6.4 Heartbeat, backpressure, and timeout
-
-The SSE layer must define:
-
-- heartbeat behavior
-- event buffer limits
-- delta merge policy
-- overload summarization/truncation behavior
-
-It must also distinguish:
-
-- single tool timeout
-- total runtime timeout for one turn
-- stream idle timeout
-
-## 7. Permission Boundary
-
-The new runtime entrypoint must re-check permissions explicitly. It must not inherit assumptions from the old WebSocket path.
-
-Every turn creation must validate:
-
-- chat ownership
-- collection/file context visibility
-- tool visibility scope
-
-Artifact retrieval endpoints must also enforce permissions.
-
-## 8. Storage Design
-
-## 8.1 Redis responsibility
-
-Redis handles only short-lived runtime and stream recovery state:
-
-- `turn runtime state`
-- `stream cursor`
-- `transient event buffer`
-- `in-flight text buffer`
-
-Redis no longer owns:
-
-- the old message grammar
-- the product-level message contract
-- the long-term history representation
-
-## 8.2 DB / persistent responsibility
-
-Persistent storage holds:
-
-- `conversation_turn`
-- `timeline_event` (at least key events)
-- `artifact`
-- `reference_bundle`
-- `error_summary`
-
-The timeline must be replayable after the stream ends.
-
-## 8.3 History commit policy
-
-The final history is not written token-by-token.
-
-Instead:
-
-1. only transient runtime state is updated during streaming
-2. the standardized turn record is committed only after `done` or explicit `error`
-
-This prevents half-written history after cancellation, reconnect, or rollback.
-
-## 9. Frontend Experience Model
-
-The frontend should no longer treat one assistant bubble as the carrier of everything.
-
-The recommended layout is:
-
-1. `Turn Header`
-2. `Timeline`
-3. `Final Answer Panel`
-4. `References Panel`
-5. `Diagnostics Drawer`
-
-Where:
-
-- Timeline shows process only
-- Final Answer Panel shows the final answer only
-- References Panel shows sources only
-- Diagnostics Drawer is expandable and opt-in
-
-## 10. Runtime Path
-
-## 10.1 Phase 1: PydanticAI adapter
-
-Phase 1 uses `PydanticAI` as the runtime implementation because it lowers implementation cost for:
-
-- single-turn internal loops
-- tool calling
-- provider calling
-- state mapping
-
-But it does not define:
-
-- public API
-- timeline schema
-- artifact schema
-- history commit policy
-
-## 10.2 Long-term path
-
-If the `PydanticAI adapter` stays stable and low-maintenance, it may remain.
-
-If its runtime behavior still constrains ApeRAG too much, the internals can later be replaced by a self-built thin orchestration layer.
-
-Because the contract is already separated, that later replacement changes the implementation only, not the product boundary.
-
-## 11. Replacement Boundary
-
-This rewrite:
-
-- keeps `FastAPI`, `FastMCP`, business APIs, provider integrations, and business entities
-- replaces `mcp-agent runtime glue`, the old WebSocket grammar, the old Redis message shape, the old frontend rendering model, and the old tool-result/event pushback mechanism
-
-That means business value is preserved while the most fragile product/runtime coupling is rebuilt cleanly.
-
-## 12. Migration and Rollback Principles
-
-### 12.1 Migration
-
-- a short-lived feature flag is allowed for migration safety
-- a long-lived dual stack is not allowed
-- once the new path is stable, the old WebSocket grammar and old runtime glue should be removed
-
-### 12.2 Rollback conditions
-
-Rollback is allowed if:
-
-- SSE is unstable behind enterprise proxies
-- timeline replay/reconnect is unreliable
-- turn/history/artifact compatibility is broken
-- provider/tool failures are not diagnosable
-
-### 12.3 Rollback requirements
-
-After rollback:
-
-- history must remain readable
-- turn records must not become orphaned
-- artifacts must remain traceable
-
-## 13. Phase 1 Task Outline
-
-Phase 1 should focus on the minimum viable end-to-end path rather than the final long-term shape.
-
-Suggested tasks:
-
-1. create `aperag/agent_runtime/`
-2. define `Turn / TimelineEvent / Artifact` schemas
-3. implement `TurnService`
-4. implement `EventService`
-5. implement `ArtifactService`
-6. implement `HistoryWriter`
-7. define `AgentRuntime`
-8. implement `PydanticAIRuntime`
-9. implement the MCP client adapter
-10. implement `SSE StreamEmitter`
-11. add v2 agent APIs
-12. build the new timeline frontend
-13. build answer/references/diagnostics panels
-14. implement snapshot recovery and cancel
-15. add contract-level E2E coverage
-
-## 14. Acceptance Criteria
-
-Phase 1 should satisfy:
-
-1. the new API can create turns, stream events, fetch snapshots, fetch artifacts, and cancel turns
-2. a single turn can complete multiple search/tool/thinking loops
-3. the timeline is reconnectable and replayable
-4. the final answer and process events are fully separated
-5. history commit policy leaves no half-written final records
-6. `mcp-agent` is no longer on the primary chat execution path
-
-## 15. Final Decision
-
-Agent Runtime V3 makes the following official:
-
-- stop patching `mcp-agent`
-- stop patching the old WebSocket grammar
-- rebuild the runtime contract around `Turn + TimelineEvent + Artifact + SSE`
-- use `PydanticAI adapter` for Phase 1
-- let implementation follow this document, while architecture remains responsible for the contract boundary and long-term direction
-
-In one sentence:
-
-This is not just a runtime swap. It is a product-layer rebuild of ApeRAG's agent runtime for private, low-support delivery.
diff --git a/docs/en-US/design/architecture.md b/docs/en-US/design/architecture.md
deleted file mode 100644
index b310af552..000000000
--- a/docs/en-US/design/architecture.md
+++ /dev/null
@@ -1,850 +0,0 @@
----
-title: System Architecture
-description: ApeRAG Architecture Design and Core Components
-keywords: ApeRAG, Architecture, RAG, Knowledge Graph, LightRAG
-position: 1
----
-
-# ApeRAG System Architecture
-
-## 1. What is ApeRAG
-
-ApeRAG is an **open, Agentic Graph RAG platform**. It's not just a simple vector retrieval system, but a production-ready solution that deeply integrates knowledge graphs, multimodal retrieval, and intelligent agents.
-
-Traditional RAG systems primarily rely on vector similarity search. While they can find semantically related content, they often lack understanding of relationships between knowledge points. ApeRAG's core innovations are:
-
-- **Graph RAG**: Automatically extracts entities (people, places, concepts) and relationships from documents to build knowledge graphs, understanding connections between knowledge points
-- **Agentic**: Built-in intelligent agents that can autonomously plan, invoke tools, and conduct multi-turn conversations for smarter Q&A experiences
-- **Open Integration**: Exposes capabilities through **RESTful API** and **MCP Protocol**, easily integrating with external systems like Dify, Claude, and Cursor
-
-### Core Advantages
-
-Compared to traditional RAG solutions, ApeRAG provides:
-
-- **Powerful Document Processing**: Supports PDF, Word, Excel and more, handling complex tables, formulas, and images
-- **Multiple Retrieval Methods**: Vector, full-text, and graph retrieval complement each other
-- **Knowledge Relationship Understanding**: Understands concept relationships through knowledge graphs, not just text similarity
-- **Open Integration Capabilities**: RESTful API + MCP protocol, can serve as knowledge backend for Dify, Claude Desktop, Cursor
-- **Production-Grade Architecture**: Async processing, multi-storage, high concurrency, ready for production
-
-### Architecture Overview
-
-```mermaid
-graph TB
- User[Users] --> Frontend[Web Frontend]
- User --> External[External Systems
Dify/Claude/Cursor]
-
- Frontend --> API[RESTful API]
- External --> MCP[MCP Protocol]
-
- API --> DocProcess[Document Processing]
- API --> Search[Search Service]
- API --> Agent[Agent Dialogue]
- MCP --> Search
- MCP --> Agent
-
- DocProcess --> Tasks[Async Task Layer]
- Tasks --> Storage[Storage Layer]
-
- Search --> Storage
- Agent --> Search
-
- Storage --> PG[(PostgreSQL)]
- Storage --> Qdrant[(Qdrant
Vector DB)]
- Storage --> ES[(Elasticsearch
Full-text Search)]
- Storage --> Neo4j[(Neo4j
Graph DB)]
- Storage --> MinIO[(MinIO
Object Storage)]
-
- style User fill:#e1f5ff
- style Frontend fill:#bbdefb
- style External fill:#bbdefb
- style API fill:#90caf9
- style MCP fill:#90caf9
- style DocProcess fill:#fff59d
- style Search fill:#fff59d
- style Agent fill:#fff59d
- style Tasks fill:#c5e1a5
- style Storage fill:#ffccbc
-```
-
-## 2. Layered Architecture
-
-ApeRAG adopts a clear layered design, with each layer serving its specific purpose:
-
-```mermaid
-graph TB
- subgraph Layer1[Client Layer]
- Web[Web Frontend
Next.js]
- Dify[Dify]
- Cursor[Cursor]
- Claude[Claude Desktop]
- end
-
- subgraph Layer2[Interface Layer]
- API[RESTful API
FastAPI]
- MCP[MCP Server
Model Context Protocol]
- end
-
- subgraph Layer3[Service Layer]
- CollSvc[Collection Service]
- DocSvc[Document Service]
- SearchSvc[Search Service]
- GraphSvc[Graph Service]
- AgentSvc[Agent Service]
- end
-
- subgraph Layer4[Task Layer]
- Celery[Celery Worker
Async Tasks]
- MinerU[MinerU
Document Parser]
- end
-
- subgraph Layer5[Storage Layer]
- PG[(PostgreSQL)]
- Qdrant[(Qdrant)]
- ES[(Elasticsearch)]
- Neo4j[(Neo4j)]
- Redis[(Redis)]
- MinIO[(MinIO)]
- end
-
- Web --> API
- Dify --> MCP
- Cursor --> MCP
- Claude --> MCP
-
- API --> CollSvc
- API --> DocSvc
- API --> SearchSvc
- API --> GraphSvc
- API --> AgentSvc
-
- MCP --> SearchSvc
- MCP --> AgentSvc
-
- CollSvc --> Celery
- DocSvc --> Celery
- GraphSvc --> Celery
-
- Celery --> MinerU
- Celery --> PG
- Celery --> Qdrant
- Celery --> ES
- Celery --> Neo4j
- Celery --> MinIO
-
- SearchSvc --> PG
- SearchSvc --> Qdrant
- SearchSvc --> ES
- SearchSvc --> Neo4j
-
- style Layer1 fill:#e3f2fd
- style Layer2 fill:#f3e5f5
- style Layer3 fill:#fff3e0
- style Layer4 fill:#e8f5e9
- style Layer5 fill:#fce4ec
-```
-
-**Layer Responsibilities**:
-
-- **Client Layer**: Multiple access methods - Web UI for management, MCP clients (Dify, Cursor, Claude, etc.) for integration
-- **Interface Layer**: RESTful API (traditional HTTP interface) and MCP Server (AI tool protocol) provide services in parallel
-- **Service Layer**: Core business logic, coordinating resources to complete specific functions
-- **Task Layer**: Handles time-consuming operations (document parsing, index building) to ensure fast API responses
-- **Storage Layer**: Multiple storage systems, selecting optimal solutions for different data types
-
-## 3. Document Processing Flow
-
-This is one of ApeRAG's core capabilities. From uploading a PDF file to making it searchable involves a series of carefully designed processing steps.
-
-### 3.1 Document Upload and Parsing
-
-When you upload a document, ApeRAG automatically identifies the format and selects the appropriate parser:
-
-```mermaid
-flowchart TD
- Upload[User Upload Document] --> Detect[Format Detection]
-
- Detect --> |PDF| MinerU[MinerU Parser]
- Detect --> |Word/Excel| MarkItDown[MarkItDown Parser]
- Detect --> |Markdown| DirectParse[Direct Parse]
- Detect --> |Image| OCR[OCR Recognition]
-
- MinerU --> Extract[Content Extraction]
- MarkItDown --> Extract
- DirectParse --> Extract
- OCR --> Extract
-
- Extract --> Parts[Document Parts
Parts Objects]
-
- style Upload fill:#e1f5ff
- style Extract fill:#c5e1a5
- style Parts fill:#fff59d
-```
-
-**MinerU's Power**:
-
-- Accurately recognizes complex PDF table structures, preserving table content integrity
-- Extracts LaTeX mathematical formulas, maintaining formula readability
-- Performs OCR on scanned PDFs, supporting mixed Chinese-English text
-- Identifies image regions in documents, supporting image content understanding
-
-### 3.2 Intelligent Chunking Strategy
-
-After document parsing, content needs to be split into appropriately sized chunks. This step is critical - chunks too large affect retrieval precision, too small lose context.
-
-```mermaid
-flowchart TD
- Parts[Document Parts] --> Rechunk[Smart Re-chunking]
-
- Rechunk --> Analysis[Analyze Document Structure]
- Analysis --> Hierarchy[Identify Title Hierarchy]
- Hierarchy --> Group[Group by Titles]
-
- Group --> Check{Check Chunk Size}
- Check --> |Too Large| Split[Semantic Splitting]
- Check --> |Appropriate| Chunks[Final Chunks]
- Split --> Chunks
-
- Chunks --> AddContext[Add Context]
- AddContext --> FinalChunks[Chunks with Context]
-
- style Rechunk fill:#bbdefb
- style Split fill:#ffccbc
- style FinalChunks fill:#c5e1a5
-```
-
-**Chunking Strategy Features**:
-
-- **Maintain Semantic Integrity**: Avoid breaking sentences in the middle
-- **Preserve Title Context**: Each chunk knows which section it belongs to
-- **Hierarchical Splitting**: Split by paragraphs first, then sentences, finally characters
-- **Smart Merging**: Adjacent small title chunks are merged to avoid information fragmentation
-
-Chunking Parameters:
-- Default chunk size: 1200 tokens (approximately 800-1000 Chinese characters)
-- Overlap size: 100 tokens (ensures context continuity)
-
-### 3.3 Parallel Multi-Index Building
-
-After chunking, multiple indexes are created simultaneously. Each index serves different purposes and complements the others:
-
-| Index Type | Use Case | Storage | Retrieval Method |
-|-----------|----------|---------|------------------|
-| **Vector Index** | Semantic similarity questions, e.g., "how to optimize performance" | Qdrant | Cosine Similarity |
-| **Full-text Index** | Exact keyword search, e.g., "PostgreSQL configuration" | Elasticsearch | BM25 Algorithm |
-| **Graph Index** | Relationship questions, e.g., "what's the connection between A and B" | PostgreSQL/Neo4j | Graph Traversal |
-| **Summary Index** | Quick document overview | PostgreSQL | Vector Matching |
-| **Vision Index** | Image content search | Qdrant | Multimodal Vector |
-
-```mermaid
-flowchart LR
- Chunks[Document Chunks] --> IndexMgr[Index Manager]
-
- IndexMgr --> VectorIdx[Vector Index Creation]
- IndexMgr --> FulltextIdx[Full-text Index Creation]
- IndexMgr --> GraphIdx[Graph Index Creation]
- IndexMgr --> VisionIdx[Vision Index Creation]
-
- VectorIdx --> Qdrant1[(Qdrant)]
- FulltextIdx --> ES[(Elasticsearch)]
- GraphIdx --> Graph[(Neo4j/PG)]
- VisionIdx --> Qdrant2[(Qdrant)]
-
- style IndexMgr fill:#fff59d
- style VectorIdx fill:#bbdefb
- style FulltextIdx fill:#c5e1a5
- style GraphIdx fill:#ffccbc
- style VisionIdx fill:#e1bee7
-```
-
-**Advantages of Parallel Building**:
-- Different indexes can be built simultaneously, improving speed
-- Failure of one index doesn't affect others
-- Can enable specific index types on demand
-
-### 3.4 Knowledge Graph Construction
-
-Graph indexing is ApeRAG's signature feature, extracting structured knowledge from documents.
-
-```mermaid
-flowchart TD
- Chunks[Document Chunks] --> EntityExtract[Entity Extraction]
-
- EntityExtract --> LLM1[Call LLM
Identify Entities]
- LLM1 --> Entities[Entity List
People, Places, Concepts]
-
- Entities --> RelationExtract[Relation Extraction]
- RelationExtract --> LLM2[Call LLM
Identify Relations]
- LLM2 --> Relations[Relation List
Who relates to whom and how]
-
- Entities --> Merge[Entity Merging]
- Relations --> Merge
-
- Merge --> Components[Connected Components Analysis]
- Components --> Parallel[Parallel Processing of Components]
- Parallel --> Graph[(Knowledge Graph)]
-
- style EntityExtract fill:#bbdefb
- style RelationExtract fill:#c5e1a5
- style Merge fill:#ffccbc
- style Components fill:#fff59d
-```
-
-**Key Steps in Graph Construction**:
-
-1. **Entity Extraction**: LLM identifies meaningful entities from document chunks
- - Example: From "Zhang San studies AI at Tsinghua University in Beijing"
- - Entities: Zhang San (person), Beijing (location), Tsinghua University (organization), AI (concept)
-
-2. **Relation Extraction**: Identifies relationships between entities
- - Example: Zhang San --studies--> AI, Zhang San --attends--> Tsinghua University
-
-3. **Entity Merging**: Same entity may have different expressions, needs normalization
- - Example: "LightRAG", "light rag", "Light-RAG" → merged into unified entity
-
-4. **Connected Components Optimization**: Divides graph into independent subgraphs for parallel processing
- - Performance improvement: 2-3x throughput
-
-**Why Connected Components Optimization?**
-
-Suppose you have 100 documents discussing different topics. Entities about "databases" and entities about "machine learning" have no connections and can be processed independently. The connected components algorithm identifies these independent "knowledge islands" and processes them in parallel, greatly improving speed.
-
-### 3.5 Async Task System
-
-Document processing is time-consuming, so ApeRAG uses a "dual-chain architecture" to ensure good user experience:
-
-```mermaid
-graph TB
- subgraph Frontend["🚀 Frontend Chain - Fast Response"]
- direction TB
- A1["📤 User Upload Document"] --> A2["🔌 API Receives Request"]
- A2 --> A3["📋 Index Manager"]
- A3 --> A4["💾 Write to Database
status = PENDING
version = 1"]
- A4 --> A5["✅ Return Success Immediately
< 100ms"]
- end
-
- subgraph Backend["⚙️ Backend Chain - Async Processing"]
- direction TB
- B1["⏰ Celery Beat
Check every 30s"] --> B2["🔍 Reconciler Detects
version ≠ observed_version"]
- B2 --> B3{"🎯 Found Pending Tasks?"}
- B3 -->|Yes| B4["🚀 Schedule Worker"]
- B3 -->|No| B1
- B4 --> B5["📄 Parse Document"]
- B5 --> B6["🔀 Parallel Index Creation
Vector + Fulltext
+ Graph + Vision"]
- B6 --> B7["✨ Update Status
status = ACTIVE
observed_version = 1"]
- B7 --> B1
- end
-
- A4 -.-|"Database State Change"| B2
-
- style Frontend fill:#e3f2fd,stroke:#1976d2,stroke-width:3px
- style Backend fill:#fff3e0,stroke:#f57c00,stroke-width:3px
- style A5 fill:#c8e6c9,stroke:#388e3c,stroke-width:2px
- style B7 fill:#c8e6c9,stroke:#388e3c,stroke-width:2px
- style B3 fill:#fff9c4,stroke:#fbc02d,stroke-width:2px
-```
-
-**Benefits of Dual-Chain**:
-
-- **Fast Frontend Response**: API returns within 100ms after user uploads, no need to wait for processing
-- **Async Backend Processing**: Real processing work happens in background without blocking user operations
-- **Auto Retry**: System automatically retries if processing fails, ensuring eventual success
-- **Status Tracking**: Users can check document processing progress anytime
-
-**Index State Machine**:
-
-```mermaid
-stateDiagram-v2
- [*] --> PENDING: 📤 Document Upload
-
- PENDING --> CREATING: 🚀 Reconciler Detected
Start Processing
-
- CREATING --> ACTIVE: ✅ All Indexes Created Successfully
- CREATING --> FAILED: ❌ Processing Failed
-
- FAILED --> CREATING: 🔄 Auto Retry
(max 3 times)
- FAILED --> [*]: 💔 Exceeded Retry Limit
Mark as Failed
-
- ACTIVE --> CREATING: 🔄 Document Updated
Rebuild Index
- ACTIVE --> [*]: 🗑️ Delete Document
-
- note right of PENDING
- version = 1
- observed_version = 0
- end note
-
- note right of CREATING
- Processing in progress
- May take several minutes
- end note
-
- note right of ACTIVE
- version = 1
- observed_version = 1
- Ready for search
- end note
-```
-
-## 4. Retrieval and Q&A Flow
-
-Once indexed, users can ask questions. ApeRAG's retrieval system intelligently selects appropriate retrieval strategies.
-
-### 4.1 Hybrid Retrieval System
-
-Different types of questions suit different retrieval methods. ApeRAG uses multiple retrieval strategies simultaneously and fuses results:
-
-```mermaid
-flowchart TB
- Query[User Query] --> Router[Retrieval Router]
-
- Router --> |Parallel| Vector[Vector Retrieval]
- Router --> |Parallel| Fulltext[Full-text Retrieval]
- Router --> |Parallel| Graph[Graph Retrieval]
-
- Vector --> Embed[Generate Query Vector]
- Embed --> QdrantSearch[Qdrant Similarity Search]
- QdrantSearch --> R1[Results 1]
-
- Fulltext --> ESSearch[Elasticsearch BM25]
- ESSearch --> R2[Results 2]
-
- Graph --> GraphQuery[Graph Query
local/global/hybrid]
- GraphQuery --> R3[Results 3]
-
- R1 --> Merge[Result Fusion]
- R2 --> Merge
- R3 --> Merge
-
- Merge --> Rerank[Rerank Re-scoring]
- Rerank --> Final[Final Results]
-
- style Query fill:#e1f5ff
- style Vector fill:#bbdefb
- style Fulltext fill:#c5e1a5
- style Graph fill:#ffccbc
- style Rerank fill:#fff59d
- style Final fill:#c5e1a5
-```
-
-**Retrieval Strategy Explanation**:
-
-- **Vector Retrieval**: For semantically similar questions
- - Q: "How to improve system performance?"
- - Finds: "Optimize database queries", "Use caching", etc.
-
-- **Full-text Retrieval**: For exact keyword matching
- - Q: "Where is PostgreSQL configuration file?"
- - Finds paragraphs containing exactly "PostgreSQL" and "configuration file"
-
-- **Graph Retrieval**: For relationship questions
- - Q: "What's the relationship between LightRAG and Neo4j?"
- - Queries connection paths between these two entities in the graph
-
-**Result Fusion Strategy**:
-
-Results from different retrieval methods need merging. ApeRAG uses a Rerank model to re-score all candidate results:
-
-1. Collect all retrieval results (may have duplicates)
-2. Deduplicate, keep most relevant segments
-3. Use Rerank model to evaluate relevance of each segment to the question
-4. Re-sort by new scores
-5. Return Top-K results
-
-### 4.2 Knowledge Graph Query
-
-Graph retrieval has three modes for different types of questions:
-
-| Mode | Use Case | Query Method | Example Question |
-|------|----------|--------------|------------------|
-| **local** | Query local info about an entity | Vector match similar entities → Get neighbor nodes | "Zhang San's personal info" |
-| **global** | Query overall relationships and patterns | Vector match similar relations → Get connection paths | "What's the company's organizational structure" |
-| **hybrid** | Comprehensive questions | local + global combined | "Zhang San's role and responsibilities in the company" |
-
-```mermaid
-flowchart TD
- Question[User Question] --> Analyze[Question Analysis]
-
- Analyze --> Local[Local Mode
Entity-centric]
- Analyze --> Global[Global Mode
Relation-centric]
- Analyze --> Hybrid[Hybrid Mode
Comprehensive Query]
-
- Local --> FindEntity[Find Related Entities]
- FindEntity --> GetNeighbors[Get Neighbors and Relations]
-
- Global --> FindRelations[Find Related Relations]
- FindRelations --> GetContext[Get Relation Context]
-
- Hybrid --> Local
- Hybrid --> Global
-
- GetNeighbors --> Context[Generate Context]
- GetContext --> Context
-
- Context --> Return[Return to LLM]
-
- style Local fill:#bbdefb
- style Global fill:#c5e1a5
- style Hybrid fill:#fff59d
-```
-
-**Real Example**:
-
-Suppose the knowledge graph contains:
-- Entities: Zhang San (person), Database Team (organization), PostgreSQL (technology)
-- Relations: Zhang San --belongs to--> Database Team, Zhang San --excels at--> PostgreSQL
-
-Question: "What is Zhang San responsible for?"
-
-1. **Local Mode**:
- - Finds "Zhang San" entity
- - Gets all directly connected nodes
- - Returns: "Zhang San belongs to Database Team, excels at PostgreSQL"
-
-2. **Global Mode**:
- - Finds related relation patterns: "responsible for", "belongs to"
- - Returns entire team structure and responsibility division
-
-3. **Hybrid Mode**:
- - Uses both methods above
- - Provides more comprehensive answer
-
-### 4.3 Agent Dialogue System
-
-Agent is ApeRAG's intelligent assistant that can invoke various tools to answer questions.
-
-```mermaid
-sequenceDiagram
- participant User as User
- participant API as API Server
- participant Agent as Agent Service
- participant LLM as LLM Service
- participant MCP as MCP Tools
- participant Search as Search Service
-
- User->>API: Send Question
- API->>Agent: Forward Question
-
- Agent->>LLM: Call LLM
with Tool List
- LLM-->>Agent: Decide to call search_collection tool
-
- Agent->>MCP: Execute Tool Call
- MCP->>Search: Hybrid Retrieval
- Search-->>MCP: Return Relevant Document Segments
- MCP-->>Agent: Tool Execution Result
-
- Agent->>LLM: Call LLM Again
with Retrieved Context
- LLM-->>Agent: Generate Final Answer
-
- Agent-->>API: Stream Response
- API-->>User: SSE Push Answer
-```
-
-**Agent Workflow**:
-
-1. **Receive Question**: User sends a question
-
-2. **Tool Decision**: LLM analyzes question and decides which tools to call
- - Possible tools: search_collection (search knowledge base), web_search (search internet), web_read (read web page), etc.
-
-3. **Execute Tools**: Agent calls corresponding tools
- - Example: search_collection triggers hybrid retrieval, returns relevant documents
-
-4. **Generate Answer**: LLM generates answer based on retrieved context
-
-5. **Stream Response**: Answer pushed to user in real-time via SSE (Server-Sent Events), no need to wait for complete generation
-
-**Role of MCP Protocol**:
-
-MCP (Model Context Protocol) is a standardized tool protocol that allows AI assistants (like Claude Desktop, Cursor) to easily invoke ApeRAG's capabilities. Through MCP, external AI tools can:
-- List your knowledge bases
-- Search knowledge base content
-- Read web page content
-- Search the internet
-
-**Dialogue Example**:
-
-```
-User: How does ApeRAG's graph indexing work?
-
-Agent thinks: Need to search knowledge base
-↓
-Call tool: search_collection(query="graph indexing principles", collection_id="aperag-docs")
-↓
-Retrieval results: Returns document segments about graph construction, entity extraction, relation extraction
-↓
-Agent answers: ApeRAG's graph indexing works through the following steps... (generated based on retrieved content)
-```
-
-## 5. Storage Architecture
-
-ApeRAG adopts a multi-storage architecture, selecting the most appropriate storage solution for different data types.
-
-### 5.1 Storage Selection Decision
-
-```mermaid
-flowchart TD
- Data["🎯 Data Type Classification"] --> Choice{"📊 What Data?"}
-
- Choice --> |"📋 Structured Data
Users, Configs, etc."| PG["PostgreSQL"]
- Choice --> |"🔢 Vector Data
embeddings"| Qdrant["Qdrant"]
- Choice --> |"📝 Text Data
Full-text Search"| ES["Elasticsearch"]
- Choice --> |"📁 File Data
Raw Documents"| MinIO["MinIO/S3"]
- Choice --> |"🕸️ Graph Data
Knowledge Graph"| GraphChoice{"Graph Scale?"}
- Choice --> |"⚡ Cache Data
Temporary Data"| Redis["Redis"]
-
- GraphChoice -->|"Small Scale
< 100K entities
💰 Recommended"| PG2["PostgreSQL
Built-in Graph Storage"]
- GraphChoice -->|"Large Scale
> 1M entities"| Neo4j["Neo4j
Professional Graph DB"]
-
- PG --> PGUse["✅ Transaction Support
✅ Relational Queries
✅ Small-scale Graph Storage
✅ Mature & Stable"]
- PG2 --> PG2Use["✅ No Extra Components
✅ Lower Ops Cost
✅ Sufficient for Most Cases"]
- Qdrant --> QdrantUse["✅ Vector Similarity Search
✅ High-dimensional Data Retrieval
✅ Filter Support"]
- ES --> ESUse["✅ Full-text Search BM25
✅ Keyword Search
✅ Chinese Tokenization IK"]
- MinIO --> MinIOUse["✅ Large File Storage
✅ S3 Protocol Compatible
✅ Low Cost"]
- Neo4j --> Neo4jUse["✅ Large-scale Graph Query
✅ Complex Relation Traversal
✅ Graph Algorithm Support"]
- Redis --> RedisUse["✅ Celery Task Queue
✅ LLM Call Cache
✅ Millisecond Access"]
-
- style Data fill:#e3f2fd,stroke:#1976d2,stroke-width:3px
- style Choice fill:#fff59d,stroke:#fbc02d,stroke-width:3px
- style GraphChoice fill:#fff59d,stroke:#fbc02d,stroke-width:2px
- style PG fill:#bbdefb,stroke:#1976d2,stroke-width:2px
- style PG2 fill:#c8e6c9,stroke:#388e3c,stroke-width:3px
- style Qdrant fill:#c5e1a5,stroke:#689f38,stroke-width:2px
- style ES fill:#ffccbc,stroke:#e64a19,stroke-width:2px
- style MinIO fill:#e1bee7,stroke:#8e24aa,stroke-width:2px
- style Neo4j fill:#f8bbd0,stroke:#c2185b,stroke-width:2px
- style Redis fill:#ffecb3,stroke:#ffa000,stroke-width:2px
-```
-
-### 5.2 Data Flow
-
-Different data flows to different storage systems:
-
-```mermaid
-flowchart LR
- Doc[Upload Document] --> Parser[Parser]
- Parser --> |Raw Files| MinIO[(MinIO)]
- Parser --> |Document Metadata| PG1[(PostgreSQL)]
- Parser --> |Document Chunks| Chunks[Chunking]
-
- Chunks --> |Generate Vectors| Embed[Embedding]
- Embed --> Qdrant[(Qdrant)]
-
- Chunks --> |Text Content| ES[(Elasticsearch)]
-
- Chunks --> |Extract Entity Relations| Graph[Graph Construction]
- Graph --> |Small Scale| PG2[(PostgreSQL)]
- Graph --> |Large Scale| Neo4j[(Neo4j)]
-
- PG1 -.Metadata.-> Cache
- Cache -.Cache.-> Redis[(Redis)]
-
- style Doc fill:#e1f5ff
- style MinIO fill:#e1bee7
- style PG1 fill:#bbdefb
- style PG2 fill:#bbdefb
- style Qdrant fill:#c5e1a5
- style ES fill:#ffccbc
- style Neo4j fill:#f8bbd0
- style Redis fill:#ffecb3
-```
-
-### 5.3 Core Storage Systems
-
-**PostgreSQL** (Primary Database)
-
-Storage Content:
-- User info, permissions, configurations
-- Collection (knowledge base) metadata
-- Document metadata and index status
-- Conversation history
-- Small-scale knowledge graphs (< 100K entities)
-
-Why Choose:
-- Strong transaction support, ensures data consistency
-- Mature and stable, low operational cost
-- pgvector extension, supports vector storage
-- Can handle small-scale graph data without extra graph database
-
-**Qdrant** (Vector Database)
-
-Storage Content:
-- Document chunk embedding vectors
-- Entity and relation vector representations
-- Multimodal vectors for images
-
-Why Choose:
-- Optimized specifically for vector retrieval, fast
-- Supports filter conditions, can combine with metadata filtering
-- Supports cluster deployment, horizontally scalable
-
-**Elasticsearch** (Full-text Search)
-
-Storage Content:
-- Document chunk text content
-- Supports Chinese tokenization (IK Analyzer)
-
-Why Choose:
-- BM25 algorithm works well for keyword search
-- Supports complex queries and aggregations
-- Built-in highlighting
-
-**MinIO** (Object Storage)
-
-Storage Content:
-- Raw document files (PDF, Word, etc.)
-- Intermediate results after parsing
-- Temporary uploaded files
-
-Why Choose:
-- S3 protocol compatible, can replace with cloud storage
-- Low storage cost
-- Supports large files
-
-**Graph Database Choice: PostgreSQL vs Neo4j**
-
-ApeRAG supports two graph database solutions:
-
-**PostgreSQL** (Default, Recommended for Small Scale)
-
-Storage Content:
-- Knowledge graphs (< 100K entities)
-- Graph node and edge relationship data
-
-Recommendation Reasons:
-- No extra deployment, lower operational cost
-- Performance sufficient for most scenarios
-- Complete transaction support, data consistency guaranteed
-- Can share database with other business data
-
-**Neo4j** (Optional, for Large Scale)
-
-Storage Content:
-- Large-scale knowledge graphs (> 1M entities)
-
-When Needed:
-- Entity count exceeds 100K, PostgreSQL query performance degrades
-- Need complex graph traversal queries (multi-hop relations)
-- Need to use graph algorithms (PageRank, community detection, etc.)
-
-**Summary**: For most enterprise applications, PostgreSQL is completely sufficient. Only consider Neo4j when knowledge graph scale is very large.
-
-**Redis** (Cache and Queue)
-
-Storage Content:
-- Celery task queue
-- LLM call cache
-- User session cache
-
-Why Choose:
-- Extremely fast, suitable for high-frequency access
-- Supports multiple data structures
-- Can serve as task queue Broker
-
-## 6. Technical Highlights
-
-### 6.1 Stateless LightRAG Refactoring
-
-**Background Problem**:
-
-Original LightRAG uses global state, all tasks share one instance. This causes data confusion and concurrency conflicts in multi-user, multi-Collection scenarios.
-
-**ApeRAG's Solution**:
-
-- Each task creates independent LightRAG instance
-- Isolates different Collection data through `workspace` parameter
-- Entity naming convention: `entity:{name}:{workspace}`
-- Relation naming convention: `relationship:{src}:{tgt}:{workspace}`
-
-This way, different users' graph data won't interfere with each other, truly achieving multi-tenant isolation.
-
-### 6.2 Dual-Chain Async Architecture
-
-**Traditional Approach Problem**:
-
-After user uploads document, API needs to wait for parsing and index building to complete before returning, possibly taking several minutes or longer.
-
-**Dual-Chain Architecture Advantages**:
-
-- **Frontend Chain**: API only writes state to database, returns within 100ms
-- **Backend Chain**: Reconciler periodically detects state changes, schedules async tasks
-- **Version Control**: Implements idempotency through version and observed_version
-- **Auto Retry**: Automatically retries failed tasks, ensures eventual consistency
-
-This design is inspired by Kubernetes' Reconciler pattern, very suitable for handling long-running tasks.
-
-### 6.3 Connected Components Concurrency Optimization
-
-**Problem**:
-
-During knowledge graph construction, similar entities need merging. Serial processing is slow. Full parallelization has lock contention issues.
-
-**Solution**:
-
-Use connected components algorithm to divide graph into multiple independent subgraphs:
-
-1. Build entity-relation adjacency list
-2. BFS traversal to find all connected components
-3. Different components have no connections, can be fully processed in parallel
-4. Same component processed serially internally (avoid conflicts)
-
-**Results**:
-
-- 2-3x performance improvement
-- Zero lock contention
-- Best results for diverse document collections
-
-### 6.4 Provider Abstraction Pattern
-
-ApeRAG supports 100+ LLM providers (OpenAI, Claude, Gemini, domestic LLMs, etc.). How to manage uniformly?
-
-**Design Approach**:
-
-- Define unified Provider interface
-- Each provider implements its own Provider
-- Adapt through LiteLLM library
-
-This way, switching models only requires config change, no code change. Same pattern also applies to:
-- Embedding Service (supports multiple vector models)
-- Rerank Service (supports multiple reranking models)
-- Web Search Service (DuckDuckGo, JINA, etc.)
-
-### 6.5 Multimodal Index Support
-
-Besides text, ApeRAG can also handle images:
-
-**Vision Index's Two Paths**:
-
-1. **Pure Visual Vectors**: Use multimodal models (like CLIP) to directly generate image vectors
-2. **Vision to Text**: Use VLM to generate image descriptions + OCR to recognize text → text vectorization
-
-**Fusion Strategy**:
-
-- Text and visual retrieval results sorted separately
-- Unified scoring through Rerank model
-- Final merged display
-
-## 7. Summary
-
-ApeRAG achieves production-grade RAG capabilities through the following design:
-
-**Core Advantages**:
-- **Powerful Document Processing**: Supports multiple formats, complex layouts, tables and formulas
-- **Knowledge Graph Fusion**: Not just vector matching, but understanding knowledge relationships
-- **Multiple Retrieval Methods**: Vector, full-text, and graph working together
-- **Async Architecture**: Fast response, background processing, good user experience
-- **Production-Grade Design**: Multi-storage, high concurrency, easy to scale
-
-**Technical Innovations**:
-- Stateless LightRAG, true multi-tenant support
-- Dual-chain async architecture, API response < 100ms
-- Connected components concurrency optimization, 2-3x faster graph construction
-- Provider abstraction, supports 100+ LLMs
-
-**Use Cases**:
-- Enterprise knowledge base search
-- Technical documentation Q&A
-- Customer service bots
-- Research paper analysis
-- Any scenario requiring document understanding and intelligent Q&A
-
-The system's design philosophy is: **Make complex things simple, make simple things automatic**. Users just need to upload documents, everything else is handled automatically by ApeRAG.
diff --git a/docs/en-US/design/authentication.md b/docs/en-US/design/authentication.md
deleted file mode 100644
index 1eda4c3e5..000000000
--- a/docs/en-US/design/authentication.md
+++ /dev/null
@@ -1,502 +0,0 @@
-# ApeRAG Authentication System Architecture Documentation
-
-## Overview
-
-ApeRAG adopts a Cookie-based authentication system that supports local username/password authentication and OAuth2 social login (GitHub, Google). The system is built on the FastAPI-Users library, providing complete user management and authentication functionality.
-
-## Core Architecture
-
-### Technology Stack
-- **Backend**: FastAPI + FastAPI-Users + SQLAlchemy + PostgreSQL
-- **Frontend**: React + TypeScript + Ant Design + UmiJS
-- **Authentication**: JWT + HttpOnly Cookie + OAuth 2.0
-- **Security**: bcrypt password encryption + CSRF protection
-
-### Authentication Methods
-1. **Local Authentication**: Username/password login
-2. **OAuth Social Login**: GitHub and Google third-party login
-3. **API Key Authentication**: For programmatic access
-
-## Data Models
-
-### User Table (User)
-```python
-class User(Base):
- id: str # User ID
- username: str # Username (unique)
- email: str # Email (unique)
- hashed_password: str # bcrypt encrypted password
- role: Role # User role (ADMIN/RW/RO)
- is_active: bool # Is active
- is_verified: bool # Is verified
- date_joined: datetime # Registration time
-```
-
-### OAuth Account Table (OAuthAccount)
-```python
-class OAuthAccount(Base):
- id: str # OAuth account ID
- user_id: str # Associated user ID
- oauth_name: str # OAuth provider name
- account_id: str # Third-party account ID
- account_email: str # Third-party account email
- access_token: str # Access token
-```
-
-### API Key Table (ApiKey)
-```python
-class ApiKey(Base):
- id: str # API key ID
- key: str # API key value
- user: str # Associated user ID
- description: str # Description
- status: ApiKeyStatus # Status (ACTIVE/DELETED)
- is_system: bool # Is system generated
- last_used_at: datetime # Last used time
-```
-
-## Authentication Flow
-
-### 1. Local Authentication Flow
-
-```mermaid
-sequenceDiagram
- participant U as User
- participant F as Frontend
- participant B as Backend
- participant D as Database
-
- U->>F: Enter username/password
- F->>B: POST /api/v1/login
- B->>D: Verify user credentials
- D-->>B: Return user information
- B->>B: Generate JWT token
- B->>F: Set HttpOnly Cookie
- F->>U: Redirect to main page
-```
-
-### 2. OAuth Authentication Flow
-
-```mermaid
-sequenceDiagram
- participant U as User
- participant F as Frontend
- participant B as Backend
- participant G as GitHub/Google
- participant D as Database
-
- U->>F: Click OAuth login
- F->>B: GET /api/v1/auth/{provider}/authorize
- B->>G: Redirect to OAuth authorization page
- G->>U: Show authorization page
- U->>G: Confirm authorization
- G->>F: Redirect to callback URL (with code)
- F->>B: GET /api/v1/auth/{provider}/callback
- B->>G: Exchange code for access_token
- G-->>B: Return user information
- B->>D: Find or create user account
- B->>F: Set authentication Cookie (204 No Content)
- F->>U: Redirect to main page
-```
-
-Reference: https://github.com/fastapi-users/fastapi-users/issues/434
-
-#### OAuth API Description
-
-OAuth authentication involves two key APIs that are automatically generated by FastAPI-Users:
-
-1. **Authorization Endpoint** (`/api/v1/auth/{provider}/authorize`)
- - Generates OAuth authorization URL
- - Includes state parameter to prevent CSRF attacks
- - Redirects user to third-party OAuth provider
-
-2. **Callback Endpoint** (`/api/v1/auth/{provider}/callback`)
- - Handles OAuth provider callback requests
- - Exchanges authorization code for access token
- - Retrieves user information and creates/logs in user
- - Sets authentication Cookie and returns 204 No Content
-
-## Core Components
-
-### 1. FastAPI-Users Configuration
-
-#### JWT Strategy
-```python
-COOKIE_MAX_AGE = 86400 # 24 hours
-
-def get_jwt_strategy() -> JWTStrategy:
- return JWTStrategy(secret=settings.jwt_secret, lifetime_seconds=COOKIE_MAX_AGE)
-```
-
-#### Cookie Transport
-```python
-cookie_transport = CookieTransport(
- cookie_name="session",
- cookie_max_age=COOKIE_MAX_AGE,
- cookie_secure=False, # Set to False in development
- cookie_httponly=True, # Prevent XSS attacks
- cookie_samesite="lax" # Prevent CSRF attacks
-)
-```
-
-#### Authentication Backend
-```python
-auth_backend = AuthenticationBackend(
- name="cookie",
- transport=cookie_transport,
- get_strategy=get_jwt_strategy,
-)
-```
-
-### 2. User Manager
-```python
-class UserManager(BaseUserManager[User, str]):
- async def on_after_register(self, user: User, request: Optional[Request] = None):
- # Set first registered user as admin
- user_count = await async_db_ops.query_user_count()
- if user_count == 1 and user.role != Role.ADMIN:
- user.role = Role.ADMIN
-```
-
-### 3. OAuth Client Configuration
-```python
-# GitHub OAuth
-if is_github_oauth_enabled():
- github_oauth_client = GitHubOAuth2(
- settings.github_oauth_client_id,
- settings.github_oauth_client_secret
- )
- github_oauth_router = get_oauth_router(
- github_oauth_client,
- auth_backend,
- get_user_manager,
- settings.jwt_secret,
- redirect_url=settings.oauth_redirect_url, # Callback URL configuration
- associate_by_email=True, # Associate accounts by email
- is_verified_by_default=True, # Verify users by default
- )
-```
-
-#### OAuth Route Generation
-FastAPI-Users' `get_oauth_router` function automatically generates the following routes:
-- `GET /auth/{provider}/authorize` - Get authorization URL
-- `GET /auth/{provider}/callback` - Handle OAuth callback
-
-## API Interfaces
-
-### Authentication Related Interfaces
-
-#### 1. Get Configuration Information
-```http
-GET /api/v1/config
-```
-**Response**: Configuration information containing available login methods
-
-#### 2. Local Login
-```http
-POST /api/v1/login
-Content-Type: application/json
-
-{
- "username": "user@example.com",
- "password": "password123"
-}
-```
-**Response**: User information + set session cookie
-
-#### 3. User Registration
-```http
-POST /api/v1/register
-Content-Type: application/json
-
-{
- "username": "newuser",
- "email": "user@example.com",
- "password": "password123",
- "token": "invitation_token" // Required for invitation mode
-}
-```
-
-#### 4. Logout
-```http
-POST /api/v1/logout
-```
-**Response**: Clear session cookie
-
-#### 5. Get Current User
-```http
-GET /api/v1/user
-Cookie: session=jwt_token
-```
-
-#### 6. Change Password
-```http
-POST /api/v1/change-password
-Content-Type: application/json
-
-{
- "username": "user@example.com",
- "old_password": "old_password",
- "new_password": "new_password"
-}
-```
-
-### OAuth Interfaces
-
-#### 1. OAuth Authorization
-```http
-GET /api/v1/auth/{provider}/authorize
-```
-**Response**:
-```json
-{
- "authorization_url": "https://github.com/login/oauth/authorize?..."
-}
-```
-
-#### 2. OAuth Callback
-```http
-GET /api/v1/auth/{provider}/callback?code=xxx&state=yyy
-```
-**Response**: 204 No Content + set authentication Cookie
-
-**Note**: These two OAuth APIs are automatically generated by FastAPI-Users and do not need manual implementation.
-
-### User Management Interfaces
-
-#### 1. List Users
-```http
-GET /api/v1/users
-```
-
-#### 2. Delete User
-```http
-DELETE /api/v1/users/{user_id}
-```
-
-## Frontend Implementation
-
-### 1. Login Page (`signin.tsx`)
-
-#### Core Features
-- Dynamically get available login methods
-- Local login form handling
-- OAuth login button handling
-
-#### OAuth Login Implementation
-```typescript
-// GitHub login
-onClick={async () => {
- try {
- localStorage.setItem('oauth_provider', 'github');
- const response = await fetch('/api/v1/auth/github/authorize');
- const data = await response.json();
- if (data.authorization_url) {
- window.location.href = data.authorization_url;
- }
- } catch (error) {
- console.error('GitHub OAuth error:', error);
- }
-}}
-```
-
-### 2. OAuth Callback Page (`oauth-callback.tsx`)
-
-#### Core Features
-- Parse URL parameters (code, state, etc.)
-- Determine OAuth provider
-- Call backend callback interface
-- Handle authentication results
-
-#### Implementation Logic
-```typescript
-const handleOAuth = async () => {
- // Get OAuth parameters
- const code = searchParams.get('code');
- const state = searchParams.get('state');
-
- // Determine provider
- let provider = localStorage.getItem('oauth_provider') || 'github';
-
- // Call callback interface
- const callbackUrl = `/api/v1/auth/${provider}/callback?code=${code}&state=${state}`;
- const response = await fetch(callbackUrl, {
- method: 'GET',
- credentials: 'include',
- });
-
- // Handle response
- if (response.status === 204) {
- navigate('/'); // Authentication successful
- }
-};
-```
-
-## Authentication Middleware
-
-### 1. Current User Retrieval
-```python
-async def current_user(
- request: Request,
- session: AsyncSessionDep,
- user: User = Depends(fastapi_users.current_user(optional=True))
-) -> Optional[User]:
- # Prioritize JWT/Cookie authentication
- if user:
- return user
-
- # Fallback to API Key authentication
- api_user = await authenticate_api_key(request, session)
- if api_user:
- return api_user
-
- return None
-```
-
-### 2. API Key Authentication
-```python
-async def authenticate_api_key(request: Request, session: AsyncSessionDep) -> Optional[User]:
- authorization = request.headers.get("Authorization")
- if not authorization or not authorization.startswith("Bearer "):
- return None
-
- api_key = authorization.split(" ")[1]
- # Find and validate API Key
- # Update last used time
- # Return associated user
-```
-
-## Configuration
-
-### Environment Variables
-```bash
-# JWT secret
-JWT_SECRET=your-super-secret-key
-
-# OAuth callback URL
-OAUTH_REDIRECT_URL=http://127.0.0.1:3000/web/oauth-callback
-
-# GitHub OAuth
-GITHUB_OAUTH_CLIENT_ID=your-github-client-id
-GITHUB_OAUTH_CLIENT_SECRET=your-github-client-secret
-
-# Google OAuth
-GOOGLE_OAUTH_CLIENT_ID=your-google-client-id
-GOOGLE_OAUTH_CLIENT_SECRET=your-google-client-secret
-
-# Registration mode
-REGISTER_MODE=invitation # unlimited/invitation
-```
-
-### OAuth Application Configuration
-
-OAuth providers need to configure the following key information:
-
-#### Required Configuration Items
-- **Client ID**: OAuth application client identifier
-- **Client Secret**: OAuth application client secret
-- **Callback URL**: Callback address after OAuth authorization completion
-- **Scopes**: Requested permission scope (usually includes user basic information and email)
-
-#### GitHub OAuth Application Configuration
-1. Visit [GitHub Developer Settings](https://github.com/settings/developers)
-2. Click "New OAuth App" to create a new application
-3. Fill in application information:
- - **Application name**: ApeRAG
- - **Homepage URL**: `http://127.0.0.1:3000`
- - **Authorization callback URL**: `http://127.0.0.1:3000/web/oauth-callback`
-4. Get Client ID and Client Secret after creation
-5. Default permission scope: `user:email` (get user basic information and email)
-
-#### Google OAuth Application Configuration
-1. Visit [Google Cloud Console](https://console.cloud.google.com/)
-2. Create a project or select an existing project
-3. Enable Google+ API or Google People API
-4. Create OAuth 2.0 client ID:
- - Application type: Web application
- - Authorized redirect URI: `http://127.0.0.1:3000/web/oauth-callback`
-5. Get client ID and client secret
-6. Default permission scope: `openid email profile` (get user basic information)
-
-#### Callback URL Description
-- Callback URL must match exactly with OAuth application configuration
-- Development environment: `http://127.0.0.1:3000/web/oauth-callback`
-- Production environment: `https://yourdomain.com/web/oauth-callback`
-- ApeRAG frontend BASE_PATH is `/web`, so callback URL includes this prefix
-
-## Security Features
-
-### 1. JWT Token Security
-- Strong key signing (HMAC-SHA256)
-- 24-hour validity period
-- HttpOnly Cookie transmission, prevents XSS
-- SameSite=Lax, prevents CSRF
-
-### 2. Password Security
-- bcrypt encryption storage
-- Random salt values
-- Password strength validation
-
-### 3. OAuth Security
-- State parameter prevents CSRF
-- Standard authorization code flow
-- Secure token storage
-
-### 4. API Key Security
-- Random generation (sk- prefix)
-- Usage tracking
-- Status management
-
-## Permission Control
-
-### User Roles
-- **ADMIN**: Administrator, has all permissions
-- **RW**: Read-write user, can create and modify resources
-- **RO**: Read-only user, can only view resources
-
-### Permission Check
-```python
-async def get_current_admin(user: User = Depends(get_current_active_user)) -> User:
- if user.role != Role.ADMIN:
- raise HTTPException(status_code=403, detail="Only admin members can perform this action")
- return user
-```
-
-## Registration Modes
-
-### 1. Open Registration (unlimited)
-- Anyone can register directly
-- First registered user automatically becomes administrator
-
-### 2. Invitation Registration (invitation)
-- Requires administrator to send invitation
-- Validated through invitation token
-
-## Troubleshooting
-
-### Common Issues
-
-#### 1. OAuth Callback Failure
-- Check if callback URL configuration matches
-- Verify OAuth application configuration
-- Check browser console logs
-
-#### 2. Cookie Authentication Failure
-- Check JWT_SECRET configuration
-- Verify Cookie domain settings
-- Confirm browser Cookie policy
-
-#### 3. API Key Authentication Failure
-- Verify Authorization header format: `Bearer sk-xxx`
-- Check API Key status
-- Confirm user association relationship
-
-### Debugging Methods
-1. Check backend logs: `tail -f logs/aperag.log`
-2. Check browser developer tools
-3. Verify database user and OAuth account data
-4. Test API interface responses
-
-## Summary
-
-The ApeRAG authentication system is built on FastAPI-Users, providing secure and flexible multiple authentication methods. The system supports local authentication and OAuth social login, adopts JWT+Cookie stateless authentication mechanism, and has good security and scalability. Through reasonable permission control and registration mode configuration, it can meet the usage requirements of different scenarios.
diff --git a/docs/en-US/design/chat_history_design.md b/docs/en-US/design/chat_history_design.md
deleted file mode 100644
index b1ef264cb..000000000
--- a/docs/en-US/design/chat_history_design.md
+++ /dev/null
@@ -1,584 +0,0 @@
-# ApeRAG Chat History Message Data Flow
-
-## Overview
-
-This document details the complete data flow of chat history messages in the ApeRAG project, covering the full-stack implementation from frontend API calls to backend storage.
-
-**Core API**: `GET /api/v1/bots/{bot_id}/chats/{chat_id}`
-
-## Data Flow Diagram
-
-```
-┌─────────────────┐
-│ Frontend │
-│ (Next.js) │
-└────────┬────────┘
- │ GET /api/v1/bots/{bot_id}/chats/{chat_id}
- ▼
-┌─────────────────────────────────────────────┐
-│ View Layer │
-│ aperag/views/chat.py │
-│ - get_chat_view() │
-│ - JWT Authentication │
-│ - Parameter Validation │
-└────────┬────────────────────────────────────┘
- │ chat_service_global.get_chat()
- ▼
-┌─────────────────────────────────────────────┐
-│ Service Layer │
-│ aperag/service/chat_service.py │
-│ - get_chat() │
-│ - Business Logic Orchestration │
-└────────┬────────────────────────────────────┘
- │
- ├──────────────┬─────────────┐
- │ │ │
- ▼ ▼ ▼
-┌────────────┐ ┌───────────┐ ┌──────────────┐
-│ PostgreSQL │ │ Redis │ │ PostgreSQL │
-│ chat table │ │ Message │ │feedback table│
-│ (Metadata) │ │ History │ │(User Feedback)│
-└────────────┘ └───────────┘ └──────────────┘
- │ │ │
- └──────────────┴──────────────────┘
- │
- ▼
- ┌──────────────┐
- │ ChatDetails │
- │ (Assemble) │
- └──────────────┘
-```
-
-## Detailed Flow
-
-### 1. View Layer - HTTP Request Handling
-
-**File**: `aperag/views/chat.py`
-
-```python
-@router.get("/bots/{bot_id}/chats/{chat_id}")
-async def get_chat_view(
- request: Request,
- bot_id: str,
- chat_id: str,
- user: User = Depends(required_user)
-) -> view_models.ChatDetails:
- return await chat_service_global.get_chat(str(user.id), bot_id, chat_id)
-```
-
-**Responsibilities**:
-- Receive HTTP GET requests
-- JWT Token authentication
-- Extract path parameters (bot_id, chat_id)
-- Call Service layer
-- Return `ChatDetails` response
-
-### 2. Service Layer - Business Logic Orchestration
-
-**File**: `aperag/service/chat_service.py`
-
-```python
-async def get_chat(self, user: str, bot_id: str, chat_id: str) -> view_models.ChatDetails:
- from aperag.utils.history import query_chat_messages
-
- # Step 1: Query Chat metadata from PostgreSQL
- chat = await self.db_ops.query_chat(user, bot_id, chat_id)
- if chat is None:
- raise ChatNotFoundException(chat_id)
-
- # Step 2: Query message history from Redis
- messages = await query_chat_messages(user, chat_id)
-
- # Step 3: Build response object (messages already include feedback info)
- chat_obj = self.build_chat_response(chat)
- return ChatDetails(**chat_obj.model_dump(), history=messages)
-```
-
-**Core Logic**:
-
-1. **Query Chat Metadata** (PostgreSQL)
-2. **Query Message History** (Redis + PostgreSQL feedback)
-3. **Assemble Complete Response**
-
-### 3. Data Storage Layer
-
-#### 3.1 PostgreSQL - Chat Metadata
-
-**Table**: `chat`
-
-**File**: `aperag/db/models.py`
-
-```python
-class Chat(Base):
- __tablename__ = "chat"
-
- id = Column(String(24), primary_key=True) # chat_xxxx
- user = Column(String(256), nullable=False) # User ID
- bot_id = Column(String(24), nullable=False) # Bot ID
- title = Column(String(256)) # Chat title
- peer_type = Column(EnumColumn(ChatPeerType)) # Peer type
- peer_id = Column(String(256)) # Peer ID
- status = Column(EnumColumn(ChatStatus)) # Status
- gmt_created = Column(DateTime(timezone=True)) # Created time
- gmt_updated = Column(DateTime(timezone=True)) # Updated time
- gmt_deleted = Column(DateTime(timezone=True)) # Deleted time (soft delete)
-```
-
-**Purpose**: Store Chat session metadata without actual message content
-
-#### 3.2 Redis - Message History
-
-**File**: `aperag/utils/history.py`
-
-**Key Format**: `message_store:{chat_id}`
-
-**Data Structure**: Redis List (using LPUSH, newest messages first)
-
-**Core Class**:
-
-```python
-class RedisChatMessageHistory:
- def __init__(self, session_id: str, key_prefix: str = "message_store:"):
- self.session_id = session_id
- self.key_prefix = key_prefix
-
- @property
- def key(self) -> str:
- return self.key_prefix + self.session_id # message_store:chat_abc123
-
- @property
- async def messages(self) -> List[StoredChatMessage]:
- # Read all messages from Redis
- _items = await self.redis_client.lrange(self.key, 0, -1)
- # Reverse to chronological order (LPUSH puts newest first)
- items = [json.loads(m.decode("utf-8")) for m in _items[::-1]]
- return [storage_dict_to_message(item) for item in items]
-```
-
-**Message Query Function**:
-
-```python
-async def query_chat_messages(user: str, chat_id: str):
- """Query chat messages and convert to frontend format"""
-
- # 1. Get message history from Redis
- chat_history = RedisChatMessageHistory(chat_id, redis_client=get_async_redis_client())
- stored_messages = await chat_history.messages
-
- if not stored_messages:
- return []
-
- # 2. Get feedback info from PostgreSQL
- feedbacks = await async_db_ops.query_chat_feedbacks(user, chat_id)
- feedback_map = {feedback.message_id: feedback for feedback in feedbacks}
-
- # 3. Convert to frontend format and attach feedback info
- result = []
- for stored_message in stored_messages:
- # Convert to frontend format
- chat_message_list = stored_message.to_frontend_format()
-
- # Add feedback data for AI messages
- for chat_msg in chat_message_list:
- feedback = feedback_map.get(chat_msg.id)
- if feedback and chat_msg.role == "ai":
- chat_msg.feedback = Feedback(
- type=feedback.type,
- tag=feedback.tag,
- message=feedback.message
- )
-
- result.append(chat_message_list)
-
- return result # [[message1_parts], [message2_parts], [message3_parts], ...]
-```
-
-#### 3.3 PostgreSQL - User Feedback
-
-**Table**: `message_feedback`
-
-```python
-class MessageFeedback(Base):
- __tablename__ = "message_feedback"
-
- user = Column(String(256), nullable=False) # User ID
- chat_id = Column(String(24), primary_key=True) # Chat ID
- message_id = Column(String(256), primary_key=True) # Message ID
- type = Column(EnumColumn(MessageFeedbackType)) # like/dislike
- tag = Column(EnumColumn(MessageFeedbackTag)) # Feedback tag
- message = Column(Text) # Feedback content
- question = Column(Text) # Original question
- original_answer = Column(Text) # Original answer
- status = Column(EnumColumn(MessageFeedbackStatus)) # Status
- gmt_created = Column(DateTime(timezone=True))
- gmt_updated = Column(DateTime(timezone=True))
-```
-
-**Purpose**: Store user feedback on AI responses (like/dislike) for quality monitoring and model optimization
-
-## Data Format Specification
-
-### Storage Format (Redis)
-
-Messages in Redis are stored in JSON format using **Part-Based Design**:
-
-#### StoredChatMessage - A Complete Message
-
-```python
-class StoredChatMessage(BaseModel):
- """A complete message (either a user message or an AI message)"""
- parts: List[StoredChatMessagePart] # Multiple parts of the message
- files: List[Dict[str, Any]] # Associated uploaded files
-```
-
-#### StoredChatMessagePart - A Message Part
-
-```python
-class StoredChatMessagePart(BaseModel):
- """A single part of a message (atomic unit)"""
-
- # Identification
- chat_id: str # Chat session ID
- message_id: str # Message ID (shared by multiple parts of the same message)
- part_id: str # Unique part ID
- timestamp: float # Generation timestamp
-
- # Content Classification
- type: Literal["message", "tool_call_result", "thinking", "references"]
- role: Literal["human", "ai", "system"]
- content: str
-
- # Extended Fields
- references: List[Dict] # Document references
- urls: List[str] # URL references
- metadata: Optional[Dict] # Additional metadata
-```
-
-#### Part Type Descriptions
-
-| Type | Description | Included in LLM Context |
-|------|-------------|------------------------|
-| `message` | Main conversation content | ✅ Yes |
-| `tool_call_result` | Tool calling process | ❌ No (display only) |
-| `thinking` | AI thinking process | ❌ No (display only) |
-| `references` | Document references and links | ❌ No (display only) |
-
-**Design Rationale**: A single AI response contains multiple stages (tool calling, thinking, answering, references), and these contents are generated sequentially and interleaved. A single field cannot express this. User messages typically have only 1 part (type="message"), but also support multiple parts to maintain structural consistency.
-
-#### Redis Storage Example
-
-**User Message**:
-```json
-{
- "parts": [
- {
- "chat_id": "chat_abc123",
- "message_id": "uuid-1",
- "part_id": "uuid-part-1",
- "timestamp": 1699999999.0,
- "type": "message",
- "role": "human",
- "content": "What is LightRAG?",
- "references": [],
- "urls": [],
- "metadata": null
- }
- ],
- "files": []
-}
-```
-
-**AI Response (with multiple parts)**:
-```json
-{
- "parts": [
- {
- "message_id": "uuid-2",
- "part_id": "uuid-part-2",
- "type": "tool_call_result",
- "role": "ai",
- "content": "Searching knowledge base...",
- "timestamp": 1699999999.1
- },
- {
- "message_id": "uuid-2",
- "part_id": "uuid-part-3",
- "type": "message",
- "role": "ai",
- "content": "LightRAG is a lightweight RAG framework, deeply modified by the ApeCloud team...",
- "timestamp": 1699999999.5
- },
- {
- "message_id": "uuid-2",
- "part_id": "uuid-part-4",
- "type": "references",
- "role": "ai",
- "content": "",
- "references": [
- {
- "score": 0.95,
- "text": "LightRAG architecture description...",
- "metadata": {"source": "lightrag_doc.pdf", "page": 3}
- }
- ],
- "urls": ["https://github.com/HKUDS/LightRAG"],
- "timestamp": 1699999999.6
- }
- ],
- "files": []
-}
-```
-
-### API Response Format
-
-**ChatDetails Schema** (FastAPI/Pydantic code-first schema):
-
-```yaml
-chatDetails:
- type: object
- properties:
- id: string # chat_abc123
- title: string # Chat title
- bot_id: string # bot_xyz
- peer_id: string
- peer_type: string # system/feishu/weixin/web
- status: string # active/archived
- created: string # ISO 8601
- updated: string # ISO 8601
- history: # 2D array
- type: array
- description: Conversation history, each element is a message
- items:
- type: array
- description: A message contains multiple parts (tool calls, thinking, answer, references, etc.)
- items:
- $ref: '#/chatMessage'
-```
-
-**ChatMessage Schema**:
-
-```yaml
-chatMessage:
- type: object
- properties:
- id: string # message_id (same for one turn)
- part_id: string # part_id (unique for each part)
- type: string # message/tool_call_result/thinking/references
- timestamp: number # Unix timestamp
- role: string # human/ai
- data: string # Message content
- references: # Document references (optional)
- type: array
- items:
- type: object
- properties:
- score: number
- text: string
- metadata: object
- urls: # URL references (optional)
- type: array
- items:
- type: string
- feedback: # User feedback (optional)
- type: object
- properties:
- type: string # like/dislike
- tag: string
- message: string
- files: # Associated files (optional)
- type: array
-```
-
-### Frontend Response Example
-
-```json
-{
- "id": "chat_abc123",
- "title": "Discussion about LightRAG",
- "bot_id": "bot_xyz",
- "status": "active",
- "created": "2025-01-01T00:00:00Z",
- "updated": "2025-01-01T01:00:00Z",
- "history": [
- [
- {
- "id": "uuid-1",
- "part_id": "uuid-part-1",
- "type": "message",
- "timestamp": 1699999999.0,
- "role": "human",
- "data": "What is LightRAG?",
- "files": []
- }
- ],
- [
- {
- "id": "uuid-2",
- "part_id": "uuid-part-2",
- "type": "tool_call_result",
- "timestamp": 1699999999.1,
- "role": "ai",
- "data": "Searching knowledge base...",
- "files": []
- },
- {
- "id": "uuid-2",
- "part_id": "uuid-part-3",
- "type": "message",
- "timestamp": 1699999999.5,
- "role": "ai",
- "data": "LightRAG is a lightweight RAG framework...",
- "files": []
- },
- {
- "id": "uuid-2",
- "part_id": "uuid-part-4",
- "type": "references",
- "timestamp": 1699999999.6,
- "role": "ai",
- "data": "",
- "references": [
- {
- "score": 0.95,
- "text": "LightRAG architecture description...",
- "metadata": {"source": "lightrag_doc.pdf"}
- }
- ],
- "urls": ["https://github.com/HKUDS/LightRAG"],
- "files": []
- }
- ]
- ]
-}
-```
-
-**Note**: `history` is a 2D array. The first dimension is the message sequence (in chronological order), and the second dimension is the multiple parts of that message. For example:
-- `history[0]` = Parts of user's 1st message (usually only 1 part)
-- `history[1]` = Parts of AI's 1st response (may have multiple parts: tool calls, thinking, answer, references)
-- `history[2]` = Parts of user's 2nd message
-- `history[3]` = Parts of AI's 2nd response
-- ...
-
-## Message Write Flow
-
-### Agent Runtime Write Path
-
-The legacy WebSocket chat endpoint `WS /api/v1/bots/{bot_id}/chats/{chat_id}/connect` has been retired.
-Current agent chat writes now go through the v2 turn/timeline APIs and SSE event stream. Keep the history
-schema above as background, but do not use this document to implement new WebSocket chat clients.
-
-## Design Features
-
-### 1. Hybrid Storage Architecture
-
-| Storage | Content | Reason |
-|---------|---------|--------|
-| PostgreSQL | Chat metadata | Persistence, complex queries |
-| Redis | Message history | High-performance read/write, TTL support |
-| PostgreSQL | User feedback | Persistence, for analysis |
-
-**Advantages**:
-- Performance optimization: Message history uses Redis for fast read/write
-- Data persistence: Important metadata stored in PostgreSQL
-- Flexibility: Independent TTL and backup strategy configuration
-
-### 2. Part-Based Message Design
-
-**Core Value**:
-- ✅ Support complex AI response flow (tool calling → thinking → answer → references)
-- ✅ Frontend can render different types of content differently
-- ✅ Complete temporal relationship recording (via timestamp)
-- ✅ Flexible extension (adding new types doesn't require schema changes)
-
-**Why does a single message need multiple parts?**
-
-A single AI response is generated sequentially and interleaved, for example:
-1. 🔍 Part1 (tool_call_result): "Querying database..."
-2. 💭 Part2 (thinking): "Found 327 records..."
-3. 🔍 Part3 (tool_call_result): "Calculating growth rate..."
-4. 💭 Part4 (thinking): "15% QoQ growth..."
-5. 💬 Part5 (message): "Based on data analysis, Q4 performance is excellent..."
-6. 📚 Part6 (references): [Document 1, Document 2]
-
-These 6 parts belong to **one AI message** (sharing the same message_id). A single field cannot express such complex temporal relationships.
-
-### 3. Format Conversion Decoupling
-
-Three format conversions are provided:
-
-```python
-class StoredChatMessage:
- def to_frontend_format(self) -> List[ChatMessage]:
- """Convert to frontend display format"""
- # Include all types of parts
-
- def to_openai_format(self) -> List[Dict]:
- """Convert to LLM call format"""
- # Only include type="message" parts
-
- def get_main_content(self) -> str:
- """Get main answer content"""
- # Content of the first type="message" part
-```
-
-**Advantages**:
-- Internal storage format decoupled from external interfaces
-- Support different consumption scenarios
-- LLM context only includes actual conversation content, not tool calls and thinking processes
-
-### 4. Three-Level ID Design
-
-```python
-chat_id = "chat_abc123" # Session level
-message_id = "uuid-msg-1" # Message level (shared by parts of the same message)
-part_id = "uuid-part-1" # Part level (each part is independent)
-```
-
-**Purpose**:
-- `chat_id`: Identifies a chat session
-- `message_id`: Groups parts of the same message (for frontend display and feedback association)
-- `part_id`: Uniquely identifies each part (for individual operations like copy, reference)
-
-## Performance Considerations
-
-### Redis Optimization
-- **List Data Structure**: LPUSH O(1), LRANGE O(N)
-- **Optional TTL**: Automatic expiration of historical messages
-- **Connection Pool Reuse**: Global Redis client
-
-### PostgreSQL Optimization
-- **Indexes**: user, bot_id, chat_id, status fields
-- **Soft Delete**: Using gmt_deleted
-- **Paginated Queries**: list_chats supports pagination
-
-### Transmission Optimization
-- **WebSocket Streaming**: Generate and send simultaneously
-- **Incremental Updates**: Only transmit new parts
-- **Lazy Loading**: Load historical messages on demand
-
-## Related Files
-
-### Core Implementation
-- `aperag/views/chat.py` - View layer interface
-- `aperag/service/chat_service.py` - Service layer business logic
-- `aperag/utils/history.py` - Redis message history management
-- `aperag/chat/history/message.py` - Message data structures
-- `aperag/db/models.py` - Database models
-- `aperag/db/repositories/chat.py` - Chat database operations
-- `aperag/schema/view_models.py` - Pydantic OpenAPI schema models
-
-### Frontend Implementation
-- `web/src/app/workspace/bots/[botId]/chats/[chatId]/page.tsx` - Chat detail page
-- `web/src/components/chat/chat-messages.tsx` - Message display component
-
-## Summary
-
-ApeRAG's chat history message system adopts **Hybrid Storage + Part-Based Message Design**:
-
-1. **PostgreSQL** stores Chat metadata and feedback (persistence, queryable)
-2. **Redis** stores message history (high performance, expiration support)
-3. **Part-Based Design** supports complex AI response flows (tool calling, thinking, answering, references)
-4. **Three-Level ID Design** supports message grouping and independent operations
-5. **Clear Layered Architecture** (View → Service → Repository → Storage)
-
-This design ensures both performance and support for complex AI interaction scenarios, while maintaining good scalability.
diff --git a/docs/en-US/design/connected_components_optimization.md b/docs/en-US/design/connected_components_optimization.md
deleted file mode 100644
index a6f14a75b..000000000
--- a/docs/en-US/design/connected_components_optimization.md
+++ /dev/null
@@ -1,153 +0,0 @@
-# Connected Components Optimization for LightRAG
-
-## Overview
-
-This document describes the connected components optimization implemented for LightRAG's graph indexing process. This optimization significantly improves concurrency by identifying and processing independent entity groups separately.
-
-## Problem Statement
-
-### Before Optimization
-
-In the original implementation, when processing extracted entities and relationships:
-
-1. **All entities** from a batch were collected into a single set
-2. **One massive MultiLock** was created to lock all entities at once
-3. All processing had to wait for this global lock
-
-This approach had several drawbacks:
-
-- **Poor Concurrency**: If Task A is processing entities about "Technology" and Task B wants to process entities about "History", Task B must wait even though these topics are completely unrelated
-- **Lock Contention**: The more entities in a batch, the higher the chance of lock conflicts
-- **Scalability Issues**: System throughput decreases as the number of concurrent tasks increases
-
-### Example Scenario
-
-```
-Document 1: "AI and Machine Learning are transforming technology..."
-Document 2: "Julius Caesar ruled the Roman Empire..."
-
-Without optimization:
-- Task 1: Lock(AI, ML, Technology, Caesar, Rome, Empire) → Process → Release
-- Task 2: Wait... → Lock(all) → Process → Release
-```
-
-## Solution: Connected Components
-
-### Core Concept
-
-We treat the extracted entities and relationships as a graph and find **connected components** - groups of entities that are connected through relationships. Entities in different components have no relationships between them and can be processed independently.
-
-### Implementation
-
-#### 1. Find Connected Components (`_find_connected_components`)
-
-```python
-def _find_connected_components(self, chunk_results) -> List[List[str]]:
- # Build adjacency list from entities and relationships
- # Use BFS to find all connected components
- # Return list of entity groups
-```
-
-This method:
-- Builds an adjacency list from all extracted entities and relationships
-- Uses Breadth-First Search (BFS) to identify connected components
-- Returns a list where each element is a group of connected entity names
-
-#### 2. Process Entity Groups (`_process_entity_groups`)
-
-```python
-async def _process_entity_groups(self, chunk_results, components, collection_id):
- for component in components:
- # Filter chunk_results for this component
- # Create locks only for entities in this component
- # Process this group independently
-```
-
-This method:
-- Processes each connected component separately
-- Creates locks only for entities within each component
-- Allows parallel processing of unrelated components
-
-#### 3. Updated Graph Indexing (`aprocess_graph_indexing`)
-
-The main processing flow now:
-1. Extracts entities and relationships (unchanged)
-2. Finds connected components
-3. Processes each component with its own lock scope
-
-## Benefits
-
-### 1. Improved Concurrency
-
-```
-With optimization:
-- Task 1: Lock(AI, ML, Technology) → Process → Release
-- Task 2: Lock(Caesar, Rome, Empire) → Process → Release
- ↑ Can run in parallel!
-```
-
-### 2. Reduced Lock Contention
-
-- Smaller lock scopes mean less chance of conflicts
-- Independent topics can be processed simultaneously
-- Better CPU utilization in multi-core systems
-
-### 3. Better Scalability
-
-- System maintains high throughput even with many concurrent tasks
-- Processing time scales with the size of the largest component, not total entities
-- Particularly effective for diverse document collections
-
-## Performance Impact
-
-### Typical Improvements
-
-- **2-3x throughput increase** for diverse document collections
-- **Near-linear scaling** with number of CPU cores for unrelated content
-- **Minimal overhead** from component detection (< 1% of total processing time)
-
-### Best Case Scenarios
-
-- Processing documents from different domains (tech, history, science, etc.)
-- Large collections with many small, unrelated topics
-- Systems with high concurrent document ingestion
-
-### Worst Case Scenarios
-
-- All entities are connected (degrades to original behavior)
-- Very small batches (overhead becomes more noticeable)
-- Documents about a single, highly interconnected topic
-
-## Example Usage
-
-```python
-# The optimization is automatic and transparent
-lightrag = LightRAG(...)
-
-# Process documents - connected components are handled internally
-result = await lightrag.aprocess_graph_indexing(chunks)
-
-# Result includes component information
-print(f"Processed {result['groups_processed']} independent groups")
-```
-
-## Testing
-
-Comprehensive unit tests are provided in `test_lightrag_connected_components.py`:
-
-- Single component (all connected)
-- Multiple components
-- Isolated entities
-- Complex graph structures
-- Edge filtering between components
-
-## Future Enhancements
-
-1. **Dynamic Batching**: Adjust batch sizes based on component characteristics
-2. **Priority Processing**: Process larger/more important components first
-3. **Component Caching**: Cache component structures for similar document patterns
-4. **Metrics and Monitoring**: Track component statistics for optimization insights
-
-## Conclusion
-
-The connected components optimization is a significant improvement to LightRAG's graph indexing process. By identifying and processing independent entity groups separately, we achieve better concurrency, reduced lock contention, and improved scalability - all while maintaining the correctness and consistency of the knowledge graph.
\ No newline at end of file
diff --git a/docs/en-US/design/document_upload_design.md b/docs/en-US/design/document_upload_design.md
deleted file mode 100644
index f90379165..000000000
--- a/docs/en-US/design/document_upload_design.md
+++ /dev/null
@@ -1,1077 +0,0 @@
----
-title: Document Upload Design
-position: 3
----
-
-# ApeRAG Document Upload Architecture Design
-
-## Overview
-
-This document details the complete architecture design of the document upload module in the ApeRAG project, covering the full pipeline from file upload, temporary storage, document parsing, format conversion to final index construction.
-
-**Core Design Philosophy**: Adopts a **two-phase commit** pattern, separating file upload (temporary storage) from document confirmation (formal addition), providing better user experience and resource management capabilities.
-
-## System Architecture
-
-### Overall Architecture
-
-```
-┌─────────────────────────────────────────────────────────────┐
-│ Frontend │
-│ (Next.js) │
-└────────┬───────────────────────────────────┬────────────────┘
- │ │
- │ Step 1: Upload │ Step 2: Confirm
- │ POST /documents/upload │ POST /documents/confirm
- ▼ ▼
-┌─────────────────────────────────────────────────────────────┐
-│ View Layer: aperag/views/collections.py │
-│ - HTTP request handling │
-│ - JWT authentication │
-│ - Parameter validation │
-└────────┬───────────────────────────────────┬────────────────┘
- │ │
- │ document_service.upload_document() │ document_service.confirm_documents()
- ▼ ▼
-┌─────────────────────────────────────────────────────────────┐
-│ Service Layer: aperag/service/document_service.py │
-│ - Business logic orchestration │
-│ - File validation (type, size) │
-│ - SHA-256 hash deduplication │
-│ - Quota checking │
-│ - Transaction management │
-└────────┬───────────────────────────────────┬────────────────┘
- │ │
- │ Step 1 │ Step 2
- ▼ ▼
-┌────────────────────────┐ ┌────────────────────────────┐
-│ 1. Create Document │ │ 1. Update Document status │
-│ status=UPLOADED │ │ UPLOADED → PENDING │
-│ 2. Save to ObjectStore│ │ 2. Create DocumentIndex │
-│ 3. Calculate hash │ │ 3. Trigger indexing tasks │
-└────────┬───────────────┘ └────────┬───────────────────┘
- │ │
- ▼ ▼
-┌─────────────────────────────────────────────────────────────┐
-│ Storage Layer │
-│ │
-│ ┌───────────────┐ ┌──────────────────┐ ┌─────────────┐ │
-│ │ PostgreSQL │ │ Object Store │ │ Vector DB │ │
-│ │ │ │ │ │ │ │
-│ │ - document │ │ - Local/S3 │ │ - Qdrant │ │
-│ │ - document_ │ │ - Original files │ │ - Vectors │ │
-│ │ index │ │ - Converted files│ │ │ │
-│ └───────────────┘ └──────────────────┘ └─────────────┘ │
-│ │
-│ ┌───────────────┐ ┌──────────────────┐ │
-│ │ Elasticsearch │ │ Neo4j/PG │ │
-│ │ │ │ │ │
-│ │ - Full-text │ │ - Knowledge Graph│ │
-│ └───────────────┘ └──────────────────┘ │
-└─────────────────────────────────────────────────────────────┘
- │
- ▼
- ┌───────────────────┐
- │ Celery Workers │
- │ │
- │ - Doc parsing │
- │ - Format convert │
- │ - Content extract│
- │ - Doc chunking │
- │ - Index building │
- └───────────────────┘
-```
-
-### Layered Architecture
-
-```
-┌─────────────────────────────────────────────┐
-│ View Layer (views/collections.py) │ HTTP handling, auth, validation
-└─────────────────┬───────────────────────────┘
- │ calls
-┌─────────────────▼───────────────────────────┐
-│ Service Layer (service/document_service.py)│ Business logic, transaction, permission
-└─────────────────┬───────────────────────────┘
- │ calls
-┌─────────────────▼───────────────────────────┐
-│ Repository Layer (db/ops.py, objectstore/) │ Data access abstraction
-└─────────────────┬───────────────────────────┘
- │ accesses
-┌─────────────────▼───────────────────────────┐
-│ Storage Layer (PG, S3, Qdrant, ES, Neo4j) │ Data persistence
-└─────────────────────────────────────────────┘
-```
-
-## Core Process Details
-
-### Phase 0: API Interface Definition
-
-The system provides three main interfaces:
-
-1. **Upload File** (Two-phase mode - Step 1)
- - Endpoint: `POST /api/v1/collections/{collection_id}/documents/upload`
- - Function: Upload file to temporary storage, status `UPLOADED`
- - Returns: `document_id`, `filename`, `size`, `status`
-
-2. **Confirm Documents** (Two-phase mode - Step 2)
- - Endpoint: `POST /api/v1/collections/{collection_id}/documents/confirm`
- - Function: Confirm uploaded documents, trigger index building
- - Parameters: `document_ids` array
- - Returns: `confirmed_count`, `failed_count`, `failed_documents`
-
-3. **One-step Upload** (Legacy mode, backward compatible)
- - Endpoint: `POST /api/v1/collections/{collection_id}/documents`
- - Function: Upload and directly add to knowledge base, status directly to `PENDING`
- - Supports batch upload
-
-### Phase 1: File Upload and Temporary Storage
-
-#### 1.1 Upload Flow
-
-```
-User selects files
- │
- ▼
-Frontend calls upload API
- │
- ▼
-View layer validates identity and params
- │
- ▼
-Service layer processes business logic:
- │
- ├─► Verify collection exists and active
- │
- ├─► Validate file type and size
- │
- ├─► Read file content
- │
- ├─► Calculate SHA-256 hash
- │
- └─► Transaction processing:
- │
- ├─► Duplicate detection (by filename + hash)
- │ ├─ Exact match: Return existing doc (idempotent)
- │ ├─ Same name, different content: Throw conflict error
- │ └─ New document: Continue creation
- │
- ├─► Create Document record (status=UPLOADED)
- │
- ├─► Upload to object store
- │ └─ Path: user-{user_id}/{collection_id}/{document_id}/original{suffix}
- │
- └─► Update document metadata (object_path)
-```
-
-#### 1.2 File Validation
-
-**Supported File Types**:
-- Documents: `.pdf`, `.doc`, `.docx`, `.ppt`, `.pptx`, `.xls`, `.xlsx`
-- Text: `.txt`, `.md`, `.html`, `.json`, `.xml`, `.yaml`, `.yml`, `.csv`
-- Images: `.png`, `.jpg`, `.jpeg`, `.gif`, `.bmp`, `.tiff`, `.tif`
-- Audio: `.mp3`, `.wav`, `.m4a`
-- Archives: `.zip`, `.tar`, `.gz`, `.tgz`
-
-**Size Limits**:
-- Default: 100 MB (configurable via `MAX_DOCUMENT_SIZE` environment variable)
-- Extracted total size: 5 GB (`MAX_EXTRACTED_SIZE`)
-
-#### 1.3 Duplicate Detection Mechanism
-
-Uses **filename + SHA-256 hash** dual detection:
-
-| Scenario | Filename | Hash | System Behavior |
-|----------|----------|------|-----------------|
-| Exact match | Same | Same | Return existing document (idempotent) |
-| Name conflict | Same | Different | Throw `DocumentNameConflictException` |
-| New document | Different | - | Create new document record |
-
-**Advantages**:
-- ✅ Supports idempotent upload: Network retries won't create duplicates
-- ✅ Prevents content conflicts: Same name with different content prompts user
-- ✅ Saves storage space: Same content stored only once
-
-### Phase 2: Temporary Storage Configuration
-
-#### 2.1 Object Storage Types
-
-System supports two object storage backends, switchable via environment variables:
-
-**1. Local Storage (Local filesystem)**
-
-Use cases:
-- Development and testing environments
-- Small-scale deployments
-- Single-machine deployments
-
-Configuration:
-```bash
-# Development environment
-OBJECT_STORE_TYPE=local
-OBJECT_STORE_LOCAL_ROOT_DIR=.objects
-
-# Docker environment
-OBJECT_STORE_TYPE=local
-OBJECT_STORE_LOCAL_ROOT_DIR=/shared/objects
-```
-
-Storage path example:
-```
-.objects/
-└── user-google-oauth2-123456/
- └── col_abc123/
- └── doc_xyz789/
- ├── original.pdf # Original file
- ├── converted.pdf # Converted PDF
- ├── processed_content.md # Parsed Markdown
- ├── chunks/ # Chunked data
- │ ├── chunk_0.json
- │ └── chunk_1.json
- └── images/ # Extracted images
- ├── page_0.png
- └── page_1.png
-```
-
-**2. S3 Storage (Compatible with AWS S3/MinIO/OSS, etc.)**
-
-Use cases:
-- Production environments
-- Large-scale deployments
-- Distributed deployments
-- High availability and disaster recovery needs
-
-Configuration:
-```bash
-OBJECT_STORE_TYPE=s3
-OBJECT_STORE_S3_ENDPOINT=http://127.0.0.1:9000 # MinIO/S3 address
-OBJECT_STORE_S3_REGION=us-east-1 # AWS Region
-OBJECT_STORE_S3_ACCESS_KEY=minioadmin # Access Key
-OBJECT_STORE_S3_SECRET_KEY=minioadmin # Secret Key
-OBJECT_STORE_S3_BUCKET=aperag # Bucket name
-OBJECT_STORE_S3_PREFIX_PATH=dev/ # Optional path prefix
-OBJECT_STORE_S3_USE_PATH_STYLE=true # Set to true for MinIO
-```
-
-#### 2.2 Object Storage Path Rules
-
-**Path Format**:
-```
-{prefix}/user-{user_id}/{collection_id}/{document_id}/{filename}
-```
-
-**Components**:
-- `prefix`: Optional global prefix (S3 only)
-- `user_id`: User ID (`|` replaced with `-`)
-- `collection_id`: Collection ID
-- `document_id`: Document ID
-- `filename`: Filename (e.g., `original.pdf`, `page_0.png`)
-
-**Multi-tenancy Isolation**:
-- Each user has an independent namespace
-- Each collection has an independent storage directory
-- Each document has an independent folder
-
-### Phase 3: Document Confirmation and Index Building
-
-#### 3.1 Confirmation Flow
-
-```
-User clicks "Save to Collection"
- │
- ▼
-Frontend calls confirm API
- │
- ▼
-Service layer processes:
- │
- ├─► Validate collection configuration
- │
- ├─► Check Quota (deduct quota at confirmation stage)
- │
- └─► For each document_id:
- │
- ├─► Verify document status is UPLOADED
- │
- ├─► Update document status: UPLOADED → PENDING
- │
- ├─► Create index records based on collection config:
- │ ├─ VECTOR (Vector index, required)
- │ ├─ FULLTEXT (Full-text index, required)
- │ ├─ GRAPH (Knowledge graph, optional)
- │ ├─ SUMMARY (Document summary, optional)
- │ └─ VISION (Vision index, optional)
- │
- └─► Return confirmation result
- │
- ▼
-Trigger Celery task: reconcile_document_indexes
- │
- ▼
-Background async index building
-```
-
-#### 3.2 Quota Management
-
-**Check Timing**:
-- ❌ Not checked during upload phase (temporary storage doesn't consume quota)
-- ✅ Checked during confirmation phase (formal addition consumes quota)
-
-**Quota Types**:
-
-1. **User Global Quota**
- - `max_document_count`: Total document count limit per user
- - Default: 1000 (configurable via `MAX_DOCUMENT_COUNT`)
-
-2. **Per-Collection Quota**
- - `max_document_count_per_collection`: Document count limit per collection
- - Excludes `UPLOADED` and `DELETED` status documents
-
-**Quota Exceeded Handling**:
-- Throws `QuotaExceededException`
-- Returns HTTP 400 error
-- Includes current usage and quota limit information
-
-### Phase 4: Document Parsing and Format Conversion
-
-#### 4.1 Parser Architecture
-
-System uses a **multi-parser chain invocation** architecture, where each parser handles specific file types:
-
-```
-DocParser (Main Controller)
- │
- ├─► MinerUParser
- │ └─ Function: High-precision PDF parsing (commercial API)
- │ └─ Supports: .pdf
- │
- ├─► ImageParser
- │ └─ Function: Image content recognition (OCR + vision understanding)
- │ └─ Supports: .jpg, .png, .gif, .bmp, .tiff
- │
- ├─► AudioParser
- │ └─ Function: Audio transcription (Speech-to-Text)
- │ └─ Supports: .mp3, .wav, .m4a
- │
- └─► MarkItDownParser (Fallback)
- └─ Function: Universal document to Markdown conversion
- └─ Supports: Almost all common formats
-```
-
-#### 4.2 Parser Configuration
-
-**Configuration Method**: Dynamically controlled via Collection Config
-
-```json
-{
- "parser_config": {
- "use_mineru": false, // Enable MinerU (requires API Token)
- "use_markitdown": true, // Enable MarkItDown (default)
- "mineru_api_token": "xxx" // MinerU API Token (optional)
- }
-}
-```
-
-**Environment Variable Configuration**:
-```bash
-USE_MINERU_API=false # Globally enable MinerU
-MINERU_API_TOKEN=your_token # MinerU API Token
-```
-
-#### 4.3 Parsing Flow
-
-```
-Celery Worker receives indexing task
- │
- ▼
-1. Download original file from object store
- │
- ▼
-2. Select Parser based on file extension
- │
- ├─► Try first matching Parser
- │ ├─ Success: Return parsing result
- │ └─ Failure: FallbackError → Try next Parser
- │
- └─► Final fallback: MarkItDownParser
- │
- ▼
-3. Parsing result (Parts):
- │
- ├─► MarkdownPart: Text content
- │ └─ Contains: headings, paragraphs, lists, tables, etc.
- │
- ├─► PdfPart: PDF file
- │ └─ For: linearization, page rendering
- │
- └─► AssetBinPart: Binary resources
- └─ Contains: images, embedded files, etc.
- │
- ▼
-4. Post-processing:
- │
- ├─► PDF pages to images (required for Vision index)
- │ └─ Each page rendered as PNG image
- │ └─ Saved to {document_path}/images/page_N.png
- │
- ├─► PDF linearization (speed up browser loading)
- │ └─ Use pikepdf to optimize PDF structure
- │ └─ Saved to {document_path}/converted.pdf
- │
- └─► Extract text content (plain text)
- └─ Merge all MarkdownPart content
- └─ Saved to {document_path}/processed_content.md
- │
- ▼
-5. Save to object store
-```
-
-#### 4.4 Format Conversion Examples
-
-**Example 1: PDF Document**
-```
-Input: user_manual.pdf (5 MB)
- │
- ▼
-Parser selection: MinerUParser / MarkItDownParser
- │
- ▼
-Output Parts:
- ├─ MarkdownPart: "# User Manual\n\n## Chapter 1\n..."
- └─ PdfPart:
- │
- ▼
-Post-processing:
- ├─ Render 50 pages to images → images/page_0.png ~ page_49.png
- ├─ Linearize PDF → converted.pdf
- └─ Extract text → processed_content.md
-```
-
-**Example 2: Image File**
-```
-Input: screenshot.png (2 MB)
- │
- ▼
-Parser selection: ImageParser
- │
- ▼
-Output Parts:
- ├─ MarkdownPart: "[OCR extracted text]"
- └─ AssetBinPart: (vision_index=true)
- │
- ▼
-Post-processing:
- └─ Save original image copy → images/file.png
-```
-
-**Example 3: Audio File**
-```
-Input: meeting_record.mp3 (50 MB)
- │
- ▼
-Parser selection: AudioParser
- │
- ▼
-Output Parts:
- └─ MarkdownPart: "[Transcribed meeting content]"
- │
- ▼
-Post-processing:
- └─ Save transcription text → processed_content.md
-```
-
-### Phase 5: Index Building
-
-#### 5.1 Index Types and Functions
-
-| Index Type | Required | Function Description | Storage Location |
-|-----------|----------|---------------------|------------------|
-| **VECTOR** | ✅ Required | Vector retrieval, semantic search | Qdrant / Elasticsearch |
-| **FULLTEXT** | ✅ Required | Full-text search, keyword search | Elasticsearch |
-| **GRAPH** | ❌ Optional | Knowledge graph, entity & relation extraction | Neo4j / PostgreSQL |
-| **SUMMARY** | ❌ Optional | Document summary, LLM generated | PostgreSQL (index_data) |
-| **VISION** | ❌ Optional | Vision understanding, image content analysis | Qdrant (vectors) + PG (metadata) |
-
-#### 5.2 Index Building Flow
-
-```
-Celery Worker: reconcile_document_indexes task
- │
- ▼
-1. Scan DocumentIndex table, find indexes needing processing
- │
- ├─► PENDING status + observed_version < version
- │ └─ Need to create or update index
- │
- └─► DELETING status
- └─ Need to delete index
- │
- ▼
-2. Group by document, process one by one
- │
- ▼
-3. For each document:
- │
- ├─► parse_document (parse document)
- │ ├─ Download original file from object store
- │ ├─ Call DocParser to parse
- │ └─ Return ParsedDocumentData
- │
- └─► For each index type:
- │
- ├─► create_index (create/update index)
- │ │
- │ ├─ VECTOR index:
- │ │ ├─ Document chunking
- │ │ ├─ Generate vectors using Embedding model
- │ │ └─ Write to Qdrant
- │ │
- │ ├─ FULLTEXT index:
- │ │ ├─ Extract plain text content
- │ │ ├─ Chunk by paragraph/section
- │ │ └─ Write to Elasticsearch
- │ │
- │ ├─ GRAPH index:
- │ │ ├─ Extract entities using LightRAG
- │ │ ├─ Extract entity relationships
- │ │ └─ Write to Neo4j/PostgreSQL
- │ │
- │ ├─ SUMMARY index:
- │ │ ├─ Generate summary using LLM
- │ │ └─ Save to DocumentIndex.index_data
- │ │
- │ └─ VISION index:
- │ ├─ Extract image Assets
- │ ├─ Understand image content using Vision LLM
- │ ├─ Generate image description vectors
- │ └─ Write to Qdrant
- │
- └─► Update index status
- ├─ Success: CREATING → ACTIVE
- └─ Failure: CREATING → FAILED
- │
- ▼
-4. Update document overall status
- │
- ├─ All indexes ACTIVE → Document.status = COMPLETE
- ├─ Any index FAILED → Document.status = FAILED
- └─ Some indexes still processing → Document.status = RUNNING
-```
-
-#### 5.3 Document Chunking
-
-**Chunking Strategy**:
-- Recursive character splitting (RecursiveCharacterTextSplitter)
-- Prioritize splitting by natural paragraphs and sections
-- Maintain context overlap
-
-**Chunking Parameters**:
-```json
-{
- "chunk_size": 1000, // Max characters per chunk
- "chunk_overlap": 200, // Overlap characters
- "separators": ["\n\n", "\n", " ", ""] // Separator priority
-}
-```
-
-**Chunking Result Storage**:
-```
-{document_path}/chunks/
- ├─ chunk_0.json: {"text": "...", "metadata": {...}}
- ├─ chunk_1.json: {"text": "...", "metadata": {...}}
- └─ ...
-```
-
-## Database Design
-
-### Table 1: document (Document Metadata)
-
-**Table Structure**:
-
-| Field | Type | Description | Index |
-|-------|------|-------------|-------|
-| `id` | String(24) | Document ID, primary key, format: `doc{random_id}` | PK |
-| `name` | String(1024) | Filename | - |
-| `user` | String(256) | User ID (supports multiple IDPs) | ✅ Index |
-| `collection_id` | String(24) | Collection ID | ✅ Index |
-| `status` | Enum | Document status (see table below) | ✅ Index |
-| `size` | BigInteger | File size (bytes) | - |
-| `content_hash` | String(64) | SHA-256 hash (for deduplication) | ✅ Index |
-| `object_path` | Text | Object store path (deprecated, use doc_metadata) | - |
-| `doc_metadata` | Text | Document metadata (JSON string) | - |
-| `gmt_created` | DateTime(tz) | Creation time (UTC) | - |
-| `gmt_updated` | DateTime(tz) | Update time (UTC) | - |
-| `gmt_deleted` | DateTime(tz) | Deletion time (soft delete) | ✅ Index |
-
-**Unique Constraint**:
-```sql
-UNIQUE INDEX uq_document_collection_name_active
- ON document (collection_id, name)
- WHERE gmt_deleted IS NULL;
-```
-- Within the same collection, active document names cannot be duplicated
-- Deleted documents are excluded from uniqueness check
-
-**Document Status Enum** (`DocumentStatus`):
-
-| Status | Description | When Set | Visibility |
-|--------|-------------|----------|------------|
-| `UPLOADED` | Uploaded to temporary storage | `upload_document` API | Frontend file selection UI |
-| `PENDING` | Waiting for index building | `confirm_documents` API | Document list (processing) |
-| `RUNNING` | Index building in progress | Celery task starts processing | Document list (processing) |
-| `COMPLETE` | All indexes completed | All indexes become ACTIVE | Document list (available) |
-| `FAILED` | Index building failed | Any index fails | Document list (failed) |
-| `DELETED` | Deleted | `delete_document` API | Not visible (soft delete) |
-| `EXPIRED` | Temporary document expired | Scheduled cleanup task | Not visible |
-
-**Document Metadata Example** (`doc_metadata` JSON field):
-```json
-{
- "object_path": "user-xxx/col_xxx/doc_xxx/original.pdf",
- "converted_path": "user-xxx/col_xxx/doc_xxx/converted.pdf",
- "processed_content_path": "user-xxx/col_xxx/doc_xxx/processed_content.md",
- "images": [
- "user-xxx/col_xxx/doc_xxx/images/page_0.png",
- "user-xxx/col_xxx/doc_xxx/images/page_1.png"
- ],
- "parser_used": "MinerUParser",
- "parse_duration_ms": 5420,
- "page_count": 50,
- "custom_field": "value"
-}
-```
-
-### Table 2: document_index (Index Status Management)
-
-**Table Structure**:
-
-| Field | Type | Description | Index |
-|-------|------|-------------|-------|
-| `id` | Integer | Auto-increment ID, primary key | PK |
-| `document_id` | String(24) | Related document ID | ✅ Index |
-| `index_type` | Enum | Index type (see table below) | ✅ Index |
-| `status` | Enum | Index status (see table below) | ✅ Index |
-| `version` | Integer | Index version number | - |
-| `observed_version` | Integer | Processed version number | - |
-| `index_data` | Text | Index data (JSON), e.g., summary content | - |
-| `error_message` | Text | Error message (on failure) | - |
-| `gmt_created` | DateTime(tz) | Creation time | - |
-| `gmt_updated` | DateTime(tz) | Update time | - |
-| `gmt_last_reconciled` | DateTime(tz) | Last reconciliation time | - |
-
-**Unique Constraint**:
-```sql
-UNIQUE CONSTRAINT uq_document_index
- ON document_index (document_id, index_type);
-```
-- Each document has only one record per index type
-
-**Index Type Enum** (`DocumentIndexType`):
-
-| Type | Value | Description | External Storage |
-|------|-------|-------------|------------------|
-| `VECTOR` | "VECTOR" | Vector index | Qdrant / Elasticsearch |
-| `FULLTEXT` | "FULLTEXT" | Full-text index | Elasticsearch |
-| `GRAPH` | "GRAPH" | Knowledge graph | Neo4j / PostgreSQL |
-| `SUMMARY` | "SUMMARY" | Document summary | PostgreSQL (index_data) |
-| `VISION` | "VISION" | Vision index | Qdrant + PostgreSQL |
-
-**Index Status Enum** (`DocumentIndexStatus`):
-
-| Status | Description | When Set |
-|--------|-------------|----------|
-| `PENDING` | Waiting for processing | `confirm_documents` creates index record |
-| `CREATING` | Creating | Celery Worker starts processing |
-| `ACTIVE` | Ready for use | Index building successful |
-| `DELETING` | Marked for deletion | `delete_document` API |
-| `DELETION_IN_PROGRESS` | Deleting | Celery Worker is deleting |
-| `FAILED` | Failed | Index building failed |
-
-**Version Control Mechanism**:
-- `version`: Expected index version (incremented on document update)
-- `observed_version`: Processed version number
-- When `version > observed_version`, triggers index update
-
-**Reconciler**:
-```python
-# Query indexes needing processing
-SELECT * FROM document_index
-WHERE status = 'PENDING'
- AND observed_version < version;
-
-# Update after processing
-UPDATE document_index
-SET status = 'ACTIVE',
- observed_version = version,
- gmt_last_reconciled = NOW()
-WHERE id = ?;
-```
-
-### Table Relationship Diagram
-
-```
-┌─────────────────────────────────┐
-│ collection │
-│ ───────────────────────────── │
-│ id (PK) │
-│ name │
-│ config (JSON) │
-│ status │
-│ ... │
-└────────────┬────────────────────┘
- │ 1:N
- ▼
-┌─────────────────────────────────┐
-│ document │
-│ ───────────────────────────── │
-│ id (PK) │
-│ collection_id (FK) │◄──── Unique constraint: (collection_id, name)
-│ name │
-│ user │
-│ status (Enum) │
-│ size │
-│ content_hash (SHA-256) │
-│ doc_metadata (JSON) │
-│ gmt_created │
-│ gmt_deleted │
-│ ... │
-└────────────┬────────────────────┘
- │ 1:N
- ▼
-┌─────────────────────────────────┐
-│ document_index │
-│ ───────────────────────────── │
-│ id (PK) │
-│ document_id (FK) │◄──── Unique constraint: (document_id, index_type)
-│ index_type (Enum) │
-│ status (Enum) │
-│ version │
-│ observed_version │
-│ index_data (JSON) │
-│ error_message │
-│ gmt_last_reconciled │
-│ ... │
-└─────────────────────────────────┘
-```
-
-## State Machine and Lifecycle
-
-### Document State Transitions
-
-```
- ┌─────────────────────────────────────────────┐
- │ │
- │ ▼
- [Upload] ──► UPLOADED ──► [Confirm] ──► PENDING ──► RUNNING ──► COMPLETE
- │ │
- │ ▼
- │ FAILED
- │ │
- │ ▼
- └──────► [Delete] ──────────────► DELETED
- │
- ┌───────────────────────────────────┘
- │
- ▼
- EXPIRED (Scheduled cleanup of unconfirmed docs)
-```
-
-**Key Transitions**:
-1. **UPLOADED → PENDING**: User clicks "Save to Collection"
-2. **PENDING → RUNNING**: Celery Worker starts processing
-3. **RUNNING → COMPLETE**: All indexes successful
-4. **RUNNING → FAILED**: Any index fails
-5. **Any status → DELETED**: User deletes document
-
-### Index State Transitions
-
-```
- [Create index record] ──► PENDING ──► CREATING ──► ACTIVE
- │
- ▼
- FAILED
- │
- ▼
- ┌──────────► PENDING (retry)
- │
- [Delete request] ────────┼──────────► DELETING ──► DELETION_IN_PROGRESS ──► (record deleted)
- │
- └──────────► (directly delete record, if PENDING/FAILED)
-```
-
-## Async Task Scheduling (Celery)
-
-### Task Definitions
-
-**Main Task**: `reconcile_document_indexes`
-- Trigger timing:
- - After `confirm_documents` API call
- - Scheduled task (every 30 seconds)
- - Manual trigger (admin interface)
-- Function: Scan `document_index` table, process indexes needing reconciliation
-
-**Sub-tasks**:
-- `parse_document_task`: Parse document content
-- `create_vector_index_task`: Create vector index
-- `create_fulltext_index_task`: Create full-text index
-- `create_graph_index_task`: Create knowledge graph index
-- `create_summary_index_task`: Create summary index
-- `create_vision_index_task`: Create vision index
-
-### Task Scheduling Strategy
-
-**Concurrency Control**:
-- Each Worker processes at most N documents simultaneously (default 4)
-- Multiple indexes of each document can be built in parallel
-- Use Celery's `task_acks_late=True` to ensure tasks aren't lost
-
-**Failure Retry**:
-- Maximum 3 retries
-- Exponential backoff (1 min → 5 min → 15 min)
-- Marked as `FAILED` after 3 failures
-
-**Idempotency**:
-- All tasks support repeated execution
-- Use `observed_version` mechanism to avoid duplicate processing
-- Same input produces same output
-
-## Design Features and Advantages
-
-### 1. Two-Phase Commit Design
-
-**Advantages**:
-- ✅ **Better User Experience**: Fast upload response, doesn't block user operations
-- ✅ **Selective Addition**: Can selectively confirm partial files after batch upload
-- ✅ **Reasonable Resource Control**: Unconfirmed documents don't build indexes, don't consume quota
-- ✅ **Failure Recovery Friendly**: Temporary documents can be periodically cleaned up without affecting business
-
-**Status Isolation**:
-```
-Temporary status (UPLOADED):
- - Not counted in quota
- - Doesn't trigger indexing
- - Can be automatically cleaned up
-
-Formal status (PENDING/RUNNING/COMPLETE):
- - Counted in quota
- - Triggers index building
- - Won't be automatically cleaned up
-```
-
-### 2. Idempotency Design
-
-**File-Level Idempotency**:
-- SHA-256 hash deduplication
-- Same file uploaded multiple times returns same `document_id`
-- Avoids storage space waste
-
-**API-Level Idempotency**:
-- `upload_document`: Repeated upload returns existing document
-- `confirm_documents`: Repeated confirmation doesn't create duplicate indexes
-- `delete_document`: Repeated deletion returns success (soft delete)
-
-### 3. Multi-Tenancy Isolation
-
-**Storage Isolation**:
-```
-user-{user_A}/... # User A's files
-user-{user_B}/... # User B's files
-```
-
-**Database Isolation**:
-- All queries filter by `user` field
-- Collection-level permission control (`collection.user`)
-- Soft delete support (`gmt_deleted`)
-
-### 4. Flexible Storage Backend
-
-**Unified Interface**:
-```python
-AsyncObjectStore:
- - put(path, data)
- - get(path)
- - delete_objects_by_prefix(prefix)
-```
-
-**Runtime Switching**:
-- Switch between Local/S3 via environment variables
-- No need to modify business code
-- Supports custom storage backends (just implement the interface)
-
-### 5. Transaction Consistency
-
-**Two-Phase Commit for Database + Object Store**:
-```python
-async with transaction:
- # 1. Create database record
- document = create_document_record()
-
- # 2. Upload to object store
- await object_store.put(path, data)
-
- # 3. Update metadata
- document.doc_metadata = json.dumps(metadata)
-
- # All operations succeed to commit, any failure rolls back
-```
-
-**Failure Handling**:
-- Database record creation fails: Don't upload file
-- File upload fails: Rollback database record
-- Metadata update fails: Rollback previous operations
-
-### 6. Observability
-
-**Audit Logging**:
-- `@audit` decorator records all document operations
-- Includes: user, time, operation type, resource ID
-
-**Task Tracking**:
-- `gmt_last_reconciled`: Last processing time
-- `error_message`: Failure reason
-- Celery task ID: Link log tracing
-
-**Monitoring Metrics**:
-- Document upload rate
-- Index building duration
-- Failure rate statistics
-
-## Performance Optimization
-
-### 1. Async Processing
-
-**Upload Doesn't Block**:
-- Returns immediately after file upload to object store
-- Index building executes asynchronously in Celery
-- Frontend gets progress via polling or WebSocket
-
-### 2. Batch Operations
-
-**Batch Confirmation**:
-```python
-confirm_documents(document_ids=[id1, id2, ..., idN])
-```
-- Process multiple documents in one transaction
-- Batch create index records
-- Reduce database round-trips
-
-### 3. Caching Strategy
-
-**Parsing Result Cache**:
-- Parsed content saved to `processed_content.md`
-- Subsequent index rebuilds can read directly without re-parsing
-
-**Chunking Result Cache**:
-- Chunking results saved to `chunks/` directory
-- Vector index rebuilds can reuse chunking results
-
-### 4. Parallel Index Building
-
-**Multiple Indexes in Parallel**:
-```python
-# VECTOR, FULLTEXT, GRAPH can be built in parallel
-await asyncio.gather(
- create_vector_index(),
- create_fulltext_index(),
- create_graph_index()
-)
-```
-
-## Error Handling
-
-### Common Exceptions
-
-| Exception Type | HTTP Status | Trigger Scenario | Handling Suggestion |
-|---------------|-------------|------------------|---------------------|
-| `ResourceNotFoundException` | 404 | Collection/document doesn't exist | Check if ID is correct |
-| `CollectionInactiveException` | 400 | Collection not active | Wait for collection initialization |
-| `DocumentNameConflictException` | 409 | Same name, different content | Rename file or delete old document |
-| `QuotaExceededException` | 429 | Quota exceeded | Upgrade plan or delete old documents |
-| `InvalidFileTypeException` | 400 | Unsupported file type | Check supported file type list |
-| `FileSizeTooLargeException` | 413 | File too large | Split file or compress |
-
-### Exception Propagation
-
-```
-Service Layer throws exception
- │
- ▼
-View Layer catches and converts
- │
- ▼
-Exception Handler unified handling
- │
- ▼
-Return standard JSON response:
-{
- "error_code": "QUOTA_EXCEEDED",
- "message": "Document count limit exceeded",
- "details": {
- "limit": 1000,
- "current": 1000
- }
-}
-```
-
-## Related Files Index
-
-### Core Implementation
-
-- **View Layer**: `aperag/views/collections.py` - HTTP interface definition
-- **Service Layer**: `aperag/service/document_service.py` - Business logic
-- **Database Models**: `aperag/db/models.py` - Document, DocumentIndex table definitions
-- **Database Operations**: `aperag/db/ops.py` - CRUD operation encapsulation
-
-### Object Storage
-
-- **Interface Definition**: `aperag/objectstore/base.py` - AsyncObjectStore abstract class
-- **Local Implementation**: `aperag/objectstore/local.py` - Local filesystem storage
-- **S3 Implementation**: `aperag/objectstore/s3.py` - S3-compatible storage
-
-### Document Parsing
-
-- **Main Controller**: `aperag/docparser/doc_parser.py` - DocParser
-- **Parser Implementations**:
- - `aperag/docparser/mineru_parser.py` - MinerU PDF parsing
- - `aperag/docparser/mineru_parser.py` - MinerU document parsing
- - `aperag/docparser/markitdown_parser.py` - MarkItDown universal parsing
- - `aperag/docparser/image_parser.py` - Image OCR
- - `aperag/docparser/audio_parser.py` - Audio transcription
-- **Document Processing**: `aperag/index/document_parser.py` - Parsing flow orchestration
-
-### Index Building
-
-- **Index Management**: `aperag/index/manager.py` - DocumentIndexManager
-- **Vector Index**: `aperag/index/vector_index.py` - VectorIndexer
-- **Full-text Index**: `aperag/index/fulltext_index.py` - FulltextIndexer
-- **Knowledge Graph**: `aperag/index/graph_index.py` - GraphIndexer
-- **Document Summary**: `aperag/index/summary_index.py` - SummaryIndexer
-- **Vision Index**: `aperag/index/vision_index.py` - VisionIndexer
-
-### Task Scheduling
-
-- **Task Definitions**: `config/celery_tasks.py` - Celery task registration
-- **Reconciler**: `aperag/tasks/reconciler.py` - DocumentIndexReconciler
-- **Document Tasks**: `aperag/tasks/document.py` - DocumentIndexTask
-
-### Frontend Implementation
-
-- **Document List**: `web/src/app/workspace/collections/[collectionId]/documents/page.tsx`
-- **Document Upload**: `web/src/app/workspace/collections/[collectionId]/documents/upload/document-upload.tsx`
-
-## Summary
-
-ApeRAG's document upload module adopts a **two-phase commit + multi-parser chain invocation + parallel multi-index building** architecture design:
-
-**Core Features**:
-1. ✅ **Two-Phase Commit**: Upload (temporary storage) → Confirm (formal addition), providing better user experience
-2. ✅ **SHA-256 Deduplication**: Prevents duplicate documents, supports idempotent upload
-3. ✅ **Flexible Storage Backend**: Local/S3 configurable switching, unified interface abstraction
-4. ✅ **Multi-Parser Architecture**: Supports MinerU, MarkItDown and other parsers
-5. ✅ **Automatic Format Conversion**: PDF→images, audio→text, images→OCR text
-6. ✅ **Multi-Index Coordination**: Five index types: vector, full-text, graph, summary, vision
-7. ✅ **Quota Management**: Quota deducted at confirmation stage, reasonable resource control
-8. ✅ **Async Processing**: Celery task queue, doesn't block user operations
-9. ✅ **Transaction Consistency**: Two-phase commit for database + object store
-10. ✅ **Observability**: Audit logs, task tracking, complete error information recording
-
-This design ensures both high performance and scalability, supports complex document processing scenarios (multi-format, multi-language, multi-modal), while maintaining good fault tolerance and user experience.
diff --git a/docs/en-US/design/graph_index_creation.md b/docs/en-US/design/graph_index_creation.md
deleted file mode 100644
index 49ce7f553..000000000
--- a/docs/en-US/design/graph_index_creation.md
+++ /dev/null
@@ -1,1070 +0,0 @@
----
-title: Graph Index Creation Process
-description: Complete process and core technologies for ApeRAG knowledge graph index construction
-keywords: Knowledge Graph, Graph Index, Entity Extraction, Relationship Extraction, Concurrency Optimization
-position: 2
----
-
-# Graph Index Creation Process
-
-## 1. What is Graph Index
-
-Graph Index is a core feature of ApeRAG that automatically extracts structured knowledge graphs from unstructured text.
-
-### 1.1 A Simple Example
-
-Imagine you have a document about company organization:
-
-> "John is the head of the database team and specializes in PostgreSQL and MySQL. Mike works in the frontend team and often collaborates with John's team to develop backend management systems."
-
-**Transformation from Document to Knowledge Graph**:
-
-```mermaid
-flowchart LR
- subgraph Input[📄 Input Document]
- Doc["John is the head of the database team,
specializes in PostgreSQL and MySQL.
Mike works in the frontend team..."]
- end
-
- subgraph Process[🔄 Graph Index Processing]
- Extract[Extract entities and relationships]
- end
-
- subgraph Output[🕸️ Knowledge Graph]
- direction TB
- A[John
Person] -->|heads| B[Database Team
Organization]
- A -->|specializes in| C[PostgreSQL
Technology]
- A -->|specializes in| D[MySQL
Technology]
- E[Mike
Person] -->|belongs to| F[Frontend Team
Organization]
- E -->|collaborates| A
- end
-
- Input --> Process
- Process --> Output
-
- style Input fill:#e3f2fd
- style Process fill:#fff59d
- style Output fill:#c8e6c9
-```
-
-Traditional vector search can only find "semantically similar" paragraphs but cannot answer these questions:
-- What does John lead?
-- What is the relationship between John and Mike?
-- What technologies does the database team use?
-
-**Graph Index can do**: Accurately answer these relationship-focused questions by making implicit knowledge relationships explicit.
-
-### 1.2 Core Value
-
-Compared to traditional retrieval methods, Graph Index provides unique capabilities:
-
-| Capability | Vector Search | Full-text Search | Graph Index |
-|------------|---------------|------------------|-------------|
-| Semantic Similarity | ✅ Strong | ❌ Weak | ✅ Strong |
-| Exact Keyword Match | ❌ Weak | ✅ Strong | ✅ Medium |
-| Relationship Query | ❌ Not Supported | ❌ Not Supported | ✅ Strong |
-| Multi-hop Reasoning | ❌ Not Supported | ❌ Not Supported | ✅ Supported |
-| Suitable Questions | "How to optimize performance" | "PostgreSQL config" | "John and Mike's relationship" |
-
-**Core Advantage**: Graph Index allows AI to "understand" the connections between knowledge, not just text similarity.
-
-## 2. What Problems Can Graph Index Solve
-
-Graph Index excels at handling scenarios that require "understanding relationships". Let's look at practical applications.
-
-### 2.1 Enterprise Knowledge Management
-
-**Scenario**: Companies have extensive documentation including organizational structure, project materials, and technical docs.
-
-**Graph Index Value**:
-
-- 📊 **Organizational Relationships**: "Who is on John's team?" → Quickly find team members
-- 🔗 **Collaboration Networks**: "Who has worked with John?" → Discover work networks
-- 🛠️ **Skill Mapping**: "Who is skilled in PostgreSQL?" → Locate technical experts
-- 📁 **Project History**: "Which projects has John participated in?" → Track project experience
-
-**Real Effect**:
-
-```
-Question: "Who leads the database team?"
-Traditional Search: Returns dozens of paragraphs containing "database team" and "lead"
-Graph Index: Directly returns "John" + relevant background information
-```
-
-### 2.2 Research and Learning
-
-**Scenario**: Analyzing academic papers and technical documentation to understand knowledge lineage.
-
-**Graph Index Value**:
-
-- 👥 **Author Networks**: "Who has this author collaborated with?" → Discover research teams
-- 📖 **Citation Relationships**: "What papers does this cite?" → Trace research lineage
-- 🔬 **Technology Evolution**: "How has this technology evolved?" → Understand tech history
-- 💡 **Concept Connections**: "What's the relationship between tech A and B?" → Connect knowledge points
-
-### 2.3 Products and Services
-
-**Scenario**: Product documentation, user manuals, API documentation.
-
-**Graph Index Value**:
-
-- ⚙️ **Feature Dependencies**: "What needs to be configured before enabling feature A?" → Understand dependencies
-- 🔧 **Configuration Relationships**: "Which features does this config affect?" → Avoid misconfigurations
-- 🐛 **Problem Diagnosis**: "What might cause error X?" → Quick troubleshooting
-- 📚 **API Relationships**: "Which APIs are typically used together?" → Learn best practices
-
-### 2.4 Comparison: When to Use Graph Index
-
-Different questions suit different retrieval methods:
-
-| Question Type | Example | Best Solution |
-|--------------|---------|---------------|
-| **Concept Understanding** | "What is RAG?" | Vector Search |
-| **Exact Lookup** | "PostgreSQL config file path" | Full-text Search |
-| **Relationship Query** | "What's John and Mike's relationship?" | Graph Index ✨ |
-| **Multi-hop Reasoning** | "What tech stack does John's team use?" | Graph Index ✨ |
-| **Knowledge Tracing** | "What modules does this feature depend on?" | Graph Index ✨ |
-
-**Best Practice**: ApeRAG supports vector search, full-text search, and graph index simultaneously, intelligently selecting or combining based on question type.
-
-## 3. Construction Process Overview
-
-When you upload a document and enable graph indexing, ApeRAG automatically completes the following steps. Here's a simple overview; details are in later chapters.
-
-### 3.1 Five Key Steps
-
-```mermaid
-flowchart TB
- subgraph Step1["1️⃣ Document Chunking"]
- A1[Original Document] --> A2[Smart Chunking]
- A2 --> A3[Generate Chunks]
- end
-
- subgraph Step2["2️⃣ Entity Relationship Extraction"]
- B1[Chunks] --> B2[Call LLM]
- B2 --> B3[Identify Entities]
- B2 --> B4[Identify Relationships]
- end
-
- subgraph Step3["3️⃣ Connected Component Analysis"]
- C1[Entity Relationship Network] --> C2[BFS Algorithm]
- C2 --> C3[Grouping]
- end
-
- subgraph Step4["4️⃣ Concurrent Merging"]
- D1[Group 1] --> D2[Entity Deduplication]
- D3[Group 2] --> D4[Entity Deduplication]
- D5[Group N] --> D6[Entity Deduplication]
- D2 --> D7[Relationship Aggregation]
- D4 --> D7
- D6 --> D7
- end
-
- subgraph Step5["5️⃣ Multi-storage Writing"]
- E1[Graph Database]
- E2[Vector Database]
- E3[Text Storage]
- end
-
- A3 --> B1
- B3 --> C1
- B4 --> C1
- C3 --> D1
- C3 --> D3
- C3 --> D5
- D7 --> E1
- D7 --> E2
- A3 --> E3
-
- style Step1 fill:#e3f2fd
- style Step2 fill:#fff3e0
- style Step3 fill:#f3e5f5
- style Step4 fill:#e8f5e9
- style Step5 fill:#fce4ec
-```
-
-**Simply put**: Chunk document → Extract entities/relationships → Smart grouping → Concurrent merging → Write to storage.
-
-The entire process is fully automated - you just upload documents, and the system handles everything.
-
-### 3.2 Processing Time Reference
-
-Processing time varies by document size:
-
-| Document Size | Entity Count | Processing Time | Example |
-|--------------|--------------|-----------------|---------|
-| Small (< 5 pages) | ~50 | 10-30 seconds | Company notices, meeting notes |
-| Medium (10-50 pages) | ~200 | 1-3 minutes | Technical docs, product manuals |
-| Large (100+ pages) | ~1000 | 5-15 minutes | Research reports, books |
-
-**Factors**:
-- LLM response speed (main bottleneck)
-- Document complexity (tables, images slow processing)
-- Concurrency settings (configurable for speed)
-
-> 💡 **Tip**: Processing is asynchronous - upload multiple documents and the system processes them in parallel.
-
-### 3.3 Real-time Progress Tracking
-
-You can check document processing progress anytime:
-
-```
-Document Status: Processing
-- ✅ Document Parsing: Complete
-- ✅ Document Chunking: Complete (25 chunks generated)
-- 🔄 Entity Extraction: In Progress (15/25)
-- ⏳ Relationship Extraction: Waiting
-- ⏳ Graph Construction: Waiting
-```
-
-Once processing completes, document status changes to "Active" and graph queries become available.
-
-## 4. Detailed Construction Process
-
-The previous sections covered what graph index does and the overall process. This chapter details the technical implementation of each step.
-
-> 💡 **Reading Tip**: If you only want to understand basic concepts and usage, skip to Chapter 9 for practical applications.
-
-### 4.1 Document Chunking
-
-First step: Split long documents into appropriately sized chunks.
-
-**Why Chunk?**
-- LLMs have input length limits (typically thousands to tens of thousands of tokens)
-- Too large: Extraction quality decreases, LLM may "miss" information
-- Too small: Loses context, can't understand complete semantics
-
-**Smart Chunking Strategy**:
-
-```mermaid
-flowchart LR
- Doc[Long Document] --> Check{Check Size}
- Check -->|< 1200 tokens| Keep[Keep Intact]
- Check -->|> 1200 tokens| Split[Smart Split]
-
- Split --> By1[By Paragraph]
- By1 --> Check2{Still Too Big?}
- Check2 -->|Yes| By2[By Sentence]
- Check2 -->|No| Done[Complete]
- By2 --> Check3{Still Too Big?}
- Check3 -->|Yes| By3[By Character]
- Check3 -->|No| Done
- By3 --> Done
-
- style Doc fill:#e1f5ff
- style Split fill:#ffccbc
- style Done fill:#c5e1a5
-```
-
-**Chunking Parameters**:
-- Default size: 1200 tokens (approximately 800-1000 English words)
-- Overlap size: 100 tokens (ensures context continuity)
-- Priority: Paragraph > Sentence > Character
-
-### 4.2 Entity Relationship Extraction
-
-Use LLM to identify entities and relationships from each chunk.
-
-**Extraction Process**:
-
-```mermaid
-sequenceDiagram
- participant C as Chunk
- participant L as LLM
- participant R as Results
-
- C->>L: "John heads the database team..."
- L->>R: Entities: [John(Person), Database Team(Org)]
- L->>R: Relationships: [John-heads->Database Team]
-
- C->>L: "John specializes in PostgreSQL..."
- L->>R: Entities: [John(Person), PostgreSQL(Tech)]
- L->>R: Relationships: [John-specializes in->PostgreSQL]
-```
-
-**Concurrency Optimization**: Multiple chunks can call LLM simultaneously, default 20 concurrent requests.
-
-### 4.3 Connected Component Analysis
-
-Divide entity relationship network into independent subgraphs for parallel processing.
-
-**Why This Step?**
-
-Tech team entities and finance department entities aren't connected - they can be processed completely in parallel!
-
-```mermaid
-graph LR
- subgraph Component1[Connected Component 1 - Tech Team]
- A1[John] -->|heads| A2[Database Team]
- A1 -->|specializes in| A3[PostgreSQL]
- A4[Mike] -->|collaborates| A1
- end
-
- subgraph Component2[Connected Component 2 - Finance]
- B1[Alice] -->|belongs to| B2[Finance Dept]
- B3[Bob] -->|collaborates| B1
- end
-
- style Component1 fill:#bbdefb
- style Component2 fill:#c5e1a5
-```
-
-**Performance Boost**: 3 independent components = 3x speedup!
-
-### 4.4 Concurrent Merging
-
-Same-name entities need deduplication, same relationships need aggregation.
-
-```mermaid
-flowchart TD
- subgraph Before["Before Merging"]
- A1["John
Database head"]
- A2["John
Specializes in PostgreSQL"]
- A3["John
Leads team"]
- end
-
- Merge[Smart Merge]
-
- subgraph After["After Merging"]
- B1["John
Database team head,
specializes in PostgreSQL,
leads multiple projects"]
- end
-
- A1 --> Merge
- A2 --> Merge
- A3 --> Merge
- Merge --> B1
-
- style Before fill:#ffccbc
- style After fill:#c5e1a5
-```
-
-**Fine-grained Locks**: Only lock entities being merged, others can process concurrently.
-
-### 4.5 Multi-storage Writing
-
-Knowledge graph written to three storage systems:
-
-```mermaid
-flowchart LR
- KG[Knowledge Graph] --> G[Graph Database
Graph Queries]
- KG --> V[Vector Database
Semantic Search]
- KG --> T[Text Storage
Full-text Search]
-
- style KG fill:#e1f5ff
- style G fill:#bbdefb
- style V fill:#c5e1a5
- style T fill:#ffccbc
-```
-
-Different storages support different query types, complementing each other.
-
-## 5. Core Technical Design
-
-This chapter introduces core technical designs including data isolation and concurrency control.
-
-> 💡 **Reading Tip**: These are system architecture and implementation details, mainly for developers and technical decision-makers.
-
-### 5.1 Workspace Data Isolation
-
-Each Collection has an independent namespace for complete data isolation.
-
-**Naming Convention**:
-
-```python
-# Entity naming
-entity:{entity_name}:{workspace}
-# Example
-entity:John:collection_abc123
-
-# Relationship naming
-relationship:{source}:{target}:{workspace}
-# Example
-relationship:John:Database Team:collection_abc123
-```
-
-**Isolation Effect**:
-
-```mermaid
-graph TB
- subgraph Collection_A[Collection A - Company Docs]
- A1[entity:John:A] --> A2[entity:Database Team:A]
- end
-
- subgraph Collection_B[Collection B - School Docs]
- B1[entity:John:B] --> B2[entity:CS Department:B]
- end
-
- style Collection_A fill:#bbdefb
- style Collection_B fill:#c5e1a5
-```
-
-"John" in two Collections is completely independent, no interference!
-
-### 5.2 Stateless Instance Management
-
-Each processing task creates an independent graph index instance, destroyed after completion.
-
-**Lifecycle Management**:
-
-```mermaid
-sequenceDiagram
- participant C as Celery Task
- participant M as Manager
- participant R as Graph Index Instance
- participant S as Storage
-
- C->>M: process_document()
- M->>R: create_instance()
- R->>S: Initialize storage connections
- R->>R: Process document
- R->>S: Write data
- R-->>M: Return results
- M-->>C: Task complete
- Note over R: Instance destroyed, resources released
-```
-
-**Advantages**:
-- ✅ Zero state pollution: Each task independent, no interference
-- ✅ Easy scaling: Can run multiple workers simultaneously
-- ✅ Resource management: Automatic cleanup, no memory leaks
-
-### 5.3 Connected Component Concurrency Optimization
-
-Intelligent concurrent processing through graph topology analysis.
-
-**Algorithm Principle**:
-
-```mermaid
-graph TB
- subgraph Input[Input: Entity Relationship Network]
- I1[Entity 1] --> I2[Entity 2]
- I2 --> I3[Entity 3]
-
- I4[Entity 4] --> I5[Entity 5]
-
- I6[Entity 6]
- end
-
- Algorithm[BFS Algorithm]
-
- subgraph Output[Output: 3 Connected Components]
- O1[Component 1
3 entities]
- O2[Component 2
2 entities]
- O3[Component 3
1 entity]
- end
-
- Input --> Algorithm
- Algorithm --> Output
-
- style Input fill:#ffccbc
- style Algorithm fill:#fff59d
- style Output fill:#c5e1a5
-```
-
-**Performance Boost**: 3 components concurrent processing = 3x speedup!
-
-### 5.4 Fine-grained Concurrency Control
-
-Precise entity-level locking:
-
-**Lock Hierarchy**:
-
-```mermaid
-graph TD
- A[Global Lock - Traditional] -->|Too Coarse| B[All Entities Serial]
-
- C[Entity Lock - ApeRAG] -->|Just Right| D[Lock Only Merging Entities]
-
- style A fill:#ffccbc
- style B fill:#ffccbc
- style C fill:#c5e1a5
- style D fill:#c5e1a5
-```
-
-**Lock Strategy**:
-1. Extraction phase: No locks, fully parallel
-2. Merging phase: Lock only needed entities
-3. Sorted lock acquisition: Prevents deadlock
-
-### 5.5 Smart Summarization
-
-Automatically compress overly long descriptions:
-
-```python
-if len(description) > 2000 tokens:
- summary = await llm_summarize(description)
-else:
- summary = description
-```
-
-**Effect**: Compress 2500 tokens to 200 tokens, retaining core information.
-
-### 5.6 Multi-storage Backend Support
-
-ApeRAG supports two graph databases: Neo4j and PostgreSQL.
-
-**How to Choose?**
-
-| Scenario | Recommended | Reason |
-|----------|-------------|--------|
-| **Small Scale** (< 100K entities) | PostgreSQL | Simple ops, low cost |
-| **Medium Scale** (100K-1M) | PostgreSQL or Neo4j | Based on query complexity |
-| **Large Scale** (> 1M) | Neo4j | Better graph query performance |
-| **Limited Budget** | PostgreSQL | No extra deployment |
-| **Complex Graph Algorithms** | Neo4j | Built-in graph algorithms |
-
-**Switching**:
-
-```bash
-# Use PostgreSQL (default)
-export GRAPH_INDEX_GRAPH_STORAGE=PGOpsSyncGraphStorage
-
-# Use Neo4j
-export GRAPH_INDEX_GRAPH_STORAGE=Neo4JSyncStorage
-```
-
-## 6. Complete Data Flow
-
-The entire graph index construction is a data transformation pipeline, from unstructured text to structured knowledge graph:
-
-```mermaid
-flowchart TD
- A[Original Document] --> B[Clean & Preprocess]
- B --> C[Smart Chunking]
- C --> D[Chunks]
-
- D --> E[LLM Concurrent Extraction]
- E --> F[Original Entity List]
- E --> G[Original Relationship List]
-
- F --> H[Build Adjacency Graph]
- G --> H
- H --> I[BFS Find Connected Components]
- I --> J[Grouped Concurrent Processing]
-
- J --> K[Entity Deduplication]
- J --> L[Relationship Aggregation]
-
- K --> M{Description Length Check}
- M -->|Too Long| N[LLM Summary]
- M -->|Appropriate| O[Keep Original]
- N --> P[Final Entities]
- O --> P
-
- L --> Q{Description Length Check}
- Q -->|Too Long| R[LLM Summary]
- Q -->|Appropriate| S[Keep Original]
- R --> T[Final Relationships]
- S --> T
-
- P --> U[Graph Database]
- P --> V[Vector Database]
- T --> U
- T --> V
- D --> W[Text Storage]
-
- U --> X[Knowledge Graph Complete]
- V --> X
- W --> X
-
- style A fill:#e1f5ff
- style E fill:#fff59d
- style I fill:#f3e5f5
- style J fill:#c5e1a5
- style X fill:#c8e6c9
-```
-
-### Data Transformation Example
-
-A concrete example showing step-by-step data transformation:
-
-**Input Document**:
-
-```text
-John heads the database team and specializes in PostgreSQL and MySQL.
-Mike works in the frontend team and often collaborates with John's team to develop backend systems.
-Alice is an accountant in the finance department, responsible for financial reports.
-```
-
-**Step 1: Chunking**
-
-```json
-[
- {
- "chunk_id": "chunk-001",
- "content": "John heads the database team and specializes in PostgreSQL and MySQL.",
- "tokens": 15
- },
- {
- "chunk_id": "chunk-002",
- "content": "Mike works in the frontend team and often collaborates with John's team...",
- "tokens": 18
- },
- {
- "chunk_id": "chunk-003",
- "content": "Alice is an accountant in the finance department, responsible for financial reports.",
- "tokens": 14
- }
-]
-```
-
-**Step 2: Entity Relationship Extraction**
-
-```json
-{
- "entities": [
- {"name": "John", "type": "Person", "source": "chunk-001"},
- {"name": "Database Team", "type": "Organization", "source": "chunk-001"},
- {"name": "PostgreSQL", "type": "Technology", "source": "chunk-001"},
- {"name": "MySQL", "type": "Technology", "source": "chunk-001"},
- {"name": "Mike", "type": "Person", "source": "chunk-002"},
- {"name": "Frontend Team", "type": "Organization", "source": "chunk-002"},
- {"name": "Alice", "type": "Person", "source": "chunk-003"},
- {"name": "Finance Department", "type": "Organization", "source": "chunk-003"}
- ],
- "relationships": [
- {"source": "John", "target": "Database Team", "relation": "heads"},
- {"source": "John", "target": "PostgreSQL", "relation": "specializes in"},
- {"source": "John", "target": "MySQL", "relation": "specializes in"},
- {"source": "Mike", "target": "Frontend Team", "relation": "belongs to"},
- {"source": "Mike", "target": "John", "relation": "collaborates"},
- {"source": "Alice", "target": "Finance Department", "relation": "belongs to"}
- ]
-}
-```
-
-**Step 3: Connected Component Analysis**
-
-```
-Connected Component 1 (Technical Department):
-- Entities: John, Mike, Database Team, Frontend Team, PostgreSQL, MySQL
-- Relationships: 6
-
-Connected Component 2 (Finance Department):
-- Entities: Alice, Finance Department
-- Relationships: 1
-```
-
-**Step 4: Concurrent Merging**
-
-Two components can process in parallel!
-
-**Step 5: Final Knowledge Graph**
-
-```mermaid
-graph LR
- subgraph Technical
- John -->|heads| DatabaseTeam[Database Team]
- John -->|specializes in| PostgreSQL
- John -->|specializes in| MySQL
- Mike -->|belongs to| FrontendTeam[Frontend Team]
- Mike -->|collaborates| John
- end
-
- subgraph Finance
- Alice -->|belongs to| FinanceDept[Finance Department]
- end
-
- style Technical fill:#bbdefb
- style Finance fill:#c5e1a5
-```
-
-### Performance Optimization Features
-
-1. **Fine-grained Concurrency Control**
- - Entity-level locks: `entity:John:collection_abc`
- - Lock only during merging, fully parallel during extraction
-
-2. **Connected Component Concurrency**
- - Technical and Finance departments can process in parallel
- - Zero lock contention, full multi-core CPU utilization
-
-3. **Smart Summarization**
- - Description < 2000 tokens: Keep original
- - Description > 2000 tokens: LLM summary compression
-
-## 7. Performance Optimization Strategies
-
-### 7.1 Concurrency Control
-
-Graph index construction involves extensive LLM calls and database operations requiring proper concurrency control.
-
-**Concurrency Hierarchy**:
-
-```mermaid
-graph TB
- A[Document-level Concurrency] --> B[Chunk-level Concurrency]
- B --> C[Component-level Concurrency]
- C --> D[Entity-level Concurrency]
-
- A1[Celery Workers
Multiple docs simultaneously] --> A
- B1[LLM Concurrent Calls
Multiple chunks simultaneously] --> B
- C1[Parallel Component Merging
Multiple components simultaneously] --> C
- D1[Concurrent Entity Merging
Different entities simultaneously] --> D
-
- style A fill:#e3f2fd
- style B fill:#fff3e0
- style C fill:#f3e5f5
- style D fill:#e8f5e9
-```
-
-**Concurrency Parameters**:
-
-| Parameter | Default | Description |
-|-----------|---------|-------------|
-| `llm_model_max_async` | 20 | LLM concurrent calls |
-| `embedding_func_max_async` | 16 | Embedding concurrent calls |
-| `max_batch_size` | 32 | Batch processing size |
-
-**Tuning Recommendations**:
-
-```python
-# Scenario 1: Strict LLM API rate limits
-llm_model_max_async = 5 # Reduce concurrency to avoid rate limiting
-
-# Scenario 2: Sufficient performance, want speedup
-llm_model_max_async = 50 # Increase concurrency to speed up processing
-
-# Scenario 3: Limited memory
-max_batch_size = 16 # Reduce batch size to lower memory usage
-```
-
-### 7.2 LLM Call Optimization
-
-LLM calls are the most time-consuming part, main optimization strategies:
-
-1. **Concurrent Calls**: Multiple chunks extract simultaneously (default 20 concurrent)
-2. **Batch Processing**: Reduce LLM call count
-3. **Cache Reuse**: Reuse summary results for similar descriptions
-
-**Performance Boost**: Concurrent calling is 4x faster than serial.
-
-### 7.3 Storage Optimization
-
-Batch writing significantly improves performance:
-
-| Method | 100 Entity Write Time |
-|--------|---------------------|
-| Individual Write | ~10 seconds |
-| Batch Write (32/batch) | ~1 second |
-
-**Optimization Effect**: 10x speedup!
-
-### 7.4 Memory Optimization
-
-Memory management strategies for large documents:
-
-- Stream chunking: Don't load entire document at once
-- Immediate release: Free memory immediately after processing
-- Batch processing: Control memory peaks
-
-### 7.5 Performance Monitoring
-
-System outputs detailed performance statistics:
-
-```
-Graph Index Construction Complete:
-✓ Document Chunking: 10 chunks, 0.5 seconds
-✓ Entity Extraction: 120 entities, 25 seconds
-✓ Relationship Extraction: 85 relationships, 25 seconds
-✓ Concurrent Merging: 15 seconds
-✓ Storage Writing: 2 seconds
-━━━━━━━━━━━━━━━━━━━━━━━━━
-Total: 42.7 seconds
-```
-
-**Bottleneck Analysis**: Entity/relationship extraction takes 60% of time, can optimize by increasing LLM concurrency.
-
-## 8. Configuration Parameters
-
-### 8.1 Core Configuration
-
-Graph index construction can be tuned with the following parameters:
-
-**Chunking Parameters**:
-
-```python
-# Chunk size (tokens)
-CHUNK_TOKEN_SIZE = 1200
-
-# Overlap size (tokens)
-CHUNK_OVERLAP_TOKEN_SIZE = 100
-```
-
-**Tuning Recommendations**:
-- Small docs (< 5000 tokens): `CHUNK_TOKEN_SIZE = 800`
-- Large docs (> 50000 tokens): `CHUNK_TOKEN_SIZE = 1500`
-- Need more context: Increase `CHUNK_OVERLAP_TOKEN_SIZE`
-
-**Concurrency Parameters**:
-
-```python
-# LLM concurrent calls
-LLM_MODEL_MAX_ASYNC = 20
-
-# Embedding concurrent calls
-EMBEDDING_FUNC_MAX_ASYNC = 16
-
-# Batch processing size
-MAX_BATCH_SIZE = 32
-```
-
-**Tuning Recommendations**:
-- Strict LLM API limits: Lower `LLM_MODEL_MAX_ASYNC` to 5-10
-- Sufficient performance for speedup: Increase to 50-100
-- Limited memory: Lower `MAX_BATCH_SIZE` to 16
-
-**Entity Extraction Parameters**:
-
-```python
-# Entity extraction retry count (0 = extract once only)
-ENTITY_EXTRACT_MAX_GLEANING = 0
-
-# Summary max tokens
-SUMMARY_TO_MAX_TOKENS = 2000
-
-# Force summary description fragment count
-FORCE_LLM_SUMMARY_ON_MERGE = 10
-```
-
-**Tuning Recommendations**:
-- Extraction quality important: `ENTITY_EXTRACT_MAX_GLEANING = 1` (extract twice)
-- Speed priority: `ENTITY_EXTRACT_MAX_GLEANING = 0`
-- Descriptions often long: Lower `SUMMARY_TO_MAX_TOKENS` to 1000
-
-### 8.2 Knowledge Graph Configuration
-
-Configure in Collection settings:
-
-```json
-{
- "knowledge_graph_config": {
- "language": "English",
- "entity_types": [
- "organization",
- "person",
- "geo",
- "event",
- "product",
- "technology",
- "date",
- "category"
- ]
- }
-}
-```
-
-**Parameter Description**:
-
-- **language**: Extraction language, affects LLM prompts
- - `English`: English
- - `simplified chinese`: Simplified Chinese
- - `traditional chinese`: Traditional Chinese
-
-- **entity_types**: Entity types to extract
- - Default: 8 types (organization, person, location, event, product, technology, date, category)
- - Customizable: e.g., extract only people and organizations
-
-### 8.3 Storage Configuration
-
-Configure storage backends via environment variables:
-
-```bash
-# KV storage (key-value)
-export GRAPH_INDEX_KV_STORAGE=PGOpsSyncKVStorage
-
-# Vector storage
-export GRAPH_INDEX_VECTOR_STORAGE=PGOpsSyncVectorStorage
-
-# Graph storage
-export GRAPH_INDEX_GRAPH_STORAGE=Neo4JSyncStorage
-# Or use PostgreSQL
-export GRAPH_INDEX_GRAPH_STORAGE=PGOpsSyncGraphStorage
-```
-
-**Storage Selection Recommendations**:
-
-| Scenario | KV Storage | Vector Storage | Graph Storage |
-|----------|-----------|----------------|---------------|
-| **Default** | PostgreSQL | PostgreSQL | PostgreSQL |
-| **High-performance Vector Search** | PostgreSQL | Qdrant | Neo4j |
-| **Large-scale Graph** | PostgreSQL | Qdrant | Neo4j |
-| **Simple Deployment** | PostgreSQL | PostgreSQL | PostgreSQL |
-
-### 8.4 Complete Configuration Example
-
-```bash
-# Chunking configuration
-export CHUNK_TOKEN_SIZE=1200
-export CHUNK_OVERLAP_TOKEN_SIZE=100
-
-# Concurrency configuration
-export LLM_MODEL_MAX_ASYNC=20
-export MAX_BATCH_SIZE=32
-
-# Extraction configuration
-export ENTITY_EXTRACT_MAX_GLEANING=0
-export SUMMARY_TO_MAX_TOKENS=2000
-
-# Storage configuration
-export GRAPH_INDEX_KV_STORAGE=PGOpsSyncKVStorage
-export GRAPH_INDEX_VECTOR_STORAGE=PGOpsSyncVectorStorage
-export GRAPH_INDEX_GRAPH_STORAGE=PGOpsSyncGraphStorage
-
-# Database connection (PostgreSQL)
-export POSTGRES_HOST=127.0.0.1
-export POSTGRES_PORT=5432
-export POSTGRES_DB=aperag
-export POSTGRES_USER=postgres
-export POSTGRES_PASSWORD=your_password
-
-# Database connection (Neo4j, optional)
-export NEO4J_HOST=127.0.0.1
-export NEO4J_PORT=7687
-export NEO4J_USERNAME=neo4j
-export NEO4J_PASSWORD=your_password
-```
-
-## 9. Practical Application Scenarios
-
-Graph index is particularly suitable for these scenarios:
-
-### 9.1 Enterprise Knowledge Base
-
-**Scenario**: Companies have extensive documentation including organizational structure, project materials, technical docs.
-
-**Graph Index Value**:
-
-- 📊 **Organizational Relationships**: "Who is on John's team?" → Quickly find team members
-- 🔗 **Collaboration Networks**: "Who has worked with John?" → Discover work networks
-- 🛠️ **Skill Mapping**: "Who is skilled in PostgreSQL?" → Locate technical experts
-- 📁 **Project History**: "Which projects has John participated in?" → Track project experience
-
-**Real Effect**:
-
-```
-Question: "Who leads the database team?"
-Traditional Search: Returns dozens of paragraphs containing "database team" and "lead"
-Graph Index: Directly returns "John" + relevant background info
-```
-
-### 9.2 Research and Learning
-
-**Scenario**: Analyzing academic papers and technical documentation to understand knowledge lineage.
-
-**Graph Index Value**:
-
-- 👥 **Author Networks**: "Who has this author collaborated with?" → Discover research teams
-- 📖 **Citation Relationships**: "What papers does this cite?" → Trace research lineage
-- 🔬 **Technology Evolution**: "How has this technology evolved?" → Understand tech history
-- 💡 **Concept Connections**: "What's the relationship between tech A and B?" → Connect knowledge points
-
-**Query Examples**:
-
-```
-User: "What research is related to Graph RAG?"
-Graph Index: Query papers --research--> Graph RAG relationships
-Result: Paper A, Paper B, Paper C
-
-User: "Who has an author collaborated with?"
-Graph Index: Query author --collaborates--> other authors relationships
-Result: Collaborator list and collaboration projects
-```
-
-### 9.3 Products and Services
-
-**Scenario**: Product documentation, user manuals, API documentation.
-
-**Graph Index Value**:
-
-- ⚙️ **Feature Dependencies**: "What needs configuration before enabling feature A?" → Understand dependencies
-- 🔧 **Configuration Relationships**: "Which features does this config affect?" → Avoid misconfigurations
-- 🐛 **Problem Diagnosis**: "What might cause error X?" → Quick troubleshooting
-- 📚 **API Relationships**: "Which APIs are typically used together?" → Learn best practices
-
-**Query Examples**:
-
-```
-User: "How to configure graph index?"
-Graph Index: Query config items --affects--> graph index relationships
-Result: GRAPH_INDEX_GRAPH_STORAGE, knowledge_graph_config
-
-User: "What's the difference between Neo4j and PostgreSQL?"
-Graph Index: Query Neo4j, PostgreSQL properties and relationships
-Result: Performance comparison, applicable scenarios, configuration methods
-```
-
-### 9.4 Conversation Scenario Comparison
-
-Let's see how different retrieval methods perform in actual conversations:
-
-**Question: "What's the relationship between John and Mike?"**
-
-| Retrieval Method | Can Answer | Answer Quality |
-|-----------------|-----------|----------------|
-| **Pure Vector Search** | ⚠️ Partial | Finds paragraphs mentioning both, but unclear relationship |
-| **Pure Full-text Search** | ⚠️ Partial | Finds paragraphs containing "John" and "Mike" |
-| **Graph Index** | ✅ Yes | Directly returns: John and Mike have a collaboration relationship |
-
-**Question: "Where is the PostgreSQL config file?"**
-
-| Retrieval Method | Can Answer | Answer Quality |
-|-----------------|-----------|----------------|
-| **Pure Vector Search** | ✅ Yes | Finds relevant config paragraphs |
-| **Pure Full-text Search** | ✅ Yes | Exact match "PostgreSQL" and "config" |
-| **Graph Index** | ✅ Yes | Finds PostgreSQL --config--> file relationships |
-
-**Question: "How to improve system performance?"**
-
-| Retrieval Method | Can Answer | Answer Quality |
-|-----------------|-----------|----------------|
-| **Pure Vector Search** | ✅ Strong | Finds all performance optimization content |
-| **Pure Full-text Search** | ⚠️ Medium | Needs exact keywords "performance", "optimize" |
-| **Graph Index** | ✅ Strong | Finds optimization methods --improves--> performance relationships |
-
-**Best Practice**: Combine multiple retrieval methods!
-
-## 10. Summary
-
-ApeRAG's graph index provides production-grade knowledge graph construction capabilities with high performance, reliability, and scalability.
-
-### Key Features
-
-1. **Workspace data isolation**: Each Collection completely independent, supporting true multi-tenancy
-2. **Stateless architecture**: Each task independent instance, zero state pollution
-3. **Connected component concurrency**: Intelligent concurrency strategy, 2-3x performance boost
-4. **Fine-grained lock management**: Entity-level locks, maximizing concurrency
-5. **Smart summarization**: Automatically compress overly long descriptions, saving storage and improving retrieval efficiency
-6. **Multi-storage support**: Flexible choice between Neo4j or PostgreSQL
-
-### Suitable Scenarios
-
-- ✅ **Enterprise Knowledge Base**: Understanding organizational structure, personnel relationships, project history
-- ✅ **Research Paper Analysis**: Author collaboration networks, citation relationships, research lineage
-- ✅ **Product Documentation**: Feature dependencies, configuration relationships, problem diagnosis
-- ✅ **Any scenario requiring "relationship" understanding**
-
-### Performance
-
-- Process 10,000 entities: approximately 2-5 minutes (depending on LLM speed)
-- Connected component concurrency: 2-3x performance boost
-- Memory usage: approximately 400 MB (10,000 entities)
-- Storage space: approximately 100 MB (10,000 entities)
-
-### Next Steps
-
-After graph index construction completes, you can perform graph queries. ApeRAG supports three graph query modes:
-
-- **Local Mode**: Query local information about an entity
-- **Global Mode**: Query overall relationships and patterns
-- **Hybrid Mode**: Comprehensive queries
-
-For detailed retrieval process, see [System Architecture Documentation](./architecture.md#42-knowledge-graph-query).
-
----
-
-## Related Documentation
-
-- 📋 [System Architecture](./architecture.md) - ApeRAG overall architecture design
-- 📖 [Entity Extraction and Merging Mechanism](./lightrag_entity_extraction_and_merging.md) - Core algorithm details
-- 🔗 [Connected Component Optimization](./connected_components_optimization.md) - Concurrency optimization principles
-- 🌐 [Index Pipeline Architecture](./indexing_architecture.md) - Complete indexing process
diff --git a/docs/en-US/design/indexing_architecture.md b/docs/en-US/design/indexing_architecture.md
deleted file mode 100644
index 64e6987bf..000000000
--- a/docs/en-US/design/indexing_architecture.md
+++ /dev/null
@@ -1,556 +0,0 @@
-# ApeRAG Indexing Pipeline Architecture Design
-
-## Overview
-
-ApeRAG's indexing pipeline architecture adopts a dual-chain design pattern, separating index management into Frontend Chain and Backend Chain, implementing asynchronous document indexing through state-driven reconciliation. The frontend chain handles fast user operation responses and sets desired index states, while the backend chain detects state differences through a periodic reconciler and schedules asynchronous tasks to execute actual indexing operations.
-
-> 🚀 **Deep Dive**: To understand the detailed Graph Index creation process, continue reading [Graph Index Creation Process Technical Documentation](./graph_index_creation.md)
-
-## Architecture Overview
-
-```mermaid
-graph TB
- subgraph "Frontend Chain (Synchronous Fast Response)"
- A[API Request] --> B[IndexManager]
- B --> C[Write to DocumentIndex Table]
- C --> D[Set status=PENDING, version++]
- end
-
- subgraph "Backend Chain (Asynchronous Task Processing)"
- E[Periodic Task reconcile_indexes_task] --> F[IndexReconciler.reconcile_all]
- F --> G[Detect version mismatch or status change needed]
- G --> H[TaskScheduler schedules async tasks]
- end
-
- subgraph "Celery Task Execution Layer"
- H --> I[create_document_indexes_workflow]
- I --> J[parse_document_task]
- J --> K[trigger_create_indexes_workflow]
- K --> L[group parallel execution]
- L --> M[create_index_task.VECTOR]
- L --> N[create_index_task.FULLTEXT]
- L --> O[create_index_task.GRAPH]
- M --> P[chord callback]
- N --> P
- O --> P
- P --> Q[notify_workflow_complete]
- end
-
- subgraph "State Feedback"
- Q --> R[IndexTaskCallbacks]
- R --> S[Update status=ACTIVE, observed_version]
- S --> T[Next reconciliation check]
- end
-
- D -.-> E
- T -.-> E
-```
-
-## Core Design Principles
-
-### 1. Dual Chain Separation
-
-**Frontend Chain**:
-- **Goal**: Fast response to user operations without blocking API requests
-- **Implementation**: Only operates on database tables, sets desired state, returns immediately
-- **Code**: `IndexManager` in `aperag/index/manager.py`
-
-**Backend Chain**:
-- **Goal**: Asynchronously execute time-consuming indexing operations with retry and error recovery support
-- **Implementation**: Continuously scans state differences through periodic tasks and schedules async tasks
-- **Code**: `IndexReconciler` in `aperag/index/reconciler.py`
-
-### 2. Single Status State-Driven Reconciliation
-
-Records index state and version for each document index through the `DocumentIndex` database table:
-
-```python
-class DocumentIndex(BaseModel):
- document_id: str
- index_type: DocumentIndexType # VECTOR/FULLTEXT/GRAPH
- status: DocumentIndexStatus # PENDING/CREATING/ACTIVE/DELETING/DELETION_IN_PROGRESS/FAILED
- version: int # Version number, increment to trigger rebuild
- observed_version: int # Last processed version
-```
-
-Key Status Meanings:
-- **PENDING**: Awaiting processing (create/update needed)
-- **CREATING**: Task claimed, creation/update in progress
-- **ACTIVE**: Index is up-to-date and ready for use
-- **DELETING**: Deletion has been requested
-- **DELETION_IN_PROGRESS**: Task claimed, deletion in progress
-- **FAILED**: The last operation failed
-
-The reconciler periodically scans all records and triggers corresponding operations based on:
-- Version mismatch: `observed_version < version` indicates need for update
-- Version = 1 with observed_version = 0: indicates need for initial creation
-- Status = DELETING: indicates need for deletion
-
-### 3. TaskScheduler Abstraction Layer Design
-
-**Design Advantages**:
-- **Business Logic and Task System Decoupling**: Reconciler only cares about "what operations to execute", not "what system to execute with"
-- **Multi-scheduler Support**: Can switch between Celery, Prefect/Airflow and other workflow engines
-- **Test-friendly**: Can use different schedulers for testing environments
-
-```python
-# Abstract interface
-class TaskScheduler(ABC):
- def schedule_create_index(self, document_id: str, index_types: List[str], context: dict = None) -> str
- def schedule_update_index(self, document_id: str, index_types: List[str], context: dict = None) -> str
- def schedule_delete_index(self, document_id: str, index_types: List[str]) -> str
-
-# Reconciler uses abstract interface
-class IndexReconciler:
- def __init__(self, scheduler_type: str = "celery"):
- self.task_scheduler = create_task_scheduler(scheduler_type)
-
- def _reconcile_document_operations(self, document_id: str, claimed_indexes: List[dict]):
- # Only calls abstract interface, doesn't care about specific implementation
- if create_types:
- self.task_scheduler.schedule_create_index(document_id, create_types, context)
- if update_types:
- self.task_scheduler.schedule_update_index(document_id, update_types, context)
-```
-
-**Celery Task Entry Point and Business Code Separation**:
-- Celery task functions (`config/celery_tasks.py`): Handle task scheduling, parameter serialization, error retry
-- Business logic (`aperag/tasks/document.py`): Handle specific index creation logic
-- This separation enables independent testing of business logic and facilitates migration between different task systems
-
-### 4. Create vs Update Operation Distinction
-
-The system clearly distinguishes between create and update operations:
-
-**Create Operations** (version = 1, observed_version = 0):
-- For new documents or new index types
-- Uses `schedule_create_index` and `create_index_task`
-- Initial index creation from scratch
-
-**Update Operations** (version > 1, observed_version < version):
-- For existing indexes that need rebuilding
-- Uses `schedule_update_index` and `update_index_task`
-- Updates existing index with new content
-
-This distinction allows for different processing strategies and optimizations for each operation type.
-
-## Asynchronous Task System
-
-### Current Asynchronous Task List
-
-ApeRAG currently defines the following asynchronous tasks, each with clear responsibility division:
-
-| Task Name | Function | Retry Count | Location |
-|-----------|----------|-------------|----------|
-| `parse_document_task` | Parse document content, extract text and metadata | 3 times | config/celery_tasks.py |
-| `create_index_task` | Create single type index (VECTOR/FULLTEXT/GRAPH) | 3 times | config/celery_tasks.py |
-| `update_index_task` | Update single type index | 3 times | config/celery_tasks.py |
-| `delete_index_task` | Delete single type index | 3 times | config/celery_tasks.py |
-| `trigger_create_indexes_workflow` | Dynamic fan-out for index creation tasks | No retry | config/celery_tasks.py |
-| `trigger_update_indexes_workflow` | Dynamic fan-out for index update tasks | No retry | config/celery_tasks.py |
-| `trigger_delete_indexes_workflow` | Dynamic fan-out for index deletion tasks | No retry | config/celery_tasks.py |
-| `notify_workflow_complete` | Aggregate workflow results and notify completion | No retry | config/celery_tasks.py |
-| `reconcile_indexes_task` | Periodic reconciler task | No retry | config/celery_tasks.py |
-
-### Task Design Principles
-
-1. **Fine-grained Tasks**: Each index type (VECTOR/FULLTEXT/GRAPH) is an independent task, supporting individual retries
-2. **Dynamic Orchestration**: Use trigger tasks to decide which index tasks to execute at runtime
-3. **Layered Retry**: Business tasks support retry, orchestration tasks don't retry
-4. **State Callbacks**: Each task calls back to update database state upon completion
-5. **Version Validation**: Tasks validate version numbers to prevent stale operations
-
-### Concurrent Execution Design
-
-#### Celery Group + Chord Pattern
-
-Use Celery's `group` for parallel execution and `chord` for result aggregation:
-
-```python
-# Group: Execute multiple index tasks in parallel with context
-parallel_index_tasks = group([
- create_index_task.s(document_id, index_type, parsed_data_dict, context)
- for index_type in index_types
-])
-
-# Chord: Execute callback after all parallel tasks complete
-workflow_chord = chord(
- parallel_index_tasks,
- notify_workflow_complete.s(document_id, "create", index_types)
-)
-```
-
-#### Task Chaining Mechanism
-
-Use Celery's `chain` for task chaining and `signature` for parameter passing:
-
-```python
-# Chained execution: parse -> dynamic fan-out with context
-workflow_chain = chain(
- parse_document_task.s(document_id),
- trigger_create_indexes_workflow.s(document_id, index_types, context)
-)
-```
-
-#### Parameter Passing and Context Flow
-
-```python
-# Context includes version information for each index type
-context = {
- "VECTOR_version": 2,
- "FULLTEXT_version": 1,
- "GRAPH_version": 3
-}
-
-# Each index task extracts its specific version from context
-def create_index_task(document_id, index_type, parsed_data_dict, context):
- target_version = context.get(f'{index_type}_version')
- # Validate version before processing
-```
-
-## Specific Execution Flow Examples
-
-### Index Creation Execution Flow
-
-Taking user document upload triggering index creation as example:
-
-```python
-# 1. Frontend Chain (Synchronous, millisecond-level)
-API Call -> IndexManager.create_indexes()
- ↓
-Write DocumentIndex table records:
-{
- document_id: "doc123",
- index_type: "VECTOR",
- status: "PENDING",
- version: 1,
- observed_version: 0
-}
- ↓
-API returns 200 immediately
-
-# 2. Backend Chain (Asynchronous, minute-level)
-Periodic task reconcile_indexes_task (executes every 30 seconds)
- ↓
-IndexReconciler.reconcile_all()
- ↓
-Detects version=1, observed_version=0 (create operation needed)
- ↓
-CeleryTaskScheduler.schedule_create_index(doc123, ["VECTOR", "FULLTEXT", "GRAPH"], context)
- ↓
-create_document_indexes_workflow.delay()
-
-# 3. Celery Task Execution (Asynchronous, minutes to hours)
-parse_document_task("doc123")
-├── Download document file to local temp directory
-├── Call docparser to parse document content
-├── Return ParsedDocumentData.to_dict()
-└── Update status="CREATING"
- ↓
-trigger_create_indexes_workflow(parsed_data, "doc123", ["VECTOR", "FULLTEXT", "GRAPH"], context)
-├── Create group parallel tasks with version context
-└── Start chord waiting
- ↓
-Parallel execution:
-├── create_index_task("doc123", "VECTOR", parsed_data, context)
-│ ├── Extract VECTOR_version from context
-│ ├── Validate version still matches database
-│ ├── Call vector_indexer.create_index()
-│ ├── Generate embeddings and store in vector database
-│ └── Callback IndexTaskCallbacks.on_index_created(target_version)
-├── create_index_task("doc123", "FULLTEXT", parsed_data, context)
-│ ├── Extract FULLTEXT_version from context
-│ ├── Validate version still matches database
-│ ├── Call fulltext_indexer.create_index()
-│ ├── Build full-text search index
-│ └── Callback IndexTaskCallbacks.on_index_created(target_version)
-└── create_index_task("doc123", "GRAPH", parsed_data, context)
- ├── Extract GRAPH_version from context
- ├── Validate version still matches database
- ├── Call graph_indexer.create_index()
- ├── Build knowledge graph
- └── Callback IndexTaskCallbacks.on_index_created(target_version)
- ↓
-notify_workflow_complete([result1, result2, result3], "doc123", "create", ["VECTOR", "FULLTEXT", "GRAPH"])
-├── Aggregate all index task results
-├── Log workflow completion
-└── Return WorkflowResult
-```
-
-### Index Update Execution Flow
-
-User modifies document content triggering index update:
-
-```python
-# 1. Frontend Chain
-API Call -> IndexManager.rebuild_indexes()
- ↓
-All existing index records version field +1:
-version: 1 -> 2 (triggers rebuild)
- ↓
-API returns immediately
-
-# 2. Backend Chain
-reconcile_indexes_task detects version mismatch
- ↓
-version=2, observed_version=1, version > 1 (update operation needed)
- ↓
-schedule_update_index() -> update_document_indexes_workflow()
-
-# 3. Task Execution (similar to creation but with update tasks)
-parse_document_task -> trigger_update_indexes_workflow -> parallel update_index_task
-```
-
-### Index Deletion Execution Flow
-
-User deletes document triggering index deletion:
-
-```python
-# 1. Frontend Chain
-API Call -> IndexManager.delete_indexes()
- ↓
-Set status="DELETING"
- ↓
-API returns immediately
-
-# 2. Backend Chain
-Detects status=DELETING
- ↓
-schedule_delete_index() -> delete_document_indexes_workflow()
-
-# 3. Task Execution (no parsing needed)
-trigger_delete_indexes_workflow -> parallel delete_index_task
-├── Delete embeddings from vector database
-├── Delete documents from full-text search engine
-└── Delete nodes and relationships from knowledge graph
-```
-
-## Exception Handling Mechanism
-
-### Task-level Exception Handling
-
-Each Celery task is configured with automatic retry:
-
-```python
-@current_app.task(bind=True, autoretry_for=(Exception,), retry_kwargs={'max_retries': 3, 'countdown': 60})
-def create_index_task(self, document_id: str, index_type: str, parsed_data_dict: dict, context: dict = None):
- try:
- # Extract and validate version from context
- target_version = context.get(f'{index_type}_version') if context else None
-
- # Double-check version still matches database before processing
- # ... version validation logic ...
-
- # Business logic
- result = document_index_task.create_index(document_id, index_type, parsed_data)
- if result.success:
- self._handle_index_success(document_id, index_type, target_version, result.data)
- else:
- # Business logic failure but don't throw exception to avoid meaningless retry
- if self.request.retries >= self.max_retries:
- self._handle_index_failure(document_id, index_type, result.error)
- return result.to_dict()
- except Exception as e:
- # Only mark as failed after retry attempts are exhausted
- if self.request.retries >= self.max_retries:
- self._handle_index_failure(document_id, index_type, str(e))
- raise # Continue throwing exception to trigger retry
-```
-
-### Workflow-level Exception Handling
-
-Aggregate errors through `notify_workflow_complete`:
-
-```python
-def notify_workflow_complete(self, index_results: List[dict], document_id: str, operation: str, index_types: List[str]):
- successful_tasks = []
- failed_tasks = []
-
- for result_dict in index_results:
- result = IndexTaskResult.from_dict(result_dict)
- if result.success:
- successful_tasks.append(result.index_type)
- else:
- failed_tasks.append(f"{result.index_type}: {result.error}")
-
- # Determine overall status
- if not failed_tasks:
- status = TaskStatus.SUCCESS # All successful
- elif successful_tasks:
- status = TaskStatus.PARTIAL_SUCCESS # Partial success
- else:
- status = TaskStatus.FAILED # All failed
-```
-
-### State Management and Error Recovery
-
-Track errors through database state with version validation:
-
-```python
-class IndexTaskCallbacks:
- @staticmethod
- def on_index_created(document_id: str, index_type: str, target_version: int, index_data: str = None):
- """Task success callback with version validation"""
- # Use atomic update with version validation
- update_stmt = (
- update(DocumentIndex)
- .where(
- and_(
- DocumentIndex.document_id == document_id,
- DocumentIndex.index_type == DocumentIndexType(index_type),
- DocumentIndex.status == DocumentIndexStatus.CREATING,
- DocumentIndex.version == target_version, # Critical: validate version
- )
- )
- .values(
- status=DocumentIndexStatus.ACTIVE,
- observed_version=target_version, # Mark this version as processed
- index_data=index_data,
- error_message=None,
- )
- )
-```
-
-### Error Recovery Strategies
-
-1. **Automatic Retry**: Task-level 3 automatic retries to handle temporary network or resource issues
-2. **Version Validation**: Prevents stale operations through version checking at task execution time
-3. **State Reset**: Users can manually reset failed state to trigger re-execution
-4. **Partial Retry**: Only retry failed index types without affecting successful indexes
-5. **Degraded Handling**: Some index failures don't affect document searchability (e.g., graph index failure but vector index success)
-
-## Code Organization Structure
-
-### Directory Structure
-
-```
-aperag/
-├── index/ # Index management core module
-│ ├── manager.py # Index manager (frontend operations)
-│ ├── reconciler.py # Backend reconciler
-│ ├── base.py # Indexer base class definition
-│ ├── vector_index.py # Vector index implementation
-│ ├── fulltext_index.py # Full-text index implementation
-│ └── graph_index.py # Graph index implementation
-├── tasks/ # Task-related modules
-│ ├── models.py # Task data structure definitions
-│ ├── scheduler.py # Task scheduler abstraction layer
-│ ├── document.py # Document processing business logic
-│ └── utils.py # Task utility functions
-└── db/
- └── models.py # Database model definitions
-
-config/
-└── celery_tasks.py # Celery task definitions
-```
-
-### Core Interface Design
-
-#### Index Management Interface
-```python
-# aperag/index/manager.py
-class IndexManager:
- def create_indexes(self, document_id, index_types, created_by, session)
- def rebuild_indexes(self, document_id, index_types, created_by, session)
- def delete_indexes(self, document_id, index_types, session)
-```
-
-#### Reconciler Interface
-```python
-# aperag/index/reconciler.py
-class IndexReconciler:
- def reconcile_all(self) # Main reconciliation loop
- def _get_indexes_needing_reconciliation(self, session) # Get indexes needing reconciliation
- def _reconcile_single_document(self, document_id, operations) # Process single document
-```
-
-#### Indexer Interface
-```python
-# aperag/index/base.py
-class BaseIndexer(ABC):
- def create_index(self, document_id, content, doc_parts, collection, **kwargs)
- def update_index(self, document_id, content, doc_parts, collection, **kwargs)
- def delete_index(self, document_id, collection, **kwargs)
- def is_enabled(self, collection)
-```
-
-### Data Flow Design
-
-#### Task Data Structures
-```python
-# aperag/tasks/models.py
-@dataclass
-class ParsedDocumentData:
- """Document parsing result, carries data needed by all index tasks"""
- document_id: str
- collection_id: str
- content: str # Parsed text content
- doc_parts: List[Any] # Chunked document segments
- file_path: str # Local file path
- local_doc_info: LocalDocumentInfo # Temporary file information
-
-@dataclass
-class IndexTaskResult:
- """Single index task execution result"""
- status: TaskStatus
- index_type: str
- document_id: str
- success: bool
- data: Optional[Dict[str, Any]] # Index metadata (e.g., vector count)
- error: Optional[str]
-
-@dataclass
-class WorkflowResult:
- """Entire workflow execution result"""
- workflow_id: str
- document_id: str
- operation: str # create/update/delete
- status: TaskStatus
- successful_indexes: List[str]
- failed_indexes: List[str]
- index_results: List[IndexTaskResult]
-```
-
-## Current Implementation Status
-
-### Simplified Architecture Features
-
-1. **Removed Distributed Locking**: Current implementation focuses on correctness through version validation rather than distributed locks for external resource concurrency
-2. **Single Status Model**: Simplified from dual-state (desired/actual) to single status with version tracking
-3. **Clear Operation Separation**: Explicit distinction between create (v=1) and update (v>1) operations
-4. **Version-based Validation**: Prevents stale operations through version checking at task execution time
-
-### Future Considerations
-
-1. **Concurrency Control**: While distributed locking has been removed for simplicity, future implementations may need to address concurrent operations on external systems (vector databases, search engines, etc.)
-2. **Performance Optimization**: The current architecture prioritizes correctness and simplicity over maximum performance
-3. **Monitoring Enhancements**: Additional monitoring and alerting capabilities may be added as the system scales
-
-## Summary
-
-ApeRAG's indexing pipeline architecture achieves efficient document indexing through the following technical design:
-
-### Core Advantages
-
-1. **Fast Response**: Frontend chain only operates on database, API response time controlled at millisecond level
-2. **Strong Processing Capability**: Backend asynchronous processing supports large-scale document indexing, improving throughput through parallel tasks
-3. **Good Error Recovery**: Multi-level retry mechanisms and version-based state management, supporting graceful handling of partial failure scenarios
-4. **Strong System Decoupling**: TaskScheduler abstraction layer decouples business logic from specific task systems
-5. **Version Consistency**: Version validation prevents stale operations and ensures data consistency
-
-### Technical Features
-
-1. **State-driven**: Achieves eventual consistency through detection of version mismatches and status changes
-2. **Dynamic Orchestration**: Dynamically creates index tasks at runtime based on document parsing results, avoiding static workflow limitations
-3. **Batch Optimization**: Multiple index tasks for the same document share parsing results, reducing redundant computation
-4. **Layered Design**: Task scheduling, business logic, and index implementation are decoupled in layers for easy testing and maintenance
-5. **Operation Distinction**: Clear separation of create vs update operations allows for optimized processing strategies
-
-This architecture provides good performance and scalability support for high-concurrency document indexing scenarios while ensuring system reliability and maintainability.
-
----
-
-## Related Documents
-
-- 🚀 [Graph Index Creation Process Technical Documentation](./graph_index_creation.md) - Deep dive into the detailed graph index construction process
-- 📋 [索引链路架构设计](./indexing_architecture_zh.md) - Chinese Version
\ No newline at end of file
diff --git a/docs/en-US/design/lightrag_entity_extraction_and_merging.md b/docs/en-US/design/lightrag_entity_extraction_and_merging.md
deleted file mode 100644
index 48debe0ed..000000000
--- a/docs/en-US/design/lightrag_entity_extraction_and_merging.md
+++ /dev/null
@@ -1,696 +0,0 @@
-# LightRAG Entity Extraction and Merging Mechanism
-
-> 📖 **Supplementary Reading**: This document is a technical supplement to [Indexing Architecture Design](./indexing_architecture.md), focusing on the entity extraction and merging mechanisms in the Graph Index creation process.
-
-## Overview
-
-This document details the working principles of two core functions in the LightRAG system:
-- `extract_entities`: Extract entities and relationships from text chunks
-- `merge_nodes_and_edges`: Merge extraction results and update knowledge graph
-
-These two functions constitute the core components of the [Graph Index Creation Process](./graph_index_creation.md), responsible for converting unstructured text into structured knowledge graphs.
-
-## Entity Extraction Mechanism (extract_entities)
-
-### Core Workflow
-
-#### 1. Concurrent Processing Strategy
-```python
-# Use semaphore to control concurrency, avoiding LLM service overload
-semaphore = asyncio.Semaphore(llm_model_max_async)
-
-# Create async tasks for each text chunk
-tasks = [
- asyncio.create_task(_process_single_content(chunk, context))
- for chunk in ordered_chunks
-]
-
-# Wait for all tasks to complete with exception handling
-done, pending = await asyncio.wait(tasks, return_when=asyncio.FIRST_EXCEPTION)
-```
-
-**Concurrent Task Input Format**:
-```python
-# Input for a single processing task
-chunk_input = {
- "content": "Text content to process",
- "chunk_key": "chunk_unique_identifier",
- "file_path": "source_file_path",
- "context": {
- "entity_extraction_prompt": "Entity extraction prompt",
- "continue_extraction_prompt": "Continue extraction prompt",
- "extraction_config": {...}
- }
-}
-```
-
-**Single Task Output Format**:
-```python
-# Return value of _process_single_content function
-task_result = (
- maybe_nodes, # Dict[str, List[Dict]] - candidate entities
- maybe_edges # Dict[Tuple[str, str], List[Dict]] - candidate relationships
-)
-
-# Example output structure
-maybe_nodes = {
- "John": [{
- "entity_name": "John",
- "entity_type": "Person",
- "description": "Chief Technology Officer",
- "source_id": "chunk_001",
- "file_path": "/docs/company.txt"
- }],
- "ABC Corp": [{
- "entity_name": "ABC Corp",
- "entity_type": "Organization",
- "description": "Technology company",
- "source_id": "chunk_001",
- "file_path": "/docs/company.txt"
- }]
-}
-
-maybe_edges = {
- ("John", "ABC Corp"): [{
- "src_id": "John",
- "tgt_id": "ABC Corp",
- "weight": 1.0,
- "description": "John is the CTO of ABC Corp",
- "keywords": "work, position, leadership",
- "source_id": "chunk_001",
- "file_path": "/docs/company.txt"
- }]
-}
-```
-
-#### 2. Multi-round Extraction Mechanism (Gleaning)
-LightRAG employs a multi-round extraction strategy to improve entity recognition completeness:
-
-1. **Initial Extraction**: Use entity extraction prompts for first-time extraction
-2. **Supplementary Extraction**: Discover missed entities through "continue extraction" prompts
-3. **Stop Decision**: LLM autonomously determines whether to continue extraction
-
-```python
-for glean_index in range(entity_extract_max_gleaning):
- # Supplementary extraction: only accept new entity names
- glean_result = await use_llm_func(continue_prompt, history_messages=history)
-
- # Merge results (deduplication)
- for entity_name, entities in glean_nodes.items():
- if entity_name not in maybe_nodes: # Only accept new entities
- maybe_nodes[entity_name].extend(entities)
-
- # Determine whether to continue
- if_continue = await use_llm_func(if_loop_prompt, history_messages=history)
- if if_continue.strip().lower() != "yes":
- break
-```
-
-**Initial Extraction Stage Output**:
-```python
-# Raw results from the first extraction round
-initial_extraction = {
- "entities": [
- {
- "entity_name": "John",
- "entity_type": "Person",
- "description": "Chief Technology Officer"
- },
- {
- "entity_name": "ABC Corp",
- "entity_type": "Organization",
- "description": "Technology company"
- }
- ],
- "relationships": [
- {
- "src_id": "John",
- "tgt_id": "ABC Corp",
- "description": "Employment relationship",
- "keywords": "employee, company"
- }
- ]
-}
-```
-
-**Supplementary Extraction Stage Output**:
-```python
-# Incremental results from each supplementary extraction round
-glean_extraction = {
- "round": 2, # Extraction round number
- "new_entities": [
- {
- "entity_name": "Product Department", # Must be a completely new entity name
- "entity_type": "Department",
- "description": "Product development department at ABC Corp"
- }
- ],
- "new_relationships": [
- {
- "src_id": "John",
- "tgt_id": "Product Department", # Must be a completely new relationship pair
- "description": "Management relationship",
- "keywords": "responsible, manage"
- }
- ],
- "continue_extraction": "no" # LLM decision on whether to continue
-}
-
-# Key Constraints: Gleaning stage only accepts newly discovered entities and relationships
-# - Existing entity names are ignored: if entity_name not in maybe_nodes
-# - Existing relationship pairs are ignored: if edge_key not in maybe_edges
-# - Does NOT supplement or merge descriptions for existing entities
-```
-
-**Multi-round Extraction Chunk Result**:
-```python
-# Complete results after multi-round extraction for a single chunk (gleaning only adds new entities, no merging)
-final_chunk_result = {
- "chunk_id": "chunk_001",
- "total_rounds": 2,
- "maybe_nodes": {
- "John": [{ # Entity from initial extraction
- "entity_name": "John",
- "entity_type": "Person",
- "description": "Chief Technology Officer",
- "source_id": "chunk_001",
- "file_path": "/docs/company.txt"
- }],
- "Product Department": [{ # New entity discovered in gleaning stage
- "entity_name": "Product Department",
- "entity_type": "Department",
- "description": "Product development department at ABC Corp",
- "source_id": "chunk_001",
- "file_path": "/docs/company.txt"
- }]
- # Note: gleaning does NOT merge John's descriptions, only adds new entities
- },
- "maybe_edges": {
- ("John", "ABC Corp"): [{ # Relationship from initial extraction
- "src_id": "John",
- "tgt_id": "ABC Corp",
- "weight": 1.0,
- "description": "John is the CTO of ABC Corp",
- "source_id": "chunk_001"
- }],
- ("John", "Product Department"): [{ # New relationship discovered in gleaning stage
- "src_id": "John",
- "tgt_id": "Product Department",
- "weight": 1.0,
- "description": "John manages the Product Department",
- "source_id": "chunk_001"
- }]
- }
-}
-
-# Important Note: Gleaning Stage Merge Rules
-# - Only accepts new entity names: if entity_name not in maybe_nodes
-# - Does NOT merge multiple description fragments of existing entities
-# - True description merging and weight accumulation happens in merge stage
-```
-
-#### 3. Extraction Result Format
-
-**Entity Format**:
-```python
-{
- "entity_name": "Standardized entity name",
- "entity_type": "Entity type",
- "description": "Entity description",
- "source_id": "chunk_key",
- "file_path": "File path"
-}
-```
-
-**Relationship Format**:
-```python
-{
- "src_id": "Source entity",
- "tgt_id": "Target entity",
- "weight": 1.0, # Relationship weight, see detailed explanation below
- "description": "Relationship description",
- "keywords": "Keywords",
- "source_id": "chunk_key",
- "file_path": "File path"
-}
-```
-
-#### Relationship Weight Mechanism Explained
-
-**Weight Purpose**:
-- 🎯 **Relationship Strength Indicator**: Higher values indicate stronger or more frequent relationships between entities
-- 📊 **Graph Query Optimization**: Prioritize high-weight relationships during retrieval to improve result quality
-- 🔍 **Path Computation**: Used as edge importance weights in graph traversal algorithms
-- 📈 **Knowledge Evolution**: Track degree of relationship repetition across different documents
-
-**Initial Weight Calculation**:
-```python
-# Each newly extracted relationship defaults to weight 1.0
-initial_weight = 1.0
-
-# Special case: LLM may output relationships with weights
-if "weight" in extracted_relation:
- initial_weight = float(extracted_relation["weight"])
-else:
- initial_weight = 1.0 # Default base weight
-```
-
-**Weight Accumulation Rules**:
-- ✅ **Within-Document Repetition**: Same relationship appearing in different chunks of the same document accumulates weight
-- 🔄 **Cross-Document Reinforcement**: Same relationship appearing in different documents continues to accumulate weight
-- 📊 **Frequency Reflection**: Final weight = total occurrences of the relationship across all documents
-
-**Weight Calculation Example**:
-```python
-# Suppose relationship "John" -> "works_at" -> "ABC Corp" appears in:
-# Document1, chunk1: weight = 1.0
-# Document1, chunk3: weight = 1.0
-# Document2, chunk1: weight = 1.0
-# Final weight: 1.0 + 1.0 + 1.0 = 3.0
-
-final_weight = sum([edge["weight"] for edge in same_relation_edges])
-```
-
-### Key Design Features
-
-#### Independent Chunk Processing
-Each text chunk is independently extracted, with results returned as:
-```python
-chunk_results = [
- (chunk1_nodes, chunk1_edges), # First chunk extraction results
- (chunk2_nodes, chunk2_edges), # Second chunk extraction results
- # ... more chunk results
-]
-```
-
-**Design Advantages**:
-- 🚀 **Concurrent Efficiency**: Text chunks can be processed completely in parallel
-- 💾 **Memory Friendly**: Avoid building huge intermediate merged results
-- 🛡️ **Error Isolation**: Single chunk failure doesn't affect other chunks
-- 🔧 **Processing Flexibility**: Different strategies can be applied to different chunks
-
-**Data Characteristics**:
-- ⚠️ **Contains Duplicates**: Same entity may be repeatedly extracted across multiple chunks
-- 📊 **Scattered Data**: Complete entity information is scattered across different chunks
-
-#### Gleaning vs Merge Stage Differences
-
-**Gleaning Stage (within chunk)**:
-- 🎯 **Goal**: Discover more entities and relationships within a single chunk
-- 🔍 **Strategy**: Only add newly discovered entity names and relationship pairs
-- ❌ **No Merging**: Does not merge descriptions of existing entities or accumulate relationship weights
-- 📝 **Code Logic**: `if entity_name not in maybe_nodes`
-
-**Merge Stage (cross-chunk)**:
-- 🎯 **Goal**: Merge all chunk results into final knowledge graph
-- 🔍 **Strategy**: Merge all description fragments of same-named entities, accumulate relationship weights
-- ✅ **Complete Merging**: Description concatenation, weight accumulation, intelligent summarization
-- 📝 **Code Logic**: `all_nodes[entity_name].extend(entities)`
-
-## Entity Merging Mechanism (merge_nodes_and_edges)
-
-### Core Merging Strategy
-
-#### 1. Cross-Chunk Data Collection
-```python
-# Collect all same-named entities and relationships
-all_nodes = defaultdict(list) # {entity_name: [entity1, entity2, ...]}
-all_edges = defaultdict(list) # {(src, tgt): [edge1, edge2, ...]}
-
-for maybe_nodes, maybe_edges in chunk_results:
- # Merge same-named entities
- for entity_name, entities in maybe_nodes.items():
- all_nodes[entity_name].extend(entities)
-
- # Merge same-direction relationships
- for edge_key, edges in maybe_edges.items():
- sorted_key = tuple(sorted(edge_key)) # Unify direction
- all_edges[sorted_key].extend(edges)
-```
-
-**Data Collection Stage Input Format**:
-```python
-# Collection of extraction results from multiple chunks
-chunk_results = [
- # Results from Chunk 1
- (chunk1_maybe_nodes, chunk1_maybe_edges),
- # Results from Chunk 2
- (chunk2_maybe_nodes, chunk2_maybe_edges),
- # ... more chunk results
-]
-
-# Example single chunk result
-chunk1_maybe_nodes = {
- "John": [{
- "entity_name": "John",
- "entity_type": "Person",
- "description": "Chief Technology Officer",
- "source_id": "chunk_001"
- }]
-}
-
-chunk2_maybe_nodes = {
- "John": [{ # Same entity repeated in different chunks
- "entity_name": "John",
- "entity_type": "Person",
- "description": "Product Manager",
- "source_id": "chunk_002"
- }]
-}
-```
-
-**Data Collection Stage Output Format**:
-```python
-# Aggregated data after cross-chunk collection
-all_nodes = {
- "John": [
- {
- "entity_name": "John",
- "entity_type": "Person",
- "description": "Chief Technology Officer",
- "source_id": "chunk_001",
- "file_path": "/docs/company.txt"
- },
- {
- "entity_name": "John",
- "entity_type": "Person",
- "description": "Product Manager",
- "source_id": "chunk_002",
- "file_path": "/docs/company.txt"
- }
- # Multiple description fragments of the same entity awaiting merge
- ],
- "ABC Corp": [
- {
- "entity_name": "ABC Corp",
- "entity_type": "Organization",
- "description": "Technology company",
- "source_id": "chunk_001"
- }
- ]
-}
-
-all_edges = {
- ("ABC Corp", "John"): [ # Key sorted to unify direction
- {
- "src_id": "John",
- "tgt_id": "ABC Corp",
- "weight": 1.0,
- "description": "Employment relationship",
- "source_id": "chunk_001"
- },
- {
- "src_id": "John",
- "tgt_id": "ABC Corp",
- "weight": 1.0,
- "description": "Management relationship",
- "source_id": "chunk_002"
- }
- # Multiple occurrences of same relationship awaiting weight accumulation
- ]
-}
-```
-
-#### 2. Entity Merging Rules
-
-**Type Selection**: Choose the most frequently occurring entity type
-```python
-entity_type = Counter([
- entity["entity_type"] for entity in entities
-]).most_common(1)[0][0]
-```
-
-**Description Merging**: Use separators to join, deduplicate and sort
-```python
-descriptions = [entity["description"] for entity in entities]
-if existing_entity:
- descriptions.extend(existing_entity["description"].split(GRAPH_FIELD_SEP))
-
-merged_description = GRAPH_FIELD_SEP.join(sorted(set(descriptions)))
-```
-
-**Intelligent Summarization**: Automatically generate summaries when description fragments are too many
-```python
-fragment_count = merged_description.count(GRAPH_FIELD_SEP) + 1
-
-if fragment_count >= force_llm_summary_threshold:
- # Use LLM to generate summary, compress long descriptions
- merged_description = await llm_summarize(
- entity_name, merged_description, max_tokens
- )
-```
-
-#### 3. Relationship Merging Rules
-
-**Weight Accumulation**: Reflect enhancement of relationship strength
-```python
-total_weight = sum([edge["weight"] for edge in edges])
-if existing_edge:
- total_weight += existing_edge["weight"]
-```
-
-**Description Aggregation**: Similar to entity description merging strategy
-```python
-# Relationship description merging example
-edge_descriptions = [edge["description"] for edge in edges]
-if existing_edge:
- edge_descriptions.extend(existing_edge["description"].split(GRAPH_FIELD_SEP))
-
-merged_description = GRAPH_FIELD_SEP.join(sorted(set(edge_descriptions)))
-```
-
-**Keyword Deduplication**: Extract and merge all keywords
-```python
-# Keyword merging example
-all_keywords = []
-for edge in edges:
- if edge.get("keywords"):
- all_keywords.extend(edge["keywords"].split(", "))
-
-merged_keywords = ", ".join(sorted(set(all_keywords)))
-```
-
-**Final Merging Rules Output Format**:
-
-**Entity Merging Output**:
-```python
-# Final entity format after merging rules processing
-merged_entity = {
- "entity_name": "John",
- "entity_type": "Person", # Type selected based on frequency
- "description": "Chief Technology Officer§Product Manager§Project Manager", # Descriptions joined with § separator
- "source_chunks": ["chunk_001", "chunk_002", "chunk_003"], # Source chunk list
- "file_paths": ["/docs/company.txt", "/docs/team.txt"], # Source file list
- "mention_count": 3, # Number of chunks mentioning this entity
- "created_at": "2024-01-01T00:00:00Z",
- "updated_at": "2024-01-01T12:00:00Z"
-}
-```
-
-**Relationship Merging Output**:
-```python
-# Final relationship format after merging rules processing
-merged_relationship = {
- "src_id": "John",
- "tgt_id": "ABC Corp",
- "weight": 3.0, # Accumulated weight (1.0 + 1.0 + 1.0)
- "description": "Employment relationship§Management relationship§Leadership relationship", # Descriptions joined with § separator
- "keywords": "employee, company, management, responsible, leadership", # Deduplicated merged keywords
- "source_chunks": ["chunk_001", "chunk_002"], # Chunks where relationship appears
- "file_paths": ["/docs/company.txt"], # Files where relationship appears
- "mention_count": 2, # Number of times relationship was mentioned
- "created_at": "2024-01-01T00:00:00Z",
- "updated_at": "2024-01-01T12:00:00Z"
-}
-```
-
-#### 4. Database Update Process
-
-```mermaid
-graph LR
- A[Merge Entity Data] --> B[Update Graph Database]
- B --> C[Generate Vector Representation]
- C --> D[Update Vector Database]
-
- E[Merge Relationship Data] --> F[Ensure Endpoints Exist]
- F --> G[Update Graph Database]
- G --> H[Generate Vector Representation]
- H --> I[Update Vector Database]
-
- style B fill:#e8f5e8
- style D fill:#e1f5fe
- style G fill:#e8f5e8
- style I fill:#e1f5fe
-```
-
-**Final Database Storage Format**:
-
-**Graph Database Entity Storage Format**:
-```python
-# Entity node stored in graph database
-graph_entity_node = {
- "id": "John", # Entity name as node ID
- "entity_type": "Person",
- "description": "Chief Technology Officer§Product Manager§Project Manager",
- "source_chunks": ["chunk_001", "chunk_002", "chunk_003"],
- "file_paths": ["/docs/company.txt", "/docs/team.txt"],
- "mention_count": 3,
- "workspace": "collection_12345", # Workspace isolation
- "created_at": "2024-01-01T00:00:00Z",
- "updated_at": "2024-01-01T12:00:00Z"
-}
-```
-
-**Graph Database Relationship Storage Format**:
-```python
-# Relationship edge stored in graph database
-graph_relationship_edge = {
- "source": "John", # Source node ID
- "target": "ABC Corp", # Target node ID
- "weight": 3.0,
- "description": "Employment relationship§Management relationship§Leadership relationship",
- "keywords": "employee, company, management, responsible, leadership",
- "source_chunks": ["chunk_001", "chunk_002"],
- "file_paths": ["/docs/company.txt"],
- "mention_count": 2,
- "workspace": "collection_12345",
- "created_at": "2024-01-01T00:00:00Z",
- "updated_at": "2024-01-01T12:00:00Z"
-}
-```
-
-**Vector Database Storage Format**:
-```python
-# Entity vector stored in vector database
-vector_entity_record = {
- "id": "entity_John_collection_12345", # Unique vector record ID
- "entity_name": "John",
- "content": "John is a Person who serves as Chief Technology Officer, Product Manager, and Project Manager", # Text for vectorization
- "content_vector": [0.1, 0.2, ..., 0.9], # 1024-dimensional vector representation
- "workspace": "collection_12345",
- "storage_type": "entity", # Distinguish entity/relationship vectors
- "metadata": {
- "entity_type": "Person",
- "mention_count": 3,
- "file_paths": ["/docs/company.txt", "/docs/team.txt"]
- }
-}
-
-# Relationship vector stored in vector database
-vector_relationship_record = {
- "id": "relation_John_ABC_Corp_collection_12345",
- "relationship": "John -> ABC Corp",
- "content": "John has an employment relationship, management relationship, and leadership relationship with ABC Corp",
- "content_vector": [0.3, 0.4, ..., 0.8],
- "workspace": "collection_12345",
- "storage_type": "relationship",
- "metadata": {
- "weight": 3.0,
- "keywords": "employee, company, management, responsible, leadership",
- "mention_count": 2
- }
-}
-```
-
-### Concurrency Control and Consistency
-
-#### Workspace Isolation
-```python
-# Use workspace for multi-tenant isolation
-lock_manager = get_lock_manager()
-entity_lock = f"entity:{entity_name}:{workspace}"
-relation_lock = f"relation:{src_id}:{tgt_id}:{workspace}"
-
-async with lock_manager.lock(entity_lock):
- # Atomic read-merge-write operations
- existing = await graph_db.get_node(entity_name)
- merged_entity = merge_entity_data(existing, new_entities)
- await graph_db.upsert_node(entity_name, merged_entity)
-```
-
-#### Lock Granularity Optimization
-- **Entity-level Locking**: Each entity locked independently, avoiding global competition
-- **Relationship-level Locking**: Each relationship pair processed independently
-- **Sorted Lock Acquisition**: Prevent deadlocks, ensure consistent lock acquisition order
-
-## Performance Optimization Features
-
-### 1. Connected Component Concurrency
-Intelligent grouping based on graph topology analysis:
-- 🧠 **Topology Analysis**: Use BFS algorithm to discover independent entity groups
-- ⚡ **Parallel Processing**: Different connected components merge completely in parallel
-- 🔒 **Zero Lock Competition**: No shared entities between components, avoiding lock conflicts
-
-### 2. Memory and I/O Optimization
-- 📦 **Batch Processing**: Process by connected components in batches, control memory peaks
-- 🔄 **Connection Reuse**: Database connection pools reduce connection overhead
-- 📊 **Batch Operations**: Use batch database operations whenever possible
-
-### 3. Intelligent Summarization Strategy
-- 🎯 **Threshold Control**: Only call LLM for summary generation when necessary
-- ⚖️ **Performance Balance**: Avoid frequent LLM calls affecting performance
-- 💡 **Information Preservation**: Retain key information during summarization
-
-## Data Flow Overview
-
-```mermaid
-graph TD
- A[Document Chunking] --> B[Concurrent Extraction]
- B --> C[Entity/Relationship Recognition]
- C --> D[Multi-round Supplementary Extraction]
- D --> E[Chunk Extraction Results]
-
- E --> F[Cross-chunk Data Collection]
- F --> G[Same-named Entity Merging]
- G --> H{Description Length Check}
- H -->|Exceeds threshold| I[LLM Intelligent Summary]
- H -->|Normal length| J[Direct Merging]
-
- I --> K[Update Graph Database]
- J --> K
- K --> L[Generate Vectors]
- L --> M[Update Vector Database]
-
- style B fill:#e3f2fd
- style G fill:#fff3e0
- style I fill:#f3e5f5
- style K fill:#e8f5e8
- style M fill:#e1f5fe
-```
-
-## Key Technical Features
-
-### 1. Incremental Update Design
-- ✅ **Non-destructive Merging**: New information enhances rather than replaces existing data
-- 📈 **Weight Accumulation**: Relationship strength increases with repeated occurrences
-- 🔍 **Information Aggregation**: Multi-source descriptions provide more comprehensive entity profiles
-
-### 2. Fault Tolerance and Recovery
-- 🛡️ **Exception Isolation**: Individual task failures don't affect overall process
-- 🔄 **Auto-completion**: Automatically create missing relationship endpoint entities
-- ✔️ **Data Validation**: Strict format and content validation mechanisms
-
-### 3. Scalability Support
-- 🏗️ **Modular Design**: Extraction and merging logic completely decoupled
-- 🔌 **Interface Standards**: Support different graph databases and vector storage
-- 📊 **Monitoring Friendly**: Complete logging and performance metrics
-
-## Summary
-
-LightRAG's entity extraction and merging mechanism achieves efficient knowledge graph construction through the following innovations:
-
-1. **🚀 High-concurrency Extraction**: Chunk parallel processing + multi-round supplementary extraction, ensuring accuracy and efficiency
-2. **🧠 Intelligent Merging**: Connected component-based concurrency optimization, maximizing parallel processing capability
-3. **📊 Incremental Updates**: Non-destructive data merging, supporting continuous evolution of knowledge graphs
-4. **🔒 Concurrent Safety**: Fine-grained lock mechanism + workspace isolation, ensuring multi-tenant data security
-5. **⚡ Performance Optimization**: Intelligent summarization + batch operations, balancing accuracy and processing speed
-
-These technical features enable LightRAG to achieve efficient knowledge graph construction for large-scale documents while ensuring data quality.
-
----
-
-## Related Documents
-
-- 📋 [Indexing Architecture Design](./indexing_architecture.md) - Overall architecture design
-- 🏗️ [Graph Index Creation Process](./graph_index_creation.md) - Detailed graph index construction process
-- 📖 [LightRAG 实体提取与合并机制详解](./lightrag_entity_extraction_and_merging_zh.md) - Chinese Version
\ No newline at end of file
diff --git a/docs/en-US/design/quota-system-design.md b/docs/en-US/design/quota-system-design.md
deleted file mode 100644
index 22778a408..000000000
--- a/docs/en-US/design/quota-system-design.md
+++ /dev/null
@@ -1,482 +0,0 @@
-# ApeRAG Quota System Design Document
-
-## Table of Contents
-- [Overview](#overview)
-- [Architecture](#architecture)
-- [Data Model](#data-model)
-- [API Design](#api-design)
-- [Service Layer](#service-layer)
-- [Frontend Implementation](#frontend-implementation)
-- [Security & Authorization](#security--authorization)
-- [Error Handling](#error-handling)
-- [Usage Patterns](#usage-patterns)
-- [Future Enhancements](#future-enhancements)
-
-## Overview
-
-The ApeRAG quota system is a comprehensive resource management solution designed to control and monitor user resource consumption across the platform. It provides fine-grained control over various resource types including collections, documents, and bots, ensuring fair usage and preventing system abuse.
-
-### Key Features
-
-- **Multi-resource Quota Management**: Support for different quota types (collections, documents, bots)
-- **Real-time Usage Tracking**: Automatic tracking of current resource usage
-- **System Default Configuration**: Centralized default quota settings for new users
-- **Administrative Controls**: Admin-only quota management and user search capabilities
-- **Atomic Operations**: Thread-safe quota consumption and release operations
-- **Flexible API Design**: RESTful APIs supporting both individual and batch operations
-
-### Supported Quota Types
-
-1. **max_collection_count**: Maximum number of collections a user can create
-2. **max_document_count**: Total maximum number of documents across all collections
-3. **max_document_count_per_collection**: Maximum documents per individual collection
-4. **max_bot_count**: Maximum number of bots a user can create (excluding system default bot)
-
-## Architecture
-
-The quota system follows a layered architecture pattern:
-
-```
-┌─────────────────────────────────────────────────────────────┐
-│ Frontend Layer │
-│ ┌─────────────────┐ ┌─────────────────┐ ┌──────────────┐ │
-│ │ User Quotas │ │ Admin Panel │ │ System Config│ │
-│ │ View │ │ View │ │ View │ │
-│ └─────────────────┘ └─────────────────┘ └──────────────┘ │
-└─────────────────────────────────────────────────────────────┘
- │
-┌─────────────────────────────────────────────────────────────┐
-│ API Layer │
-│ ┌─────────────────┐ ┌─────────────────┐ ┌──────────────┐ │
-│ │ Quota APIs │ │ Admin APIs │ │ System APIs │ │
-│ │ (quotas.yaml) │ │ │ │ │ │
-│ └─────────────────┘ └─────────────────┘ └──────────────┘ │
-└─────────────────────────────────────────────────────────────┘
- │
-┌─────────────────────────────────────────────────────────────┐
-│ Service Layer │
-│ ┌─────────────────────────────────────────────────────────┐ │
-│ │ QuotaService │ │
-│ │ • get_user_quotas() │ │
-│ │ • check_and_consume_quota() │ │
-│ │ • release_quota() │ │
-│ │ • recalculate_user_usage() │ │
-│ │ • update_user_quota() │ │
-│ │ • get/update_system_default_quotas() │ │
-│ └─────────────────────────────────────────────────────────┘ │
-└─────────────────────────────────────────────────────────────┘
- │
-┌─────────────────────────────────────────────────────────────┐
-│ Data Layer │
-│ ┌─────────────────┐ ┌─────────────────┐ ┌──────────────┐ │
-│ │ UserQuota │ │ ConfigModel │ │ Related │ │
-│ │ Table │ │ Table │ │ Tables │ │
-│ └─────────────────┘ └─────────────────┘ └──────────────┘ │
-└─────────────────────────────────────────────────────────────┘
-```
-
-## Data Model
-
-### UserQuota Table
-
-The core table for storing user quota information:
-
-```sql
-CREATE TABLE user_quota (
- user VARCHAR(256) NOT NULL, -- User identifier
- key VARCHAR(256) NOT NULL, -- Quota type key
- quota_limit INTEGER NOT NULL DEFAULT 0, -- Maximum allowed usage
- current_usage INTEGER NOT NULL DEFAULT 0, -- Current usage count
- gmt_created TIMESTAMP WITH TIME ZONE NOT NULL,
- gmt_updated TIMESTAMP WITH TIME ZONE NOT NULL,
- gmt_deleted TIMESTAMP WITH TIME ZONE,
- PRIMARY KEY (user, key)
-);
-```
-
-**Key Fields:**
-- `user`: User identifier (foreign key to user table)
-- `key`: Quota type (e.g., 'max_collection_count', 'max_document_count')
-- `quota_limit`: Maximum allowed usage for this quota type
-- `current_usage`: Real-time tracking of current usage
-- Composite primary key ensures one quota record per user per quota type
-
-### ConfigModel Table
-
-Stores system-wide configuration including default quotas:
-
-```sql
-CREATE TABLE config (
- key VARCHAR(256) PRIMARY KEY, -- Configuration key
- value TEXT NOT NULL, -- JSON configuration value
- gmt_created TIMESTAMP WITH TIME ZONE NOT NULL,
- gmt_updated TIMESTAMP WITH TIME ZONE NOT NULL,
- gmt_deleted TIMESTAMP WITH TIME ZONE
-);
-```
-
-**System Default Quotas Configuration:**
-```json
-{
- "max_collection_count": 10,
- "max_document_count": 1000,
- "max_document_count_per_collection": 100,
- "max_bot_count": 5
-}
-```
-
-### Related Tables
-
-The quota system integrates with several other tables for usage calculation:
-
-- **Collection**: For counting user collections (status != 'DELETED')
-- **Document**: For counting documents across collections
-- **Bot**: For counting user bots (excluding system default bot)
-
-## API Design
-
-### RESTful Endpoints
-
-#### 1. Get User Quotas
-```http
-GET /quotas?user_id={user_id}&search={search_term}
-```
-
-**Features:**
-- Current user quotas (no parameters)
-- Admin-only user search by ID, username, or email
-- Support for multiple search results
-
-**Response Types:**
-- `UserQuotaInfo`: Single user quota information
-- `UserQuotaList`: Multiple users (search results)
-
-#### 2. Update User Quota
-```http
-PUT /quotas/{user_id}
-```
-
-**Features:**
-- Admin-only operation
-- Supports both single and batch quota updates
-- Atomic transaction handling
-
-**Request Body:**
-```json
-{
- "max_collection_count": 20,
- "max_document_count": 2000,
- "max_bot_count": 10
-}
-```
-
-#### 3. Recalculate Usage
-```http
-POST /quotas/{user_id}/recalculate
-```
-
-**Features:**
-- Admin-only operation
-- Recalculates actual usage from database
-- Updates current_usage fields atomically
-
-#### 4. System Default Quotas
-```http
-GET /system/default-quotas
-PUT /system/default-quotas
-```
-
-**Features:**
-- Admin-only operations
-- Centralized default quota management
-- Applied to new user initialization
-
-### OpenAPI Specification
-
-The API follows OpenAPI 3.0 specification with:
-- Comprehensive schema definitions
-- Detailed error response mappings
-- Parameter validation rules
-- Security requirements (admin-only operations)
-
-## Service Layer
-
-### QuotaService Class
-
-The `QuotaService` class provides the core business logic for quota management:
-
-#### Key Methods
-
-**1. Quota Consumption (Thread-Safe)**
-```python
-async def check_and_consume_quota(
- self,
- user_id: str,
- quota_type: str,
- amount: int = 1,
- session=None
-) -> None:
- """
- Atomically check and consume quota with SELECT FOR UPDATE
- Raises QuotaExceededException if quota would be exceeded
- """
-```
-
-**2. Quota Release**
-```python
-async def release_quota(
- self,
- user_id: str,
- quota_type: str,
- amount: int = 1,
- session=None
-) -> None:
- """
- Release quota (decrease usage) with transaction safety
- """
-```
-
-**3. Usage Recalculation**
-```python
-async def recalculate_user_usage(self, user_id: str) -> Dict[str, int]:
- """
- Recalculate actual usage from related tables:
- - Collections: COUNT(*) WHERE user=? AND status!='DELETED'
- - Documents: COUNT(*) via JOIN with collections
- - Bots: COUNT(*) WHERE user=? AND title!='Default Agent Bot'
- """
-```
-
-**4. User Initialization**
-```python
-async def initialize_user_quotas(self, user_id: str) -> None:
- """
- Initialize default quotas for new users from system config
- """
-```
-
-#### Transaction Management
-
-- **Database Operations**: All quota operations use async database transactions
-- **Atomic Updates**: SELECT FOR UPDATE prevents race conditions
-- **Session Handling**: Supports both standalone and nested transactions
-
-## Frontend Implementation
-
-### React Component Architecture
-
-The frontend quota management is implemented as a comprehensive React component with:
-
-#### Key Features
-
-**1. Multi-Tab Interface (Admin Only)**
-- User Quotas Management
-- System Default Configuration
-
-**2. User Search & Selection**
-- Real-time search by username, email, or user ID
-- Multiple result handling with selection interface
-- Clear search functionality
-
-**3. Inline Table Editing**
-- Edit mode toggle for quota limits
-- Real-time validation
-- Batch save operations
-- Cancel/revert functionality
-
-**4. Progress Visualization**
-- Usage rate progress bars
-- Color-coded status (normal/warning/critical)
-- Percentage calculations
-
-**5. Administrative Actions**
-- Quota recalculation
-- System default quota management
-- User-specific quota updates
-
-#### Component Structure
-
-```typescript
-interface QuotaInfo {
- quota_type: string;
- quota_limit: number;
- current_usage: number;
- remaining: number;
-}
-
-interface UserQuotaInfo {
- user_id: string;
- username: string;
- email?: string;
- role: string;
- quotas: QuotaInfo[];
-}
-```
-
-#### State Management
-
-- **Local State**: Component-level state for UI interactions
-- **API Integration**: Direct API calls with loading states
-- **Error Handling**: User-friendly error messages with internationalization
-
-## Security & Authorization
-
-### Role-Based Access Control
-
-**Regular Users:**
-- View own quotas only
-- Read-only access
-- No administrative functions
-
-**Admin Users:**
-- Full quota management capabilities
-- User search and selection
-- System configuration access
-- Quota recalculation permissions
-
-### API Security
-
-- **Authentication**: Required for all quota endpoints
-- **Authorization**: Admin role verification for management operations
-- **Input Validation**: Comprehensive parameter validation
-- **Rate Limiting**: Implicit through quota system itself
-
-## Error Handling
-
-### Exception Hierarchy
-
-```python
-class QuotaExceededException(BusinessException):
- """
- Raised when quota consumption would exceed limits
- Maps to appropriate HTTP status codes:
- - Collection quota: 403 Forbidden
- - Document quota: 403 Forbidden
- - Bot quota: 403 Forbidden
- """
-```
-
-### Error Response Format
-
-```json
-{
- "error_code": "COLLECTION_QUOTA_EXCEEDED",
- "code": 1103,
- "message": "已达到max_collection_count的配额限制。当前使用量: 10/10",
- "details": {
- "quota_type": "max_collection_count",
- "quota_limit": 10,
- "current_usage": 10,
- "quota_exceeded": true
- }
-}
-```
-
-### Frontend Error Handling
-
-- **Internationalized Messages**: Multi-language error display
-- **User-Friendly Feedback**: Clear action guidance
-- **Graceful Degradation**: Fallback UI states
-
-## Usage Patterns
-
-### Resource Creation Flow
-
-```python
-# Example: Creating a new collection
-async def create_collection(user_id: str, collection_data: dict):
- async with database_transaction() as session:
- # 1. Check and consume quota atomically
- await quota_service.check_and_consume_quota(
- user_id,
- 'max_collection_count',
- session=session
- )
-
- # 2. Create the resource
- collection = Collection(**collection_data)
- session.add(collection)
-
- # 3. Commit transaction (quota and resource together)
- await session.commit()
-```
-
-### Resource Deletion Flow
-
-```python
-# Example: Deleting a collection
-async def delete_collection(user_id: str, collection_id: str):
- async with database_transaction() as session:
- # 1. Mark resource as deleted
- collection.status = 'DELETED'
- collection.gmt_deleted = utc_now()
-
- # 2. Release quota
- await quota_service.release_quota(
- user_id,
- 'max_collection_count',
- session=session
- )
-
- # 3. Commit transaction
- await session.commit()
-```
-
-### Admin Operations
-
-```python
-# Example: Batch quota update
-await quota_service.update_user_quota(
- user_id="user123",
- quota_updates={
- "max_collection_count": 20,
- "max_document_count": 2000,
- "max_bot_count": 10
- }
-)
-```
-
-## Future Enhancements
-
-### Planned Features
-
-1. **Quota History Tracking**
- - Historical quota changes
- - Usage analytics
- - Trend analysis
-
-2. **Dynamic Quota Adjustment**
- - Usage-based automatic adjustments
- - Temporary quota increases
- - Time-based quotas
-
-3. **Advanced Monitoring**
- - Real-time quota alerts
- - Usage prediction
- - Capacity planning
-
-4. **Integration Enhancements**
- - External quota providers
- - Multi-tenant support
- - API rate limiting integration
-
-### Technical Improvements
-
-1. **Performance Optimization**
- - Quota caching strategies
- - Batch operations
- - Database indexing improvements
-
-2. **Scalability**
- - Distributed quota management
- - Microservice architecture
- - Event-driven updates
-
-3. **Observability**
- - Detailed metrics collection
- - Quota operation tracing
- - Performance monitoring
-
----
-
-*This document provides a comprehensive overview of the ApeRAG quota system design and implementation. For specific implementation details, refer to the source code in the respective modules.*
diff --git a/docs/en-US/development/_category.yaml b/docs/en-US/development/_category.yaml
deleted file mode 100644
index 2cea69694..000000000
--- a/docs/en-US/development/_category.yaml
+++ /dev/null
@@ -1,2 +0,0 @@
-title: Development
-position: 4
diff --git a/docs/en-US/development/development-guide.md b/docs/en-US/development/development-guide.md
deleted file mode 100644
index 38c101ace..000000000
--- a/docs/en-US/development/development-guide.md
+++ /dev/null
@@ -1,373 +0,0 @@
-# 🛠️ Development Guide
-
-This guide focuses on setting up a development environment and the development workflow for ApeRAG. This is designed for developers looking to contribute to ApeRAG or run it locally for development purposes.
-
-## 🚀 Development Environment Setup
-
-Follow these steps to set up ApeRAG from source code for development:
-
-### 1. 📂 Clone the Repository and Setup Environment
-
-First, get the source code and configure environment variables:
-
-```bash
-git clone https://github.com/apecloud/ApeRAG.git
-cd ApeRAG
-cp envs/env.template .env
-```
-
-Edit the `.env` file to configure your AI service settings if needed. The default settings work with the local database services started in the next step.
-
-### 2. 📋 System Prerequisites
-
-Before you begin, ensure your system has:
-
-* **Node.js**: Version 20 or higher is recommended for frontend development. [Download Node.js](https://nodejs.org/)
-* **Docker & Docker Compose**: Required for running database services locally. [Download Docker](https://docs.docker.com/get-docker/)
-
-**Note**: Python 3.11 is required but will be automatically managed by `uv` in the next steps.
-
-### 3. 🗄️ Start Database Services
-
-Use Docker Compose to start the essential database services:
-
-```bash
-# Start core databases: PostgreSQL, Redis, Qdrant, Elasticsearch
-make infra-up
-```
-
-This will start all required database services in the background. The default connection settings in your `.env` file are pre-configured to work with these services.
-
-
-Advanced Database Options
-
-```bash
-# Use Neo4j instead of PostgreSQL for graph storage
-make infra-up WITH_NEO4J=1
-```
-
-
-
-### 4. ⚙️ Setup Development Environment
-
-Create Python virtual environment and setup development tools:
-
-```bash
-make env-dev
-```
-
-This command will:
-* Install `uv` if not already available
-* Create a Python 3.11 virtual environment (located in `.venv/`)
-* Install backend dependencies and repository git hooks
-* Install pre-commit hooks for code quality
-* Install addlicense tool for license management
-
-**Activate the virtual environment:**
-```bash
-source .venv/bin/activate
-```
-
-You'll know it's active when you see `(.venv)` in your terminal prompt.
-
-### 5. 📦 Install Dependencies
-
-Install all backend and frontend dependencies:
-
-```bash
-make env-install
-```
-
-This command will:
-* Install all Python backend dependencies from `pyproject.toml` into the virtual environment
-* Install frontend Node.js dependencies using `yarn`
-
-### 6. 🔄 Apply Database Migrations
-
-Setup the database schema:
-
-```bash
-make db-migrate
-```
-
-### 7. ▶️ Start Development Services
-
-Now you can start the development services. Open separate terminal windows/tabs for each service:
-
-**Terminal 1 - Backend API Server:**
-```bash
-make serve-api
-```
-This starts the FastAPI development server at `http://localhost:8000` with auto-reload on code changes.
-
-**Terminal 2 - Celery Worker:**
-```bash
-make serve-worker
-```
-This starts the Celery worker for processing asynchronous background tasks.
-
-**Terminal 3 - Frontend (Optional):**
-```bash
-make serve-web
-```
-This starts the frontend development server at `http://localhost:3000` with hot reload.
-
-### 8. 🌐 Access ApeRAG
-
-With the services running, you can access:
-* **Frontend UI**: http://localhost:3000 (if started)
-* **Backend API**: http://localhost:8000
-* **API Documentation**: http://localhost:8000/docs
-
-### 9. ⏹️ Stopping Services
-
-To stop the development environment:
-
-**Stop Database Services:**
-```bash
-# Stop database services (data preserved)
-make stack-down
-
-# Stop services and remove all data volumes
-make stack-down REMOVE_VOLUMES=1
-```
-
-**Stop Development Services:**
-- Backend API Server: Press `Ctrl+C` in the terminal running `make serve-api`
-- Celery Worker: Press `Ctrl+C` in the terminal running `make serve-worker`
-- Frontend Server: Press `Ctrl+C` in the terminal running `make serve-web`
-
-**Data Management:**
-- `make stack-down` - Stops services but preserves all data (PostgreSQL, Redis, Qdrant, etc.)
-- `make stack-down REMOVE_VOLUMES=1` - Stops services and **⚠️ permanently deletes all data**
-- You can run `make stack-down REMOVE_VOLUMES=1` even after already running `make stack-down`
-
-**Verify Data Removal:**
-```bash
-# Check if volumes still exist
-docker volume ls | grep aperag
-
-# Should return no results after REMOVE_VOLUMES=1
-```
-
-Now you have ApeRAG running locally from source code, ready for development! 🎉
-
-## ❓ Common Development Tasks
-
-### Q: 🔧 How do I add or modify a REST API endpoint?
-
-**Complete workflow:**
-1. Define request/response models in Python using Pydantic models.
-2. Implement backend view: `aperag/views/[module].py`
-3. Export the code-first OpenAPI specs:
- ```bash
- make openapi-generate # Writes openapi.full.json and openapi.public.json
- ```
-4. Verify the exported specs:
- ```bash
- make openapi-check
- ```
-5. Coordinate frontend typed client updates through the FE v2 adapter layer.
-6. Test the API:
- ```bash
- make test-all
- # ✅ Check live docs: http://localhost:8000/docs
- ```
-
-### Q: 🗃️ How do I modify database models/schema?
-
-**Database migration workflow:**
-1. Edit SQLModel classes in `aperag/db/models.py`
-2. Generate migration file:
- ```bash
- make db-revision # Creates new migration in migration/versions/
- ```
-3. Apply migration to database:
- ```bash
- make db-migrate # Updates database schema
- ```
-4. Update related code (repositories in `aperag/db/repositories/`, services in `aperag/service/`)
-5. Verify changes:
- ```bash
- make test-all # ✅ Ensure everything works
- ```
-
-### Q: ⚡ How do I add a new feature with background processing?
-
-**Feature implementation workflow:**
-1. Implement feature components:
- - Backend logic: `aperag/[module]/`
- - Async tasks: `aperag/tasks/`
- - Database models: `aperag/db/models.py`
-2. Update API and verify the code-first contract:
- ```bash
- make db-revision # Generate migration files
- make db-migrate # Apply database changes
- make openapi-check # Verify FastAPI/Pydantic OpenAPI export
- ```
-3. Quality assurance:
- ```bash
- make format && make lint && make test-all
- ```
-
-### Q: 🧪 How do I run unit tests and e2e tests?
-
-**Unit Tests (Fast, No External Dependencies):**
-```bash
-# Run all unit tests
-make test-unit
-
-# Run specific test file
-uv run pytest tests/unit_test/test_model_service.py -v
-
-# Run specific test class or function
-uv run pytest tests/unit_test/test_model_service.py::TestModelService::test_get_models -v
-
-# Run tests with coverage
-make test-unit
-```
-
-**E2E Tests (Require Running Services):**
-```bash
-# Setup: Start required services first
-make infra-up # 🗄️ Start databases
-make serve-api # 🚀 Start API server (separate terminal)
-
-# Run the remaining pytest-based product e2e tests
-make test-e2e
-
-# Run HTTP black-box smoke against a running service
-make test-http-smoke
-
-# Run backend integration tests
-make test-integration
-
-# Run specific pytest e2e modules
-uv run pytest tests/e2e_pytest/test_chat.py -v
-
-# Run specific integration modules
-uv run pytest tests/integration/graphstorage/ -v
-
-# Run with detailed output and no capture
-uv run pytest tests/e2e_pytest/test_chat.py -v -s
-
-# Performance benchmarks (with timing)
-make test-e2e-perf
-```
-
-**Complete Test Suite:**
-```bash
-# Run everything (unit + e2e)
-make test-all
-
-# Test with different configurations
-make infra-up WITH_NEO4J=1 # Test with Neo4j instead of PostgreSQL
-make test-all
-```
-
-### Q: 🐛 How do I debug failing tests?
-
-**Debugging workflow:**
-1. Run failing test in isolation:
- ```bash
- # Single test with full output
- uv run pytest tests/unit_test/test_failing.py::test_specific_function -v -s
-
- # Stop on first failure
- uv run pytest tests/unit_test/ -x --tb=short
- ```
-2. For e2e test failures, ensure services are running:
- ```bash
- make infra-up # Database services
- make serve-api # API server
- make serve-worker # Background workers (if testing async tasks)
- ```
-3. Use debugging tools:
- ```bash
- # Run with pdb debugger
- uv run pytest tests/unit_test/test_failing.py --pdb
-
- # Capture logs during test
- uv run pytest tests/e2e_pytest/test_chat.py --log-cli-level=DEBUG
- ```
-4. Fix and retest:
- ```bash
- make format # Auto-fix style issues
- make lint # Check remaining issues
- uv run pytest tests/path/to/fixed_test.py -v # Verify fix
- ```
-
-### Q: 📦 How do I update dependencies safely?
-
-**Python dependencies:**
-1. Edit `pyproject.toml` (add/update packages)
-2. Update virtual environment:
- ```bash
- make env-install # Syncs all groups and extras with uv
- make test-all # Verify compatibility
- ```
-
-**Frontend dependencies:**
-1. Edit `frontend/package.json`
-2. Update and test:
- ```bash
- cd frontend && yarn install
- make serve-web # Test frontend compilation
- ```
-
-### Q: 🚀 How do I prepare code for production deployment?
-
-**Pre-deployment checklist:**
-1. Code quality validation:
- ```bash
- make format # Auto-fix all style issues
- make lint # Verify no style violations
- make static-check # MyPy type checking
- ```
-2. Comprehensive testing:
- ```bash
- make test-all # All unit + e2e tests
- make test-e2e-perf # Performance benchmarks
- ```
-3. API consistency:
- ```bash
- make openapi-check # Ensure code-first OpenAPI export works
- ```
-4. Database migrations:
- ```bash
- make db-revision # Generate any pending migrations
- ```
-5. Full-stack integration test:
- ```bash
- make stack-up WITH_NEO4J=1 WITH_DOCRAY=1 # Production-like setup
- # Manual testing at http://localhost:3000/web/
- make stack-down
- ```
-
-### Q: 🔄 How do I completely reset my development environment?
-
-**Nuclear reset (destroys all data):**
-```bash
-make stack-down REMOVE_VOLUMES=1 # ⚠️ Stop services + delete ALL data
-make env-clean # 🧹 Clean temporary files
-
-# Restart fresh
-make infra-up # 🗄️ Fresh databases
-make db-migrate # 🔄 Apply all migrations
-make serve-api # 🚀 Start API server
-make serve-worker # ⚡ Start background workers
-```
-
-**Soft reset (preserve data):**
-```bash
-make stack-down # ⏹️ Stop services, keep data
-make infra-up # 🗄️ Restart databases
-make db-migrate # 🔄 Apply any new migrations
-```
-
-**Reset just Python environment:**
-```bash
-rm -rf .venv/ # 🗑️ Remove virtual environment
-make env-dev # ⚙️ Recreate everything
-source .venv/bin/activate # ✅ Reactivate
-```
diff --git a/docs/en-US/development/es-p0-contract-runtime-hardening.md b/docs/en-US/development/es-p0-contract-runtime-hardening.md
deleted file mode 100644
index 603c1991b..000000000
--- a/docs/en-US/development/es-p0-contract-runtime-hardening.md
+++ /dev/null
@@ -1,97 +0,0 @@
-# ES P0 Contract And Runtime Hardening
-
-This document captures the first implementation slice of the Elasticsearch redesign:
-
-- P0-A: contract hard-fix
-- P0-B: runtime hardening
-
-It intentionally does **not** implement the P1 shared logical index redesign or the
-P1 migration / reindex rollout. Those steps require separate design and rollout PRs.
-
-## Goals
-
-The P0 slice fixes correctness and runtime issues without introducing physical
-index migration risk:
-
-1. Make `enable_fulltext` effective in runtime behavior.
-2. Stop routing fulltext search through vector naming helpers.
-3. Add explicit filterable fulltext fields for `collection_id`, `document_id`,
- `chunk_id`, and `chat_id`.
-4. Stop silently degrading fulltext search failures into unexplained empty recall.
-5. Turn IK from an implicit startup assumption into an explicit runtime dependency.
-
-## Scope
-
-### Included in P0
-
-- Fulltext index creation is skipped when `enable_fulltext=false`.
-- Fulltext document indexing tasks skip cleanly when the collection disables fulltext.
-- Fulltext search uses `generate_fulltext_index_name(...)`, not the vector helper.
-- Fulltext chunks store explicit top-level filter fields.
-- Chat-scoped fulltext search writes explicit top-level `chat_id`, but keeps a
- temporary dual-read filter on both `chat_id` and legacy `metadata.chat_id`
- until the later reindex / rollout line removes the historical path.
-- Fulltext keyword extraction falls back to the raw query token when all extractors
- return nothing.
-- Fulltext backend failures are logged as explicit degrade events before returning
- empty recall.
-- IK installation behavior is gated by explicit environment flags.
-
-### Explicitly excluded from P0
-
-- Shared logical index.
-- Alias / versioned rebuild / cutover.
-- Physical fulltext index renaming.
-- Per-collection to shared migration.
-- Reindex / backfill / rollback orchestration across existing ES data.
-
-## Compatibility Boundary
-
-P0 keeps the current physical per-collection fulltext index model in place.
-This keeps the first PR small and avoids mixing correctness fixes with data-plane
-migration.
-
-That means:
-
-- No existing ES indices are renamed in this slice.
-- No automatic reindex runs in this slice.
-- Rollback remains a code rollback, not an ES data rollback.
-
-The physical model changes only in the later P1 implementation.
-
-P0 also does **not** make collection-config flips self-healing at the data plane:
-
-- Turning `enable_fulltext` off stops new runtime reads and writes.
-- Turning it back on does not automatically purge or rebuild existing ES
- projections.
-- Fulltext projection convergence after config flips still requires an explicit
- rebuild / rollout action.
-
-## Source Of Truth
-
-P0 does not change the source-of-truth model:
-
-- Object store remains the source of the original document.
-- Parser / chunking remains a derived layer.
-- Elasticsearch remains a projection for fulltext retrieval.
-
-This PR must not turn ES into an authoritative data source.
-
-## Runtime Contract For IK
-
-IK remains the Chinese analyzer dependency in this slice, but it is now treated
-as an explicit runtime dependency instead of an implicit startup side effect.
-
-The startup contract is:
-
-- `ES_REQUIRE_IK_PLUGIN=true|false`
-- `ES_AUTO_INSTALL_IK=true|false`
-- `ES_IK_PLUGIN_URL=`
-
-This allows environments to:
-
-- fail fast when IK is required but unavailable, or
-- explicitly opt into controlled bootstrap behavior.
-
-Longer-term image baking / artifact pinning still belongs to the broader runtime
-hardening line, but P0 removes the hidden dependency behavior.
diff --git a/docs/en-US/development/es-shared-logical-index-rollout.md b/docs/en-US/development/es-shared-logical-index-rollout.md
deleted file mode 100644
index 7c7abd286..000000000
--- a/docs/en-US/development/es-shared-logical-index-rollout.md
+++ /dev/null
@@ -1,194 +0,0 @@
-# ES Shared Logical Index Rollout
-
-This document captures the P1 implementation slice of the Elasticsearch redesign:
-
-- P1-A: shared logical fulltext index
-- P1-B: migration / reindex / rollout
-
-It is the follow-up to `es-p0-contract-runtime-hardening.md`. P0 fixed runtime
-correctness and explicit contracts; P1 changes the physical fulltext layout.
-
-## Goals
-
-1. Replace the per-collection physical fulltext index model with a shared
- logical index.
-2. Make alias / versioned rebuild / cutover / rollback first-class rollout
- primitives instead of one-off manual steps.
-3. Preserve per-collection correctness by keeping `collection_id` and `chat_id`
- as explicit fulltext filter fields.
-4. Provide a real migration path for existing data instead of leaving rollout as
- a docs-only follow-up.
-
-## Target Runtime Model
-
-### Logical view
-
-- Runtime fulltext reads and writes use the shared alias: `aperag-fulltext`
-- The alias points to a concrete versioned physical index:
- - `aperag-fulltext-v1`
- - `aperag-fulltext-v2`
- - ...
-
-### Document contract inside the shared index
-
-Each chunk document stores explicit top-level fields:
-
-- `collection_id`
-- `document_id`
-- `chunk_id`
-- `chat_id`
-- `name`
-- `content`
-- `title`
-
-`metadata` remains a stored payload blob, not the authoritative filter path.
-
-### Query model
-
-- Fulltext recall is now always collection-scoped in ES itself.
-- Collection filters run on top-level `collection_id`.
-- Chat-scoped recall keeps the P0 dual-read guard:
- - `chat_id`
- - `metadata.chat_id`
-
-This means P1 does not regress existing-data compatibility while the rollout is
-still in flight.
-
-## Shared Index Settings
-
-The shared physical index no longer relies on cluster-default topology.
-
-The creation contract is now explicit:
-
-- `ES_FULLTEXT_NUMBER_OF_SHARDS`
-- `ES_FULLTEXT_NUMBER_OF_REPLICAS`
-
-Default values:
-
-- `number_of_shards = 1`
-- `number_of_replicas = 0`
-
-These defaults are intentionally single-node friendly. Production deployments
-can override them explicitly instead of inheriting incompatible cluster
-defaults across many tiny indices.
-
-## Routing Strategy
-
-The shared index uses `collection_id` as the routing key for:
-
-- chunk writes
-- collection-scoped deletes
-- collection-scoped search
-- collection-scoped count verification
-- legacy reindex migration
-
-This keeps shared-index writes and reads collection-local inside ES without
-bringing back per-collection physical index fragmentation.
-
-## Source Of Truth And Rebuild Authority
-
-The source-of-truth model remains unchanged:
-
-- Object store is the source of original documents.
-- Parser / chunking output is derived data.
-- PostgreSQL remains the authority for ApeRAG collection/document identity.
-- Elasticsearch remains a projection for BM25/fulltext retrieval.
-
-Therefore:
-
-- ES loss must be recoverable from the authoritative source path.
-- ES migration does not redefine ownership of document data.
-- Rollback means alias rollback and, if necessary, rebuild from the true source
- path, not treating ES as authoritative state.
-
-## Rollout Script
-
-The rollout entrypoint is:
-
-```bash
-python scripts/migrate_es_fulltext_shared_index.py
-```
-
-### Legacy -> shared migration
-
-Dry-run the plan first:
-
-```bash
-python scripts/migrate_es_fulltext_shared_index.py --dry-run
-```
-
-Copy legacy per-collection indices into the shared physical target:
-
-```bash
-python scripts/migrate_es_fulltext_shared_index.py --target-version v1
-```
-
-Cut the shared alias after verification:
-
-```bash
-python scripts/migrate_es_fulltext_shared_index.py --target-version v1 --cutover
-```
-
-Delete old legacy indices after the new path is verified:
-
-```bash
-python scripts/migrate_es_fulltext_shared_index.py --only-delete --delete-old
-```
-
-### Versioned rebuild
-
-Rebuild the current shared target into a new physical version:
-
-```bash
-python scripts/migrate_es_fulltext_shared_index.py --mode shared --target-version v2
-```
-
-Cut the alias to the rebuilt target:
-
-```bash
-python scripts/migrate_es_fulltext_shared_index.py --mode shared --target-version v2 --cutover
-```
-
-Rollback the alias if the cutover needs to be reverted:
-
-```bash
-python scripts/migrate_es_fulltext_shared_index.py --rollback-to aperag-fulltext-v1
-```
-
-## Verification Contract
-
-The rollout script verifies:
-
-- legacy source document counts
-- migrated document counts inside the shared target, scoped by `collection_id`
-- shared rebuild total counts for versioned alias rebuilds
-
-Legacy migration is intentionally rerunnable:
-
-- before reindexing a collection, the script deletes that collection's docs from
- the target physical index
-- then reindexes the source collection again
-
-This assumes a controlled rollout window where writers are paused.
-
-## Explicit Boundaries
-
-### Included in P1
-
-- shared logical index alias
-- versioned physical fulltext indices
-- explicit shard / replica settings
-- collection-based routing
-- legacy per-collection -> shared migration
-- alias cutover / rollback
-- legacy index cleanup
-
-### Explicitly excluded from P1
-
-- replacing Elasticsearch with another engine
-- bucketed physical index as the default layout
-- automatic config-flip-driven rebuild orchestration for `enable_fulltext`
-
-If future scale or isolation requirements prove shared is no longer economical,
-bucketed physical indices remain a later conditional follow-up, not the default
-P1 outcome.
diff --git a/docs/en-US/images/HarryPotterKG2.png b/docs/en-US/images/HarryPotterKG2.png
deleted file mode 100644
index 101f98fb8..000000000
Binary files a/docs/en-US/images/HarryPotterKG2.png and /dev/null differ
diff --git a/docs/en-US/images/backend.jpeg b/docs/en-US/images/backend.jpeg
deleted file mode 100644
index e5bdae22e..000000000
Binary files a/docs/en-US/images/backend.jpeg and /dev/null differ
diff --git a/docs/en-US/images/celery.jpeg b/docs/en-US/images/celery.jpeg
deleted file mode 100644
index e593052b3..000000000
Binary files a/docs/en-US/images/celery.jpeg and /dev/null differ
diff --git a/docs/en-US/images/chat2.png b/docs/en-US/images/chat2.png
deleted file mode 100644
index dda576da4..000000000
Binary files a/docs/en-US/images/chat2.png and /dev/null differ
diff --git a/docs/en-US/images/collection-page.png b/docs/en-US/images/collection-page.png
deleted file mode 100644
index 9d9622a47..000000000
Binary files a/docs/en-US/images/collection-page.png and /dev/null differ
diff --git a/docs/en-US/images/configure-ollama-1.png b/docs/en-US/images/configure-ollama-1.png
deleted file mode 100644
index 82806f490..000000000
Binary files a/docs/en-US/images/configure-ollama-1.png and /dev/null differ
diff --git a/docs/en-US/images/configure-ollama-2.png b/docs/en-US/images/configure-ollama-2.png
deleted file mode 100644
index 15cb93026..000000000
Binary files a/docs/en-US/images/configure-ollama-2.png and /dev/null differ
diff --git a/docs/en-US/images/configure-ollama-3.png b/docs/en-US/images/configure-ollama-3.png
deleted file mode 100644
index fe1f0c729..000000000
Binary files a/docs/en-US/images/configure-ollama-3.png and /dev/null differ
diff --git a/docs/en-US/images/configure-ollama-4.png b/docs/en-US/images/configure-ollama-4.png
deleted file mode 100644
index 27239233b..000000000
Binary files a/docs/en-US/images/configure-ollama-4.png and /dev/null differ
diff --git a/docs/en-US/images/configure-ollama-5.png b/docs/en-US/images/configure-ollama-5.png
deleted file mode 100644
index ca1da292c..000000000
Binary files a/docs/en-US/images/configure-ollama-5.png and /dev/null differ
diff --git a/docs/en-US/images/configure-ollama-6.png b/docs/en-US/images/configure-ollama-6.png
deleted file mode 100644
index 93aa45c20..000000000
Binary files a/docs/en-US/images/configure-ollama-6.png and /dev/null differ
diff --git a/docs/en-US/images/configure-ollama-7.png b/docs/en-US/images/configure-ollama-7.png
deleted file mode 100644
index c374a37cb..000000000
Binary files a/docs/en-US/images/configure-ollama-7.png and /dev/null differ
diff --git a/docs/en-US/images/dify/aperag-banner.png b/docs/en-US/images/dify/aperag-banner.png
deleted file mode 100644
index 338290248..000000000
Binary files a/docs/en-US/images/dify/aperag-banner.png and /dev/null differ
diff --git a/docs/en-US/images/dify/step1-subscribe-collection.png b/docs/en-US/images/dify/step1-subscribe-collection.png
deleted file mode 100644
index 3a9ede8ad..000000000
Binary files a/docs/en-US/images/dify/step1-subscribe-collection.png and /dev/null differ
diff --git a/docs/en-US/images/dify/step2-add-mcp.png b/docs/en-US/images/dify/step2-add-mcp.png
deleted file mode 100644
index 92eb2154d..000000000
Binary files a/docs/en-US/images/dify/step2-add-mcp.png and /dev/null differ
diff --git a/docs/en-US/images/dify/step2-api-key.png b/docs/en-US/images/dify/step2-api-key.png
deleted file mode 100644
index 555b0b7de..000000000
Binary files a/docs/en-US/images/dify/step2-api-key.png and /dev/null differ
diff --git a/docs/en-US/images/dify/step2-configure-mcp.png b/docs/en-US/images/dify/step2-configure-mcp.png
deleted file mode 100644
index 149787ef9..000000000
Binary files a/docs/en-US/images/dify/step2-configure-mcp.png and /dev/null differ
diff --git a/docs/en-US/images/dify/step2-mcp-success.png b/docs/en-US/images/dify/step2-mcp-success.png
deleted file mode 100644
index 7f91bc07e..000000000
Binary files a/docs/en-US/images/dify/step2-mcp-success.png and /dev/null differ
diff --git a/docs/en-US/images/dify/step3-create-app.png b/docs/en-US/images/dify/step3-create-app.png
deleted file mode 100644
index a41dc7cdf..000000000
Binary files a/docs/en-US/images/dify/step3-create-app.png and /dev/null differ
diff --git a/docs/en-US/images/dify/step3-select-agent.png b/docs/en-US/images/dify/step3-select-agent.png
deleted file mode 100644
index ed3b0c00a..000000000
Binary files a/docs/en-US/images/dify/step3-select-agent.png and /dev/null differ
diff --git a/docs/en-US/images/dify/step4-configure-agent.png b/docs/en-US/images/dify/step4-configure-agent.png
deleted file mode 100644
index ac1ea12af..000000000
Binary files a/docs/en-US/images/dify/step4-configure-agent.png and /dev/null differ
diff --git a/docs/en-US/images/dify/step4-test-agent.png b/docs/en-US/images/dify/step4-test-agent.png
deleted file mode 100644
index 92f90d101..000000000
Binary files a/docs/en-US/images/dify/step4-test-agent.png and /dev/null differ
diff --git a/docs/en-US/images/feishu-qr-code.png b/docs/en-US/images/feishu-qr-code.png
deleted file mode 100644
index 94bc1f297..000000000
Binary files a/docs/en-US/images/feishu-qr-code.png and /dev/null differ
diff --git a/docs/en-US/images/star-history-2025922.png b/docs/en-US/images/star-history-2025922.png
deleted file mode 100644
index 406baf80d..000000000
Binary files a/docs/en-US/images/star-history-2025922.png and /dev/null differ
diff --git a/docs/en-US/integration/_category.yaml b/docs/en-US/integration/_category.yaml
deleted file mode 100644
index a3e10338d..000000000
--- a/docs/en-US/integration/_category.yaml
+++ /dev/null
@@ -1,2 +0,0 @@
-title: Integration
-position: 2
diff --git a/docs/en-US/integration/dify.md b/docs/en-US/integration/dify.md
deleted file mode 100644
index 4d45f656e..000000000
--- a/docs/en-US/integration/dify.md
+++ /dev/null
@@ -1,168 +0,0 @@
----
-title: Integrating ApeRAG with Dify
-description: Quick integration of ApeRAG's Graph RAG capabilities via MCP protocol
-keywords: Dify, ApeRAG, MCP, Graph RAG
----
-
-# Integrating ApeRAG with Dify
-
-ApeRAG is a production-grade RAG platform with multimodal indexing, AI agents, MCP support, and scalable K8s deployment capabilities. It helps users build complex AI applications with **hybrid retrieval**, **multimodal document processing**, and **enterprise-grade management**.
-
-**Core Features**:
-- Unlike "standard" RAG, ApeRAG implements **Graph-RAG**, building knowledge graphs to understand deep relationships between data elements
-- Integrates **MinerU**, designed for complex documents, scientific papers, and financial reports, accurately extracting tables, formulas, and engineering diagrams
-- Full Kubernetes support with built-in **high availability**, **scalability**, and **enterprise-grade management**
-
-## Video Demo
-
-
-
-
-
-## Step 1: Prepare Knowledge Base
-
-Open your ApeRAG web UI (see [Quick Start](../../../README.md#quick-start); with Docker Compose this is typically http://localhost:3000/web/). Sign in and select or import a knowledge base. This walkthrough uses the Romance of the Three Kingdoms example—click **Subscribe**.
-
-
-

-
-
-## Step 2: Configure MCP Server
-
-### 2.1 Add MCP Server
-
-Go to Dify - Tools - MCP, click Add MCP Server.
-
-
-

-
-
-### 2.2 Fill Configuration
-
-Fill in Server URL: `http://localhost:8000/mcp/` (use `https:///mcp/` if ApeRAG is not local), paste your API Key from ApeRAG, then click Confirm.
-
-
-

-
-
-
-

-
-
-### 2.3 Success
-
-MCP Server added successfully.
-
-
-

-
-
-## Step 3: Create Agent Application
-
-### 3.1 Create App
-
-Go to Dify - Studio, click Create Application.
-
-
-

-
-
-### 3.2 Select Type
-
-Click More Basic Application Types, select **Agent** type, name it, and click Create.
-
-
-

-
-
-## Step 4: Configure Agent
-
-Click Agent, input Prompt, add the ApeRAG MCP tool, select the LLM in the top-right corner, click Publish to use.
-
-
-

-
-
-
-

-
-
-### Prompt Reference
-
-```markdown
-# ApeRAG Smart Assistant
-
-You are an advanced AI research assistant powered by ApeRAG's hybrid search capabilities. Your mission is to help users accurately and autonomously find, understand, and synthesize information from knowledge bases and the web.
-
-## Core Behaviors
-
-**Autonomous Research**: Work independently until user queries are fully resolved. Search multiple sources, analyze findings, and provide comprehensive answers without waiting for permission.
-
-**Language Intelligence**: Always respond in the language the user asks in. When users ask in Chinese, respond in Chinese regardless of source language.
-
-**Visual Thinking**: **[Critical]** You are an assistant that prefers visual explanations. For any information involving entity relationships, processes, or structures, you must prioritize visualization.
-
-**Complete Solutions**: Explore from multiple angles, cross-validate sources, and ensure comprehensive coverage before responding.
-
-## Search Strategy
-
-### Priority System
-1. **User-specified knowledge base** (mentioned via "@"): Strictly limit search to specified base
-2. **Unspecified knowledge base**: Autonomously discover and search relevant bases
-3. **Web search** (if enabled): Supplement information
-4. **Clear attribution**: Always cite sources
-
-### Search Execution
-- **Knowledge base search**: Use vector + graph search by default
-- **Result processing logic**:
- 1. Execute search
- 2. **Detect graph data**: Check if search results contain `entities` and `relationships`
- 3. **Mandatory visualization**: If search results contain non-empty entity or relation data, **you must** call the `create_diagram` tool
- 4. **Content filtering**: Ignore irrelevant results
-
-## Available Tools
-
-### Knowledge Management
-- `list_collections()`: Discover available knowledge sources
-- `search_collection(collection_id, query, ...)`: **[Primary tool]** Hybrid search in persistent knowledge bases
-- `search_chat_files(chat_id, query, ...)`: **[Chat only]** Search only files temporarily uploaded in current chat session
-- `create_diagram(content)`: **[Mandatory tool]** When search results contain structured info (entities/relations), must call this tool to generate Mermaid diagrams
-
-### Web Intelligence
-- `web_search(query, ...)`: Multi-engine web search
-- `web_read(url_list, ...)`: Extract and analyze web content
-
-## Response Format
-
-### Direct Answer
-[Clear, actionable answer in user's language]
-
-### Comprehensive Analysis
-[Detailed explanation with context and insights]
-
-### Knowledge Graph Visualization
-[Tool-generated diagram displayed here]
-*(Only show after successfully calling create_diagram. Displays entity relationships from search results.)*
-
-### Supporting Evidence
-- [Knowledge Base Name]: [Key Findings]
-
-**Web Sources** (if enabled):
-- [Title] ([Domain]) - [Key Points]
-```
-
----
-
-Integrating ApeRAG with Dify is very simple. Once integrated, you can not only experience Dify's platform features but also enjoy **ApeRAG's powerful Graph-RAG capabilities**!
-
-**GitHub**: https://github.com/apecloud/ApeRAG
diff --git a/docs/en-US/integration/mcp-api.md b/docs/en-US/integration/mcp-api.md
deleted file mode 100644
index 798b54374..000000000
--- a/docs/en-US/integration/mcp-api.md
+++ /dev/null
@@ -1,333 +0,0 @@
----
-title: MCP API
-description: Model Context Protocol API Documentation
----
-
-# MCP API
-
-ApeRAG provides standardized tool interfaces through [Model Context Protocol (MCP)](https://modelcontextprotocol.io/), allowing AI assistants (Claude Desktop, Cursor, Dify, etc.) to directly access your knowledge bases.
-
-## Quick Start
-
-### Configuration Example
-
-For Claude Desktop, add to configuration file:
-
-```json
-{
- "mcpServers": {
- "aperag": {
- "url": "http://localhost:8000/mcp/",
- "headers": {
- "Authorization": "Bearer your-api-key-here"
- }
- }
- }
-}
-```
-
-### Authentication
-
-Two authentication methods supported (by priority):
-
-1. **HTTP Authorization Header** (Recommended): `Authorization: Bearer your-api-key`
-2. **Environment Variable** (Fallback): `APERAG_API_KEY=your-api-key`
-
-> **Get API Key**: Login to ApeRAG, create or copy your API Key from settings
-
-## Available Tools
-
-### 1. list_collections
-
-List all accessible knowledge bases.
-
-**Parameters**: None
-
-**Returns**:
-```json
-{
- "items": [
- {
- "id": "collection-id",
- "title": "Collection Title",
- "description": "Collection Description"
- }
- ]
-}
-```
-
-### 2. search_collection
-
-Search in knowledge bases with multiple retrieval methods.
-
-**Core Parameters**:
-
-| Parameter | Type | Default | Description |
-|-----------|------|---------|-------------|
-| `collection_id` | string | Required | Knowledge base ID |
-| `query` | string | Required | Search query |
-| `use_vector_index` | bool | true | Vector retrieval (semantic search) |
-| `use_fulltext_index` | bool | true | Full-text retrieval (keyword matching) |
-| `use_graph_index` | bool | true | Graph retrieval (relation query) |
-| `use_summary_index` | bool | true | Summary retrieval |
-| `use_vision_index` | bool | true | Vision retrieval (image search) |
-| `rerank` | bool | true | AI reranking |
-| `topk` | int | 5 | Results per method |
-
-**Return Format**:
-```json
-{
- "query": "your question",
- "items": [
- {
- "rank": 1,
- "score": 0.95,
- "content": "relevant content",
- "source": "document name",
- "recall_type": "vector_search|graph_search|fulltext_search|summary_search",
- "metadata": {
- "page_idx": 0,
- "document_id": "doc-id",
- "collection_id": "col-id",
- "indexer": "text|vision"
- }
- }
- ]
-}
-```
-
-**Image Handling**:
-
-If `metadata.indexer == "vision"`, it's an image:
-- Empty `content`: Retrieved via multimodal vector
-- Non-empty `content`: Contains image description
-
-Image URL format:
-```python
-m = item.metadata
-asset_url = f"asset://{m['asset_id']}?document_id={m['document_id']}&collection_id={m['collection_id']}&mime_type={m['mimetype']}"
-```
-
-**Usage Examples**:
-
-```python
-# Default search (recommended) - all methods enabled
-results = search_collection(
- collection_id="abc123",
- query="How to deploy applications?"
-)
-
-# Vector + Graph only
-results = search_collection(
- collection_id="abc123",
- query="deployment strategies",
- use_vector_index=True,
- use_fulltext_index=False,
- use_graph_index=True,
- use_summary_index=False,
- topk=10
-)
-```
-
-### 3. search_chat_files
-
-Search in temporary files from chat session.
-
-**When to Use**:
-- ✅ User uploaded files in current conversation
-- ✅ Analyzing temporary documents in chat
-- ❌ Don't use for persistent knowledge bases (use `search_collection`)
-
-**Parameters**:
-
-| Parameter | Type | Default | Description |
-|-----------|------|---------|-------------|
-| `chat_id` | string | Required | Chat ID |
-| `query` | string | Required | Search query |
-| `use_vector_index` | bool | true | Vector retrieval |
-| `use_fulltext_index` | bool | true | Full-text retrieval |
-| `rerank` | bool | true | Reranking |
-| `topk` | int | 5 | Results count |
-
-**Return Format**: Same as `search_collection`
-
-### 4. web_search
-
-Search the internet.
-
-**Parameters**:
-
-| Parameter | Type | Default | Description |
-|-----------|------|---------|-------------|
-| `query` | string | "" | Search keywords |
-| `max_results` | int | 5 | Results count |
-| `source` | string | "" | Specific domain (e.g., `vercel.com`) |
-| `timeout` | int | 30 | Timeout (seconds) |
-| `locale` | string | "en-US" | Language locale |
-
-**Usage Patterns**:
-
-```python
-# Regular search
-web_search(query="ApeRAG 2025")
-
-# Site-specific search
-web_search(query="deployment docs", source="vercel.com")
-
-```
-
-### 5. web_read
-
-Read webpage content.
-
-**Parameters**:
-
-| Parameter | Type | Default | Description |
-|-----------|------|---------|-------------|
-| `url_list` | list[str] | Required | URL list |
-| `timeout` | int | 30 | Timeout (seconds) |
-| `max_concurrent` | int | 5 | Max concurrent requests |
-
-**Returns**:
-```json
-{
- "results": [
- {
- "status": "success",
- "url": "https://example.com",
- "title": "Page Title",
- "content": "Extracted text",
- "word_count": 1234
- }
- ]
-}
-```
-
-**Example**:
-```python
-# Read single page
-web_read(url_list=["https://example.com/article"])
-
-# Batch read
-web_read(
- url_list=["https://example.com/page1", "https://example.com/page2"],
- max_concurrent=2
-)
-```
-
-## Practical Examples
-
-### Example 1: Knowledge Base Q&A
-
-```python
-# 1. List all knowledge bases
-collections = list_collections()
-
-# 2. Select a knowledge base
-collection_id = collections.items[0].id
-
-# 3. Search (all methods enabled by default)
-results = search_collection(
- collection_id=collection_id,
- query="How to optimize performance?"
-)
-
-# 4. Process results
-for item in results.items:
- print(f"[{item.recall_type}] {item.content}")
- print(f"Source: {item.source}, Score: {item.score}\n")
-```
-
-### Example 2: Graph Visualization
-
-```python
-# Search with graph retrieval
-results = search_collection(
- collection_id="abc123",
- query="relationship between Liu Bei and Zhuge Liang",
- use_graph_index=True
-)
-
-# Check for graph data
-if results.graph_search and results.graph_search.entities:
- print("Entities:", results.graph_search.entities)
- print("Relationships:", results.graph_search.relationships)
- # Use this data to generate knowledge graph visualization
-```
-
-### Example 3: Hybrid Search (KB + Web)
-
-```python
-# 1. Search web
-web_results = web_search(query="latest AI developments", max_results=3)
-urls = [r.url for r in web_results.results]
-
-# 2. Read web content
-web_content = web_read(url_list=urls)
-
-# 3. Search internal KB
-kb_results = search_collection(
- collection_id="ai-knowledge",
- query="AI development trends"
-)
-
-# 4. Synthesize
-print("=== Web Results ===")
-for r in web_results.results:
- print(f"{r.title}: {r.url}")
-
-print("\n=== Internal Knowledge ===")
-for item in kb_results.items:
- print(f"{item.content[:100]}...")
-```
-
-## Best Practices
-
-### Performance Tips
-
-1. **Reasonable topk**:
- - Too large increases LLM context consumption
- - Too small may miss important information
- - Recommended: 5-10
-
-2. **Selective Retrieval**:
- - Not all queries need full-text search
- - Full-text may return large amounts of text
- - Choose methods based on query type
-
-3. **Timeout Settings**:
- - Graph retrieval may be slow (default 120s)
- - Web search: 30-60s recommended
- - Batch URL read: 60s+ recommended
-
-### Common Issues
-
-**Q: No search results?**
-- Check if collection ID is correct
-- Confirm knowledge base indexing is complete
-- Try different retrieval method combinations
-
-**Q: Graph data empty?**
-- Confirm knowledge base has Graph index enabled
-- Simple documents may not contain obvious entity relationships
-
-**Q: Images not showing?**
-- Check `metadata.indexer == "vision"`
-- Use `asset://` protocol for URL
-- Ensure all required parameters included (asset_id, document_id, collection_id)
-
-## Tool Comparison
-
-| Tool | Purpose | Use Case |
-|------|---------|----------|
-| `list_collections` | List knowledge bases | See available resources |
-| `search_collection` | Search knowledge base | Primary search tool for persistent knowledge |
-| `search_chat_files` | Search chat files | Analyze temporary files uploaded in conversation |
-| `web_search` | Search internet | Get real-time or external information |
-| `web_read` | Read webpage | Extract full webpage content |
-
-## Related Links
-
-- **MCP Protocol**: https://modelcontextprotocol.io/
-- **ApeRAG GitHub**: https://github.com/apecloud/ApeRAG
-- **API Docs**: http://localhost:8000/docs (local deployment)
diff --git a/docs/en-US/jaeger-tracing.md b/docs/en-US/jaeger-tracing.md
deleted file mode 100644
index 9a6cfa8b2..000000000
--- a/docs/en-US/jaeger-tracing.md
+++ /dev/null
@@ -1,212 +0,0 @@
-# Jaeger Distributed Tracing for ApeRAG
-
-ApeRAG includes OpenTelemetry integration with Jaeger support for distributed tracing, allowing you to monitor and analyze request flows across your services.
-
-## Quick Start
-
-### 1. Enable Jaeger in Docker Compose
-
-To start ApeRAG with Jaeger enabled:
-
-```bash
-# Start infrastructure with Jaeger
-make infra-up WITH_JAEGER=1
-
-# Or start the full application stack with Jaeger
-make stack-up WITH_JAEGER=1
-```
-
-Alternatively, you can use docker-compose directly:
-
-```bash
-# Start with Jaeger profile
-docker-compose --profile jaeger up -d
-
-# Or start specific services
-docker-compose up -d jaeger postgres redis qdrant es
-```
-
-### 2. Enable Jaeger Tracing in Environment
-
-Set the following environment variables:
-
-```bash
-JAEGER_ENABLED=True
-JAEGER_ENDPOINT=http://aperag-jaeger:14268/api/traces # Docker environment
-# or
-JAEGER_ENDPOINT=http://localhost:14268/api/traces # Local development
-```
-
-### 3. Access Jaeger UI
-
-Once Jaeger is running, access the web interface at:
-
-- **Jaeger UI**: http://localhost:16686
-
-## Configuration
-
-### Environment Variables
-
-| Variable | Default | Description |
-|----------|---------|-------------|
-| `OTEL_ENABLED` | `True` | Enable/disable OpenTelemetry tracing |
-| `OTEL_SERVICE_NAME` | `aperag` | Service name in traces |
-| `OTEL_SERVICE_VERSION` | `1.0.0` | Service version in traces |
-| `JAEGER_ENABLED` | `False` | Enable/disable Jaeger exporter |
-| `JAEGER_ENDPOINT` | - | Jaeger collector endpoint |
-| `OTEL_CONSOLE_ENABLED` | `False` | Enable console span output |
-| `OTEL_FASTAPI_ENABLED` | `True` | Enable FastAPI instrumentation |
-| `OTEL_SQLALCHEMY_ENABLED` | `True` | Enable SQLAlchemy instrumentation |
-| `OTEL_MCP_ENABLED` | `True` | Enable MCP agent trace injection |
-
-### Jaeger Ports
-
-| Port | Service | Description |
-|------|---------|-------------|
-| 16686 | Web UI | Jaeger query and visualization interface |
-| 14268 | HTTP | HTTP collector for receiving spans |
-| 14250 | gRPC | gRPC collector for receiving spans |
-| 6831 | UDP | UDP agent port (legacy) |
-| 6832 | UDP | UDP agent port (legacy) |
-
-## What Gets Traced
-
-ApeRAG automatically instruments:
-
-1. **HTTP Requests** (FastAPI)
- - All API endpoints
- - Request/response details
- - Error tracking
-
-2. **Database Operations** (SQLAlchemy)
- - SQL queries
- - Database connection info
- - Query performance
-
-3. **MCP Agent Events**
- - Agent interactions
- - Tool usage
- - Session tracking
-
-4. **Custom Application Spans**
- - LLM API calls
- - Document processing
- - Graph operations
-
-## Using Traces
-
-### 1. Find Traces by Service
-
-In Jaeger UI:
-- Select "aperag" from the Service dropdown
-- Choose an operation (e.g., "GET /api/v1/collections")
-- Click "Find Traces"
-
-### 2. Analyze Request Flow
-
-Each trace shows:
-- **Timeline**: Visual representation of span duration
-- **Service Map**: How services interact
-- **Error Detection**: Failed operations highlighted in red
-- **Performance**: Identify slow operations
-
-### 3. Debugging
-
-For debugging specific issues:
-- Search traces by operation name
-- Filter by tags (user_id, collection_id, etc.)
-- Look for error spans (marked in red)
-- Examine span logs for detailed error information
-
-## Development Usage
-
-### Local Development
-
-For local development without Docker:
-
-```bash
-# Start Jaeger locally
-docker run -d --name jaeger \
- -p 16686:16686 \
- -p 14268:14268 \
- jaegertracing/all-in-one:1.60
-
-# Set environment variables
-export JAEGER_ENABLED=True
-export JAEGER_ENDPOINT=http://localhost:14268/api/traces
-
-# Start your application
-make serve-api
-```
-
-### Custom Tracing
-
-Add custom spans to your code:
-
-```python
-from aperag.trace import get_tracer, create_span
-
-tracer = get_tracer(__name__)
-
-# Synchronous function
-with create_span(tracer, "my_operation", custom_attr="value"):
- # Your code here
- pass
-
-# Or use decorators
-from aperag.trace import trace_async_function
-
-@trace_async_function("custom_operation")
-async def my_async_function():
- # This function will be automatically traced
- pass
-```
-
-## Production Considerations
-
-### Performance Impact
-
-- Tracing adds minimal overhead (~1-5ms per request)
-- Sampling can be configured to reduce load
-- Consider disabling console output in production
-
-### Data Retention
-
-- Jaeger stores traces in memory by default
-- For production, consider using persistent storage:
- - Elasticsearch backend
- - Cassandra backend
- - Kafka for high-throughput scenarios
-
-### Security
-
-- Jaeger UI has no built-in authentication
-- Consider putting it behind a reverse proxy with auth
-- Traces may contain sensitive data - review what gets traced
-
-## Troubleshooting
-
-### Jaeger Not Receiving Traces
-
-1. Check if Jaeger is running: `docker ps | grep jaeger`
-2. Verify endpoint configuration: `echo $JAEGER_ENDPOINT`
-3. Check application logs for tracing errors
-4. Ensure `OTEL_ENABLED=True` and `JAEGER_ENABLED=True`
-
-### Missing Spans
-
-1. Verify instrumentation is enabled for the component
-2. Check if the operation is being captured
-3. Look for exceptions in span creation
-
-### Performance Issues
-
-1. Disable console output: `OTEL_CONSOLE_ENABLED=False`
-2. Configure sampling rates if needed
-3. Monitor Jaeger resource usage
-
-## Resources
-
-- [Jaeger Documentation](https://www.jaegertracing.io/docs/)
-- [OpenTelemetry Python](https://opentelemetry.io/docs/instrumentation/python/)
-- [FastAPI OpenTelemetry](https://opentelemetry-python-contrib.readthedocs.io/en/latest/instrumentation/fastapi/fastapi.html)
\ No newline at end of file
diff --git a/docs/en-US/reference/HOW-TO-DEBUG.md b/docs/en-US/reference/HOW-TO-DEBUG.md
deleted file mode 100644
index 8e831744b..000000000
--- a/docs/en-US/reference/HOW-TO-DEBUG.md
+++ /dev/null
@@ -1,51 +0,0 @@
-# ApeRAG Project PyCharm Debugging Guide
-
-This document details how to debug two core components of the ApeRAG project in PyCharm: the Celery asynchronous task service and the web backend service.
-
----
-
-## 1. Debugging Celery Tasks
-
-### 1.1 Create Debug Configuration
-1. Go to PyCharm's top menu: **Run > Edit Configurations**
-2. Click **+** and select **Python**
-3. Configure with the following parameters:
-
-**Core Settings:**
-```python
-Name: celery
-Python interpreter: [uv virtual environment path]
- # Get via: `readlink .venv/bin/python`
-Script path: [Celery executable path]
- # Get via: `which celery` in terminal
-Parameters: -A config.celery worker -l INFO --pool=solo
- # Critical parameter: --pool=solo (enables single-process mode for debugging)
-Environment variables:
- PYTHONUNBUFFERED=1;PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python
-```
-
-
-
----
-
-## 2. Debugging Web Backend
-
-### 2.1 Create Debug Configuration
-1. Go to **Run > Edit Configurations**
-2. Click **+** and select **Python**
-3. Configure with the following parameters:
-
-**Core Settings:**
-```python
-Name: backend
-Python interpreter: [Same uv virtual environment as Celery]
-Script path: [uvicorn executable path]
- # Get via: `which uvicorn`
-Parameters: aperag.app:app --host 0.0.0.0 --log-config scripts/uvicorn-log-config.yaml
-Environment variables:
- PYTHONUNBUFFERED=1;
- DJANGO_SETTINGS_MODULE=config.settings;
-```
-
-
-
diff --git a/docs/en-US/reference/how-to-configure-ollama.md b/docs/en-US/reference/how-to-configure-ollama.md
deleted file mode 100644
index 491b48084..000000000
--- a/docs/en-US/reference/how-to-configure-ollama.md
+++ /dev/null
@@ -1,70 +0,0 @@
-# How to Configure Local Ollama with ApeRAG
-
-This guide shows how to configure local Ollama models in your ApeRAG deployment.
-
-## Prerequisites
-
-- ApeRAG running locally
-- [Ollama](https://ollama.ai/) installed and running locally
-- Ollama models downloaded
-
-## Step 1: Add Ollama Provider
-
-Navigate to **Settings > Models** in your ApeRAG interface and click **"Add Provider"**.
-
-Enter a provider name (e.g., "local-ollama") and set the **Base URL** to: `http://localhost:11434/v1`
-
-Click **Save**.
-
-
-
-## Step 2: Add Ollama Models
-
-Click the **three dots** on the right side of your newly created provider and select **"Models"** to enter the model management page.
-
-Click **"Add Model"** and configure:
-- **Model Name**: Enter your model name (e.g., `gpt-oss:20b`)
-- **Model Type**: Select `Completion`
-- **LLM Provider**: Select `openai` (Ollama is OpenAI-compatible)
-
-Click **Save**.
-
-
-
-
-
-## Step 3: Enable Model Usage
-
-You'll notice each model has two toggle switches: **Agent** and **Collection**. You can enable both:
-
-- **Agent**: Allows the model to be used for answering questions
-- **Collection**: Allows the model to be used when building Collection indexes
-
-
-
-
-## Step 4: Enable Ollama Provider
-
-Return to the Providers page and click the **three dots** on the right side of your Ollama provider, then select **"Enable"**.
-
-When prompted for an API key, enter any random string since Ollama is self-hosted and doesn't require actual authentication.
-
-
-
-
-Your Ollama models should now appear in the models list, ready for use.
-
-
-
-## Usage
-
-Once configured, your local Ollama models will be available:
-
-- **For Collections**: Select Ollama models in LLM settings when creating or configuring collections
-- **For Chat**: Choose Ollama models in the chat interface for conversations
-
-
-
-
-
-Your local Ollama models are now ready to use with ApeRAG!
diff --git a/docs/zh-CN/design/agent-backend.md b/docs/zh-CN/design/agent-backend.md
deleted file mode 100644
index 15162d3d6..000000000
--- a/docs/zh-CN/design/agent-backend.md
+++ /dev/null
@@ -1,529 +0,0 @@
-# ApeRAG Agent 后端接口设计方案
-
-## 1. 设计概述
-
-基于现有ApeRAG项目架构,为Agent功能设计一套完整的后端接口系统。Agent作为一个独立的智能对话助手,需要支持Web搜索、模型切换等功能,并提供流畅的对话体验。集成现有的MCP接口进行collection搜索和管理。
-
-## 2. 接口架构设计
-
-### 2.1 接口路径规划
-
-根据现有API设计模式,分为两大模块:
-
-```
-/api/v1/agent/
-└── chats/ # Agent对话管理
-
-/api/v1/web/
-├── search # Web搜索
-└── read # Web内容读取
-```
-
-### 2.2 数据流架构
-
-```
-Frontend → Agent API → Agent Service → [
- MCP Service (collection搜索,由Agent后端调用)
- Web Service (网络搜索和内容读取)
- LLM Service (模型调用)
- Chat Service (对话历史)
-]
-```
-
-## 3. 接口详细设计
-
-### 3.1 Agent对话管理接口
-
-#### 3.1.1 创建Agent对话
-```
-POST /api/v1/agent/chats
-```
-
-**请求体**:
-```json
-{
- "title": "新对话" // 可选,默认自动生成
-}
-```
-
-**响应**:
-```json
-{
- "id": "chat_12345",
- "title": "新对话",
- "created": "2025-01-07T10:00:00Z",
- "updated": "2025-01-07T10:00:00Z"
-}
-```
-
-#### 3.1.2 获取对话列表
-```
-GET /api/v1/agent/chats
-```
-
-**响应**:
-```json
-{
- "items": [
- {
- "id": "chat_12345",
- "title": "技术问题讨论",
- "created": "2025-01-07T10:00:00Z",
- "updated": "2025-01-07T10:00:00Z"
- }
- ]
-}
-```
-
-#### 3.1.3 获取对话详情
-```
-GET /api/v1/agent/chats/{chat_id}
-```
-
-**响应**:
-```json
-{
- "id": "chat_12345",
- "title": "技术问题讨论",
- "created": "2025-01-07T10:00:00Z",
- "updated": "2025-01-07T10:00:00Z",
- "messages": [
- {
- "id": "msg_67890",
- "type": "user",
- "content": "请介绍一下ApeRAG的架构",
- "web_search_enabled": false,
- "model_used": "claude-3-5-sonnet",
- "timestamp": "2025-01-07T10:01:00Z"
- },
- {
- "id": "msg_67891",
- "type": "assistant",
- "content": "ApeRAG是一个...",
- "sources": [
- {
- "collection_id": "col_1",
- "collection_name": "技术文档",
- "score": 0.95,
- "text": "相关文档内容...",
- "metadata": {"source": "doc1.pdf"}
- }
- ],
- "web_search_results": null,
- "model_used": "claude-3-5-sonnet",
- "timestamp": "2025-01-07T10:01:05Z"
- }
- ]
-}
-```
-
-#### 3.1.4 发送消息
-```
-POST /api/v1/agent/chats/{chat_id}/messages
-```
-
-**请求体**:
-```json
-{
- "content": "请介绍一下ApeRAG的架构",
- "model_id": "claude-3-5-sonnet", // 可选,使用默认模型
- "web_search_enabled": false, // 可选,默认false
- "stream": true // 可选,是否流式响应
-}
-```
-
-**流式响应**(Server-Sent Events):
-```
-data: {"type": "start", "message_id": "msg_67890"}
-
-data: {"type": "content", "content": "ApeRAG是一个"}
-
-data: {"type": "content", "content": "强大的"}
-
-data: {"type": "sources", "sources": [...]}
-
-data: {"type": "end", "message_id": "msg_67890"}
-```
-
-**非流式响应**:
-```json
-{
- "id": "msg_67890",
- "content": "ApeRAG是一个强大的RAG系统...",
- "sources": [
- {
- "collection_id": "col_1",
- "collection_name": "技术文档",
- "score": 0.95,
- "text": "相关文档内容...",
- "metadata": {"source": "doc1.pdf"}
- }
- ],
- "web_search_results": null,
- "model_used": "claude-3-5-sonnet",
- "created": "2025-01-07T10:01:05Z"
-}
-```
-
-### 3.2 Web服务接口
-
-#### 3.2.1 Web搜索接口
-
-**HTTP接口**:
-```
-POST /api/v1/web/search
-```
-
-**MCP工具**:
-```
-web_search(query, max_results, search_engine, ...)
-```
-
-参考[JINA Reader API](https://jina.ai/reader)的`s.jina.ai`设计。
-
-**请求体**:
-```json
-{
- "query": "ApeRAG 2025年最新发展",
- "max_results": 5, // 可选,默认5
- "search_engine": "google", // 可选,默认google
- "timeout": 30, // 可选,超时时间(秒)
- "locale": "zh-CN" // 可选,浏览器语言
-}
-```
-
-**响应**:
-```json
-{
- "query": "ApeRAG 2025年最新发展",
- "results": [
- {
- "rank": 1,
- "title": "ApeRAG 2025年技术路线图",
- "url": "https://example.com/aperag-2025-roadmap",
- "snippet": "ApeRAG在2025年将重点发展...",
- "domain": "example.com",
- "timestamp": "2025-01-01T00:00:00Z"
- }
- ],
- "search_engine": "google",
- "total_results": 1250,
- "search_time": 1.2
-}
-```
-
-#### 3.2.2 Web内容读取接口
-
-**HTTP接口**:
-```
-POST /api/v1/web/read
-```
-
-**MCP工具**:
-```
-web_read(urls, timeout, ...)
-```
-
-参考[JINA Reader API](https://jina.ai/reader)的`r.jina.ai`设计。
-
-**请求体**:
-```json
-{
- "urls": [ // 支持单个URL字符串或URL数组
- "https://example.com/aperag-2025-roadmap",
- "https://example.com/another-page"
- ],
- "timeout": 30, // 可选,超时时间
- "css_selector": null, // 可选,CSS选择器,提取特定内容
- "wait_for_selector": null, // 可选,等待选择器,适用于SPA页面
- "exclude_selector": null, // 可选,排除选择器,去掉广告等无用内容
- "bypass_cache": false, // 可选,绕过缓存,获取最新内容
- "locale": "zh-CN", // 可选,浏览器语言
- "max_concurrent": 3 // 可选,最大并发数(仅对多个URL有效)
-}
-```
-
-**响应**:
-```json
-{
- "results": [
- {
- "url": "https://example.com/aperag-2025-roadmap",
- "status": "success",
- "title": "ApeRAG 2025年技术路线图",
- "content": "# ApeRAG 2025年技术路线图\n\nApeRAG在2025年将...",
- "extracted_at": "2025-01-07T10:01:00Z",
- "word_count": 1250,
- "token_count": 3200
- },
- {
- "url": "https://example.com/another-page",
- "status": "error",
- "error": "页面无法访问",
- "error_code": "TIMEOUT"
- }
- ],
- "total_urls": 2,
- "successful": 1,
- "failed": 1,
- "processing_time": 5.2
-}
-```
-
-## 4. 前后端交互流程
-
-### 4.1 对话创建流程
-
-```mermaid
-sequenceDiagram
- participant F as Frontend
- participant A as Agent API
- participant S as Agent Service
- participant D as Database
-
- F->>A: POST /api/v1/agent/chats
- A->>S: agent_service.create_chat()
- S->>D: 创建对话记录
- D-->>S: 返回chat_id
- S-->>A: 返回chat对象
- A-->>F: 返回JSON响应
-```
-
-### 4.2 消息发送流程
-
-```mermaid
-sequenceDiagram
- participant F as Frontend
- participant A as Agent API
- participant S as Agent Service
- participant MCP as MCP Service
- participant WS as Web Service
- participant LLM as LLM Service
-
- F->>A: POST /api/v1/agent/chats/{chat_id}/messages
- A->>S: agent_service.send_message()
-
- Note over S: Agent后端智能选择collections
- S->>MCP: 调用MCP搜索接口
- MCP-->>S: 返回搜索结果
-
- alt 如果启用web搜索
- S->>WS: 调用Web搜索/读取接口
- WS-->>S: 返回web结果
- end
-
- S->>LLM: 调用LLM生成回答
- LLM-->>S: 返回流式响应
- S-->>A: 返回流式响应
- A-->>F: SSE流式响应
-```
-
-### 4.3 Web服务调用流程
-
-```mermaid
-sequenceDiagram
- participant Client as 客户端
- participant HTTP as HTTP API
- participant MCP as MCP Tools
- participant WS as Web Service
- participant JINA as JINA API (初期)
-
- alt HTTP调用
- Client->>HTTP: POST /api/v1/web/search
- HTTP->>WS: 执行搜索
- WS->>JINA: 调用JINA API
- JINA-->>WS: 返回结果
- WS-->>HTTP: 格式化响应
- HTTP-->>Client: 返回标准格式
- else MCP调用
- Client->>MCP: web_search(query, ...)
- MCP->>WS: 执行搜索
- WS->>JINA: 调用JINA API
- JINA-->>WS: 返回结果
- WS-->>MCP: 格式化响应
- MCP-->>Client: 返回MCP格式
- end
-```
-
-## 5. 数据模型设计
-
-### 5.1 Agent对话模型
-
-```python
-class AgentChat:
- id: str
- title: str
- created: datetime
- updated: datetime
-
-class AgentMessage:
- id: str
- chat_id: str
- type: Literal["user", "assistant"]
- content: str
- model_used: str
- web_search_enabled: bool
- sources: List[SearchSource]
- web_search_results: List[WebSearchResult]
- created: datetime
-```
-
-### 5.2 Web服务模型
-
-```python
-class WebSearchRequest:
- query: str
- max_results: Optional[int] = 5
- search_engine: Optional[str] = "google"
- include_content: Optional[bool] = False
- timeout: Optional[int] = 30
- format: Optional[str] = "markdown"
- # ... 其他JINA参数
-
-class WebReadRequest:
- urls: Union[str, List[str]]
- format: Optional[str] = "markdown"
- timeout: Optional[int] = 30
- max_concurrent: Optional[int] = 3
- # ... 其他JINA参数
-
-class WebResult:
- url: str
- status: Literal["success", "error"]
- title: Optional[str]
- content: Optional[str]
- format: str
- word_count: Optional[int]
- token_count: Optional[int]
- images: Optional[List[WebImage]]
- links: Optional[List[WebLink]]
- error: Optional[str]
- error_code: Optional[str]
-```
-
-## 6. 实现策略
-
-### 6.1 分阶段实现
-
-**第一阶段:JINA集成**
-- 直接集成[JINA Reader API](https://jina.ai/reader)作为后端
-- 提供标准化的HTTP和MCP接口
-- 支持JINA的主要参数和功能
-
-**第二阶段:自研实现**
-- 实现自己的web搜索引擎集成
-- 实现自己的网页内容提取
-- 逐步替换JINA依赖
-
-**第三阶段:优化增强**
-- 添加缓存机制
-- 实现智能内容摘要
-- 支持更多搜索引擎
-
-### 6.2 接口兼容性
-
-无论使用JINA还是自研实现,都保持相同的接口格式,确保:
-- HTTP API接口不变
-- MCP工具接口不变
-- 响应格式保持一致
-- 参数含义保持兼容
-
-## 7. 错误处理设计
-
-### 7.1 标准错误响应格式
-
-```json
-{
- "error": "WEB_SEARCH_FAILED",
- "message": "网络搜索服务暂时不可用",
- "details": {
- "search_engine": "google",
- "retry_after": 30
- }
-}
-```
-
-### 7.2 常见错误码
-
-| 错误码 | HTTP状态码 | 描述 |
-|--------|-----------|------|
-| CHAT_NOT_FOUND | 404 | 对话不存在 |
-| MODEL_NOT_AVAILABLE | 400 | 模型不可用 |
-| WEB_SEARCH_FAILED | 500 | Web搜索失败 |
-| WEB_READ_FAILED | 500 | 网页读取失败 |
-| URL_NOT_ACCESSIBLE | 400 | URL无法访问 |
-| QUOTA_EXCEEDED | 429 | 超过配额限制 |
-| INVALID_URL_FORMAT | 400 | URL格式错误 |
-| TIMEOUT_ERROR | 408 | 请求超时 |
-
-## 8. 性能优化设计
-
-### 8.1 缓存策略
-
-- **Web搜索结果缓存**:相同查询的搜索结果缓存1小时
-- **Web页面内容缓存**:页面内容缓存6小时,支持ETags
-- **模型配置缓存**:缓存用户可用的模型列表
-
-### 8.2 异步处理
-
-- **流式响应**:使用异步生成器实现流式输出
-- **并发Web读取**:支持批量并发读取多个页面
-- **超时控制**:所有外部请求都有超时限制
-
-### 8.3 限流和配额
-
-- **用户级限流**:每个用户每分钟最多10次web搜索
-- **IP级限流**:防止滥用
-- **内容大小限制**:单个页面内容最大5MB
-
-## 9. 安全性设计
-
-### 9.1 认证授权
-
-- **API认证**:支持Bearer Token和Cookie认证
-- **权限控制**:用户只能访问自己的对话
-- **API限流**:防止接口滥用
-
-### 9.2 数据安全
-
-- **URL验证**:验证URL格式和域名白名单
-- **内容过滤**:过滤恶意内容和敏感信息
-- **XSS防护**:对Web内容进行sanitization
-
-## 10. 实现优先级
-
-### 10.1 第一阶段(核心功能)
-
-1. Agent对话管理接口
-2. 基础消息发送(非流式)
-3. MCP接口集成(collection搜索)
-4. Web服务接口(HTTP + MCP)
-5. JINA Reader API集成
-
-### 10.2 第二阶段(高级功能)
-
-1. 流式响应支持
-2. 缓存优化
-3. 错误处理完善
-4. 自研Web服务开发
-
-### 10.3 第三阶段(优化功能)
-
-1. 高级错误处理
-2. 性能监控
-3. 审计日志集成
-4. 智能内容摘要
-
-## 11. 总结
-
-这个设计方案基于现有ApeRAG架构,通过新增Agent专用接口和独立的Web服务来支持智能对话功能。主要特点:
-
-1. **架构兼容**:复用现有的service层和数据模型
-2. **MCP集成**:通过MCP接口进行collection搜索,Agent后端智能选择
-3. **Web服务独立**:参考[JINA Reader API](https://jina.ai/reader)设计,提供HTTP和MCP双接口
-4. **参数兼容**:完整支持JINA的参数体系,便于初期集成
-5. **渐进替换**:先用JINA实现,后续自研替换,接口保持兼容
-6. **扩展性强**:接口设计支持未来功能扩展
-7. **性能优化**:考虑缓存、异步处理等性能优化
-
-前端可以通过这些接口实现类似Cursor的对话体验,用户可以轻松切换模型,并获得智能的搜索和问答服务。Agent后端会智能调用MCP接口进行collection搜索,同时Web服务提供强大的网络信息获取能力。
\ No newline at end of file
diff --git a/docs/zh-CN/design/agent_runtime_v3.md b/docs/zh-CN/design/agent_runtime_v3.md
deleted file mode 100644
index c841f42e5..000000000
--- a/docs/zh-CN/design/agent_runtime_v3.md
+++ /dev/null
@@ -1,568 +0,0 @@
----
-title: Agent Runtime V3 设计方案
-description: ApeRAG Agent Runtime V3 的详细设计文档,定义新的 Turn、TimelineEvent、Artifact、SSE 协议与迁移边界
-keywords: Agent Runtime, SSE, Turn, TimelineEvent, Artifact, PydanticAI, MCP
-position: 4
----
-
-# ApeRAG Agent Runtime V3 设计方案
-
-## 1. 背景与目标
-
-当前 ApeRAG 的 agent chat 主链路基于 `mcp-agent` 运行。它已经证明“能工作”,但对 ApeRAG 的长期产品目标并不合适:
-
-- 当前最脆弱的部分不是业务 API,而是 runtime 胶水层
-- 事件分发、流式输出、前后端消息格式、会话缓存、工具结果回推都与第三方 runtime 的内部语义耦合过深
-- 这类耦合会直接转化成私有化交付后的维护成本、排障成本和答疑成本
-
-因此,这次设计的目标不是“换一个更强的 agent 框架”,而是:
-
-1. 重做 ApeRAG 的 `agent 产品层`
-2. 保持 ApeRAG 的 `业务能力面` 稳定
-3. 建立一个更适合 `私有化部署`、`简单可靠`、`低维护成本`、`低答疑成本` 的单 agent runtime
-
-本设计的约束前提如下:
-
-- 优先面向 `私有化交付`
-- 假设客户普遍缺乏强运维能力
-- 功能可以保守,但默认行为必须稳健
-- agent 层允许整体切换,不为旧 WebSocket grammar、旧 Redis message shape、旧 message part 语义做长期兼容
-
-## 2. 设计结论
-
-Agent Runtime V3 的正式结论如下:
-
-1. `mcp-agent` 不再作为 ApeRAG 的长期核心 runtime
-2. `FastAPI + FastMCP + 现有业务 API/provider` 继续作为稳定业务面保留
-3. 主传输协议统一为 `SSE`
-4. 核心产品契约统一为 `Turn + TimelineEvent + Artifact`
-5. 第一阶段 runtime 实现使用 `PydanticAI adapter`
-6. 长期保留进一步收敛到 `自研薄编排层` 的空间
-7. 不把核心 runtime 迁移到 `Vercel AI SDK`、`OpenAI Agents SDK` 或 `LangGraph`
-
-这意味着:外部库只负责帮助实现,不再定义 ApeRAG 的产品语义。
-
-## 3. 设计原则
-
-### 3.1 私有化优先
-
-所有设计优先满足以下诉求:
-
-- 默认配置可运行
-- 失败行为可诊断
-- 升级和回退边界清晰
-- 少依赖隐式前提
-- 少引入双栈和兼容包袱
-
-### 3.2 产品契约归 ApeRAG 自己拥有
-
-新的对外契约由 ApeRAG 自己定义,包括:
-
-- API 入口
-- SSE 事件流
-- 前端可见状态词表
-- TimelineEvent schema
-- Artifact schema
-- History commit policy
-
-第三方 runtime 只能适配这套契约,不能反向决定它。
-
-### 3.3 最终回答与过程事件彻底分离
-
-最终 answer、运行过程、references、tool result 不再混塞进一条 assistant message。
-
-分层原则如下:
-
-- `answer` 是 answer artifact
-- `timeline` 是过程事件流
-- `references` 是独立 artifact
-- `tool result` 通过摘要事件 + artifact 引用暴露
-
-### 3.4 明确缩小 Phase 1 能力边界
-
-Phase 1 只支持:
-
-- 单 agent
-- 串行 tool loop
-- 单 MCP server 视图
-- 单个 turn 内多轮 internal loop
-
-Phase 1 不支持:
-
-- 多 agent
-- 并发 tool fan-out
-- workflow/graph orchestration
-- 长任务编排
-
-## 4. 核心对象模型
-
-## 4.1 Turn
-
-`Turn` 表示一次用户 query 对应的一次完整 agent execution。
-
-需要特别澄清的是:
-
-- 一个 turn 不是“一步回答”
-- 一个 turn 内允许多次 thinking、多次 web search、多次 tool call、多次读取结果和多轮内部推理
-- turn 是外层执行边界,不是内部 loop 次数的限制器
-
-### 4.1.1 设计目标
-
-Turn 统一承担以下职责:
-
-- 幂等边界
-- 取消边界
-- 超时边界
-- 恢复边界
-- 最终历史提交边界
-- 回放与评测边界
-
-### 4.1.2 建议字段
-
-```text
-schema_version
-turn_id
-chat_id
-user_id
-request_id
-client_idempotency_key
-status
-input_text
-model_profile
-started_at
-finished_at
-error_code
-error_message
-answer_artifact_id
-reference_bundle_artifact_id
-timeline_cursor
-```
-
-### 4.1.3 状态机
-
-```text
-queued -> running -> completed
-queued -> running -> failed
-queued -> running -> cancelled
-```
-
-### 4.1.4 硬约束
-
-1. 一个 turn 只允许一个最终 `answer_artifact_id`
-2. 同一个 turn 不允许被执行两次
-3. 同一个 `chat_id + client_idempotency_key` 只能创建一个有效 turn
-
-## 4.2 TimelineEvent
-
-`TimelineEvent` 表示 turn 执行过程中的标准化事件流。
-
-它既是:
-
-- 前端时间线展示模型
-- SSE 传输模型
-- 诊断与回放模型
-
-但它不是:
-
-- runtime 原始内部日志 dump
-- debug event 任意透出层
-
-### 4.2.1 必备字段
-
-```text
-schema_version
-event_id
-turn_id
-sequence
-timestamp
-type
-label
-status
-actor
-data
-```
-
-### 4.2.2 硬约束
-
-1. `sequence` 在 turn 内必须严格单调递增
-2. 不允许前端按时间戳猜顺序
-3. `actor` 只允许:`agent | tool | system`
-4. `data` 只携带最小必要 payload
-5. timeline 必须支持重放
-
-### 4.2.3 事件类型
-
-Phase 1 标准事件类型定义如下:
-
-- `turn.started`
-- `agent.state.changed`
-- `tool.started`
-- `tool.progress`
-- `tool.finished`
-- `external_action.started`
-- `external_action.finished`
-- `text.delta`
-- `artifact.created`
-- `turn.completed`
-- `turn.failed`
-- `turn.cancelled`
-- `heartbeat`
-
-### 4.2.4 事件分层约束
-
-- `tool.*` 用于标准 tool loop
-- `external_action.*` 只用于少数用户可感知外部动作,例如 `web_search`
-- 不允许把所有内部小步骤都提升到 timeline 层
-
-## 4.3 Artifact
-
-`Artifact` 表示需要持久化、可重读、可复用、可排障的大对象。
-
-### 4.3.1 建议类型
-
-- `answer`
-- `reference_bundle`
-- `tool_result_summary`
-- `search_result_summary`
-- `error_summary`
-
-### 4.3.2 建议字段
-
-```text
-schema_version
-artifact_id
-turn_id
-artifact_type
-created_at
-summary
-storage_ref | payload
-```
-
-### 4.3.3 硬约束
-
-1. stream 中不直接推送大正文
-2. stream 只推送摘要、artifact id 和必要元数据
-3. references 必须 materialize 成独立 artifact
-
-## 5. 用户可见状态词表
-
-前端不直接显示 runtime 原始事件名,而是统一映射成稳定的用户可见状态词表。
-
-Phase 1 固定为:
-
-- `Thinking`
-- `Searching`
-- `Calling Tool`
-- `Reading Result`
-- `Streaming Answer`
-- `Completed`
-- `Failed`
-
-这样做的目的有两个:
-
-1. 避免后端内部状态演进反复影响前端展示
-2. 降低用户理解成本与答疑成本
-
-## 6. 前后端协议设计
-
-## 6.1 主传输协议
-
-主链路只保留 `SSE`。
-
-不长期保留 `WebSocket + SSE` 双栈共存。
-
-## 6.2 API 设计
-
-### 6.2.1 创建 turn
-
-```text
-POST /api/v2/agent/chats/{chat_id}/turns
-```
-
-请求体建议包含:
-
-- `query`
-- `context`
-- `model_profile`
-- `client_idempotency_key`
-
-响应建议包含:
-
-- `turn_id`
-- `status`
-- `stream_url`
-
-### 6.2.2 订阅 turn 事件流
-
-```text
-GET /api/v2/agent/chats/{chat_id}/turns/{turn_id}/events
-```
-
-返回类型:
-
-```text
-Content-Type: text/event-stream
-```
-
-### 6.2.3 获取 turn snapshot
-
-```text
-GET /api/v2/agent/chats/{chat_id}/turns/{turn_id}
-```
-
-用于:
-
-- 刷新页面恢复
-- SSE 重连失败时兜底
-- 调试和诊断
-
-### 6.2.4 取消 turn
-
-```text
-POST /api/v2/agent/chats/{chat_id}/turns/{turn_id}/cancel
-```
-
-### 6.2.5 获取 artifact
-
-```text
-GET /api/v2/agent/artifacts/{artifact_id}
-```
-
-### 6.2.6 OpenAI-compatible adapter
-
-```text
-POST /v1/chat/completions
-```
-
-这个接口是给 OpenAI 形状客户端使用的兼容 adapter,不是前端主 UI
-contract。实现必须把每个请求转换成 Agent Runtime V3 turn,再按
-OpenAI 形状格式化输出:
-
-- `stream=false` 时返回 `chat.completion` JSON
-- `stream=true` 时返回 `text/event-stream` 的 `chat.completion.chunk`
- 帧
-
-adapter contract 固定为:
-
-- `bot_id` 是必填 query 参数
-- `chat_id` 可选;不传时后端创建并在请求结束后删除 ephemeral chat
-- `language` 可选,默认 `en-US`
-- `Idempotency-Key` / `X-Idempotency-Key` 映射为
- `client_idempotency_key`
-
-## 6.3 幂等与重连
-
-### 6.3.1 幂等策略
-
-- `POST turn` 必须支持客户端幂等键
-- 同一 `chat_id + client_idempotency_key` 下重复请求不得创建多个 turn
-- 同一个 turn 一旦创建成功,就不允许重复执行
-
-### 6.3.2 SSE 重连策略
-
-- 默认支持 `Last-Event-ID` 或 offset 续传
-- 如果服务端 event buffer 已过期,则:
- 1. 客户端先拉 `turn snapshot`
- 2. 再从当前最新游标继续订阅
-
-## 6.4 心跳、背压与超时
-
-SSE 层必须定义以下行为:
-
-- heartbeat 事件
-- event buffer 上限
-- delta 合并策略
-- 过载时摘要/截断策略
-
-同时区分以下超时:
-
-- 单 tool timeout
-- 单轮 total runtime timeout
-- stream idle timeout
-
-## 7. 权限与安全边界
-
-新 runtime 入口必须重新做完整鉴权,不能依赖旧 WebSocket 路径中的隐式前提。
-
-每个 turn 创建时必须重新校验:
-
-- `chat_id` ownership
-- collection/file context visibility
-- tool 可见范围
-
-artifact 读取接口也必须重新做权限校验,防止通过 artifact id 越权读取。
-
-## 8. 存储设计
-
-## 8.1 Redis 职责
-
-Redis 只负责短期运行态与流式恢复:
-
-- `turn runtime state`
-- `stream cursor`
-- `transient event buffer`
-- `in-flight text buffer`
-
-Redis 不再承担:
-
-- 旧 message grammar 兼容
-- 最终产品层消息协议
-- 长期历史表达
-
-## 8.2 DB / Persistent Store 职责
-
-持久化层负责保存:
-
-- `conversation_turn`
-- `timeline_event`(至少关键事件)
-- `artifact`
-- `reference_bundle`
-- `error_summary`
-
-Timeline 必须可重放,不能只存在于流式阶段。
-
-## 8.3 History Commit Policy
-
-最终 history 不按 token 流实时写入。
-
-策略如下:
-
-1. stream 期间只写运行态/缓存态
-2. 只有在 `done` 或明确 `error` 后,才一次性提交标准化 turn 记录
-
-这样可以避免:
-
-- 半截输出污染 history
-- 取消后残留脏记录
-- 重连或回退留下不可解释状态
-
-## 9. 前端体验模型
-
-新的前端展示不再以“一个 assistant bubble 包含一切”为核心。
-
-建议拆成五个视图层:
-
-1. `Turn Header`
-2. `Timeline`
-3. `Final Answer Panel`
-4. `References Panel`
-5. `Diagnostics Drawer`
-
-其中:
-
-- Timeline 只展示过程
-- Final Answer Panel 只展示最终 answer
-- References Panel 只展示引用和来源
-- Diagnostics Drawer 只在需要时展开
-
-## 10. Runtime 路线
-
-## 10.1 Phase 1:PydanticAI Adapter
-
-第一阶段 runtime 实现采用 `PydanticAI`,原因不是它定义契约,而是它能降低第一阶段实现成本。
-
-它适合用来实现:
-
-- 单 turn 内部 loop
-- tool 调用
-- provider 调用
-- 状态映射
-
-但不负责定义:
-
-- 对外 API
-- TimelineEvent schema
-- Artifact schema
-- History commit policy
-
-## 10.2 长期路线
-
-如果 `PydanticAI adapter` 运行稳定、维护成本可接受,可以继续保留。
-
-如果后续仍发现第三方 runtime 对行为边界约束太多,则继续把底层收成完全自研薄编排层。
-
-由于契约层已经独立,届时只替换 runtime 实现,不需要再次重写前后端协议。
-
-## 11. 替换边界
-
-本次重写:
-
-- 保留:`FastAPI`、`FastMCP`、业务 API、provider 接入、业务数据实体
-- 替换:`mcp-agent runtime glue`、旧 WebSocket grammar、旧 Redis message shape、旧前端消息渲染模型、旧事件回推机制
-
-这意味着:
-
-- 业务价值复用
-- 产品层和运行时耦合重做
-
-## 12. 迁移与回退原则
-
-### 12.1 迁移原则
-
-- 可以保留短期 feature flag 灰度
-- 不允许长期双栈共存
-- 一旦新链路稳定,旧 WebSocket grammar 和旧 runtime glue 直接下线
-
-### 12.2 回退条件
-
-以下情况允许回退:
-
-- SSE 在企业代理环境中明显不稳定
-- timeline 重放和恢复不可靠
-- turn/history/artifact 兼容性不成立
-- tool/provider 错误不可诊断
-
-### 12.3 回退要求
-
-回退后必须保证:
-
-- 历史记录可读
-- turn 记录不变成孤儿
-- artifact 不变成不可追踪残留
-
-## 13. Phase 1 实施清单
-
-Phase 1 的实施目标是跑通新的最小主链路,而不是一次性追求最终形态。
-
-建议的 Phase 1 任务:
-
-1. 新建 `aperag/agent_runtime/` 模块
-2. 定义 `Turn / TimelineEvent / Artifact` schema
-3. 实现 `TurnService`
-4. 实现 `EventService`
-5. 实现 `ArtifactService`
-6. 实现 `HistoryWriter`
-7. 定义 `AgentRuntime` 抽象
-8. 实现 `PydanticAIRuntime`
-9. 实现 MCP client adapter
-10. 实现 `SSE StreamEmitter`
-11. 新增 v2 agent API
-12. 新增前端 timeline 组件
-13. 新增 answer/references/diagnostics 分层面板
-14. 实现 snapshot 恢复与 cancel
-15. 补充契约级 E2E 覆盖
-
-## 14. 验收标准
-
-Phase 1 完成时,至少应满足:
-
-1. 新 API 可以创建 turn、订阅 SSE、拉 snapshot、读取 artifact、取消 turn
-2. 单 turn 内部可以完成多轮 search/tool/thinking loop
-3. Timeline 可重连、可重放
-4. 最终 answer 与过程事件完全分离
-5. 历史提交策略不产生半截脏记录
-6. `mcp-agent` 已退出主 chat 运行路径
-
-## 15. 正式拍板
-
-Agent Runtime V3 的正式拍板如下:
-
-- 不再继续修补 `mcp-agent`
-- 不再继续修补旧 WebSocket grammar
-- 新 runtime 契约统一为 `Turn + TimelineEvent + Artifact + SSE`
-- 第一阶段采用 `PydanticAI adapter`
-- 后续由实现同学按本设计文档推进,架构侧负责监督契约边界与长期方向
-
-一句话总结:
-
-这次不是“把某个 agent 库换掉”,而是为 ApeRAG 重建一个更适合私有化交付的 agent runtime 产品层。
diff --git a/docs/zh-CN/design/architecture.md b/docs/zh-CN/design/architecture.md
deleted file mode 100644
index 572e2ed72..000000000
--- a/docs/zh-CN/design/architecture.md
+++ /dev/null
@@ -1,850 +0,0 @@
----
-title: 系统架构
-description: ApeRAG 架构设计与核心组件详解
-keywords: ApeRAG, 架构, RAG, 知识图谱, LightRAG
-position: 1
----
-
-# ApeRAG 系统架构
-
-## 1. 什么是 ApeRAG
-
-ApeRAG 是一个**开放的、Agentic 的 Graph RAG 平台**。它不仅仅是一个简单的向量检索系统,而是将知识图谱、多模态检索和智能 Agent 深度融合的生产级解决方案。
-
-传统的 RAG 系统主要依赖向量相似度检索,虽然能找到语义相关的内容,但往往缺乏对知识之间关系的理解。ApeRAG 的核心创新在于:
-
-- **Graph RAG**:从文档中自动提取实体(人物、地点、概念)和关系,构建知识图谱,理解知识之间的关联
-- **Agentic**:内置智能 Agent,能够自主规划、调用工具、多轮对话,提供更智能的问答体验
-- **开放集成**:通过 **RESTful API** 和 **MCP 协议**对外暴露能力,可以轻松集成到 Dify、Claude、Cursor 等外部系统
-
-### 核心优势
-
-与传统 RAG 方案相比,ApeRAG 提供了:
-
-- **更强的文档处理能力**:支持 PDF、Word、Excel 等多种格式,能处理复杂的表格、公式、图片
-- **多种检索方式**:向量检索、全文检索、图谱检索,三者互补,各取所长
-- **知识关联理解**:通过知识图谱理解概念之间的关系,而不仅仅是文本相似度
-- **开放的集成能力**:RESTful API + MCP 协议,可以作为 Dify、Claude Desktop、Cursor 的知识后端
-- **生产级架构**:异步处理、多存储、高并发,可以直接用于生产环境
-
-### 整体架构一览
-
-```mermaid
-graph TB
- User[用户] --> Frontend[Web 前端]
- User --> External[外部系统
Dify/Claude/Cursor]
-
- Frontend --> API[RESTful API]
- External --> MCP[MCP 协议]
-
- API --> DocProcess[文档处理]
- API --> Search[检索服务]
- API --> Agent[Agent 对话]
- MCP --> Search
- MCP --> Agent
-
- DocProcess --> Tasks[异步任务层]
- Tasks --> Storage[存储层]
-
- Search --> Storage
- Agent --> Search
-
- Storage --> PG[(PostgreSQL)]
- Storage --> Qdrant[(Qdrant
向量库)]
- Storage --> ES[(Elasticsearch
全文搜索)]
- Storage --> Neo4j[(Neo4j
图数据库)]
- Storage --> MinIO[(MinIO
文件存储)]
-
- style User fill:#e1f5ff
- style Frontend fill:#bbdefb
- style External fill:#bbdefb
- style API fill:#90caf9
- style MCP fill:#90caf9
- style DocProcess fill:#fff59d
- style Search fill:#fff59d
- style Agent fill:#fff59d
- style Tasks fill:#c5e1a5
- style Storage fill:#ffccbc
-```
-
-## 2. 系统分层架构
-
-ApeRAG 采用清晰的分层设计,每一层各司其职:
-
-```mermaid
-graph TB
- subgraph Layer1[客户端层]
- Web[Web 前端
Next.js]
- Dify[Dify]
- Cursor[Cursor]
- Claude[Claude Desktop]
- end
-
- subgraph Layer2[接口层]
- API[RESTful API
FastAPI]
- MCP[MCP Server
Model Context Protocol]
- end
-
- subgraph Layer3[服务层]
- CollSvc[Collection 服务]
- DocSvc[文档服务]
- SearchSvc[检索服务]
- GraphSvc[图谱服务]
- AgentSvc[Agent 服务]
- end
-
- subgraph Layer4[任务层]
- Celery[Celery Worker
异步任务]
- MinerU[MinerU
文档解析]
- end
-
- subgraph Layer5[存储层]
- PG[(PostgreSQL)]
- Qdrant[(Qdrant)]
- ES[(Elasticsearch)]
- Neo4j[(Neo4j)]
- Redis[(Redis)]
- MinIO[(MinIO)]
- end
-
- Web --> API
- Dify --> MCP
- Cursor --> MCP
- Claude --> MCP
-
- API --> CollSvc
- API --> DocSvc
- API --> SearchSvc
- API --> GraphSvc
- API --> AgentSvc
-
- MCP --> SearchSvc
- MCP --> AgentSvc
-
- CollSvc --> Celery
- DocSvc --> Celery
- GraphSvc --> Celery
-
- Celery --> MinerU
- Celery --> PG
- Celery --> Qdrant
- Celery --> ES
- Celery --> Neo4j
- Celery --> MinIO
-
- SearchSvc --> PG
- SearchSvc --> Qdrant
- SearchSvc --> ES
- SearchSvc --> Neo4j
-
- style Layer1 fill:#e3f2fd
- style Layer2 fill:#f3e5f5
- style Layer3 fill:#fff3e0
- style Layer4 fill:#e8f5e9
- style Layer5 fill:#fce4ec
-```
-
-**各层职责说明**:
-
-- **客户端层**:多种接入方式,Web UI 用于管理,MCP 客户端(Dify、Cursor、Claude 等)用于集成
-- **接口层**:RESTful API(传统 HTTP 接口)和 MCP Server(AI 工具协议)并行提供服务
-- **服务层**:核心业务逻辑,协调各种资源完成具体功能
-- **任务层**:处理耗时操作(文档解析、索引构建),保证 API 快速响应
-- **存储层**:多种存储系统,针对不同数据类型选择最优方案
-
-## 3. 文档处理全流程
-
-这是 ApeRAG 的核心能力之一。从一个 PDF 文件上传,到最终可以被检索,经历了一系列精心设计的处理步骤。
-
-### 3.1 文档上传与解析
-
-当你上传一个文档时,ApeRAG 会自动识别格式并选择合适的解析器:
-
-```mermaid
-flowchart TD
- Upload[用户上传文档] --> Detect[格式检测]
-
- Detect --> |PDF| MinerU[MinerU 解析器]
- Detect --> |Word/Excel| MarkItDown[MarkItDown 解析器]
- Detect --> |Markdown| DirectParse[直接解析]
- Detect --> |图片| OCR[OCR 识别]
-
- MinerU --> Extract[内容提取]
- MarkItDown --> Extract
- DirectParse --> Extract
- OCR --> Extract
-
- Extract --> Parts[文档片段
Parts 对象]
-
- style Upload fill:#e1f5ff
- style Extract fill:#c5e1a5
- style Parts fill:#fff59d
-```
-
-**MinerU 的强大之处**:
-
-- 能准确识别复杂 PDF 的表格结构,保留表格内容的完整性
-- 提取 LaTeX 数学公式,保持公式的可读性
-- 对扫描版 PDF 进行 OCR,支持中英文混排
-- 识别文档中的图片区域,支持图片内容理解
-
-### 3.2 智能分块策略
-
-文档解析后,需要切分成合适大小的块(chunk)。这个步骤很关键,分块太大会影响检索精度,太小会丢失上下文。
-
-```mermaid
-flowchart TD
- Parts[文档片段] --> Rechunk[智能重分块]
-
- Rechunk --> Analysis[分析文档结构]
- Analysis --> Hierarchy[识别标题层级]
- Hierarchy --> Group[按标题分组]
-
- Group --> Check{块大小检查}
- Check --> |过大| Split[语义分割]
- Check --> |合适| Chunks[最终块]
- Split --> Chunks
-
- Chunks --> AddContext[添加上下文]
- AddContext --> FinalChunks[带上下文的文档块]
-
- style Rechunk fill:#bbdefb
- style Split fill:#ffccbc
- style FinalChunks fill:#c5e1a5
-```
-
-**分块策略的特点**:
-
-- **保持语义完整性**:尽量不在句子中间切断
-- **保留标题上下文**:每个块都知道自己属于哪个章节
-- **层级化分割**:先按段落分,不行再按句子分,最后才按字符分
-- **智能合并**:相邻的小标题块会被合并,避免信息碎片化
-
-分块参数配置:
-- 默认块大小:1200 tokens(约 800-1000 个中文字符)
-- 重叠大小:100 tokens(保证上下文连续性)
-
-### 3.3 多索引并行构建
-
-文档分块后,会同时创建多种索引。每种索引有不同的用途,互相补充:
-
-| 索引类型 | 适用场景 | 存储位置 | 检索方式 |
-|---------|---------|---------|---------|
-| **向量索引** | 语义相似问题,比如"如何优化性能" | Qdrant | 余弦相似度 |
-| **全文索引** | 精确关键词搜索,比如"PostgreSQL 配置" | Elasticsearch | BM25 算法 |
-| **图谱索引** | 关系型问题,比如"A 和 B 有什么联系" | PostgreSQL/Neo4j | 图遍历 |
-| **摘要索引** | 快速了解文档概要 | PostgreSQL | 向量匹配 |
-| **视觉索引** | 图片内容搜索 | Qdrant | 多模态向量 |
-
-```mermaid
-flowchart LR
- Chunks[文档块] --> IndexMgr[索引管理器]
-
- IndexMgr --> VectorIdx[向量索引创建]
- IndexMgr --> FulltextIdx[全文索引创建]
- IndexMgr --> GraphIdx[图谱索引创建]
- IndexMgr --> VisionIdx[视觉索引创建]
-
- VectorIdx --> Qdrant1[(Qdrant)]
- FulltextIdx --> ES[(Elasticsearch)]
- GraphIdx --> Graph[(Neo4j/PG)]
- VisionIdx --> Qdrant2[(Qdrant)]
-
- style IndexMgr fill:#fff59d
- style VectorIdx fill:#bbdefb
- style FulltextIdx fill:#c5e1a5
- style GraphIdx fill:#ffccbc
- style VisionIdx fill:#e1bee7
-```
-
-**并行构建的优势**:
-- 不同索引可以同时构建,提高速度
-- 某个索引失败不影响其他索引
-- 可以按需启用特定类型的索引
-
-### 3.4 知识图谱构建
-
-图谱索引是 ApeRAG 的核心特色,它能从文档中提取结构化的知识。
-
-```mermaid
-flowchart TD
- Chunks[文档块] --> EntityExtract[实体提取]
-
- EntityExtract --> LLM1[调用 LLM
识别实体]
- LLM1 --> Entities[实体列表
人物、地点、概念]
-
- Entities --> RelationExtract[关系提取]
- RelationExtract --> LLM2[调用 LLM
识别关系]
- LLM2 --> Relations[关系列表
谁与谁有什么关系]
-
- Entities --> Merge[实体合并]
- Relations --> Merge
-
- Merge --> Components[连通分量分析]
- Components --> Parallel[并行处理各分量]
- Parallel --> Graph[(知识图谱)]
-
- style EntityExtract fill:#bbdefb
- style RelationExtract fill:#c5e1a5
- style Merge fill:#ffccbc
- style Components fill:#fff59d
-```
-
-**图谱构建的关键步骤**:
-
-1. **实体提取**:LLM 从文档块中识别出有意义的实体
- - 示例:从"张三在北京的清华大学学习人工智能"中提取
- - 实体:张三(人物)、北京(地点)、清华大学(组织)、人工智能(概念)
-
-2. **关系提取**:识别实体之间的关系
- - 示例:张三 --学习--> 人工智能,张三 --就读于--> 清华大学
-
-3. **实体合并**:同一实体可能有不同的表述,需要归一化
- - 示例:"LightRAG"、"light rag"、"Light-RAG" → 合并为统一实体
-
-4. **连通分量优化**:把图谱分成独立的子图,并行处理
- - 性能提升:2-3 倍吞吐量
-
-**为什么需要连通分量优化?**
-
-假设你有 100 篇文档,它们讨论不同的主题。关于"数据库"的实体和关于"机器学习"的实体之间没有连接,可以独立处理。连通分量算法会找出这些独立的"知识岛",然后并行处理,大大提高速度。
-
-### 3.5 异步任务系统
-
-文档处理是一个耗时的操作,ApeRAG 采用"双链路架构"来保证用户体验:
-
-```mermaid
-graph TB
- subgraph Frontend["🚀 前端链路 - 快速响应"]
- direction TB
- A1["📤 用户上传文档"] --> A2["🔌 API 接收请求"]
- A2 --> A3["📋 Index Manager"]
- A3 --> A4["💾 写入数据库
status = PENDING
version = 1"]
- A4 --> A5["✅ 立即返回成功
< 100ms"]
- end
-
- subgraph Backend["⚙️ 后端链路 - 异步处理"]
- direction TB
- B1["⏰ Celery Beat
每 30 秒检查"] --> B2["🔍 Reconciler 检测
version ≠ observed_version"]
- B2 --> B3{"🎯 发现待处理任务?"}
- B3 -->|是| B4["🚀 调度 Worker"]
- B3 -->|否| B1
- B4 --> B5["📄 解析文档"]
- B5 --> B6["🔀 并行创建索引
Vector + Fulltext
+ Graph + Vision"]
- B6 --> B7["✨ 更新状态
status = ACTIVE
observed_version = 1"]
- B7 --> B1
- end
-
- A4 -.-|"数据库状态变化"| B2
-
- style Frontend fill:#e3f2fd,stroke:#1976d2,stroke-width:3px
- style Backend fill:#fff3e0,stroke:#f57c00,stroke-width:3px
- style A5 fill:#c8e6c9,stroke:#388e3c,stroke-width:2px
- style B7 fill:#c8e6c9,stroke:#388e3c,stroke-width:2px
- style B3 fill:#fff9c4,stroke:#fbc02d,stroke-width:2px
-```
-
-**双链路的好处**:
-
-- **前端快速响应**:用户上传文档后,API 在 100ms 内返回,不需要等待处理完成
-- **后端异步处理**:真正的处理工作在后台慢慢做,不阻塞用户操作
-- **自动重试**:如果处理失败,系统会自动重试,保证最终成功
-- **状态可查**:用户可以随时查看文档处理进度
-
-**索引状态机**:
-
-```mermaid
-stateDiagram-v2
- [*] --> PENDING: 📤 文档上传
-
- PENDING --> CREATING: 🚀 Reconciler 检测到
开始处理
-
- CREATING --> ACTIVE: ✅ 所有索引创建成功
- CREATING --> FAILED: ❌ 处理失败
-
- FAILED --> CREATING: 🔄 自动重试
(最多 3 次)
- FAILED --> [*]: 💔 超过重试次数
标记为失败
-
- ACTIVE --> CREATING: 🔄 文档更新
重建索引
- ACTIVE --> [*]: 🗑️ 删除文档
-
- note right of PENDING
- version = 1
- observed_version = 0
- end note
-
- note right of CREATING
- 正在处理中
- 可能需要几分钟
- end note
-
- note right of ACTIVE
- version = 1
- observed_version = 1
- 可以被检索
- end note
-```
-
-## 4. 检索问答全流程
-
-有了索引之后,用户就可以提问了。ApeRAG 的检索系统会智能地选择合适的检索策略。
-
-### 4.1 混合检索系统
-
-不同类型的问题适合用不同的检索方式。ApeRAG 会同时使用多种检索策略,然后融合结果:
-
-```mermaid
-flowchart TB
- Query[用户问题] --> Router[检索路由]
-
- Router --> |并行检索| Vector[向量检索]
- Router --> |并行检索| Fulltext[全文检索]
- Router --> |并行检索| Graph[图谱检索]
-
- Vector --> Embed[生成问题向量]
- Embed --> QdrantSearch[Qdrant 相似度搜索]
- QdrantSearch --> R1[结果1]
-
- Fulltext --> ESSearch[Elasticsearch BM25]
- ESSearch --> R2[结果2]
-
- Graph --> GraphQuery[图谱查询
local/global/hybrid]
- GraphQuery --> R3[结果3]
-
- R1 --> Merge[结果融合]
- R2 --> Merge
- R3 --> Merge
-
- Merge --> Rerank[Rerank 重排序]
- Rerank --> Final[最终结果]
-
- style Query fill:#e1f5ff
- style Vector fill:#bbdefb
- style Fulltext fill:#c5e1a5
- style Graph fill:#ffccbc
- style Rerank fill:#fff59d
- style Final fill:#c5e1a5
-```
-
-**检索策略说明**:
-
-- **向量检索**:用于语义相似的问题
- - 问:"如何提升系统性能?"
- - 能找到:"优化数据库查询"、"使用缓存"等相关内容
-
-- **全文检索**:用于精确关键词匹配
- - 问:"PostgreSQL 的配置文件在哪?"
- - 能找到包含"PostgreSQL"和"配置文件"的精确段落
-
-- **图谱检索**:用于关系型问题
- - 问:"LightRAG 和 Neo4j 有什么关系?"
- - 会查询图谱中这两个实体的连接路径
-
-**结果融合策略**:
-
-不同检索方式的结果需要合并。ApeRAG 使用 Rerank 模型对所有候选结果重新打分:
-
-1. 收集所有检索结果(可能有重复)
-2. 去重,保留最相关的片段
-3. 使用 Rerank 模型评估每个片段与问题的相关性
-4. 按新的分数重新排序
-5. 返回 Top-K 结果
-
-### 4.2 知识图谱查询
-
-图谱检索有三种模式,适用于不同类型的问题:
-
-| 模式 | 适用场景 | 查询方式 | 示例问题 |
-|------|---------|---------|---------|
-| **local** | 查询某个实体的局部信息 | 向量匹配相似实体 → 获取邻居节点 | "张三的个人信息" |
-| **global** | 查询整体关系和模式 | 向量匹配相似关系 → 获取关联路径 | "公司的组织架构是怎样的" |
-| **hybrid** | 综合性问题 | local + global 结合 | "张三在公司的角色和职责" |
-
-```mermaid
-flowchart TD
- Question[用户问题] --> Analyze[问题分析]
-
- Analyze --> Local[Local 模式
实体中心]
- Analyze --> Global[Global 模式
关系中心]
- Analyze --> Hybrid[Hybrid 模式
综合查询]
-
- Local --> FindEntity[找到相关实体]
- FindEntity --> GetNeighbors[获取邻居和关系]
-
- Global --> FindRelations[找到相关关系]
- FindRelations --> GetContext[获取关系上下文]
-
- Hybrid --> Local
- Hybrid --> Global
-
- GetNeighbors --> Context[生成上下文]
- GetContext --> Context
-
- Context --> Return[返回给 LLM]
-
- style Local fill:#bbdefb
- style Global fill:#c5e1a5
- style Hybrid fill:#fff59d
-```
-
-**实际例子**:
-
-假设知识图谱中有:
-- 实体:张三(人物)、数据库团队(组织)、PostgreSQL(技术)
-- 关系:张三 --属于--> 数据库团队,张三 --擅长--> PostgreSQL
-
-问题:"张三负责什么?"
-
-1. **Local 模式**:
- - 找到"张三"实体
- - 获取所有直接相连的节点
- - 返回:"张三属于数据库团队,擅长 PostgreSQL"
-
-2. **Global 模式**:
- - 找到相关的关系模式:"负责"、"属于"
- - 返回整个团队的结构和职责分工
-
-3. **Hybrid 模式**:
- - 同时使用上述两种方式
- - 给出更全面的答案
-
-### 4.3 Agent 对话系统
-
-Agent 是 ApeRAG 的智能助手,它能调用各种工具来回答问题。
-
-```mermaid
-sequenceDiagram
- participant User as 用户
- participant API as API Server
- participant Agent as Agent 服务
- participant LLM as LLM 服务
- participant MCP as MCP 工具
- participant Search as 检索服务
-
- User->>API: 发送问题
- API->>Agent: 转发问题
-
- Agent->>LLM: 调用 LLM
携带工具列表
- LLM-->>Agent: 决定调用 search_collection 工具
-
- Agent->>MCP: 执行工具调用
- MCP->>Search: 混合检索
- Search-->>MCP: 返回相关文档片段
- MCP-->>Agent: 工具执行结果
-
- Agent->>LLM: 再次调用 LLM
携带检索到的上下文
- LLM-->>Agent: 生成最终答案
-
- Agent-->>API: 流式返回
- API-->>User: SSE 推送答案
-```
-
-**Agent 的工作流程**:
-
-1. **接收问题**:用户发送一个问题
-
-2. **工具决策**:LLM 分析问题,决定需要调用哪些工具
- - 可能的工具:search_collection(检索知识库)、web_search(搜索网络)、web_read(读取网页)等
-
-3. **执行工具**:Agent 调用对应的工具
- - 示例:search_collection 会触发混合检索,返回相关文档
-
-4. **生成答案**:LLM 基于检索到的上下文生成答案
-
-5. **流式返回**:答案通过 SSE(Server-Sent Events)实时推送给用户,不用等待全部生成完毕
-
-**MCP 协议的作用**:
-
-MCP(Model Context Protocol)是一个标准化的工具协议,让 AI 助手(如 Claude Desktop、Cursor)能够方便地调用 ApeRAG 的能力。通过 MCP,外部 AI 工具可以:
-- 列出你的知识库
-- 搜索知识库内容
-- 读取网页内容
-- 搜索互联网
-
-**对话示例**:
-
-```
-用户:ApeRAG 的图谱索引是怎么工作的?
-
-Agent 思考:需要检索知识库
-↓
-调用工具:search_collection(query="图谱索引工作原理", collection_id="aperag-docs")
-↓
-检索结果:返回关于图谱构建、实体提取、关系抽取的文档片段
-↓
-Agent 回答:ApeRAG 的图谱索引通过以下步骤工作...(基于检索到的内容生成)
-```
-
-## 5. 存储架构
-
-ApeRAG 采用多存储架构,为不同类型的数据选择最合适的存储方案。
-
-### 5.1 存储选型决策
-
-```mermaid
-flowchart TD
- Data["🎯 数据类型分类"] --> Choice{"📊 什么数据?"}
-
- Choice --> |"📋 结构化数据
用户、配置等"| PG["PostgreSQL"]
- Choice --> |"🔢 向量数据
embeddings"| Qdrant["Qdrant"]
- Choice --> |"📝 文本数据
全文搜索"| ES["Elasticsearch"]
- Choice --> |"📁 文件数据
原始文档"| MinIO["MinIO/S3"]
- Choice --> |"🕸️ 图数据
知识图谱"| GraphChoice{"图规模?"}
- Choice --> |"⚡ 缓存数据
临时数据"| Redis["Redis"]
-
- GraphChoice -->|"小规模
< 10万实体
💰 推荐"| PG2["PostgreSQL
内置图存储"]
- GraphChoice -->|"大规模
> 100万实体"| Neo4j["Neo4j
专业图数据库"]
-
- PG --> PGUse["✅ 事务支持
✅ 关系查询
✅ 小规模图存储
✅ 成熟稳定"]
- PG2 --> PG2Use["✅ 无需额外组件
✅ 降低运维成本
✅ 足够应对大多数场景"]
- Qdrant --> QdrantUse["✅ 向量相似度搜索
✅ 高维数据检索
✅ 支持过滤条件"]
- ES --> ESUse["✅ 全文检索 BM25
✅ 关键词搜索
✅ 中文分词 IK"]
- MinIO --> MinIOUse["✅ 大文件存储
✅ S3 协议兼容
✅ 成本低"]
- Neo4j --> Neo4jUse["✅ 大规模图查询
✅ 复杂关系遍历
✅ 图算法支持"]
- Redis --> RedisUse["✅ Celery 任务队列
✅ LLM 调用缓存
✅ 毫秒级访问"]
-
- style Data fill:#e3f2fd,stroke:#1976d2,stroke-width:3px
- style Choice fill:#fff59d,stroke:#fbc02d,stroke-width:3px
- style GraphChoice fill:#fff59d,stroke:#fbc02d,stroke-width:2px
- style PG fill:#bbdefb,stroke:#1976d2,stroke-width:2px
- style PG2 fill:#c8e6c9,stroke:#388e3c,stroke-width:3px
- style Qdrant fill:#c5e1a5,stroke:#689f38,stroke-width:2px
- style ES fill:#ffccbc,stroke:#e64a19,stroke-width:2px
- style MinIO fill:#e1bee7,stroke:#8e24aa,stroke-width:2px
- style Neo4j fill:#f8bbd0,stroke:#c2185b,stroke-width:2px
- style Redis fill:#ffecb3,stroke:#ffa000,stroke-width:2px
-```
-
-### 5.2 数据流向
-
-不同的数据在系统中流转到不同的存储:
-
-```mermaid
-flowchart LR
- Doc[上传文档] --> Parser[解析器]
- Parser --> |原始文件| MinIO[(MinIO)]
- Parser --> |文档元数据| PG1[(PostgreSQL)]
- Parser --> |文档块| Chunks[分块]
-
- Chunks --> |生成向量| Embed[Embedding]
- Embed --> Qdrant[(Qdrant)]
-
- Chunks --> |文本内容| ES[(Elasticsearch)]
-
- Chunks --> |实体关系提取| Graph[图谱构建]
- Graph --> |小规模| PG2[(PostgreSQL)]
- Graph --> |大规模| Neo4j[(Neo4j)]
-
- PG1 -.元数据.-> Cache
- Cache -.缓存.-> Redis[(Redis)]
-
- style Doc fill:#e1f5ff
- style MinIO fill:#e1bee7
- style PG1 fill:#bbdefb
- style PG2 fill:#bbdefb
- style Qdrant fill:#c5e1a5
- style ES fill:#ffccbc
- style Neo4j fill:#f8bbd0
- style Redis fill:#ffecb3
-```
-
-### 5.3 核心存储系统
-
-**PostgreSQL**(主数据库)
-
-存储内容:
-- 用户信息、权限、配置
-- Collection(知识库)元数据
-- 文档元数据和索引状态
-- 对话历史
-- 小规模知识图谱(< 10 万实体)
-
-为什么选择:
-- 强大的事务支持,保证数据一致性
-- 成熟稳定,运维成本低
-- pgvector 扩展,支持向量存储
-- 可以承载小规模图数据,不需要额外的图数据库
-
-**Qdrant**(向量数据库)
-
-存储内容:
-- 文档块的 embedding 向量
-- 实体和关系的向量表示
-- 图片的多模态向量
-
-为什么选择:
-- 专门为向量检索优化,速度快
-- 支持过滤条件,可以结合元数据筛选
-- 支持集群部署,可以水平扩展
-
-**Elasticsearch**(全文搜索)
-
-存储内容:
-- 文档块的文本内容
-- 支持中文分词(IK Analyzer)
-
-为什么选择:
-- BM25 算法对关键词搜索效果好
-- 支持复杂的查询和聚合
-- 自带高亮显示
-
-**MinIO**(对象存储)
-
-存储内容:
-- 原始文档文件(PDF、Word 等)
-- 解析后的中间结果
-- 上传的临时文件
-
-为什么选择:
-- S3 协议兼容,可以替换为云存储
-- 存储成本低
-- 支持大文件
-
-**图数据库选择:PostgreSQL vs Neo4j**
-
-ApeRAG 支持两种图数据库方案:
-
-**PostgreSQL**(默认,推荐用于小规模)
-
-存储内容:
-- 知识图谱(< 10 万实体)
-- 图节点和边的关系数据
-
-推荐理由:
-- 无需额外部署,降低运维成本
-- 性能足够应对大多数场景
-- 事务支持完善,数据一致性有保障
-- 可以和其他业务数据共用一个数据库
-
-**Neo4j**(可选,用于大规模)
-
-存储内容:
-- 大规模知识图谱(> 100 万实体)
-
-什么时候需要:
-- 实体数量超过 10 万,PostgreSQL 查询性能下降
-- 需要复杂的图遍历查询(多跳关系)
-- 需要使用图算法(PageRank、社区发现等)
-
-**总结**:对于大多数企业应用,PostgreSQL 完全够用。只有在知识图谱规模非常大时,才需要考虑 Neo4j。
-
-**Redis**(缓存和队列)
-
-存储内容:
-- Celery 任务队列
-- LLM 调用缓存
-- 用户会话缓存
-
-为什么选择:
-- 速度极快,适合高频访问
-- 支持多种数据结构
-- 可以做任务队列的 Broker
-
-## 6. 技术亮点
-
-### 6.1 无状态 LightRAG 重构
-
-**背景问题**:
-
-原版 LightRAG 使用全局状态,所有任务共享一个实例。这在多用户、多 Collection 的场景下会导致数据混乱和并发冲突。
-
-**ApeRAG 的解决方案**:
-
-- 每个任务创建独立的 LightRAG 实例
-- 通过 `workspace` 参数隔离不同 Collection 的数据
-- 实体命名规范:`entity:{name}:{workspace}`
-- 关系命名规范:`relationship:{src}:{tgt}:{workspace}`
-
-这样,不同用户的图谱数据不会互相干扰,真正实现了多租户隔离。
-
-### 6.2 双链异步架构
-
-**传统做法的问题**:
-
-用户上传文档后,API 需要等待解析、索引构建全部完成才能返回,可能要等几分钟甚至更久。
-
-**双链架构的优势**:
-
-- **前端链路**:API 只负责写状态到数据库,100ms 内返回
-- **后端链路**:Reconciler 定时检测状态变化,调度异步任务
-- **版本控制**:通过 version 和 observed_version 实现幂等性
-- **自动重试**:任务失败后自动重试,保证最终一致性
-
-这个设计灵感来自 Kubernetes 的 Reconciler 模式,非常适合处理长时间运行的任务。
-
-### 6.3 连通分量并发优化
-
-**问题**:
-
-知识图谱构建时,需要合并相似的实体。如果串行处理,速度很慢。如果全部并行,又会有锁竞争问题。
-
-**解决方案**:
-
-使用连通分量算法,把图谱分成多个独立的子图:
-
-1. 构建实体关系邻接表
-2. BFS 遍历找出所有连通分量
-3. 不同分量之间没有连接,可以完全并行处理
-4. 同一分量内部串行处理(避免冲突)
-
-**效果**:
-
-- 性能提升 2-3 倍
-- 零锁竞争
-- 对于多样化的文档集合效果最好
-
-### 6.4 Provider 抽象模式
-
-ApeRAG 支持 100+ 种 LLM 提供商(OpenAI、Claude、Gemini、国产大模型等)。如何统一管理?
-
-**设计思路**:
-
-- 定义统一的 Provider 接口
-- 每个提供商实现自己的 Provider
-- 通过 LiteLLM 库做适配
-
-这样,切换模型只需要改配置,不需要改代码。同样的模式也应用在:
-- Embedding Service(支持多种向量模型)
-- Rerank Service(支持多种重排序模型)
-- Web Search Service(DuckDuckGo、JINA 等)
-
-### 6.5 多模态索引支持
-
-除了文本,ApeRAG 也能处理图片:
-
-**Vision Index 的两条路径**:
-
-1. **纯视觉向量**:使用多模态模型(如 CLIP)直接生成图片向量
-2. **视觉转文本**:使用 VLM 生成图片描述 + OCR 识别文字 → 文本向量化
-
-**融合策略**:
-
-- 文本检索结果和视觉检索结果分开排序
-- 通过 Rerank 模型统一打分
-- 最终合并展示
-
-## 7. 总结
-
-ApeRAG 通过以下设计实现了生产级的 RAG 能力:
-
-**核心优势**:
-- **强大的文档处理**:支持多格式、复杂布局、表格公式
-- **知识图谱融合**:不仅是向量匹配,还能理解知识关联
-- **多种检索方式**:向量、全文、图谱三管齐下
-- **异步架构**:快速响应,后台处理,用户体验好
-- **生产级设计**:多存储、高并发、易扩展
-
-**技术创新**:
-- 无状态 LightRAG,真正的多租户支持
-- 双链异步架构,API 响应 < 100ms
-- 连通分量并发优化,图谱构建快 2-3 倍
-- Provider 抽象,支持 100+ LLM
-
-**适用场景**:
-- 企业知识库搜索
-- 技术文档问答
-- 客服机器人
-- 研究论文分析
-- 任何需要理解文档并提供智能问答的场景
-
-整个系统的设计理念是:**让复杂的事情变简单,让简单的事情变自动**。用户只需要上传文档,剩下的一切都由 ApeRAG 自动完成。
diff --git a/docs/zh-CN/design/chat_history_design.md b/docs/zh-CN/design/chat_history_design.md
deleted file mode 100644
index 87f95d6b5..000000000
--- a/docs/zh-CN/design/chat_history_design.md
+++ /dev/null
@@ -1,584 +0,0 @@
-# ApeRAG 聊天历史消息数据流程
-
-## 概述
-
-本文档详细说明ApeRAG项目中聊天历史消息的完整数据流程,从前端API调用到后端存储的全链路实现。
-
-**核心接口**: `GET /api/v1/bots/{bot_id}/chats/{chat_id}`
-
-## 数据流图
-
-```
-┌─────────────────┐
-│ Frontend │
-│ (Next.js) │
-└────────┬────────┘
- │ GET /api/v1/bots/{bot_id}/chats/{chat_id}
- ▼
-┌─────────────────────────────────────────────┐
-│ View Layer │
-│ aperag/views/chat.py │
-│ - get_chat_view() │
-│ - JWT身份验证 │
-│ - 参数验证 │
-└────────┬────────────────────────────────────┘
- │ chat_service_global.get_chat()
- ▼
-┌─────────────────────────────────────────────┐
-│ Service Layer │
-│ aperag/service/chat_service.py │
-│ - get_chat() │
-│ - 业务逻辑编排 │
-└────────┬────────────────────────────────────┘
- │
- ├──────────────┬─────────────┐
- │ │ │
- ▼ ▼ ▼
-┌────────────┐ ┌───────────┐ ┌──────────────┐
-│ PostgreSQL │ │ Redis │ │ PostgreSQL │
-│ chat表 │ │ 消息历史 │ │ feedback表 │
-│(基本信息) │ │(会话内容) │ │(用户反馈) │
-└────────────┘ └───────────┘ └──────────────┘
- │ │ │
- └──────────────┴──────────────────┘
- │
- ▼
- ┌──────────────┐
- │ ChatDetails │
- │ (组装响应) │
- └──────────────┘
-```
-
-## 完整流程详解
-
-### 1. View层 - HTTP请求处理
-
-**文件**: `aperag/views/chat.py`
-
-```python
-@router.get("/bots/{bot_id}/chats/{chat_id}")
-async def get_chat_view(
- request: Request,
- bot_id: str,
- chat_id: str,
- user: User = Depends(required_user)
-) -> view_models.ChatDetails:
- return await chat_service_global.get_chat(str(user.id), bot_id, chat_id)
-```
-
-**职责**:
-- 接收HTTP GET请求
-- JWT Token身份验证
-- 提取路径参数 (bot_id, chat_id)
-- 调用Service层
-- 返回`ChatDetails`响应
-
-### 2. Service层 - 业务逻辑编排
-
-**文件**: `aperag/service/chat_service.py`
-
-```python
-async def get_chat(self, user: str, bot_id: str, chat_id: str) -> view_models.ChatDetails:
- from aperag.utils.history import query_chat_messages
-
- # Step 1: 从PostgreSQL查询Chat基本信息
- chat = await self.db_ops.query_chat(user, bot_id, chat_id)
- if chat is None:
- raise ChatNotFoundException(chat_id)
-
- # Step 2: 从Redis查询聊天消息历史
- messages = await query_chat_messages(user, chat_id)
-
- # Step 3: 构建响应对象(消息中已包含feedback信息)
- chat_obj = self.build_chat_response(chat)
- return ChatDetails(**chat_obj.model_dump(), history=messages)
-```
-
-**核心逻辑**:
-
-1. **查询Chat元数据** (PostgreSQL)
-2. **查询消息历史** (Redis + PostgreSQL反馈信息)
-3. **组装完整响应**
-
-### 3. 数据存储层
-
-#### 3.1 PostgreSQL - Chat基本信息
-
-**表**: `chat`
-
-**文件**: `aperag/db/models.py`
-
-```python
-class Chat(Base):
- __tablename__ = "chat"
-
- id = Column(String(24), primary_key=True) # chat_xxxx
- user = Column(String(256), nullable=False) # 用户ID
- bot_id = Column(String(24), nullable=False) # Bot ID
- title = Column(String(256)) # 会话标题
- peer_type = Column(EnumColumn(ChatPeerType)) # 对话类型
- peer_id = Column(String(256)) # 对话ID
- status = Column(EnumColumn(ChatStatus)) # 状态
- gmt_created = Column(DateTime(timezone=True)) # 创建时间
- gmt_updated = Column(DateTime(timezone=True)) # 更新时间
- gmt_deleted = Column(DateTime(timezone=True)) # 删除时间(软删除)
-```
-
-**用途**: 存储Chat会话的元数据,不包含具体消息内容
-
-#### 3.2 Redis - 聊天消息历史
-
-**文件**: `aperag/utils/history.py`
-
-**Key格式**: `message_store:{chat_id}`
-
-**数据结构**: Redis List (使用LPUSH,最新消息在前)
-
-**核心类**:
-
-```python
-class RedisChatMessageHistory:
- def __init__(self, session_id: str, key_prefix: str = "message_store:"):
- self.session_id = session_id
- self.key_prefix = key_prefix
-
- @property
- def key(self) -> str:
- return self.key_prefix + self.session_id # message_store:chat_abc123
-
- @property
- async def messages(self) -> List[StoredChatMessage]:
- # 从Redis读取所有消息
- _items = await self.redis_client.lrange(self.key, 0, -1)
- # 反转为时间顺序(因为LPUSH导致最新在前)
- items = [json.loads(m.decode("utf-8")) for m in _items[::-1]]
- return [storage_dict_to_message(item) for item in items]
-```
-
-**消息查询函数**:
-
-```python
-async def query_chat_messages(user: str, chat_id: str):
- """查询聊天消息并转换为前端格式"""
-
- # 1. 从Redis获取消息历史
- chat_history = RedisChatMessageHistory(chat_id, redis_client=get_async_redis_client())
- stored_messages = await chat_history.messages
-
- if not stored_messages:
- return []
-
- # 2. 从PostgreSQL获取反馈信息
- feedbacks = await async_db_ops.query_chat_feedbacks(user, chat_id)
- feedback_map = {feedback.message_id: feedback for feedback in feedbacks}
-
- # 3. 转换为前端格式并附加反馈信息
- result = []
- for stored_message in stored_messages:
- # 转换为前端格式
- chat_message_list = stored_message.to_frontend_format()
-
- # 为AI消息添加反馈数据
- for chat_msg in chat_message_list:
- feedback = feedback_map.get(chat_msg.id)
- if feedback and chat_msg.role == "ai":
- chat_msg.feedback = Feedback(
- type=feedback.type,
- tag=feedback.tag,
- message=feedback.message
- )
-
- result.append(chat_message_list)
-
- return result # [[message1_parts], [message2_parts], [message3_parts], ...]
-```
-
-#### 3.3 PostgreSQL - 用户反馈信息
-
-**表**: `message_feedback`
-
-```python
-class MessageFeedback(Base):
- __tablename__ = "message_feedback"
-
- user = Column(String(256), nullable=False) # 用户ID
- chat_id = Column(String(24), primary_key=True) # 会话ID
- message_id = Column(String(256), primary_key=True) # 消息ID
- type = Column(EnumColumn(MessageFeedbackType)) # like/dislike
- tag = Column(EnumColumn(MessageFeedbackTag)) # 反馈标签
- message = Column(Text) # 反馈内容
- question = Column(Text) # 原始问题
- original_answer = Column(Text) # 原始回答
- status = Column(EnumColumn(MessageFeedbackStatus)) # 状态
- gmt_created = Column(DateTime(timezone=True))
- gmt_updated = Column(DateTime(timezone=True))
-```
-
-**用途**: 存储用户对AI回复的反馈(点赞/点踩),用于质量监控和模型优化
-
-## 数据格式详解
-
-### 存储格式 (Redis)
-
-消息在Redis中以JSON格式存储,采用**Part-Based设计**:
-
-#### StoredChatMessage - 一条完整消息
-
-```python
-class StoredChatMessage(BaseModel):
- """一条完整消息(用户的一条消息 或 AI的一条消息)"""
- parts: List[StoredChatMessagePart] # 消息的多个部分
- files: List[Dict[str, Any]] # 关联的上传文件
-```
-
-#### StoredChatMessagePart - 消息的一个部分
-
-```python
-class StoredChatMessagePart(BaseModel):
- """消息的单个部分(原子单元)"""
-
- # 标识信息
- chat_id: str # 所属会话
- message_id: str # 所属消息(同一条消息的多个part共享)
- part_id: str # 部分的唯一ID
- timestamp: float # 生成时间戳
-
- # 内容分类
- type: Literal["message", "tool_call_result", "thinking", "references"]
- role: Literal["human", "ai", "system"]
- content: str
-
- # 扩展字段
- references: List[Dict] # 文档引用
- urls: List[str] # URL引用
- metadata: Optional[Dict] # 额外元数据
-```
-
-#### Part类型说明
-
-| Type | 说明 | 包含在LLM上下文 |
-|------|------|---------------|
-| `message` | 主要对话内容 | ✅ 是 |
-| `tool_call_result` | 工具调用过程 | ❌ 否(仅展示) |
-| `thinking` | AI思考过程 | ❌ 否(仅展示) |
-| `references` | 文档引用和链接 | ❌ 否(仅展示) |
-
-**设计原因**: AI的一条回复包含多个阶段(工具调用、思考、回答、引用),这些内容按时序产生且互相穿插,单一字段无法表达。用户的消息通常只有1个part(type="message"),但也支持多个part以保持结构一致性。
-
-#### Redis存储示例
-
-**用户消息**:
-```json
-{
- "parts": [
- {
- "chat_id": "chat_abc123",
- "message_id": "uuid-1",
- "part_id": "uuid-part-1",
- "timestamp": 1699999999.0,
- "type": "message",
- "role": "human",
- "content": "什么是LightRAG?",
- "references": [],
- "urls": [],
- "metadata": null
- }
- ],
- "files": []
-}
-```
-
-**AI回复(包含多个part)**:
-```json
-{
- "parts": [
- {
- "message_id": "uuid-2",
- "part_id": "uuid-part-2",
- "type": "tool_call_result",
- "role": "ai",
- "content": "正在检索知识库...",
- "timestamp": 1699999999.1
- },
- {
- "message_id": "uuid-2",
- "part_id": "uuid-part-3",
- "type": "message",
- "role": "ai",
- "content": "LightRAG是一个轻量级的RAG框架,由ApeCloud团队深度改造...",
- "timestamp": 1699999999.5
- },
- {
- "message_id": "uuid-2",
- "part_id": "uuid-part-4",
- "type": "references",
- "role": "ai",
- "content": "",
- "references": [
- {
- "score": 0.95,
- "text": "LightRAG架构说明...",
- "metadata": {"source": "lightrag_doc.pdf", "page": 3}
- }
- ],
- "urls": ["https://github.com/HKUDS/LightRAG"],
- "timestamp": 1699999999.6
- }
- ],
- "files": []
-}
-```
-
-### API响应格式
-
-**ChatDetails Schema**(FastAPI/Pydantic code-first schema):
-
-```yaml
-chatDetails:
- type: object
- properties:
- id: string # chat_abc123
- title: string # 会话标题
- bot_id: string # bot_xyz
- peer_id: string
- peer_type: string # system/feishu/weixin/web
- status: string # active/archived
- created: string # ISO 8601
- updated: string # ISO 8601
- history: # 二维数组
- type: array
- description: 对话历史,每个元素是一条消息
- items:
- type: array
- description: 一条消息包含多个parts(工具调用、思考、回答、引用等)
- items:
- $ref: '#/chatMessage'
-```
-
-**ChatMessage Schema**:
-
-```yaml
-chatMessage:
- type: object
- properties:
- id: string # message_id(同一轮次相同)
- part_id: string # part_id(每个part唯一)
- type: string # message/tool_call_result/thinking/references
- timestamp: number # Unix时间戳
- role: string # human/ai
- data: string # 消息内容
- references: # 文档引用(可选)
- type: array
- items:
- type: object
- properties:
- score: number
- text: string
- metadata: object
- urls: # URL引用(可选)
- type: array
- items:
- type: string
- feedback: # 用户反馈(可选)
- type: object
- properties:
- type: string # like/dislike
- tag: string
- message: string
- files: # 关联文件(可选)
- type: array
-```
-
-### 前端接收示例
-
-```json
-{
- "id": "chat_abc123",
- "title": "关于LightRAG的讨论",
- "bot_id": "bot_xyz",
- "status": "active",
- "created": "2025-01-01T00:00:00Z",
- "updated": "2025-01-01T01:00:00Z",
- "history": [
- [
- {
- "id": "uuid-1",
- "part_id": "uuid-part-1",
- "type": "message",
- "timestamp": 1699999999.0,
- "role": "human",
- "data": "什么是LightRAG?",
- "files": []
- }
- ],
- [
- {
- "id": "uuid-2",
- "part_id": "uuid-part-2",
- "type": "tool_call_result",
- "timestamp": 1699999999.1,
- "role": "ai",
- "data": "正在检索知识库...",
- "files": []
- },
- {
- "id": "uuid-2",
- "part_id": "uuid-part-3",
- "type": "message",
- "timestamp": 1699999999.5,
- "role": "ai",
- "data": "LightRAG是一个轻量级的RAG框架...",
- "files": []
- },
- {
- "id": "uuid-2",
- "part_id": "uuid-part-4",
- "type": "references",
- "timestamp": 1699999999.6,
- "role": "ai",
- "data": "",
- "references": [
- {
- "score": 0.95,
- "text": "LightRAG架构说明...",
- "metadata": {"source": "lightrag_doc.pdf"}
- }
- ],
- "urls": ["https://github.com/HKUDS/LightRAG"],
- "files": []
- }
- ]
- ]
-}
-```
-
-**注意**: `history`是二维数组,第一维是消息序列(按时间顺序),第二维是该条消息的多个part。例如:
-- `history[0]` = 用户的第1条消息的parts(通常只有1个part)
-- `history[1]` = AI的第1条回复的parts(可能有多个part:工具调用、思考、回答、引用)
-- `history[2]` = 用户的第2条消息的parts
-- `history[3]` = AI的第2条回复的parts
-- ...
-
-## 消息写入流程
-
-### Agent Runtime 写入路径
-
-旧的 WebSocket 聊天接口 `WS /api/v1/bots/{bot_id}/chats/{chat_id}/connect` 已经退休。
-当前 Agent 聊天写入走的是 v2 turn/timeline API 加 SSE 事件流。上面的 history schema 仍可作为背景说明,
-但不应再依据这份文档实现新的 WebSocket chat client。
-
-## 设计特点
-
-### 1. 混合存储架构
-
-| 存储 | 内容 | 原因 |
-|------|------|------|
-| PostgreSQL | Chat元数据 | 持久化、支持复杂查询 |
-| Redis | 消息历史 | 高性能读写、支持TTL |
-| PostgreSQL | 用户反馈 | 持久化、用于分析 |
-
-**优势**:
-- 性能优化:消息历史使用Redis快速读写
-- 数据持久化:重要元数据存储在PostgreSQL
-- 灵活性:可独立配置TTL、备份策略
-
-### 2. Part-Based消息设计
-
-**核心价值**:
-- ✅ 支持复杂的AI回复流程(工具调用→思考→回答→引用)
-- ✅ 前端可差异化渲染不同类型的内容
-- ✅ 完整记录时序关系(通过timestamp)
-- ✅ 灵活扩展(新增type无需改表结构)
-
-**为什么一条消息需要多个part**:
-
-AI的一条回复过程是时序产生、互相穿插的,例如:
-1. 🔍 Part1 (tool_call_result): "正在查询数据库..."
-2. 💭 Part2 (thinking): "找到了327条记录..."
-3. 🔍 Part3 (tool_call_result): "正在计算增长率..."
-4. 💭 Part4 (thinking): "环比增长15%..."
-5. 💬 Part5 (message): "根据数据分析,Q4表现优秀..."
-6. 📚 Part6 (references): [文档1, 文档2]
-
-这6个part属于AI的**一条消息**(共享同一个message_id),单一字段无法表达这种复杂的时序关系。
-
-### 3. 格式转换解耦
-
-提供三种格式转换:
-
-```python
-class StoredChatMessage:
- def to_frontend_format(self) -> List[ChatMessage]:
- """转换为前端展示格式"""
- # 包含所有types的parts
-
- def to_openai_format(self) -> List[Dict]:
- """转换为LLM调用格式"""
- # 只包含type="message"的parts
-
- def get_main_content(self) -> str:
- """获取主要回答内容"""
- # 第一个type="message"的content
-```
-
-**优势**:
-- 内部存储格式与外部接口解耦
-- 支持不同的消费场景
-- LLM上下文只包含实际对话内容,不包含工具调用和思考过程
-
-### 4. 三级ID设计
-
-```python
-chat_id = "chat_abc123" # 会话级别
-message_id = "uuid-msg-1" # 消息级别(同一条消息的多个part共享)
-part_id = "uuid-part-1" # 部分级别(每个part独立)
-```
-
-**作用**:
-- `chat_id`: 标识一个聊天会话
-- `message_id`: 将同一条消息的多个part分组(用于前端展示和反馈关联)
-- `part_id`: 每个part独立标识(用于单独操作,如复制、引用)
-
-## 性能考虑
-
-### Redis优化
-- **List数据结构**: LPUSH O(1), LRANGE O(N)
-- **可选TTL**: 自动过期历史消息
-- **连接池复用**: 全局Redis客户端
-
-### PostgreSQL优化
-- **索引**: user, bot_id, chat_id, status字段
-- **软删除**: 使用gmt_deleted
-- **分页查询**: list_chats支持分页
-
-### 传输优化
-- **WebSocket流式**: 边生成边发送
-- **增量更新**: 只传输新的part
-- **按需加载**: 懒加载历史消息
-
-## 相关文件
-
-### 核心实现
-- `aperag/views/chat.py` - View层接口
-- `aperag/service/chat_service.py` - Service层业务逻辑
-- `aperag/utils/history.py` - Redis消息历史管理
-- `aperag/chat/history/message.py` - 消息数据结构
-- `aperag/db/models.py` - 数据库模型
-- `aperag/db/repositories/chat.py` - Chat数据库操作
-- `aperag/schema/view_models.py` - Pydantic OpenAPI schema models
-
-### 前端实现
-- `web/src/app/workspace/bots/[botId]/chats/[chatId]/page.tsx` - 聊天详情页面
-- `web/src/components/chat/chat-messages.tsx` - 消息展示组件
-
-## 总结
-
-ApeRAG的聊天历史消息系统采用**混合存储 + Part-Based消息设计**:
-
-1. **PostgreSQL**存储Chat元数据和反馈(持久化、可查询)
-2. **Redis**存储消息历史(高性能、支持过期)
-3. **Part-Based设计**支持复杂的AI回复流程(工具调用、思考、回答、引用)
-4. **三级ID设计**支持消息分组和独立操作
-5. **清晰的分层架构**(View → Service → Repository → Storage)
-
-这种设计既保证了性能,又支持复杂的AI交互场景,同时具有良好的可扩展性。
diff --git a/docs/zh-CN/design/collection_knowledge_export_design.md b/docs/zh-CN/design/collection_knowledge_export_design.md
deleted file mode 100644
index 24b728da2..000000000
--- a/docs/zh-CN/design/collection_knowledge_export_design.md
+++ /dev/null
@@ -1,322 +0,0 @@
-# 知识库导出功能设计
-
-**状态**: 已实现(MVP)
-
----
-
-## 1. 背景与目标
-
-ApeRAG 文档处理管线会将用户上传的原始文件(PDF、Word 等)解析为结构化知识内容,并将所有产物存储在对象存储中。这些内容有独立的使用价值:
-
-- 将解析结果迁移到其他 RAG 框架(如 LlamaIndex、Dify)
-- 审查文档解析质量,发现截断/格式错误
-- 离线分析分块策略效果
-- 数据备份与合规存档
-
-本功能在知识库操作菜单中新增「**导出知识库**」按钮,允许知识库 Owner 将对象存储中该知识库目录的全部内容打包为 ZIP 文件下载。
-
-**MVP 范围**:
-- ✅ 触发导出(异步后台打包)
-- ✅ 实时进度展示(前端轮询)
-- ✅ 完成后一键下载 ZIP
-- ❌ 后台运行(对话框保持打开直到完成或失败)
-- ❌ 导出历史页面
-
----
-
-## 2. 导出内容
-
-对象存储中每个知识库的目录结构如下:
-
-```
-.objects/
-└── user-{user_id}/
- └── {collection_id}/ ← 导出此前缀下的全部内容
- ├── {document_id_1}/
- │ ├── original.pdf
- │ ├── converted.pdf
- │ ├── processed_content.md
- │ ├── chunks/
- │ │ ├── chunk_0.json
- │ │ └── chunk_1.json
- │ └── images/
- │ ├── page_0.png
- │ └── page_1.png
- └── {document_id_2}/
- └── ...
-```
-
-**导出策略**:对 `user-{user_id}/{collection_id}/` 前缀下的所有对象做全量导出,无任何过滤。
-
-生成的 ZIP 包结构:
-
-```
-{collection_title}_export_{YYYY-MM-DD}.zip
-├── manifest.json ← 元数据(id → 标题映射)
-├── {document_id_1}/
-│ └── ...(与对象存储目录结构一致)
-└── {document_id_2}/
- └── ...
-```
-
-`manifest.json` 格式:
-
-```json
-{
- "schema_version": "1.0",
- "collection": {
- "id": "colff4f33902752abee",
- "title": "医学文献库",
- "exported_at": "2026-03-04T10:00:00Z"
- },
- "documents": [
- { "id": "doc_xyz789", "title": "高血压诊疗指南", "status": "COMPLETE" }
- ]
-}
-```
-
-> `manifest.json` 仅作信息记录,不影响哪些文件被导出。导出的 ZIP 保存在对象存储的 `exports/user-{user_id}/export_{task_id}.zip`,7 天后由定时任务清理。
-
----
-
-## 3. 权限
-
-| 角色 | 能否导出 |
-|------|---------|
-| Collection Owner | ✅ |
-| 订阅用户(Marketplace) | ❌ |
-| 未登录用户 | ❌ |
-
-- 「导出知识库」按钮**仅对 Owner 渲染**(在 `collection-header.tsx` 中用 `collection.user === currentUser.id` 判断)
-- 后端通过查询 `Collection.user == user_id` 做权限校验,非 Owner 返回 `403`
-
----
-
-## 4. 系统架构
-
-### 整体流程
-
-```
-用户点击「导出知识库」
- │
- ▼
-前端弹出确认对话框
- │ 点击「开始导出」
- ▼
-POST /api/v1/collections/{id}/export
- │
- ├─► 校验 Owner 权限(非 Owner → 403)
- ├─► 检查并发任务数(同一用户 > 3 → 429)
- ├─► 创建 ExportTask(status=PENDING)
- └─► 触发 Celery 任务 export_collection_task.delay(task_id)
- │
- ▼ 202 Accepted: { export_task_id, status: "PENDING" }
- │
-前端切换为进度对话框,每 2 秒轮询
-GET /api/v1/export-tasks/{task_id}
- │
- ▼ status=COMPLETED
-前端显示「下载 ZIP」按钮
- │
- ▼
-GET /api/v1/export-tasks/{task_id}/download
-(后端从对象存储读取 ZIP,StreamingResponse 流式返回)
-```
-
-### Celery Worker 处理流程(`config/export_tasks.py`)
-
-```
-1. 更新 ExportTask.status → PROCESSING
-2. 列举对象存储 user-{user_id}/{collection_id}/ 下所有对象(list_objects_by_prefix)
-3. 创建本地临时目录 /tmp/export_{task_id}/
-4. 并发下载所有文件(最多 5 个并发,ThreadPoolExecutor)
- progress = downloaded / total * 85
-5. 从数据库查文档列表,生成 manifest.json → /tmp/export_{task_id}/
- progress → 90%
-6. 打包 ZIP(ZIP_DEFLATED)→ /tmp/export_{task_id}.zip
- progress → 95%
-7. 上传 ZIP 到对象存储 exports/user-{user_id}/export_{task_id}.zip
- progress → 98%
-8. 清理本地临时文件
-9. 更新 ExportTask:
- status=COMPLETED, progress=100, file_size, gmt_completed, gmt_expires=now()+7d
- (失败时:status=FAILED, error_message=)
-```
-
----
-
-## 5. 数据库
-
-### `export_task` 表
-
-| 字段 | 类型 | 说明 |
-|------|------|------|
-| `id` | VARCHAR(24) | PK,格式 `export` + random_id() |
-| `user` | VARCHAR(256) | 用户 ID |
-| `collection_id` | VARCHAR(24) | 所属知识库 |
-| `status` | VARCHAR(32) | 见下表 |
-| `progress` | INTEGER | 0–100 |
-| `message` | TEXT | 展示给用户的进度文字 |
-| `error_message` | TEXT | 失败时的错误详情 |
-| `object_store_path` | TEXT | ZIP 在对象存储的路径 |
-| `file_size` | BIGINT | ZIP 大小(字节) |
-| `gmt_created` | TIMESTAMP | |
-| `gmt_updated` | TIMESTAMP | |
-| `gmt_completed` | TIMESTAMP | |
-| `gmt_expires` | TIMESTAMP | 创建后 7 天 |
-
-### 状态枚举
-
-| 状态 | 说明 | 可下载 |
-|------|------|--------|
-| `PENDING` | 等待 Worker 处理 | ❌ |
-| `PROCESSING` | 正在打包中 | ❌ |
-| `COMPLETED` | 打包完成 | ✅ |
-| `FAILED` | 打包失败 | ❌ |
-| `EXPIRED` | ZIP 已被清理 | ❌ |
-
-```
-PENDING → PROCESSING → COMPLETED
- └→ FAILED
-COMPLETED / FAILED → EXPIRED(7 天后定时任务)
-```
-
----
-
-## 6. API
-
-API 定义源自 FastAPI route 和 Pydantic view model。
-
-修改 API 后需运行:
-```bash
-make openapi-check # 验证 code-first OpenAPI 可以导出
-```
-
-### 接口列表
-
-| 方法 | 路径 | 说明 | 权限 |
-|------|------|------|------|
-| `POST` | `/api/v1/collections/{collection_id}/export` | 创建导出任务 | Owner Only |
-| `GET` | `/api/v1/export-tasks/{task_id}` | 查询任务状态 | 任务创建者 |
-| `GET` | `/api/v1/export-tasks/{task_id}/download` | 下载 ZIP | 任务创建者 |
-
-### POST /collections/{collection_id}/export
-
-**Response 202**
-```json
-{ "export_task_id": "export6f918baaa28e0180", "status": "PENDING", "progress": 0 }
-```
-
-**错误码**
-
-| 场景 | HTTP |
-|------|------|
-| 非 Owner | 403 |
-| 集合不存在 | 404 |
-| 并发任务超限(>3) | 429 |
-
-### GET /export-tasks/{task_id}
-
-**处理中**
-```json
-{ "export_task_id": "...", "status": "PROCESSING", "progress": 58, "message": "Downloading files: 87 / 150" }
-```
-
-**完成**
-```json
-{
- "export_task_id": "...", "status": "COMPLETED", "progress": 100,
- "download_url": "/api/v1/export-tasks/.../download",
- "file_size": 8388608,
- "gmt_expires": "2026-03-11T10:01:30Z"
-}
-```
-
-### GET /export-tasks/{task_id}/download
-
-流式返回 ZIP 文件,`Content-Disposition: attachment; filename="{title}_export_{date}.zip"`。
-
----
-
-## 7. 前端
-
-### 入口
-
-`web/src/app/workspace/collections/[collectionId]/collection-header.tsx`
-
-Owner 的操作下拉菜单中:
-
-```
-┌──────────────────────┐
-│ 发布至市场 │
-│ 导出知识库 ← 新增 │
-│ ─────────────────── │
-│ 删除知识库 │
-└──────────────────────┘
-```
-
-### 组件:`CollectionExport`
-
-路径:`web/src/components/collections/export-dialog.tsx`
-
-组件内部实现四个状态的对话框状态机:
-
-```
-confirm → processing → completed
- └→ failed → confirm(点击「重试」)
-```
-
-- 使用自动生成的 SDK 方法调用接口:
- - `apiClient.defaultApi.createExportTask({ collectionId })`
- - `apiClient.defaultApi.getExportTask({ taskId })`(每 2 秒轮询一次)
-- 下载时直接使用 `download_url` 字段(指向 download 接口)
-- 进度对话框期间禁止关闭(屏蔽 ESC 和点击外部)
-
-### i18n
-
-翻译键均以 `export_knowledge_base_` 为前缀,定义在:
-- `web/src/i18n/en-US/page_collections.json`
-- `web/src/i18n/zh-CN/page_collections.json`
-
----
-
-## 8. 性能与限制
-
-| 限制项 | 值 |
-|--------|---|
-| 每用户最大并发导出任务数 | 3 |
-| Worker 内并发文件下载数 | 5(ThreadPoolExecutor) |
-| 导出文件保留时间 | 7 天 |
-
----
-
-## 9. 相关文件索引
-
-### 新增文件
-
-| 文件 | 说明 |
-|------|------|
-| `aperag/schema/view_models.py` | Pydantic API schema 定义 |
-| `aperag/views/export.py` | FastAPI 路径定义 |
-| `aperag/db/models.py` | 新增 `ExportTask` 模型和 `ExportTaskStatus` 枚举 |
-| `aperag/migration/versions/20260304120000-a1b2c3d4e5f6.py` | 数据库迁移 |
-| `aperag/service/export_service.py` | 导出业务逻辑 |
-| `aperag/views/export.py` | FastAPI 路由(3 个接口) |
-| `config/export_tasks.py` | Celery 异步打包任务 |
-| `web/src/components/collections/export-dialog.tsx` | 导出对话框组件 |
-| `docs/zh-CN/design/collection_knowledge_export_design.md` | 本文档 |
-
-### 修改的文件
-
-| 文件 | 修改内容 |
-|------|---------|
-| `aperag/objectstore/base.py` | 新增 `list_objects_by_prefix` 抽象方法 |
-| `aperag/objectstore/local.py` | 实现 `list_objects_by_prefix` |
-| `aperag/objectstore/s3.py` | 实现 `list_objects_by_prefix` |
-| `aperag/schema/view_models.py` | Pydantic view model,含 `ExportTaskResponse` |
-| `aperag/app.py` | 注册 `export_router` |
-| `config/celery.py` | 注册 `config.export_tasks` 模块 |
-| FE v2 feature adapter | 接入导出的 code-first OpenAPI typed client |
-| `web/src/app/workspace/collections/[collectionId]/collection-header.tsx` | 接入导出按钮 |
-| `web/src/i18n/en-US/page_collections.json` | 新增翻译键 |
-| `web/src/i18n/zh-CN/page_collections.json` | 新增翻译键 |
diff --git a/docs/zh-CN/design/connected_components_optimization.md b/docs/zh-CN/design/connected_components_optimization.md
deleted file mode 100644
index eaa565760..000000000
--- a/docs/zh-CN/design/connected_components_optimization.md
+++ /dev/null
@@ -1,281 +0,0 @@
-# LightRAG 连通分量优化技术文档
-
-## 概述
-
-本文档描述了为 LightRAG 图索引处理流程实现的连通分量优化。此优化通过识别和分别处理独立的实体组,显著提高了并发性能。
-
-## 问题陈述
-
-### 优化前的问题
-
-在原始实现中,处理提取的实体和关系时:
-
-1. **所有实体**从一个批次中被收集到一个单一集合中
-2. **创建一个大型的多重锁**来一次性锁定所有实体
-3. 所有处理都必须等待这个全局锁
-
-这种方法存在几个缺陷:
-
-- **并发性差**:如果任务A正在处理关于"技术"的实体,而任务B想要处理关于"历史"的实体,任务B必须等待,尽管这些主题完全不相关
-- **锁争用**:批次中的实体越多,锁冲突的可能性就越高
-- **可扩展性问题**:随着并发任务数量的增加,系统吞吐量会下降
-
-### 示例场景
-
-```
-文档1:"AI和机器学习正在改变技术..."
-文档2:"朱利叶斯·凯撒统治罗马帝国..."
-
-未优化时:
-- 任务1:锁定(AI, ML, 技术, 凯撒, 罗马, 帝国) → 处理 → 释放
-- 任务2:等待... → 锁定(全部) → 处理 → 释放
-```
-
-## 解决方案:连通分量
-
-### 核心概念
-
-我们将提取的实体和关系视为一个图,并找到**连通分量**——通过关系连接的实体组。不同分量中的实体之间没有关系,可以独立处理。
-
-### 实现方式
-
-#### 1. 发现连通分量 (`_find_connected_components`)
-
-```python
-def _find_connected_components(self, chunk_results) -> List[List[str]]:
- # 从实体和关系构建邻接表
- # 使用BFS查找所有连通分量
- # 返回实体组列表
-```
-
-此方法:
-- 从所有提取的实体和关系构建邻接表
-- 使用广度优先搜索(BFS)识别连通分量
-- 返回一个列表,其中每个元素是一组连接的实体名称
-
-#### 2. 处理实体组 (`_process_entity_groups`)
-
-```python
-async def _process_entity_groups(self, chunk_results, components, collection_id):
- for component in components:
- # 为此分量过滤chunk_results
- # 仅为此分量中的实体创建锁
- # 独立处理此组
-```
-
-此方法:
-- 分别处理每个连通分量
-- 仅为每个分量内的实体创建锁
-- 允许无关分量的并行处理
-
-#### 3. 更新后的图索引处理 (`aprocess_graph_indexing`)
-
-主要处理流程现在:
-1. 提取实体和关系(不变)
-2. 查找连通分量
-3. 使用各自的锁范围处理每个分量
-
-## 优势
-
-### 1. 提高并发性
-
-```
-优化后:
-- 任务1:锁定(AI, ML, 技术) → 处理 → 释放
-- 任务2:锁定(凯撒, 罗马, 帝国) → 处理 → 释放
- ↑ 可以并行运行!
-```
-
-### 2. 减少锁争用
-
-- 较小的锁范围意味着更少的冲突机会
-- 独立主题可以同时处理
-- 在多核系统中更好地利用CPU
-
-### 3. 更好的可扩展性
-
-- 即使有许多并发任务,系统也能保持高吞吐量
-- 处理时间与最大分量的大小成比例,而不是总实体数
-- 对于多样化的文档集合特别有效
-
-## 性能影响
-
-### 典型改进
-
-- 对于多样化文档集合,**吞吐量提高2-3倍**
-- 对于无关内容,与CPU核心数**接近线性扩展**
-- 分量检测的**最小开销**(<总处理时间的1%)
-
-### 最佳情况场景
-
-- 处理来自不同领域的文档(技术、历史、科学等)
-- 包含许多小的、无关主题的大型集合
-- 具有高并发文档摄取的系统
-
-### 最差情况场景
-
-- 所有实体都连接(退化为原始行为)
-- 非常小的批次(开销变得更加明显)
-- 关于单一、高度互连主题的文档
-
-## 使用示例
-
-```python
-# 优化是自动且透明的
-lightrag = LightRAG(...)
-
-# 处理文档 - 连通分量在内部处理
-result = await lightrag.aprocess_graph_indexing(chunks)
-
-# 结果包含分量信息
-print(f"处理了 {result['groups_processed']} 个独立组")
-```
-
-## 测试
-
-在 `test_lightrag_connected_components.py` 中提供了全面的单元测试:
-
-- 单一分量(全部连接)
-- 多个分量
-- 孤立实体
-- 复杂图结构
-- 分量之间的边过滤
-
-## 未来增强
-
-1. **动态批处理**:根据分量特性调整批次大小
-2. **优先级处理**:优先处理较大/更重要的分量
-3. **分量缓存**:为相似文档模式缓存分量结构
-4. **指标和监控**:跟踪分量统计信息以获得优化洞察
-
-## 结论
-
-连通分量优化是对 LightRAG 图索引处理流程的重大改进。通过识别和分别处理独立的实体组,我们实现了更好的并发性、减少的锁争用和改进的可扩展性——同时保持知识图谱的正确性和一致性。
-
-## 技术细节
-
-### 连通分量算法实现
-
-我们使用广度优先搜索(BFS)算法来发现图中的连通分量:
-
-```python
-def _find_connected_components(self, chunk_results):
- """
- 使用BFS算法查找连通分量
-
- Args:
- chunk_results: 包含实体和关系的提取结果
-
- Returns:
- List[List[str]]: 连通分量列表,每个分量包含相关实体名称
- """
- # 1. 构建邻接表
- adjacency = defaultdict(set)
- all_entities = set()
-
- # 从关系中构建图
- for result in chunk_results:
- for relationship in result.get('relationships', []):
- src = relationship['src_id']
- tgt = relationship['tgt_id']
- adjacency[src].add(tgt)
- adjacency[tgt].add(src)
- all_entities.add(src)
- all_entities.add(tgt)
-
- # 2. 使用BFS查找连通分量
- visited = set()
- components = []
-
- for entity in all_entities:
- if entity not in visited:
- component = []
- queue = [entity]
- visited.add(entity)
-
- while queue:
- current = queue.pop(0)
- component.append(current)
-
- for neighbor in adjacency[current]:
- if neighbor not in visited:
- visited.add(neighbor)
- queue.append(neighbor)
-
- components.append(component)
-
- return components
-```
-
-### 并发处理策略
-
-每个连通分量使用独立的锁范围进行处理:
-
-```python
-async def _process_entity_groups(self, chunk_results, components, collection_id):
- """
- 并发处理多个连通分量
-
- Args:
- chunk_results: 提取的结果数据
- components: 连通分量列表
- collection_id: 集合ID,用于工作空间隔离
- """
- tasks = []
-
- for i, component in enumerate(components):
- # 为每个分量创建独立的处理任务
- task = self._process_single_component(
- chunk_results, component, collection_id, i
- )
- tasks.append(task)
-
- # 并发执行所有分量处理任务
- results = await asyncio.gather(*tasks, return_exceptions=True)
-
- return self._merge_component_results(results)
-
-async def _process_single_component(self, chunk_results, component, collection_id, component_id):
- """
- 处理单个连通分量
- """
- # 1. 过滤属于此分量的数据
- filtered_results = self._filter_results_for_component(chunk_results, component)
-
- # 2. 创建分量级别的锁
- component_locks = [f"entity:{entity}:{collection_id}" for entity in component]
-
- # 3. 使用细粒度锁处理
- async with self.concurrent_manager.multi_lock(component_locks, timeout=30):
- # 处理实体合并
- merged_entities = await self._merge_entities_in_component(filtered_results)
-
- # 处理关系合并
- merged_relationships = await self._merge_relationships_in_component(filtered_results)
-
- return {
- 'component_id': component_id,
- 'entities': merged_entities,
- 'relationships': merged_relationships,
- 'entity_count': len(component)
- }
-```
-
-### 性能监控
-
-系统提供了详细的性能统计信息:
-
-```python
-# 连通分量统计
-component_stats = {
- 'total_components': len(components),
- 'max_component_size': max(len(comp) for comp in components) if components else 0,
- 'avg_component_size': sum(len(comp) for comp in components) / len(components) if components else 0,
- 'single_entity_components': sum(1 for comp in components if len(comp) == 1),
- 'large_components': sum(1 for comp in components if len(comp) > 10)
-}
-
-logger.info(f"连通分量分析完成: {component_stats}")
-```
-
-这种优化策略在处理多样化文档集合时效果显著,特别是当文档涵盖不同主题领域时,能够实现真正的并行处理,大幅提升系统整体性能。
\ No newline at end of file
diff --git a/docs/zh-CN/design/document_upload_design.md b/docs/zh-CN/design/document_upload_design.md
deleted file mode 100644
index 0587be9c9..000000000
--- a/docs/zh-CN/design/document_upload_design.md
+++ /dev/null
@@ -1,1077 +0,0 @@
----
-title: 文档上传设计
-position: 3
----
-
-# ApeRAG 文档上传架构设计
-
-## 概述
-
-本文档详细说明 ApeRAG 项目中文档上传模块的完整架构设计,涵盖从文件上传、临时存储、文档解析、格式转换到最终索引构建的全链路流程。
-
-**核心设计理念**:采用**两阶段提交**模式,将文件上传(临时存储)和文档确认(正式添加)分离,提供更好的用户体验和资源管理能力。
-
-## 系统架构
-
-### 整体架构图
-
-```
-┌─────────────────────────────────────────────────────────────┐
-│ Frontend │
-│ (Next.js) │
-└────────┬───────────────────────────────────┬────────────────┘
- │ │
- │ Step 1: Upload │ Step 2: Confirm
- │ POST /documents/upload │ POST /documents/confirm
- ▼ ▼
-┌─────────────────────────────────────────────────────────────┐
-│ View Layer: aperag/views/collections.py │
-│ - HTTP请求处理 │
-│ - JWT身份验证 │
-│ - 参数验证 │
-└────────┬───────────────────────────────────┬────────────────┘
- │ │
- │ document_service.upload_document() │ document_service.confirm_documents()
- ▼ ▼
-┌─────────────────────────────────────────────────────────────┐
-│ Service Layer: aperag/service/document_service.py │
-│ - 业务逻辑编排 │
-│ - 文件验证(类型、大小) │
-│ - SHA-256 哈希去重 │
-│ - Quota 检查 │
-│ - 事务管理 │
-└────────┬───────────────────────────────────┬────────────────┘
- │ │
- │ Step 1 │ Step 2
- ▼ ▼
-┌────────────────────────┐ ┌────────────────────────────┐
-│ 1. 创建 Document 记录 │ │ 1. 更新 Document 状态 │
-│ status=UPLOADED │ │ UPLOADED → PENDING │
-│ 2. 保存到 ObjectStore │ │ 2. 创建 DocumentIndex 记录│
-│ 3. 计算 content_hash │ │ 3. 触发索引构建任务 │
-└────────┬───────────────┘ └────────┬───────────────────┘
- │ │
- ▼ ▼
-┌─────────────────────────────────────────────────────────────┐
-│ Storage Layer │
-│ │
-│ ┌───────────────┐ ┌──────────────────┐ ┌─────────────┐ │
-│ │ PostgreSQL │ │ Object Store │ │ Vector DB │ │
-│ │ │ │ │ │ │ │
-│ │ - document │ │ - Local/S3 │ │ - Qdrant │ │
-│ │ - document_ │ │ - 原始文件 │ │ - 向量索引 │ │
-│ │ index │ │ - 转换后的文件 │ │ │ │
-│ └───────────────┘ └──────────────────┘ └─────────────┘ │
-│ │
-│ ┌───────────────┐ ┌──────────────────┐ │
-│ │ Elasticsearch │ │ Neo4j/PG │ │
-│ │ │ │ │ │
-│ │ - 全文索引 │ │ - 知识图谱 │ │
-│ └───────────────┘ └──────────────────┘ │
-└─────────────────────────────────────────────────────────────┘
- │
- ▼
- ┌───────────────────┐
- │ Celery Workers │
- │ │
- │ - 文档解析 │
- │ - 格式转换 │
- │ - 内容提取 │
- │ - 文档分块 │
- │ - 索引构建 │
- └───────────────────┘
-```
-
-### 分层架构
-
-```
-┌─────────────────────────────────────────────┐
-│ View Layer (views/collections.py) │ HTTP 处理、认证、参数验证
-└─────────────────┬───────────────────────────┘
- │ 调用
-┌─────────────────▼───────────────────────────┐
-│ Service Layer (service/document_service.py)│ 业务逻辑、事务编排、权限控制
-└─────────────────┬───────────────────────────┘
- │ 调用
-┌─────────────────▼───────────────────────────┐
-│ Repository Layer (db/ops.py, objectstore/) │ 数据访问抽象、对象存储接口
-└─────────────────┬───────────────────────────┘
- │ 访问
-┌─────────────────▼───────────────────────────┐
-│ Storage Layer (PG, S3, Qdrant, ES, Neo4j) │ 数据持久化
-└─────────────────────────────────────────────┘
-```
-
-## 核心流程详解
-
-### 阶段 0: API 接口定义
-
-系统提供三个主要接口:
-
-1. **上传文件**(两阶段模式 - 第一步)
- - 接口:`POST /api/v1/collections/{collection_id}/documents/upload`
- - 功能:上传文件到临时存储,状态为 `UPLOADED`
- - 返回:`document_id`、`filename`、`size`、`status`
-
-2. **确认文档**(两阶段模式 - 第二步)
- - 接口:`POST /api/v1/collections/{collection_id}/documents/confirm`
- - 功能:确认已上传的文档,触发索引构建
- - 参数:`document_ids` 数组
- - 返回:`confirmed_count`、`failed_count`、`failed_documents`
-
-3. **一步上传**(传统模式,兼容旧版)
- - 接口:`POST /api/v1/collections/{collection_id}/documents`
- - 功能:上传并直接添加到知识库,状态直接为 `PENDING`
- - 支持批量上传
-
-### 阶段 1: 文件上传与临时存储
-
-#### 1.1 上传流程
-
-```
-用户选择文件
- │
- ▼
-前端调用 upload API
- │
- ▼
-View 层验证身份和参数
- │
- ▼
-Service 层处理业务逻辑:
- │
- ├─► 验证集合存在且激活
- │
- ├─► 验证文件类型和大小
- │
- ├─► 读取文件内容
- │
- ├─► 计算 SHA-256 哈希
- │
- └─► 事务处理:
- │
- ├─► 重复检测(按文件名+哈希)
- │ ├─ 完全相同:返回已存在文档(幂等)
- │ ├─ 同名不同内容:抛出冲突异常
- │ └─ 新文档:继续创建
- │
- ├─► 创建 Document 记录(status=UPLOADED)
- │
- ├─► 上传到对象存储
- │ └─ 路径:user-{user_id}/{collection_id}/{document_id}/original{suffix}
- │
- └─► 更新文档元数据(object_path)
-```
-
-#### 1.2 文件验证
-
-**支持的文件类型**:
-- 文档:`.pdf`, `.doc`, `.docx`, `.ppt`, `.pptx`, `.xls`, `.xlsx`
-- 文本:`.txt`, `.md`, `.html`, `.json`, `.xml`, `.yaml`, `.yml`, `.csv`
-- 图片:`.png`, `.jpg`, `.jpeg`, `.gif`, `.bmp`, `.tiff`, `.tif`
-- 音频:`.mp3`, `.wav`, `.m4a`
-- 压缩包:`.zip`, `.tar`, `.gz`, `.tgz`
-
-**大小限制**:
-- 默认:100 MB(可通过 `MAX_DOCUMENT_SIZE` 环境变量配置)
-- 解压后总大小:5 GB(`MAX_EXTRACTED_SIZE`)
-
-#### 1.3 重复检测机制
-
-采用**文件名 + SHA-256 哈希**双重检测:
-
-| 场景 | 文件名 | 哈希值 | 系统行为 |
-|------|--------|--------|----------|
-| 完全相同 | 相同 | 相同 | 返回已存在文档(幂等操作) |
-| 文件名冲突 | 相同 | 不同 | 抛出 `DocumentNameConflictException` |
-| 新文档 | 不同 | - | 创建新文档记录 |
-
-**优势**:
-- ✅ 支持幂等上传:网络重传不会创建重复文档
-- ✅ 避免内容冲突:同名不同内容会提示用户
-- ✅ 节省存储空间:相同内容只存储一次
-
-### 阶段 2: 临时存储配置
-
-#### 2.1 对象存储类型
-
-系统支持两种对象存储后端,可通过环境变量切换:
-
-**1. Local 存储(本地文件系统)**
-
-适用场景:
-- 开发测试环境
-- 小规模部署
-- 单机部署
-
-配置方式:
-```bash
-# 开发环境
-OBJECT_STORE_TYPE=local
-OBJECT_STORE_LOCAL_ROOT_DIR=.objects
-
-# Docker 环境
-OBJECT_STORE_TYPE=local
-OBJECT_STORE_LOCAL_ROOT_DIR=/shared/objects
-```
-
-存储路径示例:
-```
-.objects/
-└── user-google-oauth2-123456/
- └── col_abc123/
- └── doc_xyz789/
- ├── original.pdf # 原始文件
- ├── converted.pdf # 转换后的 PDF
- ├── processed_content.md # 解析后的 Markdown
- ├── chunks/ # 分块数据
- │ ├── chunk_0.json
- │ └── chunk_1.json
- └── images/ # 提取的图片
- ├── page_0.png
- └── page_1.png
-```
-
-**2. S3 存储(兼容 AWS S3/MinIO/OSS 等)**
-
-适用场景:
-- 生产环境
-- 大规模部署
-- 分布式部署
-- 需要高可用和容灾
-
-配置方式:
-```bash
-OBJECT_STORE_TYPE=s3
-OBJECT_STORE_S3_ENDPOINT=http://127.0.0.1:9000 # MinIO/S3 地址
-OBJECT_STORE_S3_REGION=us-east-1 # AWS Region
-OBJECT_STORE_S3_ACCESS_KEY=minioadmin # Access Key
-OBJECT_STORE_S3_SECRET_KEY=minioadmin # Secret Key
-OBJECT_STORE_S3_BUCKET=aperag # Bucket 名称
-OBJECT_STORE_S3_PREFIX_PATH=dev/ # 可选的路径前缀
-OBJECT_STORE_S3_USE_PATH_STYLE=true # MinIO 需要设置为 true
-```
-
-#### 2.2 对象存储路径规则
-
-**路径格式**:
-```
-{prefix}/user-{user_id}/{collection_id}/{document_id}/{filename}
-```
-
-**组成部分**:
-- `prefix`:可选的全局前缀(仅 S3)
-- `user_id`:用户 ID(`|` 替换为 `-`)
-- `collection_id`:集合 ID
-- `document_id`:文档 ID
-- `filename`:文件名(如 `original.pdf`、`page_0.png`)
-
-**多租户隔离**:
-- 每个用户有独立的命名空间
-- 每个集合有独立的存储目录
-- 每个文档有独立的文件夹
-
-### 阶段 3: 文档确认与索引构建
-
-#### 3.1 确认流程
-
-```
-用户点击"保存到集合"
- │
- ▼
-前端调用 confirm API
- │
- ▼
-Service 层处理:
- │
- ├─► 验证集合配置
- │
- ├─► 检查 Quota(确认阶段才扣除配额)
- │
- └─► 对每个 document_id:
- │
- ├─► 验证文档状态为 UPLOADED
- │
- ├─► 更新文档状态:UPLOADED → PENDING
- │
- ├─► 根据集合配置创建索引记录:
- │ ├─ VECTOR(向量索引,必选)
- │ ├─ FULLTEXT(全文索引,必选)
- │ ├─ GRAPH(知识图谱,可选)
- │ ├─ SUMMARY(文档摘要,可选)
- │ └─ VISION(视觉索引,可选)
- │
- └─► 返回确认结果
- │
- ▼
-触发 Celery 任务:reconcile_document_indexes
- │
- ▼
-后台异步处理索引构建
-```
-
-#### 3.2 Quota(配额)管理
-
-**检查时机**:
-- ❌ 不在上传阶段检查(临时存储不占用配额)
-- ✅ 在确认阶段检查(正式添加才消耗配额)
-
-**配额类型**:
-
-1. **用户全局配额**
- - `max_document_count`:用户总文档数量限制
- - 默认:1000(可通过 `MAX_DOCUMENT_COUNT` 配置)
-
-2. **单集合配额**
- - `max_document_count_per_collection`:单个集合文档数量限制
- - 不计入 `UPLOADED` 和 `DELETED` 状态的文档
-
-**配额超限处理**:
-- 抛出 `QuotaExceededException`
-- 返回 HTTP 400 错误
-- 包含当前用量和配额上限信息
-
-### 阶段 4: 文档解析与格式转换
-
-#### 4.1 Parser 架构
-
-系统采用**多 Parser 链式调用**架构,每个 Parser 负责特定类型的文件解析:
-
-```
-DocParser(主控制器)
- │
- ├─► MinerUParser
- │ └─ 功能:高精度 PDF 解析(商业 API)
- │ └─ 支持:.pdf
- │
- ├─► ImageParser
- │ └─ 功能:图片内容识别(OCR + 视觉理解)
- │ └─ 支持:.jpg, .png, .gif, .bmp, .tiff
- │
- ├─► AudioParser
- │ └─ 功能:音频转录(Speech-to-Text)
- │ └─ 支持:.mp3, .wav, .m4a
- │
- └─► MarkItDownParser(兜底)
- └─ 功能:通用文档转 Markdown
- └─ 支持:几乎所有常见格式
-```
-
-#### 4.2 Parser 配置
-
-**配置方式**:通过集合配置(Collection Config)动态控制
-
-```json
-{
- "parser_config": {
- "use_mineru": false, // 是否启用 MinerU(需要 API Token)
- "use_markitdown": true, // 是否启用 MarkItDown(默认)
- "mineru_api_token": "xxx" // MinerU API Token(可选)
- }
-}
-```
-
-**环境变量配置**:
-```bash
-USE_MINERU_API=false # 全局启用 MinerU
-MINERU_API_TOKEN=your_token # MinerU API Token
-```
-
-#### 4.3 解析流程
-
-```
-Celery Worker 收到索引任务
- │
- ▼
-1. 从对象存储下载原始文件
- │
- ▼
-2. 根据文件扩展名选择 Parser
- │
- ├─► 尝试第一个匹配的 Parser
- │ ├─ 成功:返回解析结果
- │ └─ 失败:FallbackError → 尝试下一个 Parser
- │
- └─► 最终兜底:MarkItDownParser
- │
- ▼
-3. 解析结果(Parts):
- │
- ├─► MarkdownPart:文本内容
- │ └─ 包含:标题、段落、列表、表格等
- │
- ├─► PdfPart:PDF 文件
- │ └─ 用于:线性化、页面渲染
- │
- └─► AssetBinPart:二进制资源
- └─ 包含:图片、嵌入的文件等
- │
- ▼
-4. 后处理(Post-processing):
- │
- ├─► PDF 页面转图片(Vision 索引需要)
- │ └─ 每页渲染为 PNG 图片
- │ └─ 保存到 {document_path}/images/page_N.png
- │
- ├─► PDF 线性化(加速浏览器加载)
- │ └─ 使用 pikepdf 优化 PDF 结构
- │ └─ 保存到 {document_path}/converted.pdf
- │
- └─► 提取文本内容(纯文本)
- └─ 合并所有 MarkdownPart 内容
- └─ 保存到 {document_path}/processed_content.md
- │
- ▼
-5. 保存到对象存储
-```
-
-#### 4.4 格式转换示例
-
-**示例 1:PDF 文档**
-```
-输入:user_manual.pdf (5 MB)
- │
- ▼
-解析器选择:MinerUParser / MarkItDownParser
- │
- ▼
-输出 Parts:
- ├─ MarkdownPart: "# User Manual\n\n## Chapter 1\n..."
- └─ PdfPart: <原始 PDF 数据>
- │
- ▼
-后处理:
- ├─ 渲染 50 页为图片 → images/page_0.png ~ page_49.png
- ├─ 线性化 PDF → converted.pdf
- └─ 提取文本 → processed_content.md
-```
-
-**示例 2:图片文件**
-```
-输入:screenshot.png (2 MB)
- │
- ▼
-解析器选择:ImageParser
- │
- ▼
-输出 Parts:
- ├─ MarkdownPart: "[OCR 提取的文字内容]"
- └─ AssetBinPart: <原始图片数据> (vision_index=true)
- │
- ▼
-后处理:
- └─ 保存原图副本 → images/file.png
-```
-
-**示例 3:音频文件**
-```
-输入:meeting_record.mp3 (50 MB)
- │
- ▼
-解析器选择:AudioParser
- │
- ▼
-输出 Parts:
- └─ MarkdownPart: "[转录的会议内容文本]"
- │
- ▼
-后处理:
- └─ 保存转录文本 → processed_content.md
-```
-
-### 阶段 5: 索引构建
-
-#### 5.1 索引类型与功能
-
-| 索引类型 | 是否必选 | 功能描述 | 存储位置 |
-|---------|---------|----------|----------|
-| **VECTOR** | ✅ 必选 | 向量化检索,支持语义搜索 | Qdrant / Elasticsearch |
-| **FULLTEXT** | ✅ 必选 | 全文检索,支持关键词搜索 | Elasticsearch |
-| **GRAPH** | ❌ 可选 | 知识图谱,提取实体和关系 | Neo4j / PostgreSQL |
-| **SUMMARY** | ❌ 可选 | 文档摘要,LLM 生成 | PostgreSQL (index_data) |
-| **VISION** | ❌ 可选 | 视觉理解,图片内容分析 | Qdrant (向量) + PG (metadata) |
-
-#### 5.2 索引构建流程
-
-```
-Celery Worker: reconcile_document_indexes 任务
- │
- ▼
-1. 扫描 DocumentIndex 表,找到需要处理的索引
- │
- ├─► PENDING 状态 + observed_version < version
- │ └─ 需要创建或更新索引
- │
- └─► DELETING 状态
- └─ 需要删除索引
- │
- ▼
-2. 按文档分组,逐个处理
- │
- ▼
-3. 对每个文档:
- │
- ├─► parse_document(解析文档)
- │ ├─ 从对象存储下载原始文件
- │ ├─ 调用 DocParser 解析
- │ └─ 返回 ParsedDocumentData
- │
- └─► 对每个索引类型:
- │
- ├─► create_index (创建/更新索引)
- │ │
- │ ├─ VECTOR 索引:
- │ │ ├─ 文档分块(Chunking)
- │ │ ├─ Embedding 模型生成向量
- │ │ └─ 写入 Qdrant
- │ │
- │ ├─ FULLTEXT 索引:
- │ │ ├─ 提取纯文本内容
- │ │ ├─ 按段落/章节分块
- │ │ └─ 写入 Elasticsearch
- │ │
- │ ├─ GRAPH 索引:
- │ │ ├─ 使用 LightRAG 提取实体
- │ │ ├─ 提取实体间关系
- │ │ └─ 写入 Neo4j/PostgreSQL
- │ │
- │ ├─ SUMMARY 索引:
- │ │ ├─ 调用 LLM 生成摘要
- │ │ └─ 保存到 DocumentIndex.index_data
- │ │
- │ └─ VISION 索引:
- │ ├─ 提取图片 Assets
- │ ├─ Vision LLM 理解图片内容
- │ ├─ 生成图片描述向量
- │ └─ 写入 Qdrant
- │
- └─► 更新索引状态
- ├─ 成功:CREATING → ACTIVE
- └─ 失败:CREATING → FAILED
- │
- ▼
-4. 更新文档总体状态
- │
- ├─ 所有索引都 ACTIVE → Document.status = COMPLETE
- ├─ 任一索引 FAILED → Document.status = FAILED
- └─ 部分索引仍在处理 → Document.status = RUNNING
-```
-
-#### 5.3 文档分块(Chunking)
-
-**分块策略**:
-- 递归字符分割(RecursiveCharacterTextSplitter)
-- 按自然段落、章节优先切分
-- 保留上下文重叠(Overlap)
-
-**分块参数**:
-```json
-{
- "chunk_size": 1000, // 每块最大字符数
- "chunk_overlap": 200, // 重叠字符数
- "separators": ["\n\n", "\n", " ", ""] // 分隔符优先级
-}
-```
-
-**分块结果存储**:
-```
-{document_path}/chunks/
- ├─ chunk_0.json: {"text": "...", "metadata": {...}}
- ├─ chunk_1.json: {"text": "...", "metadata": {...}}
- └─ ...
-```
-
-## 数据库设计
-
-### 表 1: document(文档元数据)
-
-**表结构**:
-
-| 字段名 | 类型 | 说明 | 索引 |
-|--------|------|------|------|
-| `id` | String(24) | 文档 ID,主键,格式:`doc{random_id}` | PK |
-| `name` | String(1024) | 文件名 | - |
-| `user` | String(256) | 用户 ID(支持多种 IDP) | ✅ Index |
-| `collection_id` | String(24) | 所属集合 ID | ✅ Index |
-| `status` | Enum | 文档状态(见下表) | ✅ Index |
-| `size` | BigInteger | 文件大小(字节) | - |
-| `content_hash` | String(64) | SHA-256 哈希(用于去重) | ✅ Index |
-| `object_path` | Text | 对象存储路径(已废弃,用 doc_metadata) | - |
-| `doc_metadata` | Text | 文档元数据(JSON 字符串) | - |
-| `gmt_created` | DateTime(tz) | 创建时间(UTC) | - |
-| `gmt_updated` | DateTime(tz) | 更新时间(UTC) | - |
-| `gmt_deleted` | DateTime(tz) | 删除时间(软删除) | ✅ Index |
-
-**唯一约束**:
-```sql
-UNIQUE INDEX uq_document_collection_name_active
- ON document (collection_id, name)
- WHERE gmt_deleted IS NULL;
-```
-- 同一集合内,活跃文档的名称不能重复
-- 已删除的文档不参与唯一性检查
-
-**文档状态枚举**(`DocumentStatus`):
-
-| 状态 | 说明 | 何时设置 | 可见性 |
-|------|------|----------|--------|
-| `UPLOADED` | 已上传到临时存储 | `upload_document` 接口 | 前端文件选择界面 |
-| `PENDING` | 等待索引构建 | `confirm_documents` 接口 | 文档列表(处理中) |
-| `RUNNING` | 索引构建中 | Celery 任务开始处理 | 文档列表(处理中) |
-| `COMPLETE` | 所有索引完成 | 所有索引变为 ACTIVE | 文档列表(可用) |
-| `FAILED` | 索引构建失败 | 任一索引失败 | 文档列表(失败) |
-| `DELETED` | 已删除 | `delete_document` 接口 | 不可见(软删除) |
-| `EXPIRED` | 临时文档过期 | 定时清理任务 | 不可见 |
-
-**文档元数据示例**(`doc_metadata` JSON 字段):
-```json
-{
- "object_path": "user-xxx/col_xxx/doc_xxx/original.pdf",
- "converted_path": "user-xxx/col_xxx/doc_xxx/converted.pdf",
- "processed_content_path": "user-xxx/col_xxx/doc_xxx/processed_content.md",
- "images": [
- "user-xxx/col_xxx/doc_xxx/images/page_0.png",
- "user-xxx/col_xxx/doc_xxx/images/page_1.png"
- ],
- "parser_used": "MinerUParser",
- "parse_duration_ms": 5420,
- "page_count": 50,
- "custom_field": "value"
-}
-```
-
-### 表 2: document_index(索引状态管理)
-
-**表结构**:
-
-| 字段名 | 类型 | 说明 | 索引 |
-|--------|------|------|------|
-| `id` | Integer | 自增 ID,主键 | PK |
-| `document_id` | String(24) | 关联的文档 ID | ✅ Index |
-| `index_type` | Enum | 索引类型(见下表) | ✅ Index |
-| `status` | Enum | 索引状态(见下表) | ✅ Index |
-| `version` | Integer | 索引版本号 | - |
-| `observed_version` | Integer | 已处理的版本号 | - |
-| `index_data` | Text | 索引数据(JSON),如摘要内容 | - |
-| `error_message` | Text | 错误信息(失败时) | - |
-| `gmt_created` | DateTime(tz) | 创建时间 | - |
-| `gmt_updated` | DateTime(tz) | 更新时间 | - |
-| `gmt_last_reconciled` | DateTime(tz) | 最后协调时间 | - |
-
-**唯一约束**:
-```sql
-UNIQUE CONSTRAINT uq_document_index
- ON document_index (document_id, index_type);
-```
-- 每个文档的每种索引类型只有一条记录
-
-**索引类型枚举**(`DocumentIndexType`):
-
-| 类型 | 值 | 说明 | 外部存储 |
-|------|-----|------|----------|
-| `VECTOR` | "VECTOR" | 向量索引 | Qdrant / Elasticsearch |
-| `FULLTEXT` | "FULLTEXT" | 全文索引 | Elasticsearch |
-| `GRAPH` | "GRAPH" | 知识图谱 | Neo4j / PostgreSQL |
-| `SUMMARY` | "SUMMARY" | 文档摘要 | PostgreSQL (index_data) |
-| `VISION` | "VISION" | 视觉索引 | Qdrant + PostgreSQL |
-
-**索引状态枚举**(`DocumentIndexStatus`):
-
-| 状态 | 说明 | 何时设置 |
-|------|------|----------|
-| `PENDING` | 等待处理 | `confirm_documents` 创建索引记录 |
-| `CREATING` | 创建中 | Celery Worker 开始处理 |
-| `ACTIVE` | 就绪可用 | 索引构建成功 |
-| `DELETING` | 标记删除 | `delete_document` 接口 |
-| `DELETION_IN_PROGRESS` | 删除中 | Celery Worker 正在删除 |
-| `FAILED` | 失败 | 索引构建失败 |
-
-**版本控制机制**:
-- `version`:期望的索引版本(每次文档更新时 +1)
-- `observed_version`:已处理的版本号
-- `version > observed_version` 时,触发索引更新
-
-**协调器(Reconciler)**:
-```python
-# 查询需要处理的索引
-SELECT * FROM document_index
-WHERE status = 'PENDING'
- AND observed_version < version;
-
-# 处理后更新
-UPDATE document_index
-SET status = 'ACTIVE',
- observed_version = version,
- gmt_last_reconciled = NOW()
-WHERE id = ?;
-```
-
-### 表关系图
-
-```
-┌─────────────────────────────────┐
-│ collection │
-│ ───────────────────────────── │
-│ id (PK) │
-│ name │
-│ config (JSON) │
-│ status │
-│ ... │
-└────────────┬────────────────────┘
- │ 1:N
- ▼
-┌─────────────────────────────────┐
-│ document │
-│ ───────────────────────────── │
-│ id (PK) │
-│ collection_id (FK) │◄──── 唯一约束: (collection_id, name)
-│ name │
-│ user │
-│ status (Enum) │
-│ size │
-│ content_hash (SHA-256) │
-│ doc_metadata (JSON) │
-│ gmt_created │
-│ gmt_deleted │
-│ ... │
-└────────────┬────────────────────┘
- │ 1:N
- ▼
-┌─────────────────────────────────┐
-│ document_index │
-│ ───────────────────────────── │
-│ id (PK) │
-│ document_id (FK) │◄──── 唯一约束: (document_id, index_type)
-│ index_type (Enum) │
-│ status (Enum) │
-│ version │
-│ observed_version │
-│ index_data (JSON) │
-│ error_message │
-│ gmt_last_reconciled │
-│ ... │
-└─────────────────────────────────┘
-```
-
-## 状态机与生命周期
-
-### 文档状态转换
-
-```
- ┌─────────────────────────────────────────────┐
- │ │
- │ ▼
- [上传文件] ──► UPLOADED ──► [确认] ──► PENDING ──► RUNNING ──► COMPLETE
- │ │
- │ ▼
- │ FAILED
- │ │
- │ ▼
- └──────► [删除] ──────────────► DELETED
- │
- ┌───────────────────────────────────┘
- │
- ▼
- EXPIRED (定时清理未确认的文档)
-```
-
-**关键转换**:
-1. **UPLOADED → PENDING**:用户点击"保存到集合"
-2. **PENDING → RUNNING**:Celery Worker 开始处理
-3. **RUNNING → COMPLETE**:所有索引都成功
-4. **RUNNING → FAILED**:任一索引失败
-5. **任何状态 → DELETED**:用户删除文档
-
-### 索引状态转换
-
-```
- [创建索引记录] ──► PENDING ──► CREATING ──► ACTIVE
- │
- ▼
- FAILED
- │
- ▼
- ┌──────────► PENDING (重试)
- │
- [删除请求] ──────┼──────────► DELETING ──► DELETION_IN_PROGRESS ──► (记录删除)
- │
- └──────────► (直接删除记录,如果 PENDING/FAILED)
-```
-
-## 异步任务调度(Celery)
-
-### 任务定义
-
-**主任务**:`reconcile_document_indexes`
-- 触发时机:
- - `confirm_documents` 接口调用后
- - 定时任务(每 30 秒)
- - 手动触发(管理界面)
-- 功能:扫描 `document_index` 表,处理需要协调的索引
-
-**子任务**:
-- `parse_document_task`:解析文档内容
-- `create_vector_index_task`:创建向量索引
-- `create_fulltext_index_task`:创建全文索引
-- `create_graph_index_task`:创建知识图谱索引
-- `create_summary_index_task`:创建摘要索引
-- `create_vision_index_task`:创建视觉索引
-
-### 任务调度策略
-
-**并发控制**:
-- 每个 Worker 最多同时处理 N 个文档(默认 4)
-- 每个文档的多个索引可以并行构建
-- 使用 Celery 的 `task_acks_late=True` 确保任务不丢失
-
-**失败重试**:
-- 最多重试 3 次
-- 指数退避(1分钟 → 5分钟 → 15分钟)
-- 3 次失败后标记为 `FAILED`
-
-**幂等性**:
-- 所有任务支持重复执行
-- 使用 `observed_version` 机制避免重复处理
-- 相同输入产生相同输出
-
-## 设计特点与优势
-
-### 1. 两阶段提交设计
-
-**优势**:
-- ✅ **用户体验更好**:快速上传响应,不阻塞用户操作
-- ✅ **选择性添加**:批量上传后可选择性确认部分文件
-- ✅ **资源控制合理**:未确认的文档不构建索引,不消耗配额
-- ✅ **故障恢复友好**:临时文档可以定期清理,不影响业务
-
-**状态隔离**:
-```
-临时状态(UPLOADED):
- - 不计入配额
- - 不触发索引
- - 可以被自动清理
-
-正式状态(PENDING/RUNNING/COMPLETE):
- - 计入配额
- - 触发索引构建
- - 不会被自动清理
-```
-
-### 2. 幂等性设计
-
-**文件级别幂等**:
-- SHA-256 哈希去重
-- 相同文件多次上传返回同一 `document_id`
-- 避免存储空间浪费
-
-**接口级别幂等**:
-- `upload_document`:重复上传返回已存在文档
-- `confirm_documents`:重复确认不会创建重复索引
-- `delete_document`:重复删除返回成功(软删除)
-
-### 3. 多租户隔离
-
-**存储隔离**:
-```
-user-{user_A}/... # 用户 A 的文件
-user-{user_B}/... # 用户 B 的文件
-```
-
-**数据库隔离**:
-- 所有查询都带 `user` 字段过滤
-- 集合级别的权限控制(`collection.user`)
-- 软删除支持(`gmt_deleted`)
-
-### 4. 灵活的存储后端
-
-**统一接口**:
-```python
-AsyncObjectStore:
- - put(path, data)
- - get(path)
- - delete_objects_by_prefix(prefix)
-```
-
-**运行时切换**:
-- 通过环境变量切换 Local/S3
-- 无需修改业务代码
-- 支持自定义存储后端(实现接口即可)
-
-### 5. 事务一致性
-
-**数据库 + 对象存储的两阶段提交**:
-```python
-async with transaction:
- # 1. 创建数据库记录
- document = create_document_record()
-
- # 2. 上传到对象存储
- await object_store.put(path, data)
-
- # 3. 更新元数据
- document.doc_metadata = json.dumps(metadata)
-
- # 所有操作成功才提交,任一失败则回滚
-```
-
-**失败处理**:
-- 数据库记录创建失败:不上传文件
-- 文件上传失败:回滚数据库记录
-- 元数据更新失败:回滚前面的操作
-
-### 6. 可观测性
-
-**审计日志**:
-- `@audit` 装饰器记录所有文档操作
-- 包含:用户、时间、操作类型、资源 ID
-
-**任务追踪**:
-- `gmt_last_reconciled`:最后处理时间
-- `error_message`:失败原因
-- Celery 任务 ID:关联日志追踪
-
-**监控指标**:
-- 文档上传速率
-- 索引构建耗时
-- 失败率统计
-
-## 性能优化
-
-### 1. 异步处理
-
-**上传不阻塞**:
-- 文件上传到对象存储后立即返回
-- 索引构建在 Celery 中异步执行
-- 前端通过轮询或 WebSocket 获取进度
-
-### 2. 批量操作
-
-**批量确认**:
-```python
-confirm_documents(document_ids=[id1, id2, ..., idN])
-```
-- 一次事务处理多个文档
-- 批量创建索引记录
-- 减少数据库往返
-
-### 3. 缓存策略
-
-**解析结果缓存**:
-- 解析后的内容保存到 `processed_content.md`
-- 后续索引重建可直接读取,无需重新解析
-
-**分块结果缓存**:
-- 分块结果保存到 `chunks/` 目录
-- 向量索引重建可复用分块结果
-
-### 4. 并行索引构建
-
-**多索引并行**:
-```python
-# VECTOR、FULLTEXT、GRAPH 可以并行构建
-await asyncio.gather(
- create_vector_index(),
- create_fulltext_index(),
- create_graph_index()
-)
-```
-
-## 错误处理
-
-### 常见异常
-
-| 异常类型 | HTTP 状态码 | 触发场景 | 处理建议 |
-|---------|------------|----------|----------|
-| `ResourceNotFoundException` | 404 | 集合/文档不存在 | 检查 ID 是否正确 |
-| `CollectionInactiveException` | 400 | 集合未激活 | 等待集合初始化完成 |
-| `DocumentNameConflictException` | 409 | 同名不同内容 | 重命名文件或删除旧文档 |
-| `QuotaExceededException` | 429 | 配额超限 | 升级套餐或删除旧文档 |
-| `InvalidFileTypeException` | 400 | 不支持的文件类型 | 查看支持的文件类型列表 |
-| `FileSizeTooLargeException` | 413 | 文件过大 | 分割文件或压缩 |
-
-### 异常传播
-
-```
-Service Layer 抛出异常
- │
- ▼
-View Layer 捕获并转换
- │
- ▼
-Exception Handler 统一处理
- │
- ▼
-返回标准 JSON 响应:
-{
- "error_code": "QUOTA_EXCEEDED",
- "message": "Document count limit exceeded",
- "details": {
- "limit": 1000,
- "current": 1000
- }
-}
-```
-
-## 相关文件索引
-
-### 核心实现
-
-- **View 层**:`aperag/views/collections.py` - HTTP 接口定义
-- **Service 层**:`aperag/service/document_service.py` - 业务逻辑
-- **数据库模型**:`aperag/db/models.py` - Document, DocumentIndex 表定义
-- **数据库操作**:`aperag/db/ops.py` - CRUD 操作封装
-
-### 对象存储
-
-- **接口定义**:`aperag/objectstore/base.py` - AsyncObjectStore 抽象类
-- **Local 实现**:`aperag/objectstore/local.py` - 本地文件系统存储
-- **S3 实现**:`aperag/objectstore/s3.py` - S3 兼容存储
-
-### 文档解析
-
-- **主控制器**:`aperag/docparser/doc_parser.py` - DocParser
-- **Parser 实现**:
- - `aperag/docparser/mineru_parser.py` - MinerU PDF 解析
- - `aperag/docparser/mineru_parser.py` - MinerU 文档解析
- - `aperag/docparser/markitdown_parser.py` - MarkItDown 通用解析
- - `aperag/docparser/image_parser.py` - 图片 OCR
- - `aperag/docparser/audio_parser.py` - 音频转录
-- **文档处理**:`aperag/index/document_parser.py` - 解析流程编排
-
-### 索引构建
-
-- **索引管理**:`aperag/index/manager.py` - DocumentIndexManager
-- **向量索引**:`aperag/index/vector_index.py` - VectorIndexer
-- **全文索引**:`aperag/index/fulltext_index.py` - FulltextIndexer
-- **知识图谱**:`aperag/index/graph_index.py` - GraphIndexer
-- **文档摘要**:`aperag/index/summary_index.py` - SummaryIndexer
-- **视觉索引**:`aperag/index/vision_index.py` - VisionIndexer
-
-### 任务调度
-
-- **任务定义**:`config/celery_tasks.py` - Celery 任务注册
-- **协调器**:`aperag/tasks/reconciler.py` - DocumentIndexReconciler
-- **文档任务**:`aperag/tasks/document.py` - DocumentIndexTask
-
-### 前端实现
-
-- **文档列表**:`web/src/app/workspace/collections/[collectionId]/documents/page.tsx`
-- **文档上传**:`web/src/app/workspace/collections/[collectionId]/documents/upload/document-upload.tsx`
-
-## 总结
-
-ApeRAG 的文档上传模块采用**两阶段提交 + 多 Parser 链式调用 + 多索引并行构建**的架构设计:
-
-**核心特性**:
-1. ✅ **两阶段提交**:上传(临时存储)→ 确认(正式添加),提供更好的用户体验
-2. ✅ **SHA-256 去重**:避免重复文档,支持幂等上传
-3. ✅ **灵活存储后端**:Local/S3 可配置切换,统一接口抽象
-4. ✅ **多 Parser 架构**:支持 MinerU、MarkItDown 等多种解析器
-5. ✅ **格式自动转换**:PDF→图片、音频→文本、图片→OCR 文本
-6. ✅ **多索引协调**:向量、全文、图谱、摘要、视觉五种索引类型
-7. ✅ **配额管理**:确认阶段才扣除配额,合理控制资源
-8. ✅ **异步处理**:Celery 任务队列,不阻塞用户操作
-9. ✅ **事务一致性**:数据库 + 对象存储的两阶段提交
-10. ✅ **可观测性**:审计日志、任务追踪、错误信息完整记录
-
-这种设计既保证了高性能和可扩展性,又支持复杂的文档处理场景(多格式、多语言、多模态),同时具有良好的容错能力和用户体验。
diff --git a/docs/zh-CN/design/evaluation-design.md b/docs/zh-CN/design/evaluation-design.md
deleted file mode 100644
index 0e28f4124..000000000
--- a/docs/zh-CN/design/evaluation-design.md
+++ /dev/null
@@ -1,156 +0,0 @@
-# ApeRAG 自动化评估功能设计文档
-
-> 本文是 `#20` **Evaluation v3 simplification** 的技术设计纲要,对齐当前主线。
->
-> 如果你要了解产品视角的使用说明,请优先阅读:
-> [Evaluation 当前产品状态与使用说明](../reference/evaluation-current-guide.md)。
->
-> 早期的 Benchmark / Dataset Version / Question Set 设计稿已经**作废**,保留的只是
-> 为了理解历史迁移;所有实现细节以本文件 + `#20` merge 后的 main 为准。
-
-## 1. 目标
-
-让 Evaluation 变成一条"创建数据集 → 录入问题 → 点一下发起评测"的三步流程,删除
-用户视角中的 `Benchmark / Dataset Version / Publish / Question Set / 选 Bot` 等
-全部非必要概念。后端保留最小的 Dataset + Run + RunItem + Attempt 四个对象。
-
-## 2. 非目标
-
-- 不做 judge / scoring 最终实现(PR-C polish)。
-- 不做自动 QA 生成(留 generated `source_type` 占位)。
-- 不做跨 Collection 的 dataset 引用(`collection_id` 仅用作 scope/filter)。
-
-## 3. 数据模型(与 `#1588` / `#1590` migration 一致)
-
-### 3.1 `evaluation_datasets`
-
-| 字段 | 说明 |
-| ---- | ---- |
-| `id` | PK |
-| `user_id` | 所有者(硬过滤权限边界) |
-| `collection_id?` | 关联 Collection 仅做 scope,不继承 sharing |
-| `name` / `description` | 文本字段 |
-| `source_type` | `manual` / `import` / `generated` |
-| `schema_hint?` | JSON 提示 |
-| `item_count` | 冗余计数,便于前端显示和 Start Run 的 gating |
-| 审计字段 | `created_at / updated_at / deleted_at` |
-
-### 3.2 `evaluation_dataset_items`
-
-| 字段 | 说明 |
-| ---- | ---- |
-| `id` | PK |
-| `dataset_id` | FK -> `evaluation_datasets.id`, `ON DELETE RESTRICT` |
-| `case_key` | 稳定键,留空由后端生成 |
-| `input_message` | 必填 |
-| `expected_answer?` / `reference_context?` | 可选 |
-| `tags?` / `case_metadata?` / `sort_key` | 辅助字段 |
-
-### 3.3 `evaluation_runs`
-
-| 字段 | 说明 |
-| ---- | ---- |
-| `id` / `user_id` | 基本 |
-| `bot_id` | resolved bot id;显式传入或按 default-bot 解析 |
-| `dataset_id` | 快照时的 dataset,不跟随 dataset rename/delete |
-| `collection_id?` | 冗余过滤用 |
-| `dataset_name?` | snapshot,dataset 被删仍可读 |
-| `name?` | 运行名称 |
-| `status` | `queued / running / completed / failed / cancelled` |
-| `summary?` | JSON `{total,pending,running,completed,failed,cancelled,avg_score?}` |
-| `judge_config?` / `bot_config_snapshot?` / `model_config_snapshot?` | 快照 |
-| `error?` / `created_at / updated_at / started_at? / finished_at?` | 基本 |
-
-### 3.4 `evaluation_run_items`
-
-Run item 是 dataset item 在**创建 run 时的 value-copy 快照**。运行期间不回读
-mutable `evaluation_dataset_items`。字段(节选):
-
-- `source_dataset_item_id?`:字符串指针,**不是** FK;用来做可追溯。
-- `case_key / sort_key / input_message / expected_answer? / reference_context? / tags? / case_metadata?`:全部快照。
-- `status`:`pending / running / completed / failed / cancelled`。
-- `best_score? / latest_attempt_id? / latest_attempt? / attempt_count / error?`:执行态。
-
-### 3.5 `evaluation_run_item_attempts`
-
-单次调用记录:`attempt_no`、`agent_chat_id? / agent_turn_id?` 字符串指针(不升级为 FK)、
-`answer_text?`、`judge_result?`、`score?`、`latency_ms?`、`token_usage?`、`error?`、
-`retry_reason?`、时间戳。
-
-## 4. 公开 API(`/api/v2/*`,`openapi.public.json` 唯一真源)
-
-```
-GET /api/v2/evaluation-datasets ?collection_id&page&page_size
-POST /api/v2/evaluation-datasets
-GET /api/v2/evaluation-datasets/{dataset_id}
-PUT /api/v2/evaluation-datasets/{dataset_id}
-DELETE /api/v2/evaluation-datasets/{dataset_id}
-GET /api/v2/evaluation-datasets/{dataset_id}/items ?page&page_size
-POST /api/v2/evaluation-datasets/{dataset_id}/items
-PUT /api/v2/evaluation-datasets/{dataset_id}/items/{item_id}
-DELETE /api/v2/evaluation-datasets/{dataset_id}/items/{item_id}
-
-GET /api/v2/evaluation-runs ?collection_id&bot_id&dataset_id&page&page_size
-POST /api/v2/evaluation-runs
-GET /api/v2/evaluation-runs/{run_id}
-GET /api/v2/evaluation-runs/{run_id}/items ?page&page_size
-POST /api/v2/evaluation-runs/{run_id}/cancel
-POST /api/v2/evaluation-runs/{run_id}/items/{item_id}/retry
-GET /api/v2/evaluation-runs/{run_id}/items/{item_id}/attempts
-```
-
-`/api/v2/benchmark-datasets*` + `/versions*` 以及 `dataset_version_id` 字段已在
-`#1590` destructive migration 中一次性拆掉,不再提供。
-
-## 5. Default Bot 解析
-
-`EvaluationRunCreate.bot_id` 可选,缺省时:
-
-1. 选择当前用户下 `active=true` 且标题为 `Default Agent Bot` 的 bot。
-2. 若上一步无结果,退到 `gmt_created ASC` 的最早 active bot。
-3. 若仍无,返回可被 FE 识别的错误文案,FE 替换为"当前没有可用于评测的 Bot,请先创建 Bot 或联系管理员"(见 msg=38d7e74d UX 补丁 G)。
-
-`DEFAULT_AGENT_BOT_TITLE` 放在服务层常量,用单测固定顺序和 active/soft-delete 条件。
-
-## 6. Runtime(PR-1b 边界,落在 `#20 PR-1b`)
-
-- `launch_run()` 触发 Celery task,不同步 no-op 不直接 mark completed。
-- `run_evaluation_run(run_id)` 从 `evaluation_run_items` snapshot 读取,**不回读** `evaluation_dataset_items`。
-- 每 item 通过 `agent_runtime.runtime.agent_runtime_manager` 派发 turn,不走 HTTP bot route。
-- 写 `evaluation_run_item_attempts`,状态机 `PENDING → RUNNING → COMPLETED / FAILED`。
-- 增量更新 `evaluation_runs.summary`。
-- 不做 judge scoring / best_score / complex retry(PR-C polish)。
-- focused test:成功 + 失败 + snapshot-only read 断言 + 不引 `dataset_version_id / benchmark_*`;mock seam 放在 `agent_runtime_manager.dispatch_turn`,不 re-assert `#13` chat persistence 层。
-
-## 7. 前端(PR-2 本 PR 落地)
-
-- 单入口:`/workspace/collections/{collectionId}/evaluations`,Datasets section + Runs section。
-- 子入口:`/workspace/collections/{collectionId}/evaluations/datasets/{datasetId}` 管理 dataset items。
-- Run 详情:
- - `/workspace/collections/{collectionId}/evaluations/{runId}`(默认入口)
- - `/workspace/bots/{botId}/evaluation/runs/{runId}`(deep link,由 trace 链接跳入)
-- Bot 页 `/workspace/bots/{botId}/evaluation` 退化为**只读历史列表**,不再提供 `Create run` 入口,不再有 `dataset_version_id` / Bot 选择输入。
-- FE 只消费 `/api/v2/evaluation-*` 和已经完成迁移的 collection/document typed adapter,不再触 `@/api` 老 SDK。
-- typed feature adapter:`web/src/features/evaluation/{types,client-api,server-api}.ts`。
-- i18n:`page_collection_evaluations` (new namespace) + 清理后的 `page_bot_evaluation`;`page_benchmarks` 整 namespace 删除,`global.ts` typed `Messages` 同步。
-- `Start Evaluation` 按钮在 dataset item 数 = 0 时置灰(msg=38d7e74d 补丁 F)。
-
-## 8. 测试
-
-- `tests/unit_test/test_web_typed_api_contract.py::test_evaluation_feature_uses_v2_typed_api_boundary`:正向钉新路径 + 负向钉 0 条 benchmark/dataset_version_id/老 SDK。
-- `tests/unit_test/test_evaluation_v2_openapi_contract.py`(PR-1 落地):OpenAPI spec 层负向钉 benchmark 路径/字段。
-- `tests/e2e_http/hurl/full/16_evaluation_v2.hurl`:端到端覆盖 dataset CRUD → items append → run create(含 bot_id 显式 + default bot 两条)→ run detail → run items → cancel。
-- `#20 PR-1b` 补 runtime focused pytest(PR-1b 范围)。
-
-## 9. 迁移策略
-
-- `#1588`(PR-0)additive foundation:新建 `evaluation_datasets / items`,不动旧 benchmark 表。
-- `#1590`(PR-1)destructive switch:drop 旧 `benchmark_*` + 拆 `evaluation_runs.dataset_version_id`,切公开 API 到新路径。
-- `#20 PR-1b` runtime minimal。
-- `#20 PR-2` FE + docs + hurl(本 PR)。
-
-## 10. 非 scope / 历史约束
-
-- 不在本设计中重新规划 Question Set / Benchmark。这两个概念作为用户可见对象已在 `#1590` 移除。
-- 不改 `agent_turn / turn_feedback`(是 `#13` 的 schema 域)。
-- `collection_id` 只是 scope metadata,不耦合 document upload/indexing 状态机;也不回引 `/api/v1/collections*` 或旧 generated SDK。
diff --git a/docs/zh-CN/design/graph_db_abstraction.md b/docs/zh-CN/design/graph_db_abstraction.md
deleted file mode 100644
index 4019215e7..000000000
--- a/docs/zh-CN/design/graph_db_abstraction.md
+++ /dev/null
@@ -1,619 +0,0 @@
-# 图数据库抽象层设计与计划(ApeRAG)
-
-> Status: **设计与计划文档(仅文档,无代码改动)**。落地分 M1 / M2 / M3
-> 多个小 PR,详见 §8 路线图。
->
-> **先读这个**:[`lightrag_refactor.md`](./lightrag_refactor.md)。本文
-> 的 **Layer B** == 那份文档的 **Phase 1 facade**。实际推荐的落地顺序
-> 是"先做 LightRAG facade(覆盖了本文的 M1 + M2),再按需清理内部存储
-> 抽象(本文的 M3)"。两份文档**刻意拆成两份**是为了分别从两个角度回答
-> 同一组问题,读顺序见 `lightrag_refactor.md` §7。
->
-> 关联文档:[`vector_db_abstraction.md`](./vector_db_abstraction.md)。
-> 向量抽象层的设计思路与本文保持一致,优先"最小可行、不留尾巴、不过
-> 度设计"。
-
----
-
-## 1. 背景与目标
-
-### 1.1 为什么现在做这件事
-
-ApeRAG 的知识图谱(KG)路径目前**完全由内嵌的 LightRAG 实现**承担:
-三种图后端(`PGOpsSyncGraphStorage` / `Neo4JSyncStorage` /
-`NebulaSyncStorage`)并存,通过 `GRAPH_INDEX_GRAPH_STORAGE` env 切换。
-"支持多后端" 这件事**技术上已经可用**,问题出在:
-
-- 业务代码(`graph_service.py` / `search_pipeline_service._graph_search`
- / `tasks/collection.py::_delete_lightrag`)直接调用 `create_lightrag_instance(...)`
- + `rag.X(...)`——LightRAG 的 API 细节泄漏到业务层。
-- `LightRAG` 实例是 **per-request 构造 + `try/finally finalize_storages()`**
- 的,每次调用 API 走一遍完整的 storage 初始化/拆除流程。路径稠密后,
- 一个"查询图"API 就是 5 个子 storage 的冷启动。
-- 已经踩到过第一个真实 bug:`search_pipeline_service._graph_search`
- **创建了 LightRAG 实例但从来没 `finalize_storages()`**——见 §3.R1。
-- 未来计划把 lightrag **从 "内嵌 Python 对象" 改成 "web service"**;
- 到时候 `create_lightrag_instance` 要变成 "连接一个服务并发 RPC"。如果
- 现在业务层直接依赖 LightRAG,迁移时整个 ApeRAG 侧代码都要跟着改。
-
-### 1.2 本文档的目标
-
-**指导下一阶段的图数据库抽象改造**。具体来说:
-
-1. 以**事实**清单的形式把现状固定下来,后面做任何改动都有对照。
-2. **勘定抽象层的边界**:哪些抽象该做在 ApeRAG 侧,哪些该做在(未来的)
- lightrag 服务内部——两个边界不能混。
-3. 列出**顺手能修的 code review 问题**,避免抽象落地时被它们污染。
-4. 给出一个**分阶段的路线图**(M1/M2/M3),每个阶段都能独立上线、独立
- 回滚,不留代码债。
-5. 明确**反过度设计**:哪些"看起来该抽象"的地方,实际上**不应该**动。
-
-### 1.3 非目标
-
-- 本文**不提议**重写 LightRAG 的 `BaseGraphStorage` 或其 25 个跨后端
- 一致性测试。现有实现是可用的,重写成本远高于收益。
-- 本文**不提议**把 entity / relation 向量合并进
- [`aperag/vectorstore`](./vector_db_abstraction.md) 的 shard。那件事属
- 于 lightrag 内部重构,且 lightrag 要搬家,详见 §7.4。
-- 本文**不提议**替换 Neo4j / Nebula / pg-emulated 三种后端中任意一种。
- 这三个已经各有各的使用场景(见 §6 能力矩阵)。
-
----
-
-## 2. 现状事实清单
-
-所有引用基于 2026-04-22 的代码快照。
-
-### 2.1 入口与生命周期
-
-- **唯一工厂**:`aperag/graph/lightrag_manager.py::create_lightrag_instance(collection)`
- (约 59–124 行)。`LightRAG` 实例**不缓存**,每次调用重新构造 +
- `await rag.initialize_storages()`。
-- **workspace = collection.id**:LightRAG 的 "workspace" 绑定 ApeRAG
- collection id,实现按 collection 的逻辑隔离。
-- **`finalize_storages()` 的触发**:
- - `_process_document_async` / `_delete_document_async` 的 `finally`
- (约 215、227 行);
- - `graph_service.py` 中 5 处 handler 的 `try/finally`(约 43-48、85-118、
- 287-296、413-425、449-454 行);
- - `tasks/collection.py::_delete_lightrag`(约 193-194 行)。
-- **漏写 `finalize_storages` 的地方**:`search_pipeline_service._graph_search`
- (约 265-273 行)——见 §3.R1。
-
-### 2.2 三种图后端
-
-`aperag/graph/lightrag/kg/__init__.py::STORAGES`(42–48 行)注册 5 个类:
-
-| 类型 | 类名 | 后端 | 文件 |
-|---|---|---|---|
-| GRAPH | `PGOpsSyncGraphStorage` | PostgreSQL(模拟图,表:`lightrag_graph_nodes` / `lightrag_graph_edges`) | `kg/pg_ops_sync_graph_storage.py` |
-| GRAPH | `Neo4JSyncStorage` | Neo4j 原生图,Cypher | `kg/neo4j_sync_impl.py` |
-| GRAPH | `NebulaSyncStorage` | NebulaGraph 原生图,nGQL | `kg/nebula_sync_impl.py` |
-| KV | `PGOpsSyncKVStorage` | PostgreSQL(分 namespace:text_chunks / llm_cache / doc_status / ...) | `kg/pg_ops_sync_kv_storage.py` |
-| VECTOR | `PGOpsSyncVectorStorage` | PostgreSQL + pgvector(分 namespace:entities / relationships / chunks) | `kg/pg_ops_sync_vector_storage.py` |
-
-可通过 env 任意组合,但实际部署常见的是 `(PGOpsSyncKVStorage,
-PGOpsSyncVectorStorage, X)`,其中 `X` 根据规模选 PG / Neo4j / Nebula。
-
-### 2.3 已经存在的抽象基类
-
-- `aperag/graph/lightrag/base.py::StorageNameSpace`(约 128-172 行):
- `initialize / finalize / drop` 三方法抽象。
-- `BaseVectorStorage`(约 175-251 行):8 个抽象方法。
-- `BaseKVStorage`(约 254-292 行):5 个抽象方法。
-- `BaseGraphStorage`(约 295-606 行):**13 个必须实现 + 11 个带默认实现
- 的 batch/扩展方法**,共 24 个接口点。
-
-### 2.4 跨后端一致性测试
-
-`tests/integration/graphstorage/test_graph_storage.py::GraphStorageTestSuite`
-定义 **25 个测试方法**,覆盖 `has_node / get_node / node_degree /
-upsert_node / delete_node / has_edge / get_edge / get_nodes_batch /
-edge_degrees_batch / data_integrity / large_batch_operations / ...`。
-
-- 每个后端(`test_postgres_graph_storage.py` / `test_neo4j_storage.py`
- / `test_nebula_storage.py`)把 `Oracle` 实例化后喂给同一个 suite。
-- 这是"**事实上的契约**"——任何后端都要过同一套 25 个测试才算实现正确。
-- 本文档不提议替换这个模式;它就是跨后端等价性的最佳形式。
-
-### 2.5 连接管理
-
-- **Neo4j**:`aperag/db/neo4j_sync_manager.py::Neo4jSyncConnectionManager`
- (约 28-44 行)——class-level lazy singleton + `threading.Lock`,driver
- 进程级复用。
-- **Nebula**:`aperag/db/nebula_sync_manager.py` 同结构。
-- **PG**:不需要独立管理器,直接走 `aperag/config.py` 的 `sync_engine`。
-
-**结论**:连接池本身**不是**当前的瓶颈。瓶颈在 LightRAG 级别的 "per-request
-构造 storage 对象 + initialize + finalize" 开销。
-
-### 2.6 业务层对 LightRAG 的调用面
-
-按功能归类:
-
-| 业务动作 | LightRAG 方法 | 调用点 |
-|---|---|---|
-| 索引文档 | `rag.ainsert_and_chunk_document` + `rag.aprocess_graph_indexing` | `lightrag_manager._process_document_async` |
-| 删除文档图数据 | `rag.adelete_by_doc_id` | 同上 + `tasks/collection.py` |
-| 图检索(for RAG) | `rag.aquery_context` | `search_pipeline_service._graph_search` |
-| 查标签列表(for UI) | `rag.get_graph_labels` | `graph_service.get_graph_labels` |
-| 查子图(for UI) | `rag.get_knowledge_graph` | `graph_service.get_knowledge_graph` |
-| 生成合并建议 | `rag.agenerate_merge_suggestions` | `graph_service.generate_merge_suggestions` |
-| 合并节点 | `rag.amerge_nodes` | `graph_service._execute_merge_operation` |
-| 导出 KG 评测数据 | `rag.export_for_kg_eval` | `graph_service.export_for_kg_eval` |
-
-**8 个稳定的业务动作**。这就是"图索引服务"的天然外表面(见 §5.2)。
-
----
-
-## 3. Code review:顺手发现的问题
-
-按优先级排序。R1/R2/R3 建议纳入 M1 小 PR;其余分到 M2/M3 做。
-
-### R1. `_graph_search` 缺 `finalize_storages()` — 资源泄漏 🔴
-
-`aperag/service/search_pipeline_service.py` 约 265-273 行:
-
-```python
-rag = await lightrag_manager.create_lightrag_instance(collection)
-param = QueryParam(mode="hybrid", only_need_context=True, top_k=top_k)
-context = await rag.aquery_context(query=query, param=param)
-if not context:
- return []
-return [DocumentWithScore(text=context, metadata={"recall_type": "graph_search"})]
-```
-
-对照 `graph_service.py` 同类 handler 的 `try/finally` 模式,这里漏了
-`finalize_storages()`。每次图检索查询都会:
-
-- 构造 5 个子 storage 对象(`text_chunks_kv` / `llm_cache_kv` /
- `entities_vdb` / `relationships_vdb` / `chunks_vdb` / `chunk_entity_relation_graph`),
-- 调用它们的 `initialize()`(对 PG 实现是打日志;对 Neo4j / Nebula 会
- 触发 `prepare_database` / `prepare_space`),
-- **不调用** `finalize()` 直接 GC。
-
-短期影响有限(storage 对象 drop 时会被 GC),但这是**唯一的不对称调用
-点**,纳入 M1 修正。
-
-### R2. `LightRAG` dataclass 默认 graph_storage 与 env 不一致 🟡
-
-`aperag/graph/lightrag/lightrag.py` 约 112 行:
-
-```python
-graph_storage: str = field(default="Neo4JSyncStorage")
-```
-
-而 `envs/env.template` 的默认是 `PGOpsSyncGraphStorage`。`create_lightrag_instance`
-显式从 env 读并传入,所以生产路径没问题,但**任何绕开 manager 直接
-`LightRAG(...)` 的代码**(少数测试 / 工具)会默认连 Neo4j,出错信息
-不直观。
-
-建议:M1 把 dataclass 默认值改成 `PGOpsSyncGraphStorage`,并在 docstring
-注明"推荐总是通过 `create_lightrag_instance` 构造"。
-
-### R3. `_configure_storage_backends` 引用已废弃的类名 🟡
-
-`aperag/graph/lightrag_manager.py` 约 329-335 行:
-
-```python
-using_pg = any([
- kv_storage in ["PGKVStorage", "PGSyncKVStorage", "PGOpsSyncKVStorage"],
- vector_storage in ["PGVectorStorage", "PGSyncVectorStorage", "PGOpsSyncVectorStorage"],
- graph_storage == "PGGraphStorage", # <- 此类已不在 STORAGES 注册表里
-])
-```
-
-`STORAGES` 里只有 `PGOpsSyncGraphStorage` / `PGOpsSyncKVStorage` /
-`PGOpsSyncVectorStorage` 三个 PG 实现。`PGKVStorage` / `PGSyncKVStorage`
-/ `PGGraphStorage` 等是**历史遗留**,现在的任何配置都不会用到它们,但
-这段代码会迷惑未来读代码的人("是不是还有别的类我没看到?")。
-
-建议:M1 把分支收窄到 `"PGOpsSyncKVStorage"` / `"PGOpsSyncVectorStorage"`,
-graph 分支整个删掉——因为 graph 即便是 PG 实现也不需要额外 env 检查
-(KV/Vector 已经触发了)。
-
-### R4. 接口面过大:`BaseGraphStorage` 24 个方法 🟡
-
-24 个方法可以自然分成三层:
-
-| 层 | 方法数 | 举例 | 必要性 |
-|---|---|---|---|
-| **核心** | 13 | `has_node` / `get_node` / `upsert_node` / `upsert_edge` / `delete_node` / `get_knowledge_graph` / ... | 每个后端**必须**实现 |
-| **批量** | 8 | `get_nodes_batch` / `node_degrees_batch` / `edge_degrees_batch` / ... | 有**默认实现**(N 次串行调用),后端为了性能**应该**覆盖 |
-| **UI 扩展** | 5 | `get_top_degree_nodes` / `get_node_ids` / `search_node_ids_by_label` / `get_nodes_by_source_ids` / `get_edges_by_source_ids` | 默认返回 `None`,调用方需要自己兜底;PG 实现了、Neo4j/Nebula 没有 |
-
-第三层是隐患:调用方(一般是 export / UI)在某些后端会拿到 `None`
-而在另一些会拿到数据,**体验不一致且难以发现**。
-
-建议:M3 做层次划分,把第三层改为**显式**的 `NotImplementedError` 且在
-`GraphIndexService` 里做"能力探测"(`supports_top_degree_nodes()`)。
-
-### R5. LightRAG 的 `BaseVectorStorage` 与 ApeRAG 的 `aperag/vectorstore` 未打通 🔵
-
-LightRAG 内部 entity / relation / chunk 向量走它自己的
-`BaseVectorStorage`(目前只有 `PGOpsSyncVectorStorage` 实现)。ApeRAG
-主向量存储(存文档 chunk)走 [`aperag/vectorstore`](./vector_db_abstraction.md)
-抽象层(Qdrant / pgvector)。
-
-**两套向量系统并存**,各自有自己的 tenant 约束、分片策略、升级节奏。
-功能上互不影响,但运维视角不太好——"我这个部署里的向量到底在哪几个地方?"
-需要翻文档回答。
-
-**不建议本次修**:lightrag 要搬家,改造成本会被浪费。纳入 §7.4 的未来规划。
-
-### R6. PGOps* 实现普遍用 `asyncio.to_thread` 包同步 SQLAlchemy 🔵
-
-每次 `upsert_node` / `get_node` / `has_edge` 等单次调用都通过
-`asyncio.to_thread(...)` 切到线程池。N 次这样的调用等于 N 次 thread pool
-调度开销。
-
-已有 `node_degrees_batch` 等 batch 方法走的是**应用层组装**,底下还是
-串行单次 `to_thread`。
-
-改法:把 `GraphRepositoryMixin` 的核心方法改成 `async` + `AsyncSession`,
-去掉 `asyncio.to_thread` 包装。好处明显但工作量不小,且 lightrag 搬家
-时这段代码会搬走——所以**不建议本次修**。
-
-### R7. `search_pipeline_service._graph_search` 与 `graph_service` 有重复初始化 🔵
-
-一次"用户发起查询"包含多条子路径(vector / graph / fulltext /
-summary),每条路径如果启用 graph 都可能独立 `create_lightrag_instance`。
-如果同一请求内多次触发 graph 相关动作,就是 N 次完整的 LightRAG 初始化。
-
-改法:**request-scoped cache**——在 FastAPI 的 per-request context 里
-缓存 `rag` 实例,请求结束时统一 finalize。这是干净的并发模式,可以
-减少大量 cold-start 开销。
-
-纳入 M2(引入 `GraphIndexService` 时自然一并做掉)。
-
----
-
-## 4. 未来约束:lightrag 改 web service 形态
-
-用户明确说过未来会做:"把 lightrag 改成更贴近 web service 的形态,而不
-是现在的内置一个 lightrag 对象"。本节把这件事的约束写清楚,让本次抽象
-层不被提前废掉。
-
-### 4.1 未来的部署形态(假设)
-
-```
-┌──────────┐ HTTP/gRPC ┌──────────────────┐
-│ ApeRAG │ ────────────► │ LightRAG svc │
-│ │ │ │
-└──────────┘ │ ┌────────────┐ │
- │ │ BaseGraphS │──┼──► PG / Neo4j / Nebula
- │ └────────────┘ │
- │ ┌────────────┐ │
- │ │ BaseKVStor │──┼──► PostgreSQL
- │ └────────────┘ │
- │ ┌────────────┐ │
- │ │ BaseVectSt │──┼──► pgvector
- │ └────────────┘ │
- └──────────────────┘
-```
-
-### 4.2 这意味着什么
-
-- **图存储抽象(`BaseGraphStorage`)的归宿是 lightrag 服务内部**。搬家
- 那天,`aperag/graph/lightrag/kg/*` 整个目录会搬走。ApeRAG 侧不再
- 直接接触 Neo4j / Nebula / pg-emulated 的 SDK。
-- **ApeRAG 侧需要的是"对 lightrag 服务的抽象"**:一个可以切换 embedded
- 与 remote 实现的接口。这是 §5.2 说的 Layer B。
-- **生命周期简化**:`initialize_storages` / `finalize_storages` 变成
- lightrag 服务内部事——客户端只需要管理 HTTP 连接(这是标准事务)。
-- **延迟增加**:原本 in-process 的 `rag.get_knowledge_graph(...)` 变成
- RPC。对频繁调用路径要有客户端缓存(batch / dedup / memoize)。
-
-### 4.3 对本次抽象的启示
-
-**Layer A(图存储后端层)本次不动;Layer B(图索引服务层)本次做。**
-
-- Layer A:`BaseGraphStorage` + 三种实现 + 25 测试套件——让它原样跟 lightrag
- 一起走。本次只做 §3 的 R1-R3 清洁工作。
-- Layer B:`GraphIndexService` 是 ApeRAG 对 "图索引能力" 的**抽象**。
- 今天的实现是 `EmbeddedGraphIndexService`(调 `create_lightrag_instance`
- + `rag.X`);搬家那天改成 `RemoteGraphIndexService`(HTTP 客户端),
- 业务代码一行不用改。
-
----
-
-## 5. 建议的抽象(两层)
-
-### 5.1 分层思想
-
-| 层 | 接口代号 | 所在位置 | 归宿 |
-|---|---|---|---|
-| Layer A | `BaseGraphStorage` / `BaseKVStorage` / `BaseVectorStorage` | `aperag/graph/lightrag/base.py`(现有) | **lightrag 搬家时一起走** |
-| Layer B | `GraphIndexService`(新) | `aperag/graph/service.py`(新增) | **ApeRAG 自己的资产** |
-
-### 5.2 Layer B 接口草案(核心增量)
-
-```python
-# aperag/graph/service.py
-
-from typing import Protocol, Sequence
-from aperag.db.models import Collection
-from aperag.graph.dto import (
- KnowledgeGraph, GraphLabels, MergeSuggestion, MergedNode,
- IndexDocumentResult, DeleteDocumentResult, GraphContext,
- KGEvalExport,
-)
-
-
-class GraphIndexService(Protocol):
- """ApeRAG's business-facing contract for knowledge-graph operations.
-
- Any 'graph engine' (today: embedded LightRAG; tomorrow: remote
- LightRAG service; day-after: something else entirely) implements this.
- Business code (`graph_service.py`, search pipeline, collection tasks)
- depends ONLY on this Protocol.
- """
-
- # ---- write ----
- async def index_document(
- self, collection: Collection, doc_id: str,
- content: str, file_path: str,
- ) -> IndexDocumentResult: ...
-
- async def delete_document(
- self, collection: Collection, doc_id: str,
- ) -> DeleteDocumentResult: ...
-
- # ---- read ----
- async def query_context(
- self, collection: Collection, query: str, top_k: int,
- ) -> GraphContext: ...
-
- async def get_labels(
- self, collection: Collection,
- ) -> GraphLabels: ...
-
- async def get_knowledge_graph(
- self, collection: Collection,
- label: str | None, max_depth: int, max_nodes: int,
- ) -> KnowledgeGraph: ...
-
- # ---- curation ----
- async def generate_merge_suggestions(
- self, collection: Collection, top_k: int,
- ) -> Sequence[MergeSuggestion]: ...
-
- async def merge_nodes(
- self, collection: Collection, source_ids: Sequence[str], target_id: str,
- ) -> MergedNode: ...
-
- # ---- export ----
- async def export_for_kg_eval(
- self, collection: Collection,
- ) -> KGEvalExport: ...
-```
-
-**9 个方法,全 DTO 化**。对应 §2.6 的 8 个稳定业务动作(`index_document`
-拆了 insert+process_graph 两步为一步)。
-
-### 5.3 两个实现
-
-**`EmbeddedGraphIndexService`**(今天的默认实现):
-
-- 内部还是 `await create_lightrag_instance(collection)` + `try/finally finalize_storages()`;
-- 但业务代码**不再 import `lightrag_manager`**,只依赖 `GraphIndexService`
- Protocol;
-- request-scoped `rag` cache(见 §3.R7)可以封在这里。
-
-**`RemoteGraphIndexService`**(lightrag 搬家时启用):
-
-- HTTP / gRPC 客户端,单例进程级持有连接池;
-- 对接"lightrag 服务"的 OpenAPI(那是 lightrag 搬家那次 PR 的产出,和
- 本文档无关);
-- 本地实现不需要 `create_lightrag_instance`,lightrag 服务自己管存储。
-
-**切换方式**:`GRAPH_INDEX_SERVICE=embedded|remote` env。默认 embedded。
-
-### 5.4 Layer A 的保留与清理
-
-**保留**(不动):
-
-- `BaseGraphStorage` / `BaseKVStorage` / `BaseVectorStorage` 的接口面;
-- `STORAGES` 注册表;
-- 25 个 `GraphStorageTestSuite` 跨后端等价性测试;
-- 三个具体实现(PG / Neo4j / Nebula)及其 connection manager。
-
-**清理**(纳入 M1):
-
-- R1:`_graph_search` 补 `finalize_storages`。
-- R2:`LightRAG` dataclass 默认 `graph_storage` 改成 `PGOpsSyncGraphStorage`。
-- R3:`_configure_storage_backends` 删除废弃类名分支。
-
-**延迟**(纳入 M3,按需):
-
-- R4:`BaseGraphStorage` 接口面分层(核心 / 批量 / UI 扩展)。
-- R5:LightRAG vector store 与 `aperag/vectorstore` 的合并。
-- R6:`PGOps*` 的 `asyncio.to_thread` 包装去掉。
-
----
-
-## 6. 后端能力矩阵
-
-| 能力 | pg-emulated (PGOpsSync) | Neo4j (sync driver) | Nebula (nebula3) |
-|---|---|---|---|
-| **原生图引擎** | ❌ 用 `(src, dst)` 表模拟 | ✅ Cypher | ✅ nGQL |
-| **多跳 BFS / 路径查询** | SQL 递归 CTE;深度 3 以上代价陡增 | 原生高效 | 原生高效 |
-| **批量 upsert** | ✅(SQL `INSERT ... ON CONFLICT`) | ✅(UNWIND) | ✅(UNWIND + `INSERT VERTEX ... VALUES ...`) |
-| **事务** | ✅(PG 标准) | ✅(显式 tx) | 有限(Nebula 仅支持 session-level,无跨语句 tx) |
-| **多标签节点** | JSONB array 存 | ✅ | 有限(Nebula tag 机制,多 tag 需多次写) |
-| **分区 / 水平扩展** | PG 分区或分库 | 企业版集群 / Neo4j Fabric | 原生分布式(meta+graph+storage 分层) |
-| **索引管理** | B-tree / GIN | Label + property index | Tag / edge index |
-| **运维复杂度** | ★☆☆☆☆(与主 DB 共享) | ★★★☆☆(单独组件) | ★★★★☆(三组件分布式) |
-| **小规模成本** | ~0(复用主 PG) | 中(Neo4j Community 免费但需独立机器) | 中-高 |
-| **百万节点性能** | ⚠️ 深度遍历会慢 | ✅ | ✅ |
-| **亿节点性能** | ❌(不建议) | 集群模式 ✅ | ✅(原生分布式) |
-| **当前使用场景** | ApeRAG-Lite / 私有化默认 | 中等规模生产 | 大规模生产 |
-| **备份/恢复** | pg_dump | 原生备份 | 原生备份但流程复杂 |
-
-**部署选型的默认建议**:
-
-- 文档 < 10 万 / collection < 千:pg-emulated。零运维,一个 PG 搞定一切。
-- 文档 10 万 ~ 百万:Neo4j Community。单机能扛,Cypher 生态好。
-- 文档 百万+:Nebula 或 Neo4j 企业版。上分布式。
-
-以上数字是**数量级级别的 rule of thumb**,不是 benchmark 结论。正式
-切换前应跑对应数据量的 pilot。
-
----
-
-## 7. 路线图
-
-四个里程碑。每个都是独立 PR,独立上线,独立回滚。
-
-### 7.1 M1:清洁工作(小 PR)
-
-- [ ] R1:`_graph_search` 补 `try/finally finalize_storages()`。
-- [ ] R2:`LightRAG` dataclass 默认 `graph_storage` 对齐 env(`PGOpsSyncGraphStorage`)。
-- [ ] R3:`_configure_storage_backends` 删废弃类名分支。
-- [ ] 本文档新增的测试:至少一个**回归测试**覆盖 R1 的 finalize 调用
- (mock `rag.finalize_storages`,断言 `_graph_search` 走 finally 分支)。
-
-**预估**:半天。
-
-### 7.2 M2:引入 `GraphIndexService`(中等 PR)
-
-- [ ] 新增 `aperag/graph/service.py`:`GraphIndexService` Protocol +
- `EmbeddedGraphIndexService` 实现。
-- [ ] DTO 一套:`aperag/graph/dto.py`,9 个业务动作对应的请求/响应类型。
- 原则对齐 `aperag/vectorstore/dto.py`——frozen dataclass,零后端依赖。
-- [ ] 迁移业务层:
- - `graph_service.py` 全部 handler 改为依赖 `GraphIndexService`;
- - `search_pipeline_service._graph_search` 同样;
- - `tasks/collection.py::_delete_lightrag` 同样;
- - `lightrag_manager.process_document_for_celery` / `delete_document_for_celery`
- 内部也改用 `GraphIndexService`。
-- [ ] Request-scoped `rag` cache(见 §3.R7)—— FastAPI dependency,
- 同一请求内多次调 graph 方法时复用 LightRAG 实例。
-- [ ] 单元测试:`GraphIndexService` 的契约测试(mock LightRAG),独立
- 于真实后端。
-- [ ] 集成测试:`EmbeddedGraphIndexService` 配 pg-emulated backend 的
- 端到端测试(复用 `tests/integration/graphstorage/` 的测试数据)。
-
-**预估**:3~5 天。
-
-### 7.3 M3:Layer A 清理(小-中 PR,可选)
-
-按需做,只在有明确痛点时启动:
-
-- [ ] R4:`BaseGraphStorage` 接口分层。第三层(UI 扩展,默认返回
- `None`)改为显式 `NotImplementedError` + `GraphIndexService.capabilities()`
- 能力探测。
-- [ ] R6:`PGOpsSync*` 的 `asyncio.to_thread` 包装拆掉,直接走 `AsyncSession`。
- 需配套 benchmark 证明收益。
-- [ ] 备选:为 `GraphStorageTestSuite` 增加 "NetworkX baseline oracle"
- 的对照模式(已有 `networkx_baseline_storage.py`),让跨后端等价性
- 测试的**语义正确性**有参考实现验证。
-
-**预估**:1~2 周(不全做)。
-
-### 7.4 M4:lightrag 改 web service(大 PR,独立项目)
-
-不在本文档范围内的独立工程。本文档给它提供两件礼物:
-
-1. **Layer B 存在**:ApeRAG 侧不用改业务代码,只实现新的
- `RemoteGraphIndexService` + 切 env。
-2. **Layer A 规整**:接口 + 25 测试 + 三个实现搬去 lightrag 服务时,
- 可以直接抬走不用返工。
-
-lightrag 服务本身的 OpenAPI / 部署方案 / 数据迁移 / 并发模型留给那个
-PR 定。
-
----
-
-## 8. Open questions
-
-需要更多信息或实测才能拍板的事:
-
-### Q1. Entity / relation 向量要不要合并到 `aperag/vectorstore`?
-
-当前:LightRAG 用 `PGOpsSyncVectorStorage` 存 entity / relation / chunk
-向量,与 `aperag/vectorstore` 的 Qdrant / pgvector 分片**物理隔离**。
-
-- **合并**:减少一套向量系统。但打破 lightrag 的独立性,搬家时要重新
- 设计。
-- **不合并**:两套向量系统共存,运维多一个维度。lightrag 搬家更干净。
-
-倾向:**不合并**。lightrag 服务搬出去之后,两套向量分别属于不同服务,
-是清晰的。
-
-### Q2. `RemoteGraphIndexService` 的缓存策略?
-
-lightrag 变 service 后,`rag.get_knowledge_graph(label="*", max_nodes=1000)`
-这种"UI 展示用"的查询如果每次都走 RPC,体验会卡。
-
-候选方案:
-
-- 客户端侧 LRU(by `(collection_id, label, max_depth, max_nodes)` 键);
-- lightrag 服务端 etag + 304;
-- 业务层节流(UI 隔 N 秒才允许重新请求)。
-
-三者互斥的程度不高,但实现位置差别大。M4 定。
-
-### Q3. 合并建议(merge suggestions)的所有权归属?
-
-当前:
-- 建议**生成**逻辑在 LightRAG(`rag.agenerate_merge_suggestions`);
-- 建议**存储**在 ApeRAG 主 DB(`graph_merge_suggestion` 表,由
- `async_db_ops` 管理);
-- 建议**审核 UI** 在 ApeRAG 前端。
-
-lightrag 搬家后,"生成"会变成 RPC,"存储"和"审核"仍在 ApeRAG。可能需要
-把建议存储也搬到 lightrag 服务,让生成和存储同侧;或保留现状,"生成 →
-RPC 返回 → ApeRAG 落库" 的扇出。M4 定。
-
-### Q4. pg-emulated 在多大规模下开始不够用?
-
-目前没有正式的 benchmark。M3 的一个子任务是:用
-`GraphStorageTestSuite` 里的 `test_large_batch_operations`(697 行起)
-改造成可配置规模的压测脚本,在 10 万 / 100 万 / 1000 万 节点规模下对
-三种后端做 p50 / p99 查询延迟对比,作为未来运维手册的选型依据。
-
----
-
-## 9. 反过度设计:什么时候**不**做这个抽象
-
-与 `vector_db_abstraction.md` §10 一脉相承的安全阀:
-
-- **只用一种图后端且没打算切换** → M1 清洁做完就行,M2/M3 都是过度设计。
-- **lightrag 一直内嵌、不打算拆服务** → M2(`GraphIndexService`)只节省
- 了业务层的 import 深度;值还是有,但不是紧迫需求。
-- **团队<3 人,且图功能不是产品核心** → 维护抽象层的成本超过直接调
- LightRAG 的成本。
-
-触发**做 M2** 的信号(任一即可):
-
-1. lightrag 拆服务提上日程(即便没开始动手)。
-2. 新增一个本文档没讨论的图后端(如 ArangoDB、TigerGraph)的需求出现。
-3. 单次查询里 graph 路径的性能成为瓶颈,request-scoped cache 成为必要。
-
-触发**做 M3** 的信号:
-
-1. Layer A 接口面的 "默认返回 None" 陷阱真的在生产环境坑过人。
-2. `asyncio.to_thread` 的线程池调度被 profiler 点名成为热点。
-
-在触发信号出现前,**本文档的价值在于把思路写清楚**——不是承诺要做。
-
----
-
-## 10. 附:与向量抽象层的对照
-
-本文档在结构、命名、设计原则上有意与 [`vector_db_abstraction.md`](./vector_db_abstraction.md)
-对齐,便于后来人类比阅读:
-
-| 维度 | 向量抽象(已落地) | 图抽象(本文档) |
-|---|---|---|
-| 业务动作数 | 5 个主要方法 | 9 个主要方法 |
-| DSL 层 | `VectorFilter`(Eq/In/IsEmpty/And/Or/Not) | 暂时不需要(图查询语义更业务化) |
-| DTO 层 | `VectorPoint`、`QueryRequest`、`SearchHit`、`TenantRef`、`VectorShape` | `KnowledgeGraph`、`GraphContext`、`MergeSuggestion`、... |
-| 后端数 | 2(Qdrant、pgvector) | 3(pg-emulated、Neo4j、Nebula) |
-| 抽象层位置 | ApeRAG 本身持有 | **Layer B** 在 ApeRAG;**Layer A** 会随 lightrag 搬家 |
-| 一次性做完 | ✅ | ❌(M1 先做清洁,M2 做 Layer B,M3 按需) |
-
-向量抽象能一次到位,是因为"向量后端"是纯基础设施;图抽象分阶段,是
-因为 LightRAG 这一大块**未来要搬家**,现在深度重构会白做。这是两个
-文档最核心的区别。
diff --git a/docs/zh-CN/design/graph_normalization_merge_full_analysis.md b/docs/zh-CN/design/graph_normalization_merge_full_analysis.md
deleted file mode 100644
index c444abf69..000000000
--- a/docs/zh-CN/design/graph_normalization_merge_full_analysis.md
+++ /dev/null
@@ -1,1137 +0,0 @@
----
-title: Graph 归一化与合并全链路分析
-description: ApeRAG 当前 Graph/LightRAG 归一化、合并、删除、查询与 merge suggestion 全链路实现审计
-keywords: graph, lightrag, normalization, merge, performance, database
-position: 30
----
-
-# Graph 归一化与合并全链路分析
-
-> 本文档基于当前仓库 `main` 附近状态梳理,分析基线为 `c389f5cf`。
-> 同时把最近几轮已经合入的修复也当成当前事实的一部分:
-> - `#1519`:PG `chunk_ids` overlap 查询修复
-> - `#1523`:LightRAG query contract 收紧
-> - `#1525`:chunk identity + delete contract 收紧
-> - `#1528`:`varchar[] && text[]` follow-up cast 修复
-> - `#1531`:orchestration contract tests
-
-## 1. 先给结论
-
-当前 Graph/LightRAG 这条线已经从“明显 correctness 漏洞很多”的状态,推进到了“核心 contract 初步被测住”的状态,但整体架构仍然有 4 个根问题没有解决:
-
-1. **写路径不是事务性链路,而是多存储、多阶段的 best-effort orchestration**
- - `chunk`、`vector`、`graph` 三层写入和删除没有统一事务边界。
- - 一旦在中途失败,当前实现只能靠后续重跑或人工修复收敛。
-
-2. **归一化与 merge 语义散落在多个层次,核心 contract 依然过度依赖隐式约定**
- - `entity_name` 归一化、`source_id/chunk_ids` 追踪、图节点/边合并、手工 merge、删除引用扣减,都在不同文件里各自维护。
- - 数据形态本身也不统一:有的地方用 `chunk_ids: list[str]`,有的地方用 `source_id: str` 加分隔符拼接。
-
-3. **文件/类职责边界不清,存在明显的 god file 与平行实现**
- - `aperag/graph/lightrag/lightrag.py` 既是 facade,又承担 chunking、indexing、query、delete、merge suggestion、manual merge。
- - `aperag/graph/lightrag/operate.py` 同时承担 extraction、merge/upsert、query context、merge suggestion。
- - `aperag/graph/lightrag/utils_graph.py` 中还保留了一套看起来已经不再被主线调用的 edit/merge 实现,存在漂移风险。
-
-4. **数据库/图存储层已经有一批明确的性能风险点**
- - `source_id` 的分隔符拼接 + `LIKE/CONTAINS` 检索,是当前删除路径和部分查询路径最脆弱的地方。
- - PG / Neo4j / Nebula 三套 backend 的能力并不对齐,导致上层逻辑在不同后端下性能和行为并不对等。
-
-一句话概括当前状态:
-
-- **当前代码已经可用,但仍然是“contract 刚开始被补齐、架构还没收平、性能热点还没系统治理”的阶段。**
-
-## 2. 文档范围
-
-本文档只分析当前实现,不混整改 patch。本次范围覆盖 5 条线:
-
-1. **Graph 写路径**
- - 文档状态变更
- - reconciler / task 调度
- - LightRAG 文档处理、chunking、entity/relation extraction
- - 归一化、connected components 分组、merge、graph/vector/text 写入
-
-2. **Graph 删除/更新路径**
- - `adelete_by_doc_id`
- - update 时的 delete + rebuild 语义
-
-3. **Graph 读路径**
- - 图谱读取
- - query context 构建
- - merge suggestion 读取与生成
-
-4. **人工 merge / suggestion action 路径**
- - `GraphService.merge_nodes`
- - `handle_suggestion_action`
- - `LightRAG.amerge_nodes`
-
-5. **DB / storage / test 审计**
- - schema、index、query 形态
- - 后端差异
- - 现有单测覆盖与缺口
-
-## 3. 代码地图
-
-### 3.1 入口与调度层
-
-| 角色 | 主要文件 | 说明 |
-| --- | --- | --- |
-| 索引入口抽象 | `aperag/index/graph_index.py` | GraphIndexer 抽象入口,但不是真正执行者 |
-| 文档任务执行 | `aperag/tasks/document.py` | 真正调用 `process_document_for_celery` / `delete_document_for_celery` |
-| reconciler | `aperag/tasks/reconciler.py` | 通过 `DocumentIndex` 的 `version / observed_version / lease` 推进实际任务 |
-| 文档索引模型 | `aperag/db/models.py` 中 `DocumentIndex` | Graph 写路径的上位状态机 |
-
-### 3.2 Graph 主逻辑层
-
-| 角色 | 主要文件 | 说明 |
-| --- | --- | --- |
-| LightRAG 实例构造 | `aperag/graph/lightrag_manager.py` | 每次处理新建 `LightRAG` 实例,并注入 backend / embed / llm |
-| LightRAG facade | `aperag/graph/lightrag/lightrag.py` | chunking、indexing、query、delete、merge suggestions、manual merge |
-| extraction/query/suggestion 细节 | `aperag/graph/lightrag/operate.py` | 实体提取、关系提取、merge/upsert、query context、merge suggestion 分析 |
-| 旧的图编辑/合并工具 | `aperag/graph/lightrag/utils_graph.py` | 包含 `aedit_entity`、`amerge_entities`,当前主线未直接调用 |
-
-### 3.3 服务/API 层
-
-| 角色 | 主要文件 | 说明 |
-| --- | --- | --- |
-| 图服务 | `aperag/service/graph_service.py` | graph read path、merge suggestion cache、manual merge、export |
-| graph API | `aperag/views/graph.py` | merge nodes、merge suggestions、suggestion action、KG export |
-| collections 图读取 API | `aperag/views/collections.py` | 获取 labels、图谱数据 |
-
-### 3.4 存储层
-
-| 层 | 主要文件 | 说明 |
-| --- | --- | --- |
-| PG graph repo | `aperag/db/repositories/graph.py` | PG graph 节点/边读写、batch query、degree 计算 |
-| PG vector repo | `aperag/db/repositories/lightrag.py` | doc chunks / entity VDB / relation VDB |
-| PG graph storage | `aperag/graph/lightrag/kg/pg_ops_sync_graph_storage.py` | Graph storage facade |
-| PG vector storage | `aperag/graph/lightrag/kg/pg_ops_sync_vector_storage.py` | Vector storage facade |
-| Neo4j backend | `aperag/graph/lightrag/kg/neo4j_sync_impl.py` | Neo4j graph backend |
-| Nebula backend | `aperag/graph/lightrag/kg/nebula_sync_impl.py` | Nebula graph backend |
-
-## 4. 全链路主流程
-
-## 4.1 写路径总览
-
-```mermaid
-flowchart TD
- A[文档/collection 配置变更] --> B[DocumentIndex.version 变化]
- B --> C[reconciler claim index]
- C --> D[aperag/tasks/document.py]
- D --> E[lightrag_manager.process_document_for_celery]
- E --> F[create_lightrag_instance]
- F --> G[LightRAG.adelete_by_doc_id]
- G --> H[LightRAG.ainsert_and_chunk_document]
- H --> I[LightRAG.aprocess_graph_indexing]
- I --> J[operate.extract_entities]
- J --> K[_find_connected_components]
- K --> L[_grouping_process_chunk_results]
- L --> M[operate.merge_nodes_and_edges]
- M --> N[graph storage + entity VDB + relation VDB]
- N --> O[IndexTaskCallbacks.on_index_created]
-```
-
-这里最重要的一点是:
-
-- **当前 graph update 不是增量更新,而是 delete + rebuild。**
-
-也就是 `lightrag_manager._process_document_async()` 在写新图之前,先执行:
-
-- `await rag.adelete_by_doc_id(doc_id)`
-
-然后才会:
-
-- `ainsert_and_chunk_document()`
-- `aprocess_graph_indexing()`
-
-这意味着:
-
-1. 这条链路的真实语义是“按文档重建 graph 状态”,不是“对旧 graph 做精细增量 patch”。
-2. 一旦 delete 之后、rebuild 之前失败,当前文档的 graph 数据会出现空窗或不一致窗口。
-
-## 4.2 调度层真实语义
-
-### `GraphIndexer` 不是实际执行者
-
-`aperag/index/graph_index.py` 的命名容易让人误解,以为 graph index 由它真正执行。但当前实际执行链路里:
-
-- `GraphIndexer.create_index_async()` 只返回 “task scheduled”
-- 真正做事的是 `aperag/tasks/document.py`
-- 再往下才是 `process_document_for_celery()`
-
-这形成了一个明显的接口设计问题:
-
-- **抽象入口的语义与真实执行路径不一致。**
-
-从维护角度看,这会带来两个问题:
-
-1. 新人读 `GraphIndexer` 会误判真实调用链。
-2. 后续如果要重构 reconciliation / task 调度,很容易出现“改了入口类,但没改真实 worker”的错觉。
-
-## 4.3 LightRAG 实例构造
-
-`aperag/graph/lightrag_manager.py:create_lightrag_instance()` 每次都会:
-
-1. 根据 collection 配置解析 graph enablement / language / entity_types
-2. 动态生成 embedding function
-3. 动态生成 llm function
-4. 根据环境变量注入 `kv_storage / vector_storage / graph_storage`
-5. 新建一个全新的 `LightRAG` 实例
-6. 初始化 storages
-
-这条设计的优点是:
-
-- 避免全局状态污染
-- 对 Celery / 进程隔离友好
-
-但代价也很明显:
-
-- 每次处理都要完整走一遍实例构造和 storage 初始化
-- configuration assembly 与 runtime orchestration 强耦合在同一个 manager 里
-
-## 4.4 文档分块与 chunk identity
-
-文档 chunking 发生在 `LightRAG.ainsert_and_chunk_document()`:
-
-1. 调用 `chunking_func`
-2. 为每个 chunk 生成 `chunk_id`
-3. 同时写入 `chunks_vdb` 与 `text_chunks`
-
-当前关键语义:
-
-- chunk id 现在通过 `_compute_chunk_instance_id(doc_id, chunk_data, fallback_index, workspace)` 生成
-- 其 identity 是:
- - `doc_id + chunk_order_index + content`
-
-这意味着最近已经收掉了一个很关键的 correctness 问题:
-
-- **不同文档里相同 chunk 文本,不再共享同一个 chunk id。**
-
-这是当前图删除/更新路径能成立的前提之一。
-
-chunk 数据当前主形态:
-
-```python
-{
- chunk_id: {
- "tokens": ...,
- "content": ...,
- "chunk_order_index": ...,
- "full_doc_id": doc_id,
- "file_path": file_path,
- }
-}
-```
-
-## 4.5 归一化发生在哪里
-
-当前归一化的核心落点不在 merge 阶段,而在 extraction 阶段:
-
-- `operate._handle_single_entity_extraction()`
-- `operate._handle_single_relationship_extraction()`
-- 实际归一化函数:`aperag/graph/lightrag/utils.py:normalize_extracted_info`
-
-### 当前实体归一化语义
-
-对实体名,当前已经明确做了这些处理:
-
-1. 中英文括号、破折号等符号归一化
-2. 中英文之间多余空格移除
-3. 外层英文引号、中文引号处理
-4. 对纯英文 entity 做 title case 规范化
-5. 关系抽取时对 `src/tgt` 也会走 entity normalization
-6. 关系中 `src == tgt` 会直接丢弃自环
-
-现有单测主要覆盖:
-
-- `tests/unit_test/graphindex/test_normalize_extracted_info.py`
-- `tests/unit_test/graphindex/test_normalize_simple.py`
-- `tests/unit_test/graphindex/test_case_normalization.py`
-
-### 这一层的重要事实
-
-- **归一化不是一个独立 pipeline,而是 extraction 中的隐式步骤。**
-- 后面的 graph merge、manual merge、delete、query 都默认“输入已经被归一化过”。
-
-这也是一个设计隐患:
-
-- 如果后续从别的入口绕开 extraction 直接写 graph,这些归一化 contract 很容易失效。
-
-## 4.6 entity / relation 提取
-
-实体和关系抽取在 `operate.extract_entities()` 中完成。
-
-当前语义是:
-
-1. 对每个 chunk 并发调用 LLM
-2. 支持初次提取 + gleaning 追补
-3. 对单 chunk 结果返回:
- - `maybe_nodes: dict[entity_name, list[entity_payload]]`
- - `maybe_edges: dict[(src, tgt), list[edge_payload]]`
-4. 整体结果返回:
- - `list[(maybe_nodes, maybe_edges)]`
-
-这批 contract 最近已经被 `tests/unit_test/graphindex/test_lightrag_orchestration_contract.py` 补了一轮 orchestration 覆盖,至少把这些语义钉住了:
-
-1. `extract_entities()` 失败时不吞异常
-2. `FIRST_EXCEPTION` 后会取消 pending task
-3. `aprocess_graph_indexing()` 输入校验与异常传播稳定
-
-## 4.7 connected components 分组
-
-`LightRAG._find_connected_components()` 会先根据 extraction 出来的 nodes/edges 构造邻接表,再按 BFS 找连通分量。
-
-当前设计意图是:
-
-- 把互相独立的实体群拆开
-- 后续每个 component 分开 merge/upsert
-
-这条线最近也通过 `#1531` 的 tests 被测住了,包括:
-
-1. zero-count 结果
-2. component filtering 不串组
-3. first-exception cancel pending
-4. 当前串行语义
-
-但这里有一个很重要的事实:
-
-- `_grouping_process_chunk_results()` 虽然把 component 拆成多个 task,
-- **最终却用了 `asyncio.Semaphore(1)`,因此当前是串行处理 component。**
-
-所以当前状态是:
-
-- **设计上看起来像“可并行分组处理”,真实语义仍然是“单并发串行推进”。**
-
-这个差异必须显式写进文档,否则后续很容易有人以为这里已经并行了。
-
-## 4.8 merge 与 upsert
-
-真正的 entity / relation 聚合发生在 `operate.merge_nodes_and_edges()`。
-
-### 节点合并
-
-`_merge_nodes_then_upsert()` 的现有语义:
-
-1. 先取已存在节点
-2. 把已有描述、source_id、file_path 和新数据合并
-3. `entity_type` 通过多数派决定
-4. `description` 去重后用 `GRAPH_FIELD_SEP` 拼接
-5. 超过阈值时可触发 LLM summary
-6. 最终 upsert 回 graph
-
-### 边合并
-
-`_merge_edges_then_upsert()` 的现有语义:
-
-1. `(src, tgt)` 相同关系会聚合
-2. `weight` 做求和
-3. `keywords` 做去重
-4. `description/source_id/file_path` 做聚合
-5. 必要时补 UNKNOWN node
-6. 最终 upsert 回 graph,再写 relation VDB
-
-### 写入形态
-
-这一层会同时维护三类存储:
-
-1. graph storage
- - 节点:`lightrag_graph_nodes` 或图数据库节点
- - 边:`lightrag_graph_edges` 或图数据库边
-
-2. entity vector storage
- - `entity_name + description` 为主的 embedding
- - 额外保留 `chunk_ids` 或等价来源信息
-
-3. relation vector storage
- - `src/tgt/keywords/description` 的 embedding
- - 同样保留 `chunk_ids` 或等价来源信息
-
-## 4.9 删除与 update 语义
-
-`LightRAG.adelete_by_doc_id()` 是当前最关键的一条 contract-heavy 代码。
-
-它当前做的是两层引用扣减:
-
-1. **Vector 层**
- - 依据 `chunk_ids: list[str]` 做差集
- - shared refs -> update
- - exclusive refs -> delete
-
-2. **Graph 层**
- - 依据 `source_id: str` 拆分为 chunk token 集合
- - shared refs -> 更新 `source_id`
- - exclusive refs -> delete node/edge
-
-然后最后再删:
-
-- `chunks_vdb`
-- `text_chunks`
-
-最近合入的 `#1525` 和相关测试,已经把这条 delete contract 测得比较明确:
-
-1. shared refs 走 update / 扣减
-2. exclusive refs 才 delete
-3. 不同文档相同 chunk 内容不会再互相覆盖
-
-但它仍然有一个本质问题:
-
-- **这是跨多存储、多阶段、非原子性的补偿式删除,不是事务性删除。**
-
-## 4.10 读路径
-
-当前 graph read path 大致分三类。
-
-### A. 图谱浏览
-
-主链路:
-
-- `views/collections.py` / `views/graph.py`
-- `GraphService.get_knowledge_graph()`
-- `LightRAG.get_knowledge_graph()`
-- backend storage 的 `get_knowledge_graph()`
-
-需要明确一个重要现状:
-
-- **PG backend 的 `get_knowledge_graph()` 明确写着是 simplified implementation。**
-- `max_depth` 在 PG 下并不代表真正的多跳遍历能力。
-
-也就是说:
-
-- API 暴露了 `max_depth`
-- 但至少在 PG backend 上,这个参数语义并不完整
-
-这属于比较典型的“接口看起来比实现强”的问题。
-
-### B. query context 构建
-
-主链路:
-
-- `LightRAG.aquery_context()`
-- `operate.build_query_context()`
-- `_get_node_data()` / `_get_edge_data()` / `_get_vector_context()`
-
-最近 `#1523` 已经把 query contract 收紧了一轮,当前已明确:
-
-1. helper 层统一返回稳定三元组 `(entities, relations, text_units)`
-2. 空关键词返回空三元组
-3. `mix` 模式允许 vector-only text hits
-4. `aquery_context()` 不再复用默认 `QueryParam()`
-
-但这里仍然有一个设计味道不好的点:
-
-- `build_query_context()` 会直接修改 `query_param.mode`
-- 也就是 fallback 逻辑会带状态副作用
-
-这虽然当前已被测试固定住,但并不是一个很干净的接口设计。
-
-### C. merge suggestions
-
-主链路:
-
-- `GraphService.get_or_generate_merge_suggestions()`
-- `GraphService.generate_merge_suggestions()`
-- `LightRAG.agenerate_merge_suggestions()`
-- `operate.get_high_degree_nodes()`
-- `filter_and_group_entities()`
-- `analyze_entities_with_llm()`
-- `filter_and_deduplicate_suggestions()`
-
-这里的现有设计特点:
-
-1. 只分析高 degree 节点,做 bounded candidate selection
-2. service 层还负责 active/history suggestion cache
-3. action 层在 accept/reject 时还会把 active suggestion 移动到 history
-
-## 4.11 人工 merge 路径
-
-主链路:
-
-- `GraphService.merge_nodes()`
-- `GraphService._execute_merge_operation()`
-- `LightRAG.amerge_nodes()`
-
-这条线的当前语义:
-
-1. 支持 auto-select target entity(按最高 degree)
-2. 合并节点属性
-3. 重写所有相关边
-4. 重建 entity / relation VDB
-5. 删除 source entities
-
-但这里有个很重要的维护性问题:
-
-- `aperag/graph/lightrag/utils_graph.py` 里还保留着另一套 `aedit_entity()` / `amerge_entities()` 实现
-- 当前主线 API 走的是 `LightRAG.amerge_nodes()`
-- `utils_graph.py` 这套逻辑看起来已经不在主执行线上
-
-这意味着:
-
-- **当前仓库里存在两套 graph edit/merge 逻辑。**
-
-即使其中一套暂时不用,它仍然是明显的漂移风险与阅读噪音来源。
-
-## 5. 当前数据模型与关键 contract
-
-## 5.1 主要表
-
-### chunks
-
-- `lightrag_doc_chunks`
-- 主字段:
- - `workspace`
- - `id`
- - `full_doc_id`
- - `chunk_order_index`
- - `content`
- - `content_vector`
- - `file_path`
-
-### entity VDB
-
-- `lightrag_vdb_entity`
-- 主字段:
- - `workspace`
- - `id`
- - `entity_name`
- - `content`
- - `content_vector`
- - `chunk_ids ARRAY(String)`
- - `file_path`
-
-### relation VDB
-
-- `lightrag_vdb_relation`
-- 主字段:
- - `workspace`
- - `id`
- - `source_id`
- - `target_id`
- - `content`
- - `content_vector`
- - `chunk_ids ARRAY(String)`
- - `file_path`
-
-### graph nodes / edges
-
-- `lightrag_graph_nodes`
-- `lightrag_graph_edges`
-
-graph 层当前依然把来源信息存成:
-
-- `source_id: Text`
-
-也就是:
-
-- 不是数组
-- 不是关联表
-- 而是 `GRAPH_FIELD_SEP` 拼接字符串
-
-这是当前最值得明确记住的技术债之一。
-
-## 5.2 当前最重要的 contract
-
-### Contract A: chunk identity
-
-- 同一文档、同一 chunk 实例,chunk id 稳定
-- 不同文档,即使 chunk 文本相同,也不共享 chunk id
-
-### Contract B: delete semantics
-
-- shared refs -> update
-- exclusive refs -> delete
-
-### Contract C: query helper return shape
-
-- helper 一律返回稳定三元组
-- fail-response 不在 helper 内混字符串
-
-### Contract D: orchestration semantics
-
-- `_grouping_process_chunk_results()` 当前串行,不是假定并行
-- `FIRST_EXCEPTION` 时 pending 会取消
-
-## 6. 问题审计
-
-## 6.1 P0 correctness / contract 风险
-
-### P0-1. update 实际是 delete + rebuild,失败时存在数据空窗
-
-**位置**
-
-- `aperag/graph/lightrag_manager.py:_process_document_async`
-
-**现状**
-
-- 每次写 graph 前先 `adelete_by_doc_id(doc_id)`
-- 然后重新 chunk + extract + merge + upsert
-
-**风险**
-
-- delete 成功、后续 rebuild 失败时,当前文档 graph 状态会被清空或部分清空
-- 因为没有统一事务,worker 失败时只能靠重试/重建收敛
-
-**判断**
-
-- 这是当前最实质的 correctness 风险之一
-- 不是单测能完全解决的问题,属于架构级语义问题
-
-### P0-2. 跨多存储写入/删除没有统一事务边界
-
-**位置**
-
-- `LightRAG.ainsert_and_chunk_document`
-- `LightRAG.aprocess_graph_indexing`
-- `LightRAG.adelete_by_doc_id`
-- `LightRAG.amerge_nodes`
-
-**现状**
-
-- chunk text store
-- chunk vector store
-- entity vector store
-- relation vector store
-- graph store
-
-这些操作分多步执行,任何一步异常都可能留下部分提交状态。
-
-**风险**
-
-- graph 有数据、vector 没更新
-- vector 已删除、graph 还残留
-- source refs 已改写但 chunk 未删干净
-
-### P0-3. `source_id` 的字符串 contract 太脆弱
-
-**位置**
-
-- `operate._merge_nodes_then_upsert`
-- `operate._merge_edges_then_upsert`
-- `LightRAG.adelete_by_doc_id`
-- `GraphRepositoryMixin._build_source_id_overlap_clause`
-- `Neo4JSyncStorage.get_nodes_by_source_ids/get_edges_by_source_ids`
-- `NebulaSyncStorage.get_nodes_by_source_ids/get_edges_by_source_ids`
-
-**现状**
-
-- graph 层把多个 chunk 来源编码成 `GRAPH_FIELD_SEP` 拼接字符串
-- 删除/查询时再 split / join / like / contains
-
-**风险**
-
-- contract 极度依赖分隔符和字符串处理一致性
-- 很难做数据库级约束与索引优化
-- 很容易在不同 backend 上漂出不同语义或边界 bug
-
-### P0-4. `get_knowledge_graph(max_depth)` 暴露的接口能力强于 PG 实现
-
-**位置**
-
-- `aperag/graph/lightrag/kg/pg_ops_sync_graph_storage.py:get_knowledge_graph`
-
-**现状**
-
-- 函数注释明确写了 simplified implementation
-- 只支持非常有限的 immediate connections 近邻拼装
-
-**风险**
-
-- API 暴露 `max_depth`
-- 调用方会自然认为这是可依赖的多跳遍历能力
-- 实际上在 PG backend 下并不是
-
-**判断**
-
-- 这是接口设计问题,不是单纯性能问题
-
-### P0-5. `get_graph_labels()` 名称与返回值语义不一致
-
-**位置**
-
-- `LightRAG.get_graph_labels()`
-- `PGOpsSyncGraphStorage.get_all_labels()`
-
-**现状**
-
-- `get_graph_labels()` 实际返回的是 entity id / entity name 风格列表
-- 不是严格意义上的 label / type 集合
-
-**风险**
-
-- UI 或调用方若把它理解成“实体类型列表”,会产生错误心智
-
-### P0-6. 仍然存在两套 edit/merge 代码
-
-**位置**
-
-- 主线:`LightRAG.amerge_nodes()`
-- 平行实现:`aperag/graph/lightrag/utils_graph.py`
-
-**现状**
-
-- `utils_graph.py` 中保留 `aedit_entity()`、`amerge_entities()`
-- 但主 API 走的不是这条线
-
-**风险**
-
-- 后续修一套漏一套
-- 新人误读
-- review 时很难一眼判断哪条是 canonical implementation
-
-## 6.2 P1 数据库 / 性能风险
-
-### P1-1. `lightrag_doc_chunks` 缺少 `(workspace, full_doc_id)` 索引
-
-**位置**
-
-- `aperag/db/models.py:LightRAGDocChunksModel`
-
-**现状**
-
-- 表主键是 `(id, workspace)`
-- 但删除路径和按文档重建路径很依赖:
- - `get_by_doc_id(full_doc_id)`
-
-**风险**
-
-- 文档数量上涨后,按 doc 查 chunk 成本会上升
-- update/delete 会越来越慢
-
-### P1-2. `chunk_ids` 数组查询没有显式 GIN 索引
-
-**位置**
-
-- `LightRAGVDBEntityModel.chunk_ids`
-- `LightRAGVDBRelationModel.chunk_ids`
-
-**现状**
-
-- 最近已经修成 `&& CAST(ARRAY[...] AS VARCHAR[])`
-- 但 schema 上没看到针对 `chunk_ids` 的显式 GIN 索引
-
-**风险**
-
-- 文档删除、graph 重建、shared ref 扣减场景下,overlap query 会越来越贵
-
-### P1-3. graph 层删除依赖 `source_id` 文本扫描
-
-**位置**
-
-- `GraphRepositoryMixin.get_graph_nodes_by_source_ids`
-- `GraphRepositoryMixin.get_graph_edges_by_source_ids`
-
-**现状**
-
-- PG 里通过 `LIKE` / `OR` 检查 `source_id` 是否包含某个 chunk token
-- Neo4j 里通过 `split(... ) any(...)`
-- Nebula 里先 `CONTAINS`,再回到 Python 做 `_source_ids_overlap`
-
-**风险**
-
-- 这条路径既难索引,又容易随 chunk 数量膨胀
-- backend 间性能差异会越来越大
-
-### P1-4. `get_graph_edges_batch` / `delete_graph_edges_batch` 用大 OR 拼条件
-
-**位置**
-
-- `aperag/db/repositories/graph.py`
-
-**现状**
-
-- 对每个 `(src, tgt)` 都拼一个 `OR`
-
-**风险**
-
-- pair 数量大时 SQL 体积和 planner 成本都会迅速上升
-- 更适合改成 `VALUES JOIN` 或 row-wise comparison
-
-### P1-5. 单节点 degree 查询是两条 count SQL
-
-**位置**
-
-- `GraphRepositoryMixin.get_graph_node_degree`
-
-**现状**
-
-- outgoing 一次 count
-- incoming 一次 count
-
-**风险**
-
-- 单条调用问题不大
-- 一旦被上层热路径频繁调用,会有明显浪费
-
-### P1-6. Nebula backend 在高阶节点分析上缺少与 PG/Neo4j 对齐的优化入口
-
-**位置**
-
-- `operate.get_high_degree_nodes`
-- `NebulaSyncStorage`
-
-**现状**
-
-- PG 和 Neo4j 都有 `get_top_degree_nodes`
-- Nebula 没有
-- 所以上层会退化成:
- - `get_all_labels`
- - `node_degrees_batch`
- - 全量或大批量筛 top degree
-
-**风险**
-
-- merge suggestion 在 Nebula 下成本可能远高于 PG/Neo4j
-
-### P1-7. `_find_most_related_text_unit_from_entities()` 拉 chunk 的方式过于保守
-
-**位置**
-
-- `operate._find_most_related_text_unit_from_entities`
-
-**现状**
-
-- 手工切成 `batch_size = 5`
-- 每批 `gather`
-- 整体串行推进
-
-**风险**
-
-- 在 chunk 较多时会过度拉长 query context 构建时间
-
-### P1-8. `_find_related_text_unit_from_relationships()` 是无并发上限的 fan-out
-
-**位置**
-
-- `operate._find_related_text_unit_from_relationships`
-
-**现状**
-
-- 直接为每个 chunk id 创建 task 并 `asyncio.gather`
-
-**风险**
-
-- chunk fan-out 大时容易对 KV/backend 产生瞬时压力
-
-### P1-9. `_find_connected_components()` 使用 `queue.pop(0)`
-
-**位置**
-
-- `LightRAG._find_connected_components`
-
-**现状**
-
-- BFS 队列是 Python list
-- 用 `pop(0)` 做队头弹出
-
-**风险**
-
-- component 很大时会退化到不必要的 O(n^2) 行为
-
-### P1-10. component 层是串行,merge 层又是细粒度并发,整体并发模型不直观
-
-**位置**
-
-- `LightRAG._grouping_process_chunk_results`
-- `operate._merge_nodes_and_edges_impl`
-
-**现状**
-
-- component 层 semaphore=1
-- component 内 entity / relation merge 又开并发
-
-**风险**
-
-- 对性能分析不友好
-- 对调优也不友好
-- 容易让人误判真正瓶颈
-
-## 6.3 P2 可维护性 / 接口设计问题
-
-### P2-1. `lightrag.py` 是典型 god file
-
-它当前同时承担:
-
-1. storage wiring
-2. chunking
-3. indexing orchestration
-4. query context
-5. delete by doc
-6. merge suggestions
-7. manual merge
-8. export
-
-建议后续最少拆成:
-
-1. `document_ingestion`
-2. `query_context`
-3. `graph_maintenance`
-4. `merge_suggestions`
-5. `facade`
-
-### P2-2. `operate.py` 也是 god file
-
-它当前同时承担:
-
-1. extraction
-2. merge/upsert
-3. query helper
-4. merge suggestion analysis
-
-这会导致:
-
-- 文件阅读成本高
-- contract 之间耦合过深
-- 小改动也容易扫到无关逻辑
-
-### P2-3. `GraphService` 混了过多职责
-
-当前同一个 service 里混着:
-
-1. graph read
-2. merge suggestion cache
-3. merge suggestion history
-4. manual merge action
-5. KG export
-
-更合理的拆法应该至少分成:
-
-1. `GraphReadService`
-2. `GraphMergeSuggestionService`
-3. `GraphMutationService`
-
-### P2-4. `GraphIndexer` 的抽象位置不清
-
-当前真实执行链路已经主要依赖:
-
-- reconciler
-- task worker
-- `process_document_for_celery`
-
-而 `GraphIndexer` 本身并不是真正执行者。
-
-建议后续要么:
-
-1. 把它收成真正的 scheduling abstraction
-2. 要么承认它已经不是核心执行面,减少误导性抽象
-
-### P2-5. `source_id` 与 `chunk_ids` 双表示导致心智复杂
-
-当前同一件事有两种表达:
-
-1. graph 层:`source_id: "chunk-achunk-b"`
-2. vdb 层:`chunk_ids: ["chunk-a", "chunk-b"]`
-
-这让很多逻辑都需要写两遍:
-
-1. merge
-2. delete
-3. overlap query
-4. update shared refs
-
-这是当前整条线复杂度长期居高不下的根因之一。
-
-## 7. 现有测试覆盖与缺口
-
-## 7.1 已经补得比较值钱的覆盖
-
-### 归一化
-
-- `test_normalize_extracted_info.py`
-- `test_normalize_simple.py`
-- `test_case_normalization.py`
-
-### query contract
-
-- `test_lightrag_query_contract.py`
-
-覆盖了:
-
-1. 默认参数不串状态
-2. helper 空结果稳定三元组
-3. mix vector-only 语义
-4. mode fallback
-
-### delete / chunk identity
-
-- `test_lightrag_chunk_identity_and_delete_contract.py`
-
-覆盖了:
-
-1. document-local chunk instance 语义
-2. shared-update / exclusive-delete
-
-### orchestration
-
-- `test_lightrag_orchestration_contract.py`
-
-覆盖了:
-
-1. zero-count
-2. serial semantics
-3. first-exception cancellation
-4. component filtering
-5. `aprocess_graph_indexing` 输入校验与异常传播
-
-### PG overlap 查询
-
-- `test_lightrag_pg_chunk_overlap.py`
-
-覆盖了:
-
-1. `&& CAST(... AS VARCHAR[])`
-2. entity / relation 两条路径
-
-## 7.2 仍然缺的覆盖
-
-### 缺口 1:写路径跨多存储的失败恢复
-
-当前没有系统测试覆盖:
-
-1. `chunks_vdb` 成功,`text_chunks` 失败
-2. graph upsert 成功,relation VDB 失败
-3. delete 途中失败后的残留状态
-
-### 缺口 2:manual merge 主链路
-
-当前没有看到足够系统的测试覆盖:
-
-1. `LightRAG.amerge_nodes`
-2. relation redirect
-3. target auto-select by degree
-4. graph / entity VDB / relation VDB 一致性
-
-### 缺口 3:不同 backend 的一致性 contract
-
-PG / Neo4j / Nebula 当前主要还是接口级兼容,不是语义级对齐测试。
-
-### 缺口 4:读路径的大规模 fan-out 场景
-
-当前没有针对这些热点的压力型/大输入 contract test:
-
-1. 大量 `source_id`
-2. 大量 `chunk_ids`
-3. 大量 edge pairs
-4. 大量 disconnected components
-
-## 8. 建议的整改顺序
-
-## 8.1 P0 correctness
-
-### 建议 1:把“update = delete + rebuild”写成显式系统 contract
-
-不是先改,而是先写清楚:
-
-1. 当前就不是增量 patch
-2. 失败时的恢复策略是什么
-3. 哪些状态可以接受短时空窗,哪些不可以
-
-### 建议 2:统一来源引用模型
-
-优先级很高。目标不是马上重构全图,而是先收敛 contract:
-
-1. graph 层也使用结构化来源集合
-2. 不再长期依赖 `source_id` 分隔符字符串
-
-### 建议 3:manual merge / delete / rebuild 至少补失败面测试
-
-优先补:
-
-1. delete 中途失败
-2. merge 中途失败
-3. rebuild 中 graph 已删、chunk 未重建
-
-## 8.2 P1 性能/资源风险
-
-### 建议 4:先补最值钱的 schema/index
-
-优先考虑:
-
-1. `lightrag_doc_chunks(workspace, full_doc_id)`
-2. `lightrag_vdb_entity.chunk_ids` GIN
-3. `lightrag_vdb_relation.chunk_ids` GIN
-
-### 建议 5:替换掉大 OR / LIKE 的热点查询
-
-优先顺序:
-
-1. `get_graph_edges_batch`
-2. `delete_graph_edges_batch`
-3. `get_graph_nodes_by_source_ids`
-4. `get_graph_edges_by_source_ids`
-
-### 建议 6:把 text unit fan-out 逻辑收成一致的 bounded concurrency
-
-不要再一边 batch_size=5 串行,一边无上限 gather。
-
-## 8.3 P2 可维护性
-
-### 建议 7:按已经冻结过的顺序做机械拆分
-
-最稳的顺序仍然是:
-
-1. 先拆 `query_context`
-2. 再拆 `document_ingestion / graph_maintenance`
-3. 最后再谈性能优化
-
-### 建议 8:明确哪条 merge/edit 逻辑是 canonical
-
-建议:
-
-1. `LightRAG.amerge_nodes` 作为唯一主线
-2. `utils_graph.py` 中平行实现要么删掉,要么明确退成 legacy/internal helper
-
-### 建议 9:收平 API 命名
-
-至少明确:
-
-1. `get_graph_labels()` 到底返回 entity ids 还是 entity types
-2. `get_knowledge_graph(max_depth)` 在不同 backend 下的真实能力
-3. `GraphIndexer` 的抽象边界
-
-## 9. 我对当前代码的总体判断
-
-当前这套实现不是“完全乱”,它已经有了一条相对清楚的主执行线:
-
-1. reconciler 驱动
-2. per-document rebuild
-3. extraction -> component grouping -> merge/upsert
-4. query / merge suggestion / manual merge 三条读写面
-
-但它也还远没到“设计收平”的阶段。更准确的判断是:
-
-- **当前已经补出了第一层 contract correctness**
-- **但还没把数据模型、事务边界、backend 一致性和文件职责真正收平**
-
-如果只看最近几轮修复,方向是对的:
-
-1. 先收 contract
-2. 再补 focused tests
-3. 再谈结构拆分
-
-后续最怕的事情不是“继续慢一点”,而是:
-
-- 在 `source_id/chunk_ids` 这类基础 contract 还没统一之前,直接做大规模模块重构或性能改造
-
-那样大概率只是把同样的问题搬到新文件里。
-
-## 10. 推荐的下一批 follow-up
-
-如果只开一批最值钱的 follow-up,我建议按下面的顺序:
-
-1. **数据模型/contract 设计文档 follow-up**
- - 先单独设计 `source_id/chunk_ids` 统一模型
-
-2. **manual merge correctness tests**
- - 补 `amerge_nodes` 主链路
-
-3. **schema/index 小批次优化**
- - `full_doc_id`
- - `chunk_ids` GIN
-
-4. **query / delete 热点 SQL 改造**
- - 去掉大 OR / LIKE 热点
-
-5. **机械拆分**
- - `query_context`
- - `document_ingestion`
- - `graph_maintenance`
-
-在这之前,不建议直接开“Graph 大重构”。
diff --git a/docs/zh-CN/design/graphindex_rewrite.md b/docs/zh-CN/design/graphindex_rewrite.md
deleted file mode 100644
index a9219e56a..000000000
--- a/docs/zh-CN/design/graphindex_rewrite.md
+++ /dev/null
@@ -1,509 +0,0 @@
-# Graph Index 模块重写(v2)
-
-> Status: **v2 已落地 + 归一化/合并能力已回填 + merge suggestion 已迁入独立
-> `graph_curation` 模块**。
->
-> 历史记录:
->
-> 1. 第一次落地:完成 extraction、storage、query 核心路径,删除 LightRAG + 三个 curation 功能。
-> 2. 第二次修订:把"简单截断 + 不走 LLM 合并"的激进裁剪**撤回**。
-> 用户反馈(原话):"我不同意'upsert_entities 改成累积描述 …
-> 纯确定性,不走 LLM'。我认为得走 LLM 做总结,你这样是丢失信息!
-> 合并 API 同理,原本的功能至少是能运行的 …"
-> 本次修订在 §4 新增 "归一化 + 合并" 章节,基于 LLM 摘要重新实现
-> 这两条路径;merge 端点从 410 改回 200。
->
-> 相关文档:`[vector_db_abstraction.md](./vector_db_abstraction.md)`。
-
----
-
-## 1. 本次重写的边界
-
-用户口径:
-
-> 我认为只要能做到根据文档生成图结构,然后按照一定的 schema 存储到图数据库
-> 就好了,没必要特别强调复用 lightrag 的一些数据结构。你可以认为这是一次
-> 垂直重写+切换。代码需要高内聚低耦合,写的好一点简单一些,我希望未来是
-> 0 维护免答疑的。
->
-> 我希望能彻底删掉 lightrag 的代码。你可以自己判断哪些值得做、哪些不值得做,
-> 也没有必要把 LightRAG 的功能全部搬过去。因为我是想要做一个 Graph Index 层,
-> 而不是重写一遍 LightRAG。
-
-落地后的功能集:
-
-
-| 功能 | 状态 | 说明 |
-| ---------------------------------------- | ---------------- | -------------------------------------- |
-| `index_document`(文档→图) | ✅ **原生实现** | 核心写路径 |
-| `delete_document`(删文档对应的实体/关系/chunks) | ✅ **原生实现** | 清理路径 |
-| `query_context`(RAG 查询→图上下文) | ✅ **原生实现** | 被 search pipeline 依赖 |
-| `get_labels`(列实体类型) | ✅ **原生实现** | UI 图浏览器依赖 |
-| `get_knowledge_graph`(拉一张子图) | ✅ **原生实现** | UI 图浏览器依赖 |
-| **description 归一化(LLM 摘要)** | ✅ **原生实现(v2.2)** | 累积多片段 → LLM 总结,详见 §4 |
-| **`merge_entities`(多个 entity 合并成 1 个)** | ✅ **原生实现(v2.2)** | 走 SQL 结构合并 + LLM 总结 description,详见 §4 |
-| `generate_merge_suggestions`(LLM 挖合并候选) | ✅ **迁出 Graph Index** | 现在归属独立 `graph_curation` workflow,详见 `graph_curation.md` |
-| `export_for_kg_eval`(导出评测数据) | ❌ **删除** | 管理工具;需要时可以基于 graphindex 表直接 dump |
-
-
-**被保留下来的两件事**(归一化 + 合并)是被第二轮评审明确要回的:
-没有 LLM 摘要,单纯拼接描述会让高频实体的 description 越长越乱;
-单纯在字符上限处截断又会丢信息。v2.2 的实现用 LLM 总结在写后和合并
-后各做一次压缩,保证语义不丢。详细设计见 §4。
-
-**边界保持不变:Graph Index 仍然不负责 merge suggestion discovery。**
-变化只在于:这条能力已经按第一性原理迁到了独立的
-`graph_curation` 模块,而不是继续停留在 `410 Gone`。`export_for_kg_eval`
-仍然不回;它是单独的管理工具,不属于 Graph Index 主链。
-
----
-
-## 2. 存储选型决策:只实现 PostgreSQL
-
-LightRAG v1 原本支持三个图后端:
-
-- `PGOpsSyncGraphStorage`(PG 模拟)
-- `Neo4JSyncStorage`
-- `NebulaSyncStorage`
-
-**v2 只实现 PostgreSQL**。Neo4j/Nebula 整个代码路径随 LightRAG 一起
-被删除,包括 `aperag/db/neo4j_sync_manager.py`、`aperag/db/nebula_sync_manager.py`、
-对应的驱动依赖(`neo4j`、`nebula3-python`、`nano-vectordb`)。理由:
-
-1. **用户要"简单 + 0 维护 + 免答疑"**。一个后端活得最干净。
-2. PG 已经在 ApeRAG 的主链路里——新增不增加部署组件。
-3. 真的有客户需要 Neo4j 时再加;代码路径可以按 `GraphStore`
- Protocol 从零实现一遍,不需要背上历史包袱。
-
----
-
-## 3. LLM 提取:JSON 而不是 tuple-delimited
-
-LightRAG v1 用自定义的 tuple-delimited 格式:
-
-```
-("entity"<|>Alex<|>person<|>description)##
-("relationship"<|>Alex<|>Bob<|>reason<|>keywords<|>5)##
-<|COMPLETE|>
-```
-
-好处是**轻量**;坏处是**LLM 经常输出格式错位**(少个分隔符、缺一段等),
-解析代码要容错大量 case。
-
-**v2 用 JSON 输出**:
-
-```json
-{
- "entities": [
- {"name": "Alex", "type": "person", "description": "..."}
- ],
- "relations": [
- {"source": "Alex", "target": "Bob", "description": "...", "weight": 5}
- ]
-}
-```
-
-理由:
-
-1. **现代 LLM API 原生支持 `response_format={"type":"json_object"}`**,结构
- 保证由 provider 端做(OpenAI、DeepSeek、通义、文心等主流都支持);
-2. JSON 解析一行代码 `json.loads(...)`,不需要自定义解析器;
-3. 格式错误的恢复代码从 "50 行正则 + 状态机" 降到 "try/except 抛掉坏 chunk";
-4. 实体/关系的字段**明确有类型**(weight 是 int,不是字符串),不需要 post-parse 强转。
-
-代价是输出 token 数略多(每个实体多几个 `"name":` 这类 JSON key),但
-在写路径不是热点,LLM 调用成本也没显著变化。
-
----
-
-## 4. 归一化(description 摘要)+ 合并(merge_entities)
-
-### 4.1 问题陈述
-
-用户反馈:
-
-> upsert_entities 不能简单 concat 再截断 —— 那是在丢信息。merge API
-> 同理,不走 LLM 的"合并"会丢语义。我想要的是**改善**代码,不是
-> **破坏功能**。
-
-触发这段反馈的上一版方案是纯确定性:
-- `upsert_entities`:`description = current || "\n\n" || new_fragment`,超过 4000 字符就截断到词边界 + `"…"`。
-- `merge_entities`:SQL 合并,结束;**不调 LLM**。
-
-问题:
-- 在高频实体(例如 "张三" 被 30 个 chunk 提到)上,accumulate 会很快撞上 4000 字符的硬上限,随后 **70% 的描述被截断**。
-- 合并后的 description 是 N 个碎片堆在一起,既难读又让下游的 retrieval prompt 膨胀。LightRAG 原版在 `force_llm_summary_on_merge` 分支会走 LLM 汇总;我们直接扔掉了这条路径。
-
-本次修订把 LLM 摘要加回来,但**关键分层没动**:存储层依旧不碰 LLM,
-压缩决策完全由 service 层负责。
-
-### 4.2 分层职责
-
-```
-┌───────────────────────────────────────────────────────┐
-│ aperag/graphindex/service.py GraphIndexService │
-│ • index_document 末尾 sweep oversized → LLM 摘要 │
-│ • merge_entities: store.merge_entities 之后按需 LLM 摘要 │
-│ • _should_summarize / _summarize / _fallback_truncate │
-└───────────────────────────────────────────────────────┘
- │ rewrite_entity_description / rewrite_relation_description
- ▼
-┌───────────────────────────────────────────────────────┐
-│ aperag/graphindex/storage/base.py GraphStore Protocol│
-│ • upsert_entities / upsert_relations — 纯 concat │
-│ • merge_entities — 纯 SQL 结构合并 │
-│ • find_oversized_entities / find_oversized_relations │
-│ • rewrite_entity_description / rewrite_relation_description │
-└───────────────────────────────────────────────────────┘
- │
- ▼
-┌───────────────────────────────────────────────────────┐
-│ aperag/graphindex/storage/postgres.py │
-│ • ON CONFLICT DO UPDATE: concat + substring-dedup │
-│ • 没有字符 cap、没有 LLM │
-└───────────────────────────────────────────────────────┘
-```
-
-**好处**:存储层可以在没有 LLM stub 的情况下做集成测试(`test_postgres_store.py`)。
-Service 层可以用纯 Python stub 测 LLM 决策分支(`test_service.py`)。
-
-### 4.3 累积规则(upsert 路径)
-
-**SQL 语句**:
-
-```sql
-description = CASE
- WHEN existing IS NULL OR existing = '' THEN incoming
- WHEN incoming IS NULL OR incoming = '' THEN existing
- WHEN position(incoming IN existing) > 0 THEN existing
- ELSE existing || :sep || incoming
-END
-```
-
-- **`:sep` = `"\n\n"`**,定义在 `aperag.domains.knowledge_graph.graphindex.dto.DESCRIPTION_SEPARATOR`。
-- **substring dedup**:同一份 boilerplate 出现在多个 chunk 时不会被写两次。这是第一版最常见的抱怨("我的 description 在重复自己")。
-- **没有 cap**:上层决定是否摘要、何时摘要。SQL 层的 contract 是"写多少存多少,只做 dedup"。
-
-### 4.4 摘要触发条件(`_should_summarize`)
-
-```python
-fragments = description.split("\n\n")
-if len(fragments) >= summarize_at_fragments: # 默认 6
- return True
-if len(description) >= max_description_chars: # 默认 4000
- return True
-return False
-```
-
-**两个阈值的职责不同**:
-- `summarize_at_fragments=6`(默认):正常触发点。LightRAG 默认 10;
- 我们取 6,因为**当 description 已经是 6 段拼凑出来的时候,人眼
- 读起来就已经像"N 条流水账"而不是一段连贯描述了**,早一点摘要能
- 让 RAG prompt 更紧凑。
-- `max_description_chars=4000`(默认):安全兜底。正常路径下不会
- 命中(6 片段 × 平均 400 字符 ≈ 2400)。只有当 `llm=None`(开发模式
- 禁用了 LLM)或者一个 chunk 里塞满了同一个实体时会兜到。
-
-### 4.5 摘要实现(`_summarize_description`)
-
-```python
-prompt = render_summarization_prompt(
- subject_kind="entity" | "relation",
- subject_label=entity.name | f"{src}→{tgt}",
- fragments=description.split("\n\n"),
- language=config.extraction_language,
- target_chars=config.summary_target_chars, # 默认 800
-)
-raw = await self._llm(prompt)
-```
-
-prompt(`aperag/graphindex/prompts.py::DESCRIPTION_SUMMARIZATION`)
-的核心约束:
-
-1. "每个 fragment 的事实都必须保留" —— 明确反对选择性丢弃;
-2. 矛盾点**两份都留**,下游 pipeline 再做冲突标注(不让 LLM 自作主张挑一边);
-3. 不添加原文外的信息;
-4. 只输出纯文本,不要 JSON 外壳 —— 避免单独写一个解析分支。
-
-**失败处理**:LLM 抛异常 / 返回空 → `_fallback_truncate`(词边界截断 +
-`" … [truncated]"` 标记)。`[truncated]` marker 是可 grep 的,运维需要
-的时候可以批量审计哪些行是"降级"写入的。
-
-### 4.6 merge_entities(合并 API)
-
-**两步走**,故意不放在一个 SQL transaction 里:
-
-**第 1 步** — `PostgresGraphStore.merge_entities`(纯 SQL):
-
-1. `SELECT ... FOR UPDATE` 锁 target + 每个 source;
-2. 在 Python 里 dedup 拼接 target.description + 每个 source.description;
-3. `SELECT` 出所有涉及 source 的 edge;`DELETE` 掉它们;
-4. 在 Python 里做 endpoint rewrite(source_id/target_id → target_id);自环 drop;**同 key 的 redirected edge 在 Python 里合并(union chunks / max weight / concat description)**,避免 `INSERT ... ON CONFLICT` 在一条语句里撞到两行同 key 触发 PG 的 `CardinalityViolationError`;
-5. `UPDATE` target 行(new description + union chunk_ids);
-6. `DELETE` source 行;
-7. 返回 `MergeEntitiesResult`(包括 **pre-summary** 的 description)。
-
-**第 2 步** — `GraphIndexService.merge_entities`(LLM 决策):
-
-```python
-result = await store.merge_entities(...)
-if self._should_summarize(result.description):
- summary = await self._summarize_description(...)
- await store.rewrite_entity_description(..., summary)
- result = dataclasses.replace(result, description=summary)
-return result
-```
-
-为什么**分两步、不把 LLM 放进同一个 transaction**:
-- LLM 调用延迟高(P95 几秒),不能占着 PG 的行锁;
-- 摘要失败时结构合并不应回滚 —— 结构合并的价值独立于描述美化;
-- 单元测试可以用 `_StubStore` 直接测 service 决策,不需要真 PG。
-
-### 4.7 配置
-
-`GraphIndexConfig`:
-
-| 字段 | 默认 | 含义 |
-| --------------------------------- | --- | ---------------------------------------- |
-| `summarize_at_fragments` | 6 | 达到多少个 `\n\n` 片段就触发 LLM 摘要 |
-| `max_description_chars` | 4000| 硬上限 / 兜底阈值 |
-| `summary_target_chars` | 800 | 给摘要 prompt 的目标字数(实际会浮动) |
-
-约束(`__post_init__`):
-- `summarize_at_fragments >= 2`;
-- `summary_target_chars < max_description_chars`(否则刚摘要完又会触发截断)。
-
-### 4.8 测试覆盖
-
-- `test_dto.py`:`MergeEntitiesResult` 字段锁定、`DESCRIPTION_SEPARATOR == "\n\n"`。
-- `test_postgres_store.py`(gated on `APERAG_TEST_GRAPHINDEX_PG_URL`):
- - `test_upsert_entity_accumulates_descriptions`:多次 upsert 拼接 N 段;
- - `test_upsert_entity_dedupes_identical_fragments`:相同片段不重复写入;
- - `test_find_oversized_entities_returns_rows_past_threshold`:按 char / fragment 阈值查询;
- - `test_rewrite_entity_description_replaces_in_place`:整段替换;
- - `test_merge_entities_redirects_edges_and_unions_chunks`:结构合并 + edge redirect + 自环 drop + 重复 edge collapse;
- - `test_merge_entities_missing_target_raises`。
-- `test_service.py`:
- - `test_index_document_summarizes_oversized_entities_via_llm`:**保证走 LLM,不是截断**;
- - `test_summarization_falls_back_to_truncation_only_when_llm_fails`:LLM 故障降级;
- - `test_index_document_skips_summary_when_no_oversized_rows`:happy path 不付出 LLM 成本;
- - `test_merge_entities_summarizes_merged_description`:合并后必定调 LLM + 持久化;
- - `test_merge_entities_skips_summary_on_short_description`:小合并不浪费 LLM call。
-
----
-
-## 5. 模块结构
-
-```
-aperag/graphindex/
-├── __init__.py # 仅 re-export 公共符号
-├── service.py # GraphIndexService:5 个 async 方法 + Celery 同步包装
-├── dto.py # 所有 DTO 集中定义(Entity/Relation/Chunk/...)
-├── config.py # GraphIndexConfig:注入式,不读 env
-├── prompts.py # 独立 LLM prompt;JSON 输出;一套即可
-├── engine/
-│ ├── __init__.py
-│ ├── chunking.py # 文档切分:简单按 token 窗口+重叠
-│ ├── extraction.py # 单次 LLM 调用:chunk → {entities, relations}
-│ └── indexer.py # 协调:chunks → extractions → persist
-├── storage/
-│ ├── __init__.py
-│ ├── base.py # GraphStore Protocol(~10 方法,全 DTO 化)
-│ └── postgres.py # PostgresGraphStore:唯一可用实现
-└── models.py # SQLAlchemy 模型:graphindex_nodes / _edges / _chunks
-```
-
-**文件数:~11**。每个文件职责单一;`service.py` 是唯一外部 import 入口。
-
-### 4.1 `service.py` 的契约
-
-```python
-from aperag.domains.knowledge_graph.graphindex import GraphIndexService
-
-svc = GraphIndexService.from_config(config) # 单例也行、每次 new 也行
-
-# 写
-await svc.index_document(collection_id, doc_id, content, file_path)
-# 删
-await svc.delete_document(collection_id, doc_id)
-# 查(RAG 用)
-ctx = await svc.query_context(collection_id, query, top_k=10)
-# UI
-labels = await svc.get_labels(collection_id)
-kg = await svc.get_knowledge_graph(collection_id, label="*", max_depth=2, max_nodes=500)
-```
-
-**没有第 6 个方法**。策展功能暂留 v1。
-
-### 4.2 `GraphStore` Protocol(`storage/base.py`)
-
-10 个方法,覆盖 v2 全部需要:
-
-```python
-class GraphStore(Protocol):
- # collection lifecycle
- async def ensure_schema(self) -> None: ... # DDL 幂等
- async def drop_collection(self, collection_id: str) -> None: ...
-
- # write
- async def upsert_chunks(self, collection_id: str, chunks: Sequence[Chunk]) -> None: ...
- async def upsert_entities(self, collection_id: str, entities: Sequence[Entity]) -> None: ...
- async def upsert_relations(self, collection_id: str, relations: Sequence[Relation]) -> None: ...
-
- # delete
- async def delete_document_rows(self, collection_id: str, doc_id: str) -> DeleteDocumentResult: ...
-
- # read
- async def find_entities_by_names(self, collection_id: str, names: Sequence[str]) -> list[Entity]: ...
- async def find_entities_near(
- self, collection_id: str, anchor_ids: Sequence[str], max_hop: int, limit: int
- ) -> tuple[list[Entity], list[Relation]]: ...
- async def list_labels(self, collection_id: str) -> list[str]: ...
- async def list_subgraph(
- self, collection_id: str, label: str | None, max_depth: int, max_nodes: int
- ) -> KnowledgeGraph: ...
-```
-
-**比 LightRAG 的 `BaseGraphStorage`(24 方法)小一半**,因为 v2 不需要:
-
-- `has_node` / `has_edge`:`upsert_`* 是幂等的,调用方不需要先 exist-check;
-- `node_degree` / `edge_degree` + 所有 `*_batch` 变体:读路径不用;
-- `get_nodes_by_source_ids` / `get_top_degree_nodes`:UI 查 label 用
-`list_subgraph` 即可;
-- `remove_nodes` / `remove_edges`:用 `delete_document_rows` 按 doc 删,
-不按 node 删。
-
-**设计原则**:每个抽象方法都有实际调用方;没有调用方 == 不该存在。
-
----
-
-## 6. 数据模型
-
-### 5.1 新表(不碰 `lightrag_graph_`*)
-
-```sql
-CREATE TABLE graphindex_nodes (
- id BIGSERIAL PRIMARY KEY,
- collection_id TEXT NOT NULL,
- entity_id TEXT NOT NULL, -- hash(collection_id + normalized_name)
- name TEXT NOT NULL,
- type TEXT NOT NULL,
- description TEXT NOT NULL DEFAULT '',
- source_chunks TEXT[] NOT NULL DEFAULT '{}', -- chunk_id list
- created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
- updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
- UNIQUE (collection_id, entity_id)
-);
-CREATE INDEX graphindex_nodes_cid_type ON graphindex_nodes (collection_id, type);
-CREATE INDEX graphindex_nodes_cid_name ON graphindex_nodes (collection_id, name);
--- GIN on source_chunks for "find nodes touching this chunk" queries:
-CREATE INDEX graphindex_nodes_source_chunks ON graphindex_nodes USING GIN (source_chunks);
-
-CREATE TABLE graphindex_edges (
- id BIGSERIAL PRIMARY KEY,
- collection_id TEXT NOT NULL,
- source_id TEXT NOT NULL, -- entity_id
- target_id TEXT NOT NULL,
- description TEXT NOT NULL DEFAULT '',
- weight NUMERIC(6,3) NOT NULL DEFAULT 0,
- source_chunks TEXT[] NOT NULL DEFAULT '{}',
- created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
- updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
- UNIQUE (collection_id, source_id, target_id)
-);
-CREATE INDEX graphindex_edges_cid_src ON graphindex_edges (collection_id, source_id);
-CREATE INDEX graphindex_edges_cid_tgt ON graphindex_edges (collection_id, target_id);
-
-CREATE TABLE graphindex_chunks (
- id BIGSERIAL PRIMARY KEY,
- collection_id TEXT NOT NULL,
- chunk_id TEXT NOT NULL, -- stable UUID
- doc_id TEXT NOT NULL,
- order_in_doc INTEGER NOT NULL,
- text TEXT NOT NULL,
- file_path TEXT NOT NULL DEFAULT '',
- created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
- UNIQUE (collection_id, chunk_id)
-);
-CREATE INDEX graphindex_chunks_cid_doc ON graphindex_chunks (collection_id, doc_id);
-```
-
-**三张表**(vs v1 的 `lightrag_graph_nodes` / `lightrag_graph_edges` +
-按 namespace 拆分的 KV / vector 约 8 张表)。旧表在这个 PR 里被一次性
-drop(见 §6)。
-
-### 5.2 实体向量:复用 `aperag/vectorstore`
-
-v2 **不在 graphindex 内部管理实体向量**。实体的向量落在
-`aperag/vectorstore/pgvector` 的 `aperag_vectors__cosine` 里,通过
-`VectorPoint.payload = {"kind": "entity", "collection_id": ..., "entity_id": ...}`
-打标。检索时用 `VectorStoreConnector.search(QueryRequest(..., flt=...))` 加
-DSL 过滤。
-
-这样**向量抽象只有一套**,运维看到的就是统一的 `aperag_vectors_*` 集合。
-
----
-
-## 7. 数据切换策略
-
-- **旧表彻底删除**。新 Alembic 迁移 `f1e2d3c4b5a6` 一次性 drop:
- - `lightrag_graph_nodes` / `lightrag_graph_edges`
- - `lightrag_doc_chunks` / `lightrag_vdb_entity` / `lightrag_vdb_relation`
- - `graph_index_merge_suggestions` / `graph_index_merge_suggestions_history`
-- **不做数据迁移**。用户口径是"切换"——每个 collection 按需 re-index 到
- `graphindex_*`,旧 LightRAG 表里的内容直接丢弃。
-- **没有 fallback、没有 cutover 标记表**。上一版文档描述的 "双后端回退 +
- `graphindex_collection_state`" 方案被删除:无 LightRAG 可回退的情况下,
- 这套机制是纯维护负担。一个新建 collection 在首次 `index_document`
- 之前 `get_labels` 就会返回空 list,这是正确行为,不是 bug。
-
----
-
-## 8. 业务层切换点
-
-所有 5 处 graph truth 调用都改指 `aperag/graphindex`;merge suggestion
-改由独立的 `graph_curation` 模块接管;只剩 `kg-eval` 导出路由保留 `410`。
-
-
-| 位置 | 本 PR 改动 |
-| ---------------------------------------------------------- | ------------------------------------------------------------------------ |
-| `aperag/service/graph_service.py::get_graph_labels` | → `graphindex.get_labels` |
-| `aperag/service/graph_service.py::get_knowledge_graph` | → `graphindex.get_knowledge_graph` |
-| `aperag/service/search_pipeline_service.py::_graph_search` | → `graphindex.query_context` |
-| `aperag/tasks/collection.py::_delete_knowledge_graph_data` | → `run_drop_collection_sync` |
-| `aperag/tasks/document.py`(3 处 celery 调用) | → `run_index_document_sync` / `run_delete_document_sync` |
-| `aperag/service/prompt_template_service.py::hardcoded[graph]` | → `graphindex.prompts.ENTITY_RELATION_EXTRACTION` |
-| `aperag/views/graph.py::merge_nodes_view` | **200**,委托 `graph_service.merge_entities` → `GraphIndexService.merge_entities` |
-| `aperag/views/graph.py::merge_suggestions*` | → `graph_curation_service.start_run / get_latest / handle_action` |
-| `aperag/views/graph.py::export_kg_eval_view` | **410 Gone**(管理工具,本次范围外) |
-
-
----
-
-## 9. 代码质量约束
-
-全文档贴一条(避免未来再问为什么这样写):
-
-- **禁止 from aperag.domains.knowledge_graph.graphindex.engine import ...**:业务层只准 `from
- aperag.domains.knowledge_graph.graphindex import GraphIndexService`(和 DTO)。engine 是内部。
-- **禁止 models.py 被外部导入**:SQLAlchemy 模型只给 `storage/postgres.py`
- 和 alembic 用。
-- **每个 public 方法有 docstring**,解释:做什么、幂等性、线程安全性、
- 生命周期。
-- **每个 module 顶部有 module docstring**,解释本文件的边界和不负责什么。
-- **测试覆盖**:每个 public 方法 **至少 1 个** unit test +(gated by env
- 的)1 个 integration test。
-
----
-
-## 10. 不做什么
-
-- 不做 `BaseGraphStorage` 那种 24 方法的胖接口;
-- 不做 "gleaning"(LightRAG 的多轮提取)—— 一次 LLM 调用足够,多轮复杂度
- 不值这么点召回率;
-- 不做 incremental merge(LLM 判断两个 entity 该不该合并)—— 不属于
- Graph Index 层的责任;
-- 不做 workspace 嵌套—— 直接用 `collection_id` 作为分区键;
-- 不做 KV storage / doc status storage —— 这些是 LightRAG 内部实现细节,
- v2 的流程不需要它们;
-- 不做 v1 ↔ v2 数据迁移工具 —— 切换是硬切换,用户按需 re-index。
diff --git a/docs/zh-CN/design/lightrag_refactor.md b/docs/zh-CN/design/lightrag_refactor.md
deleted file mode 100644
index eae1f4865..000000000
--- a/docs/zh-CN/design/lightrag_refactor.md
+++ /dev/null
@@ -1,581 +0,0 @@
-# LightRAG 模块重构方案
-
-> Status: **设计与计划文档(仅文档,无代码改动)**。本文与同目录
-> [`graph_db_abstraction.md`](./graph_db_abstraction.md) 是姊妹文档,
-> 两份一起读才能判断先做哪个、还是一起做。
-
----
-
-## 0. 本文回答三个问题
-
-1. **LightRAG 模块现在"跟 ApeRAG 架构不搭"具体是什么?**(§3 事实清单)
-2. **应该怎么重构它?**(§5 目标形态 + §6 分阶段方案)
-3. **LightRAG 重构 vs 图数据库抽象层,先做哪个?还是一起做?**(§7)
-
----
-
-## 1. 背景
-
-用户原话:
-
-> 我的这个 ApeRAG 项目一开始的 GraphRAG 模块是使用的是 LightRAG 的代码,
-> 当时为了快速启动,直接把他们的代码库拿了进来并进行了深度修改。
->
-> 但这带来了一个历史负担:
-> 1. 他们的代码写得不是特别好
-> 2. 作为一个相对独立的模块存在于我的系统中,跟我的架构并不是特别搭
->
-> 理论上来说,LightRAG 应该成为一个类似于 Web 架构中的 service 模块,
-> 这样其他模块调用和使用它都会变得更顺畅一些。
-
-解构一下,其实是三个独立的问题叠在一起:
-
-| 抱怨 | 实际症结 | 解决手段 |
-|---|---|---|
-| 他们代码写得不好 | 内部实现质量(方法过多、默认实现坑、命名混乱) | 内部重构(**可延后**) |
-| 跟我的架构不搭 | 没有清晰的对外接口 / 类型跨模块泄漏 | 定义 facade(**应先做**) |
-| 应该是 service 模块 | 生命周期笨重 / 调用方直接 new 实例 | facade + 生命周期接管(**应先做**) |
-
-**关键判断**:三件事里,"定义对外接口"是**根因**;把它做掉,另外两个
-问题一半以上自动消解;而即便只做这一件也能立刻看到架构清晰度的提升。
-这是本文建议先做的部分。
-
----
-
-## 2. 术语澄清:"service 模块"是什么
-
-> "类似于 Web 架构中的 service 模块"
-
-在 Web 后端架构里,"service 模块"通常指**进程内的服务层**:夹在
-controller/view 与 repository 之间,负责组织业务流程、对外暴露一组
-清晰方法、对内协调多个数据源。**不**等于独立部署的微服务(microservice)。
-
-**本文按这个口径定义 LightRAG 重构目标**:
-
-- 目标形态 = ApeRAG 进程内的一个 **service 模块**(in-process);
-- 对外暴露的接口 = 一组方法 + 一组 DTO;
-- 对内管理 LightRAG 的全部实现细节,业务代码不再直接 import 它的内部
- 类型;
-- **物理拆成独立 service** 是**后续可选**步骤(§6.3),不是本次重构的
- 硬目标。
-
-这个区分很重要,因为——
-
-- 如果目标是 microservice,现在就要设计 OpenAPI、HTTP 客户端、数据序列
- 化、部署流水线。工作量 ≥ 两周。
-- 如果目标是 in-process service 模块,**工作量是两天**:新增一个
- facade 文件 + 搬运几处 import。
-
-**本文推荐先做 in-process 形态**,并在 §6.3 说明什么时候再升级成
-microservice。
-
----
-
-## 3. 现状事实清单:跨模块耦合
-
-这一节基于对代码库的实测扫描。LightRAG 的内部类型泄漏到 ApeRAG
-其他模块的位置(截至 2026-04-22):
-
-### 3.1 业务层直接 import LightRAG 内部类型
-
-```text
-aperag/tasks/collection.py:27 from aperag.graph import lightrag_manager
-aperag/tasks/document.py:112,214,302 from aperag.graph.lightrag_manager import process_document_for_celery, delete_document_for_celery
-aperag/service/graph_service.py:22 from aperag.graph import lightrag_manager
-aperag/service/graph_service.py:23 from aperag.graph.lightrag.types import KnowledgeGraph
-aperag/service/search_pipeline_service.py:265 from aperag.graph import lightrag_manager
-aperag/service/search_pipeline_service.py:266 from aperag.graph.lightrag import QueryParam
-aperag/service/prompt_template_service.py:156 from aperag.graph.lightrag.prompt import PROMPTS
-aperag/db/repositories/graph.py:23 from aperag.graph.lightrag.prompt import GRAPH_FIELD_SEP
-```
-
-8 处泄漏,三种不同形态:
-
-1. **manager / 工厂** 被直接 import(4 处)—— 业务代码自己 `create_lightrag_instance`
- 并管理 `try/finally`;这就是 §3.3 bug 的温床。
-2. **内部 DTO** 被直接 import:`KnowledgeGraph`、`QueryParam`(2 处)——
- 换实现就得同步改 DTO 定义。
-3. **内部常量** 被直接 import:`PROMPTS`、`GRAPH_FIELD_SEP`(2 处)——
- `aperag/db/repositories/graph.py` 作为最底层都要知道 LightRAG 的分
- 隔符,这是**明显的层次反转**。
-
-### 3.2 生命周期管理散落
-
-LightRAG 实例的 `try/finally finalize_storages()` 模式在 6 个业务 handler
-里重复:
-
-```text
-aperag/service/graph_service.py 5 处 (get_graph_labels / get_knowledge_graph / generate_merge_suggestions / _execute_merge_operation / export_for_kg_eval)
-aperag/tasks/collection.py 1 处 (_delete_lightrag)
-aperag/graph/lightrag_manager.py 2 处 (_process_document_async / _delete_document_async)
-```
-
-重复 = 容易漏。实测里就有 1 处漏写:
-
-### 3.3 已被坐实的 bug:`_graph_search` 没有 finalize
-
-`aperag/service/search_pipeline_service.py:265-273`:
-
-```python
-rag = await lightrag_manager.create_lightrag_instance(collection)
-param = QueryParam(mode="hybrid", only_need_context=True, top_k=top_k)
-context = await rag.aquery_context(query=query, param=param)
-if not context:
- return []
-return [DocumentWithScore(text=context, metadata={"recall_type": "graph_search"})]
-```
-
-**没有 `try/finally`,没有 `finalize_storages()`**。这条代码路径每次
-图检索都泄漏一批 storage 对象的引用,等 GC。
-
-**这就是"没有 service 模块接管生命周期"的直接代价**——生命周期责任散在
-8 个业务 handler 里,漏 1 个就是 bug,没办法靠 code review 长期盯住。
-
-### 3.4 配置通过环境变量传递
-
-`lightrag_manager.create_lightrag_instance` 每次调用都:
-
-```python
-kv_storage = os.environ.get("GRAPH_INDEX_KV_STORAGE")
-vector_storage = os.environ.get("GRAPH_INDEX_VECTOR_STORAGE")
-graph_storage = os.environ.get("GRAPH_INDEX_GRAPH_STORAGE")
-```
-
-—— 函数内部读全局环境变量。后果:
-
-- 单元测试想换后端就要改环境变量,不能用 DI;
-- 同一进程不能用不同后端(测试想同时跑 pg 和 neo4j 场景时难办);
-- 配置来源隐式,看函数签名完全不知道它会读 env。
-
-这是 12-factor 的典型 anti-pattern:**配置应该在构造时注入,不是在每次
-调用时读**。
-
-### 3.5 `lightrag_manager` 与 ApeRAG globals 的耦合
-
-`aperag/graph/lightrag_manager.py` 导入:
-
-```python
-from aperag.db.models import Collection # ApeRAG schema
-from aperag.db.ops import db_ops # ApeRAG repository
-from aperag.llm.embed.base_embedding import get_collection_embedding_service_sync # ApeRAG LLM
-from aperag.schema.utils import parseCollectionConfig # ApeRAG schema
-```
-
-换句话说:"LightRAG 模块"知道 ApeRAG 的 collection schema、db_ops、embedding
-service 是什么。这就是"**跟我的架构不是特别搭**"的具体症状:**不是
-ApeRAG 依赖 LightRAG,而是 LightRAG 依赖 ApeRAG**——模块边界被反向
-穿透了。
-
-一个清晰的 service 模块应该倒过来:ApeRAG 业务代码 → 调 service 模块 →
-service 模块内部用独立的小函数去做事,这些小函数接受注入的
-embedding/LLM/storage 工厂,而不是 `from aperag.db.ops import db_ops`。
-
-### 3.6 内部代码体量
-
-```text
-aperag/graph/lightrag/ ~8700 行
- ├─ lightrag.py 1800 (LightRAG 主类)
- ├─ operate.py 2368 (抽取 / 合并 / 查询 operations)
- ├─ utils.py 731
- ├─ utils_graph.py 680
- ├─ base.py 659 (BaseGraphStorage / BaseKVStorage / BaseVectorStorage)
- ├─ prompt.py 493
- ├─ kg/ (3 后端 + kv + vector) ~2900
- └─ ...
-aperag/graph/lightrag_manager.py 347
-```
-
-**~10k 行代码**。这是"作为一个相对独立的模块存在于我的系统中"的规模。
-任何深度重构都要考虑这个量级——不是一个下午能搞定的。
-
----
-
-## 4. 诊断:三类问题,三类解决手段
-
-把 §3 的事实归类:
-
-| 类别 | 症状 | 根因 | 建议 |
-|---|---|---|---|
-| **接口泄漏**(§3.1、§3.2、§3.3) | 8 处跨模块导入 + 生命周期散落 + 因此产生的 bug | 没有对外 facade | **Phase 1,必做** |
-| **配置劫持**(§3.4) | 函数内读 env | 没有依赖注入 | **Phase 1 附带** |
-| **反向依赖**(§3.5) | LightRAG 知道 ApeRAG schema | 历史为了快速搬代码的捷径 | **Phase 2,可延后** |
-| **内部代码质量**(§3.6) | 巨文件、命名、默认实现坑 | 上游代码被深度修改后没整理 | **Phase 2/3,可延后** |
-
-**Phase 1 必做** 是因为不做就持续踩 §3.3 那类 bug;**Phase 2 可延后**
-是因为"他们代码不好"不是紧迫问题——**在一个对外接口干净的模块内部**,
-代码再糟都可以慢慢修,**不影响 ApeRAG 其他模块**。
-
----
-
-## 5. 重构后的目标形态
-
-### 5.1 对外(ApeRAG 业务代码看到的)
-
-**唯一的 public API 位于一个文件**:
-
-```python
-# aperag/graph/service.py (NEW — or reuse existing aperag/graph/__init__.py)
-
-from aperag.graph.dto import (
- KnowledgeGraph, GraphLabels, GraphContext,
- MergeSuggestion, MergedNode, KGEvalExport,
- IndexDocumentResult, DeleteDocumentResult,
-)
-
-
-class GraphIndexService:
- """Business-facing service module for knowledge-graph operations.
-
- Every ApeRAG module that needs to read / write / query the knowledge
- graph imports ONLY from this class and the DTOs next to it. Anything
- else in ``aperag/graph/`` is implementation detail.
- """
-
- def __init__(self, *, config: GraphIndexConfig) -> None: ...
-
- async def index_document(self, collection, doc_id, content, file_path) -> IndexDocumentResult: ...
- async def delete_document(self, collection, doc_id) -> DeleteDocumentResult: ...
- async def query_context(self, collection, query, top_k) -> GraphContext: ...
- async def get_labels(self, collection) -> GraphLabels: ...
- async def get_knowledge_graph(self, collection, label, max_depth, max_nodes) -> KnowledgeGraph: ...
- async def generate_merge_suggestions(self, collection, top_k) -> list[MergeSuggestion]: ...
- async def merge_nodes(self, collection, source_ids, target_id) -> MergedNode: ...
- async def export_for_kg_eval(self, collection) -> KGEvalExport: ...
-```
-
-**对外就是这 9 个方法 + 约 9 个 DTO**。`PROMPTS` / `GRAPH_FIELD_SEP` 如果
-业务层真的需要(§3.1 的 2 处),re-export 到 `aperag/graph/dto.py`;更
-好的选择是把它们用到的地方提取成 service 方法(不让常量跨模块)。
-
-### 5.2 对内(不变的 + 微调的)
-
-```
-aperag/graph/
-├── service.py <- NEW: GraphIndexService 入口
-├── dto.py <- NEW: 9 个 DTO 集中定义
-├── config.py <- NEW: GraphIndexConfig(注入式,不再读 env)
-├── lifecycle.py <- NEW: request-scoped rag 缓存 / pool
-├── lightrag_manager.py <- 保留:包成 service 内部的工厂,不再被业务 import
-└── lightrag/ <- 保留:全部 ~8700 行代码原样不动
- ├── lightrag.py
- ├── operate.py
- ├── base.py
- ├── kg/
- └── ...
-```
-
-**Phase 1 不动 `aperag/graph/lightrag/` 内部的任何一行代码**。这是本方案
-的核心克制:重构是外部接口,不是内部实现。
-
-### 5.3 整体图
-
-```text
-┌──────────────────────────────────────────────────────┐
-│ ApeRAG 业务代码 │
-│ (graph_service / search_pipeline / tasks / ...) │
-└──────────────────┬───────────────────────────────────┘
- │ 依赖 import
- ▼
-┌──────────────────────────────────────────────────────┐
-│ aperag/graph/service.py (GraphIndexService) │ <- facade
-│ aperag/graph/dto.py (DTOs) │ <- types
-│ aperag/graph/config.py (GraphIndexConfig) │ <- config
-└──────────────────┬───────────────────────────────────┘
- │ 内部实现
- ▼
-┌──────────────────────────────────────────────────────┐
-│ aperag/graph/lightrag_manager.py (工厂) │
-│ aperag/graph/lifecycle.py (生命周期) │
-└──────────────────┬───────────────────────────────────┘
- │ 包装
- ▼
-┌──────────────────────────────────────────────────────┐
-│ aperag/graph/lightrag/ (~8700 lines, 原样不动) │
-│ ├─ LightRAG + operate + base + prompt │
-│ └─ kg/ (PG / Neo4j / Nebula) │
-└──────────────────────────────────────────────────────┘
-```
-
-**两条边界线**:
-
-1. **facade 边界**(`service.py`):业务代码不能越过这条线往下看。
-2. **engine 边界**(`lightrag/` 目录):service 模块内部不需要关心这个
- 目录里怎么实现,只通过 `lightrag_manager` + `lifecycle` 使用它。
-
----
-
-## 6. 分阶段执行方案
-
-### Phase 1:建立 facade(推荐**先做**,~3 天)
-
-落地清单:
-
-- [ ] 新增 `aperag/graph/service.py`:`GraphIndexService` 类,9 个方法
- 的身体是"调 `create_lightrag_instance` + `try/finally
- finalize_storages()`"的直接搬运。
-- [ ] 新增 `aperag/graph/dto.py`:9 个 DTO。
- - `GraphContext`、`KnowledgeGraph`、`GraphLabels`、`MergeSuggestion`、
- `MergedNode`、`KGEvalExport`、`IndexDocumentResult`、
- `DeleteDocumentResult`;
- - `aperag/graph/lightrag/types.py::KnowledgeGraph` 不删,在 DTO 层
- 薄包装或直接 re-export。
-- [ ] 新增 `aperag/graph/config.py`:`GraphIndexConfig` dataclass。接受
- `kv_storage / vector_storage / graph_storage` 等字段;**从 env 读一次**
- 存到单例里,整个 service 生命周期只读这一次。
-- [ ] 新增 `aperag/graph/lifecycle.py`:两件事
- - request-scoped `rag` cache(FastAPI dependency)——同一请求内多次
- 调 `GraphIndexService` 复用同一个 LightRAG 实例;
- - 进程级 `GraphIndexService` 单例入口。
-- [ ] 迁移 8 处业务层 import(§3.1):
- - `graph_service.py` / `search_pipeline_service.py` / `tasks/collection.py`
- / `tasks/document.py` 改为 `from aperag.graph.service import
- graph_index_service`;
- - `prompt_template_service.py` / `db/repositories/graph.py` 的常量依赖
- 单独处理(re-export 或消除)。
-- [ ] `search_pipeline_service._graph_search` 的 finalize 漏洞(§3.3)
- **自然消失**——生命周期不再是业务代码的责任。
-- [ ] 单元测试:`GraphIndexService` 的契约测试,mock LightRAG 内部。
-
-**工作量估计**:
-- 纯添加 facade:半天;
-- 迁移 8 处导入 + 回归测试:1 天;
-- 生命周期 + 配置注入:1 天;
-- 代码 review + 小修小补:0.5 天。
-
-**Phase 1 产出**:业务代码 `grep -r "lightrag" aperag/service aperag/tasks
-aperag/db` 为 0 命中。这就是"跟我的架构搭了"的客观指标。
-
-### Phase 2:内部整理(按需,~1 周)
-
-在 Phase 1 完成之后,LightRAG 内部代码怎么烂都**不影响 ApeRAG 其他模块**。
-所以 Phase 2 的东西**完全可以不做**,做也只在有明确收益时做:
-
-- [ ] 改名:`aperag/graph/lightrag/` → `aperag/graph/engine/`(或别的
- 不带 "LightRAG" 字样的名字)。一次性改所有 import,~30 个文件的字符
- 串替换,纯机械工作。
-- [ ] 消除 §3.5 的反向依赖:
- - `lightrag_manager.py` 不再 import `aperag.db.models.Collection`;
- 改成接受 "workspace_id + collection_config + embedding_func +
- llm_func" 四个原始参数;
- - `kg/pg_ops_sync_*.py` 内部不再 `from aperag.db.ops import db_ops`;
- 通过构造注入一个 `DbOps` 接口;
- - 这一步让"engine"模块**理论上**可以独立发布成 package;**实际上**
- 本次不做这一步,标在 §6.3 未来拆服务时做。
-- [ ] 拆分 `lightrag.py`(1800 行)和 `operate.py`(2368 行):两个文件
- 各自拆成 3~5 个小文件。**看需要做**。
-- [ ] 清理 R1/R2/R3 / R4/R5/R6 的内部问题(详见
- [`graph_db_abstraction.md`](./graph_db_abstraction.md) §3)。
-
-**Phase 2 最大的收益**不在代码质量本身,而在于**让"这块代码将来真的能
-搬出去"变得可能**。Phase 1 只是画了门,Phase 2 是确保门后的房子可以被
-整体搬走。
-
-### Phase 3:物理拆成独立 service(远期,独立项目)
-
-当且仅当以下**至少一个**信号出现再启动:
-
-- 图索引的 LLM 调用集中消耗了应用进程的 GIL / 异步调度预算;
-- 不同租户的图索引需要**独立的资源配额**(CPU / 内存隔离);
-- 要让图索引集群独立水平扩展(比如接海量文档摄入);
-- 有独立的 lightrag team 要专门维护这个 service。
-
-如果以上都没有,**拆成独立 service 是负收益**:新增运维组件、RPC 延迟、
-版本兼容面。
-
-Phase 3 本身的设计见 [`graph_db_abstraction.md`](./graph_db_abstraction.md) §4.1
-的拓扑图;本文档不再展开。
-
----
-
-## 7. 先做 LightRAG 重构还是先做图 DB 抽象?
-
-**推荐:先做 LightRAG 重构的 Phase 1**。理由如下。
-
-### 7.1 两份方案的关系重读
-
-| | 图 DB 抽象方案(姊妹文档) | LightRAG 重构方案(本文) |
-|---|---|---|
-| **Layer A**(`BaseGraphStorage` + 3 后端) | M3(按需清理) | Phase 2 内部整理(按需) |
-| **Layer B**(`GraphIndexService`) | M2 | **Phase 1 核心交付物** |
-| **小清洁**(R1/R2/R3) | M1 | Phase 1 自然消化 |
-| **物理拆 service** | M4 | Phase 3 |
-
-**Layer B 就是 LightRAG 重构的 Phase 1 对外面**——不是两件事,是一件事
-的两个视角。
-
-### 7.2 先做 LightRAG 重构的理由
-
-1. **"存储抽象"这件事的价值**在 LightRAG 没有 facade 的时候**是负的**。
- 你辛苦把存储层抽象干净了,业务代码依然直接调 `rag.get_knowledge_graph`
- / `rag.aquery_context`,表面上看不出抽象的好处。
-2. **facade 一旦建立,storage 抽象的问题大部分消失**:Layer A 是 LightRAG
- 内部的事,对 ApeRAG 业务层不可见,怎么设计都无所谓。
-3. **R1/R2/R3 的 bug 会被 Phase 1 自然修掉**。分开做反而增加冲突。
-4. **Phase 1 的 diff 很小、风险很低**:约 3 天,纯添加 + 改 import;
- 不改内部实现。不会破现有行为。
-5. **Phase 2 以后的任何"内部清理"**都有一个干净的外部边界做保护——可以
- 放心大胆修内部。没有 Phase 1 这个边界,任何内部清理都有外泄风险。
-
-### 7.3 先做图 DB 抽象的"反论"和回应
-
-**反论 A**:"图 DB 抽象的图后端场景已经实际存在(PG/Neo4j/Nebula 都在
-用),更紧迫。"
-
-回应:三个后端**已经在 work**(通过 `GRAPH_INDEX_GRAPH_STORAGE` env
-切换)。紧迫的不是抽象层,是**业务代码不受内部重构影响**——这恰恰是
-facade 解决的问题。
-
-**反论 B**:"Layer A 的设计已经基本稳定(`BaseGraphStorage` + 25 测试),
-立刻落地没风险。"
-
-回应:**现状已经稳定,没有必要现在动它**。M3 的接口分层/清理是"锦上添花",
-不做不会挂。
-
-**反论 C**:"先做小的(storage)练手,再做大的(facade)?"
-
-回应:facade 的 diff **不比** storage 清理大。而且 facade 做完之后,
-storage 清理的收益变成"纯内部的代码整洁度",不是架构级收益。
-
-### 7.4 顺序建议(明确)
-
-**1.** Phase 1(LightRAG 重构 facade)——~3 天,必做
-**2.** 暂停,观察 1~2 个月
-**3.** 按需做 Phase 2(内部整理)或图 DB 抽象 M3(存储层清理)——两者
- 都成为"内部装修",随意排序,取决于哪块先出现痛点
-**4.** 远期如有触发信号,启动 Phase 3(物理拆 service)
-
-**不推荐同时做 Phase 1 + 图 DB 抽象 M2/M3**。原因:
-- Phase 1 本身的 diff 已经能波及 8 个文件;
-- 同时改内部会让 review 难度和 rollback 风险陡增;
-- 本来就是同一个方向,没必要并发。
-
----
-
-## 8. 反过度设计:什么时候**不**做
-
-一如既往地列一下不做的条件:
-
-- **ApeRAG 只有 1 人维护,且图功能不是卖点** → Phase 1 的 3 天也省了,
- 现状能 work 就行。
-- **短期内(半年)没有新增图后端 / 新增调用方 / 大规模扩容的计划** →
- Phase 2 永远不做都行。
-- **确信 LightRAG 生命周期内都是内嵌的 ApeRAG 一部分,没打算拆出去** →
- Phase 3 不存在。
-
-**只有当至少一个下列信号出现时,启动 Phase 1**(实际上几乎一定会有):
-
-- 又出现一次类似 §3.3 的生命周期漏写 bug;
-- 新增一个业务模块要调图能力,又得重新 import `lightrag_manager`;
-- LightRAG 内部代码要做任何超过 100 行的改动,并且担心外溢。
-
----
-
-## 9. Open questions
-
-不影响本次判断、但落地 Phase 1 时要答:
-
-### Q1. `GraphIndexService` 是**单例**还是**per-collection 实例**?
-
-两种做法的取舍:
-
-- **单例**:进程级一个 service 对象;`create_lightrag_instance(collection)`
- 被封在方法内,每次调用按 collection 构造 rag 实例(+ request-scoped
- cache)。优点:调用简单 `graph_index_service.query_context(collection, ...)`。
-- **Per-collection 实例**:service 工厂 `make_graph_index_service(collection)`,
- 返回的 service 对象绑定 collection。优点:方法签名更简洁 `svc.query_context(...)`,
- 不用每次传 collection。
-
-倾向 **单例** + 方法接受 `collection` 参数——配合 FastAPI 的 DI,单例注入
-很自然;per-collection 实例在 Celery 任务里生命周期管理复杂。
-
-### Q2. Phase 1 的 DTO 层和 LightRAG 的 types 如何协调?
-
-`aperag/graph/lightrag/types.py::KnowledgeGraph` 已经存在并被业务层
-import。Phase 1 的选项:
-
-1. 直接 re-export:`from .lightrag.types import KnowledgeGraph` in `dto.py`。
- 零成本、零破坏。
-2. 复制定义:`dto.py` 定义自己的 `KnowledgeGraph`,内部做 adapter。
- 解耦彻底但前期一次性成本。
-
-倾向 **方案 1**:Phase 1 追求"零 diff 引入";真要解耦等 Phase 2 做内部
-重命名时顺手做。
-
-### Q3. Celery 路径怎么融入?
-
-`aperag/graph/lightrag_manager.py::process_document_for_celery` /
-`delete_document_for_celery` 是同步入口(为 Celery task 设计)。这两个
-函数**就是 Celery 任务入口**,迁移时不能简单换成 `graph_index_service.index_document`
-的 async 版本——Celery worker 不跑 asyncio 主循环。
-
-方案:`GraphIndexService` 提供**同步包装方法** `index_document_sync()` /
-`delete_document_sync()`,内部用 `_run_in_new_loop`(现在
-`lightrag_manager` 里已经有了)。Celery 任务调同步方法,FastAPI 调异
-步方法,同一个 service 对象。
-
-### Q4. Phase 1 的回归测试怎么做?
-
-- 单元层:mock 掉 LightRAG,只测 service 的 9 个方法签名 + 委托行为。
-- 集成层:复用 `tests/integration/graphstorage/` 已有的 3 个后端测试集,
- 再加一层:通过 `GraphIndexService` 跑一遍端到端(create collection →
- index doc → query context → delete doc)。
-
----
-
-## 10. 与向量抽象层重构的方法论对比
-
-| | 向量抽象(已落地,PR #1556) | LightRAG 重构 Phase 1(本文) |
-|---|---|---|
-| 起点 | 只有 Qdrant 一个后端 + 即将加 pgvector | 三个后端已在用 + 不会新增后端 |
-| 重构驱动力 | "要加 pgvector 就不得不抽象" | "不加 facade 就没法修内部 bug" |
-| 工作量 | ~1 周(Qdrant + pgvector 一起做完) | ~3 天(仅 facade) |
-| 对外 API 稳定性 | 历史较乱,本次一次性重设 | 仅 re-export 已有类型 |
-| 内部改造范围 | 深度(引入 DTO + 抽象 upsert/search 全套) | **刻意最小化**(不动 ~8700 行 lightrag/ 内部) |
-
-方法论上的共同原则:**一次做完,不留尾巴,干净的外部边界保护内部折腾
-自由**。差别在于向量层可以一次干掉、LightRAG 这块体量太大只能分阶段。
-
----
-
-## 11. 结论
-
-### 先做什么
-
-**Phase 1:LightRAG facade**。~3 天工作量,定义 `GraphIndexService` +
-9 个 DTO + 生命周期封装,迁移 8 处跨模块 import。
-
-### 再做什么
-
-观察 1~2 个月,看 Phase 1 之后真实使用中出现的痛点,决定:
-
-- Phase 2 内部整理 vs 图 DB 抽象 M3 清理 —— 两者都是 "内部装修",按痛点
- 优先级排。
-- 如果出现 §6.3 / §8 末尾列出的信号 —— 启动 Phase 3(独立 service)。
-
-### 不要做什么
-
-- **不要** 现在就全面重写 LightRAG 内部;
-- **不要** 在做 Phase 1 之前做图 DB 抽象层的 M2/M3;
-- **不要** 在没有触发信号时启动 Phase 3。
-
----
-
-## 12. 落地清单(可直接转为 issue)
-
-Phase 1 的任务拆解,按依赖顺序:
-
-1. `aperag/graph/dto.py` —— 9 个 DTO,re-export 现有类型为主。
-2. `aperag/graph/config.py` —— `GraphIndexConfig`,从 env 读一次。
-3. `aperag/graph/lifecycle.py` —— request-scoped rag 缓存、单例入口。
-4. `aperag/graph/service.py` —— `GraphIndexService` 类,9 个方法。
-5. 迁移 `aperag/service/graph_service.py`(5 处调用)。
-6. 迁移 `aperag/service/search_pipeline_service.py`(1 处,修掉 §3.3 bug)。
-7. 迁移 `aperag/tasks/collection.py`、`aperag/tasks/document.py`。
-8. 处理 `aperag/service/prompt_template_service.py`、
- `aperag/db/repositories/graph.py` 的常量依赖。
-9. 回归测试:现有 `tests/integration/graphstorage/` + 新增
- `tests/unit_test/graph/test_graph_index_service.py`。
-10. 更新文档:`graph_db_abstraction.md` 标注 Layer B = `GraphIndexService`
- 已落地;本文档标注 Phase 1 已完成。
-
-**预估总工作量:3 天(1 个 PR,diff 规模 ~500 行净增 + ~200 行迁移改动)**。
diff --git a/docs/zh-CN/design/prompt_customization_api_test.md b/docs/zh-CN/design/prompt_customization_api_test.md
deleted file mode 100644
index 1e69870ad..000000000
--- a/docs/zh-CN/design/prompt_customization_api_test.md
+++ /dev/null
@@ -1,164 +0,0 @@
-# Prompt API 测试手册
-
-**Base URL**: `http://localhost:8000/api/v1`
-**Auth**: `Bearer sk-85fc1342e0df44378ad73184ca8005b5`
-(请替换SK为真实SK)
-
----
-
-## 1. GET /prompts/user — 获取用户 Prompt 配置
-
-返回所有 5 种 prompt 的当前生效内容,以及来源(`source: user/system/hardcoded`)和是否自定义(`customized: true/false`)。
-
-```bash
-curl -X GET 'http://localhost:8000/api/v1/prompts/user' \
- -H 'Authorization: Bearer sk-85fc1342e0df44378ad73184ca8005b5'
-```
-
-**期待结果**:返回 `agent_system`、`agent_query`、`index_graph`、`index_summary`、`index_vision` 五个字段,初始状态下 `source` 均为 `system` 或 `hardcoded`,`customized` 均为 `false`。
-
----
-
-## 2. PUT /prompts/user — 更新用户 Prompt 配置
-
-只更新提供的字段,未提供的字段保持不变。
-
-```bash
-curl -X PUT 'http://localhost:8000/api/v1/prompts/user' \
- -H 'Content-Type: application/json' \
- -H 'Authorization: Bearer sk-85fc1342e0df44378ad73184ca8005b5' \
- -d '{
- "prompts": {
- "agent_system": "You are a helpful assistant specialized in technical support.",
- "index_graph": "Extract medical entities and relationships from the text."
- }
- }'
-```
-
-**期待结果**:`updated: ["agent_system", "index_graph"]`。再次调用接口 1,可见这两个字段的 `source` 变为 `user`,`customized` 变为 `true`。
-
----
-
-## 3. GET /prompts/system — 查看系统默认 Prompt
-
-只读接口,供用户参考系统默认内容。
-
-```bash
-# 查看所有系统默认
-curl -X GET 'http://localhost:8000/api/v1/prompts/system' \
- -H 'Authorization: Bearer sk-85fc1342e0df44378ad73184ca8005b5'
-```
-
-```bash
-# 查看指定类型
-curl -X GET 'http://localhost:8000/api/v1/prompts/system?type=agent_system' \
- -H 'Authorization: Bearer sk-85fc1342e0df44378ad73184ca8005b5'
-```
-
-**期待结果**:返回系统内置的 prompt 内容,不受用户自定义影响。
-
----
-
-## 4. DELETE /prompts/user/{type} — 重置单个 Prompt
-
-删除用户对某个 prompt 的自定义,回退到系统默认。
-
-```bash
-# 正常重置
-curl -X DELETE 'http://localhost:8000/api/v1/prompts/user/agent_system' \
- -H 'Authorization: Bearer sk-85fc1342e0df44378ad73184ca8005b5'
-```
-
-**期待结果**:返回重置后生效的内容,`source` 为 `system` 或 `hardcoded`。
-
-```bash
-# 重置一个未自定义的 prompt
-curl -X DELETE 'http://localhost:8000/api/v1/prompts/user/agent_query' \
- -H 'Authorization: Bearer sk-85fc1342e0df44378ad73184ca8005b5'
-```
-
-**期待结果**:`404`,`detail: "User has not customized agent_query prompt"`。
-
-```bash
-# 传入非法类型
-curl -X DELETE 'http://localhost:8000/api/v1/prompts/user/invalid_type' \
- -H 'Authorization: Bearer sk-85fc1342e0df44378ad73184ca8005b5'
-```
-
-**期待结果**:`400`,提示合法的 type 列表。
-
----
-
-## 5. POST /prompts/user/reset — 批量重置 Prompt
-
-```bash
-# 重置指定类型
-curl -X POST 'http://localhost:8000/api/v1/prompts/user/reset' \
- -H 'Content-Type: application/json' \
- -H 'Authorization: Bearer sk-85fc1342e0df44378ad73184ca8005b5' \
- -d '{"types": ["agent_system", "index_graph"]}'
-```
-
-```bash
-# 重置所有(不传 types)
-curl -X POST 'http://localhost:8000/api/v1/prompts/user/reset' \
- -H 'Content-Type: application/json' \
- -H 'Authorization: Bearer sk-85fc1342e0df44378ad73184ca8005b5' \
- -d '{}'
-```
-
-**期待结果**:`reset` 数组列出实际被重置的类型(未自定义的不会出现在列表中)。
-
----
-
-## 6. POST /prompts/preview — 预览 Prompt 渲染效果
-
-用于前端展示"变量填入后的效果"。
-
-```bash
-curl -X POST 'http://localhost:8000/api/v1/prompts/preview' \
- -H 'Content-Type: application/json' \
- -H 'Authorization: Bearer sk-85fc1342e0df44378ad73184ca8005b5' \
- -d '{
- "template": "Hello {{ name }}, you have {{ count }} messages.",
- "variables": {"name": "Alice", "count": 5}
- }'
-```
-
-**期待结果**:`rendered: "Hello Alice, you have 5 messages."`
-
----
-
-## 7. POST /prompts/validate — 校验 Prompt 语法
-
-```bash
-# 合法模板(但缺少建议变量,会有 warnings)
-curl -X POST 'http://localhost:8000/api/v1/prompts/validate' \
- -H 'Content-Type: application/json' \
- -H 'Authorization: Bearer sk-85fc1342e0df44378ad73184ca8005b5' \
- -d '{"type": "agent_query", "template": "{{ query }} {{ collections }}"}'
-```
-
-**期待结果**:`valid: true`,`warnings` 中提示缺少 `language`、`chat_id` 等建议变量。
-
-```bash
-# 非法 Jinja2 语法
-curl -X POST 'http://localhost:8000/api/v1/prompts/validate' \
- -H 'Content-Type: application/json' \
- -H 'Authorization: Bearer sk-85fc1342e0df44378ad73184ca8005b5' \
- -d '{"type": "agent_query", "template": "{% for x in %}broken{% endfor %}"}'
-```
-
-**期待结果**:`valid: false`,`errors` 中包含 Jinja2 语法错误信息。
-
----
-
-## Prompt 类型说明
-
-| 类型 | 用途 | 配置位置 |
-|---|---|---|
-| `agent_system` | Agent 人格/行为定义 | Bot 配置 > 用户默认 > 系统默认 |
-| `agent_query` | 每次对话的查询 prompt 模板 | Bot 配置 > 用户默认 > 系统默认 |
-| `index_graph` | 知识图谱实体关系抽取 | Collection 配置 > 用户默认 > 系统默认 |
-| `index_summary` | 文档摘要生成 | Collection 配置 > 用户默认 > 系统默认 |
-| `index_vision` | 图片内容提取 | Collection 配置 > 用户默认 > 系统默认 |
\ No newline at end of file
diff --git a/docs/zh-CN/design/prompt_customization_design.md b/docs/zh-CN/design/prompt_customization_design.md
deleted file mode 100644
index 385e3f95f..000000000
--- a/docs/zh-CN/design/prompt_customization_design.md
+++ /dev/null
@@ -1,238 +0,0 @@
-# Prompt自定义功能设计文档
-
-## 架构概述
-
-Prompt自定义功能采用**配置优先 + 简化表存储**的方案,提供三层优先级的配置继承机制。
-
----
-
-## 数据模型
-
-### prompt_template表
-
-用于存储用户默认和系统默认prompt。
-
-```sql
-CREATE TABLE prompt_template (
- id VARCHAR(24) PRIMARY KEY,
- prompt_type VARCHAR(50) NOT NULL, -- agent_system, agent_query, index_graph, etc.
- scope VARCHAR(20) NOT NULL, -- 'user' or 'system'
- user_id VARCHAR(256), -- NULL for system, user_id for user
- language VARCHAR(10) NOT NULL, -- zh-CN, en-US
- content TEXT NOT NULL,
- description TEXT,
- gmt_created TIMESTAMP,
- gmt_updated TIMESTAMP,
- gmt_deleted TIMESTAMP
-);
-```
-
-### Bot配置(已存在)
-
-```json
-{
- "agent": {
- "system_prompt_template": "Bot专属system prompt",
- "query_prompt_template": "Bot专属query prompt"
- }
-}
-```
-
-### Collection配置(新增index_prompts)
-
-```json
-{
- "index_prompts": {
- "graph": "Collection专属graph prompt",
- "summary": "Collection专属summary prompt",
- "vision": "Collection专属vision prompt"
- }
-}
-```
-
----
-
-## 三层优先级系统
-
-### Agent Prompt解析
-
-```
-优先级1: Bot.config.agent.system_prompt_template
- ↓
-优先级2: prompt_template (scope='user', prompt_type='agent_system')
- ↓
-优先级3: prompt_template (scope='system', prompt_type='agent_system')
- ↓
-优先级4: 代码硬编码 (APERAG_AGENT_INSTRUCTION_ZH/EN)
-```
-
-### 索引Prompt解析
-
-```
-优先级1: Collection.config.index_prompts.graph
- ↓
-优先级2: prompt_template (scope='user', prompt_type='index_graph')
- ↓
-优先级3: prompt_template (scope='system', prompt_type='index_graph')
- ↓
-优先级4: 代码硬编码 (LightRAG PROMPTS["entity_extraction"])
-```
-
----
-
-## 架构分层
-
-```
-┌─────────────────────────────────────┐
-│ View层 (prompts.py) │
-│ - HTTP请求处理 │
-│ - 参数验证 │
-│ - 错误处理 │
-└──────────────┬──────────────────────┘
- │
-┌──────────────▼──────────────────────┐
-│ Service层 (PromptTemplateService) │
-│ ┌─────────────────────────────────┐ │
-│ │ 用户配置管理(给View用) │ │
-│ │ - get_user_prompts() │ │
-│ │ - update_user_prompts() │ │
-│ │ - delete_user_prompt() │ │
-│ │ - reset_user_prompts() │ │
-│ └─────────────────────────────────┘ │
-│ ┌─────────────────────────────────┐ │
-│ │ Prompt解析(给Agent/LightRAG用) │ │
-│ │ - resolve_agent_system_prompt() │ │
-│ │ - resolve_agent_query_prompt() │ │
-│ │ - resolve_index_prompt() │ │
-│ └─────────────────────────────────┘ │
-└──────────────┬──────────────────────┘
- │
-┌──────────────▼──────────────────────┐
-│ Repository层 │
-│ (AsyncPromptTemplateRepositoryMixin)│
-└──────────────┬──────────────────────┘
- │
-┌──────────────▼──────────────────────┐
-│ Database (prompt_template表) │
-└─────────────────────────────────────┘
-```
-
----
-
-## 支持的Prompt类型
-
-| Prompt类型 | 用途 | 存储位置 |
-|-----------|------|---------|
-| agent_system | Agent人格定义 | Bot.config.agent.system_prompt_template |
-| agent_query | 查询模板 | Bot.config.agent.query_prompt_template |
-| index_graph | 实体关系抽取 | Collection.config.index_prompts.graph |
-| index_summary | 文档摘要 | Collection.config.index_prompts.summary |
-| index_vision | 图像提取 | Collection.config.index_prompts.vision |
-
----
-
-## API设计
-
-### RESTful资源模型
-
-```
-/prompts/user # 用户的prompt配置(资源)
-/prompts/system # 系统的prompt配置(资源)
-/prompts/preview # 工具
-/prompts/validate # 工具
-```
-
-### 核心API
-
-- `GET /prompts/user` - 获取用户配置(含优先级解析)
-- `PUT /prompts/user` - 批量更新用户配置
-- `DELETE /prompts/user/{type}` - 重置单个配置
-- `POST /prompts/user/reset` - 批量重置
-- `GET /prompts/system` - 获取系统默认
-
----
-
-## 核心实现
-
-### PromptTemplateService
-
-**位置**:`aperag/service/prompt_template_service.py`
-
-**核心方法**:
-
-```python
-class PromptTemplateService:
- # 用户配置管理(给View层用)
- async def get_user_prompts(user_id, language) -> Dict
- async def update_user_prompts(user_id, language, prompts) -> List[str]
- async def delete_user_prompt(user_id, prompt_type, language) -> Dict
- async def reset_user_prompts(user_id, language, types) -> List[str]
-
- # Prompt解析(给Agent/LightRAG用)
- async def resolve_agent_system_prompt(bot, user_id, language) -> str
- async def resolve_agent_query_prompt(bot, user_id, language) -> str
- async def resolve_index_prompt(collection, prompt_type, user_id) -> str
-```
-
----
-
-## 技术特点
-
-1. **配置内聚**:对象级配置跟随对象存储(Bot/Collection的config字段)
-2. **简化表结构**:prompt_template表只存储默认配置
-3. **三层优先级**:对象 > 用户默认 > 系统默认 > 硬编码
-4. **RESTful API**:资源导向,语义清晰
-5. **分层架构**:View → Service → Repository
-
----
-
-## 数据流示例
-
-### 用户获取配置
-
-```
-1. 用户请求:GET /prompts/user?language=zh-CN
- ↓
-2. View层:prompts.py 接收请求
- ↓
-3. Service层:prompt_template_service.get_user_prompts()
- - 遍历所有prompt_type
- - 对每个type执行优先级查找:
- a. 查询 prompt_template (scope='user')
- b. 如无,查询 prompt_template (scope='system')
- c. 如无,使用代码硬编码
- - 组装响应:content + source + customized + language
- ↓
-4. Repository层:query_prompt_template()
- ↓
-5. Database:prompt_template表
- ↓
-6. 返回响应给前端
-```
-
-### Agent对话使用prompt
-
-```
-1. 用户发起对话
- ↓
-2. agent_chat_service.py
- ↓
-3. 调用:prompt_template_service.resolve_agent_system_prompt(bot, user_id, language)
- - 优先级1:检查 bot.config.agent.system_prompt_template
- - 优先级2:查询 prompt_template (scope='user')
- - 优先级3:查询 prompt_template (scope='system')
- - 优先级4:使用 APERAG_AGENT_INSTRUCTION_ZH
- ↓
-4. 返回解析后的prompt给Agent使用
-```
-
----
-
-## 后续集成
-
-详见:[prompt_customization_integration_todo.md](./prompt_customization_integration_todo.md)
-
-核心任务:
-- Agent对话集成(agent_chat_service.py)
-- LightRAG集成(lightrag_manager.py)
-- Summary/Vision索引集成
diff --git a/docs/zh-CN/design/prompt_customization_integration_todo.md b/docs/zh-CN/design/prompt_customization_integration_todo.md
deleted file mode 100644
index 10c9efb4c..000000000
--- a/docs/zh-CN/design/prompt_customization_integration_todo.md
+++ /dev/null
@@ -1,617 +0,0 @@
-# Prompt自定义功能后续集成指南
-
-本文档记录Prompt自定义功能的后续集成工作。
-
-## 已完成的基础设施
-
-- ✅ `prompt_template` 数据库表
-- ✅ Repository层CRUD方法
-- ✅ `PromptTemplateService`类(完整业务逻辑)
-- ✅ RESTful API (`/prompts/user/*`)
-- ✅ Collection Schema扩展(`index_prompts`字段)
-- ✅ 辅助API(preview、validate、system)
-
-## 后续集成任务
-
-需要将prompt解析服务集成到现有的Agent对话和索引构建流程中。
-
----
-
-## 一、Prompt解析服务实现
-
-### 1.1 文件位置
-`aperag/service/prompt_template_service.py`
-
-### 1.2 需要添加的方法
-
-#### resolve_agent_system_prompt
-```python
-async def resolve_agent_system_prompt(bot, user_id: str, language: str) -> str:
- """
- 解析Agent系统prompt
-
- 优先级:
- 1. Bot.config.agent.system_prompt_template
- 2. prompt_template表(scope='user', prompt_type='agent_system')
- 3. prompt_template表(scope='system', prompt_type='agent_system')
- 4. 代码硬编码(APERAG_AGENT_INSTRUCTION_EN/ZH)
-
- Args:
- bot: Bot对象
- user_id: 用户ID
- language: 语言代码 (en-US, zh-CN)
-
- Returns:
- 解析后的system prompt内容
- """
- from aperag.db.ops import async_db_ops
-
- # 层级1:Bot配置
- if bot.config:
- try:
- import json
- config_dict = json.loads(bot.config) if isinstance(bot.config, str) else bot.config
- if config_dict.get("agent", {}).get("system_prompt_template"):
- return config_dict["agent"]["system_prompt_template"]
- except:
- pass
-
- # 层级2:用户默认
- user_default = await async_db_ops.query_prompt_template(
- prompt_type="agent_system",
- scope="user",
- user_id=user_id,
- language=language
- )
- if user_default:
- return user_default.content
-
- # 层级3:系统默认
- system_default = await async_db_ops.query_prompt_template(
- prompt_type="agent_system",
- scope="system",
- user_id=None,
- language=language
- )
- if system_default:
- return system_default.content
-
- # 层级4:代码硬编码
- if language == "zh-CN":
- return APERAG_AGENT_INSTRUCTION_ZH
- else:
- return APERAG_AGENT_INSTRUCTION_EN
-```
-
-#### resolve_agent_query_prompt
-```python
-async def resolve_agent_query_prompt(bot, user_id: str, language: str) -> str:
- """
- 解析Agent查询prompt模板
-
- 优先级:
- 1. Bot.config.agent.query_prompt_template
- 2. prompt_template表(scope='user', prompt_type='agent_query')
- 3. prompt_template表(scope='system', prompt_type='agent_query')
- 4. 代码硬编码(DEFAULT_AGENT_QUERY_PROMPT_EN/ZH)
-
- Args:
- bot: Bot对象
- user_id: 用户ID
- language: 语言代码
-
- Returns:
- 解析后的query prompt模板内容
- """
- # 实现逻辑与 resolve_agent_system_prompt 类似
- # ...
-```
-
-#### resolve_index_prompt
-```python
-async def resolve_index_prompt(
- collection,
- prompt_type: str, # "graph", "summary", "vision"
- user_id: str
-) -> str:
- """
- 解析索引prompt
-
- 优先级:
- 1. Collection.config.index_prompts.{type}
- 2. prompt_template表(scope='user', prompt_type='index_{type}')
- 3. prompt_template表(scope='system', prompt_type='index_{type}')
- 4. 代码硬编码
-
- Args:
- collection: Collection对象
- prompt_type: Prompt类型 (graph, summary, vision)
- user_id: 用户ID
-
- Returns:
- 解析后的index prompt内容
- """
- from aperag.db.ops import async_db_ops
-
- # 层级1:Collection配置
- if collection.config:
- try:
- import json
- config_dict = json.loads(collection.config) if isinstance(collection.config, str) else collection.config
- index_prompts = config_dict.get("index_prompts", {})
- if index_prompts.get(prompt_type):
- return index_prompts[prompt_type]
- except:
- pass
-
- # 层级2:用户默认
- db_prompt_type = f"index_{prompt_type}"
- collection_language = "zh-CN" # 从collection.config.language获取
- try:
- config_dict = json.loads(collection.config) if isinstance(collection.config, str) else collection.config
- collection_language = config_dict.get("language", "zh-CN")
- except:
- pass
-
- user_default = await async_db_ops.query_prompt_template(
- prompt_type=db_prompt_type,
- scope="user",
- user_id=user_id,
- language=collection_language
- )
- if user_default:
- return user_default.content
-
- # 层级3:系统默认
- system_default = await async_db_ops.query_prompt_template(
- prompt_type=db_prompt_type,
- scope="system",
- user_id=None,
- language=collection_language
- )
- if system_default:
- return system_default.content
-
- # 层级4:代码硬编码
- return get_hardcoded_index_prompt(prompt_type)
-
-
-def get_hardcoded_index_prompt(prompt_type: str) -> str:
- """获取代码硬编码的索引prompt(最终fallback)"""
- if prompt_type == "graph":
- from aperag.graph.lightrag.prompt import PROMPTS
- return PROMPTS["entity_extraction"]
- elif prompt_type == "summary":
- return """Provide a comprehensive summary of the following document..."""
- elif prompt_type == "vision":
- return """Analyze the provided image and extract its content with high fidelity..."""
- else:
- return None
-```
-
----
-
-## 二、Agent对话集成
-
-### 2.1 文件位置
-`aperag/service/agent_chat_service.py`
-
-### 2.2 改造点1:_get_agent_session方法
-
-**位置**:约第402-461行
-
-**当前代码**:
-```python
-# 约437-439行
-system_prompt = (
- custom_system_prompt if custom_system_prompt
- else get_agent_system_prompt(language=agent_message.language)
-)
-```
-
-**改造后**:
-```python
-from aperag.service.prompt_template_service import resolve_agent_system_prompt
-
-system_prompt = await resolve_agent_system_prompt(
- bot=bot,
- user_id=user,
- language=agent_message.language
-)
-```
-
-**影响**:
-- 需要将bot对象传递到这个方法中
-- 当前方法签名可能需要调整
-
-### 2.3 改造点2:process_agent_message方法
-
-**位置**:约第463-551行
-
-**当前代码**:
-```python
-# 约518-521行
-comprehensive_prompt = build_agent_query_prompt(
- chat_id, agent_message=merged_agent_message, user=user, custom_template=custom_query_prompt
-)
-```
-
-**改造后**:
-```python
-from aperag.service.prompt_template_service import resolve_agent_query_prompt
-
-query_prompt_template = await resolve_agent_query_prompt(
- bot=bot,
- user_id=user,
- language=merged_agent_message.language
-)
-
-comprehensive_prompt = build_agent_query_prompt(
- chat_id,
- agent_message=merged_agent_message,
- user=user,
- custom_template=query_prompt_template
-)
-```
-
-**注意事项**:
-- 需要确保bot对象在此方法中可用
-- 当前代码使用custom_query_prompt参数,需要替换为解析服务
-
----
-
-## 三、Graph索引集成(LightRAG)
-
-### 3.1 文件位置
-`aperag/graph/lightrag_manager.py`
-
-### 3.2 改造点:create_lightrag_instance函数
-
-**位置**:约第59-123行
-
-**当前代码**:
-```python
-async def create_lightrag_instance(collection: Collection) -> LightRAG:
- # ... 获取配置 ...
-
- rag = LightRAG(
- working_dir=working_dir,
- entity_types=entity_types,
- # ... 其他配置 ...
- )
-
- return rag
-```
-
-**改造后**:
-```python
-from aperag.service.prompt_template_service import resolve_index_prompt
-
-async def create_lightrag_instance(collection: Collection) -> LightRAG:
- # ... 获取配置 ...
-
- # 解析自定义graph prompt
- custom_graph_prompt = await resolve_index_prompt(
- collection=collection,
- prompt_type="graph",
- user_id=collection.user
- )
-
- # 创建LightRAG实例
- rag = LightRAG(
- working_dir=working_dir,
- entity_types=entity_types,
- # ... 其他配置 ...
- )
-
- # 如果有自定义prompt,需要覆盖LightRAG的默认prompt
- # 方案1:扩展LightRAG支持custom_prompts参数(需要修改lightrag.py)
- # 方案2:运行时替换(临时方案,但要注意线程安全)
-
- return rag
-```
-
-### 3.3 LightRAG扩展(可选)
-
-**文件位置**:`aperag/graph/lightrag/lightrag.py`
-
-**建议扩展**:
-```python
-@dataclass
-class LightRAG:
- # ... 现有字段 ...
-
- # 新增:自定义prompts字典
- custom_prompts: Optional[Dict[str, str]] = None
-
- def _get_prompt(self, key: str) -> str:
- """获取prompt,支持自定义覆盖"""
- if self.custom_prompts and key in self.custom_prompts:
- return self.custom_prompts[key]
- return PROMPTS[key]
-```
-
-**使用**:
-```python
-rag = LightRAG(
- working_dir=working_dir,
- entity_types=entity_types,
- custom_prompts={
- "entity_extraction": custom_graph_prompt
- } if custom_graph_prompt else None,
- # ... 其他配置 ...
-)
-```
-
----
-
-## 四、Summary索引集成
-
-### 4.1 文件位置
-`aperag/index/summary_index.py`
-
-### 4.2 改造点:create_index方法
-
-**位置**:约第46-87行
-
-**当前代码**:
-```python
-def create_index(self, document_id: str, content: str, doc_parts: List[Any], collection, **kwargs):
- # ... 现有逻辑 ...
-
- # 使用默认的map-reduce prompt生成摘要
- summary = self._generate_document_summary(content, doc_parts, collection)
-```
-
-**改造后**:
-```python
-from aperag.service.prompt_template_service import resolve_index_prompt
-
-def create_index(self, document_id: str, content: str, doc_parts: List[Any], collection, **kwargs):
- # ... 现有逻辑 ...
-
- # 解析自定义summary prompt
- custom_summary_prompt = await resolve_index_prompt(
- collection=collection,
- prompt_type="summary",
- user_id=collection.user
- )
-
- if custom_summary_prompt:
- # 使用自定义prompt生成摘要
- summary = self._generate_summary_with_custom_prompt(
- content, doc_parts, collection, custom_summary_prompt
- )
- else:
- # 使用默认的map-reduce逻辑
- summary = self._generate_document_summary(content, doc_parts, collection)
-```
-
-**注意事项**:
-- `create_index` 可能是同步方法,需要确认是否可以改为异步
-- 可能需要新增 `_generate_summary_with_custom_prompt` 方法
-
----
-
-## 五、Vision索引集成
-
-### 5.1 文件位置
-`aperag/index/vision_index.py`
-
-### 5.2 改造点:create_index方法
-
-**位置**:约第146行附近
-
-**当前代码**:
-```python
-prompt = """Analyze the provided image and extract its content with high fidelity. Follow these instructions precisely..."""
-```
-
-**改造后**:
-```python
-from aperag.service.prompt_template_service import resolve_index_prompt
-
-# 解析自定义vision prompt
-custom_vision_prompt = await resolve_index_prompt(
- collection=collection,
- prompt_type="vision",
- user_id=collection.user
-)
-
-prompt = custom_vision_prompt if custom_vision_prompt else """
-Analyze the provided image and extract its content with high fidelity...
-"""
-```
-
----
-
-## 六、Collection API扩展
-
-### 6.1 说明
-Collection的API已经通过Schema扩展支持`index_prompts`字段,无需额外改造。
-
-### 6.2 使用示例
-
-**更新Collection配置**:
-```bash
-PUT /api/v1/collections/{collection_id}
-Content-Type: application/json
-
-{
- "title": "医疗知识库",
- "config": {
- "enable_knowledge_graph": true,
- "enable_summary": true,
- "knowledge_graph_config": {
- "entity_types": ["疾病", "药物", "症状", "治疗方案"]
- },
- "index_prompts": {
- "graph": "从医疗文本中提取实体和关系。实体类型:{entity_types}。要求:1. 识别中文医疗术语... 2. 提取疾病-药物、症状-疾病等关系...",
- "summary": "生成医疗文档的结构化摘要,包括:1. 主要诊断 2. 治疗方案 3. 用药建议 4. 注意事项"
- }
- }
-}
-```
-
-### 6.3 可选增强:变更提示
-
-如果希望在用户修改`index_prompts`后给出"需重建索引"的提示,可以在Collection更新API中添加逻辑:
-
-**文件**:`aperag/service/collection_service.py`
-
-**位置**:`update_collection`方法
-
-**增强逻辑**:
-```python
-async def update_collection(self, user: str, collection_id: str, update_data: dict):
- # ... 现有更新逻辑 ...
-
- # 检测index_prompts是否变更
- warnings = []
- if "config" in update_data and "index_prompts" in update_data["config"]:
- warnings.append("索引Prompt配置已变更,建议重建相关索引以使新配置生效")
-
- # 在返回结果中包含warnings
- return {
- "collection": updated_collection,
- "warnings": warnings
- }
-```
-
----
-
-## 七、实施优先级建议
-
-### 高优先级(核心功能)
-1. ✅ **Prompt解析服务**:实现三个resolve方法
-2. **Agent集成**:改造agent_chat_service.py
-3. **Graph索引集成**:改造lightrag_manager.py和LightRAG
-
-### 中优先级(常用功能)
-4. **Summary索引集成**:改造summary_index.py
-5. **Collection API增强**:添加变更提示
-
-### 低优先级(可选功能)
-6. **Vision索引集成**:改造vision_index.py
-7. **性能优化**:添加缓存机制
-8. **使用统计**:记录prompt使用情况
-
----
-
-## 八、验证和测试
-
-### 8.1 API测试
-
-**用户默认Prompt**:
-```bash
-# 1. 设置用户默认的Agent System Prompt
-curl -X PUT http://localhost:8000/api/v1/prompts/defaults/agent \
- -H "Content-Type: application/json" \
- -d '{
- "language": "zh-CN",
- "system": "你是一个专业的技术支持助手,擅长解决软件问题",
- "query": "{% set collection_list = [] %}..."
- }'
-
-# 2. 获取用户默认配置
-curl http://localhost:8000/api/v1/prompts/defaults?language=zh-CN
-
-# 3. 获取系统默认配置(参考)
-curl http://localhost:8000/api/v1/prompts/system-defaults?type=agent_system&language=zh-CN
-
-# 4. 预览prompt渲染
-curl -X POST http://localhost:8000/api/v1/prompts/preview \
- -H "Content-Type: application/json" \
- -d '{
- "type": "agent_query",
- "template": "用户查询:{{ query }}",
- "variables": {"query": "测试查询"}
- }'
-
-# 5. 验证prompt语法
-curl -X POST http://localhost:8000/api/v1/prompts/validate \
- -H "Content-Type: application/json" \
- -d '{
- "type": "agent_query",
- "template": "{% set x = 1 %}{{ query }}"
- }'
-```
-
-**Collection索引Prompt**:
-```bash
-# 更新Collection配置
-curl -X PUT http://localhost:8000/api/v1/collections/{collection_id} \
- -H "Content-Type: application/json" \
- -d '{
- "config": {
- "index_prompts": {
- "graph": "自定义的图索引prompt...",
- "summary": "自定义的摘要prompt..."
- }
- }
- }'
-```
-
-### 8.2 集成测试流程
-
-**Agent对话测试**:
-1. 创建Bot(不配置prompt) → 应使用用户默认
-2. 用户设置默认Agent prompt
-3. 发起对话 → 验证使用了用户默认prompt
-4. 更新Bot配置(设置prompt) → 应优先使用Bot配置
-
-**索引构建测试**:
-1. 创建Collection(不配置index_prompts) → 应使用系统默认
-2. 用户设置默认索引prompt
-3. 上传文档构建索引 → 验证使用了用户默认prompt
-4. 更新Collection配置(设置index_prompts)
-5. 重建索引 → 验证使用了Collection配置
-
----
-
-## 九、常见问题
-
-### Q1: Bot.config是字符串还是对象?
-**A**: 数据库中是Text类型(JSON字符串),读取后需要json.loads()解析。
-
-### Q2: 异步方法在同步context中如何调用?
-**A**: 如果indexer的create_index是同步方法,可能需要:
-- 改为异步方法(推荐)
-- 或使用asyncio.run()包装(不推荐,可能有事件循环冲突)
-
-### Q3: LightRAG的PROMPTS是全局变量,如何实现实例级覆盖?
-**A**:
-- 方案1:扩展LightRAG支持custom_prompts参数(推荐)
-- 方案2:创建实例时深拷贝PROMPTS字典
-- 方案3:运行时临时替换(需注意线程安全)
-
-### Q4: 用户修改索引prompt后,旧索引怎么办?
-**A**:
-- 旧索引仍然有效,但是用旧prompt生成的
-- 建议在API响应中提示用户"需重建索引"
-- 可以添加Collection.index_config_changed字段标记(可选)
-
----
-
-## 十、下一步行动清单
-
-- [ ] 实现 `resolve_agent_system_prompt()`
-- [ ] 实现 `resolve_agent_query_prompt()`
-- [ ] 实现 `resolve_index_prompt()`
-- [ ] 改造 `agent_chat_service.py`(两处)
-- [ ] 改造 `lightrag_manager.py`
-- [ ] 扩展 `LightRAG` 支持 `custom_prompts`
-- [ ] 改造 `summary_index.py`
-- [ ] 改造 `vision_index.py`
-- [ ] 添加Collection更新提示
-- [ ] 编写集成测试用例
-- [ ] 更新用户文档
-
----
-
-## 十一、参考文档
-
-- [设计方案](../../.cursor/plans/自定义prompt模板系统_8a863299.plan.md)
-- [API实现](../../aperag/views/prompts.py)
-- [Repository实现](../../aperag/db/repositories/prompt_template.py)
-- [API实现](../../aperag/views/prompts.py)
diff --git a/docs/zh-CN/design/qdrant_memory_optimization.md b/docs/zh-CN/design/qdrant_memory_optimization.md
deleted file mode 100644
index c5cdfb091..000000000
--- a/docs/zh-CN/design/qdrant_memory_optimization.md
+++ /dev/null
@@ -1,633 +0,0 @@
-# Qdrant 内存占用治理与多租户化改造
-
-> 写作缘由:香港 ACK 集群的 Qdrant 容器 RSS 已经涨到 12.5 GiB(limit 16 GiB),但线上只有 ~~1600 个用户、~~1.3 万文档、~60 万向量 chunk。经过排查确认**绝大部分内存不是花在业务数据上,而是花在"每个 ApeRAG Collection 建一个 Qdrant Collection"造成的元数据/段级结构性浪费**。如果不治理,随着用户增长 Qdrant 会成为整个系统的扩展瓶颈。
->
-> 相关仓库:
->
-> - 应用代码:[apecloud/ApeRAG](https://github.com/apecloud/ApeRAG)
-> - 生产部署 values:[apecloud/aperag-values](https://github.com/apecloud/aperag-values)
-
----
-
-## 1. 现场快照(2026-04-20,hk 集群)
-
-```text
-pod : qdrant-cluster-qdrant-0 (KubeBlocks qdrant 0.9.1, qdrant 1.10.0)
-limit : 4 cpu / 16 GiB
-RSS : 12.47 GiB (77% of limit)
-RssAnon : 12.52 GiB ← 基本都是匿名堆/mmap
-RssFile : 1.05 GiB
-VmSize : 153 GiB ← 线程栈预留 + 大量 mmap
-Threads : 1872 ← 关键异常指标
-Storage dir : 8.0 GiB (/qdrant/storage)
-```
-
-Qdrant 自身 telemetry 聚合:
-
-
-| 指标 | 数值 | 备注 |
-| ---------------------- | ------------------ | ------------------------------------ |
-| collections 总数 | **1847** | pg 中 ACTIVE 2003 + DELETED 179 |
-| 空 collection(0 points) | **1260 (68%)** | 结构性浪费的主要来源 |
-| 非空 collection | 587 | 真正产生业务价值 |
-| 总 segments | 7387 | 每 collection 默认 4 段 |
-| 总 points | 575 772 | |
-| 总 vectors | 600 795 | 1024 维、Cosine |
-| 已建 HNSW 索引的 vectors | **90 005 (仅 15%)** | `indexing_threshold=20 MB` ≈ 5000 向量 |
-| 真正跨过索引阈值的 collection | **3** | 其他 584 个都走暴力扫描 |
-
-
-业务侧(postgres `aperag` 库):
-
-
-| 指标 | 数值 |
-| --------------------------- | ------------- |
-| `user` 行数 | 1611 |
-| `collection` ACTIVE | 2003 |
-| `collection` DELETED | 179 |
-| `document` 总行数 | 13 044 |
-| `document` COMPLETE | 5 404 |
-| `document` EXPIRED / FAILED | 5 114 / 1 440 |
-
-
----
-
-## 2. 12 GiB 内存都去哪了
-
-
-| 组成 | 估算值 | 解释 |
-| ----------------------------------- | ----------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------- |
-| 向量原始数据常驻 RAM | **~2.4 GiB** | 默认 `storage_type: Memory`,600 795 × 1024 × 4 B |
-| HNSW 图(RAM) | ~10–50 MiB | `hnsw_index.on_disk=false`,但只有 9 万 vec 被索引 |
-| **7387 × RocksDB 实例静态开销** | **~3–5 GiB** | 每个 segment 都是一个独立 RocksDB 库(MANIFEST/LOG/OPTIONS/LOCK 一套),memtable arena + block cache + table reader cache + bloom filter 按最小配置都要几 MiB |
-| **1847 × collection 级 Qdrant 内部结构** | **~2–3 GiB** | id_mapper、deleted bitset、payload index、分片元数据、tokio task |
-| **1872 threads** 的栈 + async 状态 | ~0.3–0.8 GiB | 每线程 VM 预留 8 MiB 但 RSS 只算 touched 页 |
-| WAL / 小文件回写缓冲 | ~0.5 GiB | `wal_capacity_mb=32` × 活动分片 |
-| **合计** | **≈ 9–11 GiB anon + 1 GiB file ≈ 12 GiB** | ✅ 与实测吻合 |
-
-
-**结论一句话**:12 GiB 里**只有大约 2.4 GiB 是真正存业务向量的**,剩下的 ~10 GiB 全是"1847 × 4 段 = 7387 个 RocksDB + 每 collection 的元数据"这种**与数据量无关、只与 collection 数量线性相关的成本**。
-
----
-
-## 3. 根因:一个 ApeRAG Collection = 一个 Qdrant Collection
-
-当前实现位于 `aperag/vectorstore/qdrant_connector.py`:
-
-```102:112:aperag/vectorstore/qdrant_connector.py
- def create_collection(self, **kwargs: Any):
- vector_size = kwargs.get("vector_size")
- from qdrant_client.http import models as rest
-
- self.client.recreate_collection(
- collection_name=self.collection_name,
- vectors_config=rest.VectorParams(
- size=vector_size,
- distance=rest.Distance.COSINE,
- ),
- )
-```
-
-配合 `aperag/utils/utils.py::generate_vector_db_collection_name` 的映射:
-
-```44:46:aperag/utils/utils.py
-def generate_vector_db_collection_name(collection_id) -> str:
- return str(collection_id)
-```
-
-也就是:用户每建一个 ApeRAG Collection,ApeRAG 就调用 `recreate_collection` 创建一个同名 Qdrant Collection。
-
-这是 Qdrant **官方明确反对**的用法。Qdrant 的文档 [https://qdrant.tech/documentation/guides/multiple-partitions/](https://qdrant.tech/documentation/guides/multiple-partitions/) 开头第一段就在强调:
-
-> In many cases, it is more efficient to use a **single collection** with payload-based partitioning. This approach is called **multitenancy**.
-
-照当前趋势继续线性膨胀:
-
-
-| 阶段 | 用户 | ApeRAG Collection | Qdrant RSS 预期 |
-| ------ | ---- | ----------------- | -------------- |
-| 现状 | 1.6k | 2k | 12 GiB |
-| 2× 增长 | 3k | 4k | ~25 GiB |
-| 10× 增长 | 16k | 20k | 120+ GiB,单机不可行 |
-
-
-业务向量数据其实只涨了线性的一小段,**真正爆炸的是 collection 元数据**。
-
----
-
-## 4. 解决方案
-
-按 **收益/改动量** 排序分成三档。ABC 三档可以独立落地、互不阻塞。
-
-### A 档 · 立即可做(今天完成,预计省 3–5 GiB,零代码改动)
-
-#### A.1 清理"孤儿 + 已删除"的 Qdrant collection
-
-判定规则(**只删 pg 里已经 DELETED 或根本不存在的**,不动任何 ACTIVE 记录,即便它在 Qdrant 里是空的也保留,因为业务上用户可能还没来得及上传文档):
-
-```text
-待删 = { qdrant 里存在的 collection } ∩ ( { pg.collection.status='DELETED' } ∪ { pg 里不存在的孤儿 } )
-```
-
-该清理已经由本次治理配套的 subagent 执行,详见本文档末尾"附录 · 清理执行记录"。
-
-#### A.2 调整 Qdrant Server 侧默认值
-
-改动 Qdrant 的 `config.yaml`(通过 KubeBlocks 的 config 模板或启动环境变量注入均可):
-
-```yaml
-storage:
- optimizers:
- # 4 段 → 2 段,RocksDB 实例数直接减半
- default_segment_number: 2
-
- # 超过 20 MiB 的 segment 用 mmap 存储向量,冷数据交给 kernel page cache 管
- memmap_threshold_kb: 20480
-
- hnsw_index:
- # HNSW 图也落盘,查询路径会多一次 page fault 但内存占用大幅降低
- on_disk: true
-
- wal:
- # 绝大多数 collection 很小,32 MiB 的 WAL 段太奢侈
- wal_capacity_mb: 8
-```
-
-**落地位置**:
-
-- 仓库内自部署脚本:`deploy/databases/qdrant/values.yaml`(目前只有 CPU/mem/storage/version,需要新增一节覆盖 Qdrant config)。
-- 生产 helm values:`apecloud/aperag-values` 仓库内的 qdrant values 文件(该仓库独立维护,需要在 PR 里一起更新)。
-
-> 注意:`default_segment_number` 调整**不会自动应用到已存在的 collection**,需要对现有 collection 发起 `PATCH /collections/{name}` 或通过 optimizer 重建才能生效。建议在 A.1 清理之后,通过 `UpdateCollection` API 批量刷一遍。
-
-### B 档 · 中期(1–2 天,一次性砍掉 ~70% 内存,彻底解决扩展性)
-
-#### B.1 重新设计:多租户 = 单 Qdrant collection + payload 索引过滤
-
-##### B.1.1 Qdrant 多租户的官方模型
-
-Qdrant 为此专门提供了三项能力,缺一不可:
-
-1. **全局单 collection**:所有租户共用一个 Qdrant collection(例如 `aperag_vectors`)。原来每个 ApeRAG Collection 的 vector size 如果不一样,就按"向量维度 + 距离"分成少数几个全局 collection(例如 `aperag_vectors_1024_cosine`、`aperag_vectors_1536_cosine`)。
-2. **给 point 加上 tenant 维度的 payload 字段**:在每次 upsert 时,payload 里必须带 `collection_id`(ApeRAG Collection ID)。`point.id` 继续用现有的 chunk id 方案。
-3. **给 tenant 字段建 keyword 索引,并启用 tenant 优化**:
- ```http
- PUT /collections/aperag_vectors_1024_cosine/index
- {
- "field_name": "collection_id",
- "field_schema": {
- "type": "keyword",
- "is_tenant": true # 关键:告诉 Qdrant 这是多租户分区字段
- }
- }
- ```
- `is_tenant: true` 是 Qdrant **1.11+** 引入的特殊标记([官方发布说明](https://qdrant.tech/blog/qdrant-1.11.x/))。打开后,Qdrant 的 optimizer 会**按照 tenant 字段对点进行物理分组存储**(tenant 相同的点会被尽量放在同一个 segment 内),查询时相当于只扫描对应 tenant 的子集,性能和独立 collection 几乎等价。
- **⚠️ 版本兼容**:生产现在的 Qdrant 是 **1.10.0**,不支持 `is_tenant`。连接器在 `aperag/vectorstore/qdrant_connector.py::_ensure_tenant_payload_index` 里采用了兜底策略:优先尝试 `is_tenant=True`,失败时降级为普通 keyword 索引。在 1.10 上:
- - ✅ payload filter 本身完全工作,租户隔离语义严格成立;
- - ✅ "1847 个 collection → 1 个" 的合并收益完全拿到(~3–5 GiB 的 RocksDB 实例开销立刻消失);
- - ❌ segment 级 defragmentation 不会生效,tenant 点会混存在同一批 segment 中,查询时 HNSW 需要跨更多"别人的"节点。预期 p95 查询延迟有几十百分比的退化(退化量随全局 collection 的总点数线性增长)。
- **强烈建议**:把 Qdrant server 从 1.10.0 升到 1.11.x(或 1.12.x)作为本次停服窗口的一部分。升级只需改镜像 tag,不涉及数据迁移。升上去后,新建 collection 自动带 is_tenant 优化;**已有的全局 collection 上的索引不会自动升级**——需要在升级后重建索引:
-
-##### B.1.2 查询时的过滤模板
-
-所有 `search / query_points / scroll` 都必须带 tenant 过滤:
-
-```python
-hits = client.query_points(
- collection_name="aperag_vectors_1024_cosine",
- query=query_vector,
- query_filter=Filter(
- must=[
- FieldCondition(key="collection_id", match=MatchValue(value=ctx.collection))
- ]
- ),
- limit=top_k,
-)
-```
-
-##### B.1.3 代码落点
-
-需要修改的主要位置:
-
-
-| 文件 | 改动 |
-| ----------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `aperag/vectorstore/qdrant_connector.py` | ① 去掉每次 `create_collection` 调 `recreate_collection`;改成惰性启动时 `ensure_global_collection(vector_size, distance)`(存在则跳过,不存在则创建并建 `is_tenant` 索引)。② `search/upsert/delete` 全部带上 `collection_id` 过滤。③ 删点时用 `delete_points_by_filter({collection_id, ids})` 限定在当前 tenant 内。 |
-| `aperag/utils/utils.py::generate_vector_db_collection_name` | 废弃。替换为 `get_global_qdrant_collection_name(vector_size, distance)`。保留旧函数一段时间以便在"迁移中"回读兼容。 |
-| `aperag/tasks/collection.py::CollectionTask.create / delete` | 创建时不再 `create_collection`,只需确保全局 collection 存在;删除时改成"按 filter 批量删点"而不是 `delete_collection`。 |
-| `aperag/service/search_pipeline_service.py` 及 `aperag/index/*_index.py` | 每一处 `get_vector_db_connector(...)` 调用点都要把 ApeRAG collection id 作为 tenant filter 传入。 |
-| `aperag/config.py::get_vector_db_connector` | 建议新增签名 `get_vector_db_connector(collection_id: str)`,内部查询向量维度后路由到对应的全局 collection;在 ctx 里挂上 `tenant_key`、`tenant_value`。 |
-
-
-##### B.1.4 停服迁移策略(本次采用)
-
-考虑到停服窗口明确、时长可控,本次采用**停服一次性迁移**,避免双写的复杂度:
-
-1. **停服**:停掉 apiserver / celery worker / beat,Qdrant 仍然运行。
-2. **迁移**:跑 `scripts/migrate_qdrant_multitenancy.py`:
- - 自动枚举所有 `col` 源 collection,跳过孤儿(DB 里查不到的);
- - 为每个 `(vector_size, distance)` 组合创建 `aperag_vectors_{size}_{distance}` 全局 collection,写入完整配置(INT8 量化 / HNSW on_disk / 2 segments / mmap 阈值 / tenant index);
- - scroll 源 collection 所有点,把 `payload.collection_id = <源 collection 名>` 注入后 upsert 进全局 collection;
- - **默认不删源 collection**,需要显式 `--delete-old` 或第二阶段 `--only-delete`;
-3. **发布新版代码**,默认 `QDRANT_MULTITENANT=True`。
-4. **观察 24 h**:读路径(document chunks / search / retrieve)、写路径(新建 collection、新上传文档)都正常后;
-5. **清理**:再跑一次 `--only-delete` 把老的 `col` collection 删除。
-
-> **⚠️ 单向门警告**:一旦新版代码接收任何新写入(新建 collection、新上传文档、新 chat 产生的 chunk),`QDRANT_MULTITENANT` **不能简单切回 `False` 就算回滚**——那期间写入的新点只在 `aperag_vectors`_* 里,legacy 模式下看不到;同期新建的 ApeRAG Collection 也不会有对应的 `col` 物理 collection。如果必须回滚,需要反向迁移(从全局 collection 按 `collection_id` scroll 回每个 `col`)——这段逻辑目前**未实现**。所以:回滚窗口 = 新版代码上线到接收第一笔新写入之间的那几分钟。如果发现问题必须在窗口内决策。
-
-> **另一个单向影响**:是否在停服窗口里把 Qdrant server 顺带从 1.10.0 升到 1.11.x 也是一次决策——见 B.1.1。升级一次就回不去了(KubeBlocks qdrant 0.9.1 支持任意 tag 切换,但数据文件在 1.11 上会被 optimizer 重排)。
-
-##### B.1.5 预期收益
-
-
-| 项 | 现状 | 多租户化后 |
-| ------------------- | ------------ | ------------------------ |
-| Qdrant collection 数 | 1847 | ~1–3(按向量维度分) |
-| Segment 数 | 7387 | ~数十 |
-| RocksDB 实例数 | 7387 | ~数十 |
-| RSS(同等数据量) | 12 GiB | **~2–3 GiB** |
-| 再增长 10× 用户后的 RSS | 120 GiB(不可行) | **~10–15 GiB**(线性随真实数据量) |
-
-
-### C 档 · 锦上添花
-
-#### C.1 **默认开启标量量化(INT8)**
-
-对绝大多数 1024–1536 维的 embedding,INT8 的精度损失小于 1%,但可以把向量存储从 float32 压到 int8,**立刻省 4×**(2.4 GiB → ~0.6 GiB),并且 Qdrant 的量化实现会让向量主体走 mmap,压力进一步下降。
-
-**代码侧改造**(与 B.1.3 同批完成最省力):
-
-```python
-# aperag/vectorstore/qdrant_connector.py
-
-from qdrant_client.http import models as rest
-
-def _default_quantization_config() -> rest.QuantizationConfig:
- return rest.ScalarQuantization(
- scalar=rest.ScalarQuantizationConfig(
- type=rest.ScalarType.INT8,
- quantile=0.99, # 鲁棒分位裁剪,离群点不影响量化区间
- always_ram=True, # 量化后的向量留在 RAM,原 float 全量走 mmap
- )
- )
-
-def _default_hnsw_config() -> rest.HnswConfigDiff:
- return rest.HnswConfigDiff(
- m=16,
- ef_construct=100,
- on_disk=True, # HNSW 图落盘
- )
-
-def _default_optimizer_config() -> rest.OptimizersConfigDiff:
- return rest.OptimizersConfigDiff(
- default_segment_number=2,
- memmap_threshold=20000, # 单位 KB
- )
-```
-
-在 `ensure_global_collection` / 未来所有 `create_collection` 路径里默认带上这三项。
-
-**配置侧改造**(推荐按环境变量可覆盖):在 `aperag/config.py::Settings` 里新增:
-
-```python
-qdrant_enable_quantization: bool = Field(True, alias="QDRANT_ENABLE_QUANTIZATION")
-qdrant_quantization_type: str = Field("int8", alias="QDRANT_QUANTIZATION_TYPE")
-qdrant_hnsw_on_disk: bool = Field(True, alias="QDRANT_HNSW_ON_DISK")
-qdrant_default_segment_number: int = Field(2, alias="QDRANT_DEFAULT_SEGMENT_NUMBER")
-```
-
-然后在两处 values 里把默认值同步过去:
-
-1. `**deploy/aperag/values.yaml**`(当前仓库):在 `vars` 段追加:
- ```yaml
- QDRANT_ENABLE_QUANTIZATION: "true"
- QDRANT_QUANTIZATION_TYPE: "int8"
- QDRANT_HNSW_ON_DISK: "true"
- QDRANT_DEFAULT_SEGMENT_NUMBER: "2"
- ```
-2. `**apecloud/aperag-values**`(独立仓库,生产部署用):同步上面四个环境变量;Qdrant 子 chart 的 server-side 配置(`deploy/databases/qdrant/values.yaml` 等价项)也要加:
- ```yaml
- extra:
- config:
- storage:
- optimizers:
- default_segment_number: 2
- memmap_threshold_kb: 20480
- hnsw_index:
- on_disk: true
- wal:
- wal_capacity_mb: 8
- ```
- > aperag-values 仓库的 PR 需要和本仓库的 Settings/connector 代码改动**同步发版**,避免应用开了量化但旧 Qdrant 不支持的问题。
- >
- > **Qdrant 版本要求**:INT8 量化 / HNSW on_disk / 2 segments / mmap threshold 这些选项 **1.10.0 全部支持**,无需升级。但多租户 `**is_tenant=True`** 需要 **1.11+**;生产 1.10.0 会被连接器自动降级为普通 keyword 索引——功能正确但失去 segment 级 defragmentation。**建议在本次停服窗口里把 Qdrant 也升到 1.11+**(只改镜像 tag,无数据迁移),或在下一次停服窗口里升,并接受 1.11 之前的查询延迟退化。详见 B.1.1 的版本兼容说明。
-
-#### C.2 其他零散优化
-
-- `wal_capacity_mb: 8`(见 A.2):对于 ApeRAG 这种写入量并不高的场景,32 MiB 是过度预留。
-- `on_disk_payload: true`:现状已经是 true,保持即可。
-- 搜索侧:如果开启 INT8 量化,在 `search_params` 里加 `quantization: { rescore: true, oversampling: 2.0 }`,用重排恢复精度。
-
----
-
-## 5. 落地排期建议
-
-
-| 周 | 改动 | 负责方向 |
-| ---- | ----------------------------------------------------- | ----------- |
-| 本周 | A.1 清理(已由 subagent 执行)、A.2 Server 侧配置上线到 staging | DevOps + 后端 |
-| 本周末 | A.2 推到 prod,观察内存变化 48h | DevOps |
-| 次周 | B.1.3 代码改造 + C.1 量化默认开启,提 PR 到 ApeRAG & aperag-values | 后端 |
-| 次周末 | B.1.4 双写上线 staging | 后端 |
-| +2 周 | B.1.4 回填 + 切读 + 清理老 collection | 后端 + DevOps |
-
-
----
-
-## 6. 验证与监控
-
-做完后要有"三条线"的可观测能力:
-
-1. **Qdrant pod RSS**:`container_memory_working_set_bytes{pod="qdrant-cluster-qdrant-0"}`。目标:从 12 GiB 降至 ≤ 4 GiB。
-2. **Qdrant 自身 metrics**(暴露在 `/metrics`):`collections_total`、`pending_optimizations`、`segments_total`。
-3. **RAG 查询延迟/召回**:以现有测试集 replay,要求 recall@10 降幅 < 2%,p95 延迟增幅 < 20%(量化 + on_disk 的权衡)。
-
-把这三条线加到 Grafana 里,作为此次治理的验收门槛。
-
----
-
-## 附录 · 清理执行记录
-
-本次治理启动时,由 subagent 在 `ack-hong-kong` 集群执行了 A.1 的清理脚本。执行结果追加在本节:
-
-### 执行时间
-
-2026-04-20 20:44 CST(UTC+8),集群 `ack-hong-kong`,操作人:cleanup subagent(串行 REST DELETE,单条确认,无并发)。
-
-### 基线(清理前)
-
-
-| 项 | 值 |
-| ---------------------- | --------------------------------------------------------- |
-| Qdrant collection 总数 | **1847** |
-| PG `status='ACTIVE'` | 2003(其中 1833 已在 Qdrant、170 尚未上传文档) |
-| PG `status='DELETED'` | 179(其中 165 早就不在 Qdrant,仅 14 条残留) |
-| Qdrant `qdrant` 容器 RSS | **12 517 MiB**(约 12.2 GiB,`kubectl top pod --containers`) |
-
-
-### 待删清单分类
-
-严格按 `IN_QDRANT \ ACTIVE_IN_PG` 计算,共 **14** 条,全部来源于 PG `status='DELETED'`:
-
-
-| 分类 | 数量 |
-| -------------------------------------- | ------ |
-| DELETED 来源(PG 标记 DELETED 但 Qdrant 仍残留) | **14** |
-| 孤儿来源(Qdrant 有、PG 完全查不到) | **0** |
-
-
-完整列表:
-
-```
-col234abe498124212b col2de94bc540c00373 col364fc3b305b38d9d col428bbf8564ba165a
-col430ad0d0358e4787 col56460b3f4eb88690 col69178ab2362c6391 col78e7bbbe447907eb
-col8682c84fbda6333b col896e0ebd4d699b89 colafa077fd51475227 colc18fb93c44d9f2be
-colc49a291371cc17fc colf67799b2b73f391e
-```
-
-### 删除执行结果
-
-通过 `kubectl port-forward qdrant-cluster-qdrant-0 16333:6333` + `curl -X DELETE http://localhost:16333/collections/{name}` 逐条删除,串行、无并发,完成后关闭 port-forward。
-
-
-| 项 | 值 |
-| ----------------------------------------------------------- | --------------------------- |
-| 发起 DELETE 请求数 | 14 |
-| HTTP 200 且 `result=true` | 13 |
-| HTTP 200 但 `result=false`(Qdrant 端 "collection 已不存在" 的幂等响应) | 1(首条 `col234abe498124212b`) |
-| 真实失败(HTTP 非 200) | **0** |
-| 删除后抽检 14 个名字的 `GET /collections/{name}` | **全部 404**,确认已从 Qdrant 完全消失 |
-
-
-### 清理后状态
-
-
-| 项 | 值(立即) |
-| ---------------------- | ---------------------------------------- |
-| Qdrant collection 总数 | **1833**(-14) |
-| Qdrant `qdrant` 容器 RSS | **12 492 MiB**(约 12.2 GiB,立即值 ≈ -25 MiB) |
-
-
-> 说明:Qdrant 1.10 的 `DELETE /collections/{name}` 只把 collection 从 meta 中摘除,底层段文件回收 / mmap unmap 依赖 optimizer 下一轮调度,进程 RSS 的下降通常滞后。**本条记录为删除完成瞬时的 `kubectl top` 值,真实回收预计在 24 h 内体现。**
-
-### 本次清理的关键结论(重要)
-
-1. **历史上已清过一轮**:PG 有 179 条 `DELETED`,其中 **165 条在之前的治理中已经从 Qdrant 移除**,本次仅残留 14 条;**完全没有孤儿**(Qdrant 里每一个 collection 都能在 PG 里找到对应行)。
-2. **"空 collection 很多"并不等于"可删的很多"**:PG `ACTIVE` 2003 条中,有 170 条仍未在 Qdrant 写入(或写入了还未上传文档),按产品规则属于"用户已建未用"的合法状态,**绝不能删**。真正可清的"空 collection"只是 DELETED 残留(本次 14 条)。这解释了为什么 A.1 步骤实际可清理量远小于"空 collection 总数"。
-3. **A.1 对 12 GiB RSS 的直接收益非常有限**:本次仅摘除 14 个 collection 的 meta,即便 optimizer 回收完毕,预期可释放的 RSS 也远不到 GiB 级。**真正降低 Qdrant 内存占用要靠正文提到的 B 档(多租户化,单 collection + payload index)与 C 档(INT8 量化 + `always_ram=false`)**。A.1 的价值在于"把账对齐、防止 DELETED 残留继续累积"。
-4. **正文第 2 章「12 GiB 分账」依然成立**:清理后活跃 collection 从 1847 → 1833,仅下降 0.76%,原分账中"**1847 × collection 内部结构(HNSW 图 / payload index / 段 header / 通道缓冲)**"一行的量级与结论不变,无需改写正文。
-
-### 失败条目列表
-
-无真实失败。唯一的非典型响应(首条 `col234abe498124212b` 返回 `result:false`)为 Qdrant 端的幂等提示,对象事实上已不存在;抽检 `GET /collections/col234abe498124212b` 返回 404,最终状态正确。
-
----
-
-## 7. 附加改造:Embedding 模型锁定(本 PR 一并上线)
-
-### 7.1 为什么要锁
-
-多租户化之后,**物理 collection 由 `(vector_size, distance)` 唯一决定**
-(见 `global_collection_name()`)。如果允许用户在 Collection 创建后修改
-embedding model,会发生两类数据完整性问题:
-
-1. **维度切换**(e.g. `bge-m3@1024` → `text-embedding-3-large@3072`):
- 写入会被路由到新的 `aperag_vectors_3072_cosine`;但旧的
- `aperag_vectors_1024_cosine` 仍残留着该租户的全部历史向量,**永远不会被读到**,
- 也不会被 `delete_collection` 清理(因为 delete 路径用**当前**的 vector_size 选
- shard)。
-2. **同维度异模型**(e.g. `bge-large-zh@1024` → `bge-m3@1024`):物理上落同一
- shard,但两组向量在同一 HNSW 图里语义空间不兼容,召回质量会莫名退化,且
- `is_tenant` 优化也救不了(`is_tenant` 只按 tenant 分 segment,不区分模型)。
-
-这两种失败模式在单机 Qdrant 时代就已经存在,只是单机 Qdrant 写入时会因维度
-mismatch 直接报错("硬失败");多租户化后变成**软失败**——写入成功,但
-retrieval 完全错乱。所以必须从接口层直接禁止。
-
-### 7.2 改动点
-
-- **后端**:`aperag/service/collection_service.py::CollectionService._reject_embedding_change`
- - 在 `update_collection` 的开头调用,校验:
- - `embedding.model` 不变;
- - `embedding.model_service_provider` / `custom_llm_provider` 不变;
- - 已有 embedding 配置不可被 "清空"。
- - 任一不满足抛 `ValidationException`(映射到 HTTP 400)。
- - 首次绑定(老数据或初次创建后补填)仍然允许。
-- **前端**:`web/src/app/workspace/collections/collection-form.tsx`
- - `action === 'edit'` 时 `Select` 置 `disabled`;
- - 显示 Badge "创建后不可修改 / Locked after creation";
- - `FormDescription` 改为解释性文案;
- - **并跳过** `embeddingModelName` 的 `useEffect` watcher——否则若用户切到 "edit"
- 模式时模型清单里刚好没有原模型(比如对应 provider 已下架),watcher 会自动把
- 表单里的 model 改成列表第 0 项,提交时被后端校验拒绝,用户看起来像 "我啥也
- 没动就不让保存"。
-- **i18n**:新增两个 key `embedding_model_locked_badge`、`embedding_model_locked_description`
- (`page_collections.json` 中英文各一份)。
-
-### 7.3 为什么不在 OpenAPI schema 上强制
-
-考虑过在 Pydantic `CollectionUpdate`
-单独拷一份不含 `embedding` 的子 schema,但:
-
-- 现有 `CollectionUpdate = CollectionCreate` 的全量复用会被破坏;
-- 前端已经在 edit 模式下不提交 embedding 字段的 UX 由锁定逻辑保证;
-- 服务端显式报错反而比 "schema 层静默 drop 字段" 更友好(用户能看到具体原因)。
-
-因此采用 "schema 保持灵活 + service 层强校验" 的组合。
-
----
-
-## 8. 向量数据库全链路 Review 要点
-
-本节记录在实现多租户 + embedding 锁定过程中做的一次**全链路代码 review**,
-分成 "已修 / 已知有意为之 / 待跟进" 三档,避免以后踩同一个坑。
-
-### 8.1 已修(本 PR)
-
-**R1. Delete 路径在 embedding provider 下线时会把数据孤在 Qdrant 里**
-(`aperag/tasks/collection.py::_delete_vector_databases`)
-
-- 背景:多租户化后 `delete_collection` 依赖当前 `vector_size` 去选 shard。
- 若 provider 被下架 → `get_collection_embedding_service_sync` 抛异常 →
- 代码原本默默 fallback 到某个 "默认" shard,真实 shard 里的该租户向量永远留在那里。
-- 修复:新增 `QdrantVectorStoreConnector._purge_tenant_from_all_global_collections`,
- `delete_collection(purge_all_shards=True)` 时枚举所有 `aperag_vectors_*` 做
- `FilterSelector` 级别的点删除;`_delete_vector_databases` 在无法解析 vector_size
- 时走这个兜底路径。
-- 影响:删 Collection 再也不会留孤儿点。
-
-**R2. Node metadata 里 `collection_id` 可能缺失**
-(`aperag/llm/embed/embedding_utils.py::create_embeddings_and_store`)
-
-- 背景:多租户过滤依赖 payload 顶层的 `collection_id`(LlamaIndex 会把
- `node.metadata` 扁平化进 payload);但 `vector_index / summary_index` 历史上
- 只在 `extra_info` 里塞,没有在 `node.metadata` 里设,视觉 index 的两个路径
- 甚至完全绕过了 `create_embeddings_and_store`。
-- 修复:
- - 在三个 indexer 里显式 `part.metadata['collection_id'] = ...`;
- - 在 `create_embeddings_and_store` 里再补一次防御性注入(如缺就用 connector
- 的 `tenant_id`);
- - vision 两处直连 `store.add` 也补了相同字段。
-- 影响:即使上游忘设,多租户过滤依然命中正确 shard。
-
-**R3. `is_tenant=True` 在 Qdrant < 1.11 的兼容**
-
-- Qdrant 1.10(线上版本)不认 `is_tenant` 字段,直接 `400`。
-- 修复:`_ensure_tenant_payload_index` 三级 fallback:先带 `is_tenant` 试 →
- 捕获 → 不带 `is_tenant` 的 keyword 索引 → 再捕获 → 记 warning。
- 升级到 1.11+ 之后 tenant 级 defragmentation 自动生效,无需再改代码。
-
-**R4. `create_collection` 的初始化顺序**
-
-- 以前是先构造 `QdrantVectorStore(client, collection_name)`(LlamaIndex 里这步
- 会触发 `GET /collections/xxx`,collection 不存在时抛 warning/error),再
- `_ensure_collection`。
-- 改成 "先 ensure,后 wrap",去掉了一条噪声日志,也避免了冷启动窗口里
- 竞态读到 404。
-
-**R5. 迁移脚本的安全断言**
-
-- `scripts/migrate_qdrant_multitenancy.py` 在开头加了
- `assert generate_vector_db_collection_name(x) == str(x)`。
- 若未来改命名规则,脚本会立即停机而不是静默把数据写错 shard。
-
-### 8.2 已知但有意为之(这次不动)
-
-**K1. 每次 search 会重建 `QdrantClient`**
-(`aperag/service/search_pipeline_service.py` 的三个 `_*_search` 都会
-`VectorStoreConnectorAdaptor(ctx)` 一次)
-
-- 现状:`_ENSURED_COLLECTIONS` 进程级 set 避免了重复 `_ensure_collection`,
- 但 `QdrantClient` 本身每次都是新的。
-- 为什么先不动:(1) HTTP client 本身带 keep-alive,grpc 也有连接池,单查询
- overhead 可接受;(2) 做成单例需要把线程安全、刷新策略、tenant 切换等一起设计,
- 范围超出本 PR。
-- Follow-up:放到抽象层 M2(见 `vector_db_abstraction.md` §4.1)。
-
-**K2. `ContextManager._create_combined_filter` 硬编码 Qdrant filter 类型**
-(`aperag/context/context.py` 直接 import `qdrant_client.models`)
-
-- 典型抽象破口:任何后端切换都要在这里加分支。
-- 为什么先不动:单后端状态下,重构此处没有功能收益,反而引入回归风险。
-- Follow-up:抽象层 M2 中新增 `VectorFilter` DSL 后统一收敛。
-
-**K3. `retrieve()` 未进入基类**
-(`aperag/vectorstore/base.py` 里没有该抽象方法,但 `document_service.py` 直接
-调 `connector.retrieve(...)`)
-
-- 若未来切 pgvector/Milvus 会直接 `AttributeError`。
-- Follow-up:抽象层 M2 中补齐,顺便把 `with_vectors`、`with_payload` 这些参数
- 做成统一语义。
-
-**K4. Vision 索引绕过 `create_embeddings_and_store`,直连 `store.add(nodes)`**
-(`aperag/index/vision_index.py` 两处)
-
-- 目前我们在两处都手工补了 `metadata['collection_id']`,语义是对的,但
- "两份写路径" 带来的维护成本需要记住——将来 `create_embeddings_and_store`
- 的 metadata 约定变更时,vision 路径要同步修改,容易漏。
-- Follow-up:抽象层 M2 中把 "写点" 统一到 `connector.upsert(tenant, points)`,
- 彻底去掉 `store.add` 直连。
-
-**K5. 业务层 collection id 被直接当做 tenant id**
-
-- `QdrantVectorStoreConnector.__init__` 用 `ctx['collection']` 当
- `tenant_id`;`generate_vector_db_collection_name(collection_id) == str(collection_id)`
- 是幂等映射。
-- 这是当前正确、简单的做法;迁移脚本也断言了这点。记录在案,防止以后
- 有人把 "Qdrant collection name" 和 "ApeRAG collection id" 拆成两个不同字符串
- 时忘了同步断言。
-
-### 8.3 待跟进(不在本 PR 范围)
-
-**F1. `ContextManager` 过滤条件里 `doc_id` / `document_id` / `ref_doc_id` 三名共存**
-
-- 历史上 LlamaIndex 往 payload 里同时写 `doc_id` 和 `document_id`(不同版本命名),
- 部分过滤用 `doc_id`、部分用 `document_id`。
-- 当前 review 认为语义 OK(查询端兼容两套),但等 M2 抽象层时应该收敛到单一 key。
-
-**F2. `retrieve` 的 `with_payload=True` 语义差异**
-
-- Qdrant 的 "payload" ≈ pgvector 的 "payload JSONB";但 LlamaIndex 期望 payload 里
- 包含 `_node_content` 序列化字符串。未来实现 pgvector 后端时要注意把这种
- LlamaIndex 约定从"协议"降级成"Qdrant 后端实现细节"(见抽象层 §5.1)。
-
-**F3. 并发场景下 `_ENSURED_COLLECTIONS` 的 lock 粒度**
-
-- 当前 `threading.Lock()` 是进程级、全局单把锁。多个线程同时首次访问不同
- collection 时会串行化,但由于 `_ensure_collection` 本身幂等、只在冷启动触发
- 一次,实际不是瓶颈。观测到 QPS 上升再评估。
-
-**F4. Qdrant 1.10 → 1.11 升级窗口**
-
-- 升级后 `_ensure_tenant_payload_index` 的 "带 is_tenant" 路径会开始生效,此时
- 已经存在的 keyword 索引不会自动升级为 tenant-aware。建议升级操作脚本里
- 额外一步 `DELETE index → 重建 with is_tenant=True`(无数据影响,毫秒级)。
-
----
-
-## 9. 关联设计:向量数据库抽象层
-
-本次 Qdrant 优化已经暴露了三条抽象破口(见 §8.2 K1/K2/K3)。这些问题在
-单后端状态下可以接受,但一旦要支持 pgvector/Milvus 就会成为硬阻塞。
-
-详见同目录 [`vector_db_abstraction.md`](./vector_db_abstraction.md),该文档以
-当前代码事实为起点,给出了:
-
-- 三层分层(Transport / 能力抽象 / 调优);
-- 最小可行过滤 DSL;
-- Qdrant / pgvector / Milvus 三个后端的实现草图;
-- 路线图 M1 → M4;
-- 与本次 embedding 锁定的依赖关系(锁定是抽象层的前置条件)。
-
-**何时启动抽象层**:触发条件见 `vector_db_abstraction.md` §10。在触发之前,
-本文档和抽象层设计文档共同构成决策依据,不做任何代码层面的先行重构。
diff --git a/docs/zh-CN/design/tag_based_permission_design.md b/docs/zh-CN/design/tag_based_permission_design.md
deleted file mode 100644
index 7b111fad0..000000000
--- a/docs/zh-CN/design/tag_based_permission_design.md
+++ /dev/null
@@ -1,477 +0,0 @@
-# ApeRAG 标签系统与批量授权设计
-
-## 概述
-
-标签系统是一个轻量级的**分组工具**,用于批量管理用户和知识库的访问关系。
-
-**核心理念**:
-- 标签只用于分组,不参与权限判断
-- 批量授权 = 批量创建订阅记录
-- 完全复用现有的订阅机制
-- 只新增 2 张表,不修改现有逻辑
-- 通用的标签关系表,易于扩展
-
-## 业务场景
-
-### 场景 1:批量授权
-```
-需求:让研发团队(20人)访问技术文档库
-操作:
- 1. 给 20 人打上"研发团队"标签
- 2. 点击"授权给研发团队标签"
-结果:
- 系统批量创建 20 条订阅记录
-```
-
-### 场景 2:批量订阅
-```
-需求:新员工需要访问 10 个入门知识库
-操作:
- 1. 给新员工打上"新员工"标签
- 2. 给 10 个知识库打上"新员工必读"标签
- 3. 点击"批量订阅"
-结果:
- 创建订阅记录,新员工可访问这些知识库
-```
-
-### 场景 3:临时协作
-```
-需求:双十一项目组需要共享资料
-操作:
- 1. 创建"双十一项目"标签
- 2. 给项目成员打标签
- 3. 给项目知识库打标签
- 4. 批量授权
-结果:
- 项目组成员可访问项目知识库
-```
-
-## 系统架构
-
-### 整体流程
-
-```
-┌─────────────────────────────────────────────────────────────┐
-│ Frontend │
-│ 用户管理:给用户打标签 │
-│ 知识库设置:给知识库打标签、批量授权 │
-└────────┬────────────────────────────────────────────────────┘
- │
- ▼
-┌─────────────────────────────────────────────────────────────┐
-│ Service Layer │
-│ tag_service.py - 标签 CRUD │
-│ tag_relation_service.py - 打标签(通用) │
-│ batch_permission_service.py - 批量授权(新增) │
-│ ↓ 调用 │
-│ marketplace_service.py - 创建订阅记录(复用) │
-└────────┬────────────────────────────────────────────────────┘
- │
- ▼
-┌─────────────────────────────────────────────────────────────┐
-│ PostgreSQL │
-│ tag (新增) - 标签表 │
-│ tag_relation (新增) - 标签关系表(通用) │
-│ user_collection_subscription - 订阅表(复用,权限的实际存储) │
-└─────────────────────────────────────────────────────────────┘
-```
-
-### 批量授权流程
-
-```
-用户点击"授权给研发团队标签"
- ↓
-查询标签下的所有用户
-SELECT target_id FROM tag_relation
-WHERE tag_id='研发团队' AND target_type='user'
- ↓
-过滤已有订阅的用户
- ↓
-批量创建订阅记录
-INSERT INTO user_collection_subscription ...
- ↓
-清除权限缓存
- ↓
-返回结果:成功 15人,已有权限 3人
-```
-
-## 数据模型
-
-### 新增表(2张)
-
-#### 1. tag - 标签表
-```sql
-CREATE TABLE tag (
- id VARCHAR(24) PRIMARY KEY,
- name VARCHAR(50) NOT NULL,
- description VARCHAR(200),
- user VARCHAR(256) NOT NULL, -- 创建者
- gmt_created TIMESTAMP NOT NULL,
- gmt_updated TIMESTAMP NOT NULL,
- gmt_deleted TIMESTAMP
-);
-
--- 同一创建者下标签名称唯一
-CREATE UNIQUE INDEX uq_tag_name_user ON tag(name, user, gmt_deleted);
-CREATE INDEX idx_tag_user ON tag(user);
-```
-
-**字段说明**:
-- `name`: 标签名称,如"研发团队"、"新员工必读"
-- `description`: 标签描述,可选
-- `user`: 创建者,系统管理员创建的标签
-
-#### 2. tag_relation - 标签关系表(通用)
-```sql
-CREATE TABLE tag_relation (
- id VARCHAR(24) PRIMARY KEY,
- tag_id VARCHAR(24) NOT NULL,
- target_type VARCHAR(20) NOT NULL, -- 'user', 'collection', 'document', 'bot' ...
- target_id VARCHAR(24) NOT NULL,
- gmt_created TIMESTAMP NOT NULL,
- FOREIGN KEY (tag_id) REFERENCES tag(id) ON DELETE CASCADE
-);
-
--- 防止重复打标签
-CREATE UNIQUE INDEX uq_tag_relation ON tag_relation(tag_id, target_type, target_id);
-
--- 查询某个资源的所有标签
-CREATE INDEX idx_tag_relation_target ON tag_relation(target_type, target_id);
-
--- 查询某个标签关联的所有资源(批量操作关键索引)
-CREATE INDEX idx_tag_relation_tag_type ON tag_relation(tag_id, target_type);
-```
-
-**字段说明**:
-- `tag_id`: 标签ID
-- `target_type`: 目标资源类型,当前支持:
- - `user`: 用户
- - `collection`: 知识库
- - 未来可扩展:`document`, `bot`, `chat` 等
-- `target_id`: 目标资源ID
-
-**扩展性**:
-```sql
--- 给用户打标签
-INSERT INTO tag_relation (tag_id, target_type, target_id)
-VALUES ('tag_123', 'user', 'user_456');
-
--- 给知识库打标签
-INSERT INTO tag_relation (tag_id, target_type, target_id)
-VALUES ('tag_123', 'collection', 'col_789');
-
--- 未来给文档打标签
-INSERT INTO tag_relation (tag_id, target_type, target_id)
-VALUES ('tag_123', 'document', 'doc_abc');
-```
-
-### 复用表(不修改)
-
-- `user_collection_subscription` - 订阅表(权限的实际存储)
-- `collection` - 知识库表
-- `user` - 用户表
-
-### 表关系
-
-```
-tag (1) ───────< (N) tag_relation
- │
- ├─── target_type='user' ──> user (N)
- ├─── target_type='collection' ──> collection (N)
- └─── target_type='document' ──> document (N) // 未来
-```
-
-## API 设计
-
-### 1. 标签管理
-
-```http
-# 创建标签
-POST /api/v1/tags
-{
- "name": "研发团队",
- "description": "研发相关人员"
-}
-
-# 获取标签列表
-GET /api/v1/tags?search=研发&page=1&page_size=20
-
-# 更新标签
-PUT /api/v1/tags/{tag_id}
-{
- "name": "研发团队(新)",
- "description": "更新后的描述"
-}
-
-# 删除标签(会级联删除所有 tag_relation 记录)
-DELETE /api/v1/tags/{tag_id}
-```
-
-### 2. 用户标签
-
-```http
-# 给用户添加标签
-POST /api/v1/users/{user_id}/tags
-{
- "tag_ids": ["tag_abc123", "tag_def456"]
-}
-
-# 移除用户标签
-DELETE /api/v1/users/{user_id}/tags/{tag_id}
-
-# 获取用户的所有标签
-GET /api/v1/users/{user_id}/tags
-
-# 获取标签下的所有用户
-GET /api/v1/tags/{tag_id}/users
-```
-
-### 3. 知识库标签
-
-```http
-# 给知识库添加标签
-POST /api/v1/collections/{collection_id}/tags
-{
- "tag_ids": ["tag_abc123"]
-}
-
-# 移除知识库标签
-DELETE /api/v1/collections/{collection_id}/tags/{tag_id}
-
-# 获取知识库的所有标签
-GET /api/v1/collections/{collection_id}/tags
-```
-
-### 4. 批量授权(核心)
-
-```http
-# 批量授权知识库给用户标签
-POST /api/v1/collections/{collection_id}/grant-to-tag
-{
- "tag_id": "tag_abc123"
-}
-
-Response:
-{
- "collection_id": "col_123",
- "tag_id": "tag_abc123",
- "tag_name": "研发团队",
- "total_users": 18,
- "new_granted": 15,
- "already_granted": 3,
- "failed": 0
-}
-
-# 批量订阅知识库标签
-POST /api/v1/users/me/subscribe-tag
-{
- "tag_id": "tag_def456" // 知识库标签
-}
-
-Response:
-{
- "tag_id": "tag_def456",
- "tag_name": "新员工必读",
- "total_collections": 10,
- "new_subscribed": 8,
- "already_subscribed": 2
-}
-
-# 批量撤销授权
-POST /api/v1/collections/{collection_id}/revoke-from-tag
-{
- "tag_id": "tag_abc123"
-}
-```
-
-## 实现要点
-
-### 1. 批量授权实现
-
-```python
-def grant_collection_to_user_tag(collection_id: str, tag_id: str, operator_id: str):
- # 1. 权限检查:只有 owner 可以授权
- collection = get_collection(collection_id)
- if collection.user != operator_id:
- raise PermissionDenied()
-
- # 2. 查询标签下的所有用户
- user_ids = db.query(TagRelation.target_id).filter(
- TagRelation.tag_id == tag_id,
- TagRelation.target_type == 'user'
- ).all()
-
- # 3. 过滤已有订阅的用户
- existing = db.query(UserCollectionSubscription.user_id).filter(
- UserCollectionSubscription.user_id.in_(user_ids),
- # ... 其他条件
- ).all()
-
- # 4. 批量创建订阅记录
- new_user_ids = [u for u in user_ids if u not in existing]
- for user_id in new_user_ids:
- create_subscription(user_id, collection_id)
-
- # 5. 清除权限缓存
- clear_permission_cache(new_user_ids, collection_id)
-
- return {
- 'total': len(user_ids),
- 'new_granted': len(new_user_ids),
- 'already_granted': len(existing)
- }
-```
-
-### 2. 权限检查(不变)
-
-```python
-def check_permission(user_id: str, collection_id: str) -> bool:
- """
- 权限检查逻辑保持不变,不查询标签表
- """
- # 1. 检查 owner
- if collection.user == user_id:
- return True
-
- # 2. 检查订阅(包括通过标签批量授权创建的订阅)
- subscription = db.query(UserCollectionSubscription).filter(
- UserCollectionSubscription.user_id == user_id,
- # ... 其他条件
- ).first()
-
- if subscription:
- return True
-
- # 3. 检查公开知识库
- if collection.is_published:
- return True
-
- return False
-```
-
-### 3. 性能优化
-
-**批量插入优化**:
-```python
-# 使用 bulk_insert_mappings 批量插入
-subscriptions = [
- {
- 'id': f'sub_{random_id()}',
- 'user_id': user_id,
- 'collection_marketplace_id': marketplace_id,
- 'gmt_subscribed': utc_now()
- }
- for user_id in new_user_ids
-]
-db.bulk_insert_mappings(UserCollectionSubscription, subscriptions)
-db.commit()
-```
-
-**缓存策略**:
-```python
-# 缓存标签成员列表
-redis.setex(f'tag_users:{tag_id}', 1800, json.dumps(user_ids))
-
-# 缓存权限检查结果
-redis.setex(f'permission:{user_id}:{collection_id}', 300, 'true')
-```
-
-## 权限控制
-
-### 操作权限
-
-| 操作 | 权限要求 |
-|------|---------|
-| 创建标签 | is_superuser=true |
-| 给用户打标签 | is_superuser=true |
-| 给知识库打标签 | 知识库 owner |
-| 批量授权 | 知识库 owner |
-| 批量订阅 | 任何用户(订阅自己) |
-
-### 安全限制
-
-```python
-# 批量操作数量限制
-MAX_BATCH_SIZE = 1000
-
-if user_count > MAX_BATCH_SIZE:
- raise BadRequest("Too many users, max 1000")
-
-# 审计日志
-audit_log.record(
- user_id=operator_id,
- action='batch_grant_to_tag',
- details={
- 'tag_id': tag_id,
- 'affected_users': 15
- }
-)
-```
-
-## 迁移方案
-
-### 数据库迁移
-
-```sql
--- 1. 创建标签表
-CREATE TABLE tag (
- id VARCHAR(24) PRIMARY KEY,
- name VARCHAR(50) NOT NULL,
- description VARCHAR(200),
- user VARCHAR(256) NOT NULL,
- gmt_created TIMESTAMP NOT NULL,
- gmt_updated TIMESTAMP NOT NULL,
- gmt_deleted TIMESTAMP
-);
-
--- 2. 创建标签关系表(通用)
-CREATE TABLE tag_relation (
- id VARCHAR(24) PRIMARY KEY,
- tag_id VARCHAR(24) NOT NULL,
- target_type VARCHAR(20) NOT NULL,
- target_id VARCHAR(24) NOT NULL,
- gmt_created TIMESTAMP NOT NULL,
- FOREIGN KEY (tag_id) REFERENCES tag(id) ON DELETE CASCADE
-);
-
--- 3. 创建索引
-CREATE UNIQUE INDEX uq_tag_name_user ON tag(name, user, gmt_deleted);
-CREATE INDEX idx_tag_user ON tag(user);
-CREATE UNIQUE INDEX uq_tag_relation ON tag_relation(tag_id, target_type, target_id);
-CREATE INDEX idx_tag_relation_target ON tag_relation(target_type, target_id);
-CREATE INDEX idx_tag_relation_tag_type ON tag_relation(tag_id, target_type);
-
--- 完成,无需数据迁移
-```
-
-### 部署步骤
-
-1. **Phase 1**:部署后端 API
-2. **Phase 2**:前端添加标签管理 UI
-3. **Phase 3**:灰度测试
-4. **Phase 4**:全量上线
-
-完全兼容现有系统,无需修改现有代码。
-
-## 未来扩展(可选)
-
-1. **智能推荐**:根据用户行为推荐标签
-2. **标签模板**:预定义常用标签组合,一键应用
-3. **统计分析**:标签使用情况和趋势分析
-4. **外部同步**:与钉钉/LDAP 同步组织架构
-5. **审批流程**:批量授权触发审批流程
-
-## 总结
-
-**核心特点**:
-- 标签只是分组工具,不参与权限判断
-- 批量授权 = 批量创建订阅记录
-- 只新增 2 张表,完全兼容现有系统
-- 通用的 tag_relation 表,易于扩展到其他资源
-- 实现简单,易于维护
-
-**实施路径**:
-1. 实现标签 CRUD
-2. 实现批量授权核心功能
-3. 优化性能和体验
-4. 根据反馈迭代
diff --git a/docs/zh-CN/design/url_and_text_import_design.md b/docs/zh-CN/design/url_and_text_import_design.md
deleted file mode 100644
index 47aaa983a..000000000
--- a/docs/zh-CN/design/url_and_text_import_design.md
+++ /dev/null
@@ -1,590 +0,0 @@
----
-title: URL 与文本导入设计
-position: 4
----
-
-# Collection 文档导入扩展:URL 抓取与文本粘贴
-
-## 概述
-
-本文档描述在 ApeRAG Collection 中新增两种文档来源方式的设计:
-
-1. **URL 导入**:用户输入网址,系统自动调用 `web/read` 接口抓取页面内容,生成 Markdown 文件,走现有两阶段上传流程入库。
-2. **文本导入**:用户在前端粘贴文本,前端直接将其封装为 `.txt` 文件,调用现有上传接口,**完全无需新增后端代码**。
-
-两种方式都只是"给现有上传流程提供文件内容的方式",confirm 及后续索引构建完全复用现有逻辑。
-
-> **范围说明**:本期不包含"根据文字搜索网络并导入"功能,但架构设计保留此扩展空间。
-
----
-
-## 设计原则
-
-> URL 抓取和文本粘贴只是"选择文件"的替代方式。
-
-一旦内容到手(Markdown 字符串 / 文本字符串),它就被包装成一个虚拟文件,走与普通文件上传完全相同的路径:
-
-```
-[来源] [统一入口] [后续流程(不变)]
-文件选择 ──────────► upload_document() ──► UPLOADED ──► confirm ──► 索引构建
-URL 抓取 ──────────►(虚拟 UploadFile)
-文本粘贴 ──────────►(前端 File 对象)
-```
-
----
-
-## 现状与可复用组件
-
-### 现有两阶段上传流程
-
-```
-Step 1: POST /collections/{id}/documents/upload → status = UPLOADED(临时)
-Step 2: POST /collections/{id}/documents/confirm → status = PENDING → 触发索引构建
-```
-
-URL 导入和文本导入都将产出 `UPLOADED` 状态的文档,与文件上传无缝衔接。
-
-### 关键可复用组件
-
-| 组件 | 位置 | 如何复用 |
-|------|------|---------|
-| 文档上传服务 | `aperag/service/document_service.py` → `upload_document()` | URL 导入后端调用此方法存储抓取内容 |
-| 文档确认服务 | `document_service.confirm_documents()` | 完全不变 |
-| Web Read 接口 | `POST /api/v1/web/read`(`aperag/views/web.py`) | URL 导入后端通过 HTTP 调用此接口 |
-| ReaderService | `aperag/websearch/reader/reader_service.py` | web/read 的底层实现(JINA + Trafilatura fallback) |
-| 文档列表页暂存区 | `document-upload.tsx` | URL/文本产出的 `UPLOADED` 文档自动出现在此列表 |
-
----
-
-## 架构设计
-
-### 总体流程
-
-```
-┌─────────────────────────────────────────────────────────────────┐
-│ Frontend (Next.js) │
-│ │
-│ ┌──────────────┐ ┌─────────────────┐ ┌──────────────────┐ │
-│ │ 📁 选择文件 │ │ 🔗 输入网址 │ │ 📋 粘贴文字 │ │
-│ │ (现有) │ │ (新增) │ │ (新增) │ │
-│ └──────┬───────┘ └────────┬────────┘ └────────┬─────────┘ │
-│ │ │ │ │
-│ │ POST /fetch-url │ │
-│ │ │ new File([text], "x.txt") │
-│ │ │ │ │
-│ └───────────────────┴────────────────────┘ │
-│ │ │
-│ POST /documents/upload │
-│ (现有接口,所有来源统一入口) │
-└─────────────────────────────┬───────────────────────────────────┘
- │
- ▼
- ┌───────────────────────────────┐
- │ document_service │
- │ .upload_document() │
- │ status = UPLOADED │
- └───────────────┬───────────────┘
- │
- 用户在暂存区点击"保存到知识库"
- │
- POST /documents/confirm
- │
- ▼
- ┌───────────────────────────────┐
- │ document_service │
- │ .confirm_documents() │
- │ status: UPLOADED → PENDING │
- │ 创建 DocumentIndex 记录 │
- │ 触发 reconcile 任务 │
- └───────────────────────────────┘
-```
-
----
-
-## URL 导入详细设计
-
-### 新增后端接口
-
-**接口**:`POST /api/v1/collections/{collection_id}/documents/fetch-url`
-
-此接口是本次需求唯一新增的后端接口。
-
-**职责**:
-1. 接收用户提交的 URL 列表
-2. 通过 HTTP 调用内部 `POST /api/v1/web/read` 接口抓取页面内容
-3. 将每个 URL 的抓取结果包装为虚拟 `UploadFile` 对象
-4. 调用现有 `document_service.upload_document()` 存储到对象存储(status=UPLOADED)
-5. 返回创建的 `document_id` 列表,前端将其合并到暂存区
-
-**Request Body**:
-
-```json
-{
- "urls": [
- "https://example.com/article1",
- "https://example.com/article2"
- ]
-}
-```
-
-| 字段 | 类型 | 必填 | 约束 |
-|------|------|------|------|
-| `urls` | `string[]` | ✅ | 1~10 个,必须是合法 http/https URL |
-
-**Response**(200 OK):
-
-```json
-{
- "documents": [
- {
- "id": "doc_abc123",
- "name": "示例文章标题.md",
- "status": "UPLOADED",
- "size": 8192,
- "url": "https://example.com/article1",
- "fetch_status": "success"
- },
- {
- "id": null,
- "name": null,
- "status": null,
- "url": "https://example.com/article2",
- "fetch_status": "error",
- "error": "页面无法访问(403)"
- }
- ],
- "total": 2,
- "succeeded": 1,
- "failed": 1
-}
-```
-
-**说明**:
-- 接口同步执行(URL 数量限制在 10 个以内,抓取过程在接口请求内完成)
-- 部分 URL 失败不影响其他 URL,前端针对失败项目展示错误信息
-- 成功的文档处于 `UPLOADED` 状态,出现在前端暂存区供用户确认
-
-### 后端实现逻辑
-
-在 `aperag/views/collections.py` 中新增路由函数(约 60 行):
-
-```python
-@router.post("/collections/{collection_id}/documents/fetch-url", tags=["documents"])
-@audit(resource_type="document", api_name="FetchUrlDocument")
-async def fetch_url_document_view(
- request: Request,
- collection_id: str,
- body: view_models.FetchUrlRequest,
- user: User = Depends(required_user),
-) -> view_models.FetchUrlResponse:
- """
- Fetch web page content from URLs and create UPLOADED documents.
-
- Internally calls POST /api/v1/web/read to retrieve page content,
- then wraps each result as a virtual UploadFile and calls
- document_service.upload_document() to persist as UPLOADED documents.
- """
- results = []
-
- # Step 1: Call web/read service layer (via HTTP to /api/v1/web/read)
- web_read_request = WebReadRequest(url_list=body.urls, timeout=30)
- web_read_response = await _call_web_read(web_read_request, user)
-
- # Step 2: For each result, wrap as UploadFile and call upload_document()
- for item in web_read_response.results:
- if item.status != "success" or not item.content:
- results.append(FetchUrlResultItem(
- url=item.url,
- fetch_status="error",
- error=item.error or "Failed to fetch content",
- ))
- continue
-
- # Determine filename from page title or URL
- filename = _url_to_filename(item.title, item.url)
-
- # Wrap Markdown content as a virtual UploadFile
- virtual_file = _make_upload_file(filename, item.content.encode("utf-8"))
-
- try:
- doc = await document_service.upload_document(
- user=str(user.id),
- collection_id=collection_id,
- file=virtual_file,
- extra_metadata={"source_url": item.url, "source_type": "url"},
- )
- results.append(FetchUrlResultItem(
- url=item.url,
- fetch_status="success",
- document=doc,
- ))
- except Exception as e:
- results.append(FetchUrlResultItem(
- url=item.url,
- fetch_status="error",
- error=str(e),
- ))
-
- return FetchUrlResponse(
- documents=results,
- total=len(results),
- succeeded=sum(1 for r in results if r.fetch_status == "success"),
- failed=sum(1 for r in results if r.fetch_status == "error"),
- )
-```
-
-### 调用 web/read 的方式
-
-采用**调用 Service 层**而非发起内部 HTTP 请求,直接复用 `web.py` 中的 `_read_with_jina_fallback` / `_read_with_trafilatura_only` 私有函数(或将其提取为 `reader_service` 的共享方法):
-
-```python
-async def _call_web_read(request: WebReadRequest, user: User) -> WebReadResponse:
- """Call web read service layer, with JINA + Trafilatura fallback."""
- jina_api_key = await _get_user_jina_api_key(user)
- if jina_api_key:
- return await _read_with_jina_fallback(request, jina_api_key)
- else:
- return await _read_with_trafilatura_only(request)
-```
-
-> 这些私有函数已在 `aperag/views/web.py` 中实现,将其提取到 `aperag/websearch/reader/reader_service.py` 的公共方法即可被两处复用,保持模块化边界。
-
----
-
-## 文本导入详细设计
-
-### 零后端改动
-
-文本导入**不需要新增任何后端接口**。前端在客户端将用户粘贴的文本封装为标准的 `File` 对象,调用现有上传接口:
-
-```typescript
-// web/src/app/.../import/text-import.tsx
-
-const handleImport = async () => {
- const filename = title.trim() ? `${title.trim()}.txt` : `note-${Date.now()}.txt`;
- const file = new File([textContent], filename, { type: "text/plain" });
-
- // Reuse existing upload API — no new backend endpoint needed
- const response = await apiClient.defaultApi.collectionsCollectionIdDocumentsUploadPost({
- collectionId: collection.id,
- file,
- });
-
- // Add to staging area (same as file upload)
- onDocumentUploaded(response.data);
-};
-```
-
-这样文本文档与文件上传产出完全相同的结果(`UPLOADED` 状态的 Document),在暂存区中一视同仁。
-
----
-
-## 前端设计
-
-### 入口 Dialog
-
-在文档列表页的"添加文档"按钮点击后,显示来源选择 Dialog:
-
-```
-┌──────────────────────────────────────────────────┐
-│ 向知识库中添加文档 [×] │
-├──────────────────────────────────────────────────┤
-│ │
-│ ┌────────────────┐ ┌──────────┐ ┌──────────┐ │
-│ │ 📁 │ │ 🔗 │ │ 📋 │ │
-│ │ 上传文件 │ │ 网址 │ │ 粘贴文字 │ │
-│ └────────────────┘ └──────────┘ └──────────┘ │
-│ │
-└──────────────────────────────────────────────────┘
-```
-
-### 网址导入表单(`url-import.tsx`)
-
-```
-┌──────────────────────────────────────────────────┐
-│ ← 网址导入 [×] │
-├──────────────────────────────────────────────────┤
-│ 粘贴网址,系统将自动抓取页面内容导入知识库。 │
-│ │
-│ ┌──────────────────────────────────────────┐ │
-│ │ https://example.com/article1 │ │
-│ │ https://example.com/article2 │ │
-│ │ │ │
-│ └──────────────────────────────────────────┘ │
-│ │
-│ • 每行输入一个网址,最多 10 个 │
-│ • 仅支持公开可访问的网页 │
-│ • 需要登录的页面无法抓取 │
-│ │
-│ [取消] [抓取并添加] │
-└──────────────────────────────────────────────────┘
-```
-
-点击"抓取并添加"后:
-1. 调用 `POST /collections/{id}/documents/fetch-url`
-2. 成功的 URL 对应文档出现在上传暂存区(同文件上传)
-3. 失败的 URL 在 Dialog 内以红色错误信息展示
-4. Dialog 关闭,用户在暂存区一起 confirm
-
-### 粘贴文字表单(`text-import.tsx`)
-
-```
-┌──────────────────────────────────────────────────┐
-│ ← 粘贴文字 [×] │
-├──────────────────────────────────────────────────┤
-│ 粘贴文字内容,即可将其导入知识库。 │
-│ │
-│ 标题(可选) │
-│ ┌──────────────────────────────────────────┐ │
-│ │ 我的笔记 │ │
-│ └──────────────────────────────────────────┘ │
-│ │
-│ 内容 │
-│ ┌──────────────────────────────────────────┐ │
-│ │ 在此处粘贴文字… │ │
-│ │ │ │
-│ └──────────────────────────────────────────┘ │
-│ │
-│ [取消] [添加] │
-└──────────────────────────────────────────────────┘
-```
-
-点击"添加"后:
-1. 前端创建 `new File([content], "${title}.txt")` 对象
-2. 调用现有 `POST /documents/upload` 接口(无感知)
-3. 文档出现在上传暂存区
-4. Dialog 关闭,用户在暂存区 confirm
-
-### 前端暂存区(扩展现有 `document-upload.tsx`)
-
-现有暂存区已支持展示所有 `UPLOADED` 状态文档。URL/文本产出的文档与文件上传文档合并展示,confirm 操作完全不变:
-
-```
-┌─────────────────────────────────────────────────────────────┐
-│ 暂存区(待确认) │
-├───────────────────────────────────────┬──────┬─────────────┤
-│ 文档名 │ 大小 │ 状态 │
-├───────────────────────────────────────┼──────┼─────────────┤
-│ 📄 user_manual.pdf │ 5 MB │ ✅ 已上传 │
-│ 🌐 示例文章标题.md(来自 URL) │ 8 KB │ ✅ 已上传 │
-│ 📝 我的笔记.txt │ 2 KB │ ✅ 已上传 │
-│ 🌐 example.com/article2(来自 URL) │ — │ ❌ 抓取失败 │
-└───────────────────────────────────────┴──────┴─────────────┘
- [保存到知识库(3 个文档)]
-```
-
-### 前端组件结构
-
-```
-web/src/app/workspace/collections/[collectionId]/documents/
-├── page.tsx # 文档列表页(现有,加"添加"按钮入口)
-└── upload/
- ├── page.tsx # 上传页(现有)
- ├── document-upload.tsx # 暂存区组件(现有,几乎不改)
- └── import/
- ├── import-dialog.tsx # 来源选择 Dialog(新增)
- ├── url-import.tsx # URL 输入表单(新增)
- └── text-import.tsx # 文本粘贴表单(新增)
-```
-
----
-
-## 新增代码量统计
-
-| 位置 | 改动类型 | 估计行数 |
-|------|---------|---------|
-| Pydantic document schema | 新增 `fetchUrlRequest/Response` schema | ~40 行 |
-| FastAPI collections router | 新增 `/fetch-url` 路径 | ~30 行 |
-| `aperag/views/collections.py` | 新增一个路由函数 | ~60 行 |
-| `aperag/websearch/reader/reader_service.py` | 将私有函数提取为公共方法 | ~20 行重构 |
-| `web/src/...import/import-dialog.tsx` | 新建组件 | ~60 行 |
-| `web/src/...import/url-import.tsx` | 新建组件 | ~80 行 |
-| `web/src/...import/text-import.tsx` | 新建组件 | ~70 行 |
-| `web/src/.../document-upload.tsx` | 增加"添加来源"入口触发 | ~10 行 |
-| i18n 文件 | 新增翻译 key | ~20 行 |
-| **合计** | | **~390 行** |
-
-**不需要改动的部分(完全复用)**:
-- `document_service.upload_document()` — 无改动
-- `document_service.confirm_documents()` — 无改动
-- Celery 任务 / 索引构建流程 — 无改动
-- 文档列表页、暂存区主逻辑 — 几乎无改动
-
----
-
-## API Schema 定义
-
-在 Pydantic document schema 中新增:
-
-```yaml
-fetchUrlRequest:
- type: object
- properties:
- urls:
- type: array
- items:
- type: string
- format: uri
- minItems: 1
- maxItems: 10
- description: List of URLs to fetch content from
- example:
- - "https://example.com/article1"
- - "https://example.com/article2"
- required:
- - urls
-
-fetchUrlResultItem:
- type: object
- properties:
- url:
- type: string
- description: The source URL
- fetch_status:
- type: string
- enum: ["success", "error"]
- document:
- $ref: '#/Document'
- description: Created document (only present on success)
- error:
- type: string
- description: Error message (only present on failure)
- required:
- - url
- - fetch_status
-
-fetchUrlResponse:
- type: object
- properties:
- documents:
- type: array
- items:
- $ref: '#/fetchUrlResultItem'
- total:
- type: integer
- succeeded:
- type: integer
- failed:
- type: integer
- required:
- - documents
- - total
- - succeeded
- - failed
-```
-
----
-
-## 错误处理
-
-| 场景 | 处理位置 | 处理方式 |
-|------|---------|---------|
-| URL 格式非法(非 http/https) | 后端校验 | 400,跳过该 URL |
-| URL 数量 > 10 | 后端校验 | 400,整体拒绝 |
-| URL 页面无法访问(4xx/5xx) | web/read 返回 error | 在响应中标记该 URL 失败 |
-| 抓取超时(>30s) | web/read 超时 | 同上 |
-| 文档名冲突 | `upload_document()` 抛出异常 | 自动追加序号或返回已存在文档(幂等) |
-| 配额超限 | `upload_document()` 抛出异常 | 400,停止处理剩余 URL |
-| 文本内容为空 | 前端校验 | 禁用"添加"按钮 |
-
----
-
-## 数据完整性
-
-URL 导入的文档在 `doc_metadata` 中记录来源信息,便于追溯和未来的定时刷新功能:
-
-```json
-{
- "source_type": "url",
- "source_url": "https://example.com/article",
- "page_title": "示例文章标题",
- "fetched_at": "2026-03-05T10:00:00Z",
- "object_path": "user-xxx/col_xxx/doc_xxx/original.md"
-}
-```
-
-文本导入的文档:
-
-```json
-{
- "source_type": "text",
- "object_path": "user-xxx/col_xxx/doc_xxx/original.txt"
-}
-```
-
----
-
-## 与现有功能对比
-
-| 维度 | 文件上传 | URL 导入 | 文本导入 |
-|------|----------|---------|---------|
-| 内容获取方式 | 用户本地文件 | 后端调用 web/read 服务抓取 | 前端直接创建 File 对象 |
-| 新增后端接口 | — | 1 个(`/fetch-url`) | **0 个** |
-| 初始文档状态 | `UPLOADED` | `UPLOADED` | `UPLOADED` |
-| 确认流程 | `POST /documents/confirm` | 同左(完全复用) | 同左(完全复用) |
-| 索引构建 | 现有 Celery 任务 | 同左(完全复用) | 同左(完全复用) |
-| 文件格式 | 各种格式 | `.md`(Markdown) | `.txt` |
-
----
-
-## 未来扩展
-
-1. **网络搜索导入**:用户输入关键词 → 调用 `web/search` → 获取 URL 列表 → 复用 `/fetch-url` 接口批量抓取。搜索步骤是新增逻辑,抓取和入库完全复用。
-
-2. **URL 定时刷新**:对 `source_type=url` 的文档,基于 `source_url` 定期重新抓取内容并更新索引。
-
-3. **JavaScript 渲染支持**:web/read 服务升级支持 Playwright 后,`/fetch-url` 接口自动受益,无需改动。
-
----
-
-## 实施路径
-
-### Phase 1:后端(约 2 天)
-
-1. 使用 Python / Pydantic 定义 `fetchUrlRequest/Response`
-2. 在 FastAPI collections router 中注册 `/fetch-url` 路由
-3. 运行 `make openapi-check`
-4. 将 `web.py` 中的 `_read_with_jina_fallback` / `_read_with_trafilatura_only` 提取为 `ReaderService` 的公共方法
-5. 在 `views/collections.py` 中实现 `fetch_url_document_view` 路由函数
-6. 编写单元测试
-
-### Phase 2:前端(约 2 天)
-
-1. 通过 FE v2 typed client / feature adapter 接入新接口
-2. 实现 `import-dialog.tsx`(来源选择入口)
-3. 实现 `url-import.tsx`(URL 输入 + 调用 `/fetch-url`)
-4. 实现 `text-import.tsx`(文本粘贴 + 创建 File 对象 + 调用现有上传接口)
-5. 在文档列表页或上传页集成"添加来源"按钮触发 Dialog
-6. 新增 i18n key(`zh-CN` 和 `en-US`)
-
-### Phase 3:验证(约 0.5 天)
-
-1. E2E 测试:URL 导入 → 暂存区展示 → confirm → 索引完成
-2. E2E 测试:文本粘贴 → 暂存区展示 → confirm → 索引完成
-3. 错误场景:无效 URL、超时、配额超限
-
----
-
-## 相关文件索引
-
-### 参考文件
-
-- `aperag/service/document_service.py` — `upload_document()` 实现(核心复用点)
-- `aperag/views/web.py` — `_read_with_jina_fallback()` 等私有函数(待提取为公共方法)
-- `aperag/websearch/reader/reader_service.py` — ReaderService(JINA/Trafilatura 实现)
-- `web/src/app/.../upload/document-upload.tsx` — 前端暂存区(参考现有实现)
-
-### 修改文件
-
-- Pydantic document schema — 新增 Schema
-- FastAPI collections router — 新增路由
-- `aperag/views/collections.py` — 新增 `fetch_url_document_view`
-- `aperag/websearch/reader/reader_service.py` — 提取公共方法
-
-### 新建文件
-
-- `web/src/app/.../documents/upload/import/import-dialog.tsx`
-- `web/src/app/.../documents/upload/import/url-import.tsx`
-- `web/src/app/.../documents/upload/import/text-import.tsx`
-- `web/src/i18n/zh-CN/page_documents_import.json`
-- `web/src/i18n/en-US/page_documents_import.json`
diff --git a/docs/zh-CN/design/vector_db_abstraction_m2_pr.md b/docs/zh-CN/design/vector_db_abstraction_m2_pr.md
deleted file mode 100644
index c31d55208..000000000
--- a/docs/zh-CN/design/vector_db_abstraction_m2_pr.md
+++ /dev/null
@@ -1,145 +0,0 @@
-# 向量数据库抽象层 M2 实现(PR 说明)
-
-本 PR 把 [`vector_db_abstraction.md`](./vector_db_abstraction.md) 中 M2
-的所有条目落地到代码。这是纯内部重构,**不改变任何用户可见行为**;但它
-堵上了向量数据库抽象的四个历史破口中的三个,为将来可能的 pgvector/Milvus
-切换铺好了路。
-
-## 结论先行
-
-- 上线后**不需要任何运维操作**:没新增 env、没新增 secret、没新增 CR、
- 没新增存储 migration。
-- 所有现有单元/集成测试通过;本 PR 新增 4 个测试文件共约 30 个用例。
-- `docs/zh-CN/design/vector_db_abstraction.md` 已更新,M2 段落标注
- "✅ 已落地"。
-
-## 做了什么
-
-### 1. `VectorFilter` DSL(新文件 `aperag/vectorstore/filters.py`)
-
-- 六个节点:`Eq / In / IsEmpty / And / Or / Not`,全部 frozen dataclass
- (天然可 hash / 可比较 / 不可变)。
-- 两个人类友好 helper:`all_of(*parts)` / `any_of(*parts)`,自动跳过
- `None`、单参数退化、零参数返回 `None`,让 `ContextManager` 不用写
- `if len(parts) == 0 else ...` 之类的胶水。
-- 文件顶部写清楚了 "加节点前读这里" 的设计约束:**最小化、仅标量值、
- 不许 import 任何后端库**。未来新加节点要同步改所有 backend translator,
- 成本心里有数。
-
-### 2. Qdrant translator(`qdrant_connector.py` 内新增)
-
-- `_translate_filter(flt)` 把 DSL 树转成 `qdrant_client.models.Filter`。
-- `_normalize_filter_input(x)` 同时接受 DSL 节点、`rest.Filter` 直传、
- 或 `None`;把 `rest.Filter` 直传留在这里**只是给迁移脚本用**——`ContextManager`
- 已经完全走 DSL。
-- 空 `In` 直接抛 `ValueError`——在单后端时代,这种静默"匹配空集"最容易
- 生成"线上没数据"的疑云,现在崩溃比疑云更便宜。
-
-### 3. `VectorStoreConnector` 抽象升级(`aperag/vectorstore/base.py`)
-
-- 新增抽象方法 `retrieve(ids, *, with_payload, with_vectors)`。
- 之前只有 Qdrant 实现,`document_service.py` 直接调用,切后端会
- `AttributeError`。
-- 新增最小 DTO `VectorPoint(id: str, payload: dict, vector: list | None)`。
- `retrieve()` 返回它,彻底切断 `qdrant_client.http.models.Record` 对业务
- 层的泄露。
-- `search(query, *, filter, score_threshold, **kwargs)`:`filter` 从
- `**kwargs` 升级为显式关键字,类型注释 `Optional[VectorFilter]`。
-
-### 4. `ContextManager` 去 Qdrant 化(`aperag/context/context.py`)
-
-- 整个文件不再 import `qdrant_client` 任何东西。
-- `_create_index_types_filter` / `_create_combined_filter` 产出 DSL。
-- 历史上 `Filter(should=[FieldCondition(indexer IN [...]),
- IsEmptyCondition(indexer)])` 的 **"兼容老数据没有 indexer 字段"**
- 语义被原样保留(在 `_create_index_types_filter` 的 `any_of(In, IsEmpty)`
- 里)。这是我在 double-think 时反复确认的点——悄悄丢掉这个 branch 会让
- 迁移前的数据在检索中消失一半。
-
-### 5. `QdrantClient` 进程级复用(`qdrant_connector.py` 顶部)
-
-- `_get_or_create_client(url, port, grpc_port, prefer_grpc, https, api_key, timeout)`
- 按 endpoint 特征做双检锁缓存。
-- `:memory:` URL **显式绕过缓存**,否则集成测试里各测试会共享一份内存
- store,隐性破坏隔离——这种"在 prod 用没事但在 test 里炸"的陷阱是
- 长期最耗答疑时间的那种 bug,一次把它封死。
-- `_reset_client_cache()` 只给测试用。
-
-### 6. `QdrantVectorStoreConnector.retrieve` 返回类型切换
-
-- 从"Qdrant `Record` 原样返回"变为"转成 `VectorPoint` 返回"。
-- `id` 统一 `str(x)`——上游 `Chunk.id: Optional[str]`,Pydantic 本来就会
- 强转,实际语义无变化,只是把隐式强转变成显式。
-- 多向量字段的 collection 在 Qdrant 里是 `dict[str, list[float]]`,我们
- 从来不用多向量,这里加了一个防御性的 `dict -> first.values()` 兜底,
- 以后即便有人误配置了也不会 AttributeError。
-
-### 7. `envs/env.template` 追加 `QDRANT_*` 多租户 / 优化开关
-
-- 之前这些 env 只在 `deploy/aperag/values.yaml` 里(线上路径),本地
- `.env` 不知道它们的存在。
-- 现在 `env.template` 里列齐了全部 10 个 `QDRANT_*` 开关,默认值与生产
- 一致;开发者 copy 一份 `.env` 就直接能跑。
-- `docker.env.overrides` 无需改动(它只 override host/port 类字段)。
-
-## 没做什么(以及为什么)
-
-| 没做 | 为什么 |
-|---|---|
-| 解耦 LlamaIndex `QdrantVectorStore.add(nodes)` | 两处 vision 写路径 + embedding_utils 深度依赖 LlamaIndex 的 node schema;拆解工作量 ≥ 本 PR。等 M3 做 pgvector 时必须拆,那时一起做 |
-| 引入 `TenantRef / VectorShape / QueryResult / SearchOptions` 一堆 DTO | 只有 Qdrant 时这些 DTO 都只是 "存在感",用户读到会问 "为啥 5 个类做的事情能用 3 个做"。等真加 pgvector 时,那时的工程师对 pgvector 的约束了解精确,一次定型好过现在拍脑袋 |
-| 把 graph DB 一起抽象 | 你提到了 graph 抽象也要做;但 graph 的模式(Cypher/GQL)和向量完全不同,共用层只会是最低公共分母。**本 PR 只把向量 DSL 放在 `aperag/vectorstore/filters.py` 而不是 `aperag/filters.py`**,给未来 graph DSL 留了独立命名空间 |
-| Milvus 实现 | 你说可以不做,暂不做 |
-| 客户端连接池的更复杂策略(健康探测 / 驱逐 / 自动重连) | Qdrant Python 客户端本身自带 keep-alive 和重连;更复杂的池管理会在第一次遇到具体问题时再加。**"没问题时的复杂度 = 负价值"** |
-
-## 接口兼容性
-
-- `VectorStoreConnector.search(filter=X)` 的 `X`:
- - **推荐**:DSL 节点(`Eq`、`In`、`all_of(...)` 等)。
- - **仍兼容**:`qdrant_client.models.Filter`(迁移脚本在用)。
- - 其他类型:记 WARNING 并丢弃(与之前行为一致)。
-- `VectorStoreConnector.retrieve(ids, *, with_payload, with_vectors)` 返回
- `List[VectorPoint]`。之前的 Qdrant `Record` 都被访问的是 `.id` / `.payload`,
- `VectorPoint` 完全兼容这两个属性;pydantic 下游 `Chunk(id=point.id)` 正常。
-- 所有 `connector.delete(ids=...)` / `connector.create_collection(...)` /
- `connector.delete_collection(...)` 签名不变。
-- `connector.store.add(nodes)`(LlamaIndex 写路径)不变。
-
-## 测试矩阵
-
-| 文件 | 覆盖点 |
-|---|---|
-| `test_filters.py` | DSL 构造规则、frozen 语义、`all_of/any_of` 短路、嵌套 |
-| `test_qdrant_filter_translation.py` | 每个 DSL 节点 → Qdrant Filter 的结构快照;rest.Filter 直传短路;未知类型拒绝;空 In 报错 |
-| `test_qdrant_client_cache.py` | 同 endpoint 复用、不同 endpoint 各开一个、`:memory:` 绝不缓存、并发首连接不雪崩 |
-| `test_context_manager_filter.py` | 无过滤 → None、只 index_types → `Or(In, IsEmpty)`(保留 backward-compat)、只 chat_id → Eq、全开 → And(Or, Eq)、与 `vectordb_type` 字符串解耦 |
-
-共 30 个新用例,全部通过。原有 39 个测试(包括 embedding lock、多租户
-delete、purge_all_shards)全部继续通过。
-
-## 上线核对清单
-
-- [ ] 镜像构建通过。
-- [ ] CI 所有测试绿(pytest + ruff)。
-- [ ] 生产 Qdrant 是 1.10+。本 PR 对 Qdrant 版本要求与上一 PR 相同。
-- [ ] `deploy/aperag/values.yaml` 的 `QDRANT_*` 字段与上一 PR 保持一致,
- 无需运维动作。
-- [ ] 合并后观测:
- - `qdrant-cluster-qdrant-0` RSS 曲线不应有突变(本 PR 不动内存策略)。
- - 首次搜索延迟可能小幅下降(client 复用省了 TCP/TLS 握手)。
- - 无新增 ERROR 级别日志。
-
-## 后续
-
-本 PR 合并后,[`vector_db_abstraction.md`](./vector_db_abstraction.md)
-的 M3(pgvector 实现)就是"填空题" —— 只需要:
-
-1. 新增 `aperag/vectorstore/pgvector_connector.py`:实现 `search /
- delete / retrieve / create_collection / delete_collection`;
-2. 在 `VectorStoreConnectorAdaptor.match vector_store_type` 里加一条
- `case "pgvector"` 分支;
-3. 在 `envs/env.template` 里加 `VECTOR_DB_TYPE=pgvector` 的例子;
-4. 给 pgvector 单独一份 schema migration 脚本。
-
-整个 M3 预计不需要再动 `ContextManager`、`base.py`、`filters.py` 中的
-任何一行。这就是本 PR 要达到的 "M3 只是换后端、不是重写系统" 的目标。
diff --git a/docs/zh-CN/design/vector_db_abstraction_m3_pr.md b/docs/zh-CN/design/vector_db_abstraction_m3_pr.md
deleted file mode 100644
index b11a58a23..000000000
--- a/docs/zh-CN/design/vector_db_abstraction_m3_pr.md
+++ /dev/null
@@ -1,279 +0,0 @@
-# 向量数据库抽象层 M3(pgvector + 抽象层补完)— PR 说明
-
-本 PR 在 M2 的基础上一次性做完了两件事:
-
-1. **M3:pgvector 后端实现**。ApeRAG 现在支持 `VECTOR_DB_TYPE=qdrant |
- pgvector` 两种后端,默认 Qdrant。pgvector 版本与 Qdrant 功能对等,
- 共用同一套 `VectorFilter` DSL、租户隔离语义、embedding 锁定。
-2. **M2 的补完**:原本推到 M3 的 "DTO 全套" 和 "去 LlamaIndex 写依赖" 两
- 项尾巴,这次一起做掉了。不留代码债、不留下次返工。
-
-**合并即可用。** 没有新增 env、也没有任何必要的运维动作(除非你想启用
-pgvector —— 那就翻一个 env flag 就行)。
-
-## 为什么不留尾巴
-
-上一轮我故意保守——"只有 Qdrant 一个后端时多 DTO 是答疑负担"。这次不再
-成立:
-
-- pgvector 真的要上了,两个后端的存在让 DTO 全套**立刻**有收益(避免
- per-callsite 分支);
-- LlamaIndex 的 `QdrantVectorStore.add(nodes)` 在引入 pgvector 时是
- 硬阻塞(pgvector 没有对应 adapter);
-- 等下次 PR 再拆这两件事,就意味着本次 pgvector 落地时要么**复制**
- LlamaIndex 的 `_node_content` 序列化约定到 pgvector,要么引入一个
- LlamaIndex `PGVectorStore` 作为新依赖——都是"现在不痛、将来痛"的决策。
- 趁还没分叉先拆掉,代价最低。
-
-所以本 PR 的核心口号是:"一次到位"。
-
-## 抽象层现状(合并后)
-
-```text
-aperag/vectorstore/
-├── base.py # VectorStoreConnector 抽象;签名全部 DTO 化
-├── dto.py # TenantRef, VectorShape, VectorPoint,
-│ # QueryRequest, SearchHit, flatten_node_payload
-├── filters.py # VectorFilter DSL: Eq/In/IsEmpty/And/Or/Not
-├── connector.py # 适配器:match vector_store_type
-├── qdrant_connector.py # Qdrant 实现 + DSL translator
-├── pgvector_connector.py # pgvector 实现 + SQL translator (新)
-└── llama_index_adapter.py # BaseNode -> VectorPoint (新)
-```
-
-契约(`base.py`)现在是:
-
-```python
-class VectorStoreConnector(ABC):
- @property
- @abstractmethod
- def tenant(self) -> TenantRef: ...
- @property
- @abstractmethod
- def shape(self) -> VectorShape: ...
-
- @abstractmethod
- def ensure_collection(self) -> None: ...
- @abstractmethod
- def drop_tenant(self, *, purge_all_shards: bool = False) -> None: ...
-
- @abstractmethod
- def upsert(self, points: Sequence[VectorPoint]) -> list[str]: ...
- @abstractmethod
- def delete(self, ids: Sequence[str]) -> None: ...
- @abstractmethod
- def delete_by_filter(self, flt: VectorFilter) -> None: ...
-
- @abstractmethod
- def search(self, request: QueryRequest) -> list[SearchHit]: ...
- @abstractmethod
- def retrieve(self, ids: Sequence[str], *, with_vectors: bool = False) -> list[VectorPoint]: ...
-```
-
-九个方法,全部 DTO 化、全部不携带任何后端特征。
-
-## pgvector 关键设计决策
-
-### 部署形态:默认复用主 Postgres
-
-- 默认 `PGVECTOR_DATABASE_URL=` 留空 → 使用 ApeRAG 主 DB (`DATABASE_URL`)。
-- "私有化交付 / ApeRAG-Lite" 场景少一个组件,**零部署负担**就是抽象层
- 的主要卖点。
-- 规模上去了?设 `PGVECTOR_DATABASE_URL=postgresql://...` 指到独立 PG,
- 一个 env 搞定,应用代码不动。
-
-### Schema:动态建表、对齐 Qdrant 命名
-
-每个 `(vector_size, distance)` 对一张 `aperag_vectors__`
-表,与 Qdrant 的物理分片命名完全一致。这不是偶然——`purge_all_shards`
-这类运维逻辑现在**两后端走同一条代码路径**:按 `aperag_vectors_*` 前缀
-扫分片,按 `tenant_id` 删。
-
-表结构:
-
-```sql
-CREATE TABLE aperag_vectors_1024_cosine (
- id UUID PRIMARY KEY,
- tenant_id TEXT NOT NULL,
- embedding vector(1024) NOT NULL,
- payload JSONB NOT NULL DEFAULT '{}'::jsonb,
- created_at TIMESTAMPTZ NOT NULL DEFAULT now()
-);
-CREATE INDEX ... ON ... (tenant_id);
-CREATE INDEX ... ON ... USING hnsw (embedding vector_cosine_ops)
- WITH (m=16, ef_construction=64);
-CREATE INDEX ... ON ... USING GIN (payload);
-```
-
-三个索引各司其职:tenant_id 用 B-tree 撑住"按租户 DELETE / SELECT";
-HNSW 用于近邻;GIN(payload) 让 DSL filter 能被 JSONB 索引加速。
-
-**动态建表 vs migration 文件**:我选前者。原因:
-- `(size, distance)` 组合是用户选 embedding model 时决定的,**不是**
- 部署时间常量。主 alembic 迁移如果要包含所有可能组合,意味着要么
- 硬编码 (1024, 1536, 3072) × (cosine, dot, euclid),要么 migration
- 失控。
-- 连接器的 `ensure_collection` 幂等 + 进程级缓存,首次写时 ~20ms 延迟
- 一次,之后零开销;这和 Qdrant 的 `collection_exists ? no-op :
- create_collection` 完全对称。
-- 运维视角统一:"哪里来的表?" → `aperag_vectors_*`;"按什么删?" →
- `tenant_id + id / filter`;"怎么清空某个 Collection?" → `DELETE
- FROM ... WHERE tenant_id = :t`。这些都是**查看即懂**的操作。
-
-### SQL filter translator
-
-`VectorFilter` DSL → `(where_sql, bind_params)`:
-
-| DSL | SQL |
-|---|---|
-| `Eq(k, v)` | `payload->>'k' = :f0` |
-| `In(k, [a, b])` | `payload->>'k' IN (:f0, :f1)` |
-| `IsEmpty(k)` | `NOT (payload ? 'k') OR payload->'k' = 'null'::jsonb` |
-| `And(...)` | `(...parts join ' AND '...)` |
-| `Or(...)` | `(...parts join ' OR '...)` |
-| `Not(inner)` | `NOT (...)` |
-
-**所有值走 bind 参数**;JSON key 走白名单正则校验(见
-`_escape_json_key`)。SQL 注入面彻底关死。
-
-### 距离语义:统一"分数越高越好"
-
-| 距离 | HNSW opclass | PG 操作 | 评分表达式 |
-|---|---|---|---|
-| cosine | `vector_cosine_ops` | `<=>` | `1 - (embedding <=> :q)` |
-| euclid | `vector_l2_ops` | `<->` | `-(embedding <-> :q)` |
-| dot | `vector_ip_ops` | `<#>` | `-(embedding <#> :q)` |
-
-这让 `QueryRequest.score_threshold` 对三种距离都是"higher-is-better"
-的一致语义,调用方无需分支。
-
-### `CAST(:q AS vector)` 小坑
-
-SQLAlchemy `text()` 的 `:name::vector` 写法在 psycopg2 下会被误解析
-(部分参数被替换、部分原样保留,SQL 报错)。正确写法是 `CAST(:q AS
-vector)`——在 `_DISTANCE_SPEC` 和 upsert 的 VALUES 里都已改用 CAST。
-留了详细注释,后来人不会再踩。
-
-## Qdrant 连接器的同步改造
-
-为保持契约对称,Qdrant 连接器也一起改了:
-
-- **原生 `upsert`**:`client.upsert(...)` 直接写,不再走 LlamaIndex
- `QdrantVectorStore.add(nodes)`。
-- **`delete_by_filter`**:新方法,自动 AND tenant 守卫;空过滤器被显式
- rejected(否则在多租户下会 silently 等价于 `drop_tenant`)。
-- **`retrieve` 返回 `List[VectorPoint]`**:`id` 归一化为 `str`、vector
- 归一化为 `list[float]`。
-- **`drop_tenant` 替代 `delete_collection`**:命名更准确。
-- **去掉 `self.store`**:LlamaIndex `QdrantVectorStore` 不再在连接器里
- 存在。`vector_store_adaptor.connector.store.add(...)` 这种写法彻底
- 失效。
-
-## 业务层改造
-
-三个写入点 + 两个读取点:
-
-| 位置 | 改造 |
-|---|---|
-| `embedding_utils.py::create_embeddings_and_store` | `store.add(nodes)` → `nodes_to_vector_points(nodes, tenant_id=...)` → `connector.upsert(points)` |
-| `vision_index.py`(纯视觉 + 视觉转文本两处) | 同上 |
-| `tasks/collection.py::_initialize_vector_databases` | `create_collection(vector_size=...)` → `ensure_collection()` |
-| `tasks/collection.py::_delete_vector_databases` | `delete_collection(...)` → `drop_tenant(...)` |
-| `document_service.py::get_document_chunks / _vision_chunks` | 手写 `_node_content` 解析 → `flatten_node_payload(point.payload)` |
-
-`ContextManager.query()` 内部从 `connector.search(...).results`(老的
-`QueryResult` 包装)切换到 `list[SearchHit]` + 自己适配成
-`DocumentWithScore`,对外行为完全不变。
-
-## 向后兼容:旧数据仍可读
-
-已经写入的 Qdrant 数据(由 LlamaIndex `QdrantVectorStore.add` 产生,
-payload 带 `_node_content` 字符串)**不需要任何迁移**。读路径统一走
-`flatten_node_payload()`:
-
-1. 有 `text` / `metadata` 顶层字段(新写入)→ 直接用。
-2. 只有 `_node_content`(老数据)→ 解析 JSON 取字段。
-3. 两者都有(过渡态)→ 优先用 flat 版本(最新写入意图)。
-4. `metadata.source` 缺失但 `_node_content.relationships['1'].metadata.source`
- 存在 → 派生 basename 作为 source(老文档预览页依赖这条路径)。
-
-这个 helper 有专门单元测试(`test_dto.py`)盯住上面四条语义。
-
-## 配置项
-
-### 新增 env(都非必填)
-
-```bash
-# 向量后端选择,默认 qdrant。切 pgvector 只需改这一行。
-VECTOR_DB_TYPE=qdrant
-
-# pgvector 独立 DB URL(可选)。留空 → 复用主 PG。
-PGVECTOR_DATABASE_URL=
-
-# HNSW 调优
-PGVECTOR_HNSW_M=16
-PGVECTOR_HNSW_EF_CONSTRUCTION=64
-PGVECTOR_HNSW_EF_SEARCH=40
-```
-
-### deploy
-
-- `deploy/aperag/values.yaml` 补齐 `PGVECTOR_*`,默认与 env.template 一致。
-- docker-compose 的 `aperag-postgres` 已经是 `apecloud/pgvector:pg16`,
- `vector` 扩展已安装。**本地直接开箱可用**:
-
-```bash
-# 切到 pgvector 后端(默认复用主 DB)
-echo "VECTOR_DB_TYPE=pgvector" >> envs/.env
-make run # 应用自动建表、建索引、建 extension
-```
-
-## 测试
-
-| 文件 | 范围 |
-|---|---|
-| `test_dto.py`(新) | `VectorShape` 归一化、`TenantRef` 非空、`VectorPoint` 类型检查、`QueryRequest` 字段默认、`flatten_node_payload` 四条语义路径 |
-| `test_llama_index_adapter.py`(新) | `BaseNode` → `VectorPoint` 扁平转换、tenant 自动注入、顺序保持 |
-| `test_pgvector_translator.py`(新) | 表名构造边界、DSL 每节点 → SQL 片段结构快照、SQL 注入面(参数 bind 而非插值) |
-| `test_pgvector_end_to_end.py`(新,gated by `APERAG_TEST_PGVECTOR_URL`) | 10 个用例覆盖 upsert / search / retrieve / delete / delete_by_filter / drop_tenant / tenant 隔离 / 组合过滤 |
-| `test_qdrant_multitenancy_integration.py`(更新) | 全量迁移到新 DTO API;新增 `delete_by_filter` 和 `upsert` 覆盖 |
-| `test_qdrant_filter_translation.py`、`test_qdrant_client_cache.py`、`test_context_manager_filter.py`、`test_filters.py`(更新/保留) | M2 的测试集全部兼容新 API |
-
-**结果**:121 tests in `tests/unit_test/vectorstore + service`(含 10 个
-pgvector 端到端、gated),全部通过。更广的 455 测试(排除 pre-existing
-`pydantic_ai` / MCP docstring 失败,那两组与本 PR 无关)也通过。
-
-## 上线指南(两种场景)
-
-### 场景 A:继续用 Qdrant(默认)
-
-- 什么都不用改。合并即可。
-- 新的 `connector.upsert` 写入格式不再包含 `_node_content`;已有老
- 数据的读路径通过 `flatten_node_payload` 完全兼容,不需要重写老数据。
-
-### 场景 B:切换到 pgvector
-
-1. 确认 PostgreSQL 安装了 `pgvector` 扩展(`CREATE EXTENSION vector` 能
- 跑通)。`apecloud/pgvector:pg16` 镜像已内置。
-2. 设 `VECTOR_DB_TYPE=pgvector`。默认复用 `DATABASE_URL` 指向的主
- Postgres;如要独立 PG,设 `PGVECTOR_DATABASE_URL=...`。
-3. 重启应用。首次写入时连接器自动 `CREATE TABLE IF NOT EXISTS
- aperag_vectors__` + 三个索引。
-4. **Qdrant 和 pgvector 的数据目前不互通**——切换相当于新启用一个空的
- 后端。已有 Qdrant 数据需要重新 ingest(或保留 Qdrant 继续读、同时
- pgvector 只收新数据:这种双写窗口设计不在本 PR 范围)。
-
-## 下一步(非本 PR)
-
-这些都不是 blocker,记录在这里供后续评估:
-
-- **数据迁移脚本**(Qdrant → pgvector):现在没做,需要时按
- `scripts/migrate_qdrant_multitenancy.py` 的思路扩一个 "scroll Qdrant →
- upsert pgvector" 的脚本,大概 300 行。
-- **pgvector `halfvec` / `bit` 量化**:维度过大时开这个能省 50~90% 磁
- 盘。目前 `pgvector_hnsw_*` 下留了 hook,实际启用需要给 `VectorShape`
- 加一个 `storage_type` 字段 + connector 建表时用 `halfvec(dim)`。
- **按需再做**,不提前占位。
-- **Milvus 后端**:架构已经齐了,新加一个 `milvus_connector.py` 实现
- 9 个抽象方法 + 在 `connector.py::match` 加一条 case 就能接进来。预计
- 1~2 周工作量。
diff --git a/docs/zh-CN/reference/evaluation-current-guide.md b/docs/zh-CN/reference/evaluation-current-guide.md
deleted file mode 100644
index 96036cf9a..000000000
--- a/docs/zh-CN/reference/evaluation-current-guide.md
+++ /dev/null
@@ -1,155 +0,0 @@
-# Evaluation 当前产品状态与使用说明(v3 简化版)
-
-本文档面向产品、测试、交付和一线支持同学,说明 ApeRAG 当前主线里的 Evaluation 是什么、现在有哪些能力、推荐怎么使用。
-
-注意:
-
-- 本文以当前主线实现为准(`#20` evaluation v3 simplification 合并后)。
-- 如果你同时看到 [evaluation-design](../design/evaluation-design.md) 的旧草稿,请把它理解为一份较早期的设计稿,**而不是当前产品说明**。
-
-## 1. 现状与当前能力
-
-当前主线里的 Evaluation 采用**两个核心对象 + 单入口**的工作流:
-
-1. **Evaluation Dataset**:一组用于评测的问答,挂在某个 Collection 下。
-2. **Evaluation Run**:对某个 Dataset 发起的一次运行;运行时按用户级默认 Bot 执行,或显式覆盖 `bot_id`。
-
-**操作步数只有 3 步**:创建 Dataset → 录入问题 → 发起评测。
-
-已经**消失的旧概念**(`#20` 简化移除):
-
-- ~~Benchmark~~
-- ~~Dataset Version / Publish Version~~
-- ~~Question Set(独立于 Dataset 的问题集)~~
-- ~~UI 上要求用户手工输入 / 复制粘贴 `dataset_version_id`~~
-- ~~发起运行时在 FE 页面选择 Bot~~(改成"默认 Bot 解析 + 可覆盖",一般情况下不需要选)
-
-### 1.1 入口在哪里
-
-**唯一入口:Collection → Evaluations 页面**。
-
-路径:
-
-- `/workspace/collections/{collectionId}/evaluations`
-
-作用:
-
-- 整个 Evaluation 流程的起点与终点。
-- 在这里创建/删除 Evaluation Dataset。
-- 在这里手动录入问题(每条是一个 dataset item)。
-- 在这里点"发起评测"启动一次 Evaluation Run。
-- 在这里查看该 Collection 的历史 runs 和进度。
-
-**Dataset 问题管理页面**(进入某个 dataset 后):
-
-- `/workspace/collections/{collectionId}/evaluations/datasets/{datasetId}`
-
-作用:
-
-- 查看/新增/删除这个 dataset 下的问题。
-- 注意:dataset item 可以继续修改,但**历史 run 通过 snapshot 保留当时的问答内容**,不会被后续编辑影响。
-
-**Run 详情页**:
-
-- Collection 视角:`/workspace/collections/{collectionId}/evaluations/{runId}`
-- Bot 视角(只读):`/workspace/bots/{botId}/evaluation/runs/{runId}`
-
-作用:
-
-- 查看单次 run 的总体进度、summary 汇总。
-- 查看每个 run item 的状态、分数、最近一次 attempt、trace/chat 入口。
-- 对失败项执行重试;run 处于 queued/running 时可以取消整条 run。
-
-**Bot → Evaluation 页面**(只读历史列表):
-
-路径:
-
-- `/workspace/bots/{botId}/evaluation`
-
-作用:
-
-- 只读展示这个 bot 作为评测对象的历史 runs。
-- 不再承担"发起运行"的入口,也没有 `dataset_version_id` / Bot 选择输入。
-- 要发起新 run,点页面上的"打开知识库评测"跳回 Collection Evaluations。
-
-### 1.2 当前核心对象
-
-#### Evaluation Dataset
-
-挂在 Collection 下的一组问答。字段:
-
-- `name` / `description`
-- `collection_id`:scope 过滤用(不继承 Collection 的 sharing 权限,按 `user_id` 硬过滤)
-- `source_type`:`manual` / `import` / `generated`(MVP 主走 `manual`)
-- `item_count`:当前问题数
-
-#### Evaluation Dataset Item
-
-一条"问答"。字段:
-
-- `case_key`:稳定标识;留空时后端自动生成
-- `input_message`(必填):用户提示
-- `expected_answer`:期望答案(可选)
-- `reference_context`:参考上下文(可选)
-- `tags` / `case_metadata` / `sort_key`:辅助字段
-
-#### Evaluation Run
-
-对某个 Dataset 的一次评测。字段:
-
-- `dataset_id`(必填)
-- `bot_id`:省略时后端按"默认 Bot"解析 → 标题为 `Default Agent Bot` 的 active bot 优先,否则选最早创建的 active bot
-- `name`:可选运行名称
-- `judge`:判分配置(可选,MVP 下判分留 TODO)
-- `bot_config_snapshot` / `model_config_snapshot`:调用时的配置快照
-- `status`:`queued` → `running` → `completed` / `failed` / `cancelled`
-- `summary`:`total / pending / running / completed / failed / cancelled / avg_score?`
-- `dataset_name`:dataset 删除或改名后,run 详情仍能显示当时的名称
-
-#### Evaluation Run Item
-
-一条 run item = 一个 dataset item 的快照 + 执行态。**不回读 mutable dataset items**:`input_message / expected_answer / reference_context` 在创建 run 时 value-copy 到 run item。
-
-#### Evaluation Run Item Attempt
-
-一次实际调用机器人的 attempt。挂在 run item 下,通过 `attempt_no` 编号,失败后重试追加。
-
-## 2. 使用流程
-
-1. 进入 `Collections → {你的 collection} → Evaluations`。
-2. 点"创建数据集"(Create dataset),填名称。
-3. 在数据集卡片上点"管理问题"(Manage questions),把要测的问题逐条加进来。
-4. 回到 Evaluations 页面,点"发起评测"(Start evaluation):
- - 选择数据集(必须有至少 1 个问题)。
- - 可选:覆盖 `bot_id`。多数情况下留空,让后端选默认 Bot。
- - 可选:命名本次 run。
-5. 进入 run 详情页,看进度、查看每条 item 的结果,必要时重试失败项或取消 run。
-
-## 3. 错误和边界
-
-- **Dataset 没有问题就点"发起评测"**:按钮会置灰,hover 上去提示"请先给数据集添加至少一个问题"。
-- **当前用户没有任何可用的 Bot** 就尝试发起评测(没有 `Default Agent Bot` 也没有其它 active bot):FE 会把后端返回的 `ValidationException` 替换成用户可读的提示:"当前没有可用于评测的 Bot,请先创建 Bot 或联系管理员。"
-- **Dataset 删除**:历史 run 通过 snapshot 保留;dataset 删除后仍可查看 run 的每条 item(显示 `dataset_name` 而不是挂 FK)。
-- **非终态 run** (`queued / running`):run detail 页每 5 秒自动刷新,直到 run 进入终态。
-
-## 4. API 对照
-
-| 动作 | Method | Path |
-| ---- | ------ | ---- |
-| 列举 dataset | GET | `/api/v2/evaluation-datasets?collection_id=` |
-| 创建 dataset | POST | `/api/v2/evaluation-datasets` |
-| 更新 dataset | PUT | `/api/v2/evaluation-datasets/{dataset_id}` |
-| 删除 dataset | DELETE | `/api/v2/evaluation-datasets/{dataset_id}` |
-| 列举 dataset items | GET | `/api/v2/evaluation-datasets/{dataset_id}/items` |
-| 追加 items | POST | `/api/v2/evaluation-datasets/{dataset_id}/items` |
-| 更新单条 item | PUT | `/api/v2/evaluation-datasets/{dataset_id}/items/{item_id}` |
-| 删除单条 item | DELETE | `/api/v2/evaluation-datasets/{dataset_id}/items/{item_id}` |
-| 发起 run | POST | `/api/v2/evaluation-runs` |
-| 列举 run | GET | `/api/v2/evaluation-runs?collection_id&bot_id&dataset_id` |
-| run 详情 | GET | `/api/v2/evaluation-runs/{run_id}` |
-| 列举 run items | GET | `/api/v2/evaluation-runs/{run_id}/items` |
-| 取消 run | POST | `/api/v2/evaluation-runs/{run_id}/cancel` |
-| 重试单条 item | POST | `/api/v2/evaluation-runs/{run_id}/items/{item_id}/retry` |
-| 查询 attempts | GET | `/api/v2/evaluation-runs/{run_id}/items/{item_id}/attempts` |
-
-旧路径 `/api/v2/benchmark-datasets*`、`/api/v2/benchmark-datasets/{id}/versions*` 以及请求/响应字段 `dataset_version_id` 都已移除。
diff --git a/docs/zh-CN/reference/how-to-configure-ollama.md b/docs/zh-CN/reference/how-to-configure-ollama.md
deleted file mode 100644
index c643ed8e4..000000000
--- a/docs/zh-CN/reference/how-to-configure-ollama.md
+++ /dev/null
@@ -1,66 +0,0 @@
-# 如何在 ApeRAG 中配置本地 Ollama
-
-本指南介绍如何在 ApeRAG 部署中配置本地 Ollama 模型。
-
-## 前提条件
-
-- ApeRAG 在本地运行
-- [Ollama](https://ollama.ai/) 已安装并运行
-- Ollama 模型已下载
-
-## 步骤 1: 添加 Ollama 提供商
-
-在 ApeRAG 界面中导航到 **设置 > 模型**,点击 **"添加提供商"**。
-
-输入提供商名称(例如 "local-ollama")并设置 **Base URL** 为:`http://localhost:11434/v1`
-
-点击 **保存**。
-
-
-
-## 步骤 2: 添加 Ollama 模型
-
-点击新创建提供商右侧的 **三个点**,选择 **"Models"** 进入模型管理页面。
-
-点击 **"添加模型"** 并配置:
-- **模型名称**: 输入您的模型名称(例如 `gpt-oss:20b`)
-- **模型类型**: 选择 `Completion`
-- **LLM 提供商**: 选择 `openai`(因为 Ollama 兼容 OpenAI)
-
-点击 **保存**。
-
-
-
-## 步骤 3: 启用模型
-
-您会注意到每个模型都有两个切换开关:**Agent** 和 **Collection**。您可以同时启用它们:
-
-- **Agent**: 允许模型用于回答问题
-- **Collection**: 允许模型在构建 Collection 索引时使用
-
-
-
-## 步骤 4: 启用 Ollama 提供商
-
-返回提供商页面,点击您刚才添加的 Ollama 提供商右侧的 **三个点**,选择 **"启用"**。
-
-当提示输入 API 密钥时,输入任意随机字符串即可,因为 Ollama 是自托管的,不需要真实的身份验证。
-
-
-
-您的 Ollama 模型现在应该出现在模型列表中,可以使用了。
-
-
-
-## 使用方法
-
-配置完成后,您的本地 Ollama 模型将可用于:
-
-- **Collection**: 在创建或配置集合时在 LLM 设置中选择 Ollama 模型
-- **聊天**: 在聊天界面中选择 Ollama 模型进行对话
-
-
-
-
-
-您的本地 Ollama 模型现在已准备好与 ApeRAG 一起使用!
diff --git a/web/docs/en-US/deployment/_category.yaml b/web/docs/en-US/deployment/_category.yaml
deleted file mode 100644
index b01ba155f..000000000
--- a/web/docs/en-US/deployment/_category.yaml
+++ /dev/null
@@ -1,2 +0,0 @@
-title: Deployment
-position: 3
diff --git a/web/docs/en-US/deployment/build-docker-image.md b/web/docs/en-US/deployment/build-docker-image.md
deleted file mode 100644
index f3ffbbeaa..000000000
--- a/web/docs/en-US/deployment/build-docker-image.md
+++ /dev/null
@@ -1,45 +0,0 @@
-# Build Guide
-
-This section covers how to build ApeRAG container images. It's primarily for users who need to create their own builds or deploy to environments other than the ones covered in "Getting Started".
-
-## Building Container Images
-
-The project uses Docker and `make` commands to build container images.
-
-* **Local Platform Builds**:
- These commands build images for your current machine's architecture.
- ```bash
- # Build all necessary images for local platform
- make build-local
-
- # Build only the backend image for local platform
- make build-aperag-local
-
- # Build only the frontend image for local platform
- make build-aperag-frontend-local
- ```
-
-* **Multi-platform Builds**:
- These commands build images for multiple architectures (e.g., amd64, arm64). This requires Docker Buildx to be set up and configured.
- ```bash
- # Build all necessary images for multiple platforms
- make build
-
- # Build only the backend image for multiple platforms
- make build-aperag
-
- # Build only the frontend image for multiple platforms
- make build-aperag-frontend
- ```
- You can specify the target platforms using the `PLATFORMS` variable, for example:
- ```bash
- make build PLATFORMS=linux/amd64,linux/arm64
- ```
-
-## Deployment
-
-Refer to the "Getting Started" section in the main README for common deployment methods:
-* [Getting Started with Kubernetes](../README.md#getting-started-with-kubernetes)
-* [Getting Started with Docker Compose](../README.md#getting-started-with-docker-compose)
-
-For custom deployments, you will need to adapt these methods or use the built container images with your chosen orchestration platform. Ensure all required services (databases, backend, frontend, Celery workers) are correctly configured and can communicate with each other.
\ No newline at end of file
diff --git a/web/docs/en-US/design/_category.yaml b/web/docs/en-US/design/_category.yaml
deleted file mode 100644
index 049843c35..000000000
--- a/web/docs/en-US/design/_category.yaml
+++ /dev/null
@@ -1,2 +0,0 @@
-title: Design
-position: 1
diff --git a/web/docs/en-US/design/architecture.md b/web/docs/en-US/design/architecture.md
deleted file mode 100644
index b310af552..000000000
--- a/web/docs/en-US/design/architecture.md
+++ /dev/null
@@ -1,850 +0,0 @@
----
-title: System Architecture
-description: ApeRAG Architecture Design and Core Components
-keywords: ApeRAG, Architecture, RAG, Knowledge Graph, LightRAG
-position: 1
----
-
-# ApeRAG System Architecture
-
-## 1. What is ApeRAG
-
-ApeRAG is an **open, Agentic Graph RAG platform**. It's not just a simple vector retrieval system, but a production-ready solution that deeply integrates knowledge graphs, multimodal retrieval, and intelligent agents.
-
-Traditional RAG systems primarily rely on vector similarity search. While they can find semantically related content, they often lack understanding of relationships between knowledge points. ApeRAG's core innovations are:
-
-- **Graph RAG**: Automatically extracts entities (people, places, concepts) and relationships from documents to build knowledge graphs, understanding connections between knowledge points
-- **Agentic**: Built-in intelligent agents that can autonomously plan, invoke tools, and conduct multi-turn conversations for smarter Q&A experiences
-- **Open Integration**: Exposes capabilities through **RESTful API** and **MCP Protocol**, easily integrating with external systems like Dify, Claude, and Cursor
-
-### Core Advantages
-
-Compared to traditional RAG solutions, ApeRAG provides:
-
-- **Powerful Document Processing**: Supports PDF, Word, Excel and more, handling complex tables, formulas, and images
-- **Multiple Retrieval Methods**: Vector, full-text, and graph retrieval complement each other
-- **Knowledge Relationship Understanding**: Understands concept relationships through knowledge graphs, not just text similarity
-- **Open Integration Capabilities**: RESTful API + MCP protocol, can serve as knowledge backend for Dify, Claude Desktop, Cursor
-- **Production-Grade Architecture**: Async processing, multi-storage, high concurrency, ready for production
-
-### Architecture Overview
-
-```mermaid
-graph TB
- User[Users] --> Frontend[Web Frontend]
- User --> External[External Systems
Dify/Claude/Cursor]
-
- Frontend --> API[RESTful API]
- External --> MCP[MCP Protocol]
-
- API --> DocProcess[Document Processing]
- API --> Search[Search Service]
- API --> Agent[Agent Dialogue]
- MCP --> Search
- MCP --> Agent
-
- DocProcess --> Tasks[Async Task Layer]
- Tasks --> Storage[Storage Layer]
-
- Search --> Storage
- Agent --> Search
-
- Storage --> PG[(PostgreSQL)]
- Storage --> Qdrant[(Qdrant
Vector DB)]
- Storage --> ES[(Elasticsearch
Full-text Search)]
- Storage --> Neo4j[(Neo4j
Graph DB)]
- Storage --> MinIO[(MinIO
Object Storage)]
-
- style User fill:#e1f5ff
- style Frontend fill:#bbdefb
- style External fill:#bbdefb
- style API fill:#90caf9
- style MCP fill:#90caf9
- style DocProcess fill:#fff59d
- style Search fill:#fff59d
- style Agent fill:#fff59d
- style Tasks fill:#c5e1a5
- style Storage fill:#ffccbc
-```
-
-## 2. Layered Architecture
-
-ApeRAG adopts a clear layered design, with each layer serving its specific purpose:
-
-```mermaid
-graph TB
- subgraph Layer1[Client Layer]
- Web[Web Frontend
Next.js]
- Dify[Dify]
- Cursor[Cursor]
- Claude[Claude Desktop]
- end
-
- subgraph Layer2[Interface Layer]
- API[RESTful API
FastAPI]
- MCP[MCP Server
Model Context Protocol]
- end
-
- subgraph Layer3[Service Layer]
- CollSvc[Collection Service]
- DocSvc[Document Service]
- SearchSvc[Search Service]
- GraphSvc[Graph Service]
- AgentSvc[Agent Service]
- end
-
- subgraph Layer4[Task Layer]
- Celery[Celery Worker
Async Tasks]
- MinerU[MinerU
Document Parser]
- end
-
- subgraph Layer5[Storage Layer]
- PG[(PostgreSQL)]
- Qdrant[(Qdrant)]
- ES[(Elasticsearch)]
- Neo4j[(Neo4j)]
- Redis[(Redis)]
- MinIO[(MinIO)]
- end
-
- Web --> API
- Dify --> MCP
- Cursor --> MCP
- Claude --> MCP
-
- API --> CollSvc
- API --> DocSvc
- API --> SearchSvc
- API --> GraphSvc
- API --> AgentSvc
-
- MCP --> SearchSvc
- MCP --> AgentSvc
-
- CollSvc --> Celery
- DocSvc --> Celery
- GraphSvc --> Celery
-
- Celery --> MinerU
- Celery --> PG
- Celery --> Qdrant
- Celery --> ES
- Celery --> Neo4j
- Celery --> MinIO
-
- SearchSvc --> PG
- SearchSvc --> Qdrant
- SearchSvc --> ES
- SearchSvc --> Neo4j
-
- style Layer1 fill:#e3f2fd
- style Layer2 fill:#f3e5f5
- style Layer3 fill:#fff3e0
- style Layer4 fill:#e8f5e9
- style Layer5 fill:#fce4ec
-```
-
-**Layer Responsibilities**:
-
-- **Client Layer**: Multiple access methods - Web UI for management, MCP clients (Dify, Cursor, Claude, etc.) for integration
-- **Interface Layer**: RESTful API (traditional HTTP interface) and MCP Server (AI tool protocol) provide services in parallel
-- **Service Layer**: Core business logic, coordinating resources to complete specific functions
-- **Task Layer**: Handles time-consuming operations (document parsing, index building) to ensure fast API responses
-- **Storage Layer**: Multiple storage systems, selecting optimal solutions for different data types
-
-## 3. Document Processing Flow
-
-This is one of ApeRAG's core capabilities. From uploading a PDF file to making it searchable involves a series of carefully designed processing steps.
-
-### 3.1 Document Upload and Parsing
-
-When you upload a document, ApeRAG automatically identifies the format and selects the appropriate parser:
-
-```mermaid
-flowchart TD
- Upload[User Upload Document] --> Detect[Format Detection]
-
- Detect --> |PDF| MinerU[MinerU Parser]
- Detect --> |Word/Excel| MarkItDown[MarkItDown Parser]
- Detect --> |Markdown| DirectParse[Direct Parse]
- Detect --> |Image| OCR[OCR Recognition]
-
- MinerU --> Extract[Content Extraction]
- MarkItDown --> Extract
- DirectParse --> Extract
- OCR --> Extract
-
- Extract --> Parts[Document Parts
Parts Objects]
-
- style Upload fill:#e1f5ff
- style Extract fill:#c5e1a5
- style Parts fill:#fff59d
-```
-
-**MinerU's Power**:
-
-- Accurately recognizes complex PDF table structures, preserving table content integrity
-- Extracts LaTeX mathematical formulas, maintaining formula readability
-- Performs OCR on scanned PDFs, supporting mixed Chinese-English text
-- Identifies image regions in documents, supporting image content understanding
-
-### 3.2 Intelligent Chunking Strategy
-
-After document parsing, content needs to be split into appropriately sized chunks. This step is critical - chunks too large affect retrieval precision, too small lose context.
-
-```mermaid
-flowchart TD
- Parts[Document Parts] --> Rechunk[Smart Re-chunking]
-
- Rechunk --> Analysis[Analyze Document Structure]
- Analysis --> Hierarchy[Identify Title Hierarchy]
- Hierarchy --> Group[Group by Titles]
-
- Group --> Check{Check Chunk Size}
- Check --> |Too Large| Split[Semantic Splitting]
- Check --> |Appropriate| Chunks[Final Chunks]
- Split --> Chunks
-
- Chunks --> AddContext[Add Context]
- AddContext --> FinalChunks[Chunks with Context]
-
- style Rechunk fill:#bbdefb
- style Split fill:#ffccbc
- style FinalChunks fill:#c5e1a5
-```
-
-**Chunking Strategy Features**:
-
-- **Maintain Semantic Integrity**: Avoid breaking sentences in the middle
-- **Preserve Title Context**: Each chunk knows which section it belongs to
-- **Hierarchical Splitting**: Split by paragraphs first, then sentences, finally characters
-- **Smart Merging**: Adjacent small title chunks are merged to avoid information fragmentation
-
-Chunking Parameters:
-- Default chunk size: 1200 tokens (approximately 800-1000 Chinese characters)
-- Overlap size: 100 tokens (ensures context continuity)
-
-### 3.3 Parallel Multi-Index Building
-
-After chunking, multiple indexes are created simultaneously. Each index serves different purposes and complements the others:
-
-| Index Type | Use Case | Storage | Retrieval Method |
-|-----------|----------|---------|------------------|
-| **Vector Index** | Semantic similarity questions, e.g., "how to optimize performance" | Qdrant | Cosine Similarity |
-| **Full-text Index** | Exact keyword search, e.g., "PostgreSQL configuration" | Elasticsearch | BM25 Algorithm |
-| **Graph Index** | Relationship questions, e.g., "what's the connection between A and B" | PostgreSQL/Neo4j | Graph Traversal |
-| **Summary Index** | Quick document overview | PostgreSQL | Vector Matching |
-| **Vision Index** | Image content search | Qdrant | Multimodal Vector |
-
-```mermaid
-flowchart LR
- Chunks[Document Chunks] --> IndexMgr[Index Manager]
-
- IndexMgr --> VectorIdx[Vector Index Creation]
- IndexMgr --> FulltextIdx[Full-text Index Creation]
- IndexMgr --> GraphIdx[Graph Index Creation]
- IndexMgr --> VisionIdx[Vision Index Creation]
-
- VectorIdx --> Qdrant1[(Qdrant)]
- FulltextIdx --> ES[(Elasticsearch)]
- GraphIdx --> Graph[(Neo4j/PG)]
- VisionIdx --> Qdrant2[(Qdrant)]
-
- style IndexMgr fill:#fff59d
- style VectorIdx fill:#bbdefb
- style FulltextIdx fill:#c5e1a5
- style GraphIdx fill:#ffccbc
- style VisionIdx fill:#e1bee7
-```
-
-**Advantages of Parallel Building**:
-- Different indexes can be built simultaneously, improving speed
-- Failure of one index doesn't affect others
-- Can enable specific index types on demand
-
-### 3.4 Knowledge Graph Construction
-
-Graph indexing is ApeRAG's signature feature, extracting structured knowledge from documents.
-
-```mermaid
-flowchart TD
- Chunks[Document Chunks] --> EntityExtract[Entity Extraction]
-
- EntityExtract --> LLM1[Call LLM
Identify Entities]
- LLM1 --> Entities[Entity List
People, Places, Concepts]
-
- Entities --> RelationExtract[Relation Extraction]
- RelationExtract --> LLM2[Call LLM
Identify Relations]
- LLM2 --> Relations[Relation List
Who relates to whom and how]
-
- Entities --> Merge[Entity Merging]
- Relations --> Merge
-
- Merge --> Components[Connected Components Analysis]
- Components --> Parallel[Parallel Processing of Components]
- Parallel --> Graph[(Knowledge Graph)]
-
- style EntityExtract fill:#bbdefb
- style RelationExtract fill:#c5e1a5
- style Merge fill:#ffccbc
- style Components fill:#fff59d
-```
-
-**Key Steps in Graph Construction**:
-
-1. **Entity Extraction**: LLM identifies meaningful entities from document chunks
- - Example: From "Zhang San studies AI at Tsinghua University in Beijing"
- - Entities: Zhang San (person), Beijing (location), Tsinghua University (organization), AI (concept)
-
-2. **Relation Extraction**: Identifies relationships between entities
- - Example: Zhang San --studies--> AI, Zhang San --attends--> Tsinghua University
-
-3. **Entity Merging**: Same entity may have different expressions, needs normalization
- - Example: "LightRAG", "light rag", "Light-RAG" → merged into unified entity
-
-4. **Connected Components Optimization**: Divides graph into independent subgraphs for parallel processing
- - Performance improvement: 2-3x throughput
-
-**Why Connected Components Optimization?**
-
-Suppose you have 100 documents discussing different topics. Entities about "databases" and entities about "machine learning" have no connections and can be processed independently. The connected components algorithm identifies these independent "knowledge islands" and processes them in parallel, greatly improving speed.
-
-### 3.5 Async Task System
-
-Document processing is time-consuming, so ApeRAG uses a "dual-chain architecture" to ensure good user experience:
-
-```mermaid
-graph TB
- subgraph Frontend["🚀 Frontend Chain - Fast Response"]
- direction TB
- A1["📤 User Upload Document"] --> A2["🔌 API Receives Request"]
- A2 --> A3["📋 Index Manager"]
- A3 --> A4["💾 Write to Database
status = PENDING
version = 1"]
- A4 --> A5["✅ Return Success Immediately
< 100ms"]
- end
-
- subgraph Backend["⚙️ Backend Chain - Async Processing"]
- direction TB
- B1["⏰ Celery Beat
Check every 30s"] --> B2["🔍 Reconciler Detects
version ≠ observed_version"]
- B2 --> B3{"🎯 Found Pending Tasks?"}
- B3 -->|Yes| B4["🚀 Schedule Worker"]
- B3 -->|No| B1
- B4 --> B5["📄 Parse Document"]
- B5 --> B6["🔀 Parallel Index Creation
Vector + Fulltext
+ Graph + Vision"]
- B6 --> B7["✨ Update Status
status = ACTIVE
observed_version = 1"]
- B7 --> B1
- end
-
- A4 -.-|"Database State Change"| B2
-
- style Frontend fill:#e3f2fd,stroke:#1976d2,stroke-width:3px
- style Backend fill:#fff3e0,stroke:#f57c00,stroke-width:3px
- style A5 fill:#c8e6c9,stroke:#388e3c,stroke-width:2px
- style B7 fill:#c8e6c9,stroke:#388e3c,stroke-width:2px
- style B3 fill:#fff9c4,stroke:#fbc02d,stroke-width:2px
-```
-
-**Benefits of Dual-Chain**:
-
-- **Fast Frontend Response**: API returns within 100ms after user uploads, no need to wait for processing
-- **Async Backend Processing**: Real processing work happens in background without blocking user operations
-- **Auto Retry**: System automatically retries if processing fails, ensuring eventual success
-- **Status Tracking**: Users can check document processing progress anytime
-
-**Index State Machine**:
-
-```mermaid
-stateDiagram-v2
- [*] --> PENDING: 📤 Document Upload
-
- PENDING --> CREATING: 🚀 Reconciler Detected
Start Processing
-
- CREATING --> ACTIVE: ✅ All Indexes Created Successfully
- CREATING --> FAILED: ❌ Processing Failed
-
- FAILED --> CREATING: 🔄 Auto Retry
(max 3 times)
- FAILED --> [*]: 💔 Exceeded Retry Limit
Mark as Failed
-
- ACTIVE --> CREATING: 🔄 Document Updated
Rebuild Index
- ACTIVE --> [*]: 🗑️ Delete Document
-
- note right of PENDING
- version = 1
- observed_version = 0
- end note
-
- note right of CREATING
- Processing in progress
- May take several minutes
- end note
-
- note right of ACTIVE
- version = 1
- observed_version = 1
- Ready for search
- end note
-```
-
-## 4. Retrieval and Q&A Flow
-
-Once indexed, users can ask questions. ApeRAG's retrieval system intelligently selects appropriate retrieval strategies.
-
-### 4.1 Hybrid Retrieval System
-
-Different types of questions suit different retrieval methods. ApeRAG uses multiple retrieval strategies simultaneously and fuses results:
-
-```mermaid
-flowchart TB
- Query[User Query] --> Router[Retrieval Router]
-
- Router --> |Parallel| Vector[Vector Retrieval]
- Router --> |Parallel| Fulltext[Full-text Retrieval]
- Router --> |Parallel| Graph[Graph Retrieval]
-
- Vector --> Embed[Generate Query Vector]
- Embed --> QdrantSearch[Qdrant Similarity Search]
- QdrantSearch --> R1[Results 1]
-
- Fulltext --> ESSearch[Elasticsearch BM25]
- ESSearch --> R2[Results 2]
-
- Graph --> GraphQuery[Graph Query
local/global/hybrid]
- GraphQuery --> R3[Results 3]
-
- R1 --> Merge[Result Fusion]
- R2 --> Merge
- R3 --> Merge
-
- Merge --> Rerank[Rerank Re-scoring]
- Rerank --> Final[Final Results]
-
- style Query fill:#e1f5ff
- style Vector fill:#bbdefb
- style Fulltext fill:#c5e1a5
- style Graph fill:#ffccbc
- style Rerank fill:#fff59d
- style Final fill:#c5e1a5
-```
-
-**Retrieval Strategy Explanation**:
-
-- **Vector Retrieval**: For semantically similar questions
- - Q: "How to improve system performance?"
- - Finds: "Optimize database queries", "Use caching", etc.
-
-- **Full-text Retrieval**: For exact keyword matching
- - Q: "Where is PostgreSQL configuration file?"
- - Finds paragraphs containing exactly "PostgreSQL" and "configuration file"
-
-- **Graph Retrieval**: For relationship questions
- - Q: "What's the relationship between LightRAG and Neo4j?"
- - Queries connection paths between these two entities in the graph
-
-**Result Fusion Strategy**:
-
-Results from different retrieval methods need merging. ApeRAG uses a Rerank model to re-score all candidate results:
-
-1. Collect all retrieval results (may have duplicates)
-2. Deduplicate, keep most relevant segments
-3. Use Rerank model to evaluate relevance of each segment to the question
-4. Re-sort by new scores
-5. Return Top-K results
-
-### 4.2 Knowledge Graph Query
-
-Graph retrieval has three modes for different types of questions:
-
-| Mode | Use Case | Query Method | Example Question |
-|------|----------|--------------|------------------|
-| **local** | Query local info about an entity | Vector match similar entities → Get neighbor nodes | "Zhang San's personal info" |
-| **global** | Query overall relationships and patterns | Vector match similar relations → Get connection paths | "What's the company's organizational structure" |
-| **hybrid** | Comprehensive questions | local + global combined | "Zhang San's role and responsibilities in the company" |
-
-```mermaid
-flowchart TD
- Question[User Question] --> Analyze[Question Analysis]
-
- Analyze --> Local[Local Mode
Entity-centric]
- Analyze --> Global[Global Mode
Relation-centric]
- Analyze --> Hybrid[Hybrid Mode
Comprehensive Query]
-
- Local --> FindEntity[Find Related Entities]
- FindEntity --> GetNeighbors[Get Neighbors and Relations]
-
- Global --> FindRelations[Find Related Relations]
- FindRelations --> GetContext[Get Relation Context]
-
- Hybrid --> Local
- Hybrid --> Global
-
- GetNeighbors --> Context[Generate Context]
- GetContext --> Context
-
- Context --> Return[Return to LLM]
-
- style Local fill:#bbdefb
- style Global fill:#c5e1a5
- style Hybrid fill:#fff59d
-```
-
-**Real Example**:
-
-Suppose the knowledge graph contains:
-- Entities: Zhang San (person), Database Team (organization), PostgreSQL (technology)
-- Relations: Zhang San --belongs to--> Database Team, Zhang San --excels at--> PostgreSQL
-
-Question: "What is Zhang San responsible for?"
-
-1. **Local Mode**:
- - Finds "Zhang San" entity
- - Gets all directly connected nodes
- - Returns: "Zhang San belongs to Database Team, excels at PostgreSQL"
-
-2. **Global Mode**:
- - Finds related relation patterns: "responsible for", "belongs to"
- - Returns entire team structure and responsibility division
-
-3. **Hybrid Mode**:
- - Uses both methods above
- - Provides more comprehensive answer
-
-### 4.3 Agent Dialogue System
-
-Agent is ApeRAG's intelligent assistant that can invoke various tools to answer questions.
-
-```mermaid
-sequenceDiagram
- participant User as User
- participant API as API Server
- participant Agent as Agent Service
- participant LLM as LLM Service
- participant MCP as MCP Tools
- participant Search as Search Service
-
- User->>API: Send Question
- API->>Agent: Forward Question
-
- Agent->>LLM: Call LLM
with Tool List
- LLM-->>Agent: Decide to call search_collection tool
-
- Agent->>MCP: Execute Tool Call
- MCP->>Search: Hybrid Retrieval
- Search-->>MCP: Return Relevant Document Segments
- MCP-->>Agent: Tool Execution Result
-
- Agent->>LLM: Call LLM Again
with Retrieved Context
- LLM-->>Agent: Generate Final Answer
-
- Agent-->>API: Stream Response
- API-->>User: SSE Push Answer
-```
-
-**Agent Workflow**:
-
-1. **Receive Question**: User sends a question
-
-2. **Tool Decision**: LLM analyzes question and decides which tools to call
- - Possible tools: search_collection (search knowledge base), web_search (search internet), web_read (read web page), etc.
-
-3. **Execute Tools**: Agent calls corresponding tools
- - Example: search_collection triggers hybrid retrieval, returns relevant documents
-
-4. **Generate Answer**: LLM generates answer based on retrieved context
-
-5. **Stream Response**: Answer pushed to user in real-time via SSE (Server-Sent Events), no need to wait for complete generation
-
-**Role of MCP Protocol**:
-
-MCP (Model Context Protocol) is a standardized tool protocol that allows AI assistants (like Claude Desktop, Cursor) to easily invoke ApeRAG's capabilities. Through MCP, external AI tools can:
-- List your knowledge bases
-- Search knowledge base content
-- Read web page content
-- Search the internet
-
-**Dialogue Example**:
-
-```
-User: How does ApeRAG's graph indexing work?
-
-Agent thinks: Need to search knowledge base
-↓
-Call tool: search_collection(query="graph indexing principles", collection_id="aperag-docs")
-↓
-Retrieval results: Returns document segments about graph construction, entity extraction, relation extraction
-↓
-Agent answers: ApeRAG's graph indexing works through the following steps... (generated based on retrieved content)
-```
-
-## 5. Storage Architecture
-
-ApeRAG adopts a multi-storage architecture, selecting the most appropriate storage solution for different data types.
-
-### 5.1 Storage Selection Decision
-
-```mermaid
-flowchart TD
- Data["🎯 Data Type Classification"] --> Choice{"📊 What Data?"}
-
- Choice --> |"📋 Structured Data
Users, Configs, etc."| PG["PostgreSQL"]
- Choice --> |"🔢 Vector Data
embeddings"| Qdrant["Qdrant"]
- Choice --> |"📝 Text Data
Full-text Search"| ES["Elasticsearch"]
- Choice --> |"📁 File Data
Raw Documents"| MinIO["MinIO/S3"]
- Choice --> |"🕸️ Graph Data
Knowledge Graph"| GraphChoice{"Graph Scale?"}
- Choice --> |"⚡ Cache Data
Temporary Data"| Redis["Redis"]
-
- GraphChoice -->|"Small Scale
< 100K entities
💰 Recommended"| PG2["PostgreSQL
Built-in Graph Storage"]
- GraphChoice -->|"Large Scale
> 1M entities"| Neo4j["Neo4j
Professional Graph DB"]
-
- PG --> PGUse["✅ Transaction Support
✅ Relational Queries
✅ Small-scale Graph Storage
✅ Mature & Stable"]
- PG2 --> PG2Use["✅ No Extra Components
✅ Lower Ops Cost
✅ Sufficient for Most Cases"]
- Qdrant --> QdrantUse["✅ Vector Similarity Search
✅ High-dimensional Data Retrieval
✅ Filter Support"]
- ES --> ESUse["✅ Full-text Search BM25
✅ Keyword Search
✅ Chinese Tokenization IK"]
- MinIO --> MinIOUse["✅ Large File Storage
✅ S3 Protocol Compatible
✅ Low Cost"]
- Neo4j --> Neo4jUse["✅ Large-scale Graph Query
✅ Complex Relation Traversal
✅ Graph Algorithm Support"]
- Redis --> RedisUse["✅ Celery Task Queue
✅ LLM Call Cache
✅ Millisecond Access"]
-
- style Data fill:#e3f2fd,stroke:#1976d2,stroke-width:3px
- style Choice fill:#fff59d,stroke:#fbc02d,stroke-width:3px
- style GraphChoice fill:#fff59d,stroke:#fbc02d,stroke-width:2px
- style PG fill:#bbdefb,stroke:#1976d2,stroke-width:2px
- style PG2 fill:#c8e6c9,stroke:#388e3c,stroke-width:3px
- style Qdrant fill:#c5e1a5,stroke:#689f38,stroke-width:2px
- style ES fill:#ffccbc,stroke:#e64a19,stroke-width:2px
- style MinIO fill:#e1bee7,stroke:#8e24aa,stroke-width:2px
- style Neo4j fill:#f8bbd0,stroke:#c2185b,stroke-width:2px
- style Redis fill:#ffecb3,stroke:#ffa000,stroke-width:2px
-```
-
-### 5.2 Data Flow
-
-Different data flows to different storage systems:
-
-```mermaid
-flowchart LR
- Doc[Upload Document] --> Parser[Parser]
- Parser --> |Raw Files| MinIO[(MinIO)]
- Parser --> |Document Metadata| PG1[(PostgreSQL)]
- Parser --> |Document Chunks| Chunks[Chunking]
-
- Chunks --> |Generate Vectors| Embed[Embedding]
- Embed --> Qdrant[(Qdrant)]
-
- Chunks --> |Text Content| ES[(Elasticsearch)]
-
- Chunks --> |Extract Entity Relations| Graph[Graph Construction]
- Graph --> |Small Scale| PG2[(PostgreSQL)]
- Graph --> |Large Scale| Neo4j[(Neo4j)]
-
- PG1 -.Metadata.-> Cache
- Cache -.Cache.-> Redis[(Redis)]
-
- style Doc fill:#e1f5ff
- style MinIO fill:#e1bee7
- style PG1 fill:#bbdefb
- style PG2 fill:#bbdefb
- style Qdrant fill:#c5e1a5
- style ES fill:#ffccbc
- style Neo4j fill:#f8bbd0
- style Redis fill:#ffecb3
-```
-
-### 5.3 Core Storage Systems
-
-**PostgreSQL** (Primary Database)
-
-Storage Content:
-- User info, permissions, configurations
-- Collection (knowledge base) metadata
-- Document metadata and index status
-- Conversation history
-- Small-scale knowledge graphs (< 100K entities)
-
-Why Choose:
-- Strong transaction support, ensures data consistency
-- Mature and stable, low operational cost
-- pgvector extension, supports vector storage
-- Can handle small-scale graph data without extra graph database
-
-**Qdrant** (Vector Database)
-
-Storage Content:
-- Document chunk embedding vectors
-- Entity and relation vector representations
-- Multimodal vectors for images
-
-Why Choose:
-- Optimized specifically for vector retrieval, fast
-- Supports filter conditions, can combine with metadata filtering
-- Supports cluster deployment, horizontally scalable
-
-**Elasticsearch** (Full-text Search)
-
-Storage Content:
-- Document chunk text content
-- Supports Chinese tokenization (IK Analyzer)
-
-Why Choose:
-- BM25 algorithm works well for keyword search
-- Supports complex queries and aggregations
-- Built-in highlighting
-
-**MinIO** (Object Storage)
-
-Storage Content:
-- Raw document files (PDF, Word, etc.)
-- Intermediate results after parsing
-- Temporary uploaded files
-
-Why Choose:
-- S3 protocol compatible, can replace with cloud storage
-- Low storage cost
-- Supports large files
-
-**Graph Database Choice: PostgreSQL vs Neo4j**
-
-ApeRAG supports two graph database solutions:
-
-**PostgreSQL** (Default, Recommended for Small Scale)
-
-Storage Content:
-- Knowledge graphs (< 100K entities)
-- Graph node and edge relationship data
-
-Recommendation Reasons:
-- No extra deployment, lower operational cost
-- Performance sufficient for most scenarios
-- Complete transaction support, data consistency guaranteed
-- Can share database with other business data
-
-**Neo4j** (Optional, for Large Scale)
-
-Storage Content:
-- Large-scale knowledge graphs (> 1M entities)
-
-When Needed:
-- Entity count exceeds 100K, PostgreSQL query performance degrades
-- Need complex graph traversal queries (multi-hop relations)
-- Need to use graph algorithms (PageRank, community detection, etc.)
-
-**Summary**: For most enterprise applications, PostgreSQL is completely sufficient. Only consider Neo4j when knowledge graph scale is very large.
-
-**Redis** (Cache and Queue)
-
-Storage Content:
-- Celery task queue
-- LLM call cache
-- User session cache
-
-Why Choose:
-- Extremely fast, suitable for high-frequency access
-- Supports multiple data structures
-- Can serve as task queue Broker
-
-## 6. Technical Highlights
-
-### 6.1 Stateless LightRAG Refactoring
-
-**Background Problem**:
-
-Original LightRAG uses global state, all tasks share one instance. This causes data confusion and concurrency conflicts in multi-user, multi-Collection scenarios.
-
-**ApeRAG's Solution**:
-
-- Each task creates independent LightRAG instance
-- Isolates different Collection data through `workspace` parameter
-- Entity naming convention: `entity:{name}:{workspace}`
-- Relation naming convention: `relationship:{src}:{tgt}:{workspace}`
-
-This way, different users' graph data won't interfere with each other, truly achieving multi-tenant isolation.
-
-### 6.2 Dual-Chain Async Architecture
-
-**Traditional Approach Problem**:
-
-After user uploads document, API needs to wait for parsing and index building to complete before returning, possibly taking several minutes or longer.
-
-**Dual-Chain Architecture Advantages**:
-
-- **Frontend Chain**: API only writes state to database, returns within 100ms
-- **Backend Chain**: Reconciler periodically detects state changes, schedules async tasks
-- **Version Control**: Implements idempotency through version and observed_version
-- **Auto Retry**: Automatically retries failed tasks, ensures eventual consistency
-
-This design is inspired by Kubernetes' Reconciler pattern, very suitable for handling long-running tasks.
-
-### 6.3 Connected Components Concurrency Optimization
-
-**Problem**:
-
-During knowledge graph construction, similar entities need merging. Serial processing is slow. Full parallelization has lock contention issues.
-
-**Solution**:
-
-Use connected components algorithm to divide graph into multiple independent subgraphs:
-
-1. Build entity-relation adjacency list
-2. BFS traversal to find all connected components
-3. Different components have no connections, can be fully processed in parallel
-4. Same component processed serially internally (avoid conflicts)
-
-**Results**:
-
-- 2-3x performance improvement
-- Zero lock contention
-- Best results for diverse document collections
-
-### 6.4 Provider Abstraction Pattern
-
-ApeRAG supports 100+ LLM providers (OpenAI, Claude, Gemini, domestic LLMs, etc.). How to manage uniformly?
-
-**Design Approach**:
-
-- Define unified Provider interface
-- Each provider implements its own Provider
-- Adapt through LiteLLM library
-
-This way, switching models only requires config change, no code change. Same pattern also applies to:
-- Embedding Service (supports multiple vector models)
-- Rerank Service (supports multiple reranking models)
-- Web Search Service (DuckDuckGo, JINA, etc.)
-
-### 6.5 Multimodal Index Support
-
-Besides text, ApeRAG can also handle images:
-
-**Vision Index's Two Paths**:
-
-1. **Pure Visual Vectors**: Use multimodal models (like CLIP) to directly generate image vectors
-2. **Vision to Text**: Use VLM to generate image descriptions + OCR to recognize text → text vectorization
-
-**Fusion Strategy**:
-
-- Text and visual retrieval results sorted separately
-- Unified scoring through Rerank model
-- Final merged display
-
-## 7. Summary
-
-ApeRAG achieves production-grade RAG capabilities through the following design:
-
-**Core Advantages**:
-- **Powerful Document Processing**: Supports multiple formats, complex layouts, tables and formulas
-- **Knowledge Graph Fusion**: Not just vector matching, but understanding knowledge relationships
-- **Multiple Retrieval Methods**: Vector, full-text, and graph working together
-- **Async Architecture**: Fast response, background processing, good user experience
-- **Production-Grade Design**: Multi-storage, high concurrency, easy to scale
-
-**Technical Innovations**:
-- Stateless LightRAG, true multi-tenant support
-- Dual-chain async architecture, API response < 100ms
-- Connected components concurrency optimization, 2-3x faster graph construction
-- Provider abstraction, supports 100+ LLMs
-
-**Use Cases**:
-- Enterprise knowledge base search
-- Technical documentation Q&A
-- Customer service bots
-- Research paper analysis
-- Any scenario requiring document understanding and intelligent Q&A
-
-The system's design philosophy is: **Make complex things simple, make simple things automatic**. Users just need to upload documents, everything else is handled automatically by ApeRAG.
diff --git a/web/docs/en-US/design/chat_history_design.md b/web/docs/en-US/design/chat_history_design.md
deleted file mode 100644
index f2d729f1d..000000000
--- a/web/docs/en-US/design/chat_history_design.md
+++ /dev/null
@@ -1,590 +0,0 @@
----
-title: Chat History Message Data Flow
-description: Complete data flow of chat history messages in ApeRAG, from frontend API calls to backend storage
-keywords: [chat, history, message, redis, postgresql, websocket, part-based design]
----
-
-# ApeRAG Chat History Message Data Flow
-
-## Overview
-
-This document details the complete data flow of chat history messages in the ApeRAG project, covering the full-stack implementation from frontend API calls to backend storage.
-
-**Core API**: `GET /api/v1/bots/{bot_id}/chats/{chat_id}`
-
-## Data Flow Diagram
-
-```
-┌─────────────────┐
-│ Frontend │
-│ (Next.js) │
-└────────┬────────┘
- │ GET /api/v1/bots/{bot_id}/chats/{chat_id}
- ▼
-┌─────────────────────────────────────────────┐
-│ View Layer │
-│ aperag/views/chat.py │
-│ - get_chat_view() │
-│ - JWT Authentication │
-│ - Parameter Validation │
-└────────┬────────────────────────────────────┘
- │ chat_service_global.get_chat()
- ▼
-┌─────────────────────────────────────────────┐
-│ Service Layer │
-│ aperag/service/chat_service.py │
-│ - get_chat() │
-│ - Business Logic Orchestration │
-└────────┬────────────────────────────────────┘
- │
- ├──────────────┬─────────────┐
- │ │ │
- ▼ ▼ ▼
-┌────────────┐ ┌───────────┐ ┌──────────────┐
-│ PostgreSQL │ │ Redis │ │ PostgreSQL │
-│ chat table │ │ Message │ │feedback table│
-│ (Metadata) │ │ History │ │(User Feedback)│
-└────────────┘ └───────────┘ └──────────────┘
- │ │ │
- └──────────────┴──────────────────┘
- │
- ▼
- ┌──────────────┐
- │ ChatDetails │
- │ (Assemble) │
- └──────────────┘
-```
-
-## Detailed Flow
-
-### 1. View Layer - HTTP Request Handling
-
-**File**: `aperag/views/chat.py`
-
-```python
-@router.get("/bots/{bot_id}/chats/{chat_id}")
-async def get_chat_view(
- request: Request,
- bot_id: str,
- chat_id: str,
- user: User = Depends(required_user)
-) -> view_models.ChatDetails:
- return await chat_service_global.get_chat(str(user.id), bot_id, chat_id)
-```
-
-**Responsibilities**:
-- Receive HTTP GET requests
-- JWT Token authentication
-- Extract path parameters (bot_id, chat_id)
-- Call Service layer
-- Return `ChatDetails` response
-
-### 2. Service Layer - Business Logic Orchestration
-
-**File**: `aperag/service/chat_service.py`
-
-```python
-async def get_chat(self, user: str, bot_id: str, chat_id: str) -> view_models.ChatDetails:
- from aperag.utils.history import query_chat_messages
-
- # Step 1: Query Chat metadata from PostgreSQL
- chat = await self.db_ops.query_chat(user, bot_id, chat_id)
- if chat is None:
- raise ChatNotFoundException(chat_id)
-
- # Step 2: Query message history from Redis
- messages = await query_chat_messages(user, chat_id)
-
- # Step 3: Build response object (messages already include feedback info)
- chat_obj = self.build_chat_response(chat)
- return ChatDetails(**chat_obj.model_dump(), history=messages)
-```
-
-**Core Logic**:
-
-1. **Query Chat Metadata** (PostgreSQL)
-2. **Query Message History** (Redis + PostgreSQL feedback)
-3. **Assemble Complete Response**
-
-### 3. Data Storage Layer
-
-#### 3.1 PostgreSQL - Chat Metadata
-
-**Table**: `chat`
-
-**File**: `aperag/db/models.py`
-
-```python
-class Chat(Base):
- __tablename__ = "chat"
-
- id = Column(String(24), primary_key=True) # chat_xxxx
- user = Column(String(256), nullable=False) # User ID
- bot_id = Column(String(24), nullable=False) # Bot ID
- title = Column(String(256)) # Chat title
- peer_type = Column(EnumColumn(ChatPeerType)) # Peer type
- peer_id = Column(String(256)) # Peer ID
- status = Column(EnumColumn(ChatStatus)) # Status
- gmt_created = Column(DateTime(timezone=True)) # Created time
- gmt_updated = Column(DateTime(timezone=True)) # Updated time
- gmt_deleted = Column(DateTime(timezone=True)) # Deleted time (soft delete)
-```
-
-**Purpose**: Store Chat session metadata without actual message content
-
-#### 3.2 Redis - Message History
-
-**File**: `aperag/utils/history.py`
-
-**Key Format**: `message_store:{chat_id}`
-
-**Data Structure**: Redis List (using LPUSH, newest messages first)
-
-**Core Class**:
-
-```python
-class RedisChatMessageHistory:
- def __init__(self, session_id: str, key_prefix: str = "message_store:"):
- self.session_id = session_id
- self.key_prefix = key_prefix
-
- @property
- def key(self) -> str:
- return self.key_prefix + self.session_id # message_store:chat_abc123
-
- @property
- async def messages(self) -> List[StoredChatMessage]:
- # Read all messages from Redis
- _items = await self.redis_client.lrange(self.key, 0, -1)
- # Reverse to chronological order (LPUSH puts newest first)
- items = [json.loads(m.decode("utf-8")) for m in _items[::-1]]
- return [storage_dict_to_message(item) for item in items]
-```
-
-**Message Query Function**:
-
-```python
-async def query_chat_messages(user: str, chat_id: str):
- """Query chat messages and convert to frontend format"""
-
- # 1. Get message history from Redis
- chat_history = RedisChatMessageHistory(chat_id, redis_client=get_async_redis_client())
- stored_messages = await chat_history.messages
-
- if not stored_messages:
- return []
-
- # 2. Get feedback info from PostgreSQL
- feedbacks = await async_db_ops.query_chat_feedbacks(user, chat_id)
- feedback_map = {feedback.message_id: feedback for feedback in feedbacks}
-
- # 3. Convert to frontend format and attach feedback info
- result = []
- for stored_message in stored_messages:
- # Convert to frontend format
- chat_message_list = stored_message.to_frontend_format()
-
- # Add feedback data for AI messages
- for chat_msg in chat_message_list:
- feedback = feedback_map.get(chat_msg.id)
- if feedback and chat_msg.role == "ai":
- chat_msg.feedback = Feedback(
- type=feedback.type,
- tag=feedback.tag,
- message=feedback.message
- )
-
- result.append(chat_message_list)
-
- return result # [[message1_parts], [message2_parts], [message3_parts], ...]
-```
-
-#### 3.3 PostgreSQL - User Feedback
-
-**Table**: `message_feedback`
-
-```python
-class MessageFeedback(Base):
- __tablename__ = "message_feedback"
-
- user = Column(String(256), nullable=False) # User ID
- chat_id = Column(String(24), primary_key=True) # Chat ID
- message_id = Column(String(256), primary_key=True) # Message ID
- type = Column(EnumColumn(MessageFeedbackType)) # like/dislike
- tag = Column(EnumColumn(MessageFeedbackTag)) # Feedback tag
- message = Column(Text) # Feedback content
- question = Column(Text) # Original question
- original_answer = Column(Text) # Original answer
- status = Column(EnumColumn(MessageFeedbackStatus)) # Status
- gmt_created = Column(DateTime(timezone=True))
- gmt_updated = Column(DateTime(timezone=True))
-```
-
-**Purpose**: Store user feedback on AI responses (like/dislike) for quality monitoring and model optimization
-
-## Data Format Specification
-
-### Storage Format (Redis)
-
-Messages in Redis are stored in JSON format using **Part-Based Design**:
-
-#### StoredChatMessage - A Complete Message
-
-```python
-class StoredChatMessage(BaseModel):
- """A complete message (either a user message or an AI message)"""
- parts: List[StoredChatMessagePart] # Multiple parts of the message
- files: List[Dict[str, Any]] # Associated uploaded files
-```
-
-#### StoredChatMessagePart - A Message Part
-
-```python
-class StoredChatMessagePart(BaseModel):
- """A single part of a message (atomic unit)"""
-
- # Identification
- chat_id: str # Chat session ID
- message_id: str # Message ID (shared by multiple parts of the same message)
- part_id: str # Unique part ID
- timestamp: float # Generation timestamp
-
- # Content Classification
- type: Literal["message", "tool_call_result", "thinking", "references"]
- role: Literal["human", "ai", "system"]
- content: str
-
- # Extended Fields
- references: List[Dict] # Document references
- urls: List[str] # URL references
- metadata: Optional[Dict] # Additional metadata
-```
-
-#### Part Type Descriptions
-
-| Type | Description | Included in LLM Context |
-|------|-------------|------------------------|
-| `message` | Main conversation content | ✅ Yes |
-| `tool_call_result` | Tool calling process | ❌ No (display only) |
-| `thinking` | AI thinking process | ❌ No (display only) |
-| `references` | Document references and links | ❌ No (display only) |
-
-**Design Rationale**: A single AI response contains multiple stages (tool calling, thinking, answering, references), and these contents are generated sequentially and interleaved. A single field cannot express this. User messages typically have only 1 part (type="message"), but also support multiple parts to maintain structural consistency.
-
-#### Redis Storage Example
-
-**User Message**:
-```json
-{
- "parts": [
- {
- "chat_id": "chat_abc123",
- "message_id": "uuid-1",
- "part_id": "uuid-part-1",
- "timestamp": 1699999999.0,
- "type": "message",
- "role": "human",
- "content": "What is LightRAG?",
- "references": [],
- "urls": [],
- "metadata": null
- }
- ],
- "files": []
-}
-```
-
-**AI Response (with multiple parts)**:
-```json
-{
- "parts": [
- {
- "message_id": "uuid-2",
- "part_id": "uuid-part-2",
- "type": "tool_call_result",
- "role": "ai",
- "content": "Searching knowledge base...",
- "timestamp": 1699999999.1
- },
- {
- "message_id": "uuid-2",
- "part_id": "uuid-part-3",
- "type": "message",
- "role": "ai",
- "content": "LightRAG is a lightweight RAG framework, deeply modified by the ApeCloud team...",
- "timestamp": 1699999999.5
- },
- {
- "message_id": "uuid-2",
- "part_id": "uuid-part-4",
- "type": "references",
- "role": "ai",
- "content": "",
- "references": [
- {
- "score": 0.95,
- "text": "LightRAG architecture description...",
- "metadata": {"source": "lightrag_doc.pdf", "page": 3}
- }
- ],
- "urls": ["https://github.com/HKUDS/LightRAG"],
- "timestamp": 1699999999.6
- }
- ],
- "files": []
-}
-```
-
-### API Response Format
-
-**ChatDetails Schema** (`aperag/api/components/schemas/chat.yaml`):
-
-```yaml
-chatDetails:
- type: object
- properties:
- id: string # chat_abc123
- title: string # Chat title
- bot_id: string # bot_xyz
- peer_id: string
- peer_type: string # system/feishu/weixin/web
- status: string # active/archived
- created: string # ISO 8601
- updated: string # ISO 8601
- history: # 2D array
- type: array
- description: Conversation history, each element is a message
- items:
- type: array
- description: A message contains multiple parts (tool calls, thinking, answer, references, etc.)
- items:
- $ref: '#/chatMessage'
-```
-
-**ChatMessage Schema**:
-
-```yaml
-chatMessage:
- type: object
- properties:
- id: string # message_id (same for one turn)
- part_id: string # part_id (unique for each part)
- type: string # message/tool_call_result/thinking/references
- timestamp: number # Unix timestamp
- role: string # human/ai
- data: string # Message content
- references: # Document references (optional)
- type: array
- items:
- type: object
- properties:
- score: number
- text: string
- metadata: object
- urls: # URL references (optional)
- type: array
- items:
- type: string
- feedback: # User feedback (optional)
- type: object
- properties:
- type: string # like/dislike
- tag: string
- message: string
- files: # Associated files (optional)
- type: array
-```
-
-### Frontend Response Example
-
-```json
-{
- "id": "chat_abc123",
- "title": "Discussion about LightRAG",
- "bot_id": "bot_xyz",
- "status": "active",
- "created": "2025-01-01T00:00:00Z",
- "updated": "2025-01-01T01:00:00Z",
- "history": [
- [
- {
- "id": "uuid-1",
- "part_id": "uuid-part-1",
- "type": "message",
- "timestamp": 1699999999.0,
- "role": "human",
- "data": "What is LightRAG?",
- "files": []
- }
- ],
- [
- {
- "id": "uuid-2",
- "part_id": "uuid-part-2",
- "type": "tool_call_result",
- "timestamp": 1699999999.1,
- "role": "ai",
- "data": "Searching knowledge base...",
- "files": []
- },
- {
- "id": "uuid-2",
- "part_id": "uuid-part-3",
- "type": "message",
- "timestamp": 1699999999.5,
- "role": "ai",
- "data": "LightRAG is a lightweight RAG framework...",
- "files": []
- },
- {
- "id": "uuid-2",
- "part_id": "uuid-part-4",
- "type": "references",
- "timestamp": 1699999999.6,
- "role": "ai",
- "data": "",
- "references": [
- {
- "score": 0.95,
- "text": "LightRAG architecture description...",
- "metadata": {"source": "lightrag_doc.pdf"}
- }
- ],
- "urls": ["https://github.com/HKUDS/LightRAG"],
- "files": []
- }
- ]
- ]
-}
-```
-
-**Note**: `history` is a 2D array. The first dimension is the message sequence (in chronological order), and the second dimension is the multiple parts of that message. For example:
-- `history[0]` = Parts of user's 1st message (usually only 1 part)
-- `history[1]` = Parts of AI's 1st response (may have multiple parts: tool calls, thinking, answer, references)
-- `history[2]` = Parts of user's 2nd message
-- `history[3]` = Parts of AI's 2nd response
-- ...
-
-## Message Write Flow
-
-### Agent Runtime Write Path
-
-The legacy WebSocket chat endpoint `WS /api/v1/bots/{bot_id}/chats/{chat_id}/connect` has been retired.
-Current agent chat writes now go through the v2 turn/timeline APIs and SSE event stream. Keep the history
-schema above as background, but do not use this document to implement new WebSocket chat clients.
-
-## Design Features
-
-### 1. Hybrid Storage Architecture
-
-| Storage | Content | Reason |
-|---------|---------|--------|
-| PostgreSQL | Chat metadata | Persistence, complex queries |
-| Redis | Message history | High-performance read/write, TTL support |
-| PostgreSQL | User feedback | Persistence, for analysis |
-
-**Advantages**:
-- Performance optimization: Message history uses Redis for fast read/write
-- Data persistence: Important metadata stored in PostgreSQL
-- Flexibility: Independent TTL and backup strategy configuration
-
-### 2. Part-Based Message Design
-
-**Core Value**:
-- ✅ Support complex AI response flow (tool calling → thinking → answer → references)
-- ✅ Frontend can render different types of content differently
-- ✅ Complete temporal relationship recording (via timestamp)
-- ✅ Flexible extension (adding new types doesn't require schema changes)
-
-**Why does a single message need multiple parts?**
-
-A single AI response is generated sequentially and interleaved, for example:
-1. 🔍 Part1 (tool_call_result): "Querying database..."
-2. 💭 Part2 (thinking): "Found 327 records..."
-3. 🔍 Part3 (tool_call_result): "Calculating growth rate..."
-4. 💭 Part4 (thinking): "15% QoQ growth..."
-5. 💬 Part5 (message): "Based on data analysis, Q4 performance is excellent..."
-6. 📚 Part6 (references): [Document 1, Document 2]
-
-These 6 parts belong to **one AI message** (sharing the same message_id). A single field cannot express such complex temporal relationships.
-
-### 3. Format Conversion Decoupling
-
-Three format conversions are provided:
-
-```python
-class StoredChatMessage:
- def to_frontend_format(self) -> List[ChatMessage]:
- """Convert to frontend display format"""
- # Include all types of parts
-
- def to_openai_format(self) -> List[Dict]:
- """Convert to LLM call format"""
- # Only include type="message" parts
-
- def get_main_content(self) -> str:
- """Get main answer content"""
- # Content of the first type="message" part
-```
-
-**Advantages**:
-- Internal storage format decoupled from external interfaces
-- Support different consumption scenarios
-- LLM context only includes actual conversation content, not tool calls and thinking processes
-
-### 4. Three-Level ID Design
-
-```python
-chat_id = "chat_abc123" # Session level
-message_id = "uuid-msg-1" # Message level (shared by parts of the same message)
-part_id = "uuid-part-1" # Part level (each part is independent)
-```
-
-**Purpose**:
-- `chat_id`: Identifies a chat session
-- `message_id`: Groups parts of the same message (for frontend display and feedback association)
-- `part_id`: Uniquely identifies each part (for individual operations like copy, reference)
-
-## Performance Considerations
-
-### Redis Optimization
-- **List Data Structure**: LPUSH O(1), LRANGE O(N)
-- **Optional TTL**: Automatic expiration of historical messages
-- **Connection Pool Reuse**: Global Redis client
-
-### PostgreSQL Optimization
-- **Indexes**: user, bot_id, chat_id, status fields
-- **Soft Delete**: Using gmt_deleted
-- **Paginated Queries**: list_chats supports pagination
-
-### Transmission Optimization
-- **WebSocket Streaming**: Generate and send simultaneously
-- **Incremental Updates**: Only transmit new parts
-- **Lazy Loading**: Load historical messages on demand
-
-## Related Files
-
-### Core Implementation
-- `aperag/views/chat.py` - View layer interface
-- `aperag/service/chat_service.py` - Service layer business logic
-- `aperag/utils/history.py` - Redis message history management
-- `aperag/chat/history/message.py` - Message data structures
-- `aperag/db/models.py` - Database models
-- `aperag/db/repositories/chat.py` - Chat database operations
-- `aperag/api/components/schemas/chat.yaml` - OpenAPI Schema
-
-### Frontend Implementation
-- `web/src/app/workspace/bots/[botId]/chats/[chatId]/page.tsx` - Chat detail page
-- `web/src/components/chat/chat-messages.tsx` - Message display component
-
-## Summary
-
-ApeRAG's chat history message system adopts **Hybrid Storage + Part-Based Message Design**:
-
-1. **PostgreSQL** stores Chat metadata and feedback (persistence, queryable)
-2. **Redis** stores message history (high performance, expiration support)
-3. **Part-Based Design** supports complex AI response flows (tool calling, thinking, answering, references)
-4. **Three-Level ID Design** supports message grouping and independent operations
-5. **Clear Layered Architecture** (View → Service → Repository → Storage)
-
-This design ensures both performance and support for complex AI interaction scenarios, while maintaining good scalability.
diff --git a/web/docs/en-US/design/document_upload_design.md b/web/docs/en-US/design/document_upload_design.md
deleted file mode 100644
index b31d76ba7..000000000
--- a/web/docs/en-US/design/document_upload_design.md
+++ /dev/null
@@ -1,1076 +0,0 @@
----
-title: Document Upload Design
----
-
-# ApeRAG Document Upload Architecture Design
-
-## Overview
-
-This document details the complete architecture design of the document upload module in the ApeRAG project, covering the full pipeline from file upload, temporary storage, document parsing, format conversion to final index construction.
-
-**Core Design Philosophy**: Adopts a **two-phase commit** pattern, separating file upload (temporary storage) from document confirmation (formal addition), providing better user experience and resource management capabilities.
-
-## System Architecture
-
-### Overall Architecture
-
-```
-┌─────────────────────────────────────────────────────────────┐
-│ Frontend │
-│ (Next.js) │
-└────────┬───────────────────────────────────┬────────────────┘
- │ │
- │ Step 1: Upload │ Step 2: Confirm
- │ POST /documents/upload │ POST /documents/confirm
- ▼ ▼
-┌─────────────────────────────────────────────────────────────┐
-│ View Layer: aperag/views/collections.py │
-│ - HTTP request handling │
-│ - JWT authentication │
-│ - Parameter validation │
-└────────┬───────────────────────────────────┬────────────────┘
- │ │
- │ document_service.upload_document() │ document_service.confirm_documents()
- ▼ ▼
-┌─────────────────────────────────────────────────────────────┐
-│ Service Layer: aperag/service/document_service.py │
-│ - Business logic orchestration │
-│ - File validation (type, size) │
-│ - SHA-256 hash deduplication │
-│ - Quota checking │
-│ - Transaction management │
-└────────┬───────────────────────────────────┬────────────────┘
- │ │
- │ Step 1 │ Step 2
- ▼ ▼
-┌────────────────────────┐ ┌────────────────────────────┐
-│ 1. Create Document │ │ 1. Update Document status │
-│ status=UPLOADED │ │ UPLOADED → PENDING │
-│ 2. Save to ObjectStore│ │ 2. Create DocumentIndex │
-│ 3. Calculate hash │ │ 3. Trigger indexing tasks │
-└────────┬───────────────┘ └────────┬───────────────────┘
- │ │
- ▼ ▼
-┌─────────────────────────────────────────────────────────────┐
-│ Storage Layer │
-│ │
-│ ┌───────────────┐ ┌──────────────────┐ ┌─────────────┐ │
-│ │ PostgreSQL │ │ Object Store │ │ Vector DB │ │
-│ │ │ │ │ │ │ │
-│ │ - document │ │ - Local/S3 │ │ - Qdrant │ │
-│ │ - document_ │ │ - Original files │ │ - Vectors │ │
-│ │ index │ │ - Converted files│ │ │ │
-│ └───────────────┘ └──────────────────┘ └─────────────┘ │
-│ │
-│ ┌───────────────┐ ┌──────────────────┐ │
-│ │ Elasticsearch │ │ Neo4j/PG │ │
-│ │ │ │ │ │
-│ │ - Full-text │ │ - Knowledge Graph│ │
-│ └───────────────┘ └──────────────────┘ │
-└─────────────────────────────────────────────────────────────┘
- │
- ▼
- ┌───────────────────┐
- │ Celery Workers │
- │ │
- │ - Doc parsing │
- │ - Format convert │
- │ - Content extract│
- │ - Doc chunking │
- │ - Index building │
- └───────────────────┘
-```
-
-### Layered Architecture
-
-```
-┌─────────────────────────────────────────────┐
-│ View Layer (views/collections.py) │ HTTP handling, auth, validation
-└─────────────────┬───────────────────────────┘
- │ calls
-┌─────────────────▼───────────────────────────┐
-│ Service Layer (service/document_service.py)│ Business logic, transaction, permission
-└─────────────────┬───────────────────────────┘
- │ calls
-┌─────────────────▼───────────────────────────┐
-│ Repository Layer (db/ops.py, objectstore/) │ Data access abstraction
-└─────────────────┬───────────────────────────┘
- │ accesses
-┌─────────────────▼───────────────────────────┐
-│ Storage Layer (PG, S3, Qdrant, ES, Neo4j) │ Data persistence
-└─────────────────────────────────────────────┘
-```
-
-## Core Process Details
-
-### Phase 0: API Interface Definition
-
-The system provides three main interfaces:
-
-1. **Upload File** (Two-phase mode - Step 1)
- - Endpoint: `POST /api/v1/collections/{collection_id}/documents/upload`
- - Function: Upload file to temporary storage, status `UPLOADED`
- - Returns: `document_id`, `filename`, `size`, `status`
-
-2. **Confirm Documents** (Two-phase mode - Step 2)
- - Endpoint: `POST /api/v1/collections/{collection_id}/documents/confirm`
- - Function: Confirm uploaded documents, trigger index building
- - Parameters: `document_ids` array
- - Returns: `confirmed_count`, `failed_count`, `failed_documents`
-
-3. **One-step Upload** (Legacy mode, backward compatible)
- - Endpoint: `POST /api/v1/collections/{collection_id}/documents`
- - Function: Upload and directly add to knowledge base, status directly to `PENDING`
- - Supports batch upload
-
-### Phase 1: File Upload and Temporary Storage
-
-#### 1.1 Upload Flow
-
-```
-User selects files
- │
- ▼
-Frontend calls upload API
- │
- ▼
-View layer validates identity and params
- │
- ▼
-Service layer processes business logic:
- │
- ├─► Verify collection exists and active
- │
- ├─► Validate file type and size
- │
- ├─► Read file content
- │
- ├─► Calculate SHA-256 hash
- │
- └─► Transaction processing:
- │
- ├─► Duplicate detection (by filename + hash)
- │ ├─ Exact match: Return existing doc (idempotent)
- │ ├─ Same name, different content: Throw conflict error
- │ └─ New document: Continue creation
- │
- ├─► Create Document record (status=UPLOADED)
- │
- ├─► Upload to object store
- │ └─ Path: user-{user_id}/{collection_id}/{document_id}/original{suffix}
- │
- └─► Update document metadata (object_path)
-```
-
-#### 1.2 File Validation
-
-**Supported File Types**:
-- Documents: `.pdf`, `.doc`, `.docx`, `.ppt`, `.pptx`, `.xls`, `.xlsx`
-- Text: `.txt`, `.md`, `.html`, `.json`, `.xml`, `.yaml`, `.yml`, `.csv`
-- Images: `.png`, `.jpg`, `.jpeg`, `.gif`, `.bmp`, `.tiff`, `.tif`
-- Audio: `.mp3`, `.wav`, `.m4a`
-- Archives: `.zip`, `.tar`, `.gz`, `.tgz`
-
-**Size Limits**:
-- Default: 100 MB (configurable via `MAX_DOCUMENT_SIZE` environment variable)
-- Extracted total size: 5 GB (`MAX_EXTRACTED_SIZE`)
-
-#### 1.3 Duplicate Detection Mechanism
-
-Uses **filename + SHA-256 hash** dual detection:
-
-| Scenario | Filename | Hash | System Behavior |
-|----------|----------|------|-----------------|
-| Exact match | Same | Same | Return existing document (idempotent) |
-| Name conflict | Same | Different | Throw `DocumentNameConflictException` |
-| New document | Different | - | Create new document record |
-
-**Advantages**:
-- ✅ Supports idempotent upload: Network retries won't create duplicates
-- ✅ Prevents content conflicts: Same name with different content prompts user
-- ✅ Saves storage space: Same content stored only once
-
-### Phase 2: Temporary Storage Configuration
-
-#### 2.1 Object Storage Types
-
-System supports two object storage backends, switchable via environment variables:
-
-**1. Local Storage (Local filesystem)**
-
-Use cases:
-- Development and testing environments
-- Small-scale deployments
-- Single-machine deployments
-
-Configuration:
-```bash
-# Development environment
-OBJECT_STORE_TYPE=local
-OBJECT_STORE_LOCAL_ROOT_DIR=.objects
-
-# Docker environment
-OBJECT_STORE_TYPE=local
-OBJECT_STORE_LOCAL_ROOT_DIR=/shared/objects
-```
-
-Storage path example:
-```
-.objects/
-└── user-google-oauth2-123456/
- └── col_abc123/
- └── doc_xyz789/
- ├── original.pdf # Original file
- ├── converted.pdf # Converted PDF
- ├── processed_content.md # Parsed Markdown
- ├── chunks/ # Chunked data
- │ ├── chunk_0.json
- │ └── chunk_1.json
- └── images/ # Extracted images
- ├── page_0.png
- └── page_1.png
-```
-
-**2. S3 Storage (Compatible with AWS S3/MinIO/OSS, etc.)**
-
-Use cases:
-- Production environments
-- Large-scale deployments
-- Distributed deployments
-- High availability and disaster recovery needs
-
-Configuration:
-```bash
-OBJECT_STORE_TYPE=s3
-OBJECT_STORE_S3_ENDPOINT=http://127.0.0.1:9000 # MinIO/S3 address
-OBJECT_STORE_S3_REGION=us-east-1 # AWS Region
-OBJECT_STORE_S3_ACCESS_KEY=minioadmin # Access Key
-OBJECT_STORE_S3_SECRET_KEY=minioadmin # Secret Key
-OBJECT_STORE_S3_BUCKET=aperag # Bucket name
-OBJECT_STORE_S3_PREFIX_PATH=dev/ # Optional path prefix
-OBJECT_STORE_S3_USE_PATH_STYLE=true # Set to true for MinIO
-```
-
-#### 2.2 Object Storage Path Rules
-
-**Path Format**:
-```
-{prefix}/user-{user_id}/{collection_id}/{document_id}/{filename}
-```
-
-**Components**:
-- `prefix`: Optional global prefix (S3 only)
-- `user_id`: User ID (`|` replaced with `-`)
-- `collection_id`: Collection ID
-- `document_id`: Document ID
-- `filename`: Filename (e.g., `original.pdf`, `page_0.png`)
-
-**Multi-tenancy Isolation**:
-- Each user has an independent namespace
-- Each collection has an independent storage directory
-- Each document has an independent folder
-
-### Phase 3: Document Confirmation and Index Building
-
-#### 3.1 Confirmation Flow
-
-```
-User clicks "Save to Collection"
- │
- ▼
-Frontend calls confirm API
- │
- ▼
-Service layer processes:
- │
- ├─► Validate collection configuration
- │
- ├─► Check Quota (deduct quota at confirmation stage)
- │
- └─► For each document_id:
- │
- ├─► Verify document status is UPLOADED
- │
- ├─► Update document status: UPLOADED → PENDING
- │
- ├─► Create index records based on collection config:
- │ ├─ VECTOR (Vector index, required)
- │ ├─ FULLTEXT (Full-text index, required)
- │ ├─ GRAPH (Knowledge graph, optional)
- │ ├─ SUMMARY (Document summary, optional)
- │ └─ VISION (Vision index, optional)
- │
- └─► Return confirmation result
- │
- ▼
-Trigger Celery task: reconcile_document_indexes
- │
- ▼
-Background async index building
-```
-
-#### 3.2 Quota Management
-
-**Check Timing**:
-- ❌ Not checked during upload phase (temporary storage doesn't consume quota)
-- ✅ Checked during confirmation phase (formal addition consumes quota)
-
-**Quota Types**:
-
-1. **User Global Quota**
- - `max_document_count`: Total document count limit per user
- - Default: 1000 (configurable via `MAX_DOCUMENT_COUNT`)
-
-2. **Per-Collection Quota**
- - `max_document_count_per_collection`: Document count limit per collection
- - Excludes `UPLOADED` and `DELETED` status documents
-
-**Quota Exceeded Handling**:
-- Throws `QuotaExceededException`
-- Returns HTTP 400 error
-- Includes current usage and quota limit information
-
-### Phase 4: Document Parsing and Format Conversion
-
-#### 4.1 Parser Architecture
-
-System uses a **multi-parser chain invocation** architecture, where each parser handles specific file types:
-
-```
-DocParser (Main Controller)
- │
- ├─► MinerUParser
- │ └─ Function: High-precision PDF parsing (commercial API)
- │ └─ Supports: .pdf
- │
- ├─► ImageParser
- │ └─ Function: Image content recognition (OCR + vision understanding)
- │ └─ Supports: .jpg, .png, .gif, .bmp, .tiff
- │
- ├─► AudioParser
- │ └─ Function: Audio transcription (Speech-to-Text)
- │ └─ Supports: .mp3, .wav, .m4a
- │
- └─► MarkItDownParser (Fallback)
- └─ Function: Universal document to Markdown conversion
- └─ Supports: Almost all common formats
-```
-
-#### 4.2 Parser Configuration
-
-**Configuration Method**: Dynamically controlled via Collection Config
-
-```json
-{
- "parser_config": {
- "use_mineru": false, // Enable MinerU (requires API Token)
- "use_markitdown": true, // Enable MarkItDown (default)
- "mineru_api_token": "xxx" // MinerU API Token (optional)
- }
-}
-```
-
-**Environment Variable Configuration**:
-```bash
-USE_MINERU_API=false # Globally enable MinerU
-MINERU_API_TOKEN=your_token # MinerU API Token
-```
-
-#### 4.3 Parsing Flow
-
-```
-Celery Worker receives indexing task
- │
- ▼
-1. Download original file from object store
- │
- ▼
-2. Select Parser based on file extension
- │
- ├─► Try first matching Parser
- │ ├─ Success: Return parsing result
- │ └─ Failure: FallbackError → Try next Parser
- │
- └─► Final fallback: MarkItDownParser
- │
- ▼
-3. Parsing result (Parts):
- │
- ├─► MarkdownPart: Text content
- │ └─ Contains: headings, paragraphs, lists, tables, etc.
- │
- ├─► PdfPart: PDF file
- │ └─ For: linearization, page rendering
- │
- └─► AssetBinPart: Binary resources
- └─ Contains: images, embedded files, etc.
- │
- ▼
-4. Post-processing:
- │
- ├─► PDF pages to images (required for Vision index)
- │ └─ Each page rendered as PNG image
- │ └─ Saved to {document_path}/images/page_N.png
- │
- ├─► PDF linearization (speed up browser loading)
- │ └─ Use pikepdf to optimize PDF structure
- │ └─ Saved to {document_path}/converted.pdf
- │
- └─► Extract text content (plain text)
- └─ Merge all MarkdownPart content
- └─ Saved to {document_path}/processed_content.md
- │
- ▼
-5. Save to object store
-```
-
-#### 4.4 Format Conversion Examples
-
-**Example 1: PDF Document**
-```
-Input: user_manual.pdf (5 MB)
- │
- ▼
-Parser selection: MinerUParser / MarkItDownParser
- │
- ▼
-Output Parts:
- ├─ MarkdownPart: "# User Manual\n\n## Chapter 1\n..."
- └─ PdfPart:
- │
- ▼
-Post-processing:
- ├─ Render 50 pages to images → images/page_0.png ~ page_49.png
- ├─ Linearize PDF → converted.pdf
- └─ Extract text → processed_content.md
-```
-
-**Example 2: Image File**
-```
-Input: screenshot.png (2 MB)
- │
- ▼
-Parser selection: ImageParser
- │
- ▼
-Output Parts:
- ├─ MarkdownPart: "[OCR extracted text]"
- └─ AssetBinPart: (vision_index=true)
- │
- ▼
-Post-processing:
- └─ Save original image copy → images/file.png
-```
-
-**Example 3: Audio File**
-```
-Input: meeting_record.mp3 (50 MB)
- │
- ▼
-Parser selection: AudioParser
- │
- ▼
-Output Parts:
- └─ MarkdownPart: "[Transcribed meeting content]"
- │
- ▼
-Post-processing:
- └─ Save transcription text → processed_content.md
-```
-
-### Phase 5: Index Building
-
-#### 5.1 Index Types and Functions
-
-| Index Type | Required | Function Description | Storage Location |
-|-----------|----------|---------------------|------------------|
-| **VECTOR** | ✅ Required | Vector retrieval, semantic search | Qdrant / Elasticsearch |
-| **FULLTEXT** | ✅ Required | Full-text search, keyword search | Elasticsearch |
-| **GRAPH** | ❌ Optional | Knowledge graph, entity & relation extraction | Neo4j / PostgreSQL |
-| **SUMMARY** | ❌ Optional | Document summary, LLM generated | PostgreSQL (index_data) |
-| **VISION** | ❌ Optional | Vision understanding, image content analysis | Qdrant (vectors) + PG (metadata) |
-
-#### 5.2 Index Building Flow
-
-```
-Celery Worker: reconcile_document_indexes task
- │
- ▼
-1. Scan DocumentIndex table, find indexes needing processing
- │
- ├─► PENDING status + observed_version < version
- │ └─ Need to create or update index
- │
- └─► DELETING status
- └─ Need to delete index
- │
- ▼
-2. Group by document, process one by one
- │
- ▼
-3. For each document:
- │
- ├─► parse_document (parse document)
- │ ├─ Download original file from object store
- │ ├─ Call DocParser to parse
- │ └─ Return ParsedDocumentData
- │
- └─► For each index type:
- │
- ├─► create_index (create/update index)
- │ │
- │ ├─ VECTOR index:
- │ │ ├─ Document chunking
- │ │ ├─ Generate vectors using Embedding model
- │ │ └─ Write to Qdrant
- │ │
- │ ├─ FULLTEXT index:
- │ │ ├─ Extract plain text content
- │ │ ├─ Chunk by paragraph/section
- │ │ └─ Write to Elasticsearch
- │ │
- │ ├─ GRAPH index:
- │ │ ├─ Extract entities using LightRAG
- │ │ ├─ Extract entity relationships
- │ │ └─ Write to Neo4j/PostgreSQL
- │ │
- │ ├─ SUMMARY index:
- │ │ ├─ Generate summary using LLM
- │ │ └─ Save to DocumentIndex.index_data
- │ │
- │ └─ VISION index:
- │ ├─ Extract image Assets
- │ ├─ Understand image content using Vision LLM
- │ ├─ Generate image description vectors
- │ └─ Write to Qdrant
- │
- └─► Update index status
- ├─ Success: CREATING → ACTIVE
- └─ Failure: CREATING → FAILED
- │
- ▼
-4. Update document overall status
- │
- ├─ All indexes ACTIVE → Document.status = COMPLETE
- ├─ Any index FAILED → Document.status = FAILED
- └─ Some indexes still processing → Document.status = RUNNING
-```
-
-#### 5.3 Document Chunking
-
-**Chunking Strategy**:
-- Recursive character splitting (RecursiveCharacterTextSplitter)
-- Prioritize splitting by natural paragraphs and sections
-- Maintain context overlap
-
-**Chunking Parameters**:
-```json
-{
- "chunk_size": 1000, // Max characters per chunk
- "chunk_overlap": 200, // Overlap characters
- "separators": ["\n\n", "\n", " ", ""] // Separator priority
-}
-```
-
-**Chunking Result Storage**:
-```
-{document_path}/chunks/
- ├─ chunk_0.json: {"text": "...", "metadata": {...}}
- ├─ chunk_1.json: {"text": "...", "metadata": {...}}
- └─ ...
-```
-
-## Database Design
-
-### Table 1: document (Document Metadata)
-
-**Table Structure**:
-
-| Field | Type | Description | Index |
-|-------|------|-------------|-------|
-| `id` | String(24) | Document ID, primary key, format: `doc{random_id}` | PK |
-| `name` | String(1024) | Filename | - |
-| `user` | String(256) | User ID (supports multiple IDPs) | ✅ Index |
-| `collection_id` | String(24) | Collection ID | ✅ Index |
-| `status` | Enum | Document status (see table below) | ✅ Index |
-| `size` | BigInteger | File size (bytes) | - |
-| `content_hash` | String(64) | SHA-256 hash (for deduplication) | ✅ Index |
-| `object_path` | Text | Object store path (deprecated, use doc_metadata) | - |
-| `doc_metadata` | Text | Document metadata (JSON string) | - |
-| `gmt_created` | DateTime(tz) | Creation time (UTC) | - |
-| `gmt_updated` | DateTime(tz) | Update time (UTC) | - |
-| `gmt_deleted` | DateTime(tz) | Deletion time (soft delete) | ✅ Index |
-
-**Unique Constraint**:
-```sql
-UNIQUE INDEX uq_document_collection_name_active
- ON document (collection_id, name)
- WHERE gmt_deleted IS NULL;
-```
-- Within the same collection, active document names cannot be duplicated
-- Deleted documents are excluded from uniqueness check
-
-**Document Status Enum** (`DocumentStatus`):
-
-| Status | Description | When Set | Visibility |
-|--------|-------------|----------|------------|
-| `UPLOADED` | Uploaded to temporary storage | `upload_document` API | Frontend file selection UI |
-| `PENDING` | Waiting for index building | `confirm_documents` API | Document list (processing) |
-| `RUNNING` | Index building in progress | Celery task starts processing | Document list (processing) |
-| `COMPLETE` | All indexes completed | All indexes become ACTIVE | Document list (available) |
-| `FAILED` | Index building failed | Any index fails | Document list (failed) |
-| `DELETED` | Deleted | `delete_document` API | Not visible (soft delete) |
-| `EXPIRED` | Temporary document expired | Scheduled cleanup task | Not visible |
-
-**Document Metadata Example** (`doc_metadata` JSON field):
-```json
-{
- "object_path": "user-xxx/col_xxx/doc_xxx/original.pdf",
- "converted_path": "user-xxx/col_xxx/doc_xxx/converted.pdf",
- "processed_content_path": "user-xxx/col_xxx/doc_xxx/processed_content.md",
- "images": [
- "user-xxx/col_xxx/doc_xxx/images/page_0.png",
- "user-xxx/col_xxx/doc_xxx/images/page_1.png"
- ],
- "parser_used": "MinerUParser",
- "parse_duration_ms": 5420,
- "page_count": 50,
- "custom_field": "value"
-}
-```
-
-### Table 2: document_index (Index Status Management)
-
-**Table Structure**:
-
-| Field | Type | Description | Index |
-|-------|------|-------------|-------|
-| `id` | Integer | Auto-increment ID, primary key | PK |
-| `document_id` | String(24) | Related document ID | ✅ Index |
-| `index_type` | Enum | Index type (see table below) | ✅ Index |
-| `status` | Enum | Index status (see table below) | ✅ Index |
-| `version` | Integer | Index version number | - |
-| `observed_version` | Integer | Processed version number | - |
-| `index_data` | Text | Index data (JSON), e.g., summary content | - |
-| `error_message` | Text | Error message (on failure) | - |
-| `gmt_created` | DateTime(tz) | Creation time | - |
-| `gmt_updated` | DateTime(tz) | Update time | - |
-| `gmt_last_reconciled` | DateTime(tz) | Last reconciliation time | - |
-
-**Unique Constraint**:
-```sql
-UNIQUE CONSTRAINT uq_document_index
- ON document_index (document_id, index_type);
-```
-- Each document has only one record per index type
-
-**Index Type Enum** (`DocumentIndexType`):
-
-| Type | Value | Description | External Storage |
-|------|-------|-------------|------------------|
-| `VECTOR` | "VECTOR" | Vector index | Qdrant / Elasticsearch |
-| `FULLTEXT` | "FULLTEXT" | Full-text index | Elasticsearch |
-| `GRAPH` | "GRAPH" | Knowledge graph | Neo4j / PostgreSQL |
-| `SUMMARY` | "SUMMARY" | Document summary | PostgreSQL (index_data) |
-| `VISION` | "VISION" | Vision index | Qdrant + PostgreSQL |
-
-**Index Status Enum** (`DocumentIndexStatus`):
-
-| Status | Description | When Set |
-|--------|-------------|----------|
-| `PENDING` | Waiting for processing | `confirm_documents` creates index record |
-| `CREATING` | Creating | Celery Worker starts processing |
-| `ACTIVE` | Ready for use | Index building successful |
-| `DELETING` | Marked for deletion | `delete_document` API |
-| `DELETION_IN_PROGRESS` | Deleting | Celery Worker is deleting |
-| `FAILED` | Failed | Index building failed |
-
-**Version Control Mechanism**:
-- `version`: Expected index version (incremented on document update)
-- `observed_version`: Processed version number
-- When `version > observed_version`, triggers index update
-
-**Reconciler**:
-```python
-# Query indexes needing processing
-SELECT * FROM document_index
-WHERE status = 'PENDING'
- AND observed_version < version;
-
-# Update after processing
-UPDATE document_index
-SET status = 'ACTIVE',
- observed_version = version,
- gmt_last_reconciled = NOW()
-WHERE id = ?;
-```
-
-### Table Relationship Diagram
-
-```
-┌─────────────────────────────────┐
-│ collection │
-│ ───────────────────────────── │
-│ id (PK) │
-│ name │
-│ config (JSON) │
-│ status │
-│ ... │
-└────────────┬────────────────────┘
- │ 1:N
- ▼
-┌─────────────────────────────────┐
-│ document │
-│ ───────────────────────────── │
-│ id (PK) │
-│ collection_id (FK) │◄──── Unique constraint: (collection_id, name)
-│ name │
-│ user │
-│ status (Enum) │
-│ size │
-│ content_hash (SHA-256) │
-│ doc_metadata (JSON) │
-│ gmt_created │
-│ gmt_deleted │
-│ ... │
-└────────────┬────────────────────┘
- │ 1:N
- ▼
-┌─────────────────────────────────┐
-│ document_index │
-│ ───────────────────────────── │
-│ id (PK) │
-│ document_id (FK) │◄──── Unique constraint: (document_id, index_type)
-│ index_type (Enum) │
-│ status (Enum) │
-│ version │
-│ observed_version │
-│ index_data (JSON) │
-│ error_message │
-│ gmt_last_reconciled │
-│ ... │
-└─────────────────────────────────┘
-```
-
-## State Machine and Lifecycle
-
-### Document State Transitions
-
-```
- ┌─────────────────────────────────────────────┐
- │ │
- │ ▼
- [Upload] ──► UPLOADED ──► [Confirm] ──► PENDING ──► RUNNING ──► COMPLETE
- │ │
- │ ▼
- │ FAILED
- │ │
- │ ▼
- └──────► [Delete] ──────────────► DELETED
- │
- ┌───────────────────────────────────┘
- │
- ▼
- EXPIRED (Scheduled cleanup of unconfirmed docs)
-```
-
-**Key Transitions**:
-1. **UPLOADED → PENDING**: User clicks "Save to Collection"
-2. **PENDING → RUNNING**: Celery Worker starts processing
-3. **RUNNING → COMPLETE**: All indexes successful
-4. **RUNNING → FAILED**: Any index fails
-5. **Any status → DELETED**: User deletes document
-
-### Index State Transitions
-
-```
- [Create index record] ──► PENDING ──► CREATING ──► ACTIVE
- │
- ▼
- FAILED
- │
- ▼
- ┌──────────► PENDING (retry)
- │
- [Delete request] ────────┼──────────► DELETING ──► DELETION_IN_PROGRESS ──► (record deleted)
- │
- └──────────► (directly delete record, if PENDING/FAILED)
-```
-
-## Async Task Scheduling (Celery)
-
-### Task Definitions
-
-**Main Task**: `reconcile_document_indexes`
-- Trigger timing:
- - After `confirm_documents` API call
- - Scheduled task (every 30 seconds)
- - Manual trigger (admin interface)
-- Function: Scan `document_index` table, process indexes needing reconciliation
-
-**Sub-tasks**:
-- `parse_document_task`: Parse document content
-- `create_vector_index_task`: Create vector index
-- `create_fulltext_index_task`: Create full-text index
-- `create_graph_index_task`: Create knowledge graph index
-- `create_summary_index_task`: Create summary index
-- `create_vision_index_task`: Create vision index
-
-### Task Scheduling Strategy
-
-**Concurrency Control**:
-- Each Worker processes at most N documents simultaneously (default 4)
-- Multiple indexes of each document can be built in parallel
-- Use Celery's `task_acks_late=True` to ensure tasks aren't lost
-
-**Failure Retry**:
-- Maximum 3 retries
-- Exponential backoff (1 min → 5 min → 15 min)
-- Marked as `FAILED` after 3 failures
-
-**Idempotency**:
-- All tasks support repeated execution
-- Use `observed_version` mechanism to avoid duplicate processing
-- Same input produces same output
-
-## Design Features and Advantages
-
-### 1. Two-Phase Commit Design
-
-**Advantages**:
-- ✅ **Better User Experience**: Fast upload response, doesn't block user operations
-- ✅ **Selective Addition**: Can selectively confirm partial files after batch upload
-- ✅ **Reasonable Resource Control**: Unconfirmed documents don't build indexes, don't consume quota
-- ✅ **Failure Recovery Friendly**: Temporary documents can be periodically cleaned up without affecting business
-
-**Status Isolation**:
-```
-Temporary status (UPLOADED):
- - Not counted in quota
- - Doesn't trigger indexing
- - Can be automatically cleaned up
-
-Formal status (PENDING/RUNNING/COMPLETE):
- - Counted in quota
- - Triggers index building
- - Won't be automatically cleaned up
-```
-
-### 2. Idempotency Design
-
-**File-Level Idempotency**:
-- SHA-256 hash deduplication
-- Same file uploaded multiple times returns same `document_id`
-- Avoids storage space waste
-
-**API-Level Idempotency**:
-- `upload_document`: Repeated upload returns existing document
-- `confirm_documents`: Repeated confirmation doesn't create duplicate indexes
-- `delete_document`: Repeated deletion returns success (soft delete)
-
-### 3. Multi-Tenancy Isolation
-
-**Storage Isolation**:
-```
-user-{user_A}/... # User A's files
-user-{user_B}/... # User B's files
-```
-
-**Database Isolation**:
-- All queries filter by `user` field
-- Collection-level permission control (`collection.user`)
-- Soft delete support (`gmt_deleted`)
-
-### 4. Flexible Storage Backend
-
-**Unified Interface**:
-```python
-AsyncObjectStore:
- - put(path, data)
- - get(path)
- - delete_objects_by_prefix(prefix)
-```
-
-**Runtime Switching**:
-- Switch between Local/S3 via environment variables
-- No need to modify business code
-- Supports custom storage backends (just implement the interface)
-
-### 5. Transaction Consistency
-
-**Two-Phase Commit for Database + Object Store**:
-```python
-async with transaction:
- # 1. Create database record
- document = create_document_record()
-
- # 2. Upload to object store
- await object_store.put(path, data)
-
- # 3. Update metadata
- document.doc_metadata = json.dumps(metadata)
-
- # All operations succeed to commit, any failure rolls back
-```
-
-**Failure Handling**:
-- Database record creation fails: Don't upload file
-- File upload fails: Rollback database record
-- Metadata update fails: Rollback previous operations
-
-### 6. Observability
-
-**Audit Logging**:
-- `@audit` decorator records all document operations
-- Includes: user, time, operation type, resource ID
-
-**Task Tracking**:
-- `gmt_last_reconciled`: Last processing time
-- `error_message`: Failure reason
-- Celery task ID: Link log tracing
-
-**Monitoring Metrics**:
-- Document upload rate
-- Index building duration
-- Failure rate statistics
-
-## Performance Optimization
-
-### 1. Async Processing
-
-**Upload Doesn't Block**:
-- Returns immediately after file upload to object store
-- Index building executes asynchronously in Celery
-- Frontend gets progress via polling or WebSocket
-
-### 2. Batch Operations
-
-**Batch Confirmation**:
-```python
-confirm_documents(document_ids=[id1, id2, ..., idN])
-```
-- Process multiple documents in one transaction
-- Batch create index records
-- Reduce database round-trips
-
-### 3. Caching Strategy
-
-**Parsing Result Cache**:
-- Parsed content saved to `processed_content.md`
-- Subsequent index rebuilds can read directly without re-parsing
-
-**Chunking Result Cache**:
-- Chunking results saved to `chunks/` directory
-- Vector index rebuilds can reuse chunking results
-
-### 4. Parallel Index Building
-
-**Multiple Indexes in Parallel**:
-```python
-# VECTOR, FULLTEXT, GRAPH can be built in parallel
-await asyncio.gather(
- create_vector_index(),
- create_fulltext_index(),
- create_graph_index()
-)
-```
-
-## Error Handling
-
-### Common Exceptions
-
-| Exception Type | HTTP Status | Trigger Scenario | Handling Suggestion |
-|---------------|-------------|------------------|---------------------|
-| `ResourceNotFoundException` | 404 | Collection/document doesn't exist | Check if ID is correct |
-| `CollectionInactiveException` | 400 | Collection not active | Wait for collection initialization |
-| `DocumentNameConflictException` | 409 | Same name, different content | Rename file or delete old document |
-| `QuotaExceededException` | 429 | Quota exceeded | Upgrade plan or delete old documents |
-| `InvalidFileTypeException` | 400 | Unsupported file type | Check supported file type list |
-| `FileSizeTooLargeException` | 413 | File too large | Split file or compress |
-
-### Exception Propagation
-
-```
-Service Layer throws exception
- │
- ▼
-View Layer catches and converts
- │
- ▼
-Exception Handler unified handling
- │
- ▼
-Return standard JSON response:
-{
- "error_code": "QUOTA_EXCEEDED",
- "message": "Document count limit exceeded",
- "details": {
- "limit": 1000,
- "current": 1000
- }
-}
-```
-
-## Related Files Index
-
-### Core Implementation
-
-- **View Layer**: `aperag/views/collections.py` - HTTP interface definition
-- **Service Layer**: `aperag/service/document_service.py` - Business logic
-- **Database Models**: `aperag/db/models.py` - Document, DocumentIndex table definitions
-- **Database Operations**: `aperag/db/ops.py` - CRUD operation encapsulation
-
-### Object Storage
-
-- **Interface Definition**: `aperag/objectstore/base.py` - AsyncObjectStore abstract class
-- **Local Implementation**: `aperag/objectstore/local.py` - Local filesystem storage
-- **S3 Implementation**: `aperag/objectstore/s3.py` - S3-compatible storage
-
-### Document Parsing
-
-- **Main Controller**: `aperag/docparser/doc_parser.py` - DocParser
-- **Parser Implementations**:
- - `aperag/docparser/mineru_parser.py` - MinerU PDF parsing
- - `aperag/docparser/mineru_parser.py` - MinerU document parsing
- - `aperag/docparser/markitdown_parser.py` - MarkItDown universal parsing
- - `aperag/docparser/image_parser.py` - Image OCR
- - `aperag/docparser/audio_parser.py` - Audio transcription
-- **Document Processing**: `aperag/index/document_parser.py` - Parsing flow orchestration
-
-### Index Building
-
-- **Index Management**: `aperag/index/manager.py` - DocumentIndexManager
-- **Vector Index**: `aperag/index/vector_index.py` - VectorIndexer
-- **Full-text Index**: `aperag/index/fulltext_index.py` - FulltextIndexer
-- **Knowledge Graph**: `aperag/index/graph_index.py` - GraphIndexer
-- **Document Summary**: `aperag/index/summary_index.py` - SummaryIndexer
-- **Vision Index**: `aperag/index/vision_index.py` - VisionIndexer
-
-### Task Scheduling
-
-- **Task Definitions**: `config/celery_tasks.py` - Celery task registration
-- **Reconciler**: `aperag/tasks/reconciler.py` - DocumentIndexReconciler
-- **Document Tasks**: `aperag/tasks/document.py` - DocumentIndexTask
-
-### Frontend Implementation
-
-- **Document List**: `web/src/app/workspace/collections/[collectionId]/documents/page.tsx`
-- **Document Upload**: `web/src/app/workspace/collections/[collectionId]/documents/upload/document-upload.tsx`
-
-## Summary
-
-ApeRAG's document upload module adopts a **two-phase commit + multi-parser chain invocation + parallel multi-index building** architecture design:
-
-**Core Features**:
-1. ✅ **Two-Phase Commit**: Upload (temporary storage) → Confirm (formal addition), providing better user experience
-2. ✅ **SHA-256 Deduplication**: Prevents duplicate documents, supports idempotent upload
-3. ✅ **Flexible Storage Backend**: Local/S3 configurable switching, unified interface abstraction
-4. ✅ **Multi-Parser Architecture**: Supports MinerU, MarkItDown and other parsers
-5. ✅ **Automatic Format Conversion**: PDF→images, audio→text, images→OCR text
-6. ✅ **Multi-Index Coordination**: Five index types: vector, full-text, graph, summary, vision
-7. ✅ **Quota Management**: Quota deducted at confirmation stage, reasonable resource control
-8. ✅ **Async Processing**: Celery task queue, doesn't block user operations
-9. ✅ **Transaction Consistency**: Two-phase commit for database + object store
-10. ✅ **Observability**: Audit logs, task tracking, complete error information recording
-
-This design ensures both high performance and scalability, supports complex document processing scenarios (multi-format, multi-language, multi-modal), while maintaining good fault tolerance and user experience.
diff --git a/web/docs/en-US/design/graph_index_creation.md b/web/docs/en-US/design/graph_index_creation.md
deleted file mode 100644
index 49ce7f553..000000000
--- a/web/docs/en-US/design/graph_index_creation.md
+++ /dev/null
@@ -1,1070 +0,0 @@
----
-title: Graph Index Creation Process
-description: Complete process and core technologies for ApeRAG knowledge graph index construction
-keywords: Knowledge Graph, Graph Index, Entity Extraction, Relationship Extraction, Concurrency Optimization
-position: 2
----
-
-# Graph Index Creation Process
-
-## 1. What is Graph Index
-
-Graph Index is a core feature of ApeRAG that automatically extracts structured knowledge graphs from unstructured text.
-
-### 1.1 A Simple Example
-
-Imagine you have a document about company organization:
-
-> "John is the head of the database team and specializes in PostgreSQL and MySQL. Mike works in the frontend team and often collaborates with John's team to develop backend management systems."
-
-**Transformation from Document to Knowledge Graph**:
-
-```mermaid
-flowchart LR
- subgraph Input[📄 Input Document]
- Doc["John is the head of the database team,
specializes in PostgreSQL and MySQL.
Mike works in the frontend team..."]
- end
-
- subgraph Process[🔄 Graph Index Processing]
- Extract[Extract entities and relationships]
- end
-
- subgraph Output[🕸️ Knowledge Graph]
- direction TB
- A[John
Person] -->|heads| B[Database Team
Organization]
- A -->|specializes in| C[PostgreSQL
Technology]
- A -->|specializes in| D[MySQL
Technology]
- E[Mike
Person] -->|belongs to| F[Frontend Team
Organization]
- E -->|collaborates| A
- end
-
- Input --> Process
- Process --> Output
-
- style Input fill:#e3f2fd
- style Process fill:#fff59d
- style Output fill:#c8e6c9
-```
-
-Traditional vector search can only find "semantically similar" paragraphs but cannot answer these questions:
-- What does John lead?
-- What is the relationship between John and Mike?
-- What technologies does the database team use?
-
-**Graph Index can do**: Accurately answer these relationship-focused questions by making implicit knowledge relationships explicit.
-
-### 1.2 Core Value
-
-Compared to traditional retrieval methods, Graph Index provides unique capabilities:
-
-| Capability | Vector Search | Full-text Search | Graph Index |
-|------------|---------------|------------------|-------------|
-| Semantic Similarity | ✅ Strong | ❌ Weak | ✅ Strong |
-| Exact Keyword Match | ❌ Weak | ✅ Strong | ✅ Medium |
-| Relationship Query | ❌ Not Supported | ❌ Not Supported | ✅ Strong |
-| Multi-hop Reasoning | ❌ Not Supported | ❌ Not Supported | ✅ Supported |
-| Suitable Questions | "How to optimize performance" | "PostgreSQL config" | "John and Mike's relationship" |
-
-**Core Advantage**: Graph Index allows AI to "understand" the connections between knowledge, not just text similarity.
-
-## 2. What Problems Can Graph Index Solve
-
-Graph Index excels at handling scenarios that require "understanding relationships". Let's look at practical applications.
-
-### 2.1 Enterprise Knowledge Management
-
-**Scenario**: Companies have extensive documentation including organizational structure, project materials, and technical docs.
-
-**Graph Index Value**:
-
-- 📊 **Organizational Relationships**: "Who is on John's team?" → Quickly find team members
-- 🔗 **Collaboration Networks**: "Who has worked with John?" → Discover work networks
-- 🛠️ **Skill Mapping**: "Who is skilled in PostgreSQL?" → Locate technical experts
-- 📁 **Project History**: "Which projects has John participated in?" → Track project experience
-
-**Real Effect**:
-
-```
-Question: "Who leads the database team?"
-Traditional Search: Returns dozens of paragraphs containing "database team" and "lead"
-Graph Index: Directly returns "John" + relevant background information
-```
-
-### 2.2 Research and Learning
-
-**Scenario**: Analyzing academic papers and technical documentation to understand knowledge lineage.
-
-**Graph Index Value**:
-
-- 👥 **Author Networks**: "Who has this author collaborated with?" → Discover research teams
-- 📖 **Citation Relationships**: "What papers does this cite?" → Trace research lineage
-- 🔬 **Technology Evolution**: "How has this technology evolved?" → Understand tech history
-- 💡 **Concept Connections**: "What's the relationship between tech A and B?" → Connect knowledge points
-
-### 2.3 Products and Services
-
-**Scenario**: Product documentation, user manuals, API documentation.
-
-**Graph Index Value**:
-
-- ⚙️ **Feature Dependencies**: "What needs to be configured before enabling feature A?" → Understand dependencies
-- 🔧 **Configuration Relationships**: "Which features does this config affect?" → Avoid misconfigurations
-- 🐛 **Problem Diagnosis**: "What might cause error X?" → Quick troubleshooting
-- 📚 **API Relationships**: "Which APIs are typically used together?" → Learn best practices
-
-### 2.4 Comparison: When to Use Graph Index
-
-Different questions suit different retrieval methods:
-
-| Question Type | Example | Best Solution |
-|--------------|---------|---------------|
-| **Concept Understanding** | "What is RAG?" | Vector Search |
-| **Exact Lookup** | "PostgreSQL config file path" | Full-text Search |
-| **Relationship Query** | "What's John and Mike's relationship?" | Graph Index ✨ |
-| **Multi-hop Reasoning** | "What tech stack does John's team use?" | Graph Index ✨ |
-| **Knowledge Tracing** | "What modules does this feature depend on?" | Graph Index ✨ |
-
-**Best Practice**: ApeRAG supports vector search, full-text search, and graph index simultaneously, intelligently selecting or combining based on question type.
-
-## 3. Construction Process Overview
-
-When you upload a document and enable graph indexing, ApeRAG automatically completes the following steps. Here's a simple overview; details are in later chapters.
-
-### 3.1 Five Key Steps
-
-```mermaid
-flowchart TB
- subgraph Step1["1️⃣ Document Chunking"]
- A1[Original Document] --> A2[Smart Chunking]
- A2 --> A3[Generate Chunks]
- end
-
- subgraph Step2["2️⃣ Entity Relationship Extraction"]
- B1[Chunks] --> B2[Call LLM]
- B2 --> B3[Identify Entities]
- B2 --> B4[Identify Relationships]
- end
-
- subgraph Step3["3️⃣ Connected Component Analysis"]
- C1[Entity Relationship Network] --> C2[BFS Algorithm]
- C2 --> C3[Grouping]
- end
-
- subgraph Step4["4️⃣ Concurrent Merging"]
- D1[Group 1] --> D2[Entity Deduplication]
- D3[Group 2] --> D4[Entity Deduplication]
- D5[Group N] --> D6[Entity Deduplication]
- D2 --> D7[Relationship Aggregation]
- D4 --> D7
- D6 --> D7
- end
-
- subgraph Step5["5️⃣ Multi-storage Writing"]
- E1[Graph Database]
- E2[Vector Database]
- E3[Text Storage]
- end
-
- A3 --> B1
- B3 --> C1
- B4 --> C1
- C3 --> D1
- C3 --> D3
- C3 --> D5
- D7 --> E1
- D7 --> E2
- A3 --> E3
-
- style Step1 fill:#e3f2fd
- style Step2 fill:#fff3e0
- style Step3 fill:#f3e5f5
- style Step4 fill:#e8f5e9
- style Step5 fill:#fce4ec
-```
-
-**Simply put**: Chunk document → Extract entities/relationships → Smart grouping → Concurrent merging → Write to storage.
-
-The entire process is fully automated - you just upload documents, and the system handles everything.
-
-### 3.2 Processing Time Reference
-
-Processing time varies by document size:
-
-| Document Size | Entity Count | Processing Time | Example |
-|--------------|--------------|-----------------|---------|
-| Small (< 5 pages) | ~50 | 10-30 seconds | Company notices, meeting notes |
-| Medium (10-50 pages) | ~200 | 1-3 minutes | Technical docs, product manuals |
-| Large (100+ pages) | ~1000 | 5-15 minutes | Research reports, books |
-
-**Factors**:
-- LLM response speed (main bottleneck)
-- Document complexity (tables, images slow processing)
-- Concurrency settings (configurable for speed)
-
-> 💡 **Tip**: Processing is asynchronous - upload multiple documents and the system processes them in parallel.
-
-### 3.3 Real-time Progress Tracking
-
-You can check document processing progress anytime:
-
-```
-Document Status: Processing
-- ✅ Document Parsing: Complete
-- ✅ Document Chunking: Complete (25 chunks generated)
-- 🔄 Entity Extraction: In Progress (15/25)
-- ⏳ Relationship Extraction: Waiting
-- ⏳ Graph Construction: Waiting
-```
-
-Once processing completes, document status changes to "Active" and graph queries become available.
-
-## 4. Detailed Construction Process
-
-The previous sections covered what graph index does and the overall process. This chapter details the technical implementation of each step.
-
-> 💡 **Reading Tip**: If you only want to understand basic concepts and usage, skip to Chapter 9 for practical applications.
-
-### 4.1 Document Chunking
-
-First step: Split long documents into appropriately sized chunks.
-
-**Why Chunk?**
-- LLMs have input length limits (typically thousands to tens of thousands of tokens)
-- Too large: Extraction quality decreases, LLM may "miss" information
-- Too small: Loses context, can't understand complete semantics
-
-**Smart Chunking Strategy**:
-
-```mermaid
-flowchart LR
- Doc[Long Document] --> Check{Check Size}
- Check -->|< 1200 tokens| Keep[Keep Intact]
- Check -->|> 1200 tokens| Split[Smart Split]
-
- Split --> By1[By Paragraph]
- By1 --> Check2{Still Too Big?}
- Check2 -->|Yes| By2[By Sentence]
- Check2 -->|No| Done[Complete]
- By2 --> Check3{Still Too Big?}
- Check3 -->|Yes| By3[By Character]
- Check3 -->|No| Done
- By3 --> Done
-
- style Doc fill:#e1f5ff
- style Split fill:#ffccbc
- style Done fill:#c5e1a5
-```
-
-**Chunking Parameters**:
-- Default size: 1200 tokens (approximately 800-1000 English words)
-- Overlap size: 100 tokens (ensures context continuity)
-- Priority: Paragraph > Sentence > Character
-
-### 4.2 Entity Relationship Extraction
-
-Use LLM to identify entities and relationships from each chunk.
-
-**Extraction Process**:
-
-```mermaid
-sequenceDiagram
- participant C as Chunk
- participant L as LLM
- participant R as Results
-
- C->>L: "John heads the database team..."
- L->>R: Entities: [John(Person), Database Team(Org)]
- L->>R: Relationships: [John-heads->Database Team]
-
- C->>L: "John specializes in PostgreSQL..."
- L->>R: Entities: [John(Person), PostgreSQL(Tech)]
- L->>R: Relationships: [John-specializes in->PostgreSQL]
-```
-
-**Concurrency Optimization**: Multiple chunks can call LLM simultaneously, default 20 concurrent requests.
-
-### 4.3 Connected Component Analysis
-
-Divide entity relationship network into independent subgraphs for parallel processing.
-
-**Why This Step?**
-
-Tech team entities and finance department entities aren't connected - they can be processed completely in parallel!
-
-```mermaid
-graph LR
- subgraph Component1[Connected Component 1 - Tech Team]
- A1[John] -->|heads| A2[Database Team]
- A1 -->|specializes in| A3[PostgreSQL]
- A4[Mike] -->|collaborates| A1
- end
-
- subgraph Component2[Connected Component 2 - Finance]
- B1[Alice] -->|belongs to| B2[Finance Dept]
- B3[Bob] -->|collaborates| B1
- end
-
- style Component1 fill:#bbdefb
- style Component2 fill:#c5e1a5
-```
-
-**Performance Boost**: 3 independent components = 3x speedup!
-
-### 4.4 Concurrent Merging
-
-Same-name entities need deduplication, same relationships need aggregation.
-
-```mermaid
-flowchart TD
- subgraph Before["Before Merging"]
- A1["John
Database head"]
- A2["John
Specializes in PostgreSQL"]
- A3["John
Leads team"]
- end
-
- Merge[Smart Merge]
-
- subgraph After["After Merging"]
- B1["John
Database team head,
specializes in PostgreSQL,
leads multiple projects"]
- end
-
- A1 --> Merge
- A2 --> Merge
- A3 --> Merge
- Merge --> B1
-
- style Before fill:#ffccbc
- style After fill:#c5e1a5
-```
-
-**Fine-grained Locks**: Only lock entities being merged, others can process concurrently.
-
-### 4.5 Multi-storage Writing
-
-Knowledge graph written to three storage systems:
-
-```mermaid
-flowchart LR
- KG[Knowledge Graph] --> G[Graph Database
Graph Queries]
- KG --> V[Vector Database
Semantic Search]
- KG --> T[Text Storage
Full-text Search]
-
- style KG fill:#e1f5ff
- style G fill:#bbdefb
- style V fill:#c5e1a5
- style T fill:#ffccbc
-```
-
-Different storages support different query types, complementing each other.
-
-## 5. Core Technical Design
-
-This chapter introduces core technical designs including data isolation and concurrency control.
-
-> 💡 **Reading Tip**: These are system architecture and implementation details, mainly for developers and technical decision-makers.
-
-### 5.1 Workspace Data Isolation
-
-Each Collection has an independent namespace for complete data isolation.
-
-**Naming Convention**:
-
-```python
-# Entity naming
-entity:{entity_name}:{workspace}
-# Example
-entity:John:collection_abc123
-
-# Relationship naming
-relationship:{source}:{target}:{workspace}
-# Example
-relationship:John:Database Team:collection_abc123
-```
-
-**Isolation Effect**:
-
-```mermaid
-graph TB
- subgraph Collection_A[Collection A - Company Docs]
- A1[entity:John:A] --> A2[entity:Database Team:A]
- end
-
- subgraph Collection_B[Collection B - School Docs]
- B1[entity:John:B] --> B2[entity:CS Department:B]
- end
-
- style Collection_A fill:#bbdefb
- style Collection_B fill:#c5e1a5
-```
-
-"John" in two Collections is completely independent, no interference!
-
-### 5.2 Stateless Instance Management
-
-Each processing task creates an independent graph index instance, destroyed after completion.
-
-**Lifecycle Management**:
-
-```mermaid
-sequenceDiagram
- participant C as Celery Task
- participant M as Manager
- participant R as Graph Index Instance
- participant S as Storage
-
- C->>M: process_document()
- M->>R: create_instance()
- R->>S: Initialize storage connections
- R->>R: Process document
- R->>S: Write data
- R-->>M: Return results
- M-->>C: Task complete
- Note over R: Instance destroyed, resources released
-```
-
-**Advantages**:
-- ✅ Zero state pollution: Each task independent, no interference
-- ✅ Easy scaling: Can run multiple workers simultaneously
-- ✅ Resource management: Automatic cleanup, no memory leaks
-
-### 5.3 Connected Component Concurrency Optimization
-
-Intelligent concurrent processing through graph topology analysis.
-
-**Algorithm Principle**:
-
-```mermaid
-graph TB
- subgraph Input[Input: Entity Relationship Network]
- I1[Entity 1] --> I2[Entity 2]
- I2 --> I3[Entity 3]
-
- I4[Entity 4] --> I5[Entity 5]
-
- I6[Entity 6]
- end
-
- Algorithm[BFS Algorithm]
-
- subgraph Output[Output: 3 Connected Components]
- O1[Component 1
3 entities]
- O2[Component 2
2 entities]
- O3[Component 3
1 entity]
- end
-
- Input --> Algorithm
- Algorithm --> Output
-
- style Input fill:#ffccbc
- style Algorithm fill:#fff59d
- style Output fill:#c5e1a5
-```
-
-**Performance Boost**: 3 components concurrent processing = 3x speedup!
-
-### 5.4 Fine-grained Concurrency Control
-
-Precise entity-level locking:
-
-**Lock Hierarchy**:
-
-```mermaid
-graph TD
- A[Global Lock - Traditional] -->|Too Coarse| B[All Entities Serial]
-
- C[Entity Lock - ApeRAG] -->|Just Right| D[Lock Only Merging Entities]
-
- style A fill:#ffccbc
- style B fill:#ffccbc
- style C fill:#c5e1a5
- style D fill:#c5e1a5
-```
-
-**Lock Strategy**:
-1. Extraction phase: No locks, fully parallel
-2. Merging phase: Lock only needed entities
-3. Sorted lock acquisition: Prevents deadlock
-
-### 5.5 Smart Summarization
-
-Automatically compress overly long descriptions:
-
-```python
-if len(description) > 2000 tokens:
- summary = await llm_summarize(description)
-else:
- summary = description
-```
-
-**Effect**: Compress 2500 tokens to 200 tokens, retaining core information.
-
-### 5.6 Multi-storage Backend Support
-
-ApeRAG supports two graph databases: Neo4j and PostgreSQL.
-
-**How to Choose?**
-
-| Scenario | Recommended | Reason |
-|----------|-------------|--------|
-| **Small Scale** (< 100K entities) | PostgreSQL | Simple ops, low cost |
-| **Medium Scale** (100K-1M) | PostgreSQL or Neo4j | Based on query complexity |
-| **Large Scale** (> 1M) | Neo4j | Better graph query performance |
-| **Limited Budget** | PostgreSQL | No extra deployment |
-| **Complex Graph Algorithms** | Neo4j | Built-in graph algorithms |
-
-**Switching**:
-
-```bash
-# Use PostgreSQL (default)
-export GRAPH_INDEX_GRAPH_STORAGE=PGOpsSyncGraphStorage
-
-# Use Neo4j
-export GRAPH_INDEX_GRAPH_STORAGE=Neo4JSyncStorage
-```
-
-## 6. Complete Data Flow
-
-The entire graph index construction is a data transformation pipeline, from unstructured text to structured knowledge graph:
-
-```mermaid
-flowchart TD
- A[Original Document] --> B[Clean & Preprocess]
- B --> C[Smart Chunking]
- C --> D[Chunks]
-
- D --> E[LLM Concurrent Extraction]
- E --> F[Original Entity List]
- E --> G[Original Relationship List]
-
- F --> H[Build Adjacency Graph]
- G --> H
- H --> I[BFS Find Connected Components]
- I --> J[Grouped Concurrent Processing]
-
- J --> K[Entity Deduplication]
- J --> L[Relationship Aggregation]
-
- K --> M{Description Length Check}
- M -->|Too Long| N[LLM Summary]
- M -->|Appropriate| O[Keep Original]
- N --> P[Final Entities]
- O --> P
-
- L --> Q{Description Length Check}
- Q -->|Too Long| R[LLM Summary]
- Q -->|Appropriate| S[Keep Original]
- R --> T[Final Relationships]
- S --> T
-
- P --> U[Graph Database]
- P --> V[Vector Database]
- T --> U
- T --> V
- D --> W[Text Storage]
-
- U --> X[Knowledge Graph Complete]
- V --> X
- W --> X
-
- style A fill:#e1f5ff
- style E fill:#fff59d
- style I fill:#f3e5f5
- style J fill:#c5e1a5
- style X fill:#c8e6c9
-```
-
-### Data Transformation Example
-
-A concrete example showing step-by-step data transformation:
-
-**Input Document**:
-
-```text
-John heads the database team and specializes in PostgreSQL and MySQL.
-Mike works in the frontend team and often collaborates with John's team to develop backend systems.
-Alice is an accountant in the finance department, responsible for financial reports.
-```
-
-**Step 1: Chunking**
-
-```json
-[
- {
- "chunk_id": "chunk-001",
- "content": "John heads the database team and specializes in PostgreSQL and MySQL.",
- "tokens": 15
- },
- {
- "chunk_id": "chunk-002",
- "content": "Mike works in the frontend team and often collaborates with John's team...",
- "tokens": 18
- },
- {
- "chunk_id": "chunk-003",
- "content": "Alice is an accountant in the finance department, responsible for financial reports.",
- "tokens": 14
- }
-]
-```
-
-**Step 2: Entity Relationship Extraction**
-
-```json
-{
- "entities": [
- {"name": "John", "type": "Person", "source": "chunk-001"},
- {"name": "Database Team", "type": "Organization", "source": "chunk-001"},
- {"name": "PostgreSQL", "type": "Technology", "source": "chunk-001"},
- {"name": "MySQL", "type": "Technology", "source": "chunk-001"},
- {"name": "Mike", "type": "Person", "source": "chunk-002"},
- {"name": "Frontend Team", "type": "Organization", "source": "chunk-002"},
- {"name": "Alice", "type": "Person", "source": "chunk-003"},
- {"name": "Finance Department", "type": "Organization", "source": "chunk-003"}
- ],
- "relationships": [
- {"source": "John", "target": "Database Team", "relation": "heads"},
- {"source": "John", "target": "PostgreSQL", "relation": "specializes in"},
- {"source": "John", "target": "MySQL", "relation": "specializes in"},
- {"source": "Mike", "target": "Frontend Team", "relation": "belongs to"},
- {"source": "Mike", "target": "John", "relation": "collaborates"},
- {"source": "Alice", "target": "Finance Department", "relation": "belongs to"}
- ]
-}
-```
-
-**Step 3: Connected Component Analysis**
-
-```
-Connected Component 1 (Technical Department):
-- Entities: John, Mike, Database Team, Frontend Team, PostgreSQL, MySQL
-- Relationships: 6
-
-Connected Component 2 (Finance Department):
-- Entities: Alice, Finance Department
-- Relationships: 1
-```
-
-**Step 4: Concurrent Merging**
-
-Two components can process in parallel!
-
-**Step 5: Final Knowledge Graph**
-
-```mermaid
-graph LR
- subgraph Technical
- John -->|heads| DatabaseTeam[Database Team]
- John -->|specializes in| PostgreSQL
- John -->|specializes in| MySQL
- Mike -->|belongs to| FrontendTeam[Frontend Team]
- Mike -->|collaborates| John
- end
-
- subgraph Finance
- Alice -->|belongs to| FinanceDept[Finance Department]
- end
-
- style Technical fill:#bbdefb
- style Finance fill:#c5e1a5
-```
-
-### Performance Optimization Features
-
-1. **Fine-grained Concurrency Control**
- - Entity-level locks: `entity:John:collection_abc`
- - Lock only during merging, fully parallel during extraction
-
-2. **Connected Component Concurrency**
- - Technical and Finance departments can process in parallel
- - Zero lock contention, full multi-core CPU utilization
-
-3. **Smart Summarization**
- - Description < 2000 tokens: Keep original
- - Description > 2000 tokens: LLM summary compression
-
-## 7. Performance Optimization Strategies
-
-### 7.1 Concurrency Control
-
-Graph index construction involves extensive LLM calls and database operations requiring proper concurrency control.
-
-**Concurrency Hierarchy**:
-
-```mermaid
-graph TB
- A[Document-level Concurrency] --> B[Chunk-level Concurrency]
- B --> C[Component-level Concurrency]
- C --> D[Entity-level Concurrency]
-
- A1[Celery Workers
Multiple docs simultaneously] --> A
- B1[LLM Concurrent Calls
Multiple chunks simultaneously] --> B
- C1[Parallel Component Merging
Multiple components simultaneously] --> C
- D1[Concurrent Entity Merging
Different entities simultaneously] --> D
-
- style A fill:#e3f2fd
- style B fill:#fff3e0
- style C fill:#f3e5f5
- style D fill:#e8f5e9
-```
-
-**Concurrency Parameters**:
-
-| Parameter | Default | Description |
-|-----------|---------|-------------|
-| `llm_model_max_async` | 20 | LLM concurrent calls |
-| `embedding_func_max_async` | 16 | Embedding concurrent calls |
-| `max_batch_size` | 32 | Batch processing size |
-
-**Tuning Recommendations**:
-
-```python
-# Scenario 1: Strict LLM API rate limits
-llm_model_max_async = 5 # Reduce concurrency to avoid rate limiting
-
-# Scenario 2: Sufficient performance, want speedup
-llm_model_max_async = 50 # Increase concurrency to speed up processing
-
-# Scenario 3: Limited memory
-max_batch_size = 16 # Reduce batch size to lower memory usage
-```
-
-### 7.2 LLM Call Optimization
-
-LLM calls are the most time-consuming part, main optimization strategies:
-
-1. **Concurrent Calls**: Multiple chunks extract simultaneously (default 20 concurrent)
-2. **Batch Processing**: Reduce LLM call count
-3. **Cache Reuse**: Reuse summary results for similar descriptions
-
-**Performance Boost**: Concurrent calling is 4x faster than serial.
-
-### 7.3 Storage Optimization
-
-Batch writing significantly improves performance:
-
-| Method | 100 Entity Write Time |
-|--------|---------------------|
-| Individual Write | ~10 seconds |
-| Batch Write (32/batch) | ~1 second |
-
-**Optimization Effect**: 10x speedup!
-
-### 7.4 Memory Optimization
-
-Memory management strategies for large documents:
-
-- Stream chunking: Don't load entire document at once
-- Immediate release: Free memory immediately after processing
-- Batch processing: Control memory peaks
-
-### 7.5 Performance Monitoring
-
-System outputs detailed performance statistics:
-
-```
-Graph Index Construction Complete:
-✓ Document Chunking: 10 chunks, 0.5 seconds
-✓ Entity Extraction: 120 entities, 25 seconds
-✓ Relationship Extraction: 85 relationships, 25 seconds
-✓ Concurrent Merging: 15 seconds
-✓ Storage Writing: 2 seconds
-━━━━━━━━━━━━━━━━━━━━━━━━━
-Total: 42.7 seconds
-```
-
-**Bottleneck Analysis**: Entity/relationship extraction takes 60% of time, can optimize by increasing LLM concurrency.
-
-## 8. Configuration Parameters
-
-### 8.1 Core Configuration
-
-Graph index construction can be tuned with the following parameters:
-
-**Chunking Parameters**:
-
-```python
-# Chunk size (tokens)
-CHUNK_TOKEN_SIZE = 1200
-
-# Overlap size (tokens)
-CHUNK_OVERLAP_TOKEN_SIZE = 100
-```
-
-**Tuning Recommendations**:
-- Small docs (< 5000 tokens): `CHUNK_TOKEN_SIZE = 800`
-- Large docs (> 50000 tokens): `CHUNK_TOKEN_SIZE = 1500`
-- Need more context: Increase `CHUNK_OVERLAP_TOKEN_SIZE`
-
-**Concurrency Parameters**:
-
-```python
-# LLM concurrent calls
-LLM_MODEL_MAX_ASYNC = 20
-
-# Embedding concurrent calls
-EMBEDDING_FUNC_MAX_ASYNC = 16
-
-# Batch processing size
-MAX_BATCH_SIZE = 32
-```
-
-**Tuning Recommendations**:
-- Strict LLM API limits: Lower `LLM_MODEL_MAX_ASYNC` to 5-10
-- Sufficient performance for speedup: Increase to 50-100
-- Limited memory: Lower `MAX_BATCH_SIZE` to 16
-
-**Entity Extraction Parameters**:
-
-```python
-# Entity extraction retry count (0 = extract once only)
-ENTITY_EXTRACT_MAX_GLEANING = 0
-
-# Summary max tokens
-SUMMARY_TO_MAX_TOKENS = 2000
-
-# Force summary description fragment count
-FORCE_LLM_SUMMARY_ON_MERGE = 10
-```
-
-**Tuning Recommendations**:
-- Extraction quality important: `ENTITY_EXTRACT_MAX_GLEANING = 1` (extract twice)
-- Speed priority: `ENTITY_EXTRACT_MAX_GLEANING = 0`
-- Descriptions often long: Lower `SUMMARY_TO_MAX_TOKENS` to 1000
-
-### 8.2 Knowledge Graph Configuration
-
-Configure in Collection settings:
-
-```json
-{
- "knowledge_graph_config": {
- "language": "English",
- "entity_types": [
- "organization",
- "person",
- "geo",
- "event",
- "product",
- "technology",
- "date",
- "category"
- ]
- }
-}
-```
-
-**Parameter Description**:
-
-- **language**: Extraction language, affects LLM prompts
- - `English`: English
- - `simplified chinese`: Simplified Chinese
- - `traditional chinese`: Traditional Chinese
-
-- **entity_types**: Entity types to extract
- - Default: 8 types (organization, person, location, event, product, technology, date, category)
- - Customizable: e.g., extract only people and organizations
-
-### 8.3 Storage Configuration
-
-Configure storage backends via environment variables:
-
-```bash
-# KV storage (key-value)
-export GRAPH_INDEX_KV_STORAGE=PGOpsSyncKVStorage
-
-# Vector storage
-export GRAPH_INDEX_VECTOR_STORAGE=PGOpsSyncVectorStorage
-
-# Graph storage
-export GRAPH_INDEX_GRAPH_STORAGE=Neo4JSyncStorage
-# Or use PostgreSQL
-export GRAPH_INDEX_GRAPH_STORAGE=PGOpsSyncGraphStorage
-```
-
-**Storage Selection Recommendations**:
-
-| Scenario | KV Storage | Vector Storage | Graph Storage |
-|----------|-----------|----------------|---------------|
-| **Default** | PostgreSQL | PostgreSQL | PostgreSQL |
-| **High-performance Vector Search** | PostgreSQL | Qdrant | Neo4j |
-| **Large-scale Graph** | PostgreSQL | Qdrant | Neo4j |
-| **Simple Deployment** | PostgreSQL | PostgreSQL | PostgreSQL |
-
-### 8.4 Complete Configuration Example
-
-```bash
-# Chunking configuration
-export CHUNK_TOKEN_SIZE=1200
-export CHUNK_OVERLAP_TOKEN_SIZE=100
-
-# Concurrency configuration
-export LLM_MODEL_MAX_ASYNC=20
-export MAX_BATCH_SIZE=32
-
-# Extraction configuration
-export ENTITY_EXTRACT_MAX_GLEANING=0
-export SUMMARY_TO_MAX_TOKENS=2000
-
-# Storage configuration
-export GRAPH_INDEX_KV_STORAGE=PGOpsSyncKVStorage
-export GRAPH_INDEX_VECTOR_STORAGE=PGOpsSyncVectorStorage
-export GRAPH_INDEX_GRAPH_STORAGE=PGOpsSyncGraphStorage
-
-# Database connection (PostgreSQL)
-export POSTGRES_HOST=127.0.0.1
-export POSTGRES_PORT=5432
-export POSTGRES_DB=aperag
-export POSTGRES_USER=postgres
-export POSTGRES_PASSWORD=your_password
-
-# Database connection (Neo4j, optional)
-export NEO4J_HOST=127.0.0.1
-export NEO4J_PORT=7687
-export NEO4J_USERNAME=neo4j
-export NEO4J_PASSWORD=your_password
-```
-
-## 9. Practical Application Scenarios
-
-Graph index is particularly suitable for these scenarios:
-
-### 9.1 Enterprise Knowledge Base
-
-**Scenario**: Companies have extensive documentation including organizational structure, project materials, technical docs.
-
-**Graph Index Value**:
-
-- 📊 **Organizational Relationships**: "Who is on John's team?" → Quickly find team members
-- 🔗 **Collaboration Networks**: "Who has worked with John?" → Discover work networks
-- 🛠️ **Skill Mapping**: "Who is skilled in PostgreSQL?" → Locate technical experts
-- 📁 **Project History**: "Which projects has John participated in?" → Track project experience
-
-**Real Effect**:
-
-```
-Question: "Who leads the database team?"
-Traditional Search: Returns dozens of paragraphs containing "database team" and "lead"
-Graph Index: Directly returns "John" + relevant background info
-```
-
-### 9.2 Research and Learning
-
-**Scenario**: Analyzing academic papers and technical documentation to understand knowledge lineage.
-
-**Graph Index Value**:
-
-- 👥 **Author Networks**: "Who has this author collaborated with?" → Discover research teams
-- 📖 **Citation Relationships**: "What papers does this cite?" → Trace research lineage
-- 🔬 **Technology Evolution**: "How has this technology evolved?" → Understand tech history
-- 💡 **Concept Connections**: "What's the relationship between tech A and B?" → Connect knowledge points
-
-**Query Examples**:
-
-```
-User: "What research is related to Graph RAG?"
-Graph Index: Query papers --research--> Graph RAG relationships
-Result: Paper A, Paper B, Paper C
-
-User: "Who has an author collaborated with?"
-Graph Index: Query author --collaborates--> other authors relationships
-Result: Collaborator list and collaboration projects
-```
-
-### 9.3 Products and Services
-
-**Scenario**: Product documentation, user manuals, API documentation.
-
-**Graph Index Value**:
-
-- ⚙️ **Feature Dependencies**: "What needs configuration before enabling feature A?" → Understand dependencies
-- 🔧 **Configuration Relationships**: "Which features does this config affect?" → Avoid misconfigurations
-- 🐛 **Problem Diagnosis**: "What might cause error X?" → Quick troubleshooting
-- 📚 **API Relationships**: "Which APIs are typically used together?" → Learn best practices
-
-**Query Examples**:
-
-```
-User: "How to configure graph index?"
-Graph Index: Query config items --affects--> graph index relationships
-Result: GRAPH_INDEX_GRAPH_STORAGE, knowledge_graph_config
-
-User: "What's the difference between Neo4j and PostgreSQL?"
-Graph Index: Query Neo4j, PostgreSQL properties and relationships
-Result: Performance comparison, applicable scenarios, configuration methods
-```
-
-### 9.4 Conversation Scenario Comparison
-
-Let's see how different retrieval methods perform in actual conversations:
-
-**Question: "What's the relationship between John and Mike?"**
-
-| Retrieval Method | Can Answer | Answer Quality |
-|-----------------|-----------|----------------|
-| **Pure Vector Search** | ⚠️ Partial | Finds paragraphs mentioning both, but unclear relationship |
-| **Pure Full-text Search** | ⚠️ Partial | Finds paragraphs containing "John" and "Mike" |
-| **Graph Index** | ✅ Yes | Directly returns: John and Mike have a collaboration relationship |
-
-**Question: "Where is the PostgreSQL config file?"**
-
-| Retrieval Method | Can Answer | Answer Quality |
-|-----------------|-----------|----------------|
-| **Pure Vector Search** | ✅ Yes | Finds relevant config paragraphs |
-| **Pure Full-text Search** | ✅ Yes | Exact match "PostgreSQL" and "config" |
-| **Graph Index** | ✅ Yes | Finds PostgreSQL --config--> file relationships |
-
-**Question: "How to improve system performance?"**
-
-| Retrieval Method | Can Answer | Answer Quality |
-|-----------------|-----------|----------------|
-| **Pure Vector Search** | ✅ Strong | Finds all performance optimization content |
-| **Pure Full-text Search** | ⚠️ Medium | Needs exact keywords "performance", "optimize" |
-| **Graph Index** | ✅ Strong | Finds optimization methods --improves--> performance relationships |
-
-**Best Practice**: Combine multiple retrieval methods!
-
-## 10. Summary
-
-ApeRAG's graph index provides production-grade knowledge graph construction capabilities with high performance, reliability, and scalability.
-
-### Key Features
-
-1. **Workspace data isolation**: Each Collection completely independent, supporting true multi-tenancy
-2. **Stateless architecture**: Each task independent instance, zero state pollution
-3. **Connected component concurrency**: Intelligent concurrency strategy, 2-3x performance boost
-4. **Fine-grained lock management**: Entity-level locks, maximizing concurrency
-5. **Smart summarization**: Automatically compress overly long descriptions, saving storage and improving retrieval efficiency
-6. **Multi-storage support**: Flexible choice between Neo4j or PostgreSQL
-
-### Suitable Scenarios
-
-- ✅ **Enterprise Knowledge Base**: Understanding organizational structure, personnel relationships, project history
-- ✅ **Research Paper Analysis**: Author collaboration networks, citation relationships, research lineage
-- ✅ **Product Documentation**: Feature dependencies, configuration relationships, problem diagnosis
-- ✅ **Any scenario requiring "relationship" understanding**
-
-### Performance
-
-- Process 10,000 entities: approximately 2-5 minutes (depending on LLM speed)
-- Connected component concurrency: 2-3x performance boost
-- Memory usage: approximately 400 MB (10,000 entities)
-- Storage space: approximately 100 MB (10,000 entities)
-
-### Next Steps
-
-After graph index construction completes, you can perform graph queries. ApeRAG supports three graph query modes:
-
-- **Local Mode**: Query local information about an entity
-- **Global Mode**: Query overall relationships and patterns
-- **Hybrid Mode**: Comprehensive queries
-
-For detailed retrieval process, see [System Architecture Documentation](./architecture.md#42-knowledge-graph-query).
-
----
-
-## Related Documentation
-
-- 📋 [System Architecture](./architecture.md) - ApeRAG overall architecture design
-- 📖 [Entity Extraction and Merging Mechanism](./lightrag_entity_extraction_and_merging.md) - Core algorithm details
-- 🔗 [Connected Component Optimization](./connected_components_optimization.md) - Concurrency optimization principles
-- 🌐 [Index Pipeline Architecture](./indexing_architecture.md) - Complete indexing process
diff --git a/web/docs/en-US/development/_category.yaml b/web/docs/en-US/development/_category.yaml
deleted file mode 100644
index 2cea69694..000000000
--- a/web/docs/en-US/development/_category.yaml
+++ /dev/null
@@ -1,2 +0,0 @@
-title: Development
-position: 4
diff --git a/web/docs/en-US/development/development-guide.md b/web/docs/en-US/development/development-guide.md
deleted file mode 100644
index 54bbe4277..000000000
--- a/web/docs/en-US/development/development-guide.md
+++ /dev/null
@@ -1,382 +0,0 @@
-# 🛠️ Development Guide
-
-This guide focuses on setting up a development environment and the development workflow for ApeRAG. This is designed for developers looking to contribute to ApeRAG or run it locally for development purposes.
-
-## 🚀 Development Environment Setup
-
-Follow these steps to set up ApeRAG from source code for development:
-
-### 1. 📂 Clone the Repository and Setup Environment
-
-First, get the source code and configure environment variables:
-
-```bash
-git clone https://github.com/apecloud/ApeRAG.git
-cd ApeRAG
-cp envs/env.template .env
-```
-
-Edit the `.env` file to configure your AI service settings if needed. The default settings work with the local database services started in the next step.
-
-### 2. 📋 System Prerequisites
-
-Before you begin, ensure your system has:
-
-* **Node.js**: Version 20 or higher is recommended for frontend development. [Download Node.js](https://nodejs.org/)
-* **Docker & Docker Compose**: Required for running database services locally. [Download Docker](https://docs.docker.com/get-docker/)
-
-**Note**: Python 3.11 is required but will be automatically managed by `uv` in the next steps.
-
-### 3. 🗄️ Start Database Services
-
-Use Docker Compose to start the essential database services:
-
-```bash
-# Start core databases: PostgreSQL, Redis, Qdrant, Elasticsearch
-make infra-up
-```
-
-This will start all required database services in the background. The default connection settings in your `.env` file are pre-configured to work with these services.
-
-
-Advanced Database Options
-
-```bash
-# Use Neo4j instead of PostgreSQL for graph storage
-make infra-up WITH_NEO4J=1
-```
-
-
-
-### 4. ⚙️ Setup Development Environment
-
-Create Python virtual environment and setup development tools:
-
-```bash
-make env-dev
-```
-
-This command will:
-* Install `uv` if not already available
-* Create a Python 3.11 virtual environment (located in `.venv/`)
-* Install development tools (redocly, openapi-generator-cli, etc.)
-* Install pre-commit hooks for code quality
-* Install addlicense tool for license management
-
-**Activate the virtual environment:**
-```bash
-source .venv/bin/activate
-```
-
-You'll know it's active when you see `(.venv)` in your terminal prompt.
-
-### 5. 📦 Install Dependencies
-
-Install all backend and frontend dependencies:
-
-```bash
-make env-install
-```
-
-This command will:
-* Install all Python backend dependencies from `pyproject.toml` into the virtual environment
-* Install frontend Node.js dependencies using `yarn`
-
-### 6. 🔄 Apply Database Migrations
-
-Setup the database schema:
-
-```bash
-make db-migrate
-```
-
-### 7. ▶️ Start Development Services
-
-Now you can start the development services. Open separate terminal windows/tabs for each service:
-
-**Terminal 1 - Backend API Server:**
-```bash
-make serve-api
-```
-This starts the FastAPI development server at `http://localhost:8000` with auto-reload on code changes.
-
-**Terminal 2 - Celery Worker:**
-```bash
-make serve-worker
-```
-This starts the Celery worker for processing asynchronous background tasks.
-
-**Terminal 3 - Frontend (Optional):**
-```bash
-make serve-web
-```
-This starts the frontend development server at `http://localhost:3000` with hot reload.
-
-### 8. 🌐 Access ApeRAG
-
-With the services running, you can access:
-* **Frontend UI**: http://localhost:3000 (if started)
-* **Backend API**: http://localhost:8000
-* **API Documentation**: http://localhost:8000/docs
-
-### 9. ⏹️ Stopping Services
-
-To stop the development environment:
-
-**Stop Database Services:**
-```bash
-# Stop database services (data preserved)
-make stack-down
-
-# Stop services and remove all data volumes
-make stack-down REMOVE_VOLUMES=1
-```
-
-**Stop Development Services:**
-- Backend API Server: Press `Ctrl+C` in the terminal running `make serve-api`
-- Celery Worker: Press `Ctrl+C` in the terminal running `make serve-worker`
-- Frontend Server: Press `Ctrl+C` in the terminal running `make serve-web`
-
-**Data Management:**
-- `make stack-down` - Stops services but preserves all data (PostgreSQL, Redis, Qdrant, etc.)
-- `make stack-down REMOVE_VOLUMES=1` - Stops services and **⚠️ permanently deletes all data**
-- You can run `make stack-down REMOVE_VOLUMES=1` even after already running `make stack-down`
-
-**Verify Data Removal:**
-```bash
-# Check if volumes still exist
-docker volume ls | grep aperag
-
-# Should return no results after REMOVE_VOLUMES=1
-```
-
-Now you have ApeRAG running locally from source code, ready for development! 🎉
-
-## ❓ Common Development Tasks
-
-### Q: 🔧 How do I add or modify a REST API endpoint?
-
-**Complete workflow:**
-1. Edit OpenAPI specification: `aperag/api/paths/[endpoint-name].yaml`
-2. Regenerate backend models:
- ```bash
- make api-generate-models # This runs merge-openapi internally
- ```
-3. Implement backend view: `aperag/views/[module].py`
-4. Generate frontend TypeScript client:
- ```bash
- make api-generate-sdk # Updates frontend/src/api/
- ```
-5. Test the API:
- ```bash
- make test-all
- # ✅ Check live docs: http://localhost:8000/docs
- ```
-
-### Q: 🗃️ How do I modify database models/schema?
-
-**Database migration workflow:**
-1. Edit SQLModel classes in `aperag/db/models.py`
-2. Generate migration file:
- ```bash
- make db-revision # Creates new migration in migration/versions/
- ```
-3. Apply migration to database:
- ```bash
- make db-migrate # Updates database schema
- ```
-4. Update related code (repositories in `aperag/db/repositories/`, services in `aperag/service/`)
-5. Verify changes:
- ```bash
- make test-all # ✅ Ensure everything works
- ```
-
-### Q: ⚡ How do I add a new feature with background processing?
-
-**Feature implementation workflow:**
-1. Implement feature components:
- - Backend logic: `aperag/[module]/`
- - Async tasks: `aperag/tasks/`
- - Database models: `aperag/db/models.py`
-2. Update API and generate code:
- ```bash
- make db-revision # Generate migration files
- make db-migrate # Apply database changes
- make api-generate-models # Update Pydantic models
- make api-generate-sdk # Update TypeScript client
- ```
-3. Quality assurance:
- ```bash
- make format && make lint && make test-all
- ```
-
-### Q: 🧪 How do I run unit tests and e2e tests?
-
-**Unit Tests (Fast, No External Dependencies):**
-```bash
-# Run all unit tests
-make test-unit
-
-# Run specific test file
-uv run pytest tests/unit_test/test_model_service.py -v
-
-# Run specific test class or function
-uv run pytest tests/unit_test/test_model_service.py::TestModelService::test_get_models -v
-
-# Run tests with coverage
-uv run pytest tests/unit_test/ --cov=aperag --cov-report=html
-```
-
-**E2E Tests (Require Running Services):**
-```bash
-# Setup: Start required services first
-make infra-up # 🗄️ Start databases
-make serve-api # 🚀 Start API server (separate terminal)
-
-# Run all e2e tests
-make test-e2e
-
-# Run specific e2e test modules
-uv run pytest tests/e2e_test/test_chat/ -v
-uv run pytest tests/e2e_test/graphstorage/ -v
-
-# Run with detailed output and no capture
-uv run pytest tests/e2e_test/test_specific.py -v -s
-
-# Performance benchmarks (with timing)
-make test-e2e-perf
-```
-
-**Complete Test Suite:**
-```bash
-# Run everything (unit + e2e)
-make test-all
-
-# Test with different configurations
-make infra-up WITH_NEO4J=1 # Test with Neo4j instead of PostgreSQL
-make test-all
-```
-
-### Q: 🐛 How do I debug failing tests?
-
-**Debugging workflow:**
-1. Run failing test in isolation:
- ```bash
- # Single test with full output
- uv run pytest tests/unit_test/test_failing.py::test_specific_function -v -s
-
- # Stop on first failure
- uv run pytest tests/unit_test/ -x --tb=short
- ```
-2. For e2e test failures, ensure services are running:
- ```bash
- make infra-up # Database services
- make serve-api # API server
- make serve-worker # Background workers (if testing async tasks)
- ```
-3. Use debugging tools:
- ```bash
- # Run with pdb debugger
- uv run pytest tests/unit_test/test_failing.py --pdb
-
- # Capture logs during test
- uv run pytest tests/e2e_test/test_failing.py --log-cli-level=DEBUG
- ```
-4. Fix and retest:
- ```bash
- make format # Auto-fix style issues
- make lint # Check remaining issues
- uv run pytest tests/path/to/fixed_test.py -v # Verify fix
- ```
-
-### Q: 📊 How do I run RAG evaluation and analysis?
-
-**Evaluation workflow:**
-```bash
-# Ensure environment is ready
-make infra-up WITH_NEO4J=1 # Use Neo4j for better graph performance
-make serve-api
-make serve-worker
-
-# Run comprehensive RAG evaluation
-make evaluate # 📊 Runs aperag.evaluation.run module
-
-# 📈 Check evaluation reports in tests/report/
-```
-
-### Q: 📦 How do I update dependencies safely?
-
-**Python dependencies:**
-1. Edit `pyproject.toml` (add/update packages)
-2. Update virtual environment:
- ```bash
- make env-install # Syncs all groups and extras with uv
- make test-all # Verify compatibility
- ```
-
-**Frontend dependencies:**
-1. Edit `frontend/package.json`
-2. Update and test:
- ```bash
- cd frontend && yarn install
- make serve-web # Test frontend compilation
- make api-generate-sdk # Ensure API client still works
- ```
-
-### Q: 🚀 How do I prepare code for production deployment?
-
-**Pre-deployment checklist:**
-1. Code quality validation:
- ```bash
- make format # Auto-fix all style issues
- make lint # Verify no style violations
- make static-check # MyPy type checking
- ```
-2. Comprehensive testing:
- ```bash
- make test-all # All unit + e2e tests
- make test-e2e-perf # Performance benchmarks
- ```
-3. API consistency:
- ```bash
- make api-generate-models # Ensure models match OpenAPI spec
- make api-generate-sdk # Update frontend client
- ```
-4. Database migrations:
- ```bash
- make db-revision # Generate any pending migrations
- ```
-5. Full-stack integration test:
- ```bash
- make stack-up WITH_NEO4J=1 WITH_DOCRAY=1 # Production-like setup
- # Manual testing at http://localhost:3000/web/
- make stack-down
- ```
-
-### Q: 🔄 How do I completely reset my development environment?
-
-**Nuclear reset (destroys all data):**
-```bash
-make stack-down REMOVE_VOLUMES=1 # ⚠️ Stop services + delete ALL data
-make env-clean # 🧹 Clean temporary files
-
-# Restart fresh
-make infra-up # 🗄️ Fresh databases
-make db-migrate # 🔄 Apply all migrations
-make serve-api # 🚀 Start API server
-make serve-worker # ⚡ Start background workers
-```
-
-**Soft reset (preserve data):**
-```bash
-make stack-down # ⏹️ Stop services, keep data
-make infra-up # 🗄️ Restart databases
-make db-migrate # 🔄 Apply any new migrations
-```
-
-**Reset just Python environment:**
-```bash
-rm -rf .venv/ # 🗑️ Remove virtual environment
-make env-dev # ⚙️ Recreate everything
-source .venv/bin/activate # ✅ Reactivate
-```
diff --git a/web/docs/en-US/images/dify/aperag-banner.png b/web/docs/en-US/images/dify/aperag-banner.png
deleted file mode 100644
index 338290248..000000000
Binary files a/web/docs/en-US/images/dify/aperag-banner.png and /dev/null differ
diff --git a/web/docs/en-US/images/dify/step1-subscribe-collection.png b/web/docs/en-US/images/dify/step1-subscribe-collection.png
deleted file mode 100644
index 3a9ede8ad..000000000
Binary files a/web/docs/en-US/images/dify/step1-subscribe-collection.png and /dev/null differ
diff --git a/web/docs/en-US/images/dify/step2-add-mcp.png b/web/docs/en-US/images/dify/step2-add-mcp.png
deleted file mode 100644
index 92eb2154d..000000000
Binary files a/web/docs/en-US/images/dify/step2-add-mcp.png and /dev/null differ
diff --git a/web/docs/en-US/images/dify/step2-api-key.png b/web/docs/en-US/images/dify/step2-api-key.png
deleted file mode 100644
index 555b0b7de..000000000
Binary files a/web/docs/en-US/images/dify/step2-api-key.png and /dev/null differ
diff --git a/web/docs/en-US/images/dify/step2-configure-mcp.png b/web/docs/en-US/images/dify/step2-configure-mcp.png
deleted file mode 100644
index 149787ef9..000000000
Binary files a/web/docs/en-US/images/dify/step2-configure-mcp.png and /dev/null differ
diff --git a/web/docs/en-US/images/dify/step2-mcp-success.png b/web/docs/en-US/images/dify/step2-mcp-success.png
deleted file mode 100644
index 7f91bc07e..000000000
Binary files a/web/docs/en-US/images/dify/step2-mcp-success.png and /dev/null differ
diff --git a/web/docs/en-US/images/dify/step3-create-app.png b/web/docs/en-US/images/dify/step3-create-app.png
deleted file mode 100644
index a41dc7cdf..000000000
Binary files a/web/docs/en-US/images/dify/step3-create-app.png and /dev/null differ
diff --git a/web/docs/en-US/images/dify/step3-select-agent.png b/web/docs/en-US/images/dify/step3-select-agent.png
deleted file mode 100644
index ed3b0c00a..000000000
Binary files a/web/docs/en-US/images/dify/step3-select-agent.png and /dev/null differ
diff --git a/web/docs/en-US/images/dify/step4-configure-agent.png b/web/docs/en-US/images/dify/step4-configure-agent.png
deleted file mode 100644
index ac1ea12af..000000000
Binary files a/web/docs/en-US/images/dify/step4-configure-agent.png and /dev/null differ
diff --git a/web/docs/en-US/images/dify/step4-test-agent.png b/web/docs/en-US/images/dify/step4-test-agent.png
deleted file mode 100644
index 92f90d101..000000000
Binary files a/web/docs/en-US/images/dify/step4-test-agent.png and /dev/null differ
diff --git a/web/docs/en-US/integration/_category.yaml b/web/docs/en-US/integration/_category.yaml
deleted file mode 100644
index a3e10338d..000000000
--- a/web/docs/en-US/integration/_category.yaml
+++ /dev/null
@@ -1,2 +0,0 @@
-title: Integration
-position: 2
diff --git a/web/docs/en-US/integration/dify.md b/web/docs/en-US/integration/dify.md
deleted file mode 100644
index f6595da66..000000000
--- a/web/docs/en-US/integration/dify.md
+++ /dev/null
@@ -1,168 +0,0 @@
----
-title: Integrating ApeRAG with Dify
-description: Quick integration of ApeRAG's Graph RAG capabilities via MCP protocol
-keywords: Dify, ApeRAG, MCP, Graph RAG
----
-
-# Integrating ApeRAG with Dify
-
-ApeRAG is a production-grade RAG platform with multimodal indexing, AI agents, MCP support, and scalable K8s deployment capabilities. It helps users build complex AI applications with **hybrid retrieval**, **multimodal document processing**, and **enterprise-grade management**.
-
-**Core Features**:
-- Unlike "standard" RAG, ApeRAG implements **Graph-RAG**, building knowledge graphs to understand deep relationships between data elements
-- Integrates **MinerU**, designed for complex documents, scientific papers, and financial reports, accurately extracting tables, formulas, and engineering diagrams
-- Full Kubernetes support with built-in **high availability**, **scalability**, and **enterprise-grade management**
-
-## Video Demo
-
-
-
-
-
-## Step 1: Prepare Knowledge Base
-
-Open your ApeRAG web UI (see [Quick Start](../../../../README.md#quick-start); with Docker Compose this is typically http://localhost:3000/web/). Sign in and select or import a knowledge base. This walkthrough uses the Romance of the Three Kingdoms example—click **Subscribe**.
-
-
-

-
-
-## Step 2: Configure MCP Server
-
-### 2.1 Add MCP Server
-
-Go to Dify - Tools - MCP, click Add MCP Server.
-
-
-

-
-
-### 2.2 Fill Configuration
-
-Fill in Server URL: `http://localhost:8000/mcp/` (use `https:///mcp/` if ApeRAG is not local), paste your API Key from ApeRAG, then click Confirm.
-
-
-

-
-
-
-

-
-
-### 2.3 Success
-
-MCP Server added successfully.
-
-
-

-
-
-## Step 3: Create Agent Application
-
-### 3.1 Create App
-
-Go to Dify - Studio, click Create Application.
-
-
-

-
-
-### 3.2 Select Type
-
-Click More Basic Application Types, select **Agent** type, name it, and click Create.
-
-
-

-
-
-## Step 4: Configure Agent
-
-Click Agent, input Prompt, add the ApeRAG MCP tool, select the LLM in the top-right corner, click Publish to use.
-
-
-

-
-
-
-

-
-
-### Prompt Reference
-
-```markdown
-# ApeRAG Smart Assistant
-
-You are an advanced AI research assistant powered by ApeRAG's hybrid search capabilities. Your mission is to help users accurately and autonomously find, understand, and synthesize information from knowledge bases and the web.
-
-## Core Behaviors
-
-**Autonomous Research**: Work independently until user queries are fully resolved. Search multiple sources, analyze findings, and provide comprehensive answers without waiting for permission.
-
-**Language Intelligence**: Always respond in the language the user asks in. When users ask in Chinese, respond in Chinese regardless of source language.
-
-**Visual Thinking**: **[Critical]** You are an assistant that prefers visual explanations. For any information involving entity relationships, processes, or structures, you must prioritize visualization.
-
-**Complete Solutions**: Explore from multiple angles, cross-validate sources, and ensure comprehensive coverage before responding.
-
-## Search Strategy
-
-### Priority System
-1. **User-specified knowledge base** (mentioned via "@"): Strictly limit search to specified base
-2. **Unspecified knowledge base**: Autonomously discover and search relevant bases
-3. **Web search** (if enabled): Supplement information
-4. **Clear attribution**: Always cite sources
-
-### Search Execution
-- **Knowledge base search**: Use vector + graph search by default
-- **Result processing logic**:
- 1. Execute search
- 2. **Detect graph data**: Check if search results contain `entities` and `relationships`
- 3. **Mandatory visualization**: If search results contain non-empty entity or relation data, **you must** call the `create_diagram` tool
- 4. **Content filtering**: Ignore irrelevant results
-
-## Available Tools
-
-### Knowledge Management
-- `list_collections()`: Discover available knowledge sources
-- `search_collection(collection_id, query, ...)`: **[Primary tool]** Hybrid search in persistent knowledge bases
-- `search_chat_files(chat_id, query, ...)`: **[Chat only]** Search only files temporarily uploaded in current chat session
-- `create_diagram(content)`: **[Mandatory tool]** When search results contain structured info (entities/relations), must call this tool to generate Mermaid diagrams
-
-### Web Intelligence
-- `web_search(query, ...)`: Multi-engine web search
-- `web_read(url_list, ...)`: Extract and analyze web content
-
-## Response Format
-
-### Direct Answer
-[Clear, actionable answer in user's language]
-
-### Comprehensive Analysis
-[Detailed explanation with context and insights]
-
-### Knowledge Graph Visualization
-[Tool-generated diagram displayed here]
-*(Only show after successfully calling create_diagram. Displays entity relationships from search results.)*
-
-### Supporting Evidence
-- [Knowledge Base Name]: [Key Findings]
-
-**Web Sources** (if enabled):
-- [Title] ([Domain]) - [Key Points]
-```
-
----
-
-Integrating ApeRAG with Dify is very simple. Once integrated, you can not only experience Dify's platform features but also enjoy **ApeRAG's powerful Graph-RAG capabilities**!
-
-**GitHub**: https://github.com/apecloud/ApeRAG
diff --git a/web/docs/en-US/integration/mcp-api.md b/web/docs/en-US/integration/mcp-api.md
deleted file mode 100644
index 798b54374..000000000
--- a/web/docs/en-US/integration/mcp-api.md
+++ /dev/null
@@ -1,333 +0,0 @@
----
-title: MCP API
-description: Model Context Protocol API Documentation
----
-
-# MCP API
-
-ApeRAG provides standardized tool interfaces through [Model Context Protocol (MCP)](https://modelcontextprotocol.io/), allowing AI assistants (Claude Desktop, Cursor, Dify, etc.) to directly access your knowledge bases.
-
-## Quick Start
-
-### Configuration Example
-
-For Claude Desktop, add to configuration file:
-
-```json
-{
- "mcpServers": {
- "aperag": {
- "url": "http://localhost:8000/mcp/",
- "headers": {
- "Authorization": "Bearer your-api-key-here"
- }
- }
- }
-}
-```
-
-### Authentication
-
-Two authentication methods supported (by priority):
-
-1. **HTTP Authorization Header** (Recommended): `Authorization: Bearer your-api-key`
-2. **Environment Variable** (Fallback): `APERAG_API_KEY=your-api-key`
-
-> **Get API Key**: Login to ApeRAG, create or copy your API Key from settings
-
-## Available Tools
-
-### 1. list_collections
-
-List all accessible knowledge bases.
-
-**Parameters**: None
-
-**Returns**:
-```json
-{
- "items": [
- {
- "id": "collection-id",
- "title": "Collection Title",
- "description": "Collection Description"
- }
- ]
-}
-```
-
-### 2. search_collection
-
-Search in knowledge bases with multiple retrieval methods.
-
-**Core Parameters**:
-
-| Parameter | Type | Default | Description |
-|-----------|------|---------|-------------|
-| `collection_id` | string | Required | Knowledge base ID |
-| `query` | string | Required | Search query |
-| `use_vector_index` | bool | true | Vector retrieval (semantic search) |
-| `use_fulltext_index` | bool | true | Full-text retrieval (keyword matching) |
-| `use_graph_index` | bool | true | Graph retrieval (relation query) |
-| `use_summary_index` | bool | true | Summary retrieval |
-| `use_vision_index` | bool | true | Vision retrieval (image search) |
-| `rerank` | bool | true | AI reranking |
-| `topk` | int | 5 | Results per method |
-
-**Return Format**:
-```json
-{
- "query": "your question",
- "items": [
- {
- "rank": 1,
- "score": 0.95,
- "content": "relevant content",
- "source": "document name",
- "recall_type": "vector_search|graph_search|fulltext_search|summary_search",
- "metadata": {
- "page_idx": 0,
- "document_id": "doc-id",
- "collection_id": "col-id",
- "indexer": "text|vision"
- }
- }
- ]
-}
-```
-
-**Image Handling**:
-
-If `metadata.indexer == "vision"`, it's an image:
-- Empty `content`: Retrieved via multimodal vector
-- Non-empty `content`: Contains image description
-
-Image URL format:
-```python
-m = item.metadata
-asset_url = f"asset://{m['asset_id']}?document_id={m['document_id']}&collection_id={m['collection_id']}&mime_type={m['mimetype']}"
-```
-
-**Usage Examples**:
-
-```python
-# Default search (recommended) - all methods enabled
-results = search_collection(
- collection_id="abc123",
- query="How to deploy applications?"
-)
-
-# Vector + Graph only
-results = search_collection(
- collection_id="abc123",
- query="deployment strategies",
- use_vector_index=True,
- use_fulltext_index=False,
- use_graph_index=True,
- use_summary_index=False,
- topk=10
-)
-```
-
-### 3. search_chat_files
-
-Search in temporary files from chat session.
-
-**When to Use**:
-- ✅ User uploaded files in current conversation
-- ✅ Analyzing temporary documents in chat
-- ❌ Don't use for persistent knowledge bases (use `search_collection`)
-
-**Parameters**:
-
-| Parameter | Type | Default | Description |
-|-----------|------|---------|-------------|
-| `chat_id` | string | Required | Chat ID |
-| `query` | string | Required | Search query |
-| `use_vector_index` | bool | true | Vector retrieval |
-| `use_fulltext_index` | bool | true | Full-text retrieval |
-| `rerank` | bool | true | Reranking |
-| `topk` | int | 5 | Results count |
-
-**Return Format**: Same as `search_collection`
-
-### 4. web_search
-
-Search the internet.
-
-**Parameters**:
-
-| Parameter | Type | Default | Description |
-|-----------|------|---------|-------------|
-| `query` | string | "" | Search keywords |
-| `max_results` | int | 5 | Results count |
-| `source` | string | "" | Specific domain (e.g., `vercel.com`) |
-| `timeout` | int | 30 | Timeout (seconds) |
-| `locale` | string | "en-US" | Language locale |
-
-**Usage Patterns**:
-
-```python
-# Regular search
-web_search(query="ApeRAG 2025")
-
-# Site-specific search
-web_search(query="deployment docs", source="vercel.com")
-
-```
-
-### 5. web_read
-
-Read webpage content.
-
-**Parameters**:
-
-| Parameter | Type | Default | Description |
-|-----------|------|---------|-------------|
-| `url_list` | list[str] | Required | URL list |
-| `timeout` | int | 30 | Timeout (seconds) |
-| `max_concurrent` | int | 5 | Max concurrent requests |
-
-**Returns**:
-```json
-{
- "results": [
- {
- "status": "success",
- "url": "https://example.com",
- "title": "Page Title",
- "content": "Extracted text",
- "word_count": 1234
- }
- ]
-}
-```
-
-**Example**:
-```python
-# Read single page
-web_read(url_list=["https://example.com/article"])
-
-# Batch read
-web_read(
- url_list=["https://example.com/page1", "https://example.com/page2"],
- max_concurrent=2
-)
-```
-
-## Practical Examples
-
-### Example 1: Knowledge Base Q&A
-
-```python
-# 1. List all knowledge bases
-collections = list_collections()
-
-# 2. Select a knowledge base
-collection_id = collections.items[0].id
-
-# 3. Search (all methods enabled by default)
-results = search_collection(
- collection_id=collection_id,
- query="How to optimize performance?"
-)
-
-# 4. Process results
-for item in results.items:
- print(f"[{item.recall_type}] {item.content}")
- print(f"Source: {item.source}, Score: {item.score}\n")
-```
-
-### Example 2: Graph Visualization
-
-```python
-# Search with graph retrieval
-results = search_collection(
- collection_id="abc123",
- query="relationship between Liu Bei and Zhuge Liang",
- use_graph_index=True
-)
-
-# Check for graph data
-if results.graph_search and results.graph_search.entities:
- print("Entities:", results.graph_search.entities)
- print("Relationships:", results.graph_search.relationships)
- # Use this data to generate knowledge graph visualization
-```
-
-### Example 3: Hybrid Search (KB + Web)
-
-```python
-# 1. Search web
-web_results = web_search(query="latest AI developments", max_results=3)
-urls = [r.url for r in web_results.results]
-
-# 2. Read web content
-web_content = web_read(url_list=urls)
-
-# 3. Search internal KB
-kb_results = search_collection(
- collection_id="ai-knowledge",
- query="AI development trends"
-)
-
-# 4. Synthesize
-print("=== Web Results ===")
-for r in web_results.results:
- print(f"{r.title}: {r.url}")
-
-print("\n=== Internal Knowledge ===")
-for item in kb_results.items:
- print(f"{item.content[:100]}...")
-```
-
-## Best Practices
-
-### Performance Tips
-
-1. **Reasonable topk**:
- - Too large increases LLM context consumption
- - Too small may miss important information
- - Recommended: 5-10
-
-2. **Selective Retrieval**:
- - Not all queries need full-text search
- - Full-text may return large amounts of text
- - Choose methods based on query type
-
-3. **Timeout Settings**:
- - Graph retrieval may be slow (default 120s)
- - Web search: 30-60s recommended
- - Batch URL read: 60s+ recommended
-
-### Common Issues
-
-**Q: No search results?**
-- Check if collection ID is correct
-- Confirm knowledge base indexing is complete
-- Try different retrieval method combinations
-
-**Q: Graph data empty?**
-- Confirm knowledge base has Graph index enabled
-- Simple documents may not contain obvious entity relationships
-
-**Q: Images not showing?**
-- Check `metadata.indexer == "vision"`
-- Use `asset://` protocol for URL
-- Ensure all required parameters included (asset_id, document_id, collection_id)
-
-## Tool Comparison
-
-| Tool | Purpose | Use Case |
-|------|---------|----------|
-| `list_collections` | List knowledge bases | See available resources |
-| `search_collection` | Search knowledge base | Primary search tool for persistent knowledge |
-| `search_chat_files` | Search chat files | Analyze temporary files uploaded in conversation |
-| `web_search` | Search internet | Get real-time or external information |
-| `web_read` | Read webpage | Extract full webpage content |
-
-## Related Links
-
-- **MCP Protocol**: https://modelcontextprotocol.io/
-- **ApeRAG GitHub**: https://github.com/apecloud/ApeRAG
-- **API Docs**: http://localhost:8000/docs (local deployment)
diff --git a/web/docs/zh-CN/deployment/_category.yaml b/web/docs/zh-CN/deployment/_category.yaml
deleted file mode 100644
index eba379f30..000000000
--- a/web/docs/zh-CN/deployment/_category.yaml
+++ /dev/null
@@ -1,2 +0,0 @@
-title: 部署
-position: 3
diff --git a/web/docs/zh-CN/deployment/build-docker-image.md b/web/docs/zh-CN/deployment/build-docker-image.md
deleted file mode 100644
index a1cd89c5c..000000000
--- a/web/docs/zh-CN/deployment/build-docker-image.md
+++ /dev/null
@@ -1,50 +0,0 @@
----
-title: 构建 Docker 镜像
-description: 如何构建 ApeRAG 容器镜像
----
-
-# 构建指南
-
-本节介绍如何构建 ApeRAG 容器镜像。这主要适用于需要创建自己的构建或部署到"快速开始"中未涵盖的环境的用户。
-
-## 构建容器镜像
-
-项目使用 Docker 和 `make` 命令来构建容器镜像。
-
-* **本地平台构建**:
- 这些命令为您当前机器的架构构建镜像。
- ```bash
- # 为本地平台构建所有必要的镜像
- make build-local
-
- # 仅为本地平台构建后端镜像
- make build-aperag-local
-
- # 仅为本地平台构建前端镜像
- make build-aperag-frontend-local
- ```
-
-* **多平台构建**:
- 这些命令为多种架构(例如 amd64、arm64)构建镜像。这需要设置和配置 Docker Buildx。
- ```bash
- # 为多个平台构建所有必要的镜像
- make build
-
- # 仅为多个平台构建后端镜像
- make build-aperag
-
- # 仅为多个平台构建前端镜像
- make build-aperag-frontend
- ```
- 您可以使用 `PLATFORMS` 变量指定目标平台,例如:
- ```bash
- make build PLATFORMS=linux/amd64,linux/arm64
- ```
-
-## 部署
-
-有关常见的部署方法,请参考主 README 中的"快速开始"部分:
-* [Kubernetes 快速开始](../README-zh.md#kubernetes-部署推荐生产环境)
-* [Docker Compose 快速开始](../README-zh.md#快速开始)
-
-对于自定义部署,您需要调整这些方法或使用构建的容器镜像与您选择的编排平台配合使用。确保所有必需的服务(数据库、后端、前端、Celery worker)都正确配置并能够相互通信。
\ No newline at end of file
diff --git a/web/docs/zh-CN/design/_category.yaml b/web/docs/zh-CN/design/_category.yaml
deleted file mode 100644
index 1211298a0..000000000
--- a/web/docs/zh-CN/design/_category.yaml
+++ /dev/null
@@ -1,2 +0,0 @@
-title: 设计
-position: 1
diff --git a/web/docs/zh-CN/design/architecture.md b/web/docs/zh-CN/design/architecture.md
deleted file mode 100644
index 572e2ed72..000000000
--- a/web/docs/zh-CN/design/architecture.md
+++ /dev/null
@@ -1,850 +0,0 @@
----
-title: 系统架构
-description: ApeRAG 架构设计与核心组件详解
-keywords: ApeRAG, 架构, RAG, 知识图谱, LightRAG
-position: 1
----
-
-# ApeRAG 系统架构
-
-## 1. 什么是 ApeRAG
-
-ApeRAG 是一个**开放的、Agentic 的 Graph RAG 平台**。它不仅仅是一个简单的向量检索系统,而是将知识图谱、多模态检索和智能 Agent 深度融合的生产级解决方案。
-
-传统的 RAG 系统主要依赖向量相似度检索,虽然能找到语义相关的内容,但往往缺乏对知识之间关系的理解。ApeRAG 的核心创新在于:
-
-- **Graph RAG**:从文档中自动提取实体(人物、地点、概念)和关系,构建知识图谱,理解知识之间的关联
-- **Agentic**:内置智能 Agent,能够自主规划、调用工具、多轮对话,提供更智能的问答体验
-- **开放集成**:通过 **RESTful API** 和 **MCP 协议**对外暴露能力,可以轻松集成到 Dify、Claude、Cursor 等外部系统
-
-### 核心优势
-
-与传统 RAG 方案相比,ApeRAG 提供了:
-
-- **更强的文档处理能力**:支持 PDF、Word、Excel 等多种格式,能处理复杂的表格、公式、图片
-- **多种检索方式**:向量检索、全文检索、图谱检索,三者互补,各取所长
-- **知识关联理解**:通过知识图谱理解概念之间的关系,而不仅仅是文本相似度
-- **开放的集成能力**:RESTful API + MCP 协议,可以作为 Dify、Claude Desktop、Cursor 的知识后端
-- **生产级架构**:异步处理、多存储、高并发,可以直接用于生产环境
-
-### 整体架构一览
-
-```mermaid
-graph TB
- User[用户] --> Frontend[Web 前端]
- User --> External[外部系统
Dify/Claude/Cursor]
-
- Frontend --> API[RESTful API]
- External --> MCP[MCP 协议]
-
- API --> DocProcess[文档处理]
- API --> Search[检索服务]
- API --> Agent[Agent 对话]
- MCP --> Search
- MCP --> Agent
-
- DocProcess --> Tasks[异步任务层]
- Tasks --> Storage[存储层]
-
- Search --> Storage
- Agent --> Search
-
- Storage --> PG[(PostgreSQL)]
- Storage --> Qdrant[(Qdrant
向量库)]
- Storage --> ES[(Elasticsearch
全文搜索)]
- Storage --> Neo4j[(Neo4j
图数据库)]
- Storage --> MinIO[(MinIO
文件存储)]
-
- style User fill:#e1f5ff
- style Frontend fill:#bbdefb
- style External fill:#bbdefb
- style API fill:#90caf9
- style MCP fill:#90caf9
- style DocProcess fill:#fff59d
- style Search fill:#fff59d
- style Agent fill:#fff59d
- style Tasks fill:#c5e1a5
- style Storage fill:#ffccbc
-```
-
-## 2. 系统分层架构
-
-ApeRAG 采用清晰的分层设计,每一层各司其职:
-
-```mermaid
-graph TB
- subgraph Layer1[客户端层]
- Web[Web 前端
Next.js]
- Dify[Dify]
- Cursor[Cursor]
- Claude[Claude Desktop]
- end
-
- subgraph Layer2[接口层]
- API[RESTful API
FastAPI]
- MCP[MCP Server
Model Context Protocol]
- end
-
- subgraph Layer3[服务层]
- CollSvc[Collection 服务]
- DocSvc[文档服务]
- SearchSvc[检索服务]
- GraphSvc[图谱服务]
- AgentSvc[Agent 服务]
- end
-
- subgraph Layer4[任务层]
- Celery[Celery Worker
异步任务]
- MinerU[MinerU
文档解析]
- end
-
- subgraph Layer5[存储层]
- PG[(PostgreSQL)]
- Qdrant[(Qdrant)]
- ES[(Elasticsearch)]
- Neo4j[(Neo4j)]
- Redis[(Redis)]
- MinIO[(MinIO)]
- end
-
- Web --> API
- Dify --> MCP
- Cursor --> MCP
- Claude --> MCP
-
- API --> CollSvc
- API --> DocSvc
- API --> SearchSvc
- API --> GraphSvc
- API --> AgentSvc
-
- MCP --> SearchSvc
- MCP --> AgentSvc
-
- CollSvc --> Celery
- DocSvc --> Celery
- GraphSvc --> Celery
-
- Celery --> MinerU
- Celery --> PG
- Celery --> Qdrant
- Celery --> ES
- Celery --> Neo4j
- Celery --> MinIO
-
- SearchSvc --> PG
- SearchSvc --> Qdrant
- SearchSvc --> ES
- SearchSvc --> Neo4j
-
- style Layer1 fill:#e3f2fd
- style Layer2 fill:#f3e5f5
- style Layer3 fill:#fff3e0
- style Layer4 fill:#e8f5e9
- style Layer5 fill:#fce4ec
-```
-
-**各层职责说明**:
-
-- **客户端层**:多种接入方式,Web UI 用于管理,MCP 客户端(Dify、Cursor、Claude 等)用于集成
-- **接口层**:RESTful API(传统 HTTP 接口)和 MCP Server(AI 工具协议)并行提供服务
-- **服务层**:核心业务逻辑,协调各种资源完成具体功能
-- **任务层**:处理耗时操作(文档解析、索引构建),保证 API 快速响应
-- **存储层**:多种存储系统,针对不同数据类型选择最优方案
-
-## 3. 文档处理全流程
-
-这是 ApeRAG 的核心能力之一。从一个 PDF 文件上传,到最终可以被检索,经历了一系列精心设计的处理步骤。
-
-### 3.1 文档上传与解析
-
-当你上传一个文档时,ApeRAG 会自动识别格式并选择合适的解析器:
-
-```mermaid
-flowchart TD
- Upload[用户上传文档] --> Detect[格式检测]
-
- Detect --> |PDF| MinerU[MinerU 解析器]
- Detect --> |Word/Excel| MarkItDown[MarkItDown 解析器]
- Detect --> |Markdown| DirectParse[直接解析]
- Detect --> |图片| OCR[OCR 识别]
-
- MinerU --> Extract[内容提取]
- MarkItDown --> Extract
- DirectParse --> Extract
- OCR --> Extract
-
- Extract --> Parts[文档片段
Parts 对象]
-
- style Upload fill:#e1f5ff
- style Extract fill:#c5e1a5
- style Parts fill:#fff59d
-```
-
-**MinerU 的强大之处**:
-
-- 能准确识别复杂 PDF 的表格结构,保留表格内容的完整性
-- 提取 LaTeX 数学公式,保持公式的可读性
-- 对扫描版 PDF 进行 OCR,支持中英文混排
-- 识别文档中的图片区域,支持图片内容理解
-
-### 3.2 智能分块策略
-
-文档解析后,需要切分成合适大小的块(chunk)。这个步骤很关键,分块太大会影响检索精度,太小会丢失上下文。
-
-```mermaid
-flowchart TD
- Parts[文档片段] --> Rechunk[智能重分块]
-
- Rechunk --> Analysis[分析文档结构]
- Analysis --> Hierarchy[识别标题层级]
- Hierarchy --> Group[按标题分组]
-
- Group --> Check{块大小检查}
- Check --> |过大| Split[语义分割]
- Check --> |合适| Chunks[最终块]
- Split --> Chunks
-
- Chunks --> AddContext[添加上下文]
- AddContext --> FinalChunks[带上下文的文档块]
-
- style Rechunk fill:#bbdefb
- style Split fill:#ffccbc
- style FinalChunks fill:#c5e1a5
-```
-
-**分块策略的特点**:
-
-- **保持语义完整性**:尽量不在句子中间切断
-- **保留标题上下文**:每个块都知道自己属于哪个章节
-- **层级化分割**:先按段落分,不行再按句子分,最后才按字符分
-- **智能合并**:相邻的小标题块会被合并,避免信息碎片化
-
-分块参数配置:
-- 默认块大小:1200 tokens(约 800-1000 个中文字符)
-- 重叠大小:100 tokens(保证上下文连续性)
-
-### 3.3 多索引并行构建
-
-文档分块后,会同时创建多种索引。每种索引有不同的用途,互相补充:
-
-| 索引类型 | 适用场景 | 存储位置 | 检索方式 |
-|---------|---------|---------|---------|
-| **向量索引** | 语义相似问题,比如"如何优化性能" | Qdrant | 余弦相似度 |
-| **全文索引** | 精确关键词搜索,比如"PostgreSQL 配置" | Elasticsearch | BM25 算法 |
-| **图谱索引** | 关系型问题,比如"A 和 B 有什么联系" | PostgreSQL/Neo4j | 图遍历 |
-| **摘要索引** | 快速了解文档概要 | PostgreSQL | 向量匹配 |
-| **视觉索引** | 图片内容搜索 | Qdrant | 多模态向量 |
-
-```mermaid
-flowchart LR
- Chunks[文档块] --> IndexMgr[索引管理器]
-
- IndexMgr --> VectorIdx[向量索引创建]
- IndexMgr --> FulltextIdx[全文索引创建]
- IndexMgr --> GraphIdx[图谱索引创建]
- IndexMgr --> VisionIdx[视觉索引创建]
-
- VectorIdx --> Qdrant1[(Qdrant)]
- FulltextIdx --> ES[(Elasticsearch)]
- GraphIdx --> Graph[(Neo4j/PG)]
- VisionIdx --> Qdrant2[(Qdrant)]
-
- style IndexMgr fill:#fff59d
- style VectorIdx fill:#bbdefb
- style FulltextIdx fill:#c5e1a5
- style GraphIdx fill:#ffccbc
- style VisionIdx fill:#e1bee7
-```
-
-**并行构建的优势**:
-- 不同索引可以同时构建,提高速度
-- 某个索引失败不影响其他索引
-- 可以按需启用特定类型的索引
-
-### 3.4 知识图谱构建
-
-图谱索引是 ApeRAG 的核心特色,它能从文档中提取结构化的知识。
-
-```mermaid
-flowchart TD
- Chunks[文档块] --> EntityExtract[实体提取]
-
- EntityExtract --> LLM1[调用 LLM
识别实体]
- LLM1 --> Entities[实体列表
人物、地点、概念]
-
- Entities --> RelationExtract[关系提取]
- RelationExtract --> LLM2[调用 LLM
识别关系]
- LLM2 --> Relations[关系列表
谁与谁有什么关系]
-
- Entities --> Merge[实体合并]
- Relations --> Merge
-
- Merge --> Components[连通分量分析]
- Components --> Parallel[并行处理各分量]
- Parallel --> Graph[(知识图谱)]
-
- style EntityExtract fill:#bbdefb
- style RelationExtract fill:#c5e1a5
- style Merge fill:#ffccbc
- style Components fill:#fff59d
-```
-
-**图谱构建的关键步骤**:
-
-1. **实体提取**:LLM 从文档块中识别出有意义的实体
- - 示例:从"张三在北京的清华大学学习人工智能"中提取
- - 实体:张三(人物)、北京(地点)、清华大学(组织)、人工智能(概念)
-
-2. **关系提取**:识别实体之间的关系
- - 示例:张三 --学习--> 人工智能,张三 --就读于--> 清华大学
-
-3. **实体合并**:同一实体可能有不同的表述,需要归一化
- - 示例:"LightRAG"、"light rag"、"Light-RAG" → 合并为统一实体
-
-4. **连通分量优化**:把图谱分成独立的子图,并行处理
- - 性能提升:2-3 倍吞吐量
-
-**为什么需要连通分量优化?**
-
-假设你有 100 篇文档,它们讨论不同的主题。关于"数据库"的实体和关于"机器学习"的实体之间没有连接,可以独立处理。连通分量算法会找出这些独立的"知识岛",然后并行处理,大大提高速度。
-
-### 3.5 异步任务系统
-
-文档处理是一个耗时的操作,ApeRAG 采用"双链路架构"来保证用户体验:
-
-```mermaid
-graph TB
- subgraph Frontend["🚀 前端链路 - 快速响应"]
- direction TB
- A1["📤 用户上传文档"] --> A2["🔌 API 接收请求"]
- A2 --> A3["📋 Index Manager"]
- A3 --> A4["💾 写入数据库
status = PENDING
version = 1"]
- A4 --> A5["✅ 立即返回成功
< 100ms"]
- end
-
- subgraph Backend["⚙️ 后端链路 - 异步处理"]
- direction TB
- B1["⏰ Celery Beat
每 30 秒检查"] --> B2["🔍 Reconciler 检测
version ≠ observed_version"]
- B2 --> B3{"🎯 发现待处理任务?"}
- B3 -->|是| B4["🚀 调度 Worker"]
- B3 -->|否| B1
- B4 --> B5["📄 解析文档"]
- B5 --> B6["🔀 并行创建索引
Vector + Fulltext
+ Graph + Vision"]
- B6 --> B7["✨ 更新状态
status = ACTIVE
observed_version = 1"]
- B7 --> B1
- end
-
- A4 -.-|"数据库状态变化"| B2
-
- style Frontend fill:#e3f2fd,stroke:#1976d2,stroke-width:3px
- style Backend fill:#fff3e0,stroke:#f57c00,stroke-width:3px
- style A5 fill:#c8e6c9,stroke:#388e3c,stroke-width:2px
- style B7 fill:#c8e6c9,stroke:#388e3c,stroke-width:2px
- style B3 fill:#fff9c4,stroke:#fbc02d,stroke-width:2px
-```
-
-**双链路的好处**:
-
-- **前端快速响应**:用户上传文档后,API 在 100ms 内返回,不需要等待处理完成
-- **后端异步处理**:真正的处理工作在后台慢慢做,不阻塞用户操作
-- **自动重试**:如果处理失败,系统会自动重试,保证最终成功
-- **状态可查**:用户可以随时查看文档处理进度
-
-**索引状态机**:
-
-```mermaid
-stateDiagram-v2
- [*] --> PENDING: 📤 文档上传
-
- PENDING --> CREATING: 🚀 Reconciler 检测到
开始处理
-
- CREATING --> ACTIVE: ✅ 所有索引创建成功
- CREATING --> FAILED: ❌ 处理失败
-
- FAILED --> CREATING: 🔄 自动重试
(最多 3 次)
- FAILED --> [*]: 💔 超过重试次数
标记为失败
-
- ACTIVE --> CREATING: 🔄 文档更新
重建索引
- ACTIVE --> [*]: 🗑️ 删除文档
-
- note right of PENDING
- version = 1
- observed_version = 0
- end note
-
- note right of CREATING
- 正在处理中
- 可能需要几分钟
- end note
-
- note right of ACTIVE
- version = 1
- observed_version = 1
- 可以被检索
- end note
-```
-
-## 4. 检索问答全流程
-
-有了索引之后,用户就可以提问了。ApeRAG 的检索系统会智能地选择合适的检索策略。
-
-### 4.1 混合检索系统
-
-不同类型的问题适合用不同的检索方式。ApeRAG 会同时使用多种检索策略,然后融合结果:
-
-```mermaid
-flowchart TB
- Query[用户问题] --> Router[检索路由]
-
- Router --> |并行检索| Vector[向量检索]
- Router --> |并行检索| Fulltext[全文检索]
- Router --> |并行检索| Graph[图谱检索]
-
- Vector --> Embed[生成问题向量]
- Embed --> QdrantSearch[Qdrant 相似度搜索]
- QdrantSearch --> R1[结果1]
-
- Fulltext --> ESSearch[Elasticsearch BM25]
- ESSearch --> R2[结果2]
-
- Graph --> GraphQuery[图谱查询
local/global/hybrid]
- GraphQuery --> R3[结果3]
-
- R1 --> Merge[结果融合]
- R2 --> Merge
- R3 --> Merge
-
- Merge --> Rerank[Rerank 重排序]
- Rerank --> Final[最终结果]
-
- style Query fill:#e1f5ff
- style Vector fill:#bbdefb
- style Fulltext fill:#c5e1a5
- style Graph fill:#ffccbc
- style Rerank fill:#fff59d
- style Final fill:#c5e1a5
-```
-
-**检索策略说明**:
-
-- **向量检索**:用于语义相似的问题
- - 问:"如何提升系统性能?"
- - 能找到:"优化数据库查询"、"使用缓存"等相关内容
-
-- **全文检索**:用于精确关键词匹配
- - 问:"PostgreSQL 的配置文件在哪?"
- - 能找到包含"PostgreSQL"和"配置文件"的精确段落
-
-- **图谱检索**:用于关系型问题
- - 问:"LightRAG 和 Neo4j 有什么关系?"
- - 会查询图谱中这两个实体的连接路径
-
-**结果融合策略**:
-
-不同检索方式的结果需要合并。ApeRAG 使用 Rerank 模型对所有候选结果重新打分:
-
-1. 收集所有检索结果(可能有重复)
-2. 去重,保留最相关的片段
-3. 使用 Rerank 模型评估每个片段与问题的相关性
-4. 按新的分数重新排序
-5. 返回 Top-K 结果
-
-### 4.2 知识图谱查询
-
-图谱检索有三种模式,适用于不同类型的问题:
-
-| 模式 | 适用场景 | 查询方式 | 示例问题 |
-|------|---------|---------|---------|
-| **local** | 查询某个实体的局部信息 | 向量匹配相似实体 → 获取邻居节点 | "张三的个人信息" |
-| **global** | 查询整体关系和模式 | 向量匹配相似关系 → 获取关联路径 | "公司的组织架构是怎样的" |
-| **hybrid** | 综合性问题 | local + global 结合 | "张三在公司的角色和职责" |
-
-```mermaid
-flowchart TD
- Question[用户问题] --> Analyze[问题分析]
-
- Analyze --> Local[Local 模式
实体中心]
- Analyze --> Global[Global 模式
关系中心]
- Analyze --> Hybrid[Hybrid 模式
综合查询]
-
- Local --> FindEntity[找到相关实体]
- FindEntity --> GetNeighbors[获取邻居和关系]
-
- Global --> FindRelations[找到相关关系]
- FindRelations --> GetContext[获取关系上下文]
-
- Hybrid --> Local
- Hybrid --> Global
-
- GetNeighbors --> Context[生成上下文]
- GetContext --> Context
-
- Context --> Return[返回给 LLM]
-
- style Local fill:#bbdefb
- style Global fill:#c5e1a5
- style Hybrid fill:#fff59d
-```
-
-**实际例子**:
-
-假设知识图谱中有:
-- 实体:张三(人物)、数据库团队(组织)、PostgreSQL(技术)
-- 关系:张三 --属于--> 数据库团队,张三 --擅长--> PostgreSQL
-
-问题:"张三负责什么?"
-
-1. **Local 模式**:
- - 找到"张三"实体
- - 获取所有直接相连的节点
- - 返回:"张三属于数据库团队,擅长 PostgreSQL"
-
-2. **Global 模式**:
- - 找到相关的关系模式:"负责"、"属于"
- - 返回整个团队的结构和职责分工
-
-3. **Hybrid 模式**:
- - 同时使用上述两种方式
- - 给出更全面的答案
-
-### 4.3 Agent 对话系统
-
-Agent 是 ApeRAG 的智能助手,它能调用各种工具来回答问题。
-
-```mermaid
-sequenceDiagram
- participant User as 用户
- participant API as API Server
- participant Agent as Agent 服务
- participant LLM as LLM 服务
- participant MCP as MCP 工具
- participant Search as 检索服务
-
- User->>API: 发送问题
- API->>Agent: 转发问题
-
- Agent->>LLM: 调用 LLM
携带工具列表
- LLM-->>Agent: 决定调用 search_collection 工具
-
- Agent->>MCP: 执行工具调用
- MCP->>Search: 混合检索
- Search-->>MCP: 返回相关文档片段
- MCP-->>Agent: 工具执行结果
-
- Agent->>LLM: 再次调用 LLM
携带检索到的上下文
- LLM-->>Agent: 生成最终答案
-
- Agent-->>API: 流式返回
- API-->>User: SSE 推送答案
-```
-
-**Agent 的工作流程**:
-
-1. **接收问题**:用户发送一个问题
-
-2. **工具决策**:LLM 分析问题,决定需要调用哪些工具
- - 可能的工具:search_collection(检索知识库)、web_search(搜索网络)、web_read(读取网页)等
-
-3. **执行工具**:Agent 调用对应的工具
- - 示例:search_collection 会触发混合检索,返回相关文档
-
-4. **生成答案**:LLM 基于检索到的上下文生成答案
-
-5. **流式返回**:答案通过 SSE(Server-Sent Events)实时推送给用户,不用等待全部生成完毕
-
-**MCP 协议的作用**:
-
-MCP(Model Context Protocol)是一个标准化的工具协议,让 AI 助手(如 Claude Desktop、Cursor)能够方便地调用 ApeRAG 的能力。通过 MCP,外部 AI 工具可以:
-- 列出你的知识库
-- 搜索知识库内容
-- 读取网页内容
-- 搜索互联网
-
-**对话示例**:
-
-```
-用户:ApeRAG 的图谱索引是怎么工作的?
-
-Agent 思考:需要检索知识库
-↓
-调用工具:search_collection(query="图谱索引工作原理", collection_id="aperag-docs")
-↓
-检索结果:返回关于图谱构建、实体提取、关系抽取的文档片段
-↓
-Agent 回答:ApeRAG 的图谱索引通过以下步骤工作...(基于检索到的内容生成)
-```
-
-## 5. 存储架构
-
-ApeRAG 采用多存储架构,为不同类型的数据选择最合适的存储方案。
-
-### 5.1 存储选型决策
-
-```mermaid
-flowchart TD
- Data["🎯 数据类型分类"] --> Choice{"📊 什么数据?"}
-
- Choice --> |"📋 结构化数据
用户、配置等"| PG["PostgreSQL"]
- Choice --> |"🔢 向量数据
embeddings"| Qdrant["Qdrant"]
- Choice --> |"📝 文本数据
全文搜索"| ES["Elasticsearch"]
- Choice --> |"📁 文件数据
原始文档"| MinIO["MinIO/S3"]
- Choice --> |"🕸️ 图数据
知识图谱"| GraphChoice{"图规模?"}
- Choice --> |"⚡ 缓存数据
临时数据"| Redis["Redis"]
-
- GraphChoice -->|"小规模
< 10万实体
💰 推荐"| PG2["PostgreSQL
内置图存储"]
- GraphChoice -->|"大规模
> 100万实体"| Neo4j["Neo4j
专业图数据库"]
-
- PG --> PGUse["✅ 事务支持
✅ 关系查询
✅ 小规模图存储
✅ 成熟稳定"]
- PG2 --> PG2Use["✅ 无需额外组件
✅ 降低运维成本
✅ 足够应对大多数场景"]
- Qdrant --> QdrantUse["✅ 向量相似度搜索
✅ 高维数据检索
✅ 支持过滤条件"]
- ES --> ESUse["✅ 全文检索 BM25
✅ 关键词搜索
✅ 中文分词 IK"]
- MinIO --> MinIOUse["✅ 大文件存储
✅ S3 协议兼容
✅ 成本低"]
- Neo4j --> Neo4jUse["✅ 大规模图查询
✅ 复杂关系遍历
✅ 图算法支持"]
- Redis --> RedisUse["✅ Celery 任务队列
✅ LLM 调用缓存
✅ 毫秒级访问"]
-
- style Data fill:#e3f2fd,stroke:#1976d2,stroke-width:3px
- style Choice fill:#fff59d,stroke:#fbc02d,stroke-width:3px
- style GraphChoice fill:#fff59d,stroke:#fbc02d,stroke-width:2px
- style PG fill:#bbdefb,stroke:#1976d2,stroke-width:2px
- style PG2 fill:#c8e6c9,stroke:#388e3c,stroke-width:3px
- style Qdrant fill:#c5e1a5,stroke:#689f38,stroke-width:2px
- style ES fill:#ffccbc,stroke:#e64a19,stroke-width:2px
- style MinIO fill:#e1bee7,stroke:#8e24aa,stroke-width:2px
- style Neo4j fill:#f8bbd0,stroke:#c2185b,stroke-width:2px
- style Redis fill:#ffecb3,stroke:#ffa000,stroke-width:2px
-```
-
-### 5.2 数据流向
-
-不同的数据在系统中流转到不同的存储:
-
-```mermaid
-flowchart LR
- Doc[上传文档] --> Parser[解析器]
- Parser --> |原始文件| MinIO[(MinIO)]
- Parser --> |文档元数据| PG1[(PostgreSQL)]
- Parser --> |文档块| Chunks[分块]
-
- Chunks --> |生成向量| Embed[Embedding]
- Embed --> Qdrant[(Qdrant)]
-
- Chunks --> |文本内容| ES[(Elasticsearch)]
-
- Chunks --> |实体关系提取| Graph[图谱构建]
- Graph --> |小规模| PG2[(PostgreSQL)]
- Graph --> |大规模| Neo4j[(Neo4j)]
-
- PG1 -.元数据.-> Cache
- Cache -.缓存.-> Redis[(Redis)]
-
- style Doc fill:#e1f5ff
- style MinIO fill:#e1bee7
- style PG1 fill:#bbdefb
- style PG2 fill:#bbdefb
- style Qdrant fill:#c5e1a5
- style ES fill:#ffccbc
- style Neo4j fill:#f8bbd0
- style Redis fill:#ffecb3
-```
-
-### 5.3 核心存储系统
-
-**PostgreSQL**(主数据库)
-
-存储内容:
-- 用户信息、权限、配置
-- Collection(知识库)元数据
-- 文档元数据和索引状态
-- 对话历史
-- 小规模知识图谱(< 10 万实体)
-
-为什么选择:
-- 强大的事务支持,保证数据一致性
-- 成熟稳定,运维成本低
-- pgvector 扩展,支持向量存储
-- 可以承载小规模图数据,不需要额外的图数据库
-
-**Qdrant**(向量数据库)
-
-存储内容:
-- 文档块的 embedding 向量
-- 实体和关系的向量表示
-- 图片的多模态向量
-
-为什么选择:
-- 专门为向量检索优化,速度快
-- 支持过滤条件,可以结合元数据筛选
-- 支持集群部署,可以水平扩展
-
-**Elasticsearch**(全文搜索)
-
-存储内容:
-- 文档块的文本内容
-- 支持中文分词(IK Analyzer)
-
-为什么选择:
-- BM25 算法对关键词搜索效果好
-- 支持复杂的查询和聚合
-- 自带高亮显示
-
-**MinIO**(对象存储)
-
-存储内容:
-- 原始文档文件(PDF、Word 等)
-- 解析后的中间结果
-- 上传的临时文件
-
-为什么选择:
-- S3 协议兼容,可以替换为云存储
-- 存储成本低
-- 支持大文件
-
-**图数据库选择:PostgreSQL vs Neo4j**
-
-ApeRAG 支持两种图数据库方案:
-
-**PostgreSQL**(默认,推荐用于小规模)
-
-存储内容:
-- 知识图谱(< 10 万实体)
-- 图节点和边的关系数据
-
-推荐理由:
-- 无需额外部署,降低运维成本
-- 性能足够应对大多数场景
-- 事务支持完善,数据一致性有保障
-- 可以和其他业务数据共用一个数据库
-
-**Neo4j**(可选,用于大规模)
-
-存储内容:
-- 大规模知识图谱(> 100 万实体)
-
-什么时候需要:
-- 实体数量超过 10 万,PostgreSQL 查询性能下降
-- 需要复杂的图遍历查询(多跳关系)
-- 需要使用图算法(PageRank、社区发现等)
-
-**总结**:对于大多数企业应用,PostgreSQL 完全够用。只有在知识图谱规模非常大时,才需要考虑 Neo4j。
-
-**Redis**(缓存和队列)
-
-存储内容:
-- Celery 任务队列
-- LLM 调用缓存
-- 用户会话缓存
-
-为什么选择:
-- 速度极快,适合高频访问
-- 支持多种数据结构
-- 可以做任务队列的 Broker
-
-## 6. 技术亮点
-
-### 6.1 无状态 LightRAG 重构
-
-**背景问题**:
-
-原版 LightRAG 使用全局状态,所有任务共享一个实例。这在多用户、多 Collection 的场景下会导致数据混乱和并发冲突。
-
-**ApeRAG 的解决方案**:
-
-- 每个任务创建独立的 LightRAG 实例
-- 通过 `workspace` 参数隔离不同 Collection 的数据
-- 实体命名规范:`entity:{name}:{workspace}`
-- 关系命名规范:`relationship:{src}:{tgt}:{workspace}`
-
-这样,不同用户的图谱数据不会互相干扰,真正实现了多租户隔离。
-
-### 6.2 双链异步架构
-
-**传统做法的问题**:
-
-用户上传文档后,API 需要等待解析、索引构建全部完成才能返回,可能要等几分钟甚至更久。
-
-**双链架构的优势**:
-
-- **前端链路**:API 只负责写状态到数据库,100ms 内返回
-- **后端链路**:Reconciler 定时检测状态变化,调度异步任务
-- **版本控制**:通过 version 和 observed_version 实现幂等性
-- **自动重试**:任务失败后自动重试,保证最终一致性
-
-这个设计灵感来自 Kubernetes 的 Reconciler 模式,非常适合处理长时间运行的任务。
-
-### 6.3 连通分量并发优化
-
-**问题**:
-
-知识图谱构建时,需要合并相似的实体。如果串行处理,速度很慢。如果全部并行,又会有锁竞争问题。
-
-**解决方案**:
-
-使用连通分量算法,把图谱分成多个独立的子图:
-
-1. 构建实体关系邻接表
-2. BFS 遍历找出所有连通分量
-3. 不同分量之间没有连接,可以完全并行处理
-4. 同一分量内部串行处理(避免冲突)
-
-**效果**:
-
-- 性能提升 2-3 倍
-- 零锁竞争
-- 对于多样化的文档集合效果最好
-
-### 6.4 Provider 抽象模式
-
-ApeRAG 支持 100+ 种 LLM 提供商(OpenAI、Claude、Gemini、国产大模型等)。如何统一管理?
-
-**设计思路**:
-
-- 定义统一的 Provider 接口
-- 每个提供商实现自己的 Provider
-- 通过 LiteLLM 库做适配
-
-这样,切换模型只需要改配置,不需要改代码。同样的模式也应用在:
-- Embedding Service(支持多种向量模型)
-- Rerank Service(支持多种重排序模型)
-- Web Search Service(DuckDuckGo、JINA 等)
-
-### 6.5 多模态索引支持
-
-除了文本,ApeRAG 也能处理图片:
-
-**Vision Index 的两条路径**:
-
-1. **纯视觉向量**:使用多模态模型(如 CLIP)直接生成图片向量
-2. **视觉转文本**:使用 VLM 生成图片描述 + OCR 识别文字 → 文本向量化
-
-**融合策略**:
-
-- 文本检索结果和视觉检索结果分开排序
-- 通过 Rerank 模型统一打分
-- 最终合并展示
-
-## 7. 总结
-
-ApeRAG 通过以下设计实现了生产级的 RAG 能力:
-
-**核心优势**:
-- **强大的文档处理**:支持多格式、复杂布局、表格公式
-- **知识图谱融合**:不仅是向量匹配,还能理解知识关联
-- **多种检索方式**:向量、全文、图谱三管齐下
-- **异步架构**:快速响应,后台处理,用户体验好
-- **生产级设计**:多存储、高并发、易扩展
-
-**技术创新**:
-- 无状态 LightRAG,真正的多租户支持
-- 双链异步架构,API 响应 < 100ms
-- 连通分量并发优化,图谱构建快 2-3 倍
-- Provider 抽象,支持 100+ LLM
-
-**适用场景**:
-- 企业知识库搜索
-- 技术文档问答
-- 客服机器人
-- 研究论文分析
-- 任何需要理解文档并提供智能问答的场景
-
-整个系统的设计理念是:**让复杂的事情变简单,让简单的事情变自动**。用户只需要上传文档,剩下的一切都由 ApeRAG 自动完成。
diff --git a/web/docs/zh-CN/design/chat_history_design.md b/web/docs/zh-CN/design/chat_history_design.md
deleted file mode 100644
index b92611e8e..000000000
--- a/web/docs/zh-CN/design/chat_history_design.md
+++ /dev/null
@@ -1,590 +0,0 @@
----
-title: 聊天历史消息数据流程
-description: 详细说明ApeRAG项目中聊天历史消息的完整数据流程,从前端API调用到后端存储的全链路实现
-keywords: [chat, history, message, redis, postgresql, websocket, part-based design]
----
-
-# 聊天历史消息数据流程
-
-## 概述
-
-本文档详细说明ApeRAG项目中聊天历史消息的完整数据流程,从前端API调用到后端存储的全链路实现。
-
-**核心接口**: `GET /api/v1/bots/{bot_id}/chats/{chat_id}`
-
-## 数据流图
-
-```
-┌─────────────────┐
-│ Frontend │
-│ (Next.js) │
-└────────┬────────┘
- │ GET /api/v1/bots/{bot_id}/chats/{chat_id}
- ▼
-┌─────────────────────────────────────────────┐
-│ View Layer │
-│ aperag/views/chat.py │
-│ - get_chat_view() │
-│ - JWT身份验证 │
-│ - 参数验证 │
-└────────┬────────────────────────────────────┘
- │ chat_service_global.get_chat()
- ▼
-┌─────────────────────────────────────────────┐
-│ Service Layer │
-│ aperag/service/chat_service.py │
-│ - get_chat() │
-│ - 业务逻辑编排 │
-└────────┬────────────────────────────────────┘
- │
- ├──────────────┬─────────────┐
- │ │ │
- ▼ ▼ ▼
-┌────────────┐ ┌───────────┐ ┌──────────────┐
-│ PostgreSQL │ │ Redis │ │ PostgreSQL │
-│ chat表 │ │ 消息历史 │ │ feedback表 │
-│(基本信息) │ │(会话内容) │ │(用户反馈) │
-└────────────┘ └───────────┘ └──────────────┘
- │ │ │
- └──────────────┴──────────────────┘
- │
- ▼
- ┌──────────────┐
- │ ChatDetails │
- │ (组装响应) │
- └──────────────┘
-```
-
-## 完整流程详解
-
-### 1. View层 - HTTP请求处理
-
-**文件**: `aperag/views/chat.py`
-
-```python
-@router.get("/bots/{bot_id}/chats/{chat_id}")
-async def get_chat_view(
- request: Request,
- bot_id: str,
- chat_id: str,
- user: User = Depends(required_user)
-) -> view_models.ChatDetails:
- return await chat_service_global.get_chat(str(user.id), bot_id, chat_id)
-```
-
-**职责**:
-- 接收HTTP GET请求
-- JWT Token身份验证
-- 提取路径参数 (bot_id, chat_id)
-- 调用Service层
-- 返回`ChatDetails`响应
-
-### 2. Service层 - 业务逻辑编排
-
-**文件**: `aperag/service/chat_service.py`
-
-```python
-async def get_chat(self, user: str, bot_id: str, chat_id: str) -> view_models.ChatDetails:
- from aperag.utils.history import query_chat_messages
-
- # Step 1: 从PostgreSQL查询Chat基本信息
- chat = await self.db_ops.query_chat(user, bot_id, chat_id)
- if chat is None:
- raise ChatNotFoundException(chat_id)
-
- # Step 2: 从Redis查询聊天消息历史
- messages = await query_chat_messages(user, chat_id)
-
- # Step 3: 构建响应对象(消息中已包含feedback信息)
- chat_obj = self.build_chat_response(chat)
- return ChatDetails(**chat_obj.model_dump(), history=messages)
-```
-
-**核心逻辑**:
-
-1. **查询Chat元数据** (PostgreSQL)
-2. **查询消息历史** (Redis + PostgreSQL反馈信息)
-3. **组装完整响应**
-
-### 3. 数据存储层
-
-#### 3.1 PostgreSQL - Chat基本信息
-
-**表**: `chat`
-
-**文件**: `aperag/db/models.py`
-
-```python
-class Chat(Base):
- __tablename__ = "chat"
-
- id = Column(String(24), primary_key=True) # chat_xxxx
- user = Column(String(256), nullable=False) # 用户ID
- bot_id = Column(String(24), nullable=False) # Bot ID
- title = Column(String(256)) # 会话标题
- peer_type = Column(EnumColumn(ChatPeerType)) # 对话类型
- peer_id = Column(String(256)) # 对话ID
- status = Column(EnumColumn(ChatStatus)) # 状态
- gmt_created = Column(DateTime(timezone=True)) # 创建时间
- gmt_updated = Column(DateTime(timezone=True)) # 更新时间
- gmt_deleted = Column(DateTime(timezone=True)) # 删除时间(软删除)
-```
-
-**用途**: 存储Chat会话的元数据,不包含具体消息内容
-
-#### 3.2 Redis - 聊天消息历史
-
-**文件**: `aperag/utils/history.py`
-
-**Key格式**: `message_store:{chat_id}`
-
-**数据结构**: Redis List (使用LPUSH,最新消息在前)
-
-**核心类**:
-
-```python
-class RedisChatMessageHistory:
- def __init__(self, session_id: str, key_prefix: str = "message_store:"):
- self.session_id = session_id
- self.key_prefix = key_prefix
-
- @property
- def key(self) -> str:
- return self.key_prefix + self.session_id # message_store:chat_abc123
-
- @property
- async def messages(self) -> List[StoredChatMessage]:
- # 从Redis读取所有消息
- _items = await self.redis_client.lrange(self.key, 0, -1)
- # 反转为时间顺序(因为LPUSH导致最新在前)
- items = [json.loads(m.decode("utf-8")) for m in _items[::-1]]
- return [storage_dict_to_message(item) for item in items]
-```
-
-**消息查询函数**:
-
-```python
-async def query_chat_messages(user: str, chat_id: str):
- """查询聊天消息并转换为前端格式"""
-
- # 1. 从Redis获取消息历史
- chat_history = RedisChatMessageHistory(chat_id, redis_client=get_async_redis_client())
- stored_messages = await chat_history.messages
-
- if not stored_messages:
- return []
-
- # 2. 从PostgreSQL获取反馈信息
- feedbacks = await async_db_ops.query_chat_feedbacks(user, chat_id)
- feedback_map = {feedback.message_id: feedback for feedback in feedbacks}
-
- # 3. 转换为前端格式并附加反馈信息
- result = []
- for stored_message in stored_messages:
- # 转换为前端格式
- chat_message_list = stored_message.to_frontend_format()
-
- # 为AI消息添加反馈数据
- for chat_msg in chat_message_list:
- feedback = feedback_map.get(chat_msg.id)
- if feedback and chat_msg.role == "ai":
- chat_msg.feedback = Feedback(
- type=feedback.type,
- tag=feedback.tag,
- message=feedback.message
- )
-
- result.append(chat_message_list)
-
- return result # [[message1_parts], [message2_parts], [message3_parts], ...]
-```
-
-#### 3.3 PostgreSQL - 用户反馈信息
-
-**表**: `message_feedback`
-
-```python
-class MessageFeedback(Base):
- __tablename__ = "message_feedback"
-
- user = Column(String(256), nullable=False) # 用户ID
- chat_id = Column(String(24), primary_key=True) # 会话ID
- message_id = Column(String(256), primary_key=True) # 消息ID
- type = Column(EnumColumn(MessageFeedbackType)) # like/dislike
- tag = Column(EnumColumn(MessageFeedbackTag)) # 反馈标签
- message = Column(Text) # 反馈内容
- question = Column(Text) # 原始问题
- original_answer = Column(Text) # 原始回答
- status = Column(EnumColumn(MessageFeedbackStatus)) # 状态
- gmt_created = Column(DateTime(timezone=True))
- gmt_updated = Column(DateTime(timezone=True))
-```
-
-**用途**: 存储用户对AI回复的反馈(点赞/点踩),用于质量监控和模型优化
-
-## 数据格式详解
-
-### 存储格式 (Redis)
-
-消息在Redis中以JSON格式存储,采用**Part-Based设计**:
-
-#### StoredChatMessage - 一条完整消息
-
-```python
-class StoredChatMessage(BaseModel):
- """一条完整消息(用户的一条消息 或 AI的一条消息)"""
- parts: List[StoredChatMessagePart] # 消息的多个部分
- files: List[Dict[str, Any]] # 关联的上传文件
-```
-
-#### StoredChatMessagePart - 消息的一个部分
-
-```python
-class StoredChatMessagePart(BaseModel):
- """消息的单个部分(原子单元)"""
-
- # 标识信息
- chat_id: str # 所属会话
- message_id: str # 所属消息(同一条消息的多个part共享)
- part_id: str # 部分的唯一ID
- timestamp: float # 生成时间戳
-
- # 内容分类
- type: Literal["message", "tool_call_result", "thinking", "references"]
- role: Literal["human", "ai", "system"]
- content: str
-
- # 扩展字段
- references: List[Dict] # 文档引用
- urls: List[str] # URL引用
- metadata: Optional[Dict] # 额外元数据
-```
-
-#### Part类型说明
-
-| Type | 说明 | 包含在LLM上下文 |
-|------|------|---------------|
-| `message` | 主要对话内容 | ✅ 是 |
-| `tool_call_result` | 工具调用过程 | ❌ 否(仅展示) |
-| `thinking` | AI思考过程 | ❌ 否(仅展示) |
-| `references` | 文档引用和链接 | ❌ 否(仅展示) |
-
-**设计原因**: AI的一条回复包含多个阶段(工具调用、思考、回答、引用),这些内容按时序产生且互相穿插,单一字段无法表达。用户的消息通常只有1个part(type="message"),但也支持多个part以保持结构一致性。
-
-#### Redis存储示例
-
-**用户消息**:
-```json
-{
- "parts": [
- {
- "chat_id": "chat_abc123",
- "message_id": "uuid-1",
- "part_id": "uuid-part-1",
- "timestamp": 1699999999.0,
- "type": "message",
- "role": "human",
- "content": "什么是LightRAG?",
- "references": [],
- "urls": [],
- "metadata": null
- }
- ],
- "files": []
-}
-```
-
-**AI回复(包含多个part)**:
-```json
-{
- "parts": [
- {
- "message_id": "uuid-2",
- "part_id": "uuid-part-2",
- "type": "tool_call_result",
- "role": "ai",
- "content": "正在检索知识库...",
- "timestamp": 1699999999.1
- },
- {
- "message_id": "uuid-2",
- "part_id": "uuid-part-3",
- "type": "message",
- "role": "ai",
- "content": "LightRAG是一个轻量级的RAG框架,由ApeCloud团队深度改造...",
- "timestamp": 1699999999.5
- },
- {
- "message_id": "uuid-2",
- "part_id": "uuid-part-4",
- "type": "references",
- "role": "ai",
- "content": "",
- "references": [
- {
- "score": 0.95,
- "text": "LightRAG架构说明...",
- "metadata": {"source": "lightrag_doc.pdf", "page": 3}
- }
- ],
- "urls": ["https://github.com/HKUDS/LightRAG"],
- "timestamp": 1699999999.6
- }
- ],
- "files": []
-}
-```
-
-### API响应格式
-
-**ChatDetails Schema** (`aperag/api/components/schemas/chat.yaml`):
-
-```yaml
-chatDetails:
- type: object
- properties:
- id: string # chat_abc123
- title: string # 会话标题
- bot_id: string # bot_xyz
- peer_id: string
- peer_type: string # system/feishu/weixin/web
- status: string # active/archived
- created: string # ISO 8601
- updated: string # ISO 8601
- history: # 二维数组
- type: array
- description: 对话历史,每个元素是一条消息
- items:
- type: array
- description: 一条消息包含多个parts(工具调用、思考、回答、引用等)
- items:
- $ref: '#/chatMessage'
-```
-
-**ChatMessage Schema**:
-
-```yaml
-chatMessage:
- type: object
- properties:
- id: string # message_id(同一轮次相同)
- part_id: string # part_id(每个part唯一)
- type: string # message/tool_call_result/thinking/references
- timestamp: number # Unix时间戳
- role: string # human/ai
- data: string # 消息内容
- references: # 文档引用(可选)
- type: array
- items:
- type: object
- properties:
- score: number
- text: string
- metadata: object
- urls: # URL引用(可选)
- type: array
- items:
- type: string
- feedback: # 用户反馈(可选)
- type: object
- properties:
- type: string # like/dislike
- tag: string
- message: string
- files: # 关联文件(可选)
- type: array
-```
-
-### 前端接收示例
-
-```json
-{
- "id": "chat_abc123",
- "title": "关于LightRAG的讨论",
- "bot_id": "bot_xyz",
- "status": "active",
- "created": "2025-01-01T00:00:00Z",
- "updated": "2025-01-01T01:00:00Z",
- "history": [
- [
- {
- "id": "uuid-1",
- "part_id": "uuid-part-1",
- "type": "message",
- "timestamp": 1699999999.0,
- "role": "human",
- "data": "什么是LightRAG?",
- "files": []
- }
- ],
- [
- {
- "id": "uuid-2",
- "part_id": "uuid-part-2",
- "type": "tool_call_result",
- "timestamp": 1699999999.1,
- "role": "ai",
- "data": "正在检索知识库...",
- "files": []
- },
- {
- "id": "uuid-2",
- "part_id": "uuid-part-3",
- "type": "message",
- "timestamp": 1699999999.5,
- "role": "ai",
- "data": "LightRAG是一个轻量级的RAG框架...",
- "files": []
- },
- {
- "id": "uuid-2",
- "part_id": "uuid-part-4",
- "type": "references",
- "timestamp": 1699999999.6,
- "role": "ai",
- "data": "",
- "references": [
- {
- "score": 0.95,
- "text": "LightRAG架构说明...",
- "metadata": {"source": "lightrag_doc.pdf"}
- }
- ],
- "urls": ["https://github.com/HKUDS/LightRAG"],
- "files": []
- }
- ]
- ]
-}
-```
-
-**注意**: `history`是二维数组,第一维是消息序列(按时间顺序),第二维是该条消息的多个part。例如:
-- `history[0]` = 用户的第1条消息的parts(通常只有1个part)
-- `history[1]` = AI的第1条回复的parts(可能有多个part:工具调用、思考、回答、引用)
-- `history[2]` = 用户的第2条消息的parts
-- `history[3]` = AI的第2条回复的parts
-- ...
-
-## 消息写入流程
-
-### Agent Runtime 写入路径
-
-旧的 WebSocket 聊天接口 `WS /api/v1/bots/{bot_id}/chats/{chat_id}/connect` 已经退休。
-当前 Agent 聊天写入走的是 v2 turn/timeline API 加 SSE 事件流。上面的 history schema 仍可作为背景说明,
-但不应再依据这份文档实现新的 WebSocket chat client。
-
-## 设计特点
-
-### 1. 混合存储架构
-
-| 存储 | 内容 | 原因 |
-|------|------|------|
-| PostgreSQL | Chat元数据 | 持久化、支持复杂查询 |
-| Redis | 消息历史 | 高性能读写、支持TTL |
-| PostgreSQL | 用户反馈 | 持久化、用于分析 |
-
-**优势**:
-- 性能优化:消息历史使用Redis快速读写
-- 数据持久化:重要元数据存储在PostgreSQL
-- 灵活性:可独立配置TTL、备份策略
-
-### 2. Part-Based消息设计
-
-**核心价值**:
-- ✅ 支持复杂的AI回复流程(工具调用→思考→回答→引用)
-- ✅ 前端可差异化渲染不同类型的内容
-- ✅ 完整记录时序关系(通过timestamp)
-- ✅ 灵活扩展(新增type无需改表结构)
-
-**为什么一条消息需要多个part**:
-
-AI的一条回复过程是时序产生、互相穿插的,例如:
-1. 🔍 Part1 (tool_call_result): "正在查询数据库..."
-2. 💭 Part2 (thinking): "找到了327条记录..."
-3. 🔍 Part3 (tool_call_result): "正在计算增长率..."
-4. 💭 Part4 (thinking): "环比增长15%..."
-5. 💬 Part5 (message): "根据数据分析,Q4表现优秀..."
-6. 📚 Part6 (references): [文档1, 文档2]
-
-这6个part属于AI的**一条消息**(共享同一个message_id),单一字段无法表达这种复杂的时序关系。
-
-### 3. 格式转换解耦
-
-提供三种格式转换:
-
-```python
-class StoredChatMessage:
- def to_frontend_format(self) -> List[ChatMessage]:
- """转换为前端展示格式"""
- # 包含所有types的parts
-
- def to_openai_format(self) -> List[Dict]:
- """转换为LLM调用格式"""
- # 只包含type="message"的parts
-
- def get_main_content(self) -> str:
- """获取主要回答内容"""
- # 第一个type="message"的content
-```
-
-**优势**:
-- 内部存储格式与外部接口解耦
-- 支持不同的消费场景
-- LLM上下文只包含实际对话内容,不包含工具调用和思考过程
-
-### 4. 三级ID设计
-
-```python
-chat_id = "chat_abc123" # 会话级别
-message_id = "uuid-msg-1" # 消息级别(同一条消息的多个part共享)
-part_id = "uuid-part-1" # 部分级别(每个part独立)
-```
-
-**作用**:
-- `chat_id`: 标识一个聊天会话
-- `message_id`: 将同一条消息的多个part分组(用于前端展示和反馈关联)
-- `part_id`: 每个part独立标识(用于单独操作,如复制、引用)
-
-## 性能考虑
-
-### Redis优化
-- **List数据结构**: LPUSH O(1), LRANGE O(N)
-- **可选TTL**: 自动过期历史消息
-- **连接池复用**: 全局Redis客户端
-
-### PostgreSQL优化
-- **索引**: user, bot_id, chat_id, status字段
-- **软删除**: 使用gmt_deleted
-- **分页查询**: list_chats支持分页
-
-### 传输优化
-- **WebSocket流式**: 边生成边发送
-- **增量更新**: 只传输新的part
-- **按需加载**: 懒加载历史消息
-
-## 相关文件
-
-### 核心实现
-- `aperag/views/chat.py` - View层接口
-- `aperag/service/chat_service.py` - Service层业务逻辑
-- `aperag/utils/history.py` - Redis消息历史管理
-- `aperag/chat/history/message.py` - 消息数据结构
-- `aperag/db/models.py` - 数据库模型
-- `aperag/db/repositories/chat.py` - Chat数据库操作
-- `aperag/api/components/schemas/chat.yaml` - OpenAPI Schema
-
-### 前端实现
-- `web/src/app/workspace/bots/[botId]/chats/[chatId]/page.tsx` - 聊天详情页面
-- `web/src/components/chat/chat-messages.tsx` - 消息展示组件
-
-## 总结
-
-ApeRAG的聊天历史消息系统采用**混合存储 + Part-Based消息设计**:
-
-1. **PostgreSQL**存储Chat元数据和反馈(持久化、可查询)
-2. **Redis**存储消息历史(高性能、支持过期)
-3. **Part-Based设计**支持复杂的AI回复流程(工具调用、思考、回答、引用)
-4. **三级ID设计**支持消息分组和独立操作
-5. **清晰的分层架构**(View → Service → Repository → Storage)
-
-这种设计既保证了性能,又支持复杂的AI交互场景,同时具有良好的可扩展性。
diff --git a/web/docs/zh-CN/design/document_upload_design.md b/web/docs/zh-CN/design/document_upload_design.md
deleted file mode 100644
index 0587be9c9..000000000
--- a/web/docs/zh-CN/design/document_upload_design.md
+++ /dev/null
@@ -1,1077 +0,0 @@
----
-title: 文档上传设计
-position: 3
----
-
-# ApeRAG 文档上传架构设计
-
-## 概述
-
-本文档详细说明 ApeRAG 项目中文档上传模块的完整架构设计,涵盖从文件上传、临时存储、文档解析、格式转换到最终索引构建的全链路流程。
-
-**核心设计理念**:采用**两阶段提交**模式,将文件上传(临时存储)和文档确认(正式添加)分离,提供更好的用户体验和资源管理能力。
-
-## 系统架构
-
-### 整体架构图
-
-```
-┌─────────────────────────────────────────────────────────────┐
-│ Frontend │
-│ (Next.js) │
-└────────┬───────────────────────────────────┬────────────────┘
- │ │
- │ Step 1: Upload │ Step 2: Confirm
- │ POST /documents/upload │ POST /documents/confirm
- ▼ ▼
-┌─────────────────────────────────────────────────────────────┐
-│ View Layer: aperag/views/collections.py │
-│ - HTTP请求处理 │
-│ - JWT身份验证 │
-│ - 参数验证 │
-└────────┬───────────────────────────────────┬────────────────┘
- │ │
- │ document_service.upload_document() │ document_service.confirm_documents()
- ▼ ▼
-┌─────────────────────────────────────────────────────────────┐
-│ Service Layer: aperag/service/document_service.py │
-│ - 业务逻辑编排 │
-│ - 文件验证(类型、大小) │
-│ - SHA-256 哈希去重 │
-│ - Quota 检查 │
-│ - 事务管理 │
-└────────┬───────────────────────────────────┬────────────────┘
- │ │
- │ Step 1 │ Step 2
- ▼ ▼
-┌────────────────────────┐ ┌────────────────────────────┐
-│ 1. 创建 Document 记录 │ │ 1. 更新 Document 状态 │
-│ status=UPLOADED │ │ UPLOADED → PENDING │
-│ 2. 保存到 ObjectStore │ │ 2. 创建 DocumentIndex 记录│
-│ 3. 计算 content_hash │ │ 3. 触发索引构建任务 │
-└────────┬───────────────┘ └────────┬───────────────────┘
- │ │
- ▼ ▼
-┌─────────────────────────────────────────────────────────────┐
-│ Storage Layer │
-│ │
-│ ┌───────────────┐ ┌──────────────────┐ ┌─────────────┐ │
-│ │ PostgreSQL │ │ Object Store │ │ Vector DB │ │
-│ │ │ │ │ │ │ │
-│ │ - document │ │ - Local/S3 │ │ - Qdrant │ │
-│ │ - document_ │ │ - 原始文件 │ │ - 向量索引 │ │
-│ │ index │ │ - 转换后的文件 │ │ │ │
-│ └───────────────┘ └──────────────────┘ └─────────────┘ │
-│ │
-│ ┌───────────────┐ ┌──────────────────┐ │
-│ │ Elasticsearch │ │ Neo4j/PG │ │
-│ │ │ │ │ │
-│ │ - 全文索引 │ │ - 知识图谱 │ │
-│ └───────────────┘ └──────────────────┘ │
-└─────────────────────────────────────────────────────────────┘
- │
- ▼
- ┌───────────────────┐
- │ Celery Workers │
- │ │
- │ - 文档解析 │
- │ - 格式转换 │
- │ - 内容提取 │
- │ - 文档分块 │
- │ - 索引构建 │
- └───────────────────┘
-```
-
-### 分层架构
-
-```
-┌─────────────────────────────────────────────┐
-│ View Layer (views/collections.py) │ HTTP 处理、认证、参数验证
-└─────────────────┬───────────────────────────┘
- │ 调用
-┌─────────────────▼───────────────────────────┐
-│ Service Layer (service/document_service.py)│ 业务逻辑、事务编排、权限控制
-└─────────────────┬───────────────────────────┘
- │ 调用
-┌─────────────────▼───────────────────────────┐
-│ Repository Layer (db/ops.py, objectstore/) │ 数据访问抽象、对象存储接口
-└─────────────────┬───────────────────────────┘
- │ 访问
-┌─────────────────▼───────────────────────────┐
-│ Storage Layer (PG, S3, Qdrant, ES, Neo4j) │ 数据持久化
-└─────────────────────────────────────────────┘
-```
-
-## 核心流程详解
-
-### 阶段 0: API 接口定义
-
-系统提供三个主要接口:
-
-1. **上传文件**(两阶段模式 - 第一步)
- - 接口:`POST /api/v1/collections/{collection_id}/documents/upload`
- - 功能:上传文件到临时存储,状态为 `UPLOADED`
- - 返回:`document_id`、`filename`、`size`、`status`
-
-2. **确认文档**(两阶段模式 - 第二步)
- - 接口:`POST /api/v1/collections/{collection_id}/documents/confirm`
- - 功能:确认已上传的文档,触发索引构建
- - 参数:`document_ids` 数组
- - 返回:`confirmed_count`、`failed_count`、`failed_documents`
-
-3. **一步上传**(传统模式,兼容旧版)
- - 接口:`POST /api/v1/collections/{collection_id}/documents`
- - 功能:上传并直接添加到知识库,状态直接为 `PENDING`
- - 支持批量上传
-
-### 阶段 1: 文件上传与临时存储
-
-#### 1.1 上传流程
-
-```
-用户选择文件
- │
- ▼
-前端调用 upload API
- │
- ▼
-View 层验证身份和参数
- │
- ▼
-Service 层处理业务逻辑:
- │
- ├─► 验证集合存在且激活
- │
- ├─► 验证文件类型和大小
- │
- ├─► 读取文件内容
- │
- ├─► 计算 SHA-256 哈希
- │
- └─► 事务处理:
- │
- ├─► 重复检测(按文件名+哈希)
- │ ├─ 完全相同:返回已存在文档(幂等)
- │ ├─ 同名不同内容:抛出冲突异常
- │ └─ 新文档:继续创建
- │
- ├─► 创建 Document 记录(status=UPLOADED)
- │
- ├─► 上传到对象存储
- │ └─ 路径:user-{user_id}/{collection_id}/{document_id}/original{suffix}
- │
- └─► 更新文档元数据(object_path)
-```
-
-#### 1.2 文件验证
-
-**支持的文件类型**:
-- 文档:`.pdf`, `.doc`, `.docx`, `.ppt`, `.pptx`, `.xls`, `.xlsx`
-- 文本:`.txt`, `.md`, `.html`, `.json`, `.xml`, `.yaml`, `.yml`, `.csv`
-- 图片:`.png`, `.jpg`, `.jpeg`, `.gif`, `.bmp`, `.tiff`, `.tif`
-- 音频:`.mp3`, `.wav`, `.m4a`
-- 压缩包:`.zip`, `.tar`, `.gz`, `.tgz`
-
-**大小限制**:
-- 默认:100 MB(可通过 `MAX_DOCUMENT_SIZE` 环境变量配置)
-- 解压后总大小:5 GB(`MAX_EXTRACTED_SIZE`)
-
-#### 1.3 重复检测机制
-
-采用**文件名 + SHA-256 哈希**双重检测:
-
-| 场景 | 文件名 | 哈希值 | 系统行为 |
-|------|--------|--------|----------|
-| 完全相同 | 相同 | 相同 | 返回已存在文档(幂等操作) |
-| 文件名冲突 | 相同 | 不同 | 抛出 `DocumentNameConflictException` |
-| 新文档 | 不同 | - | 创建新文档记录 |
-
-**优势**:
-- ✅ 支持幂等上传:网络重传不会创建重复文档
-- ✅ 避免内容冲突:同名不同内容会提示用户
-- ✅ 节省存储空间:相同内容只存储一次
-
-### 阶段 2: 临时存储配置
-
-#### 2.1 对象存储类型
-
-系统支持两种对象存储后端,可通过环境变量切换:
-
-**1. Local 存储(本地文件系统)**
-
-适用场景:
-- 开发测试环境
-- 小规模部署
-- 单机部署
-
-配置方式:
-```bash
-# 开发环境
-OBJECT_STORE_TYPE=local
-OBJECT_STORE_LOCAL_ROOT_DIR=.objects
-
-# Docker 环境
-OBJECT_STORE_TYPE=local
-OBJECT_STORE_LOCAL_ROOT_DIR=/shared/objects
-```
-
-存储路径示例:
-```
-.objects/
-└── user-google-oauth2-123456/
- └── col_abc123/
- └── doc_xyz789/
- ├── original.pdf # 原始文件
- ├── converted.pdf # 转换后的 PDF
- ├── processed_content.md # 解析后的 Markdown
- ├── chunks/ # 分块数据
- │ ├── chunk_0.json
- │ └── chunk_1.json
- └── images/ # 提取的图片
- ├── page_0.png
- └── page_1.png
-```
-
-**2. S3 存储(兼容 AWS S3/MinIO/OSS 等)**
-
-适用场景:
-- 生产环境
-- 大规模部署
-- 分布式部署
-- 需要高可用和容灾
-
-配置方式:
-```bash
-OBJECT_STORE_TYPE=s3
-OBJECT_STORE_S3_ENDPOINT=http://127.0.0.1:9000 # MinIO/S3 地址
-OBJECT_STORE_S3_REGION=us-east-1 # AWS Region
-OBJECT_STORE_S3_ACCESS_KEY=minioadmin # Access Key
-OBJECT_STORE_S3_SECRET_KEY=minioadmin # Secret Key
-OBJECT_STORE_S3_BUCKET=aperag # Bucket 名称
-OBJECT_STORE_S3_PREFIX_PATH=dev/ # 可选的路径前缀
-OBJECT_STORE_S3_USE_PATH_STYLE=true # MinIO 需要设置为 true
-```
-
-#### 2.2 对象存储路径规则
-
-**路径格式**:
-```
-{prefix}/user-{user_id}/{collection_id}/{document_id}/{filename}
-```
-
-**组成部分**:
-- `prefix`:可选的全局前缀(仅 S3)
-- `user_id`:用户 ID(`|` 替换为 `-`)
-- `collection_id`:集合 ID
-- `document_id`:文档 ID
-- `filename`:文件名(如 `original.pdf`、`page_0.png`)
-
-**多租户隔离**:
-- 每个用户有独立的命名空间
-- 每个集合有独立的存储目录
-- 每个文档有独立的文件夹
-
-### 阶段 3: 文档确认与索引构建
-
-#### 3.1 确认流程
-
-```
-用户点击"保存到集合"
- │
- ▼
-前端调用 confirm API
- │
- ▼
-Service 层处理:
- │
- ├─► 验证集合配置
- │
- ├─► 检查 Quota(确认阶段才扣除配额)
- │
- └─► 对每个 document_id:
- │
- ├─► 验证文档状态为 UPLOADED
- │
- ├─► 更新文档状态:UPLOADED → PENDING
- │
- ├─► 根据集合配置创建索引记录:
- │ ├─ VECTOR(向量索引,必选)
- │ ├─ FULLTEXT(全文索引,必选)
- │ ├─ GRAPH(知识图谱,可选)
- │ ├─ SUMMARY(文档摘要,可选)
- │ └─ VISION(视觉索引,可选)
- │
- └─► 返回确认结果
- │
- ▼
-触发 Celery 任务:reconcile_document_indexes
- │
- ▼
-后台异步处理索引构建
-```
-
-#### 3.2 Quota(配额)管理
-
-**检查时机**:
-- ❌ 不在上传阶段检查(临时存储不占用配额)
-- ✅ 在确认阶段检查(正式添加才消耗配额)
-
-**配额类型**:
-
-1. **用户全局配额**
- - `max_document_count`:用户总文档数量限制
- - 默认:1000(可通过 `MAX_DOCUMENT_COUNT` 配置)
-
-2. **单集合配额**
- - `max_document_count_per_collection`:单个集合文档数量限制
- - 不计入 `UPLOADED` 和 `DELETED` 状态的文档
-
-**配额超限处理**:
-- 抛出 `QuotaExceededException`
-- 返回 HTTP 400 错误
-- 包含当前用量和配额上限信息
-
-### 阶段 4: 文档解析与格式转换
-
-#### 4.1 Parser 架构
-
-系统采用**多 Parser 链式调用**架构,每个 Parser 负责特定类型的文件解析:
-
-```
-DocParser(主控制器)
- │
- ├─► MinerUParser
- │ └─ 功能:高精度 PDF 解析(商业 API)
- │ └─ 支持:.pdf
- │
- ├─► ImageParser
- │ └─ 功能:图片内容识别(OCR + 视觉理解)
- │ └─ 支持:.jpg, .png, .gif, .bmp, .tiff
- │
- ├─► AudioParser
- │ └─ 功能:音频转录(Speech-to-Text)
- │ └─ 支持:.mp3, .wav, .m4a
- │
- └─► MarkItDownParser(兜底)
- └─ 功能:通用文档转 Markdown
- └─ 支持:几乎所有常见格式
-```
-
-#### 4.2 Parser 配置
-
-**配置方式**:通过集合配置(Collection Config)动态控制
-
-```json
-{
- "parser_config": {
- "use_mineru": false, // 是否启用 MinerU(需要 API Token)
- "use_markitdown": true, // 是否启用 MarkItDown(默认)
- "mineru_api_token": "xxx" // MinerU API Token(可选)
- }
-}
-```
-
-**环境变量配置**:
-```bash
-USE_MINERU_API=false # 全局启用 MinerU
-MINERU_API_TOKEN=your_token # MinerU API Token
-```
-
-#### 4.3 解析流程
-
-```
-Celery Worker 收到索引任务
- │
- ▼
-1. 从对象存储下载原始文件
- │
- ▼
-2. 根据文件扩展名选择 Parser
- │
- ├─► 尝试第一个匹配的 Parser
- │ ├─ 成功:返回解析结果
- │ └─ 失败:FallbackError → 尝试下一个 Parser
- │
- └─► 最终兜底:MarkItDownParser
- │
- ▼
-3. 解析结果(Parts):
- │
- ├─► MarkdownPart:文本内容
- │ └─ 包含:标题、段落、列表、表格等
- │
- ├─► PdfPart:PDF 文件
- │ └─ 用于:线性化、页面渲染
- │
- └─► AssetBinPart:二进制资源
- └─ 包含:图片、嵌入的文件等
- │
- ▼
-4. 后处理(Post-processing):
- │
- ├─► PDF 页面转图片(Vision 索引需要)
- │ └─ 每页渲染为 PNG 图片
- │ └─ 保存到 {document_path}/images/page_N.png
- │
- ├─► PDF 线性化(加速浏览器加载)
- │ └─ 使用 pikepdf 优化 PDF 结构
- │ └─ 保存到 {document_path}/converted.pdf
- │
- └─► 提取文本内容(纯文本)
- └─ 合并所有 MarkdownPart 内容
- └─ 保存到 {document_path}/processed_content.md
- │
- ▼
-5. 保存到对象存储
-```
-
-#### 4.4 格式转换示例
-
-**示例 1:PDF 文档**
-```
-输入:user_manual.pdf (5 MB)
- │
- ▼
-解析器选择:MinerUParser / MarkItDownParser
- │
- ▼
-输出 Parts:
- ├─ MarkdownPart: "# User Manual\n\n## Chapter 1\n..."
- └─ PdfPart: <原始 PDF 数据>
- │
- ▼
-后处理:
- ├─ 渲染 50 页为图片 → images/page_0.png ~ page_49.png
- ├─ 线性化 PDF → converted.pdf
- └─ 提取文本 → processed_content.md
-```
-
-**示例 2:图片文件**
-```
-输入:screenshot.png (2 MB)
- │
- ▼
-解析器选择:ImageParser
- │
- ▼
-输出 Parts:
- ├─ MarkdownPart: "[OCR 提取的文字内容]"
- └─ AssetBinPart: <原始图片数据> (vision_index=true)
- │
- ▼
-后处理:
- └─ 保存原图副本 → images/file.png
-```
-
-**示例 3:音频文件**
-```
-输入:meeting_record.mp3 (50 MB)
- │
- ▼
-解析器选择:AudioParser
- │
- ▼
-输出 Parts:
- └─ MarkdownPart: "[转录的会议内容文本]"
- │
- ▼
-后处理:
- └─ 保存转录文本 → processed_content.md
-```
-
-### 阶段 5: 索引构建
-
-#### 5.1 索引类型与功能
-
-| 索引类型 | 是否必选 | 功能描述 | 存储位置 |
-|---------|---------|----------|----------|
-| **VECTOR** | ✅ 必选 | 向量化检索,支持语义搜索 | Qdrant / Elasticsearch |
-| **FULLTEXT** | ✅ 必选 | 全文检索,支持关键词搜索 | Elasticsearch |
-| **GRAPH** | ❌ 可选 | 知识图谱,提取实体和关系 | Neo4j / PostgreSQL |
-| **SUMMARY** | ❌ 可选 | 文档摘要,LLM 生成 | PostgreSQL (index_data) |
-| **VISION** | ❌ 可选 | 视觉理解,图片内容分析 | Qdrant (向量) + PG (metadata) |
-
-#### 5.2 索引构建流程
-
-```
-Celery Worker: reconcile_document_indexes 任务
- │
- ▼
-1. 扫描 DocumentIndex 表,找到需要处理的索引
- │
- ├─► PENDING 状态 + observed_version < version
- │ └─ 需要创建或更新索引
- │
- └─► DELETING 状态
- └─ 需要删除索引
- │
- ▼
-2. 按文档分组,逐个处理
- │
- ▼
-3. 对每个文档:
- │
- ├─► parse_document(解析文档)
- │ ├─ 从对象存储下载原始文件
- │ ├─ 调用 DocParser 解析
- │ └─ 返回 ParsedDocumentData
- │
- └─► 对每个索引类型:
- │
- ├─► create_index (创建/更新索引)
- │ │
- │ ├─ VECTOR 索引:
- │ │ ├─ 文档分块(Chunking)
- │ │ ├─ Embedding 模型生成向量
- │ │ └─ 写入 Qdrant
- │ │
- │ ├─ FULLTEXT 索引:
- │ │ ├─ 提取纯文本内容
- │ │ ├─ 按段落/章节分块
- │ │ └─ 写入 Elasticsearch
- │ │
- │ ├─ GRAPH 索引:
- │ │ ├─ 使用 LightRAG 提取实体
- │ │ ├─ 提取实体间关系
- │ │ └─ 写入 Neo4j/PostgreSQL
- │ │
- │ ├─ SUMMARY 索引:
- │ │ ├─ 调用 LLM 生成摘要
- │ │ └─ 保存到 DocumentIndex.index_data
- │ │
- │ └─ VISION 索引:
- │ ├─ 提取图片 Assets
- │ ├─ Vision LLM 理解图片内容
- │ ├─ 生成图片描述向量
- │ └─ 写入 Qdrant
- │
- └─► 更新索引状态
- ├─ 成功:CREATING → ACTIVE
- └─ 失败:CREATING → FAILED
- │
- ▼
-4. 更新文档总体状态
- │
- ├─ 所有索引都 ACTIVE → Document.status = COMPLETE
- ├─ 任一索引 FAILED → Document.status = FAILED
- └─ 部分索引仍在处理 → Document.status = RUNNING
-```
-
-#### 5.3 文档分块(Chunking)
-
-**分块策略**:
-- 递归字符分割(RecursiveCharacterTextSplitter)
-- 按自然段落、章节优先切分
-- 保留上下文重叠(Overlap)
-
-**分块参数**:
-```json
-{
- "chunk_size": 1000, // 每块最大字符数
- "chunk_overlap": 200, // 重叠字符数
- "separators": ["\n\n", "\n", " ", ""] // 分隔符优先级
-}
-```
-
-**分块结果存储**:
-```
-{document_path}/chunks/
- ├─ chunk_0.json: {"text": "...", "metadata": {...}}
- ├─ chunk_1.json: {"text": "...", "metadata": {...}}
- └─ ...
-```
-
-## 数据库设计
-
-### 表 1: document(文档元数据)
-
-**表结构**:
-
-| 字段名 | 类型 | 说明 | 索引 |
-|--------|------|------|------|
-| `id` | String(24) | 文档 ID,主键,格式:`doc{random_id}` | PK |
-| `name` | String(1024) | 文件名 | - |
-| `user` | String(256) | 用户 ID(支持多种 IDP) | ✅ Index |
-| `collection_id` | String(24) | 所属集合 ID | ✅ Index |
-| `status` | Enum | 文档状态(见下表) | ✅ Index |
-| `size` | BigInteger | 文件大小(字节) | - |
-| `content_hash` | String(64) | SHA-256 哈希(用于去重) | ✅ Index |
-| `object_path` | Text | 对象存储路径(已废弃,用 doc_metadata) | - |
-| `doc_metadata` | Text | 文档元数据(JSON 字符串) | - |
-| `gmt_created` | DateTime(tz) | 创建时间(UTC) | - |
-| `gmt_updated` | DateTime(tz) | 更新时间(UTC) | - |
-| `gmt_deleted` | DateTime(tz) | 删除时间(软删除) | ✅ Index |
-
-**唯一约束**:
-```sql
-UNIQUE INDEX uq_document_collection_name_active
- ON document (collection_id, name)
- WHERE gmt_deleted IS NULL;
-```
-- 同一集合内,活跃文档的名称不能重复
-- 已删除的文档不参与唯一性检查
-
-**文档状态枚举**(`DocumentStatus`):
-
-| 状态 | 说明 | 何时设置 | 可见性 |
-|------|------|----------|--------|
-| `UPLOADED` | 已上传到临时存储 | `upload_document` 接口 | 前端文件选择界面 |
-| `PENDING` | 等待索引构建 | `confirm_documents` 接口 | 文档列表(处理中) |
-| `RUNNING` | 索引构建中 | Celery 任务开始处理 | 文档列表(处理中) |
-| `COMPLETE` | 所有索引完成 | 所有索引变为 ACTIVE | 文档列表(可用) |
-| `FAILED` | 索引构建失败 | 任一索引失败 | 文档列表(失败) |
-| `DELETED` | 已删除 | `delete_document` 接口 | 不可见(软删除) |
-| `EXPIRED` | 临时文档过期 | 定时清理任务 | 不可见 |
-
-**文档元数据示例**(`doc_metadata` JSON 字段):
-```json
-{
- "object_path": "user-xxx/col_xxx/doc_xxx/original.pdf",
- "converted_path": "user-xxx/col_xxx/doc_xxx/converted.pdf",
- "processed_content_path": "user-xxx/col_xxx/doc_xxx/processed_content.md",
- "images": [
- "user-xxx/col_xxx/doc_xxx/images/page_0.png",
- "user-xxx/col_xxx/doc_xxx/images/page_1.png"
- ],
- "parser_used": "MinerUParser",
- "parse_duration_ms": 5420,
- "page_count": 50,
- "custom_field": "value"
-}
-```
-
-### 表 2: document_index(索引状态管理)
-
-**表结构**:
-
-| 字段名 | 类型 | 说明 | 索引 |
-|--------|------|------|------|
-| `id` | Integer | 自增 ID,主键 | PK |
-| `document_id` | String(24) | 关联的文档 ID | ✅ Index |
-| `index_type` | Enum | 索引类型(见下表) | ✅ Index |
-| `status` | Enum | 索引状态(见下表) | ✅ Index |
-| `version` | Integer | 索引版本号 | - |
-| `observed_version` | Integer | 已处理的版本号 | - |
-| `index_data` | Text | 索引数据(JSON),如摘要内容 | - |
-| `error_message` | Text | 错误信息(失败时) | - |
-| `gmt_created` | DateTime(tz) | 创建时间 | - |
-| `gmt_updated` | DateTime(tz) | 更新时间 | - |
-| `gmt_last_reconciled` | DateTime(tz) | 最后协调时间 | - |
-
-**唯一约束**:
-```sql
-UNIQUE CONSTRAINT uq_document_index
- ON document_index (document_id, index_type);
-```
-- 每个文档的每种索引类型只有一条记录
-
-**索引类型枚举**(`DocumentIndexType`):
-
-| 类型 | 值 | 说明 | 外部存储 |
-|------|-----|------|----------|
-| `VECTOR` | "VECTOR" | 向量索引 | Qdrant / Elasticsearch |
-| `FULLTEXT` | "FULLTEXT" | 全文索引 | Elasticsearch |
-| `GRAPH` | "GRAPH" | 知识图谱 | Neo4j / PostgreSQL |
-| `SUMMARY` | "SUMMARY" | 文档摘要 | PostgreSQL (index_data) |
-| `VISION` | "VISION" | 视觉索引 | Qdrant + PostgreSQL |
-
-**索引状态枚举**(`DocumentIndexStatus`):
-
-| 状态 | 说明 | 何时设置 |
-|------|------|----------|
-| `PENDING` | 等待处理 | `confirm_documents` 创建索引记录 |
-| `CREATING` | 创建中 | Celery Worker 开始处理 |
-| `ACTIVE` | 就绪可用 | 索引构建成功 |
-| `DELETING` | 标记删除 | `delete_document` 接口 |
-| `DELETION_IN_PROGRESS` | 删除中 | Celery Worker 正在删除 |
-| `FAILED` | 失败 | 索引构建失败 |
-
-**版本控制机制**:
-- `version`:期望的索引版本(每次文档更新时 +1)
-- `observed_version`:已处理的版本号
-- `version > observed_version` 时,触发索引更新
-
-**协调器(Reconciler)**:
-```python
-# 查询需要处理的索引
-SELECT * FROM document_index
-WHERE status = 'PENDING'
- AND observed_version < version;
-
-# 处理后更新
-UPDATE document_index
-SET status = 'ACTIVE',
- observed_version = version,
- gmt_last_reconciled = NOW()
-WHERE id = ?;
-```
-
-### 表关系图
-
-```
-┌─────────────────────────────────┐
-│ collection │
-│ ───────────────────────────── │
-│ id (PK) │
-│ name │
-│ config (JSON) │
-│ status │
-│ ... │
-└────────────┬────────────────────┘
- │ 1:N
- ▼
-┌─────────────────────────────────┐
-│ document │
-│ ───────────────────────────── │
-│ id (PK) │
-│ collection_id (FK) │◄──── 唯一约束: (collection_id, name)
-│ name │
-│ user │
-│ status (Enum) │
-│ size │
-│ content_hash (SHA-256) │
-│ doc_metadata (JSON) │
-│ gmt_created │
-│ gmt_deleted │
-│ ... │
-└────────────┬────────────────────┘
- │ 1:N
- ▼
-┌─────────────────────────────────┐
-│ document_index │
-│ ───────────────────────────── │
-│ id (PK) │
-│ document_id (FK) │◄──── 唯一约束: (document_id, index_type)
-│ index_type (Enum) │
-│ status (Enum) │
-│ version │
-│ observed_version │
-│ index_data (JSON) │
-│ error_message │
-│ gmt_last_reconciled │
-│ ... │
-└─────────────────────────────────┘
-```
-
-## 状态机与生命周期
-
-### 文档状态转换
-
-```
- ┌─────────────────────────────────────────────┐
- │ │
- │ ▼
- [上传文件] ──► UPLOADED ──► [确认] ──► PENDING ──► RUNNING ──► COMPLETE
- │ │
- │ ▼
- │ FAILED
- │ │
- │ ▼
- └──────► [删除] ──────────────► DELETED
- │
- ┌───────────────────────────────────┘
- │
- ▼
- EXPIRED (定时清理未确认的文档)
-```
-
-**关键转换**:
-1. **UPLOADED → PENDING**:用户点击"保存到集合"
-2. **PENDING → RUNNING**:Celery Worker 开始处理
-3. **RUNNING → COMPLETE**:所有索引都成功
-4. **RUNNING → FAILED**:任一索引失败
-5. **任何状态 → DELETED**:用户删除文档
-
-### 索引状态转换
-
-```
- [创建索引记录] ──► PENDING ──► CREATING ──► ACTIVE
- │
- ▼
- FAILED
- │
- ▼
- ┌──────────► PENDING (重试)
- │
- [删除请求] ──────┼──────────► DELETING ──► DELETION_IN_PROGRESS ──► (记录删除)
- │
- └──────────► (直接删除记录,如果 PENDING/FAILED)
-```
-
-## 异步任务调度(Celery)
-
-### 任务定义
-
-**主任务**:`reconcile_document_indexes`
-- 触发时机:
- - `confirm_documents` 接口调用后
- - 定时任务(每 30 秒)
- - 手动触发(管理界面)
-- 功能:扫描 `document_index` 表,处理需要协调的索引
-
-**子任务**:
-- `parse_document_task`:解析文档内容
-- `create_vector_index_task`:创建向量索引
-- `create_fulltext_index_task`:创建全文索引
-- `create_graph_index_task`:创建知识图谱索引
-- `create_summary_index_task`:创建摘要索引
-- `create_vision_index_task`:创建视觉索引
-
-### 任务调度策略
-
-**并发控制**:
-- 每个 Worker 最多同时处理 N 个文档(默认 4)
-- 每个文档的多个索引可以并行构建
-- 使用 Celery 的 `task_acks_late=True` 确保任务不丢失
-
-**失败重试**:
-- 最多重试 3 次
-- 指数退避(1分钟 → 5分钟 → 15分钟)
-- 3 次失败后标记为 `FAILED`
-
-**幂等性**:
-- 所有任务支持重复执行
-- 使用 `observed_version` 机制避免重复处理
-- 相同输入产生相同输出
-
-## 设计特点与优势
-
-### 1. 两阶段提交设计
-
-**优势**:
-- ✅ **用户体验更好**:快速上传响应,不阻塞用户操作
-- ✅ **选择性添加**:批量上传后可选择性确认部分文件
-- ✅ **资源控制合理**:未确认的文档不构建索引,不消耗配额
-- ✅ **故障恢复友好**:临时文档可以定期清理,不影响业务
-
-**状态隔离**:
-```
-临时状态(UPLOADED):
- - 不计入配额
- - 不触发索引
- - 可以被自动清理
-
-正式状态(PENDING/RUNNING/COMPLETE):
- - 计入配额
- - 触发索引构建
- - 不会被自动清理
-```
-
-### 2. 幂等性设计
-
-**文件级别幂等**:
-- SHA-256 哈希去重
-- 相同文件多次上传返回同一 `document_id`
-- 避免存储空间浪费
-
-**接口级别幂等**:
-- `upload_document`:重复上传返回已存在文档
-- `confirm_documents`:重复确认不会创建重复索引
-- `delete_document`:重复删除返回成功(软删除)
-
-### 3. 多租户隔离
-
-**存储隔离**:
-```
-user-{user_A}/... # 用户 A 的文件
-user-{user_B}/... # 用户 B 的文件
-```
-
-**数据库隔离**:
-- 所有查询都带 `user` 字段过滤
-- 集合级别的权限控制(`collection.user`)
-- 软删除支持(`gmt_deleted`)
-
-### 4. 灵活的存储后端
-
-**统一接口**:
-```python
-AsyncObjectStore:
- - put(path, data)
- - get(path)
- - delete_objects_by_prefix(prefix)
-```
-
-**运行时切换**:
-- 通过环境变量切换 Local/S3
-- 无需修改业务代码
-- 支持自定义存储后端(实现接口即可)
-
-### 5. 事务一致性
-
-**数据库 + 对象存储的两阶段提交**:
-```python
-async with transaction:
- # 1. 创建数据库记录
- document = create_document_record()
-
- # 2. 上传到对象存储
- await object_store.put(path, data)
-
- # 3. 更新元数据
- document.doc_metadata = json.dumps(metadata)
-
- # 所有操作成功才提交,任一失败则回滚
-```
-
-**失败处理**:
-- 数据库记录创建失败:不上传文件
-- 文件上传失败:回滚数据库记录
-- 元数据更新失败:回滚前面的操作
-
-### 6. 可观测性
-
-**审计日志**:
-- `@audit` 装饰器记录所有文档操作
-- 包含:用户、时间、操作类型、资源 ID
-
-**任务追踪**:
-- `gmt_last_reconciled`:最后处理时间
-- `error_message`:失败原因
-- Celery 任务 ID:关联日志追踪
-
-**监控指标**:
-- 文档上传速率
-- 索引构建耗时
-- 失败率统计
-
-## 性能优化
-
-### 1. 异步处理
-
-**上传不阻塞**:
-- 文件上传到对象存储后立即返回
-- 索引构建在 Celery 中异步执行
-- 前端通过轮询或 WebSocket 获取进度
-
-### 2. 批量操作
-
-**批量确认**:
-```python
-confirm_documents(document_ids=[id1, id2, ..., idN])
-```
-- 一次事务处理多个文档
-- 批量创建索引记录
-- 减少数据库往返
-
-### 3. 缓存策略
-
-**解析结果缓存**:
-- 解析后的内容保存到 `processed_content.md`
-- 后续索引重建可直接读取,无需重新解析
-
-**分块结果缓存**:
-- 分块结果保存到 `chunks/` 目录
-- 向量索引重建可复用分块结果
-
-### 4. 并行索引构建
-
-**多索引并行**:
-```python
-# VECTOR、FULLTEXT、GRAPH 可以并行构建
-await asyncio.gather(
- create_vector_index(),
- create_fulltext_index(),
- create_graph_index()
-)
-```
-
-## 错误处理
-
-### 常见异常
-
-| 异常类型 | HTTP 状态码 | 触发场景 | 处理建议 |
-|---------|------------|----------|----------|
-| `ResourceNotFoundException` | 404 | 集合/文档不存在 | 检查 ID 是否正确 |
-| `CollectionInactiveException` | 400 | 集合未激活 | 等待集合初始化完成 |
-| `DocumentNameConflictException` | 409 | 同名不同内容 | 重命名文件或删除旧文档 |
-| `QuotaExceededException` | 429 | 配额超限 | 升级套餐或删除旧文档 |
-| `InvalidFileTypeException` | 400 | 不支持的文件类型 | 查看支持的文件类型列表 |
-| `FileSizeTooLargeException` | 413 | 文件过大 | 分割文件或压缩 |
-
-### 异常传播
-
-```
-Service Layer 抛出异常
- │
- ▼
-View Layer 捕获并转换
- │
- ▼
-Exception Handler 统一处理
- │
- ▼
-返回标准 JSON 响应:
-{
- "error_code": "QUOTA_EXCEEDED",
- "message": "Document count limit exceeded",
- "details": {
- "limit": 1000,
- "current": 1000
- }
-}
-```
-
-## 相关文件索引
-
-### 核心实现
-
-- **View 层**:`aperag/views/collections.py` - HTTP 接口定义
-- **Service 层**:`aperag/service/document_service.py` - 业务逻辑
-- **数据库模型**:`aperag/db/models.py` - Document, DocumentIndex 表定义
-- **数据库操作**:`aperag/db/ops.py` - CRUD 操作封装
-
-### 对象存储
-
-- **接口定义**:`aperag/objectstore/base.py` - AsyncObjectStore 抽象类
-- **Local 实现**:`aperag/objectstore/local.py` - 本地文件系统存储
-- **S3 实现**:`aperag/objectstore/s3.py` - S3 兼容存储
-
-### 文档解析
-
-- **主控制器**:`aperag/docparser/doc_parser.py` - DocParser
-- **Parser 实现**:
- - `aperag/docparser/mineru_parser.py` - MinerU PDF 解析
- - `aperag/docparser/mineru_parser.py` - MinerU 文档解析
- - `aperag/docparser/markitdown_parser.py` - MarkItDown 通用解析
- - `aperag/docparser/image_parser.py` - 图片 OCR
- - `aperag/docparser/audio_parser.py` - 音频转录
-- **文档处理**:`aperag/index/document_parser.py` - 解析流程编排
-
-### 索引构建
-
-- **索引管理**:`aperag/index/manager.py` - DocumentIndexManager
-- **向量索引**:`aperag/index/vector_index.py` - VectorIndexer
-- **全文索引**:`aperag/index/fulltext_index.py` - FulltextIndexer
-- **知识图谱**:`aperag/index/graph_index.py` - GraphIndexer
-- **文档摘要**:`aperag/index/summary_index.py` - SummaryIndexer
-- **视觉索引**:`aperag/index/vision_index.py` - VisionIndexer
-
-### 任务调度
-
-- **任务定义**:`config/celery_tasks.py` - Celery 任务注册
-- **协调器**:`aperag/tasks/reconciler.py` - DocumentIndexReconciler
-- **文档任务**:`aperag/tasks/document.py` - DocumentIndexTask
-
-### 前端实现
-
-- **文档列表**:`web/src/app/workspace/collections/[collectionId]/documents/page.tsx`
-- **文档上传**:`web/src/app/workspace/collections/[collectionId]/documents/upload/document-upload.tsx`
-
-## 总结
-
-ApeRAG 的文档上传模块采用**两阶段提交 + 多 Parser 链式调用 + 多索引并行构建**的架构设计:
-
-**核心特性**:
-1. ✅ **两阶段提交**:上传(临时存储)→ 确认(正式添加),提供更好的用户体验
-2. ✅ **SHA-256 去重**:避免重复文档,支持幂等上传
-3. ✅ **灵活存储后端**:Local/S3 可配置切换,统一接口抽象
-4. ✅ **多 Parser 架构**:支持 MinerU、MarkItDown 等多种解析器
-5. ✅ **格式自动转换**:PDF→图片、音频→文本、图片→OCR 文本
-6. ✅ **多索引协调**:向量、全文、图谱、摘要、视觉五种索引类型
-7. ✅ **配额管理**:确认阶段才扣除配额,合理控制资源
-8. ✅ **异步处理**:Celery 任务队列,不阻塞用户操作
-9. ✅ **事务一致性**:数据库 + 对象存储的两阶段提交
-10. ✅ **可观测性**:审计日志、任务追踪、错误信息完整记录
-
-这种设计既保证了高性能和可扩展性,又支持复杂的文档处理场景(多格式、多语言、多模态),同时具有良好的容错能力和用户体验。
diff --git a/web/docs/zh-CN/design/graph_index_creation.md b/web/docs/zh-CN/design/graph_index_creation.md
deleted file mode 100644
index 255f6433a..000000000
--- a/web/docs/zh-CN/design/graph_index_creation.md
+++ /dev/null
@@ -1,1084 +0,0 @@
----
-title: 图索引构建流程
-description: ApeRAG 知识图谱索引构建的完整流程与核心技术
-keywords: 知识图谱, Graph Index, 实体提取, 关系抽取, 并发优化
-position: 2
----
-
-# 图索引构建流程
-
-## 1. 什么是图索引
-
-图索引(Graph Index)是 ApeRAG 的核心特色功能,它能从非结构化文本中自动提取出结构化的知识图谱。
-
-### 1.1 一个简单的例子
-
-想象一下,你有一份关于公司组织架构的文档,里面提到:
-
-> "张三是数据库团队的负责人,他擅长 PostgreSQL 和 MySQL。李四在前端团队工作,经常和张三的团队协作开发后台管理系统。"
-
-**从文档到知识图谱的转换**:
-
-```mermaid
-flowchart LR
- subgraph Input[📄 输入文档]
- Doc["张三是数据库团队的负责人,
他擅长 PostgreSQL 和 MySQL。
李四在前端团队工作..."]
- end
-
- subgraph Process[🔄 图索引处理]
- Extract[提取实体和关系]
- end
-
- subgraph Output[🕸️ 知识图谱]
- direction TB
- A[张三
人物] -->|负责| B[数据库团队
组织]
- A -->|擅长| C[PostgreSQL
技术]
- A -->|擅长| D[MySQL
技术]
- E[李四
人物] -->|属于| F[前端团队
组织]
- E -->|协作| A
- end
-
- Input --> Process
- Process --> Output
-
- style Input fill:#e3f2fd
- style Process fill:#fff59d
- style Output fill:#c8e6c9
-```
-
-传统的向量检索只能找到"语义相似"的段落,但无法回答这些问题:
-- 张三负责什么?
-- 张三和李四是什么关系?
-- 数据库团队都有哪些技术栈?
-
-**图索引能做到**:精确回答这些需要理解"关系"的问题,因为它把隐藏在文本中的知识关系显性化了。
-
-### 1.2 核心价值
-
-与传统检索方式相比,图索引提供了独特的能力:
-
-| 能力 | 向量检索 | 全文检索 | 图索引 |
-|------|---------|---------|--------|
-| 语义相似搜索 | ✅ 强 | ❌ 弱 | ✅ 强 |
-| 精确关键词匹配 | ❌ 弱 | ✅ 强 | ✅ 中 |
-| 关系查询 | ❌ 不支持 | ❌ 不支持 | ✅ 强 |
-| 多跳推理 | ❌ 不支持 | ❌ 不支持 | ✅ 支持 |
-| 适用问题 | "如何优化性能" | "PostgreSQL 配置" | "张三和李四的关系" |
-
-**核心优势**:图索引让 AI 能够"理解"知识之间的关联,而不仅仅是文本的相似度。
-
-## 2. 图索引能解决什么问题
-
-图索引特别擅长处理那些需要"理解关系"的场景。让我们看看它在实际工作中的应用。
-
-### 2.1 企业知识管理
-
-**场景**:公司有大量文档,包括组织架构、项目资料、技术文档等。
-
-**图索引的价值**:
-
-- 📊 **组织关系**:"张三的团队有哪些人?" → 快速找到团队成员
-- 🔗 **协作关系**:"谁和张三合作过?" → 发现工作网络
-- 🛠️ **技能图谱**:"谁擅长 PostgreSQL?" → 定位技术专家
-- 📁 **项目历史**:"张三参与过哪些项目?" → 追溯项目经验
-
-**实际效果**:
-
-```
-问:"数据库团队负责人是谁?"
-传统检索:返回包含"数据库团队"和"负责人"的所有段落(可能几十条)
-图索引:直接返回"张三" + 相关背景信息
-```
-
-### 2.2 研究与学习
-
-**场景**:分析学术论文、技术文档,理解知识脉络。
-
-**图索引的价值**:
-
-- 👥 **作者网络**:"这个作者和谁合作过?" → 发现研究团队
-- 📖 **引用关系**:"这篇论文引用了哪些文献?" → 追溯研究脉络
-- 🔬 **技术演进**:"这个技术是如何发展的?" → 理解技术历史
-- 💡 **概念关联**:"A 技术和 B 技术有什么关系?" → 连接知识点
-
-### 2.3 产品与服务
-
-**场景**:产品文档、用户手册、API 文档等。
-
-**图索引的价值**:
-
-- ⚙️ **功能依赖**:"启用 A 功能需要先配置什么?" → 理解依赖关系
-- 🔧 **配置关联**:"这个配置项会影响哪些功能?" → 避免误操作
-- 🐛 **问题诊断**:"出现 X 错误可能是什么原因?" → 快速定位
-- 📚 **API 关系**:"这个 API 通常和哪些 API 一起使用?" → 学习最佳实践
-
-### 2.4 对比:什么时候用图索引
-
-不同的问题适合不同的检索方式:
-
-| 问题类型 | 示例 | 最佳方案 |
-|---------|------|---------|
-| **概念理解** | "什么是 RAG?" | 向量检索 |
-| **精确查找** | "PostgreSQL 配置文件路径" | 全文检索 |
-| **关系查询** | "张三和李四什么关系?" | 图索引 ✨ |
-| **多跳推理** | "张三团队用的技术栈" | 图索引 ✨ |
-| **知识追溯** | "这个功能依赖哪些模块?" | 图索引 ✨ |
-
-**最佳实践**:ApeRAG 同时支持向量检索、全文检索和图索引,可以根据问题类型智能选择或组合使用。
-
-## 3. 构建流程概览
-
-当你上传一个文档并启用图索引后,ApeRAG 会自动完成以下步骤。这里先给出一个简单的概览,具体细节在后面章节详细介绍。
-
-### 3.1 五个关键步骤
-
-```mermaid
-flowchart TB
- subgraph Step1["1️⃣ 文档分块"]
- A1[原始文档] --> A2[智能分块]
- A2 --> A3[生成 Chunks]
- end
-
- subgraph Step2["2️⃣ 实体关系提取"]
- B1[Chunks] --> B2[调用 LLM]
- B2 --> B3[识别实体]
- B2 --> B4[识别关系]
- end
-
- subgraph Step3["3️⃣ 连通分量分析"]
- C1[实体关系网络] --> C2[BFS 算法]
- C2 --> C3[分组]
- end
-
- subgraph Step4["4️⃣ 并发合并"]
- D1[分组 1] --> D2[实体去重]
- D3[分组 2] --> D4[实体去重]
- D5[分组 N] --> D6[实体去重]
- D2 --> D7[关系聚合]
- D4 --> D7
- D6 --> D7
- end
-
- subgraph Step5["5️⃣ 多存储写入"]
- E1[图数据库]
- E2[向量数据库]
- E3[文本存储]
- end
-
- A3 --> B1
- B3 --> C1
- B4 --> C1
- C3 --> D1
- C3 --> D3
- C3 --> D5
- D7 --> E1
- D7 --> E2
- A3 --> E3
-
- style Step1 fill:#e3f2fd
- style Step2 fill:#fff3e0
- style Step3 fill:#f3e5f5
- style Step4 fill:#e8f5e9
- style Step5 fill:#fce4ec
-```
-
-**简单来说**,就是:文档分块 → 提取实体关系 → 智能分组 → 并发合并 → 写入存储。
-
-整个过程完全自动化,你只需要上传文档,系统会自动完成所有工作。
-
-### 3.2 处理时间参考
-
-不同规模的文档,处理时间大致如下:
-
-| 文档大小 | 实体数量 | 处理时间 | 说明 |
-|---------|---------|---------|------|
-| 小型(< 5 页) | ~50 个 | 10-30 秒 | 公司通知、会议纪要 |
-| 中型(10-50 页) | ~200 个 | 1-3 分钟 | 技术文档、产品手册 |
-| 大型(100+ 页) | ~1000 个 | 5-15 分钟 | 研究报告、书籍 |
-
-**影响因素**:
-- LLM 响应速度(主要瓶颈)
-- 文档复杂度(表格、图片多会慢一些)
-- 并发设置(可以通过配置提速)
-
-> 💡 **提示**:处理是异步的,你可以上传多个文档,系统会并行处理。
-
-### 3.3 实时进度查看
-
-你可以随时查看文档的处理进度:
-
-```
-文档状态:处理中
-- ✅ 文档解析:完成
-- ✅ 文档分块:完成(生成 25 个 chunks)
-- 🔄 实体提取:进行中(15/25)
-- ⏳ 关系提取:等待中
-- ⏳ 图谱构建:等待中
-```
-
-处理完成后,文档状态会变为"活跃",此时就可以进行图谱查询了。
-
-## 4. 详细构建流程
-
-前面介绍了图索引能做什么以及整体流程概览。如果你想了解更多技术细节,这一章会详细介绍每个步骤的具体实现。
-
-> 💡 **阅读建议**:如果你只是想了解图索引的基本概念和用法,可以跳过这一章,直接看第 8 章的实际应用场景。
-
-### 4.1 文档分块
-
-第一步是把长文档切成合适大小的块(chunks)。
-
-**为什么要分块?**
-- LLM 有输入长度限制(通常几千到几万 tokens)
-- 块太大:提取质量下降,LLM 容易"遗漏"信息
-- 块太小:丢失上下文,无法理解完整语义
-
-**智能分块策略**:
-
-```mermaid
-flowchart LR
- Doc[长文档] --> Check{检查大小}
- Check -->|小于 1200 tokens| Keep[保持完整]
- Check -->|大于 1200 tokens| Split[智能分割]
-
- Split --> By1[按段落分]
- By1 --> Check2{还是太大?}
- Check2 -->|是| By2[按句子分]
- Check2 -->|否| Done[完成]
- By2 --> Check3{还是太大?}
- Check3 -->|是| By3[按字符分]
- Check3 -->|否| Done
- By3 --> Done
-
- style Doc fill:#e1f5ff
- style Split fill:#ffccbc
- style Done fill:#c5e1a5
-```
-
-**分块参数**:
-- 默认大小:1200 tokens(约 800-1000 个中文字)
-- 重叠大小:100 tokens(保证上下文连续)
-- 优先级:段落 > 句子 > 字符
-
-### 4.2 实体关系提取
-
-使用 LLM 从每个 chunk 中识别实体和关系。
-
-**提取过程**:
-
-```mermaid
-sequenceDiagram
- participant C as Chunk
- participant L as LLM
- participant R as 结果
-
- C->>L: "张三是数据库团队负责人..."
- L->>R: 实体: [张三(人物), 数据库团队(组织)]
- L->>R: 关系: [张三-负责->数据库团队]
-
- C->>L: "张三擅长 PostgreSQL..."
- L->>R: 实体: [张三(人物), PostgreSQL(技术)]
- L->>R: 关系: [张三-擅长->PostgreSQL]
-```
-
-**并发优化**:多个 chunks 可以同时调用 LLM,默认并发 20 个请求。
-
-### 4.3 连通分量分析
-
-把实体关系网络分成独立的子图,实现并行处理。
-
-**为什么需要这一步?**
-
-技术团队的实体和财务部门的实体之间没有连接,可以完全并行处理!
-
-```mermaid
-graph LR
- subgraph 分量1[连通分量 1 - 技术团队]
- A1[张三] -->|负责| A2[数据库团队]
- A1 -->|擅长| A3[PostgreSQL]
- A4[李四] -->|协作| A1
- end
-
- subgraph 分量2[连通分量 2 - 财务部门]
- B1[王五] -->|属于| B2[财务部]
- B3[赵六] -->|协作| B1
- end
-
- style 分量1 fill:#bbdefb
- style 分量2 fill:#c5e1a5
-```
-
-**性能提升**:3 个独立分量 = 3 倍加速!
-
-### 4.4 并发合并
-
-同名实体需要去重,相同关系需要聚合。
-
-```mermaid
-flowchart TD
- subgraph Before["合并前"]
- A1["张三
数据库负责人"]
- A2["张三
擅长 PostgreSQL"]
- A3["张三
带领团队"]
- end
-
- Merge[智能合并]
-
- subgraph After["合并后"]
- B1["张三
数据库团队负责人,
擅长 PostgreSQL,
带领团队完成多个项目"]
- end
-
- A1 --> Merge
- A2 --> Merge
- A3 --> Merge
- Merge --> B1
-
- style Before fill:#ffccbc
- style After fill:#c5e1a5
-```
-
-**细粒度锁**:只锁定正在合并的实体,其他实体可以并发处理。
-
-### 4.5 多存储写入
-
-知识图谱写入三个存储系统:
-
-```mermaid
-flowchart LR
- KG[知识图谱] --> G[图数据库
图查询]
- KG --> V[向量数据库
语义搜索]
- KG --> T[文本存储
全文检索]
-
- style KG fill:#e1f5ff
- style G fill:#bbdefb
- style V fill:#c5e1a5
- style T fill:#ffccbc
-```
-
-不同存储支持不同类型的查询,互相补充。
-
-## 5. 核心技术设计
-
-这一章介绍 ApeRAG 图索引的核心技术设计,包括数据隔离、并发控制等。
-
-> 💡 **阅读建议**:这些是系统架构和实现细节,主要面向开发者和技术决策者。
-
-### 5.1 workspace 数据隔离
-
-每个 Collection 拥有独立的命名空间,实现完全的数据隔离。
-
-**命名规范**:
-
-```python
-# 实体命名
-entity:{entity_name}:{workspace}
-# 示例
-entity:张三:collection_abc123
-
-# 关系命名
-relationship:{source}:{target}:{workspace}
-# 示例
-relationship:张三:数据库团队:collection_abc123
-```
-
-**隔离效果**:
-
-```mermaid
-graph TB
- subgraph Collection_A[Collection A - 公司文档]
- A1[entity:张三:A] --> A2[entity:数据库团队:A]
- end
-
- subgraph Collection_B[Collection B - 学校文档]
- B1[entity:张三:B] --> B2[entity:计算机系:B]
- end
-
- style Collection_A fill:#bbdefb
- style Collection_B fill:#c5e1a5
-```
-
-两个 Collection 中的"张三"完全独立,互不干扰!
-
-### 5.2 无状态实例管理
-
-每个处理任务创建独立的图索引实例,处理完成后销毁。
-
-**生命周期管理**:
-
-```mermaid
-sequenceDiagram
- participant C as Celery Task
- participant M as Manager
- participant R as Graph Index Instance
- participant S as Storage
-
- C->>M: process_document()
- M->>R: create_instance()
- R->>S: 初始化存储连接
- R->>R: 处理文档
- R->>S: 写入数据
- R-->>M: 返回结果
- M-->>C: 任务完成
- Note over R: 实例被销毁,资源释放
-```
-
-**优势**:
-
-- ✅ 零状态污染:每个任务独立,不会互相干扰
-- ✅ 自动资源管理:实例销毁时自动释放资源
-- ✅ 易于扩展:可以同时运行多个 Worker
-
-### 5.3 连通分量并发优化
-
-通过图拓扑分析,实现智能并发处理。
-
-**算法原理**:
-
-```mermaid
-graph TB
- subgraph Input[输入:实体关系网络]
- I1[实体 1] --> I2[实体 2]
- I2 --> I3[实体 3]
-
- I4[实体 4] --> I5[实体 5]
-
- I6[实体 6]
- end
-
- Algorithm[BFS 算法]
-
- subgraph Output[输出:3 个连通分量]
- O1[分量 1
3 个实体]
- O2[分量 2
2 个实体]
- O3[分量 3
1 个实体]
- end
-
- Input --> Algorithm
- Algorithm --> Output
-
- style Input fill:#ffccbc
- style Algorithm fill:#fff59d
- style Output fill:#c5e1a5
-```
-
-**性能提升**:3 个分量并发处理 = 3 倍加速!
-
-### 5.4 细粒度并发控制
-
-实现实体级别的精确锁定:
-
-**锁的层次**:
-
-```mermaid
-graph TD
- A[全局锁 - 传统方案] -->|太粗| B[所有实体串行处理]
-
- C[实体锁 - ApeRAG] -->|刚好| D[只锁定需要合并的实体]
-
- style A fill:#ffccbc
- style B fill:#ffccbc
- style C fill:#c5e1a5
- style D fill:#c5e1a5
-```
-
-**锁策略**:
-1. 提取阶段无锁:完全并行
-2. 合并阶段加锁:只锁需要的实体
-3. 排序获取锁:避免死锁
-
-### 5.5 智能摘要生成
-
-自动压缩过长的描述内容:
-
-```python
-if len(description) > 2000 tokens:
- summary = await llm_summarize(description)
-else:
- summary = description
-```
-
-**效果**:2500 tokens 压缩到 200 tokens,保留核心信息。
-
-### 5.6 多存储后端支持
-
-ApeRAG 支持两种图数据库:Neo4j 和 PostgreSQL。
-
-**如何选择?**
-
-| 场景 | 推荐方案 | 原因 |
-|------|---------|------|
-| **小规模**(< 10万实体) | PostgreSQL | 运维简单,成本低 |
-| **中等规模**(10-100万) | PostgreSQL 或 Neo4j | 根据查询复杂度选择 |
-| **大规模**(> 100万) | Neo4j | 图查询性能更好 |
-| **预算有限** | PostgreSQL | 无需额外部署 |
-| **复杂图算法** | Neo4j | 内置图算法支持 |
-
-**切换方式**:
-
-```bash
-# 使用 PostgreSQL(默认)
-export GRAPH_INDEX_GRAPH_STORAGE=PGOpsSyncGraphStorage
-
-# 使用 Neo4j
-export GRAPH_INDEX_GRAPH_STORAGE=Neo4JSyncStorage
-```
-
-**性能对比**:
-
-| 操作 | PostgreSQL | Neo4j |
-|------|-----------|-------|
-| **简单查询**(1-2跳) | 快 | 快 |
-| **复杂查询**(3+跳) | 中 | 快 |
-| **批量写入** | 快 | 中 |
-| **图算法** | 需要自己实现 | 内置支持 |
-
-## 6. 完整数据流
-
-整个图索引构建过程是一个数据转换流水线,从非结构化文本到结构化知识图谱:
-
-```mermaid
-flowchart TD
- A[原始文档] --> B[清理预处理]
- B --> C[智能分块]
- C --> D[Chunks]
-
- D --> E[LLM 并发提取]
- E --> F[原始实体列表]
- E --> G[原始关系列表]
-
- F --> H[构建邻接图]
- G --> H
- H --> I[BFS 发现连通分量]
- I --> J[分组并发处理]
-
- J --> K[实体去重合并]
- J --> L[关系聚合]
-
- K --> M{描述长度检查}
- M -->|过长| N[LLM 摘要]
- M -->|适中| O[保留原文]
- N --> P[最终实体]
- O --> P
-
- L --> Q{描述长度检查}
- Q -->|过长| R[LLM 摘要]
- Q -->|适中| S[保留原文]
- R --> T[最终关系]
- S --> T
-
- P --> U[图数据库]
- P --> V[向量数据库]
- T --> U
- T --> V
- D --> W[文本存储]
-
- U --> X[知识图谱完成]
- V --> X
- W --> X
-
- style A fill:#e1f5ff
- style E fill:#fff59d
- style I fill:#f3e5f5
- style J fill:#c5e1a5
- style X fill:#c8e6c9
-```
-
-### 数据转换示例
-
-让我们用一个具体例子,看看数据是如何一步步转换的:
-
-**输入文档**:
-
-```text
-张三是数据库团队的负责人,他擅长 PostgreSQL 和 MySQL。
-李四在前端团队工作,经常和张三的团队协作开发后台管理系统。
-王五是财务部的会计,负责公司的财务报表。
-```
-
-**Step 1: 分块**
-
-```json
-[
- {
- "chunk_id": "chunk-001",
- "content": "张三是数据库团队的负责人,他擅长 PostgreSQL 和 MySQL。",
- "tokens": 25
- },
- {
- "chunk_id": "chunk-002",
- "content": "李四在前端团队工作,经常和张三的团队协作开发后台管理系统。",
- "tokens": 28
- },
- {
- "chunk_id": "chunk-003",
- "content": "王五是财务部的会计,负责公司的财务报表。",
- "tokens": 20
- }
-]
-```
-
-**Step 2: 实体关系提取**
-
-```json
-{
- "entities": [
- {"name": "张三", "type": "人物", "source": "chunk-001"},
- {"name": "数据库团队", "type": "组织", "source": "chunk-001"},
- {"name": "PostgreSQL", "type": "技术", "source": "chunk-001"},
- {"name": "MySQL", "type": "技术", "source": "chunk-001"},
- {"name": "李四", "type": "人物", "source": "chunk-002"},
- {"name": "前端团队", "type": "组织", "source": "chunk-002"},
- {"name": "王五", "type": "人物", "source": "chunk-003"},
- {"name": "财务部", "type": "组织", "source": "chunk-003"}
- ],
- "relationships": [
- {"source": "张三", "target": "数据库团队", "relation": "负责"},
- {"source": "张三", "target": "PostgreSQL", "relation": "擅长"},
- {"source": "张三", "target": "MySQL", "relation": "擅长"},
- {"source": "李四", "target": "前端团队", "relation": "属于"},
- {"source": "李四", "target": "张三", "relation": "协作"},
- {"source": "王五", "target": "财务部", "relation": "属于"}
- ]
-}
-```
-
-**Step 3: 连通分量分析**
-
-```
-连通分量 1(技术部门):
-- 实体:张三、李四、数据库团队、前端团队、PostgreSQL、MySQL
-- 关系:6 条
-
-连通分量 2(财务部门):
-- 实体:王五、财务部
-- 关系:1 条
-```
-
-**Step 4: 并发合并**
-
-两个分量可以并行处理!
-
-**Step 5: 最终知识图谱**
-
-```mermaid
-graph LR
- subgraph 技术部门
- 张三 -->|负责| 数据库团队
- 张三 -->|擅长| PostgreSQL
- 张三 -->|擅长| MySQL
- 李四 -->|属于| 前端团队
- 李四 -->|协作| 张三
- end
-
- subgraph 财务部门
- 王五 -->|属于| 财务部
- end
-
- style 技术部门 fill:#bbdefb
- style 财务部门 fill:#c5e1a5
-```
-
-### 性能优化特性
-
-1. **细粒度并发控制**
- - 实体级别的锁:`entity:张三:collection_abc`
- - 只在合并时加锁,提取时完全并行
-
-2. **连通分量并发**
- - 技术部门和财务部门可以并行处理
- - 零锁竞争,充分利用多核 CPU
-
-3. **智能摘要**
- - 描述 < 2000 tokens:保留原文
- - 描述 > 2000 tokens:LLM 摘要压缩
-
-## 7. 性能优化策略
-
-### 7.1 并发度控制
-
-图索引构建涉及大量的 LLM 调用和数据库操作,需要合理控制并发度。
-
-**并发层次**:
-
-```mermaid
-graph TB
- A[文档级并发] --> B[Chunk 级并发]
- B --> C[连通分量级并发]
- C --> D[实体级并发]
-
- A1[Celery Workers
多个文档同时处理] --> A
- B1[LLM 并发调用
多个 chunks 同时提取] --> B
- C1[分量并行合并
多个分量同时处理] --> C
- D1[实体并发合并
不同实体同时合并] --> D
-
- style A fill:#e3f2fd
- style B fill:#fff3e0
- style C fill:#f3e5f5
- style D fill:#e8f5e9
-```
-
-**并发参数配置**:
-
-| 参数 | 默认值 | 说明 |
-|------|--------|------|
-| `llm_model_max_async` | 20 | LLM 并发调用数 |
-| `embedding_func_max_async` | 16 | Embedding 并发调用数 |
-| `max_batch_size` | 32 | 批量处理大小 |
-
-**调优建议**:
-
-```python
-# 场景 1:LLM API 限流严格
-llm_model_max_async = 5 # 降低并发,避免触发限流
-
-# 场景 2:性能充足,想提速
-llm_model_max_async = 50 # 提高并发,加快处理速度
-
-# 场景 3:内存有限
-max_batch_size = 16 # 减小批量大小,降低内存占用
-```
-
-### 7.2 LLM 调用优化
-
-LLM 调用是最耗时的环节,主要优化策略:
-
-1. **并发调用**:多个 chunks 同时提取(默认并发 20 个)
-2. **批量处理**:减少 LLM 调用次数
-3. **缓存复用**:相似描述复用摘要结果
-
-**性能提升**:并发调用比串行快 4 倍。
-
-### 7.3 存储优化
-
-批量写入可以显著提升性能:
-
-| 方式 | 100 个实体写入时间 |
-|------|------------------|
-| 逐个写入 | ~10 秒 |
-| 批量写入(32 个/批) | ~1 秒 |
-
-**优化效果**:10 倍速度提升!
-
-### 7.4 内存优化
-
-大文档处理的内存管理策略:
-
-- 流式分块:不一次性加载整个文档
-- 及时释放:处理完立即释放内存
-- 分批处理:控制内存峰值
-
-### 7.5 性能监控
-
-系统会输出详细的性能统计:
-
-```
-图索引构建完成:
-✓ 文档分块:10 个 chunks,耗时 0.5 秒
-✓ 实体提取:120 个实体,耗时 25 秒
-✓ 关系提取:85 个关系,耗时 25 秒
-✓ 并发合并:耗时 15 秒
-✓ 存储写入:耗时 2 秒
-━━━━━━━━━━━━━━━━━━━━━━━━━
-总耗时:42.7 秒
-```
-
-**瓶颈分析**:实体/关系提取占 60% 时间,可通过提高 LLM 并发度优化。
-
-## 8. 配置参数
-
-### 8.1 核心配置
-
-图索引构建可以通过以下参数进行调优:
-
-**分块参数**:
-
-```python
-# 分块大小(tokens)
-CHUNK_TOKEN_SIZE = 1200
-
-# 重叠大小(tokens)
-CHUNK_OVERLAP_TOKEN_SIZE = 100
-```
-
-**调优建议**:
-- 小文档(< 5000 tokens):`CHUNK_TOKEN_SIZE = 800`
-- 大文档(> 50000 tokens):`CHUNK_TOKEN_SIZE = 1500`
-- 需要更多上下文:增加 `CHUNK_OVERLAP_TOKEN_SIZE`
-
-**并发参数**:
-
-```python
-# LLM 并发调用数
-LLM_MODEL_MAX_ASYNC = 20
-
-# Embedding 并发调用数
-EMBEDDING_FUNC_MAX_ASYNC = 16
-
-# 批量处理大小
-MAX_BATCH_SIZE = 32
-```
-
-**调优建议**:
-- LLM API 限流严格:降低 `LLM_MODEL_MAX_ASYNC` 到 5-10
-- 性能充足想提速:提高到 50-100
-- 内存有限:降低 `MAX_BATCH_SIZE` 到 16
-
-**实体提取参数**:
-
-```python
-# 实体提取重试次数(0 = 只提取 1 次)
-ENTITY_EXTRACT_MAX_GLEANING = 0
-
-# 摘要最大 token 数
-SUMMARY_TO_MAX_TOKENS = 2000
-
-# 强制摘要的描述片段数
-FORCE_LLM_SUMMARY_ON_MERGE = 10
-```
-
-**调优建议**:
-- 提取质量重要:`ENTITY_EXTRACT_MAX_GLEANING = 1`(多提取一次)
-- 追求速度:`ENTITY_EXTRACT_MAX_GLEANING = 0`
-- 描述经常很长:降低 `SUMMARY_TO_MAX_TOKENS` 到 1000
-
-### 8.2 知识图谱配置
-
-在 Collection 配置中可以设置:
-
-```json
-{
- "knowledge_graph_config": {
- "language": "simplified chinese",
- "entity_types": [
- "organization",
- "person",
- "geo",
- "event",
- "product",
- "technology",
- "date",
- "category"
- ]
- }
-}
-```
-
-**参数说明**:
-
-- **language**:提取语言,影响 LLM 提示词
- - `simplified chinese`:简体中文
- - `English`:英文
- - `traditional chinese`:繁体中文
-
-- **entity_types**:要提取的实体类型
- - 默认:8 种类型(组织、人物、地点、事件、产品、技术、日期、类别)
- - 可自定义:比如只提取人物和组织
-
-### 8.3 存储配置
-
-通过环境变量配置存储后端:
-
-```bash
-# KV 存储(键值对)
-export GRAPH_INDEX_KV_STORAGE=PGOpsSyncKVStorage
-
-# 向量存储
-export GRAPH_INDEX_VECTOR_STORAGE=PGOpsSyncVectorStorage
-
-# 图存储
-export GRAPH_INDEX_GRAPH_STORAGE=Neo4JSyncStorage
-# 或者使用 PostgreSQL
-export GRAPH_INDEX_GRAPH_STORAGE=PGOpsSyncGraphStorage
-```
-
-**存储选择建议**:
-
-| 场景 | KV 存储 | 向量存储 | 图存储 |
-|------|---------|---------|--------|
-| **默认** | PostgreSQL | PostgreSQL | PostgreSQL |
-| **高性能向量搜索** | PostgreSQL | Qdrant | Neo4j |
-| **大规模图谱** | PostgreSQL | Qdrant | Neo4j |
-| **简单部署** | PostgreSQL | PostgreSQL | PostgreSQL |
-
-### 8.4 完整配置示例
-
-```bash
-# 分块配置
-export CHUNK_TOKEN_SIZE=1200
-export CHUNK_OVERLAP_TOKEN_SIZE=100
-
-# 并发配置
-export LLM_MODEL_MAX_ASYNC=20
-export MAX_BATCH_SIZE=32
-
-# 提取配置
-export ENTITY_EXTRACT_MAX_GLEANING=0
-export SUMMARY_TO_MAX_TOKENS=2000
-
-# 存储配置
-export GRAPH_INDEX_KV_STORAGE=PGOpsSyncKVStorage
-export GRAPH_INDEX_VECTOR_STORAGE=PGOpsSyncVectorStorage
-export GRAPH_INDEX_GRAPH_STORAGE=PGOpsSyncGraphStorage
-
-# 数据库连接(PostgreSQL)
-export POSTGRES_HOST=127.0.0.1
-export POSTGRES_PORT=5432
-export POSTGRES_DB=aperag
-export POSTGRES_USER=postgres
-export POSTGRES_PASSWORD=your_password
-
-# 数据库连接(Neo4j,可选)
-export NEO4J_HOST=127.0.0.1
-export NEO4J_PORT=7687
-export NEO4J_USERNAME=neo4j
-export NEO4J_PASSWORD=your_password
-```
-
-## 9. 实际应用场景
-
-图索引特别适合以下场景:
-
-### 9.1 企业知识库
-
-**场景描述**:公司有大量的技术文档、组织架构、项目资料。
-
-**图索引的价值**:
-
-- ✅ 理解人员关系:谁和谁在一起工作过
-- ✅ 追溯项目历史:哪些人参与了哪些项目
-- ✅ 技术栈分析:哪个团队用什么技术
-- ✅ 知识传承:某个领域的专家是谁
-
-**查询示例**:
-
-```
-用户:"张三参与过哪些项目?"
-图索引:查询 张三 --参与--> 项目 的关系
-结果:项目 A、项目 B、项目 C
-
-用户:"数据库团队都有哪些人?"
-图索引:查询 人物 --属于--> 数据库团队 的关系
-结果:张三、李四、王五
-```
-
-### 8.2 研究论文分析
-
-**场景描述**:分析大量学术论文,理解研究脉络。
-
-**图索引的价值**:
-
-- ✅ 作者合作网络:谁和谁合作过
-- ✅ 引用关系:哪些论文互相引用
-- ✅ 研究主题:某个领域的核心概念
-- ✅ 技术演进:技术如何发展的
-
-**查询示例**:
-
-```
-用户:"Graph RAG 相关的研究有哪些?"
-图索引:查询 论文 --研究--> Graph RAG 的关系
-结果:论文 A、论文 B、论文 C
-
-用户:"某作者和谁合作过?"
-图索引:查询 作者 --合作--> 其他作者 的关系
-结果:合作者列表及合作项目
-```
-
-### 8.3 产品文档
-
-**场景描述**:软件产品的用户手册、API 文档。
-
-**图索引的价值**:
-
-- ✅ 功能依赖:某个功能依赖哪些其他功能
-- ✅ API 关联:哪些 API 经常一起使用
-- ✅ 配置关系:某个配置项影响哪些功能
-- ✅ 问题诊断:出现某个错误可能是什么原因
-
-**查询示例**:
-
-```
-用户:"如何配置图索引?"
-图索引:查询 配置项 --影响--> 图索引 的关系
-结果:GRAPH_INDEX_GRAPH_STORAGE、knowledge_graph_config
-
-用户:"Neo4j 和 PostgreSQL 有什么区别?"
-图索引:查询 Neo4j、PostgreSQL 的属性和关系
-结果:性能对比、适用场景、配置方式
-```
-
-### 8.4 对话场景对比
-
-让我们看看不同检索方式在实际对话中的表现:
-
-**问题:"张三和李四是什么关系?"**
-
-| 检索方式 | 能否回答 | 回答质量 |
-|---------|---------|---------|
-| **纯向量检索** | ⚠️ 部分 | 找到提到两人的段落,但不清楚关系 |
-| **纯全文检索** | ⚠️ 部分 | 找到包含"张三"和"李四"的段落 |
-| **图索引** | ✅ 可以 | 直接返回:张三和李四是协作关系 |
-
-**问题:"PostgreSQL 配置文件在哪?"**
-
-| 检索方式 | 能否回答 | 回答质量 |
-|---------|---------|---------|
-| **纯向量检索** | ✅ 可以 | 找到相关配置段落 |
-| **纯全文检索** | ✅ 可以 | 精确匹配"PostgreSQL"和"配置" |
-| **图索引** | ✅ 可以 | 找到 PostgreSQL --配置--> 文件 的关系 |
-
-**问题:"如何提升系统性能?"**
-
-| 检索方式 | 能否回答 | 回答质量 |
-|---------|---------|---------|
-| **纯向量检索** | ✅ 强 | 找到所有性能优化相关内容 |
-| **纯全文检索** | ⚠️ 中 | 需要精确关键词"性能"、"优化" |
-| **图索引** | ✅ 强 | 找到 优化方法 --提升--> 性能 的关系 |
-
-**最佳实践**:结合使用多种检索方式!
-
-## 10. 总结
-
-ApeRAG 的图索引提供了生产级的知识图谱构建能力,具有高性能、高可靠性和易扩展的特点。
-
-### 关键特性
-
-1. **workspace 数据隔离**:每个 Collection 完全独立,支持真正的多租户
-2. **无状态架构**:每个任务独立实例,零状态污染
-3. **连通分量并发**:智能并发策略,性能提升 2-3 倍
-4. **细粒度锁管理**:实体级别的锁,最大化并发度
-5. **智能摘要**:自动压缩过长描述,节省存储和提升检索效率
-6. **多存储支持**:灵活选择 Neo4j 或 PostgreSQL
-
-### 适用场景
-
-- ✅ **企业知识库**:理解组织结构、人员关系、项目历史
-- ✅ **研究论文分析**:作者合作网络、引用关系、研究脉络
-- ✅ **产品文档**:功能依赖、配置关系、问题诊断
-- ✅ **任何需要理解"关系"的场景**
-
-### 性能表现
-
-- 处理 10,000 个实体:约 2-5 分钟(取决于 LLM 速度)
-- 连通分量并发:性能提升 2-3 倍
-- 内存占用:约 400 MB(10,000 个实体)
-- 存储空间:约 100 MB(10,000 个实体)
-
-### 下一步
-
-图索引构建完成后,就可以进行图谱检索了。ApeRAG 支持三种图谱查询模式:
-
-- **Local 模式**:查询某个实体的局部信息
-- **Global 模式**:查询整体关系和模式
-- **Hybrid 模式**:综合性查询
-
-详细的检索流程请参考 [系统架构文档](./architecture.md#42-知识图谱查询)。
-
----
-
-## 相关文档
-
-- 📋 [系统架构](./architecture.md) - ApeRAG 整体架构设计
-- 📖 [实体提取与合并机制](./lightrag_entity_extraction_and_merging.md) - 核心算法详解
-- 🔗 [连通分量优化](./connected_components_optimization.md) - 并发优化原理
-- 🌐 [索引链路架构](./indexing_architecture.md) - 完整索引流程
diff --git a/web/docs/zh-CN/development/_category.yaml b/web/docs/zh-CN/development/_category.yaml
deleted file mode 100644
index a264578f0..000000000
--- a/web/docs/zh-CN/development/_category.yaml
+++ /dev/null
@@ -1,2 +0,0 @@
-title: 开发
-position: 4
diff --git a/web/docs/zh-CN/development/development-guide.md b/web/docs/zh-CN/development/development-guide.md
deleted file mode 100644
index fc1caaade..000000000
--- a/web/docs/zh-CN/development/development-guide.md
+++ /dev/null
@@ -1,387 +0,0 @@
----
-title: 开发指南
-description: ApeRAG 开发环境设置和工作流程
----
-
-# 🛠️ 开发指南
-
-本指南重点介绍如何为 ApeRAG 设置开发环境和开发工作流程。这是为希望为 ApeRAG 做贡献或在本地运行它进行开发的开发人员设计的。
-
-## 🚀 开发环境设置
-
-按照以下步骤从源代码设置 ApeRAG 进行开发:
-
-### 1. 📂 克隆仓库并设置环境
-
-首先,获取源代码并配置环境变量:
-
-```bash
-git clone https://github.com/apecloud/ApeRAG.git
-cd ApeRAG
-cp envs/env.template .env
-```
-
-如果需要,编辑 `.env` 文件以配置您的 AI 服务设置。默认设置适用于下一步启动的本地数据库服务。
-
-### 2. 📋 系统前提条件
-
-在开始之前,请确保您的系统具备:
-
-* **Node.js**:推荐版本 20 或更高版本用于前端开发。[下载 Node.js](https://nodejs.org/)
-* **Docker & Docker Compose**:本地运行数据库服务所需。[下载 Docker](https://docs.docker.com/get-docker/)
-
-**注意**:需要 Python 3.11,但将在下一步中通过 `uv` 自动管理。
-
-### 3. 🗄️ 启动数据库服务
-
-使用 Docker Compose 启动必要的数据库服务:
-
-```bash
-# 启动核心数据库:PostgreSQL、Redis、Qdrant、Elasticsearch
-make infra-up
-```
-
-这将在后台启动所有必需的数据库服务。您的 `.env` 文件中的默认连接设置已预配置为与这些服务一起工作。
-
-
-高级数据库选项
-
-```bash
-# 使用 Neo4j 而不是 PostgreSQL 进行图存储
-make infra-up WITH_NEO4J=1
-```
-
-
-
-### 4. ⚙️ 设置开发环境
-
-创建 Python 虚拟环境并设置开发工具:
-
-```bash
-make env-dev
-```
-
-此命令将:
-* 如果尚未可用,则安装 `uv`
-* 创建 Python 3.11 虚拟环境(位于 `.venv/` 中)
-* 安装开发工具(redocly、openapi-generator-cli 等)
-* 为代码质量安装 pre-commit hooks
-* 安装 addlicense 工具进行许可证管理
-
-**激活虚拟环境:**
-```bash
-source .venv/bin/activate
-```
-
-当您在终端提示符中看到 `(.venv)` 时,您就知道它是活动的。
-
-### 5. 📦 安装依赖项
-
-安装所有后端和前端依赖项:
-
-```bash
-make env-install
-```
-
-此命令将:
-* 将 `pyproject.toml` 中的所有 Python 后端依赖项安装到虚拟环境中
-* 使用 `yarn` 安装前端 Node.js 依赖项
-
-### 6. 🔄 应用数据库迁移
-
-设置数据库架构:
-
-```bash
-make db-migrate
-```
-
-### 7. ▶️ 启动开发服务
-
-现在您可以启动开发服务。为每个服务打开单独的终端窗口/选项卡:
-
-**终端 1 - 后端 API 服务器:**
-```bash
-make serve-api
-```
-这将在 `http://localhost:8000` 启动 FastAPI 开发服务器,代码更改时自动重新加载。
-
-**终端 2 - Celery Worker:**
-```bash
-make serve-worker
-```
-这将启动 Celery worker 以处理异步后台任务。
-
-**终端 3 - 前端(可选):**
-```bash
-make serve-web
-```
-这将在 `http://localhost:3000` 启动前端开发服务器,支持热重载。
-
-### 8. 🌐 访问 ApeRAG
-
-服务运行后,您可以访问:
-* **前端 UI**:http://localhost:3000 (如果已启动)
-* **后端 API**:http://localhost:8000
-* **API 文档**:http://localhost:8000/docs
-
-### 9. ⏹️ 停止服务
-
-要停止开发环境:
-
-**停止数据库服务:**
-```bash
-# 停止数据库服务(保留数据)
-make stack-down
-
-# 停止服务并移除所有数据卷
-make stack-down REMOVE_VOLUMES=1
-```
-
-**停止开发服务:**
-- 后端 API 服务器:在运行 `make serve-api` 的终端中按 `Ctrl+C`
-- Celery Worker:在运行 `make serve-worker` 的终端中按 `Ctrl+C`
-- 前端服务器:在运行 `make serve-web` 的终端中按 `Ctrl+C`
-
-**数据管理:**
-- `make stack-down` - 停止服务但保留所有数据(PostgreSQL、Redis、Qdrant 等)
-- `make stack-down REMOVE_VOLUMES=1` - 停止服务并**⚠️ 永久删除所有数据**
-- 即使已经运行过 `make stack-down`,您也可以运行 `make stack-down REMOVE_VOLUMES=1`
-
-**验证数据移除:**
-```bash
-# 检查卷是否仍然存在
-docker volume ls | grep aperag
-
-# REMOVE_VOLUMES=1 后应该不返回结果
-```
-
-现在您已经从源代码本地运行 ApeRAG,准备好进行开发!🎉
-
-## ❓ 常见开发任务
-
-### Q: 🔧 如何添加或修改 REST API 端点?
-
-**完整工作流程:**
-1. 编辑 OpenAPI 规范:`aperag/api/paths/[endpoint-name].yaml`
-2. 重新生成后端模型:
- ```bash
- make api-generate-models # 这会在内部运行 merge-openapi
- ```
-3. 实现后端视图:`aperag/views/[module].py`
-4. 生成前端 TypeScript 客户端:
- ```bash
- make api-generate-sdk # 更新 frontend/src/api/
- ```
-5. 测试 API:
- ```bash
- make test-all
- # ✅ 检查实时文档:http://localhost:8000/docs
- ```
-
-### Q: 🗃️ 如何修改数据库模型/架构?
-
-**数据库迁移工作流程:**
-1. 编辑 `aperag/db/models.py` 中的 SQLModel 类
-2. 生成迁移文件:
- ```bash
- make db-revision # 在 migration/versions/ 中创建新迁移
- ```
-3. 将迁移应用到数据库:
- ```bash
- make db-migrate # 更新数据库架构
- ```
-4. 更新相关代码(`aperag/db/repositories/` 中的仓库,`aperag/service/` 中的服务)
-5. 验证更改:
- ```bash
- make test-all # ✅ 确保一切正常工作
- ```
-
-### Q: ⚡ 如何添加具有后台处理的新功能?
-
-**功能实现工作流程:**
-1. 实现功能组件:
- - 后端逻辑:`aperag/[module]/`
- - 异步任务:`aperag/tasks/`
- - 数据库模型:`aperag/db/models.py`
-2. 更新 API 并生成代码:
- ```bash
- make db-revision # 生成迁移文件
- make db-migrate # 应用数据库更改
- make api-generate-models # 更新 Pydantic 模型
- make api-generate-sdk # 更新 TypeScript 客户端
- ```
-3. 质量保证:
- ```bash
- make format && make lint && make test-all
- ```
-
-### Q: 🧪 如何运行单元测试和 e2e 测试?
-
-**单元测试(快速,无外部依赖):**
-```bash
-# 运行所有单元测试
-make test-unit
-
-# 运行特定测试文件
-uv run pytest tests/unit_test/test_model_service.py -v
-
-# 运行特定测试类或函数
-uv run pytest tests/unit_test/test_model_service.py::TestModelService::test_get_models -v
-
-# 运行带覆盖率的测试
-uv run pytest tests/unit_test/ --cov=aperag --cov-report=html
-```
-
-**E2E 测试(需要运行服务):**
-```bash
-# 设置:首先启动所需服务
-make infra-up # 🗄️ 启动数据库
-make serve-api # 🚀 启动 API 服务器(单独终端)
-
-# 运行所有 e2e 测试
-make test-e2e
-
-# 运行特定 e2e 测试模块
-uv run pytest tests/e2e_test/test_chat/ -v
-uv run pytest tests/e2e_test/graphstorage/ -v
-
-# 运行带详细输出且不捕获的测试
-uv run pytest tests/e2e_test/test_specific.py -v -s
-
-# 性能基准测试(带计时)
-make test-e2e-perf
-```
-
-**完整测试套件:**
-```bash
-# 运行所有内容(单元 + e2e)
-make test-all
-
-# 使用不同配置进行测试
-make infra-up WITH_NEO4J=1 # 使用 Neo4j 而不是 PostgreSQL 进行测试
-make test-all
-```
-
-### Q: 🐛 如何调试失败的测试?
-
-**调试工作流程:**
-1. 单独运行失败的测试:
- ```bash
- # 带完整输出的单个测试
- uv run pytest tests/unit_test/test_failing.py::test_specific_function -v -s
-
- # 在第一次失败时停止
- uv run pytest tests/unit_test/ -x --tb=short
- ```
-2. 对于 e2e 测试失败,确保服务正在运行:
- ```bash
- make infra-up # 数据库服务
- make serve-api # API 服务器
- make serve-worker # 后台 workers(如果测试异步任务)
- ```
-3. 使用调试工具:
- ```bash
- # 使用 pdb 调试器运行
- uv run pytest tests/unit_test/test_failing.py --pdb
-
- # 在测试期间捕获日志
- uv run pytest tests/e2e_test/test_failing.py --log-cli-level=DEBUG
- ```
-4. 修复并重新测试:
- ```bash
- make format # 自动修复样式问题
- make lint # 检查剩余问题
- uv run pytest tests/path/to/fixed_test.py -v # 验证修复
- ```
-
-### Q: 📊 如何运行 RAG 评估和分析?
-
-**评估工作流程:**
-```bash
-# 确保环境准备就绪
-make infra-up WITH_NEO4J=1 # 使用 Neo4j 获得更好的图性能
-make serve-api
-make serve-worker
-
-# 运行全面的 RAG 评估
-make evaluate # 📊 运行 aperag.evaluation.run 模块
-
-# 📈 检查 tests/report/ 中的评估报告
-```
-
-### Q: 📦 如何安全地更新依赖项?
-
-**Python 依赖项:**
-1. 编辑 `pyproject.toml`(添加/更新包)
-2. 更新虚拟环境:
- ```bash
- make env-install # 使用 uv 同步所有组和额外内容
- make test-all # 验证兼容性
- ```
-
-**前端依赖项:**
-1. 编辑 `frontend/package.json`
-2. 更新并测试:
- ```bash
- cd frontend && yarn install
- make serve-web # 测试前端编译
- make api-generate-sdk # 确保 API 客户端仍然工作
- ```
-
-### Q: 🚀 如何准备代码进行生产部署?
-
-**部署前检查清单:**
-1. 代码质量验证:
- ```bash
- make format # 自动修复所有样式问题
- make lint # 验证无样式违规
- make static-check # MyPy 类型检查
- ```
-2. 全面测试:
- ```bash
- make test-all # 所有单元 + e2e 测试
- make test-e2e-perf # 性能基准测试
- ```
-3. API 一致性:
- ```bash
- make api-generate-models # 确保模型与 OpenAPI 规范匹配
- make api-generate-sdk # 更新前端客户端
- ```
-4. 数据库迁移:
- ```bash
- make db-revision # 生成任何待处理的迁移
- ```
-5. 全栈集成测试:
- ```bash
- make stack-up WITH_NEO4J=1 WITH_DOCRAY=1 # 类似生产的设置
- # 在 http://localhost:3000/web/ 手动测试
- make stack-down
- ```
-
-### Q: 🔄 如何完全重置我的开发环境?
-
-**核选项重置(销毁所有数据):**
-```bash
-make stack-down REMOVE_VOLUMES=1 # ⚠️ 停止服务 + 删除所有数据
-make env-clean # 🧹 清理临时文件
-
-# 重新开始
-make infra-up # 🗄️ 新的数据库
-make db-migrate # 🔄 应用所有迁移
-make serve-api # 🚀 启动 API 服务器
-make serve-worker # ⚡ 启动后台 workers
-```
-
-**软重置(保留数据):**
-```bash
-make stack-down # ⏹️ 停止服务,保留数据
-make infra-up # 🗄️ 重启数据库
-make db-migrate # 🔄 应用任何新迁移
-```
-
-**仅重置 Python 环境:**
-```bash
-rm -rf .venv/ # 🗑️ 移除虚拟环境
-make env-dev # ⚙️ 重新创建所有内容
-source .venv/bin/activate # ✅ 重新激活
-```
diff --git a/web/docs/zh-CN/images/dify/aperag-banner.png b/web/docs/zh-CN/images/dify/aperag-banner.png
deleted file mode 100644
index 338290248..000000000
Binary files a/web/docs/zh-CN/images/dify/aperag-banner.png and /dev/null differ
diff --git a/web/docs/zh-CN/images/dify/step1-subscribe-collection.png b/web/docs/zh-CN/images/dify/step1-subscribe-collection.png
deleted file mode 100644
index 3a9ede8ad..000000000
Binary files a/web/docs/zh-CN/images/dify/step1-subscribe-collection.png and /dev/null differ
diff --git a/web/docs/zh-CN/images/dify/step2-add-mcp.png b/web/docs/zh-CN/images/dify/step2-add-mcp.png
deleted file mode 100644
index 92eb2154d..000000000
Binary files a/web/docs/zh-CN/images/dify/step2-add-mcp.png and /dev/null differ
diff --git a/web/docs/zh-CN/images/dify/step2-api-key.png b/web/docs/zh-CN/images/dify/step2-api-key.png
deleted file mode 100644
index 555b0b7de..000000000
Binary files a/web/docs/zh-CN/images/dify/step2-api-key.png and /dev/null differ
diff --git a/web/docs/zh-CN/images/dify/step2-configure-mcp.png b/web/docs/zh-CN/images/dify/step2-configure-mcp.png
deleted file mode 100644
index 149787ef9..000000000
Binary files a/web/docs/zh-CN/images/dify/step2-configure-mcp.png and /dev/null differ
diff --git a/web/docs/zh-CN/images/dify/step2-mcp-success.png b/web/docs/zh-CN/images/dify/step2-mcp-success.png
deleted file mode 100644
index 7f91bc07e..000000000
Binary files a/web/docs/zh-CN/images/dify/step2-mcp-success.png and /dev/null differ
diff --git a/web/docs/zh-CN/images/dify/step3-create-app.png b/web/docs/zh-CN/images/dify/step3-create-app.png
deleted file mode 100644
index a41dc7cdf..000000000
Binary files a/web/docs/zh-CN/images/dify/step3-create-app.png and /dev/null differ
diff --git a/web/docs/zh-CN/images/dify/step3-select-agent.png b/web/docs/zh-CN/images/dify/step3-select-agent.png
deleted file mode 100644
index ed3b0c00a..000000000
Binary files a/web/docs/zh-CN/images/dify/step3-select-agent.png and /dev/null differ
diff --git a/web/docs/zh-CN/images/dify/step4-configure-agent.png b/web/docs/zh-CN/images/dify/step4-configure-agent.png
deleted file mode 100644
index ac1ea12af..000000000
Binary files a/web/docs/zh-CN/images/dify/step4-configure-agent.png and /dev/null differ
diff --git a/web/docs/zh-CN/images/dify/step4-test-agent.png b/web/docs/zh-CN/images/dify/step4-test-agent.png
deleted file mode 100644
index 92f90d101..000000000
Binary files a/web/docs/zh-CN/images/dify/step4-test-agent.png and /dev/null differ
diff --git a/web/docs/zh-CN/integration/_category.yaml b/web/docs/zh-CN/integration/_category.yaml
deleted file mode 100644
index 9f4f46c2e..000000000
--- a/web/docs/zh-CN/integration/_category.yaml
+++ /dev/null
@@ -1,2 +0,0 @@
-title: 集成
-position: 2
diff --git a/web/docs/zh-CN/integration/dify.md b/web/docs/zh-CN/integration/dify.md
deleted file mode 100644
index ca02e13cc..000000000
--- a/web/docs/zh-CN/integration/dify.md
+++ /dev/null
@@ -1,168 +0,0 @@
----
-title: Dify 集成 ApeRAG
-description: 通过 MCP 协议快速集成 ApeRAG 的 Graph RAG 能力
-keywords: Dify, ApeRAG, MCP, Graph RAG
----
-
-# Dify 集成 ApeRAG
-
-ApeRAG 是一款具备多模态索引、AI 智能体、MCP 支持及可扩展 K8s 部署能力的生产级 RAG 平台,能够帮助用户构建具备**混合检索**、**多模态文档处理**及**企业级管理能力**的复杂 AI 应用。
-
-**核心特点**:
-- 不同于"标准" RAG,ApeRAG 实现了 **Graph-RAG**,通过构建知识图谱理解数据要素之间的深层关系
-- 集成了 **MinerU**,专为复杂文档、科学论文和财务报告设计,可以准确提取表格、公式和工程图表
-- 全面支持 Kubernetes,提供内置的**高可用性**、**可扩展性**和**企业级管理能力**
-
-## 视频演示
-
-
-
-
-
-## Step 1: 准备知识库
-
-打开 ApeRAG Web 界面(见[快速开始](../../../../README-zh.md#快速开始);Docker Compose 启动时一般为 http://localhost:3000/web/)。登录后选择或导入知识库。下文以「三国演义」知识库为例,点击订阅。
-
-
-

-
-
-## Step 2: 配置 MCP Server
-
-### 2.1 添加 MCP Server
-
-来到 Dify - 工具 - MCP,点击添加 MCP Server。
-
-
-

-
-
-### 2.2 填写配置信息
-
-填写 Server URL:`http://localhost:8000/mcp/`(若非本机部署,请改为实际 API 地址,例如 `https://<你的域名>/mcp/`),并粘贴从 ApeRAG 复制的 API Key,点击确定。
-
-
-

-
-
-
-

-
-
-### 2.3 配置成功
-
-MCP Server 添加成功。
-
-
-

-
-
-## Step 3: 创建 Agent 应用
-
-### 3.1 创建应用
-
-来到 Dify - Studio,点击创建应用。
-
-
-

-
-
-### 3.2 选择类型
-
-点击更多基础应用类型,选择 **Agent** 类型,命名后点击创建。
-
-
-

-
-
-## Step 4: 配置 Agent
-
-点击 Agent,输入 Prompt,在工具里添加创建好的 ApeRAG MCP,右上角选择驱动 Agent 的大语言模型,点击发布运行即可使用。
-
-
-

-
-
-
-

-
-
-### Prompt 参考
-
-```markdown
-# ApeRAG 智能助手
-
-您是由 ApeRAG 混合搜索能力驱动的高级 AI 研究助手。您的使命是帮助用户从知识库和网络中准确、自主地查找、理解和综合信息。
-
-## 核心行为
-
-**自主研究**:独立工作直到用户查询完全解决。搜索多个来源,分析发现,无需等待许可即提供全面答案。
-
-**语言智能**:始终用用户提问的语言回应。用户用中文提问时,无论源语言如何都用中文回应。
-
-**可视化思维**:**[关键]** 您是一个偏好视觉解释的助手。凡是涉及实体关系、流程或结构的信息,您必须优先考虑将其可视化。
-
-**完整解决**:从多角度探索,交叉验证来源,确保全面覆盖后再回应。
-
-## 搜索策略
-
-### 优先级系统
-1. **用户指定知识库**(通过"@"提及):严格限制仅搜索指定库
-2. **未指定知识库**:自主发现并搜索相关库
-3. **网络搜索**(如启用):补充信息
-4. **清晰归属**:始终标注来源
-
-### 搜索执行
-- **知识库搜索**:默认使用向量+图搜索
-- **结果处理逻辑**:
- 1. 执行搜索
- 2. **检测图数据**:检查搜索结果是否包含 `entities` (实体) 和 `relationships` (关系)
- 3. **强制可视化**:如果搜索结果包含非空的实体或关系数据,**您必须**调用 `create_diagram` 工具
- 4. **内容甄别**:忽略不相关结果
-
-## 可用工具
-
-### 知识管理
-- `list_collections()`:发现可用知识源
-- `search_collection(collection_id, query, ...)`: **[主要工具]** 在持久化知识库中进行混合搜索
-- `search_chat_files(chat_id, query, ...)`: **[仅限聊天]** 仅搜索用户在本次聊天会话中临时上传的文件
-- `create_diagram(content)`:**[强制工具]** 当搜索结果包含结构化信息(实体/关系)时,必须调用此工具生成 Mermaid 图表
-
-### 网络智能
-- `web_search(query, ...)`:多引擎网络搜索
-- `web_read(url_list, ...)`:提取和分析网络内容
-
-## 回应格式
-
-### 直接答案
-[用户语言的清晰、可操作答案]
-
-### 全面分析
-[包含上下文和见解的详细解释]
-
-### 知识图谱可视化
-[此处展示工具生成的图表]
-*(仅在成功调用 create_diagram 后显示。该图表展示了基于搜索结果的实体关系。)*
-
-### 支持证据
-- [知识库名称]:[关键发现]
-
-**网络来源**(如启用):
-- [标题]([域名])- [要点]
-```
-
----
-
-ApeRAG + Dify 的集成非常简单,集成后不仅可以体验 Dify 的平台功能,还可以享受到 **ApeRAG 强大的 Graph-RAG 能力**,感兴趣的小伙伴快去试试吧!
-
-**GitHub**: https://github.com/apecloud/ApeRAG
diff --git a/web/docs/zh-CN/integration/mcp-api.md b/web/docs/zh-CN/integration/mcp-api.md
deleted file mode 100644
index debca5a9a..000000000
--- a/web/docs/zh-CN/integration/mcp-api.md
+++ /dev/null
@@ -1,333 +0,0 @@
----
-title: MCP API
-description: Model Context Protocol API 文档
----
-
-# MCP API
-
-ApeRAG 通过 [Model Context Protocol (MCP)](https://modelcontextprotocol.io/) 对外提供标准化的工具接口,让 AI 助手(Claude Desktop、Cursor、Dify 等)能够直接访问你的知识库。
-
-## 快速开始
-
-### 配置示例
-
-以 Claude Desktop 为例,在配置文件中添加:
-
-```json
-{
- "mcpServers": {
- "aperag": {
- "url": "http://localhost:8000/mcp/",
- "headers": {
- "Authorization": "Bearer your-api-key-here"
- }
- }
- }
-}
-```
-
-### 认证方式
-
-支持两种认证方式(按优先级):
-
-1. **HTTP Authorization 头**(推荐):`Authorization: Bearer your-api-key`
-2. **环境变量**(备用):`APERAG_API_KEY=your-api-key`
-
-> **获取 API Key**:登录 ApeRAG 后,在设置页面创建或复制你的 API Key
-
-## 可用工具
-
-### 1. list_collections
-
-列出所有可访问的知识库。
-
-**参数**:无
-
-**返回**:
-```json
-{
- "items": [
- {
- "id": "collection-id",
- "title": "知识库标题",
- "description": "知识库描述"
- }
- ]
-}
-```
-
-### 2. search_collection
-
-在知识库中搜索,支持多种检索方式。
-
-**核心参数**:
-
-| 参数 | 类型 | 默认值 | 说明 |
-|------|------|--------|------|
-| `collection_id` | string | 必需 | 知识库 ID |
-| `query` | string | 必需 | 搜索问题 |
-| `use_vector_index` | bool | true | 向量检索(语义搜索) |
-| `use_fulltext_index` | bool | true | 全文检索(关键词匹配) |
-| `use_graph_index` | bool | true | 图谱检索(关系查询) |
-| `use_summary_index` | bool | true | 摘要检索 |
-| `use_vision_index` | bool | true | 视觉检索(图片搜索) |
-| `rerank` | bool | true | AI 重排序 |
-| `topk` | int | 5 | 每种方式返回的结果数 |
-
-**返回格式**:
-```json
-{
- "query": "你的问题",
- "items": [
- {
- "rank": 1,
- "score": 0.95,
- "content": "相关内容片段",
- "source": "文档名称",
- "recall_type": "vector_search|graph_search|fulltext_search|summary_search",
- "metadata": {
- "page_idx": 0,
- "document_id": "doc-id",
- "collection_id": "col-id",
- "indexer": "text|vision"
- }
- }
- ]
-}
-```
-
-**图片处理**:
-
-如果 `metadata.indexer == "vision"`,表示这是一张图片:
-- `content` 为空:通过多模态向量检索
-- `content` 不为空:包含图片的文字描述
-
-显示图片的 URL 格式:
-```python
-m = item.metadata
-asset_url = f"asset://{m['asset_id']}?document_id={m['document_id']}&collection_id={m['collection_id']}&mime_type={m['mimetype']}"
-```
-
-**使用示例**:
-
-```python
-# 默认搜索(推荐)- 启用所有检索方式
-results = search_collection(
- collection_id="abc123",
- query="如何部署应用?"
-)
-
-# 仅向量+图谱检索
-results = search_collection(
- collection_id="abc123",
- query="部署策略",
- use_vector_index=True,
- use_fulltext_index=False,
- use_graph_index=True,
- use_summary_index=False,
- topk=10
-)
-```
-
-### 3. search_chat_files
-
-在聊天会话的临时文件中搜索。
-
-**何时使用**:
-- ✅ 用户在当前对话中上传了文件
-- ✅ 需要分析聊天中的临时文档
-- ❌ 不要用于搜索持久化的知识库(应该用 `search_collection`)
-
-**参数**:
-
-| 参数 | 类型 | 默认值 | 说明 |
-|------|------|--------|------|
-| `chat_id` | string | 必需 | 聊天 ID |
-| `query` | string | 必需 | 搜索问题 |
-| `use_vector_index` | bool | true | 向量检索 |
-| `use_fulltext_index` | bool | true | 全文检索 |
-| `rerank` | bool | true | 重排序 |
-| `topk` | int | 5 | 返回结果数 |
-
-**返回格式**:与 `search_collection` 相同
-
-### 4. web_search
-
-搜索互联网内容。
-
-**参数**:
-
-| 参数 | 类型 | 默认值 | 说明 |
-|------|------|--------|------|
-| `query` | string | "" | 搜索关键词 |
-| `max_results` | int | 5 | 返回结果数 |
-| `source` | string | "" | 指定域名(如 `vercel.com`) |
-| `timeout` | int | 30 | 超时时间(秒) |
-| `locale` | string | "en-US" | 语言地区 |
-
-**使用模式**:
-
-```python
-# 常规搜索
-web_search(query="ApeRAG 2025")
-
-# 指定网站搜索
-web_search(query="部署文档", source="vercel.com")
-
-```
-
-### 5. web_read
-
-读取网页内容。
-
-**参数**:
-
-| 参数 | 类型 | 默认值 | 说明 |
-|------|------|--------|------|
-| `url_list` | list[str] | 必需 | URL 列表 |
-| `timeout` | int | 30 | 超时时间(秒) |
-| `max_concurrent` | int | 5 | 最大并发数 |
-
-**返回**:
-```json
-{
- "results": [
- {
- "status": "success",
- "url": "https://example.com",
- "title": "页面标题",
- "content": "提取的文本内容",
- "word_count": 1234
- }
- ]
-}
-```
-
-**示例**:
-```python
-# 读取单个页面
-web_read(url_list=["https://example.com/article"])
-
-# 批量读取
-web_read(
- url_list=["https://example.com/page1", "https://example.com/page2"],
- max_concurrent=2
-)
-```
-
-## 实战示例
-
-### 示例 1:知识库问答
-
-```python
-# 1. 列出所有知识库
-collections = list_collections()
-
-# 2. 选择一个知识库
-collection_id = collections.items[0].id
-
-# 3. 搜索(默认启用所有检索方式)
-results = search_collection(
- collection_id=collection_id,
- query="如何优化性能?"
-)
-
-# 4. 处理结果
-for item in results.items:
- print(f"[{item.recall_type}] {item.content}")
- print(f"来源: {item.source}, 得分: {item.score}\n")
-```
-
-### 示例 2:图谱可视化
-
-```python
-# 搜索并获取实体关系数据
-results = search_collection(
- collection_id="abc123",
- query="刘备和诸葛亮的关系",
- use_graph_index=True # 确保启用图谱检索
-)
-
-# 检查是否包含图谱数据
-if results.graph_search and results.graph_search.entities:
- print("实体:", results.graph_search.entities)
- print("关系:", results.graph_search.relationships)
- # 可以用这些数据生成知识图谱可视化
-```
-
-### 示例 3:混合搜索(知识库 + 互联网)
-
-```python
-# 1. 搜索互联网
-web_results = web_search(query="最新 AI 发展", max_results=3)
-urls = [r.url for r in web_results.results]
-
-# 2. 读取网页内容
-web_content = web_read(url_list=urls)
-
-# 3. 搜索内部知识库
-kb_results = search_collection(
- collection_id="ai-knowledge",
- query="AI 发展趋势"
-)
-
-# 4. 综合分析
-print("=== 互联网资讯 ===")
-for r in web_results.results:
- print(f"{r.title}: {r.url}")
-
-print("\n=== 内部知识 ===")
-for item in kb_results.items:
- print(f"{item.content[:100]}...")
-```
-
-## 注意事项
-
-### 性能优化
-
-1. **合理设置 topk**:
- - 太大会增加 LLM 上下文消耗
- - 太小可能遗漏重要信息
- - 推荐:5-10
-
-2. **选择性启用检索**:
- - 不是所有查询都需要全文检索
- - 全文检索可能返回大量文本
- - 根据问题类型选择合适的检索方式
-
-3. **超时设置**:
- - 图谱检索可能较慢(默认 120 秒)
- - 网络搜索建议 30-60 秒
- - 批量 URL 读取建议 60 秒以上
-
-### 常见问题
-
-**Q: 搜索没有结果?**
-- 检查知识库 ID 是否正确
-- 确认知识库已完成索引构建
-- 尝试不同的检索方式组合
-
-**Q: 图谱数据为空?**
-- 确认知识库启用了 Graph 索引
-- 某些简单文档可能不包含明显的实体关系
-
-**Q: 图片显示不了?**
-- 检查 `metadata.indexer == "vision"`
-- 使用 `asset://` 协议构建 URL
-- 确保包含所有必需参数(asset_id、document_id、collection_id)
-
-## 工具对比
-
-| 工具 | 用途 | 适用场景 |
-|------|------|---------|
-| `list_collections` | 列出知识库 | 查看有哪些可用资源 |
-| `search_collection` | 搜索知识库 | 主要搜索工具,用于持久化知识 |
-| `search_chat_files` | 搜索聊天文件 | 分析用户在对话中上传的临时文件 |
-| `web_search` | 搜索互联网 | 获取实时信息或外部资料 |
-| `web_read` | 读取网页 | 提取网页完整内容 |
-
-## 相关链接
-
-- **MCP 协议官网**: https://modelcontextprotocol.io/
-- **ApeRAG GitHub**: https://github.com/apecloud/ApeRAG
-- **API 文档**: http://localhost:8000/docs (本地部署)