Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
DATABASE_URL=postgresql://contexthub:contexthub@localhost:5432/contexthub
DB_BACKEND=postgres # "postgres" or "opengauss"
API_KEY=changeme
EMBEDDING_MODEL=text-embedding-3-small

# Example openGauss config:
# DATABASE_URL=postgresql://contexthub:ContextHub@123@localhost:15432/contexthub
# DB_BACKEND=opengauss
36 changes: 36 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,40 @@

## 为ContextHub新增openGauss后端

### 一、openGauss server配置

* 参考`docs/setup/opengauss-setup-guide-zh.md`新增server后端
* CREATE DATABASE时需要设定DBCOMPATIBILITY = 'PG'模式, 避免空字符串被interpret成NULL。

### 二、Extension替换

* 原ContextHub项目依赖pgvector+pgcrypto两个插件,这两个库在opengauss都不支持
* 解决方案:opengauss 7.0.0原生支持DataVec,可以替换pgvector;opengauss不支持pgcrypto,替代方案uuid-ossp在官方镜像中也缺失无法安装,最终手动自定义实现`gen_random_uuid`函数,规避pgcrypto extension

### 三、Python Driver兼容

* 原ContextHub使用asyncpg库与db后端交互,但是asyncpg不支持opengauss的vector数据格式
* 报错信息为`message: unhandled standard data type 'vector' (OID 8305)`, 复现脚本为`opengauss/vector_asyncpg.py`
* gaussdb有一个自己维护的`async_gaussdb`库,但是一样不支持vector格式, 验证脚本为`opengauss/vector_async_gaussdb.py`
* 解决方案:新增db兼容层,PG后端仍然使用asyncpg,OpenGauss后端切换到psycopg3连接
* psycopg3 的位置参数语法与asyncpg完全不同,一个是%s一个是$n,用正则匹配转换
* 兼容层处理语法转换,对外封装暴露统一的fetch/fetchrow/fetchall/execute接口
* 相关实现在`src/contexthub/db/repository.py`

### 四、SQL Dialect转写

* 原ContextHub使用postgres方言的SQL,多种语言特性与opengauss不兼容
* 例如,openGauss不支持PG的INSERT ON CONFLICT (需要重写为ON DUPLICATE KEY UPDATE), 且不能与RETURNING语句, ROW POLICY同时使用
* 全项目约20+条需要重写, 详细情况可见于报告`opengauss-compatibility-report.md`

### 整体完成度

* 步骤一、二、三进度100%,目前demo `opengauss/demo_e2e_opengauss.py` 可以成功执行前3个steps
* 步骤四进度50%, 正在处理demo的第四个step的SQL转写,具体可见`ContextHub/src/contexthub/services/skill_service.py`的FIXME


---

<div align="center">

<img src="figures/logo2.jpeg" width="200">
Expand Down
6 changes: 6 additions & 0 deletions alembic/env.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,12 @@
from sqlalchemy import pool
from sqlalchemy.ext.asyncio import async_engine_from_config

import os
if os.environ['DB_BACKEND'] == 'opengauss':
# Force SQLAlchemy to treat openGauss as PostgreSQL 9.2 to bypass the version string error
from sqlalchemy.dialects.postgresql.base import PGDialect
PGDialect._get_server_version_info = lambda *args, **kwargs: (9, 2, 0)

PROJECT_ROOT = Path(__file__).resolve().parents[1]
SRC_DIR = PROJECT_ROOT / "src"
if str(SRC_DIR) not in sys.path:
Expand Down
38 changes: 29 additions & 9 deletions alembic/versions/001_initial_schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,35 @@
depends_on: Union[str, Sequence[str], None] = None


def _is_opengauss() -> bool:
import os
return os.environ.get("DB_BACKEND", "postgres").lower() == "opengauss"


def upgrade() -> None:
opengauss = _is_opengauss()
uuid_default = "uuid_generate_opengauss()" if opengauss else "gen_random_uuid()"

# Extensions
op.execute("CREATE EXTENSION IF NOT EXISTS vector")
op.execute("CREATE EXTENSION IF NOT EXISTS pgcrypto")
if opengauss:
# openGauss 7.0+ has DataVec built-in; avoid pgcrypto and uuid-ossp with
# customized random uuid generation function
op.execute("""
CREATE OR REPLACE FUNCTION uuid_generate_opengauss()
RETURNS uuid
LANGUAGE sql
AS $$
SELECT md5(random()::text || clock_timestamp()::text)::uuid;
$$;
""")
else:
op.execute("CREATE EXTENSION IF NOT EXISTS vector")
op.execute("CREATE EXTENSION IF NOT EXISTS pgcrypto")

# --- contexts ---
op.execute("""
op.execute(f"""
CREATE TABLE contexts (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
id UUID PRIMARY KEY DEFAULT {uuid_default},
uri TEXT NOT NULL,
context_type TEXT NOT NULL CHECK (context_type IN ('table_schema', 'skill', 'memory', 'resource')),
scope TEXT NOT NULL CHECK (scope IN ('datalake', 'team', 'agent', 'user')),
Expand Down Expand Up @@ -79,9 +99,9 @@ def upgrade() -> None:
op.execute("CREATE INDEX idx_deps_dependent ON dependencies (dependent_id)")

# --- change_events (no RLS) ---
op.execute("""
op.execute(f"""
CREATE TABLE change_events (
event_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
event_id UUID PRIMARY KEY DEFAULT {uuid_default},
timestamp TIMESTAMPTZ DEFAULT NOW(),
context_id UUID NOT NULL REFERENCES contexts(id),
account_id TEXT NOT NULL,
Expand Down Expand Up @@ -126,9 +146,9 @@ def upgrade() -> None:
""")

# --- teams ---
op.execute("""
op.execute(f"""
CREATE TABLE teams (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
id UUID PRIMARY KEY DEFAULT {uuid_default},
path TEXT NOT NULL,
parent_id UUID REFERENCES teams(id),
display_name TEXT,
Expand Down Expand Up @@ -247,7 +267,7 @@ def upgrade() -> None:
op.execute("CREATE INDEX idx_qt_context ON query_templates (context_id)")

# --- Seed data ---
op.execute("""
op.execute(f"""
INSERT INTO teams (id, path, parent_id, display_name, account_id) VALUES
('00000000-0000-0000-0000-000000000001', '', NULL, '全组织', 'acme'),
('00000000-0000-0000-0000-000000000002', 'engineering', '00000000-0000-0000-0000-000000000001', '工程部', 'acme'),
Expand Down
13 changes: 13 additions & 0 deletions docker-compose.opengauss.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
services:
opengauss:
image: opengauss/opengauss-server:latest
privileged: true
ports:
- "15432:5432"
environment:
GS_PASSWORD: "Huawei@123"
volumes:
- ogdata:/var/lib/opengauss/data

volumes:
ogdata:
97 changes: 97 additions & 0 deletions docs/setup/opengauss-setup-guide-zh.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# openGauss 部署指南

本文档介绍如何使用 openGauss 7.0+ 作为 ContextHub 的存储后端。

## 前置条件

- Docker 已安装
- openGauss 7.0+ (内置 DataVec 向量能力)

## 1. 启动 openGauss 容器

```bash
# 使用项目提供的 compose 文件
docker compose -f docker-compose.opengauss.yml up -d

# 或者手动启动
docker pull opengauss/opengauss-server:latest
docker run --name opengauss --privileged=true -d \
-e GS_PASSWORD=Huawei@123 \
-p 15432:5432 \
opengauss/opengauss-server:latest
```

## 2. 初始化数据库

```bash
docker exec -it opengauss bash
su omm
gsql -d postgres -p 5432
```

在 gsql 中执行:

```sql
CREATE USER contexthub WITH PASSWORD 'ContextHub@123' SYSADMIN;
CREATE DATABASE contexthub OWNER contexthub DBCOMPATIBILITY = 'PG';
```

> **注意:**
> - openGauss 7.0+ 内置 DataVec,无需创建 vector 扩展
> - 使用自定义`uuid_generate_v4()` (`alembic/versions/001_initial_schema.py`),替代 PostgreSQL 的 `pgcrypto` + `gen_random_uuid()`
> - openGauss 密码有强度约束,须包含大小写字母、数字和特殊字符
> - 使用 `SYSADMIN` 而非 `SUPERUSER` 关键字
> - 使用DBCOMPATIBILITY = 'PG'模式,保证空字符串/NULL值处理等与postgres统一。

## 3. 配置 ContextHub

编辑 `.env` 文件:

```env
DATABASE_URL=postgresql://contexthub:ContextHub%40123@<host>:15432/contexthub
DB_BACKEND=opengauss
```

其中 `<host>` 替换为 openGauss 服务器的实际 IP 地址。

## 4. 运行数据库迁移

```bash
DB_BACKEND=opengauss alembic upgrade head
```

> 迁移脚本会根据 `DB_BACKEND` 环境变量自动选择:
> - `opengauss`: 使用 `uuid_generate_v4()` 作为 UUID 默认值
> - `postgres` (默认): 创建 `vector` + `pgcrypto` 扩展,使用 `gen_random_uuid()`

## 5. 启动服务

```bash
DB_BACKEND=opengauss uvicorn contexthub.main:app --host 0.0.0.0 --port 8000
```

## 6. 安装 Python 依赖说明

使用 openGauss 后端时,不需要安装 `pgvector` Python 包:

```bash
# openGauss 后端
pip install .

# PostgreSQL + pgvector 后端
pip install ".[postgres]"
```

## 与 PostgreSQL 后端的差异

| 特性 | PostgreSQL 16 | openGauss 7.0+ |
|------|--------------|----------------|
| 向量扩展 | pgvector (需安装) | DataVec (内置) |
| UUID 函数 | `gen_random_uuid()` (pgcrypto) | `uuid_generate_v4()` (自定义) |
| 向量类型 `vector(N)` | 兼容 | 兼容 |
| 向量距离 `<=>` | 兼容 | 兼容 |
| HNSW 索引 | 兼容 | 兼容 |
| RLS | 兼容 | 兼容 |
| `pg_notify`/`LISTEN` | 兼容 | 不兼容 |
| asyncpg 驱动 | 兼容 | 部分兼容 |
| 连接 URL 格式 | `postgresql://` | `postgresql://` |
Loading