Skip to content

[WIP] [Feature]: Add openGauss DB Backend#29

Open
initzhang wants to merge 6 commits into
The-AI-Framework-and-Data-Tech-Lab-HK:mainfrom
initzhang:main
Open

[WIP] [Feature]: Add openGauss DB Backend#29
initzhang wants to merge 6 commits into
The-AI-Framework-and-Data-Tech-Lab-HK:mainfrom
initzhang:main

Conversation

@initzhang

Copy link
Copy Markdown

Adding openGauss Backend for ContextHub

Part 1. Configurate openGauss Server

  • Refer to docs/setup/opengauss-setup-guide-zh.md to add the new server backend.
  • When executing CREATE DATABASE, the mode must be set to DBCOMPATIBILITY = 'PG'. This prevents empty strings from being interpreted as NULL.

Part 2. Extension Replacement

  • The original ContextHub project relies on the pgvector and pgcrypto extensions, neither of which are supported by openGauss.
  • Solution: openGauss 7.0.0 natively supports DataVec, which can replace pgvector. Since openGauss does not support pgcrypto and the alternative uuid-ossp is missing from the official docker image, a custom gen_random_uuid function was implemented manually to bypass the need for the pgcrypto extension.

Part 3. Python Driver Compatibility

  • Originally, ContextHub used the asyncpg library to interact with the database; however, asyncpg does not support openGauss's vector data format.
    • Error message: message: unhandled standard data type 'vector' (OID 8305). This can be reproduced using the script opengauss/vector_asyncpg.py.
    • While openGauss maintains its own async_gaussdb library, it also lacks support for the vector format (verified via opengauss/vector_async_gaussdb.py).
  • Solution: Implemented a database compatibility layer. The Postgres backend continues to use asyncpg, while the openGauss backend switches to the psycopg3 driver.
    • Since psycopg3 uses %s for positional parameters instead of asyncpg's $n syntax, regular expressions are used for automatic conversion.
    • The compatibility layer handles syntax translation and exposes unified fetch, fetchrow, fetchall, and execute interfaces.
    • Implementation details can be found in src/contexthub/db/repository.py.

Part 4. SQL Dialect Rewriting

  • ContextHub originally used PostgreSQL-dialect SQL, which contains features incompatible with openGauss.
  • For example, openGauss does not support PostgreSQL's INSERT ON CONFLICT (must be rewritten as ON DUPLICATE KEY UPDATE) and cannot use it in conjunction with RETURNING statements or ROW POLICY.
  • Approximately 20+ SQL statements across the project require rewriting. Detailed specifics are available in the opengauss-compatibility-report.md.

Overall Completion Status

  • Part 1, 2, and 3: 100% complete. The demo script opengauss/demo_e2e_opengauss.py currently executes the first three steps successfully.
  • Part 4: 50% complete. Currently rewriting the SQL for the fourth step of the demo; see the FIXME tags in ContextHub/src/contexthub/services/skill_service.py for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant