Skip to content

#847 chore(deps): 添加 OceanBase 集成示例和文档#1080

Open
flying-dragon-ai wants to merge 6 commits into
areal-project:mainfrom
flying-dragon-ai:main
Open

#847 chore(deps): 添加 OceanBase 集成示例和文档#1080
flying-dragon-ai wants to merge 6 commits into
areal-project:mainfrom
flying-dragon-ai:main

Conversation

@flying-dragon-ai
Copy link
Copy Markdown

Description

Related Issue

Fixes #(issue)

Type of Change

  • 🐛 Bug fix
  • ✨ New feature
  • 💥 Breaking change
  • 📝 Documentation update
  • ♻️ Refactoring
  • ⚡ Performance improvement
  • ✅ Test coverage improvement

Checklist

  • I have read the Contributing Guide
  • Pre-commit hooks pass (pre-commit run --all-files)
  • Relevant tests pass; new tests added for new functionality
  • Documentation updated (if applicable; built with ./docs/build_all.sh)
  • Branch is up to date with main
  • Self-reviewed via /review-pr command
  • This PR was created by a coding agent via /create-pr
  • This PR is a breaking change

Breaking Change Details (if applicable):

Additional Context


Need help? Check the Contributing Guide or ask in
GitHub Discussions!

flying-dragon-ai and others added 6 commits January 31, 2026 14:57
- 在 pyproject.toml 文件的依赖列表中添加了 pymysql
- 方便后续数据库相关操作的支持
- 保持依赖一致性和完整性
- Add OceanBaseMetricsLogger class for metrics persistence
  - Database connection with environment variable support
  - Table creation with proper indexes
  - Metric insertion with error handling
  - Query examples for verification
- Add comprehensive quickstart guide
  - OceanBase introduction and Docker deployment
  - Connection configuration and troubleshooting
  - Two integration approaches (direct + custom)
  - Common SQL queries and performance optimization
- Add pymysql dependency to pyproject.toml
- Update README with tutorial link

Closes: OceanBase integration feature request
Replace `Optional[X]` with `X | None` syntax (Python 3.10+) in
oceanbase_example.py to comply with ruff UP045 rule.

Changes:
- Remove unused `typing.Optional` import
- Update connection type annotation
- Update insert_metric parameter annotations
- 引入了 typing.Optional 以替代联合类型注解
- 将 pymysql.Connection | None 修改为 Optional[pymysql.Connection]
- 将 float | None 类型参数改为 Optional[float]
- 提升代码的类型一致性和可读性
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances AReaL by providing an integration example and documentation for OceanBase, a distributed database suitable for storing large-scale training metrics. It also introduces comprehensive documentation for several core modules, offering a clearer understanding of the codebase structure and functionality.

Highlights

  • OceanBase Integration: This PR adds an example and documentation for integrating OceanBase, an open-source distributed database, with AReaL for persistent storage of training metrics.
  • New Documentation: Introduces a quickstart guide for OceanBase integration, covering installation, configuration, usage examples, common queries, troubleshooting, and advanced configurations.
  • Example Script: Includes a Python script demonstrating how to connect to OceanBase, create a table for training metrics, insert data, and query the database.
  • Codebase Overview: Adds CLAUDE.md files to provide documentation and overview of the areal.api, areal.core, areal.engine, areal.workflow, and examples directories.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

本次 PR 添加了 OceanBase 的集成示例和相关文档,整体实现得很好。代码示例清晰,文档全面。
我提出了一些改进建议:

  1. oceanbase_example.py 中,建议使用批量插入来提高性能,这对于日志场景是一个重要的最佳实践。
  2. oceanbase_quickstart.md 文档中,修复了分区表示例中不正确的 DDL,以确保其可以成功执行。

这些修改将使示例代码更高效,并确保文档中的示例准确无误。

loss FLOAT,
reward FLOAT,
timestamp DATETIME NOT NULL,
PRIMARY KEY (id, timestamp)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

在 OceanBase (以及大多数 MySQL 兼容数据库) 中,分区表的每个唯一键(包括主键)都必须包含所有分区键列。当前示例中,表按 timestamp 分区,但主键 (id, timestamp) 的定义不正确,因为分区函数中不包含 id 列。这会导致建表语句执行失败。

为了修正这个问题,应将分区键 timestamp 作为主键的第一部分。

Suggested change
PRIMARY KEY (id, timestamp)
PRIMARY KEY (timestamp, id)

Comment on lines +191 to +198
logger.info("插入示例训练指标...")
for step in range(1, 6):
metrics_logger.insert_metric(
experiment_name="gsm8k_grpo_demo",
step=step * 100,
loss=1.5 - step * 0.2,
reward=0.5 + step * 0.1,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

在循环中逐条插入指标效率较低,会导致大量的数据库网络请求。建议使用批量插入(executemany)来提高性能,这对于日志记录场景尤其重要。您的文档 oceanbase_quickstart.md 中也推荐了批量插入作为性能优化方案。

以下建议直接在 main 函数中实现了批量插入。为了更好的封装性,建议将此批量插入逻辑封装成 OceanBaseMetricsLogger 类的一个新方法(例如 insert_metrics_batch)。

        logger.info("生成并批量插入示例训练指标...")
        metrics_to_insert = [
            (
                "gsm8k_grpo_demo",
                step * 100,
                1.5 - step * 0.2,
                0.5 + step * 0.1,
                datetime.now(),
            )
            for step in range(1, 6)
        ]
        if metrics_to_insert and metrics_logger.connection:
            with metrics_logger.connection.cursor() as cursor:
                insert_sql = """
                INSERT INTO training_metrics
                (experiment_name, step, loss, reward, timestamp)
                VALUES (%s, %s, %s, %s, %s)
                """
                cursor.executemany(insert_sql, metrics_to_insert)
            logger.info(f"批量插入 {len(metrics_to_insert)} 条指标成功")

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 7, 2026

This pull request has been automatically marked as stale because it has not had recent activity within the last 14 days.

Please add a comment or push new commits to keep it active.

Thank you for your contribution!

@github-actions github-actions Bot added the stale label Apr 7, 2026
@garrett4wade
Copy link
Copy Markdown
Collaborator

Hi @flying-dragon-ai , the OceanBase logger would be a great feature! Could you please:
(1) directly integrate OceanBase in @areal/utils/stats_logger.py as an additional logging backend?
(2) clean up the new CLAUDE.md and Chinese comments which are not essentially related to this PR?

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

This pull request has been automatically marked as stale because it has not had recent activity within the last 14 days.

Please add a comment or push new commits to keep it active.

Thank you for your contribution!

@github-actions github-actions Bot added the stale label May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants