Skip to content

feat(console): 增加 Usage/Audit Dashboard BFF#2016

Merged
ZaynJarvis merged 3 commits into
mainfrom
feat/console-usage-audit-bff
May 14, 2026
Merged

feat(console): 增加 Usage/Audit Dashboard BFF#2016
ZaynJarvis merged 3 commits into
mainfrom
feat/console-usage-audit-bff

Conversation

@qin-ctx
Copy link
Copy Markdown
Collaborator

@qin-ctx qin-ctx commented May 13, 2026

Description

WIP:本 PR 为 Console 增加内置在 OV Server 内的 Usage/Audit BFF 能力,用于支撑 Console 首页和请求日志类产品化数据展示。

核心思路是复用现有 metrics/observability 打点链路,把同一份请求、模型、检索信号分发给 metrics 和 Usage/Audit 两类消费者:metrics 继续面向 Prometheus/运维观测,Usage/Audit 负责产品侧可查询、可聚合、可持久化的数据。

当前版本先落地本地版 SQLite store,并保留 UsageAuditStore 抽象,后续生产环境可以扩展到 Postgres、ClickHouse 或其他服务端存储。

Related Issue

暂无关联 Issue。

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Performance improvement
  • Test update

Changes Made

  • 新增 Console BFF API:
    • GET /api/v1/console/dashboard/summary
    • GET /api/v1/console/tokens
    • GET /api/v1/console/context-commits
    • GET /api/v1/console/audit
  • 新增 openviking.observability.events 进程内事件总线,metrics datasource 通过共享事件总线发布事件,metrics collector 和 Usage/Audit subscriber 可以消费同一份信号,避免重复打点。
  • 新增 openviking.observability.usage_audit 模块:
    • SQLite schema 和 store 实现
    • 后台 worker 异步批量写入
    • HTTP request / model usage event projection
    • Dashboard query service
    • Context inventory provider
    • runtime bootstrap / shutdown
  • Dashboard 首屏数据口径:
    • Token:聚合 VLM / embedding / rerank 事件
    • 今日检索:聚合 /api/v1/search/find/api/v1/search/search 请求事件
    • Agent 概览:基于请求事件更新 agent 活跃信息
    • 上下文数量:通过 VikingFS.stat().count 读取业务目录当前状态,不再直接 count vector store,也不通过历史写入事件累计
  • Console proxy 增加明确 allowlist 的 /ov/console/... 转发,避免 wildcard proxy 带来的路径穿越风险。
  • Server config 增加 observability.usage_audit 配置,默认启用 SQLite 本地存储。
  • 新增设计文档和使用文档:
    • docs/design/console-usage-audit-design.md
    • openviking/observability/usage_audit/README.md
  • 新增测试覆盖:
    • Console router 鉴权、错误码和参数拆分
    • Console proxy 转发和路径穿越拒绝
    • 事件总线 fan-out 和 metrics-disabled 场景
    • Usage/Audit runtime、inventory、SQLite 聚合、retention、worker shutdown flush
    • HTTP metrics 现有行为回归

Testing

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have tested this on the following platforms:
    • Linux
    • macOS
    • Windows

本地验证命令:

.venv/bin/python -m ruff check openviking/observability/events.py openviking/observability/usage_audit openviking/server/routers/console.py openviking/console/app.py openviking/metrics/core/runtime.py openviking/metrics/datasources/base.py openviking/metrics/datasources/http.py openviking/metrics/datasources/model_usage.py openviking/metrics/global_api.py openviking/observability/http_observability_middleware.py openviking/server/app.py openviking/server/config.py openviking/server/routers/__init__.py tests/observability tests/misc/test_console_proxy.py tests/metrics/integration/test_http_metrics.py
.venv/bin/python -m pytest tests/observability tests/misc/test_console_proxy.py tests/metrics/integration/test_http_metrics.py

结果:

  • ruff:通过
  • pytest:28 passed

另外,本地启动 OV Server 后做过一轮接口级验证,包括创建 session、写入 messages、调用 search/find、查询 /api/v1/console/* BFF 接口。验证记录写在本地忽略文件 test_scripts/usage-audit-live-api-run.md,未纳入 PR,避免提交一次性运行产物。

Checklist

  • My code follows the project's coding style
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Screenshots (if applicable)

不适用。本 PR 主要是 OV Server BFF、Usage/Audit 存储和后端聚合逻辑。

Additional Notes

  • PR 先标记为 WIP/Draft,方便继续评审数据口径、生产 store 选型和 Console 页面联调。
  • 当前 SQLite store 主要面向本地版和单进程部署。分布式生产环境不建议多个服务直接共享同一个 SQLite 文件,后续应扩展服务端存储实现。
  • ContextInventoryProvider 使用 VikingFS.stat().count 读取当前状态;业务根目录不存在时按 0 处理,非预期异常记录 warning 后降级为 0。
  • 本地 git commit 的 pre-commit hook 因环境缺少 pre_commit 模块未能执行,已手动完成 ruff 和 pytest 验证后使用 --no-verify 提交。

@github-actions
Copy link
Copy Markdown

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🏅 Score: 85
🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Refactor: Introduce shared observability event bus

Relevant files:

  • openviking/observability/events.py
  • openviking/metrics/global_api.py
  • openviking/metrics/datasources/base.py
  • openviking/metrics/datasources/http.py
  • openviking/metrics/datasources/model_usage.py
  • tests/observability/test_events.py
  • tests/metrics/integration/test_http_metrics.py

Sub-PR theme: Feature: Add Usage/Audit Dashboard BFF with SQLite store

Relevant files:

  • openviking/observability/usage_audit/*
  • openviking/server/routers/console.py
  • openviking/server/config.py
  • openviking/server/app.py
  • tests/observability/*

⚡ Recommended focus areas for review

Perf: Unbounded cache with no expired entry cleanup

The _cache dictionary in ContextInventoryProvider stores entries indefinitely, even after they expire. While entries are checked for TTL when accessed, expired entries are never removed, leading to unbounded memory growth over time with many unique (account_id, user_id, agent_id) tuples.

def __init__(self, service: Any, *, ttl_seconds: float = 10.0) -> None:
    self._service = service
    self._ttl_seconds = max(float(ttl_seconds), 0.0)
    self._cache: dict[tuple[str, str, str], tuple[float, dict[str, int]]] = {}
    self._lock = asyncio.Lock()

async def get_counts(self, ctx: RequestContext) -> dict[str, int]:
    """Return current context counts for the caller's tenant scope."""
    key = (ctx.account_id, ctx.user.user_id, ctx.user.agent_id)
    now = time.monotonic()
    cached = self._cache.get(key)
    if cached and now - cached[0] < self._ttl_seconds:
        return dict(cached[1])

    async with self._lock:
        cached = self._cache.get(key)
        if cached and now - cached[0] < self._ttl_seconds:
            return dict(cached[1])
        counts = await self._read_counts(ctx)
        self._cache[key] = (time.monotonic(), counts)
        return dict(counts)
Suggestion: Missing type annotation for `tz` parameter

The tz parameter in project_events lacks a type annotation, which could lead to type checking errors and reduced code clarity.

def project_events(
    events: Sequence[ObservabilityEvent],
    *,
    tz,
) -> UsageAuditProjection:

@github-actions
Copy link
Copy Markdown

PR Code Suggestions ✨

No code suggestions found for the PR.

@qin-ctx qin-ctx marked this pull request as ready for review May 14, 2026 08:33
@qin-ctx qin-ctx force-pushed the feat/console-usage-audit-bff branch from 730c9ad to efb5452 Compare May 14, 2026 08:33
@github-actions
Copy link
Copy Markdown

PR Code Suggestions ✨

No code suggestions found for the PR.

@qin-ctx qin-ctx changed the title WIP: feat(console): 增加 Usage/Audit Dashboard BFF feat(console): 增加 Usage/Audit Dashboard BFF May 14, 2026
Copy link
Copy Markdown
Collaborator

@ZaynJarvis ZaynJarvis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ZaynJarvis ZaynJarvis merged commit 9d36b2f into main May 14, 2026
5 checks passed
@ZaynJarvis ZaynJarvis deleted the feat/console-usage-audit-bff branch May 14, 2026 10:19
@github-project-automation github-project-automation Bot moved this from Backlog to Done in OpenViking project May 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants