Skip to content

[eval] CI: compare eval run against stored baseline #388

@thorrester

Description

@thorrester

Part of #380
Depends on: #383, #384

What

Logic for CI workflows where a team runs evaluations on the current version and compares against a previously stored run.

  • Python API: compare(current_result, baseline) where baseline can be specified as:
    • Service version (e.g. service_version="1.2.0")
    • Space/name/version triple
    • Specific eval run uid
  • Server route to fetch a stored eval result by the above identifiers
  • Comparison output: per-metric delta, pass/fail regression verdict
  • CLI: opsml eval compare --baseline-version 1.2.0 for use in CI scripts
  • Should return a non-zero exit code on regression (for CI gate use)

Metadata

Metadata

Assignees

No one assigned

    Labels

    BackendRequires backend Rust and python workenhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions