Part of #380
Depends on: #383, #384
What
Logic for CI workflows where a team runs evaluations on the current version and compares against a previously stored run.
- Python API:
compare(current_result, baseline) where baseline can be specified as:
- Service version (e.g.
service_version="1.2.0")
- Space/name/version triple
- Specific eval run uid
- Server route to fetch a stored eval result by the above identifiers
- Comparison output: per-metric delta, pass/fail regression verdict
- CLI:
opsml eval compare --baseline-version 1.2.0 for use in CI scripts
- Should return a non-zero exit code on regression (for CI gate use)
Part of #380
Depends on: #383, #384
What
Logic for CI workflows where a team runs evaluations on the current version and compares against a previously stored run.
compare(current_result, baseline)where baseline can be specified as:service_version="1.2.0")opsml eval compare --baseline-version 1.2.0for use in CI scripts