feat: extract evaluation framework into uipath-eval package by Chibionos · Pull Request #1710 · UiPath/uipath-python

Chibionos · 2026-06-11T05:10:38Z

Summary

Extract the evaluation framework (uipath.eval) into a new standalone distribution uipath-eval (packages/uipath-eval), so consumers — primarily the python eval worker in the agents backend — can depend on the evaluators, mocking system, and eval runtime without pulling in the CLI and the rest of the SDK. Today that worker pins the entire uipath SDK just to run evaluators.

Supersedes the goals of #1040 (closed as stale); the strategy-pattern reporting refactor follows as a separate PR on top of this extraction.

What moved

packages/uipath/src/uipath/eval/ → packages/uipath-eval/src/uipath/eval/ (namespace package, same pattern as uipath-platform/uipath-core: no __init__.py at src/uipath/, py.typed marker included). Import paths are unchanged — from uipath.eval... works exactly as before for every existing consumer.
Pure-eval tests (731) move with the code. CLI-coupled eval tests (discovery, telemetry, progress reporter, live tracking span processor) stay in packages/uipath, as do the CLI progress reporters in _cli/_evals/.

Changes required by the split

The three legacy evaluators imported COMMUNITY_agents_SUFFIX from uipath._utils.constants via a relative import — the only entanglement with non-extracted SDK internals. They now use the constant that already existed in uipath.eval._helpers.evaluators_helpers.
mockito, coverage move from uipath's dependencies to uipath-eval's (only eval code uses them). pydantic-function-models stays in both (cli_server.py preloads it).

Versions & dependency chain

uipath 2.10.82 → uipath-eval 0.1.0 → uipath-platform / uipath-runtime / uipath-core

CI / release wiring

detect_changed_packages.py: uipath-eval added to the dependents graph (core/platform changes test eval; eval changes test uipath)
test-packages.yml / lint-packages.yml: dedicated uipath-eval jobs (same matrix as platform)
cd.yml: new publish tier between platform and uipath (core → platform → eval → uipath), with wait-for-uipath-eval gating the uipath build
labeler.yml: uipath-eval source globs added to the langchain/integration test triggers

Validation

uipath-eval: 731 tests pass; ruff, ruff format, mypy clean; wheel + sdist build
uipath: 1200 tests pass; ruff, custom httpx linter, mypy (src + tests) clean; wheel builds; uipath --help and uipath eval --help smoke-tested against the new layout
Namespace merge verified: import uipath; import uipath.eval.evaluators; from uipath.eval.runtime import evaluate resolves across the two distributions
All four edited workflow/labeler YAMLs parse

🤖 Generated with Claude Code

Move src/uipath/eval to a new namespace package distribution (packages/uipath-eval, import path unchanged: uipath.eval) so the evaluation framework — evaluators, mocking, eval runtime — can be consumed standalone, e.g. by the python eval worker in the agents backend, without pulling in the CLI and the rest of the SDK. - uipath-eval 0.1.0: depends only on uipath-core, uipath-platform, uipath-runtime (+ mockito, pydantic-function-models, coverage, which move out of the main package's dependencies) - uipath 2.10.82 depends on uipath-eval>=0.1.0,<0.2.0; editable link via [tool.uv.sources] - pure-eval tests move with the code (731 tests); CLI-coupled eval tests (discovery, telemetry, progress reporter, live tracking) stay in packages/uipath - the three legacy evaluators' relative import of uipath._utils.constants now uses the eval-local constant - CI: detect_changed_packages dependency graph, test/lint jobs, cd.yml publish tier (core -> platform -> eval -> uipath), labeler Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

sonarqubecloud · 2026-06-11T05:13:37Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Copilot

Pull request overview

Extracts the uipath.eval evaluation framework into a standalone uipath-eval distribution (while preserving from uipath.eval... import paths) and wires the monorepo/CI/release pipeline so uipath depends on uipath-eval instead of bundling eval internals directly.

Changes:

Adds a new packages/uipath-eval package containing evaluators, models, mocks/simulation, and the eval runtime (+ its test suite).
Updates uipath to depend on uipath-eval (and shifts eval-only deps like mockito/coverage accordingly).
Updates CI and CD workflows/scripts to test, lint, and publish uipath-eval in the correct dependency tier.

Reviewed changes

Copilot reviewed 34 out of 130 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
packages/uipath/pyproject.toml	Add `uipath-eval` dependency; bump `uipath` version.
packages/uipath/uv.lock	Lockfile updates: add editable `uipath-eval`; move deps; bump `uipath` version.
packages/uipath-eval/README.md	New package README documenting modules and usage.
packages/uipath-eval/pyproject.toml	New package metadata, deps, tooling config, pytest/mypy/ruff settings.
packages/uipath-eval/.python-version	Pin package dev Python version.
packages/uipath-eval/CLAUDE.md	Package-specific dev notes and constraints.
CLAUDE.md	Repo doc updated for new 4th package and dependency chain.
.github/workflows/test-packages.yml	Add `uipath-eval` test matrix job.
.github/workflows/lint-packages.yml	Add `uipath-eval` lint/typecheck job.
.github/workflows/cd.yml	Add build/publish/wait tier for `uipath-eval` between platform and `uipath`.
.github/scripts/detect_changed_packages.py	Add `uipath-eval` to dependents graph.
.github/labeler.yml	Include `uipath-eval` globs for integration/langchain triggers.
packages/uipath-eval/src/uipath/eval/py.typed	Mark package as typed.
packages/uipath-eval/src/uipath/eval/constants.py	Eval package constants (folder names, custom prefix).
packages/uipath-eval/src/uipath/eval/_execution_context.py	Shared contextvars + span collector for runtime/mocks.
packages/uipath-eval/src/uipath/eval/_helpers/init.py	Helpers package init.
packages/uipath-eval/src/uipath/eval/_helpers/helpers.py	Helper utilities (e.g., emptiness checks, metrics wrapper).
packages/uipath-eval/src/uipath/eval/_helpers/output_path.py	Utility for resolving nested output paths (`a.b[0]`).
packages/uipath-eval/src/uipath/eval/_helpers/evaluators_helpers.py	Evaluator helper functions/constants used across evaluators.
packages/uipath-eval/src/uipath/eval/models/init.py	Public exports for eval models.
packages/uipath-eval/src/uipath/eval/models/models.py	Core eval result/trace models (minor typing tweak in diff).
packages/uipath-eval/src/uipath/eval/models/_conversational_utils.py	Conversational eval input/output helpers.
packages/uipath-eval/src/uipath/eval/models/evaluation_set.py	Eval set + item models (incl GUID id normalization).
packages/uipath-eval/src/uipath/eval/models/llm_judge_types.py	LLM judge prompt/output schema models.
packages/uipath-eval/src/uipath/eval/mocks/init.py	Public exports for mocks/simulation API.
packages/uipath-eval/src/uipath/eval/mocks/mockable.py	`@mockable` decorator for mocking/simulation.
packages/uipath-eval/src/uipath/eval/mocks/_types.py	Pydantic schemas for mocking/simulation config.
packages/uipath-eval/src/uipath/eval/mocks/_mocker.py	Mocker interface + mock-related exceptions.
packages/uipath-eval/src/uipath/eval/mocks/_mocker_factory.py	Factory to select LLM vs mockito mocker.
packages/uipath-eval/src/uipath/eval/mocks/_mockito_mocker.py	Mockito-backed mocker implementation.
packages/uipath-eval/src/uipath/eval/mocks/_llm_mocker.py	LLM tool-response mocking implementation.
packages/uipath-eval/src/uipath/eval/mocks/_input_mocker.py	LLM input-generation mocking implementation.
packages/uipath-eval/src/uipath/eval/mocks/_cache_manager.py	Cache manager for mocker responses (memory + disk).
packages/uipath-eval/src/uipath/eval/mocks/_mock_context.py	Contextvars + helpers for mock resolution/simulation checks.
packages/uipath-eval/src/uipath/eval/mocks/_mock_runtime.py	Runtime delegate wrapping execution with mock context.
packages/uipath-eval/src/uipath/eval/mocks/_structured_output.py	Structured-output helper used by mocking.
packages/uipath-eval/src/uipath/eval/helpers.py	Eval set loading/migration + evaluator loading helpers.
packages/uipath-eval/src/uipath/eval/runtime/init.py	Runtime public API re-exports (`evaluate`, context, types).
packages/uipath-eval/src/uipath/eval/runtime/context.py	`UiPathEvalContext` container for runtime execution.
packages/uipath-eval/src/uipath/eval/runtime/events.py	Event types + payload models for eval progress reporting.
packages/uipath-eval/src/uipath/eval/runtime/_evaluate.py	`evaluate()` entrypoint wrapper around `UiPathEvalRuntime`.
packages/uipath-eval/src/uipath/eval/runtime/runtime.py	Main eval runtime implementation.
packages/uipath-eval/src/uipath/eval/runtime/_parallelization.py	Async worker-queue parallel execution helper.
packages/uipath-eval/src/uipath/eval/runtime/_utils.py	Input override merging utilities.
packages/uipath-eval/src/uipath/eval/runtime/_types.py	Runtime result DTOs/types.
packages/uipath-eval/src/uipath/eval/runtime/_spans.py	Span persistence/extraction utilities.
packages/uipath-eval/src/uipath/eval/runtime/_exporters.py	Trace/log exporters integration.
packages/uipath-eval/src/uipath/eval/evaluators/init.py	Evaluator exports + `EVALUATORS` registry.
packages/uipath-eval/src/uipath/eval/evaluators/evaluator.py	Discriminated unions for coded vs legacy evaluators.
packages/uipath-eval/src/uipath/eval/evaluators/evaluator_factory.py	Factory for loading built-in and custom evaluators.
packages/uipath-eval/src/uipath/eval/evaluators/registration.py	CLI support for registering custom evaluators/types.
packages/uipath-eval/src/uipath/eval/evaluators/base_legacy_evaluator.py	Legacy evaluator base + line-by-line support.
packages/uipath-eval/src/uipath/eval/evaluators/legacy_deterministic_evaluator_base.py	Shared deterministic evaluator utilities (canonical JSON).
packages/uipath-eval/src/uipath/eval/evaluators/legacy_exact_match_evaluator.py	Legacy deterministic exact match evaluator.
packages/uipath-eval/src/uipath/eval/evaluators/legacy_json_similarity_evaluator.py	Legacy deterministic JSON similarity evaluator.
packages/uipath-eval/src/uipath/eval/evaluators/legacy_llm_helpers.py	Legacy LLM function-calling helper utilities.
packages/uipath-eval/src/uipath/eval/evaluators/legacy_llm_as_judge_evaluator.py	Legacy LLM-as-judge evaluator (split helpers/const use).
packages/uipath-eval/src/uipath/eval/evaluators/legacy_trajectory_evaluator.py	Legacy trajectory evaluator (split helpers/const use).
packages/uipath-eval/src/uipath/eval/evaluators/legacy_evaluator_utils.py	Legacy evaluator utilities (const import change).
packages/uipath-eval/src/uipath/eval/evaluators/legacy_context_precision_evaluator.py	Legacy context precision evaluator.
packages/uipath-eval/src/uipath/eval/evaluators/legacy_faithfulness_evaluator.py	Legacy faithfulness evaluator.
packages/uipath-eval/src/uipath/eval/evaluators/legacy_csv_exact_match_evaluator.py	Legacy CSV exact match evaluator.
packages/uipath-eval/src/uipath/eval/evaluators/attachment_utils.py	Job-attachment URI download helpers.
packages/uipath-eval/src/uipath/eval/evaluators/line_by_line_utils.py	Line-by-line evaluation utilities used by legacy evaluators.
packages/uipath-eval/src/uipath/eval/evaluators/exact_match_evaluator.py	Coded exact-match evaluator.
packages/uipath-eval/src/uipath/eval/evaluators/contains_evaluator.py	Coded contains evaluator.
packages/uipath-eval/src/uipath/eval/evaluators/json_similarity_evaluator.py	Coded JSON similarity evaluator.
packages/uipath-eval/src/uipath/eval/evaluators/llm_as_judge_evaluator.py	Coded LLM-as-judge core logic.
packages/uipath-eval/src/uipath/eval/evaluators/llm_judge_output_evaluator.py	Coded LLM judge output evaluators.
packages/uipath-eval/src/uipath/eval/evaluators/llm_judge_trajectory_evaluator.py	Coded LLM judge trajectory evaluators.
packages/uipath-eval/src/uipath/eval/evaluators/binary_classification_evaluator.py	Binary classification evaluator + aggregation.
packages/uipath-eval/src/uipath/eval/evaluators/multiclass_classification_evaluator.py	Multiclass classification evaluator + aggregation.
packages/uipath-eval/src/uipath/eval/evaluators/tool_call_order_evaluator.py	Tool call order evaluator.
packages/uipath-eval/src/uipath/eval/evaluators/tool_call_args_evaluator.py	Tool call args evaluator.
packages/uipath-eval/src/uipath/eval/evaluators/tool_call_count_evaluator.py	Tool call count evaluator.
packages/uipath-eval/src/uipath/eval/evaluators/tool_call_output_evaluator.py	Tool call output evaluator.
packages/uipath-eval/src/uipath/eval/evaluators/base_evaluator.py	Core coded evaluator base infrastructure.
packages/uipath-eval/src/uipath/eval/evaluators/output_evaluator.py	Output extraction + aggregation helpers for coded evaluators.
packages/uipath-eval/src/uipath/eval/evaluators_types/generate_types.py	Script to generate JSON type specs.
packages/uipath-eval/src/uipath/eval/evaluators_types/*.json	Generated evaluator config/criteria/justification schemas.
packages/uipath-eval/tests/evaluators/init.py	Tests package init.
packages/uipath-eval/tests/evaluators/test_output_path.py	Tests for nested output-path resolution.
packages/uipath-eval/tests/evaluators/test_helpers.py	Tests for helper utilities (e.g., `is_empty_value`).
packages/uipath-eval/tests/evaluators/test_legacy_trajectory_evaluator.py	Regression test for legacy trajectory prompt compaction.
packages/uipath-eval/tests/evaluators/test_evaluator_factory.py	EvaluatorFactory tests (incl config prep and loading).
packages/uipath-eval/tests/evaluators/test_attachment_utils.py	Tests for attachment URI parsing/downloading helpers.
packages/uipath-eval/tests/evaluators/test_documentation_examples.py	Documentation example coverage tests.
packages/uipath-eval/tests/evaluators/test_eval_level_expected_output.py	Tests around expected output placement.
packages/uipath-eval/tests/evaluators/test_evaluator_aggregation.py	Aggregation behavior tests for evaluators.
packages/uipath-eval/tests/evaluators/test_evaluator_helpers.py	Tests for evaluator helper functions.
packages/uipath-eval/tests/evaluators/test_evaluator_methods.py	Broad evaluator behavior tests.
packages/uipath-eval/tests/evaluators/test_evaluator_schemas.py	Schema generation/validation tests.
packages/uipath-eval/tests/evaluators/test_legacy_target_output_key_paths.py	Legacy targetOutputKey path tests.
packages/uipath-eval/tests/evaluators/test_line_by_line_utils.py	Tests for line-by-line evaluation utilities.
packages/uipath-eval/tests/evaluators/test_llm_judge_placeholder_validation.py	Tests for LLM judge prompt placeholder validation.
packages/uipath-eval/tests/eval/test_evaluate.py	End-to-end eval runtime tests invoking `evaluate()`.
packages/uipath-eval/tests/eval/test_eval_tracing_integration.py	Tracing integration tests for runtime/evals.
packages/uipath-eval/tests/eval/test_eval_runtime_suspend_resume.py	Suspend/resume flow tests.
packages/uipath-eval/tests/eval/test_eval_runtime_metadata.py	Runtime metadata access tests.
packages/uipath-eval/tests/eval/test_eval_resume_flow.py	Resume-mode selection/validation tests.
packages/uipath-eval/tests/eval/test_eval_id_casing.py	Regression tests for case-insensitive GUID ids.
packages/uipath-eval/tests/eval/test_conversational_utils.py	Conversational eval conversion tests.
packages/uipath-eval/tests/eval/test_input_overrides_e2e.py	E2E tests for per-eval input overrides utilities.
packages/uipath-eval/tests/eval/test_apply_file_overrides.py	Tests for applying file/attachment overrides in inputs.
packages/uipath-eval/tests/eval/test_eval_runtime_spans.py	Span handling/persistence tests.
packages/uipath-eval/tests/eval/test_eval_set.py	Eval set parsing/migration tests.
packages/uipath-eval/tests/eval/test_eval_span_utils.py	Span utility tests.
packages/uipath-eval/tests/eval/test_eval_util.py	Misc eval util tests.
packages/uipath-eval/tests/eval/test_span_persistence.py	Span persistence behavior tests.
packages/uipath-eval/tests/eval/mocks/test_mockable_arg_collision.py	Regression test for `@mockable` arg-name collisions.
packages/uipath-eval/tests/eval/mocks/test_input_mocker.py	Tests for LLM input mock generation.
packages/uipath-eval/tests/eval/mocks/test_input_mocker_span.py	Tests for tracing spans during input mocking.
packages/uipath-eval/tests/eval/mocks/test_cache_manager.py	Tests for cache manager read/write/invalidations.
packages/uipath-eval/tests/eval/mocks/test_mocks.py	Broader mock/simulation behavior tests.
packages/uipath-eval/tests/eval/mocks/test_mockable_mocked_annotation.py	Tests for `@mockable` annotation handling.
packages/uipath-eval/tests/eval/mocks/test_structured_output.py	Tests for provider-agnostic structured output handling.
packages/uipath-eval/tests/eval/evals/evaluators/exact-match.json	Test evaluator spec fixture.
packages/uipath-eval/tests/eval/evals/eval-sets/default.json	Test eval-set fixture.
packages/uipath-eval/tests/eval/evals/eval-sets/multiple-evals.json	Test multi-eval-set fixture.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a5f44181ae

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-11T05:18:15Z

          UIPATH_FOLDER_KEY: ${{ secrets.UIPATH_MEMORY_FOLDER }}
        run: uv run pytest tests/services/test_memory_service_e2e.py -m e2e -v --no-cov

+  test-uipath-eval:


Include eval tests in the required test gate

This new test-uipath-eval matrix job is not included in the test-gate job's needs list or failure check at the bottom of this workflow, so a PR can still get a passing required Test status even when all uipath-eval tests fail. Since this commit moves the eval framework and its tests into this package, the gate should depend on and check test-uipath-eval as well.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-11T05:18:15Z

+  build-uipath-eval:
    needs: [detect-publishable-packages, wait-for-uipath-platform]


Wait for core before building eval releases

When a release publishes a new uipath-core and uipath-eval version without also publishing uipath-platform or uipath, wait-for-uipath-core is skipped because its condition only mentions platform/uipath, and this new eval build only waits on wait-for-uipath-platform (which just skips if platform is not being published). In that scenario needs-relock: true runs uv lock --no-sources for eval before the new core version is visible on PyPI, causing intermittent release failures or locking against the previous core if the lower bound was not updated.

Useful? React with 👍 / 👎.

Copilot AI review requested due to automatic review settings June 11, 2026 05:10

Chibionos enabled auto-merge (squash) June 11, 2026 05:10

github-actions Bot added test:uipath-langchain Triggers tests in the uipath-langchain-python repository test:uipath-integrations labels Jun 11, 2026

Copilot started reviewing on behalf of Chibionos June 11, 2026 05:10 View session

Copilot AI reviewed Jun 11, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Chibionos requested a review from mjnovice June 11, 2026 06:13

Chibionos mentioned this pull request Jun 11, 2026

refactor(eval): split progress reporter into strategy-based reporting package #1711

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: extract evaluation framework into uipath-eval package - #1710

feat: extract evaluation framework into uipath-eval package#1710
Chibionos wants to merge 1 commit into
mainfrom
feat/extract-uipath-eval-package

Chibionos commented Jun 11, 2026

Uh oh!

sonarqubecloud Bot commented Jun 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		build-uipath-eval:
		needs: [detect-publishable-packages, wait-for-uipath-platform]

Uh oh!

Conversation

Chibionos commented Jun 11, 2026

Summary

What moved

Changes required by the split

Versions & dependency chain

CI / release wiring

Validation

Uh oh!

sonarqubecloud Bot commented Jun 11, 2026

Quality Gate passed

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants