Skip to content

feat: extract evaluation framework into uipath-eval package#1710

Open
Chibionos wants to merge 1 commit into
mainfrom
feat/extract-uipath-eval-package
Open

feat: extract evaluation framework into uipath-eval package#1710
Chibionos wants to merge 1 commit into
mainfrom
feat/extract-uipath-eval-package

Conversation

@Chibionos

Copy link
Copy Markdown
Contributor

Summary

Extract the evaluation framework (uipath.eval) into a new standalone distribution uipath-eval (packages/uipath-eval), so consumers — primarily the python eval worker in the agents backend — can depend on the evaluators, mocking system, and eval runtime without pulling in the CLI and the rest of the SDK. Today that worker pins the entire uipath SDK just to run evaluators.

Supersedes the goals of #1040 (closed as stale); the strategy-pattern reporting refactor follows as a separate PR on top of this extraction.

What moved

  • packages/uipath/src/uipath/eval/packages/uipath-eval/src/uipath/eval/ (namespace package, same pattern as uipath-platform/uipath-core: no __init__.py at src/uipath/, py.typed marker included). Import paths are unchangedfrom uipath.eval... works exactly as before for every existing consumer.
  • Pure-eval tests (731) move with the code. CLI-coupled eval tests (discovery, telemetry, progress reporter, live tracking span processor) stay in packages/uipath, as do the CLI progress reporters in _cli/_evals/.

Changes required by the split

  • The three legacy evaluators imported COMMUNITY_agents_SUFFIX from uipath._utils.constants via a relative import — the only entanglement with non-extracted SDK internals. They now use the constant that already existed in uipath.eval._helpers.evaluators_helpers.
  • mockito, coverage move from uipath's dependencies to uipath-eval's (only eval code uses them). pydantic-function-models stays in both (cli_server.py preloads it).

Versions & dependency chain

uipath 2.10.82 → uipath-eval 0.1.0 → uipath-platform / uipath-runtime / uipath-core

CI / release wiring

  • detect_changed_packages.py: uipath-eval added to the dependents graph (core/platform changes test eval; eval changes test uipath)
  • test-packages.yml / lint-packages.yml: dedicated uipath-eval jobs (same matrix as platform)
  • cd.yml: new publish tier between platform and uipath (core → platform → eval → uipath), with wait-for-uipath-eval gating the uipath build
  • labeler.yml: uipath-eval source globs added to the langchain/integration test triggers

Validation

  • uipath-eval: 731 tests pass; ruff, ruff format, mypy clean; wheel + sdist build
  • uipath: 1200 tests pass; ruff, custom httpx linter, mypy (src + tests) clean; wheel builds; uipath --help and uipath eval --help smoke-tested against the new layout
  • Namespace merge verified: import uipath; import uipath.eval.evaluators; from uipath.eval.runtime import evaluate resolves across the two distributions
  • All four edited workflow/labeler YAMLs parse

🤖 Generated with Claude Code

Move src/uipath/eval to a new namespace package distribution
(packages/uipath-eval, import path unchanged: uipath.eval) so the
evaluation framework — evaluators, mocking, eval runtime — can be
consumed standalone, e.g. by the python eval worker in the agents
backend, without pulling in the CLI and the rest of the SDK.

- uipath-eval 0.1.0: depends only on uipath-core, uipath-platform,
  uipath-runtime (+ mockito, pydantic-function-models, coverage, which
  move out of the main package's dependencies)
- uipath 2.10.82 depends on uipath-eval>=0.1.0,<0.2.0; editable link
  via [tool.uv.sources]
- pure-eval tests move with the code (731 tests); CLI-coupled eval
  tests (discovery, telemetry, progress reporter, live tracking) stay
  in packages/uipath
- the three legacy evaluators' relative import of
  uipath._utils.constants now uses the eval-local constant
- CI: detect_changed_packages dependency graph, test/lint jobs,
  cd.yml publish tier (core -> platform -> eval -> uipath), labeler

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 11, 2026 05:10
@Chibionos Chibionos enabled auto-merge (squash) June 11, 2026 05:10
@github-actions github-actions Bot added test:uipath-langchain Triggers tests in the uipath-langchain-python repository test:uipath-integrations labels Jun 11, 2026
@sonarqubecloud

Copy link
Copy Markdown

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Extracts the uipath.eval evaluation framework into a standalone uipath-eval distribution (while preserving from uipath.eval... import paths) and wires the monorepo/CI/release pipeline so uipath depends on uipath-eval instead of bundling eval internals directly.

Changes:

  • Adds a new packages/uipath-eval package containing evaluators, models, mocks/simulation, and the eval runtime (+ its test suite).
  • Updates uipath to depend on uipath-eval (and shifts eval-only deps like mockito/coverage accordingly).
  • Updates CI and CD workflows/scripts to test, lint, and publish uipath-eval in the correct dependency tier.

Reviewed changes

Copilot reviewed 34 out of 130 changed files in this pull request and generated no comments.

Show a summary per file
File Description
packages/uipath/pyproject.toml Add uipath-eval dependency; bump uipath version.
packages/uipath/uv.lock Lockfile updates: add editable uipath-eval; move deps; bump uipath version.
packages/uipath-eval/README.md New package README documenting modules and usage.
packages/uipath-eval/pyproject.toml New package metadata, deps, tooling config, pytest/mypy/ruff settings.
packages/uipath-eval/.python-version Pin package dev Python version.
packages/uipath-eval/CLAUDE.md Package-specific dev notes and constraints.
CLAUDE.md Repo doc updated for new 4th package and dependency chain.
.github/workflows/test-packages.yml Add uipath-eval test matrix job.
.github/workflows/lint-packages.yml Add uipath-eval lint/typecheck job.
.github/workflows/cd.yml Add build/publish/wait tier for uipath-eval between platform and uipath.
.github/scripts/detect_changed_packages.py Add uipath-eval to dependents graph.
.github/labeler.yml Include uipath-eval globs for integration/langchain triggers.
packages/uipath-eval/src/uipath/eval/py.typed Mark package as typed.
packages/uipath-eval/src/uipath/eval/constants.py Eval package constants (folder names, custom prefix).
packages/uipath-eval/src/uipath/eval/_execution_context.py Shared contextvars + span collector for runtime/mocks.
packages/uipath-eval/src/uipath/eval/_helpers/init.py Helpers package init.
packages/uipath-eval/src/uipath/eval/_helpers/helpers.py Helper utilities (e.g., emptiness checks, metrics wrapper).
packages/uipath-eval/src/uipath/eval/_helpers/output_path.py Utility for resolving nested output paths (a.b[0]).
packages/uipath-eval/src/uipath/eval/_helpers/evaluators_helpers.py Evaluator helper functions/constants used across evaluators.
packages/uipath-eval/src/uipath/eval/models/init.py Public exports for eval models.
packages/uipath-eval/src/uipath/eval/models/models.py Core eval result/trace models (minor typing tweak in diff).
packages/uipath-eval/src/uipath/eval/models/_conversational_utils.py Conversational eval input/output helpers.
packages/uipath-eval/src/uipath/eval/models/evaluation_set.py Eval set + item models (incl GUID id normalization).
packages/uipath-eval/src/uipath/eval/models/llm_judge_types.py LLM judge prompt/output schema models.
packages/uipath-eval/src/uipath/eval/mocks/init.py Public exports for mocks/simulation API.
packages/uipath-eval/src/uipath/eval/mocks/mockable.py @mockable decorator for mocking/simulation.
packages/uipath-eval/src/uipath/eval/mocks/_types.py Pydantic schemas for mocking/simulation config.
packages/uipath-eval/src/uipath/eval/mocks/_mocker.py Mocker interface + mock-related exceptions.
packages/uipath-eval/src/uipath/eval/mocks/_mocker_factory.py Factory to select LLM vs mockito mocker.
packages/uipath-eval/src/uipath/eval/mocks/_mockito_mocker.py Mockito-backed mocker implementation.
packages/uipath-eval/src/uipath/eval/mocks/_llm_mocker.py LLM tool-response mocking implementation.
packages/uipath-eval/src/uipath/eval/mocks/_input_mocker.py LLM input-generation mocking implementation.
packages/uipath-eval/src/uipath/eval/mocks/_cache_manager.py Cache manager for mocker responses (memory + disk).
packages/uipath-eval/src/uipath/eval/mocks/_mock_context.py Contextvars + helpers for mock resolution/simulation checks.
packages/uipath-eval/src/uipath/eval/mocks/_mock_runtime.py Runtime delegate wrapping execution with mock context.
packages/uipath-eval/src/uipath/eval/mocks/_structured_output.py Structured-output helper used by mocking.
packages/uipath-eval/src/uipath/eval/helpers.py Eval set loading/migration + evaluator loading helpers.
packages/uipath-eval/src/uipath/eval/runtime/init.py Runtime public API re-exports (evaluate, context, types).
packages/uipath-eval/src/uipath/eval/runtime/context.py UiPathEvalContext container for runtime execution.
packages/uipath-eval/src/uipath/eval/runtime/events.py Event types + payload models for eval progress reporting.
packages/uipath-eval/src/uipath/eval/runtime/_evaluate.py evaluate() entrypoint wrapper around UiPathEvalRuntime.
packages/uipath-eval/src/uipath/eval/runtime/runtime.py Main eval runtime implementation.
packages/uipath-eval/src/uipath/eval/runtime/_parallelization.py Async worker-queue parallel execution helper.
packages/uipath-eval/src/uipath/eval/runtime/_utils.py Input override merging utilities.
packages/uipath-eval/src/uipath/eval/runtime/_types.py Runtime result DTOs/types.
packages/uipath-eval/src/uipath/eval/runtime/_spans.py Span persistence/extraction utilities.
packages/uipath-eval/src/uipath/eval/runtime/_exporters.py Trace/log exporters integration.
packages/uipath-eval/src/uipath/eval/evaluators/init.py Evaluator exports + EVALUATORS registry.
packages/uipath-eval/src/uipath/eval/evaluators/evaluator.py Discriminated unions for coded vs legacy evaluators.
packages/uipath-eval/src/uipath/eval/evaluators/evaluator_factory.py Factory for loading built-in and custom evaluators.
packages/uipath-eval/src/uipath/eval/evaluators/registration.py CLI support for registering custom evaluators/types.
packages/uipath-eval/src/uipath/eval/evaluators/base_legacy_evaluator.py Legacy evaluator base + line-by-line support.
packages/uipath-eval/src/uipath/eval/evaluators/legacy_deterministic_evaluator_base.py Shared deterministic evaluator utilities (canonical JSON).
packages/uipath-eval/src/uipath/eval/evaluators/legacy_exact_match_evaluator.py Legacy deterministic exact match evaluator.
packages/uipath-eval/src/uipath/eval/evaluators/legacy_json_similarity_evaluator.py Legacy deterministic JSON similarity evaluator.
packages/uipath-eval/src/uipath/eval/evaluators/legacy_llm_helpers.py Legacy LLM function-calling helper utilities.
packages/uipath-eval/src/uipath/eval/evaluators/legacy_llm_as_judge_evaluator.py Legacy LLM-as-judge evaluator (split helpers/const use).
packages/uipath-eval/src/uipath/eval/evaluators/legacy_trajectory_evaluator.py Legacy trajectory evaluator (split helpers/const use).
packages/uipath-eval/src/uipath/eval/evaluators/legacy_evaluator_utils.py Legacy evaluator utilities (const import change).
packages/uipath-eval/src/uipath/eval/evaluators/legacy_context_precision_evaluator.py Legacy context precision evaluator.
packages/uipath-eval/src/uipath/eval/evaluators/legacy_faithfulness_evaluator.py Legacy faithfulness evaluator.
packages/uipath-eval/src/uipath/eval/evaluators/legacy_csv_exact_match_evaluator.py Legacy CSV exact match evaluator.
packages/uipath-eval/src/uipath/eval/evaluators/attachment_utils.py Job-attachment URI download helpers.
packages/uipath-eval/src/uipath/eval/evaluators/line_by_line_utils.py Line-by-line evaluation utilities used by legacy evaluators.
packages/uipath-eval/src/uipath/eval/evaluators/exact_match_evaluator.py Coded exact-match evaluator.
packages/uipath-eval/src/uipath/eval/evaluators/contains_evaluator.py Coded contains evaluator.
packages/uipath-eval/src/uipath/eval/evaluators/json_similarity_evaluator.py Coded JSON similarity evaluator.
packages/uipath-eval/src/uipath/eval/evaluators/llm_as_judge_evaluator.py Coded LLM-as-judge core logic.
packages/uipath-eval/src/uipath/eval/evaluators/llm_judge_output_evaluator.py Coded LLM judge output evaluators.
packages/uipath-eval/src/uipath/eval/evaluators/llm_judge_trajectory_evaluator.py Coded LLM judge trajectory evaluators.
packages/uipath-eval/src/uipath/eval/evaluators/binary_classification_evaluator.py Binary classification evaluator + aggregation.
packages/uipath-eval/src/uipath/eval/evaluators/multiclass_classification_evaluator.py Multiclass classification evaluator + aggregation.
packages/uipath-eval/src/uipath/eval/evaluators/tool_call_order_evaluator.py Tool call order evaluator.
packages/uipath-eval/src/uipath/eval/evaluators/tool_call_args_evaluator.py Tool call args evaluator.
packages/uipath-eval/src/uipath/eval/evaluators/tool_call_count_evaluator.py Tool call count evaluator.
packages/uipath-eval/src/uipath/eval/evaluators/tool_call_output_evaluator.py Tool call output evaluator.
packages/uipath-eval/src/uipath/eval/evaluators/base_evaluator.py Core coded evaluator base infrastructure.
packages/uipath-eval/src/uipath/eval/evaluators/output_evaluator.py Output extraction + aggregation helpers for coded evaluators.
packages/uipath-eval/src/uipath/eval/evaluators_types/generate_types.py Script to generate JSON type specs.
packages/uipath-eval/src/uipath/eval/evaluators_types/*.json Generated evaluator config/criteria/justification schemas.
packages/uipath-eval/tests/evaluators/init.py Tests package init.
packages/uipath-eval/tests/evaluators/test_output_path.py Tests for nested output-path resolution.
packages/uipath-eval/tests/evaluators/test_helpers.py Tests for helper utilities (e.g., is_empty_value).
packages/uipath-eval/tests/evaluators/test_legacy_trajectory_evaluator.py Regression test for legacy trajectory prompt compaction.
packages/uipath-eval/tests/evaluators/test_evaluator_factory.py EvaluatorFactory tests (incl config prep and loading).
packages/uipath-eval/tests/evaluators/test_attachment_utils.py Tests for attachment URI parsing/downloading helpers.
packages/uipath-eval/tests/evaluators/test_documentation_examples.py Documentation example coverage tests.
packages/uipath-eval/tests/evaluators/test_eval_level_expected_output.py Tests around expected output placement.
packages/uipath-eval/tests/evaluators/test_evaluator_aggregation.py Aggregation behavior tests for evaluators.
packages/uipath-eval/tests/evaluators/test_evaluator_helpers.py Tests for evaluator helper functions.
packages/uipath-eval/tests/evaluators/test_evaluator_methods.py Broad evaluator behavior tests.
packages/uipath-eval/tests/evaluators/test_evaluator_schemas.py Schema generation/validation tests.
packages/uipath-eval/tests/evaluators/test_legacy_target_output_key_paths.py Legacy targetOutputKey path tests.
packages/uipath-eval/tests/evaluators/test_line_by_line_utils.py Tests for line-by-line evaluation utilities.
packages/uipath-eval/tests/evaluators/test_llm_judge_placeholder_validation.py Tests for LLM judge prompt placeholder validation.
packages/uipath-eval/tests/eval/test_evaluate.py End-to-end eval runtime tests invoking evaluate().
packages/uipath-eval/tests/eval/test_eval_tracing_integration.py Tracing integration tests for runtime/evals.
packages/uipath-eval/tests/eval/test_eval_runtime_suspend_resume.py Suspend/resume flow tests.
packages/uipath-eval/tests/eval/test_eval_runtime_metadata.py Runtime metadata access tests.
packages/uipath-eval/tests/eval/test_eval_resume_flow.py Resume-mode selection/validation tests.
packages/uipath-eval/tests/eval/test_eval_id_casing.py Regression tests for case-insensitive GUID ids.
packages/uipath-eval/tests/eval/test_conversational_utils.py Conversational eval conversion tests.
packages/uipath-eval/tests/eval/test_input_overrides_e2e.py E2E tests for per-eval input overrides utilities.
packages/uipath-eval/tests/eval/test_apply_file_overrides.py Tests for applying file/attachment overrides in inputs.
packages/uipath-eval/tests/eval/test_eval_runtime_spans.py Span handling/persistence tests.
packages/uipath-eval/tests/eval/test_eval_set.py Eval set parsing/migration tests.
packages/uipath-eval/tests/eval/test_eval_span_utils.py Span utility tests.
packages/uipath-eval/tests/eval/test_eval_util.py Misc eval util tests.
packages/uipath-eval/tests/eval/test_span_persistence.py Span persistence behavior tests.
packages/uipath-eval/tests/eval/mocks/test_mockable_arg_collision.py Regression test for @mockable arg-name collisions.
packages/uipath-eval/tests/eval/mocks/test_input_mocker.py Tests for LLM input mock generation.
packages/uipath-eval/tests/eval/mocks/test_input_mocker_span.py Tests for tracing spans during input mocking.
packages/uipath-eval/tests/eval/mocks/test_cache_manager.py Tests for cache manager read/write/invalidations.
packages/uipath-eval/tests/eval/mocks/test_mocks.py Broader mock/simulation behavior tests.
packages/uipath-eval/tests/eval/mocks/test_mockable_mocked_annotation.py Tests for @mockable annotation handling.
packages/uipath-eval/tests/eval/mocks/test_structured_output.py Tests for provider-agnostic structured output handling.
packages/uipath-eval/tests/eval/evals/evaluators/exact-match.json Test evaluator spec fixture.
packages/uipath-eval/tests/eval/evals/eval-sets/default.json Test eval-set fixture.
packages/uipath-eval/tests/eval/evals/eval-sets/multiple-evals.json Test multi-eval-set fixture.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a5f44181ae

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

UIPATH_FOLDER_KEY: ${{ secrets.UIPATH_MEMORY_FOLDER }}
run: uv run pytest tests/services/test_memory_service_e2e.py -m e2e -v --no-cov

test-uipath-eval:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include eval tests in the required test gate

This new test-uipath-eval matrix job is not included in the test-gate job's needs list or failure check at the bottom of this workflow, so a PR can still get a passing required Test status even when all uipath-eval tests fail. Since this commit moves the eval framework and its tests into this package, the gate should depend on and check test-uipath-eval as well.

Useful? React with 👍 / 👎.

Comment thread .github/workflows/cd.yml
Comment on lines +159 to 160
build-uipath-eval:
needs: [detect-publishable-packages, wait-for-uipath-platform]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Wait for core before building eval releases

When a release publishes a new uipath-core and uipath-eval version without also publishing uipath-platform or uipath, wait-for-uipath-core is skipped because its condition only mentions platform/uipath, and this new eval build only waits on wait-for-uipath-platform (which just skips if platform is not being published). In that scenario needs-relock: true runs uv lock --no-sources for eval before the new core version is visible on PyPI, causing intermittent release failures or locking against the previous core if the lower bound was not updated.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test:uipath-integrations test:uipath-langchain Triggers tests in the uipath-langchain-python repository

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants