test(uipath-agents): coded anti-pattern + bindings sweep tests (PR 5/5) by radugheo · Pull Request #476 · UiPath/skills

radugheo · 2026-04-29T18:24:05Z

Summary

Adds seven coded-agent tests — four anti-pattern smokes that verify the skill catches the most expensive coded mistakes, plus three bindings-sweep tests that close the binding-resource and entrypoint-resolution gaps.

Anti-pattern tests (smoke)

Each task seeds a violating project layout in the prompt; the skill must drive the agent to detect and fix the violation in place.

Task	What it covers
`skill-agent-coded-antipattern-build-system`	Pre-seeded `pyproject.toml` with `[build-system]` (hatchling). Check verifies the section is removed and `[project]` survives.
`skill-agent-coded-antipattern-module-level-llm`	Pre-seeded LangGraph `main.py` with `llm = UiPathChat(...)` at module level. Check verifies no module-level UiPath* construction remains and the top-level `graph` variable is preserved.
`skill-agent-coded-antipattern-wrong-sdk-import`	Pre-seeded `from uipath import UiPath` (raises `ImportError`). Check verifies the import was switched to `from uipath.platform import UiPath`.
`skill-agent-coded-antipattern-login-no-tenant`	Goal-only auth prompt with environment / org / tenant. Asserts `uip login --output json` and `uip login tenant set "MyTenant"` were both run.

Bindings sweep tests (e2e)

Task	What it covers
`skill-agents-bindings-queue-app-index`	Five binding types in one project — `queue`, `app`, `index`, `connection`, `mcpServer`. Asserts one entry per resource with the correct key (bare for connections; bare for the no-folder queue), value block, and metadata (`ActivityName`, `DisplayLabel`, `UseConnectionService`).
`skill-agents-bindings-asset-subtypes`	Three asset retrieves annotated `str` / `int` / `bool`. Asserts three asset bindings; per-binding `metadata.SubType` either matches the annotation (`stringAsset` / `integerAsset` / `booleanAsset`) OR is omitted. A wrong SubType fails — that would mis-place the resource at `uipath push` time.
`skill-agents-bindings-multi-entrypoint`	`entry-points.json` with two entrypoints, two SDK calls across two source files. Asserts each binding's `EntryPointUniqueId` references one of the real entrypoints (or has no entrypoint link at all); fabricated UUIDs fail.

Stacked on #475.

Test plan

All six sidecar check_*.py scripts dry-run green against synthetic well-formed projects (the seventh task, antipattern-login-no-tenant, has no sidecar — pure command_executed checks).
Negative cases verified: leaving [build-system] in fails the build-system check; from uipath import UiPath fails the import check; a fabricated entrypoint UUID fails the multi-entrypoint check; SubType="credentialAsset" on a str-annotated retrieve fails the asset-subtypes check.
All nineteen task YAMLs in coded/ parse and tag lists are consistent.

github-actions · 2026-04-29T18:24:20Z

Claude finished @radugheo's task in 2m 39s —— View job

Summary

Adds seven coder_eval test tasks (four anti-pattern smoke tests, three bindings-sweep e2e tests) and six Python check scripts for the uipath-agents coded-agent skill. Well-structured, consistent with existing tasks, and the check scripts use the shared _shared/ helpers correctly.

Change-by-Change Review

1. Anti-pattern task YAMLs (4 files)

Severity: OK — antipattern_build_system.yaml, antipattern_login_no_tenant.yaml, antipattern_module_level_llm.yaml, antipattern_wrong_sdk_import.yaml

Tags are correct (uipath-agents first, smoke test type, valid facets). task_id follows skill-agent-coded-antipattern-*, consistent with skill-agent-coded-* convention. Each has max_iterations: 1, reasonable success_criteria weights, and minimal initial_prompts that describe the goal and seed the violating project without prescribing steps.

2. Bindings sweep task YAMLs (3 files)

Severity: Medium — bindings_asset_subtypes.yaml, bindings_multi_entrypoint.yaml, bindings_queue_app_index.yaml

Tags are correct (uipath-agents first, e2e test type, valid facets). However, the task_id values deviate from the established convention:

New bindings task_id	Convention would be
`skill-agents-bindings-asset-subtypes`	`skill-agent-coded-bindings-asset-subtypes`
`skill-agents-bindings-multi-entrypoint`	`skill-agent-coded-bindings-multi-entrypoint`
`skill-agents-bindings-queue-app-index`	`skill-agent-coded-bindings-queue-app-index`

All 16 existing tasks in coded/ use skill-agent-coded-*. The new bindings tasks use skill-agents-bindings-* — two deviations: agents instead of agent, and drops coded. If this is intentional (a new naming tier for bindings-specific tests), it should be documented. Otherwise, align with the existing pattern.

Fix this →

3. `check_antipattern_build_system.py`

Severity: OK

Clean regex-based check: line-anchored [build-system] detection, sanity check that [project] survived. Correct use of re.M flag. No external dependencies.

4. `check_antipattern_module_level_llm.py`

Severity: OK

Correctly imports from _shared.ast_lazy_init_check for AST-level module-level client detection. Also verifies the graph = export survived the refactor. Good separation of concerns.

5. `check_antipattern_wrong_sdk_import.py`

Severity: Low — minor regex edge case

The WRONG regex (check_antipattern_wrong_sdk_import.py:21):

WRONG = re.compile(r"^\s*from\s+uipath\s+import\s+(?:[^,\n]*,\s*)*UiPath(?:\s*,|$)", re.M)

The trailing (?:\s*,|$) requires UiPath to be followed by a comma or end-of-line. An import with a trailing comment like from uipath import UiPath # wrong would not match. This is unlikely in practice (the agent would typically strip comments when recreating), and the RIGHT regex catches the missing correct import regardless, so it's a defense-in-depth gap rather than a functional bug. A more robust alternative:

WRONG = re.compile(r"^\s*from\s+uipath\s+import\s+(?:[^,\n]*,\s*)*UiPath\b", re.M)

Using \b (word boundary) covers all trailing contexts — matches the same approach used in the RIGHT regex on line 22.

6. `check_bindings_asset_subtypes.py`

Severity: OK

Good acceptance criteria: requires exactly 3 asset bindings, accepts SubType either matching the annotation or being omitted, and explicitly fails on wrong SubType. Uses _shared.bindings_assertions correctly.

7. `check_bindings_multi_entrypoint.py`

Severity: OK

Reads real entrypoint UUIDs from entry-points.json, then validates any EntryPointUniqueId in the bindings references a real one. Correctly accepts absence of the field (agent might leave it off). The fabricated-UUID detection is the load-bearing assertion.

8. `check_bindings_queue_app_index.py`

Severity: OK

Covers all five resource types with correct key construction rules (bare key for connections, <name>.<folder> for others). Validates metadata fields (ActivityName, DisplayLabel, UseConnectionService). Deduplication check ensures exactly one entry per kind.

What's Missing

--output json not enforced on tenant set: antipattern_login_no_tenant.yaml:46 — the prompt tells the agent "Use --output json on every uip command you run" but the command_pattern for tenant set doesn't require it. The uip login pattern on line 38 does enforce --output json. This is likely intentional leniency (tenant set is the critical behavior being tested), but noting the asymmetry.
Nothing else identified.

Area Ratings

Area	Status	Notes
Frontmatter	N/A	No skill changes — tests only
E2E Tests	Issue	`task_id` naming inconsistency on bindings tasks
Skill Body	N/A	No skill changes
References & Assets	N/A	No skill changes
Repo Hygiene	OK	No secrets, no personal paths, no cross-skill refs, changes scoped to test tasks

Issues for Manual Review

The index binding (check_bindings_queue_app_index.py:49) asserts ActivityName = "retrieve_async" even though the call site is search_async. The check comment explains this matches the bindings reference ("ALL methods bind via retrieve_async"). Verify this is still the current contract — if the reference changes, this assertion will silently pass incorrect bindings.
The antipattern_login_no_tenant task seeds only a goal prompt (no project files). Verify the skill activates correctly from a bare auth request without a pre-existing project context.

Conclusion

Clean PR. The tests are well-designed with good positive/negative coverage and correct use of shared helpers. One Medium issue: the three bindings task_id values should align with the established skill-agent-coded-* convention. One Low cosmetic: the WRONG regex in check_antipattern_wrong_sdk_import.py could use \b instead of (?:\s*,|$) for consistency with the RIGHT regex on the same page. Neither blocks merge.

Anti-pattern smokes — each task seeds a violating project layout and the skill must drive the agent to detect and fix the violation: - skill-agent-coded-antipattern-build-system — pre-seeded pyproject.toml carries [build-system] hatchling. Check verifies the section is removed and [project] survives. - skill-agent-coded-antipattern-module-level-llm — pre-seeded LangGraph main.py constructs UiPathChat() at module level. Check verifies no module-level UiPath* construction remains and the top-level `graph` variable is preserved. - skill-agent-coded-antipattern-wrong-sdk-import — pre-seeded main.py uses `from uipath import UiPath` (raises ImportError). Check verifies the import was switched to `from uipath.platform import UiPath`. Bindings sweep tests (e2e): - skill-agents-bindings-queue-app-index — five binding types in one project (queue, app, index, connection, mcpServer). Asserts one entry per resource with correct keys, value blocks, and metadata. Connection key is bare; queue with no folder_path is also bare (per the bindings reference No-folder edge case). - skill-agents-bindings-asset-subtypes — three asset retrieves annotated str / int / bool. Asserts three asset bindings; per-binding SubType either matches the annotation (stringAsset / integerAsset / booleanAsset) OR is omitted (the bindings reference allows the omit-fallback). Wrong SubType fails — that would mis-place the resource at uipath push time. - skill-agents-bindings-multi-entrypoint — entry-points.json with two entrypoints. Asserts each binding's EntryPointUniqueId references one of the real entrypoints (or no link at all); fabricated UUIDs fail.

radugheo requested a review from uipreliga as a code owner April 30, 2026 09:27

radugheo force-pushed the test/coded-tests-capabilities branch from 15da95f to 4aa145b Compare April 30, 2026 09:53

radugheo force-pushed the test/coded-tests-remaining-gaps branch from 0e6d5a8 to 14f863f Compare April 30, 2026 09:53

radugheo force-pushed the test/coded-tests-capabilities branch from 4aa145b to 23b712b Compare April 30, 2026 09:57

radugheo force-pushed the test/coded-tests-remaining-gaps branch from 14f863f to 684055a Compare April 30, 2026 09:57

radugheo force-pushed the test/coded-tests-capabilities branch from 23b712b to 2a98b40 Compare April 30, 2026 11:42

radugheo force-pushed the test/coded-tests-remaining-gaps branch from 684055a to 7c742c4 Compare April 30, 2026 11:43

radugheo force-pushed the test/coded-tests-capabilities branch from 2a98b40 to d949d53 Compare April 30, 2026 12:04

radugheo force-pushed the test/coded-tests-remaining-gaps branch from 7c742c4 to a290b64 Compare April 30, 2026 12:04

radugheo force-pushed the test/coded-tests-capabilities branch from d949d53 to 9ed1291 Compare May 4, 2026 09:37

radugheo force-pushed the test/coded-tests-remaining-gaps branch from a290b64 to aa14992 Compare May 4, 2026 09:37

radugheo force-pushed the test/coded-tests-capabilities branch from 9ed1291 to fa8bc6b Compare May 4, 2026 10:10

radugheo force-pushed the test/coded-tests-remaining-gaps branch from aa14992 to 7e708c0 Compare May 4, 2026 10:10

cosmyo approved these changes May 4, 2026

View reviewed changes

radugheo force-pushed the test/coded-tests-capabilities branch from fa8bc6b to a2b7c01 Compare May 4, 2026 13:27

radugheo force-pushed the test/coded-tests-remaining-gaps branch from 7e708c0 to bf81f67 Compare May 4, 2026 13:27

radugheo force-pushed the test/coded-tests-capabilities branch from a2b7c01 to d168789 Compare May 4, 2026 13:32

radugheo force-pushed the test/coded-tests-remaining-gaps branch from bf81f67 to 97e8686 Compare May 4, 2026 13:32

radugheo force-pushed the test/coded-tests-capabilities branch from d168789 to 59ee4c3 Compare May 4, 2026 15:42

radugheo force-pushed the test/coded-tests-remaining-gaps branch 2 times, most recently from 6221f86 to b9f704e Compare May 4, 2026 15:59

radugheo force-pushed the test/coded-tests-capabilities branch 2 times, most recently from 72a8cc6 to 23543b3 Compare May 4, 2026 16:35

radugheo force-pushed the test/coded-tests-remaining-gaps branch from b9f704e to 310ccec Compare May 4, 2026 16:36

radugheo force-pushed the test/coded-tests-capabilities branch from 23543b3 to e3cfe5b Compare May 4, 2026 17:24

radugheo force-pushed the test/coded-tests-remaining-gaps branch from 310ccec to 36ddc2e Compare May 4, 2026 17:24

radugheo force-pushed the test/coded-tests-capabilities branch from e3cfe5b to f48ca21 Compare May 4, 2026 19:07

radugheo force-pushed the test/coded-tests-remaining-gaps branch from 36ddc2e to c3506a6 Compare May 4, 2026 19:07

radugheo force-pushed the test/coded-tests-capabilities branch from f48ca21 to 88fa6a4 Compare May 4, 2026 21:23

radugheo force-pushed the test/coded-tests-remaining-gaps branch from c3506a6 to cfc79d0 Compare May 4, 2026 21:23

radugheo force-pushed the test/coded-tests-capabilities branch from 88fa6a4 to b759597 Compare May 4, 2026 21:25

radugheo force-pushed the test/coded-tests-remaining-gaps branch from cfc79d0 to 2f9ec34 Compare May 4, 2026 21:25

radugheo force-pushed the test/coded-tests-capabilities branch from b759597 to c77a1bc Compare May 4, 2026 21:26

radugheo force-pushed the test/coded-tests-remaining-gaps branch from 2f9ec34 to 71a8126 Compare May 4, 2026 21:26

radugheo force-pushed the test/coded-tests-capabilities branch from c77a1bc to 6924bbd Compare May 5, 2026 08:34

Base automatically changed from test/coded-tests-capabilities to main May 5, 2026 08:35

radugheo force-pushed the test/coded-tests-remaining-gaps branch from 71a8126 to d4a4df7 Compare May 5, 2026 08:35

radugheo merged commit 0c28dfe into main May 5, 2026
3 checks passed

radugheo deleted the test/coded-tests-remaining-gaps branch May 5, 2026 08:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(uipath-agents): coded anti-pattern + bindings sweep tests (PR 5/5)#476

test(uipath-agents): coded anti-pattern + bindings sweep tests (PR 5/5)#476
radugheo merged 1 commit into
mainfrom
test/coded-tests-remaining-gaps

radugheo commented Apr 29, 2026

Uh oh!

github-actions Bot commented Apr 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

radugheo commented Apr 29, 2026

Summary

Anti-pattern tests (smoke)

Bindings sweep tests (e2e)

Test plan

Uh oh!

github-actions Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Change-by-Change Review

1. Anti-pattern task YAMLs (4 files)

2. Bindings sweep task YAMLs (3 files)

3. check_antipattern_build_system.py

4. check_antipattern_module_level_llm.py

5. check_antipattern_wrong_sdk_import.py

6. check_bindings_asset_subtypes.py

7. check_bindings_multi_entrypoint.py

8. check_bindings_queue_app_index.py

What's Missing

Area Ratings

Issues for Manual Review

Conclusion

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented Apr 29, 2026 •

edited

Loading

3. `check_antipattern_build_system.py`

4. `check_antipattern_module_level_llm.py`

5. `check_antipattern_wrong_sdk_import.py`

6. `check_bindings_asset_subtypes.py`

7. `check_bindings_multi_entrypoint.py`

8. `check_bindings_queue_app_index.py`