Skip to content

test(uipath-agents): coded anti-pattern + bindings sweep tests (PR 5/5)#476

Merged
radugheo merged 1 commit into
mainfrom
test/coded-tests-remaining-gaps
May 5, 2026
Merged

test(uipath-agents): coded anti-pattern + bindings sweep tests (PR 5/5)#476
radugheo merged 1 commit into
mainfrom
test/coded-tests-remaining-gaps

Conversation

@radugheo
Copy link
Copy Markdown
Contributor

Summary

Adds seven coded-agent tests — four anti-pattern smokes that verify the skill catches the most expensive coded mistakes, plus three bindings-sweep tests that close the binding-resource and entrypoint-resolution gaps.

Anti-pattern tests (smoke)

Each task seeds a violating project layout in the prompt; the skill must drive the agent to detect and fix the violation in place.

Task What it covers
skill-agent-coded-antipattern-build-system Pre-seeded pyproject.toml with [build-system] (hatchling). Check verifies the section is removed and [project] survives.
skill-agent-coded-antipattern-module-level-llm Pre-seeded LangGraph main.py with llm = UiPathChat(...) at module level. Check verifies no module-level UiPath* construction remains and the top-level graph variable is preserved.
skill-agent-coded-antipattern-wrong-sdk-import Pre-seeded from uipath import UiPath (raises ImportError). Check verifies the import was switched to from uipath.platform import UiPath.
skill-agent-coded-antipattern-login-no-tenant Goal-only auth prompt with environment / org / tenant. Asserts uip login --output json and uip login tenant set "MyTenant" were both run.

Bindings sweep tests (e2e)

Task What it covers
skill-agents-bindings-queue-app-index Five binding types in one project — queue, app, index, connection, mcpServer. Asserts one entry per resource with the correct key (bare for connections; bare for the no-folder queue), value block, and metadata (ActivityName, DisplayLabel, UseConnectionService).
skill-agents-bindings-asset-subtypes Three asset retrieves annotated str / int / bool. Asserts three asset bindings; per-binding metadata.SubType either matches the annotation (stringAsset / integerAsset / booleanAsset) OR is omitted. A wrong SubType fails — that would mis-place the resource at uipath push time.
skill-agents-bindings-multi-entrypoint entry-points.json with two entrypoints, two SDK calls across two source files. Asserts each binding's EntryPointUniqueId references one of the real entrypoints (or has no entrypoint link at all); fabricated UUIDs fail.

Stacked on #475.

Test plan

  • All six sidecar check_*.py scripts dry-run green against synthetic well-formed projects (the seventh task, antipattern-login-no-tenant, has no sidecar — pure command_executed checks).
  • Negative cases verified: leaving [build-system] in fails the build-system check; from uipath import UiPath fails the import check; a fabricated entrypoint UUID fails the multi-entrypoint check; SubType="credentialAsset" on a str-annotated retrieve fails the asset-subtypes check.
  • All nineteen task YAMLs in coded/ parse and tag lists are consistent.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 29, 2026

Claude finished @radugheo's task in 2m 39s —— View job


Summary

Adds seven coder_eval test tasks (four anti-pattern smoke tests, three bindings-sweep e2e tests) and six Python check scripts for the uipath-agents coded-agent skill. Well-structured, consistent with existing tasks, and the check scripts use the shared _shared/ helpers correctly.

Change-by-Change Review

1. Anti-pattern task YAMLs (4 files)

Severity: OKantipattern_build_system.yaml, antipattern_login_no_tenant.yaml, antipattern_module_level_llm.yaml, antipattern_wrong_sdk_import.yaml

Tags are correct (uipath-agents first, smoke test type, valid facets). task_id follows skill-agent-coded-antipattern-*, consistent with skill-agent-coded-* convention. Each has max_iterations: 1, reasonable success_criteria weights, and minimal initial_prompts that describe the goal and seed the violating project without prescribing steps.

2. Bindings sweep task YAMLs (3 files)

Severity: Mediumbindings_asset_subtypes.yaml, bindings_multi_entrypoint.yaml, bindings_queue_app_index.yaml

Tags are correct (uipath-agents first, e2e test type, valid facets). However, the task_id values deviate from the established convention:

New bindings task_id Convention would be
skill-agents-bindings-asset-subtypes skill-agent-coded-bindings-asset-subtypes
skill-agents-bindings-multi-entrypoint skill-agent-coded-bindings-multi-entrypoint
skill-agents-bindings-queue-app-index skill-agent-coded-bindings-queue-app-index

All 16 existing tasks in coded/ use skill-agent-coded-*. The new bindings tasks use skill-agents-bindings-* — two deviations: agents instead of agent, and drops coded. If this is intentional (a new naming tier for bindings-specific tests), it should be documented. Otherwise, align with the existing pattern.

Fix this →

3. check_antipattern_build_system.py

Severity: OK

Clean regex-based check: line-anchored [build-system] detection, sanity check that [project] survived. Correct use of re.M flag. No external dependencies.

4. check_antipattern_module_level_llm.py

Severity: OK

Correctly imports from _shared.ast_lazy_init_check for AST-level module-level client detection. Also verifies the graph = export survived the refactor. Good separation of concerns.

5. check_antipattern_wrong_sdk_import.py

Severity: Low — minor regex edge case

The WRONG regex (check_antipattern_wrong_sdk_import.py:21):

WRONG = re.compile(r"^\s*from\s+uipath\s+import\s+(?:[^,\n]*,\s*)*UiPath(?:\s*,|$)", re.M)

The trailing (?:\s*,|$) requires UiPath to be followed by a comma or end-of-line. An import with a trailing comment like from uipath import UiPath # wrong would not match. This is unlikely in practice (the agent would typically strip comments when recreating), and the RIGHT regex catches the missing correct import regardless, so it's a defense-in-depth gap rather than a functional bug. A more robust alternative:

WRONG = re.compile(r"^\s*from\s+uipath\s+import\s+(?:[^,\n]*,\s*)*UiPath\b", re.M)

Using \b (word boundary) covers all trailing contexts — matches the same approach used in the RIGHT regex on line 22.

6. check_bindings_asset_subtypes.py

Severity: OK

Good acceptance criteria: requires exactly 3 asset bindings, accepts SubType either matching the annotation or being omitted, and explicitly fails on wrong SubType. Uses _shared.bindings_assertions correctly.

7. check_bindings_multi_entrypoint.py

Severity: OK

Reads real entrypoint UUIDs from entry-points.json, then validates any EntryPointUniqueId in the bindings references a real one. Correctly accepts absence of the field (agent might leave it off). The fabricated-UUID detection is the load-bearing assertion.

8. check_bindings_queue_app_index.py

Severity: OK

Covers all five resource types with correct key construction rules (bare key for connections, <name>.<folder> for others). Validates metadata fields (ActivityName, DisplayLabel, UseConnectionService). Deduplication check ensures exactly one entry per kind.

What's Missing

  • --output json not enforced on tenant set: antipattern_login_no_tenant.yaml:46 — the prompt tells the agent "Use --output json on every uip command you run" but the command_pattern for tenant set doesn't require it. The uip login pattern on line 38 does enforce --output json. This is likely intentional leniency (tenant set is the critical behavior being tested), but noting the asymmetry.
  • Nothing else identified.

Area Ratings

Area Status Notes
Frontmatter N/A No skill changes — tests only
E2E Tests Issue task_id naming inconsistency on bindings tasks
Skill Body N/A No skill changes
References & Assets N/A No skill changes
Repo Hygiene OK No secrets, no personal paths, no cross-skill refs, changes scoped to test tasks

Issues for Manual Review

  • The index binding (check_bindings_queue_app_index.py:49) asserts ActivityName = "retrieve_async" even though the call site is search_async. The check comment explains this matches the bindings reference ("ALL methods bind via retrieve_async"). Verify this is still the current contract — if the reference changes, this assertion will silently pass incorrect bindings.
  • The antipattern_login_no_tenant task seeds only a goal prompt (no project files). Verify the skill activates correctly from a bare auth request without a pre-existing project context.

Conclusion

Clean PR. The tests are well-designed with good positive/negative coverage and correct use of shared helpers. One Medium issue: the three bindings task_id values should align with the established skill-agent-coded-* convention. One Low cosmetic: the WRONG regex in check_antipattern_wrong_sdk_import.py could use \b instead of (?:\s*,|$) for consistency with the RIGHT regex on the same page. Neither blocks merge.


@radugheo radugheo requested a review from uipreliga as a code owner April 30, 2026 09:27
@radugheo radugheo force-pushed the test/coded-tests-capabilities branch from 15da95f to 4aa145b Compare April 30, 2026 09:53
@radugheo radugheo force-pushed the test/coded-tests-remaining-gaps branch from 0e6d5a8 to 14f863f Compare April 30, 2026 09:53
@radugheo radugheo force-pushed the test/coded-tests-capabilities branch from 4aa145b to 23b712b Compare April 30, 2026 09:57
@radugheo radugheo force-pushed the test/coded-tests-remaining-gaps branch from 14f863f to 684055a Compare April 30, 2026 09:57
@radugheo radugheo force-pushed the test/coded-tests-capabilities branch from 23b712b to 2a98b40 Compare April 30, 2026 11:42
@radugheo radugheo force-pushed the test/coded-tests-remaining-gaps branch from 684055a to 7c742c4 Compare April 30, 2026 11:43
@radugheo radugheo force-pushed the test/coded-tests-capabilities branch from 2a98b40 to d949d53 Compare April 30, 2026 12:04
@radugheo radugheo force-pushed the test/coded-tests-remaining-gaps branch from 7c742c4 to a290b64 Compare April 30, 2026 12:04
@radugheo radugheo force-pushed the test/coded-tests-capabilities branch from d949d53 to 9ed1291 Compare May 4, 2026 09:37
@radugheo radugheo force-pushed the test/coded-tests-remaining-gaps branch from a290b64 to aa14992 Compare May 4, 2026 09:37
@radugheo radugheo force-pushed the test/coded-tests-capabilities branch from 9ed1291 to fa8bc6b Compare May 4, 2026 10:10
@radugheo radugheo force-pushed the test/coded-tests-remaining-gaps branch from aa14992 to 7e708c0 Compare May 4, 2026 10:10
@radugheo radugheo force-pushed the test/coded-tests-capabilities branch from fa8bc6b to a2b7c01 Compare May 4, 2026 13:27
@radugheo radugheo force-pushed the test/coded-tests-remaining-gaps branch from 7e708c0 to bf81f67 Compare May 4, 2026 13:27
@radugheo radugheo force-pushed the test/coded-tests-capabilities branch from a2b7c01 to d168789 Compare May 4, 2026 13:32
@radugheo radugheo force-pushed the test/coded-tests-remaining-gaps branch from bf81f67 to 97e8686 Compare May 4, 2026 13:32
@radugheo radugheo force-pushed the test/coded-tests-capabilities branch from d168789 to 59ee4c3 Compare May 4, 2026 15:42
@radugheo radugheo force-pushed the test/coded-tests-remaining-gaps branch 2 times, most recently from 6221f86 to b9f704e Compare May 4, 2026 15:59
@radugheo radugheo force-pushed the test/coded-tests-capabilities branch 2 times, most recently from 72a8cc6 to 23543b3 Compare May 4, 2026 16:35
@radugheo radugheo force-pushed the test/coded-tests-remaining-gaps branch from b9f704e to 310ccec Compare May 4, 2026 16:36
@radugheo radugheo force-pushed the test/coded-tests-capabilities branch from 23543b3 to e3cfe5b Compare May 4, 2026 17:24
@radugheo radugheo force-pushed the test/coded-tests-remaining-gaps branch from 310ccec to 36ddc2e Compare May 4, 2026 17:24
@radugheo radugheo force-pushed the test/coded-tests-capabilities branch from e3cfe5b to f48ca21 Compare May 4, 2026 19:07
@radugheo radugheo force-pushed the test/coded-tests-remaining-gaps branch from 36ddc2e to c3506a6 Compare May 4, 2026 19:07
@radugheo radugheo force-pushed the test/coded-tests-capabilities branch from f48ca21 to 88fa6a4 Compare May 4, 2026 21:23
@radugheo radugheo force-pushed the test/coded-tests-remaining-gaps branch from c3506a6 to cfc79d0 Compare May 4, 2026 21:23
@radugheo radugheo force-pushed the test/coded-tests-capabilities branch from 88fa6a4 to b759597 Compare May 4, 2026 21:25
@radugheo radugheo force-pushed the test/coded-tests-remaining-gaps branch from cfc79d0 to 2f9ec34 Compare May 4, 2026 21:25
@radugheo radugheo force-pushed the test/coded-tests-capabilities branch from b759597 to c77a1bc Compare May 4, 2026 21:26
@radugheo radugheo force-pushed the test/coded-tests-remaining-gaps branch from 2f9ec34 to 71a8126 Compare May 4, 2026 21:26
@radugheo radugheo force-pushed the test/coded-tests-capabilities branch from c77a1bc to 6924bbd Compare May 5, 2026 08:34
Base automatically changed from test/coded-tests-capabilities to main May 5, 2026 08:35
Anti-pattern smokes — each task seeds a violating project layout
and the skill must drive the agent to detect and fix the violation:

- skill-agent-coded-antipattern-build-system — pre-seeded
  pyproject.toml carries [build-system] hatchling. Check verifies
  the section is removed and [project] survives.
- skill-agent-coded-antipattern-module-level-llm — pre-seeded
  LangGraph main.py constructs UiPathChat() at module level.
  Check verifies no module-level UiPath* construction remains and
  the top-level `graph` variable is preserved.
- skill-agent-coded-antipattern-wrong-sdk-import — pre-seeded
  main.py uses `from uipath import UiPath` (raises ImportError).
  Check verifies the import was switched to `from uipath.platform
  import UiPath`.

Bindings sweep tests (e2e):

- skill-agents-bindings-queue-app-index — five binding types in
  one project (queue, app, index, connection, mcpServer). Asserts
  one entry per resource with correct keys, value blocks, and
  metadata. Connection key is bare; queue with no folder_path is
  also bare (per the bindings reference No-folder edge case).
- skill-agents-bindings-asset-subtypes — three asset retrieves
  annotated str / int / bool. Asserts three asset bindings;
  per-binding SubType either matches the annotation
  (stringAsset / integerAsset / booleanAsset) OR is omitted (the
  bindings reference allows the omit-fallback). Wrong SubType
  fails — that would mis-place the resource at uipath push time.
- skill-agents-bindings-multi-entrypoint — entry-points.json with
  two entrypoints. Asserts each binding's EntryPointUniqueId
  references one of the real entrypoints (or no link at all);
  fabricated UUIDs fail.
@radugheo radugheo force-pushed the test/coded-tests-remaining-gaps branch from 71a8126 to d4a4df7 Compare May 5, 2026 08:35
@radugheo radugheo merged commit 0c28dfe into main May 5, 2026
3 checks passed
@radugheo radugheo deleted the test/coded-tests-remaining-gaps branch May 5, 2026 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants