Conversation
|
Claude finished @t-hsia's task in 4m 37s —— View job SummaryThis PR adds a new Change-by-Change Review1.
|
| Area | Status | Notes |
|---|---|---|
| Frontmatter | ✅ OK | Valid YAML, 305-char description, correct → redirects |
| E2E Tests | ✅ OK | 5 smoke + 3 e2e tasks with good success criteria |
| Skill Body | Cross-skill references in onboarding-workflows.md | |
| References & Assets | Significant duplication between identity-commands.md and domain files; placeholder steps in onboarding | |
| Repo Hygiene | Missing CODEOWNERS entry for test path |
Issues for Manual Review
- CLI command accuracy — Unable to verify that
uip admin identitycommands, flags, and output codes are correct against the actual CLI. The commands look well-structured but should be validated againstuip admin identity --helpoutput. users inviteand--organizationflag — Isinviteintentionally organization-agnostic? This differs from every other identity command and could confuse the agent.- Plugin loading in smoke tests — Verify that task-level
agentblocks (which omitplugins) correctly inherit plugin config from the experiment defaults incoder_eval. - Scope completeness — The skill covers Identity Server CRUD well. If there are other
uip adminsubcommands beyondidentity(e.g.,uip admin settings,uip admin audit), the skill nameuipath-adminmight be too broad for just identity management, or the skill should note its current scope explicitly.
Conclusion
Request changes. The skill is well-built with strong structure, good test coverage, and clear documentation. Three items need fixing before merge:
- High — Remove cross-skill references to
uipath-platforminonboarding-workflows.mdand resolve placeholder Steps 4-5 - Medium — Add CODEOWNERS entry for
/tests/tasks/uipath-admin/ - Medium — Reduce duplication between
identity-commands.mdand domain-specific reference files (domain files should link to the command reference for flag details)
|
|
||
| > **Preview** — Under active development. Command coverage will expand. | ||
|
|
||
| Identity Server management via `uip admin identity`. Users, groups, robot accounts, external OAuth2 apps. |
There was a problem hiding this comment.
Regarding "uipath-admin" folder name, I would suggest you to consult with @tomasz Religa if this is ok.
Because, we will have other admin commands.
|
I ran the AI linter on the new tasks - results below. Scope: 8 task YAMLs under Summary
Per-Task Verdicts
Per-Task Findings
|
| File | Prompt lines | Criteria lines |
|---|---|---|
identity_create_external_app_smoke.yaml |
20–27 | 62–87 |
identity_create_robot_account_smoke.yaml |
19–26 | 69–86 |
identity_list_groups_smoke.yaml |
18–23 | 58–70 |
identity_list_robot_accounts_smoke.yaml |
17–22 | 49–61 |
identity_list_users_smoke.yaml |
19–25 | 60–85 |
robot_account_onboarding_e2e.yaml |
32–37 | 92–109 |
Each prompt instructs the agent to write a report.json recording the very things the criteria then grade — command_used, commands_attempted, app_name, scopes_used, robot_name, listed_first, steps_completed. The file_exists, file_contains, and json_check criteria all read this self-written file. Combined with the explicit "no live tenant — commands will fail" caveat, a lazy agent satisfies criteria by running each CLI once (failing) and writing the expected JSON. Skill is not exercised against ground truth.
Fix: Remove the report.json requirement entirely. Grade on the CLI invocation itself:
- Replace
file_exists/file_contains/json_checkonreport.jsonwithrun_commandblocks that re-execute the canonical CLI and assert viaexpected_stdout/stdout_matchagainst the real (failing) error string when no tenant is connected — proves the agent constructed the right command. - Or, if smoke cannot reach a tenant, drop the file checks and tighten
command_patternregexes to pin exact flags + values (e.g.--display-name "Invoice Processing Bot",--scope "?OR\.(Folders|Jobs)").
Theme 2 — [High] Prompt over-specification: recipe-style numbered prompts (3 tasks)
| File | Prompt lines | Recipe length |
|---|---|---|
group_membership_management_e2e.yaml |
22–30 | 5 steps |
human_user_onboarding_e2e.yaml |
22–29 | 3 steps |
robot_account_onboarding_e2e.yaml |
25–37 | 2 steps |
Initial prompts enumerate the procedure step-by-step (1. Create… 2. List… 3. Add…). The skill's onboarding-workflows reference is supposed to teach this ordering; if the prompt enumerates it, an agent can pass without invoking the skill at all.
Fix: State the goal, not the procedure. Examples:
group_membership_management_e2e: "Set up a new 'Invoice Processing Team' group containing the first two users in the org, verify membership, then remove the second user."human_user_onboarding_e2e: "Onboard john.doe@example.com (John Doe) and add them to the 'Automation Developer' group once provisioned."robot_account_onboarding_e2e: "Onboard an unattended invoice-processing robot ('invoice-processor' / 'Invoice Processing Robot') with appropriate group access."
Theme 3 — [Low] Prompt over-specification: --output json prescription (all 8 tasks)
All prompts contain "Use --output json on (all|every uip admin identity) commands." Procedural — the skill already teaches --output json. Cross-cutting command_executed criteria already verify usage, so the prompt-level prescription is redundant and slightly leaks procedure.
Fix: Drop the --output json instruction from each initial_prompt. Keep the cross-cutting command_executed criterion that pattern-matches --output\s+json so the test still rewards correct flag use.
Top 3 Priority Fixes (whole PR)
- Remove the
report.jsonself-report pattern from all 6 smoke + e2e tasks (Theme 1). Most expensive issue — these tests grade the agent's own narrative, not platform behavior. - Convert numbered-step recipes in the 3 onboarding/management e2e prompts to goal statements (Theme 2). Lets the skill carry workflow knowledge.
- Add real platform side-effect checks (
run_command+json_checkofget-members/users listfiltered output) togroup_membership_management_e2eandhuman_user_onboarding_e2e. They're the only two tasks that can hit a live tenant; right now they only verify the agent typed the right commands.
Align skill docs and tests with CLI refactor (075b3a8d): - get-members → members list - add-members → members add - remove-members → members revoke - GroupMembersRemoved → GroupMembersRevoked Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Claude finished @t-hsia's task in 3m 7s —— View job Coder-eval task lint (advisory)8 task YAMLs changed; verdicts: 0 Critical, 6 High, 2 Medium, 0 Low, 0 OK. Rubric: .claude/commands/lint-task.md. This check is advisory and never blocks merge. Evidence of passing run❌ High — PR body does not claim the changed tasks have been run and passed. Please edit the PR description to add a line like: Per-task lint
|
| File | Prompt lines | Criteria lines |
|---|---|---|
identity_create_external_app_smoke.yaml |
20–27 | 62–87 |
identity_create_robot_account_smoke.yaml |
19–26 | 69–86 |
identity_list_groups_smoke.yaml |
18–23 | 58–70 |
identity_list_robot_accounts_smoke.yaml |
17–22 | 49–61 |
identity_list_users_smoke.yaml |
19–25 | 60–85 |
robot_account_onboarding_e2e.yaml |
32–37 | 92–109 |
Each prompt instructs the agent to write a report.json recording the very things the criteria then grade — command_used, commands_attempted, app_name, scopes_used, robot_name, listed_first, steps_completed. The file_exists, file_contains, and json_check criteria all read this self-written file. Combined with the explicit "no live tenant — commands will fail" caveat, a lazy agent satisfies criteria by running each CLI once (failing) and writing the expected JSON. Skill is not exercised against ground truth.
Fix: Remove the report.json requirement entirely. Grade on the CLI invocation itself:
- Replace
file_exists/file_contains/json_checkonreport.jsonwith tightercommand_executedregexes that pin exact flags and values (e.g.--display-name "Invoice Processing Bot",--scope "?OR\.(Folders|Jobs)"). - Or, if the sandbox can reach a tenant, use
run_commandblocks that re-execute the canonical CLI and assert viaexpected_stdout/stdout_match.
Theme 2 — [High] Prompt over-specification: recipe-style numbered steps (3 tasks)
| File | Prompt lines | Step count |
|---|---|---|
group_membership_management_e2e.yaml |
22–30 | 5 steps |
human_user_onboarding_e2e.yaml |
22–27 | 3 steps |
robot_account_onboarding_e2e.yaml |
25–31 | 2 steps |
Prompts enumerate the procedure step-by-step. The skill's onboarding-workflows and group-management references are supposed to teach ordering; if the prompt enumerates it, an agent passes without invoking the skill.
Fix: State the goal, not the procedure:
group_membership_management_e2e: "Set up a new 'Invoice Processing Team' group containing the first two users in the org, verify membership, then remove the second user."human_user_onboarding_e2e: "Onboard john.doe@example.com (John Doe) and add them to the 'Automation Developer' group once provisioned."robot_account_onboarding_e2e: "Onboard an unattended invoice-processing robot ('invoice-processor' / 'Invoice Processing Robot') with appropriate group access."
Theme 3 — [Low] Prompt over-specification: --output json prescription (all 8 tasks)
All prompts contain Use --output json on (all|every) uip admin commands. The skill already teaches --output json usage. Cross-cutting command_executed criteria already verify the flag, so the prompt-level prescription is redundant and slightly leaks procedure.
Fix: Drop the --output json instruction from each initial_prompt. Keep the cross-cutting command_executed criterion that pattern-matches --output\s+json.
Conclusion
⚠ 8 task(s) have issues, max severity Critical (at theme level; High at per-task level after theme downgrade). Advisory only — not blocking merge.
Top 3 priority fixes:
- Remove
report.jsonself-report pattern from all 6 affected tasks (Theme 1) — most impactful issue; these tests grade the agent's own narrative, not platform behavior. - Convert numbered-step recipes to goal statements in the 3 e2e/onboarding prompts (Theme 2) — lets the skill carry workflow knowledge.
- Add platform side-effect checks (
run_command+json_check) togroup_membership_management_e2eandhuman_user_onboarding_e2e— the only two tasks expected to hit a live tenant; currently they only verify command strings were typed.
No description provided.