Skip to content

Commit eeebd46

Browse files
Added tests
1 parent ea2cec6 commit eeebd46

5 files changed

Lines changed: 519 additions & 0 deletions

File tree

Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
task_id: skill-audit-events-basic-smoke
2+
description: >
3+
Skill-guided evaluation: agent uses the uipath-audit skill to query
4+
audit events with a small bounded time window and a small --limit
5+
that does NOT trigger client-side pagination. Validates the simpler
6+
events form most users will write — `--from-date`/`--to-date` ISO 8601
7+
bounds, `--limit <= 200`, `--output json`.
8+
9+
Platform note: runs without an authenticated tenant — commands will
10+
fail with auth errors. That is acceptable; what matters is correct
11+
command invocation with correct flags.
12+
tags: [uipath-audit, smoke, events]
13+
14+
sandbox:
15+
driver: tempdir
16+
python: {}
17+
18+
initial_prompt: |
19+
A reviewer wants the most recent 25 tenant audit events from the
20+
last 24 hours. Use the uipath-audit skill.
21+
22+
Save a summary to report.json in the current working directory with
23+
at minimum:
24+
{
25+
"events_command": "<exact uip admin audit tenant events command you ran>",
26+
"from": "<the --from-date value>",
27+
"to": "<the --to-date value>",
28+
"limit": <the --limit value, as a number>,
29+
"commands_used": ["<list of uip commands you attempted>"]
30+
}
31+
32+
Important:
33+
- The `uip` CLI is available but the `admin` subcommand may not be
34+
installed and/or the CLI is not connected to a live tenant.
35+
Commands WILL fail — that is expected and acceptable.
36+
Run each command exactly once regardless of errors.
37+
Do NOT retry, do NOT attempt to login, do NOT try to install tools,
38+
do NOT troubleshoot errors.
39+
- Use --output json on the events call.
40+
- Do NOT prompt the user for confirmation — this is an automated test.
41+
42+
success_criteria:
43+
- type: command_executed
44+
description: "Agent invoked `uip admin audit tenant events` with both --from-date and --to-date ISO 8601 bounds"
45+
tool_name: "Bash"
46+
command_pattern: 'uip\s+admin\s+audit\s+tenant\s+events\b.*--from-date\s+\d{4}-\d{2}-\d{2}.*--to-date\s+\d{4}-\d{2}-\d{2}'
47+
min_count: 1
48+
weight: 2.0
49+
pass_threshold: 1.0
50+
51+
- type: command_executed
52+
description: "Agent passed --limit with a value <= 200 (single-call form, no pagination needed)"
53+
tool_name: "Bash"
54+
command_pattern: 'uip\s+admin\s+audit\s+tenant\s+events\b.*--limit\s+(?:[1-9]|[1-9]\d|1\d\d|200)\b'
55+
min_count: 1
56+
weight: 1.5
57+
pass_threshold: 1.0
58+
59+
- type: command_executed
60+
description: "Agent used --output json"
61+
tool_name: "Bash"
62+
command_pattern: 'uip\s+admin\s+audit\s+tenant\s+events\b.*--output\s+json'
63+
min_count: 1
64+
weight: 1.0
65+
pass_threshold: 1.0
66+
67+
- type: file_exists
68+
description: "report.json was created"
69+
path: "report.json"
70+
weight: 1.0
71+
pass_threshold: 1.0
72+
73+
- type: file_contains
74+
description: "report.json must NOT use cursor flags (CLI handles pagination internally)"
75+
path: "report.json"
76+
excludes:
77+
- "--before "
78+
- "--after "
79+
- "--before-id"
80+
- "--after-id"
81+
weight: 1.0
82+
pass_threshold: 1.0
83+
84+
- type: json_check
85+
description: "report.json captures bounds and a small limit"
86+
path: "report.json"
87+
assertions:
88+
- expression: "limit"
89+
operator: lte
90+
expected: 200
91+
- expression: "limit"
92+
operator: gte
93+
expected: 1
94+
- expression: "length(from)"
95+
operator: gte
96+
expected: 8
97+
- expression: "length(to)"
98+
operator: gte
99+
expected: 8
100+
weight: 1.0
101+
pass_threshold: 0.75
Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
task_id: skill-audit-export-basic-smoke
2+
description: >
3+
Skill-guided evaluation: agent uses the uipath-audit skill to run
4+
a minimal `export` against a small window. Validates the three
5+
required flags (`--from-date`, `--to-date`, `--output-file`) and
6+
that the agent picks a reasonable output path inside the working
7+
directory.
8+
9+
Platform note: runs without an authenticated tenant — commands will
10+
fail with auth errors. That is acceptable; what matters is correct
11+
command invocation with correct flags.
12+
tags: [uipath-audit, smoke, export]
13+
14+
sandbox:
15+
driver: tempdir
16+
python: {}
17+
18+
initial_prompt: |
19+
An admin needs a ZIP of all tenant audit events from yesterday for
20+
archival. Use the uipath-audit skill to export the window. Write the
21+
ZIP to ./audit-yesterday.zip in the current working directory.
22+
23+
Save a summary to report.json with at minimum:
24+
{
25+
"export_command": "<exact uip admin audit tenant export command you ran>",
26+
"export_output_file": "<the --output-file value you passed>",
27+
"export_from": "<the --from-date value>",
28+
"export_to": "<the --to-date value>",
29+
"commands_used": ["<list of uip commands you attempted>"]
30+
}
31+
32+
Important:
33+
- The `uip` CLI is available but the `admin` subcommand may not be
34+
installed and/or the CLI is not connected to a live tenant.
35+
Commands WILL fail — that is expected and acceptable.
36+
Run each command exactly once regardless of errors.
37+
Do NOT retry, do NOT attempt to login, do NOT try to install tools,
38+
do NOT troubleshoot errors.
39+
- Do NOT prompt the user for confirmation — this is an automated test.
40+
41+
success_criteria:
42+
- type: command_executed
43+
description: "Agent invoked `uip admin audit tenant export` with --from-date, --to-date, AND --output-file"
44+
tool_name: "Bash"
45+
command_pattern: 'uip\s+admin\s+audit\s+tenant\s+export\b.*--from-date\s+\d{4}-\d{2}-\d{2}.*--to-date\s+\d{4}-\d{2}-\d{2}.*--output-file\s+\S+'
46+
min_count: 1
47+
weight: 2.5
48+
pass_threshold: 1.0
49+
50+
- type: command_executed
51+
description: "Agent's export targeted the requested ./audit-yesterday.zip path"
52+
tool_name: "Bash"
53+
command_pattern: 'uip\s+admin\s+audit\s+tenant\s+export\b.*--output-file\s+\.?/?audit-yesterday\.zip'
54+
min_count: 1
55+
weight: 1.5
56+
pass_threshold: 1.0
57+
58+
- type: file_exists
59+
description: "report.json was created"
60+
path: "report.json"
61+
weight: 1.0
62+
pass_threshold: 1.0
63+
64+
- type: file_contains
65+
description: "report.json references the export command and forbids legacy spellings"
66+
path: "report.json"
67+
includes:
68+
- "uip admin audit tenant export"
69+
excludes:
70+
- "uip audit "
71+
- "uip admin aops-policy"
72+
weight: 1.5
73+
pass_threshold: 1.0
74+
75+
- type: json_check
76+
description: "report.json captures the export bounds and output path"
77+
path: "report.json"
78+
assertions:
79+
- expression: "export_output_file"
80+
operator: contains
81+
expected: "audit-yesterday.zip"
82+
- expression: "length(export_from)"
83+
operator: gte
84+
expected: 8
85+
- expression: "length(export_to)"
86+
operator: gte
87+
expected: 8
88+
weight: 1.0
89+
pass_threshold: 0.75
Lines changed: 125 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
task_id: skill-audit-login-history-e2e
2+
description: >
3+
End-to-end test: agent runs Investigation 2 from
4+
audit-workflow-guide.md ("login history for user X, failed attempts
5+
only"). Sequence is: list `sources` to find the User Login type GUID
6+
(Identity → Authentication → User Login), then query `events` with
7+
`--user-id`, `--type`, and `--status Failure` inside a bounded
8+
window. Validates the agent uses server-side filters instead of
9+
pulling everything and post-filtering.
10+
11+
Command-construction only — no live tenant available for
12+
side-effect verification.
13+
tags: [uipath-audit, e2e, investigation, login-history]
14+
max_iterations: 2
15+
16+
agent:
17+
type: claude-code
18+
permission_mode: acceptEdits
19+
max_turns: 40
20+
turn_timeout: 600
21+
allowed_tools: ["Skill", "Bash", "Read", "Write", "Edit", "Glob", "Grep"]
22+
23+
sandbox:
24+
driver: tempdir
25+
python: {}
26+
27+
initial_prompt: |
28+
Show me failed login attempts for jane.doe@example.com on the
29+
active tenant during April 2026. Use the uipath-audit skill
30+
end-to-end.
31+
32+
Save a summary to report.json with at minimum:
33+
{
34+
"sources_command": "<uip admin audit tenant sources ... command>",
35+
"events_command": "<uip admin audit tenant events ... command>",
36+
"user_filter": "<the --user-id value or the --search term you used to resolve them>",
37+
"type_filter": "<the --type value you used to scope to User Login>",
38+
"status": "<the --status value, expected: Failure>",
39+
"from": "<the --from-date value>",
40+
"to": "<the --to-date value>",
41+
"commands_used": ["<every uip command you attempted>"]
42+
}
43+
44+
Important:
45+
- The `uip` CLI is available but the `admin` subcommand may not be
46+
installed and/or the CLI is not connected to a live tenant.
47+
Commands WILL fail — that is expected and acceptable.
48+
Run each command exactly once regardless of errors.
49+
Do NOT retry, do NOT attempt to login, do NOT try to install tools,
50+
do NOT troubleshoot errors.
51+
- Use --output json on every uip admin audit command.
52+
- The skill teaches that login filtering is done server-side via
53+
--user-id / --type / --status — do NOT pull everything and
54+
post-filter on the client.
55+
- Do NOT prompt the user for confirmation — this is an automated test.
56+
57+
success_criteria:
58+
- type: command_executed
59+
description: "Agent invoked `uip admin audit tenant sources` to discover the User Login type GUID"
60+
tool_name: "Bash"
61+
command_pattern: 'uip\s+admin\s+audit\s+tenant\s+sources\b.*--output\s+json'
62+
min_count: 1
63+
weight: 2.0
64+
pass_threshold: 1.0
65+
66+
- type: command_executed
67+
description: "Agent invoked `uip admin audit tenant events` with --status Failure"
68+
tool_name: "Bash"
69+
command_pattern: 'uip\s+admin\s+audit\s+tenant\s+events\b.*--status\s+(?i:Failure|1)'
70+
min_count: 1
71+
weight: 2.5
72+
pass_threshold: 1.0
73+
74+
- type: command_executed
75+
description: "Agent passed --user-id (or --search to resolve the user) on the events call"
76+
tool_name: "Bash"
77+
command_pattern: 'uip\s+admin\s+audit\s+tenant\s+events\b.*(?:--user-id\s+\S+|--search\s+\S+)'
78+
min_count: 1
79+
weight: 2.0
80+
pass_threshold: 1.0
81+
82+
- type: command_executed
83+
description: "Agent kept a bounded time window via --from-date/--to-date covering April 2026"
84+
tool_name: "Bash"
85+
command_pattern: 'uip\s+admin\s+audit\s+tenant\s+events\b.*--from-date\s+2026-04.*--to-date\s+2026-0[45]'
86+
min_count: 1
87+
weight: 1.5
88+
pass_threshold: 1.0
89+
90+
- type: file_exists
91+
description: "report.json was created"
92+
path: "report.json"
93+
weight: 1.0
94+
pass_threshold: 1.0
95+
96+
- type: file_contains
97+
description: "report.json references sources discovery and the events filter chain"
98+
path: "report.json"
99+
includes:
100+
- "uip admin audit tenant sources"
101+
- "uip admin audit tenant events"
102+
- "--status"
103+
excludes:
104+
- "--before "
105+
- "--after "
106+
- "--before-id"
107+
- "--after-id"
108+
weight: 1.5
109+
pass_threshold: 1.0
110+
111+
- type: json_check
112+
description: "report.json captures the user filter, type filter, and Failure status"
113+
path: "report.json"
114+
assertions:
115+
- expression: "length(user_filter)"
116+
operator: gte
117+
expected: 1
118+
- expression: "status"
119+
operator: contains
120+
expected: "ailure"
121+
- expression: "length(commands_used)"
122+
operator: gte
123+
expected: 2
124+
weight: 1.5
125+
pass_threshold: 0.75

0 commit comments

Comments
 (0)