feat(uipath-platform): add traces_fetch smoke test + traces_e2e full round-trip test#480
Merged
Conversation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
uipreliga
reviewed
Apr 30, 2026
uipreliga
reviewed
Apr 30, 2026
- Remove JSON schema from initial_prompt; agent writes freely, criteria validate - Add uip login status step so test runs against active live tenant (e2e pattern) - Add json_check criteria for command + outcome fields in report.json Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Contributor
Reviewing PR #480
|
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
Adds skill-platform-traces-e2e — first test in repo proving uip traces spans get returns real spans from a real job (span_count >= 1). Uses a pre-published traces-smoke-agent on alpha codereval/DefaultTenant. Process key supplied via TRACES_SMOKE_PROCESS_KEY secret (not hardcoded). Test starts the job, waits for completion, fetches spans, asserts span_count >= 1 via check_traces_e2e.py — the gate the smoke test lacks. Also injects TRACES_SMOKE_PROCESS_KEY into GITHUB_ENV in smoke-skills.yml so the secret is available to the agent sandbox at runtime. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
9963a9f to
2742505
Compare
bai-uipath
approved these changes
May 1, 2026
Agent no longer writes its own assessment to report.json. traces_fetch relies purely on command_executed criteria. traces_e2e has the agent pipe raw CLI output to spans.json; check_traces_e2e.py reads that instead, preserving the span_count >= 1 gate without self-reporting. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
uipreliga
approved these changes
May 5, 2026
Collaborator
uipreliga
left a comment
There was a problem hiding this comment.
I approved, but please can you fix the issue below? The prompt is way too detailed.
… in traces tests Replace verbatim CLI command sequences with goal-statement style prompts (per PR review: prompts were telling the agent what to do rather than testing the skill). Remove uip-login-status success criterion — CI always authenticates via UIPATH_CLI_ENABLE_ENV_AUTH before tests run. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This comment was marked as resolved.
This comment was marked as resolved.
…_prompt Replace start→wait→fetch procedure with goal-only prompt per lint feedback. Skill now teaches the full workflow; test verified locally score=1.000 5/5. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Contributor
|
Claude finished @saksharthakkar's task in 2m 39s —— View job Coder-eval task lint (advisory)2 task YAMLs changed; verdict counts: 0 Critical, 0 High, 0 Medium, 1 Low, 1 OK. Rubric: .claude/commands/lint-task.md. This check is advisory and never blocks merge. Evidence of passing run
Per-task lint
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Motivation
Adds two tests for `uip traces spans get` — the CLI command for fetching LLM trace spans from agent jobs.
Summary
`traces_fetch.yaml` — smoke (`skill-platform-traces-fetch`)
`traces_e2e.yaml` + `check_traces_e2e.py` — full round-trip (`skill-platform-traces-e2e`)
Workflow change: `smoke-skills.yml` now injects `TRACES_SMOKE_PROCESS_KEY` into `$GITHUB_ENV` so the agent sandbox picks it up at runtime.
Prompt style: Both prompts follow goal-statement style matching the repo pattern — no CLI commands, no procedural steps. Skill teaches the workflow.
Test Results
traces_e2e full round-trip — score 1.000 ✅
Verified 2026-05-05 locally against `codereval/DefaultTenant`:
Agent discovered the process via `uip or processes list`, started the job, fetched spans — all from the skill, no procedure in the prompt.
traces_fetch smoke — score 1.000 ✅
Verified locally — 2/2 criteria passed, ~20s.
Test plan
🤖 Generated with Claude Code