Ignite UI for Angular — Skill Evals

Automated evaluation suite for the Ignite UI for Angular agent skills. Uses the skill-eval framework to measure skill quality, detect regressions, and gate merges.

Overview

The suite tests three skills:

Skill	Task ID	What it tests
`igniteui-angular-grids`	`grid-basic-setup`	Flat grid with sorting and pagination on flat employee data
`igniteui-angular-components`	`component-combo-reactive-form`	Multi-select combo bound to a reactive form control
`igniteui-angular-theming`	`theming-palette-generation`	Custom branded palette with `palette()` and `theme()`

Each task includes:

instruction.md — the prompt given to the agent
tests/test.sh — deterministic grader (file checks, compilation, lint)
prompts/quality.md — LLM rubric grader (intent routing, API usage)
solution/solve.sh — reference solution for baseline validation
environment/Dockerfile — isolated environment for agent execution
skills/ — symlinked or copied skill files under test

Prerequisites

Node.js 20+
Docker (for isolated agent execution)
An API key for the agent provider (Gemini or Anthropic)

Running Evals Locally

Install dependencies

cd evals
npm install

Run a single task

# Gemini (default)
GEMINI_API_KEY=your-key npm run eval -- grid-basic-setup

# Claude
ANTHROPIC_API_KEY=your-key npm run eval -- grid-basic-setup --agent=claude

Run all tasks

GEMINI_API_KEY=your-key npm run eval:all

Options

# Adjust trials (default: 5)
npm run eval -- grid-basic-setup --trials=5

# Run locally without Docker
npm run eval -- grid-basic-setup --provider=local

# Validate graders against the reference solution
npm run eval -- grid-basic-setup --validate --provider=local

# Run multiple trials in parallel
npm run eval -- grid-basic-setup --parallel=3

Preview results

# CLI report
npm run preview

# Web UI at http://localhost:3847
npm run preview:browser

Adding a New Task

Create a directory under evals/tasks/<task-id>/ with the standard structure:

tasks/<task-id>/
├── task.toml               # Config: graders, timeouts, resource limits
├── instruction.md          # Agent prompt
├── environment/Dockerfile  # Container setup
├── tests/test.sh           # Deterministic grader
├── prompts/quality.md      # LLM rubric grader
├── solution/solve.sh       # Reference solution
└── skills/                 # Skill files under test
    └── <skill-name>/SKILL.md

Write a clear, unambiguous instruction.md that tells the agent exactly what to build.
Write tests/test.sh to check outcomes (files exist, project compiles, correct selectors are present) rather than specific steps.
Write prompts/quality.md with rubric dimensions that sum to 1.0.
Write solution/solve.sh — a shell script that proves the task is solvable and validates that the graders work correctly.

Validate graders before submitting:

npm run eval -- <task-id> --validate --provider=local

Pass / Fail Thresholds

Following Anthropic's recommendations:

Metric	Threshold	Effect
`pass@5 ≥ 80%`	Merge gate	At least 1 success in 5 trials required
`pass^5 ≥ 60%`	Tracked	Flags flaky skills for investigation
`pass@5 < 60%`	Blocks merge	On PRs touching the relevant skill

CI Integration

The GitHub Actions workflow at .github/workflows/skill-eval.yml runs automatically on PRs that modify skills/** or evals/**. It:

Checks out the repo
Installs eval dependencies
Runs all tasks with 5 trials
Uploads results as an artifact
Posts a summary comment on the PR

Grading Strategy

Deterministic grader (60% weight) — checks:

Project builds without errors
Correct Ignite UI selector is present in the generated template
Required imports exist
No use of forbidden alternatives

LLM rubric grader (40% weight) — evaluates:

Correct intent routing
Idiomatic API usage
Absence of hallucinated APIs
Following the skill's guidance

Results

Baseline results are stored in evals/results/baseline.json and used for regression comparison on PRs. The CI workflow uploads per-run results as GitHub Actions artifacts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ignite UI for Angular — Skill Evals

Overview

Prerequisites

Running Evals Locally

Install dependencies

Run a single task

Run all tasks

Options

Preview results

Adding a New Task

Pass / Fail Thresholds

CI Integration

Grading Strategy

Results

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Ignite UI for Angular — Skill Evals

Overview

Prerequisites

Running Evals Locally

Install dependencies

Run a single task

Run all tasks

Options

Preview results

Adding a New Task

Pass / Fail Thresholds

CI Integration

Grading Strategy

Results