Productionize Agent Skills simulator journey and release gate

## Problem

Agent Skills are now implemented enough to build and merge, but the remaining production gap is validation quality: we need a repeatable simulator journey that proves the on-device LLM, model download/setup path, skill routing, connector execution, and user-facing output stay aligned with the intended UX before release.

Recent merged context:

- PR #226 added iOS calendar read parity and connector coverage.
- PR #227 fixed reduced pubspec validation issues.
- PR #228 restored green Android CI under AGP 9 with compatibility flags and CI plugin patching.
- The repo now includes `scripts/run_agent_skills_journey.sh`, `app/patrol_test/agent_skills_journey_test.dart`, and `app/integration_test/agent_skills_journey_test.dart`.
- Reminder flows can now ask whether to add the reminder to Calendar for cross-device sync, then execute `create_calendar_event` on confirmation.

## Goal

Make Agent Skills production-ready by converting the current simulator journey into a reliable release gate with artifacts, clear pass/fail criteria, and documented remediation steps.

## Scope

### In scope

- Run the Agent Skills simulator journey on Android and iOS from a clean developer machine or CI runner.
- Cover the model setup path: launch app, select/download recommended on-device model where required, and reach a usable chat state.
- Validate representative prompts:
  - "Check my schedule for today"
  - "Remind me tomorrow at 9 AM to submit the report"
  - User confirms: "yes, add it to calendar"
- Verify visible action traces for `get_current_date_time`, `read_calendar_events`, `schedule_notification`, and `create_calendar_event` where applicable.
- Persist journey artifacts: logs, screenshots/video if supported, summary JSON, selected model/runtime, device/platform, app build variant, and final assistant output.
- Decide whether this journey should run on every PR, nightly, or only release branches, then wire it into GitHub Actions accordingly.
- Document local and CI execution in the repo.

### Out of scope

- Building a full community skills marketplace.
- Replacing the current Agent Skills UI.
- Removing AGP 9 compatibility workarounds unless it is required to make this journey stable.
- Adding new third-party connectors beyond Calendar/Notifications.

## Acceptance criteria

- [ ] `make agent-skills-journey-android` succeeds on a clean Android simulator and writes artifacts under `artifacts/agent-skills-journey/...`.
- [ ] `make agent-skills-journey-ios` succeeds on a clean iOS simulator or has a documented, tracked blocker if CI cannot support it yet.
- [ ] The journey verifies model setup/download state before sending prompts.
- [ ] The journey asserts the calendar-read prompt invokes the expected skill/tool path and produces a user-facing answer.
- [ ] The journey asserts reminder scheduling creates an app notification and asks whether to add the event to Calendar.
- [ ] The journey asserts positive confirmation executes `create_calendar_event` and displays success/failure clearly.
- [ ] Artifacts include a machine-readable summary with status, duration, platform, device, prompt, selected model/runtime, and log path.
- [ ] CI runs the journey in the agreed cadence and fails on regressions, or a documented manual QA runbook exists until CI support is stable.
- [ ] Documentation explains prerequisites, commands, expected outputs, common failures, and how to update baselines.
- [ ] All impacted checks are green before merge: PR Checks, Plugin Size Gate, CI analyze/lint/tests, and Android full/streaming/TV builds if touched.

## Suggested implementation plan

1. Audit the existing journey files and confirm what they currently assert versus only navigate.
2. Add stable test IDs/semantics to the Agent Skills UI if the Patrol tests are relying on fragile text matching.
3. Extend the Patrol journey to capture model/runtime state and screenshots at each major step.
4. Add explicit assertions for action traces and final assistant messages.
5. Add GitHub Actions wiring with a conservative cadence first, preferably manual dispatch or nightly if simulator runtime is too slow for every PR.
6. Promote to PR/release gate once flakiness is under control.
7. Update docs and PR template/testing notes so developers know when to run it locally.

## Files to start from

- `scripts/run_agent_skills_journey.sh`
- `Makefile` targets around `agent-skills-journey`
- `app/patrol_test/agent_skills_journey_test.dart`
- `app/integration_test/agent_skills_journey_test.dart`
- `app/lib/features/agent_chat/domain/services/agent_skill_orchestrator.dart`
- `app/lib/features/agent_chat/presentation/screens/chat_screen.dart`
- `.github/workflows/ci.yml`

## Risks

- Simulator tests may be slow or flaky if model download is network-dependent.
- Calendar/notification permissions differ between Android and iOS simulators.
- Current AGP 9 workaround patches pub-cache plugin Gradle files in CI; plugin upgrades may be needed before this is stable long term.
- On-device model output can vary; assertions should focus on required actions and UX contract, not exact wording unless deterministic.

## Owner / estimate

- Main agent: `agent/ai-llm`
- Supporting areas: QA, CI/CD, mobile UI
- Estimate: 2-4 days for stable manual/nightly coverage; longer if PR-gating requires simulator infrastructure hardening.

## Definition of done

A developer can pick this ticket, run one documented command, reproduce the Agent Skills user journey on simulator, inspect artifacts, and trust the result as a release-readiness signal for the on-device LLM skill workflow.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Productionize Agent Skills simulator journey and release gate #252

Problem

Goal

Scope

In scope

Out of scope

Acceptance criteria

Suggested implementation plan

Files to start from

Risks

Owner / estimate

Definition of done

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Productionize Agent Skills simulator journey and release gate #252

Description

Problem

Goal

Scope

In scope

Out of scope

Acceptance criteria

Suggested implementation plan

Files to start from

Risks

Owner / estimate

Definition of done

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions