You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Agent Skills are now implemented enough to build and merge, but the remaining production gap is validation quality: we need a repeatable simulator journey that proves the on-device LLM, model download/setup path, skill routing, connector execution, and user-facing output stay aligned with the intended UX before release.
The repo now includes scripts/run_agent_skills_journey.sh, app/patrol_test/agent_skills_journey_test.dart, and app/integration_test/agent_skills_journey_test.dart.
Reminder flows can now ask whether to add the reminder to Calendar for cross-device sync, then execute create_calendar_event on confirmation.
Goal
Make Agent Skills production-ready by converting the current simulator journey into a reliable release gate with artifacts, clear pass/fail criteria, and documented remediation steps.
Scope
In scope
Run the Agent Skills simulator journey on Android and iOS from a clean developer machine or CI runner.
Cover the model setup path: launch app, select/download recommended on-device model where required, and reach a usable chat state.
Validate representative prompts:
"Check my schedule for today"
"Remind me tomorrow at 9 AM to submit the report"
User confirms: "yes, add it to calendar"
Verify visible action traces for get_current_date_time, read_calendar_events, schedule_notification, and create_calendar_event where applicable.
Persist journey artifacts: logs, screenshots/video if supported, summary JSON, selected model/runtime, device/platform, app build variant, and final assistant output.
Decide whether this journey should run on every PR, nightly, or only release branches, then wire it into GitHub Actions accordingly.
Document local and CI execution in the repo.
Out of scope
Building a full community skills marketplace.
Replacing the current Agent Skills UI.
Removing AGP 9 compatibility workarounds unless it is required to make this journey stable.
Adding new third-party connectors beyond Calendar/Notifications.
Acceptance criteria
make agent-skills-journey-android succeeds on a clean Android simulator and writes artifacts under artifacts/agent-skills-journey/....
make agent-skills-journey-ios succeeds on a clean iOS simulator or has a documented, tracked blocker if CI cannot support it yet.
The journey verifies model setup/download state before sending prompts.
The journey asserts the calendar-read prompt invokes the expected skill/tool path and produces a user-facing answer.
The journey asserts reminder scheduling creates an app notification and asks whether to add the event to Calendar.
The journey asserts positive confirmation executes create_calendar_event and displays success/failure clearly.
Artifacts include a machine-readable summary with status, duration, platform, device, prompt, selected model/runtime, and log path.
CI runs the journey in the agreed cadence and fails on regressions, or a documented manual QA runbook exists until CI support is stable.
Documentation explains prerequisites, commands, expected outputs, common failures, and how to update baselines.
All impacted checks are green before merge: PR Checks, Plugin Size Gate, CI analyze/lint/tests, and Android full/streaming/TV builds if touched.
Suggested implementation plan
Audit the existing journey files and confirm what they currently assert versus only navigate.
Add stable test IDs/semantics to the Agent Skills UI if the Patrol tests are relying on fragile text matching.
Extend the Patrol journey to capture model/runtime state and screenshots at each major step.
Add explicit assertions for action traces and final assistant messages.
Add GitHub Actions wiring with a conservative cadence first, preferably manual dispatch or nightly if simulator runtime is too slow for every PR.
Promote to PR/release gate once flakiness is under control.
Update docs and PR template/testing notes so developers know when to run it locally.
Simulator tests may be slow or flaky if model download is network-dependent.
Calendar/notification permissions differ between Android and iOS simulators.
Current AGP 9 workaround patches pub-cache plugin Gradle files in CI; plugin upgrades may be needed before this is stable long term.
On-device model output can vary; assertions should focus on required actions and UX contract, not exact wording unless deterministic.
Owner / estimate
Main agent: agent/ai-llm
Supporting areas: QA, CI/CD, mobile UI
Estimate: 2-4 days for stable manual/nightly coverage; longer if PR-gating requires simulator infrastructure hardening.
Definition of done
A developer can pick this ticket, run one documented command, reproduce the Agent Skills user journey on simulator, inspect artifacts, and trust the result as a release-readiness signal for the on-device LLM skill workflow.
Problem
Agent Skills are now implemented enough to build and merge, but the remaining production gap is validation quality: we need a repeatable simulator journey that proves the on-device LLM, model download/setup path, skill routing, connector execution, and user-facing output stay aligned with the intended UX before release.
Recent merged context:
scripts/run_agent_skills_journey.sh,app/patrol_test/agent_skills_journey_test.dart, andapp/integration_test/agent_skills_journey_test.dart.create_calendar_eventon confirmation.Goal
Make Agent Skills production-ready by converting the current simulator journey into a reliable release gate with artifacts, clear pass/fail criteria, and documented remediation steps.
Scope
In scope
get_current_date_time,read_calendar_events,schedule_notification, andcreate_calendar_eventwhere applicable.Out of scope
Acceptance criteria
make agent-skills-journey-androidsucceeds on a clean Android simulator and writes artifacts underartifacts/agent-skills-journey/....make agent-skills-journey-iossucceeds on a clean iOS simulator or has a documented, tracked blocker if CI cannot support it yet.create_calendar_eventand displays success/failure clearly.Suggested implementation plan
Files to start from
scripts/run_agent_skills_journey.shMakefiletargets aroundagent-skills-journeyapp/patrol_test/agent_skills_journey_test.dartapp/integration_test/agent_skills_journey_test.dartapp/lib/features/agent_chat/domain/services/agent_skill_orchestrator.dartapp/lib/features/agent_chat/presentation/screens/chat_screen.dart.github/workflows/ci.ymlRisks
Owner / estimate
agent/ai-llmDefinition of done
A developer can pick this ticket, run one documented command, reproduce the Agent Skills user journey on simulator, inspect artifacts, and trust the result as a release-readiness signal for the on-device LLM skill workflow.