Define workflow for tests with ai and create initial tests for Dart SDK. by polina-c · Pull Request #1251 · google/A2UI

polina-c · 2026-04-21T20:34:40Z

Contributes to:

The eval verifies that these entities work well together:

Python agent, plus whatever libraries it uses
Dart SDK
AI model

This PR:

Adds eval workflow that:

reads api key from secret
runs eval/run.sh

Creates project eval/dart_and_flutter with test, that verify that app can:

ask a gemini model for a joke directly
start restaurant finder agent
request the agent to request model

Adds configurations to start processes via vscode UI

…-issue

polina-c · 2026-04-23T15:47:52Z

Thank you. It seems we are almost on the same page here.

Some comments:

this is an end-to-end test for Gen UI SDK + Restaurant finder, right?

Almost. It is for three things: Gen UI SDK + Restaurant finder agent + AI Model

AI model is important component here, not stubbed. We want to see Gen UI SDK is working reliably with real responses, produced by Restaurant finder + AI Model, where AI Model is not stubbed.
If we will want to test SDK for some concrete responses, we will create tests for a stubbed finder agent.

I think this is closer to an end-to-end test than an eval system

I am observing that engineering community is not consistent defining what "eval" is. So far I saw these definitions:

A test that tests prompt
A test that tests anything that involves AI
A test that is ok to produce X% of failures.

It seems you use definition #1. I updated README.md in the folder 'eval' to contain this definition.

How about putting it at samples/agent/adk/restaurant_finder/end_to_end_tests/dart?

Yes, it make sense to have this test closer to the code it tests. The code it tests is the code of the finder's Dart client.
Initially I wanted to put it to GenUI SDK repo. But now I see it will be easier to have in in a2ui repo, to test it together with the agent.

I am thinking to place it to:

samples/client/flutter/restaurant_finder/app

So, it will make sense to put the test to:

samples/client/flutter/restaurant_finder/e2e_test

This is PR that removes the client from GenUI SDK: flutter/genui#885

How does all this sound for you?

Changes I did:

Renamed .github/workflows/eval.yaml to .github/workflows/test_with_ai.yaml
Moved eval/run.sh to scripts/test_with_ai.sh
Updated .github/PULL_REQUEST_TEMPLATE.md to include instruction to run the tests on forked branched manually
Moved the test to samples/client/flutter/restaurant_finder/e2e_test

Does this structure look right for you?

jacobsimionato

Thank you for iterating! This looks great to me. I think it will be nice for developers to have a Flutter example in this repo, alongside the other client samples.

jacobsimionato · 2026-04-24T00:06:44Z

+# To run script locally, you need to set API key as an environment variable.
+# Example: export GEMINI_API_KEY=your_api_key
+
+cd "$(dirname "$0")/../samples/client/flutter/restaurant_finder/e2e_test"


This script is specific to the Flutter e2e tests for the restaurant finder agent. How about moving this script into samples/client/flutter/restaurant_finder/e2e_test and reference it there?

BTW part of my motivation here is that there is a plan to break out the A2UI repository into many repositories - https://docs.google.com/document/d/1914FKcF5LOXrj8y7-xf-45bJAwaPtcix_nBhOCuMECc/edit?resourcekey=0-g9pTHlodm5jGpkbknk-TJw&tab=t.0#heading=h.sd9tbgmewn0l

When we do this, it will be easier if the different parts (e.g. samples, renderers etc) are clearly separated to begin with.

jacobsimionato · 2026-04-24T00:10:30Z

 - [ ] I have added updates to the [CHANGELOG].
 - [ ] I updated/added relevant documentation.
 - [ ] My code changes (if any) have tests.
+- [ ] I have verified that [scripts/test_with_ai.sh](../scripts/test_with_ai.sh) passes.


This is covered by CI, so I don't think we need it specifically here - we'll see if this is necessary / has worked based on the results of the CI.

If we did add an item to this checklist, I think it should be a way to run all the CI workflows locally, rather than just this one. In most cases, other tests will be more relevant, because this only covers the Python SDK, restaurant finder demo, and Flutter SDK. Maybe we can experiment with something like https://github.com/nektos/act ?

Nope, CI will run only for local branches, not for forked. Forked branches do not have access to key.

Updated requirement.

People will need to configure key to run this test. I do not see how act will help.

jacobsimionato · 2026-04-24T00:23:16Z

+# Use of this source code is governed by a BSD-style license that can be
+# found in the LICENSE file.
+
+name: test_with_ai_workflow


How about calling this something like Restaurant Finder E2E Test and the file restaurant_finder_e2e_test, to be consistent with the other workflows, e.g. https://github.com/google/A2UI/blob/main/.github/workflows/lit_build_and_test.yml etc. E.g. I think that it should be "x test" rather than "test x", and the name here should be in regular English rather than snake case etc.

I was supposing this workflow, as well as scripts/test_with_ai.sh to run all tests that require key.

But, yes, it may become test for all e2e. I want to put all them together, to make it easier to run them with one command locally.

Made renamings you requested.

polina-c added 10 commits April 21, 2026 09:57

-

739b05f

-

1e489dc

-

bbe432b

-

04272e9

-

3728764

Update restaurant_finder_test.dart

83cd8d2

Update restaurant_finder_test.dart

769bf99

-

a273ddc

-

7251dc0

Update restaurant_finder_test.dart

76058e6

github-project-automation Bot added this to A2UI Apr 21, 2026

github-project-automation Bot moved this to Todo in A2UI Apr 21, 2026

This comment was marked as outdated.

Sign in to view

-

6503f11

polina-c changed the title ~~Eval issue~~ Eval for rest finder. Apr 21, 2026

polina-c added 2 commits April 21, 2026 14:16

-

148ce0b

Update eval.yaml

45b24cb

polina-c had a problem deploying to eval April 21, 2026 21:18 — with GitHub Actions Failure

polina-c changed the title ~~Eval for rest finder.~~ Eval for restaurant finder. Apr 21, 2026

Update CONTRIBUTING.md

bfb4dab

polina-c had a problem deploying to eval April 21, 2026 21:27 — with GitHub Actions Failure

-

eecf762

polina-c had a problem deploying to eval April 21, 2026 21:34 — with GitHub Actions Failure

Update eval.yaml

a17866c

polina-c had a problem deploying to eval April 21, 2026 21:39 — with GitHub Actions Failure

-

39ce962

polina-c had a problem deploying to eval April 21, 2026 21:44 — with GitHub Actions Failure

-

d320e28

polina-c had a problem deploying to eval April 21, 2026 21:52 — with GitHub Actions Failure

Update run.sh

914415e

polina-c added 3 commits April 23, 2026 08:20

-

5a97b4d

Merge branch 'eval-issue' of https://github.com/google/A2UI into eval…

41c20f0

…-issue

-

ffd18ff

polina-c had a problem deploying to eval April 23, 2026 15:27 — with GitHub Actions Failure

Update pubspec.yaml

0fd798d

polina-c had a problem deploying to eval April 23, 2026 15:50 — with GitHub Actions Failure

Update pubspec.yaml

29b05c4

polina-c had a problem deploying to eval April 23, 2026 16:00 — with GitHub Actions Failure

-

cc561a6

polina-c had a problem deploying to eval April 23, 2026 16:03 — with GitHub Actions Failure

Update test_with_ai.yaml

cf64025

polina-c changed the title ~~Enable Dart eval and create initial tests for Dart SDK.~~ Define workflow for tests with ai and create initial tests for Dart SDK. Apr 23, 2026

polina-c added 7 commits April 23, 2026 09:17

-

01f5754

Update restaurant_finder.dart

b2f0b17

Update PULL_REQUEST_TEMPLATE.md

0edc036

Update test_with_ai.sh

f23bb9e

Update test_with_ai.sh

5469ce8

-

54fd154

Update restaurant_finder_test.dart

c353d45

jacobsimionato approved these changes Apr 24, 2026

View reviewed changes

polina-c added 6 commits April 23, 2026 17:52

Update PULL_REQUEST_TEMPLATE.md

bb1f093

-

ac00146

Update PULL_REQUEST_TEMPLATE.md

9249afa

Update e2e_test.yaml

31e6f26

Update e2e_test.yaml

300bc8c

Merge branch 'main' of github.com:google/A2UI into eval-issue

3d1540d

polina-c merged commit e37809a into main Apr 24, 2026
15 checks passed

polina-c deleted the eval-issue branch April 24, 2026 01:09

github-project-automation Bot moved this from Todo to Done in A2UI Apr 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define workflow for tests with ai and create initial tests for Dart SDK.#1251

Define workflow for tests with ai and create initial tests for Dart SDK.#1251
polina-c merged 67 commits intomainfrom
eval-issue

polina-c commented Apr 21, 2026 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

polina-c commented Apr 23, 2026 •

edited

Loading

Uh oh!

jacobsimionato left a comment

Uh oh!

jacobsimionato Apr 24, 2026

Uh oh!

jacobsimionato Apr 24, 2026

Uh oh!

polina-c Apr 24, 2026

Uh oh!

jacobsimionato Apr 24, 2026

Uh oh!

polina-c Apr 24, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

polina-c commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

polina-c commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Some comments:

Changes I did:

Uh oh!

jacobsimionato left a comment

Choose a reason for hiding this comment

Uh oh!

jacobsimionato Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

jacobsimionato Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

polina-c Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

jacobsimionato Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

polina-c Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

polina-c commented Apr 21, 2026 •

edited

Loading

polina-c commented Apr 23, 2026 •

edited

Loading