Define workflow for tests with ai and create initial tests for Dart SDK.#1251
Define workflow for tests with ai and create initial tests for Dart SDK.#1251
Conversation
|
Thank you. It seems we are almost on the same page here. Some comments:
Almost. It is for three things: Gen UI SDK + Restaurant finder agent + AI Model AI model is important component here, not stubbed. We want to see Gen UI SDK is working reliably with real responses, produced by Restaurant finder + AI Model, where AI Model is not stubbed.
I am observing that engineering community is not consistent defining what "eval" is. So far I saw these definitions:
It seems you use definition #1. I updated README.md in the folder 'eval' to contain this definition.
Yes, it make sense to have this test closer to the code it tests. The code it tests is the code of the finder's Dart client. I am thinking to place it to: samples/client/flutter/restaurant_finder/app So, it will make sense to put the test to: samples/client/flutter/restaurant_finder/e2e_test This is PR that removes the client from GenUI SDK: flutter/genui#885 How does all this sound for you? Changes I did:
Does this structure look right for you? |
jacobsimionato
left a comment
There was a problem hiding this comment.
Thank you for iterating! This looks great to me. I think it will be nice for developers to have a Flutter example in this repo, alongside the other client samples.
| # To run script locally, you need to set API key as an environment variable. | ||
| # Example: export GEMINI_API_KEY=your_api_key | ||
|
|
||
| cd "$(dirname "$0")/../samples/client/flutter/restaurant_finder/e2e_test" |
There was a problem hiding this comment.
This script is specific to the Flutter e2e tests for the restaurant finder agent. How about moving this script into samples/client/flutter/restaurant_finder/e2e_test and reference it there?
BTW part of my motivation here is that there is a plan to break out the A2UI repository into many repositories - https://docs.google.com/document/d/1914FKcF5LOXrj8y7-xf-45bJAwaPtcix_nBhOCuMECc/edit?resourcekey=0-g9pTHlodm5jGpkbknk-TJw&tab=t.0#heading=h.sd9tbgmewn0l
When we do this, it will be easier if the different parts (e.g. samples, renderers etc) are clearly separated to begin with.
| - [ ] I have added updates to the [CHANGELOG]. | ||
| - [ ] I updated/added relevant documentation. | ||
| - [ ] My code changes (if any) have tests. | ||
| - [ ] I have verified that [scripts/test_with_ai.sh](../scripts/test_with_ai.sh) passes. |
There was a problem hiding this comment.
This is covered by CI, so I don't think we need it specifically here - we'll see if this is necessary / has worked based on the results of the CI.
If we did add an item to this checklist, I think it should be a way to run all the CI workflows locally, rather than just this one. In most cases, other tests will be more relevant, because this only covers the Python SDK, restaurant finder demo, and Flutter SDK. Maybe we can experiment with something like https://github.com/nektos/act ?
There was a problem hiding this comment.
Nope, CI will run only for local branches, not for forked. Forked branches do not have access to key.
Updated requirement.
People will need to configure key to run this test. I do not see how act will help.
| # Use of this source code is governed by a BSD-style license that can be | ||
| # found in the LICENSE file. | ||
|
|
||
| name: test_with_ai_workflow |
There was a problem hiding this comment.
How about calling this something like Restaurant Finder E2E Test and the file restaurant_finder_e2e_test, to be consistent with the other workflows, e.g. https://github.com/google/A2UI/blob/main/.github/workflows/lit_build_and_test.yml etc. E.g. I think that it should be "x test" rather than "test x", and the name here should be in regular English rather than snake case etc.
There was a problem hiding this comment.
I was supposing this workflow, as well as scripts/test_with_ai.sh to run all tests that require key.
But, yes, it may become test for all e2e. I want to put all them together, to make it easier to run them with one command locally.
Made renamings you requested.
Contributes to:
The eval verifies that these entities work well together:
This PR: