feat(runner): add support for running and repairing tests by atscott · Pull Request #62 · angular/web-codegen-scorer

atscott · 2025-09-29T20:00:56Z

This commit introduces the ability to run tests against the generated code as part of the evaluation process.

A new optional testCommand can be in the environment configuration. If provided, this command will be executed after a successful build.

If the tests fail, the tool will attempt to repair the code using the LLM, similar to how build failures are handled. The number of repair attempts is configurable.

The report has been updated to display the test results for each run, including whether the tests passed, failed, or passed after repair. The summary view also includes aggregated statistics about the test results.

atscott · 2025-09-30T20:57:16Z

Screenshots of an environment config that defines a testCommand with testCommand: 'ng test --browsers ChromeHeadless --watch=false',

devversion

Overall this looks great, but a couple of comments/discussions

devversion · 2025-10-01T10:04:33Z

+ * Number of times we'll try to ask LLM to repair a test failure,
+ * providing the test output and the code that causes the problem.
+ */
+export const DEFAULT_MAX_TEST_REPAIR_ATTEMPTS = 1;


This is interesting. Do we actually want to repair test failures? or would it be better to repair if the test code can't be built?

Personally, I do think it's useful. It's pretty similar to build, where there is something verifiably wrong. I would argue that allowing a repair on a test failure is just as relevant, if not more, than an Axe failure (which is also a test and we do allow repairs for Axe failures).

It might also be relatively difficult to discern build vs test failure since I think both would return non-zero error codes.

We recently stopped repairs by default for Axe. Re being useful. Isn't there a risk it would rewrite test assertions to just pass? Asking a bit of questions to make sure we think about it/align.

Overall, agreed. Sounds good to me. Especially if the tests aren't LLM generated itself, presumably (could be prompted to generate I think)

cc. @crisbeto do you have any thoughts here?

I don't mind having it, but IMO they should be opt-in.

Isn't there a risk it would rewrite test assertions to just pass? Asking a bit of questions to make sure we think about it/align.

Yes, indeed that is something I am somewhat concerned about as well. Sometimes, though, it would be appropriate to edit the tests themselves when the original prompt was to "add tests for X component" or something (which we don't have coverage for but I think we should look into at some point)

I don't mind having it, but IMO they should be opt-in.

SGTM. I have bundled this in to the same rerun as Axe testing. Since both fall into a test bucket, I figured it should be okay to have "test reruns" cover both a11y and the custom testCommand. Since you can omit either of these individually (axe can be skipped with --skip-axe-testing) I think this should be fine. WDYT?

devversion

LGTM, some final comments, but then we can merge this IMO

devversion · 2025-10-06T11:53:18Z


-                @if (hasBuildFailureDuringA11yRepair(result)) {
+                @if (hasBuildFailureDuringTestRepair(result)) {
                  <span class="status-badge error">Build failed after a11y repair</span>


Does this need to be updated? (conceptual suggestion)

Suggested change

<span class="status-badge error">Build failed after a11y repair</span>

<span class="status-badge error">Build failed after a11y/test repair</span>

devversion · 2025-10-06T11:55:00Z

      enableAutoCsp,
      userJourneyAgentTaskInput,
    );
+    const testResult = await runTest(


Question: Should tests run when serving failed?

atscott · 2025-10-10T23:46:19Z

I think I might abandon this change... I'm having too much trouble keeping up with the churn underneath and rebasing.

crisbeto · 2025-10-11T01:26:54Z

Oh I was under the impression we had merged this already 😅 @devversion should we try to get this in before we move more things around?

devversion · 2025-10-11T05:56:13Z

Ops sorry! I meant to rebase this for you (after some more important g3 refactorings), but didn't get to it.

We'll handle it from here. Thanks for the change Andrew!

This commit introduces the ability to run tests against the generated code as part of the evaluation process. A new optional `testCommand` can be in the environment configuration. If provided, this command will be executed after a successful build. If the tests fail, the tool will attempt to repair the code using the LLM, similar to how build failures are handled. The number of repair attempts is configurable. The report has been updated to display the test results for each run, including whether the tests passed, failed, or passed after repair. The summary view also includes aggregated statistics about the test results.

devversion · 2025-10-11T08:18:09Z

@crisbeto I rebased. Can you make sure to review again? (to make sure I didn't break things)

atscott force-pushed the main branch 10 times, most recently from af95494 to 877d195 Compare September 30, 2025 20:20

atscott force-pushed the main branch from 877d195 to a19a09c Compare September 30, 2025 20:58

atscott marked this pull request as ready for review September 30, 2025 20:58

atscott requested review from AndrewKushnir, crisbeto and devversion as code owners September 30, 2025 20:58

devversion reviewed Oct 1, 2025

View reviewed changes

crisbeto reviewed Oct 1, 2025

View reviewed changes

Comment thread runner/orchestration/gateways/local_gateway.ts Outdated

Comment thread runner/orchestration/test-repair.ts

Comment thread runner/workers/test/worker.ts Outdated

atscott force-pushed the main branch 9 times, most recently from 5419d02 to 4332b39 Compare October 1, 2025 20:38

atscott changed the title ~~feat(runner): add support for running and repairing tests~~ feat(runner): add support for running tests Oct 1, 2025

atscott force-pushed the main branch from 4332b39 to af64562 Compare October 2, 2025 00:08

atscott marked this pull request as draft October 2, 2025 00:09

devversion reviewed Oct 2, 2025

View reviewed changes

atscott changed the title ~~feat(runner): add support for running tests~~ feat(runner): add support for running and repairing tests Oct 2, 2025

atscott force-pushed the main branch 3 times, most recently from ae8a24d to 3bcaca6 Compare October 2, 2025 23:11

atscott marked this pull request as ready for review October 2, 2025 23:11

atscott force-pushed the main branch from 3bcaca6 to 616f19c Compare October 2, 2025 23:12

crisbeto approved these changes Oct 3, 2025

View reviewed changes

Comment thread runner/ratings/built-in-ratings/successful-tests-rating.ts Outdated

atscott force-pushed the main branch from 616f19c to d1c3eaa Compare October 3, 2025 16:04

devversion approved these changes Oct 6, 2025

View reviewed changes

devversion assigned atscott Oct 6, 2025

devversion force-pushed the main branch from d1c3eaa to b9a6a17 Compare October 11, 2025 08:17

devversion unassigned atscott Oct 11, 2025

devversion requested a review from crisbeto October 11, 2025 08:18

crisbeto approved these changes Oct 11, 2025

View reviewed changes

devversion merged commit 2fbf78b into angular:main Oct 11, 2025
3 checks passed

	<span class="status-badge error">Build failed after a11y repair</span>
	<span class="status-badge error">Build failed after a11y/test repair</span>

Uh oh!

Conversation

atscott commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

atscott commented Sep 30, 2025

Uh oh!

devversion left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

devversion Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

atscott Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

devversion Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

crisbeto Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

atscott Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

devversion left a comment

Choose a reason for hiding this comment

Uh oh!

devversion Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

devversion Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

atscott commented Oct 10, 2025

Uh oh!

crisbeto commented Oct 11, 2025

Uh oh!

devversion commented Oct 11, 2025

Uh oh!

devversion commented Oct 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

atscott commented Sep 29, 2025 •

edited

Loading

devversion Oct 2, 2025 •

edited

Loading

devversion commented Oct 11, 2025 •

edited

Loading