RFC 239: Policy on LLM assistance in contributions by jugglinmike · Pull Request #239 · web-platform-tests/rfcs

jugglinmike · 2026-05-20T19:54:59Z

This initial draft takes a maximalist approach (and a permissive stance) to promote a robust and grounded discussion.

Rendered

gsnedders · 2026-05-21T21:00:09Z

Fixes #202

gsnedders

Thanks for trying to tackle this!

I think we should also write something about use of LLMs for review, if nothing else.

gsnedders · 2026-05-21T23:06:43Z

+> contributions; the external projects are trusted to determine their own
+> mechanisms for quality assurance.
+
+## Risks


I think it's worthwhile including at least a few more technical risks:

Contributions of tests generated by an LLM closely looking at a specific implementation's code, matching that implementation, rather than the spec. (This is, of course, already an issue — but could inevitably become more of a problem if we get more, larger contributions.)

Contributions not matching the spec at all. I've seen this mostly with trying to generate tests to assert ordering of things which end of using HTML's parallelism and HTML's event loops; that case is especially annoying because it can lead to flaky tests.

To me, those look like specific failure modes of LLMs. I think they'd be more helpful to elucidate what we mean by "low-value contributions" rather than as additional risks that are distinct from low-value contributions. I've just pushed a commit incorporating them in that way.

While I think the (2) might be reasonable to class as "low value", I think (1) is really a different failure mode, and one that becomes a higher-risk with LLMs (by virtue of sheer volume).

martinthomson · 2026-05-27T21:35:00Z

In terms of a survey, this is a very useful resource. It's fairly comprehensive and it provides a useful rubric for classification across quality, copyright, and ethics.

jgraham

My view is that the overall policy should be something like:

The use of LLMs as tools to help author commits to this repository is allowed
All contributions must be attributed to a person. That person is expected to fully understand and take responsibility for their contribution, irrespective of any tooling used.
If LLMs are used as a significant input to a commit, authors are encouraged to include details about how they were used as part of the commit message, in order to help review and future understanding of the code (similar to the way that a commit created using a script might include details of the script).
People who habitually submit low quality PRs, PRs they don't understand, or otherwise make contributions that are deemed to waste the time of other project contributors or downstream users of the project may be banned.

jgraham · 2026-05-28T12:48:08Z

https://github.com/BurntSushi/ripgrep/blob/master/AI_POLICY.md is also interesting, in particular the "no AI in issues or issue comments" part, with a specific carve out for use cases like translation.

DanielRyanSmith · 2026-06-01T21:30:18Z

I think I'm mostly in agreement with the discussions here so far. I do have one scenario where I think the policy could be expanded on.

All contributions must be attributed to a person. That person is expected to fully understand and take responsibility for their contribution, irrespective of any tooling used.

I think it would be reasonable to allow a test to be proposed or generated, and although the original author does not have a strong fundamental understanding of the change, it would be feasible to have that change reviewed thoroughly by a domain expert before submission.

For example, using the experimental tool WPT-Gen can uncover potential gaps in test coverage reading the spec, and then generate tests to cover those gaps. Hypothetically, we could occasionally run this automatically for certain features. If we have domain expert reviewers who are willing to thoroughly review generated tests, it seems like it would be useful to have an option in the LLM policy to allow for this. Naturally, this does put a lot of burden on the already thankless job of being a reviewer, but it still seems like a viable contribution avenue.

I imagine this will not be terribly common, but it is something that doesn't sound unrealistic.

tabatkins · 2026-06-01T22:15:46Z

As a reviewer, I am not interested in reviewing tests that the author does not understand. There are a lot more test authors than test reviewers. Someone using wpt-gen to create tests should then review those tests and make sure they're doing reasonable things before tossing them at a reviewer. (Generally speaking, I'm not happy with things that further unbalance the effort expectations in the submitter/reviewer dynamic. I'm happy to do thorough review to teach a person how to handle an area; I am not happy to do the same to approve/reject a large batch of unreviewed tests when the effort will not meaningfully improve future interactions.))

jgraham · 2026-06-02T07:47:36Z

To expand a bit on my proposal for issue/PR comments:

The project may choose to use tooling that produces LLM-generated comments. Such comments must always be explicitly associated with a bot account. The core team gets final discretion about which bots / tooling are permitted.
Human authored comments should not include LLM-generated content in the main text; any such content must be clearly labelled and placed inside a <details> element.

DanielRyanSmith · 2026-06-02T15:26:21Z

As a reviewer, I am not interested in reviewing tests that the author does not understand. There are a lot more test authors than test reviewers. Someone using wpt-gen to create tests should then review those tests and make sure they're doing reasonable things before tossing them at a reviewer. (Generally speaking, I'm not happy with things that further unbalance the effort expectations in the submitter/reviewer dynamic. I'm happy to do thorough review to teach a person how to handle an area; I am not happy to do the same to approve/reject a large batch of unreviewed tests when the effort will not meaningfully improve future interactions.))

I don't disagree with any of this, and I should clarify that I'm not advocating for reviewers now being responsible for reviewing AI-generated PRs that authors do not understand - that would absolutely not be beneficial to the project.

I do think that there's a scenario where someone is working to identify and fill gaps in test coverage using LLMs for specific features, and if that person has already interacted with a specific reviewer who is in agreement and willing to review these tests, then that should not be disbarred from contribution if these tests are being reviewed by a competent reviewer.

This is mostly a corner case, but it is something that I wanted to bring up as well.

jugglinmike · 2026-06-02T20:08:51Z

Thanks, folks! During today's meeting, I offered to incorporate all the feedback that hasn't received explicit pushback on this discussion thread. While carrying that out, I've found one small conflict between the suggestions above.

@gsnedders gestured toward a statement in the Chromium project's policy:

That said, Chromium's policy here is currently:

To aid reviewers, authors should flag areas that they are not confident about that had AI assistance.

This is maybe a weaker form, and hopefully something more in line with existing contributions.

However, @jgraham later made a suggestion which undermines that text:

All contributions must be attributed to a person. That person is expected to fully understand and take responsibility for their contribution, irrespective of any tooling used.

If LLMs are used as a significant input to a commit, authors are encouraged to include details about how they were used as part of the commit message, in order to help review and future understanding of the code (similar to the way that a commit created using a script might include details of the script).

If a person must "fully understand and take responsibility for their contribution," then there doesn't seem to be a need to guide folks who "are not confident" about their work. I've incorporated the part from @jgraham's exclusively because @gsnedders has not pushed back on it.

gsnedders · 2026-06-03T01:08:23Z

I do think that there's a scenario where someone is working to identify and fill gaps in test coverage using LLMs for specific features, and if that person has already interacted with a specific reviewer who is in agreement and willing to review these tests, then that should not be disbarred from contribution if these tests are being reviewed by a competent reviewer.

I think there's two meaningful cases here:

The specific reviewer doesn't have access to the same models as the other contributor, and
The specific reviewer is delegating the task of actually generating the tests to the other contributor.

In either case, I think I'd rather push the other contributor to provide the tests to the specific reviewer in some other way — e.g., by pushing the branch to their fork without opening a PR so the specific reviewer can take over and do any initial fixup work before opening a PR.

I, like others here, don't want to deal with PRs that the author hasn't themselves reviewed.

If a person must "fully understand and take responsibility for their contribution," then there doesn't seem to be a need to guide folks who "are not confident" about their work. I've incorporated the part from @jgraham's exclusively because @gsnedders has not pushed back on it.

While I recognise @jgraham was clearly deriving text from the Firefox policy, I do wonder if "fully understand" is a somewhat unreasonable bar! I think most of us working on the infrastructure have touched things without fully understanding the code?

That said, I don't have a particularly strong opinion here!

annevk · 2026-06-04T07:33:27Z

+> Human-authored code discourse (e.g. issue descriptions, pull request
+> descriptions, and responses to discussion threads) should not include
+> LLM-generated content in the main text; any such content must be clearly
+> labelled and placed inside a `<details>` element.


This feels too onerous. What if people use it for translation? What if they copy-and-pasted a tiny snippet?

The ripgrep policy linked above has specific text about translation, which after noting its usefulness says the same policy applies:

If using AI for translation, we recommend writing in your native language and including the AI translation in a quote block.

That makes a lot of sense to me. If you only auto-translated the content then providing the source language as canonical allows others to redo the translation potentially using a better tool.

The tiny snippet thing feels very edge-casy, but we can try to reword. Are there specific scenarios you have in mind? Personally I'm not sure why someone would take just a few words directly and intermix it with otherwise handwritten text. If it's not substantive (e.g. you're just copying phrasing) then I'd prefer that you didn't do that, if it is substantive (e.g. you're relying on an LLM for a point of fact) I'd prefer that you are explicit about where you got it from (or rewrite it in your own words, which at least suggests that you put a modest amount of thought into it).

I do think there's a case for allowing <blockquote> for including shorter sections of LLM output (up to one paragraph).

E.g., the LLM whipped up a comparison table to illustrate some design trade-offs. It just seems excessive to have a "must" on this.

That feels like a case where it's very useful to know that it was LLM output that was copied and pasted.

I personally don’t really mind where it comes from as long as a human pasted it and knows it’s correct. If they are unsure, then yes.

Being overtly strict doesn’t seem like the right trade-off. Especially since I don’t think we even have that many bad contributions? They first have to figure out how to get reviewed. 😅

annevk · 2026-06-04T07:34:48Z

+>
+> #### Understanding
+>
+> Every pull request must be initiated by one human. That person must author


Should this be at least one? Don't we sometimes get team contributions?

All PRs are created by one user account which almost always means one person. Do you have an example of a PR that's attributed to a team rather than an individual?

That part makes sense, but I don't think the human uploading always knows about every line, if they are open sourcing a set of tests for instance that were developed internally over some period of time. E.g., I think some Opera contributions in the past were of this nature.

That seems like a case where we're trusting a third party repository as a source of commits (as we do for Gecko, Chromium, WebKit and Servo repositories today). It is true the policy there isn't quite made explicit anywhere, but I think we should deal with that separately compared to the policy for direct contributions to this repository.

Do you have an example of a PR that's attributed to a team rather than an individual?

Maybe irrelevant detail, but you can create PRs whose head points at an organization's fork, though it is still a user account actually opening the PR.

annevk · 2026-06-04T07:35:27Z

+> #### Discussion
+>
+> Discussion


Is this meant to be part of the policy?

RFC 239: Policy on LLM assistance in contributions

ba9ec3e

gsnedders requested changes May 21, 2026

View reviewed changes

jgraham requested changes May 28, 2026

View reviewed changes

Comment thread rfcs/llm_policy.md Outdated

Incorporate review feedback

3f4f8e0

jugglinmike requested review from gsnedders and jgraham June 2, 2026 20:10

jyasskin mentioned this pull request Jun 3, 2026

[Minutes] SG Minutes 2026-06-02 whatwg/sg#274

Closed

annevk reviewed Jun 4, 2026

View reviewed changes

gsnedders mentioned this pull request Jun 4, 2026

Define "Trusted External Repositories" #240

Open

Conversation

jugglinmike commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gsnedders commented May 21, 2026

Uh oh!

gsnedders left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martinthomson commented May 27, 2026

Uh oh!

jgraham left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jgraham commented May 28, 2026

Uh oh!

DanielRyanSmith commented Jun 1, 2026

Uh oh!

tabatkins commented Jun 1, 2026

Uh oh!

jgraham commented Jun 2, 2026

Uh oh!

DanielRyanSmith commented Jun 2, 2026

Uh oh!

jugglinmike commented Jun 2, 2026

Uh oh!

gsnedders commented Jun 3, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

jugglinmike commented May 20, 2026 •

edited

Loading