Skip to content

RFC 239: Policy on LLM assistance in contributions#239

Open
jugglinmike wants to merge 2 commits into
web-platform-tests:mainfrom
bocoup:llm-policy
Open

RFC 239: Policy on LLM assistance in contributions#239
jugglinmike wants to merge 2 commits into
web-platform-tests:mainfrom
bocoup:llm-policy

Conversation

@jugglinmike
Copy link
Copy Markdown
Contributor

@jugglinmike jugglinmike commented May 20, 2026

This initial draft takes a maximalist approach (and a permissive stance) to promote a robust and grounded discussion.

Rendered

@gsnedders
Copy link
Copy Markdown
Member

Fixes #202

Copy link
Copy Markdown
Member

@gsnedders gsnedders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for trying to tackle this!

I think we should also write something about use of LLMs for review, if nothing else.

Comment thread rfcs/llm_policy.md
Comment thread rfcs/llm_policy.md Outdated
Comment thread rfcs/llm_policy.md Outdated
Comment thread rfcs/llm_policy.md Outdated
Comment thread rfcs/llm_policy.md Outdated
Comment thread rfcs/llm_policy.md
> contributions; the external projects are trusted to determine their own
> mechanisms for quality assurance.

## Risks
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worthwhile including at least a few more technical risks:

  1. Contributions of tests generated by an LLM closely looking at a specific implementation's code, matching that implementation, rather than the spec. (This is, of course, already an issue — but could inevitably become more of a problem if we get more, larger contributions.)
  2. Contributions not matching the spec at all. I've seen this mostly with trying to generate tests to assert ordering of things which end of using HTML's parallelism and HTML's event loops; that case is especially annoying because it can lead to flaky tests.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, those look like specific failure modes of LLMs. I think they'd be more helpful to elucidate what we mean by "low-value contributions" rather than as additional risks that are distinct from low-value contributions. I've just pushed a commit incorporating them in that way.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I think the (2) might be reasonable to class as "low value", I think (1) is really a different failure mode, and one that becomes a higher-risk with LLMs (by virtue of sheer volume).

@martinthomson
Copy link
Copy Markdown

In terms of a survey, this is a very useful resource. It's fairly comprehensive and it provides a useful rubric for classification across quality, copyright, and ethics.

Copy link
Copy Markdown
Contributor

@jgraham jgraham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My view is that the overall policy should be something like:

  • The use of LLMs as tools to help author commits to this repository is allowed
  • All contributions must be attributed to a person. That person is expected to fully understand and take responsibility for their contribution, irrespective of any tooling used.
  • If LLMs are used as a significant input to a commit, authors are encouraged to include details about how they were used as part of the commit message, in order to help review and future understanding of the code (similar to the way that a commit created using a script might include details of the script).
  • People who habitually submit low quality PRs, PRs they don't understand, or otherwise make contributions that are deemed to waste the time of other project contributors or downstream users of the project may be banned.

Comment thread rfcs/llm_policy.md Outdated
@jgraham
Copy link
Copy Markdown
Contributor

jgraham commented May 28, 2026

https://github.com/BurntSushi/ripgrep/blob/master/AI_POLICY.md is also interesting, in particular the "no AI in issues or issue comments" part, with a specific carve out for use cases like translation.

@DanielRyanSmith
Copy link
Copy Markdown
Contributor

I think I'm mostly in agreement with the discussions here so far. I do have one scenario where I think the policy could be expanded on.

All contributions must be attributed to a person. That person is expected to fully understand and take responsibility for their contribution, irrespective of any tooling used.

I think it would be reasonable to allow a test to be proposed or generated, and although the original author does not have a strong fundamental understanding of the change, it would be feasible to have that change reviewed thoroughly by a domain expert before submission.

For example, using the experimental tool WPT-Gen can uncover potential gaps in test coverage reading the spec, and then generate tests to cover those gaps. Hypothetically, we could occasionally run this automatically for certain features. If we have domain expert reviewers who are willing to thoroughly review generated tests, it seems like it would be useful to have an option in the LLM policy to allow for this. Naturally, this does put a lot of burden on the already thankless job of being a reviewer, but it still seems like a viable contribution avenue.

I imagine this will not be terribly common, but it is something that doesn't sound unrealistic.

@tabatkins
Copy link
Copy Markdown

As a reviewer, I am not interested in reviewing tests that the author does not understand. There are a lot more test authors than test reviewers. Someone using wpt-gen to create tests should then review those tests and make sure they're doing reasonable things before tossing them at a reviewer. (Generally speaking, I'm not happy with things that further unbalance the effort expectations in the submitter/reviewer dynamic. I'm happy to do thorough review to teach a person how to handle an area; I am not happy to do the same to approve/reject a large batch of unreviewed tests when the effort will not meaningfully improve future interactions.))

@jgraham
Copy link
Copy Markdown
Contributor

jgraham commented Jun 2, 2026

To expand a bit on my proposal for issue/PR comments:

  • The project may choose to use tooling that produces LLM-generated comments. Such comments must always be explicitly associated with a bot account. The core team gets final discretion about which bots / tooling are permitted.
  • Human authored comments should not include LLM-generated content in the main text; any such content must be clearly labelled and placed inside a <details> element.

@DanielRyanSmith
Copy link
Copy Markdown
Contributor

As a reviewer, I am not interested in reviewing tests that the author does not understand. There are a lot more test authors than test reviewers. Someone using wpt-gen to create tests should then review those tests and make sure they're doing reasonable things before tossing them at a reviewer. (Generally speaking, I'm not happy with things that further unbalance the effort expectations in the submitter/reviewer dynamic. I'm happy to do thorough review to teach a person how to handle an area; I am not happy to do the same to approve/reject a large batch of unreviewed tests when the effort will not meaningfully improve future interactions.))

I don't disagree with any of this, and I should clarify that I'm not advocating for reviewers now being responsible for reviewing AI-generated PRs that authors do not understand - that would absolutely not be beneficial to the project.

I do think that there's a scenario where someone is working to identify and fill gaps in test coverage using LLMs for specific features, and if that person has already interacted with a specific reviewer who is in agreement and willing to review these tests, then that should not be disbarred from contribution if these tests are being reviewed by a competent reviewer.

This is mostly a corner case, but it is something that I wanted to bring up as well.

@jugglinmike
Copy link
Copy Markdown
Contributor Author

Thanks, folks! During today's meeting, I offered to incorporate all the feedback that hasn't received explicit pushback on this discussion thread. While carrying that out, I've found one small conflict between the suggestions above.

@gsnedders gestured toward a statement in the Chromium project's policy:

That said, Chromium's policy here is currently:

To aid reviewers, authors should flag areas that they are not confident about that had AI assistance.

This is maybe a weaker form, and hopefully something more in line with existing contributions.

However, @jgraham later made a suggestion which undermines that text:

  • All contributions must be attributed to a person. That person is expected to fully understand and take responsibility for their contribution, irrespective of any tooling used.
  • If LLMs are used as a significant input to a commit, authors are encouraged to include details about how they were used as part of the commit message, in order to help review and future understanding of the code (similar to the way that a commit created using a script might include details of the script).

If a person must "fully understand and take responsibility for their contribution," then there doesn't seem to be a need to guide folks who "are not confident" about their work. I've incorporated the part from @jgraham's exclusively because @gsnedders has not pushed back on it.

@jugglinmike jugglinmike requested review from gsnedders and jgraham June 2, 2026 20:10
@gsnedders
Copy link
Copy Markdown
Member

I do think that there's a scenario where someone is working to identify and fill gaps in test coverage using LLMs for specific features, and if that person has already interacted with a specific reviewer who is in agreement and willing to review these tests, then that should not be disbarred from contribution if these tests are being reviewed by a competent reviewer.

I think there's two meaningful cases here:

  1. The specific reviewer doesn't have access to the same models as the other contributor, and
  2. The specific reviewer is delegating the task of actually generating the tests to the other contributor.

In either case, I think I'd rather push the other contributor to provide the tests to the specific reviewer in some other way — e.g., by pushing the branch to their fork without opening a PR so the specific reviewer can take over and do any initial fixup work before opening a PR.

I, like others here, don't want to deal with PRs that the author hasn't themselves reviewed.

If a person must "fully understand and take responsibility for their contribution," then there doesn't seem to be a need to guide folks who "are not confident" about their work. I've incorporated the part from @jgraham's exclusively because @gsnedders has not pushed back on it.

While I recognise @jgraham was clearly deriving text from the Firefox policy, I do wonder if "fully understand" is a somewhat unreasonable bar! I think most of us working on the infrastructure have touched things without fully understanding the code?

That said, I don't have a particularly strong opinion here!

Comment thread rfcs/llm_policy.md
> Human-authored code discourse (e.g. issue descriptions, pull request
> descriptions, and responses to discussion threads) should not include
> LLM-generated content in the main text; any such content must be clearly
> labelled and placed inside a `<details>` element.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels too onerous. What if people use it for translation? What if they copy-and-pasted a tiny snippet?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ripgrep policy linked above has specific text about translation, which after noting its usefulness says the same policy applies:

If using AI for translation, we recommend writing in your native language and including the AI translation in a quote block.

That makes a lot of sense to me. If you only auto-translated the content then providing the source language as canonical allows others to redo the translation potentially using a better tool.

The tiny snippet thing feels very edge-casy, but we can try to reword. Are there specific scenarios you have in mind? Personally I'm not sure why someone would take just a few words directly and intermix it with otherwise handwritten text. If it's not substantive (e.g. you're just copying phrasing) then I'd prefer that you didn't do that, if it is substantive (e.g. you're relying on an LLM for a point of fact) I'd prefer that you are explicit about where you got it from (or rewrite it in your own words, which at least suggests that you put a modest amount of thought into it).

I do think there's a case for allowing <blockquote> for including shorter sections of LLM output (up to one paragraph).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g., the LLM whipped up a comparison table to illustrate some design trade-offs. It just seems excessive to have a "must" on this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That feels like a case where it's very useful to know that it was LLM output that was copied and pasted.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally don’t really mind where it comes from as long as a human pasted it and knows it’s correct. If they are unsure, then yes.

Being overtly strict doesn’t seem like the right trade-off. Especially since I don’t think we even have that many bad contributions? They first have to figure out how to get reviewed. 😅

Comment thread rfcs/llm_policy.md
>
> #### Understanding
>
> Every pull request must be initiated by one human. That person must author
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be at least one? Don't we sometimes get team contributions?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All PRs are created by one user account which almost always means one person. Do you have an example of a PR that's attributed to a team rather than an individual?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That part makes sense, but I don't think the human uploading always knows about every line, if they are open sourcing a set of tests for instance that were developed internally over some period of time. E.g., I think some Opera contributions in the past were of this nature.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like a case where we're trusting a third party repository as a source of commits (as we do for Gecko, Chromium, WebKit and Servo repositories today). It is true the policy there isn't quite made explicit anywhere, but I think we should deal with that separately compared to the policy for direct contributions to this repository.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an example of a PR that's attributed to a team rather than an individual?

Maybe irrelevant detail, but you can create PRs whose head points at an organization's fork, though it is still a user account actually opening the PR.

Comment thread rfcs/llm_policy.md
Comment on lines +69 to +71
> #### Discussion
>
> Discussion
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this meant to be part of the policy?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants