Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
99 changes: 99 additions & 0 deletions rfcs/llm_policy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
# RFC 239: Policy on LLM assistance in contributions

## Summary

Introduce guidelines for acceptable use of large-language models when
contributing to web-platform-tests.

## Background

[#202 Set policy for LLM-generated
tests](https://github.com/web-platform-tests/rfcs/issues/202) includes evidence
for public interest in a formal policy for LLM usage in authoring contributions
to WPT.

The Chrome team is exploring applications of LLMs for detecting coverage gaps
and for filling those gaps with generated code. ([Project
repository](https://github.com/GoogleChromeLabs/wpt-gen), [April 2026
presentation](https://www.youtube.com/watch?v=9r0PBbJFLoM))

A few examples of policies on LLM use in FOSS contributions:

- permissive
- [ghostty/AI_POLICY.md at main · ghostty-org/ghostty](https://github.com/ghostty-org/ghostty/blob/main/AI_POLICY.md)
- [Policy about LLM generated code from PRs · Issue #28335 · opencv/opencv](https://github.com/opencv/opencv/issues/28335)
- [CONTRIBUTING.md: Guidelines relevant to AI-assisted contributions by gasche · Pull Request #14052 · ocaml/ocaml](https://github.com/ocaml/ocaml/pull/14052)
- [LLVM AI Tool Use Policy — LLVM 23.0.0git documentation](https://llvm.org/docs/AIToolPolicy.html)
- [Chromium AI Coding Policy](https://chromium.googlesource.com/chromium/src/+/4f44016dfbd4fcd890694c00d7f9ec6dcefe4955/agents/ai_policy.md)
- [Firefox AI Coding Policy](https://github.com/mozilla-firefox/firefox/blob/1f7030c8de8f2b349c7d91d7b5a3253c109a1cc1/docs/contributing/ai-coding.md)
- prohibitive
- [Code of Conduct ⚡ Zig Programming Language](https://ziglang.org/code-of-conduct/#strict-no-llm-no-ai-policy)
- [Getting Started - The Servo Book](https://book.servo.org/contributing/getting-started.html#ai-contributions)
Comment thread
gsnedders marked this conversation as resolved.

## Details

The following text describes the policy in full and will be maintained in a
dedicated document within WPT's `docs/writing-tests/` directory (which will be
referenced both from the project's `README.md` file and the
`docs/writing-tests/index.md` file):

> ### Guidelines for acceptable LLM use
>
> The use of large language models (LLMs) as tools to help author commits to
> this repository is allowed with the stipulations described below.
> Contributors who repeatedly fail to adhere to these guidelines may be banned
> from contributing to this project.
>
> #### Disclosure
>
> If LLMs are used as a significant input to a commit, authors are encouraged
> to include details about how they were used as part of the commit message in
> order to help review and future understanding of the code.
>
> Human-authored code discourse (e.g. issue descriptions, pull request
> descriptions, and responses to discussion threads) should not include
> LLM-generated content in the main text; any such content must be clearly
> labelled and placed inside a `<details>` element.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels too onerous. What if people use it for translation? What if they copy-and-pasted a tiny snippet?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ripgrep policy linked above has specific text about translation, which after noting its usefulness says the same policy applies:

If using AI for translation, we recommend writing in your native language and including the AI translation in a quote block.

That makes a lot of sense to me. If you only auto-translated the content then providing the source language as canonical allows others to redo the translation potentially using a better tool.

The tiny snippet thing feels very edge-casy, but we can try to reword. Are there specific scenarios you have in mind? Personally I'm not sure why someone would take just a few words directly and intermix it with otherwise handwritten text. If it's not substantive (e.g. you're just copying phrasing) then I'd prefer that you didn't do that, if it is substantive (e.g. you're relying on an LLM for a point of fact) I'd prefer that you are explicit about where you got it from (or rewrite it in your own words, which at least suggests that you put a modest amount of thought into it).

I do think there's a case for allowing <blockquote> for including shorter sections of LLM output (up to one paragraph).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g., the LLM whipped up a comparison table to illustrate some design trade-offs. It just seems excessive to have a "must" on this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That feels like a case where it's very useful to know that it was LLM output that was copied and pasted.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally don’t really mind where it comes from as long as a human pasted it and knows it’s correct. If they are unsure, then yes.

Being overtly strict doesn’t seem like the right trade-off. Especially since I don’t think we even have that many bad contributions? They first have to figure out how to get reviewed. 😅

>
> #### Attribution
>
> All commits must be attributed to the human who is taking responsibility for
> them, regardless of LLM use.
>
> #### Understanding
>
> Every pull request must be initiated by one human. That person must author
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be at least one? Don't we sometimes get team contributions?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All PRs are created by one user account which almost always means one person. Do you have an example of a PR that's attributed to a team rather than an individual?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That part makes sense, but I don't think the human uploading always knows about every line, if they are open sourcing a set of tests for instance that were developed internally over some period of time. E.g., I think some Opera contributions in the past were of this nature.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like a case where we're trusting a third party repository as a source of commits (as we do for Gecko, Chromium, WebKit and Servo repositories today). It is true the policy there isn't quite made explicit anywhere, but I think we should deal with that separately compared to the policy for direct contributions to this repository.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an example of a PR that's attributed to a team rather than an individual?

Maybe irrelevant detail, but you can create PRs whose head points at an organization's fork, though it is still a user account actually opening the PR.

> the pull request description, understand every change proposed, and be
> prepared to engage in technical discussion regarding those changes.
>
> #### Discussion
>
> Discussion
Comment on lines +69 to +71
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this meant to be part of the policy?


## Risks
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worthwhile including at least a few more technical risks:

  1. Contributions of tests generated by an LLM closely looking at a specific implementation's code, matching that implementation, rather than the spec. (This is, of course, already an issue — but could inevitably become more of a problem if we get more, larger contributions.)
  2. Contributions not matching the spec at all. I've seen this mostly with trying to generate tests to assert ordering of things which end of using HTML's parallelism and HTML's event loops; that case is especially annoying because it can lead to flaky tests.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, those look like specific failure modes of LLMs. I think they'd be more helpful to elucidate what we mean by "low-value contributions" rather than as additional risks that are distinct from low-value contributions. I've just pushed a commit incorporating them in that way.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I think the (2) might be reasonable to class as "low value", I think (1) is really a different failure mode, and one that becomes a higher-risk with LLMs (by virtue of sheer volume).


### Discouraging volunteers

All but the most permissive policy is effectively another hurdle to
contributing to the project. Friction in the contribution process could deter
people who might otherwise volunteer their time to help improve the project.

In some sense, adding friction is the goal of this policy. New technology has
removed barriers which previously restricted unqualified individuals from
participation. Rather than introducing more restrictions on good-faith actors,
an ideal policy will buttress eroded structural barriers with more intentional
social ones.

### Encouraging low-value contributions

All but the most restrictive policy could be interpreted as an invitation to
take shortcuts which undermine the quality of contributions. Any permissive
policy might be taken as encouragement to rely on fallible tools (LLMs are
particularly susceptible to certain kinds of test-writing errors, such as
over-fitting and fabrication).

However, it will not be possible to strictly enforce any policy. It inevitably
falls on contributors to follow rules and for administrators to police
transgressions. Respect in public works projects is never guaranteed; policies
exist only to make expectations clear (this is the same dynamic that guides the
design and enforcement of codes of conduct).