RFC 239: Policy on LLM assistance in contributions#239
Conversation
|
Fixes #202 |
gsnedders
left a comment
There was a problem hiding this comment.
Thanks for trying to tackle this!
I think we should also write something about use of LLMs for review, if nothing else.
| > contributions; the external projects are trusted to determine their own | ||
| > mechanisms for quality assurance. | ||
|
|
||
| ## Risks |
There was a problem hiding this comment.
I think it's worthwhile including at least a few more technical risks:
- Contributions of tests generated by an LLM closely looking at a specific implementation's code, matching that implementation, rather than the spec. (This is, of course, already an issue — but could inevitably become more of a problem if we get more, larger contributions.)
- Contributions not matching the spec at all. I've seen this mostly with trying to generate tests to assert ordering of things which end of using HTML's parallelism and HTML's event loops; that case is especially annoying because it can lead to flaky tests.
There was a problem hiding this comment.
To me, those look like specific failure modes of LLMs. I think they'd be more helpful to elucidate what we mean by "low-value contributions" rather than as additional risks that are distinct from low-value contributions. I've just pushed a commit incorporating them in that way.
There was a problem hiding this comment.
While I think the (2) might be reasonable to class as "low value", I think (1) is really a different failure mode, and one that becomes a higher-risk with LLMs (by virtue of sheer volume).
|
In terms of a survey, this is a very useful resource. It's fairly comprehensive and it provides a useful rubric for classification across quality, copyright, and ethics. |
jgraham
left a comment
There was a problem hiding this comment.
My view is that the overall policy should be something like:
- The use of LLMs as tools to help author commits to this repository is allowed
- All contributions must be attributed to a person. That person is expected to fully understand and take responsibility for their contribution, irrespective of any tooling used.
- If LLMs are used as a significant input to a commit, authors are encouraged to include details about how they were used as part of the commit message, in order to help review and future understanding of the code (similar to the way that a commit created using a script might include details of the script).
- People who habitually submit low quality PRs, PRs they don't understand, or otherwise make contributions that are deemed to waste the time of other project contributors or downstream users of the project may be banned.
|
https://github.com/BurntSushi/ripgrep/blob/master/AI_POLICY.md is also interesting, in particular the "no AI in issues or issue comments" part, with a specific carve out for use cases like translation. |
|
I think I'm mostly in agreement with the discussions here so far. I do have one scenario where I think the policy could be expanded on.
I think it would be reasonable to allow a test to be proposed or generated, and although the original author does not have a strong fundamental understanding of the change, it would be feasible to have that change reviewed thoroughly by a domain expert before submission. For example, using the experimental tool WPT-Gen can uncover potential gaps in test coverage reading the spec, and then generate tests to cover those gaps. Hypothetically, we could occasionally run this automatically for certain features. If we have domain expert reviewers who are willing to thoroughly review generated tests, it seems like it would be useful to have an option in the LLM policy to allow for this. Naturally, this does put a lot of burden on the already thankless job of being a reviewer, but it still seems like a viable contribution avenue. I imagine this will not be terribly common, but it is something that doesn't sound unrealistic. |
|
As a reviewer, I am not interested in reviewing tests that the author does not understand. There are a lot more test authors than test reviewers. Someone using wpt-gen to create tests should then review those tests and make sure they're doing reasonable things before tossing them at a reviewer. (Generally speaking, I'm not happy with things that further unbalance the effort expectations in the submitter/reviewer dynamic. I'm happy to do thorough review to teach a person how to handle an area; I am not happy to do the same to approve/reject a large batch of unreviewed tests when the effort will not meaningfully improve future interactions.)) |
|
To expand a bit on my proposal for issue/PR comments:
|
I don't disagree with any of this, and I should clarify that I'm not advocating for reviewers now being responsible for reviewing AI-generated PRs that authors do not understand - that would absolutely not be beneficial to the project. I do think that there's a scenario where someone is working to identify and fill gaps in test coverage using LLMs for specific features, and if that person has already interacted with a specific reviewer who is in agreement and willing to review these tests, then that should not be disbarred from contribution if these tests are being reviewed by a competent reviewer. This is mostly a corner case, but it is something that I wanted to bring up as well. |
|
Thanks, folks! During today's meeting, I offered to incorporate all the feedback that hasn't received explicit pushback on this discussion thread. While carrying that out, I've found one small conflict between the suggestions above. @gsnedders gestured toward a statement in the Chromium project's policy:
However, @jgraham later made a suggestion which undermines that text:
If a person must "fully understand and take responsibility for their contribution," then there doesn't seem to be a need to guide folks who "are not confident" about their work. I've incorporated the part from @jgraham's exclusively because @gsnedders has not pushed back on it. |
I think there's two meaningful cases here:
In either case, I think I'd rather push the other contributor to provide the tests to the specific reviewer in some other way — e.g., by pushing the branch to their fork without opening a PR so the specific reviewer can take over and do any initial fixup work before opening a PR. I, like others here, don't want to deal with PRs that the author hasn't themselves reviewed.
While I recognise @jgraham was clearly deriving text from the Firefox policy, I do wonder if "fully understand" is a somewhat unreasonable bar! I think most of us working on the infrastructure have touched things without fully understanding the code? That said, I don't have a particularly strong opinion here! |
| > Human-authored code discourse (e.g. issue descriptions, pull request | ||
| > descriptions, and responses to discussion threads) should not include | ||
| > LLM-generated content in the main text; any such content must be clearly | ||
| > labelled and placed inside a `<details>` element. |
There was a problem hiding this comment.
This feels too onerous. What if people use it for translation? What if they copy-and-pasted a tiny snippet?
There was a problem hiding this comment.
The ripgrep policy linked above has specific text about translation, which after noting its usefulness says the same policy applies:
If using AI for translation, we recommend writing in your native language and including the AI translation in a quote block.
That makes a lot of sense to me. If you only auto-translated the content then providing the source language as canonical allows others to redo the translation potentially using a better tool.
The tiny snippet thing feels very edge-casy, but we can try to reword. Are there specific scenarios you have in mind? Personally I'm not sure why someone would take just a few words directly and intermix it with otherwise handwritten text. If it's not substantive (e.g. you're just copying phrasing) then I'd prefer that you didn't do that, if it is substantive (e.g. you're relying on an LLM for a point of fact) I'd prefer that you are explicit about where you got it from (or rewrite it in your own words, which at least suggests that you put a modest amount of thought into it).
I do think there's a case for allowing <blockquote> for including shorter sections of LLM output (up to one paragraph).
There was a problem hiding this comment.
E.g., the LLM whipped up a comparison table to illustrate some design trade-offs. It just seems excessive to have a "must" on this.
There was a problem hiding this comment.
That feels like a case where it's very useful to know that it was LLM output that was copied and pasted.
There was a problem hiding this comment.
I personally don’t really mind where it comes from as long as a human pasted it and knows it’s correct. If they are unsure, then yes.
Being overtly strict doesn’t seem like the right trade-off. Especially since I don’t think we even have that many bad contributions? They first have to figure out how to get reviewed. 😅
| > | ||
| > #### Understanding | ||
| > | ||
| > Every pull request must be initiated by one human. That person must author |
There was a problem hiding this comment.
Should this be at least one? Don't we sometimes get team contributions?
There was a problem hiding this comment.
All PRs are created by one user account which almost always means one person. Do you have an example of a PR that's attributed to a team rather than an individual?
There was a problem hiding this comment.
That part makes sense, but I don't think the human uploading always knows about every line, if they are open sourcing a set of tests for instance that were developed internally over some period of time. E.g., I think some Opera contributions in the past were of this nature.
There was a problem hiding this comment.
That seems like a case where we're trusting a third party repository as a source of commits (as we do for Gecko, Chromium, WebKit and Servo repositories today). It is true the policy there isn't quite made explicit anywhere, but I think we should deal with that separately compared to the policy for direct contributions to this repository.
There was a problem hiding this comment.
Do you have an example of a PR that's attributed to a team rather than an individual?
Maybe irrelevant detail, but you can create PRs whose head points at an organization's fork, though it is still a user account actually opening the PR.
| > #### Discussion | ||
| > | ||
| > Discussion |
There was a problem hiding this comment.
Is this meant to be part of the policy?
This initial draft takes a maximalist approach (and a permissive stance) to promote a robust and grounded discussion.
Rendered