Conversation
|
I'm definitely interested in a process that helps us leverage the intent of such pull requests without introducing the liabilities of prompt engineering, e.g. technical debt, copyright dilution, infringement, and digital sovereignty concerns, among others. But the process cannot be to just to revise our policy to permissively allow pull requests that introduce the above liabilities. None of my questions and concerns from the prior pull request have been addressed in discussion. For clarity of discussion, I have copied my questions and concerns from pull request 3 below. Practical questions:
|
|
Valid points.
Yes I think specific topics should be highlighted. Is this about risk of license laundering? Lack of attribution?
I think since LLMs are trained on public data, a Google search / GitHub search of relevant samples of a piece of work could quickly provide indications of possible infringement. It's also worth noting that not all code is equal, e.g. boilerplate code for a user interface is different than an algorithm performing a very domain specific task. I would scrutinize the latter more than the former.
By labeling and disclosing the parts that are machine generated (which is addressed in the proposed language change). Git and GitHub can be used to track changes. If machine code is not copyrightable, then code like WebODM/WebODM#1820 can be used by others, but the sum of machine code + AGPLv3 code remains bound by AGPLv3. Just like when we vendor MIT or BSD code into AGPLv3.
We don't accept bot generated pull requests without a human in the loop.
The author/developer, obviously. |
|
I will link this post from Karpathy that summarizes well the shift that is happening https://x.com/karpathy/status/2015883857489522876 , and that perhaps is not evident to non-coders. |
|
Gentle reminder @smathermather @dakotabenjamin |
Needed some time to review and update all my sources. Summary: the state of the art has subtly changed, but the underlying blocking issues are still there: Going back to the framing, as it is critical to understand and frame my concerns around ethics, technical debt, copyright dilution, infringement, and digital sovereignty concerns. The only one I see adequately addressed is copyright dilution (FWIW, I am not even addressing the ethics issues in any depth at this time).
Accountability goes beyond code review and includes conceptualization, execution, debugging, refactoring. Some reading/framing: In short, centaurs are probably fine (as long as we ignore the ethics of energy usage, water usage, and global labor underpinning LLMs). Reverse centaurs are a long-term code maintenance, technical debt problem. We already have centaurs allowed in our policy as a compromise. Reverse centaurs should not be allowed. Via your Karpathy link:
How the human is in the loop matters. See reverse centaurs above. Why would we willingly introducing subtle "conceptual" errors into our code base? When we do so with people, we hope and expect them to grow. We invest in those devs and develop a community of developers. When we instead divest in people and digital sovereignty by instead spending that energy on coding tools that other folks own, we divest of developer community, and we isolate ourselves, all to improve tools we don't own.
Are LLMs only trained on public data and do we know the full corpus of the training datasets for the LLMs that we are allowing? Is this corpus published and third party verified for all LLMs we would be allowing? Do we know which LLMs we are allowing? Unfortunately permissive policy does not address any of this. Moreover, it feels like the onus on copyright review then falls back to the maintainer, which is a shift in responsibility. When someone writes their code and submits their code, the copyright review is fairly clear. When "authors" are acting as a reverse centaur, we have permitted trading the problem of writing the code for a basket of other problems, including possible infringement.
I'm unsure the audience of non-coders in this context, but I'll quote a bit more from the thread:
AGI isn't a thing, and this is a fairly worrisome inclusion twice in this thread. But, more to the point, "self-experimentation is gossip, not evidence" and "self-experimenting with psychological hazards is always a bad idea" via https://www.baldurbjarnason.com/2025/trusting-your-own-judgement-on-ai/. In short, even the folks like Karpathy who are hyping it are finding some variation on the same challenges as David Chisnall: plausible looking code that is almost functional and increases productivity most of the time isn't enough to make up for the subtler problems that are much harder to debug.
Interesting. And maybe it makes us dumber. I doubt it. But we should be mindful of our metacognition: My impression from our short conversation on the topic was the @DodgySpaniard is not against generative coding in general, but doesn't think we are ready to accept such submissions nor wants to review such submissions prior to establishing a framework for doing so. So, I would conclude that while we differ on position (as I am still against generative coding for the reasons above and more), this change to policy is at best premature. Here's what I propose we do instead with WebODM/WebODM#1820: similar to API or IBM PC cloning, we summarize the changes and task someone with implementing that summary. Arun M. has already volunteered to find someone to do the re-implementation. |
|
I think we disagree on most fronts here. My only suggestion would be to actually try to use these tools and evaluate them for what they can (and cannot do). Ironically, I think contributors like you and @Saijin-Naib would benefit the most from these tools, as the technical nuances of coding are being abstracted away, similarly to how we no longer need to write Assembly by hand but use compilers and high level languages like Python to generate machine code, LLMs are able to generate high level code from natural language. Which means you could help actually fix problems in the codebases. E.g. somebody just did that for a few issues you opened a few days ago: OpenDroneMap/ODM#1994 are you going to reject that PR too (since it's most likely AI assisted/generated)? Appreciate Arun's help here, I look forward to a (AI free) PR. |
|
The validation and feeling of being able to contribute beyond my means and capabilities is alluring, and I considered such before looking into the myriad issues with GenAI. Now, I am more certain than ever that my doing so would be a negative benefit as I lack the programming skill to critically evaluate the generated code for correctness, safety, legality, provenance, et al. What you seem to overlook as a prolific and skilled developer is that you can, and do, catch those issues. Me GenAI-ing a hobby horse of mine just outsources all my lack of skill and ability to a knowledgeable maintainer (you, probably). It is a zero-sum game and poison pill, IMO. The labor must be extracted somewhere, and if not from me, then whom? Also, to interpreted languages being similar, I think the metaphor is a bit stretched. Compilers and VMs and such are meant to be deterministic, GenAI outputs are not. It seems to not be the same thing to me. I alluded to it, but I'll elucidate further: I trust you and your ability to craft and review the code. Your recietps are everywhere. Undeniable. Should the project trust me and an LLM the same? Absolutely not. As well-intentioned as I am, I can not validate the outputs properly. I will mess up, and it will cause more maintainer burden as actual experts will have to fix the slop I contributed. You can see this pattern playing out in other projects, and can read many folks remarking on this phenomenon, yourself included. To me it seems GenAI shifts the labor/expertise burden from Lines Of Code generated to Lines Of Code audited. |
Currently one of my main burdens is not reviewing slop, which AI does have a tendency to create, but knowing whether someone used AI to create such slop. we need a policy where we can ask politely "is this AI?", because if it is and it is slop, then you can quickly reject it on the basis that it's no good, but if it isn't I probably need to give the contributor some slack because we're all human and I shouldn't reject a contribution just because I don't think it's good enough.
You're a fantastic tester and you probably know more about the software than most people in this world. If you can test, you can validate. We all mess-up from time to time, myself included. Anyway, I get the feeling that the project is not ready to acknowledge this shift that is happening, so I rest my case and will simply demand that people disclose AI usage as a matter of my own personal policy, rather than an organization wise policy. |
The entire thread is an acknowledgement of a broader shift, but I don't find inevitability narratives compelling nor meaningful. Nor do inevitability narratives negate all the questions and concerns I have around a permissive policy. Driving questions for me aren't whether the hype curve around generative AI for coding is currently on the upswing. Driving questions are: what are the core requirements of the project, how do we grow the community of developers in support of the project, and how do we ensure the long term viability of the project while doing so, inclusive of license, copyright, and maintainability, while also keeping an eye to questions of labor, ethics, provenance, and digital sovereignty?
Disclosure is a requirement of the project, so I'm very happy this is also a personal policy. I understand the desire to have a permissive policy to maybe make it easier to discern origin. But it misrepresents what we are willing to accept in a pull request in order to defend against the assumption that our contributors are dishonest. That makes it a policy that violates trust in order to defend against possible trust violation. Regardless of all of my concerns above, this makes it a problematic policy.
Sounds like we need to ask. I'll follow up.
Same. I'm looking forward to this experiment. It's an interesting model I haven't seen tried, acknowledges the current climate and pressures on projects, and it could be beneficial to the broader ecosystem of FOSS tools handling similar questions. |
The current policy is not sensible. E.g. WebODM/WebODM#1820
These tools are changing the field and we're turning away good contributions.
I propose to change the stance on AI usage.