Skip to content

contrib: require explicit agreement for including external code#23201

Open
ngxson wants to merge 1 commit into
ggml-org:masterfrom
ngxson:xsn/contri_no_copy_code
Open

contrib: require explicit agreement for including external code#23201
ngxson wants to merge 1 commit into
ggml-org:masterfrom
ngxson:xsn/contri_no_copy_code

Conversation

@ngxson

@ngxson ngxson commented May 17, 2026

Copy link
Copy Markdown
Collaborator

Overview

In simple words: ask the author before copying their code

Requirements

@ngxson ngxson requested a review from a team May 17, 2026 11:29
@ngxson ngxson requested a review from ggerganov as a code owner May 17, 2026 11:29
@inforithmics

inforithmics commented May 17, 2026

Copy link
Copy Markdown

Does this include code files that are used in the vendor directory?

@taronaeo

Copy link
Copy Markdown
Member

I would suggest that they credit the original author for their work too, even if they had their blessing to use it, so that we know where it came from.

@JohannesGaessler JohannesGaessler left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My opinion is that (instead of requiring explicit permission in all cases) we should instead require disclosure and then decide on a case-by-case basis how to go from there.

I would suggest that they credit the original author for their work too, even if they had their blessing to use it, so that we know where it came from.

I agree that external code sources should be documented.

@ngxson

ngxson commented May 17, 2026

Copy link
Copy Markdown
Collaborator Author

@inforithmics we include full license from vendors in the distributed package for this purpose

@JohannesGaessler can you point me to a real example where always require explicit acknowledgement can be a problem?

many recent slop PRs are taken straight from llama.cpp forks, like turboquant and dsv4, where the original author have no intent to push it upstream. this kind of contribution farming is not something we should tolerate IMO, it's straight up stealing code without consent.

some examples:

@ngxson

ngxson commented May 17, 2026

Copy link
Copy Markdown
Collaborator Author

@taronaeo to be clear, we do try our best to give attributions to original authors. Asking for an explicit approval will also can also prevent potential unfortunate events where code is merged to upstream, but the author then requires a specific form of attribution that we cannot offer.

@JohannesGaessler

Copy link
Copy Markdown
Contributor

The problem we have is people creating spam PRs. I'm thinking there could be legitimate use for taking code from external sources where the original author is not available/responding for whatever reason. Although we could also handle that on a case-by-case basis.

Maybe we were also not thinking of the same thing. The text in the PR says "obtain explicit acknowledgement from the original author" but what do we actually mean by that? I read it as "explicit permission".

@ngxson

ngxson commented May 17, 2026

Copy link
Copy Markdown
Collaborator Author

Maybe we were also not thinking of the same thing. The text in the PR says "obtain explicit acknowledgement from the original author" but what do we actually mean by that? I read it as "explicit permission".

Yes, I do mean "explicit permission" or in other words, "written proof"

I'm thinking there could be legitimate use for taking code from external sources where the original author is not available/responding for whatever reason. Although we could also handle that on a case-by-case basis.

IMO it is a bit risky that way, for the reason I mentioned in my previous comment. I do think an explicit consensus is much safer in the context of llama.cpp as a well-known open-source project.

I would assume the case where we legitimately copy code from another repo is pretty rare in llama.cpp, for 2 reasons:

  1. Most of the code here are tailored to llama.cpp and ggml infrastructure, for ex. model definitions are often originally in python, and reimplementing them in cpp isn't counted as "include code from external source"
  2. Agents are pretty capable nowadays, in many cases it's more efficient to vibe code a version specifically for llama.cpp

Also note that cases below are not counted as "include code from external source":

  • Add a 3rd party library --> we USE it as the author intended
  • Copy example code --> again, use as intended purpose
  • Copy trivial code --> if any other human or agents can produce the same code without prior knowledge, no IP can be claimed

@IMbackK

IMbackK commented May 17, 2026

Copy link
Copy Markdown
Collaborator

Any code licensed under a license compatible with inclusion in llamacpp should be allowed to be included in a pr if it serves the project. Requiring explicit permission from the author of the code in question is not necessary nor useful as a stipulation if that author has released the code under a license that allows inclusion.

Allowing the original author to upstream his own code published elsewhere is a common curtsy that should be observed, but there is absolutely nothing wrong with anyone adding code that the original author is not interested in up-streaming or if the code in question is being borrowed from a unrelated project.

@ngxson

ngxson commented May 17, 2026

Copy link
Copy Markdown
Collaborator Author

@IMbackK In theory, MIT or any compatible license allow doing exactly that. But in reality, not everyone is happy with their code is being copied without their acknowledgement.

Example: Imagine that I'm working on a big feature and I want to optimize it further, then create the PR on upstream later on. HOWEVER if someone take my code and push to upstream in this bad state (without my acknowledgement), that would be pretty much unwanted, even though everything is permitted by the licenses.

And indeed, just to remind that there were some messy consequences of not having explicit agreement from original author about how to give them attributions (I won't mention in details here as most maintainers already know). So I think my point still stands, having explicit acknowledgement from the original author is still much better / much safer to have.

If you have an example of how this can hurt the development of the project, I'm happy to discuss further.

@ngxson ngxson changed the title contrib: make it clear that we do not accept IP violation contrib: require explicit agreement for including external code May 18, 2026
@0cc4m

0cc4m commented May 18, 2026

Copy link
Copy Markdown
Contributor

In theory, MIT or any compatible license allow doing exactly that. But in reality, not everyone is happy with their code is being copied without their acknowledgement.

MIT does require acknowledgement of license and author through the copyright note for copies of "substantial portions" of code. This is sensible anyways, if you use someone's code (as allowed by the license), at the very least credit them.

I don't think we need a specific rule beyond that, the current rules already allow closing these kinds of spam PRs without any problem. It can be handled case by case.

@ngxson

ngxson commented May 18, 2026

Copy link
Copy Markdown
Collaborator Author

@0cc4m can you cite the exact text in the license that explicitly or implicitly implies the terms you mentioned ?

@0cc4m

0cc4m commented May 18, 2026

Copy link
Copy Markdown
Contributor

I'm not a lawyer, but the MIT license states

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

The above copyright notice is e.g. Copyright (c) 2023-2026 The ggml authors in our case, this permission notice is just the license text, I think. The relevant legal question would be what substantial portion means, I don't know that.

Maybe if including MIT-licensed code in another MIT-licensed project the second condition is already satisfied by the main license, but the copyright notice would still be required and can maybe be in a comment above the code that was imported.

I might be wrong, but it sounds to me like that covers our cases well enough. I don't think we would really consider importing code from other projects most of the time anyways.

@ngxson

ngxson commented May 18, 2026

Copy link
Copy Markdown
Collaborator Author

@0cc4m I think what you are referring to has nothing to do with the point of "the author must acknowledge about how their code is being used"

Grammatically say, the term you mentioned is in the passive form (use of the phrase shall be) and can be re-written in the active form:

(Subject) must include the above copyright notice and this permission notice in all copies or substantial portions of the Software.

It means the subject who use or redistribute the software must acknowledge and include the license, but not the way around.

But still, the license the "in theory". After all, I believe my arguments are pretty solid as they are backed by real examples that can be verified.

I still strongly believe that a change in the guideline is needed. However, after second thought, I'd agree my proposed expression is a bit strict. The @JohannesGaessler 's proposal sounds a bit better, so I'll rethink and adapt to his version instead.

@0cc4m

0cc4m commented May 18, 2026

Copy link
Copy Markdown
Contributor

I think what you are referring to has nothing to do with the point of "the author must acknowledge about how their code is being used"

Yes, I am talking about what anyone submitting code to us that they have not authored must do. Whether we accept it or not is a different question. If someone does not follow this, it can be declined immediately. If someone does follow it, it can be considered/discussed and that may include asking the original author if necessary.

I just don't think we should codify requiring direct confirmation from the original author if it is licensed in a permissive way, but I also don't mean we should just accept code from other projects. I agree with deciding case-by-case.

@JohannesGaessler

JohannesGaessler commented May 18, 2026

Copy link
Copy Markdown
Contributor

How about something like this:

"If at all possible, coordinate the upstreaming of code from forks of this repository with the original author(s). If you are not doing this, explain why. Always disclose when code is being upstreamed."

I think our issue largely has to do with forks of llama.cpp specifically, not with random MIT-licensed projects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants