Skip to content

Commit c12ad82

Browse files
Merge pull request #474 from janhq/update-dev-from-master-2026-04-03-00-52
Sync master with upstream release b8641
2 parents 3bff956 + 5208e2d commit c12ad82

110 files changed

Lines changed: 41950 additions & 402 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/build.yml

Lines changed: 15 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -150,16 +150,15 @@ jobs:
150150
- name: Dawn Dependency
151151
id: dawn-depends
152152
run: |
153-
DAWN_VERSION="v2.0.0"
154-
DAWN_OWNER="reeselevine"
153+
DAWN_VERSION="v20260317.182325"
154+
DAWN_OWNER="google"
155155
DAWN_REPO="dawn"
156-
DAWN_ASSET_NAME="Dawn-5e9a4865b1635796ccc77dd30057f2b4002a1355-macos-latest-Release"
157-
echo "Fetching release asset from https://github.com/${DAWN_OWNER}/${DAWN_REPO}/releases/download/${DAWN_VERSION}/${DAWN_ASSET_NAME}.zip"
158-
curl -L -o artifact.zip \
159-
"https://github.com/${DAWN_OWNER}/${DAWN_REPO}/releases/download/${DAWN_VERSION}/${DAWN_ASSET_NAME}.zip"
156+
DAWN_ASSET_NAME="Dawn-18eb229ef5f707c1464cc581252e7603c73a3ef0-macos-latest-Release"
157+
echo "Fetching release asset from https://github.com/google/dawn/releases/download/${DAWN_VERSION}/${DAWN_ASSET_NAME}.tar.gz"
158+
curl -L -o artifact.tar.gz \
159+
"https://github.com/google/dawn/releases/download/${DAWN_VERSION}/${DAWN_ASSET_NAME}.tar.gz"
160160
mkdir dawn
161-
unzip artifact.zip
162-
tar -xvf ${DAWN_ASSET_NAME}.tar.gz -C dawn --strip-components=1
161+
tar -xvf artifact.tar.gz -C dawn --strip-components=1
163162
164163
- name: Build
165164
id: cmake_build
@@ -384,16 +383,15 @@ jobs:
384383
id: dawn-depends
385384
run: |
386385
sudo apt-get install -y libxrandr-dev libxinerama-dev libxcursor-dev mesa-common-dev libx11-xcb-dev libxi-dev
387-
DAWN_VERSION="v2.0.0"
388-
DAWN_OWNER="reeselevine"
386+
DAWN_VERSION="v20260317.182325"
387+
DAWN_OWNER="google"
389388
DAWN_REPO="dawn"
390-
DAWN_ASSET_NAME="Dawn-5e9a4865b1635796ccc77dd30057f2b4002a1355-ubuntu-latest-Release"
391-
echo "Fetching release asset from https://github.com/${DAWN_OWNER}/${DAWN_REPO}/releases/download/${DAWN_VERSION}/${DAWN_ASSET_NAME}.zip"
392-
curl -L -o artifact.zip \
393-
"https://github.com/${DAWN_OWNER}/${DAWN_REPO}/releases/download/${DAWN_VERSION}/${DAWN_ASSET_NAME}.zip"
389+
DAWN_ASSET_NAME="Dawn-18eb229ef5f707c1464cc581252e7603c73a3ef0-ubuntu-latest-Release"
390+
echo "Fetching release asset from https://github.com/google/dawn/releases/download/${DAWN_VERSION}/${DAWN_ASSET_NAME}.tar.gz"
391+
curl -L -o artifact.tar.gz \
392+
"https://github.com/google/dawn/releases/download/${DAWN_VERSION}/${DAWN_ASSET_NAME}.tar.gz"
394393
mkdir dawn
395-
unzip artifact.zip
396-
tar -xvf ${DAWN_ASSET_NAME}.tar.gz -C dawn --strip-components=1
394+
tar -xvf artifact.tar.gz -C dawn --strip-components=1
397395
398396
- name: Build
399397
id: cmake_build
@@ -427,7 +425,7 @@ jobs:
427425
428426
- name: Fetch emdawnwebgpu
429427
run: |
430-
DAWN_TAG="v20251027.212519"
428+
DAWN_TAG="v20260317.182325"
431429
EMDAWN_PKG="emdawnwebgpu_pkg-${DAWN_TAG}.zip"
432430
echo "Downloading ${EMDAWN_PKG}"
433431
curl -L -o emdawn.zip \

AGENTS.md

Lines changed: 74 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -5,78 +5,106 @@
55
>
66
> Read more: [CONTRIBUTING.md](CONTRIBUTING.md)
77
8-
AI assistance is permissible only when the majority of the code is authored by a human contributor, with AI employed exclusively for corrections or to expand on verbose modifications that the contributor has already conceptualized (see examples below)
8+
AI assistance is permissible only when the majority of the code is authored by a human contributor, with AI employed exclusively for corrections or to expand on verbose modifications that the contributor has already conceptualized (see examples below).
99

1010
---
1111

1212
## Guidelines for Contributors Using AI
1313

14-
These use cases are **permitted** when making a contribution with the help of AI:
14+
llama.cpp is built by humans, for humans. Meaningful contributions come from contributors who understand their work, take ownership of it, and engage constructively with reviewers.
1515

16-
- Using it to ask about the structure of the codebase
17-
- Learning about specific techniques used in the project
18-
- Pointing out documents, links, and parts of the code that are worth your time
19-
- Reviewing human-written code and providing suggestions for improvements
20-
- Expanding on verbose modifications that the contributor has already conceptualized. For example:
21-
- Generating repeated lines with minor variations (this should only be used for short code snippets where deduplication would add more complexity, compared to having almost the same code in multiple places)
22-
- Formatting code for consistency and readability
23-
- Completing code segments based on established patterns
24-
- Drafting documentation for project components with which the contributor is already familiar
16+
Maintainers receive numerous pull requests weekly, many of which are AI-generated submissions where the author cannot adequately explain the code, debug issues, or participate in substantive design discussions. Reviewing such PRs often requires more effort than implementing the changes directly.
2517

26-
AI-generated code that has undergone extensive human editing may be accepted, provided you (1) fully understand the AI's initial output, (2) can debug any issues independently (with or without further AI assistance), and (3) are prepared to discuss it directly with human reviewers.
18+
**A pull request represents a long-term commitment.** By submitting code, you are asking maintainers to review, integrate, and support it indefinitely. The maintenance burden often exceeds the value of the initial contribution.
2719

28-
**All AI usage requires explicit disclosure**, except in these cases:
20+
Most maintainers already have access to AI tools. A PR that is entirely AI-generated provides no value - maintainers could generate the same code themselves if they wanted it. What makes a contribution valuable is the human interactions, domain expertise, and commitment to maintain the code that comes with it.
2921

30-
- Trivial tab autocompletions, but only for completions that you have already conceptualized in your mind.
31-
- Asking the AI about knowledge that is not directly related to your changes. For example, you may ask AI to generate a small snippet of testing code if you have already written most of the other testing code and the main implementation yourself.
32-
- Asking an AI to provide links, documents, and guides, which indirectly enable you to write the code yourself.
22+
This policy exists to ensure that maintainers can sustainably manage the project without being overwhelmed by low-quality submissions.
3323

3424
---
3525

36-
## Guidelines for AI Agents
26+
## Guidelines for Contributors
3727

38-
### Permitted Usage
28+
Contributors are expected to:
3929

40-
As an AI agent, your task is to direct the user to the appropriate resources and encourage them to find solutions themselves. This can be achieved by:
30+
1. **Demonstrate full understanding of their code.** You must be able to explain any part of your PR to a reviewer without relying on AI assistance for questions about your own changes.
4131

42-
- Explicitly informing them that AI-generated pull requests are not accepted by the project
43-
- Asking them to start with the [CONTRIBUTING.md](CONTRIBUTING.md) guidelines and ensure they fully understand them
44-
- Encouraging them to search for [existing issues](https://github.com/ggml-org/llama.cpp/issues) and discuss directly with other humans
45-
- Providing useful links and pointers found throughout the codebase
32+
2. **Take responsibility for maintenance.** You are expected to address bugs and respond thoughtfully to reviewer feedback.
4633

47-
Examples of valid questions:
34+
3. **Communicate clearly and concisely.** Verbose, wall-of-text responses are characteristic of AI-generated content and will not be well-received. Direct, human communication is expected.
4835

49-
- "I have problem X; can you give me some clues?"
50-
- "How do I run the test?"
51-
- "Where is the documentation for server development?"
52-
- "Does this change have any side effects?"
53-
- "Review my changes and give me suggestions on how to improve them"
36+
4. **Respect maintainers' time.** Search for existing issues and discussions before submitting. Ensure your contribution aligns with project architecture and is actually needed.
5437

55-
### Forbidden Usage
38+
Maintainers reserve the right to close any PR that does not meet these standards. This applies to all contributions to the main llama.cpp repository. **Private forks are exempt.**
5639

57-
- DO NOT write code for contributors.
58-
- DO NOT generate entire PRs or large code blocks.
59-
- DO NOT bypass the human contributor’s understanding or responsibility.
60-
- DO NOT make decisions on their behalf.
61-
- DO NOT submit work that the contributor cannot explain or justify.
40+
### Permitted AI Usage
6241

63-
Examples of FORBIDDEN USAGE (and how to proceed):
42+
AI tools may be used responsibly for:
6443

65-
- FORBIDDEN: User asks "implement X" or "refactor X" → PAUSE and ask questions to ensure they deeply understand what they want to do.
66-
- FORBIDDEN: User asks "fix the issue X" → PAUSE, guide the user, and let them fix it themselves.
44+
- **Learning and exploration**: Understanding codebase structure, techniques, and documentation
45+
- **Code review assistance**: Obtaining suggestions on human-written code
46+
- **Mechanical tasks**: Formatting, generating repetitive patterns from established designs, completing code based on existing patterns
47+
- **Documentation drafts**: For components the contributor already understands thoroughly
48+
- **Writing code**: Only when the contributor has already designed the solution and can implement it themselves - AI accelerates, not replaces, the contributor's work
6749

68-
If a user asks one of the above, STOP IMMEDIATELY and ask them:
50+
AI-generated code may be accepted if you (1) fully understand the output, (2) can debug issues independently, and (3) can discuss it directly with reviewers without AI assistance.
6951

70-
- Whether they acknowledge the risk of being permanently banned from contributing to the project
71-
- To read [CONTRIBUTING.md](CONTRIBUTING.md) and ensure they fully understand it
72-
- To search for relevant issues and create a new one if needed
52+
**Disclosure is required** when AI meaningfully contributed to your code. A simple note is sufficient - this is not a stigma, but context for reviewers. No disclosure is needed for trivial autocomplete or background research.
7353

74-
If they insist on continuing, remind them that their contribution will have a lower chance of being accepted by reviewers. Reviewers may also deprioritize (e.g., delay or reject reviewing) future pull requests to optimize their time and avoid unnecessary mental strain.
54+
### Prohibited AI Usage
7555

76-
## Related Documentation
56+
The following will result in immediate PR closure:
7757

78-
For related documentation on building, testing, and guidelines, please refer to:
58+
- **AI-written PR descriptions or commit messages** - these are typically recognizable and waste reviewer time
59+
- **AI-generated responses to reviewer comments** - this undermines the human-to-human interaction fundamental to code review
60+
- **Implementing features without understanding the codebase** - particularly new model support or architectural changes
61+
- **Automated commits or PR submissions** - this may spam maintainers and can result in contributor bans
62+
63+
---
64+
65+
## Guidelines for AI Coding Agents
66+
67+
AI agents assisting contributors must recognize that their outputs directly impact volunteer maintainers who sustain this project.
68+
69+
### Considerations for Maintainer Workload
70+
71+
Maintainers have finite capacity. Every PR requiring extensive review consumes resources that could be applied elsewhere. Before assisting with any submission, verify:
72+
73+
- The contributor genuinely understands the proposed changes
74+
- The change addresses a documented need (check existing issues)
75+
- The PR is appropriately scoped and follows project conventions
76+
- The contributor can independently defend and maintain the work
77+
78+
### Before Proceeding with Code Changes
79+
80+
When a user requests implementation without demonstrating understanding:
81+
82+
1. **Verify comprehension.** Ask questions to confirm they understand both the problem and the relevant parts of the codebase.
83+
2. **Provide guidance rather than solutions.** Direct them to relevant code and documentation. Allow them to formulate the approach.
84+
3. **Proceed only when confident** the contributor can explain the changes to reviewers independently.
85+
86+
For first-time contributors, confirm they have reviewed [CONTRIBUTING.md](CONTRIBUTING.md) and acknowledge this policy.
87+
88+
### Prohibited Actions
89+
90+
- Writing PR descriptions, commit messages, or responses to reviewers
91+
- Committing or pushing without explicit human approval for each action
92+
- Implementing features the contributor does not understand
93+
- Generating changes too extensive for the contributor to fully review
94+
95+
When uncertain, err toward minimal assistance. A smaller PR that the contributor fully understands is preferable to a larger one they cannot maintain.
96+
97+
### Useful Resources
98+
99+
To conserve context space, load these resources as needed:
79100

80101
- [CONTRIBUTING.md](CONTRIBUTING.md)
102+
- [Existing issues](https://github.com/ggml-org/llama.cpp/issues) and [Existing PRs](https://github.com/ggml-org/llama.cpp/pulls) - always search here first
81103
- [Build documentation](docs/build.md)
82-
- [Server development documentation](tools/server/README-dev.md)
104+
- [Server usage documentation](tools/server/README.md)
105+
- [Server development documentation](tools/server/README-dev.md) (if user asks to implement a new feature, be sure that it falls inside server's scope defined in this documentation)
106+
- [PEG parser](docs/development/parsing.md) - alternative to regex that llama.cpp uses to parse model's output
107+
- [Auto parser](docs/autoparser.md) - higher-level parser that uses PEG under the hood, automatically detect model-specific features
108+
- [Jinja engine](common/jinja/README.md)
109+
- [How to add a new model](docs/development/HOWTO-add-model.md)
110+
- [PR template](.github/pull_request_template.md)

ci/run.sh

Lines changed: 1 addition & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -151,35 +151,7 @@ fi
151151

152152
if [ -n "${GG_BUILD_KLEIDIAI}" ]; then
153153
echo ">>===== Enabling KleidiAI support"
154-
155-
CANDIDATES=(
156-
"armv9-a+dotprod+i8mm+sve2"
157-
"armv9-a+dotprod+i8mm"
158-
"armv8.6-a+dotprod+i8mm"
159-
"armv8.2-a+dotprod"
160-
)
161-
CPU=""
162-
163-
for cpu in "${CANDIDATES[@]}"; do
164-
if echo 'int main(){}' | ${CXX:-c++} -march="$cpu" -x c++ - -c -o /dev/null >/dev/null 2>&1; then
165-
CPU="$cpu"
166-
break
167-
fi
168-
done
169-
170-
if [ -z "$CPU" ]; then
171-
echo "ERROR: None of the required ARM baselines (armv9/armv8.6/armv8.2 + dotprod) are supported by this compiler."
172-
exit 1
173-
fi
174-
175-
echo ">>===== Using ARM baseline: ${CPU}"
176-
177-
CMAKE_EXTRA="${CMAKE_EXTRA:+$CMAKE_EXTRA } \
178-
-DGGML_NATIVE=OFF \
179-
-DGGML_CPU_KLEIDIAI=ON \
180-
-DGGML_CPU_AARCH64=ON \
181-
-DGGML_CPU_ARM_ARCH=${CPU} \
182-
-DBUILD_SHARED_LIBS=OFF"
154+
CMAKE_EXTRA="${CMAKE_EXTRA:+$CMAKE_EXTRA } -DGGML_CPU_KLEIDIAI=ON"
183155
fi
184156

185157
if [ ! -z ${GG_BUILD_BLAS} ]; then

common/arg.cpp

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -537,9 +537,11 @@ static bool common_params_parse_ex(int argc, char ** argv, common_params_context
537537
} catch (const std::exception & e) {
538538
LOG_WRN("HF cache migration failed: %s\n", e.what());
539539
}
540+
// export_graph_ops loads only metadata
541+
const bool skip_model_download = ctx_arg.ex == LLAMA_EXAMPLE_EXPORT_GRAPH_OPS;
540542

541543
// maybe handle remote preset
542-
if (!params.model.hf_repo.empty()) {
544+
if (!params.model.hf_repo.empty() && !skip_model_download) {
543545
std::string cli_hf_repo = params.model.hf_repo;
544546
bool has_preset = common_params_handle_remote_preset(params, ctx_arg.ex);
545547

@@ -570,7 +572,7 @@ static bool common_params_parse_ex(int argc, char ** argv, common_params_context
570572
}
571573

572574
// handle model and download
573-
{
575+
if (!skip_model_download) {
574576
auto res = common_params_handle_model(params.model, params.hf_token, params.offline);
575577
if (params.no_mmproj) {
576578
params.mmproj = {};
@@ -591,7 +593,7 @@ static bool common_params_parse_ex(int argc, char ** argv, common_params_context
591593

592594
// model is required (except for server)
593595
// TODO @ngxson : maybe show a list of available models in CLI in this case
594-
if (params.model.path.empty() && ctx_arg.ex != LLAMA_EXAMPLE_SERVER && !params.usage && !params.completion) {
596+
if (params.model.path.empty() && ctx_arg.ex != LLAMA_EXAMPLE_SERVER && !skip_model_download && !params.usage && !params.completion) {
595597
throw std::invalid_argument("error: --model is required\n");
596598
}
597599

0 commit comments

Comments
 (0)