|
| 1 | +# Contributors |
| 2 | + |
| 3 | +The project differentiates between 3 levels of contributors: |
| 4 | + |
| 5 | +- Contributors: people who have contributed before (no special privileges) |
| 6 | +- Collaborators (Triage): people with significant contributions, who may be responsible for some parts of the code, and are expected to maintain and review contributions for the code they own |
| 7 | +- Maintainers: responsible for reviewing and merging PRs, after approval from the code owners |
| 8 | + |
| 9 | +# AI Usage Policy |
| 10 | + |
| 11 | +> [!IMPORTANT] |
| 12 | +> This project does **not** accept pull requests that are fully or predominantly AI-generated. AI tools may be utilized solely in an assistive capacity. |
| 13 | +> |
| 14 | +> Repeated violations of this policy may result in your account being permanently banned from contributing to the project. |
| 15 | +> |
| 16 | +> Detailed information regarding permissible and restricted uses of AI can be found in the [AGENTS.md](AGENTS.md) file. |
| 17 | +
|
| 18 | +Code that is initially generated by AI and subsequently edited will still be considered AI-generated. AI assistance is permissible only when the majority of the code is authored by a human contributor, with AI employed exclusively for corrections or to expand on verbose modifications that the contributor has already conceptualized (e.g., generating repeated lines with minor variations). |
| 19 | + |
| 20 | +If AI is used to generate any portion of the code, contributors must adhere to the following requirements: |
| 21 | + |
| 22 | +1. Explicitly disclose the manner in which AI was employed. |
| 23 | +2. Perform a comprehensive manual review prior to submitting the pull request. |
| 24 | +3. Be prepared to explain every line of code they submitted when asked about it by a maintainer. |
| 25 | +4. It is strictly prohibited to use AI to write your posts for you (bug reports, feature requests, pull request descriptions, Github discussions, responding to humans, ...). |
| 26 | + |
| 27 | +For more info, please refer to the [AGENTS.md](AGENTS.md) file. |
| 28 | + |
| 29 | +# Pull requests (for contributors & collaborators) |
| 30 | + |
| 31 | +Before submitting your PR: |
| 32 | +- Search for existing PRs to prevent duplicating efforts |
| 33 | +- whisper.cpp uses the ggml tensor library for model evaluation. If you are unfamiliar with ggml, consider taking a look at the [examples in the ggml repository](https://github.com/ggml-org/ggml/tree/master/examples/). [simple](https://github.com/ggml-org/ggml/tree/master/examples/simple) shows the bare minimum for using ggml. [gpt-2](https://github.com/ggml-org/ggml/tree/master/examples/gpt-2) has minimal implementations for language model inference using GPT-2. [mnist](https://github.com/ggml-org/ggml/tree/master/examples/mnist) demonstrates how to train and evaluate a simple image classifier |
| 34 | +- Test your changes: |
| 35 | + - Execute [the full CI locally on your machine](ci/README.md) before publishing |
| 36 | +- Create separate PRs for each feature or fix: |
| 37 | + - Avoid combining unrelated changes in a single PR |
| 38 | + - For intricate features, consider opening a feature request first to discuss and align expectations |
| 39 | +- If you are a new contributor |
| 40 | + - Limit your open PRs to 1 |
| 41 | + - Do not submit trivial fixes (e.g. typos, formatting changes) |
| 42 | + |
| 43 | +After submitting your PR: |
| 44 | +- Expect requests for modifications to ensure the code meets whisper.cpp's standards for quality and long-term maintainability |
| 45 | +- Maintainers will rely on your insights and approval when making a final decision to approve and merge a PR |
| 46 | +- If your PR becomes stale, rebase it on top of latest `master` to get maintainers attention |
| 47 | + |
| 48 | +# Pull requests (for maintainers) |
| 49 | + |
| 50 | +- Squash-merge PRs |
| 51 | +- Use the following format for the squashed commit title: `<module> : <commit title> (#<issue_number>)`. For example: `utils : fix typo in utils.py (#1234)` |
| 52 | +- Optionally pick a `<module>` from here: https://github.com/ggml-org/llama.cpp/wiki/Modules |
| 53 | +- Let other maintainers merge their own PRs |
| 54 | +- When merging a PR, make sure you have a good understanding of the changes |
| 55 | +- Be mindful of maintenance: most of the work going into a feature happens after the PR is merged. If the PR author is not committed to contribute long-term, someone else needs to take responsibility (you) |
| 56 | + |
| 57 | +Maintainers reserve the right to decline review or close pull requests for any reason, without any questions, particularly under any of the following conditions: |
| 58 | +- The proposed change is already mentioned in the roadmap or an existing issue, and it has been assigned to someone. |
| 59 | +- The pull request duplicates an existing one. |
| 60 | +- The contributor fails to adhere to this contributing guide or the AI policy. |
| 61 | + |
| 62 | +# Coding guidelines |
| 63 | + |
| 64 | +- Avoid adding third-party dependencies, extra files, extra headers, etc. |
| 65 | +- Always consider cross-compatibility with other operating systems and architectures |
| 66 | +- Avoid fancy-looking modern STL constructs, use basic `for` loops, avoid templates, keep it simple |
| 67 | +- Vertical alignment makes things more readable and easier to batch edit |
| 68 | +- Clean-up any trailing whitespaces, use 4 spaces for indentation, brackets on the same line, `void * ptr`, `int & a` |
| 69 | +- Use sized integer types such as `int32_t` in the public API, e.g. `size_t` may also be appropriate for allocation sizes or byte offsets |
| 70 | +- Declare structs with `struct foo {}` instead of `typedef struct foo {} foo` |
| 71 | + - In C++ code omit optional `struct` and `enum` keyword whenever they are not necessary |
| 72 | + ```cpp |
| 73 | + // OK |
| 74 | + llama_context * ctx; |
| 75 | + const llama_rope_type rope_type; |
| 76 | + |
| 77 | + // not OK |
| 78 | + struct llama_context * ctx; |
| 79 | + const enum llama_rope_type rope_type; |
| 80 | + ``` |
| 81 | +
|
| 82 | + _(NOTE: this guideline is yet to be applied to the `whisper.cpp` codebase. New code should follow this guideline.)_ |
| 83 | +
|
| 84 | +- Try to follow the existing patterns in the code (indentation, spaces, etc.). In case of doubt use `clang-format` (from clang-tools v15+) to format the added code |
| 85 | +- For anything not covered in the current guidelines, refer to the [C++ Core Guidelines](https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines) |
| 86 | +- Tensors store data in row-major order. We refer to dimension 0 as columns, 1 as rows, 2 as matrices |
| 87 | +- Matrix multiplication is unconventional: [`C = ggml_mul_mat(ctx, A, B)`](https://github.com/ggml-org/llama.cpp/blob/880e352277fc017df4d5794f0c21c44e1eae2b84/ggml.h#L1058-L1064) means $C^T = A B^T \Leftrightarrow C = B A^T.$ |
| 88 | +
|
| 89 | + |
| 90 | +
|
| 91 | +# Naming guidelines |
| 92 | +
|
| 93 | +- Use `snake_case` for function, variable and type names |
| 94 | +- Naming usually optimizes for longest common prefix (see https://github.com/ggml-org/ggml/pull/302#discussion_r1243240963) |
| 95 | +
|
| 96 | + ```cpp |
| 97 | + // not OK |
| 98 | + int small_number; |
| 99 | + int big_number; |
| 100 | +
|
| 101 | + // OK |
| 102 | + int number_small; |
| 103 | + int number_big; |
| 104 | + ``` |
| 105 | +
|
| 106 | +- Enum values are always in upper case and prefixed with the enum name |
| 107 | +
|
| 108 | + ```cpp |
| 109 | + enum llama_vocab_type { |
| 110 | + LLAMA_VOCAB_TYPE_NONE = 0, |
| 111 | + LLAMA_VOCAB_TYPE_SPM = 1, |
| 112 | + LLAMA_VOCAB_TYPE_BPE = 2, |
| 113 | + LLAMA_VOCAB_TYPE_WPM = 3, |
| 114 | + LLAMA_VOCAB_TYPE_UGM = 4, |
| 115 | + LLAMA_VOCAB_TYPE_RWKV = 5, |
| 116 | + }; |
| 117 | + ``` |
| 118 | +
|
| 119 | +- The general naming pattern is `<class>_<method>`, with `<method>` being `<action>_<noun>` |
| 120 | +
|
| 121 | + ```cpp |
| 122 | + llama_model_init(); // class: "llama_model", method: "init" |
| 123 | + llama_sampler_chain_remove(); // class: "llama_sampler_chain", method: "remove" |
| 124 | + llama_sampler_get_seed(); // class: "llama_sampler", method: "get_seed" |
| 125 | + llama_set_embeddings(); // class: "llama_context", method: "set_embeddings" |
| 126 | + llama_n_threads(); // class: "llama_context", method: "n_threads" |
| 127 | + llama_adapter_lora_free(); // class: "llama_adapter_lora", method: "free" |
| 128 | + ``` |
| 129 | +
|
| 130 | + - The `get` `<action>` can be omitted |
| 131 | + - The `<noun>` can be omitted if not necessary |
| 132 | + - The `_context` suffix of the `<class>` is optional. Use it to disambiguate symbols when needed |
| 133 | + - Use `init`/`free` for constructor/destructor `<action>` |
| 134 | +
|
| 135 | +- Use the `_t` suffix when a type is supposed to be opaque to the user - it's not relevant to them if it is a struct or anything else |
| 136 | +
|
| 137 | + ```cpp |
| 138 | + typedef struct llama_context * llama_context_t; |
| 139 | +
|
| 140 | + enum llama_pooling_type llama_pooling_type(const llama_context_t ctx); |
| 141 | + ``` |
| 142 | +
|
| 143 | + _(NOTE: this guideline is yet to be applied to the `whisper.cpp` codebase. New code should follow this guideline)_ |
| 144 | +
|
| 145 | +- C/C++ filenames are all lowercase with dashes. Headers use the `.h` extension. Source files use the `.c` or `.cpp` extension |
| 146 | +- Python filenames are all lowercase with underscores |
| 147 | +
|
| 148 | +- _(TODO: abbreviations usage)_ |
| 149 | +
|
| 150 | +# Preprocessor directives |
| 151 | +
|
| 152 | +- _(TODO: add guidelines with examples and apply them to the codebase)_ |
| 153 | +
|
| 154 | + ```cpp |
| 155 | + #ifdef FOO |
| 156 | + #endif // FOO |
| 157 | + ``` |
| 158 | +
|
| 159 | +# Code maintenance |
| 160 | +
|
| 161 | +- New code should follow the guidelines (coding, naming, etc.) outlined in this document. Exceptions are allowed in isolated, backend-specific parts of the code that do not interface directly with the `ggml` interfaces. |
| 162 | + _(NOTE: for legacy reasons, existing code is not required to follow this guideline)_ |
| 163 | +
|
| 164 | +- For changes in server, please make sure to refer to the [server development documentation](./tools/server/README-dev.md) |
| 165 | +
|
| 166 | +# Documentation |
| 167 | +
|
| 168 | +- Documentation is a community effort |
| 169 | +- When you need to look into the source code to figure out how to use an API consider adding a short summary to the header file for future reference |
| 170 | +- When you notice incorrect or outdated documentation, please update it |
| 171 | +
|
| 172 | +# Resources |
| 173 | +
|
| 174 | +The Github issues, PRs and discussions contain a lot of information that can be useful to get familiar with the codebase. For convenience, some of the more important information is referenced from Github projects: |
| 175 | +
|
| 176 | +https://github.com/ggml-org/whisper.cpp/projects |
0 commit comments