Skip to content

Commit 38c1cee

Browse files
authored
docs: update llama.cpp repo links from ggerganov to ggml-org (#10964)
1 parent 7015c16 commit 38c1cee

20 files changed

Lines changed: 20 additions & 20 deletions

File tree

docs-website/docs/pipeline-components/generators/llamacppchatgenerator.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ description: "`LlamaCppGenerator` enables chat completion using an LLM running o
2424

2525
## Overview
2626

27-
[Llama.cpp](https://github.com/ggerganov/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs).
27+
[Llama.cpp](https://github.com/ggml-org/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs).
2828

2929
`Llama.cpp` uses the quantized binary file of the LLM in GGUF format, which can be downloaded from [Hugging Face](https://huggingface.co/models?library=gguf). `LlamaCppChatGenerator` supports models running on `Llama.cpp` by taking the path to the locally saved GGUF file as `model` parameter at initialization.
3030

docs-website/docs/pipeline-components/generators/llamacppgenerator.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ description: "`LlamaCppGenerator` provides an interface to generate text using a
2424

2525
## Overview
2626

27-
[Llama.cpp](https://github.com/ggerganov/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs).
27+
[Llama.cpp](https://github.com/ggml-org/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs).
2828

2929
`Llama.cpp` uses the quantized binary file of the LLM in GGUF format that can be downloaded from [Hugging Face](https://huggingface.co/models?library=gguf). `LlamaCppGenerator` supports models running on `Llama.cpp` by taking the path to the locally saved GGUF file as `model` parameter at initialization.
3030

docs-website/versioned_docs/version-2.18/pipeline-components/generators/llamacppchatgenerator.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ description: "`LlamaCppGenerator` enables chat completion using an LLM running o
2020

2121
## Overview
2222

23-
[Llama.cpp](https://github.com/ggerganov/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs).
23+
[Llama.cpp](https://github.com/ggml-org/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs).
2424

2525
`Llama.cpp` uses the quantized binary file of the LLM in GGUF format, which can be downloaded from [Hugging Face](https://huggingface.co/models?library=gguf). `LlamaCppChatGenerator` supports models running on `Llama.cpp` by taking the path to the locally saved GGUF file as `model` parameter at initialization.
2626

docs-website/versioned_docs/version-2.18/pipeline-components/generators/llamacppgenerator.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ description: "`LlamaCppGenerator` provides an interface to generate text using a
2020

2121
## Overview
2222

23-
[Llama.cpp](https://github.com/ggerganov/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs).
23+
[Llama.cpp](https://github.com/ggml-org/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs).
2424

2525
`Llama.cpp` uses the quantized binary file of the LLM in GGUF format that can be downloaded from [Hugging Face](https://huggingface.co/models?library=gguf). `LlamaCppGenerator` supports models running on `Llama.cpp` by taking the path to the locally saved GGUF file as `model` parameter at initialization.
2626

docs-website/versioned_docs/version-2.19/pipeline-components/generators/llamacppchatgenerator.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ description: "`LlamaCppGenerator` enables chat completion using an LLM running o
2424

2525
## Overview
2626

27-
[Llama.cpp](https://github.com/ggerganov/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs).
27+
[Llama.cpp](https://github.com/ggml-org/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs).
2828

2929
`Llama.cpp` uses the quantized binary file of the LLM in GGUF format, which can be downloaded from [Hugging Face](https://huggingface.co/models?library=gguf). `LlamaCppChatGenerator` supports models running on `Llama.cpp` by taking the path to the locally saved GGUF file as `model` parameter at initialization.
3030

docs-website/versioned_docs/version-2.19/pipeline-components/generators/llamacppgenerator.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ description: "`LlamaCppGenerator` provides an interface to generate text using a
2424

2525
## Overview
2626

27-
[Llama.cpp](https://github.com/ggerganov/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs).
27+
[Llama.cpp](https://github.com/ggml-org/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs).
2828

2929
`Llama.cpp` uses the quantized binary file of the LLM in GGUF format that can be downloaded from [Hugging Face](https://huggingface.co/models?library=gguf). `LlamaCppGenerator` supports models running on `Llama.cpp` by taking the path to the locally saved GGUF file as `model` parameter at initialization.
3030

docs-website/versioned_docs/version-2.20/pipeline-components/generators/llamacppchatgenerator.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ description: "`LlamaCppGenerator` enables chat completion using an LLM running o
2424

2525
## Overview
2626

27-
[Llama.cpp](https://github.com/ggerganov/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs).
27+
[Llama.cpp](https://github.com/ggml-org/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs).
2828

2929
`Llama.cpp` uses the quantized binary file of the LLM in GGUF format, which can be downloaded from [Hugging Face](https://huggingface.co/models?library=gguf). `LlamaCppChatGenerator` supports models running on `Llama.cpp` by taking the path to the locally saved GGUF file as `model` parameter at initialization.
3030

docs-website/versioned_docs/version-2.20/pipeline-components/generators/llamacppgenerator.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ description: "`LlamaCppGenerator` provides an interface to generate text using a
2424

2525
## Overview
2626

27-
[Llama.cpp](https://github.com/ggerganov/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs).
27+
[Llama.cpp](https://github.com/ggml-org/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs).
2828

2929
`Llama.cpp` uses the quantized binary file of the LLM in GGUF format that can be downloaded from [Hugging Face](https://huggingface.co/models?library=gguf). `LlamaCppGenerator` supports models running on `Llama.cpp` by taking the path to the locally saved GGUF file as `model` parameter at initialization.
3030

docs-website/versioned_docs/version-2.21/pipeline-components/generators/llamacppchatgenerator.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ description: "`LlamaCppGenerator` enables chat completion using an LLM running o
2424

2525
## Overview
2626

27-
[Llama.cpp](https://github.com/ggerganov/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs).
27+
[Llama.cpp](https://github.com/ggml-org/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs).
2828

2929
`Llama.cpp` uses the quantized binary file of the LLM in GGUF format, which can be downloaded from [Hugging Face](https://huggingface.co/models?library=gguf). `LlamaCppChatGenerator` supports models running on `Llama.cpp` by taking the path to the locally saved GGUF file as `model` parameter at initialization.
3030

docs-website/versioned_docs/version-2.21/pipeline-components/generators/llamacppgenerator.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ description: "`LlamaCppGenerator` provides an interface to generate text using a
2424

2525
## Overview
2626

27-
[Llama.cpp](https://github.com/ggerganov/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs).
27+
[Llama.cpp](https://github.com/ggml-org/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs).
2828

2929
`Llama.cpp` uses the quantized binary file of the LLM in GGUF format that can be downloaded from [Hugging Face](https://huggingface.co/models?library=gguf). `LlamaCppGenerator` supports models running on `Llama.cpp` by taking the path to the locally saved GGUF file as `model` parameter at initialization.
3030

0 commit comments

Comments
 (0)