You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+12-14Lines changed: 12 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@ Inference Vision Transformer (ViT) in plain C/C++ using ggml without any extra d
5
5
## Description
6
6
7
7
8
-
This project presents a standalone implementation of the well known Vision Transformer (ViT) model family, used in a broad spectrum of applications and SOTA models like Large Multimodal Modls(LMM). The primary goal is to develop a C/C++ inference engine tailored for ViT models, utilizing [ggml](https://github.com/ggerganov/ggml) to enhance performance, particularly on edge devices. Designed to be both lightweight and self-contained, this implementation can be run across diverse platforms.
8
+
This project presents a standalone implementation of the well known Vision Transformer (ViT) model family, used in a broad spectrum of applications and SOTA models like Large Multimodal Models(LMM). The primary goal is to develop a C/C++ inference engine tailored for ViT models, utilizing [ggml](https://github.com/ggerganov/ggml) to enhance performance, particularly on edge devices. Designed to be both lightweight and self-contained, this implementation can be run across diverse platforms.
9
9
10
10
<details>
11
11
<summary>Table of Contents</summary>
@@ -35,8 +35,7 @@ This project presents a standalone implementation of the well known Vision Trans
35
35
- 4-bit, 5-bit and 8-bit quantization support.
36
36
- Support for timm ViTs with different variants out of the box.
37
37
38
-
`vit.cpp` also has a short startup time compared to large ML frameworks, which makes it suitable for serverless deployments where the cold start is an issue.
39
-
38
+
An important aspect of using `vit.cpp` is that it has short startup times compared to common DL frameworks, which makes it suitable for serverless deployments where the cold start is an issue.
40
39
41
40
## Vision Transformer architecture
42
41
@@ -106,7 +105,7 @@ The implemented architecture is based on the original Vision Transformer from:
106
105
# install torch and timm
107
106
pip install torch timm
108
107
109
-
# list available models if needed, note that not all models are supported
108
+
# list available models if needed; note that not all models are supported
110
109
python convert-pth-to-ggml.py --list
111
110
112
111
# convert the weights to gguf : vit tiny with patch size of 16 and an image
@@ -186,12 +185,12 @@ You can efficiently run ViT inference on the CPU.
186
185
Memory requirements and inference speed on AMD Ryzen 7 3700U(4 cores, 8 threads) for both native PyTorch and `vit.cpp`.
187
186
Using 4 threads gives better results for my machine. The reported results of inference speed correspond to 10 runs averages for both PyTorch and `vit.cpp`.
188
187
189
-
| Model | Mem(PyTorch) | Mem | Speed(PyTorch) | Speed |
188
+
| Model |Max Mem(PyTorch) | Max Mem | Speed(PyTorch) | Speed |
0 commit comments