Skip to content

Commit 8a8cacf

Browse files
committed
[New] Build an AI Inferencing Solution With TensorRt and PyTorch
1 parent 09321fe commit 8a8cacf

1 file changed

Lines changed: 248 additions & 0 deletions

File tree

  • docs/guides/applications/big-data/ai-inferencing-with-tensorrt-and-pytorch
Lines changed: 248 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,248 @@
1+
---
2+
slug: ai-inferencing-with-tensorrt-and-pytorch
3+
title: "Build an AI Inferencing Solution With TensorRt and PyTorch"
4+
description: "Enhance deep learning capabilities with TensorRT and PyTorch on Akamai Cloud. Optimize inferencing for various AI models using NVIDIA RTX 4000 Ada GPU instances."
5+
authors: ["Akamai"]
6+
contributors: ["Akamai"]
7+
published: 2025-06-27
8+
keywords: ['ai','inference','inferencing','llm','model','pytorch','tensorrt','gpu','nvidia']
9+
license: '[CC BY-ND 4.0](https://creativecommons.org/licenses/by-nd/4.0)'
10+
external_resources:
11+
- '[Link Title 1](http://www.example.com)'
12+
- '[Link Title 2](http://www.example.net)'
13+
---
14+
15+
AI inference workloads are increasingly demanding, requiring low latency, high throughput, and cost-efficiency at scale. Whether working with computer vision or natural language AI models, processing power and efficiency are key; inference workloads must be able to handle real-time predictions while maintaining optimal resource utilization. Choosing the right infrastructure and optimization tools can dramatically impact both performance and operational costs.
16+
17+
This guide shows how to build and benchmark a complete AI inferencing solution using TensorRT and PyTorch on Akamai Cloud's NVIDIA RTX 4000 Ada GPU instances. NVIDIA RTX 4000 Ada GPU instances are available across global core compute regions, delivering the specialized hardware required for heavy AI workloads. Using the steps in this guide, you can:
18+
19+
- Deploy an RTX 4000 Ada GPU instance using Akamai Cloud infrastructure
20+
- Run an AI inference workload using PyTorch
21+
- Optimize your model with TensorRT for performance gains
22+
- Measure latency and throughput
23+
24+
The primary AI model used in this guide is a ResNet50 computer vision (CV) model. However, the techniques used can be applied to other model architectures like object detection ([YOLO](https://en.wikipedia.org/wiki/You_Only_Look_Once); You Only Look Once) models, speech recognition systems (OpenAI's [Whisper](https://openai.com/index/whisper/)), and large language models (LLMs) like [ChatGPT](https://openai.com/index/chatgpt/), [Llama](https://www.llama.com/), or [Claude](https://www.anthropic.com/claude).
25+
26+
## What are TensorRT and PyTorch?
27+
28+
### TensorRt
29+
30+
31+
32+
### PyTorch
33+
34+
35+
36+
## Before You Begin
37+
38+
The following prerequisites are recommended before starting the implementation steps in this tutorial:
39+
40+
- An Akamai Cloud account with the ability to deploy GPU instances
41+
- The [Linode CLI](https://techdocs.akamai.com/cloud-computing/docs/getting-started-with-the-linode-cli) configured with proper permissions
42+
- An understanding of Python virtual environments and package management
43+
- General familiarity of deep learning concepts and models
44+
45+
{{< note >}}
46+
This guide is written for a non-root user. Commands that require elevated privileges are prefixed with `sudo`. If you’re not familiar with the `sudo` command, see our [Users and Groups](https://www.linode.com/docs/guides/linux-users-and-groups/) doc.
47+
{{< /note >}}
48+
49+
## Deploy an NVIDIA RTX 4000 Ada Instance
50+
51+
Akamai's NVIDIA RTX 4000 Ada GPU instances can be deployed using Cloud Manager or the Linode CLI.
52+
53+
### Deploy Using Cloud Manager
54+
55+
56+
### Deploy Using the Linode CLI
57+
58+
59+
60+
## Set Up Your Development Environment
61+
62+
Once it is fully deployed, connect to your GPU instance to update system packages and install system dependencies.
63+
64+
### Update Packages
65+
66+
1. Log into your instance via SSH:
67+
68+
```command
69+
ssh user@{{< placeholder "IP_ADDRESS" >}}
70+
```
71+
72+
1. Update your system and install build tools and system dependencies:
73+
74+
```command
75+
sudo apt update && sudo apt install -y \
76+
build-essential \
77+
gcc \
78+
wget \
79+
gnupg \
80+
software-properties-common \
81+
python3-pip \
82+
python3-venv
83+
```
84+
85+
1. Download and install NVIDIA CUDA keyring so you get the latest stable drivers and toolkits:
86+
87+
```command
88+
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
89+
sudo dpkg -i cuda-keyring_1.1-1_all.deb
90+
```
91+
92+
1. Update system packages after the keyring is installed:
93+
94+
```command
95+
sudo apt update
96+
```
97+
98+
### Install NVIDIA Drivers and CUDA Toolkit
99+
100+
1. Install the NVIDIA driver repository along with the latest drivers compatible with the RTX 4000 Ada card:
101+
102+
```command
103+
sudo apt install -y cuda
104+
```
105+
106+
1. Reboot your instance to complete installation of the driver:
107+
108+
```command
109+
sudo reboot
110+
```
111+
112+
1. After the reboot is complete, log back into your instance:
113+
114+
```command
115+
ssh user@{{< placeholder "IP_ADDRESS" >}}
116+
```
117+
118+
1. Use the following command to verify successful driver installation:
119+
120+
```command
121+
nvidia-smi
122+
```
123+
124+
You should see basic information about your RTX 4000 Ada instance and its driver version:
125+
126+
```output
127+
128+
```
129+
130+
## Configure Your Python Environment
131+
132+
Set up and use a Python Virtual Environment (venv) so that you can isolate Python packages and prevent conflicts with system-wide packages and across projects.
133+
134+
### Create the Virtual Environment
135+
136+
1. Using the python3-venv package downloaded during setup, set up the Python Virtual Environment:
137+
138+
```command
139+
python3 -m venv ~/venv
140+
source ~/venv/bin/activate
141+
```
142+
143+
1. Upgrade pip to the latest version to complete the setup:
144+
145+
```command
146+
pip install --upgrade pip
147+
```
148+
149+
### Install PyTorch and TensorRT
150+
151+
1. While using your virtual environment, install PyTorch, TensorRT, and dependencies. These are the primary AI libraries needed to run your inference workloads.
152+
153+
```command
154+
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
155+
pip install requests
156+
pip install nvidia-pyindex
157+
pip install nvidia-tensorrt
158+
pip install torch-tensorrt -U
159+
```
160+
161+
## Create a Benchmark Using the ResNet50 Inference Model
162+
163+
Create and run a Python script using a pre-trained ResNet50 computer vision model. Running this script tests to make sure the environment is configured correctly while providing a way to evaluate GPU performance using a real-world example. This example script is a foundation that can be adapted for other inference model architectures.
164+
165+
1. Using a text editor such as nano, create the Python script file. Replace {{< placeholder "inference_test.py" >}} with a script tile name of your choosing:
166+
167+
```command
168+
nano {{< placeholder "inference_test.py" >}}
169+
```
170+
171+
1. Copy and insert the following code content into the script. Note the commented descriptions for what each section of code performs:
172+
173+
```file {title="inference_test.py"}
174+
# import PyTorch, pre-trained models from torchvision and image utilities
175+
176+
import torch
177+
import torchvision.models as models
178+
import torchvision.transforms as transforms
179+
from PIL import Image
180+
import requests
181+
from io import BytesIO
182+
import time
183+
184+
# Download a sample image of a dog
185+
# You could replace this with a local file or different URL
186+
187+
img_url = "https://github.com/pytorch/hub/raw/master/images/dog.jpg"
188+
image = Image.open(BytesIO(requests.get(img_url).content))
189+
190+
# Preprocess
191+
# Resize and crop to match ResNet50’s input size
192+
# ResNet50 is trained on ImageNet where inputs are 224sx224 RGB
193+
# Convert to a tensor array so PyTorch can understand it
194+
# Use unsqueeze(0) to add a batch dimension, tricks model to think we are sending a batch of # images
195+
# Use cuda() to move the data to the GPU
196+
197+
transform = transforms.Compose([
198+
transforms.Resize(256),
199+
transforms.CenterCrop(224),
200+
transforms.ToTensor(),
201+
])
202+
input_tensor = transform(image).unsqueeze(0).cuda()
203+
204+
# Load a model (ResNet50) pretrained on the ImageNet dataset containing millions of images
205+
206+
model = models.resnet50(pretrained=True).cuda().eval()
207+
208+
# Warm-up the GPU
209+
# Allows the GPU to optimize the necessary kernels prior to running the benchmark
210+
211+
for _ in range(5):
212+
_ = model(input_tensor)
213+
214+
# Benchmark Inference Time using an average time across 20 inference runs
215+
216+
start = time.time()
217+
with torch.no_grad():
218+
for _ in range(20):
219+
_ = model(input_tensor)
220+
end = time.time()
221+
222+
print(f"Average inference time: {(end - start) / 20:.4f} seconds")
223+
```
224+
225+
When complete, press <kbd>Ctrl</kbd> + <kbd>X</kbd> to exit nano, <kbd>Y</kbd> to save, and <kbd>Enter</kbd> to confirm.
226+
227+
1. Run the Python script:
228+
229+
```command
230+
python inference_test.py
231+
```
232+
233+
If everything works correctly, you should see output similar to the below. Time results may vary:
234+
235+
```output
236+
Average inference time: 0.0025 seconds
237+
```
238+
239+
We recommend timing how long it takes to run the model 20 times, and then divide by 20 to get the average time per inference. This should give you an idea of how quickly your GPU can process input using this model.
240+
241+
## Next Steps
242+
243+
Try switching out ResNet50 for different model architectures available in torchvision.models, such as:
244+
245+
- `efficientnet_b0`: Lightweight and accurate
246+
- `vit_b_16`: Vision Transformer model for experimenting with newer architectures
247+
248+
This can help you see how model complexity affects speed and accuracy.

0 commit comments

Comments
 (0)