You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/llamaindex-rag-axion/_index.md
+1-5Lines changed: 1 addition & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,9 +1,5 @@
1
1
---
2
-
title: Build RAG applications with LlamaIndex on Google Cloud C4A Axion VM
3
-
4
-
draft: true
5
-
cascade:
6
-
draft: true
2
+
title: Build RAG applications with LlamaIndex on a Google Cloud C4A Axion virtual machine
7
3
8
4
description: Set up LlamaIndex on Google Cloud C4A Axion Arm VMs running SUSE Linux to build browser-based Retrieval-Augmented Generation (RAG) applications using local LLMs, vector databases, and FastAPI.
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/llamaindex-rag-axion/background.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,7 +13,7 @@ The C4A series provides a cost-effective alternative to x86 virtual machines whi
13
13
14
14
## LlamaIndex for RAG and context-aware AI applications on Arm
15
15
16
-
LlamaIndex is an open-source framework designed to build context-aware AI applications using Large Language Models (LLMs). It's widely used for Retrieval-Augmented Generation (RAG), document indexing, vector search, semantic retrieval, and integrating custom data sources with LLMs.
16
+
LlamaIndex is an open-source framework designed to build context-aware AI applications using Large Language Models (LLMs). It's widely used for RAG, document indexing, vector search, semantic retrieval, and integrating custom data sources with LLMs.
17
17
18
18
LlamaIndex provides a unified framework with components such as:
19
19
@@ -31,4 +31,4 @@ Common use cases include browser-based AI assistants, document search applicatio
31
31
32
32
You've now learned about Google Axion C4A Arm-based virtual machines and their performance advantages for AI and RAG workloads. You were also introduced to core LlamaIndex components including document ingestion, indexing pipelines, query engines, vector stores, and LLM integrations.
33
33
34
-
Next, you'll create a firewall rule in Google Cloud Console to enable remote access to the browser-based LlamaIndex RAG application used in this Learning Path.
34
+
Next, you'll create a firewall rule in Google Cloud Console to enable remote access to the browser-based LlamaIndex RAG application that you'll create in this Learning Path.
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/llamaindex-rag-axion/build-browser-rag-app.md
+20-13Lines changed: 20 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,14 +1,14 @@
1
1
---
2
-
title: Build a Browser-Based RAG Application with LlamaIndex
2
+
title: Build and test a browser-based RAG application with LlamaIndex
3
3
weight: 6
4
4
5
5
### FIXED, DO NOT MODIFY
6
6
layout: learningpathall
7
7
---
8
8
9
-
## Build a Browser-Based RAG Application with LlamaIndex
9
+
## Build a browser-based RAG application
10
10
11
-
In this section, you'll build and test a browser-based Retrieval-Augmented Generation (RAG) application using LlamaIndex on Google Cloud Axion Arm64.
11
+
In this section, you'll build and test a browser-based Retrieval-Augmented Generation (RAG) application using LlamaIndex.
12
12
13
13
You'll:
14
14
@@ -19,9 +19,9 @@ You'll:
19
19
- Create a FastAPI backend
20
20
- Query documents directly from a web browser
21
21
22
-
##Architecture
22
+
### Application architecture
23
23
24
-
The following diagram shows how the components interact. A request from the browser reaches FastAPI, which calls LlamaIndex to retrieve relevant chunks from ChromaDB and passes them to the Ollama local LLM for answer generation:
24
+
The following flow shows how the application components interact. A request from the browser reaches FastAPI, which calls LlamaIndex to retrieve relevant chunks from ChromaDB and passes them to the Ollama local LLM for answer generation:
25
25
26
26
```text
27
27
Browser UI
@@ -37,7 +37,7 @@ Ollama Local LLM
37
37
Documents
38
38
```
39
39
40
-
## Activate the Python environment
40
+
###Activate the Python environment
41
41
42
42
Activate the Python virtual environment:
43
43
@@ -46,7 +46,7 @@ cd ~/llamaindex-rag
46
46
source rag-env/bin/activate
47
47
```
48
48
49
-
## Create sample documents
49
+
###Create sample documents
50
50
51
51
Create the first document:
52
52
@@ -72,7 +72,7 @@ LlamaIndex is a framework for building context-aware LLM applications using inde
72
72
EOF
73
73
```
74
74
75
-
## Create the RAG engine
75
+
###Create the RAG engine
76
76
77
77
Create the main LlamaIndex application:
78
78
@@ -151,7 +151,7 @@ def build_query_engine():
151
151
EOF
152
152
```
153
153
154
-
## Create browser UI
154
+
###Create browser UI
155
155
156
156
Create a browser-based interface for asking questions:
157
157
@@ -268,7 +268,7 @@ async function askQuestion() {
After starting the application, open the application UI and test the application to make sure it works.
347
+
348
+
### Open browser application UI
346
349
347
350
Open a browser and navigate to:
348
351
@@ -354,7 +357,7 @@ This opens the browser-based RAG application UI.
354
357
355
358

356
359
357
-
## Test browser-based Q&A
360
+
###Test browser-based Q&A
358
361
359
362
Ask the following questions in the browser UI:
360
363
@@ -380,7 +383,9 @@ The answers will appear directly in the browser interface.
380
383
381
384
## Add your own documents
382
385
383
-
Copy your own files into the data directory:
386
+
After confirming that the application works, you can try adding your own documents.
387
+
388
+
Copy your own files into the data directory. For example:
384
389
385
390
```bash
386
391
cp yourfile.txt ~/llamaindex-rag/data/
@@ -397,3 +402,5 @@ The `build_query_engine()` function runs on startup and reads all documents from
397
402
## What you've accomplished
398
403
399
404
You've successfully built a browser-based RAG application using LlamaIndex on a Google Cloud Axion Arm64 VM. You created sample documents, generated embeddings using HuggingFace models, stored vectors in ChromaDB, exposed the backend using FastAPI, and queried custom documents directly from a browser using Ollama.
405
+
406
+
You can extend this workflow for your own LlamaIndex RAG applications on Arm-based cloud infrastructure.
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/llamaindex-rag-axion/firewall.md
+5-9Lines changed: 5 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,24 +10,21 @@ layout: learningpathall
10
10
11
11
Create a firewall rule in Google Cloud Console to expose the required port for the browser-based LlamaIndex RAG application.
12
12
13
-
## Configure the firewall rule in Google Cloud Console
13
+
###Configure the firewall rule in Google Cloud Console
14
14
15
-
To configure a firewall rule for the LlamaIndex browser-based RAG application:
15
+
To configure a firewall rule:
16
16
17
-
1. Navigate to the [Google Cloud Console](https://console.cloud.google.com/), go to **VPC Network > Firewall**, and select **Create firewall rule**.
17
+
1. Navigate to the [Google Cloud Console](https://console.cloud.google.com/).
18
+
2. Go to **VPC Network > Firewall**, and select **Create firewall rule**.
18
19
19
20

20
21
21
-
2. Create a firewall rule that exposes the port required for the LlamaIndex browser application.
22
-
23
22
3. Set **Name** to `allow-llamaindex-port`, then select the network you want to bind to your virtual machine.
24
-
25
23
4. Set **Direction of traffic** to **Ingress**, set **Action on match** to **Allow**, set **Targets** to **All instances in the network**, and set **Source IPv4 ranges** to **0.0.0.0/0**.
26
24
27
25

28
26
29
27
5. Under **Protocols and ports**, select **Specified protocols and ports**.
30
-
31
28
6. Select the **TCP** checkbox. Port **8000** is used by the FastAPI server that backs the browser-based LlamaIndex RAG application. Enter:
32
29
33
30
```text
@@ -37,11 +34,10 @@ To configure a firewall rule for the LlamaIndex browser-based RAG application:
37
34

38
35
39
36
7. In the same **TCP** field, also add port `22` to allow SSH access to the VM.
40
-
41
37
8. Select **Create**.
42
38
43
39
## What you've accomplished and what's next
44
40
45
-
You've created a firewall rule that exposes port 8000 for the browser-based LlamaIndex RAG application and port 22 for SSH. The firewall rule uses the network tag `allow-llamaindex-port`, which you'll attach to your virtual machine in the next step.
41
+
You've now created a firewall rule that exposes port 8000 for the browser-based LlamaIndex RAG application and port 22 for SSH. The firewall rule uses the network tag `allow-llamaindex-port`, which you'll attach to your virtual machine in the next section.
46
42
47
43
Next, you'll create a Google Cloud Axion C4A virtual machine and connect to it using SSH.
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/llamaindex-rag-axion/instance.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,9 +8,9 @@ layout: learningpathall
8
8
9
9
## Set up the virtual machine
10
10
11
-
In this section, you'll create a Google Axion C4A Arm-based virtual machine (VM) on Google Cloud Platform (GCP). You'll use the `c4a-standard-4` machine type, which provides 4 vCPUs and 16 GB of memory. This VM will host your browser-based LlamaIndex RAG application.
11
+
In this section, you'll create a Google Axion C4A Arm-based virtual machine (VM). You'll use the `c4a-standard-4` machine type, which provides 4 vCPUs and 16 GB of memory. This VM will host your browser-based LlamaIndex RAG application.
12
12
13
-
## Configure the C4A virtual machine in Google Cloud Console
13
+
###Configure the C4A virtual machine in Google Cloud Console
14
14
15
15
To create a virtual machine based on the C4A instance type in the console:
16
16
@@ -25,7 +25,7 @@ To create a virtual machine based on the C4A instance type in the console:
25
25
6. For the license type, choose **Pay as you go**.
26
26
7. Increase **Size (GB)** from **10** to **100** to allocate sufficient disk space, and then click **Select**.
27
27
8. Select **Networking** from the column on the left.
28
-
9. Under **Network tags**, enter `allow-llamaindex-port` to link the VM to the firewall rule from the previous step and allow inbound access to port 8000 for the browser-based LlamaIndex RAG application and port 22 for ssh access.
28
+
9. Under **Network tags**, enter `allow-llamaindex-port` to link the VM to the firewall rule from the previous section and allow inbound access to port `8000` for the browser-based LlamaIndex RAG application and port `22` for ssh access.
29
29
10. Select **Create** to launch the virtual machine.
30
30
31
31
After the instance starts, select **SSH** next to the VM in the instance list to open a browser-based terminal session.
This ensures your system is up to date before installing anything.
65
30
66
-
## Install required packages
31
+
###Install required packages
67
32
68
33
Install Python 3.11 and the build tools needed to compile Python packages with native extensions:
69
34
@@ -99,9 +64,9 @@ Python 3.11.10
99
64
pip 22.3.1 from /usr/lib/python3.11/site-packages/pip (python 3.11)
100
65
```
101
66
102
-
## Install Docker
67
+
### (Optional) Install Docker
103
68
104
-
Docker is installed here so that you can run containerized workloads alongside the RAG pipeline if needed. For this Learning Path, ChromaDB and Ollama run natively, but Docker is available for extended use.
69
+
For this Learning Path, ChromaDB and Ollama run natively. For extended use, you can install Docker so that you can run containerized workloads alongside the RAG pipeline if needed:
105
70
106
71
```bash
107
72
sudo zypper install -y docker
@@ -117,7 +82,7 @@ sudo usermod -aG docker $USER
117
82
newgrp docker
118
83
```
119
84
120
-
**Test Docker:**
85
+
Test Docker:
121
86
122
87
```bash
123
88
docker run hello-world
@@ -130,7 +95,7 @@ Hello from Docker!
130
95
This message shows that your installation appears to be working correctly.
131
96
```
132
97
133
-
## Create project directory
98
+
###Create project directory
134
99
135
100
Create a project directory and a Python virtual environment. The virtual environment isolates the Python packages for this project from your system packages:
136
101
@@ -152,7 +117,9 @@ Upgrade pip to the latest version:
152
117
pip install --upgrade pip setuptools wheel
153
118
```
154
119
155
-
## Install Ollama
120
+
### Install Ollama
121
+
122
+
Run the following command:
156
123
157
124
```bash
158
125
curl -fsSL https://ollama.com/install.sh | sh
@@ -170,7 +137,7 @@ The output is similar to:
170
137
ollama version is 0.24.0
171
138
```
172
139
173
-
## Check Ollama is running
140
+
###Check Ollama is running
174
141
175
142
When installed using the official script, Ollama registers itself as a systemd service and starts automatically. Verify it is running:
176
143
@@ -184,7 +151,7 @@ If the service is not running, start it:
184
151
sudo systemctl start ollama
185
152
```
186
153
187
-
## Pull an LLM model
154
+
###Pull an LLM model
188
155
189
156
With Ollama running, pull the `llama3.2:1b` model. This is a lightweight 1-billion parameter model suitable for local inference on a 16 GB VM:
190
157
@@ -204,9 +171,9 @@ The output is similar to:
204
171
Retrieval-Augmented Generation (RAG) is a technique that combines a retrieval step, which fetches relevant documents from a knowledge base, with a generation step, where a large language model uses those documents to produce a grounded, context-aware response.
205
172
```
206
173
207
-
## Install LlamaIndex packages
174
+
###Install LlamaIndex packages
208
175
209
-
Install the LlamaIndex core library along with the integrations needed for Ollama, HuggingFace embeddings, and ChromaDB. FastAPI and Uvicorn are also installed here because the browser-based application you'll build in the next section uses them as the web server:
176
+
Install the LlamaIndex core library along with the integrations needed for Ollama, HuggingFace embeddings, and ChromaDB. You'll also install FastAPI and Uvicorn here because the browser-based application you'll build in the next section uses them as the web server:
210
177
211
178
```bash
212
179
pip install llama-index
@@ -221,6 +188,6 @@ pip install uvicorn
221
188
222
189
## What you've accomplished and what's next
223
190
224
-
You've successfully installed and configured LlamaIndex on a Google Cloud Axion Arm64 VM running SUSE Linux with Python 3.11. You installed Docker, configured Ollama for local LLM inference, and prepared the environment for building browser-based RAG applications using LlamaIndex and ChromaDB.
191
+
You've now successfully installed and configured LlamaIndex on a Google Cloud Axion Arm64 VM running SUSE Linux with Python 3.11. You optionally installed Docker, configured Ollama for local LLM inference, and prepared the environment for building browser-based RAG applications using LlamaIndex and ChromaDB.
225
192
226
193
Next, you'll build the RAG engine, create the browser UI, and query custom documents using a local large language model.
0 commit comments