You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/llamaindex-rag-axion/build-browser-rag-app.md
+12-15Lines changed: 12 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,17 +19,10 @@ You'll:
19
19
- Create a FastAPI backend
20
20
- Query documents directly from a web browser
21
21
22
-
## Terminal usage
23
-
24
-
You'll use:
25
-
26
-
-**Terminal A** → FastAPI, file creation, and testing
27
-
-**Terminal B** → Ollama server
28
-
29
-
Leave Terminal B running throughout the rest of this Learning Path.
30
-
31
22
## Architecture
32
23
24
+
The following diagram shows how the components interact. A request from the browser reaches FastAPI, which calls LlamaIndex to retrieve relevant chunks from ChromaDB and passes them to the Ollama local LLM for answer generation:
25
+
33
26
```text
34
27
Browser UI
35
28
↓
@@ -42,11 +35,11 @@ ChromaDB Vector Store
42
35
Ollama Local LLM
43
36
↓
44
37
Documents
45
-
````
38
+
```
46
39
47
40
## Activate the Python environment
48
41
49
-
Open Terminal A and activate the Python virtual environment:
42
+
Activate the Python virtual environment:
50
43
51
44
```bash
52
45
cd~/llamaindex-rag
@@ -320,9 +313,13 @@ EOF
320
313
321
314
## Start the browser-based RAG application
322
315
323
-
Make sure Ollama is still running in Terminal B.
316
+
Verify that Ollama is still running before starting the application:
317
+
318
+
```bash
319
+
sudo systemctl status ollama
320
+
```
324
321
325
-
In Terminal A run:
322
+
Activate the virtual environment and navigate to the project directory:
326
323
327
324
```bash
328
325
cd~/llamaindex-rag
@@ -389,13 +386,13 @@ Copy your own files into the data directory:
389
386
cp yourfile.txt ~/llamaindex-rag/data/
390
387
```
391
388
392
-
First stop the server and then restart FastAPI:
389
+
Stop the running FastAPI server by pressing `Ctrl+C` in the terminal where Uvicorn is running. Then restart it:
393
390
394
391
```bash
395
392
uvicorn api:app --host 0.0.0.0 --port 8000
396
393
```
397
394
398
-
The application automatically indexes the new documents and makes them searchable through the browser UI.
395
+
The `build_query_engine()` function runs on startup and reads all documents from the `data/` directory each time the server starts. Restarting the server causes LlamaIndex to ingest the new file, generate its embeddings, and store them in ChromaDB, making the new document searchable through the browser UI.
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/llamaindex-rag-axion/firewall.md
+5-7Lines changed: 5 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -28,22 +28,20 @@ To configure a firewall rule for the LlamaIndex browser-based RAG application:
28
28
29
29
5. Under **Protocols and ports**, select **Specified protocols and ports**.
30
30
31
-
6. Select the **TCP** checkbox and enter:
31
+
6. Select the **TCP** checkbox. Port **8000** is used by the FastAPI server that backs the browser-based LlamaIndex RAG application. Enter:
32
32
33
33
```text
34
34
8000
35
-
````
36
-
37
-
Use port mapping **8000** for the browser-based LlamaIndex RAG application running with FastAPI.
35
+
```
38
36
39
37

40
38
41
-
7. Also add port 22 in **TCP** checkbox for ssh access.
39
+
7.In the same **TCP**field, also add port `22` to allow SSH access to the VM.
42
40
43
41
8. Select **Create**.
44
42
45
43
## What you've accomplished and what's next
46
44
47
-
You've created a firewall rule to expose the browser-based LlamaIndex RAG application. You also enabled external access to query documents and interact with the application directly from a web browser.
45
+
You've created a firewall rule that exposes port 8000 for the browser-based LlamaIndex RAG application and port 22 for SSH. The firewall rule uses the network tag `allow-llamaindex-port`, which you'll attach to your virtual machine in the next step.
48
46
49
-
Next, you'll access the browser-based RAG application using the external IP address of your Google Cloud Axion virtual machine.
47
+
Next, you'll create a Google Cloud Axion C4A virtual machine and connect to it using SSH.
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/llamaindex-rag-axion/instance.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,7 +24,7 @@ To create a virtual machine based on the C4A instance type in the console:
24
24
5. Under **OS and storage**, select **Change** and then choose an Arm64-based operating system image. For this Learning Path, select **SUSE Linux Enterprise Server**.
25
25
6. For the license type, choose **Pay as you go**.
26
26
7. Increase **Size (GB)** from **10** to **100** to allocate sufficient disk space, and then click **Select**.
27
-
8. Select **Networking** from column on the left
27
+
8. Select **Networking** from the column on the left.
28
28
9. Under **Network tags**, enter `allow-llamaindex-port` to link the VM to the firewall rule from the previous step and allow inbound access to port 8000 for the browser-based LlamaIndex RAG application and port 22 for ssh access.
29
29
10. Select **Create** to launch the virtual machine.
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/llamaindex-rag-axion/setup-llamaindex-rag.md
+26-30Lines changed: 26 additions & 30 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,18 +6,16 @@ weight: 5
6
6
layout: learningpathall
7
7
---
8
8
9
-
## Install and Configure LlamaIndex on Google Cloud Axion
9
+
## Prepare the environment
10
10
11
11
In this section, you will prepare a Google Cloud Axion Arm64 VM for running a browser-based RAG application using LlamaIndex.
12
12
13
13
You will:
14
14
15
15
- Verify the VM architecture
16
16
- Install required system packages
17
-
- Install Docker
18
17
- Install Python 3.11
19
-
- Install Ollama
20
-
- Pull a lightweight LLM model
18
+
- Install Ollama and pull a lightweight LLM model
21
19
- Install LlamaIndex and required Python packages
22
20
23
21
@@ -26,16 +24,11 @@ You will:
26
24
```text
27
25
Cloud: Google Cloud Platform
28
26
VM Type: C4A Axion ARM64
29
-
OS: SUSE Linux Enterprise Server 15 SP6
27
+
OS: SUSE Linux Enterprise Server 15 SP5
30
28
Architecture: aarch64
31
29
RAM: 16 GB or higher recommended
32
30
```
33
31
34
-
## Terminal usage You'll use
35
-
36
-
-**Terminal A** → setup, package installation, FastAPI, and testing
37
-
-**Terminal B** → Ollama server Open both terminals connected to the VM before starting.
38
-
39
32
## Verify VM architecture
40
33
41
34
```bash
@@ -70,8 +63,9 @@ sudo zypper update -y
70
63
71
64
This ensures your system is up to date before installing anything.
72
65
73
-
## Install required packages:
74
-
Now install Python 3.11 and other tools:
66
+
## Install required packages
67
+
68
+
Install Python 3.11 and the build tools needed to compile Python packages with native extensions:
75
69
76
70
```bash
77
71
sudo zypper install -y \
@@ -92,7 +86,7 @@ python311-setuptools \
92
86
python311-wheel
93
87
```
94
88
95
-
**Verify Python:**
89
+
Verify Python is installed correctly:
96
90
97
91
```bash
98
92
python3.11 --version
@@ -105,15 +99,17 @@ Python 3.11.10
105
99
pip 22.3.1 from /usr/lib/python3.11/site-packages/pip (python 3.11)
106
100
```
107
101
108
-
## Install Docker and Add current user to Docker group
102
+
## Install Docker
103
+
104
+
Docker is installed here so that you can run containerized workloads alongside the RAG pipeline if needed. For this Learning Path, ChromaDB and Ollama run natively, but Docker is available for extended use.
109
105
110
106
```bash
111
107
sudo zypper install -y docker
112
108
sudo systemctl enable docker
113
109
sudo systemctl start docker
114
110
```
115
111
116
-
**Check Docker Add current user to Docker group:**
112
+
Verify Docker is running and add your user to the `docker`group so you don't need `sudo` for Docker commands:
117
113
118
114
```bash
119
115
sudo systemctl status docker
@@ -136,19 +132,21 @@ This message shows that your installation appears to be working correctly.
136
132
137
133
## Create project directory
138
134
135
+
Create a project directory and a Python virtual environment. The virtual environment isolates the Python packages for this project from your system packages:
136
+
139
137
```bash
140
138
mkdir -p ~/llamaindex-rag/data
141
139
cd~/llamaindex-rag
142
140
```
143
141
144
-
**Create and Activate Python virtual environment:**
142
+
Create and activate the Python virtual environment:
When Ollama is installed via the official script, it sets up a systemd background service and automatically starts the service. Use the following command to check the status of ollama service.
175
+
When installed using the official script, Ollama registers itself as a systemd service and starts automatically. Verify it is running:
178
176
179
177
```bash
180
178
sudo systemctl status ollama
181
179
```
182
180
183
-
Leave Terminal B open and don't run any other commands in it. Ollama must stay running throughout the rest of this Learning Path.
184
-
185
-
## Open a new terminal
186
-
187
-
Open a second SSH terminal and run:
181
+
If the service is not running, start it:
188
182
189
183
```bash
190
-
cd~/llamaindex-rag
191
-
source rag-env/bin/activate
184
+
sudo systemctl start ollama
192
185
```
193
186
194
187
## Pull an LLM model
195
188
189
+
With Ollama running, pull the `llama3.2:1b` model. This is a lightweight 1-billion parameter model suitable for local inference on a 16 GB VM:
190
+
196
191
```bash
197
192
ollama pull llama3.2:1b
198
193
```
199
194
200
-
**Test the model:**
195
+
Test that the model responds correctly:
201
196
202
197
```bash
203
198
ollama run llama3.2:1b "Explain RAG in one sentence."
@@ -206,12 +201,13 @@ ollama run llama3.2:1b "Explain RAG in one sentence."
206
201
The output is similar to:
207
202
208
203
```output
209
-
RAG (Resource Allocation Group) is a method of allocating resources, such as people or equipment, to tasks based on their criticality and urgency,
210
-
prioritizing high-priority tasks that have significant consequences if not completed on time.
204
+
Retrieval-Augmented Generation (RAG) is a technique that combines a retrieval step, which fetches relevant documents from a knowledge base, with a generation step, where a large language model uses those documents to produce a grounded, context-aware response.
211
205
```
212
206
213
207
## Install LlamaIndex packages
214
208
209
+
Install the LlamaIndex core library along with the integrations needed for Ollama, HuggingFace embeddings, and ChromaDB. FastAPI and Uvicorn are also installed here because the browser-based application you'll build in the next section uses them as the web server:
0 commit comments