You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/llamaindex-rag-axion/_index.md
+2-6Lines changed: 2 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,11 +1,7 @@
1
1
---
2
-
title: Build RAG applications with LlamaIndex on Google Cloud C4A Axion VM
2
+
title: Build RAG applications with LlamaIndex on a Google Cloud C4A virtual machine
3
3
4
-
draft: true
5
-
cascade:
6
-
draft: true
7
-
8
-
description: Set up LlamaIndex on Google Cloud C4A Axion Arm VMs running SUSE Linux to build browser-based Retrieval-Augmented Generation (RAG) applications using local LLMs, vector databases, and FastAPI.
4
+
description: Set up LlamaIndex on Google Axion-based C4A Arm64 VMs running SUSE Linux to build browser-based Retrieval-Augmented Generation (RAG) applications using local LLMs, vector databases, and FastAPI.
title: Learn about LlamaIndex and Google Axion C4A for RAG applications
2
+
title: Learn about LlamaIndex and Google Cloud C4A for RAG applications
3
+
description: Learn how LlamaIndex supports browser-based RAG applications on Google Axion-based C4A instances.
3
4
weight: 2
4
5
5
6
layout: "learningpathall"
6
7
---
7
8
8
-
## Google Axion C4A Arm instances for AI and RAG workloads
9
+
## Google Cloud C4A instances for AI and RAG workloads
9
10
10
-
Google Axion C4A is a family of Arm-based virtual machines built on Google’s custom Axion CPU, which is based on Arm Neoverse V2 cores. Designed for high-performance and energy-efficient computing, these virtual machines offer strong performance for modern cloud workloads such as AI applications, vector databases, Retrieval-Augmented Generation (RAG) pipelines, and scalable inference services.
11
+
Google Cloud C4A is a family of Arm-based virtual machines (VMs) built on Google’s custom Axion CPU, which is based on Arm Neoverse V2 cores. Designed for high-performance and energy-efficient computing, these VMs offer strong performance for modern cloud workloads.
11
12
12
13
The C4A series provides a cost-effective alternative to x86 virtual machines while using the scalability and performance benefits of the Arm architecture in Google Cloud.
13
14
14
15
## LlamaIndex for RAG and context-aware AI applications on Arm
15
16
16
-
LlamaIndex is an open-source framework designed to build context-aware AI applications using Large Language Models (LLMs). It's widely used for Retrieval-Augmented Generation (RAG), document indexing, vector search, semantic retrieval, and integrating custom data sources with LLMs.
17
+
LlamaIndex is an open-source framework designed to build context-aware AI applications using large language models (LLMs). It's widely used for Retrieval-Augmented Generation (RAG), document indexing, vector search, semantic retrieval, and integrating custom data sources with LLMs.
17
18
18
19
LlamaIndex provides a unified framework with components such as:
19
20
20
-
* Document loaders for ingesting custom data
21
-
* Indexing pipelines for structured retrieval workflows
22
-
* Query engines for context-aware question answering
23
-
* Vector store integrations for scalable embedding search
24
-
* LLM integrations for generating grounded responses
21
+
- Document loaders for ingesting custom data
22
+
- Indexing pipelines for structured retrieval workflows
23
+
- Query engines for context-aware question answering
24
+
- Vector store integrations for scalable embedding search
25
+
- LLM integrations for generating grounded responses
25
26
26
-
Running LlamaIndex on Google Axion C4A Arm-based infrastructure enables efficient execution of AI and RAG workloads by using multi-core Arm CPUs and optimized memory performance. This results in improved performance per watt, reduced infrastructure costs, and better scalability for browser-based AI applications and local inference pipelines.
27
+
Running LlamaIndex on Google Cloud C4A Arm-based infrastructure enables efficient execution of AI and RAG workloads by using multi-core Arm CPUs and optimized memory performance. This results in improved performance per watt, reduced infrastructure costs, and better scalability for browser-based AI applications and local inference pipelines.
27
28
28
-
Common use cases include browser-based AI assistants, document search applications, semantic retrieval systems, vector database integrations, enterprise knowledge bases, and context-aware chatbot applications.
29
+
In this Learning Path, you'll use these components to build a browser-based RAG application that answers questions from custom documents.
29
30
30
31
## What you've learned and what's next
31
32
32
-
You've now learned about Google Axion C4A Arm-based virtual machines and their performance advantages for AI and RAG workloads. You were also introduced to core LlamaIndex components including document ingestion, indexing pipelines, query engines, vector stores, and LLM integrations.
33
+
You've now learned about Arm-based Google Cloud C4A VMs and their performance advantages for AI and RAG workloads. You were also introduced to core LlamaIndex components including document ingestion, indexing pipelines, query engines, vector stores, and LLM integrations.
33
34
34
-
Next, you'll create a firewall rule in Google Cloud Console to enable remote access to the browser-based LlamaIndex RAG application used in this Learning Path.
35
+
Next, you'll create a firewall rule in Google Cloud Console to enable remote access to the browser-based LlamaIndex RAG application that you'll create in this Learning Path.
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/llamaindex-rag-axion/build-browser-rag-app.md
+27-18Lines changed: 27 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,14 +1,15 @@
1
1
---
2
-
title: Build a Browser-Based RAG Application with LlamaIndex
2
+
title: Build and test a browser-based RAG application with LlamaIndex
3
+
description: Learn how to build a browser-based RAG application with LlamaIndex, ChromaDB, Ollama, and FastAPI on an Arm-based Google Cloud C4A VM.
3
4
weight: 6
4
5
5
6
### FIXED, DO NOT MODIFY
6
7
layout: learningpathall
7
8
---
8
9
9
-
## Build a Browser-Based RAG Application with LlamaIndex
10
+
## Build a browser-based RAG application
10
11
11
-
In this section, you'll build and test a browser-based Retrieval-Augmented Generation (RAG) application using LlamaIndex on Google Cloud Axion Arm64.
12
+
In this section, you'll build and test a browser-based Retrieval-Augmented Generation (RAG) application using LlamaIndex.
12
13
13
14
You'll:
14
15
@@ -19,9 +20,9 @@ You'll:
19
20
- Create a FastAPI backend
20
21
- Query documents directly from a web browser
21
22
22
-
##Architecture
23
+
### Application architecture
23
24
24
-
The following diagram shows how the components interact. A request from the browser reaches FastAPI, which calls LlamaIndex to retrieve relevant chunks from ChromaDB and passes them to the Ollama local LLM for answer generation:
25
+
The following flow shows how the application components interact. A request from the browser reaches FastAPI, which calls LlamaIndex to retrieve relevant chunks from ChromaDB and passes them to the Ollama local LLM for answer generation:
25
26
26
27
```text
27
28
Browser UI
@@ -30,14 +31,14 @@ FastAPI
30
31
↓
31
32
LlamaIndex
32
33
↓
33
-
ChromaDB Vector Store
34
+
ChromaDB vector store
34
35
↓
35
-
Ollama Local LLM
36
+
Ollama local LLM
36
37
↓
37
38
Documents
38
39
```
39
40
40
-
## Activate the Python environment
41
+
###Activate the Python environment
41
42
42
43
Activate the Python virtual environment:
43
44
@@ -46,7 +47,7 @@ cd ~/llamaindex-rag
46
47
source rag-env/bin/activate
47
48
```
48
49
49
-
## Create sample documents
50
+
###Create sample documents
50
51
51
52
Create the first document:
52
53
@@ -72,7 +73,7 @@ LlamaIndex is a framework for building context-aware LLM applications using inde
72
73
EOF
73
74
```
74
75
75
-
## Create the RAG engine
76
+
###Create the RAG engine
76
77
77
78
Create the main LlamaIndex application:
78
79
@@ -151,7 +152,7 @@ def build_query_engine():
151
152
EOF
152
153
```
153
154
154
-
## Create browser UI
155
+
###Create browser UI
155
156
156
157
Create a browser-based interface for asking questions:
157
158
@@ -268,7 +269,7 @@ async function askQuestion() {
268
269
EOF
269
270
```
270
271
271
-
## Create FastAPI backend
272
+
###Create FastAPI backend
272
273
273
274
Create the FastAPI backend application:
274
275
@@ -340,9 +341,13 @@ INFO: Waiting for application startup.
340
341
INFO: Application startup complete.
341
342
INFO: Uvicorn running on http://0.0.0.0:8000
342
343
```
344
+
Keep the terminal open for testing the application.
343
345
346
+
## Test the browser-based RAG application
344
347
345
-
## Open browser application
348
+
After starting the application, test it by opening the UI and asking a few questions.
349
+
350
+
### Open browser application UI
346
351
347
352
Open a browser and navigate to:
348
353
@@ -354,7 +359,7 @@ This opens the browser-based RAG application UI.
354
359
355
360

356
361
357
-
## Test browser-based Q&A
362
+
###Test browser-based Q&A
358
363
359
364
Ask the following questions in the browser UI:
360
365
@@ -378,15 +383,17 @@ What is Google Cloud Axion?
378
383
379
384
The answers will appear directly in the browser interface.
380
385
381
-
## Add your own documents
386
+
## (Optional) Add your own documents
387
+
388
+
After confirming that the application works, you can try adding your own documents.
382
389
383
-
Copy your own files into the data directory:
390
+
Copy your own files into the data directory. For example:
384
391
385
392
```bash
386
393
cp yourfile.txt ~/llamaindex-rag/data/
387
394
```
388
395
389
-
Stop the running FastAPI server by pressing `Ctrl+C` in the terminal where Uvicorn is running. Then restart it:
396
+
Stop the running FastAPI server by pressing `Ctrl + C` in the terminal where Uvicorn is running. Then restart it:
390
397
391
398
```bash
392
399
uvicorn api:app --host 0.0.0.0 --port 8000
@@ -396,4 +403,6 @@ The `build_query_engine()` function runs on startup and reads all documents from
396
403
397
404
## What you've accomplished
398
405
399
-
You've successfully built a browser-based RAG application using LlamaIndex on a Google Cloud Axion Arm64 VM. You created sample documents, generated embeddings using HuggingFace models, stored vectors in ChromaDB, exposed the backend using FastAPI, and queried custom documents directly from a browser using Ollama.
406
+
You've now built a browser-based RAG application using LlamaIndex on an Arm-based Google Cloud C4A VM. You created sample documents, generated embeddings using Hugging Face models, stored vectors in ChromaDB, exposed the backend using FastAPI, and queried custom documents directly from a browser using Ollama.
407
+
408
+
You can extend this workflow for your own LlamaIndex RAG applications on Arm-based cloud infrastructure.
title: Configure Google Cloud firewall rules for LlamaIndex
3
+
description: Learn how to create a Google Cloud firewall rule that allows browser access to a FastAPI-based LlamaIndex RAG application.
3
4
weight: 3
4
5
5
6
### FIXED, DO NOT MODIFY
@@ -8,40 +9,32 @@ layout: learningpathall
8
9
9
10
## Allow inbound access to the LlamaIndex browser application
10
11
11
-
Create a firewall rule in Google Cloud Console to expose the required port for the browser-based LlamaIndex RAG application.
12
+
Create a firewall rule in Google Cloud console to expose port 8000 for the browser-based LlamaIndex RAG application.
12
13
13
-
## Configure the firewall rule in Google Cloud Console
14
+
###Configure the firewall rule in the Google Cloud console
14
15
15
-
To configure a firewall rule for the LlamaIndex browser-based RAG application:
16
+
To configure a firewall rule:
16
17
17
-
1. Navigate to the [Google Cloud Console](https://console.cloud.google.com/), go to **VPC Network > Firewall**, and select **Create firewall rule**.
18
+
1. Navigate to the [Google Cloud console](https://console.cloud.google.com/).
19
+
2. Go to **VPC Network > Firewall**, and select **Create firewall rule**.
18
20
19
-

20
-
21
-
2. Create a firewall rule that exposes the port required for the LlamaIndex browser application.
21
+

22
22
23
23
3. Set **Name** to `allow-llamaindex-port`, then select the network you want to bind to your virtual machine.
24
-
25
24
4. Set **Direction of traffic** to **Ingress**, set **Action on match** to **Allow**, set **Targets** to **All instances in the network**, and set **Source IPv4 ranges** to **0.0.0.0/0**.
26
25
27
-

26
+

28
27
29
28
5. Under **Protocols and ports**, select **Specified protocols and ports**.
29
+
6. Select the **TCP** checkbox. For **Ports**, enter `8000`. Port `8000` is used by the FastAPI server that backs the browser-based LlamaIndex RAG application.
30
30
31
-
6. Select the **TCP** checkbox. Port **8000** is used by the FastAPI server that backs the browser-based LlamaIndex RAG application. Enter:
32
-
33
-
```text
34
-
8000
35
-
```
36
-
37
-

31
+

38
32
39
33
7. In the same **TCP** field, also add port `22` to allow SSH access to the VM.
40
-
41
34
8. Select **Create**.
42
35
43
36
## What you've accomplished and what's next
44
37
45
-
You've created a firewall rule that exposes port 8000 for the browser-based LlamaIndex RAG application and port 22 for SSH. The firewall rule uses the network tag `allow-llamaindex-port`, which you'll attach to your virtual machine in the next step.
38
+
You've now created a firewall rule that exposes port 8000 for the browser-based LlamaIndex RAG application and port 22 for SSH. You'll attach this firewall rule to your virtual machine in the next section.
46
39
47
-
Next, you'll create a Google Cloud Axion C4A virtual machine and connect to it using SSH.
40
+
Next, you'll create a Google Cloud C4A virtual machine and connect to it using SSH.
title: Create a Google Axion C4A virtual machine for LlamaIndex
2
+
title: Create a Google Cloud C4A virtual machine for LlamaIndex
3
+
description: Learn how to create an Arm-based Google Cloud C4A virtual machine powered by Google Axion and connect to it with browser-based SSH.
3
4
weight: 4
4
5
5
6
### FIXED, DO NOT MODIFY
@@ -8,36 +9,36 @@ layout: learningpathall
8
9
9
10
## Set up the virtual machine
10
11
11
-
In this section, you'll create a Google Axion C4A Arm-based virtual machine (VM) on Google Cloud Platform (GCP). You'll use the `c4a-standard-4` machine type, which provides 4 vCPUs and 16 GB of memory. This VM will host your browser-based LlamaIndex RAG application.
12
+
In this section, you'll create a Google Cloud C4A Arm-based virtual machine (VM). You'll use the `c4a-standard-4` machine type, which provides four vCPUs and 16 GB of memory. This VM will host your browser-based LlamaIndex RAG application.
12
13
13
-
## Configure the C4A virtual machine in Google Cloud Console
14
+
###Configure the C4A virtual machine in the Google Cloud console
14
15
15
16
To create a virtual machine based on the C4A instance type in the console:
16
17
17
-
1. Navigate to the [Google Cloud Console](https://console.cloud.google.com/).
18
-
2. Go to **Compute Engine** > **VM Instances** and select **Create Instance**.
18
+
1. Navigate to the [Google Cloud console](https://console.cloud.google.com/).
19
+
2. Go to **Compute Engine** > **VM instances** and select **Create instance**.
19
20
3. Under **Machine configuration**, populate fields such as **Instance name**, **Region**, and **Zone**.
20
21
4. Set **Series** to `C4A`, then select `c4a-standard-4` for **Machine type**.
21
22
22
-

23
+

23
24
24
25
5. Under **OS and storage**, select **Change** and then choose an Arm64-based operating system image. For this Learning Path, select **SUSE Linux Enterprise Server**.
25
26
6. For the license type, choose **Pay as you go**.
26
-
7. Increase **Size (GB)** from **10** to **100** to allocate sufficient disk space, and then click**Select**.
27
+
7. Increase **Size (GB)** from **10** to **100** to allocate sufficient disk space, and then select**Select**.
27
28
8. Select **Networking** from the column on the left.
28
-
9. Under **Network tags**, enter `allow-llamaindex-port` to link the VM to the firewall rule from the previous step and allow inbound access to port 8000 for the browser-based LlamaIndex RAG application and port 22 for ssh access.
29
+
9. Under **Network tags**, enter `allow-llamaindex-port` to link the VM to the firewall rule from the previous section and allow inbound access to port `8000` for the browser-based LlamaIndex RAG application and port `22` for SSH access.
29
30
10. Select **Create** to launch the virtual machine.
30
31
31
32
After the instance starts, select **SSH** next to the VM in the instance list to open a browser-based terminal session.
32
33
33
-

34
+

34
35
35
36
A new browser window opens with a terminal connected to your VM.
36
37
37
38

38
39
39
40
## What you've accomplished and what's next
40
41
41
-
You've now provisioned a Google Axion C4A Arm VM and connected to it using SSH.
42
+
You've now provisioned a Google Cloud C4A VM and connected to it using SSH.
42
43
43
44
Next, you'll install LlamaIndex, Ollama, ChromaDB, and the required dependencies on your VM.
0 commit comments