Skip to content

Commit a065dff

Browse files
authored
Merge pull request #3378 from anupras-mohapatra-arm/servers-and-cloud-computing
LlamaIndex RAG app on Google Cloud C4A LP editorial review
2 parents b797aa1 + 99206c2 commit a065dff

6 files changed

Lines changed: 83 additions & 151 deletions

File tree

content/learning-paths/servers-and-cloud-computing/llamaindex-rag-axion/_index.md

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,7 @@
11
---
2-
title: Build RAG applications with LlamaIndex on Google Cloud C4A Axion VM
2+
title: Build RAG applications with LlamaIndex on a Google Cloud C4A virtual machine
33

4-
draft: true
5-
cascade:
6-
draft: true
7-
8-
description: Set up LlamaIndex on Google Cloud C4A Axion Arm VMs running SUSE Linux to build browser-based Retrieval-Augmented Generation (RAG) applications using local LLMs, vector databases, and FastAPI.
4+
description: Set up LlamaIndex on Google Axion-based C4A Arm64 VMs running SUSE Linux to build browser-based Retrieval-Augmented Generation (RAG) applications using local LLMs, vector databases, and FastAPI.
95

106
minutes_to_complete: 30
117

Lines changed: 14 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,34 +1,35 @@
11
---
2-
title: Learn about LlamaIndex and Google Axion C4A for RAG applications
2+
title: Learn about LlamaIndex and Google Cloud C4A for RAG applications
3+
description: Learn how LlamaIndex supports browser-based RAG applications on Google Axion-based C4A instances.
34
weight: 2
45

56
layout: "learningpathall"
67
---
78

8-
## Google Axion C4A Arm instances for AI and RAG workloads
9+
## Google Cloud C4A instances for AI and RAG workloads
910

10-
Google Axion C4A is a family of Arm-based virtual machines built on Google’s custom Axion CPU, which is based on Arm Neoverse V2 cores. Designed for high-performance and energy-efficient computing, these virtual machines offer strong performance for modern cloud workloads such as AI applications, vector databases, Retrieval-Augmented Generation (RAG) pipelines, and scalable inference services.
11+
Google Cloud C4A is a family of Arm-based virtual machines (VMs) built on Google’s custom Axion CPU, which is based on Arm Neoverse V2 cores. Designed for high-performance and energy-efficient computing, these VMs offer strong performance for modern cloud workloads.
1112

1213
The C4A series provides a cost-effective alternative to x86 virtual machines while using the scalability and performance benefits of the Arm architecture in Google Cloud.
1314

1415
## LlamaIndex for RAG and context-aware AI applications on Arm
1516

16-
LlamaIndex is an open-source framework designed to build context-aware AI applications using Large Language Models (LLMs). It's widely used for Retrieval-Augmented Generation (RAG), document indexing, vector search, semantic retrieval, and integrating custom data sources with LLMs.
17+
LlamaIndex is an open-source framework designed to build context-aware AI applications using large language models (LLMs). It's widely used for Retrieval-Augmented Generation (RAG), document indexing, vector search, semantic retrieval, and integrating custom data sources with LLMs.
1718

1819
LlamaIndex provides a unified framework with components such as:
1920

20-
* Document loaders for ingesting custom data
21-
* Indexing pipelines for structured retrieval workflows
22-
* Query engines for context-aware question answering
23-
* Vector store integrations for scalable embedding search
24-
* LLM integrations for generating grounded responses
21+
- Document loaders for ingesting custom data
22+
- Indexing pipelines for structured retrieval workflows
23+
- Query engines for context-aware question answering
24+
- Vector store integrations for scalable embedding search
25+
- LLM integrations for generating grounded responses
2526

26-
Running LlamaIndex on Google Axion C4A Arm-based infrastructure enables efficient execution of AI and RAG workloads by using multi-core Arm CPUs and optimized memory performance. This results in improved performance per watt, reduced infrastructure costs, and better scalability for browser-based AI applications and local inference pipelines.
27+
Running LlamaIndex on Google Cloud C4A Arm-based infrastructure enables efficient execution of AI and RAG workloads by using multi-core Arm CPUs and optimized memory performance. This results in improved performance per watt, reduced infrastructure costs, and better scalability for browser-based AI applications and local inference pipelines.
2728

28-
Common use cases include browser-based AI assistants, document search applications, semantic retrieval systems, vector database integrations, enterprise knowledge bases, and context-aware chatbot applications.
29+
In this Learning Path, you'll use these components to build a browser-based RAG application that answers questions from custom documents.
2930

3031
## What you've learned and what's next
3132

32-
You've now learned about Google Axion C4A Arm-based virtual machines and their performance advantages for AI and RAG workloads. You were also introduced to core LlamaIndex components including document ingestion, indexing pipelines, query engines, vector stores, and LLM integrations.
33+
You've now learned about Arm-based Google Cloud C4A VMs and their performance advantages for AI and RAG workloads. You were also introduced to core LlamaIndex components including document ingestion, indexing pipelines, query engines, vector stores, and LLM integrations.
3334

34-
Next, you'll create a firewall rule in Google Cloud Console to enable remote access to the browser-based LlamaIndex RAG application used in this Learning Path.
35+
Next, you'll create a firewall rule in Google Cloud Console to enable remote access to the browser-based LlamaIndex RAG application that you'll create in this Learning Path.

content/learning-paths/servers-and-cloud-computing/llamaindex-rag-axion/build-browser-rag-app.md

Lines changed: 27 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,15 @@
11
---
2-
title: Build a Browser-Based RAG Application with LlamaIndex
2+
title: Build and test a browser-based RAG application with LlamaIndex
3+
description: Learn how to build a browser-based RAG application with LlamaIndex, ChromaDB, Ollama, and FastAPI on an Arm-based Google Cloud C4A VM.
34
weight: 6
45

56
### FIXED, DO NOT MODIFY
67
layout: learningpathall
78
---
89

9-
## Build a Browser-Based RAG Application with LlamaIndex
10+
## Build a browser-based RAG application
1011

11-
In this section, you'll build and test a browser-based Retrieval-Augmented Generation (RAG) application using LlamaIndex on Google Cloud Axion Arm64.
12+
In this section, you'll build and test a browser-based Retrieval-Augmented Generation (RAG) application using LlamaIndex.
1213

1314
You'll:
1415

@@ -19,9 +20,9 @@ You'll:
1920
- Create a FastAPI backend
2021
- Query documents directly from a web browser
2122

22-
## Architecture
23+
### Application architecture
2324

24-
The following diagram shows how the components interact. A request from the browser reaches FastAPI, which calls LlamaIndex to retrieve relevant chunks from ChromaDB and passes them to the Ollama local LLM for answer generation:
25+
The following flow shows how the application components interact. A request from the browser reaches FastAPI, which calls LlamaIndex to retrieve relevant chunks from ChromaDB and passes them to the Ollama local LLM for answer generation:
2526

2627
```text
2728
Browser UI
@@ -30,14 +31,14 @@ FastAPI
3031
3132
LlamaIndex
3233
33-
ChromaDB Vector Store
34+
ChromaDB vector store
3435
35-
Ollama Local LLM
36+
Ollama local LLM
3637
3738
Documents
3839
```
3940

40-
## Activate the Python environment
41+
### Activate the Python environment
4142

4243
Activate the Python virtual environment:
4344

@@ -46,7 +47,7 @@ cd ~/llamaindex-rag
4647
source rag-env/bin/activate
4748
```
4849

49-
## Create sample documents
50+
### Create sample documents
5051

5152
Create the first document:
5253

@@ -72,7 +73,7 @@ LlamaIndex is a framework for building context-aware LLM applications using inde
7273
EOF
7374
```
7475

75-
## Create the RAG engine
76+
### Create the RAG engine
7677

7778
Create the main LlamaIndex application:
7879

@@ -151,7 +152,7 @@ def build_query_engine():
151152
EOF
152153
```
153154

154-
## Create browser UI
155+
### Create browser UI
155156

156157
Create a browser-based interface for asking questions:
157158

@@ -268,7 +269,7 @@ async function askQuestion() {
268269
EOF
269270
```
270271

271-
## Create FastAPI backend
272+
### Create FastAPI backend
272273

273274
Create the FastAPI backend application:
274275

@@ -340,9 +341,13 @@ INFO: Waiting for application startup.
340341
INFO: Application startup complete.
341342
INFO: Uvicorn running on http://0.0.0.0:8000
342343
```
344+
Keep the terminal open for testing the application.
343345

346+
## Test the browser-based RAG application
344347

345-
## Open browser application
348+
After starting the application, test it by opening the UI and asking a few questions.
349+
350+
### Open browser application UI
346351

347352
Open a browser and navigate to:
348353

@@ -354,7 +359,7 @@ This opens the browser-based RAG application UI.
354359

355360
![Browser-based RAG application showing a question input box and generated response using LlamaIndex and Ollama#center](images/rag-browser.png "Browser-based LlamaIndex RAG application")
356361

357-
## Test browser-based Q&A
362+
### Test browser-based Q&A
358363

359364
Ask the following questions in the browser UI:
360365

@@ -378,15 +383,17 @@ What is Google Cloud Axion?
378383

379384
The answers will appear directly in the browser interface.
380385

381-
## Add your own documents
386+
## (Optional) Add your own documents
387+
388+
After confirming that the application works, you can try adding your own documents.
382389

383-
Copy your own files into the data directory:
390+
Copy your own files into the data directory. For example:
384391

385392
```bash
386393
cp yourfile.txt ~/llamaindex-rag/data/
387394
```
388395

389-
Stop the running FastAPI server by pressing `Ctrl+C` in the terminal where Uvicorn is running. Then restart it:
396+
Stop the running FastAPI server by pressing `Ctrl + C` in the terminal where Uvicorn is running. Then restart it:
390397

391398
```bash
392399
uvicorn api:app --host 0.0.0.0 --port 8000
@@ -396,4 +403,6 @@ The `build_query_engine()` function runs on startup and reads all documents from
396403

397404
## What you've accomplished
398405

399-
You've successfully built a browser-based RAG application using LlamaIndex on a Google Cloud Axion Arm64 VM. You created sample documents, generated embeddings using HuggingFace models, stored vectors in ChromaDB, exposed the backend using FastAPI, and queried custom documents directly from a browser using Ollama.
406+
You've now built a browser-based RAG application using LlamaIndex on an Arm-based Google Cloud C4A VM. You created sample documents, generated embeddings using Hugging Face models, stored vectors in ChromaDB, exposed the backend using FastAPI, and queried custom documents directly from a browser using Ollama.
407+
408+
You can extend this workflow for your own LlamaIndex RAG applications on Arm-based cloud infrastructure.
Lines changed: 12 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
---
22
title: Configure Google Cloud firewall rules for LlamaIndex
3+
description: Learn how to create a Google Cloud firewall rule that allows browser access to a FastAPI-based LlamaIndex RAG application.
34
weight: 3
45

56
### FIXED, DO NOT MODIFY
@@ -8,40 +9,32 @@ layout: learningpathall
89

910
## Allow inbound access to the LlamaIndex browser application
1011

11-
Create a firewall rule in Google Cloud Console to expose the required port for the browser-based LlamaIndex RAG application.
12+
Create a firewall rule in Google Cloud console to expose port 8000 for the browser-based LlamaIndex RAG application.
1213

13-
## Configure the firewall rule in Google Cloud Console
14+
### Configure the firewall rule in the Google Cloud console
1415

15-
To configure a firewall rule for the LlamaIndex browser-based RAG application:
16+
To configure a firewall rule:
1617

17-
1. Navigate to the [Google Cloud Console](https://console.cloud.google.com/), go to **VPC Network > Firewall**, and select **Create firewall rule**.
18+
1. Navigate to the [Google Cloud console](https://console.cloud.google.com/).
19+
2. Go to **VPC Network > Firewall**, and select **Create firewall rule**.
1820

19-
![Google Cloud Console VPC Network Firewall page showing the Create firewall rule button in the top menu bar#center](images/firewall-rule.png "Create a firewall rule in Google Cloud Console")
20-
21-
2. Create a firewall rule that exposes the port required for the LlamaIndex browser application.
21+
![Google Cloud console VPC Network Firewall page showing the Create firewall rule button in the top menu bar#center](images/firewall-rule.png "Create a firewall rule in Google Cloud console")
2222

2323
3. Set **Name** to `allow-llamaindex-port`, then select the network you want to bind to your virtual machine.
24-
2524
4. Set **Direction of traffic** to **Ingress**, set **Action on match** to **Allow**, set **Targets** to **All instances in the network**, and set **Source IPv4 ranges** to **0.0.0.0/0**.
2625

27-
![Google Cloud Console Create firewall rule form with Name set to allow-llamaindex-port and Direction of traffic set to Ingress#center](images/network-rule.png "Configuring the allow-llamaindex-port firewall rule")
26+
![Google Cloud console Create firewall rule form with Name set to allow-llamaindex-port and Direction of traffic set to Ingress#center](images/network-rule.png "Configuring the allow-llamaindex-port firewall rule")
2827

2928
5. Under **Protocols and ports**, select **Specified protocols and ports**.
29+
6. Select the **TCP** checkbox. For **Ports**, enter `8000`. Port `8000` is used by the FastAPI server that backs the browser-based LlamaIndex RAG application.
3030

31-
6. Select the **TCP** checkbox. Port **8000** is used by the FastAPI server that backs the browser-based LlamaIndex RAG application. Enter:
32-
33-
```text
34-
8000
35-
```
36-
37-
![Google Cloud Console Protocols and ports section with TCP selected and port 8000 entered#center](images/network-port.png "Setting the LlamaIndex browser application port in the firewall rule")
31+
![Google Cloud console Protocols and ports section with TCP selected and port 8000 entered#center](images/network-port.png "Setting the LlamaIndex browser application port in the firewall rule")
3832

3933
7. In the same **TCP** field, also add port `22` to allow SSH access to the VM.
40-
4134
8. Select **Create**.
4235

4336
## What you've accomplished and what's next
4437

45-
You've created a firewall rule that exposes port 8000 for the browser-based LlamaIndex RAG application and port 22 for SSH. The firewall rule uses the network tag `allow-llamaindex-port`, which you'll attach to your virtual machine in the next step.
38+
You've now created a firewall rule that exposes port 8000 for the browser-based LlamaIndex RAG application and port 22 for SSH. You'll attach this firewall rule to your virtual machine in the next section.
4639

47-
Next, you'll create a Google Cloud Axion C4A virtual machine and connect to it using SSH.
40+
Next, you'll create a Google Cloud C4A virtual machine and connect to it using SSH.
Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
---
2-
title: Create a Google Axion C4A virtual machine for LlamaIndex
2+
title: Create a Google Cloud C4A virtual machine for LlamaIndex
3+
description: Learn how to create an Arm-based Google Cloud C4A virtual machine powered by Google Axion and connect to it with browser-based SSH.
34
weight: 4
45

56
### FIXED, DO NOT MODIFY
@@ -8,36 +9,36 @@ layout: learningpathall
89

910
## Set up the virtual machine
1011

11-
In this section, you'll create a Google Axion C4A Arm-based virtual machine (VM) on Google Cloud Platform (GCP). You'll use the `c4a-standard-4` machine type, which provides 4 vCPUs and 16 GB of memory. This VM will host your browser-based LlamaIndex RAG application.
12+
In this section, you'll create a Google Cloud C4A Arm-based virtual machine (VM). You'll use the `c4a-standard-4` machine type, which provides four vCPUs and 16 GB of memory. This VM will host your browser-based LlamaIndex RAG application.
1213

13-
## Configure the C4A virtual machine in Google Cloud Console
14+
### Configure the C4A virtual machine in the Google Cloud console
1415

1516
To create a virtual machine based on the C4A instance type in the console:
1617

17-
1. Navigate to the [Google Cloud Console](https://console.cloud.google.com/).
18-
2. Go to **Compute Engine** > **VM Instances** and select **Create Instance**.
18+
1. Navigate to the [Google Cloud console](https://console.cloud.google.com/).
19+
2. Go to **Compute Engine** > **VM instances** and select **Create instance**.
1920
3. Under **Machine configuration**, populate fields such as **Instance name**, **Region**, and **Zone**.
2021
4. Set **Series** to `C4A`, then select `c4a-standard-4` for **Machine type**.
2122

22-
![Screenshot of the Google Cloud Console showing the Machine configuration section. The Series dropdown is set to C4A and the machine type c4a-standard-4 is selected#center](images/gcp-vm.png "Configuring machine type to C4A in Google Cloud Console")
23+
![Screenshot of the Google Cloud console showing the Machine configuration section. The Series dropdown is set to C4A and the machine type c4a-standard-4 is selected.#center](images/gcp-vm.png "Configuring machine type to C4A in Google Cloud Console")
2324

2425
5. Under **OS and storage**, select **Change** and then choose an Arm64-based operating system image. For this Learning Path, select **SUSE Linux Enterprise Server**.
2526
6. For the license type, choose **Pay as you go**.
26-
7. Increase **Size (GB)** from **10** to **100** to allocate sufficient disk space, and then click **Select**.
27+
7. Increase **Size (GB)** from **10** to **100** to allocate sufficient disk space, and then select **Select**.
2728
8. Select **Networking** from the column on the left.
28-
9. Under **Network tags**, enter `allow-llamaindex-port` to link the VM to the firewall rule from the previous step and allow inbound access to port 8000 for the browser-based LlamaIndex RAG application and port 22 for ssh access.
29+
9. Under **Network tags**, enter `allow-llamaindex-port` to link the VM to the firewall rule from the previous section and allow inbound access to port `8000` for the browser-based LlamaIndex RAG application and port `22` for SSH access.
2930
10. Select **Create** to launch the virtual machine.
3031

3132
After the instance starts, select **SSH** next to the VM in the instance list to open a browser-based terminal session.
3233

33-
![Google Cloud Console VM instances page displaying running instance with green checkmark and SSH button in the Connect column#center](images/gcp-pubip-ssh.png "Connecting to a running C4A VM using SSH")
34+
![Google Cloud console VM instances page displaying running instance with green checkmark and SSH button in the Connect column#center](images/gcp-pubip-ssh.png "Connecting to a running C4A VM using SSH")
3435

3536
A new browser window opens with a terminal connected to your VM.
3637

3738
![Browser-based SSH terminal connected to the Google Axion C4A VM. The shell prompt confirms that the instance is running and ready for the next step, where you'll install LlamaIndex and its dependencies.#center](images/gcp-shell.png "Terminal session connected to the VM")
3839

3940
## What you've accomplished and what's next
4041

41-
You've now provisioned a Google Axion C4A Arm VM and connected to it using SSH.
42+
You've now provisioned a Google Cloud C4A VM and connected to it using SSH.
4243

4344
Next, you'll install LlamaIndex, Ollama, ChromaDB, and the required dependencies on your VM.

0 commit comments

Comments
 (0)