Merge pull request #3378 from anupras-mohapatra-arm/servers-and-cloud-computing

pareenaverma · web-flow · commit a065dff75b70 · 2026-06-10T08:23:56.000-04:00
LlamaIndex RAG app on Google Cloud C4A LP editorial review
diff --git a/content/learning-paths/servers-and-cloud-computing/llamaindex-rag-axion/_index.md b/content/learning-paths/servers-and-cloud-computing/llamaindex-rag-axion/_index.md
@@ -1,11 +1,7 @@
 ---
-title: Build RAG applications with LlamaIndex on Google Cloud C4A Axion VM
+title: Build RAG applications with LlamaIndex on a Google Cloud C4A virtual machine
 
-draft: true
-cascade:
-    draft: true
-
-description: Set up LlamaIndex on Google Cloud C4A Axion Arm VMs running SUSE Linux to build browser-based Retrieval-Augmented Generation (RAG) applications using local LLMs, vector databases, and FastAPI.
+description: Set up LlamaIndex on Google Axion-based C4A Arm64 VMs running SUSE Linux to build browser-based Retrieval-Augmented Generation (RAG) applications using local LLMs, vector databases, and FastAPI.
 
 minutes_to_complete: 30
 
diff --git a/content/learning-paths/servers-and-cloud-computing/llamaindex-rag-axion/background.md b/content/learning-paths/servers-and-cloud-computing/llamaindex-rag-axion/background.md
@@ -1,34 +1,35 @@
 ---
-title: Learn about LlamaIndex and Google Axion C4A for RAG applications
+title: Learn about LlamaIndex and Google Cloud C4A for RAG applications
+description: Learn how LlamaIndex supports browser-based RAG applications on Google Axion-based C4A instances.
 weight: 2
 
 layout: "learningpathall"
 ---
 
-## Google Axion C4A Arm instances for AI and RAG workloads
+## Google Cloud C4A instances for AI and RAG workloads
 
-Google Axion C4A is a family of Arm-based virtual machines built on Google’s custom Axion CPU, which is based on Arm Neoverse V2 cores. Designed for high-performance and energy-efficient computing, these virtual machines offer strong performance for modern cloud workloads such as AI applications, vector databases, Retrieval-Augmented Generation (RAG) pipelines, and scalable inference services.
+Google Cloud C4A is a family of Arm-based virtual machines (VMs) built on Google’s custom Axion CPU, which is based on Arm Neoverse V2 cores. Designed for high-performance and energy-efficient computing, these VMs offer strong performance for modern cloud workloads.
 
 The C4A series provides a cost-effective alternative to x86 virtual machines while using the scalability and performance benefits of the Arm architecture in Google Cloud.
 
 ## LlamaIndex for RAG and context-aware AI applications on Arm
 
-LlamaIndex is an open-source framework designed to build context-aware AI applications using Large Language Models (LLMs). It's widely used for Retrieval-Augmented Generation (RAG), document indexing, vector search, semantic retrieval, and integrating custom data sources with LLMs.
+LlamaIndex is an open-source framework designed to build context-aware AI applications using large language models (LLMs). It's widely used for Retrieval-Augmented Generation (RAG), document indexing, vector search, semantic retrieval, and integrating custom data sources with LLMs.
 
 LlamaIndex provides a unified framework with components such as:
 
-* Document loaders for ingesting custom data  
-* Indexing pipelines for structured retrieval workflows  
-* Query engines for context-aware question answering  
-* Vector store integrations for scalable embedding search  
-* LLM integrations for generating grounded responses  
+- Document loaders for ingesting custom data  
+- Indexing pipelines for structured retrieval workflows  
+- Query engines for context-aware question answering  
+- Vector store integrations for scalable embedding search  
+- LLM integrations for generating grounded responses  
 
-Running LlamaIndex on Google Axion C4A Arm-based infrastructure enables efficient execution of AI and RAG workloads by using multi-core Arm CPUs and optimized memory performance. This results in improved performance per watt, reduced infrastructure costs, and better scalability for browser-based AI applications and local inference pipelines.
+Running LlamaIndex on Google Cloud C4A Arm-based infrastructure enables efficient execution of AI and RAG workloads by using multi-core Arm CPUs and optimized memory performance. This results in improved performance per watt, reduced infrastructure costs, and better scalability for browser-based AI applications and local inference pipelines.
 
-Common use cases include browser-based AI assistants, document search applications, semantic retrieval systems, vector database integrations, enterprise knowledge bases, and context-aware chatbot applications.
+In this Learning Path, you'll use these components to build a browser-based RAG application that answers questions from custom documents.
 
 ## What you've learned and what's next
 
-You've now learned about Google Axion C4A Arm-based virtual machines and their performance advantages for AI and RAG workloads. You were also introduced to core LlamaIndex components including document ingestion, indexing pipelines, query engines, vector stores, and LLM integrations.
+You've now learned about Arm-based Google Cloud C4A VMs and their performance advantages for AI and RAG workloads. You were also introduced to core LlamaIndex components including document ingestion, indexing pipelines, query engines, vector stores, and LLM integrations.
 
-Next, you'll create a firewall rule in Google Cloud Console to enable remote access to the browser-based LlamaIndex RAG application used in this Learning Path.
+Next, you'll create a firewall rule in Google Cloud Console to enable remote access to the browser-based LlamaIndex RAG application that you'll create in this Learning Path.
diff --git a/content/learning-paths/servers-and-cloud-computing/llamaindex-rag-axion/build-browser-rag-app.md b/content/learning-paths/servers-and-cloud-computing/llamaindex-rag-axion/build-browser-rag-app.md
@@ -1,14 +1,15 @@
 ---
-title: Build a Browser-Based RAG Application with LlamaIndex
+title: Build and test a browser-based RAG application with LlamaIndex
+description: Learn how to build a browser-based RAG application with LlamaIndex, ChromaDB, Ollama, and FastAPI on an Arm-based Google Cloud C4A VM.
 weight: 6
 
 ### FIXED, DO NOT MODIFY
 layout: learningpathall
 ---
 
-## Build a Browser-Based RAG Application with LlamaIndex
+## Build a browser-based RAG application 
 
-In this section, you'll build and test a browser-based Retrieval-Augmented Generation (RAG) application using LlamaIndex on Google Cloud Axion Arm64.
+In this section, you'll build and test a browser-based Retrieval-Augmented Generation (RAG) application using LlamaIndex.
 
 You'll:
 
@@ -19,9 +20,9 @@ You'll:
 - Create a FastAPI backend
 - Query documents directly from a web browser
 
-## Architecture
+### Application architecture
 
-The following diagram shows how the components interact. A request from the browser reaches FastAPI, which calls LlamaIndex to retrieve relevant chunks from ChromaDB and passes them to the Ollama local LLM for answer generation:
+The following flow shows how the application components interact. A request from the browser reaches FastAPI, which calls LlamaIndex to retrieve relevant chunks from ChromaDB and passes them to the Ollama local LLM for answer generation:
 
 ```text
 Browser UI
@@ -30,14 +31,14 @@ FastAPI
     ↓
 LlamaIndex
     ↓
-ChromaDB Vector Store
+ChromaDB vector store
     ↓
-Ollama Local LLM
+Ollama local LLM
     ↓
 Documents
 ```
 
-## Activate the Python environment
+### Activate the Python environment
 
 Activate the Python virtual environment:
 
@@ -46,7 +47,7 @@ cd ~/llamaindex-rag
 source rag-env/bin/activate
 ```
 
-## Create sample documents
+### Create sample documents
 
 Create the first document:
 
@@ -72,7 +73,7 @@ LlamaIndex is a framework for building context-aware LLM applications using inde
 EOF
 ```
 
-## Create the RAG engine
+### Create the RAG engine
 
 Create the main LlamaIndex application:
 
@@ -151,7 +152,7 @@ def build_query_engine():
 EOF
 ```
 
-## Create browser UI
+### Create browser UI
 
 Create a browser-based interface for asking questions:
 
@@ -268,7 +269,7 @@ async function askQuestion() {
 EOF
 ```
 
-## Create FastAPI backend
+### Create FastAPI backend
 
 Create the FastAPI backend application:
 
@@ -340,9 +341,13 @@ INFO:     Waiting for application startup.
 INFO:     Application startup complete.
 INFO:     Uvicorn running on http://0.0.0.0:8000
 ```
+Keep the terminal open for testing the application. 
 
+## Test the browser-based RAG application
 
-## Open browser application
+After starting the application, test it by opening the UI and asking a few questions. 
+
+### Open browser application UI
 
 Open a browser and navigate to:
 
@@ -354,7 +359,7 @@ This opens the browser-based RAG application UI.
 
 ![Browser-based RAG application showing a question input box and generated response using LlamaIndex and Ollama#center](images/rag-browser.png "Browser-based LlamaIndex RAG application")
 
-## Test browser-based Q&A
+### Test browser-based Q&A
 
 Ask the following questions in the browser UI:
 
@@ -378,15 +383,17 @@ What is Google Cloud Axion?
 
 The answers will appear directly in the browser interface.
 
-## Add your own documents
+## (Optional) Add your own documents
+
+After confirming that the application works, you can try adding your own documents.
 
-Copy your own files into the data directory:
+Copy your own files into the data directory. For example:
 
 ```bash
 cp yourfile.txt ~/llamaindex-rag/data/
 ```
 
-Stop the running FastAPI server by pressing `Ctrl+C` in the terminal where Uvicorn is running. Then restart it:
+Stop the running FastAPI server by pressing `Ctrl + C` in the terminal where Uvicorn is running. Then restart it:
 
 ```bash
 uvicorn api:app --host 0.0.0.0 --port 8000
@@ -396,4 +403,6 @@ The `build_query_engine()` function runs on startup and reads all documents from
 
 ## What you've accomplished
 
-You've successfully built a browser-based RAG application using LlamaIndex on a Google Cloud Axion Arm64 VM. You created sample documents, generated embeddings using HuggingFace models, stored vectors in ChromaDB, exposed the backend using FastAPI, and queried custom documents directly from a browser using Ollama.
+You've now built a browser-based RAG application using LlamaIndex on an Arm-based Google Cloud C4A VM. You created sample documents, generated embeddings using Hugging Face models, stored vectors in ChromaDB, exposed the backend using FastAPI, and queried custom documents directly from a browser using Ollama.
+
+You can extend this workflow for your own LlamaIndex RAG applications on Arm-based cloud infrastructure. 
diff --git a/content/learning-paths/servers-and-cloud-computing/llamaindex-rag-axion/firewall.md b/content/learning-paths/servers-and-cloud-computing/llamaindex-rag-axion/firewall.md
@@ -1,5 +1,6 @@
 ---
 title: Configure Google Cloud firewall rules for LlamaIndex
+description: Learn how to create a Google Cloud firewall rule that allows browser access to a FastAPI-based LlamaIndex RAG application.
 weight: 3
 
 ### FIXED, DO NOT MODIFY
@@ -8,40 +9,32 @@ layout: learningpathall
 
 ## Allow inbound access to the LlamaIndex browser application
 
-Create a firewall rule in Google Cloud Console to expose the required port for the browser-based LlamaIndex RAG application.
+Create a firewall rule in Google Cloud console to expose port 8000 for the browser-based LlamaIndex RAG application.
 
-## Configure the firewall rule in Google Cloud Console
+### Configure the firewall rule in the Google Cloud console
 
-To configure a firewall rule for the LlamaIndex browser-based RAG application:
+To configure a firewall rule:
 
-1. Navigate to the [Google Cloud Console](https://console.cloud.google.com/), go to **VPC Network > Firewall**, and select **Create firewall rule**.
+1. Navigate to the [Google Cloud console](https://console.cloud.google.com/).
+2. Go to **VPC Network > Firewall**, and select **Create firewall rule**.
 
-![Google Cloud Console VPC Network Firewall page showing the Create firewall rule button in the top menu bar#center](images/firewall-rule.png "Create a firewall rule in Google Cloud Console")
-
-2. Create a firewall rule that exposes the port required for the LlamaIndex browser application.
+![Google Cloud console VPC Network Firewall page showing the Create firewall rule button in the top menu bar#center](images/firewall-rule.png "Create a firewall rule in Google Cloud console")
 
 3. Set **Name** to `allow-llamaindex-port`, then select the network you want to bind to your virtual machine.
-
 4. Set **Direction of traffic** to **Ingress**, set **Action on match** to **Allow**, set **Targets** to **All instances in the network**, and set **Source IPv4 ranges** to **0.0.0.0/0**.
 
-![Google Cloud Console Create firewall rule form with Name set to allow-llamaindex-port and Direction of traffic set to Ingress#center](images/network-rule.png "Configuring the allow-llamaindex-port firewall rule")
+![Google Cloud console Create firewall rule form with Name set to allow-llamaindex-port and Direction of traffic set to Ingress#center](images/network-rule.png "Configuring the allow-llamaindex-port firewall rule")
 
 5. Under **Protocols and ports**, select **Specified protocols and ports**.
+6. Select the **TCP** checkbox. For **Ports**, enter `8000`. Port `8000` is used by the FastAPI server that backs the browser-based LlamaIndex RAG application.
 
-6. Select the **TCP** checkbox. Port **8000** is used by the FastAPI server that backs the browser-based LlamaIndex RAG application. Enter:
-
-```text
-8000
-```
-
-![Google Cloud Console Protocols and ports section with TCP selected and port 8000 entered#center](images/network-port.png "Setting the LlamaIndex browser application port in the firewall rule")
+![Google Cloud console Protocols and ports section with TCP selected and port 8000 entered#center](images/network-port.png "Setting the LlamaIndex browser application port in the firewall rule")
 
 7. In the same **TCP** field, also add port `22` to allow SSH access to the VM.
-
 8. Select **Create**.
 
 ## What you've accomplished and what's next
 
-You've created a firewall rule that exposes port 8000 for the browser-based LlamaIndex RAG application and port 22 for SSH. The firewall rule uses the network tag `allow-llamaindex-port`, which you'll attach to your virtual machine in the next step.
+You've now created a firewall rule that exposes port 8000 for the browser-based LlamaIndex RAG application and port 22 for SSH. You'll attach this firewall rule to your virtual machine in the next section.
 
-Next, you'll create a Google Cloud Axion C4A virtual machine and connect to it using SSH.
+Next, you'll create a Google Cloud C4A virtual machine and connect to it using SSH.
diff --git a/content/learning-paths/servers-and-cloud-computing/llamaindex-rag-axion/instance.md b/content/learning-paths/servers-and-cloud-computing/llamaindex-rag-axion/instance.md
@@ -1,5 +1,6 @@
 ---
-title: Create a Google Axion C4A virtual machine for LlamaIndex
+title: Create a Google Cloud C4A virtual machine for LlamaIndex
+description: Learn how to create an Arm-based Google Cloud C4A virtual machine powered by Google Axion and connect to it with browser-based SSH.
 weight: 4
 
 ### FIXED, DO NOT MODIFY
@@ -8,36 +9,36 @@ layout: learningpathall
 
 ## Set up the virtual machine
 
-In this section, you'll create a Google Axion C4A Arm-based virtual machine (VM) on Google Cloud Platform (GCP). You'll use the `c4a-standard-4` machine type, which provides 4 vCPUs and 16 GB of memory. This VM will host your browser-based LlamaIndex RAG application.
+In this section, you'll create a Google Cloud C4A Arm-based virtual machine (VM). You'll use the `c4a-standard-4` machine type, which provides four vCPUs and 16 GB of memory. This VM will host your browser-based LlamaIndex RAG application.
 
-## Configure the C4A virtual machine in Google Cloud Console
+### Configure the C4A virtual machine in the Google Cloud console
 
 To create a virtual machine based on the C4A instance type in the console:
 
-1. Navigate to the [Google Cloud Console](https://console.cloud.google.com/).
-2. Go to **Compute Engine** > **VM Instances** and select **Create Instance**.
+1. Navigate to the [Google Cloud console](https://console.cloud.google.com/).
+2. Go to **Compute Engine** > **VM instances** and select **Create instance**.
 3. Under **Machine configuration**, populate fields such as **Instance name**, **Region**, and **Zone**.
 4. Set **Series** to `C4A`, then select `c4a-standard-4` for **Machine type**.
 
-![Screenshot of the Google Cloud Console showing the Machine configuration section. The Series dropdown is set to C4A and the machine type c4a-standard-4 is selected#center](images/gcp-vm.png "Configuring machine type to C4A in Google Cloud Console")
+![Screenshot of the Google Cloud console showing the Machine configuration section. The Series dropdown is set to C4A and the machine type c4a-standard-4 is selected.#center](images/gcp-vm.png "Configuring machine type to C4A in Google Cloud Console")
 
 5. Under **OS and storage**, select **Change** and then choose an Arm64-based operating system image. For this Learning Path, select **SUSE Linux Enterprise Server**.
 6. For the license type, choose **Pay as you go**.
-7. Increase **Size (GB)** from **10** to **100** to allocate sufficient disk space, and then click **Select**.
+7. Increase **Size (GB)** from **10** to **100** to allocate sufficient disk space, and then select **Select**.
 8. Select **Networking** from the column on the left.
-9. Under **Network tags**, enter `allow-llamaindex-port` to link the VM to the firewall rule from the previous step and allow inbound access to port 8000 for the browser-based LlamaIndex RAG application and port 22 for ssh access.
+9. Under **Network tags**, enter `allow-llamaindex-port` to link the VM to the firewall rule from the previous section and allow inbound access to port `8000` for the browser-based LlamaIndex RAG application and port `22` for SSH access.
 10. Select **Create** to launch the virtual machine.
 
 After the instance starts, select **SSH** next to the VM in the instance list to open a browser-based terminal session.
 
-![Google Cloud Console VM instances page displaying running instance with green checkmark and SSH button in the Connect column#center](images/gcp-pubip-ssh.png "Connecting to a running C4A VM using SSH")
+![Google Cloud console VM instances page displaying running instance with green checkmark and SSH button in the Connect column#center](images/gcp-pubip-ssh.png "Connecting to a running C4A VM using SSH")
 
 A new browser window opens with a terminal connected to your VM.
 
 ![Browser-based SSH terminal connected to the Google Axion C4A VM. The shell prompt confirms that the instance is running and ready for the next step, where you'll install LlamaIndex and its dependencies.#center](images/gcp-shell.png "Terminal session connected to the VM")
 
 ## What you've accomplished and what's next
 
-You've now provisioned a Google Axion C4A Arm VM and connected to it using SSH.
+You've now provisioned a Google Cloud C4A VM and connected to it using SSH.
 
 Next, you'll install LlamaIndex, Ollama, ChromaDB, and the required dependencies on your VM.
diff --git a/content/learning-paths/servers-and-cloud-computing/llamaindex-rag-axion/setup-llamaindex-rag.md b/content/learning-paths/servers-and-cloud-computing/llamaindex-rag-axion/setup-llamaindex-rag.md