Arm-Examples
diff --git a/‎README.md‎
Lines changed: 4 additions & 4 deletions b/‎README.md‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎compose.yaml‎
Lines changed: 3 additions & 15 deletions b/‎compose.yaml‎
Lines changed: 3 additions & 15 deletions
diff --git a/‎simple-chat/Dockerfile‎
Lines changed: 0 additions & 25 deletions b/‎simple-chat/Dockerfile‎
Lines changed: 0 additions & 25 deletions
diff --git a/‎simple-chat/app.py‎
Lines changed: 0 additions & 139 deletions b/‎simple-chat/app.py‎
Lines changed: 0 additions & 139 deletions
diff --git a/‎simple-chat/requirements.txt‎
Lines changed: 0 additions & 2 deletions b/‎simple-chat/requirements.txt‎
Lines changed: 0 additions & 2 deletions
@@ -11,9 +11,9 @@ Features: SVE, NEON
 This project demonstrates running large language models on CPU using llama.cpp compiled with Arm baseline optimizations and accelerated using NEON SIMD and SVE (when supported and enabled).
 
 The stack includes:
-- llama.cpp server with Arm NEON optimizations (SVE optional)
-- Quantized Qwen3.5-0.8B model bundled in the image
-- Simple web-based chat interface
+- Prebuilt llama.cpp server runtime
+- Quantized SmolLM2 135M model bundled in the image
+- Built-in web chat interface
 - No GPU required - pure CPU inference
 
 ## Prerequisites
@@ -68,4 +68,4 @@ topo deploy --target <ip-address-of-target> \
 
 ### Access the Chat Interface
 
-Open your browser to `URL:3000` to start chatting!
+Open your browser to `URL:8080` to start chatting!
@@ -16,18 +16,6 @@ services:
       retries: 3
       start_period: 60s
 
-  chat-ui:
-    platform: linux/arm64
-    build:
-      context: ./simple-chat
-      args:
-        ENABLE_SVE: OFF
-    depends_on:
-      llama-server:
-        condition: service_healthy
-    ports:
-      - "3000:3000"
-
 x-topo:
   name: "Topo CPU AI Chat"
   description: |
@@ -38,9 +26,9 @@ x-topo:
     accelerated using NEON SIMD and SVE (when supported and enabled).
 
     The stack includes:
-    - llama.cpp server with Arm NEON optimizations (SVE optional)
-    - Quantized Qwen3.5-0.8B model bundled in the image
-    - Simple web-based chat interface
+    - Prebuilt llama.cpp server runtime
+    - Quantized SmolLM2 135M model bundled in the image
+    - Built-in web chat interface
     - No GPU required - pure CPU inference
 
     Perfect for demos and testing! The bundled Qwen3.5-0.8B model allows the