Skip to content

Commit efccb28

Browse files
committed
Use llama.cpp built-in UI
1 parent c06f343 commit efccb28

8 files changed

Lines changed: 7 additions & 731 deletions

File tree

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,9 @@ Features: SVE, NEON
1111
This project demonstrates running large language models on CPU using llama.cpp compiled with Arm baseline optimizations and accelerated using NEON SIMD and SVE (when supported and enabled).
1212

1313
The stack includes:
14-
- llama.cpp server with Arm NEON optimizations (SVE optional)
15-
- Quantized Qwen3.5-0.8B model bundled in the image
16-
- Simple web-based chat interface
14+
- Prebuilt llama.cpp server runtime
15+
- Quantized SmolLM2 135M model bundled in the image
16+
- Built-in web chat interface
1717
- No GPU required - pure CPU inference
1818

1919
## Prerequisites
@@ -68,4 +68,4 @@ topo deploy --target <ip-address-of-target> \
6868

6969
### Access the Chat Interface
7070

71-
Open your browser to `URL:3000` to start chatting!
71+
Open your browser to `URL:8080` to start chatting!

compose.yaml

Lines changed: 3 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -16,18 +16,6 @@ services:
1616
retries: 3
1717
start_period: 60s
1818

19-
chat-ui:
20-
platform: linux/arm64
21-
build:
22-
context: ./simple-chat
23-
args:
24-
ENABLE_SVE: OFF
25-
depends_on:
26-
llama-server:
27-
condition: service_healthy
28-
ports:
29-
- "3000:3000"
30-
3119
x-topo:
3220
name: "Topo CPU AI Chat"
3321
description: |
@@ -38,9 +26,9 @@ x-topo:
3826
accelerated using NEON SIMD and SVE (when supported and enabled).
3927
4028
The stack includes:
41-
- llama.cpp server with Arm NEON optimizations (SVE optional)
42-
- Quantized Qwen3.5-0.8B model bundled in the image
43-
- Simple web-based chat interface
29+
- Prebuilt llama.cpp server runtime
30+
- Quantized SmolLM2 135M model bundled in the image
31+
- Built-in web chat interface
4432
- No GPU required - pure CPU inference
4533
4634
Perfect for demos and testing! The bundled Qwen3.5-0.8B model allows the

simple-chat/Dockerfile

Lines changed: 0 additions & 25 deletions
This file was deleted.

simple-chat/app.py

Lines changed: 0 additions & 139 deletions
This file was deleted.

simple-chat/requirements.txt

Lines changed: 0 additions & 2 deletions
This file was deleted.

0 commit comments

Comments
 (0)