Skip to content

Commit ff7a96d

Browse files
authored
Support server (#44)
* add server support Signed-off-by: kerthcet <kerthcet@gmail.com> * Add logo to serve Signed-off-by: kerthcet <kerthcet@gmail.com> * fix tests Signed-off-by: kerthcet <kerthcet@gmail.com> * fix lint Signed-off-by: kerthcet <kerthcet@gmail.com> * fix tests Signed-off-by: kerthcet <kerthcet@gmail.com> * fix tests Signed-off-by: kerthcet <kerthcet@gmail.com> * fix test Signed-off-by: kerthcet <kerthcet@gmail.com> * fix lint Signed-off-by: kerthcet <kerthcet@gmail.com> * remove libs Signed-off-by: kerthcet <kerthcet@gmail.com> * fix tests Signed-off-by: kerthcet <kerthcet@gmail.com> * change log lib Signed-off-by: kerthcet <kerthcet@gmail.com> * change the request log level Signed-off-by: kerthcet <kerthcet@gmail.com> --------- Signed-off-by: kerthcet <kerthcet@gmail.com>
1 parent 1fa6418 commit ff7a96d

25 files changed

Lines changed: 2503 additions & 515 deletions

Cargo.lock

Lines changed: 870 additions & 504 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,8 @@ reqwest = { version = "0.12", features = ["json"] }
1313
tokio = { version = "1", features = ["full"] }
1414
serde = { version = "1.0", features = ["derive"] }
1515
serde_derive = "1.0"
16-
env_logger = "0.11.6"
17-
log = "0.4.26"
16+
tracing = "0.1"
17+
tracing-subscriber = { version = "0.3", features = ["env-filter"] }
1818
indicatif = "0.18"
1919
dirs = "6.0.0"
2020
hf-hub = { version = "0.5.0", features = ["tokio"] }
@@ -26,5 +26,15 @@ rusqlite = { version = "0.32", features = ["bundled"] }
2626
rusqlite_migration = "1.3"
2727
regex = "1.11"
2828

29+
# Web server
30+
axum = "0.7"
31+
tower = "0.4"
32+
tower-http = { version = "0.5", features = ["cors", "trace"] }
33+
uuid = { version = "1.0", features = ["v4", "serde"] }
34+
futures = "0.3"
35+
tokio-stream = "0.1"
36+
2937
[dev-dependencies]
3038
tempfile = "3.12"
39+
tower = { version = "0.4", features = ["util"] }
40+
serde_json = "1.0"

README.md

Lines changed: 121 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,8 @@
2121

2222
💻 **System Detection** - Automatic GPU detection and resource reporting
2323

24+
🚀 **OpenAI-Compatible API** - RESTful API with streaming support
25+
2426
## Installation
2527

2628
### Install with Cargo
@@ -45,6 +47,8 @@ make build
4547

4648
## Quick Start
4749

50+
### CLI Usage
51+
4852
```bash
4953
# Download a model
5054
puma pull inftyai/tiny-random-gpt2
@@ -62,6 +66,39 @@ puma info
6266
puma rm inftyai/tiny-random-gpt2
6367
```
6468

69+
### API Server
70+
71+
```bash
72+
# Start the inference server
73+
puma serve
74+
75+
# Server will start on http://0.0.0.0:8000
76+
# API endpoints:
77+
# POST /v1/chat/completions
78+
# POST /v1/completions
79+
# GET /v1/models
80+
# GET /v1/models/:model
81+
# GET /health
82+
```
83+
84+
**Test the API:**
85+
86+
```bash
87+
# Health check
88+
curl http://localhost:8000/health
89+
90+
# Chat completion
91+
curl http://localhost:8000/v1/chat/completions \
92+
-H "Content-Type: application/json" \
93+
-d '{
94+
"model": "inftyai/tiny-random-gpt2",
95+
"messages": [{"role": "user", "content": "Hello!"}]
96+
}'
97+
98+
# Or use the test script
99+
./hack/scripts/test_api.sh
100+
```
101+
65102
## Commands
66103

67104
| Command | Status | Description |
@@ -72,6 +109,7 @@ puma rm inftyai/tiny-random-gpt2
72109
| `rm <model>` || Remove model and cache |
73110
| `info` || Display system information |
74111
| `version` || Show PUMA version |
112+
| `serve` || Start OpenAI-compatible API server |
75113
| `ps` | 🚧 | List running models |
76114
| `run` | 🚧 | Start model inference |
77115
| `stop` | 🚧 | Stop running model |
@@ -106,6 +144,81 @@ puma ls llama -l author=meta
106144

107145
**Available filters:** `author`, `task`, `license`, `provider`, `model_series`
108146

147+
## API Server
148+
149+
PUMA provides an OpenAI-compatible API server for model inference.
150+
151+
### Starting the Server
152+
153+
```bash
154+
# Default: 0.0.0.0:8000
155+
puma serve
156+
157+
# Custom host and port
158+
puma serve --host 127.0.0.1 --port 3000
159+
```
160+
161+
### API Endpoints
162+
163+
#### Chat Completions (Recommended)
164+
```bash
165+
curl http://localhost:8000/v1/chat/completions \
166+
-H "Content-Type: application/json" \
167+
-d '{
168+
"model": "inftyai/tiny-random-gpt2",
169+
"messages": [
170+
{"role": "system", "content": "You are a helpful assistant."},
171+
{"role": "user", "content": "Hello!"}
172+
],
173+
"max_tokens": 100,
174+
"temperature": 0.7
175+
}'
176+
```
177+
178+
#### Streaming (Server-Sent Events)
179+
```bash
180+
curl http://localhost:8000/v1/chat/completions \
181+
-H "Content-Type: application/json" \
182+
-d '{
183+
"model": "inftyai/tiny-random-gpt2",
184+
"messages": [{"role": "user", "content": "Tell me a story"}],
185+
"stream": true
186+
}'
187+
```
188+
189+
#### List Models
190+
```bash
191+
curl http://localhost:8000/v1/models
192+
```
193+
194+
#### Health Check
195+
```bash
196+
curl http://localhost:8000/health
197+
# Returns: {"status":"ok","version":"0.0.2"}
198+
```
199+
200+
### OpenAI Python Client
201+
202+
PUMA is compatible with the OpenAI Python SDK:
203+
204+
```python
205+
from openai import OpenAI
206+
207+
client = OpenAI(
208+
base_url="http://localhost:8000/v1",
209+
api_key="dummy" # Not required
210+
)
211+
212+
response = client.chat.completions.create(
213+
model="inftyai/tiny-random-gpt2",
214+
messages=[
215+
{"role": "user", "content": "Hello!"}
216+
]
217+
)
218+
219+
print(response.choices[0].message.content)
220+
```
221+
109222
### Inspect Output
110223

111224
```bash
@@ -146,22 +259,28 @@ Models are stored with lowercase names for case-insensitive matching.
146259
# Build
147260
make build
148261

149-
# Run tests (67 unit + 22 integration)
262+
# Run all tests
150263
make test
264+
265+
# Test API manually
266+
./hack/scripts/test_api.sh
151267
```
152268

153269
### Project Structure
154270

155271
```
156272
puma/
157273
├── src/
158-
│ ├── cli/ # Command implementations (ls, rm, inspect)
274+
│ ├── api/ # OpenAI-compatible API
275+
│ ├── backend/ # Inference backends (Mock, MLX)
276+
│ ├── cli/ # Command implementations
159277
│ ├── downloader/ # HuggingFace download logic
160278
│ ├── registry/ # Model registry & metadata
161279
│ ├── storage/ # SQLite storage backend
162280
│ ├── system/ # System info detection
163281
│ └── utils/ # Formatting & helpers
164282
├── tests/ # Integration tests
283+
├── hack/ # Development scripts
165284
├── Cargo.toml # Rust dependencies
166285
└── Makefile # Build commands
167286
```

hack/README.md

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
# Hack Directory
2+
3+
Development and testing utilities for PUMA.
4+
5+
## Structure
6+
7+
```
8+
hack/
9+
└── scripts/ # Test and utility scripts
10+
└── test_api.sh
11+
```
12+
13+
## Scripts
14+
15+
### `scripts/test_api.sh`
16+
17+
Tests all PUMA API endpoints manually.
18+
19+
**Usage:**
20+
```bash
21+
# Start PUMA server first
22+
./puma serve
23+
24+
# In another terminal
25+
./hack/scripts/test_api.sh
26+
```
27+
28+
**Tests:**
29+
- Health check
30+
- List models
31+
- Chat completion (non-streaming)
32+
- Chat completion (streaming)
33+
- Text completion
34+
35+
**Requirements:**
36+
- Running PUMA server
37+
- `curl` and `jq` installed
38+
39+
---
40+
41+
42+
## Adding New Scripts
43+
44+
Place development and testing scripts in `hack/scripts/`:
45+
46+
```bash
47+
# Create new script
48+
cat > hack/scripts/my_script.sh << 'EOF'
49+
#!/bin/bash
50+
# Your script here
51+
EOF
52+
53+
# Make executable
54+
chmod +x hack/scripts/my_script.sh
55+
```
56+
57+
---
58+
59+
## Why "hack"?
60+
61+
The `hack/` directory is a convention from Kubernetes and other projects for:
62+
- Development utilities
63+
- Test scripts
64+
- Build helpers
65+
- CI/CD scripts
66+
- One-off tools
67+
68+
It keeps the root directory clean while providing a place for development tools.

hack/scripts/test_api.sh

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
#!/bin/bash
2+
3+
echo "Testing PUMA OpenAI-Compatible API"
4+
echo "===================================="
5+
echo
6+
7+
# Base URL
8+
BASE_URL="http://localhost:8000"
9+
10+
echo "1. Health Check"
11+
curl -s "$BASE_URL/health"
12+
echo -e "\n"
13+
14+
echo "2. List Models"
15+
curl -s "$BASE_URL/v1/models" | jq '.'
16+
echo
17+
18+
echo "3. Chat Completion (Non-streaming)"
19+
curl -s "$BASE_URL/v1/chat/completions" \
20+
-H "Content-Type: application/json" \
21+
-d '{
22+
"model": "test-model",
23+
"messages": [
24+
{"role": "user", "content": "Hello!"}
25+
],
26+
"max_tokens": 50
27+
}' | jq '.'
28+
echo
29+
30+
echo "4. Chat Completion (Streaming)"
31+
curl -s -N "$BASE_URL/v1/chat/completions" \
32+
-H "Content-Type: application/json" \
33+
-d '{
34+
"model": "test-model",
35+
"messages": [
36+
{"role": "user", "content": "Tell me a story"}
37+
],
38+
"stream": true,
39+
"max_tokens": 50
40+
}'
41+
echo -e "\n"
42+
43+
echo "5. Legacy Text Completion"
44+
curl -s "$BASE_URL/v1/completions" \
45+
-H "Content-Type: application/json" \
46+
-d '{
47+
"model": "test-model",
48+
"prompt": "Once upon a time",
49+
"max_tokens": 50
50+
}' | jq '.'
51+
echo
52+
53+
echo "Done!"

0 commit comments

Comments
 (0)