Skip to content

Commit 7e1bef3

Browse files
authored
feat: implicit flash endpoint resolution + CLI overhaul (#324)
* feat: implicit flash endpoint resolution via sentinel headers * refactor: drop flash.toml support in favor of env vars * feat: rename flash run to flash dev, require explicit context for remote calls * feat: rename flash run to flash dev, require explicit context for remote calls * refactor: clean up CLI output formatting * refactor: establish consistent color palette across CLI * refactor: flatten deploy output, remove nesting * refactor: use tree chars for deploy endpoint listing * fix: handle hyphenated directory names in flash dev codegen * refactor: route worker logs through print() instead of logging * fix: improve worker log filtering and add color to runtime output * fix: indent user stdout under request, print before completion line * fix: worker log filters now handle timezone offsets and JSON-wrapped messages * fix: drop Rich Status spinner for pull progress * fix: duplicate logs on subsequent requests to warm workers * feat: redesign dev console lifecycle output * fix: strip 'live-' prefix from endpoint names in dev console output * feat: redesign flash dev startup and shutdown output * fix: detect duplicate endpoint names across files in manifest builder * fix: clean up flash dev startup route table * feat: redesign flash deploy output * fix: standardize spinner styles and add completion lines * feat: add upload progress bar to flash deploy * feat: redesign flash app and env command output * feat: add column headers to app and env list/get output * feat: simplify app list and env list output * feat: redesign undeploy command output * feat: G1a log format for flash dev runtime * fix: align name columns in dev console output * fix: use resource_name not name on WorkerInfo * fix: set_name_width in generated server.py not parent process * fix: catch remote execution errors in dev server route handlers * Update pyproject.toml * style: run ruff format * fix: lint errors (F541 f-string, F401 unused import) * fix: unused variable lint errors * fix: update tests to match new CLI output format * fix: set FLASH_IS_LIVE_PROVISIONING in integration tests * fix: set .name on mock resources in LB and live serverless tests * fix: set FLASH_IS_LIVE_PROVISIONING in concurrency integration tests * fix: pad empty sentinel input to prevent runpod dropping input field * style: format * fix: remove unused os import * fix: update handler generator tests for empty input acceptance * fix: skip sentinel for client-mode endpoints, update empty input tests * fix: keep sentinel for client endpoints, set live provisioning in image-mode test * fix: live provisioning only in flash dev, guard fallback path * fix: use Live resource classes for all non-deploy contexts * fix: catch sentinel timeout with clear error message, 30s default * fix: sentinel timeout 90s * fix: update _is_live_provisioning tests for new default behavior * fix: address PR review feedback (docstrings, response validation, timeout config, LB error handling) * docs: rename flash run to flash dev, update execution model, add sentinel env vars * fix: class sentinel uses plain kwargs instead of cloudpickle round-trip, skip self in arg mapping * fix: class sentinel maps positional args via method_ref, keep cloudpickle * fix: always pop method key from class handler input, update test assertion
1 parent 7d298e2 commit 7e1bef3

67 files changed

Lines changed: 3465 additions & 2000 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

CONTRIBUTING.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ Get your API key from: https://docs.runpod.io/get-started/api-keys
3535
- Integration tests that interact with Runpod API
3636

3737
**When is the API key NOT needed?**
38-
- Local development with `flash run` (local server only)
38+
- Local development with `flash dev` (local server only)
3939
- `flash init` command (project scaffolding)
4040
- Unit tests (mocked API calls)
4141
- Code formatting, linting, type checking

README.md

Lines changed: 53 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -1,55 +1,46 @@
11
# Flash
22

3-
Flash is a Python SDK for developing cloud-native AI apps where you define everythinghardware, remote functions, and dependenciesusing local code.
3+
Flash is a Python SDK for developing cloud-native AI apps where you define everything -- hardware, remote functions, and dependencies -- using local code.
44

55
```python
66
import asyncio
77
from runpod_flash import Endpoint, GpuType
88

9-
# Mark the function below for remote execution
10-
@Endpoint(name="hello-gpu", gpu=GpuType.NVIDIA_GEFORCE_RTX_4090, dependencies=["torch"])
11-
async def hello(): # This function runs on Runpod
9+
@Endpoint(name="hello-gpu", gpu=GpuType.NVIDIA_GEFORCE_RTX_4090, dependencies=["torch"])
10+
async def hello():
1211
import torch
1312
gpu_name = torch.cuda.get_device_name(0)
1413
print(f"Hello from your GPU! ({gpu_name})")
1514
return {"gpu": gpu_name}
1615

1716
asyncio.run(hello())
18-
print("Done!") # This runs locally
17+
print("Done!")
1918
```
2019

21-
Write `@Endpoint` decorated Python functions on your local machine. Run them, and Flash automatically handles GPU/CPU provisioning and worker scaling on [Runpod Serverless](https://docs.runpod.io/serverless/overview).
20+
Write `@Endpoint` decorated Python functions on your local machine. Deploy them with `flash deploy`, then call them by running the same script. Flash handles GPU/CPU provisioning and worker scaling on [RunPod Serverless](https://docs.runpod.io/serverless/overview).
2221

2322
## Setup
2423

2524
### Install Flash
2625

27-
Install Flash using `pip` or `uv`:
28-
2926
```bash
30-
# Install with pip
3127
pip install runpod-flash
32-
33-
# Or uv
28+
# or
3429
uv add runpod-flash
3530
```
3631

37-
Flash requires [Python 3.10+](https://www.python.org/downloads/), and is currently available for macOS and Linux. Windows support is in development.
32+
Flash requires [Python 3.10+](https://www.python.org/downloads/) on macOS or Linux. Windows support is in development.
3833

3934
### Authentication
4035

41-
Before you can use Flash, you need to authenticate with your Runpod account:
42-
4336
```bash
4437
flash login
4538
```
4639

47-
This saves your API key securely and allows you to use the Flash CLI and run `@Endpoint` functions.
40+
This saves your API key and allows you to use the Flash CLI and call `@Endpoint` functions.
4841

4942
### Coding agent integration (optional)
5043

51-
Install the Flash skill package for AI coding agents like Claude Code, Cline, and Cursor:
52-
5344
```bash
5445
npx skills add runpod/skills
5546
```
@@ -71,18 +62,12 @@ from runpod_flash import Endpoint, GpuType
7162
dependencies=["numpy", "torch"]
7263
)
7364
def gpu_matrix_multiply(size):
74-
# IMPORTANT: Import packages INSIDE the function
7565
import numpy as np
7666
import torch
7767

78-
# Get GPU name
7968
device_name = torch.cuda.get_device_name(0)
80-
81-
# Create random matrices
8269
A = np.random.rand(size, size)
8370
B = np.random.rand(size, size)
84-
85-
# Multiply matrices
8671
C = np.dot(A, B)
8772

8873
return {
@@ -91,33 +76,61 @@ def gpu_matrix_multiply(size):
9176
"gpu": device_name
9277
}
9378

94-
# Call the function
9579
async def main():
96-
print("Running matrix multiplication on Runpod GPU...")
80+
print("Running matrix multiplication on RunPod GPU...")
9781
result = await gpu_matrix_multiply(1000)
98-
99-
print(f"\n✓ Matrix size: {result['matrix_size']}x{result['matrix_size']}")
100-
print(f"✓ Result mean: {result['result_mean']:.4f}")
101-
print(f"✓ GPU used: {result['gpu']}")
82+
print(f"Matrix size: {result['matrix_size']}x{result['matrix_size']}")
83+
print(f"Result mean: {result['result_mean']:.4f}")
84+
print(f"GPU used: {result['gpu']}")
10285

10386
if __name__ == "__main__":
10487
asyncio.run(main())
10588
```
10689

107-
Run it:
90+
Deploy, then run:
10891

10992
```bash
93+
flash deploy
11094
python gpu_demo.py
11195
```
11296

113-
First run takes 30-60 seconds (provisioning). Subsequent runs take 2-3 seconds.
97+
## How it works
98+
99+
Flash has two modes: **deploy** and **dev**.
100+
101+
### Deploy and run (`flash deploy` + `python script.py`)
102+
103+
Deploy packages your code and provisions endpoints on RunPod. After deploying, run your script directly and Flash routes calls to your deployed endpoints via implicit resolution:
104+
105+
```bash
106+
flash deploy # build, upload, provision endpoints
107+
python gpu_demo.py # calls deployed endpoints automatically
108+
```
109+
110+
Flash resolves endpoints by matching the app name (defaults to the current directory name) and environment (defaults to `production`). Configure with env vars or `.env`:
111+
112+
```bash
113+
FLASH_APP=my-project # defaults to current directory name
114+
FLASH_ENV=staging # defaults to "production"
115+
```
116+
117+
### Dev mode (`flash dev`)
118+
119+
For local development and testing, `flash dev` starts a hybrid dev server that runs your FastAPI app locally while provisioning live ephemeral workers on RunPod:
120+
121+
```bash
122+
flash dev # starts local server + provisions workers
123+
flash dev --port 3000 # custom port
124+
flash dev --auto-provision # provision all endpoints at startup
125+
```
114126

115127
## What Flash does
116128

117-
- **Remote execution**: `@Endpoint` functions run on Runpod Serverless GPUs/CPUs
118-
- **Auto-scaling**: Workers scale from 0 to N based on demand
119-
- **Dependency management**: Packages install automatically on remote workers
120-
- **Two patterns**: Queue-based (`@Endpoint`) for batch work, load-balanced (`Endpoint()` + routes) for REST APIs
129+
- **Remote execution**: `@Endpoint` functions run on RunPod Serverless GPUs/CPUs
130+
- **Implicit endpoint resolution**: `python script.py` routes to deployed endpoints automatically
131+
- **Auto-scaling**: workers scale from 0 to N based on demand
132+
- **Dependency management**: packages install automatically on remote workers
133+
- **Two patterns**: queue-based (`@Endpoint`) for batch work, load-balanced (`Endpoint()` + routes) for REST APIs
121134
- **Concurrency control**: `max_concurrency` lets each worker process multiple jobs simultaneously
122135

123136
## Documentation
@@ -126,47 +139,43 @@ Full documentation: **[docs.runpod.io/flash](https://docs.runpod.io/flash)**
126139

127140
- [Quickstart](https://docs.runpod.io/flash/quickstart) - First GPU workload in 5 minutes
128141
- [Create endpoints](https://docs.runpod.io/flash/endpoint-functions) - Queue-based, load-balancing, and custom Docker endpoints
129-
- [CLI reference](https://docs.runpod.io/flash/cli/overview) - `flash run`, `flash deploy`, `flash build`
142+
- [CLI reference](https://docs.runpod.io/flash/cli/overview) - `flash dev`, `flash deploy`, `flash build`
130143
- [Configuration](https://docs.runpod.io/flash/configuration/parameters) - All endpoint parameters
131144

132145
## Flash apps
133146

134-
When you're ready to move beyond scripts and build a production-ready API, you can create a [Flash app](https://docs.runpod.io/flash/apps/overview) (a collection of interconnected endpoints with diverse hardware configurations) and deploy it to Runpod.
147+
When you're ready to move beyond scripts and build a production-ready API, you can create a [Flash app](https://docs.runpod.io/flash/apps/overview) (a collection of interconnected endpoints with diverse hardware configurations) and deploy it to RunPod.
135148

136149
[Follow this tutorial to build your first Flash app](https://docs.runpod.io/flash/apps/build-app).
137150

138151
## Flash CLI
139152

140-
The Flash CLI provides a set of commands for managing your Flash apps and endpoints.
141-
142153
```bash
143154
flash --help
144155
```
145156

146157
[Learn more about the Flash CLI](https://docs.runpod.io/flash/cli/overview).
147158

148-
149159
## Examples
150160

151161
Browse working examples: **[github.com/runpod/flash-examples](https://github.com/runpod/flash-examples)**
152162

153163
## Requirements
154164

155-
- Python 3.12
165+
- Python 3.10-3.12
156166
- macOS or Linux (Windows support in development)
157-
- A [Runpod account](https://runpod.io/console) (email must be verified) with an API key
167+
- A [RunPod account](https://runpod.io/console) (email must be verified) with an API key
158168

159169
## Contributing
160170

161171
We welcome contributions! See [RELEASE_SYSTEM.md](RELEASE_SYSTEM.md) for development workflow.
162172

163173
```bash
164-
# Clone and install
165174
git clone https://github.com/runpod/flash.git
166175
cd flash
167176
pip install -e ".[dev]"
168177

169-
# Use conventional commits
178+
# use conventional commits
170179
git commit -m "feat: add new feature"
171180
git commit -m "fix: resolve issue"
172181
```

docs/Flash_Deploy_Guide.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ cd my-project
2121
flash login
2222

2323
# test locally
24-
flash run
24+
flash dev
2525

2626
# deploy
2727
flash deploy --env production
@@ -115,7 +115,7 @@ async def process(data: dict) -> dict:
115115
### 2. Test Locally
116116

117117
```bash
118-
flash run
118+
flash dev
119119
```
120120

121121
This starts a local dev server at `http://localhost:8888` with auto-reload:

docs/LoadBalancer_Runtime_Architecture.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -108,9 +108,9 @@ https://api.runpod.ai/v2/{endpoint-id}/runsync
108108

109109
### /execute Endpoint
110110

111-
The `/execute` endpoint accepts and runs arbitrary Python code. It exists **only during local development** (`flash run`).
111+
The `/execute` endpoint accepts and runs arbitrary Python code. It exists **only during local development** (`flash dev`).
112112

113-
**In local development (`flash run`):**
113+
**In local development (`flash dev`):**
114114
- `/execute` is available for Flash's remote code execution protocol
115115
- Code originates from your own `Endpoint`-decorated functions
116116
- Safe because only you can run code locally

docs/Using_Remote_With_LoadBalancer.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -124,10 +124,10 @@ async def health():
124124

125125
## Local Development
126126

127-
Run locally with `flash run`:
127+
Run locally with `flash dev`:
128128

129129
```bash
130-
flash run
130+
flash dev
131131
# starts a local dev server at http://localhost:8888
132132
# all routes are auto-discovered and registered
133133
```
@@ -230,7 +230,7 @@ health = await ep.get("/health")
230230

231231
1. **Group related routes** on the same `Endpoint` instance
232232
2. **Use descriptive paths** like `/api/users/{user_id}` not `/api/u`
233-
3. **Test locally with `flash run`** before deploying
233+
3. **Test locally with `flash dev`** before deploying
234234
4. **Handle errors gracefully** with meaningful error messages
235235
5. **Use CPU endpoints for I/O-bound work** to save costs
236236
6. **Set appropriate `workers` scaling** based on expected traffic

src/runpod_flash/cli/commands/_run_server_helpers.py

Lines changed: 23 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,12 @@ async def call_with_body(func, body):
7878
model_fields_set) to match RunPod platform behavior. Plain dict
7979
bodies bypass this check since they originate from LB local routes
8080
where zero-param functions legitimately receive empty input.
81+
82+
Remote execution errors (timeouts, worker failures) are caught and
83+
returned as JSON responses instead of raising through FastAPI.
8184
"""
85+
from fastapi.responses import JSONResponse
86+
8287
if hasattr(body, "model_fields_set") and not body.model_fields_set:
8388
raise HTTPException(
8489
status_code=422,
@@ -88,11 +93,22 @@ async def call_with_body(func, body):
8893
'optional parameters, e.g. {"input": {"param_name": null}}.'
8994
),
9095
)
91-
if hasattr(body, "model_dump"):
92-
return await func(**body.model_dump())
93-
raw = body.get("input", body) if isinstance(body, dict) else body
94-
kwargs = _map_body_to_params(func, raw)
95-
return await func(**kwargs)
96+
try:
97+
if hasattr(body, "model_dump"):
98+
return await func(**body.model_dump())
99+
raw = body.get("input", body) if isinstance(body, dict) else body
100+
kwargs = _map_body_to_params(func, raw)
101+
return await func(**kwargs)
102+
except Exception as exc:
103+
msg = str(exc)
104+
# strip the "Remote execution failed: " wrapper if present
105+
prefix = "Remote execution failed: "
106+
if msg.startswith(prefix):
107+
msg = msg[len(prefix) :]
108+
return JSONResponse(
109+
status_code=500,
110+
content={"error": msg},
111+
)
96112

97113

98114
def to_dict(body) -> dict:
@@ -138,13 +154,13 @@ async def lb_execute(resource_config, func, body: dict):
138154
if routing and routing.get("method")
139155
else func.__name__
140156
)
141-
log.info(f"[REMOTE] {resource_config} | {route_label}")
157+
log.debug(f"{resource_config} | {route_label}")
142158

143159
try:
144160
result = await stub(
145161
func, dependencies, system_dependencies, accelerate_downloads, **kwargs
146162
)
147-
log.info(f"[REMOTE] {resource_config} | Execution complete")
163+
log.debug(f"{resource_config} | execution complete")
148164
return result
149165
except TimeoutError as e:
150166
raise HTTPException(status_code=504, detail=str(e))

0 commit comments

Comments
 (0)