You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: accounts-billing/cost-centers.mdx
+9Lines changed: 9 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,6 +6,15 @@ tag: "NEW"
6
6
7
7
Cost centers let you attach billing labels to your Runpod resources to track and manage spending across your organization. By grouping your compute resources into cost centers, you can attribute charges to specific teams, projects, or departments.
8
8
9
+
<iframe
10
+
className="w-full aspect-video rounded-xl"
11
+
src="https://www.youtube.com/embed/0MEYF00Kno0"
12
+
title="3 Minute Runpod: Allocate GPU spend to Cost Centers for reporting and invoicing"
Open a new terminal tab or window and test your endpoints using cURL:
@@ -100,21 +100,21 @@ curl -X POST http://localhost:8888/lb_worker/process \
100
100
-d '{"input_data": {"message": "Hello from Flash"}}'
101
101
```
102
102
103
-
If you switch back to the terminal tab where you used `flash run`, you'll see the details of the job's progress.
103
+
If you switch back to the terminal tab where you used `flash dev`, you'll see the details of the job's progress.
104
104
105
105
### Faster testing with auto-provisioning
106
106
107
107
For development with multiple endpoints, use `--auto-provision` to deploy all resources before testing:
108
108
109
109
```bash
110
-
uv run flash run --auto-provision
110
+
uv run flash dev --auto-provision
111
111
```
112
112
113
113
This eliminates cold-start delays by provisioning all serverless endpoints upfront. Endpoints are cached and reused across server restarts, making subsequent runs faster. Resources are identified by name, so the same endpoint won't be re-deployed if the configuration hasn't changed.
114
114
115
115
## Step 5: Open the API explorer
116
116
117
-
Besides starting the API server, `flash run` also starts an interactive API explorer. Point your web browser at [http://localhost:8888/docs](http://localhost:8888/docs) to explore the API.
117
+
Besides starting the API server, `flash dev` also starts an interactive API explorer. Point your web browser at [http://localhost:8888/docs](http://localhost:8888/docs) to explore the API.
Copy file name to clipboardExpand all lines: flash/apps/deploy-apps.mdx
+142Lines changed: 142 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -275,6 +275,7 @@ The `flash_manifest.json` file is the brain of your deployment. It tells each en
275
275
- Which functions to execute.
276
276
- What Docker image to use.
277
277
- How to configure resources (GPUs, workers, scaling).
278
+
- Environment variables for workers.
278
279
- How to route HTTP requests (for load balancer endpoints).
279
280
280
281
```json
@@ -293,6 +294,10 @@ The `flash_manifest.json` file is the brain of your deployment. It tells each en
293
294
"imageName": "runpod/flash:latest",
294
295
"gpuIds": "AMPERE_16",
295
296
"workersMax": 3,
297
+
"env": {
298
+
"HF_TOKEN": "your_token",
299
+
"MODEL_ID": "gpt2"
300
+
},
296
301
"functions": [
297
302
{"name": "gpu_hello", "module": "gpu_worker"}
298
303
]
@@ -329,6 +334,143 @@ When one endpoint needs to call a function on another endpoint:
329
334
330
335
Each endpoint maintains its own connection to the state manager, querying for peer endpoint URLs as needed and caching results for 300 seconds to minimize API calls.
331
336
337
+
#### Calling another endpoint from your code
338
+
339
+
To call one endpoint from another, import the target endpoint function **inside** your function body. Flash automatically detects these imports and generates the necessary dispatch stubs.
340
+
341
+
For example, if you have a GPU worker for inference:
342
+
343
+
```python gpu_worker.py
344
+
from runpod_flash import Endpoint, GpuType
345
+
346
+
@Endpoint(
347
+
name="gpu-inference",
348
+
gpu=GpuType.NVIDIA_GEFORCE_RTX_4090,
349
+
dependencies=["torch"]
350
+
)
351
+
asyncdefgpu_inference(payload: dict) -> dict:
352
+
import torch
353
+
# GPU inference logic
354
+
return {"result": "processed"}
355
+
```
356
+
357
+
You can call it from a CPU-based pipeline endpoint:
358
+
359
+
```python cpu_worker.py
360
+
from runpod_flash import Endpoint
361
+
362
+
@Endpoint(name="pipeline", cpu="cpu5c-4-8")
363
+
asyncdefclassify(text: str) -> dict:
364
+
# Import the GPU endpoint inside the function body
365
+
from gpu_worker import gpu_inference
366
+
367
+
# Flash routes this call to the gpu-inference endpoint
368
+
result =await gpu_inference({"text": text})
369
+
return {"classification": result}
370
+
```
371
+
372
+
## Call deployed endpoints from scripts
373
+
374
+
After deploying your Flash app, you can call your `@Endpoint` functions directly from Python scripts. Flash automatically resolves the app context from your project structure, so in most cases you can run scripts without any additional configuration.
375
+
376
+
### How it works
377
+
378
+
When you run a script that calls an `@Endpoint` function, Flash:
379
+
380
+
1. Detects the app context from the project directory structure.
381
+
2. Looks up the deployed endpoint by name within the resolved app and environment.
382
+
3. Routes the request to that endpoint using Flash's sentinel service.
383
+
4. Returns the result to your script.
384
+
385
+
This lets you reuse the same `@Endpoint` function definitions to interact with deployed endpoints without modifying your code.
386
+
387
+
### Example: calling within the same script
388
+
389
+
The simplest approach is to call the endpoint directly in the same file where it's defined:
390
+
391
+
```python
392
+
# gpu_worker.py
393
+
import asyncio
394
+
from runpod_flash import Endpoint, GpuType
395
+
396
+
@Endpoint(
397
+
name="inference",
398
+
gpu=GpuType.NVIDIA_GEFORCE_RTX_4090,
399
+
dependencies=["torch"]
400
+
)
401
+
asyncdefrun_inference(data: dict) -> dict:
402
+
import torch
403
+
# Inference logic
404
+
return {"result": "processed"}
405
+
406
+
asyncdefmain():
407
+
result =await run_inference({"input": "data"})
408
+
print(result)
409
+
410
+
if__name__=="__main__":
411
+
asyncio.run(main())
412
+
```
413
+
414
+
Run the script:
415
+
416
+
```bash
417
+
python gpu_worker.py
418
+
```
419
+
420
+
### Example: importing from another script
421
+
422
+
You can also import and call endpoints from a separate script:
423
+
424
+
```python
425
+
# call_inference.py
426
+
import asyncio
427
+
from gpu_worker import run_inference
428
+
429
+
asyncdefmain():
430
+
# Flash resolves the app context automatically
431
+
result =await run_inference({"input": "data"})
432
+
print(result)
433
+
434
+
if__name__=="__main__":
435
+
asyncio.run(main())
436
+
```
437
+
438
+
Run the script:
439
+
440
+
```bash
441
+
python call_inference.py
442
+
```
443
+
444
+
### Override the resolved context
445
+
446
+
Flash resolves the app name from your project's directory structure. Use `FLASH_APP` and `FLASH_ENV` environment variables to override this automatic resolution when needed.
447
+
448
+
A common use case is when you move a script to a different directory. Since the resolved app name depends on the directory location, moving the script changes the resolved context. To continue targeting the original app, set `FLASH_APP` explicitly:
If Flash cannot resolve the app context and you haven't set the environment variables, it raises an error:
463
+
464
+
```text
465
+
RuntimeError: no flash context for endpoint 'inference'. either:
466
+
- use 'flash dev' for local development
467
+
- set FLASH_APP and FLASH_ENV to target a deployed environment
468
+
```
469
+
470
+
### Automatic context in deployed workers
471
+
472
+
When Flash deploys your app, it automatically sets `FLASH_APP` and `FLASH_ENV` environment variables on each worker. This enables cross-endpoint communication within your deployed application without additional configuration.
The `flash init` command creates a new Flash project with a complete project structure, including example <LoadBalancingEndpointsTooltip /> and <QueueBasedEndpointsTooltip />, and configuration files. This gives you a working starting point for building Flash applications.
10
10
11
-
Use `flash init` whenever you want to start a new Flash project, fully configured for you to run `flash run` and `flash deploy`.
11
+
Use `flash init` whenever you want to start a new Flash project, fully configured for you to run `flash dev` and `flash deploy`.
12
12
13
13
## Create a new project
14
14
@@ -105,13 +105,13 @@ Once your project is set up:
105
105
106
106
```bash
107
107
# Start the development server
108
-
flash run
108
+
flash dev
109
109
110
110
# Open the API explorer
111
111
# http://localhost:8888/docs
112
112
113
113
# If using uv:
114
-
uv run flash run
114
+
uv run flash dev
115
115
```
116
116
117
117
Make changes to your worker files, and the server reloads automatically. When you're ready, deploy with:
@@ -126,6 +126,6 @@ uv run flash deploy
126
126
## Next steps
127
127
128
128
-[Customize your app](/flash/apps/customize-app) to add endpoints and modify configurations.
129
-
-[Test locally](/flash/apps/local-testing) with `flash run`.
129
+
-[Test locally](/flash/apps/local-testing) with `flash dev`.
130
130
-[Deploy to production](/flash/apps/deploy-apps) with `flash deploy`.
131
131
-[View the flash init reference](/flash/cli/init) for all options.
0 commit comments