runpod
diff --git a/‎accounts-billing/cost-centers.mdx‎
Lines changed: 9 additions & 0 deletions b/‎accounts-billing/cost-centers.mdx‎
Lines changed: 9 additions & 0 deletions
diff --git a/‎docs.json‎
Lines changed: 1 addition & 1 deletion b/‎docs.json‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎flash/apps/build-app.mdx‎
Lines changed: 6 additions & 6 deletions b/‎flash/apps/build-app.mdx‎
Lines changed: 6 additions & 6 deletions
diff --git a/‎flash/apps/customize-app.mdx‎
Lines changed: 4 additions & 4 deletions b/‎flash/apps/customize-app.mdx‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎flash/apps/deploy-apps.mdx‎
Lines changed: 142 additions & 0 deletions b/‎flash/apps/deploy-apps.mdx‎
Lines changed: 142 additions & 0 deletions
diff --git a/‎flash/apps/initialize-project.mdx‎
Lines changed: 4 additions & 4 deletions b/‎flash/apps/initialize-project.mdx‎
Lines changed: 4 additions & 4 deletions
@@ -6,6 +6,15 @@ tag: "NEW"
 
 Cost centers let you attach billing labels to your Runpod resources to track and manage spending across your organization. By grouping your compute resources into cost centers, you can attribute charges to specific teams, projects, or departments.
 
+<iframe
+  className="w-full aspect-video rounded-xl"
+  src="https://www.youtube.com/embed/0MEYF00Kno0"
+  title="3 Minute Runpod: Allocate GPU spend to Cost Centers for reporting and invoicing"
+  frameBorder="0"
+  allow="fullscreen; accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
+  allowFullScreen
+></iframe>
+
 ## Why use cost centers
 
 Cost centers help answer common questions about cloud GPU spending:
 
@@ -391,7 +391,7 @@
               "flash/cli/overview",
               "flash/cli/init",
               "flash/cli/login",
-              "flash/cli/run",
+              "flash/cli/dev",
               "flash/cli/build",
               "flash/cli/deploy",
               "flash/cli/env",
 
@@ -14,7 +14,7 @@ If you haven't already, we recommend starting with the [Quickstart](/flash/quick
 
 - You've [created a Runpod account](/get-started/manage-accounts).
 - You've [created a Runpod API key](/get-started/api-keys).
-- You've installed [Python 3.12](https://www.python.org/downloads/).
+- You've installed [Python 3.10, 3.11, 3.12, or 3.13](https://www.python.org/downloads/).
 
 ## Step 1: Initialize a new project
 
@@ -80,10 +80,10 @@ uv pip install -r requirements.txt
 
 ## Step 4: Start the local API server
 
-Use `flash run` to start the API server:
+Use `flash dev` to start the API server:
 
 ```bash
-uv run flash run
+uv run flash dev
 ```
 
 Open a new terminal tab or window and test your endpoints using cURL:
@@ -100,21 +100,21 @@ curl -X POST http://localhost:8888/lb_worker/process \
     -d '{"input_data": {"message": "Hello from Flash"}}'
 ```
 
-If you switch back to the terminal tab where you used `flash run`, you'll see the details of the job's progress.
+If you switch back to the terminal tab where you used `flash dev`, you'll see the details of the job's progress.
 
 ### Faster testing with auto-provisioning
 
 For development with multiple endpoints, use `--auto-provision` to deploy all resources before testing:
 
 ```bash
-uv run flash run --auto-provision
+uv run flash dev --auto-provision
 ```
 
 This eliminates cold-start delays by provisioning all serverless endpoints upfront. Endpoints are cached and reused across server restarts, making subsequent runs faster. Resources are identified by name, so the same endpoint won't be re-deployed if the configuration hasn't changed.
 
 ## Step 5: Open the API explorer
 
-Besides starting the API server, `flash run` also starts an interactive API explorer. Point your web browser at [http://localhost:8888/docs](http://localhost:8888/docs) to explore the API.
+Besides starting the API server, `flash dev` also starts an interactive API explorer. Point your web browser at [http://localhost:8888/docs](http://localhost:8888/docs) to explore the API.
 
 To run endpoint functions in the explorer:
 
 
@@ -145,13 +145,13 @@ For details, see:
 
 ## Test your customizations
 
-After customizing your app, test locally with `flash run`:
+After customizing your app, test locally with `flash dev`:
 
 ```bash
-flash run
+flash dev
 
 # If using uv:
-uv run flash run
+uv run flash dev
 ```
 
 This starts a development server at http://localhost:8888 with:
@@ -169,7 +169,7 @@ Make sure to test:
 
 <CardGroup cols={2}>
   <Card title="Test locally" href="/flash/apps/local-testing" icon="flask" horizontal>
-    Use `flash run` for local development and testing.
+    Use `flash dev` for local development and testing.
   </Card>
   <Card title="Deploy to Runpod" href="/flash/apps/deploy-apps" icon="rocket" horizontal>
     Deploy your application to production with `flash deploy`.
 
@@ -275,6 +275,7 @@ The `flash_manifest.json` file is the brain of your deployment. It tells each en
 - Which functions to execute.
 - What Docker image to use.
 - How to configure resources (GPUs, workers, scaling).
+- Environment variables for workers.
 - How to route HTTP requests (for load balancer endpoints).
 
 ```json
@@ -293,6 +294,10 @@ The `flash_manifest.json` file is the brain of your deployment. It tells each en
       "imageName": "runpod/flash:latest",
       "gpuIds": "AMPERE_16",
       "workersMax": 3,
+      "env": {
+        "HF_TOKEN": "your_token",
+        "MODEL_ID": "gpt2"
+      },
       "functions": [
         {"name": "gpu_hello", "module": "gpu_worker"}
       ]
@@ -329,6 +334,143 @@ When one endpoint needs to call a function on another endpoint:
 
 Each endpoint maintains its own connection to the state manager, querying for peer endpoint URLs as needed and caching results for 300 seconds to minimize API calls.
 
+#### Calling another endpoint from your code
+
+To call one endpoint from another, import the target endpoint function **inside** your function body. Flash automatically detects these imports and generates the necessary dispatch stubs.
+
+For example, if you have a GPU worker for inference:
+
+```python gpu_worker.py
+from runpod_flash import Endpoint, GpuType
+
+@Endpoint(
+    name="gpu-inference",
+    gpu=GpuType.NVIDIA_GEFORCE_RTX_4090,
+    dependencies=["torch"]
+)
+async def gpu_inference(payload: dict) -> dict:
+    import torch
+    # GPU inference logic
+    return {"result": "processed"}
+```
+
+You can call it from a CPU-based pipeline endpoint:
+
+```python cpu_worker.py
+from runpod_flash import Endpoint
+
+@Endpoint(name="pipeline", cpu="cpu5c-4-8")
+async def classify(text: str) -> dict:
+    # Import the GPU endpoint inside the function body
+    from gpu_worker import gpu_inference
+
+    # Flash routes this call to the gpu-inference endpoint
+    result = await gpu_inference({"text": text})
+    return {"classification": result}
+```
+
+## Call deployed endpoints from scripts
+
+After deploying your Flash app, you can call your `@Endpoint` functions directly from Python scripts. Flash automatically resolves the app context from your project structure, so in most cases you can run scripts without any additional configuration.
+
+### How it works
+
+When you run a script that calls an `@Endpoint` function, Flash:
+
+1. Detects the app context from the project directory structure.
+2. Looks up the deployed endpoint by name within the resolved app and environment.
+3. Routes the request to that endpoint using Flash's sentinel service.
+4. Returns the result to your script.
+
+This lets you reuse the same `@Endpoint` function definitions to interact with deployed endpoints without modifying your code.
+
+### Example: calling within the same script
+
+The simplest approach is to call the endpoint directly in the same file where it's defined:
+
+```python
+# gpu_worker.py
+import asyncio
+from runpod_flash import Endpoint, GpuType
+
+@Endpoint(
+    name="inference",
+    gpu=GpuType.NVIDIA_GEFORCE_RTX_4090,
+    dependencies=["torch"]
+)
+async def run_inference(data: dict) -> dict:
+    import torch
+    # Inference logic
+    return {"result": "processed"}
+
+async def main():
+    result = await run_inference({"input": "data"})
+    print(result)
+
+if __name__ == "__main__":
+    asyncio.run(main())
+```
+
+Run the script:
+
+```bash
+python gpu_worker.py
+```
+
+### Example: importing from another script
+
+You can also import and call endpoints from a separate script:
+
+```python
+# call_inference.py
+import asyncio
+from gpu_worker import run_inference
+
+async def main():
+    # Flash resolves the app context automatically
+    result = await run_inference({"input": "data"})
+    print(result)
+
+if __name__ == "__main__":
+    asyncio.run(main())
+```
+
+Run the script:
+
+```bash
+python call_inference.py
+```
+
+### Override the resolved context
+
+Flash resolves the app name from your project's directory structure. Use `FLASH_APP` and `FLASH_ENV` environment variables to override this automatic resolution when needed.
+
+A common use case is when you move a script to a different directory. Since the resolved app name depends on the directory location, moving the script changes the resolved context. To continue targeting the original app, set `FLASH_APP` explicitly:
+
+```bash
+FLASH_APP=my-app python call_inference.py
+```
+
+You can also override the environment:
+
+```bash
+FLASH_APP=my-app FLASH_ENV=production python call_inference.py
+```
+
+### Error without context
+
+If Flash cannot resolve the app context and you haven't set the environment variables, it raises an error:
+
+```text
+RuntimeError: no flash context for endpoint 'inference'. either:
+  - use 'flash dev' for local development
+  - set FLASH_APP and FLASH_ENV to target a deployed environment
+```
+
+### Automatic context in deployed workers
+
+When Flash deploys your app, it automatically sets `FLASH_APP` and `FLASH_ENV` environment variables on each worker. This enables cross-endpoint communication within your deployed application without additional configuration.
+
 ## Troubleshooting
 
 ### No @Endpoint functions found
 
@@ -8,7 +8,7 @@ import { LoadBalancingEndpointsTooltip, QueueBasedEndpointsTooltip } from "/snip
 
 The `flash init` command creates a new Flash project with a complete project structure, including example <LoadBalancingEndpointsTooltip /> and <QueueBasedEndpointsTooltip />, and configuration files. This gives you a working starting point for building Flash applications.
 
-Use `flash init` whenever you want to start a new Flash project, fully configured for you to run `flash run` and `flash deploy`.
+Use `flash init` whenever you want to start a new Flash project, fully configured for you to run `flash dev` and `flash deploy`.
 
 ## Create a new project
 
@@ -105,13 +105,13 @@ Once your project is set up:
 
 ```bash
 # Start the development server
-flash run
+flash dev
 
 # Open the API explorer
 # http://localhost:8888/docs
 
 # If using uv:
-uv run flash run
+uv run flash dev
 ```
 
 Make changes to your worker files, and the server reloads automatically. When you're ready, deploy with:
@@ -126,6 +126,6 @@ uv run flash deploy
 ## Next steps
 
 - [Customize your app](/flash/apps/customize-app) to add endpoints and modify configurations.
-- [Test locally](/flash/apps/local-testing) with `flash run`.
+- [Test locally](/flash/apps/local-testing) with `flash dev`.
 - [Deploy to production](/flash/apps/deploy-apps) with `flash deploy`.
 - [View the flash init reference](/flash/cli/init) for all options.