Skip to content

Commit 987b443

Browse files
committed
Add documentation for is_model_ready
1 parent aea4149 commit 987b443

1 file changed

Lines changed: 61 additions & 1 deletion

File tree

README.md

Lines changed: 61 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -367,9 +367,25 @@ class TritonPythonModel:
367367
"""
368368
print('Cleaning up...')
369369

370+
def is_model_ready(self):
371+
"""`is_model_ready` is called whenever the model readiness is checked
372+
via the health endpoint (v2/models/<model>/ready). Implementing
373+
`is_model_ready` is optional. If not implemented, the model is
374+
considered ready as long as the stub process is healthy. This
375+
function must return a boolean value. Both sync and async
376+
implementations are supported.
377+
378+
Returns
379+
-------
380+
bool
381+
True if the model is ready to serve inference requests,
382+
False otherwise.
383+
"""
384+
return True
385+
370386
```
371387

372-
Every Python backend can implement four main functions:
388+
Every Python backend can implement the following main functions:
373389

374390
### `auto_complete_config`
375391

@@ -748,6 +764,50 @@ class TritonPythonModel:
748764
Implementing `finalize` is optional. This function allows you to do any clean
749765
ups necessary before the model is unloaded from Triton server.
750766

767+
### `is_model_ready`
768+
769+
Implementing `is_model_ready` is optional. When defined, this function is invoked whenever the model’s readiness is verified through the
770+
`v2/models/<model>/ready` health endpoint. It must return a **boolean** value
771+
(`True` or `False`). Both synchronous and asynchronous (`async def`)
772+
implementations are supported.
773+
774+
If `is_model_ready` is not implemented, the model is considered ready as long as the stub process remains healthy (the default behavior). In this case, no IPC overhead is incurred.
775+
776+
When `is_model_ready` is implemented, a readiness check timeout of five seconds is enforced. If the function fails to return within this period, the model is reported as not ready for that check. Only one internal readiness IPC call is executed per model instance at a given time. Concurrent readiness requests wait for the ongoing call to complete and reuse its result.
777+
778+
**Note:** The `is_model_ready` function should be kept as lightweight and efficient as possible. It shares an internal message queue with BLS decoupled response delivery. Although a slow readiness check does not affect standard (non‑decoupled) inference directly, it can delay the delivery of BLS decoupled streaming responses while both requests are processed. Avoid blocking operations such as long-running network calls or heavy computations inside this function.
779+
780+
```python
781+
import triton_python_backend_utils as pb_utils
782+
783+
784+
class TritonPythonModel:
785+
def initialize(self, args):
786+
# Load model resources, establish connections, etc.
787+
self.resource = connect_to_resource()
788+
789+
def is_model_ready(self):
790+
# Perform custom readiness checks such as verifying
791+
# that dependent resources are available.
792+
return self.resource.is_available()
793+
794+
def execute(self, requests):
795+
...
796+
797+
def finalize(self):
798+
self.resource.close()
799+
```
800+
801+
An asynchronous implementation is also supported:
802+
803+
```python
804+
class TritonPythonModel:
805+
async def is_model_ready(self):
806+
status = await self.check_dependency_health()
807+
return status.ok
808+
...
809+
```
810+
751811
You can look at the [add_sub example](examples/add_sub/model.py) which contains
752812
a complete example of implementing all these functions for a Python model
753813
that adds and subtracts the inputs given to it. After implementing all the

0 commit comments

Comments
 (0)