You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+61-1Lines changed: 61 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -367,9 +367,25 @@ class TritonPythonModel:
367
367
"""
368
368
print('Cleaning up...')
369
369
370
+
defis_model_ready(self):
371
+
"""`is_model_ready` is called whenever the model readiness is checked
372
+
via the health endpoint (v2/models/<model>/ready). Implementing
373
+
`is_model_ready` is optional. If not implemented, the model is
374
+
considered ready as long as the stub process is healthy. This
375
+
function must return a boolean value. Both sync and async
376
+
implementations are supported.
377
+
378
+
Returns
379
+
-------
380
+
bool
381
+
True if the model is ready to serve inference requests,
382
+
False otherwise.
383
+
"""
384
+
returnTrue
385
+
370
386
```
371
387
372
-
Every Python backend can implement four main functions:
388
+
Every Python backend can implement the following main functions:
373
389
374
390
### `auto_complete_config`
375
391
@@ -748,6 +764,50 @@ class TritonPythonModel:
748
764
Implementing `finalize` is optional. This function allows you to do any clean
749
765
ups necessary before the model is unloaded from Triton server.
750
766
767
+
### `is_model_ready`
768
+
769
+
Implementing `is_model_ready` is optional. When defined, this function is invoked whenever the model’s readiness is verified through the
770
+
`v2/models/<model>/ready` health endpoint. It must return a **boolean** value
771
+
(`True` or `False`). Both synchronous and asynchronous (`async def`)
772
+
implementations are supported.
773
+
774
+
If `is_model_ready` is not implemented, the model is considered ready as long as the stub process remains healthy (the default behavior). In this case, no IPC overhead is incurred.
775
+
776
+
When `is_model_ready` is implemented, a readiness check timeout of five seconds is enforced. If the function fails to return within this period, the model is reported as not ready for that check. Only one internal readiness IPC call is executed per model instance at a given time. Concurrent readiness requests wait for the ongoing call to complete and reuse its result.
777
+
778
+
**Note:** The `is_model_ready` function should be kept as lightweight and efficient as possible. It shares an internal message queue with BLS decoupled response delivery. Although a slow readiness check does not affect standard (non‑decoupled) inference directly, it can delay the delivery of BLS decoupled streaming responses while both requests are processed. Avoid blocking operations such as long-running network calls or heavy computations inside this function.
779
+
780
+
```python
781
+
import triton_python_backend_utils as pb_utils
782
+
783
+
784
+
classTritonPythonModel:
785
+
definitialize(self, args):
786
+
# Load model resources, establish connections, etc.
787
+
self.resource = connect_to_resource()
788
+
789
+
defis_model_ready(self):
790
+
# Perform custom readiness checks such as verifying
791
+
# that dependent resources are available.
792
+
returnself.resource.is_available()
793
+
794
+
defexecute(self, requests):
795
+
...
796
+
797
+
deffinalize(self):
798
+
self.resource.close()
799
+
```
800
+
801
+
An asynchronous implementation is also supported:
802
+
803
+
```python
804
+
classTritonPythonModel:
805
+
asyncdefis_model_ready(self):
806
+
status =awaitself.check_dependency_health()
807
+
return status.ok
808
+
...
809
+
```
810
+
751
811
You can look at the [add_sub example](examples/add_sub/model.py) which contains
752
812
a complete example of implementing all these functions for a Python model
753
813
that adds and subtracts the inputs given to it. After implementing all the
0 commit comments