open-edge-platform · tybulewicz · Mar 5, 2026 · Mar 2, 2026 · Mar 2, 2026 · Mar 2, 2026
@@ -98,20 +98,56 @@ jobs:
       - name: Run python unit tests
         run: uv run pytest tests/unit --cov
 
-      - name: Prepare test data
+      - &prepare-test-data
+        name: Prepare test data
         run: |
           uv run python tests/accuracy/download_models.py -d data -j tests/precommit/public_scope.json -l
 
       - name: Run test
         run: |
           uv run pytest --data=./data tests/functional
 
+  serving_api-tests:
+    strategy:
+      fail-fast: false
+      matrix:
+        os:
+          - "ubuntu-latest"
+        python-version:
+          - "3.11"
+          - "3.12"
+          - "3.13"
+          - "3.14"
+    runs-on: ${{ matrix.os }}
+    steps:
+      - name: Set up docker for macOS
+        if: startsWith(matrix.os, 'macos-1')
+        run: |
+          brew install colima docker
+          colima start
+
+      - *checkout
+
+      - *matrix-setup-uv
+
+      - name: Install dependencies
+        run: uv sync --locked --extra tests --extra ovms --extra-index-url https://download.pytorch.org/whl/cpu
+
+      - *prepare-test-data
+
+      - name: serving_api
+        run: |
+          uv run python -c "from model_api.models import DetectionModel; DetectionModel.create_model('./data/otx_models/detection_model_with_xai_head.xml').save('ovms_models/ssd_mobilenet_v1_fpn_coco/1/ssd_mobilenet_v1_fpn_coco.xml')"
+          docker run -d --rm -v $GITHUB_WORKSPACE/ovms_models/:/models -p 8000:8000 openvino/model_server:latest --model_path /models/ssd_mobilenet_v1_fpn_coco/ --model_name ssd_mobilenet_v1_fpn_coco --rest_port 8000 --log_level DEBUG --target_device CPU
+          uv run python examples/serving_api/run.py data/coco128/images/train2017/000000000009.jpg  # detects 4 objects
+
   pre-commit-result:
     runs-on: ubuntu-latest
     needs:
       - accuracy-tests
       - code_quality_checks
       - unit-functional-tests
+      - serving_api-tests
     if: always()
     steps:
       - name: All tests ok

@@ -145,3 +145,4 @@ docs/source/_build/
 .vscode/
 
 data/
+ovms_models/
@@ -11,7 +11,7 @@
 
 ## Introduction
 
-Model API is a set of wrapper classes for particular tasks and model architectures, simplifying data preprocess and postprocess as well as routine procedures (model loading, asynchronous execution, etc.). It is aimed at simplifying end-to-end model inference. The Model API is based on the OpenVINO inference API.
+Model API is a set of wrapper classes for particular tasks and model architectures, simplifying data preprocess and postprocess as well as routine procedures (model loading, asynchronous execution, etc.). It is aimed at simplifying end-to-end model inference for different deployment scenarios, including local execution and serving. The Model API is based on the OpenVINO inference API.
 
 ## How it works
 
@@ -29,6 +29,7 @@ Training Extensions embed all the metadata required for inference into model fil
 
 - Python API
 - Synchronous and asynchronous inference
+- Local inference and serving through the REST API
 - Model preprocessing embedding for faster inference
 
 ## Installation
@@ -41,6 +42,7 @@ Training Extensions embed all the metadata required for inference into model fil
 from model_api.models import Model
 
 # Create a model wrapper from a compatible model generated by OpenVINO Training Extensions
+# To work with an OVMS-served model, pass its endpoint instead of a file path, e.g. "localhost:8000/v2/models/ssdlite_mobilenet_v2"
 model = Model.create_model("model.xml")
 
 # Run synchronous inference locally
@@ -52,7 +54,7 @@ print(f"Inference result: {result}")
 
 ## Prepare a model for `InferenceAdapter`
 
-There are usecases when it is not possible to modify an internal `ov::Model` and it is hidden behind `InferenceAdapter`. `create_model()` can construct a model from a given `InferenceAdapter`. That approach assumes that the model in `InferenceAdapter` was already configured by `create_model()` called with a string (a path or a model name). It is possible to prepare such model:
+There are usecases when it is not possible to modify an internal `ov::Model` and it is hidden behind `InferenceAdapter`. For example the model can be served using [OVMS](https://github.com/openvinotoolkit/model_server). `create_model()` can construct a model from a given `InferenceAdapter`. That approach assumes that the model in `InferenceAdapter` was already configured by `create_model()` called with a string (a path or a model name). It is possible to prepare such model:
 
 ```python
 model = DetectionModel.create_model("~/.cache/omz/public/ssdlite_mobilenet_v2/FP16/ssdlite_mobilenet_v2.xml")

@@ -11,6 +11,10 @@
 [todo]
 :::
 
+:::{grid-item-card} Ovms Adapter
+:link: ./ovms_adapter
+:link-type: doc
+
 [todo]
 :::
 :::{grid-item-card} Onnx Adapter
@@ -41,5 +45,6 @@
 ./inference_adapter
 ./onnx_adapter
 ./openvino_adapter
+./ovms_adapter
 ./utils
 ```
@@ -0,0 +1,8 @@
+# OVMS Adapter
+
+```{eval-rst}
+.. automodule:: model_api.adapters.ovms_adapter
+   :members:
+   :undoc-members:
+   :show-inheritance:
+```
@@ -0,0 +1,40 @@
+# Serving API example
+
+This example demonstrates how to use a Python API of OpenVINO Model API for a remote inference of models hosted with [OpenVINO Model Server](https://docs.openvino.ai/latest/ovms_what_is_openvino_model_server.html). This tutorial assumes that you are familiar with Docker subsystem and includes the following steps:
+
+- Run Docker image with
+- Instantiate a model
+- Run inference
+- Process results
+
+## Prerequisites
+
+- Install Model API from source. Please refer to the main [README](../../../README.md) for details.
+- Install Docker. Please refer to the [official documentation](https://docs.docker.com/get-docker/) for details.
+- Install Triton HTTP client (used by the OVMS adapter) into the Python environment:
+
+  ```bash
+  pip install 'tritonclient[http]'
+  ```
+
+- Download a model by running a Python code with Model API, see Python [example](../../synchronous_api/README.md) and resave a configured model at OVMS friendly folder layout:
+
+  ```python
+  from model_api.models import DetectionModel
+
+  DetectionModel.create_model("ssd_mobilenet_v1_fpn_coco").save("/home/user/models/ssd_mobilenet_v1_fpn_coco/1/ssd_mobilenet_v1_fpn_coco.xml")
+  ```
+
+- Run docker with OVMS server:
+
+  ```bash
+  docker run -d -v /home/user/models:/models -p 8000:8000 openvino/model_server:latest --model_path /models/ssd_mobilenet_v1_fpn_coco --model_name ssd_mobilenet_v1_fpn_coco --rest_port 8000 --nireq 4 --target_device CPU
+  ```
+
+## Run example
+
+To run the example, please execute the following command:
+
+```bash
+python run.py <path_to_image>
+```
@@ -0,0 +1,34 @@
+#!/usr/bin/env python3
+#
+# Copyright (C) 2020-2024 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+#
+
+import sys
+
+import cv2
+
+from model_api.models import DetectionModel
+
+
+def main():
+    if len(sys.argv) != 2:
+        usage_message = f"Usage: {sys.argv[0]} <path_to_image>"
+        raise RuntimeError(usage_message)
+
+    image = cv2.imread(sys.argv[1])
+    if image is None:
+        error_message = f"Failed to read the image: {sys.argv[1]}"
+        raise RuntimeError(error_message)
+    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
+    # Create Object Detection model specifying the OVMS server URL
+    model = DetectionModel.create_model(
+        "localhost:8000/v2/models/ssd_mobilenet_v1_fpn_coco",
+        model_type="ssd",
+    )
+    detections = model(image)
+    print(f"Detection results: {detections}")
+
+
+if __name__ == "__main__":
+    main()
@@ -35,6 +35,9 @@ dependencies = [
 ]
 
 [project.optional-dependencies]
+ovms = [
+  "tritonclient[http]<2.59",
+]
 tests = [
     "httpx",
     "pytest",

@@ -79,13 +79,25 @@ The following tasks can be solved with wrappers usage:
 
 Model API wrappers are executor-agnostic, meaning it does not implement the specific model inference or model loading, instead it can be used with different executors having the implementation of common interface methods in adapter class respectively.
 
-Currently, `OpenvinoAdapter` and `ONNXRuntimeAdapter` are supported.
+Currently, `OpenvinoAdapter`, `OVMSAdapter`, and `ONNXRuntimeAdapter` are supported.
 
 ### OpenVINO Adapter
 
 `OpenvinoAdapter` hides the OpenVINO™ toolkit API, which allows Model API wrappers launching with models represented in Intermediate Representation (IR) format.
 It accepts a path to either `xml` model file or `onnx` model file.
 
+### OpenVINO Model Server Adapter
+
+`OVMSAdapter` hides the OpenVINO Model Server python client API, which allows Model API wrappers launching with models served by OVMS.
+
+Refer to **[`OVMSAdapter`](adapters/ovms_adapter.md)** to learn about running demos with OVMS.
+
+For using OpenVINO Model Server Adapter you need to install the package with extra module:
+
+```sh
+pip install <omz_dir>/demos/common/python[ovms]
+```
+
 ### ONNXRuntime Adapter
 
 `ONNXRuntimeAdapter` hides the ONNXRuntime, which Model API wrappers launching with models represented in ONNX format.

@@ -5,13 +5,15 @@
 
 from .onnx_adapter import ONNXRuntimeAdapter
 from .openvino_adapter import OpenvinoAdapter, create_core, get_user_config
+from .ovms_adapter import OVMSAdapter
 from .utils import INTERPOLATION_TYPES, RESIZE_TYPES, InputTransform, Layout
 
 __all__ = [
     "create_core",
     "get_user_config",
     "Layout",
     "OpenvinoAdapter",
+    "OVMSAdapter",
     "ONNXRuntimeAdapter",
     "RESIZE_TYPES",
     "InputTransform",

@@ -0,0 +1,59 @@
+# OpenVINO Model Server Adapter
+
+The `OVMSAdapter` implements `InferenceAdapter` interface. The `OVMSAdapter` makes it possible to use Model API with models hosted in OpenVINO Model Server.
+
+## Prerequisites
+
+`OVMSAdapter` enables inference via calls to OpenVINO Model Server, so in order to use it you need two things:
+
+- OpenVINO Model Server that serves your model
+- [`tritonclient[http]`](https://pypi.org/project/tritonclient/) package installed to enable communication with the model server: `python3 -m pip install tritonclient[http]`
+
+### Deploy OpenVINO Model Server
+
+Model Server is distributed as a docker image and it's available in DockerHub, so you can use it with `docker run` command. See [model server documentation](https://github.com/openvinotoolkit/model_server/blob/main/docs/starting_server.md) to learn how to deploy OpenVINO optimized models with OpenVINO Model Server.
+
+## Model configuration
+
+When using OpenVINO Model Server model cannot be directly accessed from the client application. Therefore any configuration must be done on model server side or before starting the server: see [Prepare a model for `InferenceAdapter`](../../../../../README.md#prepare-a-model-for-inferenceadapter).
+
+### Input reshaping
+
+For some use cases you may want your model to reshape to match input of certain size. In that case, you should provide `--shape auto` parameter to model server startup command. With that option, model server will reshape model input on demand to match the input data.
+
+### Inference options
+
+It's possible to configure inference related options for the model in OpenVINO Model Server with options:
+
+- `--target_device` - name of the device to load the model to
+- `--nireq` - number of InferRequests
+- `--plugin_config` - configuration of the device plugin
+
+See [model server configuration parameters](https://github.com/openvinotoolkit/model_server/blob/main/docs/starting_server.md#serving-a-single-model) for more details.
+
+### Example OVMS startup command
+
+```bash
+docker run -d --rm -v /home/user/models:/models -p 8000:8000 openvino/model_server:latest --model_path /models/model1 --model_name model1 --port 8000 --shape auto --nireq 32 --target_device CPU --plugin_config "{\"CPU_THROUGHPUT_STREAMS\": \"CPU_THROUGHPUT_AUTO\"}"
+```
+
+> **Note**: In demos, while using `--adapter ovms`, inference options like: `-nireq`, `-nstreams` `-nthreads` as well as device specification with `-d` will be ignored.
+
+## Running demos with OVMSAdapter
+
+To run the demo with model served in OpenVINO Model Server, you would have to provide `--adapter ovms` option and modify `-m` parameter to indicate model inference service instead of the model files. Model parameter for `OVMSAdapter` follows this schema:
+
+`<service_address>/v2/models/<model_name>[/versions/<model_version>[/]]`
+
+- `<service_address>` - OVMS service address in form `<address>:<port>`
+- `<model_name>` - name of the target model (the one specified by `model_name` parameter in the model server startup command)
+- `<model_version>` _(optional)_ - version of the target model specified in the `/versions/<model_version>` path segment (default: latest)
+
+Assuming that model server runs on the same machine as the demo, exposes service on port 8000 and serves model called `model1`, the value of `-m` parameter would be:
+
+- `localhost:8000/v2/models/model1` - requesting latest model version
+- `localhost:8000/v2/models/model1/versions/2` - requesting model version number 2 (an optional trailing slash, e.g. `/versions/2/`, is also accepted)
+
+## See Also
+
+- [OpenVINO Model Server](https://github.com/openvinotoolkit/model_server)
Original file line number	Diff line number	Diff line change
Expand Up		@@ -145,3 +145,4 @@ docs/source/_build/
		.vscode/

		data/
		ovms_models/