Skip to content

Commit aee593d

Browse files
committed
docs(FR-2571): address 26.4 docs feedback — remove legacy model serving sections and apply KO wording fixes
Resolves Sujin's feedback thread for PRs #6760 and #6767: - Remove legacy (pre-26.4.0) sections: Auto Scaling Rules, Model Store (browsing/details/clone/run), Admin Model Store Management (create/edit/delete/scan). Promote the V2 subsections to main section content. - Drop `:::note` version-gate blocks, `#### (version 26.4.0 and later)` headings, and associated HTML anchors now that only the 26.4.0 UI is documented. - Remove the `Scan Project Model Cards` subsection entirely (no longer in UI). - Rewrite Auto Scaling `Step Size` direction with explicit threshold formulas (`[minThreshold] < [metric]`, `[metric] < [maxThreshold]`, `[minThreshold] < [metric] < [maxThreshold]`) instead of prose about `+`/`-`/`±`. - Korean translations: - Translate hardcoded 'No compatible presets available. This model cannot be deployed.' to Korean. - Apply 영어→한국어(영어) formatting consistency in Model Store field lists. - `Edit` → `수정` on Service Info card, routing-info alert, revision edit behavior note. - `Edit`/`Confirm` → `수정`/`업데이트` in Modifying a Service section. - Rewrite `Access Level Internal` description across all languages: clarify that Internal is visible only to administrators of the owning domain and project. - Simplify ripple references to the deploy button (`Run this model` (`Deploy` on 26.4.0) → `Deploy`) in EN/KO/JA/TH. - Add TODO(Recapture) markers for images that still reference legacy or placeholder states (`auto_scaling_rules_v2.png`, `service_launcher_runtime_variant.png`).
1 parent 56dbd79 commit aee593d

4 files changed

Lines changed: 323 additions & 799 deletions

File tree

packages/backend.ai-webui-docs/src/en/model_serving/model_serving.md

Lines changed: 50 additions & 165 deletions
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,7 @@ To use the Model Service, you need to follow the steps below:
7272
:::tip
7373
As an alternative workflow, you can browse pre-configured models in the
7474
[Model Store](#model-store) and deploy them with a single click using the
75-
`Run this model` button (renamed `Deploy` on version 26.4.0 and later).
75+
`Deploy` button.
7676
:::
7777

7878
<a id="model-definition-guide"></a>
@@ -293,10 +293,10 @@ please refer to the [Explore Folder](#explore-folder) section.
293293
The service definition file (`service-definition.toml`) allows administrators to pre-configure the resources, environment, and runtime settings required for a model service. When this file is present in a model folder, the system uses these settings as default values when creating a service.
294294
295295
Both `model-definition.yaml` and `service-definition.toml` must be present in the
296-
model folder to enable the `Run this model` button (`Deploy` on version 26.4.0 and
297-
later) on the Model Store page. These two files work together: the model definition
298-
specifies the model and inference server configuration, while the service definition
299-
specifies the runtime environment, resource allocation, and environment variables.
296+
model folder to enable the `Deploy` button on the Model Store page. These two files
297+
work together: the model definition specifies the model and inference server
298+
configuration, while the service definition specifies the runtime environment,
299+
resource allocation, and environment variables.
300300
301301
The service definition file follows the TOML format with sections organized by runtime variant. Each section configures a specific aspect of the service:
302302
@@ -340,10 +340,10 @@ selected runtime variant when creating the service.
340340
:::
341341

342342
:::note
343-
When a service is created from the Model Store using the `Run this model` button
344-
(`Deploy` on version 26.4.0 and later), the settings from `service-definition.toml`
345-
are applied automatically. If you later need to adjust the resource allocation, you
346-
can modify the service through the Model Serving page.
343+
When a service is created from the Model Store using the `Deploy` button, the
344+
settings from `service-definition.toml` are applied automatically. If you later
345+
need to adjust the resource allocation, you can modify the service through the
346+
Model Serving page.
347347
:::
348348

349349
## Serving Page Overview
@@ -380,6 +380,7 @@ First, provide a service name. The following fields are available:
380380
For runtime variants such as `vLLM`, `SGLang`, `NVIDIA NIM`, or `Modular MAX`, there is no need to configure a `model-definition` file in your model folder. Instead, the system handles the model configuration automatically based on the selected variant.
381381

382382
![](../images/service_launcher_runtime_variant.png)
383+
<!-- TODO: Recapture — current image shows the runtime-variant dropdown with the vLLM entry still visible; replace with the post-search state where the vLLM result is no longer shown (reference: doc figure 15.7) -->
383384

384385
#### Model Definition Mode (Custom Runtime Only)
385386

@@ -628,59 +629,10 @@ Model definition and runtime parameters are distinct concepts stored separately
628629

629630
### Auto Scaling Rules
630631

631-
You can configure auto scaling rules for the model service.
632-
Based on the defined rules, the number of replicas is automatically reduced during low usage to conserve resources,
633-
and increased during high usage to prevent request delays or failures.
634-
635-
![](../images/auto_scaling_rules.png)
636-
637-
Click the `Add Rules` button to add a new rule. When you click the button, a modal appears
638-
where you can add a rule. Each field in the modal is described below:
639-
640-
- **Type**: Define the rule. Select either `Scale Out` or `Scale In` based on the scope of the rule.
641-
642-
- **Metric Source**: Inference Framework or kernel.
643-
644-
- Inference Framework: Average value taken from every replica. Supported only if AppProxy reports the inference metrics.
645-
- Kernel: Average value taken from every kernel backing the endpoint.
646-
647-
- **Condition**: Set the condition under which the auto scaling rule will be applied.
648-
649-
- **Metric Name**: The name of the metric to be compared. You can freely input any metric supported by the runtime environment.
650-
- **Comparator**: Method to compare live metrics with threshold value.
651-
652-
- LESS_THAN: Rule triggered when current metric value goes below the threshold defined
653-
- LESS_THAN_OR_EQUAL: Rule triggered when current metric value goes below or equals the threshold defined
654-
- GREATER_THAN: Rule triggered when current metric value goes above the threshold defined
655-
- GREATER_THAN_OR_EQUAL: Rule triggered when current metric value goes above or equals the threshold defined
656-
657-
- **Threshold**: A reference value to determine whether the scaling condition is met.
658-
659-
- **Step Size**: Size of step of the replica count to be changed when rule is triggered.
660-
Can be represented as both positive and negative value.
661-
When defined as negative, the rule will decrease number of replicas.
662-
663-
- **Max/Min Replicas**: Sets a maximum/minimum value for the replica count of the endpoint.
664-
Rule will not be triggered if the potential replica count gets above/below this value.
665-
666-
- **CoolDown Seconds**: Duration in seconds to skip reapplying the rule right after rule is first triggered.
667-
668-
![](../images/auto_scaling_rules_modal.png)
669-
670-
:::note
671-
From Backend.AI version **26.4.0** onwards, Auto Scaling Rules have been redesigned with
672-
Prometheus preset support and a new condition model. If you are on an older version, the
673-
description above still applies. Otherwise, refer to
674-
[Auto Scaling Rules (version 26.4.0 and later)](#auto-scaling-rules-version-26-4-0-and-later) below.
675-
:::
676-
677-
<a id="auto-scaling-rules-version-26-4-0-and-later"></a>
678-
679-
#### Auto Scaling Rules (version 26.4.0 and later)
680-
681-
On Backend.AI version 26.4.0 and later, Auto Scaling Rules are redesigned with a Prometheus metric source, a segmented condition control, and a richer rule list.
632+
Auto Scaling Rules automatically increase or decrease the number of replicas for a model service based on live metrics. This conserves resources during low usage and prevents request delays or failures during high usage.
682633

683634
![](../images/auto_scaling_rules_v2.png)
635+
<!-- TODO: Recapture with a realistic example rule (current screenshot contains placeholder test values such as `123123`) -->
684636

685637
The rule list provides:
686638

@@ -699,7 +651,12 @@ Click the `Add Rules` button to open the **Add Auto Scaling Rule** editor. To mo
699651
- **Single**: Defines a single comparison `Metric <op> Threshold`, where `<op>` is either `>` or `<`.
700652
- **Range**: Defines a range `Min Threshold < Metric < Max Threshold`. Both thresholds are required; the minimum must be less than the maximum.
701653

702-
- **Step Size**: A positive integer specifying how many replicas to add or remove per scaling event. The direction (add or remove) is derived automatically from which threshold is configured, so you only specify the magnitude.
654+
- **Step Size**: A positive integer specifying how many replicas to add or remove per scaling event. The direction (add or remove) is derived automatically from which threshold is configured:
655+
656+
- Only a minimum threshold is set → `[minThreshold] < [metric]`. Replicas are scaled **in** when the metric falls below the threshold.
657+
- Only a maximum threshold is set → `[metric] < [maxThreshold]`. Replicas are scaled **out** when the metric rises above the threshold.
658+
- Both thresholds are set → `[minThreshold] < [metric] < [maxThreshold]`. Replicas are scaled in or out depending on which boundary the metric crosses.
659+
703660
- **Time Window**: The time window, in seconds, over which the metric is aggregated and evaluated for scaling. This replaces the legacy `CoolDown Seconds` field and has a different meaning.
704661
- **Min Replicas** and **Max Replicas**: The lower and upper bounds that auto-scaling enforces on the replica count. Auto-scaling will not reduce the number of replicas below **Min Replicas** or increase it above **Max Replicas**.
705662

@@ -756,7 +713,7 @@ If a route has encountered an error, clicking the error indicator on the route r
756713

757714
### Modifying a Service
758715

759-
Click the `Edit` button on the endpoint detail page to modify a model service. The service launcher opens with previously entered fields already filled in. You can optionally modify only the fields you wish to change. After modifying the fields, click `Confirm` to apply the changes.
716+
Click the `Edit` button on the endpoint detail page to modify a model service. The service launcher opens with previously entered fields already filled in. You can optionally modify only the fields you wish to change. After modifying the fields, click the `Update` button to apply the changes.
760717

761718
![](../images/edit_model_service.png)
762719

@@ -833,63 +790,10 @@ To use the model, you will need the following information:
833790

834791
The Model Store provides a card-based gallery of pre-configured models that you can browse, search, and deploy. You can access the Model Store from the sidebar menu.
835792

836-
![](../images/model_store_page.png)
793+
![](../images/model_store_page_v2.png)
837794

838795
### Browsing and Searching Models
839796

840-
You can search for models by name, description, task, category, or label using the search bar at the top of the page. Additionally, you can use the filter dropdowns to narrow results:
841-
842-
- **Category**: Filter by model category (e.g., LLM).
843-
- **Task**: Filter by task type (e.g., text-generation).
844-
- **Label**: Filter by model labels.
845-
846-
### Model Card Details
847-
848-
Click on a model card to view its details in a modal. The model card modal displays:
849-
850-
- **Title**, **Author**, and **Version**
851-
- **Description** and **README**
852-
- **Task**, **Category**, and **Architecture**
853-
- **Framework** and **Labels**
854-
- **License**
855-
- **Minimum Resources** required to run the model
856-
- A link to the model storage folder
857-
858-
![](../images/model_card_detail_modal.png)
859-
860-
### Cloning a Model
861-
862-
Click the `Clone to a folder` button on the model card to clone the model folder to your own storage. A confirmation dialog will appear where you can specify the destination folder name.
863-
864-
![](../images/model_clone_dialog.png)
865-
866-
### Running a Model from Model Store
867-
868-
Click the `Run this model` button on the model card to deploy the model as a service. This requires both `model-definition.yaml` and `service-definition.toml` to be present in the model folder.
869-
870-
- If only one runtime variant is configured in the service definition, the service is launched automatically with the pre-configured settings.
871-
- If multiple runtime variants are available, you are redirected to the service launcher page to select one.
872-
873-
:::note
874-
When a service is created from the Model Store, the settings from
875-
`service-definition.toml` are applied automatically. You can modify the
876-
service later through the Serving page.
877-
:::
878-
879-
:::note
880-
From Backend.AI version **26.4.0** onwards, the Model Store has been redesigned. If you are on
881-
an older version, the description above still applies. Otherwise, refer to
882-
[Model Store (version 26.4.0 and later)](#model-store-version-26-4-0-and-later) below.
883-
:::
884-
885-
<a id="model-store-version-26-4-0-and-later"></a>
886-
887-
### Model Store (version 26.4.0 and later)
888-
889-
On Backend.AI version 26.4.0 and later, the Model Store is redesigned with a simplified browsing experience, a card-detail drawer, and a streamlined deploy flow that replaces the legacy browse/detail/run workflow.
890-
891-
![](../images/model_store_page_v2.png)
892-
893797
The page uses a search and sort layout at the top:
894798

895799
- **Search Models**: Use the **Filter By Name** property filter to search model cards by name.
@@ -902,6 +806,8 @@ If the `MODEL_STORE` project is not set up on the server, the page shows a *Mode
902806

903807
The list is paginated at the bottom. You can change the page size between `10`, `20`, and `50` entries.
904808

809+
### Model Card Details
810+
905811
Click a card to open the model card drawer on the right side of the page. The drawer shows the model title and description at the top, followed by the task, category, labels, and license tags, and then a details list with the following items:
906812

907813
- **Author**
@@ -916,7 +822,11 @@ If the model card includes a README, it is rendered as a `README.md` card at the
916822

917823
![](../images/model_card_detail_drawer.png)
918824

919-
To clone a model folder in version 26.4.0 and later, use the [Data](../vfolder/vfolder.md) page directly, since the Model Store drawer no longer provides a dedicated Clone button.
825+
### Cloning a Model
826+
827+
To clone a model folder, use the [Data](../vfolder/vfolder.md) page directly. The Model Store drawer does not provide a dedicated Clone button.
828+
829+
### Deploying a Model
920830

921831
Click the **Deploy** button in the drawer header to deploy the model as a service. The deploy flow behaves in one of two ways:
922832

@@ -950,9 +860,24 @@ The Admin Serving page has two tabs:
950860

951861
### Admin Model Store Management
952862

953-
Superadmins can manage model cards through the **Model Store Management** tab on the Admin Serving page. This tab provides a table view of all model cards with the following columns: **Name**, **Title**, **Task**, **Category**, **Labels**, **Created At**, and **Controls**.
863+
Superadmins can manage model cards through the **Model Store Management** tab on the Admin Serving page.
954864

955-
![](../images/admin_model_card_list.png)
865+
![](../images/admin_model_card_list_v2.png)
866+
867+
The list provides the following columns:
868+
869+
- **Name**: The unique identifier of the model card.
870+
- **Title**: The human-readable display name.
871+
- **Category**: The model category (e.g., LLM).
872+
- **Task**: The inference task type (e.g., text-generation).
873+
- **Access Level**: Shows a green `Public` tag when the model card is publicly accessible, or a default `Private` tag otherwise.
874+
- **Domain**: The domain that owns the model card.
875+
- **Project**: The project that owns the model card.
876+
- **Created At**: The timestamp when the model card was created.
877+
878+
You can filter the list by **Name** using the property filter bar at the top. Edit and delete action icons are shown directly in the **Name** cell of each row.
879+
880+
To delete multiple model cards at once, select the rows you want to remove using the checkboxes and click the red trash-bin button next to the selection count. A confirmation dialog appears before the cards are deleted.
956881

957882
#### Creating a Model Card
958883

@@ -973,56 +898,16 @@ Click the `Create Model Card` button to open the creation modal. Fill in the fol
973898
- **Domain**: The domain to associate the model card with.
974899
- **Project ID** (required): The project that owns the model card.
975900
- **VFolder** (required): The storage folder containing the model files.
976-
- **Access Level**: Set to `Internal` (visible within the domain) or `Public` (visible to all).
901+
- **Access Level**: Controls who can see the model card in the user-facing Model Store.
902+
903+
* `Internal`: Visible only to administrators of the owning domain and project. Regular users cannot see internal cards in their Model Store.
904+
* `Public`: Visible to all users who have access to the owning project.
977905

978906
#### Editing a Model Card
979907

980-
Click the edit icon in the **Controls** column to modify an existing model card. The edit modal opens with previously entered fields already filled in.
908+
Click the edit icon next to the model card name to modify an existing model card. The edit modal opens with previously entered fields already filled in.
981909

982910
#### Deleting Model Cards
983911

984-
You can delete individual model cards by clicking the delete icon in the **Controls** column, or perform bulk deletion by selecting multiple model cards and clicking `Delete Selected`.
985-
986-
#### Scanning Project Model Cards
987-
988-
Click the `Scan Project Model Cards` button to automatically scan a project's model folders and create model cards for any folders that contain valid model definitions. The scan results show the number of model cards created and updated.
989-
990-
:::note
991-
The **Scan Project Model Cards** button is not available on Backend.AI version 26.4.0 and later.
992-
:::
993-
994-
:::note
995-
From Backend.AI version **26.4.0** onwards, the Admin Model Store Management tab has been
996-
redesigned. If you are on an older version, the description above still applies. Otherwise,
997-
refer to [Admin Model Store Management (version 26.4.0 and later)](#admin-model-store-management-version-26-4-0-and-later) below.
998-
:::
999-
1000-
<a id="admin-model-store-management-version-26-4-0-and-later"></a>
1001-
1002-
#### Admin Model Store Management (version 26.4.0 and later)
1003-
1004-
On Backend.AI version 26.4.0 and later, the **Model Store Management** tab presents a redesigned model card list.
1005-
1006-
![](../images/admin_model_card_list_v2.png)
1007-
1008-
The list provides the following columns:
1009-
1010-
- **Name**: The unique identifier of the model card.
1011-
- **Title**: The human-readable display name.
1012-
- **Category**: The model category (e.g., LLM).
1013-
- **Task**: The inference task type (e.g., text-generation).
1014-
- **Access Level**: Shows a green `Public` tag when the model card is publicly accessible, or a default `Private` tag otherwise.
1015-
- **Domain**: The domain that owns the model card.
1016-
- **Project**: The project that owns the model card.
1017-
- **Created At**: The timestamp when the model card was created.
1018-
1019-
You can filter the list by **Name** using the property filter bar at the top. Edit and delete action icons are shown directly in the **Name** cell of each row.
1020-
1021-
To delete multiple model cards at once, select the rows you want to remove using the checkboxes and click the red trash-bin button next to the selection count. A confirmation dialog appears before the cards are deleted.
1022-
1023-
:::note
1024-
The Create, Edit, and Delete dialogs for individual model cards are the same as in the legacy
1025-
version. See [Creating a Model Card](#creating-a-model-card),
1026-
[Editing a Model Card](#editing-a-model-card), and [Deleting Model Cards](#deleting-model-cards).
1027-
:::
912+
You can delete an individual model card by clicking the delete icon next to its name, or perform bulk deletion by selecting multiple model cards with the row checkboxes and clicking the red trash-bin button next to the selection count.
1028913

0 commit comments

Comments
 (0)