You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs(FR-2571): address 26.4 docs feedback — remove legacy model serving sections and apply KO wording fixes
Resolves Sujin's feedback thread for PRs #6760 and #6767:
- Remove legacy (pre-26.4.0) sections: Auto Scaling Rules, Model Store (browsing/details/clone/run), Admin Model Store Management (create/edit/delete/scan). Promote the V2 subsections to main section content.
- Drop `:::note` version-gate blocks, `#### (version 26.4.0 and later)` headings, and associated HTML anchors now that only the 26.4.0 UI is documented.
- Remove the `Scan Project Model Cards` subsection entirely (no longer in UI).
- Rewrite Auto Scaling `Step Size` direction with explicit threshold formulas (`[minThreshold] < [metric]`, `[metric] < [maxThreshold]`, `[minThreshold] < [metric] < [maxThreshold]`) instead of prose about `+`/`-`/`±`.
- Korean translations:
- Translate hardcoded 'No compatible presets available. This model cannot be deployed.' to Korean.
- Apply 영어→한국어(영어) formatting consistency in Model Store field lists.
- `Edit` → `수정` on Service Info card, routing-info alert, revision edit behavior note.
- `Edit`/`Confirm` → `수정`/`업데이트` in Modifying a Service section.
- Rewrite `Access Level Internal` description across all languages: clarify that Internal is visible only to administrators of the owning domain and project.
- Simplify ripple references to the deploy button (`Run this model` (`Deploy` on 26.4.0) → `Deploy`) in EN/KO/JA/TH.
- Add TODO(Recapture) markers for images that still reference legacy or placeholder states (`auto_scaling_rules_v2.png`, `service_launcher_runtime_variant.png`).
@@ -72,7 +72,7 @@ To use the Model Service, you need to follow the steps below:
72
72
:::tip
73
73
As an alternative workflow, you can browse pre-configured models in the
74
74
[Model Store](#model-store) and deploy them with a single click using the
75
-
`Run this model` button (renamed `Deploy`on version 26.4.0 and later).
75
+
`Deploy`button.
76
76
:::
77
77
78
78
<aid="model-definition-guide"></a>
@@ -293,10 +293,10 @@ please refer to the [Explore Folder](#explore-folder) section.
293
293
The service definition file (`service-definition.toml`) allows administrators to pre-configure the resources, environment, and runtime settings required for a model service. When this file is present in a model folder, the system uses these settings as default values when creating a service.
294
294
295
295
Both `model-definition.yaml` and `service-definition.toml` must be present in the
296
-
model folder to enable the `Run this model` button (`Deploy` on version 26.4.0 and
297
-
later) on the Model Store page. These two files work together: the model definition
298
-
specifies the model and inference server configuration, while the service definition
299
-
specifies the runtime environment, resource allocation, and environment variables.
296
+
model folder to enable the `Deploy` button on the Model Store page. These two files
297
+
work together: the model definition specifies the model and inference server
298
+
configuration, while the service definition specifies the runtime environment,
299
+
resource allocation, and environment variables.
300
300
301
301
The service definition file follows the TOML format with sections organized by runtime variant. Each section configures a specific aspect of the service:
302
302
@@ -340,10 +340,10 @@ selected runtime variant when creating the service.
340
340
:::
341
341
342
342
:::note
343
-
When a service is created from the Model Store using the `Run this model` button
344
-
(`Deploy` on version 26.4.0 and later), the settings from `service-definition.toml`
345
-
are applied automatically. If you later need to adjust the resource allocation, you
346
-
can modify the service through the Model Serving page.
343
+
When a service is created from the Model Store using the `Deploy` button, the
344
+
settings from `service-definition.toml` are applied automatically. If you later
345
+
need to adjust the resource allocation, you can modify the service through the
346
+
Model Serving page.
347
347
:::
348
348
349
349
## Serving Page Overview
@@ -380,6 +380,7 @@ First, provide a service name. The following fields are available:
380
380
For runtime variants such as `vLLM`, `SGLang`, `NVIDIA NIM`, or `Modular MAX`, there is no need to configure a `model-definition` file in your model folder. Instead, the system handles the model configuration automatically based on the selected variant.
<!-- TODO: Recapture — current image shows the runtime-variant dropdown with the vLLM entry still visible; replace with the post-search state where the vLLM result is no longer shown (reference: doc figure 15.7) -->
383
384
384
385
#### Model Definition Mode (Custom Runtime Only)
385
386
@@ -628,59 +629,10 @@ Model definition and runtime parameters are distinct concepts stored separately
628
629
629
630
### Auto Scaling Rules
630
631
631
-
You can configure auto scaling rules for the model service.
632
-
Based on the defined rules, the number of replicas is automatically reduced during low usage to conserve resources,
633
-
and increased during high usage to prevent request delays or failures.
634
-
635
-

636
-
637
-
Click the `Add Rules` button to add a new rule. When you click the button, a modal appears
638
-
where you can add a rule. Each field in the modal is described below:
639
-
640
-
-**Type**: Define the rule. Select either `Scale Out` or `Scale In` based on the scope of the rule.
641
-
642
-
-**Metric Source**: Inference Framework or kernel.
643
-
644
-
- Inference Framework: Average value taken from every replica. Supported only if AppProxy reports the inference metrics.
645
-
- Kernel: Average value taken from every kernel backing the endpoint.
646
-
647
-
-**Condition**: Set the condition under which the auto scaling rule will be applied.
648
-
649
-
-**Metric Name**: The name of the metric to be compared. You can freely input any metric supported by the runtime environment.
650
-
-**Comparator**: Method to compare live metrics with threshold value.
651
-
652
-
- LESS_THAN: Rule triggered when current metric value goes below the threshold defined
653
-
- LESS_THAN_OR_EQUAL: Rule triggered when current metric value goes below or equals the threshold defined
654
-
- GREATER_THAN: Rule triggered when current metric value goes above the threshold defined
655
-
- GREATER_THAN_OR_EQUAL: Rule triggered when current metric value goes above or equals the threshold defined
656
-
657
-
-**Threshold**: A reference value to determine whether the scaling condition is met.
658
-
659
-
-**Step Size**: Size of step of the replica count to be changed when rule is triggered.
660
-
Can be represented as both positive and negative value.
661
-
When defined as negative, the rule will decrease number of replicas.
662
-
663
-
-**Max/Min Replicas**: Sets a maximum/minimum value for the replica count of the endpoint.
664
-
Rule will not be triggered if the potential replica count gets above/below this value.
665
-
666
-
-**CoolDown Seconds**: Duration in seconds to skip reapplying the rule right after rule is first triggered.
667
-
668
-

669
-
670
-
:::note
671
-
From Backend.AI version **26.4.0** onwards, Auto Scaling Rules have been redesigned with
672
-
Prometheus preset support and a new condition model. If you are on an older version, the
673
-
description above still applies. Otherwise, refer to
674
-
[Auto Scaling Rules (version 26.4.0 and later)](#auto-scaling-rules-version-26-4-0-and-later) below.
#### Auto Scaling Rules (version 26.4.0 and later)
680
-
681
-
On Backend.AI version 26.4.0 and later, Auto Scaling Rules are redesigned with a Prometheus metric source, a segmented condition control, and a richer rule list.
632
+
Auto Scaling Rules automatically increase or decrease the number of replicas for a model service based on live metrics. This conserves resources during low usage and prevents request delays or failures during high usage.
682
633
683
634

635
+
<!-- TODO: Recapture with a realistic example rule (current screenshot contains placeholder test values such as `123123`) -->
684
636
685
637
The rule list provides:
686
638
@@ -699,7 +651,12 @@ Click the `Add Rules` button to open the **Add Auto Scaling Rule** editor. To mo
699
651
-**Single**: Defines a single comparison `Metric <op> Threshold`, where `<op>` is either `>` or `<`.
700
652
-**Range**: Defines a range `Min Threshold < Metric < Max Threshold`. Both thresholds are required; the minimum must be less than the maximum.
701
653
702
-
-**Step Size**: A positive integer specifying how many replicas to add or remove per scaling event. The direction (add or remove) is derived automatically from which threshold is configured, so you only specify the magnitude.
654
+
-**Step Size**: A positive integer specifying how many replicas to add or remove per scaling event. The direction (add or remove) is derived automatically from which threshold is configured:
655
+
656
+
- Only a minimum threshold is set → `[minThreshold] < [metric]`. Replicas are scaled **in** when the metric falls below the threshold.
657
+
- Only a maximum threshold is set → `[metric] < [maxThreshold]`. Replicas are scaled **out** when the metric rises above the threshold.
658
+
- Both thresholds are set → `[minThreshold] < [metric] < [maxThreshold]`. Replicas are scaled in or out depending on which boundary the metric crosses.
659
+
703
660
-**Time Window**: The time window, in seconds, over which the metric is aggregated and evaluated for scaling. This replaces the legacy `CoolDown Seconds` field and has a different meaning.
704
661
-**Min Replicas** and **Max Replicas**: The lower and upper bounds that auto-scaling enforces on the replica count. Auto-scaling will not reduce the number of replicas below **Min Replicas** or increase it above **Max Replicas**.
705
662
@@ -756,7 +713,7 @@ If a route has encountered an error, clicking the error indicator on the route r
756
713
757
714
### Modifying a Service
758
715
759
-
Click the `Edit` button on the endpoint detail page to modify a model service. The service launcher opens with previously entered fields already filled in. You can optionally modify only the fields you wish to change. After modifying the fields, click `Confirm` to apply the changes.
716
+
Click the `Edit` button on the endpoint detail page to modify a model service. The service launcher opens with previously entered fields already filled in. You can optionally modify only the fields you wish to change. After modifying the fields, click the `Update` button to apply the changes.
760
717
761
718

762
719
@@ -833,63 +790,10 @@ To use the model, you will need the following information:
833
790
834
791
The Model Store provides a card-based gallery of pre-configured models that you can browse, search, and deploy. You can access the Model Store from the sidebar menu.
835
792
836
-

793
+

837
794
838
795
### Browsing and Searching Models
839
796
840
-
You can search for models by name, description, task, category, or label using the search bar at the top of the page. Additionally, you can use the filter dropdowns to narrow results:
841
-
842
-
-**Category**: Filter by model category (e.g., LLM).
843
-
-**Task**: Filter by task type (e.g., text-generation).
844
-
-**Label**: Filter by model labels.
845
-
846
-
### Model Card Details
847
-
848
-
Click on a model card to view its details in a modal. The model card modal displays:
849
-
850
-
-**Title**, **Author**, and **Version**
851
-
-**Description** and **README**
852
-
-**Task**, **Category**, and **Architecture**
853
-
-**Framework** and **Labels**
854
-
-**License**
855
-
-**Minimum Resources** required to run the model
856
-
- A link to the model storage folder
857
-
858
-

859
-
860
-
### Cloning a Model
861
-
862
-
Click the `Clone to a folder` button on the model card to clone the model folder to your own storage. A confirmation dialog will appear where you can specify the destination folder name.
863
-
864
-

865
-
866
-
### Running a Model from Model Store
867
-
868
-
Click the `Run this model` button on the model card to deploy the model as a service. This requires both `model-definition.yaml` and `service-definition.toml` to be present in the model folder.
869
-
870
-
- If only one runtime variant is configured in the service definition, the service is launched automatically with the pre-configured settings.
871
-
- If multiple runtime variants are available, you are redirected to the service launcher page to select one.
872
-
873
-
:::note
874
-
When a service is created from the Model Store, the settings from
875
-
`service-definition.toml` are applied automatically. You can modify the
876
-
service later through the Serving page.
877
-
:::
878
-
879
-
:::note
880
-
From Backend.AI version **26.4.0** onwards, the Model Store has been redesigned. If you are on
881
-
an older version, the description above still applies. Otherwise, refer to
882
-
[Model Store (version 26.4.0 and later)](#model-store-version-26-4-0-and-later) below.
883
-
:::
884
-
885
-
<aid="model-store-version-26-4-0-and-later"></a>
886
-
887
-
### Model Store (version 26.4.0 and later)
888
-
889
-
On Backend.AI version 26.4.0 and later, the Model Store is redesigned with a simplified browsing experience, a card-detail drawer, and a streamlined deploy flow that replaces the legacy browse/detail/run workflow.
890
-
891
-

892
-
893
797
The page uses a search and sort layout at the top:
894
798
895
799
-**Search Models**: Use the **Filter By Name** property filter to search model cards by name.
@@ -902,6 +806,8 @@ If the `MODEL_STORE` project is not set up on the server, the page shows a *Mode
902
806
903
807
The list is paginated at the bottom. You can change the page size between `10`, `20`, and `50` entries.
904
808
809
+
### Model Card Details
810
+
905
811
Click a card to open the model card drawer on the right side of the page. The drawer shows the model title and description at the top, followed by the task, category, labels, and license tags, and then a details list with the following items:
906
812
907
813
-**Author**
@@ -916,7 +822,11 @@ If the model card includes a README, it is rendered as a `README.md` card at the
916
822
917
823

918
824
919
-
To clone a model folder in version 26.4.0 and later, use the [Data](../vfolder/vfolder.md) page directly, since the Model Store drawer no longer provides a dedicated Clone button.
825
+
### Cloning a Model
826
+
827
+
To clone a model folder, use the [Data](../vfolder/vfolder.md) page directly. The Model Store drawer does not provide a dedicated Clone button.
828
+
829
+
### Deploying a Model
920
830
921
831
Click the **Deploy** button in the drawer header to deploy the model as a service. The deploy flow behaves in one of two ways:
922
832
@@ -950,9 +860,24 @@ The Admin Serving page has two tabs:
950
860
951
861
### Admin Model Store Management
952
862
953
-
Superadmins can manage model cards through the **Model Store Management** tab on the Admin Serving page. This tab provides a table view of all model cards with the following columns: **Name**, **Title**, **Task**, **Category**, **Labels**, **Created At**, and **Controls**.
863
+
Superadmins can manage model cards through the **Model Store Management** tab on the Admin Serving page.
954
864
955
-

865
+

866
+
867
+
The list provides the following columns:
868
+
869
+
-**Name**: The unique identifier of the model card.
870
+
-**Title**: The human-readable display name.
871
+
-**Category**: The model category (e.g., LLM).
872
+
-**Task**: The inference task type (e.g., text-generation).
873
+
-**Access Level**: Shows a green `Public` tag when the model card is publicly accessible, or a default `Private` tag otherwise.
874
+
-**Domain**: The domain that owns the model card.
875
+
-**Project**: The project that owns the model card.
876
+
-**Created At**: The timestamp when the model card was created.
877
+
878
+
You can filter the list by **Name** using the property filter bar at the top. Edit and delete action icons are shown directly in the **Name** cell of each row.
879
+
880
+
To delete multiple model cards at once, select the rows you want to remove using the checkboxes and click the red trash-bin button next to the selection count. A confirmation dialog appears before the cards are deleted.
956
881
957
882
#### Creating a Model Card
958
883
@@ -973,56 +898,16 @@ Click the `Create Model Card` button to open the creation modal. Fill in the fol
973
898
-**Domain**: The domain to associate the model card with.
974
899
-**Project ID** (required): The project that owns the model card.
975
900
-**VFolder** (required): The storage folder containing the model files.
976
-
-**Access Level**: Set to `Internal` (visible within the domain) or `Public` (visible to all).
901
+
-**Access Level**: Controls who can see the model card in the user-facing Model Store.
902
+
903
+
*`Internal`: Visible only to administrators of the owning domain and project. Regular users cannot see internal cards in their Model Store.
904
+
*`Public`: Visible to all users who have access to the owning project.
977
905
978
906
#### Editing a Model Card
979
907
980
-
Click the edit icon in the **Controls** column to modify an existing model card. The edit modal opens with previously entered fields already filled in.
908
+
Click the edit icon next to the model card name to modify an existing model card. The edit modal opens with previously entered fields already filled in.
981
909
982
910
#### Deleting Model Cards
983
911
984
-
You can delete individual model cards by clicking the delete icon in the **Controls** column, or perform bulk deletion by selecting multiple model cards and clicking `Delete Selected`.
985
-
986
-
#### Scanning Project Model Cards
987
-
988
-
Click the `Scan Project Model Cards` button to automatically scan a project's model folders and create model cards for any folders that contain valid model definitions. The scan results show the number of model cards created and updated.
989
-
990
-
:::note
991
-
The **Scan Project Model Cards** button is not available on Backend.AI version 26.4.0 and later.
992
-
:::
993
-
994
-
:::note
995
-
From Backend.AI version **26.4.0** onwards, the Admin Model Store Management tab has been
996
-
redesigned. If you are on an older version, the description above still applies. Otherwise,
997
-
refer to [Admin Model Store Management (version 26.4.0 and later)](#admin-model-store-management-version-26-4-0-and-later) below.
#### Admin Model Store Management (version 26.4.0 and later)
1003
-
1004
-
On Backend.AI version 26.4.0 and later, the **Model Store Management** tab presents a redesigned model card list.
1005
-
1006
-

1007
-
1008
-
The list provides the following columns:
1009
-
1010
-
-**Name**: The unique identifier of the model card.
1011
-
-**Title**: The human-readable display name.
1012
-
-**Category**: The model category (e.g., LLM).
1013
-
-**Task**: The inference task type (e.g., text-generation).
1014
-
-**Access Level**: Shows a green `Public` tag when the model card is publicly accessible, or a default `Private` tag otherwise.
1015
-
-**Domain**: The domain that owns the model card.
1016
-
-**Project**: The project that owns the model card.
1017
-
-**Created At**: The timestamp when the model card was created.
1018
-
1019
-
You can filter the list by **Name** using the property filter bar at the top. Edit and delete action icons are shown directly in the **Name** cell of each row.
1020
-
1021
-
To delete multiple model cards at once, select the rows you want to remove using the checkboxes and click the red trash-bin button next to the selection count. A confirmation dialog appears before the cards are deleted.
1022
-
1023
-
:::note
1024
-
The Create, Edit, and Delete dialogs for individual model cards are the same as in the legacy
1025
-
version. See [Creating a Model Card](#creating-a-model-card),
1026
-
[Editing a Model Card](#editing-a-model-card), and [Deleting Model Cards](#deleting-model-cards).
1027
-
:::
912
+
You can delete an individual model card by clicking the delete icon next to its name, or perform bulk deletion by selecting multiple model cards with the row checkboxes and clicking the red trash-bin button next to the selection count.
0 commit comments