Skip to content

Add Update Service action to EndpointDetailPage to apply model-definition changes via new revision #6606

@agatha197

Description

@agatha197

Problem

When a user creates a model service via service/start and later edits the model-definition.yaml file inside the model vfolder, the change is not reflected in the running routing sessions. Even restarting/scaling sessions does not help — the only current workaround is destroying the service and creating a new one.

This is painful for iteration on runtime parameters, mounts, command, etc.

Root Cause

Backend.AI serving has already been refactored into a Kubernetes-style Deployment / ReplicaSet / Revision model:

  • POST /services is a legacy wrapper (api/rest/service/handler.py:303 CreateLegacyDeploymentAction) that internally creates an endpoint and an initial revision (services/deployment/service.py:493-502), setting auto_activate=True.
  • Each revision captures a JSONB snapshot of model_definition in deployment_revisions.model_definition (models/deployment_revision/row.py:73-116). Endpoints themselves do not hold the definition content.
  • Routing rows point at a specific revision_id. Session provisioning (sokovan/deployment/route/executor.py:452) uses route.revision_id → the revision snapshot — not the vfolder file.
  • Therefore, modifying the vfolder file does nothing for the existing revision. A new revision must be created for the change to take effect.

Proposed Solution (small scope, temporary until full deployment UI)

Add an Update Service action on EndpointDetailPage:

  1. Button opens a form identical to ServiceLauncherPageContent, pre-filled with values from the endpoint's current_revision.
  2. On submit, WebUI calls the existing REST API POST /deployments/{id}/revisions with auto_activate: true.
  3. Server creates a new DeploymentRevisionRow with revision_number = current + 1, snapshots the current model_definition.yaml content, swaps endpoints.current_revision, and triggers CHECK_REPLICA lifecycle so routing sessions are rolled over to the new revision.

From the user's perspective this is simply "I updated the service" — the revision concept is not exposed.

Why no revision UI yet

This is an interim fix. Full revision history / rollback UI is planned as part of the dedicated deployment UI effort and will supersede this.

Implementation Notes

  • Endpoint / deployment id mapping: endpoint_id from the legacy API, deployment_id in the new REST API, and DeploymentInfo.id are the same UUID (api/rest/service/handler.py:161-162). WebUI can reuse the id it already has — no extra lookup.
  • REST client helper: add a new helper for POST /deployments/{id}/revisions; the existing baiClient.service.* helpers only target legacy /services.
  • Payload shape: mirror what _to_model_revision() in service/handler.py builds from the legacy body, so the client-side payload assembly can follow the existing form.
  • Form initial values: hydrate ServiceLauncherFormValue from the endpoint's current_revision (model vfolder, model_definition_path, model_mount_destination, environ, resource_opts, runtime_variant, image, resources, etc.).
  • Dirty check: disable submit when no field changed, to avoid creating wasteful revisions.
  • Field scoping: deployment-level fields (name, domain, project) cannot be changed via a new revision; disable them in the form. replicas has a separate scale API — decide whether Update flow also handles it or shows a notice.
  • Rollout indicator: show an "Updating…" badge while deploying_revision_id is set on the endpoint.
  • Revision accumulation: not a concern. The server sets revision_history_limit=10 for legacy-created endpoints (api/rest/service/handler.py:318), so old revisions are auto-pruned (same as K8s revisionHistoryLimit).
  • Error handling: if revision creation fails, the existing current_revision must remain untouched. Verify the server handles this atomically before rollout.

Out of Scope

  • Revision history list / compare / rollback UI (to come with the dedicated deployment UI)
  • Exposing revision_history_limit in UI
  • Changing deployment-level fields (name, domain, project)
  • Auto-scaling rules editing

References

  • Server create flow: services/deployment/service.py:397-502 (create_deployment + initial revision via add_model_revision with auto_activate=True)
  • Server add-revision API: api/rest/deployment/handler.py:242 (POST /deployments/{id}/revisions)
  • Revision row schema: models/deployment_revision/row.py:73-116 (model_definition JSONB column)
  • Session provisioning from revision: sokovan/deployment/route/executor.py:452 (_provision_route)
  • Legacy REST wrapper: api/rest/service/handler.py:303 (create)
  • WebUI current create flow: react/src/components/ServiceLauncherPageContent.tsx:511 (mutationToCreateService)

JIRA Issue: FR-2528

Metadata

Metadata

Assignees

No one assigned

    Type

    No fields configured for Task.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions