Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 20 additions & 2 deletions docs/docs/concepts/services.md
Original file line number Diff line number Diff line change
Expand Up @@ -518,12 +518,30 @@ via the [`spot_policy`](../reference/dstack.yml/service.md#spot_policy) property

### Retry policy

By default, if `dstack` can't find capacity, the task exits with an error, or the instance is interrupted,
the run will fail.
By default, if `dstack` can't find capacity, or the service exits with an error, or the instance is interrupted, the run will fail.

If you'd like `dstack` to automatically retry, configure the
[retry](../reference/dstack.yml/service.md#retry) property accordingly:

<div editor-title="service.dstack.yml">

```yaml
type: service
image: my-app:latest
port: 80

retry:
# Retry on specific events
on_events: [no-capacity, error, interruption]
# Retry for up to 1 hour
duration: 1h
```

</div>

If one replica of a multi-replica service fails with retry enabled,
`dstack` will resubmit only the failed replica while keeping active replicas running.

--8<-- "docs/concepts/snippets/manage-fleets.ext"

--8<-- "docs/concepts/snippets/manage-runs.ext"
Expand Down
5 changes: 4 additions & 1 deletion docs/docs/concepts/tasks.md
Original file line number Diff line number Diff line change
Expand Up @@ -387,7 +387,7 @@ via the [`spot_policy`](../reference/dstack.yml/task.md#spot_policy) property. I

### Retry policy

By default, if `dstack` can't find capacity, the task exits with an error, or the instance is interrupted,
By default, if `dstack` can't find capacity, or the task exits with an error, or the instance is interrupted,
the run will fail.

If you'd like `dstack` to automatically retry, configure the
Expand Down Expand Up @@ -416,6 +416,9 @@ retry:

</div>

If one job of a multi-node task fails with retry enabled,
`dstack` will stop all the jobs and resubmit the run.

--8<-- "docs/concepts/snippets/manage-fleets.ext"

--8<-- "docs/concepts/snippets/manage-runs.ext"
Expand Down
Loading