K8S API error resilience

## New feature

Could Nextflow be a little bit more resilient to transient K8S API issues, e.g.

```
Caused by:
  Request POST /apis/batch/v1/namespaces/ns-<NAMESPACE>/jobs returned an error code=500
  
    {
        "kind": "Status",
        "apiVersion": "v1",
        "metadata": {
            
        },
        "status": "Failure",
        "message": "Internal error occurred: resource quota evaluation timed out",
        "reason": "InternalError",
        "details": {
            "causes": [
                {
                    "message": "resource quota evaluation timed out"
                }
            ]
        },
        "code": 500
    }

 -- Check '.nextflow.log' file for details
```

Perhaps some sort of configurable retry number for 4xx/5xx error codes?

The k8s API can be a little flakey under high load situations, especially when there are a lot of jobs sitting around spamming the API with requests to make pods!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

K8S API error resilience #7122

New feature

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

K8S API error resilience #7122

Description

New feature

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions