New feature
Could Nextflow be a little bit more resilient to transient K8S API issues, e.g.
Caused by:
Request POST /apis/batch/v1/namespaces/ns-<NAMESPACE>/jobs returned an error code=500
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {
},
"status": "Failure",
"message": "Internal error occurred: resource quota evaluation timed out",
"reason": "InternalError",
"details": {
"causes": [
{
"message": "resource quota evaluation timed out"
}
]
},
"code": 500
}
-- Check '.nextflow.log' file for details
Perhaps some sort of configurable retry number for 4xx/5xx error codes?
The k8s API can be a little flakey under high load situations, especially when there are a lot of jobs sitting around spamming the API with requests to make pods!
New feature
Could Nextflow be a little bit more resilient to transient K8S API issues, e.g.
Perhaps some sort of configurable retry number for 4xx/5xx error codes?
The k8s API can be a little flakey under high load situations, especially when there are a lot of jobs sitting around spamming the API with requests to make pods!