What happened
A SolrBackup can remain stuck in InProgress forever if:
- the backup request is accepted (operator backup submit, stable backup async ID),
- the operator suddenly become unavailable,
- the Solr async tracking entry for that backup is deleted (DELETESTATUS API, tracker deleteSingleAsyncId),
- and the operator become available again.
After that:
Environment
- macOS
- local
kind cluster
- Kubernetes / kind node version:
v1.32.1
solr-operator version: v0.10.0-orerekease
solr-operator built from master on March 22, 2026 (ca9d3c5c37a59f29570a6b49a8da5dc614aba75e)
- Solr version:
9.10.0
Steps to reproduce
- Deploy
solr-operator on a local kind cluster.
- Create a 1-node
SolrCloud with a local backup repository, then create a
collection and start a SolrBackup for it.
- As soon as the backup first shows
inProgress=true, scale the
solr-operator deployment down to 0.
- While the operator is down, wait for the Solr async request to finish, then
delete only that async status entry with DELETESTATUS.
- Confirm the backup data still exists, but
REQUESTSTATUS for that same
request ID now returns notfound.
- Scale the operator back up to
1 and observe that the SolrBackup CR never
reaches a terminal state.
The stuck status looks like:
status:
collectionBackupStatuses:
- asyncBackupStatus: notfound
inProgress: true
and it never sets finished: true or successful: true.
Expected behavior
Once an accepted backup later becomes notfound, the operator should not leave the CR in InProgress forever.
It should eventually either:
- recover, or
- mark the backup failed with a clear reason.
What happened
A
SolrBackupcan remain stuck inInProgressforever if:After that:
REQUESTSTATUSfor the backup request ID returnsnotfound(REQUESTSTATUS API, tracker getAsyncTaskRequestStatus),SolrBackupCR still stays ininProgress=true,Environment
kindclusterv1.32.1solr-operatorversion:v0.10.0-orerekeasesolr-operatorbuilt frommasteron March 22, 2026 (ca9d3c5c37a59f29570a6b49a8da5dc614aba75e)9.10.0Steps to reproduce
solr-operatoron a localkindcluster.SolrCloudwith a local backup repository, then create acollection and start a
SolrBackupfor it.inProgress=true, scale thesolr-operatordeployment down to0.delete only that async status entry with
DELETESTATUS.REQUESTSTATUSfor that samerequest ID now returns
notfound.1and observe that theSolrBackupCR neverreaches a terminal state.
The stuck status looks like:
and it never sets
finished: trueorsuccessful: true.Expected behavior
Once an accepted backup later becomes
notfound, the operator should not leave the CR inInProgressforever.It should eventually either: