Skip to content

Commit 9138a1d

Browse files
committed
Add user documentation for drift detection and external deletion
Add user-facing documentation covering periodic resync configuration, drift detection behavior, and external deletion handling. Update the enhancement proposal to reflect the final two-tier resync period resolution design.
1 parent 94bb33d commit 9138a1d

4 files changed

Lines changed: 238 additions & 4 deletions

File tree

enhancements/drift-detection.md

Lines changed: 37 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,10 @@
22

33
| Field | Value |
44
|-------|-------|
5-
| **Status** | implementable |
5+
| **Status** | implemented |
66
| **Author(s)** | @eshulman |
77
| **Created** | 2026-02-03 |
8-
| **Last Updated** | 2026-02-03 |
8+
| **Last Updated** | 2026-02-10 |
99
| **Tracking Issue** | TBD |
1010

1111
## Summary
@@ -227,7 +227,7 @@ Drift detection covers all **mutable fields** that ORC actuators implement updat
227227

228228
**Mitigation**:
229229
- Disabled by default; when enabled, recommend conservative intervals (e.g., 10 hours)
230-
- Add random jitter to resync times to avoid thundering herd: since reconciliation already uses "requeue after X duration", jitter simply adds a random offset (e.g., ±10%) to the resync period, spreading resyncs over time rather than having them fire simultaneously
230+
- Add random jitter to resync times to avoid thundering herd: since reconciliation already uses "requeue after X duration", jitter simply adds a random offset (e.g., [0%, +20%]) to the resync period, spreading resyncs over time rather than having them fire simultaneously
231231
- Allow operators to disable or lengthen resync for stable resources
232232

233233
### Controller Resource Consumption
@@ -275,3 +275,37 @@ Implement a watcher that periodically lists all resources from OpenStack and com
275275
## Implementation History
276276

277277
- 2026-02-03: Enhancement proposed
278+
- 2026-02-03: Implemented — all tasks completed
279+
280+
### Implemented Components
281+
282+
The following have been implemented:
283+
284+
**API Changes**
285+
- Added `spec.resyncPeriod` field (`*metav1.Duration`) to all ORC resource types
286+
- Added `status.lastSyncTime` field (`*metav1.Time`) to all ORC resource types
287+
288+
**Periodic Resync**
289+
- `shouldReconcile` updated to check `lastSyncTime` against `resyncPeriod` for time-based resync
290+
- Jitter ([0%, +20%]) applied to resync scheduling via `resync.CalculateJitteredDuration`
291+
- `status.lastSyncTime` written on every successful reconciliation cycle
292+
- Resources in terminal error state are not rescheduled
293+
294+
**External Deletion Handling**
295+
- `IsImported()` method added to `APIObjectAdapter` interface (all resource adapters)
296+
- `GetOrCreateOSResource` branches on management policy and import status when 404 is received:
297+
- Managed, non-imported resources → `(nil, nil)` to trigger recreation
298+
- Unmanaged or imported resources → terminal error
299+
- `status.ClearStatusID` clears `status.id` before recreation (using JSON merge patch with explicit `null`)
300+
- `reconcileNormal` handles the `(nil, nil)` recreation signal from `GetOrCreateOSResource`
301+
302+
**E2E Tests**
303+
- `network-resync-period`: verifies `lastSyncTime` is updated after configured period
304+
- `network-resync-disabled`: verifies `lastSyncTime` is not updated when `resyncPeriod: 0`
305+
- `network-resync-terminal-error`: verifies terminal errors are not rescheduled
306+
- `network-resync-jitter`: verifies independent jitter-based scheduling for multiple resources
307+
- `network-external-deletion`: verifies managed ORC-created network is recreated with new ID after external deletion
308+
- `network-external-deletion-import`: verifies imported network enters terminal error state after external deletion
309+
310+
**Documentation**
311+
- `website/docs/user-guide/drift-detection.md`: user-facing documentation covering external deletion behavior, resync configuration, verification steps, and implications for dependent resources
Lines changed: 189 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,189 @@
1+
# Drift Detection and External Deletion Handling
2+
3+
ORC can periodically reconcile resources to detect and correct configuration drift — changes made to OpenStack resources outside of ORC's control. This feature also detects when managed resources have been deleted directly from OpenStack and recreates them automatically.
4+
5+
## Enabling Drift Detection
6+
7+
Drift detection is disabled by default. Enable it per-resource by setting `spec.resyncPeriod`:
8+
9+
```yaml
10+
apiVersion: openstack.k-orc.cloud/v1alpha1
11+
kind: Network
12+
metadata:
13+
name: critical-network
14+
spec:
15+
cloudCredentialsRef:
16+
secretName: openstack-clouds
17+
cloudName: openstack
18+
managementPolicy: managed
19+
resyncPeriod: 1h # Re-check OpenStack every hour
20+
resource:
21+
description: Critical application network
22+
```
23+
24+
The `resyncPeriod` field accepts any Go duration string: `10m`, `1h`, `24h`, etc.
25+
26+
**Default:** `0` (disabled). When disabled, ORC only reconciles resources in response to spec changes or controller restarts.
27+
28+
!!! note
29+
30+
Conservative resync periods (e.g., `1h` or `10h`) are recommended in production to avoid excessive OpenStack API calls.
31+
32+
## How It Works
33+
34+
After a resource reaches a stable state (`Progressing=False`), ORC schedules a reconciliation after the configured `resyncPeriod`. On each resync:
35+
36+
1. ORC fetches the current state of the OpenStack resource.
37+
2. For **managed** resources: if drift is detected, ORC updates the resource to match the Kubernetes spec.
38+
3. For **unmanaged** resources: ORC refreshes `status.resource` to reflect the current OpenStack state, but makes no changes.
39+
4. The next resync is scheduled.
40+
41+
A small random jitter ([0%, +20%]) is applied to `resyncPeriod` to spread reconciliations and avoid thundering-herd effects.
42+
43+
!!! note
44+
45+
Resources in a terminal error state (`Progressing=False` with reason `InvalidConfiguration` or `UnrecoverableError`) are **not** periodically resynced. Terminal errors require manual intervention to resolve.
46+
47+
## Tracking Sync Status
48+
49+
Every ORC resource has a `status.lastSyncTime` field that records when ORC last successfully reconciled with OpenStack:
50+
51+
```bash
52+
kubectl get network critical-network -o jsonpath='{.status.lastSyncTime}'
53+
# 2026-02-03T10:30:00Z
54+
```
55+
56+
ORC persists this timestamp in the Kubernetes status. After a controller restart, it uses `lastSyncTime` to determine when the next resync should occur, preventing a thundering herd of reconciliations on startup.
57+
58+
## External Deletion Handling
59+
60+
When a resource is deleted directly from OpenStack (bypassing ORC), the behavior depends on how ORC originally obtained the resource.
61+
62+
### ORC-Created Resources (Managed, Not Imported)
63+
64+
If you created the resource through ORC's `spec.resource` field, ORC **recreates** it automatically:
65+
66+
1. ORC detects the resource is missing from OpenStack (the ID stored in `status.id` no longer exists).
67+
2. ORC clears `status.id`.
68+
3. On the next reconcile, ORC creates a new OpenStack resource.
69+
4. The new resource ID is stored in `status.id`.
70+
71+
The ORC object continues to exist and becomes `Available=True` again once the resource is recreated.
72+
73+
```yaml
74+
# This type of resource will be recreated if deleted from OpenStack
75+
spec:
76+
managementPolicy: managed
77+
resyncPeriod: 10m # Enable resync to detect deletion quickly
78+
resource: # Resource was created by ORC
79+
description: My application network
80+
```
81+
82+
!!! warning
83+
84+
Recreation produces a new OpenStack resource with a **new ID**. Any OpenStack resources (outside ORC) that referenced the old ID will need to be updated manually.
85+
86+
### Imported Resources (Terminal Error)
87+
88+
If you imported an existing resource using `spec.import`, ORC reports a **terminal error** when the resource is deleted from OpenStack:
89+
90+
- `Available=False`
91+
- `Progressing=False`
92+
- Condition reason: `UnrecoverableError`
93+
- Message: `resource has been deleted from OpenStack`
94+
95+
ORC does **not** recreate imported resources because it did not create them originally, and recreating a new empty resource would not restore what was lost.
96+
97+
```yaml
98+
# This type of resource enters terminal error if deleted from OpenStack
99+
spec:
100+
managementPolicy: managed
101+
import:
102+
id: "12345678-1234-1234-1234-123456789abc" # Was imported by ID
103+
```
104+
105+
```yaml
106+
# This type also enters terminal error if deleted from OpenStack
107+
spec:
108+
managementPolicy: unmanaged
109+
import:
110+
filter:
111+
name: public # Was imported by filter
112+
```
113+
114+
To recover: manually recreate the OpenStack resource and update the ORC object's `spec.import.id` to the new resource ID, or delete and recreate the ORC object.
115+
116+
### Summary Table
117+
118+
| Resource Type | How Obtained | External Deletion Behavior |
119+
|--------------|--------------|---------------------------|
120+
| Managed, ORC-created | `spec.resource` | **Recreated** automatically |
121+
| Managed, imported by ID | `spec.import.id` | **Terminal error** |
122+
| Managed, imported by filter | `spec.import.filter` | **Terminal error** |
123+
| Unmanaged | `spec.import.*` | **Terminal error** |
124+
125+
## Verifying Recreation Occurred
126+
127+
When an ORC-created resource is recreated after external deletion, `status.id` changes to reflect the new OpenStack resource ID. Monitor this to detect recreation events:
128+
129+
```bash
130+
# Record the current ID
131+
ORIGINAL_ID=$(kubectl get network my-network -o jsonpath='{.status.id}')
132+
echo "Original ID: $ORIGINAL_ID"
133+
134+
# ... some time later, check if it changed ...
135+
CURRENT_ID=$(kubectl get network my-network -o jsonpath='{.status.id}')
136+
if [ "$ORIGINAL_ID" != "$CURRENT_ID" ]; then
137+
echo "Resource was recreated! New ID: $CURRENT_ID"
138+
fi
139+
```
140+
141+
You can also watch the resource for status changes:
142+
143+
```bash
144+
kubectl get network my-network -w
145+
```
146+
147+
During recreation, you will observe:
148+
149+
1. `Available=False`, `Progressing=True` — ORC is recreating the resource
150+
2. `Available=True`, `Progressing=False` — Recreation complete, `status.id` has new value
151+
152+
## Implications for Dependent Resources
153+
154+
OpenStack enforces referential integrity for most resource relationships (e.g., a Network cannot be deleted while Subnets exist). If an external deletion manages to bypass these constraints (e.g., direct database manipulation), the behavior of dependent ORC resources follows these rules:
155+
156+
### If a Parent Resource Is Recreated
157+
158+
When a parent resource (e.g., Network) is recreated by ORC, dependent resources that reference it (e.g., Subnets) detect the parent as available again but may encounter errors when OpenStack rejects operations referencing the old parent ID. **Manual intervention may be required** to recreate dependent resources against the new parent.
159+
160+
### If a Parent Resource Enters Terminal Error
161+
162+
When a parent resource enters terminal error:
163+
164+
- **Dependent resources waiting on it** (e.g., a Subnet waiting for its Network): ORC will not proceed — it waits until the parent becomes available again. The dependent is not itself in an error state; it is just waiting.
165+
- **Dependent resources already created**: ORC continues managing them normally. If ORC attempts to update a dependent resource that references a deleted parent in OpenStack, the behavior depends on what OpenStack returns for that operation.
166+
167+
!!! warning
168+
169+
If a parent resource is externally deleted in a way that bypasses OpenStack's referential integrity checks, the resulting state may require manual cleanup of both the parent and dependent resources. This is an unusual operational scenario and not specific to drift detection.
170+
171+
## Interaction with `managementPolicy: unmanaged`
172+
173+
Unmanaged resources are never modified by ORC. With `resyncPeriod` set, ORC will periodically refresh `status.resource` to reflect the current OpenStack state. However, if the OpenStack resource is deleted, ORC will report a terminal error — it does not recreate unmanaged resources under any circumstances.
174+
175+
```yaml
176+
spec:
177+
managementPolicy: unmanaged
178+
resyncPeriod: 1h # Refresh status every hour, but never modify OpenStack
179+
import:
180+
id: "12345678-1234-1234-1234-123456789abc"
181+
```
182+
183+
## Drift Detection Without Resync
184+
185+
Even with `resyncPeriod: 0` (the default, disabled), ORC will still detect external deletion when another event triggers reconciliation — for example, when you make a spec change or the controller restarts. The recreation or terminal error behavior is the same; the difference is only in how quickly ORC detects the deletion.
186+
187+
!!! tip
188+
189+
If you want rapid detection of external deletions for critical resources, set a short `resyncPeriod` (e.g., `10m`).

website/docs/user-guide/index.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,15 @@ spec:
122122
ipVersion: 4
123123
```
124124

125+
### Drift Detection and External Deletion
126+
127+
ORC can periodically reconcile resources to detect configuration drift and recreate managed resources that are deleted directly from OpenStack. See [Drift Detection](drift-detection.md) for details on:
128+
129+
- How to enable periodic resync with `spec.resyncPeriod`
130+
- How ORC handles externally deleted resources (recreation vs. terminal error)
131+
- How to verify that recreation occurred by checking `status.id`
132+
- Implications for dependent resources
133+
125134
### Understanding Status and Conditions
126135

127136
Every ORC resource reports its status through two conditions: `Available` (whether the resource is ready for use) and `Progressing` (whether ORC is still working on it). For detailed information about conditions and their meanings, see [Troubleshooting: Status Conditions Explained](../troubleshooting.md#status-conditions-explained).

website/mkdocs.yml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,9 @@ nav:
77
- Getting Started:
88
- Installation: installation.md
99
- Quick Start: getting-started.md
10-
- User Guide: user-guide/index.md
10+
- User Guide:
11+
- Overview: user-guide/index.md
12+
- Drift Detection: user-guide/drift-detection.md
1113
- CRD Reference: crd-reference.md
1214
- Troubleshooting: troubleshooting.md
1315
- Contributing:

0 commit comments

Comments
 (0)