Skip to content

Commit 62159d9

Browse files
committed
OLS-3348 Reconcile agentic alerts adapter as lightspeed-operator operand
1 parent fb5c715 commit 62159d9

28 files changed

Lines changed: 2248 additions & 67 deletions

.ai/spec/how/project-structure.md

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,9 @@
2121
| `internal/controller/console/reconciler.go` | `ReconcileConsoleUIResources()`, `ReconcileConsoleUIDeploymentAndPlugin()`, `RemoveConsoleUI()` | Console UI Phase 1 + Phase 2 + cleanup |
2222
| `internal/controller/console/deployment.go` | `GenerateConsoleUIDeployment()` | Console UI deployment generation |
2323
| `internal/controller/console/assets.go` | ConsolePlugin CR generator, nginx config, service, network policy | Console UI resource generation |
24+
| `internal/controller/alertsadapter/reconciler.go` | `ReconcileAlertsAdapterResources()`, `ReconcileAlertsAdapterDeployment()`, `RemoveAlertsAdapter()` | Alerts adapter Phase 1 + Phase 2 + finalizer RBAC cleanup |
25+
| `internal/controller/alertsadapter/deployment.go` | `GenerateDeployment()` | Alerts adapter deployment generation |
26+
| `internal/controller/alertsadapter/assets.go` | SA, ClusterRole, ClusterRoleBinding, monitoring RoleBinding, NetworkPolicy generators | Alerts adapter resource generation |
2427
| `internal/controller/reconciler/interface.go` | `Reconciler` interface | Dependency injection interface for component packages |
2528
| `internal/controller/utils/constants.go` | ~200 constants | Resource names, ports, paths, annotation keys, defaults |
2629
| `internal/controller/utils/errors.go` | ~80 error message constants | Structured error messages for all operations |
@@ -69,10 +72,12 @@ OLSConfigReconciler.Reconcile()
6972
+-- console.ReconcileConsoleUIResources()
7073
+-- postgres.ReconcilePostgresResources()
7174
+-- appserver.ReconcileAppServerResources()
75+
+-- alertsadapter.ReconcileAlertsAdapterResources()
7276
6. reconcileDeploymentsAndStatus() -- Phase 2: Deployments, Services, TLS certs, status
7377
+-- console.ReconcileConsoleUIDeploymentAndPlugin()
7478
+-- postgres.ReconcilePostgresDeployment()
7579
+-- appserver.ReconcileAppServerDeployment()
80+
+-- alertsadapter.ReconcileAlertsAdapterDeployment()
7681
+-- checkDeploymentStatus() per deployment -> build newStatus
7782
+-- UpdateStatusCondition()
7883
```
@@ -97,7 +102,7 @@ External secret/configmap changes
97102
## Key Abstractions
98103

99104
### Image Management
100-
Default images are stored in a `defaultImages` map in `cmd/main.go` keyed by logical name (e.g., `"lightspeed-service"`, `"postgres-image"`, `"console-plugin"`). Default values come from `internal/relatedimages/` which reads `related_images.json` at build time. Command-line flags override individual images. The map is passed to the reconciler via `OLSConfigReconcilerOptions` as individual named fields (e.g., `LightspeedServiceImage`, `ConsoleUIImage`).
105+
Default images are stored in a `defaultImages` map in `cmd/main.go` keyed by logical name (e.g., `"lightspeed-service"`, `"postgres-image"`, `"console-plugin"`, `"alerts-adapter"`). Default values come from `internal/relatedimages/` which reads `related_images.json` at build time. Command-line flags override individual images. The map is passed to the reconciler via `OLSConfigReconcilerOptions` as individual named fields (e.g., `LightspeedServiceImage`, `ConsoleUIImage`, `AlertsAdapterImage`).
101106

102107
### WatcherConfig
103108
Declarative configuration for external resource watching. Contains:
@@ -108,7 +113,7 @@ Declarative configuration for external resource watching. Contains:
108113
The special deployment name `"ACTIVE_BACKEND"` resolves to the AppServer deployment name (`lightspeed-app-server`).
109114

110115
### Component Package Pattern
111-
Each component (appserver, postgres, console) follows the same package structure:
116+
Each component (appserver, postgres, console, alertsadapter) follows the same package structure:
112117
- `reconciler.go`: Phase 1 (resources) and Phase 2 (deployment) entry points
113118
- `deployment.go`: Deployment spec generation and update detection
114119
- `assets.go` and/or `config.go`: Resource and config generation
@@ -117,17 +122,18 @@ The packages receive `reconciler.Reconciler` interface, never import the control
117122
### Reconciler Interface (`internal/controller/reconciler/interface.go`)
118123
Embeds `client.Client` and adds getter methods for:
119124
- `GetScheme()`, `GetLogger()`, `GetNamespace()`
120-
- Image getters: `GetAppServerImage()`, `GetPostgresImage()`, `GetConsoleUIImage()`, `GetOpenShiftMCPServerImage()`, `GetDataverseExporterImage()`
125+
- Image getters: `GetAppServerImage()`, `GetPostgresImage()`, `GetConsoleUIImage()`, `GetAlertsAdapterImage()`, `GetOpenShiftMCPServerImage()`, `GetDataverseExporterImage()`
121126
- Version getters: `GetOpenShiftMajor()`, `GetOpenshiftMinor()`
122127
- Config getters: `IsPrometheusAvailable()`, `GetWatcherConfig()`
123128

124129
### Finalizer Pattern
125130
The OLSConfig CR uses finalizer `ols.openshift.io/finalizer` (defined in `utils.OLSConfigFinalizer`). On deletion:
126131
1. Remove Console UI (deactivate plugin, delete ConsolePlugin CR)
127-
2. List all owned resources via owner references
128-
3. Explicitly delete owned resources
129-
4. Wait up to 3 minutes for deletion (poll every 5 seconds)
130-
5. Remove finalizer (proceeds even if cleanup times out)
132+
2. Remove alerts adapter cross-namespace RBAC (`alertsadapter.RemoveAlertsAdapter()`)
133+
3. List all owned resources via owner references
134+
4. Explicitly delete owned resources
135+
5. Wait up to 3 minutes for deletion (poll every 5 seconds)
136+
6. Remove finalizer (proceeds even if cleanup times out)
131137

132138
## Integration Points
133139

.ai/spec/how/reconciliation.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,22 +18,24 @@ Reconcile(ctx, req)
1818
-> handleFinalizer() # Add/remove finalizer, run cleanup
1919
-> reconcileOperatorResources() # ServiceMonitor, NetworkPolicy (operator-level)
2020
-> annotateExternalResources() # Validate secrets, annotate for watching
21-
-> reconcileIndependentResources() # Phase 1: console, postgres, backend resources
21+
-> reconcileIndependentResources() # Phase 1: console, postgres, backend, alerts adapter resources
2222
| |-- console.ReconcileConsoleUIResources()
2323
| |-- postgres.ReconcilePostgresResources()
24-
| +-- appserver.ReconcileAppServerResources()
24+
| |-- appserver.ReconcileAppServerResources()
25+
| +-- alertsadapter.ReconcileAlertsAdapterResources()
2526
-> reconcileDeploymentsAndStatus() # Phase 2: deployments + status update
2627
|-- console.ReconcileConsoleUIDeploymentAndPlugin()
2728
|-- postgres.ReconcilePostgresDeployment()
2829
|-- appserver.ReconcileAppServerDeployment()
30+
|-- alertsadapter.ReconcileAlertsAdapterDeployment()
2931
|-- checkDeploymentStatus() for each # Collect diagnostics
3032
+-- UpdateStatusCondition() # Single status update
3133
```
3234

3335
## Key Abstractions
3436

3537
### Reconciler Interface
36-
The `reconciler.Reconciler` interface breaks the circular dependency between the main controller and component packages. Component packages (appserver, postgres, console) receive this interface instead of importing the controller package directly. It embeds `client.Client` and adds getter methods for images, namespace, and OpenShift version.
38+
The `reconciler.Reconciler` interface breaks the circular dependency between the main controller and component packages. Component packages (appserver, postgres, console, alertsadapter) receive this interface instead of importing the controller package directly. It embeds `client.Client` and adds getter methods for images, namespace, and OpenShift version.
3739

3840
### ReconcileSteps Pattern
3941
Both phases use a slice of `ReconcileSteps` structs, each containing a Name, reconcile function, and (for Phase 2) a ConditionType and Deployment name. Phase 1 iterates with continue-on-error; Phase 2 iterates but tracks all conditions and diagnostics.
@@ -44,7 +46,7 @@ Two ownership models:
4446
2. **External resources**: Watches() with custom predicates. Annotation-based filtering. Secret/ConfigMap handlers compare data and trigger deployment restarts.
4547

4648
### Finalizer Cleanup
47-
The `finalizeOLSConfig()` method uses `listOwnedResources()` which queries every resource type by owner reference UID (not labels). This is more reliable than label-based cleanup. The wait loop polls with a fixed interval and timeout, using `wait.PollUntilContextTimeout`.
49+
The `finalizeOLSConfig()` method removes Console UI, deletes alerts adapter cross-namespace RBAC via `alertsadapter.RemoveAlertsAdapter()`, then uses `listOwnedResources()` which queries every resource type by owner reference UID (not labels). This is more reliable than label-based cleanup. The wait loop polls with a fixed interval and timeout, using `wait.PollUntilContextTimeout`.
4850

4951
### Status Update Mechanics
5052
`UpdateStatusCondition()` uses `retry.RetryOnConflict` with `client.MergeFrom` patch. It preserves `LastTransitionTime` for conditions whose status hasn't changed. It re-fetches the CR before each update attempt to get the latest ResourceVersion.

.ai/spec/what/bundle-composition.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -36,8 +36,8 @@ The lightspeed-operator OLM bundle installs both the lightspeed-operator control
3636

3737
### Agentic Operand Deployment
3838

39-
16. [PLANNED: OLS-3236] The lightspeed-operator deploys the agentic alerts adapter and the agentic console plugin as fully reconciled operands, with Phase 1/2 reconciliation, status conditions, health monitoring, and finalizer cleanup. The agentic-operator does not deploy these operands.
40-
17. [PLANNED: OLS-3236] Agentic operand images default to `:main` tags until Konflux onboarding provides SHA-pinned productized images. CLI flags (`--alerts-adapter-image`, `--agentic-console-image`) on the lightspeed-operator deployment override the defaults.
39+
16. The lightspeed-operator reconciles the agentic alerts adapter as a fully managed operand (OLS-3348): Phase 1/2 reconciliation, `AlertsAdapterReady` status condition, health monitoring, and finalizer cleanup for cross-namespace RBAC. The agentic console plugin remains [PLANNED: OLS-3236].
40+
17. Agentic operand images default to `:main` tags until Konflux onboarding provides SHA-pinned productized images. The `--alerts-adapter-image` flag is implemented on the lightspeed-operator binary; wiring it into the CSV deployment spec is [PLANNED: OLS-3236]. The `--agentic-console-image` flag is [PLANNED: OLS-3236].
4141

4242
## Configuration Surface
4343

@@ -48,7 +48,7 @@ The lightspeed-operator OLM bundle installs both the lightspeed-operator control
4848
| Agentic controller startup flags | CSV deployment spec args | Operand image overrides for the agentic controller |
4949
| Agentic controller `--sandbox-mode` | CSV deployment spec args | `bare-pod` (default) or `sandbox-claim` — selects sandbox provisioning strategy |
5050
| Agentic controller `--agentic-sandbox-image` | CSV deployment spec args | [PLANNED: OLS-3236] Sandbox container image (default: `:main` tag, overridable) |
51-
| Lightspeed controller `--alerts-adapter-image` | CSV deployment spec args | [PLANNED: OLS-3236] Alerts adapter container image (default: `:main` tag) |
51+
| Lightspeed controller `--alerts-adapter-image` | `cmd/main.go` flag (implemented); CSV deployment spec args [PLANNED: OLS-3236] | Alerts adapter container image (default: Konflux `:main` tag) |
5252
| Lightspeed controller `--agentic-console-image` | CSV deployment spec args | [PLANNED: OLS-3236] Agentic console plugin container image (default: `:main` tag) |
5353

5454
## Constraints
@@ -61,4 +61,4 @@ The lightspeed-operator OLM bundle installs both the lightspeed-operator control
6161

6262
| Ticket | Summary |
6363
|---|---|
64-
| OLS-3236 | Migrate agentic console deployment from agentic-operator to lightspeed-operator. Add alerts-adapter as new operand. Add `--alerts-adapter-image` and `--agentic-console-image` flags to lightspeed-operator CSV deployment. Remove `--agentic-console-image` from agentic-operator CSV deployment. |
64+
| OLS-3236 | Migrate agentic console deployment from agentic-operator to lightspeed-operator. Wire `--alerts-adapter-image` and `--agentic-console-image` into lightspeed-operator CSV deployment. Remove `--agentic-console-image` from agentic-operator CSV deployment. |

.ai/spec/what/crd-api.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,7 @@ Field path (relative to `spec.ols.deployment`) | JSON key | Go type | Notes
108108
`mcpServer` | `mcpServer` | `ContainerConfig` | MCP server container. Resources only
109109
`console` | `console` | `Config` | Console container. Has replicas field but operator forces 1
110110
`database` | `database` | `Config` | Database container. Has replicas field but operator forces 1
111-
`alertsAdapter` | `alertsAdapter` | `Config` | [PLANNED: OLS-3236] Agentic alerts adapter container. Replicas forced to 1
111+
`alertsAdapter` | `alertsAdapter` | `Config` | Agentic alerts adapter container. Replicas forced to 1
112112
`agenticConsole` | `agenticConsole` | `Config` | [PLANNED: OLS-3236] Agentic console plugin container. Replicas forced to 1
113113

114114
20. Replicas are only user-configurable for the API container (`spec.ols.deployment.api.replicas`). For console, database, alerts adapter, and agentic console, the operator always overrides replicas to 1 regardless of spec value.
@@ -280,7 +280,7 @@ Condition types used by the operator:
280280
- `ApiReady` -- API server deployment health
281281
- `CacheReady` -- PostgreSQL cache deployment health
282282
- `ConsolePluginReady` -- Console UI plugin deployment health
283-
- `AlertsAdapterReady` -- [PLANNED: OLS-3236] Agentic alerts adapter deployment health
283+
- `AlertsAdapterReady` -- Agentic alerts adapter deployment health
284284
- `AgenticConsolePluginReady` -- [PLANNED: OLS-3236] Agentic console plugin deployment health
285285
- `ResourceReconciliation` -- Overall resource reconciliation status (set directly, not deployment-based)
286286

@@ -372,7 +372,7 @@ Path | Type | Default | Required | Validation | Description
372372
`spec.ols.deployment.database.nodeSelector` | `map[string]string` | -- | No | -- | Database node selector
373373
`spec.ols.deployment.database.affinity` | `*Affinity` | -- | No | -- | Database affinity
374374
`spec.ols.deployment.database.topologySpreadConstraints` | `[]TopologySpreadConstraint` | -- | No | -- | Database topology spread
375-
`spec.ols.deployment.alertsAdapter` | `Config` | -- | No | -- | [PLANNED: OLS-3236] Alerts adapter deployment
375+
`spec.ols.deployment.alertsAdapter` | `Config` | -- | No | -- | Alerts adapter deployment
376376
`spec.ols.deployment.alertsAdapter.replicas` | `*int32` | `1` | No | Min=0 | Alerts adapter replicas (operator forces 1)
377377
`spec.ols.deployment.alertsAdapter.resources` | `*ResourceRequirements` | -- | No | -- | Alerts adapter resources
378378
`spec.ols.deployment.alertsAdapter.tolerations` | `[]Toleration` | -- | No | -- | Alerts adapter tolerations

.ai/spec/what/reconciliation.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -17,16 +17,16 @@ The operator reconciles the OLSConfig CR into Kubernetes resources through a two
1717
8. Step 6 (Phase 2): Reconcile deployments and dependent resources -- Deployments, Services, TLS certificates, ServiceMonitors, PrometheusRules. After reconciliation, check deployment health and update CR status.
1818

1919
### Phase 1: Independent Resources
20-
9. Five component groups are reconciled in Phase 1: Console UI, PostgreSQL, the application server, the agentic alerts adapter, and the agentic console plugin.
20+
9. Four component groups are reconciled in Phase 1: Console UI, PostgreSQL, the application server, and the agentic alerts adapter. The agentic console plugin is [PLANNED: OLS-3236].
2121
10. All Phase 1 resource groups are independent and can be reconciled in any order.
2222
11. If any Phase 1 resource fails, the operator continues reconciling the remaining resources, then reports all failures in the CR status with ResourceReconciliation conditions.
23-
11a. Alerts adapter Phase 1 resources: ServiceAccount, ClusterRole (`agentic.openshift.io/proposals`: create, list, get), ClusterRoleBinding, RoleBinding in `openshift-monitoring` (binds SA to `monitoring-alertmanager-view`), NetworkPolicy.
24-
11b. Agentic console Phase 1 resources: ServiceAccount, ConfigMap (nginx.conf), NetworkPolicy.
23+
11a. Alerts adapter Phase 1 resources (OLS-3348): ServiceAccount, ClusterRole (`agentic.openshift.io/proposals`: create, list, get), ClusterRoleBinding, RoleBinding in `openshift-monitoring` (binds SA to `monitoring-alertmanager-view`), NetworkPolicy.
24+
11b. Agentic console Phase 1 resources [PLANNED: OLS-3236]: ServiceAccount, ConfigMap (nginx.conf), NetworkPolicy.
2525

2626
### Phase 2: Deployments and Status
27-
12. Five deployments are reconciled in Phase 2: Console UI (condition: ConsolePluginReady), PostgreSQL (condition: CacheReady), the active backend (condition: ApiReady), the agentic alerts adapter (condition: AlertsAdapterReady), and the agentic console plugin (condition: AgenticConsolePluginReady).
28-
12a. Alerts adapter Phase 2: Deployment (1 replica, `ALERTMANAGER_URL` env hardcoded to `https://alertmanager-main.openshift-monitoring.svc:9094`).
29-
12b. Agentic console Phase 2: Deployment (1 replica, nginx with TLS via service-ca cert), Service (port 9443, serving-cert annotation), ConsolePlugin CR, Console CR activation.
27+
12. Four deployments are reconciled in Phase 2: Console UI (condition: `ConsolePluginReady`), PostgreSQL (condition: `CacheReady`), the active backend (condition: `ApiReady`), and the agentic alerts adapter (condition: `AlertsAdapterReady`). The agentic console plugin (condition: `AgenticConsolePluginReady`) is [PLANNED: OLS-3236].
28+
12a. Alerts adapter Phase 2 (OLS-3348): Deployment (1 replica, `ALERTMANAGER_URL` env hardcoded to `https://alertmanager-main.openshift-monitoring.svc:9094`, `POD_NAMESPACE` via downward API).
29+
12b. Agentic console Phase 2 [PLANNED: OLS-3236]: Deployment (1 replica, nginx with TLS via service-ca cert), Service (port 9443, serving-cert annotation), ConsolePlugin CR, Console CR activation.
3030
13. After each deployment reconciliation, the operator checks the deployment's health status.
3131
14. Deployment health has three states: Ready (Available condition true), Progressing (not yet available, no terminal failures), Failed (terminal pod failures detected).
3232
15. Terminal pod failures include: CrashLoopBackOff, ImagePullBackOff, ErrImagePull, OOMKilled, and containers terminated with non-zero exit codes after CrashLoopBackOff.
@@ -37,12 +37,12 @@ The operator reconciles the OLSConfig CR into Kubernetes resources through a two
3737
### Finalizer Lifecycle
3838
19. On CR creation: add finalizer, return immediately (controller-runtime auto-requeues).
3939
20. On CR deletion: run finalizer cleanup before removing finalizer.
40-
21. Finalizer cleanup sequence: remove Console UI from Console CR, delete ConsolePlugin CR, remove agentic console plugin from Console CR, delete agentic ConsolePlugin CR, delete alerts-adapter RoleBinding in `openshift-monitoring`, delete alerts-adapter ClusterRoleBinding, delete alerts-adapter ClusterRole, list all owned resources by owner reference, explicitly delete them, wait for deletion (polling with timeout).
40+
21. Finalizer cleanup sequence: remove Console UI from Console CR, delete ConsolePlugin CR, remove agentic console plugin from Console CR [PLANNED: OLS-3236], delete agentic ConsolePlugin CR [PLANNED: OLS-3236], delete alerts-adapter RoleBinding in `openshift-monitoring`, delete alerts-adapter ClusterRoleBinding, delete alerts-adapter ClusterRole, list all owned resources by owner reference, explicitly delete them, wait for deletion (polling with timeout).
4141
22. If cleanup times out, the finalizer is removed anyway to prevent the CR from being stuck in Terminating state.
4242
23. Console UI and agentic component removal errors during finalization are logged but do not block finalization.
4343

4444
### Status Conditions
45-
24. The operator sets these condition types: ApiReady, CacheReady, ConsolePluginReady, AlertsAdapterReady, AgenticConsolePluginReady, ResourceReconciliation.
45+
24. The operator sets these condition types: `ApiReady`, `CacheReady`, `ConsolePluginReady`, `AlertsAdapterReady`, `AgenticConsolePluginReady` [PLANNED: OLS-3236], `ResourceReconciliation`.
4646
25. OverallStatus is Ready only when all deployment conditions are True.
4747
26. OverallStatus is NotReady if any condition is False.
4848
27. When deployments are not ready, diagnosticInfo is populated with per-pod failure details including container name, reason, message, exit code, and diagnostic type.
@@ -67,4 +67,4 @@ Reconciliation behavior is not directly user-configurable. It is driven by the O
6767

6868
| Ticket | Summary |
6969
|---|---|
70-
| OLS-3236 | [PLANNED] Add alerts-adapter and agentic-console as reconciled operands with Phase 1/2 steps, status conditions (AlertsAdapterReady, AgenticConsolePluginReady), and finalizer cleanup for cross-namespace resources |
70+
| OLS-3236 | [PLANNED] Add agentic-console as a reconciled operand with Phase 1/2 steps, `AgenticConsolePluginReady` status condition, and finalizer cleanup for Console CR / ConsolePlugin CR |

0 commit comments

Comments
 (0)