Skip to content

Commit 562005d

Browse files
Enhance README clarity and remove unused OpenTelemetry config (#182)
* docs: update README to enhance clarity and structure of CDC documentation * chore: remove unused resource detection and attributes processors from OpenTelemetry config * docs: enhance README with detailed key features and security design
1 parent c5450ae commit 562005d

2 files changed

Lines changed: 15 additions & 314 deletions

File tree

README.md

Lines changed: 13 additions & 280 deletions
Original file line numberDiff line numberDiff line change
@@ -1,287 +1,20 @@
1-
# Composition Dynamic Controller
2-
The composition-dynamic-controller is an operator that is instantiated by the [core-provider](https://github.com/krateoplatformops/core-provider) to manage the Custom Resources whose Custom Resource Definition is generated by the core-provider.
1+
# Composition Dynamic Controller (CDC)
32

4-
## Summary
3+
The **Composition Dynamic Controller (CDC)** is the execution engine of Krateo. It is a specialized operator that manages the full lifecycle of Helm-based services by watching and reconciling `Composition` resources.
54

6-
- [Composition Dynamic Controller](#composition-dynamic-controller)
7-
- [Summary](#summary)
8-
- [Architecture](#architecture)
9-
- [Workflow](#workflow)
10-
- [Composition Dynamic Controller (CDC) \& Chart Inspector: Secure Helm Lifecycle Management](#composition-dynamic-controller-cdc--chart-inspector-secure-helm-lifecycle-management)
11-
- [Core CDC Workflow (with Chart Inspector Integration)](#core-cdc-workflow-with-chart-inspector-integration)
12-
- [Key Capabilities Enabled by This Collaboration](#key-capabilities-enabled-by-this-collaboration)
13-
- [Why This Architecture Matters](#why-this-architecture-matters)
14-
- [Real-World Example: Handling a Breaking Chart Change](#real-world-example-handling-a-breaking-chart-change)
15-
- [Helm Release Name Logic](#helm-release-name-logic)
16-
- [Prior Versions (\<= 0.19.9)](#prior-versions--0199)
17-
- [Subsequent Versions (\>= 0.20.0)](#subsequent-versions--0200)
18-
- [Composition Dynamic Controller Values Injection](#composition-dynamic-controller-values-injection)
19-
- [About the `gracefullyPaused` value](#about-the-gracefullypaused-value)
20-
- [Configuration](#configuration)
21-
- [Operator Env Vars](#operator-env-vars)
5+
## Key Features
226

23-
7+
- **Lifecycle Orchestration**: Manages the end-to-end deployment, updates, and deletion of services based on Helm charts.
8+
- **Dynamic Reconciliation**: Automatically reconciles resource states, ensuring the live cluster matches the desired state defined in the `Composition`.
9+
- **Chart Inspector Integration**: Leverages the Krateo Chart Inspector for secure dry-runs, ensuring chart validity and resource safety before application.
2410

25-
## Architecture
11+
## Security & Operational Design
2612

27-
![composition-dynamic-controller architecture](_diagrams/architecture.png "composition-dynamic-controller Architecture")
13+
- **RBAC Enforcement**: Provisions specific, fine-grained RBAC policies for each managed composition, enforcing least-privilege principles at the instance level.
14+
- **Graceful Lifecycle Management**: Supports advanced management features like service pausing/resuming and controlled Helm release versioning.
15+
- **Observability**: Built-in support for OpenTelemetry to monitor reconciliation health and performance metrics.
2816

29-
## Workflow
17+
## Documentation
3018

31-
![composition-dynamic-controller State Diagram](_diagrams/composition-dynamic-controller-flow.png "composition-dynamic-controller State Diagram")
32-
33-
### Composition Dynamic Controller (CDC) & Chart Inspector: Secure Helm Lifecycle Management
34-
35-
The **Composition Dynamic Controller (CDC)** is a specialized Kubernetes operator that orchestrates the end-to-end lifecycle of Krateo compositions. Acting as the reconciliation engine for Composition custom resources, it bridges declarative application definitions with Helm’s packaging system through intelligent automation. The **Chart Inspector** serves as its "safety advisor," enabling proactive decision-making via dry-run analysis.
36-
37-
#### Core CDC Workflow (with Chart Inspector Integration)
38-
1. **Reconciliation Trigger**
39-
- Watches for changes to `Composition` CRs or Helm chart versions.
40-
- Invokes the **Chart Inspector** to simulate installations/upgrades *before* execution.
41-
42-
2. **Dry-Run Analysis Phase** (*Chart Inspector*)
43-
```bash
44-
helm install --dry-run=server <chart> --version <ver> # Returns:
45-
```
46-
- **Resource Manifest List**: All Kubernetes objects (Deployments, CRDs, etc.) the chart would create along with filename to .
47-
- **Dependency Graph**: Order of operations (e.g., CRDs before custom resources).
48-
49-
3. **RBAC Auto-Provisioning** (*CDC*)
50-
- Dynamically generates **least-privilege** Roles/ClusterRoles based on the Inspector’s output.
51-
- Ensures the CDC’s service account has *exactly* the permissions needed—no more, no less.
52-
53-
4. **Atomic Execution** (*CDC*)
54-
- Proceeds with `helm install/upgrade` *only* after successful dry-run and RBAC setup.
55-
56-
---
57-
58-
### Key Capabilities Enabled by This Collaboration
59-
60-
| **Feature** | **CDC’s Role** | **Chart Inspector’s Contribution** |
61-
|----------------------------|-----------------------------------------|----------------------------------------------------|
62-
| **Version-Sensitive Reconciliation** | Detects chart version drift; rolls forward/back. | Identifies version-specific resource changes during dry-run. |
63-
| **Atomic Upgrades** | Ensures all-or-nothing upgrades. | Pre-flights resource compatibility (e.g., CRD schema changes). |
64-
| **Self-Healing** | Corrects configuration drift. | Provides baseline "desired state" for comparison. |
65-
| **Declarative Enforcement** | Continuously reconciles actual vs. desired state. | Supplies the desired state *before* cluster changes. |
66-
| **Secure RBAC** | Generates minimal required permissions. | Audits chart manifests for required API operations. |
67-
68-
---
69-
70-
### Why This Architecture Matters
71-
1. **Safety Net**
72-
- The Chart Inspector’s dry-run prevents "helm surprises" (e.g., undeclared CRD creations or namespace pollution).
73-
- Example: Blocks a chart upgrade if the new version requires a `ClusterRole` the CDC isn’t authorized to manage.
74-
75-
2. **GitOps Compliance**
76-
- The CDC enforces *declarative intent* by reconciling against the dry-run’s output, not just Helm’s last-applied state.
77-
- Self-healing kicks in if manual changes violate the composition’s definition.
78-
79-
3. **Multi-Tenancy Ready**
80-
- RBAC is scoped per-composition, isolating teams/projects.
81-
- The Inspector’s resource listing ensures no cross-tenant leakage (e.g., a composition can’t create resources in forbidden namespaces).
82-
83-
---
84-
85-
### Real-World Example: Handling a Breaking Chart Change
86-
1. **Scenario**: A Helm chart v1.2.0 introduces a new `CustomResourceDefinition` (CRD).
87-
2. **CDC+Inspector Flow**:
88-
- **Dry-run** detects the new CRD and its required API group permissions.
89-
- **CDC** creates a `ClusterRole` granting `create/get/list` for the CRD.
90-
- **Upgrade** proceeds *only after* the CRD and RBAC are confirmed active.
91-
3. **Result**: Zero downtime; no "helm upgrade failed: CRD missing" errors.
92-
93-
94-
---
95-
96-
## Helm Release Name Logic
97-
98-
The logic used by the **`composition-dynamic-controller`** to determine the Helm release name has evolved to better handle multi-tenancy and character limits.
99-
100-
---
101-
102-
### Prior Versions (<= 0.19.9)
103-
104-
In these versions, the release name was determined by:
105-
106-
1. The value of the **label** `krateo.io/release-name` (if set).
107-
2. The **Composition resource name** (as a fallback).
108-
109-
---
110-
111-
### Versions 0.20.0 to 0.20.2
112-
113-
Starting with v0.20.0, the logic shifted to ensure uniqueness across different namespaces:
114-
115-
1. If the **annotation** `krateo.io/release-name` is set, its value is used.
116-
2. Otherwise, the name is generated as: `{composition.metadata.name}-{composition.metadata.uid[:8]}`.
117-
118-
> [!IMPORTANT]
119-
> Because the UID suffix adds 9 characters (hyphen + 8-char UID) and Helm limits release names to **53 characters**, the `metadata.name` of a Composition cannot exceed **44 characters**.
120-
121-
---
122-
123-
### Versions 0.20.3+ (Configurable Logic)
124-
125-
Starting from **v0.20.3**, the environment variable `COMPOSITION_CONTROLLER_SAFE_RELEASE_NAME` allows you to toggle between modern and legacy naming conventions.
126-
127-
#### Default Behavior (`true`)
128-
129-
The controller appends the **UID suffix** to ensure uniqueness across namespaces. This is the recommended setting to prevent Helm naming collisions when identical Composition names exist in different namespaces.
130-
131-
#### Legacy Behavior (`false`)
132-
133-
The controller reverts to the logic used prior to v0.20.0. The release name is determined by:
134-
135-
1. The value of the `krateo.io/release-name` **annotation** (if set).
136-
2. The **Composition resource name**.
137-
138-
> [!CAUTION]
139-
> **Disabling this option is highly discouraged.** While it provides backward compatibility for charts with strict character length limits, it removes the uniqueness guarantee. Creating Compositions with the same name in different namespaces will cause release name collisions and failed Helm operations. It's up to the administrator to enforce a policy between users of composition name uniqueness to mitigate the risk.
140-
141-
---
142-
143-
### Comparison Summary (v0.20.3+)
144-
145-
| `SAFE_RELEASE_NAME` | Source | UID Suffix | Collision Risk | Max Name Length |
146-
| --- | --- | --- | --- | --- |
147-
| **`true` (Default)** | Name + UID | Included | Negligible | 44 Characters |
148-
| **`false`** | Annotation/Name | Excluded | **High** | 53 Characters |
149-
150-
---
151-
152-
153-
## Composition Dynamic Controller Values Injection
154-
155-
The composition-dynamic-controller inject labels and values into the installed resources and in the helm chart release values. This values contains informations about the composition resource associated with the helm release.
156-
The values are injected in the following way:
157-
158-
| Helm Chart Release Values | Installed Resources Labels | Description |
159-
|:--------------------------|:---------------------------|:------------|
160-
| `global.compositionId` | `krateo.io/composition-id` | The composition resource uid |
161-
| `global.compositionName` | `krateo.io/composition-name` | The composition resource name |
162-
| `global.compositionNamespace` | `krateo.io/composition-namespace` | The composition resource namespace |
163-
| `global.compositionInstalledVersion` | `krateo.io/composition-installed-version` | The version of the composition resource installed. This value changes if the chart version is upgraded |
164-
| `global.compositionApiVersion` | not injected | The api version of the composition resource. This values is deprecated but is mainteined for backward compatibility. |
165-
| `global.compositionGroup` | `krateo.io/composition-group` | The group of the composition resource. |
166-
| `global.compositionResource` | `krateo.io/composition-resource` | The plural name of the composition resource. |
167-
| `global.compositionKind` | `krateo.io/composition-kind` | The kind of the composition resource. |
168-
| `global.krateoNamespace` | `krateo.io/krateo-namespace` | The namespace where Krateo is installed. This value is used to identify the Krateo resources in the cluster. |
169-
| `global.gracefullyPaused`| not injected | This value is set to `true` if the annotation `krateo.io/gracefully-paused` is set on the composition resource. This value is used to pause the reconciliation of the composition resource only after the value is injected in the helm release values with a successful helm upgrade. Read the [paragraph below](#about-the-gracefullypaused-value) for more details. |
170-
171-
### About the `gracefullyPaused` value
172-
173-
The `global.gracefullyPaused` value provides a way to gracefully pause both the composition resource and all Krateo resources within its Helm chart.
174-
175-
#### How it works:
176-
1. **Trigger**: Set the annotation `krateo.io/gracefully-paused` on the composition resource
177-
2. **Activation**: The pause takes effect only after the next successful Helm upgrade injects this value into the chart
178-
3. **Scope**: Pauses both the composition reconciliation AND any Krateo resources in the chart that respect the `krateo.io/paused` annotation
179-
180-
#### Use cases:
181-
- **Graceful pause**: Temporarily halt all composition-related activity without resource deletion
182-
- **Coordinated pause**: Ensure both the composition and its managed resources pause simultaneously
183-
- **Safe maintenance**: Pause operations during maintenance windows
184-
185-
#### Comparison with `krateo.io/paused`:
186-
187-
| Annotation | Scope | When it takes effect |
188-
|------------|-------|---------------------|
189-
| `krateo.io/gracefully-paused` | Composition + chart resources | After next Helm upgrade |
190-
| `krateo.io/paused` | Composition only | Immediately |
191-
192-
**Example**: Use `krateo.io/gracefully-paused` when you need to pause an entire application stack, or `krateo.io/paused` for immediate composition-only pausing.
193-
194-
#### How to include the pause in a resource included in the chart
195-
196-
##### For Krateo resources that support pausing via the `krateo.io/paused` annotation:
197-
198-
To include the pause in a resource included in the chart, you can use the `krateo.io/paused` annotation on the resource. This will ensure that the resource is paused when the composition is paused.
199-
200-
```yaml
201-
apiVersion: git.krateo.io/v1alpha1
202-
kind: Repo
203-
metadata:
204-
name: {{ include "fireworks-app.fullname" . }}-repo
205-
labels:
206-
{{- include "fireworks-app.labels" . | nindent 4 }}
207-
annotations:
208-
krateo.io/paused: "{{ default false (and .Values.global .Values.global.gracefullyPaused) }}"
209-
spec:
210-
...
211-
```
212-
213-
##### For non-Krateo resources:
214-
> **NOTE:** Operators implemented without the Krateo runtime may handle "pause" semantics differently (different annotation keys, immediate vs. delayed behavior, or custom fields). Before templating a pause for a non‑Krateo controller, review that operator's API and controller behavior and adapt the Helm template to map `global.gracefullyPaused` to the operator's expected pause mechanism.
215-
216-
217-
This is an example of how to include the pause in a non-Krateo resource included in the chart. In this case, we use an ArgoCD Application as an example.
218-
219-
```yaml
220-
apiVersion: argoproj.io/v1alpha1
221-
kind: Application
222-
metadata:
223-
name: {{ .Release.Name }}
224-
namespace: {{ .Values.argocd.namespace }}
225-
labels:
226-
{{- include "github-scaffolding.labels" . | nindent 4 }}
227-
spec:
228-
project: {{ .Values.argocd.application.project | default "default" }}
229-
source:
230-
repoURL: {{ include "github-scaffolding.toRepoUrl" . }}
231-
targetRevision: {{ .Values.git.toRepo.branch }}
232-
path: {{ .Values.argocd.application.source.path }}
233-
destination:
234-
server: {{ .Values.argocd.application.destination.server }}
235-
namespace: {{ .Values.argocd.application.destination.namespace }}
236-
237-
{{- /* Normalize flags */ -}}
238-
{{- $hasPaused := and .Values.global (hasKey .Values.global "gracefullyPaused") -}}
239-
{{- $paused := and $hasPaused (eq (toString .Values.global.gracefullyPaused) "true") -}}
240-
{{- $syncEnabled := default false .Values.argocd.application.syncEnabled -}}
241-
242-
{{- if $paused }}
243-
syncPolicy: {}
244-
{{- else if $syncEnabled }}
245-
syncPolicy:
246-
automated:
247-
prune: {{ default false .Values.argocd.application.syncPolicy.automated.prune }}
248-
selfHeal: {{ default false .Values.argocd.application.syncPolicy.automated.selfHeal }}
249-
syncOptions:
250-
- CreateNamespace=true
251-
{{- else }}
252-
syncPolicy: {}
253-
{{- end }}
254-
```
255-
The composition-dynamic-controller injects a `global.gracefullyPaused` boolean into Helm release values after a successful upgrade. When `true`, chart templates can use this flag to disable automated behavior for non‑Krateo resources (for example emit `syncPolicy: {}` in an Argo CD Application) and to set `krateo.io/paused` on Krateo resources, ensuring a coordinated pause across the composition and all included resources.
256-
257-
258-
259-
## Configuration
260-
261-
### Operator Env Vars
262-
263-
These enviroment varibles can be changed in the Deployment of the composition-dynamic-controller we need to tweak.
264-
265-
| Name | Description | Default Value | Notes |
266-
|:---------------------------------------|:---------------------------|:--------------|:--------------|
267-
| COMPOSITION_CONTROLLER_DEBUG | dump verbose output | false | |
268-
| COMPOSITION_CONTROLLER_WORKERS | number of workers | 1 | |
269-
| COMPOSITION_CONTROLLER_RESYNC_INTERVAL | resync interval | 3m | |
270-
| COMPOSITION_CONTROLLER_GROUP | resource api group | | populated by `core-provider` |
271-
| COMPOSITION_CONTROLLER_VERSION | resource api version | | populated by `core-provider` |
272-
| COMPOSITION_CONTROLLER_RESOURCE | resource plural name | | populated by `core-provider` |
273-
| COMPOSITION_CONTROLLER_SA_NAME | cdc deployment ServiceAccount name | | populated by `core-provider` |
274-
| COMPOSITION_CONTROLLER_SA_NAMESPACE | cdc deployment ServiceAccount namespace | |populated by `core-provider` |
275-
| URL_PLURALS | NOT USED from version 0.17.1 - URL to krateo pluraliser service | `http://snowplow.krateo-system.svc.cluster.local:8081/api-info/names` | Ignored from version 0.17.1 |
276-
| URL_CHART_INSPECTOR | url to chart inspector | `http://chart-inspector.krateo-system.svc.cluster.local:8081/` |
277-
| KRATEO_NAMESPACE | namespace where krateo is installed | krateo-system |
278-
| HELM_REGISTRY_CONFIG_PATH | NOT USED from version '1.0.0' - default helm config path | /tmp |
279-
| HELM_MAX_HISTORY | Max Helm History | 3 |
280-
| COMPOSITION_CONTROLLER_MAX_ERROR_RETRY_INTERVAL | The maximum interval between retries when an error occurs. This should be less than the half of the poll interval. | 60s |
281-
| COMPOSITION_CONTROLLER_MIN_ERROR_RETRY_INTERVAL | The minimum interval between retries when an error occurs. This should be less than max-error-retry-interval. | 1s |
282-
| COMPOSITION_CONTROLLER_MAX_ERROR_RETRIES | The maximum number of retries when an error occurs. Set to 0 to disable retries. | 5 |
283-
| COMPOSITION_CONTROLLER_METRICS_SERVER_PORT | The port where the metrics server will be listening. If not set, the metrics server is disabled. | |
284-
| COMPOSITION_CONTROLLER_SAFE_RELEASE_NAME | If disabled the randmom suffix is not appended in the Helm release name. This can be useful for avoid having problems with complex helm charts. The use of this option is highly discouraged, as it can lead to release name collisions. | true |
285-
| OTEL_ENABLED | Enable OpenTelemetry metrics export | false |
286-
| OTEL_EXPORTER_OTLP_ENDPOINT | OTLP HTTP endpoint where metrics are sent | `http://localhost:4318` |
287-
| OTEL_EXPORT_INTERVAL | Interval at which metrics are exported | 30s |
19+
For detailed guides, architecture diagrams, and full reference, visit the official documentation:
20+
👉 **[https://docs.krateo.io](https://docs.krateo.io/key-concepts/kco/cdc/overview)**

telemetry/otelcol-values.yaml

Lines changed: 2 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -11,37 +11,14 @@ config:
1111
endpoint: 0.0.0.0:4318
1212
processors:
1313
batch: {}
14-
resourcedetection:
15-
detectors: [k8s, env]
16-
timeout: 2s
17-
override: true
18-
attributes:
19-
actions:
20-
- key: k8s_pod_name
21-
from_attribute: k8s.pod.name
22-
action: upsert
23-
- key: k8s_namespace_name
24-
from_attribute: k8s.namespace.name
25-
action: upsert
26-
- key: k8s_deployment_name
27-
from_attribute: k8s.deployment.name
28-
action: upsert
29-
- key: k8s_statefulset_name
30-
from_attribute: k8s.statefulset.name
31-
action: upsert
32-
resource:
33-
attributes:
34-
- key: deployment.environment
35-
value: production
36-
action: insert
3714
exporters:
3815
prometheus:
3916
endpoint: 0.0.0.0:9464
4017
service:
4118
pipelines:
4219
metrics:
4320
receivers: [otlp]
44-
processors: [resourcedetection, attributes, batch, resource]
21+
processors: [batch]
4522
exporters: [prometheus]
4623

4724
ports:
@@ -54,13 +31,4 @@ ports:
5431
enabled: true
5532
containerPort: 9464
5633
servicePort: 9464
57-
protocol: TCP
58-
59-
serviceMonitor:
60-
enabled: true
61-
metricsEndpoints:
62-
- port: prom-metrics
63-
interval: 30s
64-
scrapeTimeout: 10s
65-
extraLabels:
66-
release: monitoring-stack
34+
protocol: TCP

0 commit comments

Comments
 (0)