Skip to content

Add Pipeline CRD for Redpanda Connect pipeline management#1337

Draft
david-yu wants to merge 53 commits intomainfrom
feat/connect-crd
Draft

Add Pipeline CRD for Redpanda Connect pipeline management#1337
david-yu wants to merge 53 commits intomainfrom
feat/connect-crd

Conversation

@david-yu
Copy link
Copy Markdown
Contributor

@david-yu david-yu commented Mar 23, 2026

Summary

Introduces the Pipeline custom resource (shortName: rpcn) for managing Redpanda Connect pipelines via the Redpanda Operator. This enables declarative pipeline lifecycle management through Kubernetes CRDs, gated behind an enterprise license for RPCN.

What's included

CRD (Pipeline):

  • spec.configYaml — Redpanda Connect pipeline configuration in YAML
  • spec.replicas — number of pipeline replicas (default: 1)
  • spec.image — container image override (default: redpandadata/connect:4.87.0)
  • spec.paused — scales replicas to 0 when true
  • spec.resources — compute resource requirements
  • spec.env — additional environment variables
  • spec.secretRef — Kubernetes Secrets to inject as environment variables
  • spec.cluster — optional ClusterSource reference to a Redpanda cluster
  • spec.zones — availability zones for pod spreading
  • spec.annotations — pod-level annotations (e.g., for Datadog autodiscovery), merged with commonAnnotations
  • spec.tolerations / spec.nodeSelector / spec.topologySpreadConstraints — scheduling controls
  • spec.displayName / spec.description / spec.tags / spec.configFiles — Cloud migration-compatible metadata
  • Status tracks phase (Running/Provisioning/Stopped), ready replicas, and conditions

ClusterRef — Connect Pipelines to Redpanda Clusters:

When a Pipeline references a Redpanda cluster via spec.cluster.clusterRef, the operator automatically:

  1. Resolves the cluster's internal broker addresses, TLS configuration, and SASL credentials
  2. Injects environment variables into the Connect pod
  3. Mounts TLS CA certificates as a projected volume at /etc/tls/certs/ca/
  4. Watches referenced Redpanda CRs — when a cluster changes, all referencing Pipelines re-reconcile

This enables seamless connectivity to operator-managed Redpanda clusters using the redpanda input, redpanda output, redpanda_migrator input, and redpanda_migrator output.

Environment Variable When set Description
RPK_BROKERS Always (with clusterRef) Comma-separated internal broker addresses
RPK_TLS_ENABLED Always (with clusterRef) true or false
RPK_TLS_ROOT_CAS_FILE TLS enabled Path to mounted CA certificate
RPK_SASL_MECHANISM Cluster has SASL enabled Bootstrap user SASL mechanism
RPK_SASL_USER Cluster has SASL enabled Bootstrap user username
RPK_SASL_PASSWORD Cluster has SASL enabled Bootstrap user password (from Secret)

Controller:

  • Reconciles Pipeline CRs using kube.Ctl and server-side apply (SSA) semantics
  • Uses kube.Syncer for child resource lifecycle management (ConfigMap, Deployment)
  • Resource rendering externalized to a dedicated render struct implementing kube.Renderer
  • Status conditions use SSA-compatible utils.StatusConditionConfigs helper — no swallowed errors
  • Validates enterprise license on every reconciliation using common-go/license v1
  • License must include the CONNECT product, allow enterprise features, and be unexpired
  • Owned resources (Deployment, ConfigMap) are garbage-collected on CR deletion via Syncer
  • Gated behind --enable-connect flag (default: false)
  • Watches referenced Redpanda clusters via field index — re-reconciles Pipelines when cluster changes
  • ClusterRef and ConfigValid conditions track resolution and lint status

Typed Status Conditions:

  • PipelinePhase is a typed enum: Pending, Provisioning, Running, Stopped, Unknown
  • Condition types: Ready, ConfigValid, ClusterRef
  • Condition reasons include: Running, Provisioning, Paused, LicenseInvalid, Failed, ConfigValid, ConfigInvalid, ClusterRefResolved, ClusterRefInvalid

Prometheus Monitoring (PodMonitor):

  • Controller creates a PodMonitor per Pipeline CR when connectController.monitoring.enabled is true
  • PodMonitor scrapes Redpanda Connect's /metrics endpoint on port 4195
  • Configurable scrape interval and custom labels via operator Helm values

Configuration Lint Validation:

  • Deployment includes a lint init container that runs /redpanda-connect lint before the main container starts
  • Controller checks init container status (including LastTerminationState) and surfaces lint errors via a ConfigValid condition
  • Uses shorter requeue interval (15s) during provisioning to detect failures quickly

Helm Chart Integration:

  • Pipeline RBAC policy includes permissions for pipelines, redpandas (for clusterRef), deployments, configmaps, pods, secrets, and podmonitors
  • RBAC gated by connectController.enabled in the chart values
  • Pipeline CRD added to the CRD installation subcommand

Reference implementation

Based on the pipeline controller in cloudv2/apps/redpanda-connect-api, adapted to operator patterns (SSA, kube.Ctl, kube.Syncer, render package, typed conditions, RBAC in helm chart).

CLAUDE.md

Added a "Creating a New CRD" section documenting the conventions to follow for future CRD additions.


Try it out

A pre-built operator image is available at yongshin/redpanda-operator:pipeline-crd (linux/arm64).

Step 1: Check out the branch and install CRDs

git clone https://github.com/redpanda-data/redpanda-operator.git
cd redpanda-operator
git checkout feat/connect-crd

# Install the CRDs
kubectl apply -f operator/config/crd/bases/

Step 2: Create a license Secret

kubectl create secret generic redpanda-license \
  --from-file=license=./redpanda.license \
  -n redpanda

Step 3: Deploy the operator with the pre-built image

helm install redpanda-operator ./operator/chart \
  --set image.repository=yongshin/redpanda-operator \
  --set image.tag=pipeline-crd \
  --set rbac.createRPKBundleCRs=false \
  --set connectController.enabled=true \
  --set enterprise.licenseSecretRef.name=redpanda-license \
  --set enterprise.licenseSecretRef.key=license \
  --create-namespace \
  -n redpanda

Step 4: Deploy a Connect pipeline

# pipeline.yaml
apiVersion: cluster.redpanda.com/v1alpha2
kind: Pipeline
metadata:
  name: demo-pipeline
  namespace: redpanda
spec:
  configYaml: |
    input:
      generate:
        mapping: 'root.message = "hello world"'
        interval: "5s"
    output:
      stdout: {}
  replicas: 1
kubectl apply -f pipeline.yaml
kubectl get rpcn -n redpanda
kubectl describe pipeline demo-pipeline -n redpanda

Step 5: Verify the pipeline is running

# Check status
kubectl get rpcn -n redpanda
# NAME            READY   PHASE     REPLICAS   AVAILABLE   AGE
# demo-pipeline   True    Running   1          1           30s

# Check pipeline logs
kubectl logs -n redpanda -l app.kubernetes.io/instance=demo-pipeline

Clean up

kubectl delete pipeline demo-pipeline -n redpanda
helm uninstall redpanda-operator -n redpanda
kubectl delete -f operator/config/crd/bases/

Usage guide

Prerequisites

  1. The operator must be started with --enable-connect (disabled by default).
  2. A valid Redpanda enterprise license that includes the CONNECT product, configured via enterprise.licenseSecretRef in the operator Helm chart values.

Configure the license and enable the Connect controller

helm install redpanda-operator redpanda/operator \
  --set connectController.enabled=true \
  --set enterprise.licenseSecretRef.name=redpanda-license \
  --set enterprise.licenseSecretRef.key=license

Create a Connect pipeline

apiVersion: cluster.redpanda.com/v1alpha2
kind: Pipeline
metadata:
  name: my-pipeline
  namespace: redpanda
spec:
  configYaml: |
    input:
      kafka:
        addresses: ["redpanda:9092"]
        topics: ["events"]
        consumer_group: "connect-pipeline"
    output:
      kafka:
        addresses: ["redpanda:9092"]
        topic: "processed-events"
  replicas: 1

Monitor the pipeline

kubectl get rpcn -n redpanda
kubectl describe pipeline my-pipeline -n redpanda
kubectl get pods -n redpanda -l app.kubernetes.io/instance=my-pipeline

Status phases:

Phase Meaning
Pending Pipeline has been accepted but Deployment not yet created
Provisioning Deployment is being created or pods are starting up
Running All replicas are ready and processing
Stopped Pipeline is paused (spec.paused: true)

Pause / resume a pipeline

# Pause
kubectl patch pipeline my-pipeline -n redpanda \
  --type merge -p '{"spec":{"paused":true}}'

# Resume
kubectl patch pipeline my-pipeline -n redpanda \
  --type merge -p '{"spec":{"paused":false}}'

Spread pods across availability zones

spec:
  replicas: 3
  zones:
    - us-east-1a
    - us-east-1b
    - us-east-1c

Connect to a Redpanda cluster via clusterRef

Reference an operator-managed Redpanda cluster. The operator resolves broker addresses, TLS, and bootstrap SASL credentials automatically:

apiVersion: cluster.redpanda.com/v1alpha2
kind: Pipeline
metadata:
  name: my-pipeline
spec:
  cluster:
    clusterRef:
      name: my-redpanda-cluster
  configYaml: |
    input:
      redpanda:
        seed_brokers: ["${RPK_BROKERS}"]
        tls:
          enabled: ${RPK_TLS_ENABLED}
          root_cas_file: "${RPK_TLS_ROOT_CAS_FILE}"
        topics: ["events"]
        consumer_group: "my-pipeline"
    output:
      redpanda:
        seed_brokers: ["${RPK_BROKERS}"]
        tls:
          enabled: ${RPK_TLS_ENABLED}
          root_cas_file: "${RPK_TLS_ROOT_CAS_FILE}"
        topic: "processed-events"

Using dedicated user credentials (non-admin) with clusterRef

By default, clusterRef injects the cluster's bootstrap (admin) SASL credentials via RPK_SASL_* env vars. For least-privilege access, create a dedicated User CRD and store both its username and password in a Secret. Then reference that Secret via spec.secretRef or spec.env and configure the SASL mechanism directly in your pipeline config:

# Step 1: Create a Secret with the dedicated user's credentials
apiVersion: v1
kind: Secret
metadata:
  name: pipeline-user-credentials
stringData:
  username: pipeline-user
  password: my-secure-password
  sasl_mechanism: SCRAM-SHA-256
---
# Step 2: Create a dedicated user with specific ACLs
apiVersion: cluster.redpanda.com/v1alpha2
kind: User
metadata:
  name: pipeline-user
spec:
  cluster:
    clusterRef:
      name: my-cluster
  authentication:
    type: scram-sha-256
    password:
      valueFrom:
        secretKeyRef:
          name: pipeline-user-credentials
          key: password
  authorization:
    acls:
      - resource: {type: topic, name: "events"}
        operations: [read, describe]
      - resource: {type: topic, name: "processed-events"}
        operations: [write, describe, create]
      - resource: {type: group, name: "my-pipeline"}
        operations: [read]
---
# Step 3: Reference user credentials via spec.secretRef and configure SASL in configYaml
apiVersion: cluster.redpanda.com/v1alpha2
kind: Pipeline
metadata:
  name: my-pipeline
spec:
  cluster:
    clusterRef:
      name: my-cluster
  secretRef:
    - name: pipeline-user-credentials
  configYaml: |
    input:
      redpanda:
        seed_brokers: ["${RPK_BROKERS}"]
        tls:
          enabled: ${RPK_TLS_ENABLED}
          root_cas_file: "${RPK_TLS_ROOT_CAS_FILE}"
        sasl:
          - mechanism: "${sasl_mechanism}"
            username: "${username}"
            password: "${password}"
        topics: ["events"]
        consumer_group: "my-pipeline"
    output:
      redpanda:
        seed_brokers: ["${RPK_BROKERS}"]
        tls:
          enabled: ${RPK_TLS_ENABLED}
          root_cas_file: "${RPK_TLS_ROOT_CAS_FILE}"
        sasl:
          - mechanism: "${sasl_mechanism}"
            username: "${username}"
            password: "${password}"
        topic: "processed-events"

Note: When using spec.secretRef, all keys from the referenced Secret are injected as env vars. The Secret key names (username, password, sasl_mechanism) become env var names. The clusterRef still provides broker addresses and TLS — only the SASL credentials are overridden in the config.

Alternatively, use spec.env to map specific Secret keys to custom env var names:

spec:
  cluster:
    clusterRef:
      name: my-cluster
  env:
    - name: MY_SASL_USER
      valueFrom:
        secretKeyRef:
          name: pipeline-user-credentials
          key: username
    - name: MY_SASL_PASSWORD
      valueFrom:
        secretKeyRef:
          name: pipeline-user-credentials
          key: password
  configYaml: |
    input:
      redpanda:
        seed_brokers: ["${RPK_BROKERS}"]
        tls:
          enabled: ${RPK_TLS_ENABLED}
          root_cas_file: "${RPK_TLS_ROOT_CAS_FILE}"
        sasl:
          - mechanism: SCRAM-SHA-256
            username: "${MY_SASL_USER}"
            password: "${MY_SASL_PASSWORD}"
        topics: ["events"]
    output:
      stdout: {}

Passing secrets to a Pipeline

Pipelines often need credentials (e.g., Kafka passwords, API keys). There are two approaches:

Option A: Reference an entire Secret (spec.secretRef)

All key-value pairs in each referenced Secret are injected as environment variables. The pipeline config can reference them using ${VAR_NAME} interpolation.

spec:
  secretRef:
    - name: my-pipeline-creds
  configYaml: |
    input:
      kafka:
        addresses: ["redpanda:9092"]
        topics: ["events"]
        password: "${KAFKA_PASSWORD}"
    output:
      aws_s3:
        bucket: my-bucket
        credentials:
          id: "${S3_ACCESS_KEY}"
          secret: "${S3_SECRET_KEY}"

Option B: Reference individual Secret keys (spec.env)

spec:
  env:
    - name: KAFKA_PASSWORD
      valueFrom:
        secretKeyRef:
          name: kafka-creds
          key: password
  configYaml: |
    input:
      kafka:
        addresses: ["redpanda:9092"]
        topics: ["events"]
        password: "${KAFKA_PASSWORD}"
    output:
      stdout: {}
Approach Best for
spec.secretRef Injecting all keys from a Secret at once
spec.env with secretKeyRef Cherry-picking individual keys or renaming them

See also: Redpanda Connect secrets documentation

Common annotations for Gatekeeper compliance

commonAnnotations:
  owner: "platform-team@example.com"
  environment: "production"

Monitoring Pipeline metrics with Prometheus

helm install redpanda-operator redpanda/operator \
  --set connectController.enabled=true \
  --set connectController.monitoring.enabled=true \
  --set connectController.monitoring.scrapeInterval=30s \
  --set enterprise.licenseSecretRef.name=redpanda-license \
  --set enterprise.licenseSecretRef.key=license

Add Prometheus metrics to your pipeline config:

spec:
  configYaml: |
    input: ...
    output: ...
    metrics:
      prometheus:
        add_process_metrics: true
        add_go_metrics: true

Monitoring Pipeline metrics with Datadog

spec:
  annotations:
    ad.datadoghq.com/connect.checks: |
      {
        "openmetrics": {
          "instances": [
            {
              "openmetrics_endpoint": "http://%%host%%:4195/metrics",
              "namespace": "redpanda_connect",
              "metrics": [".*"]
            }
          ]
        }
      }

Configuration lint validation

# Check lint status
kubectl get pipeline my-pipeline -o jsonpath='{.status.conditions[?(@.type=="ConfigValid")]}' | jq .

# View raw lint output
kubectl logs deploy/my-pipeline -c lint

License validation and troubleshooting

Condition Reason Message Resolution
LicenseInvalid no license configured: set enterprise.licenseSecretRef... Configure enterprise.licenseSecretRef in operator Helm values
LicenseInvalid failed to read license Ensure the license Secret exists and key is correct
LicenseInvalid license expired Contact Redpanda for a renewal
LicenseInvalid license does not allow enterprise features Need an Enterprise or Trial license
LicenseInvalid license does not include Redpanda Connect License lacks the CONNECT product

Failure modes and recovery

The Pipeline controller creates a standard Kubernetes Deployment, so most failure recovery is handled by Kubernetes built-in controllers rather than the operator itself. The operator does not implement custom node-failure detection or pod rescheduling — it delegates that entirely to the Deployment abstraction and observes the resulting state on each reconciliation cycle.

Node failure (detailed walkthrough)

When a node running pipeline pods fails:

  1. Detection — the kubelet stops heartbeating. After node-monitor-grace-period (default 40s), the node controller marks the node as NotReady.
  2. Eviction — after pod-eviction-timeout (default 5 minutes), the node controller taints the node with node.kubernetes.io/unreachable:NoExecute. Pods without a matching toleration are evicted.
  3. Rescheduling — the ReplicaSet controller detects the pod count is below the desired replica count and schedules replacement pods on healthy nodes.
  4. Operator reconcile — on the next reconciliation (triggered by the Deployment status change), the operator updates the Pipeline status conditions and phase (e.g., Provisioning while new pods start, then Running once ready).

If spec.zones is configured, replacement pods respect the zone node affinity and topology spread constraint (ScheduleAnyway), so they will prefer spreading across zones but will not block scheduling if a zone is entirely unavailable. If spec.budget is configured, the PDB limits how many pods can be simultaneously evicted during voluntary disruptions (node drains), but does not affect involuntary evictions from node failures.

Failure Who recovers What happens Notes
Node failure Kubernetes (ReplicaSet controller) Node NotReady → eviction timeout → pods rescheduled on healthy nodes Zone affinity and topology spread respected. PDB protects against voluntary disruptions only.
Pod crash / OOMKill Kubernetes (kubelet) Container restarts with exponential backoff (CrashLoopBackOff) Readiness probe (/ready on port 4195) gates traffic. Operator updates status conditions on next reconcile.
Invalid pipeline config Operator (lint init container) lint init container exits non-zero → pod stays in Init:Error → operator sets ConfigValid=False condition with lint output Controller uses a 15s requeue during provisioning to detect failures quickly. Fix the config and the next reconcile will succeed.
Invalid license Operator Reconciler short-circuits before creating any child resources → Ready=False with LicenseInvalid reason No Deployment or ConfigMap is created until a valid license is present.
ClusterRef target unavailable Operator Sets ClusterRef=False condition with error details, skips Deployment reconciliation Pipeline will not start until the referenced Redpanda cluster is resolvable. Watches on the Redpanda CR trigger re-reconciliation when the cluster becomes available.
Deployment rollout stall Kubernetes (Deployment controller) Recreate strategy kills all old pods before creating new ones — if new pods fail readiness, rollout stalls Operator reflects the Deployment status in Pipeline conditions. Manual intervention may be needed (fix config, check resources).
Pipeline paused (spec.paused: true) Operator Scales replicas to 0, sets Stopped phase Resume by setting spec.paused: false.

Key design decisions:

  • spec.budget (PodDisruptionBudget) — configurable via the CRD with maxUnavailable or minAvailable. Protects against voluntary disruptions (node drain, cluster autoscaler) but does not affect involuntary evictions. The PDB is rendered by the Syncer alongside the Deployment and ConfigMap, so it is automatically garbage-collected on CR deletion. CEL validation enforces exactly one of the two fields.
  • Recreate strategy — chosen over RollingUpdate to avoid running two pipeline instances concurrently that might double-process messages. This means updates cause brief downtime. Note that a PDB with maxUnavailable: 1 will slow down voluntary drains but does not conflict with the Recreate strategy (which is the operator's own update path, not a voluntary disruption).
  • No automatic retry of failed lint — the init container fails fast; the operator surfaces the error and waits for a config fix rather than retrying.

Test plan

  • go build ./... passes in operator/ and acceptance/
  • Unit tests for license validation (8 tests covering all license scenarios)
  • Render tests for ConfigMap, Deployment defaults, paused, secrets, zones, common annotations
  • PodMonitor render tests: disabled, enabled, common annotations, no scrape interval
  • Reconciler tests for no-license, invalid-license, and deletion paths (require envtest)
  • Helm chart rendering test cases for connectController.enabled, monitoring, and common annotations
  • Lint passes (task lint)
  • Acceptance tests: create/run, delete, update, stop, resume, lint validation, clusterRef produce, clusterRef consume

🤖 Generated with Claude Code

@david-yu david-yu changed the title Add Connect CRD for Redpanda Connect pipeline management Add Pipeline CRD for Redpanda Connect pipeline management Mar 24, 2026
@github-actions
Copy link
Copy Markdown

This PR is stale because it has been open 5 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the stale label Mar 30, 2026
@david-yu david-yu removed the stale label Mar 30, 2026
@david-yu david-yu marked this pull request as ready for review March 30, 2026 22:51
Copy link
Copy Markdown
Contributor

@andrewstucki andrewstucki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is just a movement of connect pipeline reconcilers over from another repo, but would definitely want to change a chunk of the design around how this reconciliation works to be more inline with the patterns that this repo has before merging anything like this. Could we just add this in as part of a roadmap rather than trying to generate it? It shouldn't take more than a day or two to implement properly once we actually pull it in. But as is, there are a number of issues I see immediately with this PR that need changing:

  1. we try to use SSA semantics whenever possible, so the CreateOrPatch and Update calls are out-of-place.
  2. not a huge fan of swallowing the status Update errors on the reconcile calls, and it appears inconsistent -- some times it looks like we're returning the update error, sometimes swallowing it
  3. we generally try and externalize our sub-resource definitions to some sort of "render" package to avoid having to inline everything
  4. this should likely use the kube.Ctl synchronization primitives
  5. I'm assuming we'd probably want to run some of the secret stuff through cloud-secret materialization?
  6. would we want any of the configuration around Redpanda sources to somehow be pluggable with our clusterRef-style specification?
  7. this appears to not have created the RBAC policies in the proper place as it needs to be copies over to the helm chart itself
  8. the tests should actually test the reconciler, here they just do license validation
  9. I'd prefer to use some sort of enum/typed status information for the pipeline conditions, because what they are/do are basically undocumented right now
  10. at least one rendering test in the helm chart should test the enabling flag
  11. the CRD itself also needs to be added to the CRD installation process subcommand in order for this to ever work.
  12. for a new CRD type we should have at least one acceptance test that excercises the feature.

@david-yu
Copy link
Copy Markdown
Contributor Author

david-yu commented Mar 31, 2026

Moving back to draft mode. Thanks for taking a look.

@david-yu david-yu marked this pull request as draft March 31, 2026 03:37
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 6, 2026

This PR is stale because it has been open 5 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the stale label Apr 6, 2026
@david-yu david-yu removed the stale label Apr 6, 2026
david-yu and others added 14 commits April 8, 2026 11:56
Introduces the Connect custom resource (shortName: rpcn) for managing
Redpanda Connect pipelines via the Redpanda Operator. Each Connect CR
declaratively specifies a pipeline configuration in YAML, and the
controller reconciles the desired state by managing a Deployment and
ConfigMap.

Enterprise license gating: the controller validates a Redpanda enterprise
license (v1 format from common-go/license) on every reconciliation. The
license must include the CONNECT product and be unexpired. The license is
read from a Kubernetes Secret referenced by spec.licenseSecretRef.

Key components:
- CRD types: Connect, ConnectSpec, ConnectStatus in v1alpha2
- Controller: creates/patches ConfigMap + Deployment, updates status
- RBAC: ClusterRole permissions for connects, deployments, configmaps, secrets
- CRD manifest: cluster.redpanda.com_connects.yaml
- Gated behind --enable-connect flag (default: false)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update generated files to match what CI's controller-gen v0.20.1
and code generators produce:

- Move Connect deepcopy functions to correct alphabetical position
  (after Configurator, before ConnectorMonitoring)
- Regenerate CRD YAML with full OpenAPI schema from controller-gen
- Update crd-docs.adoc with Connect type documentation
- Add Connect deprecation test case
- Update RBAC role.yaml to match controller-gen output
- Add missing common-go/license go.sum entries in acceptance/ and gen/
- Fix whitespace in run.go

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fix TestCRDS by adding connects.cluster.redpanda.com to the expected
CRD list and adding a Connect() helper function.

Add Cloud-compatible fields to ConnectSpec for smooth migration to
Redpanda Cloud managed Connect:
- displayName: human-readable pipeline name
- description: pipeline description
- tags: key-value pairs for filtering/organization
- configFiles: additional config files mounted at /config

The controller now includes configFiles entries in the ConfigMap
alongside connect.yaml, with a guard against key collision.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add displayName, description, tags, and configFiles documentation
to the ConnectSpec section of the generated CRD docs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add scheduling fields to ConnectSpec for spreading pipeline pods
across availability zones:

- zones: list of AZs to constrain and spread pods across. When set,
  the controller auto-generates a node affinity (restrict to listed
  zones) and a topology spread constraint (even distribution with
  maxSkew=1, ScheduleAnyway) using topology.kubernetes.io/zone.
- tolerations: standard k8s tolerations for tainted nodes
- nodeSelector: label-based node selection
- topologySpreadConstraints: explicit spread constraints that
  override the auto-generated zone constraint when provided

Example usage:
  spec:
    zones: ["us-east-1a", "us-east-1b", "us-east-1c"]
    replicas: 3

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update connects CRD YAML with full TopologySpreadConstraint schema
instead of x-kubernetes-preserve-unknown-fields, expand toleration
descriptions, fix field ordering (nodeSelector before paused), and
update crd-docs.adoc descriptions to match Go struct comments.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Connect controller is now enabled by default (--enable-connect=true).
Users can disable it via the operator helm chart value:

  helm install redpanda-operator ... --set connectController.enabled=false

Individual Connect pipeline CRs still require an enterprise license
with the CONNECT product — enabling the controller alone does not
grant enterprise functionality.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update README, template, schema, partial types, and golden files
to include the new connectController chart value.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Make spec.licenseSecretRef optional on Connect CRs. When not set, the
controller falls back to the operator-level enterprise license configured
via enterprise.licenseSecretRef in the operator Helm chart values.

This avoids requiring users to specify the license on every Connect
pipeline CR. The operator-level license is passed via --license-file-path
and mounted from the chart's enterprise.licenseSecretRef secret.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove spec.licenseSecretRef from Connect CRD entirely. License is
  now only configured at the operator level via enterprise.licenseSecretRef
  in the operator Helm chart values.
- Set connectController.enabled to false by default (opt-in).
- Simplify controller license validation to only read from the
  operator-level license file path.
- Add unit tests for license validation covering: no license configured,
  invalid file, expired license, open source license, V0 enterprise
  license with all products, V1 enterprise with/without CONNECT product,
  V1 trial license, and V1 expired enterprise license.
- Fix values.schema.json alphabetical ordering (connectController before crds).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
david-yu and others added 10 commits April 9, 2026 16:42
The connect container image has the binary at /redpanda-connect (root),
not in $PATH. Use the absolute path in the pod command to match the
image layout. Also bump the default image tag to 4.87.0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
# Conflicts:
#	acceptance/go.sum
#	gen/go.sum
The merge from main introduced new RBAC rules (endpoints,
endpointslices, serviceexports, serviceimports) that were not
reflected in the golden test file.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds spec.annotations to the Pipeline CRD, applied only to the pod
template (not ConfigMaps or Deployments). Per-pipeline annotations are
merged with commonAnnotations, with per-pipeline values taking
precedence. This enables Datadog autodiscovery and similar pod-level
integrations without polluting other resources.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a lint init container to Pipeline Deployments that runs
`redpanda-connect lint` before the main container starts. If the
pipeline config is invalid, the init container fails and the pod
never runs.

The controller now detects init container failures by listing pods
and checking their init container statuses, surfacing the result as
a new ConfigValid condition on the Pipeline status. This gives users
immediate feedback when their pipeline config has syntax errors.

New acceptance test scenarios:
- Delete a Pipeline (create → running → delete → verify gone)
- Update a Pipeline config (change configYaml → verify still running)
- Stop a Pipeline (set paused:true → verify stopped)
- Resume a stopped Pipeline (pause → unpause → verify running)
- Invalid config detection (bad configYaml → verify ConfigValid=False)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add detailed instructions for adding RBAC permissions: manually
updating itemized RBAC files, ensuring they're in the k8s.yml file
list, and regenerating golden files after RBAC changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Align the Pipeline controller with the conventions established by the
Console controller (PR #1113):

1. Finalizer key: Use shared `operator.redpanda.com/finalizer` instead
   of unique `pipeline.redpanda.com/finalizer`
2. Namespace filtering: Store namespace param and filter in Reconcile
   to respect operator namespace scoping
3. Owns() all types: Iterate Types() and register Owns() for each
   managed resource type (Deployment, ConfigMap, PodMonitor)
4. Unexported finalizerKey: Match Console's private constant style
5. Golden file tests: Add txtar-based golden file snapshot tests and
   deletion GC verification following Console's test pattern
6. PodMonitor CRD check: Skip PodMonitor watch if the CRD is not
   installed, matching Console's ServiceMonitor pattern

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Run task generate (which includes gci lint-fix) to ensure generated
files have the correct import ordering that CI expects.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Document the correct order for regeneration and linting: use
`task generate` (not `task k8s:generate`) as the final step before
committing, since it includes `lint-fix`/`gci` import ordering.
Explains the common mistake of running k8s:generate which reverts
gci fixes on generated files.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@david-yu
Copy link
Copy Markdown
Contributor Author

Changes since last comment

Bug fixes

  • Acceptance test failures fixed (08f2eb06): Reverted upgrade-regressions starting version from v25.3.1 back to v25.1.3 (v25.3.1 has a FieldManager bug that writes *kube.Ctl instead of cluster.redpanda.com/operator). Added enterprise license secret to shared operator install so Pipeline controller passes license validation. Added connect image to k3d pre-loaded images to avoid ImagePullBackOff.
  • Moved commonAnnotations from Pipeline CRD to operator values (823b89bc): Annotations are now sourced from the operator Helm chart commonAnnotations value rather than per-Pipeline spec, matching the Console CRD convention.
  • Removed unused writeLicenseFile function (344de623): Lint cleanup.
  • Regenerated files (44c057fb): Fixed lint and unit test failures from regeneration drift.
  • Aligned Pipeline controller with Console CRD conventions (5c14cdfc): Consistent patterns across controllers.
  • Fixed import ordering (5a47fcca, 574e722a): Matched gci linter expectations.
  • Regenerated chart template golden files after main merge (ed6a4e6c).
  • Bumped RPCN image to connect:4.87.0 and fixed binary path (28a27175).

New features

  • PodMonitor support for Pipeline monitoring (0623b781): Adds optional Prometheus PodMonitor creation for Pipeline pods, controlled by operator-level monitoring.enabled config.
  • spec.annotations for per-pipeline pod annotations (bad2c706): Allows users to set custom annotations on Pipeline pod templates.
  • Config lint validation (4a8c0add): Pipeline controller now validates configYaml using rpk connect lint at reconcile time. Added pipeline lifecycle acceptance tests (create, pause, resume, delete).

Documentation

  • Updated CLAUDE.md (cdf84208, 6ada3dcd, d60947f2): Added golden file workflows, gotohelm learnings, RBAC regeneration workflow, and local lint instructions.

Maintenance

  • Bumped default RPCN image from connect:4.86.0 to connect:4.87.0.
  • Merged main (6173abfb): Resolved conflicts with upstream.

david-yu and others added 10 commits April 9, 2026 23:06
… detection

- Change finalizer addition from Apply/SSA to Update to avoid taking
  ownership of spec fields, which caused SSA conflicts when users
  updated .spec.configYaml via kubectl apply --server-side
- Use shorter requeue interval (15s) during provisioning/pending phases
  so init-container lint failures are detected within seconds instead of
  waiting the full 5-minute requeue cycle
- Check LastTerminationState in addition to current State when inspecting
  the lint init container, catching failures between container restarts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nection

When a Pipeline references a Redpanda cluster via spec.cluster.clusterRef,
the operator resolves the cluster's connection details (brokers, TLS, SASL)
and injects them as environment variables and volume mounts into the Connect
pod. This allows pipeline configs to use ${RPK_BROKERS}, ${RPK_TLS_ENABLED},
${RPK_TLS_ROOT_CAS_FILE}, ${RPK_SASL_MECHANISM}, ${RPK_SASL_USER}, and
${RPK_SASL_PASSWORD} to connect to operator-managed Redpanda clusters.

Changes:
- Add cluster.go with resolution logic using ConvertV2ToRenderState
- Inject cluster env vars and TLS CA cert volume in render.go
- Watch referenced Redpanda clusters via field index for re-reconciliation
- Add ClusterRef condition to Pipeline status
- Add RBAC for reading Redpanda CRs
- Add acceptance tests for produce/consume via clusterRef

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add spec.credentials to Pipeline CRD allowing users to specify custom
SASL credentials (mechanism, username, passwordSecretRef) instead of
defaulting to the cluster's bootstrap admin user. When credentials is
set alongside a clusterRef, the explicit credentials take precedence.
This enables pairing a Pipeline with a dedicated User CRD for
least-privilege access.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Change credentials.password from corev1.SecretKeySelector to ValueSource,
matching the pattern used by all other CRDs in this operator. This adds
support for:
- Kubernetes Secrets (secretKeyRef)
- ConfigMaps (configMapKeyRef)
- Inline values (inline)
- External secret providers via the externalSecretRef field (AWS Secrets
  Manager, GCP Secret Manager, Azure Key Vault)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ials

When credentials come from spec.credentials (dedicated user), use
RPK_CREDENTIALS_SASL_MECHANISM, RPK_CREDENTIALS_SASL_USER, and
RPK_CREDENTIALS_SASL_PASSWORD. When credentials come from the cluster's
bootstrap admin user, use RPK_SASL_MECHANISM, RPK_SASL_USER, and
RPK_SASL_PASSWORD. This makes the credential source explicit in the
pipeline configuration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Document how ClusterRef resolution works (ConvertV2ToRenderState +
AsStaticConfigSource pattern), how controllers watch referenced clusters
(multicluster vs single-cluster patterns), and the ValueSource type for
secrets including external secret provider support (AWS, GCP, Azure).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…alignment

Add DeepCopyInto/DeepCopy for PipelineSASLCredentials and add Credentials
field handling to PipelineSpec.DeepCopyInto. Fix struct field alignment
in render.go lint init container.

Note: CRD YAML, CRD docs, and RBAC formatting still require
`nix develop -c task generate` to fully regenerate.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Run `task generate` to regenerate:
- CRD schema with credentials field (ValueSource-based password)
- CRD reference docs with PipelineSASLCredentials type
- RBAC ClusterRole with controller-gen formatting

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Document that CRD YAML, deepcopy, CRD docs, RBAC, and Helm templates
must always be regenerated via `nix develop -c task generate`, never
hand-edited or reconstructed from CI diffs. Also note the fallback nix
binary path at /nix/var/nix/profiles/default/bin/nix.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove the PipelineSASLCredentials type and spec.credentials field.
Users who need non-admin SASL credentials should use spec.secretRef
or spec.env to inject custom username/password env vars, and configure
the SASL mechanism directly in their pipeline configYaml.

This simplifies the CRD surface — clusterRef provides broker addresses,
TLS, and bootstrap SASL by default. Custom credentials are handled
through the existing secret injection mechanisms.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@david-yu
Copy link
Copy Markdown
Contributor Author

Changes since last comment

Bug fixes

  • Fixed Pipeline acceptance test failures (69503ffc):
    • Changed finalizer addition from Apply (SSA) to Update to avoid taking ownership of spec.configYaml, which caused SSA conflicts when users updated the field via kubectl apply --server-side
    • Shortened requeue interval from 5 minutes to 15 seconds during provisioning/pending phases so init-container lint failures are detected within seconds
    • Added LastTerminationState check for lint init container to catch failures between container restarts
  • Lint fixes (125bc112, 3488cbbe): Added generated DeepCopy for new types, fixed struct field alignment, regenerated CRD YAML/docs/RBAC via task generate

New features

  • ClusterRef support (3fea052c): When a Pipeline references a Redpanda cluster via spec.cluster.clusterRef, the operator resolves broker addresses, TLS, and SASL credentials from the Redpanda CR and injects them as environment variables (RPK_BROKERS, RPK_TLS_ENABLED, RPK_TLS_ROOT_CAS_FILE, RPK_SASL_MECHANISM, RPK_SASL_USER, RPK_SASL_PASSWORD) and TLS CA cert volume mounts into the Connect pod. Uses the same ConvertV2ToRenderState + AsStaticConfigSource pattern as the Console controller. Watches referenced Redpanda CRs via field index so Pipelines re-reconcile when their cluster changes. Adds ClusterRef condition to Pipeline status.
  • Acceptance tests for clusterRef (3fea052c): Two new scenarios — Pipeline produces to Redpanda via clusterRef (generate → redpanda output, verifies messages arrive), and Pipeline reads from Redpanda via clusterRef (produces via kafka client, Pipeline consumes with redpanda input).

Refactoring

  • Removed spec.credentials field (6d801607): Removed the PipelineSASLCredentials type that was briefly added for dedicated user credentials. Instead, users who need non-admin SASL credentials should use spec.secretRef or spec.env to inject username/password from a Secret, and configure the SASL mechanism directly in their configYaml. This keeps the CRD surface simple — clusterRef provides broker addresses, TLS, and bootstrap SASL by default; custom credentials are handled through existing secret injection mechanisms. PR description updated with full examples showing both approaches.

Documentation

  • CLAUDE.md updates (015231d1, 8694efb8): Added ClusterRef resolution docs (how ConvertV2ToRenderState + AsStaticConfigSource works, multicluster vs single-cluster watch patterns), ValueSource/external secret provider docs, and a rule to never hand-edit generated files (always use nix develop -c task generate).

…ifests

The PatchManifest function in acceptance tests expands ${KEY} patterns
as test template variables. Pipeline configYaml contains ${RPK_BROKERS},
${RPK_TLS_ENABLED}, etc. which are Redpanda Connect runtime env var
interpolations resolved inside the container, not test framework vars.

Pass through any ${RPK_*} pattern without expansion so these reach
Kubernetes as literal text for Connect to interpolate at runtime.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The lint init container runs /redpanda-connect lint which needs env vars
like RPK_BROKERS, RPK_TLS_ENABLED, RPK_TLS_ROOT_CAS_FILE to resolve
${...} interpolations in the pipeline config. Without the env vars, the
linter sees literal strings where it expects typed values (e.g.,
"${RPK_TLS_ENABLED}" instead of a boolean), causing lint to fail and
the pod to CrashLoopBackOff.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
david-yu and others added 2 commits April 10, 2026 19:09
The Pipeline_produces_to_Redpanda_via_clusterRef acceptance test failed
consistently because Redpanda defaults auto_create_topics_enabled to
false. The producer pipeline could not auto-create the target topic, so
no messages were ever delivered.

- Pre-create pipeline-produce-test topic before running the producer
  pipeline, matching the pattern used by the consumer scenario
- Remove misleading "Found topic" logs from ExpectTopic/ExpectNoTopic
  that printed unconditionally even when the topic was not found
- Increase checkTopic timeout from 10s to 30s for CI stability
- Handle NotFound/Conflict errors during finalizer removal to avoid
  noisy UID precondition errors when pipelines are deleted concurrently

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add spec.budget field to Pipeline with maxUnavailable/minAvailable
options, following the convention used by Strimzi and Prometheus
Operator. The PDB is rendered by the Syncer alongside the Deployment
and ConfigMap, so it is automatically garbage-collected on CR deletion.

CRD validation enforces exactly one of maxUnavailable or minAvailable
via CEL rule. RBAC updated for policy/poddisruptionbudgets.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@david-yu
Copy link
Copy Markdown
Contributor Author

Changes since last comment

New feature

  • PodDisruptionBudget support (5ca73507): Added spec.budget field to the Pipeline CRD with maxUnavailable or minAvailable options, following the convention used by Strimzi and Prometheus Operator. The PDB is rendered by the Syncer alongside the Deployment and ConfigMap, so it is automatically garbage-collected on CR deletion. CEL validation enforces exactly one of the two fields must be set. RBAC markers added and pipeline controller added to controller-gen RBAC generation loop in taskfiles/k8s.yml. All generated files (CRD YAML, DeepCopy, RBAC, chart golden files, CRD docs) regenerated via nix develop -c task k8s:generate. Three new render tests: PDB not configured, maxUnavailable (int), minAvailable (percentage).

Bug fixes

  • Lint init container missing env vars (026d0735): The lint init container was not receiving the env vars (including RPK_* cluster connection vars from clusterRef), causing lint to fail when the pipeline config referenced environment variable interpolations like ${RPK_BROKERS}.
  • Acceptance test env var passthrough (adee691b): RPK_* environment variable interpolations in acceptance test manifests were being treated as Go template expressions and replaced with empty strings. Added passthrough for RPK_-prefixed vars.
  • Pre-create topic in produce test (d46f37b9): The produce acceptance test was racing against topic auto-creation. Now explicitly creates the topic before starting the pipeline. Also added a nil guard for status.ReadyReplicas in the controller to avoid panics on freshly created Deployments.

PR description updates

  • Added "Failure modes and recovery" section with detailed node failure walkthrough (detection → eviction → rescheduling → operator reconcile), failure mode table, and key design decisions
  • Updated PDB entry from "not supported" to documenting the new spec.budget field and its interaction with the Recreate deployment strategy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants