Add Pipeline CRD for Redpanda Connect pipeline management by david-yu · Pull Request #1337 · redpanda-data/redpanda-operator

david-yu · 2026-03-23T22:54:42Z

Summary

Introduces the Pipeline custom resource (shortName: rpcn) for managing Redpanda Connect pipelines via the Redpanda Operator. This enables declarative pipeline lifecycle management through Kubernetes CRDs, gated behind an enterprise license for RPCN.

What's included

CRD (Pipeline):

spec.configYaml — Redpanda Connect pipeline configuration in YAML
spec.replicas — number of pipeline replicas (default: 1)
spec.image — container image override (default: redpandadata/connect:4.87.0)
spec.paused — scales replicas to 0 when true
spec.resources — compute resource requirements
spec.env — additional environment variables
spec.secretRef — Kubernetes Secrets to inject as environment variables
spec.cluster — optional ClusterSource reference to a Redpanda cluster
spec.zones — availability zones for pod spreading
spec.annotations — pod-level annotations (e.g., for Datadog autodiscovery), merged with commonAnnotations
spec.tolerations / spec.nodeSelector / spec.topologySpreadConstraints — scheduling controls
spec.displayName / spec.description / spec.tags / spec.configFiles — Cloud migration-compatible metadata
Status tracks phase (Running/Provisioning/Stopped), ready replicas, and conditions

ClusterRef — Connect Pipelines to Redpanda Clusters:

When a Pipeline references a Redpanda cluster via spec.cluster.clusterRef, the operator automatically:

Resolves the cluster's internal broker addresses, TLS configuration, and SASL credentials
Injects environment variables into the Connect pod
Mounts TLS CA certificates as a projected volume at /etc/tls/certs/ca/
Watches referenced Redpanda CRs — when a cluster changes, all referencing Pipelines re-reconcile

This enables seamless connectivity to operator-managed Redpanda clusters using the redpanda input, redpanda output, redpanda_migrator input, and redpanda_migrator output.

Environment Variable	When set	Description
`RPK_BROKERS`	Always (with clusterRef)	Comma-separated internal broker addresses
`RPK_TLS_ENABLED`	Always (with clusterRef)	`true` or `false`
`RPK_TLS_ROOT_CAS_FILE`	TLS enabled	Path to mounted CA certificate
`RPK_SASL_MECHANISM`	Cluster has SASL enabled	Bootstrap user SASL mechanism
`RPK_SASL_USER`	Cluster has SASL enabled	Bootstrap user username
`RPK_SASL_PASSWORD`	Cluster has SASL enabled	Bootstrap user password (from Secret)

Controller:

Reconciles Pipeline CRs using kube.Ctl and server-side apply (SSA) semantics
Uses kube.Syncer for child resource lifecycle management (ConfigMap, Deployment)
Resource rendering externalized to a dedicated render struct implementing kube.Renderer
Status conditions use SSA-compatible utils.StatusConditionConfigs helper — no swallowed errors
Validates enterprise license on every reconciliation using common-go/license v1
License must include the CONNECT product, allow enterprise features, and be unexpired
Owned resources (Deployment, ConfigMap) are garbage-collected on CR deletion via Syncer
Gated behind --enable-connect flag (default: false)
Watches referenced Redpanda clusters via field index — re-reconciles Pipelines when cluster changes
ClusterRef and ConfigValid conditions track resolution and lint status

Typed Status Conditions:

PipelinePhase is a typed enum: Pending, Provisioning, Running, Stopped, Unknown
Condition types: Ready, ConfigValid, ClusterRef
Condition reasons include: Running, Provisioning, Paused, LicenseInvalid, Failed, ConfigValid, ConfigInvalid, ClusterRefResolved, ClusterRefInvalid

Prometheus Monitoring (PodMonitor):

Controller creates a PodMonitor per Pipeline CR when connectController.monitoring.enabled is true
PodMonitor scrapes Redpanda Connect's /metrics endpoint on port 4195
Configurable scrape interval and custom labels via operator Helm values

Configuration Lint Validation:

Deployment includes a lint init container that runs /redpanda-connect lint before the main container starts
Controller checks init container status (including LastTerminationState) and surfaces lint errors via a ConfigValid condition
Uses shorter requeue interval (15s) during provisioning to detect failures quickly

Helm Chart Integration:

Pipeline RBAC policy includes permissions for pipelines, redpandas (for clusterRef), deployments, configmaps, pods, secrets, and podmonitors
RBAC gated by connectController.enabled in the chart values
Pipeline CRD added to the CRD installation subcommand

Reference implementation

Based on the pipeline controller in cloudv2/apps/redpanda-connect-api, adapted to operator patterns (SSA, kube.Ctl, kube.Syncer, render package, typed conditions, RBAC in helm chart).

CLAUDE.md

Added a "Creating a New CRD" section documenting the conventions to follow for future CRD additions.

Try it out

A pre-built operator image is available at yongshin/redpanda-operator:pipeline-crd (linux/arm64).

Step 1: Check out the branch and install CRDs

git clone https://github.com/redpanda-data/redpanda-operator.git
cd redpanda-operator
git checkout feat/connect-crd

# Install the CRDs
kubectl apply -f operator/config/crd/bases/

Step 2: Create a license Secret

kubectl create secret generic redpanda-license \
  --from-file=license=./redpanda.license \
  -n redpanda

Step 3: Deploy the operator with the pre-built image

helm install redpanda-operator ./operator/chart \
  --set image.repository=yongshin/redpanda-operator \
  --set image.tag=pipeline-crd \
  --set rbac.createRPKBundleCRs=false \
  --set connectController.enabled=true \
  --set enterprise.licenseSecretRef.name=redpanda-license \
  --set enterprise.licenseSecretRef.key=license \
  --create-namespace \
  -n redpanda

Step 4: Deploy a Connect pipeline

# pipeline.yaml
apiVersion: cluster.redpanda.com/v1alpha2
kind: Pipeline
metadata:
  name: demo-pipeline
  namespace: redpanda
spec:
  configYaml: |
    input:
      generate:
        mapping: 'root.message = "hello world"'
        interval: "5s"
    output:
      stdout: {}
  replicas: 1

kubectl apply -f pipeline.yaml
kubectl get rpcn -n redpanda
kubectl describe pipeline demo-pipeline -n redpanda

Step 5: Verify the pipeline is running

# Check status
kubectl get rpcn -n redpanda
# NAME            READY   PHASE     REPLICAS   AVAILABLE   AGE
# demo-pipeline   True    Running   1          1           30s

# Check pipeline logs
kubectl logs -n redpanda -l app.kubernetes.io/instance=demo-pipeline

Clean up

kubectl delete pipeline demo-pipeline -n redpanda
helm uninstall redpanda-operator -n redpanda
kubectl delete -f operator/config/crd/bases/

Usage guide

Prerequisites

The operator must be started with --enable-connect (disabled by default).
A valid Redpanda enterprise license that includes the CONNECT product, configured via enterprise.licenseSecretRef in the operator Helm chart values.

Configure the license and enable the Connect controller

helm install redpanda-operator redpanda/operator \
  --set connectController.enabled=true \
  --set enterprise.licenseSecretRef.name=redpanda-license \
  --set enterprise.licenseSecretRef.key=license

Create a Connect pipeline

apiVersion: cluster.redpanda.com/v1alpha2
kind: Pipeline
metadata:
  name: my-pipeline
  namespace: redpanda
spec:
  configYaml: |
    input:
      kafka:
        addresses: ["redpanda:9092"]
        topics: ["events"]
        consumer_group: "connect-pipeline"
    output:
      kafka:
        addresses: ["redpanda:9092"]
        topic: "processed-events"
  replicas: 1

Monitor the pipeline

kubectl get rpcn -n redpanda
kubectl describe pipeline my-pipeline -n redpanda
kubectl get pods -n redpanda -l app.kubernetes.io/instance=my-pipeline

Status phases:

Phase	Meaning
`Pending`	Pipeline has been accepted but Deployment not yet created
`Provisioning`	Deployment is being created or pods are starting up
`Running`	All replicas are ready and processing
`Stopped`	Pipeline is paused (`spec.paused: true`)

Pause / resume a pipeline

# Pause
kubectl patch pipeline my-pipeline -n redpanda \
  --type merge -p '{"spec":{"paused":true}}'

# Resume
kubectl patch pipeline my-pipeline -n redpanda \
  --type merge -p '{"spec":{"paused":false}}'

Spread pods across availability zones

spec:
  replicas: 3
  zones:
    - us-east-1a
    - us-east-1b
    - us-east-1c

Connect to a Redpanda cluster via clusterRef

Reference an operator-managed Redpanda cluster. The operator resolves broker addresses, TLS, and bootstrap SASL credentials automatically:

apiVersion: cluster.redpanda.com/v1alpha2
kind: Pipeline
metadata:
  name: my-pipeline
spec:
  cluster:
    clusterRef:
      name: my-redpanda-cluster
  configYaml: |
    input:
      redpanda:
        seed_brokers: ["${RPK_BROKERS}"]
        tls:
          enabled: ${RPK_TLS_ENABLED}
          root_cas_file: "${RPK_TLS_ROOT_CAS_FILE}"
        topics: ["events"]
        consumer_group: "my-pipeline"
    output:
      redpanda:
        seed_brokers: ["${RPK_BROKERS}"]
        tls:
          enabled: ${RPK_TLS_ENABLED}
          root_cas_file: "${RPK_TLS_ROOT_CAS_FILE}"
        topic: "processed-events"

Using dedicated user credentials (non-admin) with clusterRef

By default, clusterRef injects the cluster's bootstrap (admin) SASL credentials via RPK_SASL_* env vars. For least-privilege access, create a dedicated User CRD and store both its username and password in a Secret. Then reference that Secret via spec.secretRef or spec.env and configure the SASL mechanism directly in your pipeline config:

# Step 1: Create a Secret with the dedicated user's credentials
apiVersion: v1
kind: Secret
metadata:
  name: pipeline-user-credentials
stringData:
  username: pipeline-user
  password: my-secure-password
  sasl_mechanism: SCRAM-SHA-256
---
# Step 2: Create a dedicated user with specific ACLs
apiVersion: cluster.redpanda.com/v1alpha2
kind: User
metadata:
  name: pipeline-user
spec:
  cluster:
    clusterRef:
      name: my-cluster
  authentication:
    type: scram-sha-256
    password:
      valueFrom:
        secretKeyRef:
          name: pipeline-user-credentials
          key: password
  authorization:
    acls:
      - resource: {type: topic, name: "events"}
        operations: [read, describe]
      - resource: {type: topic, name: "processed-events"}
        operations: [write, describe, create]
      - resource: {type: group, name: "my-pipeline"}
        operations: [read]
---
# Step 3: Reference user credentials via spec.secretRef and configure SASL in configYaml
apiVersion: cluster.redpanda.com/v1alpha2
kind: Pipeline
metadata:
  name: my-pipeline
spec:
  cluster:
    clusterRef:
      name: my-cluster
  secretRef:
    - name: pipeline-user-credentials
  configYaml: |
    input:
      redpanda:
        seed_brokers: ["${RPK_BROKERS}"]
        tls:
          enabled: ${RPK_TLS_ENABLED}
          root_cas_file: "${RPK_TLS_ROOT_CAS_FILE}"
        sasl:
          - mechanism: "${sasl_mechanism}"
            username: "${username}"
            password: "${password}"
        topics: ["events"]
        consumer_group: "my-pipeline"
    output:
      redpanda:
        seed_brokers: ["${RPK_BROKERS}"]
        tls:
          enabled: ${RPK_TLS_ENABLED}
          root_cas_file: "${RPK_TLS_ROOT_CAS_FILE}"
        sasl:
          - mechanism: "${sasl_mechanism}"
            username: "${username}"
            password: "${password}"
        topic: "processed-events"

Note: When using spec.secretRef, all keys from the referenced Secret are injected as env vars. The Secret key names (username, password, sasl_mechanism) become env var names. The clusterRef still provides broker addresses and TLS — only the SASL credentials are overridden in the config.

Alternatively, use spec.env to map specific Secret keys to custom env var names:

spec:
  cluster:
    clusterRef:
      name: my-cluster
  env:
    - name: MY_SASL_USER
      valueFrom:
        secretKeyRef:
          name: pipeline-user-credentials
          key: username
    - name: MY_SASL_PASSWORD
      valueFrom:
        secretKeyRef:
          name: pipeline-user-credentials
          key: password
  configYaml: |
    input:
      redpanda:
        seed_brokers: ["${RPK_BROKERS}"]
        tls:
          enabled: ${RPK_TLS_ENABLED}
          root_cas_file: "${RPK_TLS_ROOT_CAS_FILE}"
        sasl:
          - mechanism: SCRAM-SHA-256
            username: "${MY_SASL_USER}"
            password: "${MY_SASL_PASSWORD}"
        topics: ["events"]
    output:
      stdout: {}

Passing secrets to a Pipeline

Pipelines often need credentials (e.g., Kafka passwords, API keys). There are two approaches:

Option A: Reference an entire Secret (`spec.secretRef`)

All key-value pairs in each referenced Secret are injected as environment variables. The pipeline config can reference them using ${VAR_NAME} interpolation.

spec:
  secretRef:
    - name: my-pipeline-creds
  configYaml: |
    input:
      kafka:
        addresses: ["redpanda:9092"]
        topics: ["events"]
        password: "${KAFKA_PASSWORD}"
    output:
      aws_s3:
        bucket: my-bucket
        credentials:
          id: "${S3_ACCESS_KEY}"
          secret: "${S3_SECRET_KEY}"

Option B: Reference individual Secret keys (`spec.env`)

spec:
  env:
    - name: KAFKA_PASSWORD
      valueFrom:
        secretKeyRef:
          name: kafka-creds
          key: password
  configYaml: |
    input:
      kafka:
        addresses: ["redpanda:9092"]
        topics: ["events"]
        password: "${KAFKA_PASSWORD}"
    output:
      stdout: {}

Approach	Best for
`spec.secretRef`	Injecting all keys from a Secret at once
`spec.env` with `secretKeyRef`	Cherry-picking individual keys or renaming them

See also: Redpanda Connect secrets documentation

Common annotations for Gatekeeper compliance

commonAnnotations:
  owner: "platform-team@example.com"
  environment: "production"

Monitoring Pipeline metrics with Prometheus

helm install redpanda-operator redpanda/operator \
  --set connectController.enabled=true \
  --set connectController.monitoring.enabled=true \
  --set connectController.monitoring.scrapeInterval=30s \
  --set enterprise.licenseSecretRef.name=redpanda-license \
  --set enterprise.licenseSecretRef.key=license

Add Prometheus metrics to your pipeline config:

spec:
  configYaml: |
    input: ...
    output: ...
    metrics:
      prometheus:
        add_process_metrics: true
        add_go_metrics: true

Monitoring Pipeline metrics with Datadog

spec:
  annotations:
    ad.datadoghq.com/connect.checks: |
      {
        "openmetrics": {
          "instances": [
            {
              "openmetrics_endpoint": "http://%%host%%:4195/metrics",
              "namespace": "redpanda_connect",
              "metrics": [".*"]
            }
          ]
        }
      }

Configuration lint validation

# Check lint status
kubectl get pipeline my-pipeline -o jsonpath='{.status.conditions[?(@.type=="ConfigValid")]}' | jq .

# View raw lint output
kubectl logs deploy/my-pipeline -c lint

License validation and troubleshooting

Condition Reason	Message	Resolution
`LicenseInvalid`	`no license configured: set enterprise.licenseSecretRef...`	Configure `enterprise.licenseSecretRef` in operator Helm values
`LicenseInvalid`	`failed to read license`	Ensure the license Secret exists and key is correct
`LicenseInvalid`	`license expired`	Contact Redpanda for a renewal
`LicenseInvalid`	`license does not allow enterprise features`	Need an Enterprise or Trial license
`LicenseInvalid`	`license does not include Redpanda Connect`	License lacks the CONNECT product

Failure modes and recovery

The Pipeline controller creates a standard Kubernetes Deployment, so most failure recovery is handled by Kubernetes built-in controllers rather than the operator itself. The operator does not implement custom node-failure detection or pod rescheduling — it delegates that entirely to the Deployment abstraction and observes the resulting state on each reconciliation cycle.

Node failure (detailed walkthrough)

When a node running pipeline pods fails:

Detection — the kubelet stops heartbeating. After node-monitor-grace-period (default 40s), the node controller marks the node as NotReady.
Eviction — after pod-eviction-timeout (default 5 minutes), the node controller taints the node with node.kubernetes.io/unreachable:NoExecute. Pods without a matching toleration are evicted.
Rescheduling — the ReplicaSet controller detects the pod count is below the desired replica count and schedules replacement pods on healthy nodes.
Operator reconcile — on the next reconciliation (triggered by the Deployment status change), the operator updates the Pipeline status conditions and phase (e.g., Provisioning while new pods start, then Running once ready).

If spec.zones is configured, replacement pods respect the zone node affinity and topology spread constraint (ScheduleAnyway), so they will prefer spreading across zones but will not block scheduling if a zone is entirely unavailable. If spec.budget is configured, the PDB limits how many pods can be simultaneously evicted during voluntary disruptions (node drains), but does not affect involuntary evictions from node failures.

Failure	Who recovers	What happens	Notes
Node failure	Kubernetes (ReplicaSet controller)	Node `NotReady` → eviction timeout → pods rescheduled on healthy nodes	Zone affinity and topology spread respected. PDB protects against voluntary disruptions only.
Pod crash / OOMKill	Kubernetes (kubelet)	Container restarts with exponential backoff (`CrashLoopBackOff`)	Readiness probe (`/ready` on port 4195) gates traffic. Operator updates status conditions on next reconcile.
Invalid pipeline config	Operator (lint init container)	`lint` init container exits non-zero → pod stays in `Init:Error` → operator sets `ConfigValid=False` condition with lint output	Controller uses a 15s requeue during provisioning to detect failures quickly. Fix the config and the next reconcile will succeed.
Invalid license	Operator	Reconciler short-circuits before creating any child resources → `Ready=False` with `LicenseInvalid` reason	No Deployment or ConfigMap is created until a valid license is present.
ClusterRef target unavailable	Operator	Sets `ClusterRef=False` condition with error details, skips Deployment reconciliation	Pipeline will not start until the referenced Redpanda cluster is resolvable. Watches on the Redpanda CR trigger re-reconciliation when the cluster becomes available.
Deployment rollout stall	Kubernetes (Deployment controller)	`Recreate` strategy kills all old pods before creating new ones — if new pods fail readiness, rollout stalls	Operator reflects the Deployment status in Pipeline conditions. Manual intervention may be needed (fix config, check resources).
Pipeline paused (`spec.paused: true`)	Operator	Scales replicas to 0, sets `Stopped` phase	Resume by setting `spec.paused: false`.

Key design decisions:

spec.budget (PodDisruptionBudget) — configurable via the CRD with maxUnavailable or minAvailable. Protects against voluntary disruptions (node drain, cluster autoscaler) but does not affect involuntary evictions. The PDB is rendered by the Syncer alongside the Deployment and ConfigMap, so it is automatically garbage-collected on CR deletion. CEL validation enforces exactly one of the two fields.
Recreate strategy — chosen over RollingUpdate to avoid running two pipeline instances concurrently that might double-process messages. This means updates cause brief downtime. Note that a PDB with maxUnavailable: 1 will slow down voluntary drains but does not conflict with the Recreate strategy (which is the operator's own update path, not a voluntary disruption).
No automatic retry of failed lint — the init container fails fast; the operator surfaces the error and waits for a config fix rather than retrying.

Test plan

go build ./... passes in operator/ and acceptance/
Unit tests for license validation (8 tests covering all license scenarios)
Render tests for ConfigMap, Deployment defaults, paused, secrets, zones, common annotations
PodMonitor render tests: disabled, enabled, common annotations, no scrape interval
Reconciler tests for no-license, invalid-license, and deletion paths (require envtest)
Helm chart rendering test cases for connectController.enabled, monitoring, and common annotations
Lint passes (task lint)
Acceptance tests: create/run, delete, update, stop, resume, lint validation, clusterRef produce, clusterRef consume

🤖 Generated with Claude Code

github-actions · 2026-03-30T02:14:47Z

This PR is stale because it has been open 5 days with no activity. Remove stale label or comment or this will be closed in 5 days.

andrewstucki

Not sure if this is just a movement of connect pipeline reconcilers over from another repo, but would definitely want to change a chunk of the design around how this reconciliation works to be more inline with the patterns that this repo has before merging anything like this. Could we just add this in as part of a roadmap rather than trying to generate it? It shouldn't take more than a day or two to implement properly once we actually pull it in. But as is, there are a number of issues I see immediately with this PR that need changing:

we try to use SSA semantics whenever possible, so the CreateOrPatch and Update calls are out-of-place.
not a huge fan of swallowing the status Update errors on the reconcile calls, and it appears inconsistent -- some times it looks like we're returning the update error, sometimes swallowing it
we generally try and externalize our sub-resource definitions to some sort of "render" package to avoid having to inline everything
this should likely use the kube.Ctl synchronization primitives
I'm assuming we'd probably want to run some of the secret stuff through cloud-secret materialization?
would we want any of the configuration around Redpanda sources to somehow be pluggable with our clusterRef-style specification?
this appears to not have created the RBAC policies in the proper place as it needs to be copies over to the helm chart itself
the tests should actually test the reconciler, here they just do license validation
I'd prefer to use some sort of enum/typed status information for the pipeline conditions, because what they are/do are basically undocumented right now
at least one rendering test in the helm chart should test the enabling flag
the CRD itself also needs to be added to the CRD installation process subcommand in order for this to ever work.
for a new CRD type we should have at least one acceptance test that excercises the feature.

david-yu · 2026-03-31T03:37:32Z

Moving back to draft mode. Thanks for taking a look.

github-actions · 2026-04-06T02:14:51Z

This PR is stale because it has been open 5 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Introduces the Connect custom resource (shortName: rpcn) for managing Redpanda Connect pipelines via the Redpanda Operator. Each Connect CR declaratively specifies a pipeline configuration in YAML, and the controller reconciles the desired state by managing a Deployment and ConfigMap. Enterprise license gating: the controller validates a Redpanda enterprise license (v1 format from common-go/license) on every reconciliation. The license must include the CONNECT product and be unexpired. The license is read from a Kubernetes Secret referenced by spec.licenseSecretRef. Key components: - CRD types: Connect, ConnectSpec, ConnectStatus in v1alpha2 - Controller: creates/patches ConfigMap + Deployment, updates status - RBAC: ClusterRole permissions for connects, deployments, configmaps, secrets - CRD manifest: cluster.redpanda.com_connects.yaml - Gated behind --enable-connect flag (default: false) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Update generated files to match what CI's controller-gen v0.20.1 and code generators produce: - Move Connect deepcopy functions to correct alphabetical position (after Configurator, before ConnectorMonitoring) - Regenerate CRD YAML with full OpenAPI schema from controller-gen - Update crd-docs.adoc with Connect type documentation - Add Connect deprecation test case - Update RBAC role.yaml to match controller-gen output - Add missing common-go/license go.sum entries in acceptance/ and gen/ - Fix whitespace in run.go Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fix TestCRDS by adding connects.cluster.redpanda.com to the expected CRD list and adding a Connect() helper function. Add Cloud-compatible fields to ConnectSpec for smooth migration to Redpanda Cloud managed Connect: - displayName: human-readable pipeline name - description: pipeline description - tags: key-value pairs for filtering/organization - configFiles: additional config files mounted at /config The controller now includes configFiles entries in the ConfigMap alongside connect.yaml, with a guard against key collision. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add displayName, description, tags, and configFiles documentation to the ConnectSpec section of the generated CRD docs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add scheduling fields to ConnectSpec for spreading pipeline pods across availability zones: - zones: list of AZs to constrain and spread pods across. When set, the controller auto-generates a node affinity (restrict to listed zones) and a topology spread constraint (even distribution with maxSkew=1, ScheduleAnyway) using topology.kubernetes.io/zone. - tolerations: standard k8s tolerations for tainted nodes - nodeSelector: label-based node selection - topologySpreadConstraints: explicit spread constraints that override the auto-generated zone constraint when provided Example usage: spec: zones: ["us-east-1a", "us-east-1b", "us-east-1c"] replicas: 3 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Update connects CRD YAML with full TopologySpreadConstraint schema instead of x-kubernetes-preserve-unknown-fields, expand toleration descriptions, fix field ordering (nodeSelector before paused), and update crd-docs.adoc descriptions to match Go struct comments. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The Connect controller is now enabled by default (--enable-connect=true). Users can disable it via the operator helm chart value: helm install redpanda-operator ... --set connectController.enabled=false Individual Connect pipeline CRs still require an enterprise license with the CONNECT product — enabling the controller alone does not grant enterprise functionality. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Update README, template, schema, partial types, and golden files to include the new connectController chart value. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Make spec.licenseSecretRef optional on Connect CRs. When not set, the controller falls back to the operator-level enterprise license configured via enterprise.licenseSecretRef in the operator Helm chart values. This avoids requiring users to specify the license on every Connect pipeline CR. The operator-level license is passed via --license-file-path and mounted from the chart's enterprise.licenseSecretRef secret. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Remove spec.licenseSecretRef from Connect CRD entirely. License is now only configured at the operator level via enterprise.licenseSecretRef in the operator Helm chart values. - Set connectController.enabled to false by default (opt-in). - Simplify controller license validation to only read from the operator-level license file path. - Add unit tests for license validation covering: no license configured, invalid file, expired license, open source license, V0 enterprise license with all products, V1 enterprise with/without CONNECT product, V1 trial license, and V1 expired enterprise license. - Fix values.schema.json alphabetical ordering (connectController before crds). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The connect container image has the binary at /redpanda-connect (root), not in $PATH. Use the absolute path in the pod command to match the image layout. Also bump the default image tag to 4.87.0. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

# Conflicts: # acceptance/go.sum # gen/go.sum

The merge from main introduced new RBAC rules (endpoints, endpointslices, serviceexports, serviceimports) that were not reflected in the golden test file. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Adds spec.annotations to the Pipeline CRD, applied only to the pod template (not ConfigMaps or Deployments). Per-pipeline annotations are merged with commonAnnotations, with per-pipeline values taking precedence. This enables Datadog autodiscovery and similar pod-level integrations without polluting other resources. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add a lint init container to Pipeline Deployments that runs `redpanda-connect lint` before the main container starts. If the pipeline config is invalid, the init container fails and the pod never runs. The controller now detects init container failures by listing pods and checking their init container statuses, surfacing the result as a new ConfigValid condition on the Pipeline status. This gives users immediate feedback when their pipeline config has syntax errors. New acceptance test scenarios: - Delete a Pipeline (create → running → delete → verify gone) - Update a Pipeline config (change configYaml → verify still running) - Stop a Pipeline (set paused:true → verify stopped) - Resume a stopped Pipeline (pause → unpause → verify running) - Invalid config detection (bad configYaml → verify ConfigValid=False) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add detailed instructions for adding RBAC permissions: manually updating itemized RBAC files, ensuring they're in the k8s.yml file list, and regenerating golden files after RBAC changes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Align the Pipeline controller with the conventions established by the Console controller (PR #1113): 1. Finalizer key: Use shared `operator.redpanda.com/finalizer` instead of unique `pipeline.redpanda.com/finalizer` 2. Namespace filtering: Store namespace param and filter in Reconcile to respect operator namespace scoping 3. Owns() all types: Iterate Types() and register Owns() for each managed resource type (Deployment, ConfigMap, PodMonitor) 4. Unexported finalizerKey: Match Console's private constant style 5. Golden file tests: Add txtar-based golden file snapshot tests and deletion GC verification following Console's test pattern 6. PodMonitor CRD check: Skip PodMonitor watch if the CRD is not installed, matching Console's ServiceMonitor pattern Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Run task generate (which includes gci lint-fix) to ensure generated files have the correct import ordering that CI expects. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Document the correct order for regeneration and linting: use `task generate` (not `task k8s:generate`) as the final step before committing, since it includes `lint-fix`/`gci` import ordering. Explains the common mistake of running k8s:generate which reverts gci fixes on generated files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

david-yu · 2026-04-10T05:06:37Z

Changes since last comment

Bug fixes

Acceptance test failures fixed (08f2eb06): Reverted upgrade-regressions starting version from v25.3.1 back to v25.1.3 (v25.3.1 has a FieldManager bug that writes *kube.Ctl instead of cluster.redpanda.com/operator). Added enterprise license secret to shared operator install so Pipeline controller passes license validation. Added connect image to k3d pre-loaded images to avoid ImagePullBackOff.
Moved commonAnnotations from Pipeline CRD to operator values (823b89bc): Annotations are now sourced from the operator Helm chart commonAnnotations value rather than per-Pipeline spec, matching the Console CRD convention.
Removed unused writeLicenseFile function (344de623): Lint cleanup.
Regenerated files (44c057fb): Fixed lint and unit test failures from regeneration drift.
Aligned Pipeline controller with Console CRD conventions (5c14cdfc): Consistent patterns across controllers.
Fixed import ordering (5a47fcca, 574e722a): Matched gci linter expectations.
Regenerated chart template golden files after main merge (ed6a4e6c).
Bumped RPCN image to connect:4.87.0 and fixed binary path (28a27175).

New features

PodMonitor support for Pipeline monitoring (0623b781): Adds optional Prometheus PodMonitor creation for Pipeline pods, controlled by operator-level monitoring.enabled config.
spec.annotations for per-pipeline pod annotations (bad2c706): Allows users to set custom annotations on Pipeline pod templates.
Config lint validation (4a8c0add): Pipeline controller now validates configYaml using rpk connect lint at reconcile time. Added pipeline lifecycle acceptance tests (create, pause, resume, delete).

Documentation

Updated CLAUDE.md (cdf84208, 6ada3dcd, d60947f2): Added golden file workflows, gotohelm learnings, RBAC regeneration workflow, and local lint instructions.

Maintenance

Bumped default RPCN image from connect:4.86.0 to connect:4.87.0.
Merged main (6173abfb): Resolved conflicts with upstream.

… detection - Change finalizer addition from Apply/SSA to Update to avoid taking ownership of spec fields, which caused SSA conflicts when users updated .spec.configYaml via kubectl apply --server-side - Use shorter requeue interval (15s) during provisioning/pending phases so init-container lint failures are detected within seconds instead of waiting the full 5-minute requeue cycle - Check LastTerminationState in addition to current State when inspecting the lint init container, catching failures between container restarts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…nection When a Pipeline references a Redpanda cluster via spec.cluster.clusterRef, the operator resolves the cluster's connection details (brokers, TLS, SASL) and injects them as environment variables and volume mounts into the Connect pod. This allows pipeline configs to use ${RPK_BROKERS}, ${RPK_TLS_ENABLED}, ${RPK_TLS_ROOT_CAS_FILE}, ${RPK_SASL_MECHANISM}, ${RPK_SASL_USER}, and ${RPK_SASL_PASSWORD} to connect to operator-managed Redpanda clusters. Changes: - Add cluster.go with resolution logic using ConvertV2ToRenderState - Inject cluster env vars and TLS CA cert volume in render.go - Watch referenced Redpanda clusters via field index for re-reconciliation - Add ClusterRef condition to Pipeline status - Add RBAC for reading Redpanda CRs - Add acceptance tests for produce/consume via clusterRef Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add spec.credentials to Pipeline CRD allowing users to specify custom SASL credentials (mechanism, username, passwordSecretRef) instead of defaulting to the cluster's bootstrap admin user. When credentials is set alongside a clusterRef, the explicit credentials take precedence. This enables pairing a Pipeline with a dedicated User CRD for least-privilege access. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Change credentials.password from corev1.SecretKeySelector to ValueSource, matching the pattern used by all other CRDs in this operator. This adds support for: - Kubernetes Secrets (secretKeyRef) - ConfigMaps (configMapKeyRef) - Inline values (inline) - External secret providers via the externalSecretRef field (AWS Secrets Manager, GCP Secret Manager, Azure Key Vault) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ials When credentials come from spec.credentials (dedicated user), use RPK_CREDENTIALS_SASL_MECHANISM, RPK_CREDENTIALS_SASL_USER, and RPK_CREDENTIALS_SASL_PASSWORD. When credentials come from the cluster's bootstrap admin user, use RPK_SASL_MECHANISM, RPK_SASL_USER, and RPK_SASL_PASSWORD. This makes the credential source explicit in the pipeline configuration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Document how ClusterRef resolution works (ConvertV2ToRenderState + AsStaticConfigSource pattern), how controllers watch referenced clusters (multicluster vs single-cluster patterns), and the ValueSource type for secrets including external secret provider support (AWS, GCP, Azure). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…alignment Add DeepCopyInto/DeepCopy for PipelineSASLCredentials and add Credentials field handling to PipelineSpec.DeepCopyInto. Fix struct field alignment in render.go lint init container. Note: CRD YAML, CRD docs, and RBAC formatting still require `nix develop -c task generate` to fully regenerate. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Run `task generate` to regenerate: - CRD schema with credentials field (ValueSource-based password) - CRD reference docs with PipelineSASLCredentials type - RBAC ClusterRole with controller-gen formatting Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Document that CRD YAML, deepcopy, CRD docs, RBAC, and Helm templates must always be regenerated via `nix develop -c task generate`, never hand-edited or reconstructed from CI diffs. Also note the fallback nix binary path at /nix/var/nix/profiles/default/bin/nix. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Remove the PipelineSASLCredentials type and spec.credentials field. Users who need non-admin SASL credentials should use spec.secretRef or spec.env to inject custom username/password env vars, and configure the SASL mechanism directly in their pipeline configYaml. This simplifies the CRD surface — clusterRef provides broker addresses, TLS, and bootstrap SASL by default. Custom credentials are handled through the existing secret injection mechanisms. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

david-yu · 2026-04-10T18:58:17Z

Changes since last comment

Bug fixes

Fixed Pipeline acceptance test failures (69503ffc):
- Changed finalizer addition from Apply (SSA) to Update to avoid taking ownership of spec.configYaml, which caused SSA conflicts when users updated the field via kubectl apply --server-side
- Shortened requeue interval from 5 minutes to 15 seconds during provisioning/pending phases so init-container lint failures are detected within seconds
- Added LastTerminationState check for lint init container to catch failures between container restarts
Lint fixes (125bc112, 3488cbbe): Added generated DeepCopy for new types, fixed struct field alignment, regenerated CRD YAML/docs/RBAC via task generate

New features

ClusterRef support (3fea052c): When a Pipeline references a Redpanda cluster via spec.cluster.clusterRef, the operator resolves broker addresses, TLS, and SASL credentials from the Redpanda CR and injects them as environment variables (RPK_BROKERS, RPK_TLS_ENABLED, RPK_TLS_ROOT_CAS_FILE, RPK_SASL_MECHANISM, RPK_SASL_USER, RPK_SASL_PASSWORD) and TLS CA cert volume mounts into the Connect pod. Uses the same ConvertV2ToRenderState + AsStaticConfigSource pattern as the Console controller. Watches referenced Redpanda CRs via field index so Pipelines re-reconcile when their cluster changes. Adds ClusterRef condition to Pipeline status.
Acceptance tests for clusterRef (3fea052c): Two new scenarios — Pipeline produces to Redpanda via clusterRef (generate → redpanda output, verifies messages arrive), and Pipeline reads from Redpanda via clusterRef (produces via kafka client, Pipeline consumes with redpanda input).

Refactoring

Removed spec.credentials field (6d801607): Removed the PipelineSASLCredentials type that was briefly added for dedicated user credentials. Instead, users who need non-admin SASL credentials should use spec.secretRef or spec.env to inject username/password from a Secret, and configure the SASL mechanism directly in their configYaml. This keeps the CRD surface simple — clusterRef provides broker addresses, TLS, and bootstrap SASL by default; custom credentials are handled through existing secret injection mechanisms. PR description updated with full examples showing both approaches.

Documentation

CLAUDE.md updates (015231d1, 8694efb8): Added ClusterRef resolution docs (how ConvertV2ToRenderState + AsStaticConfigSource works, multicluster vs single-cluster watch patterns), ValueSource/external secret provider docs, and a rule to never hand-edit generated files (always use nix develop -c task generate).

…ifests The PatchManifest function in acceptance tests expands ${KEY} patterns as test template variables. Pipeline configYaml contains ${RPK_BROKERS}, ${RPK_TLS_ENABLED}, etc. which are Redpanda Connect runtime env var interpolations resolved inside the container, not test framework vars. Pass through any ${RPK_*} pattern without expansion so these reach Kubernetes as literal text for Connect to interpolate at runtime. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The lint init container runs /redpanda-connect lint which needs env vars like RPK_BROKERS, RPK_TLS_ENABLED, RPK_TLS_ROOT_CAS_FILE to resolve ${...} interpolations in the pipeline config. Without the env vars, the linter sees literal strings where it expects typed values (e.g., "${RPK_TLS_ENABLED}" instead of a boolean), causing lint to fail and the pod to CrashLoopBackOff. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The Pipeline_produces_to_Redpanda_via_clusterRef acceptance test failed consistently because Redpanda defaults auto_create_topics_enabled to false. The producer pipeline could not auto-create the target topic, so no messages were ever delivered. - Pre-create pipeline-produce-test topic before running the producer pipeline, matching the pattern used by the consumer scenario - Remove misleading "Found topic" logs from ExpectTopic/ExpectNoTopic that printed unconditionally even when the topic was not found - Increase checkTopic timeout from 10s to 30s for CI stability - Handle NotFound/Conflict errors during finalizer removal to avoid noisy UID precondition errors when pipelines are deleted concurrently Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add spec.budget field to Pipeline with maxUnavailable/minAvailable options, following the convention used by Strimzi and Prometheus Operator. The PDB is rendered by the Syncer alongside the Deployment and ConfigMap, so it is automatically garbage-collected on CR deletion. CRD validation enforces exactly one of maxUnavailable or minAvailable via CEL rule. RBAC updated for policy/poddisruptionbudgets. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

david-yu · 2026-04-14T02:37:27Z

Changes since last comment

New feature

PodDisruptionBudget support (5ca73507): Added spec.budget field to the Pipeline CRD with maxUnavailable or minAvailable options, following the convention used by Strimzi and Prometheus Operator. The PDB is rendered by the Syncer alongside the Deployment and ConfigMap, so it is automatically garbage-collected on CR deletion. CEL validation enforces exactly one of the two fields must be set. RBAC markers added and pipeline controller added to controller-gen RBAC generation loop in taskfiles/k8s.yml. All generated files (CRD YAML, DeepCopy, RBAC, chart golden files, CRD docs) regenerated via nix develop -c task k8s:generate. Three new render tests: PDB not configured, maxUnavailable (int), minAvailable (percentage).

Bug fixes

Lint init container missing env vars (026d0735): The lint init container was not receiving the env vars (including RPK_* cluster connection vars from clusterRef), causing lint to fail when the pipeline config referenced environment variable interpolations like ${RPK_BROKERS}.
Acceptance test env var passthrough (adee691b): RPK_* environment variable interpolations in acceptance test manifests were being treated as Go template expressions and replaced with empty strings. Added passthrough for RPK_-prefixed vars.
Pre-create topic in produce test (d46f37b9): The produce acceptance test was racing against topic auto-creation. Now explicitly creates the topic before starting the pipeline. Also added a nil guard for status.ReadyReplicas in the controller to avoid panics on freshly created Deployments.

PR description updates

Added "Failure modes and recovery" section with detailed node failure walkthrough (detection → eviction → rescheduling → operator reconcile), failure mode table, and key design decisions
Updated PDB entry from "not supported" to documenting the new spec.budget field and its interaction with the Recreate deployment strategy

david-yu changed the title ~~Add Connect CRD for Redpanda Connect pipeline management~~ Add Pipeline CRD for Redpanda Connect pipeline management Mar 24, 2026

github-actions bot added the stale label Mar 30, 2026

david-yu removed the stale label Mar 30, 2026

david-yu marked this pull request as ready for review March 30, 2026 22:51

david-yu requested review from RafalKorepta, andrewstucki, chrisseto, gene-redpanda and hidalgopl as code owners March 30, 2026 22:51

andrewstucki requested changes Mar 31, 2026

View reviewed changes

david-yu marked this pull request as draft March 31, 2026 03:37

github-actions bot added the stale label Apr 6, 2026

david-yu removed the stale label Apr 6, 2026

david-yu and others added 14 commits April 8, 2026 11:56

bump default Connect image tag to 4.84.1

b9b64d0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: add new Connect fields to crd-docs.adoc

40d67fb

Add displayName, description, tags, and configFiles documentation to the ConnectSpec section of the generated CRD docs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add changelog for Connect controller enabled by default

f46b4d2

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: update generated files for connectController helm value

8d556ab

Update README, template, schema, partial types, and golden files to include the new connectController chart value. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Update changelog: remove unnecessary license detail

c30261c

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: update README connectController default to false

670aa5b

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

david-yu and others added 10 commits April 9, 2026 16:42

Merge remote-tracking branch 'origin/main' into feat/connect-crd

6173abf

# Conflicts: # acceptance/go.sum # gen/go.sum

fix: regenerate chart template golden files after main merge

ed6a4e6

The merge from main introduced new RBAC rules (endpoints, endpointslices, serviceexports, serviceimports) that were not reflected in the golden test file. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: fix import ordering for gci linter

5a47fcc

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: fix import ordering to match gci linter expectations

574e722

Run task generate (which includes gci lint-fix) to ensure generated files have the correct import ordering that CI expects. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

david-yu and others added 10 commits April 9, 2026 23:06

david-yu mentioned this pull request Apr 10, 2026

fix: wait for cert-manager webhook in decommissioning integration test #1441

Draft

3 tasks

david-yu force-pushed the feat/connect-crd branch from ac78d3a to 026d073 Compare April 10, 2026 21:28

david-yu and others added 2 commits April 10, 2026 19:09

david-yu force-pushed the feat/connect-crd branch from 4dc1932 to 5ca7350 Compare April 14, 2026 02:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Pipeline CRD for Redpanda Connect pipeline management#1337

Add Pipeline CRD for Redpanda Connect pipeline management#1337
david-yu wants to merge 53 commits intomainfrom
feat/connect-crd

david-yu commented Mar 23, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 30, 2026

Uh oh!

andrewstucki left a comment

Uh oh!

david-yu commented Mar 31, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 6, 2026

Uh oh!

david-yu commented Apr 10, 2026

Uh oh!

david-yu commented Apr 10, 2026

Uh oh!

david-yu commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

david-yu commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's included

Reference implementation

CLAUDE.md

Try it out

Step 1: Check out the branch and install CRDs

Step 2: Create a license Secret

Step 3: Deploy the operator with the pre-built image

Step 4: Deploy a Connect pipeline

Step 5: Verify the pipeline is running

Clean up

Usage guide

Prerequisites

Configure the license and enable the Connect controller

Create a Connect pipeline

Monitor the pipeline

Pause / resume a pipeline

Spread pods across availability zones

Connect to a Redpanda cluster via clusterRef

Using dedicated user credentials (non-admin) with clusterRef

Passing secrets to a Pipeline

Option A: Reference an entire Secret (spec.secretRef)

Option B: Reference individual Secret keys (spec.env)

Common annotations for Gatekeeper compliance

Monitoring Pipeline metrics with Prometheus

Monitoring Pipeline metrics with Datadog

Configuration lint validation

License validation and troubleshooting

Failure modes and recovery

Node failure (detailed walkthrough)

Test plan

Uh oh!

github-actions bot commented Mar 30, 2026

Uh oh!

andrewstucki left a comment

Choose a reason for hiding this comment

Uh oh!

david-yu commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 6, 2026

Uh oh!

david-yu commented Apr 10, 2026

Changes since last comment

Bug fixes

New features

Documentation

Maintenance

Uh oh!

david-yu commented Apr 10, 2026

Changes since last comment

Bug fixes

New features

Refactoring

Documentation

Uh oh!

david-yu commented Apr 14, 2026

Changes since last comment

New feature

Bug fixes

PR description updates

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

david-yu commented Mar 23, 2026 •

edited

Loading

Option A: Reference an entire Secret (`spec.secretRef`)

Option B: Reference individual Secret keys (`spec.env`)

david-yu commented Mar 31, 2026 •

edited

Loading