Skip to content

Migrate mini-runtime to Strimzi-managed Kafka#215

Open
gauravakto wants to merge 19 commits into
masterfrom
feat/kafka-cluster
Open

Migrate mini-runtime to Strimzi-managed Kafka#215
gauravakto wants to merge 19 commits into
masterfrom
feat/kafka-cluster

Conversation

@gauravakto
Copy link
Copy Markdown
Contributor

@gauravakto gauravakto commented May 6, 2026

Summary

  • Replaces embedded Confluent CP Kafka sidecar with Strimzi-managed Kafka (KRaft mode, no Zookeeper)
  • Two modes: Strimzi (default, kafkaCluster.enabled: true) and External (mini_runtime.useExternalKafka: true)
  • Strimzi operator installed as Helm subchart dependency (v0.51.0) — no manual pre-install steps
  • Full feature parity: SASL/SCRAM-SHA-256/512, TLS, port overrides, service annotations, topic configuration
  • wait-for-kafka init container on runtime and threat-client deployments prevents connection race
  • Cluster name: akto-kafka; bootstrap DNS auto-configured from cluster name and namespace

What changed

  • charts/mini-runtime/templates/deployment.yaml — removed kafka1 sidecar + JAAS init container; added Kafka wait init container; updated broker URL and SASL/TLS env vars
  • charts/mini-runtime/templates/kafka-cluster.yaml — new file with Strimzi CRDs (Kafka, KafkaNodePool, KafkaTopic, KafkaUser)
  • charts/mini-runtime/templates/mini-runtime.yaml — deleted (Kafka Service now created by Strimzi)
  • charts/mini-runtime/templates/sasl-secret.yaml — deleted (replaced by KafkaUser CRD)
  • charts/mini-runtime/Chart.yaml — added strimzi-kafka-operator subchart dependency
  • charts/mini-runtime/values.yaml — replaced kafka1 block with kafkaCluster block
  • charts/mini-runtime/CLAUDE.md — full documentation
  • .github/workflows/test.yaml — CI workflow (lint + 44 unit tests, triggers only on mini-runtime changes)
  • charts/mini-runtime/tests/ — helm-unittest test suite covering SASL, TLS, port override, service annotations, external Kafka, annotations

Test plan

  • helm lint passes for all value combinations (default, SASL, TLS, SASL+TLS, external Kafka, port override)
  • 44 helm-unittest tests pass (helm unittest charts/mini-runtime)
  • Live install verified: Strimzi operator deploys, Kafka cluster comes up, runtime and threat-client connect successfully

🤖 Generated with Claude Code

gauravakto and others added 19 commits February 24, 2026 10:47
Replace the embedded Confluent CP Kafka sidecar with Strimzi-managed
Kafka CRDs. The chart now deploys KafkaNodePool, Kafka, KafkaTopic,
and KafkaUser resources directly when kafkaCluster.enabled=true, and
installs the Strimzi operator via a pre-install hook Job.

Key changes:
- Remove kafka1 sidecar container, JAAS init container, and Kafka Service
- Add kafkaCluster.* values block (replaces kafka1.* and ports.*)
- Add templates/kafka-cluster.yaml for Strimzi CRDs
- Add strimzi-install-job.yaml + strimzi-install-rbac.yaml pre-install hook
- Update broker URL and SASL/TLS env vars in runtime and threat client
  to use three-way conditional: kafkaCluster → externalKafka → manual
- Forward Kafka Service annotations via spec.kafka.template in Kafka CR
- Update kafka-persistent.yaml: replication factor 2→3, deleteClaim
  false→true, add akto.daemonset.producer.heartbeats topic
- Add CLAUDE.md documenting all Kafka modes and new values

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…stall

- Add strimzi-kafka-operator 0.51.0 as a chart dependency (condition:
  kafkaCluster.strimziInstall.enabled) so CRDs are installed by Helm
  before Strimzi CRs are created — fixes install ordering issue
- Remove strimzi-install-job.yaml and strimzi-install-rbac.yaml
  (pre-install hook approach no longer needed)
- Update Strimzi CR apiVersion from kafka.strimzi.io/v1beta2 to
  kafka.strimzi.io/v1 to fix deprecation warnings
- Remove unused busybox image value

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Match the old embedded kafka1 sidecar resource footprint (1.5 CPU / 4Gi)
by defaulting to 1 broker replica with tight limits. Users can scale to
HA by setting broker.replicas: 3 and bumping replication factors to 3.

- broker: 1 replica, 1.5 CPU / 4Gi (matches old sidecar)
- controller: 1 replica, 500m CPU / 1Gi
- topic partitions/replicas default to 1 (increase with broker count)
- replication factors default to 1

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a busybox init container that polls the Kafka bootstrap Service
with nc until the port is accepting connections. Prevents the runtime
from crashing on startup due to DNS/connection race condition when
Strimzi Kafka is still initializing.

Only injected when kafkaCluster.enabled: true.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Same busybox nc-based init container as the runtime deployment,
prevents threat client from crashing on startup when Strimzi Kafka
is still initializing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- kafkaCluster.enabled: true by default (Strimzi mode)
- mini_runtime.useExternalKafka: true for external Kafka mode
- Remove strimziInstall flag — Strimzi operator install is now tied
  directly to kafkaCluster.enabled via subchart condition
- Update CLAUDE.md to reflect two-mode model

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- 44 unit tests covering SASL/SCRAM, TLS, port override, service
  annotations, external Kafka mode, and deployment/pod annotations
- Uses documentSelector with skipEmptyTemplates to avoid index fragility
- Adds missing --- separator between keel and threat-client deployments
- GitHub Actions workflow triggers only on charts/mini-runtime/** changes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The authorization.type: simple ACLs on KafkaUser require the Kafka broker
to have spec.kafka.authorization.type: simple enabled — otherwise Strimzi
rejects the KafkaUser with ReconciliationException. Since this chart only
needs SASL authentication (not fine-grained ACL enforcement), the
authorization block is removed. KafkaUser now only configures SCRAM
credentials.

Also adds charts/mini-runtime/tests/integration_test.sh — a self-contained
shell script that runs 6 integration scenarios against a live cluster:
default Strimzi install, external Kafka, TLS, port override, annotations,
and SASL.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant