CloudNativePG Architecture

Tip

Source Code Navigation: The Cluster resource is the primary and most important Custom Resource Definition (CRD) provided by CloudNativePG. For a detailed mapping of the architectural components to their implementation, please see the Source Code Reference at the bottom of this page.

This document describes the design architecture of CloudNativePG, providing the technical foundation for contributors and auditors to understand how the operator ensures high availability and security for PostgreSQL workloads. It focuses on core lifecycle and reconciliation; topics such as WAL archiving, backup/restore pipelines, the plugin system, and webhook admission controllers are outside its current scope.

Design Philosophy

CloudNativePG follows the "Cloud Native" paradigm. Unlike PostgreSQL solutions that delegate failover and replication management to additional components (like Patroni, repmgr, or Stolon), CloudNativePG relies on the Kubernetes API as the single source of truth.

Key Principles

No Sidecars: Management logic is integrated into the primary container as an Instance Manager (PID 1).
Direct Pod Management: Does not use StatefulSets, allowing for more granular control over individual instances.
Native Integration: Leverages Kubernetes primitives (Pods, PVCs, Services, ConfigMaps, Secrets, StorageClasses, VolumeSnapshots) for all database cluster operations.
Security by Design: Minimal attack surface by reducing the number of moving parts and using non-root, immutable application containers.

Architectural Actors

The following diagram illustrates how the Cluster resource orchestrates a PostgreSQL cluster in the Kubernetes ecosystem:

graph TD
    subgraph Control_Plane [Kubernetes Control Plane]
        Cluster[Cluster CRD]
        Operator[CNPG Operator]
        K8sAPI[API Server]
    end

    subgraph Node ["Worker Node"]
        Kubelet[Kubelet]

        subgraph Pod ["PostgreSQL Pod (Operand)"]
            IM[Instance Manager <br/> PID 1]
            PG[(Postgres Process)]

            IM ---|Controls / Monitors| PG
        end

        PVC[(Persistent Volume Claim)]
    end

    SvcRW[Service: -rw]
    SvcRO[Service: -ro]
    SvcR[Service: -r]

    %% Probes
    Kubelet -->|HTTP startup probe| IM
    Kubelet -->|HTTP liveness probe| IM
    Kubelet -->|HTTP readiness probe| IM

    %% Logic & Persistence
    Cluster -->|Watched by| Operator
    Operator -->|Reconciles| K8sAPI
    IM -.->|Reports Health & LSN| K8sAPI

    PG ---|Mounts Data/WAL| PVC

    %% Traffic Routing
    K8sAPI ---|Selector: role=primary| SvcRW
    K8sAPI ---|Selector: role=replica| SvcRO
    K8sAPI ---|Selector: all ready| SvcR

Key Differentiator: Beyond StatefulSets

A core architectural distinction of CloudNativePG is that it does not use StatefulSets. By managing Pod and PVC resources directly, the Operator gains surgical control over the cluster state.

1. Granular Failover and Promotion

StatefulSets are bound by ordinal logic (0, 1, 2). CloudNativePG can promote any replica based on the Log Sequence Number (LSN) reported by the Instance Manager, ensuring the most up-to-date instance is always chosen as the new primary, regardless of its name or index.

2. Managing "Sensitive" Parameters

In PostgreSQL, certain parameters (e.g., max_connections, max_prepared_transactions, max_wal_senders) must be equal to or greater on a standby than they are on the primary. If a standby starts with a lower value, it will refuse to start or follow the primary.

Because CloudNativePG manages Pods directly, it can coordinate the update of these parameters across the cluster in the correct order, ensuring that standbys are always compatible with the primary during rolling updates or configuration changes.

3. Synchronous Replication Control

The Operator dynamically manages synchronous_standby_names by watching Pod status and updating the PostgreSQL configuration accordingly. Users configure this via the declarative synchronous section, which supports priority-based (first) and quorum-based (any) methods, allowing users to toggle between required (strict durability) and preferred (availability-first) modes without manual intervention.

The Role of the Instance Manager (PID 1)

The Instance Manager is a Go binary that acts as the entry point for the container. Its responsibilities include:

Postgres Lifecycle: Initializing the cluster (initdb), starting the database, and managing clean shutdowns.
Self-Healing: Participating in failover by executing pg_ctl promote when the Operator identifies it as the new leader.
Kubernetes Awareness: Communicating directly with the K8s API to report status, replication lag, and LSN.

Intelligent Probes

The Instance Manager provides a database-aware HTTP server for Kubelet probes:

Startup Probe: Prevents restarts during initdb, recovery, or WAL replay.
Liveness Probe: On primary instances, performs an isolation check: if both the API server and peer instances are unreachable (as determined by the configurable IsolationCheck settings), the probe fails, causing Kubelet to restart the Pod. Replicas always pass the liveness check.
Readiness Probe: Ensures pg_isready succeeds on the primary and validates replication lag/hot-standby status on replicas before allowing traffic.

Cluster Lifecycle: Orchestrated Resources

When a Cluster is created, the Operator generates several native objects to manage the identity, security, and connectivity of the instances.

1. Bootstrap Phase: Initialization and Join Jobs

Before long-running Pods start, CloudNativePG uses temporary Jobs to prepare the storage:

initdb: Created for the first instance to initialize the PostgreSQL data directory on the PVC.
recovery: Created when bootstrapping from a backup or volume snapshot, restoring the data directory for point-in-time recovery.
pg_basebackup: Created when cloning from an external cluster via streaming replication.
join: Created for subsequent replicas to clone data from the primary using pg_basebackup.

2. Operational Phase: Resource Hierarchy

For a basic 3-instance cluster (see cluster-example.yaml), the following hierarchy of objects is maintained:

graph LR
    Cluster[Cluster: cluster-example]

    %% Identity & RBAC
    Cluster --> SA[ServiceAccount: cluster-example]
    Cluster --> Role[Role: cluster-example]
    Cluster --> RB[RoleBinding: cluster-example]

    %% Compute & Storage
    Cluster --> P1[Pod: cluster-example-1]
    Cluster --> P2[Pod: cluster-example-2]
    Cluster --> P3[Pod: cluster-example-3]

    P1 --- PVC1[(PVC: cluster-example-1)]
    P2 --- PVC2[(PVC: cluster-example-2)]
    P3 --- PVC3[(PVC: cluster-example-3)]

    %% Networking
    Cluster --> SvcRW[Service: cluster-example-rw]
    Cluster --> SvcRO[Service: cluster-example-ro]
    Cluster --> SvcR[Service: cluster-example-r]

    %% Security
    Cluster --> SecCA(Secret: cluster-example-ca)
    Cluster --> SecSrv(Secret: cluster-example-server)
    Cluster --> SecRepl(Secret: cluster-example-replication)
    Cluster --> SecApp(Secret: cluster-example-app)

    %% Availability
    Cluster --> PDB1[PodDisruptionBudget: cluster-example-primary]
    Cluster --> PDB2[PodDisruptionBudget: cluster-example]

    %% Monitoring (namespace-level, not owned by the Cluster)
    Cluster -.-> CM(ConfigMap: cnpg-default-monitoring)

    style Cluster fill:#f96,stroke:#333,stroke-width:2px

Label-Based Networking

Networking is purely label-driven. The Operator manages the cnpg.io/instanceRole label:

Primary: Labeled as primary. The -rw Service selects this Pod.
Replica: Labeled as replica. The -ro Service selects these Pods.
Failover: When a new primary is promoted via pg_ctl promote, the Operator updates the labels. The Kubernetes API Server then automatically updates the Endpoints for the respective Services.

Source Code Reference

The following table maps architectural components to their implementation in the repository. The Core components form the backbone of the operator; the remaining entries are peripheral CRDs for specific feature areas.

Category	Component	Types Definition	Logic / Functions
Core	`Cluster`	`cluster_types.go`	`cluster_funcs.go`
Core	Instance Manager	N/A	`pkg/management/postgres/`
Core	Operator Controller	N/A	`cluster_controller.go`
Image management	`ClusterImageCatalog`	`clusterimagecatalog_types.go`	`clusterimagecatalog_funcs.go`
Image management	`ImageCatalog`	`imagecatalog_types.go`	`imagecatalog_funcs.go`
Backup scheduling	`Backup`	`backup_types.go`	`backup_funcs.go`
Backup scheduling	`ScheduledBackup`	`scheduledbackup_types.go`	`scheduledbackup_funcs.go`
Pooling	`Pooler`	`pooler_types.go`	`pooler_funcs.go`
Database management	`Database`	`database_types.go`	`database_funcs.go`
Database management	`Publication`	`publication_types.go`	`publication_funcs.go`
Database management	`Subscription`	`subscription_types.go`	`subscription_funcs.go`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CloudNativePG Architecture

Design Philosophy

Key Principles

Architectural Actors

Key Differentiator: Beyond StatefulSets

1. Granular Failover and Promotion

2. Managing "Sensitive" Parameters

3. Synchronous Replication Control

The Role of the Instance Manager (PID 1)

Intelligent Probes

Cluster Lifecycle: Orchestrated Resources

1. Bootstrap Phase: Initialization and Join Jobs

2. Operational Phase: Resource Hierarchy

Label-Based Networking

Source Code Reference

FilesExpand file tree

technical-architecture.md

Latest commit

History

technical-architecture.md

File metadata and controls

CloudNativePG Architecture

Design Philosophy

Key Principles

Architectural Actors

Key Differentiator: Beyond StatefulSets

1. Granular Failover and Promotion

2. Managing "Sensitive" Parameters

3. Synchronous Replication Control

The Role of the Instance Manager (PID 1)

Intelligent Probes

Cluster Lifecycle: Orchestrated Resources

1. Bootstrap Phase: Initialization and Join Jobs

2. Operational Phase: Resource Hierarchy

Label-Based Networking

Source Code Reference