Skip to content

Latest commit

 

History

History
558 lines (459 loc) · 20.1 KB

File metadata and controls

558 lines (459 loc) · 20.1 KB

Phase 0 Research: OrbStack + ArgoCD 3.1.x Bootstrap

Date: 2026-02-10 | Spec: spec.md | Plan: plan.md


Topic 1: ArgoCD 3.1.x Breaking Changes from 2.x

Summary of Changes (v2.14 → v3.0 → v3.1)

ArgoCD 3.0 is explicitly described as a "low-risk upgrade containing only minor breaking changes" — there is no v1alpha1 → v1beta1 API migration. The Application CRD remains apiVersion: argoproj.io/v1alpha1. The breaking changes are behavioral, not schema-level.

Breaking Changes Relevant to This Project

Change Impact Action Required
Annotation-based tracking by default Resources tracked via annotation instead of label. New installs unaffected. None (fresh install uses annotation tracking)
Fine-grained RBAC for update/delete update/delete policies no longer cascade to sub-resources None (local dev, using admin)
Logs RBAC enforced by default logs, get must be explicitly granted None (local dev, admin has full access)
Default resource.exclusions High-churn resources (Endpoints, EndpointSlice, Lease, etc.) excluded by default Beneficial — reduces load on single-node cluster
Legacy repo config in argocd-cm removed Repos must be configured as Secrets, not in ConfigMap None (Helm chart handles this via configs.repositories)
Helm upgraded to 3.17.1 null values in subchart values.yaml now override instead of warning Audit all values files for null values
Health status no longer persisted in Application CR Health stored in appTree, not .status.resources[].health None (no tooling parsing CR status directly)
Status field ignored from diffs by default .status diffs ignored for all resources (was CRD-only) Beneficial for local dev
ApplicationSet applyNestedSelectors ignored Nested selectors always applied None (not using nested selectors)

API Version Status

  • Decision: Application CRD stays at apiVersion: argoproj.io/v1alpha1 in ArgoCD 3.x. No migration needed.
  • Rationale: ArgoCD 3.0 release notes contain zero mention of API version changes. The CRD schema is unchanged.
  • Key evidence: The Helm chart 8.0.0 changelog (which deploys ArgoCD v3.0.0) and official upgrade guide both reference argoproj.io/v1alpha1 in all examples.

Helm Chart argo-cd 8.x → 9.x Breaking Changes

Chart Version ArgoCD Version Key Change
8.0.0 v3.0.0 Deploys ArgoCD v3.0. Read v2.14→3.0 upgrade guide.
9.0.0 v3.1.x Removed all default .Values.configs.params parameters (except create and annotations). Default applicationsetcontroller.policy changed from 'sync' to "" (empty string).
9.1.0 v3.1.x Redis-HA selector label change requires argocd-redis-ha-haproxy deployment replacement (only if using redis-ha).

Decision: Use Helm chart argo-cd version 9.0.1 with ArgoCD v3.1.9.

Rationale: Chart 9.0.x aligns with ArgoCD v3.1.x stable. The configs.params change is not impactful for fresh installs — we explicitly set all needed params in our values file.

Action for 9.0.0: If using ApplicationSet auto-sync, explicitly set applicationsetcontroller.policy: 'sync' in values, since the new default is empty string.

App-of-Apps Pattern in 3.1.x

Decision: Use raw Application YAML manifests in a directory, pointed to by a root Application with source.directory.recurse: false.

Rationale: The App-of-Apps pattern is unchanged in 3.1.x. The root Application simply points to a Git directory containing child Application manifests. No new pattern required.

# Root Application (argocd/applications/root-app-of-apps.yaml)
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: root-app-of-apps
  namespace: argocd
  finalizers:
    - resources-finalizer.argocd.argoproj.io
spec:
  project: infrastructure
  source:
    repoURL: https://github.com/<org>/<repo>.git
    targetRevision: HEAD
    path: argocd/applications
    directory:
      recurse: false
  destination:
    server: https://kubernetes.default.svc
    namespace: argocd
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true

Alternatives Considered:

  • ApplicationSet with Git generator: More dynamic but adds complexity for a fixed set of ~5 components. Overkill for local dev.
  • Helm umbrella chart for child Applications: Adds a Helm layer that provides no value over raw manifests.

Topic 2: OrbStack Kubernetes Specifics

DNS and Ingress

Decision: Use *.k8s.orb.local wildcard domain for all ingress hostnames.

Rationale: OrbStack provides automatic wildcard DNS resolution for *.k8s.orb.local that routes to LoadBalancer services. No /etc/hosts editing needed. No external DNS required.

How it works:

  • OrbStack resolves *.k8s.orb.local to the LoadBalancer IP automatically
  • LoadBalancer services work out of the box — OrbStack provisions external IPs for them
  • ClusterIP addresses are directly accessible from macOS (no port-forward needed)
  • service.namespace.svc.cluster.local domains are accessible from macOS

Ingress hostnames:

argocd.k8s.orb.local        → ArgoCD UI
grafana.k8s.orb.local       → Grafana dashboard
alertmanager.k8s.orb.local  → Alertmanager (optional)
sample-app.k8s.orb.local    → Validation sample app

Note: OrbStack docs reference *.k8s.orb.local (not *.orb.local). The k8s subdomain is specific to Kubernetes services.

CNI and NetworkPolicies

Decision: OrbStack uses Flannel as the default CNI. Flannel does not support NetworkPolicies natively.

Rationale: OrbStack's docs state "The default CNI is Flannel." Flannel is a simple L3 overlay network that does not implement the Kubernetes NetworkPolicy API. NetworkPolicy resources can be created but will have no effect unless a policy-enforcing CNI (Calico, Cilium) is installed.

Impact on project:

  • NetworkPolicy YAMLs in namespace templates will be created for correctness and portability
  • They will not be enforced on OrbStack's default Flannel CNI
  • This is acceptable for local development — security enforcement is a production concern
  • If enforcement is needed, Calico can be installed alongside Flannel (but adds resource overhead)

Alternatives Considered:

  • Install Calico on OrbStack: Possible but adds ~200-300MB RAM overhead. Not worth it for local dev.
  • Install Cilium: OrbStack docs mention Istio CNI paths but Cilium requires replacing Flannel entirely. Not officially supported.
  • Accept Flannel: Best option for local dev — minimal resource usage, NetworkPolicies exist as documentation.

kubectl Context

Decision: The kubectl context for OrbStack Kubernetes is orbstack.

Rationale: OrbStack automatically configures kubeconfig with context name orbstack. Verified from OrbStack documentation and community usage.

# Verify context
kubectl config get-contexts | grep orbstack
# Switch to OrbStack context
kubectl config use-context orbstack

OrbStack CLI Commands

# Enable Kubernetes (if not already via GUI)
orb start k8s

# Stop Kubernetes
orb stop k8s

# Restart Kubernetes
orb restart k8s

# Delete cluster (full reset)
orb delete k8s

# Check if kubectl is available (bundled with OrbStack)
which kubectl  # → /opt/orbstack-guest/bin/kubectl or ~/.orbstack/bin/kubectl

Known Limitations

Limitation Impact Mitigation
Single-node only No multi-node testing, no real HA Acceptable for local dev. Use kind/k3d in OrbStack for multi-node if needed.
Flannel CNI No NetworkPolicy enforcement Create policies for portability; accept no enforcement locally
No GPU passthrough Cannot test GPU workloads Not relevant for infrastructure bootstrap
Resource contention All pods share host resources Use resource limits and monitor via kube-prometheus-stack
NodePort/LB localhost only Services not accessible from LAN by default Enable in OrbStack Settings → Kubernetes if needed
Container images shared K8s uses same engine as Docker; :latest always pulls Set imagePullPolicy: IfNotPresent and avoid :latest tags

Topic 3: ArgoCD App-of-Apps with Helm + Kustomize Mix

Best Practices for Mixed Sources

Decision: Use raw Application manifest YAMLs in the argocd/applications/ directory. Each child Application specifies its own source type (Helm or Kustomize) independently.

Rationale: ArgoCD Applications are just Kubernetes resources. The root App-of-Apps simply deploys Application manifests from a directory. Each child Application independently declares whether it uses Helm (source.chart or source.helm) or Kustomize (source.kustomize / directory with kustomization.yaml). The root app doesn't need to know or care about the child's source type.

Structure

argocd/applications/
├── root-app-of-apps.yaml           # Root: points to this directory (self-excluded)
├── ingress-nginx.yaml              # Child: Helm (chart: ingress-nginx)
├── cert-manager.yaml               # Child: Helm (chart: cert-manager)
├── sealed-secrets.yaml             # Child: Helm (chart: sealed-secrets)
├── kube-prometheus-stack.yaml      # Child: Helm (chart: kube-prometheus-stack)
└── namespace-templates.yaml        # Child: Kustomize (path: kubernetes/namespace-template)

Child Application: Helm Example

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: ingress-nginx
  namespace: argocd
  finalizers:
    - resources-finalizer.argocd.argoproj.io
spec:
  project: infrastructure
  source:
    repoURL: https://kubernetes.github.io/ingress-nginx
    chart: ingress-nginx
    targetRevision: 4.14.3
    helm:
      releaseName: ingress-nginx
      valueFiles:
        - $values/argocd/helm-values/ingress-nginx/values.yaml
  sources:
    - repoURL: https://kubernetes.github.io/ingress-nginx
      chart: ingress-nginx
      targetRevision: 4.14.3
      helm:
        releaseName: ingress-nginx
        valueFiles:
          - $values/argocd/helm-values/ingress-nginx/values.yaml
    - repoURL: https://github.com/<org>/<repo>.git
      targetRevision: HEAD
      ref: values
  destination:
    server: https://kubernetes.default.svc
    namespace: ingress-nginx
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true

Child Application: Kustomize Example

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: namespace-templates
  namespace: argocd
  finalizers:
    - resources-finalizer.argocd.argoproj.io
spec:
  project: infrastructure
  source:
    repoURL: https://github.com/<org>/<repo>.git
    targetRevision: HEAD
    path: kubernetes/namespace-template/overlays/local
  destination:
    server: https://kubernetes.default.svc
    namespace: default
  syncPolicy:
    automated:
      prune: true
      selfHeal: true

Root App Source Type

Decision: The root Application uses plain directory source type (not Kustomize, not Helm).

Rationale: The argocd/applications/ directory contains plain YAML files. ArgoCD's directory source type deploys all YAML files found in the path. No kustomization.yaml needed. This is the simplest approach and avoids adding unnecessary indirection.

Alternatives Considered:

  • Root app uses Kustomize: Would require a kustomization.yaml listing every child Application file. Adds maintenance burden with no benefit.
  • Root app uses Helm: Would require wrapping Application YAMLs in a Helm chart. Unnecessary complexity.
  • ApplicationSet instead of App-of-Apps: Better for dynamic generation (100+ apps), overkill for 5-6 fixed infrastructure components.

Topic 4: cert-manager Self-Signed ClusterIssuer

Recommended Approach: Bootstrap a CA Issuer from SelfSigned

Decision: Use a two-tier approach: SelfSigned ClusterIssuer → CA root Certificate → CA ClusterIssuer. All ingress TLS uses the CA ClusterIssuer.

Rationale: A pure SelfSigned issuer means every certificate's private key signs itself (no shared CA). This makes trust distribution impossible — each service would have a different CA. Instead, bootstrap a self-signed root CA cert, then use that CA to issue all subsequent certificates. This way there's a single CA to trust.

Configuration

# 1. SelfSigned ClusterIssuer (bootstrap only)
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: selfsigned-issuer
spec:
  selfSigned: {}
---
# 2. Self-signed CA Certificate
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: local-ca
  namespace: cert-manager
spec:
  isCA: true
  commonName: local-dev-ca
  subject:
    organizations:
      - Local Development
  secretName: local-ca-secret
  privateKey:
    algorithm: ECDSA
    size: 256
  issuerRef:
    name: selfsigned-issuer
    kind: ClusterIssuer
    group: cert-manager.io
---
# 3. CA ClusterIssuer (used by all ingresses)
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: local-ca-issuer
spec:
  ca:
    secretName: local-ca-secret

Ingress Annotation for Automatic TLS

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: argocd-server
  namespace: argocd
  annotations:
    cert-manager.io/cluster-issuer: "local-ca-issuer"
spec:
  ingressClassName: nginx
  tls:
    - hosts:
        - argocd.k8s.orb.local
      secretName: argocd-server-tls
  rules:
    - host: argocd.k8s.orb.local
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: argocd-server
                port:
                  number: 80

Key points:

  • Annotation cert-manager.io/cluster-issuer: "local-ca-issuer" triggers automatic Certificate creation
  • tls.secretName must be set — cert-manager stores the cert there
  • tls.hosts determines the certificate's SAN (Subject Alternative Names)
  • For ArgoCD specifically, use configs.params.server.insecure: true and let ingress-nginx handle TLS termination

Alternatives Considered:

  • Pure SelfSigned (no CA): Simpler but each cert has a different signer. Cannot trust a single CA.
  • mkcert: External tool, not Kubernetes-native, doesn't integrate with ingress annotations.
  • Let's Encrypt with DNS challenge: Requires real domain and DNS provider. Overkill for local dev.

Topic 5: kube-prometheus-stack Local Development Values

Decision: Minimal Resource Footprint

Rationale: Local dev cluster has 4-8 CPU and 8-16 GiB RAM shared across all workloads. kube-prometheus-stack's defaults target production clusters. We need to aggressively reduce resource requests and disable unnecessary components.

Recommended Values Overrides

# argocd/helm-values/kube-prometheus-stack/values.yaml

# -- Global settings
fullnameOverride: prometheus

# -- Prometheus
prometheus:
  prometheusSpec:
    # Reduce resource requests for local dev
    resources:
      requests:
        cpu: 100m
        memory: 256Mi
      limits:
        cpu: 500m
        memory: 512Mi
    # Reduce retention for local dev
    retention: 6h
    retentionSize: 1GB
    # Disable persistent storage (ephemeral is fine locally)
    storageSpec: {}
    # Scrape every 30s instead of 15s to reduce load
    scrapeInterval: 30s
    evaluationInterval: 30s
    # Allow discovery of ServiceMonitors in all namespaces
    serviceMonitorSelectorNilUsesHelmValues: false
    podMonitorSelectorNilUsesHelmValues: false
    # Disable Thanos sidecar (not needed locally)
    thanos: {}
    # Reduce replicas
    replicas: 1

# -- Alertmanager
alertmanager:
  alertmanagerSpec:
    resources:
      requests:
        cpu: 10m
        memory: 32Mi
      limits:
        cpu: 100m
        memory: 64Mi
    # No persistent storage locally
    storage: {}

# -- Grafana
grafana:
  # Reduce resources
  resources:
    requests:
      cpu: 50m
      memory: 128Mi
    limits:
      cpu: 200m
      memory: 256Mi
  # Ingress for *.k8s.orb.local
  ingress:
    enabled: true
    ingressClassName: nginx
    annotations:
      cert-manager.io/cluster-issuer: "local-ca-issuer"
    hosts:
      - grafana.k8s.orb.local
    tls:
      - hosts:
          - grafana.k8s.orb.local
        secretName: grafana-tls
  # Default admin credentials (local dev only)
  adminUser: admin
  adminPassword: admin
  # Persistence not needed locally
  persistence:
    enabled: false
  # Sidecar for dashboard loading
  sidecar:
    dashboards:
      enabled: true
    datasources:
      enabled: true

# -- Disable components not needed locally
kubeEtcd:
  enabled: false     # OrbStack manages etcd, metrics not exposed

kubeControllerManager:
  enabled: false     # OrbStack manages, metrics not accessible

kubeScheduler:
  enabled: false     # OrbStack manages, metrics not accessible

kubeProxy:
  enabled: false     # OrbStack uses kube-proxy, but metrics not standard

# -- Node exporter (reduced)
nodeExporter:
  enabled: true
  resources:
    requests:
      cpu: 10m
      memory: 16Mi
    limits:
      cpu: 50m
      memory: 32Mi

# -- Prometheus Operator (reduced)
prometheusOperator:
  resources:
    requests:
      cpu: 50m
      memory: 64Mi
    limits:
      cpu: 200m
      memory: 128Mi
  # Disable admission webhooks for simpler local setup
  admissionWebhooks:
    enabled: true
    patch:
      enabled: true

# -- kube-state-metrics (reduced)
kubeStateMetrics:
  enabled: true

kube-state-metrics:
  resources:
    requests:
      cpu: 10m
      memory: 32Mi
    limits:
      cpu: 50m
      memory: 64Mi

Components Disabled and Why

Component Disabled Reason
kubeEtcd Yes OrbStack manages etcd internally; metrics endpoint not exposed
kubeControllerManager Yes OrbStack manages; metrics endpoint not accessible at standard path
kubeScheduler Yes OrbStack manages; metrics endpoint not accessible at standard path
kubeProxy Yes Metrics bind address defaults to 127.0.0.1 in k3s-based clusters
Thanos sidecar Yes (empty thanos: {}) No HA/long-term storage needed locally
Persistent storage No Ephemeral data is acceptable for local dev; restart restores scraping

Grafana Ingress

Uses the same pattern as all other ingresses:

  • Host: grafana.k8s.orb.local
  • IngressClass: nginx
  • TLS via cert-manager annotation: cert-manager.io/cluster-issuer: local-ca-issuer

Estimated Resource Footprint

Component CPU Request Memory Request
Prometheus 100m 256Mi
Alertmanager 10m 32Mi
Grafana 50m 128Mi
Prometheus Operator 50m 64Mi
Node Exporter 10m 16Mi
kube-state-metrics 10m 32Mi
Total 230m 528Mi

This is ~75% reduction from production defaults.

Alternatives Considered:

  • Disable kube-prometheus-stack entirely: Loses observability. Not recommended even for local dev.
  • Use standalone Prometheus + Grafana: More manual setup, fewer dashboards out of the box.
  • Victoria Metrics: Lighter weight but less community tooling and dashboards. Not worth the ecosystem trade-off.

Cross-Cutting Decisions Summary

Decision Choice Key Reason
ArgoCD API version argoproj.io/v1alpha1 No change in 3.x
Ingress domain *.k8s.orb.local OrbStack built-in wildcard DNS
CNI Flannel (default) Minimal resources; NetworkPolicies as documentation only
kubectl context orbstack OrbStack auto-configured
App-of-Apps root source Plain directory Simplest; no kustomization.yaml needed
TLS issuer CA ClusterIssuer (bootstrapped from SelfSigned) Single CA to trust, automatic via ingress annotations
Monitoring resource strategy Aggressive reduction (~75% below defaults) Local dev has limited resources
Helm chart version argo-cd 9.0.1 Matches ArgoCD v3.1.9 stable
ApplicationSet policy Explicitly set sync if using auto-sync Chart 9.0.0 changed default from sync to ""