Skip to content

nebari-dev/mlflow-pack

Repository files navigation

Nebari MLflow Pack

Deploys MLflow on Nebari with Keycloak authentication, PostgreSQL backend storage, and automatic TLS.

Quick Start

  1. Create the PostgreSQL credentials secret:

    kubectl create namespace mlflow
    
    kubectl create secret generic mlflow-pack-postgresql \
      --namespace mlflow \
      --from-literal=password="$(openssl rand -base64 32)" \
      --from-literal=postgres-password="$(openssl rand -base64 32)"

    The secret name must be <release-name>-postgresql and contain keys password (mlflow DB user) and postgres-password (superuser).

  2. Copy the example ArgoCD Application and edit it for your cluster:

    cp examples/nebari-values.yaml /path/to/your/gitops-repo/apps/mlflow-pack.yaml

    Update nebariapp.hostname, nebariapp.keycloakHostname, and mlflow.postgresql.primary.persistence.storageClass for your environment.

  3. Add mlflow.<your-domain> to your gateway certificate and DNS.

See examples/nebari-values.yaml for the full ArgoCD Application manifest.

Connecting JupyterHub

To allow nebari-data-science-pack notebooks to log experiments to MLflow, add the following to your data-science-pack ArgoCD Application values:

jupyterhub:
  singleuser:
    extraEnv:
      MLFLOW_TRACKING_URI: "http://mlflow-pack.mlflow.svc.cluster.local:80"
    networkPolicy:
      egress:
        - ports:
            - port: 5000
              protocol: TCP
          to:
            - namespaceSelector:
                matchLabels:
                  kubernetes.io/metadata.name: mlflow

The egress rule uses port 5000 (the pod port) because NetworkPolicy operates at the pod IP level, not the ClusterIP service level (which maps 80 to 5000).

After applying, existing JupyterLab sessions must be restarted (stop/start from the hub control panel) to pick up the new environment variable and NetworkPolicy.

Verify from a notebook

import mlflow
mlflow.set_experiment("test")
with mlflow.start_run():
    mlflow.log_param("framework", "pytorch")
    mlflow.log_metric("accuracy", 0.95)
print("Run ID:", mlflow.last_active_run().info.run_id)

Configuration

PostgreSQL

By default, this chart bundles a Bitnami PostgreSQL instance. For dev/testing you can pass the password inline instead of creating a secret:

helm install mlflow-pack . \
  --set mlflow.postgresql.auth.password=my-dev-password

Do not use inline passwords in production or commit them to a gitops repository.

To disable PostgreSQL and use in-memory SQLite (data lost on pod restart):

mlflow:
  postgresql:
    enabled: false

Allowed Hosts

The chart automatically whitelists the NebariApp hostname and the cluster-internal service name via MLFLOW_SERVER_ALLOWED_HOSTS. To allow additional hosts:

security:
  additionalAllowedHosts:
    - custom-alias.internal

Troubleshooting

NebariApp not ready

kubectl get nebariapp -n mlflow
kubectl describe nebariapp -n mlflow

Check conditions: RoutingReady, TLSReady, AuthReady should all be True.

License

Apache 2.0 - see LICENSE.

About

MLflow experiment tracking and model registry with Keycloak authentication, a PostgreSQL backend, and automatic TLS.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors