**FIXME** Link here to doc about our current Kubernetes cluster and hosting setup. - Monitoring - [ ] Add customized dashboards using [Grafonnet](https://github.com/grafana/grafonnet-lib) to [kube-prometheus](https://github.com/codeformuenster/kubernetes-deployment/tree/master/sources/kube-prometheus). - [ ] cert-manager - [ ] openebs - [ ] Configure Alerts and send notifications to a matrix-channel. - Maybe: [matrix-alertmanager](https://github.com/jaywink/matrix-alertmanager) - [ ] Add website analytics with Fathom. - [ ] Create public status page with overview of current apps. - [ ] Regularly check observatory.mozilla.org for all public sites. - Authentication - [ ] OpenID Connect via Keycloak for kube-apiserver and apps. - [ ] Add [gangway](https://github.com/heptiolabs/gangway). - Security - [ ] Create restricted [Pod Security Policy](https://kubernetes.io/docs/concepts/policy/pod-security-policy/) to only allow non-root. - [ ] [Default deny all ingress traffic](https://kubernetes.io/docs/concepts/services-networking/network-policies/#default-deny-all-ingress-traffic) - [ ] RBAC - Shared services - [ ] Kinto - [ ] Postgres - [ ] Minio - [ ] Elasticsearch - Backup - [ ] Push database snapshots and filestores regularly so some `s3` storage. - Stability - [ ] Automatically replace the oldest node every twelve hours with a fresh one. Maybe with the help of [kured](https://github.com/weaveworks/kured). - [ ] Make sure limits are set with every pod. - [ ] Make every service be backed by at least two replicas. Label apps that can't deal with this. - [ ] Set [PodDisruptionBudget](https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/) for all apps. - [ ] Set [recommended labels](https://kubernetes.io/docs/concepts/overview/working-with-objects/common-labels/) for all resources. **Random Ideas** - [ ] Try [varnish](https://medium.com/@andreic9203/varnish-websocket-ba517d22a805) with `traffics` and `crashes`. - [ ] Add blackbox exporter for our public services.
FIXME Link here to doc about our current Kubernetes cluster and hosting setup.
s3storage.Random Ideas
trafficsandcrashes.