Skip to content

Commit 40064cc

Browse files
committed
docs: Update v0.5.1 technical plan
1 parent 17eedc8 commit 40064cc

1 file changed

Lines changed: 330 additions & 12 deletions

File tree

docs/Release_v0.5.1_plan.md

Lines changed: 330 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,16 @@
11
# ExaMon v0.5.1 -- Technical Plan
22

33
This document captures the planned improvements for v0.5.1, building on the
4-
v0.5.0 Kubernetes migration. It covers two main areas:
4+
v0.5.0 Kubernetes migration. It covers three main areas:
55

66
1. **Data persistence, backup, and migration strategy** -- filling the gaps
77
identified in the v0.5.0 deployment.
88
2. **KairosDB HOCON ConfigMap overlay** -- replacing the current `sed`-patching
99
entrypoint with a cleaner, Kubernetes-native configuration model.
10+
3. **Documentation gap analysis (README parity)** -- ensuring every operation
11+
documented in the Docker Compose `README.md` has a complete K8s equivalent.
1012

11-
Both items are documented here as a technical plan; implementation follows in a
13+
All items are documented here as a technical plan; implementation follows in a
1214
subsequent phase.
1315

1416
---
@@ -657,17 +659,10 @@ ingress → grafana (3000), examon-server (5000)
657659

658660
---
659661

660-
## 4. Implementation Phases
662+
## 4. Implementation Phases (Original)
661663

662-
| Phase | Items | Depends On | Effort |
663-
|-------|-------|-----------|--------|
664-
| **Phase 1** | Cassandra Medusa backups, PV reclaim policy docs, Mosquitto persistence fix | -- | Medium |
665-
| **Phase 2** | KairosDB HOCON ConfigMap overlay, updated Dockerfile | -- | Medium |
666-
| **Phase 3** | Grafana dashboard-as-code + API backup script | -- | Medium |
667-
| **Phase 4** | Security contexts on all workloads | Phase 2 (KairosDB readOnlyRootFilesystem) | Medium |
668-
| **Phase 5** | PDBs, image versioning, expanded migration docs | -- | Low-Medium |
669-
| **Phase 6** | Observability (ServiceMonitors, alert rules, dashboards) | Phase 3 (Grafana provisioning) | Medium |
670-
| **Phase 7** | HPAs, NetworkPolicies | Phase 6 (metrics for HPA decisions) | Medium |
664+
See [section 7](#7-implementation-phases-updated) for the updated phase plan
665+
that includes documentation gap items from section 6.
671666

672667
---
673668

@@ -684,3 +679,326 @@ ingress → grafana (3000), examon-server (5000)
684679
backup/restore operator for Cassandra
685680
- [Grafana sidecar provisioning](https://github.com/grafana/helm-charts/tree/main/charts/grafana#sidecar-for-dashboards) --
686681
dashboard-as-code via ConfigMap sidecar
682+
683+
---
684+
685+
## 6. Documentation Gap Analysis (README vs. K8s Docs)
686+
687+
A systematic comparison of the Docker Compose `README.md` against the full
688+
Kubernetes documentation identified six areas where the K8s docs do not cover
689+
functionality that users expect based on the README. These must be addressed
690+
so that every operation documented for Docker Compose has a clear, complete
691+
K8s equivalent.
692+
693+
### 6.1 Gap Overview
694+
695+
| # | README Feature | K8s Coverage | Severity | Environments |
696+
|---|----------------|-------------|----------|--------------|
697+
| D1 | Grafana dashboard import/provisioning | Not covered | **HIGH** | All |
698+
| D2 | Plugin management (start/stop/restart/status/logs) | Partially covered | **MEDIUM** | All |
699+
| D3 | Grafana first login & datasource setup walkthrough | Auto-provisioned but undocumented | **MEDIUM** | Mostly local |
700+
| D4 | Plugin `.conf` files replaced by Helm values | Not explained | **LOW** | All |
701+
| D5 | Custom volume paths / data persistence model | Partially covered | **LOW** | Production |
702+
| D6 | Log rotation and retention configuration | Not covered | **LOW** | Production |
703+
704+
### 6.2 D1 — Grafana Dashboard Import and Provisioning
705+
706+
**README reference:**
707+
708+
> "To import the dashboards stored in the `dashboards/` folder:
709+
> [Import dashboard](https://grafana.com/docs/grafana/latest/dashboards/export-import/#import-dashboard)
710+
> To test the installation, you can import the `Examon Test - Random Sensor.json` dashboard."
711+
712+
**Current K8s state:**
713+
714+
- The `values.yaml` auto-provisions the KairosDB **datasource** via
715+
`grafana.datasources.datasources.yaml`, which is correctly handled.
716+
- The `dashboards/Examon Test - Random Sensor.json` file exists in the repo
717+
but is never referenced by any K8s template or documentation.
718+
- The upgrading guide (`upgrading.md` Step 4) only says "Re-import Grafana
719+
dashboards through the Grafana UI or API" with no further instructions.
720+
- The v0.5.1 plan (section 1.2 GAP 2) already describes the Grafana sidecar
721+
dashboard-as-code strategy but only as a future plan, not documentation.
722+
723+
**TODO — Documentation additions:**
724+
725+
- [ ] **D1a:** Add a "Grafana Dashboards" section to `kubernetes.md` covering:
726+
- Manual dashboard import via Grafana UI (same procedure as README, with
727+
K8s-specific URL — `http://localhost:3000` for local, or the Ingress URL
728+
for production).
729+
- API-based import (useful for scripting):
730+
```bash
731+
# Port-forward to Grafana (if not using NodePort/Ingress)
732+
kubectl port-forward svc/examon-grafana 3000:80 -n examon &
733+
734+
# Import dashboard JSON
735+
curl -X POST -H "Content-Type: application/json" \
736+
-H "Authorization: Basic $(echo -n admin:<password> | base64)" \
737+
-d @dashboards/Examon\ Test\ -\ Random\ Sensor.json \
738+
http://localhost:3000/api/dashboards/db
739+
```
740+
- A note that the bundled `dashboards/Examon Test - Random Sensor.json`
741+
can be used to verify the full data pipeline after installation.
742+
743+
- [ ] **D1b:** (Implementation item, already tracked in section 1.2 GAP 2)
744+
Enable Grafana sidecar dashboard provisioning so that bundled dashboards
745+
are automatically loaded from ConfigMaps on deploy. This eliminates the
746+
manual import step for core dashboards. Required changes:
747+
- Set `grafana.sidecar.dashboards.enabled: true` and
748+
`grafana.sidecar.dashboards.label: grafana_dashboard` in `values.yaml`.
749+
- Create a ConfigMap template in the umbrella chart's `templates/` that
750+
loads `dashboards/*.json` files with the `grafana_dashboard` label.
751+
- Document: "Core dashboards are auto-provisioned on deploy. User-created
752+
dashboards must be imported manually or backed up via the API."
753+
754+
- [ ] **D1c:** Add environment-specific guidance:
755+
- **Local/Staging:** Manual UI import is acceptable for testing. The
756+
auto-provisioned test dashboard (after D1b) provides an out-of-the-box
757+
verification experience.
758+
- **Production:** Dashboard-as-code is recommended. User-created dashboards
759+
should be exported via the Grafana API and stored in version control.
760+
761+
### 6.3 D2 — Plugin Management (Start/Stop/Restart/Status/Logs)
762+
763+
**README reference:**
764+
765+
The README has a large section on `supervisorctl` commands:
766+
- `docker exec -it <container> supervisorctl start/stop/restart/status/tail <plugin>`
767+
- Opening the supervisor shell for interactive management.
768+
- Editing `supervisor.conf` for `autostart=True` to persist enable/disable.
769+
770+
**Current K8s state:**
771+
772+
- `kubernetes.md` has a brief "Managing Plugins" section showing
773+
`--set random-pub.enabled=false` and `kubectl scale`.
774+
- The `kubernetes.md` "Logs" section shows `kubectl logs` commands.
775+
- No mapping table exists to help Docker Compose users translate their
776+
existing workflows.
777+
778+
**TODO — Documentation additions:**
779+
780+
- [ ] **D2a:** Expand the "Managing Plugins" section in `kubernetes.md` with a
781+
complete mapping table between Docker Compose `supervisorctl` operations and
782+
their Kubernetes equivalents:
783+
784+
| Operation | Docker Compose | Kubernetes |
785+
|-----------|---------------|------------|
786+
| **Start** a plugin | `docker exec -it examon supervisorctl start plugins:random_pub` | `kubectl scale deployment examon-random-pub --replicas=1 -n examon` |
787+
| **Stop** a plugin | `docker exec -it examon supervisorctl stop plugins:random_pub` | `kubectl scale deployment examon-random-pub --replicas=0 -n examon` |
788+
| **Restart** a plugin | `docker exec -it examon supervisorctl restart plugins:random_pub` | `kubectl rollout restart deployment/examon-random-pub -n examon` |
789+
| **Status** of all plugins | `docker exec -it examon supervisorctl status` | `kubectl get pods -n examon` |
790+
| **Tail logs** of a plugin | `docker exec -it examon supervisorctl tail -f random_pub` | `kubectl logs -f deployment/examon-random-pub -n examon` |
791+
| **Permanent enable** | Edit `supervisor.conf`, set `autostart=True` | Set `random-pub.enabled: true` in `values-<env>.yaml` + `helm upgrade` |
792+
| **Permanent disable** | Edit `supervisor.conf`, set `autostart=False` | Set `random-pub.enabled: false` in `values-<env>.yaml` + `helm upgrade` |
793+
| **Scale** a plugin | Not supported (single container) | `kubectl scale deployment examon-mqtt2kairosdb --replicas=3 -n examon` |
794+
795+
- [ ] **D2b:** Add a note explaining the architectural difference: in Docker
796+
Compose, all plugins run inside a single container managed by `supervisord`;
797+
in Kubernetes, each plugin is an independent Deployment with its own pod(s),
798+
enabling independent scaling, restarts, and resource limits.
799+
800+
- [ ] **D2c:** Document how to get a shell inside a plugin pod for debugging:
801+
```bash
802+
kubectl exec -it deployment/examon-examon-server -n examon -- /bin/bash
803+
```
804+
This is the K8s equivalent of `docker exec -it <container> bash`.
805+
806+
### 6.4 D3 — Grafana First Login and Datasource Setup Walkthrough
807+
808+
**README reference:**
809+
810+
> "Log in to the Grafana server using your browser and the default credentials...
811+
> http://localhost:3000 ... add a new data source and select `KairosDB`.
812+
> Fill out the form with: Name: kairosdb, Url: http://kairosdb:8083, Access: Server"
813+
814+
**Current K8s state:**
815+
816+
- The KairosDB datasource is **auto-provisioned** via
817+
`grafana.datasources.datasources.yaml` in `values.yaml`. Users do NOT need
818+
to add it manually — but this is never stated.
819+
- Grafana plugins (KairosDB, plotly, piechart, etc.) are auto-installed via
820+
`grafana.plugins` in `values.yaml` — also not documented as a feature.
821+
- The `GF_PANELS_DISABLE_SANITIZE_HTML` setting is auto-configured via
822+
`grafana.env` — not documented.
823+
- No "first login" walkthrough exists for K8s users.
824+
- The production guide briefly mentions auto-provisioning and has a manual
825+
fallback, but local/staging docs skip this entirely.
826+
827+
**TODO — Documentation additions:**
828+
829+
- [ ] **D3a:** Add a "Grafana First Login" subsection to `kubernetes-local.md`
830+
(after the data pipeline verification step), covering:
831+
1. Open `http://localhost:3000` (for K3d) or the Ingress URL (production).
832+
2. Log in as `admin` with the password set via `--set grafana.adminPassword`
833+
(default: `Password` if not overridden).
834+
3. Note: the KairosDB datasource is **already configured** — no manual setup
835+
needed (unlike Docker Compose).
836+
4. Note: the KairosDB Grafana plugin and other required plugins are
837+
**auto-installed** during pod startup via the `grafana.plugins` list.
838+
5. Link to the dashboard import section (D1a) for importing test dashboards.
839+
840+
- [ ] **D3b:** Add a "What is auto-configured" callout in `kubernetes.md` listing
841+
all the things that are automatically set up (datasources, plugins, HTML
842+
sanitization) so users understand they don't need to replicate the README's
843+
manual configuration steps.
844+
845+
### 6.5 D4 — Plugin Configuration Files Replaced by Helm Values
846+
847+
**README reference:**
848+
849+
> "It is necessary to define all the properties of the `.conf` configuration file
850+
> of the plugins with the appropriate values related to the server hosting the
851+
> framework. In particular, it is necessary to define the IP addresses and ports
852+
> of the server where the KairosDB and/or MQTT broker services run..."
853+
854+
Points users to `publishers/random_pub/random_pub.conf` and per-plugin READMEs.
855+
856+
**Current K8s state:**
857+
858+
- The `change-propagation.md` explains that `.conf` files are generated from
859+
Helm values via ConfigMap templates, and `configuration.md` lists all
860+
parameters. However, nowhere does the documentation explicitly say: "the
861+
old `.conf` files in `publishers/` and `web/examon-server/` are **not used**
862+
in K8s. All configuration is driven by Helm `values.yaml`."
863+
- A user coming from Docker Compose would naturally look for `.conf` files
864+
to edit and would be confused.
865+
866+
**TODO — Documentation additions:**
867+
868+
- [ ] **D4a:** Add a paragraph to `kubernetes.md` (in the "Configuration" or
869+
a new "Configuration Model" section) explicitly stating:
870+
871+
> In the Kubernetes deployment, plugin configuration files (`.conf`) are
872+
> **generated automatically** from Helm values and mounted as ConfigMaps.
873+
> Do not edit the `.conf` files in `publishers/` or `web/examon-server/`
874+
> directly — those files are only used by the Docker Compose deployment.
875+
> All configuration is managed via `values.yaml` and environment-specific
876+
> override files.
877+
878+
- [ ] **D4b:** Add a mapping reference of key old `.conf` fields to new Helm
879+
values, either in `configuration.md` or `upgrading.md`:
880+
881+
| Old field (`random_pub.conf`) | New Helm value |
882+
|-------------------------------|---------------|
883+
| `MQTT_BROKER` | `random-pub.config.mqttBroker` |
884+
| `MQTT_PORT` | `random-pub.config.mqttPort` |
885+
| `MQTT_TOPIC` | `random-pub.config.mqttTopic` |
886+
| `MQTT_USER` | `random-pub.config.mqttUser` |
887+
| `MQTT_PASSWORD` | `random-pub.config.mqttPassword` |
888+
| `NUM_SENSORS` | `random-pub.config.numSensors` |
889+
| `TS` (sample interval) | `random-pub.config.sampleInterval` |
890+
891+
Similar tables for `mqtt2kairosdb.conf` and `server.conf`.
892+
893+
### 6.6 D5 — Data Persistence Model and Custom Volume Paths
894+
895+
**README reference:**
896+
897+
> "Two Docker volumes are created... `examon_cassandra_volume`, `examon_grafana_volume`.
898+
> To set a custom volume path, use `driver_opts` with `type: none`, `device: /path/...`"
899+
900+
**Current K8s state:**
901+
902+
- `configuration.md` documents `cassandra.datacenters.dc1.storageClass`,
903+
`grafana.persistence.enabled/size`, and `mosquitto.persistence.enabled/size`.
904+
- The production guide has a "Storage" section for Cassandra StorageClass.
905+
- Section 1 of this plan document covers backup/snapshot gaps.
906+
- However, there is no unified "Data Persistence" section explaining how K8s
907+
PVCs replace Docker volumes, what happens to data on pod restart vs.
908+
`helm uninstall` vs. cluster deletion, or how to specify custom storage
909+
paths.
910+
911+
**TODO — Documentation additions:**
912+
913+
- [ ] **D5a:** Add a "Data Persistence" section to `kubernetes.md` covering:
914+
- How PVCs replace Docker volumes (the K8s equivalent).
915+
- What data survives what operations:
916+
917+
| Event | Cassandra data | Grafana data |
918+
|-------|:---:|:---:|
919+
| Pod restart | Survives | Survives |
920+
| `helm upgrade` | Survives | Survives |
921+
| `helm uninstall` | Depends on reclaim policy | Depends on reclaim policy |
922+
| K3d cluster delete | **Lost** (local only) | **Lost** (local only) |
923+
| Node failure (production) | Survives (replicas) | Survives (if PVC on shared storage) |
924+
925+
- Environment-specific behavior:
926+
- **Local (K3d):** Uses `local-path` provisioner. Data persists across
927+
pod restarts but is lost when the K3d cluster is deleted.
928+
- **Staging (K3d):** Same as local; data is ephemeral to the cluster.
929+
- **Production:** Set `storageClass` to match infrastructure
930+
(e.g., `cinder-ssd`, `gp3`, `longhorn`). Use `reclaimPolicy: Retain`
931+
for critical data (see section 1.2 GAP 4).
932+
933+
- [ ] **D5b:** Add the K8s equivalent of "custom volume path": explain how to
934+
use a `hostPath` PersistentVolume or a `local` StorageClass for bare-metal
935+
deployments where data must reside on a specific disk/partition.
936+
937+
### 6.7 D6 — Log Rotation and Retention
938+
939+
**README reference (via docker-compose.yml):**
940+
941+
Each service in `docker-compose.yml` specifies logging configuration:
942+
```yaml
943+
logging:
944+
driver: json-file
945+
options:
946+
max-size: "10m"
947+
max-file: "1"
948+
```
949+
950+
**Current K8s state:**
951+
952+
- The `kubernetes.md` "Logs" section shows `kubectl logs` commands but says
953+
nothing about log rotation, retention, or aggregation.
954+
955+
**TODO — Documentation additions:**
956+
957+
- [ ] **D6a:** Add a note to the "Logs" section of `kubernetes.md` explaining:
958+
- In Kubernetes, container log rotation is handled by the **container
959+
runtime** (containerd/CRI-O), not by individual services. The Docker
960+
Compose `json-file` settings have no direct K8s equivalent — containerd
961+
handles rotation automatically.
962+
- K3d default: containerd keeps ~10MB per container before rotation.
963+
- Production recommendation: the default containerd rotation is sufficient
964+
for cluster-level operations, but for persistent, searchable logs,
965+
deploy a **log aggregation stack**:
966+
- **Loki + Promtail + Grafana** (lightweight, integrates with existing
967+
Grafana — recommended for ExaMon)
968+
- **EFK** (Elasticsearch + Fluentd + Kibana) — heavier but more mature
969+
- **Cloud-native** (CloudWatch, GCP Logging, Azure Monitor) — if on
970+
managed K8s
971+
972+
- [ ] **D6b:** For production, add a brief note in `kubernetes-production.md`
973+
recommending a log retention strategy and linking to the Logs section.
974+
975+
### 6.8 Implementation Priority
976+
977+
| Priority | TODO | Effort | Depends On |
978+
|----------|------|--------|-----------|
979+
| **P0** | D1a (dashboard import docs) | Low | — |
980+
| **P0** | D2a (plugin management mapping table) | Low | — |
981+
| **P0** | D3a (Grafana first login) | Low | — |
982+
| **P1** | D1b (dashboard-as-code provisioning) | Medium | Grafana sidecar config |
983+
| **P1** | D4a, D4b (conf file replacement docs) | Low | — |
984+
| **P1** | D5a (persistence model docs) | Low | — |
985+
| **P2** | D1c (per-environment dashboard guidance) | Low | D1a |
986+
| **P2** | D2b, D2c (architecture note, exec shell) | Low | — |
987+
| **P2** | D3b (auto-configured callout) | Low | — |
988+
| **P2** | D5b (custom volume paths for bare metal) | Low | — |
989+
| **P2** | D6a, D6b (logging docs) | Low | — |
990+
991+
---
992+
993+
## 7. Implementation Phases (Updated)
994+
995+
| Phase | Items | Depends On | Effort |
996+
|-------|-------|-----------|--------|
997+
| **Phase 1** | Cassandra Medusa backups, PV reclaim policy docs, Mosquitto persistence fix | — | Medium |
998+
| **Phase 2** | KairosDB HOCON ConfigMap overlay, updated Dockerfile | — | Medium |
999+
| **Phase 3** | Grafana dashboard-as-code + API backup script | — | Medium |
1000+
| **Phase 4** | Documentation gaps D1–D6 (README parity) | D1b depends on Phase 3 | Low-Medium |
1001+
| **Phase 5** | Security contexts on all workloads | Phase 2 (readOnlyRootFilesystem) | Medium |
1002+
| **Phase 6** | PDBs, image versioning, expanded migration docs | — | Low-Medium |
1003+
| **Phase 7** | Observability (ServiceMonitors, alert rules, dashboards) | Phase 3 (Grafana provisioning) | Medium |
1004+
| **Phase 8** | HPAs, NetworkPolicies | Phase 7 (metrics for HPA decisions) | Medium |

0 commit comments

Comments
 (0)