Skip to content

Commit 70e042e

Browse files
committed
feat: TFO-Agent inventory for K8S Cluster provider
1 parent 8b630ef commit 70e042e

4 files changed

Lines changed: 192 additions & 20 deletions

File tree

CHANGELOG.md

Lines changed: 23 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ All notable changes to TelemetryFlow Agent will be documented in this file.
2424
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.1/),
2525
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
2626

27-
## [1.1.7] - 2026-03-05
27+
## [1.1.7] - 2026-03-08
2828

2929
### Added
3030

@@ -33,6 +33,16 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
3333
- **Fingerprint components**: `NODE_NAME` (Kubernetes Downward API, highest priority) + OS `HostID` (`/etc/machine-id` on Linux, hardware UUID on macOS) + hostname — joined and hashed with a fixed TelemetryFlow namespace UUID
3434
- Same agent UUID is produced on every restart as long as the underlying host/node identity is unchanged
3535
- Logs `Derived stable agent ID from host fingerprint` with component labels for observability
36+
- **Kubernetes Provider Detection (`internal/collector/system/host.go`)**: Host collector now detects the Kubernetes distribution/provider and exposes it in `SystemInfo`
37+
- New `detectK8sProvider()` function covers all 15 provider types matching `K8sProviderEnum` on the platform backend
38+
- **Managed cloud**: `eks` (AWS), `gke` (GCP), `aks` (Azure), `ack` (Alibaba Cloud), `cce` (Huawei Cloud) — detected from cloud-injected environment variables
39+
- **OpenShift variants** (priority order): `microshift``openshift``okd` — detected via env vars and host filesystem paths
40+
- **Lightweight/local distributions**: `k3s`, `rancher` (RKE/RKE2), `minikube`, `kind` — detected via `CATTLE_*` env vars and `/var/lib/rancher/*` host paths
41+
- **Platform distributions**: `kubesphere` — detected via `KUBESPHERE_NAMESPACE` env var
42+
- **Generic fallback**: `self-managed` when `KUBERNETES_SERVICE_HOST` is set but no specific distribution is identified
43+
- Host filesystem paths checked both directly and under `TELEMETRYFLOW_HOST_ROOT` prefix — detection works correctly inside DaemonSet containers
44+
- Returns `(false, "")` when not running in a Kubernetes environment at all
45+
- New `IsKubernetes bool` and `K8sProvider string` fields added to `collector.SystemInfo` struct
3646

3747
### Changed
3848

@@ -390,18 +400,18 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
390400

391401
## Version History
392402

393-
| Version | Date | OTEL SDK | Description |
394-
| ------- | ---------- | -------- | --------------------------------------------------------------------------------------------------------- |
395-
| 1.1.7 | 2026-03-05 | v1.40.0 | Stable agent identity via UUIDv5 host fingerprint; fix SyncKubernetesState; K8s gopsutil host paths |
396-
| 1.1.6 | 2026-02-21 | v1.40.0 | Go 1.25.7, OTEL SDK v1.40.0, build-tag lint fixes, errcheck/staticcheck cleanup |
397-
| 1.1.5 | 2026-02-19 | v1.39.0 | Docker container collector, cAdvisor scraper, CPU fix macOS, tags/labels propagation |
398-
| 1.1.4 | 2026-02-11 | v1.39.0 | eBPF collector (28 metrics), Cilium Hubble integration, 6 BPF programs, kernel-level observability |
399-
| 1.1.3 | 2026-02-04 | v1.39.0 | Network retransmit metrics, container name/image detection, page faults, IOPS, system calls |
400-
| 1.1.2 | 2026-01-03 | v1.39.0 | OSS observability (SigNoz, Coroot, HyperDX, OpenObserve, Netdata), APM (Dynatrace, Instana, ManageEngine) |
401-
| 1.1.1 | 2024-12-29 | v1.39.0 | Enterprise integrations (GCP, Azure, Alibaba, Proxmox, VMware, Nutanix, Cisco, SNMP, MQTT, eBPF) |
402-
| 1.1.0 | 2024-12-27 | v1.39.0 | OTEL SDK standardization, aligned with TFO-Go-SDK & TFO-Collector |
403-
| 1.0.1 | 2024-12-17 | - | Docker workflow, SBOM, multi-platform support |
404-
| 1.0.0 | 2024-12-17 | - | Initial release |
403+
| Version | Date | OTEL SDK | Description |
404+
| ------- | ---------- | -------- | ----------------------------------------------------------------------------------------------------------------- |
405+
| 1.1.7 | 2026-03-08 | v1.40.0 | Stable agent identity via UUIDv5 host fingerprint; K8s provider detection (15 providers); fix SyncKubernetesState |
406+
| 1.1.6 | 2026-02-21 | v1.40.0 | Go 1.25.7, OTEL SDK v1.40.0, build-tag lint fixes, errcheck/staticcheck cleanup |
407+
| 1.1.5 | 2026-02-19 | v1.39.0 | Docker container collector, cAdvisor scraper, CPU fix macOS, tags/labels propagation |
408+
| 1.1.4 | 2026-02-11 | v1.39.0 | eBPF collector (28 metrics), Cilium Hubble integration, 6 BPF programs, kernel-level observability |
409+
| 1.1.3 | 2026-02-04 | v1.39.0 | Network retransmit metrics, container name/image detection, page faults, IOPS, system calls |
410+
| 1.1.2 | 2026-01-03 | v1.39.0 | OSS observability (SigNoz, Coroot, HyperDX, OpenObserve, Netdata), APM (Dynatrace, Instana, ManageEngine) |
411+
| 1.1.1 | 2024-12-29 | v1.39.0 | Enterprise integrations (GCP, Azure, Alibaba, Proxmox, VMware, Nutanix, Cisco, SNMP, MQTT, eBPF) |
412+
| 1.1.0 | 2024-12-27 | v1.39.0 | OTEL SDK standardization, aligned with TFO-Go-SDK & TFO-Collector |
413+
| 1.0.1 | 2024-12-17 | - | Docker workflow, SBOM, multi-platform support |
414+
| 1.0.0 | 2024-12-17 | - | Initial release |
405415

406416
## Upgrade Guide
407417

deploy/kubernetes/daemonset.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -148,6 +148,12 @@ spec:
148148
- name: TELEMETRYFLOW_PROMETHEUS_PORT
149149
value: "8888"
150150

151+
# Host filesystem root — used by cloud/cluster provider auto-detection heuristics
152+
# (checks /var/lib/rancher/rke2, /var/lib/rancher/k3s, /sys/class/dmi/id/*, etc.)
153+
# Must match the mountPath of the 'root' hostPath volume below.
154+
- name: TELEMETRYFLOW_HOST_ROOT
155+
value: /host/root
156+
151157
# Cluster and environment tags for OTEL resource attributes
152158
- name: CLUSTER_NAME
153159
value: "" # override with your cluster name, or auto-detected from hostname

internal/collector/collector.go

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -239,6 +239,8 @@ type SystemInfo struct {
239239
ContainerImage string `json:"containerImage,omitempty"`
240240
IsVirtualized bool `json:"isVirtualized,omitempty"`
241241
VirtualizationType string `json:"virtualizationType,omitempty"` // kvm, vmware, xen, etc.
242+
IsKubernetes bool `json:"isKubernetes,omitempty"`
243+
K8sProvider string `json:"k8sProvider,omitempty"` // eks, gke, aks, ack, cce, k3s, kind, minikube, rancher, openshift, okd, microshift, kubesphere, self-managed
242244
CloudProvider string `json:"cloudProvider,omitempty"` // aws, gcp, azure, etc.
243245
CloudInstanceID string `json:"cloudInstanceId,omitempty"`
244246
CloudInstanceType string `json:"cloudInstanceType,omitempty"`

internal/collector/system/host.go

Lines changed: 161 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -852,6 +852,9 @@ func (c *HostCollector) GetSystemInfo() (*collector.SystemInfo, error) {
852852

853853
info.IsVirtualized, info.VirtualizationType = detectVirtualization()
854854

855+
// Kubernetes provider detection
856+
info.IsKubernetes, info.K8sProvider = detectK8sProvider()
857+
855858
// Cloud metadata
856859
info.CloudProvider, info.CloudInstanceID, info.CloudInstanceType,
857860
info.CloudRegion, info.CloudZone = detectCloudMetadata()
@@ -887,6 +890,109 @@ func parseUint64(s string) uint64 {
887890
return v
888891
}
889892

893+
// detectK8sProvider detects whether the agent is running in a Kubernetes
894+
// environment and returns the specific distribution/provider. The returned
895+
// provider matches K8sProviderEnum values on the platform backend:
896+
// eks, gke, aks, ack, cce, k3s, kind, minikube, rancher, openshift, okd,
897+
// microshift, kubesphere, self-managed. Returns (false, "") when no
898+
// Kubernetes environment is detected.
899+
//
900+
// Detection priority:
901+
// 1. Managed cloud providers (env vars injected by the cloud control plane)
902+
// 2. OpenShift variants (MicroShift → OpenShift → OKD)
903+
// 3. Lightweight/local distributions (k3s → Rancher → minikube → KIND)
904+
// 4. Platform distributions (KubeSphere)
905+
// 5. Generic in-cluster fallback via KUBERNETES_SERVICE_HOST
906+
func detectK8sProvider() (isK8s bool, provider string) {
907+
// hostRoot is the host filesystem mount point used when running as a
908+
// DaemonSet (e.g. /hostfs). Falls back to empty string so that paths
909+
// are checked directly when running outside a container.
910+
hostRoot := os.Getenv("TELEMETRYFLOW_HOST_ROOT")
911+
912+
hostStat := func(path string) bool {
913+
if _, err := os.Stat(path); err == nil {
914+
return true
915+
}
916+
if hostRoot != "" {
917+
if _, err := os.Stat(hostRoot + path); err == nil {
918+
return true
919+
}
920+
}
921+
return false
922+
}
923+
924+
// === Managed Cloud Providers ===
925+
926+
// EKS (Amazon Elastic Kubernetes Service)
927+
if os.Getenv("AWS_REGION") != "" || os.Getenv("EKS_CLUSTER_NAME") != "" {
928+
return true, "eks"
929+
}
930+
// GKE (Google Kubernetes Engine)
931+
if os.Getenv("GOOGLE_CLOUD_PROJECT") != "" || os.Getenv("GKE_CLUSTER_NAME") != "" {
932+
return true, "gke"
933+
}
934+
// AKS (Azure Kubernetes Service)
935+
if os.Getenv("AKS_CLUSTER_NAME") != "" || os.Getenv("AZURE_SUBSCRIPTION_ID") != "" {
936+
return true, "aks"
937+
}
938+
// ACK (Alibaba Cloud Container Service for Kubernetes)
939+
if os.Getenv("ALIBABA_CLOUD_ACCESS_KEY_ID") != "" || os.Getenv("ACK_CLUSTER_ID") != "" {
940+
return true, "ack"
941+
}
942+
// CCE (Huawei Cloud Container Engine)
943+
if os.Getenv("HUAWEICLOUD_SDK_TYPE") != "" || os.Getenv("CCE_CLUSTER_ID") != "" {
944+
return true, "cce"
945+
}
946+
947+
// === OpenShift Variants (MicroShift first — it's a subset of OpenShift) ===
948+
949+
if hostStat("/var/lib/microshift") {
950+
return true, "microshift"
951+
}
952+
if os.Getenv("OPENSHIFT_BUILD_NAMESPACE") != "" || os.Getenv("OPENSHIFT_DEPLOYMENT_NAME") != "" ||
953+
hostStat("/etc/openshift") {
954+
return true, "openshift"
955+
}
956+
// OKD (community OpenShift)
957+
if os.Getenv("OKD_CLUSTER") != "" || hostStat("/etc/okd") {
958+
return true, "okd"
959+
}
960+
961+
// === Lightweight / Local Distributions ===
962+
963+
// k3s (must be checked before generic Rancher — k3s lives under /var/lib/rancher/k3s)
964+
if hostStat("/var/lib/rancher/k3s") {
965+
return true, "k3s"
966+
}
967+
// Rancher (RKE / RKE2) — CATTLE_* vars are injected by Rancher into pods
968+
if os.Getenv("CATTLE_CLUSTER_AGENT_PORT") != "" || os.Getenv("CATTLE_SERVER") != "" ||
969+
hostStat("/var/lib/rancher/rke2") || hostStat("/var/lib/rancher") {
970+
return true, "rancher"
971+
}
972+
// minikube
973+
if os.Getenv("MINIKUBE_ACTIVE_DOCKERD") != "" || os.Getenv("MINIKUBE_HOME") != "" {
974+
return true, "minikube"
975+
}
976+
// KIND (Kubernetes IN Docker)
977+
if os.Getenv("KIND_CLUSTER_NAME") != "" {
978+
return true, "kind"
979+
}
980+
981+
// === Platform Distributions ===
982+
983+
if os.Getenv("KUBESPHERE_NAMESPACE") != "" {
984+
return true, "kubesphere"
985+
}
986+
987+
// === Generic in-cluster fallback ===
988+
// KUBERNETES_SERVICE_HOST is injected by the kubelet into every pod.
989+
if os.Getenv("KUBERNETES_SERVICE_HOST") != "" {
990+
return true, "self-managed"
991+
}
992+
993+
return false, ""
994+
}
995+
890996
// detectContainer checks if running inside a container
891997
func detectContainer() bool {
892998
// Check for Docker
@@ -1019,8 +1125,27 @@ func detectVirtualization() (bool, string) {
10191125

10201126
// detectCloudMetadata attempts to detect cloud provider and instance metadata
10211127
func detectCloudMetadata() (provider, instanceID, instanceType, region, zone string) {
1128+
// hostRoot is the mount point for the host filesystem inside the container.
1129+
// DaemonSet mounts host / → /host/root and sets TELEMETRYFLOW_HOST_ROOT=/host/root.
1130+
// k8s-collector Deployment mounts host / → /hostfs and sets TELEMETRYFLOW_HOST_ROOT=/hostfs.
1131+
// Falls back to empty string (direct paths) when running outside a container.
1132+
hostRoot := os.Getenv("TELEMETRYFLOW_HOST_ROOT")
1133+
1134+
// hostStat checks the path directly and, if that fails, prefixed by hostRoot.
1135+
hostStat := func(path string) bool {
1136+
if _, err := os.Stat(path); err == nil {
1137+
return true
1138+
}
1139+
if hostRoot != "" {
1140+
if _, err := os.Stat(hostRoot + path); err == nil {
1141+
return true
1142+
}
1143+
}
1144+
return false
1145+
}
1146+
10221147
// AWS detection
1023-
if _, err := os.Stat("/sys/hypervisor/uuid"); err == nil {
1148+
if hostStat("/sys/hypervisor/uuid") {
10241149
if data, err := os.ReadFile("/sys/hypervisor/uuid"); err == nil {
10251150
if strings.HasPrefix(strings.ToLower(string(data)), "ec2") {
10261151
provider = "aws"
@@ -1033,22 +1158,51 @@ func detectCloudMetadata() (provider, instanceID, instanceType, region, zone str
10331158
}
10341159

10351160
// GCP detection
1036-
if data, err := os.ReadFile("/sys/class/dmi/id/product_name"); err == nil {
1037-
if strings.Contains(strings.ToLower(string(data)), "google") {
1038-
provider = "gcp"
1161+
for _, p := range []string{"/sys/class/dmi/id/product_name", hostRoot + "/sys/class/dmi/id/product_name"} {
1162+
if data, err := os.ReadFile(p); err == nil {
1163+
if strings.Contains(strings.ToLower(string(data)), "google") {
1164+
provider = "gcp"
1165+
}
1166+
break
10391167
}
10401168
}
10411169
if os.Getenv("GOOGLE_CLOUD_PROJECT") != "" {
10421170
provider = "gcp"
10431171
}
10441172

10451173
// Azure detection
1046-
if data, err := os.ReadFile("/sys/class/dmi/id/sys_vendor"); err == nil {
1047-
if strings.Contains(strings.ToLower(string(data)), "microsoft") {
1048-
provider = "azure"
1174+
for _, p := range []string{"/sys/class/dmi/id/sys_vendor", hostRoot + "/sys/class/dmi/id/sys_vendor"} {
1175+
if data, err := os.ReadFile(p); err == nil {
1176+
if strings.Contains(strings.ToLower(string(data)), "microsoft") {
1177+
provider = "azure"
1178+
}
1179+
break
10491180
}
10501181
}
10511182

1183+
// Rancher / RKE / RKE2 detection
1184+
// CATTLE_* vars are injected by Rancher into pods in the cattle-system namespace.
1185+
// For other namespaces we rely on host filesystem heuristics via hostRoot.
1186+
if os.Getenv("CATTLE_CLUSTER_AGENT_PORT") != "" || os.Getenv("CATTLE_SERVER") != "" {
1187+
provider = "rancher"
1188+
return provider, instanceID, instanceType, region, zone
1189+
}
1190+
// k3s lives under /var/lib/rancher/k3s (check before generic rancher path)
1191+
if hostStat("/var/lib/rancher/k3s") {
1192+
provider = "k3s"
1193+
return provider, instanceID, instanceType, region, zone
1194+
}
1195+
// RKE2
1196+
if hostStat("/var/lib/rancher/rke2") {
1197+
provider = "rancher"
1198+
return provider, instanceID, instanceType, region, zone
1199+
}
1200+
// RKE1 — /var/lib/rancher exists but neither k3s nor rke2 sub-dirs
1201+
if hostStat("/var/lib/rancher") {
1202+
provider = "rancher"
1203+
return provider, instanceID, instanceType, region, zone
1204+
}
1205+
10521206
return provider, instanceID, instanceType, region, zone
10531207
}
10541208

0 commit comments

Comments
 (0)