Skip to content

chore: replace control-plane promtail with alloy#595

Open
ma-hartma wants to merge 12 commits into
masterfrom
promtail-alloy-migration-control-plane
Open

chore: replace control-plane promtail with alloy#595
ma-hartma wants to merge 12 commits into
masterfrom
promtail-alloy-migration-control-plane

Conversation

@ma-hartma
Copy link
Copy Markdown
Contributor

@ma-hartma ma-hartma commented May 4, 2026

Description

Replaces promtail with alloy in the control-plane.
The partition replacement is treated separately in #592 and should go first.

Question Marks:

After successful replacement

  • update monitoring docs for the website
  • migration guide
    • convert does not work with helm chart
  • deprecate promtail roles

Manually tested in the mini-lab:

  • metrics are pushed to thanos, accessible via Grafana
  • labels work as documented and before
  • cutover migration from promtail to alloy in the mini-lab including cleanup
  • deprecation warnings show up
  • Gardener Logging (do not know how to test this, maybe we can do it together @simcod?)

References:

Closes #552

Used AI-Tools ✨

  • Claude Sonnet 4.6 for docs and migration guide generation and jinja templating

Release Notes

Required Actions

The `logging` and `gardener-logging` roles now support Grafana Alloy as the log collector. Promtail continues to run by default after upgrading — no breakage on upgrade — but **a deprecation warning will fire on every Ansible run** until you migrate. Promtail support will be removed in a future release, so plan your migration soon.

Follow the migration instructions:
- https://github.com/metal-stack/metal-roles/blob/master/control-plane/roles/logging/README.md#migration-from-promtail
- https://github.com/metal-stack/metal-roles/blob/master/control-plane/roles/gardener-logging/README.md#migration-from-promtail

Several migration scenarios are supported — from a hard cutover to a parallel run for verification before switching over. Note that running both simultaneously will produce duplicate log entries in Loki during the transition window.

Noteworthy

The `logging` and `gardener-logging` roles now use Grafana Alloy as the log collector instead of Promtail. Kubernetes events are collected natively without a separate event-exporter pod. Log positions are persisted across pod restarts to avoid duplicate log shipments. Alloy self-metrics are push-based via `prometheus.remote_write` — no ServiceMonitor required. New `*_promtail_migrate_cleanup` flags automate Helm release removal after cutover. Labels are identical to Promtail's defaults.

@metal-robot metal-robot Bot added this to Development May 4, 2026
@ma-hartma ma-hartma requested a review from simcod May 4, 2026 20:38
@ma-hartma ma-hartma changed the title feat: introduce alloy role to replace promtail in control-plane feat: replace control-plain promtail with alloy May 5, 2026
@ma-hartma ma-hartma changed the title feat: replace control-plain promtail with alloy feat: replace control-plane promtail with alloy May 5, 2026
@ma-hartma ma-hartma force-pushed the promtail-alloy-migration branch 2 times, most recently from 78c5ad6 to 04731eb Compare May 6, 2026 13:43
@ma-hartma ma-hartma changed the title feat: replace control-plane promtail with alloy chore: replace control-plane promtail with alloy May 6, 2026
@ma-hartma ma-hartma changed the base branch from promtail-alloy-migration to master May 8, 2026 06:23
@ma-hartma ma-hartma force-pushed the promtail-alloy-migration-control-plane branch from 9182a67 to 413e4cc Compare May 8, 2026 06:36
@ma-hartma ma-hartma force-pushed the promtail-alloy-migration-control-plane branch from 413e4cc to 8ef1443 Compare May 8, 2026 13:07
@ma-hartma ma-hartma marked this pull request as ready for review May 8, 2026 13:09
@ma-hartma ma-hartma requested review from a team as code owners May 8, 2026 13:09
@ma-hartma ma-hartma force-pushed the promtail-alloy-migration-control-plane branch from 4d6686b to 8597f82 Compare May 8, 2026 13:43
Comment thread control-plane/roles/logging/templates/alloy-config.alloy.j2
Comment thread control-plane/roles/logging/defaults/main.yaml Outdated
@ma-hartma ma-hartma marked this pull request as draft May 19, 2026 07:47
@ma-hartma ma-hartma force-pushed the promtail-alloy-migration-control-plane branch from 1040276 to d92490b Compare May 19, 2026 11:16
@ma-hartma ma-hartma marked this pull request as ready for review May 19, 2026 12:12

rule {
target_label = "job"
replacement = "monitoring/event-exporter"
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The event-exporter used with promtail had the events labelled with job=monitoring/event-exporter.
I did actually re-label the logs from job=kubernetes-events to also match the legacy job name.
I haven't found any usage in metal-roles, but other users may rely on it.

logging_ingress_dns: "loki.{{ metal_control_plane_ingress_dns }}"
logging_ingress_loki_tls: yes
logging_ingress_loki_basic_auth_user: promtail
logging_ingress_loki_basic_auth_user: promtail # TODO rename to alloy or generic
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave the old username for now.

# Each entry: {url, remote_timeout?: duration, basic_auth?: {username, password}}
# Thanos Receive endpoint is included automatically when monitoring_thanos_receive_enabled: true.
logging_alloy_prometheus_write_endpoints: >-
{{ [{'url': 'http://thanos-receive.' ~ logging_namespace ~ '.svc.cluster.local:19291/api/v1/receive'}]
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the shooted seeds, we do not have an in-cluster prometheus. Here we could actually use ServiceMonitors but I decided against it, because we would introduce a dependency on prometheus already existing before deploying the helm chart.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Introduce Alloy as promtail Replacement

1 participant