Skip to content

Consolidate alert notification system into alert_notifier role #9

@jackaltx

Description

@jackaltx

Current State

Three independent alert scripts manually deployed to monitor11:

  • ispconfig-login-alert (email + Matrix) - alerts on ANY ISPConfig login
  • matrix-synapse-alert (Matrix only) - alerts on Matrix Synapse errors
  • fail2ban-alert (email only) - alerts on high ban rate

Deployment: Manual scp + systemd setup on monitor11

Location: Scripts in /usr/local/bin/, configs in /etc/{service}/, systemd units manually created

Desired State

Unified alert_notifier role in solti-ensemble that:

  • Deploys all alert scripts via Ansible templates
  • Shares common utilities (matrix-send.py, email helpers)
  • Configurable per-alert via role variables
  • Idempotent deployment with proper systemd timer management
  • Version controlled and repeatable

Benefits

  • Repeatable deployments - Deploy to new monitoring servers easily
  • Easier to add new alerts - Template pattern for new alert types
  • Consistent configuration - All alerts use same variable structure
  • Version controlled - Track changes to alert logic over time
  • Testing - Molecule scenarios for alert deployment

Design Questions to Answer

  1. Shared utility strategy: Should matrix-send.py be a shared utility or per-alert?
  2. Template vs. scripts: Alert templates (Jinja2) vs. separate Python scripts?
  3. Dependencies: How to handle alert-specific Python dependencies (requests, yaml)?
  4. State file management: Where to store state files, cleanup strategy?
  5. Alert toggles: Should alerts be independently enabled/disabled via variables?
  6. Credentials: How to pass Matrix tokens, SMTP passwords securely?
  7. Multiple targets: Support deploying different alerts to different hosts?

Proposed Architecture

solti-ensemble/roles/alert_notifier/
├── defaults/main.yml          # Default variables for all alerts
├── templates/
│   ├── matrix-send.py.j2      # Shared Matrix notification utility
│   ├── ispconfig-login-alert.py.j2
│   ├── matrix-synapse-alert.py.j2
│   ├── fail2ban-alert.py.j2
│   └── alert-wrapper.sh.j2    # Generic wrapper for env vars
├── tasks/
│   ├── main.yml               # Orchestrates alert deployment
│   ├── matrix-send.yml        # Deploy matrix-send.py utility
│   ├── ispconfig-alert.yml    # Deploy ISPConfig alert
│   ├── matrix-synapse-alert.yml
│   └── fail2ban-alert.yml
└── molecule/
    └── default/
        └── converge.yml       # Test alert deployment

Example Usage

- name: Deploy alerts to monitor11
  hosts: monitor11
  roles:
    - role: jackaltx.solti_ensemble.alert_notifier
      vars:
        # Matrix notification config
        alert_matrix_enabled: true
        alert_matrix_homeserver: "https://matrix-web.jackaltx.com"
        alert_matrix_token: "{{ vault_matrix_token }}"
        alert_matrix_room: "#solti-verify:jackaltx.com"
        
        # ISPConfig alert
        ispconfig_alert_enabled: true
        ispconfig_alert_loki_url: "http://localhost:3100"
        ispconfig_alert_email_enabled: true
        ispconfig_alert_smtp_host: "mail.lavnet.net"
        
        # Matrix Synapse alert
        matrix_synapse_alert_enabled: true
        matrix_synapse_alert_check_interval: 10
        
        # Fail2ban alert
        fail2ban_alert_enabled: true
        fail2ban_alert_threshold: 50
        fail2ban_alert_matrix_enabled: false  # Email only

Testing Strategy

  1. Molecule scenario: Deploy all alerts to test container
  2. Verify systemd timers: Check timers are created and enabled
  3. Mock Loki responses: Test alert logic with fake data
  4. Matrix notification test: Verify matrix-send.py works
  5. Idempotency: Run role twice, ensure no changes

Related

  • Existing role: solti-ensemble/roles/alert_notifier (currently has fail2ban-alert only)
  • Documentation: mylab/docs/matrix-notifications.md
  • Manual deployment: Scripts currently on monitor11 in /usr/local/bin/
  • Matrix collection: solti-matrix-mgr for Matrix API integration

Acceptance Criteria

  • All three alert scripts deployed via Ansible role
  • Matrix notification utility shared across alerts
  • Role variables documented in README
  • Molecule test passes
  • Systemd timers properly enabled
  • Idempotent deployment (no changes on second run)
  • Migration guide from manual deployment
  • Role can be deployed to clean host successfully

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions