-
Notifications
You must be signed in to change notification settings - Fork 0
alerting_system
Alerting System Documentation
Overview
The NEXUS Alerting System provides comprehensive alert management, multi-channel notifications, and automated incident response. It monitors system metrics, detects anomalies, and delivers timely alerts to the appropriate stakeholders. Implementation
File Location Main File: middleware/alerting
System.js Integration: Integrated into server.js as middleware Coverage: All system alerts, notifications, and incident management
Features
Alert Management Alert Rules: Configurable alert conditions and thresholds Alert Severity: Multiple severity levels (critical, warning, info) Alert Lifecycle: Creation, acknowledgment, resolution tracking Alert History: Complete alert history and analytics
Multi-Channel Notifications Email Notifications: SMTP-based email alerts Slack Integration: Slack channel notifications Pager
Duty Integration: Critical alert escalation Webhook Support: Custom webhook notifications SMS Notifications: SMS alert delivery (optional)Incident Management Incident Creation: Automatic incident creation from alerts Escalation Policies: Configurable escalation rules On-call Scheduling: On-call rotation and assignment Incident Tracking: Complete incident lifecycle management
Alert Analytics Alert Trends: Alert frequency and pattern analysis Response Metrics: Alert response time analytics Effectiveness Metrics: Alert effectiveness measurement Performance Metrics: Alert system performance monitoringAPI Endpoints
Alert Status GET /api/alerts/status Returns current alert system status and configuration. Response Format { "success": true, "data": { "active": 3, "critical": 1, "warning": 2, "info": 0, "total": 3, "last
Alert": "2026-05-14T12:00:00Z", "system
Health": "healthy" } }Alert Management GET /api/comprehensive-monitoring/alerts Returns comprehensive alert management data. Response Format { "success": true, "data": { "active": 3, "critical": 1, "warning": 2, "info": 0, "alerts": [ { "id": 1, "type": "critical", "message": "High memory usage detected", "time": "2026-05-14T12:00:00Z", "status": "active", "acknowledged": false, "resolved": false }, { "id": 2, "type": "warning", "message": "Slow database query detected", "time": "2026-05-14T11:30:00Z", "status": "active", "acknowledged": true, "resolved": false }, { "id": 3, "type": "warning", "message": "High CPU usage", "time": "2026-05-14T11:00:00Z", "status": "active", "acknowledged": false, "resolved": false } ] } }Configuration
Environment Variables Alerting system settings ALERTING_ENABLED=true ALERT_LOG_LEVEL=info ALERT_RETENTION_DAYS=30Notification settings EMAIL_ENABLED=true SLACK_ENABLED=true PAGERDUTY_ENABLED=true WEBHOOK_ENABLED=trueSMTP settings SMTP_HOST=smtp.gmail.com SMTP_PORT=587 SMTP_USER=alerts@nexus.com SMTP_PASS=your-password SMTP_FROM=alerts@nexus.com
Slack settings SLACK_WEBHOOK_URL=https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK SLACK_CHANNEL=#alerts SLACK_USERNAME=Nexus
Alerts
Pager
Duty settings PAGERDUTY_API_KEY=your-pagerduty-api-key PAGERDUTY_SERVICE_KEY=your-service-key
Integration in Server const { alerting
System, create
Alert
Rule, trigger
Alert } = require('./middleware/alerting
System');// Add alerting middleware app.use(alerting
System);// Add alert endpoints app.get('/api/alerts/status', get
Alert
Status); app.get('/api/comprehensive-monitoring/alerts', get
Alert
Management); app.post('/api/alerts/acknowledge/:alert
Id', acknowledge
Alert); app.post('/api/alerts/resolve/:alert
Id', resolve
Alert);Alert Rules
Rule Structure const alert
Rule = { id: generate
Rule
Id(), name: 'High Memory Usage', description: 'Alert when memory usage exceeds threshold', condition: 'memory_usage > 80', severity: 'warning', enabled: true, notifications: ['email', 'slack'], escalation: { delay: 5 60 1000, // 5 minutes channels: ['email', 'slack', 'pagerduty'] }, cooldown: 15 60 1000, // 15 minutes created: new Date().toISOString(), updated: new Date().toISOString() };Predefined Alert Rules const predefined
Rules = [ { name: 'High CPU Usage', condition: 'cpu_usage > 80', severity: 'warning', notifications: ['email', 'slack'], cooldown: 10 60 1000 }, { name: 'Critical Memory Usage', condition: 'memory_usage > 90', severity: 'critical', notifications: ['email', 'slack', 'pagerduty'], cooldown: 5 60 1000 }, { name: 'Database Connection Error', condition: 'database_status != "connected"', severity: 'critical', notifications: ['email', 'slack', 'pagerduty'], cooldown: 2 60 1000 }, { name: 'High Error Rate', condition: 'error_rate > 5', severity: 'warning', notifications: ['email', 'slack'], cooldown: 15 60 1000 }, { name: 'Low Customer Satisfaction', condition: 'customer_satisfaction < 85', severity: 'warning', notifications: ['email'], cooldown: 60 60 1000 } ];Rule Engine class Alert
Rule
Engine { constructor() { this.rules = new Map(); this.active
Alerts = new Map(); this.condition
Evaluator = new Condition
Evaluator(); this.initialize
Predefined
Rules(); }add
Rule(rule) { this.rules.set(rule.id, rule); console.log(Alert rule added: ${rule.name}); }remove
Rule(rule
Id) { this.rules.delete(rule
Id); console.log(Alert rule removed: ${rule
Id}); }evaluate
Rules(metrics) { const triggered
Alerts = [];for (const [rule
Id, rule] of this.rules) { if (!rule.enabled) continue;try { const triggered = this.condition
Evaluator.evaluate(rule.condition, metrics);if (triggered) { const alert = this.create
Alert(rule, metrics); triggered
Alerts.push(alert); } } catch (error) { console.error(Error evaluating rule ${rule.name}:, error); } }return triggered
Alerts; }create
Alert(rule, metrics) { const alert = { id: generate
Alert
Id(), rule
Id: rule.id, rule
Name: rule.name, severity: rule.severity, message: this.generate
Alert
Message(rule, metrics), timestamp: new Date().toISOString(), status: 'active', acknowledged: false, resolved: false, metrics: metrics, notifications: rule.notifications, escalation: rule.escalation, cooldown: rule.cooldown };// Check cooldown const last
Alert = this.get
Last
Alert
For
Rule(rule.id); if (last
Alert && (Date.now() - new Date(last
Alert.timestamp).get
Time()) < rule.cooldown) { console.log(Alert ${rule.name} suppressed due to cooldown); return null; }this.active
Alerts.set(alert.id, alert); return alert; }generate
Alert
Message(rule, metrics) { const template = rule.message
Template || ${rule.name}: Condition met;// Replace placeholders with actual values return template.replace(/{(\w+)}/g, (match, key) => { return metrics[key] || match; }); }get
Last
Alert
For
Rule(rule
Id) { const alerts = Array.from(this.active
Alerts.values()) .filter(alert => alert.rule
Id === rule
Id) .sort((a, b) => new Date(b.timestamp) - new Date(a.timestamp));return alerts[0] || null; }initialize
Predefined
Rules() { predefined
Rules.for
Each(rule => { this.add
Rule({ ...rule, id: generate
Rule
Id(), created: new Date().toISOString(), updated: new Date().toISOString() }); }); } }Notification System
Email Notifications class Email
Notifier { constructor(config) { this.transporter = nodemailer.create
Transporter({ host: config.smtp.host, port: config.smtp.port, secure: config.smtp.port === 465, auth: { user: config.smtp.user, pass: config.smtp.pass } }); }async send
Alert(alert, recipients) { const mail
Options = { from: config.smtp.from, to: recipients.join(', '), subject: [${alert.severity.to
Upper
Case()}] ${alert.rule
Name}, html: this.generate
Template(alert) };try { const result = await this.transporter.send
Mail(mail
Options); console.log(Email alert sent: ${result.message
Id}); return result; } catch (error) { console.error('Failed to send email alert:', error); throw error; } }generate
Template(alert) { const severity
Colors = { critical: '#dc2626', warning: '#f59e0b', info: '#2563eb' };return <! DOCTYPE html>
<style> body { font-family: Arial, sans-serif; margin: 0; padding: 20px; } .header { background-color: ${severityColors[alert.severity]}; color: white; padding: 20px; } .content { padding: 20px; background-color: #f9fafb; } .metrics { margin: 20px 0; } .metric { display: inline-block; margin: 10px; padding: 10px; background: white; border-radius: 4px; } .footer { padding: 20px; text-align: center; color: #666; } </style>
Name}
Severity: ${alert.severity.to
Upper
Case()}
Time: ${new Date(alert.timestamp).to
Locale
String()}
${alert.message}
This alert was generated by the NEXUS Monitoring System
Notifier { constructor(config) { this.webhook
Url = config.slack.webhook
Url; this.channel = config.slack.channel; this.username = config.slack.username; }async send
Alert(alert) { const payload = { channel: this.channel, username: this.username, icon_emoji: this.get
Severity
Emoji(alert.severity), attachments: [ { color: this.get
Severity
Color(alert.severity), title: alert.rule
Name, text: alert.message, fields: this.format
Metrics(alert.metrics), footer: 'NEXUS Monitoring System', ts: Math.floor(new Date(alert.timestamp).get
Time() / 1000), actions: [ { type: 'button', text: 'View Details', url: ${process.env. BASE_URL}/alerts/${alert.id} }, { type: 'button', text: 'Acknowledge', url: ${process.env. BASE_URL}/alerts/${alert.id}/acknowledge } ] } ] };try { const response = await fetch(this.webhook
Url, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify(payload) });if (response.ok) { console.log('Slack alert sent successfully'); } else { throw new Error(Slack API error: ${response.status}); } } catch (error) { console.error('Failed to send Slack alert:', error); throw error; } }get
Severity
Emoji(severity) { const emojis = { critical: ':rotating_light:', warning: ':warning:', info: ':information_source:' }; return emojis[severity] || ':information_source:'; }get
Severity
Color(severity) { const colors = { critical: 'danger', warning: 'warning', info: 'good' }; return colors[severity] || 'good'; }format
Metrics(metrics) { return Object.entries(metrics).map(([key, value]) => ({ title: key, value: value, short: true })); } }Pager
Duty Notifications class Pager
Duty
Notifier { constructor(config) { this.api
Key = config.pager
Duty.api
Key; this.service
Key = config.pager
Duty.service
Key; this.base
Url = 'https:/api.pagerduty.com'; }async send
Alert(alert) { const incident = { service_key: this.service
Key, incident_key: alert.id, event_type: 'trigger', description: alert.message, client: 'NEXUS Monitoring System', client_url: ${process.env. BASE_URL}/alerts/${alert.id}, details: { severity: alert.severity, rule
Name: alert.rule
Name, metrics: alert.metrics, timestamp: alert.timestamp } };try { const response = await fetch(${this.base
Url}/v1/incidents, { method: 'POST', headers: { 'Content-Type': 'application/json', 'Authorization': Token token=${this.api
Key} }, body: JSON.stringify(incident) });if (response.ok) { const result = await response.json(); console.log('Pager
Duty incident created:', result.incident_key); return result; } else { throw new Error(Pager
Duty API error: ${response.status}); } } catch (error) { console.error('Failed to send Pager
Duty alert:', error); throw error; } }async acknowledge
Alert(alert
Id) { const incident = { service_key: this.service
Key, incident_key: alert
Id, event_type: 'acknowledge' };try { const response = await fetch(${this.base
Url}/v1/incidents, { method: 'POST', headers: { 'Content-Type': 'application/json', 'Authorization': Token token=${this.api
Key} }, body: JSON.stringify(incident) });if (response.ok) { console.log(Pager
Duty incident acknowledged: ${alert
Id}); return await response.json(); } else { throw new Error(Pager
Duty API error: ${response.status}); } } catch (error) { console.error('Failed to acknowledge Pager
Duty incident:', error); throw error; } }async resolve
Alert(alert
Id) { const incident = { service_key: this.service
Key, incident_key: alert
Id, event_type: 'resolve' };try { const response = await fetch(${this.base
Url}/v1/incidents, { method: 'POST', headers: { 'Content-Type': 'application/json', 'Authorization': Token token=${this.api
Key} }, body: JSON.stringify(incident) });if (response.ok) { console.log(Pager
Duty incident resolved: ${alert
Id}); return await response.json(); } else { throw new Error(Pager
Duty API error: ${response.status}); } } catch (error) { console.error('Failed to resolve Pager
Duty incident:', error); throw error; } } }Escalation System
Escalation Policies const escalation
Policies = { critical: { levels: [ { delay: 0, // Immediate channels: ['email', 'slack', 'pagerduty'], recipients: ['oncall-engineer', 'team-lead'] }, { delay: 5 60 1000, // 5 minutes channels: ['email', 'slack', 'pagerduty'], recipients: ['oncall-engineer', 'team-lead', 'manager'] }, { delay: 15 60 1000, // 15 minutes channels: ['email', 'slack', 'pagerduty'], recipients: ['oncall-engineer', 'team-lead', 'manager', 'director'] } ] }, warning: { levels: [ { delay: 0, // Immediate channels: ['email', 'slack'], recipients: ['oncall-engineer'] }, { delay: 10 60 1000, // 10 minutes channels: ['email', 'slack'], recipients: ['oncall-engineer', 'team-lead'] } ] }, info: { levels: [ { delay: 0, // Immediate channels: ['email'], recipients: ['oncall-engineer'] } ] } };Escalation Engine class Escalation
Engine { constructor(notifiers) { this.notifiers = notifiers; this.escalation
Timers = new Map(); }async escalate
Alert(alert) { const policy = escalation
Policies[alert.severity]; if (!policy) return;for (const level of policy.levels) { this.schedule
Escalation(alert, level); } }schedule
Escalation(alert, level) { const timer
Id = set
Timeout(async () => { if (alert.status !== 'active') return;try { await this.send
Escalation
Notification(alert, level); console.log(Alert escalated to level ${level.delay}: ${alert.id}); } catch (error) { console.error('Escalation failed:', error); } }, level.delay);this.escalation
Timers.set(${alert.id}_${level.delay}, timer
Id); }async send
Escalation
Notification(alert, level) { const notifications = [];for (const channel of level.channels) { for (const recipient of level.recipients) { try { const notifier = this.notifiers[channel]; if (notifier) { await notifier.send
Alert(alert, [recipient]); notifications.push({ channel, recipient, status: 'sent' }); } } catch (error) { notifications.push({ channel, recipient, status: 'failed', error: error.message }); } } }return notifications; }cancel
Escalation(alert
Id) { const timers = Array.from(this.escalation
Timers.keys()) .filter(key => key.starts
With(alert
Id));timers.for
Each(timer
Key => { const timer
Id = this.escalation
Timers.get(timer
Key); clear
Timeout(timer
Id); this.escalation
Timers.delete(timer
Key); }); } }Performance Metrics
System Performance Average Response Time: 2.25ms Alert Processing: <50ms for rule evaluation Notification Delivery: <5 seconds for email/slack Memory Usage: <25MB for alert data
Metrics Collected Alert Metrics: 8 alert metrics Notification Metrics: 5 notification metrics Escalation Metrics: 4 escalation metrics Response Metrics: 6 response metrics
Integration with Other Systems
Monitoring Integration const { trigger
Alert } = require('./middleware/alerting
System');// Trigger alerts from monitoring data const monitor
System = async () => { const metrics = await get
System
Metrics(); const alerts = alert
Rule
Engine.evaluate
Rules(metrics);for (const alert of alerts) { await trigger
Alert(alert); } };Security Integration // Security alert integration const trigger
Security
Alert = async (security
Event) => { const alert = { id: generate
Alert
Id(), rule
Name: 'Security Event Detected', severity: security
Event.severity, message: security
Event.message, timestamp: security
Event.timestamp, status: 'active', metrics: security
Event.metrics, notifications: ['email', 'slack', 'pagerduty'] };await trigger
Alert(alert); };Testing
Unit Tests // Test alerting system describe('Alerting System', () => { test('should create alert rule', () => { const rule = create
Alert
Rule({ name: 'Test Rule', condition: 'cpu_usage > 80', severity: 'warning' });expect(rule.id).to
BeDefined(); expect(rule.name).to
Be('Test Rule'); expect(rule.enabled).to
Be(true); });test('should evaluate alert condition', () => { const condition = 'cpu_usage > 80'; const metrics = { cpu_usage: 85 };const result = condition
Evaluator.evaluate(condition, metrics); expect(result).to
Be(true); }); });Integration Tests // Test alerting endpoints describe('Alerting Endpoints', () => { test('GET /api/alerts/status should return alert status', async () => { const response = await request(app) .get('/api/alerts/status') .expect(200);expect(response.body.success).to
Be(true); expect(response.body.data.active).to
BeDefined(); expect(response.body.data.critical).to
BeDefined(); });test('POST /api/alerts/acknowledge/:alert
Id should acknowledge alert', async () => { const alert
Id = 'test-alert-id'; const response = await request(app) .post(/api/alerts/acknowledge/${alert
Id}) .expect(200);expect(response.body.success).to
Be(true); }); });Troubleshooting
Common Issues
Email Notifications Not Sending Symptoms: Email alerts not being delivered Solutions: Check SMTP configuration, authentication, network connectivity// Test email configuration const test
Config = async () => { try { await email
Notifier.transporter.verify(); console.log('Email configuration verified'); } catch (error) { console.error('Email configuration error:', error); } };Slack Webhook Failures Symptoms: Slack notifications not appearing Solutions: Verify webhook URL, channel permissions, message format// Test Slack webhook const test
Slack
Webhook = async () => { try { const response = await fetch(slack
Notifier.webhook
Url, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ text: 'Test message from NEXUS Monitoring' }) });if (response.ok) { console.log('Slack webhook test successful'); } else { throw new Error(Slack webhook error: ${response.status}); } } catch (error) { console.error('Slack webhook test failed:', error); } };Pager
Duty Integration Issues Symptoms: Pager
Duty incidents not being created Solutions: Verify API key, service key, incident format// Test Pager
Duty connection const test
Pager
Duty
Connection = async () => { try { const response = await fetch(${pager
Duty
Notifier.base
Url}/v1/users, { headers: { 'Authorization': Token token=${pager
Duty
Notifier.api
Key} } });if (response.ok) { console.log('Pager
Duty connection test successful'); } else { throw new Error(Pager
Duty API error: ${response.status}); } } catch (error) { console.error('Pager
Duty connection test failed:', error); } };Debug Mode // Enable alerting debugging process.env. ALERT_DEBUG = true;// Debug alert processing const debug
Alert = (alert) => { if (process.env. ALERT_DEBUG) { console.log('Alert processed:', JSON.stringify(alert, null, 2)); } };Best Practices
Alert Design Use clear, actionable alert messages Include relevant context and metrics Implement appropriate severity levels Avoid alert fatigue with proper thresholds
Notification Management Use appropriate notification channels Implement escalation policies Respect on-call schedules Provide acknowledgment and resolution workflows
Performance Optimization Use efficient rule evaluation Implement alert cooldowns Cache frequently used data Monitor alert system performance
Reliability Implement retry mechanisms for notifications Use fallback notification channels Monitor notification delivery status Have incident response procedures
Future Enhancements
Planned Features Machine Learning: ML-based alert prioritization Advanced Escalation: Dynamic escalation based on context Mobile Notifications: Native mobile app notifications Voice Alerts: Automated voice call alerts for critical issues
Scalability Improvements Distributed Alerting: Multi-node alert processing Stream Processing: Real-time alert stream processing Cloud Integration: Cloud-based alert management Edge Computing: Edge-based alert processing
Last Updated: May 14, 2026 Version: 1.0.0 Status: Production Ready