Skip to content

Latest commit

 

History

History
782 lines (632 loc) · 18.6 KB

File metadata and controls

782 lines (632 loc) · 18.6 KB

AevovOP Analytics & Monitoring System Documentation

Analytics System - Complete Implementation

Status: ✅ Fully Implemented Version: 2.0.0 Date: November 20, 2025


📊 Overview

The AevovOP Analytics & Monitoring System provides comprehensive metrics collection, time-series analysis, and real-time monitoring capabilities for the entire platform. Track system performance, node metrics, consensus statistics, storage utilization, and more.

Key Features

  • Multi-Category Metrics: 8 metric categories covering all system aspects
  • Time-Series Analysis: Query historical data with customizable intervals
  • Real-Time Dashboards: Comprehensive dashboard with live updates
  • Aggregation Functions: 8 aggregation functions (avg, sum, min, max, count, p50, p95, p99)
  • Custom Queries: Flexible query engine for custom analytics
  • Performance Comparison: Compare metrics across multiple entities
  • Top Performers: Identify highest-performing nodes
  • Automated Collection: Background metrics collection

🏗️ Architecture

┌────────────────────────────────────────────────────────────┐
│               Analytics & Monitoring System                 │
│                                                            │
│  ┌──────────────────────────────────────────────────┐     │
│  │           Metrics Collector                      │     │
│  │  • Real-time metric recording                    │     │
│  │  • Multi-category tracking                       │     │
│  │  • Automatic aggregation                         │     │
│  │  • Batch processing                              │     │
│  └──────────────────────────────────────────────────┘     │
│                         ↕                                  │
│  ┌──────────────────────────────────────────────────┐     │
│  │           Query Engine                           │     │
│  │  • Time-series queries                           │     │
│  │  • Aggregation functions                         │     │
│  │  • Custom filters                                │     │
│  │  • Performance optimization                      │     │
│  └──────────────────────────────────────────────────┘     │
│                         ↕                                  │
│  ┌──────────────┬──────────────┬────────────────────┐    │
│  │              │              │                    │    │
│  │   Metrics    │  Time Series │    Dashboards      │    │
│  │     DB       │   Analysis   │                    │    │
│  └──────────────┴──────────────┴────────────────────┘    │
└────────────────────────────────────────────────────────────┘

📈 Metric Categories

1. System Metrics

Track overall system performance and health.

Metrics:

  • Total requests
  • Successful/failed requests
  • Average latency
  • Requests per second
  • System uptime
  • Active nodes count
  • Total/used storage
  • Consensus rounds

2. Node Metrics

Track individual node performance and contributions.

Metrics:

  • Request count per node
  • Success rate
  • Average latency
  • Reputation score
  • Total rewards earned
  • Validations completed
  • Storage provided
  • Inference executed
  • Node uptime

3. Consensus Metrics

Monitor consensus protocol performance.

Metrics:

  • Total consensus rounds
  • Completed/failed rounds
  • Average round time
  • Average votes per round
  • Consensus success rate
  • Byzantine detections

4. Storage Metrics

Track storage system utilization.

Metrics:

  • Total chunks stored
  • Total storage size
  • Average chunk size
  • Replication factor
  • Failed uploads/downloads
  • Provider statistics

5. Reward Metrics

Monitor reward distribution and economics.

Metrics:

  • Total rewards issued
  • Total rewards distributed
  • Pending rewards
  • Average reward amount
  • Rewards by type
  • Total staked
  • Total slashed
  • Transaction count

6. Network Metrics

Track network health and connectivity.

Metrics:

  • Total/online/offline nodes
  • Suspended nodes count
  • Average reputation
  • Network health percentage
  • Nodes by type
  • Messages sent/received
  • Average network latency

7. Inference Metrics

Monitor AI inference performance.

Metrics:

  • Total inference sessions
  • Completed/failed sessions
  • Average inference latency
  • Average confidence scores
  • Total tokens processed
  • Models used count

8. Pattern Metrics

Track pattern extraction and validation.

Metrics:

  • Total patterns
  • Validated/rejected patterns
  • Patterns by category
  • Average quality score
  • Compression ratio

📡 API Endpoints

Get Metrics

Retrieve current metrics for a category.

GET /api/aevovop/analytics/metrics?category=system

curl http://localhost:8090/api/aevovop/analytics/metrics?category=system

Categories:

  • system - Overall system metrics (default)
  • consensus - Consensus metrics
  • storage - Storage metrics
  • reward - Reward metrics
  • network - Network metrics
  • inference - Inference metrics
  • pattern - Pattern metrics

Response:

{
  "total_requests": 125000,
  "successful_requests": 123500,
  "failed_requests": 1500,
  "average_latency": 45.2,
  "requests_per_second": 125.5,
  "uptime": 99.5,
  "active_nodes": 42,
  "total_storage": 5368709120,
  "used_storage": 3221225472,
  "total_rewards_issued": 15000.0,
  "active_stakes": 28,
  "consensus_rounds_total": 3450,
  "last_updated": "2025-11-20T08:00:00Z"
}

Get Dashboard

Retrieve comprehensive dashboard data with all metrics.

GET /api/aevovop/analytics/dashboard

curl http://localhost:8090/api/aevovop/analytics/dashboard

Response:

{
  "system": {
    "total_requests": 125000,
    "successful_requests": 123500,
    "average_latency": 45.2,
    "uptime": 99.5
  },
  "consensus": {
    "total_rounds": 3450,
    "completed_rounds": 3420,
    "average_round_time": 12.5,
    "consensus_rate": 99.13
  },
  "storage": {
    "total_chunks": 15000,
    "total_size": 3221225472,
    "average_chunk_size": 214748
  },
  "rewards": {
    "total_rewards_issued": 15000.0,
    "total_rewards_distributed": 14250.0,
    "pending_rewards": 750.0,
    "total_staked": 25000.0
  },
  "network": {
    "total_nodes": 42,
    "online_nodes": 40,
    "offline_nodes": 1,
    "suspended_nodes": 1,
    "network_health": 95.24
  },
  "inference": {
    "total_sessions": 8500,
    "completed_sessions": 8420,
    "average_latency": 350.5,
    "average_confidence": 0.85
  },
  "patterns": {
    "total_patterns": 2500,
    "validated_patterns": 2450,
    "average_quality": 0.88
  },
  "top_nodes": [
    {
      "node_id": "node_abc123",
      "reputation_score": 95.5,
      "success_rate": 99.2,
      "total_rewards_earned": 550.0
    }
  ],
  "updated_at": "2025-11-20T08:00:00Z"
}

Query Metrics

Execute a custom analytics query with time-series support.

POST /api/aevovop/analytics/query

curl -X POST http://localhost:8090/api/aevovop/analytics/query \
  -H "Content-Type: application/json" \
  -d '{
    "metric": "request_latency",
    "category": "system",
    "start_time": "2025-11-20T00:00:00Z",
    "end_time": "2025-11-20T23:59:59Z",
    "interval": "1h",
    "aggregation": "avg"
  }'

Query Parameters:

  • metric (required) - Metric name
  • category - Metric category
  • start_time - Start time (RFC3339 format)
  • end_time - End time (RFC3339 format)
  • interval - Time interval (1m, 5m, 1h, 1d)
  • aggregation - Aggregation function (avg, sum, min, max, count, p50, p95, p99)
  • filters - Label filters
  • limit - Result limit

Response:

{
  "metric": "request_latency",
  "category": "system",
  "time_series": {
    "metric": "request_latency",
    "category": "system",
    "interval": "1h",
    "start_time": "2025-11-20T00:00:00Z",
    "end_time": "2025-11-20T23:59:59Z",
    "data_points": [
      {
        "timestamp": "2025-11-20T00:00:00Z",
        "value": 42.5
      },
      {
        "timestamp": "2025-11-20T01:00:00Z",
        "value": 45.2
      }
    ]
  },
  "count": 24,
  "executed_at": "2025-11-20T08:00:00Z"
}

Get Node Metrics

Retrieve metrics for a specific node.

GET /api/aevovop/analytics/nodes/:nodeID/metrics

curl http://localhost:8090/api/aevovop/analytics/nodes/node_abc123/metrics

Response:

{
  "node_id": "node_abc123",
  "request_count": 15000,
  "success_rate": 99.2,
  "average_latency": 38.5,
  "uptime": 98.5,
  "reputation_score": 95.5,
  "total_rewards_earned": 550.0,
  "validations_completed": 450,
  "storage_provided": 10737418240,
  "inference_executed": 250,
  "last_seen": "2025-11-20T07:55:00Z"
}

Get Node Performance

Retrieve performance trends for a node over time.

GET /api/aevovop/analytics/nodes/:nodeID/performance?period=7d&interval=1d

curl "http://localhost:8090/api/aevovop/analytics/nodes/node_abc123/performance?period=7d&interval=1d"

Query Parameters:

  • period - Time period (1h, 6h, 12h, 24h, 7d, 30d)
  • interval - Data point interval (1m, 5m, 1h, 6h, 1d)
  • start_time - Custom start time (RFC3339)
  • end_time - Custom end time (RFC3339)

Response:

{
  "node_requests": {
    "metric": "node_requests",
    "category": "node",
    "interval": "1d",
    "data_points": [
      {"timestamp": "2025-11-14T00:00:00Z", "value": 2000},
      {"timestamp": "2025-11-15T00:00:00Z", "value": 2100}
    ]
  },
  "node_latency": {
    "metric": "node_latency",
    "category": "node",
    "interval": "1d",
    "data_points": [
      {"timestamp": "2025-11-14T00:00:00Z", "value": 35.5},
      {"timestamp": "2025-11-15T00:00:00Z", "value": 38.2}
    ]
  },
  "reputation_score": {
    "metric": "reputation_score",
    "category": "node",
    "interval": "1d",
    "data_points": [
      {"timestamp": "2025-11-14T00:00:00Z", "value": 94.0},
      {"timestamp": "2025-11-15T00:00:00Z", "value": 95.5}
    ]
  }
}

Get System Trends

Retrieve system-wide trends over time.

GET /api/aevovop/analytics/trends?period=24h&interval=1h

curl "http://localhost:8090/api/aevovop/analytics/trends?period=24h&interval=1h"

Response:

{
  "requests_total": {
    "metric": "requests_total",
    "category": "system",
    "interval": "1h",
    "data_points": [
      {"timestamp": "2025-11-20T00:00:00Z", "value": 5000},
      {"timestamp": "2025-11-20T01:00:00Z", "value": 5200}
    ]
  },
  "active_nodes": {
    "metric": "active_nodes",
    "category": "network",
    "interval": "1h",
    "data_points": [
      {"timestamp": "2025-11-20T00:00:00Z", "value": 40},
      {"timestamp": "2025-11-20T01:00:00Z", "value": 42}
    ]
  }
}

Get Top Performers

Retrieve top performing nodes by a specific metric.

GET /api/aevovop/analytics/top-performers?metric=reputation_score&limit=10

curl "http://localhost:8090/api/aevovop/analytics/top-performers?metric=reputation_score&limit=10"

Query Parameters:

  • metric - Metric to rank by (default: reputation_score)
  • limit - Number of top performers (default: 10)
  • period - Time period to consider

Response:

{
  "metric": "reputation_score",
  "limit": 10,
  "top_performers": [
    {
      "node_id": "node_abc123",
      "reputation_score": 95.5,
      "success_rate": 99.2,
      "total_rewards": 550.0,
      "uptime": 98.5
    },
    {
      "node_id": "node_def456",
      "reputation_score": 93.8,
      "success_rate": 98.5,
      "total_rewards": 480.0,
      "uptime": 97.2
    }
  ]
}

Compare Metrics

Compare a metric across multiple entities.

POST /api/aevovop/analytics/compare

curl -X POST http://localhost:8090/api/aevovop/analytics/compare \
  -H "Content-Type: application/json" \
  -d '{
    "metric": "average_latency",
    "category": "node",
    "entities": ["node_abc123", "node_def456", "node_ghi789"]
  }'

Response:

{
  "metric": "average_latency",
  "category": "node",
  "comparison": {
    "node_abc123": 38.5,
    "node_def456": 42.1,
    "node_ghi789": 35.2
  }
}

Get Time Series

Retrieve time-series data for a metric.

GET /api/aevovop/analytics/time-series?metric=request_latency&interval=5m&period=1h

curl "http://localhost:8090/api/aevovop/analytics/time-series?metric=request_latency&interval=5m&period=1h"

Response:

{
  "metric": "request_latency",
  "category": "system",
  "time_series": {
    "interval": "5m",
    "data_points": [
      {"timestamp": "2025-11-20T07:00:00Z", "value": 42.5},
      {"timestamp": "2025-11-20T07:05:00Z", "value": 43.2}
    ]
  }
}

Record Metric

Manually record a metric (for testing/debugging).

POST /api/aevovop/analytics/record

curl -X POST http://localhost:8090/api/aevovop/analytics/record \
  -H "Content-Type: application/json" \
  -d '{
    "name": "custom_metric",
    "type": "gauge",
    "category": "system",
    "value": 123.45,
    "unit": "ms",
    "labels": {
      "service": "api",
      "endpoint": "/health"
    }
  }'

Get Analytics Health

Check analytics system health.

GET /api/aevovop/analytics/health

curl http://localhost:8090/api/aevovop/analytics/health

Response:

{
  "status": "healthy",
  "service": "analytics",
  "timestamp": "2025-11-20T08:00:00Z"
}

🔍 Query Examples

Example 1: Average Request Latency (Last 24 Hours)

{
  "metric": "request_latency",
  "category": "system",
  "start_time": "2025-11-19T08:00:00Z",
  "end_time": "2025-11-20T08:00:00Z",
  "interval": "1h",
  "aggregation": "avg"
}

Example 2: Node Reputation Trend (Last 7 Days)

{
  "metric": "reputation_score",
  "category": "node",
  "filters": {
    "node_id": "node_abc123"
  },
  "start_time": "2025-11-14T00:00:00Z",
  "end_time": "2025-11-20T00:00:00Z",
  "interval": "1d",
  "aggregation": "avg"
}

Example 3: Total Rewards Distributed (Last Month)

{
  "metric": "rewards_distributed",
  "category": "reward",
  "start_time": "2025-10-20T00:00:00Z",
  "end_time": "2025-11-20T00:00:00Z",
  "aggregation": "sum"
}

Example 4: 95th Percentile Consensus Round Time

{
  "metric": "consensus_round_time",
  "category": "consensus",
  "aggregation": "p95",
  "start_time": "2025-11-20T00:00:00Z",
  "end_time": "2025-11-20T23:59:59Z"
}

📊 Aggregation Functions

Function Description Use Case
avg Average value Overall trends, typical performance
sum Sum of all values Total counts, cumulative metrics
min Minimum value Best performance, lower bounds
max Maximum value Worst performance, upper bounds
count Number of data points Activity levels, event counts
p50 50th percentile (median) Typical performance excluding outliers
p95 95th percentile Performance targets, SLA monitoring
p99 99th percentile Tail latency, worst-case scenarios

🎯 Configuration

Collector Configuration

config := &analytics.MetricsCollectorConfig{
    CollectionInterval: 1 * time.Minute,      // How often to collect
    RetentionPeriod:    30 * 24 * time.Hour,  // 30 days
    EnableAggregation:  true,
    EnableAlerts:       true,
    MaxMetricsPerBatch: 1000,
}

🔧 Developer Guide

Recording a Metric

collector := analytics.NewCollector(app, config)

// Record a counter
collector.RecordCounter(ctx, "requests_total", analytics.CategorySystem, 1, nil)

// Record a gauge
collector.RecordGauge(ctx, "active_connections", analytics.CategorySystem, 42, "connections", nil)

// Record with labels
collector.RecordGauge(ctx, "node_latency", analytics.CategoryNode, 35.5, "ms", map[string]string{
    "node_id": "node_abc123",
})

Querying Metrics

queryEngine := analytics.NewQueryEngine(app, collector)

req := &analytics.QueryRequest{
    Metric:      "request_latency",
    Category:    analytics.CategorySystem,
    StartTime:   &startTime,
    EndTime:     &endTime,
    Interval:    "1h",
    Aggregation: "avg",
}

resp, err := queryEngine.Execute(ctx, req)

Getting Dashboard Data

dashboard, err := collector.GetDashboardData(ctx)

🛡️ Best Practices

1. Metric Naming

  • Use snake_case: request_latency, node_reputation
  • Be descriptive: consensus_round_time vs time
  • Include units in name when ambiguous: size_bytes, duration_ms

2. Label Usage

  • Keep label cardinality low (avoid unique IDs as labels)
  • Use consistent label names across metrics
  • Use labels for dimensions you want to filter by

3. Aggregation Selection

  • Use avg for latency and rates
  • Use sum for counters and totals
  • Use p95/p99 for SLA monitoring
  • Use max for capacity planning

4. Time Ranges

  • Use appropriate intervals for time range:
    • 1 hour → 1m interval
    • 24 hours → 1h interval
    • 7 days → 6h or 1d interval
    • 30 days → 1d interval

📈 Use Cases

1. System Health Monitoring

Monitor overall system health with real-time dashboards showing:

  • Request throughput and latency
  • Error rates and success rates
  • Active nodes and network health
  • Resource utilization

2. Node Performance Analysis

Track individual node performance:

  • Identify underperforming nodes
  • Monitor reputation trends
  • Track reward earnings
  • Analyze uptime patterns

3. Capacity Planning

Use historical data for capacity planning:

  • Storage growth trends
  • Request volume patterns
  • Node scaling requirements
  • Resource allocation optimization

4. SLA Monitoring

Track service level agreements:

  • P95/P99 latency monitoring
  • Uptime tracking
  • Consensus success rates
  • Error budget management

5. Anomaly Detection

Identify unusual patterns:

  • Sudden latency spikes
  • Reputation drops
  • Failed consensus rounds
  • Storage failures

Built with ❤️ for AevovOP Analytics & Monitoring System