Name	Name	Last commit message	Last commit date
parent directory ..
__pycache__	__pycache__
README.md	README.md
action.yaml	action.yaml
change_failure_rate.py	change_failure_rate.py
deployment_status_table.yaml	deployment_status_table.yaml
metric.py	metric.py
requirements.txt	requirements.txt
update_dynamodb.js	update_dynamodb.js

Change Failure Rate Calculator

Overview

The Change Failure Rate (CFR) is one of the four DORA metrics that measures the percentage of deployments causing a failure in production that requires immediate remediation (hotfix, rollback, fix forward, or patch).

What is Change Failure Rate?

Change Failure Rate represents the stability of your deployment process. It answers the question: "What percentage of our deployments result in degraded service?"

Formula

Change Failure Rate = (Failed Deployments / Total Deployments) × 100

Script Components

1. change_failure_rate.py

Main Python script that calculates the change failure rate from various data sources.

2. deployment_status_table.yaml

CloudFormation template for creating a DynamoDB table to track deployment statuses.

Installation

Prerequisites

Python 3.7+
AWS CLI configured (if using DynamoDB)
Required Python packages:
```
pip install -r requirements.txt
```

AWS Infrastructure Setup

Deploy the DynamoDB table for tracking deployment statuses:

aws cloudformation create-stack \
  --stack-name DeploymentStatusTableStack \
  --template-body file://deployment_status_table.yaml \
  --region YOUR_AWS_REGION \
  --capabilities CAPABILITY_NAMED_IAM

Usage

Basic Usage

python change_failure_rate.py --start-date 2024-01-01 --end-date 2024-01-31

With Different Data Sources

From CSV File

python change_failure_rate.py --source csv --file deployments.csv

From DynamoDB

python change_failure_rate.py --source dynamodb --table deployment-status

From CI/CD API

python change_failure_rate.py --source jenkins --url https://jenkins.example.com

Output Formats

# JSON output
python change_failure_rate.py --output json > cfr_report.json

# CSV output
python change_failure_rate.py --output csv > cfr_report.csv

# Dashboard-ready format
python change_failure_rate.py --output grafana

Data Format

Expected CSV Format

deployment_id,timestamp,status,environment,service,rollback_required
deploy-001,2024-01-15T10:30:00Z,success,production,api-service,false
deploy-002,2024-01-15T14:20:00Z,failed,production,web-app,true
deploy-003,2024-01-15T16:45:00Z,success,production,database,false

Status Values

success: Deployment completed without issues
failed: Deployment caused service degradation
partial: Deployment partially failed (counts as failure)
rolled_back: Deployment was rolled back (counts as failure)

Integration Examples

GitHub Actions

- name: Calculate Change Failure Rate
  run: |
    python scripts/ChangeFailureRate/change_failure_rate.py \
      --source github \
      --repo ${{ github.repository }} \
      --token ${{ secrets.GITHUB_TOKEN }}

Jenkins Pipeline

stage('Calculate CFR') {
    steps {
        sh '''
            python scripts/ChangeFailureRate/change_failure_rate.py \
              --source jenkins \
              --job ${JOB_NAME} \
              --build ${BUILD_NUMBER}
        '''
    }
}

GitLab CI

calculate-cfr:
  script:
    - python scripts/ChangeFailureRate/change_failure_rate.py
        --source gitlab
        --project $CI_PROJECT_ID
        --token $GITLAB_TOKEN

Performance Benchmarks

Industry Standards (2023 State of DevOps Report)

Elite: 0-5%
High: 6-15%
Medium: 16-30%
Low: > 30%

Improvement Strategies

Increase test coverage - Catch issues before production
Implement feature flags - Reduce impact of failures
Use canary deployments - Detect issues with minimal impact
Improve rollback procedures - Faster recovery from failures
Enhance monitoring - Detect issues quickly

Troubleshooting

Common Issues

No Data Returned

Verify date range includes deployments
Check data source connectivity
Ensure proper authentication

Incorrect Calculations

Validate status field values
Check for duplicate deployment records
Verify timezone handling

Performance Issues

Use date filters to limit data range
Implement pagination for large datasets
Consider data aggregation for historical data

Debug Mode

python change_failure_rate.py --debug --verbose

Advanced Features

Filtering by Service

python change_failure_rate.py --service api-gateway --service user-service

Excluding Environments

python change_failure_rate.py --exclude-env staging --exclude-env development

Trend Analysis

python change_failure_rate.py --trend weekly --weeks 12

Alerting Integration

python change_failure_rate.py --alert-threshold 15 --alert-webhook $SLACK_WEBHOOK

Best Practices

Track all production deployments - Include hotfixes and rollbacks
Define "failure" clearly - Document what constitutes a failed deployment
Automate data collection - Reduce manual tracking errors
Review trends regularly - Weekly or sprint retrospectives
Correlate with other metrics - Balance speed with stability

Related Scripts

Contributing

See the main Contributing Guide for details on submitting improvements.

FilesExpand file tree

ChangeFailureRate

Directory actions

More options