The Change Failure Rate (CFR) is one of the four DORA metrics that measures the percentage of deployments causing a failure in production that requires immediate remediation (hotfix, rollback, fix forward, or patch).
Change Failure Rate represents the stability of your deployment process. It answers the question: "What percentage of our deployments result in degraded service?"
Change Failure Rate = (Failed Deployments / Total Deployments) × 100
Main Python script that calculates the change failure rate from various data sources.
CloudFormation template for creating a DynamoDB table to track deployment statuses.
- Python 3.7+
- AWS CLI configured (if using DynamoDB)
- Required Python packages:
pip install -r requirements.txt
Deploy the DynamoDB table for tracking deployment statuses:
aws cloudformation create-stack \
--stack-name DeploymentStatusTableStack \
--template-body file://deployment_status_table.yaml \
--region YOUR_AWS_REGION \
--capabilities CAPABILITY_NAMED_IAMpython change_failure_rate.py --start-date 2024-01-01 --end-date 2024-01-31python change_failure_rate.py --source csv --file deployments.csvpython change_failure_rate.py --source dynamodb --table deployment-statuspython change_failure_rate.py --source jenkins --url https://jenkins.example.com# JSON output
python change_failure_rate.py --output json > cfr_report.json
# CSV output
python change_failure_rate.py --output csv > cfr_report.csv
# Dashboard-ready format
python change_failure_rate.py --output grafanadeployment_id,timestamp,status,environment,service,rollback_required
deploy-001,2024-01-15T10:30:00Z,success,production,api-service,false
deploy-002,2024-01-15T14:20:00Z,failed,production,web-app,true
deploy-003,2024-01-15T16:45:00Z,success,production,database,falsesuccess: Deployment completed without issuesfailed: Deployment caused service degradationpartial: Deployment partially failed (counts as failure)rolled_back: Deployment was rolled back (counts as failure)
- name: Calculate Change Failure Rate
run: |
python scripts/ChangeFailureRate/change_failure_rate.py \
--source github \
--repo ${{ github.repository }} \
--token ${{ secrets.GITHUB_TOKEN }}stage('Calculate CFR') {
steps {
sh '''
python scripts/ChangeFailureRate/change_failure_rate.py \
--source jenkins \
--job ${JOB_NAME} \
--build ${BUILD_NUMBER}
'''
}
}calculate-cfr:
script:
- python scripts/ChangeFailureRate/change_failure_rate.py
--source gitlab
--project $CI_PROJECT_ID
--token $GITLAB_TOKEN- Elite: 0-5%
- High: 6-15%
- Medium: 16-30%
- Low: > 30%
- Increase test coverage - Catch issues before production
- Implement feature flags - Reduce impact of failures
- Use canary deployments - Detect issues with minimal impact
- Improve rollback procedures - Faster recovery from failures
- Enhance monitoring - Detect issues quickly
- Verify date range includes deployments
- Check data source connectivity
- Ensure proper authentication
- Validate status field values
- Check for duplicate deployment records
- Verify timezone handling
- Use date filters to limit data range
- Implement pagination for large datasets
- Consider data aggregation for historical data
python change_failure_rate.py --debug --verbosepython change_failure_rate.py --service api-gateway --service user-servicepython change_failure_rate.py --exclude-env staging --exclude-env developmentpython change_failure_rate.py --trend weekly --weeks 12python change_failure_rate.py --alert-threshold 15 --alert-webhook $SLACK_WEBHOOK- Track all production deployments - Include hotfixes and rollbacks
- Define "failure" clearly - Document what constitutes a failed deployment
- Automate data collection - Reduce manual tracking errors
- Review trends regularly - Weekly or sprint retrospectives
- Correlate with other metrics - Balance speed with stability
See the main Contributing Guide for details on submitting improvements.