DORA Metrics: The Complete Guide
Everything you need to know about measuring and improving software delivery performance with DORA metrics.
What Are DORA Metrics?
The industry standard for measuring software delivery performance
DORA metrics are four key metrics that measure software delivery performance and operational efficiency. Developed by the DevOps Research and Assessment (DORA) team through years of research analyzing thousands of teams, these metrics have become the industry standard for measuring engineering excellence.
The DORA research, published annually in the State of DevOps Report, found that high-performing teams consistently excel across these four metrics. More importantly, teams that improve their DORA metrics see measurable improvements in business outcomes: faster time to market, higher customer satisfaction, and better employee morale.
Why DORA Metrics Matter
• Objective Performance Measurement: Unlike subjective metrics like story points, DORA metrics measure actual outcomes.
• Industry Benchmarks: Compare your team against thousands of other teams to understand where you stand.
• Proven Business Impact: Teams that improve DORA metrics deliver more value faster with higher quality.
• Comprehensive View: Covers both velocity (how fast you ship) and quality (how reliable your releases are).
The 4 DORA Metrics Explained
Deep dive into each metric
1. Deployment Frequency
How often your team deploys code to production
What it measures: The number of times your team successfully releases code to production. This could be daily, weekly, monthly, or on-demand depending on your team's maturity.
Why it matters: Frequent deployments mean you can deliver value to customers faster, get feedback sooner, and reduce the risk of each deployment. Small, frequent releases are easier to debug and roll back than large, infrequent ones.
How to calculate: Count the number of successful production deployments in a given time period (typically measured per day, week, or month).
Elite Performance Benchmark:
Multiple deployments per day (on-demand deployment)
2. Lead Time for Changes
How long it takes for code to go from commit to production
What it measures: The time it takes for a code change to go from version control (commit) to successfully running in production. This includes coding time, review time, testing time, and deployment time.
Why it matters: Short lead times enable faster feedback loops. You can respond to customer needs quickly, fix bugs faster, and experiment more frequently. Long lead times mean slow time-to-market and delayed value delivery.
How to calculate: Measure the time from first commit to production deployment. Track both median and 95th percentile to understand typical performance and worst-case scenarios.
Elite Performance Benchmark:
Less than one day (often measured in hours)
3. Change Failure Rate
What percentage of deployments cause failures in production
What it measures: The percentage of changes (deployments) that result in degraded service or require remediation (hotfix, rollback, patch) in production. Essentially, how often your deployments break something.
Why it matters: High change failure rates indicate quality issues in your development and testing processes. They erode customer trust and force engineers to spend time on firefighting instead of building new features.
How to calculate: (Number of failed deployments / Total number of deployments) × 100. Define what constitutes a "failure" clearly—typically deployments that require rollback, hotfix, or result in service degradation.
Elite Performance Benchmark:
0-15% of deployments result in failures
4. Mean Time to Recovery (MTTR)
How long it takes to restore service after a failure
What it measures: The average time it takes to restore service when a production incident occurs. Measured from when the incident starts (service degrades) to when service is fully restored.
Why it matters: Even elite teams have incidents. What separates them is how quickly they recover. Fast MTTR minimizes customer impact, reduces revenue loss, and indicates strong incident response processes.
How to calculate: Sum of all recovery times / Number of incidents. Track both mean and median. Long recovery times often indicate poor monitoring, unclear runbooks, or lack of rollback capabilities.
Elite Performance Benchmark:
Less than one hour (often measured in minutes)
DORA Performance Levels
Where does your team stand?
The DORA research categorizes teams into four performance levels: Elite, High, Medium, and Low. These benchmarks help you understand where your team stands and set realistic improvement goals.
| Metric | Elite | High | Medium | Low |
|---|---|---|---|---|
| Deployment Frequency | On-demand (multiple per day) | Between once per week and once per month | Between once per month and once every 6 months | Fewer than once per 6 months |
| Lead Time for Changes | Less than one day | Between one day and one week | Between one week and one month | Between one month and six months |
| Change Failure Rate | 0-15% | 16-30% | 31-45% | 46-60% |
| Mean Time to Recovery | Less than one hour | Less than one day | Between one day and one week | More than one week |
What the Data Shows
The DORA research found that elite performers are not just incrementally better—they're orders of magnitude better:
• Elite teams deploy 208x more frequently than low performers
• Elite teams have 106x faster lead time for changes
• Elite teams have 7x lower change failure rates
• Elite teams recover from incidents 2,604x faster
How to Measure DORA Metrics
Tools and approaches for tracking
Manual Tracking
Start here if you're just getting started with DORA metrics:
• Spreadsheet with deployment dates and outcomes
• Jira tickets for tracking incidents
• Git commits with timestamps
• Manual calculation weekly or monthly
Pros: Free, simple to start
Cons: Time-consuming, error-prone, hard to scale
Automated Tools
Recommended for consistent, accurate tracking:
• Connect to GitHub/GitLab/Bitbucket
• Automatic deployment tracking via CI/CD
• Incident detection from monitoring tools
• Real-time dashboards and trends
Pros: Accurate, real-time, scalable
Cons: Requires setup and potentially cost
Data Sources for DORA Metrics
Deployment Frequency:
CI/CD tools (GitHub Actions, CircleCI, Jenkins), deployment logs, release tags in Git
Lead Time for Changes:
Git commit timestamps, PR merge times, deployment timestamps
Change Failure Rate:
Incident management tools (PagerDuty, Opsgenie), rollback logs, hotfix deployments
Mean Time to Recovery:
Incident tickets (Jira, Linear), monitoring alerts (Datadog, New Relic), incident timestamps
How to Improve Each DORA Metric
Actionable playbooks for improvement
Improving Deployment Frequency
Automate Your Pipeline
Implement CI/CD to remove manual deployment steps. Elite teams deploy on-demand because every merge to main triggers automatic deployment.
Reduce Batch Size
Deploy smaller changes more frequently instead of large releases. Smaller PRs are faster to review and safer to deploy.
Remove Manual Approvals
Manual approval gates slow deployment frequency. Replace them with automated quality checks and feature flags for gradual rollouts.
Use Trunk-Based Development
Long-lived feature branches slow deployments. Merge to main frequently and use feature flags to hide incomplete features.
Improving Lead Time for Changes
Speed Up Code Reviews
PRs waiting for review are the #1 cause of long lead times. Set 24-hour SLAs for reviews and use smart nudges to prevent PRs from getting stuck.
Automate Testing
Manual QA cycles add days to lead time. Invest in automated tests that run in CI/CD. Elite teams have comprehensive test suites that run in minutes.
Reduce Work in Progress
Too many open PRs increase lead time. Limit WIP and focus on finishing work before starting new tasks. Finish what you start.
Break Down Large Changes
Large PRs take longer to review and test. Break features into smaller, independently deployable pieces that can ship incrementally.
Improving Change Failure Rate
Improve Test Coverage
Bugs that reach production indicate gaps in testing. Focus on integration and end-to-end tests for critical paths. Aim for 80%+ coverage on core features.
Implement Gradual Rollouts
Use canary deployments or feature flags to test changes with small user groups before full rollout. Catch issues early with minimal impact.
Better Monitoring and Alerts
Many failures go undetected initially. Implement comprehensive monitoring for errors, performance, and user experience. Alert on anomalies immediately.
Learn from Failures
Conduct blameless post-mortems for every production incident. Document root causes and implement preventive measures. Build institutional knowledge.
Improving Mean Time to Recovery
One-Click Rollbacks
The fastest way to recover is to rollback. Ensure your deployment pipeline supports instant rollbacks. Elite teams can rollback in seconds.
Clear Runbooks
Document incident response procedures for common issues. When incidents happen, engineers shouldn't waste time figuring out what to do. Runbooks save hours.
Improve Observability
The faster you can identify the root cause, the faster you recover. Invest in logging, tracing, and metrics. Make it easy to diagnose issues quickly.
Practice Incident Response
Run incident drills and game days. Practice recovering from failures in non-critical scenarios so your team is prepared when real incidents occur.
Common DORA Metrics Mistakes
Avoid these pitfalls
Mistake #1: Measuring Without Context
Tracking DORA metrics without understanding why they matter leads to "metrics theater." Teams game the numbers instead of improving outcomes. Always connect metrics to business goals and team health.
Mistake #2: Using Metrics for Individual Performance
DORA metrics measure team and system performance, not individual performance. Using them for performance reviews or to compare developers creates perverse incentives and destroys psychological safety.
Mistake #3: Optimizing One Metric at the Expense of Others
Deploying more frequently while change failure rate skyrockets isn't progress. All four metrics must improve together. Balance velocity (deployment frequency, lead time) with stability (change failure rate, MTTR).
Mistake #4: Setting Unrealistic Goals
If you're currently a low performer, aiming for elite performance immediately will demoralize your team. Set incremental goals. Move from low to medium, then medium to high, then high to elite. Sustainable improvement takes time.
Mistake #5: Not Defining "Deployment" and "Failure" Clearly
Teams measure DORA metrics inconsistently without clear definitions. What counts as a deployment? What constitutes a failure? Document your definitions and measure consistently across teams.
Related Guides
Code Review Automation
Improve lead time with automated reviews
Code Review Best Practices
Reduce change failure rate with better reviews
Productivity Tools
Tools to improve all DORA metrics
Engineering Metrics Dashboard
Track your DORA metrics automatically
DORA Metrics Tools Comparison
Tools to help you track and improve
| Tool | Approach | Best For | Pricing |
|---|---|---|---|
| TeamOnTrack | Actionable insights + smart nudges | Teams who want to improve, not just measure | $29/dev/month |
| LinearB | Dashboards and reporting | Executive reporting and visibility | Contact sales |
| Jellyfish | Engineering management platform | Large enterprises (200+ engineers) | Contact sales |
| Swarmia | Simple metrics + Slack integration | Small teams (10-50 engineers) | $240/dev/year |
| Sleuth | Deployment tracking focused | Teams focused on deployment metrics | $150/month+ |
Why TeamOnTrack for DORA Metrics?
Most tools show you DORA metrics. TeamOnTrack tells you how to improve them:
• Root Cause Analysis: "Your deployment frequency dropped because manual approvals were added. Teams who automated these checks deploy 3x more."
• Smart Nudges: When PRs sit for 24+ hours (hurting lead time), automatically nudge the right reviewer.
• Pattern Detection: AI identifies what "normal" looks like and flags anomalies before they impact your metrics.
• Comparison Intelligence: See how your metrics compare to similar teams and get specific recommendations.
