I was having a discussion today with a customer about how they measure and reduce risk. And it came down to a simple understanding of MTBF and MTTR. If felt a little like going back to basics, but the simple fact is the basics are, the basics.
MTBF (Mean Time Between Failures) is a measure of how often something will fail.
MTTR (Mean Time To Repair) is a measure of how long it will take to fix a problem.
Together MTBF and MTTR help quantify risk.
If something will fail on average once a day, the goal is to find a way of having it fail less often, ideally with a goal in mind (for example once a week or once a month).
And when something fails, the focus is on trying to find ways of solving every problem quicker, for example instead of taking about a day to fix each problem, wouldn’t it be nice if it could take just a few minutes.
Small changes to MTBF and MTTR can have a massive impact, because of the compound nature of reliability. If one component is likely to fail once in ten years, that’s good, but if you have 500 of them, then the likelihood of a failure goes up to one per week, not so good.
The trick is to understand how failures happen, and to recognize that many failures happen slowly, over time. So being able to spot the indicators of failure before the ultimate breakdown, can actually mean you can solve a problem before it actually happens.
I’m not sure if solving a problem before it happens counts as a negative MTTR score, but either way it’s impressive to make happen.
I digress, my point here is that using Nastel’s solutions this customer has been able to deliver quantitative improvement to the MTTR and MTBF of his impressively complex business applications. It’s always a pleasure to speak with users of our solutions.
If you would like to improve the MTTR and MTBF of your business applications, please visit our website or call us today!