While sitting in immovable traffic after going to the theater, I was reminded of something I heard someone say… “If we’d have left 15 minutes later, we would have been home by now.” At the time, this made absolutely no sense. How could we leave 15 minutes later and be home when we would be behind ourselves? Unless, of course we had found a wormhole and were about to travel in time… Nope. There is a much simpler explanation. It’s all about Visibility.
Let me explain. Once you are sitting in traffic, you have limited choices. You are looking left, looking right, and trying to decide if there is a better way to get home. Without a GPS, you can’t see anything ahead, so you make a guess that taking the ramp to the right will be faster. Wrong! Construction.. worse than you were before. This is the result of no visibility, just a guess, and it got you nowhere. If you have a GPS, you have the visibility to see where the problem lies and you are able to make a much wiser decision as to the best route to take.
The advice that had formerly made no sense actually did…it went something like this. Instead of reacting once you had the problem, wait to see where the problem areas were, and then you could choose the best route home and not have to deal with all the traffic. This is viewing your traffic problem as a situation with the necessary visibility to make a decision on a composite set of information or events.
Monitoring an application infrastructure is similar. Many times, you deploy an application and problems occur; however, it is completely unclear what their root cause is. The symptoms are often missed until the users have issues and at that point it is just too hard for the support desk to piece it all together. Eventually, the problem goes away, but no one fixed anything. That can only mean one thing…it will come back again. What’s missing?
Just like in our travel example, you need to be able to put multiple pieces of information together in order to make the best decision. Instead of the fragmented view you get when you react to each issue as the complaints come in, use real-time analytics to put the symptoms together and recognize the patterns that describe a real problem. Old-school monitoring of individual resource availability alone will result in many false-positive. Who wants to wake up in the middle of the night to chase those ghosts? Better, would be to monitor the rate of change for a resource such as heap; as an increase in that rate is most likely a real problem. And then correlate that rate of change with an application indicator such as transactions completed.
Ideally you end up with a monitoring system that lets you monitor, analyze and act. You have the visibility and analysis to address issues before they cause damage.
For more information, check out the video on detecting application performance trends.