When to Monitor and when to Trace
Sometimes it can be hard to understand the complexity of IT, and for many business leaders running information technology teams, the “IT word salad” (which includes terms such as transactions, monitoring, tracing etc) can make it hard to define the most appropriate strategy to achieve the business fundamentals you are charged with maintaining and improving.
Today every business is an e-business, customers, suppliers, marketing, sales, finance is all handled electronically. This means that a failure of systems can leave a business unable to create products, collect money and pay bills. And if systems are hard to use or slow to respond that directly translates into additional costs, unhappy customers and lost business.
But to achieve e-business requires many systems to be interconnected, and each of these systems is run by different teams of people, with different goals So when your e-business fails to meet expectations the issue can be very hard to solve, often taking a huge amount of time and effort to even understand exactly what is not working, let alone solving the issue.
The best practice has been to monitor everything, and this means to have a series of graphs on a report that shows how each system is performing, generally measuring the amount of interactions each system is performing, how much system memory is being used, how much disk space is being used and how quickly information is being sent from system to system. And this information is supplemented with information about what is considered the high watermark, i.e. When things are considered to be getting full or overloaded.
By combining the system metrics with thresholds, alerts can be generated to indicate when systems are becoming overloaded, and other alerts can be generated if things are taking longer than was planned.
As IT environments have become more complex this amount of configuration work to set and maintain the monitoring of all these system metrics and the thresholds has become a considerable burden on the IT teams. In fact, for some companies it can become a limiting factor on their ability to innovate, because the testing of their monitoring systems to handle any change can take months of work.
Now add to this the issue another huge concern; with hundreds of different systems interacting, a small variance in a number of systems can lead to a problem that impacts users but can be very hard to pin down to a specific cause. This is precisely the issue that many companies find themselves in. When a problem is identified, a war-room meeting is called, this will often be a conference call, where all the monitoring responsible experts are called together to review what is going on. And these calls all too often become a multi-hour debate with experts talking over each other to passionately prove that their specific systems are not at fault. The leader of these calls will be found nursing their headaches while listening to very experienced and expensive people shouting at each other. Many times, these issues actually are left unresolved, because the data available doesn’t provide a simple path to resolution.
This challenge can be solved!
The root cause of the issue is that machine data must be given a business context to allow the impact of technology on business to be understood, but the classic model has been to start with the result and try to infer the cause.
What is important is the impact technology has on the business and not the impact that the business has on technology.
And this is where tracing comes in.
If you can follow a user’s request through the technology, creating a map that shows exactly how the users request is handled, you can quickly see the impact of technology on the business.
The classic model of monitoring does not lend itself to this way of thinking, because the classic monitoring paradigm requires the specifics of relationship between all the systems to be manually described in the context of business rules. This is inherently very time consuming, expensive and difficult to scale.
But by using the idea of tracing, the technology is actually used to dynamically build and map the pathway that a user’s transaction takes.
When you combine tracing and monitoring you have the information to fully understand how a user is experiencing your e-business and you have the information on exactly which components of your technology are impacting and specific issue.
One company has been delivering tracing for decades, and that is Nastel Technologies. Our tracing technology provides a dynamic business perspective that allows IT teams, operations teams, business teams and Development to deeply understand exactly how their IT environments are being consumed by the business and to see precisely how each system impacts any perceived performance issue.
Nastel has spent many tens of thousands of hours of development work over decades to build machine learning based systems that help the world’s largest companies to solve these specific issues. Today we are delivering in production to fortune 100 companies (and many others) solutions that reduce their costs, improve their user experiences and continually lower the time it takes to solve problems and reduce the number of problems they see.
Monitoring that combines tracing and doesn’t require armies of experts managing scripts is solving critical issues for IT today. If you want to see what Nastel can do for you contact us and we’ll be happy to show you.