An icon for a calendar

Published October 30, 2020

Observability is to Monitoring what Tesla is to a Horse Drawn Carriage

Observability is to Monitoring what Tesla is to Horse Drawn Carriage

Information Technology (IT) has always relied on monitoring to provide an understanding of how systems perform over time. The basics has always been to collect a series of metrics and to build an algorithm that shows how these metrics are related, and then to show when a system is either reaching the limits of its capacity or is likely to break.

But as IT has become more complex, it’s become increasingly difficult to use this classic method to monitor. The sheer volume of systems in play, and therefore the number of potential interactions makes it harder to firstly identify issues before they become critical and secondly much harder to discover the root cause and remediate the issue.

When you monitor, you look at parameters such as number of transactions in a database, volume of storage used, network performance and amount of RAM being used. These metrics when looked at over time allow you to see how your systems are being used, and when combined can allow you to see events forming. But there are limits to how effective this model can be, and these limits are predicated on how complex the environment becomes. You reach a point when the volume of potential permutations is greater than the time you have to consider them. We refer to this as the complexity fatigue point. Beyond this point the classic monitoring paradigm starts to collapse, and you will see your ability to monitor performance degrade and your time to discover the root cause of issues and remediate them increase.

Complexity can be countered by abstraction

Abstraction is the concept of dealing with representational qualities (of ideas) rather than events.

The flow of a user’s interaction through the entire application stack of an e-business is a useful abstraction to consider. If you can understand analytically how a user’s request is serviced across all your systems, and overlay performance data on this visualization, then you can compare each user request against others, either individually or in groups. Then you can spot what poor performance looks like and identify others that have similar attributes. What makes up each user’s interaction may be thousands or millions of discreate pieces of data, but what you get to consider is the user’s experience. This is the power of abstraction.

How to Abstract Business Understanding

One of the most innovative methods in use today to abstract business understanding from within complex enterprise environments, is to make use of the configuration information and message content already in place within messaging middleware environments. Complex business applications make use of messaging architectures to send and receive messages between application components that all run in different locations and at different speeds. These messages ensure a secure and reliable communication process, but also are the perfect description of how the business flows. If you can read and use this information, then you can automatically (dynamically) follow each and every user’s interaction. A huge amount of effort is involved in ensuring these message systems work, but this effort results in a readable description of how a business is enacted. By visualizing the flow of messages between and inside systems, and overlaying performance data of each system and message, you have a complete view of how your business truly performs. By comparing the profile of any transaction (or group of transactions) to the historical record of previous transactions (or groups) you can spot the subtle indicators of future issues, delivering the much-postulated concept of predictive analysis.

Observability

By adding an abstracted view (additional dimension) of the business through this exploitation of messaging middleware, we have in-fact moved from classic monitoring to observability. We are now able to consider each and every user experience holistically and compare it to other users interactions. Machine Learning then lends itself to this analysis, allowing automated signal detection techniques to be used to alert operations teams, business teams and developers to any deviance that is likely to miss service level agreements. And the reasons for any event can be described in business and user terms, allowing vastly improved root-cause analysis.

Quite simply, if your business uses messaging middleware, and you are looking to move from monitoring to observability, we have a way to make this happen simply and quickly.