Monitoring application performance is simple…as long as the application and the deployment architecture are also simple. The mystery appears as the complexity of the application architecture grows and begins to span from an isolated Java or .NET silo into a full-blooded composite application spanning web, middleware and mainframe tiers. Unraveling this mystery can be daunting as the builders of the application often view their role as within a silo, e.g. web, application server, or database. However, as transactions traverse across tiers the consequences of what has been coded is not always clear to the developer. They may have done a perfect job in Java, but their objects might have created something in the database that unknowingly persists after the Java part of the application has exited and its objects are destroyed. And that is where problems start to creep in…unknowable consequences whose frequency grows with complexity.
Complexity itself is a word that can be used to describe composite applications. According to Wikipedia: “A complex system is a system composed of interconnected parts that as a whole exhibit one or more properties (behavior among the possible properties) not obvious from the properties of the individual parts.” One may conclude (sometimes, erroneously) from this definition that misbehavior in one or more components of a complex system may not necessarily compromise the system as a whole. The challenge is in how to know that our “complex system” is misbehaving or acting outside of “normal” and what is the impact of this?
Today’s monitoring tools are pretty good at identifying known faults such as server availability, network errors, resource utilization, etc – piece parts. A typical data center receives many thousands of alerts (sometimes millions) daily about all kinds of faults. Many alerts have nothing to do with system-wide outages, but are in fact part of the normal operation of complex systems. But, from this never ending stream of event big data, how do we separate the signal from the noise and detect abnormal behavior?
For example, a “server down” alert is clearly an indication of a failed system. However, if the server is clustered and the failover succeeded without service interruption, the “server down” is part of normal operation of a complex system and should not raise a red flag. The same applies to the performance attributes of complex systems such as response time, volume and latency. Knowing normal ranges, deviations (dispersion) is key to understanding how complex systems behave. The other crucial dimension of composite applications is transaction flow, which is the context and the flow of information from one part of the application to another – their behavior.
Real-time application performance monitoring of composite applications requires low-latency analysis using something like a Complex Event Processing engine in order to effectively understand the performance behavior of composite applications as it unfolds. It must be able to rapidly determine whether current behavior is normal or abnormal or trending towards abnormal and then take action. For more on this topic see the white paper here.