(Second of a six-part blog series that describes how a team of IT pros and managers at one of the world’s largest global banks accommodated a bank acquisition and mastered a complex messaging environment.)
Before MegaBank’s merger with TargetBank, their IT department had been using a well-known event monitoring product to monitor WebSphere MQ. TargetBank used TIBCO Hawk to monitor their TIBCO EMS system. As the bank merger progressed, business requirements soon dictated that a seamless integration across the combined environment was necessary. But at this point, they had two separate management systems. MegaBank needed a tool that could handle their home-grown message broker plus multi-vendor middleware and messaging environments.
After due diligence, running a Proof of Concept (POC)—and evaluating deployment, operations and ability to satisfy key use cases—the IT team chose my firm’s Nastel AutoPilot® product to monitor and manage WebSphere MQ on their distributed and mainframe z/OS platforms.
Why? One compelling reason was that AutoPilot’s array of fully integrated middleware/message capabilities could monitor the bank’s homegrown message broker along with its internal business applications. (Now before you the reader think I’m just making a naked sales pitch for my company, I want to say there are many fine products on the market! But in this particular instance Nastel turned out to be a particularly good fit for MegaBank’s business challenges, as described later in this post.)
As MegaBank moved ahead with integrating its messaging systems, the decision was made to do likewise with their middleware monitoring systems. As a result, the IT team charged with integrating front-end and back-end applications also absorbed all middleware-related responsibilities.
Soon the team encountered additional issues as the two IT infrastructures began to merge. Mission-critical bank applications that relied on IBM and TIBCO products—including a particularly critical high-volume trading application—began to falter under a full production workload consisting of millions of messages per second over hundreds of queue managers. With multiple monitoring systems assigned to the IBM and TIBCO products, there was no clear consensus on the root cause of a problem as it occurred, or even if there was an actual problem! MegaBank’s IT specialists were trained on either the IBM product or the TIBCO, but few on both, making collaboration challenging. Complicating things further was the fact that the incumbent monitoring solutions were producing far too many false alarms. The support team felt like they were chasing ghosts.
Another concern was the lack of integrated monitoring for DataPower and Solace appliances. Having to juggle multiple monitoring solutions along with possible needs for a third or fourth would prevent the IT team from meeting performance standards. So they determined a single monitoring solution was necessary that would:
- Find stuck messages
- Deliver early notification of faults, bottlenecks and slowdowns
- Monitor the performance of messaging and comparisons to SLA performance requirements
- Supply administrative control over WebSphere MQ
- Furnish complete message tracking through JMS, WebSphere MQ, and DataPower
- Easily integrate with the bank’s enterprise management systems (EMSs)
- Provide complete real-time visibility across DataPower, Solace, TIBCO, WebSphere MQ and MegaBank’s home-grown middleware solution
Relief was on the way for the IT team. Nastel support specialists went on-site with MegaBank, learning their use cases and devised an effective architecture for them. Installation was accomplished rapidly with careful attention directed at devising policies (rules implemented using Complex Event processing – CEP) specific to the bank in order to automatically handle their use cases as “situations”. Defining a situation is a way of enriching events with other events, metrics, payload data and transactions. Monitoring a situation is more meaningful than just monitoring individual events. It reduces complexity, more easily relates to business impact and in contrast to a thresholding approach for each event avoids false alarms.
Tune in next week for the next installment of Making a Happy Marriage of WebSphere & TIBCO Infrastructures – Part 3.