Looking Beyond MQ
Debenhams Improve levels of service and avoid costly upgrades.
Focused on Messaging Middleware
It’s likely you’ve been working with WebSphere MQ (WMQ) for years developing, deploying, monitoring or all of the above. It’s also likely that by now you have assembled a tool bag of items to support your implementation and ongoing operations. And you have no doubt become accustomed with dealing with problems within the MQ domain. But what if you could see more of the transactional journey on either side of MQ? Organizations doing this are finding new ways to improve levels of service and avoid costly mistakes and upgrades.
Initially just seeing the touch points, you know – the “puts” and the “gets”- along with MQ administration probably seemed entirely sufficient in order to manage WMQ. A common assumption was that as long as we saw messages coming and going, things were okay. Analogous to Archimedes’ principle of water displacement, the health of MQ or any other middleware system could be assumed as long as both sides remained within a reasonable state of balance.
A process to solve complex issues.
But it didn’t take very long at all to see that more was needed. The difficulty was that by the time an imbalance was noticed, too many problems had already been caused; it was obvious that an earlier warning was needed to avert problems or at least correct them sooner. The need to watch things like queue depth, channel status, or any of the other 40 standard events that MQ raises become clear and pretty much commonplace.
If you are still requiring more precise information to detect problems earlier and to see their impact on the applications and business processes, some of you may have had the experience of configuring your own customer events based on conditions with your unique MQ implementation. If you’re really on top of the MQ management game, you’ve implemented automatic corrective actions, which launch the moment warning signs appear – preventing problems before they occur.
Expanding the Myopic MQ Perspective
At Debenhams we didn’t need to look far for a solution. From our experience with Nastel AutoPilot, our existing MQ monitoring solution and our dialogue with its vendor, we saw how easily this type of problem could be handled. With all the facts we could publish with the agents we had already deployed across our AS/400’s, we were easily able to get access to all these other points of information.
When you’ve got the right monitoring platform, the biggest task can often be defining what data metrics you want to monitor. But, if the tool you are using for monitoring is built on a service-orientated architecture such that is supports open standard interfaces, and treats each metric as a portable object – each with its own metadata and methods – and if it makes it easy to define your own new events, then you should be able to quickly and easily monitor any data metric with rule and correlation engines, and invoke automated or manual corrective actions.
For example, in Debenhams’s case, there are three business-critical applications “glued” together by WMQ. This allows them to see the end-to-end transactional journey involved in correlating facts from the API layers of their application with those they are already getting from MQ.
As you can see in figure 1, both the applications and their API layers are being monitored – all the way into the database that supports the application, and facts are being published along the way. The API layer can initiate a piece of work that can take a good deal of time. For example, if you are processing price changes for a substantial department, having tens of thousands of SKU’s, it can take up to half an hour to complete. We need to know this beforehand, as opposed to those jobs that are perceived to be long-running and turn out to be only one hundred records or so.
So, what was it like implementing all of this monitoring? It took a bit of work, but don’t get the idea it was difficult or impractical. It was relatively straightforward with the tools we used at Debenhams. We’ve got one guy who has written some generic code on the AS400, which is hooked up to how we publish facts with our monitoring system. We have several different sorts of things we’re collecting. For example, you can easily find out how many records have been received. So that’s an easy sort of fact to publish. Some of the other facts we are getting from the API are far more complex. So It’s actually working out things like what processes are active, and if the processes are active, then we need to interrogate the files that they are reading to see how the queue depths are going.
The expanded visibility of seeing beyond the bounds of your middleware (when you begin to see more of what is happening within your applications ) will always yield better ways to monitor and tune the system and make smarter use of that system, too.
At Debenhams, we estimate that our monitoring saves us at least two to three hours in staff time every day. The most sizable savings is in the time it takes to find problems. For example, our application’s vendor used to ask, “How did you know we were doing that?” But they work very closely with us and have been on-site when we were working out the problems inside of MQ. Therefore, they saw the tool we have and learned that we also have quite a lot of MQ knowledge. So now when we think we’ve got a problem, they don’t question it – they just go fit it.
During Debenham’s busy time of the year, they run more shifts and the volumes start going up. Chris’s group wanted to be able to isolate which particular jobs were causing an issue in the API layers. Once they were identified, the group could then see if they could either throw more resources at them or get them rewritten. As a result of this visibility, we saved thousands of dollars that we would have otherwise wasted just treating the symptoms.
One of the critical processes within Debenhams’ operation involved a relatively simple transformation on the source application where the data gets pushed through MQ quite quickly when users gathering the data come off their shifts. Within the target application, however, there is a rather tortuous route. It was at this point in the process that we were finding some problems. It goes through a number of files and a number of rigorous data transformations and eventually it gets into the target database, and then pops out the other side. Not only were we able to easily see what needed to be fixed, but we also extended visibility of the process to our users. Now they see the path of transactions pictorially, and can watch the whole process – easily spotting problems without having to understand any of the complexities.
So, where do you start with this practical approach to monitoring your application infrastructure in a proactive manner? Here are the ways you can make it happen, and get the visibility to go beyond mere MQ monitoring:
• Use a monitoring platform that is an SOA designed to operate in real time, and is modular and extensible with support for any open standard.
• Identify specific data metrics (facts) that will give you better insight into the health of your system. This has a compound effect in that the more you can watch these metrics, the more you’ll discover those points of data that are the real determinants. Remember the facts about your environment are all out there; you just need to locate them and pull them together in order to monitor the health of your system.
• Build simple modular views of these metrics, so that you can correlate them with each other into a hierarchy that represents the complex events within your system. This way you’ll see the warning signs of problems before your application goes to production (or before your customers pick up the phone, if it’s already there). Most of all, you’ll spend much less time chasing false alarms.
• Share the visibility with other stakeholders of the system, particularly application groups and business units. You’ll empower them to solve many of their own problems without bothering you. It also reduces time wasted on needless mundane inquiries, and makes the questions you do get of a higher quality.
• Implement corrective actions for those recurring conditions where the resolution is consistently the same set of actions.
Christopher W.J. Holland is the technical integration manager at Debenhams Retail PLC where he has worked in various positions, most recently in data
warehousing and technical architectural capacities.
He has designed and implemented Debenhams’ MQ Series infrastructure and is involved in numerous technical initiatives across supply chain, buying and merchandising. B2B, and B2C areas.
- Cloud Migration & Hybrid Cloud
- Integration Infrastructure Management
- Middleware Management
- Transaction Tracking
Read other case studies
Innovation-focused FinServ Relies on Unique MQ-based Insights
Financial Services (FinServ) is an exceptionally fast-moving industry, and leaders know that being the best demands excellence in both new customer acquisition (NCA) and the customer experience ...
A Well known retailer of consumer goods Business Transaction Management for Major International Retailer
Helps to Ensure the ‘Price is Right’ by Drastically Improving Efficiency and Cost Savings During Peak Retail Season.