Founded more than 150 years ago, a large Fortune 50 financial services firm provides clients in more than 50 countries with financial advisory services, products and solutions. As an integrated bank employing more than 100,000 people, the company consists of multiple global divisions including private banking, investment banking and asset management.
In financial services, conducting business in real-time is a crucial part of staying competitive. For Nastel’s banking client, the massive amount of operational and transactional data that traveled across its business applications was difficult to monitor without an effective, enterprise-wide tool that could automate that process.
As the bank researched the effectiveness of its existing monitoring solutions, it found there were almost as many different tools as there were banking divisions. It also uncovered an embarrassing fact: More than 60 percent of the time, users found and reported problems before IT personnel knew about them.
The bank’s help desks were backlogged with problems that often required expensive tier 2 or tier 3 support assistance to resolve. This was expensive not only in terms of productivity, but of increased reputation risk. And as resolution times were long and getting longer, a belief became entrenched among the bank’s users that IT was not responsive to their needs.
When problems occurred (and they did often in the context of a large volume of multifunctional applications running in the same environment), sometimes the only strategy available to the IT teams was holding long, drawnout problem resolution sessions in a “war room” setting.
These sessions often devolved into blame games. The bank’s IT director explained: “With management of the production environment in silos, it was pretty common for one group to pinpoint another as the cause of the previous problem and point the finger at them as a result. However, answers are never that simple, and 80 percent of the sessions would be spent identifying and analyzing the problem, leaving 20 percent of the time for actually dealing with and solving it.”
An example of the high-priority business problems being encountered: The IT group confirmed a financial trade was completed, but the customer complained that their order had disappeared. Management brings the problem into a single room with the network, web server, app server, and DB management and development groups. No group claims responsibility for the problem—so management orders everyone to remain in the room until the problem is resolved, sometimes consuming a full day.
The extended resolution process was very expensive, with a high number of tickets being opened at the service desk, and customers forced to deal with substandard service levels while the problem remained unresolved.
Each of the bank’s business transactions consisted of multiple IT infrastructure events; some asynchronous and some executing in parallel. The IT group needed a method of understanding and relating physical IT activities to a business process and transaction context. For example: how IT transactions map to the exact business steps (and sequence of execution) involved in a specific financial trade.
After analyzing their situation, the bank determined the following was necessary to address various issues:
- Real-time transaction monitoring across Java, middleware and CICS
- Trending analysis to anticipate problems
- Capacity planning
- Support for monitoring legacy applications written in C and C++
- Ability for support to easily field “where is my trade?” questions from customers, along with answers to questions such as: “Is it stuck in a queue, an integration broker, an application or a database?”
- A similar ability for support to answer the questions: “Where is my message?” “Why are my business transactions taking a half hour to complete? What segment is causing the delay? Has the delay been increasing over time?”
- Proactive problem detection
- Improved performance quality of releases moving from development to production
The banks’ IT environment was quite diverse; however, the environment that needed first consideration consisted of WebLogic Application Servers, IBM MQ and Message Broker middleware, TIBCO EMS middleware, CICS Transaction Server on the mainframe, and MS SQL Database.
The only option for the firm to compete in real-time and function efficiently (while saving hundreds of thousands of dollars in wasted personnel time) was to proactively monitor the production environment. Automating the monitoring processes was mandated because the traditional manual method of “eyes on screen” monitoring was impractical simply because of the incoming volume and rate-of-change of transactional and operational data.
The enterprise-wide need for monitoring elicited many questions from the IT director: “Do we buy different tools for each department? What about tools already in existence? Does it still make sense to look at these issues from a silo-based perspective?”
The bank’s “capability wish list” included a robust monitoring system with real-time alerting that could manage transaction processing encompassing high volume and volatility—across a global environment with many platforms, systems, and applications.
Any candidate solution would have to work proactively to alert staff and help resolve problems immediately as they arose. It would also have to be scalable and dynamic enough to handle such a large infrastructure—and at a reasonable price point.
The right solution, incorporating a high degree of automation and ability to work with a wide variety of industry-standard IT components, would benefit a wide range of business users:
- The IT group could get comprehensive detection of application problems and faster support
- Enterprise architects would benefit from a solution that worked well across multiple departments and enabled users to integrate their existing tool investments
- The head of trading would benefit from high-quality service in terms of SLAs being met and app performance that facilitated compliance with various regulatory requirements
- Development teams could use the solution in UAT (user acceptance testing) to ensure new releases were performance-compliant before they were provisioned to production
Effective automation in the form of proactive monitoring was identified time and again as a critical requirement. “In reality, nobody has time to constantly look for delays or breaks occurring in business transactions throughout the course of the business day,” said the bank’s IT director. “On the other hand, no one wants to be caught flat-footed with a customer complaint that something significant has gone awry or is missing, and they’re suddenly under the gun to investigate and solve the issue immediately.
“Advanced automation would allow us to define time-based SLA thresholds for our business process transaction flows at different levels, for example, each flow segment, whole transaction, etc. Before those SLA thresholds are breached, alerts can be automatically sent to a variety of destinations, including an operations dashboard, email server, and instant messaging, to facilitate proactive actions by the relevant IT group or individual.”
As part of its due diligence process the bank began evaluating Nastel AutoPilot. Based on the firm’s criterion for a monitoring solution, Nastel AutoPilot proved a near-perfect fit, supplying all of the bank’s prerequisites for a monitoring solution: transactional and operational monitoring, scalability, powerful automation tools and facilities, and strong middleware domain expertise and mainframe support.
To proactively monitor its complex production environment, the bank would leverage one of AutoPilot’s most powerful features, a Complex Event Processing (CEP) engine that employs automated algorithms to examine IT events and performance metrics. The engine harnesses user-defined business rules to: 1) ascertain if an IT situation is normal or abnormal, and 2)provide governance capabilities by deciding if various IT states and conditions have potential businessimpact and, if so, act to limit or eliminate it.
The bank’s IT pros saw they could quickly create CEP rules based on their knowledge of their unique production environments. AutoPilot’s CEP engine could then automatically fix problems or alert the staff anytime a “business abnormal” state was reached. This capability enabled users to be proactive—often remediating problems and issues before end-users or business processes experienced any impact.
Prime Brokerage & Equities Cash Securities Trade Monitoring
An early implementation of AutoPilot involved monitoring the bank’s prime brokerage and equities cash securities trade booking flow. It encompassed multi-segment business transaction flows with both sequential and parallel event sequencing. Transactions spanned multiple platform environments, like app servers, Solaris and Linux operating systems, and an IBM mainframe.
Reliable Trade Tracking Through a SWIFT Gateway
Another initial implementation involved orders and post-trade processing through the SWIFT gateway (MINT). This required a solution robust enough to monitor multi-segment business transaction flows traversing multiple geographic regions and gateway hops. The security restrictions for SWIFT gateway entry/exit hobbled previous monitoring capabilities—but AutoPilot’s functionality was powerful enough to keep everything up and running.
This was a rather unique challenge, because to determine what a business transaction was, AutoPilot had to view middleware message payload content and use this information to automatically “stitch” together diverse IT events into an actual business transaction and then store it in a database.
This capability finally gave the IT support group the analytics tool they needed to answer the “where is my trade?” question from customers. Support could instantly look up a trade in the historical performance database using the same trade ID the customer had.
Monitoring the Messaging Environment
Lastly, the bank put AutoPilot to work monitoring all aspects of the messaging environment for TIBCO EMS and MQ, including trade booking, validation, clearing, settlement and payments.
Benefits & Results
AutoPilot’s out-of-the-box functionality fostered quick implementations and proved to be more than scalable enough to meet the bank’s current and projected needs. While it hasn’t yet put a dollar amount on the actual money saved on providing end-to-end visibility, transaction profiling and eliminating internal “blame games,” the financial benefit was deemed significant. To date, the bank has substantially lowered the number of incident tickets at their help desk and the number of expensive support professionals required for problem resolution.
With a solution that tracks and monitors the environment and automates problem resolution, the long “blame game” scenarios mentioned earlier disappeared, raising service levels and reducing the number of tickets at the service desk.
“Instead of spending our time identifying the cause of problems, AutoPilot recognizes them right away,” said the IT director, “thereby releasing resources from detection and diagnosis tasks, and enabling much faster actual problem resolution.”
He added: “Insight into real-time transaction activity isn’t easy to come by, but it’s key to the delivery of high-quality service that helps establish market leadership. AutoPilot provides intuitive insight in a simple, central dashboard.
“With AutoPilot,” he continued, “we more accurately process SWIFT and syndicated loan payments, routing out to the DTCC and SIAC, Fedwire payment settlement, and more.”
AutoPilot also helps with trending analysis to anticipate problems and enable capacity planning. The solution handles legacy applications written in “C” and “C++”—programming languages, for which the banks previously lacked adequate monitoring tools.
“AutoPilot helps ensure our SLAs are fulfilled and thresholds aren’t being breached,” the IT director concluded. “It’s been a huge help in providing general customer views into the system.”
About Nastel AutoPilot
Nastel AutoPilot ensures the availability and performance of critical business applications via auto discovery, business transaction profiling, real-time monitoring, role-based dashboards and automated problem resolution. Customers in the line of business, development, and IT utilize AutoPilot to guarantee high application performance, compliance, reduced user impact, fewer incidents, lower costs, and greater productivity.
Nastel Technologies helps the world’s largest enterprises solve application compliance, surveillance and performance issues while safeguarding customer relationships and lowering operational costs. Nastel’s Enterprise-Grade APM solution encompasses deep real-time monitoring, transaction tracking, and analytics spanning applications, middleware, transactions, end-user experience, logs, and mobile services in a single, powerful, easy-to-use solution.
Nastel is privately held and headquartered in New York, with offices in the U.S., the U.K., France, Germany and Mexico, with an additional network of partners throughout Europe, the Middle East, Latin America and Asia. For more information, visit nastel.com.