The Client

Founded more than 150 years ago, a large Fortune 50 financial services firm provides clients in more than 50 countries with financial advisory services, prod­ucts and solutions. As an integrated bank employing more than 100,000 people, the company consists of multiple global divisions including private banking, investment banking and asset management.

Challenges

In financial services, conducting business in real-time is a crucial part of staying competitive. For Nastel’s banking client, the massive amount of operational and transactional data that traveled across its busi­ness applications was difficult to monitor without an effective, enterprise-wide tool that could automate that process.

As the bank researched the effectiveness of its exist­ing monitoring solutions, it found there were almost as many different tools as there were banking divi­sions. It also uncovered an embarrassing fact: More than 60 percent of the time, users found and reported problems before IT personnel knew about them.

The bank’s help desks were backlogged with prob­lems that often required expensive tier 2 or tier 3 support assistance to resolve. This was expensive not only in terms of productivity, but of increased rep­utation risk. And as resolution times were long and getting longer, a belief became entrenched among the bank’s users that IT was not responsive to their needs.

When problems occurred (and they did often in the context of a large volume of multifunctional applications running in the same environment), sometimes the only strategy available to the IT teams was holding long, drawnout problem resolution sessions in a “war room” setting.

These sessions often devolved into blame games. The bank’s IT director explained: “With management of the production environment in silos, it was pretty common for one group to pinpoint another as the cause of the previous problem and point the finger at them as a result. However, answers are never that simple, and 80 percent of the sessions would be spent identifying and analyzing the problem, leaving 20 percent of the time for actually dealing with and solving it.”

An example of the high-priority business prob­lems being encountered: The IT group confirmed a financial trade was completed, but the customer complained that their order had disappeared. Man­agement brings the problem into a single room with the network, web server, app server, and DB manage­ment and development groups. No group claims re­sponsibility for the problem—so management orders everyone to remain in the room until the problem is resolved, sometimes consuming a full day.

The extended resolution process was very expensive, with a high number of tickets being opened at the service desk, and customers forced to deal with sub­standard service levels while the problem remained unresolved.

Defining Requirements

Each of the bank’s business transactions consisted of multiple IT infrastructure events; some asynchronous and some executing in parallel. The IT group needed a method of understanding and relating physical IT activities to a business process and transaction context. For example: how IT transactions map to the exact business steps (and sequence of execution) involved in a specific financial trade.

After analyzing their situation, the bank determined the following was necessary to address various issues:

  • Real-time transaction monitoring across Java, middleware and CICS
  • Trending analysis to anticipate problems
  • Capacity planning
  • Support for monitoring legacy applications written in C and C++
  • Ability for support to easily field “where is my trade?” questions from customers, along with an­swers to questions such as: “Is it stuck in a queue, an integration broker, an application or a database?”
  • A similar ability for support to answer the ques­tions: “Where is my message?” “Why are my busi­ness transactions taking a half hour to complete? What segment is causing the delay? Has the delay been increasing over time?”
  • Proactive problem detection
  • Improved performance quality of releases moving from development to production

The banks’ IT environment was quite diverse; however, the environment that needed first consideration consisted of WebLogic Application Servers, IBM MQ and Message Broker middleware, TIBCO EMS middleware, CICS Transaction Server on the mainframe, and MS SQL Database.

The only option for the firm to compete in real-time and function efficiently (while saving hundreds of thousands of dollars in wasted personnel time) was to proactively monitor the production environment. Automating the monitoring processes was mandated because the traditional manual method of “eyes on screen” monitoring was impractical simply because of the incoming volume and rate-of-change of transac­tional and operational data.

The enterprise-wide need for monitoring elicited many questions from the IT director: “Do we buy different tools for each department? What about tools already in existence? Does it still make sense to look at these issues from a silo-based perspective?”

The bank’s “capability wish list” included a robust monitoring system with real-time alerting that could manage transaction processing encompassing high volume and volatility—across a global environment with many platforms, systems, and applications.

Any candidate solution would have to work proac­tively to alert staff and help resolve problems immediately as they arose. It would also have to be scalable and dynamic enough to handle such a large infra­structure—and at a reasonable price point.

The right solution, incorporating a high degree of automation and ability to work with a wide variety of industry-standard IT components, would benefit a wide range of business users:

  • The IT group could get comprehensive detection of application problems and faster support
  • Enterprise architects would benefit from a solution that worked well across multiple departments and enabled users to integrate their existing tool investments
  • The head of trading would benefit from high-quality service in terms of SLAs being met and app performance that facilitated compliance with various regulatory requirements
  • Development teams could use the solution in UAT (user acceptance testing) to ensure new releases were performance-compliant before they were provisioned to production

Effective automation in the form of proactive moni­toring was identified time and again as a critical re­quirement. “In reality, nobody has time to constantly look for delays or breaks occurring in business trans­actions throughout the course of the business day,” said the bank’s IT director. “On the other hand, no one wants to be caught flat-footed with a customer complaint that something significant has gone awry or is missing, and they’re suddenly under the gun to investigate and solve the issue immediately.

“Advanced automation would allow us to define time-based SLA thresholds for our business process transaction flows at different levels, for example, each flow segment, whole transaction, etc. Before those SLA thresholds are breached, alerts can be automatically sent to a variety of destinations, including an operations dashboard, email server, and instant messaging, to facilitate proactive actions by the relevant IT group or individual.”

Solution Path

As part of its due diligence process the bank began evaluating Nastel AutoPilot. Based on the firm’s criterion for a monitoring solution, Nastel AutoPilot proved a near-perfect fit, supplying all of the bank’s prerequisites for a monitoring solution: transactional and operational monitoring, scalability, powerful au­tomation tools and facilities, and strong middleware domain expertise and mainframe support.

To proactively monitor its complex production envi­ronment, the bank would leverage one of AutoPilot’s most powerful features, a Complex Event Processing (CEP) engine that employs automated algorithms to examine IT events and performance metrics. The engine harnesses user-defined business rules to: 1) ascertain if an IT situation is normal or abnormal, and 2)provide governance capabilities by deciding if var­ious IT states and conditions have potential businessimpact and, if so, act to limit or eliminate it.

The bank’s IT pros saw they could quickly create CEP rules based on their knowledge of their unique pro­duction environments. AutoPilot’s CEP engine could then automatically fix problems or alert the staff anytime a “business abnormal” state was reached. This capability enabled users to be proactive—often remediating problems and issues before end-users or business processes experienced any impact.

Prime Brokerage & Equities Cash Securities Trade Monitoring

An early implementation of AutoPilot involved monitoring the bank’s prime brokerage and equities cash securities trade booking flow. It encompassed multi-segment business transaction flows with both sequential and parallel event sequencing. Transac­tions spanned multiple platform environments, like app servers, Solaris and Linux operating systems, and an IBM mainframe.

Reliable Trade Tracking Through a SWIFT Gateway

Another initial implementation involved orders and post-trade processing through the SWIFT gateway (MINT). This required a solution robust enough to monitor multi-segment business transaction flows traversing multiple geographic regions and gateway hops. The security restrictions for SWIFT gateway en­try/exit hobbled previous monitoring capabilities—but AutoPilot’s functionality was powerful enough to keep everything up and running.

This was a rather unique challenge, because to deter­mine what a business transaction was, AutoPilot had to view middleware message payload content and use this information to automatically “stitch” together diverse IT events into an actual business transaction and then store it in a database.

This capability finally gave the IT support group the analytics tool they needed to answer the “where is my trade?” question from customers. Support could instantly look up a trade in the historical performance database using the same trade ID the customer had.

Monitoring the Messaging Environment

Lastly, the bank put AutoPilot to work monitoring all aspects of the messaging environment for TIBCO EMS and MQ, including trade booking, validation, clearing, settlement and payments.

Benefits & Results

AutoPilot’s out-of-the-box functionality fostered quick implementations and proved to be more than scalable enough to meet the bank’s current and pro­jected needs. While it hasn’t yet put a dollar amount on the actual money saved on providing end-to-end visibility, transaction profiling and eliminating inter­nal “blame games,” the financial benefit was deemed significant. To date, the bank has substantially low­ered the number of incident tickets at their help desk and the number of expensive support professionals required for problem resolution.

With a solution that tracks and monitors the envi­ronment and automates problem resolution, the long “blame game” scenarios mentioned earlier disappeared, raising service levels and reducing the number of tickets at the service desk.

“Instead of spending our time identifying the cause of problems, AutoPilot recognizes them right away,” said the IT director, “thereby releasing resources from detection and diagnosis tasks, and enabling much faster actual problem resolution.”

He added: “Insight into real-time transaction activity isn’t easy to come by, but it’s key to the delivery of high-quality service that helps establish market lead­ership. AutoPilot provides intuitive insight in a simple, central dashboard.

“With AutoPilot,” he continued, “we more accurately process SWIFT and syndicated loan payments, routing out to the DTCC and SIAC, Fedwire payment settlement, and more.”

AutoPilot also helps with trending analysis to antic­ipate problems and enable capacity planning. The solution handles legacy applications written in “C” and “C++”—programming languages, for which the banks previously lacked adequate monitoring tools.

“AutoPilot helps ensure our SLAs are fulfilled and thresholds aren’t being breached,” the IT director concluded. “It’s been a huge help in providing general customer views into the system.”

About Nastel AutoPilot

Nastel AutoPilot ensures the availability and performance of critical business applications via auto discovery, business transaction profiling, real-time monitoring, role-based dashboards and automated problem resolution. Customers in the line of business, development, and IT utilize AutoPilot to guarantee high application performance, compliance, reduced user impact, fewer incidents, lower costs, and greater productivity.

About Nastel

Nastel Technologies helps the world’s largest enterprises solve application compliance, surveillance and performance issues while safeguarding customer relationships and lowering operational costs. Nastel’s Enterprise-Grade APM solution encompasses deep real-time monitoring, transaction tracking, and analytics spanning applications, middleware, transactions, end-user experience, logs, and mobile services in a single, powerful, easy-to-use solution.

Nastel is privately held and headquartered in New York, with offices in the U.S., the U.K., France, Germany and Mexico, with an additional network of partners throughout Europe, the Middle East, Latin America and Asia. For more information, visit nastel.com.