Interviewee Information

  • Senior Operations Manager
  • Responsible for all aspects of IBM® WebSphere® MQ middleware management supporting business applications
  • Team of 8 people
  • Team responsible for approximately 325 applications that leverage MQ, 275+ unique applications
  • Application and transaction types include: Service Oriented Architecture (SOA), Web Services, integrations, Point of Sale, manufacturing, services and support, others

Company Information

  • High Technology Infrastructure Manufacturing/Distribution vertical
  • Global company with 400 locations/offices worldwide, 24/7 operations
  • Number of employees: 70,000 to 80,000

This Senior Operations Manager leads an Operations team supporting 300+ applications that leverage IBM MQ. For those unfamiliar with MQ, it is a messaging backbone connecting distributed platforms, designed to support high volume transactions. For this company, MQ supports the manufacturing floor and the integrations between the sales and manufacturing systems. It also drives the equipment servicing side of the business as well as a variety of other applications. The team monitors 65,000 MQ queues, with “probably billions” of transactions going through the systems on a daily basis. “Some of the queues do a million transactions; others do 100, depending on the application itself.”

His team is the “second level support” team, and highly skilled, with some employees being IT veterans of 15 years or more. They have a combination of skills, but are primarily Operations-focused. If the Level 1 support team cannot solve an application-related problem, they escalate it to this team, which is more application-savvy.

Replacement Story

Nastel Autopilot™1 replaced a “homegrown solution to monitor queues on MQ. It was ten years old and outdated. The people who had written it were long gone.” The team was maintaining it themselves, and it simply became too expensive.

In addition to the need to retire the old solution, they had other pain points as well. They are a “big Microsoft shop. Every time new patches came out, they broke a portion of the transactions running on MQ.” The tool often did not notify the team when transactions were broken—they didn’t know there was a problem until they received a call from the business. Troubleshooting required a long process of sifting through logs and metrics to determine the source of the problem. “It was a very large pain.”

“For us, Severity 1 is not being able to sell products or get them out of the door. Severity 1 problems cost us between $750,000 and $1.5 million per incident. Severity 2 is system degradation. There would be cases where the MQ queue was backing up, but we couldn’t tell because our old tool couldn’t see it.”

Although the initial Nastel Autopilot installation was quick, it took between six and nine months to integrate it with BMC Remedy (for trouble-ticket notification) and to configure Autopilot for the 300+ applications they support. Of these, there are 275 unique applications, each of which required analysis and tuning on the Autopilot side. Tuning was done in conjunction with business input, based on the specific thresholds and metrics that were important to business stakeholders.

Nastel provided education and training, as well as CTO-level support when the team ran into scaling issues. There were some “basic issues with the database because of the large numbers of transactions we were throwing at it.”

ROI

Hard ROI (per year): $15,850,000
Details in Table 1 below

Summary

Hard ROI:

  • Reduction in Severity 1 problems: Number of Severity 1 problems reduced from 15 per year to 1 in the past 1 ½ years, for savings of $15,500,000 annually
  • Reduction in new headcount requirements: Workload dictated a 33% increase in staff (from 8 people to 10.5 FTE). After the Nastel implementation, they were able to redeploy one specialist based on workload reduction. This reduced their total support team to 7 versus a projected 10.5, at an annual savings of $350,000.

Areas of Soft ROI:

  • Customer satisfaction: Internal and external customer satisfaction have improved.
  • Service improvements: 6,000 trouble tickets per year reduced by 70%
  • Business flexibility improvements: With Nastel Autopilot support, the less-skilled Level 1 support team has been able to shoulder many calls which would have previously been escalated to Level 2. This has freed up the more skilled Level 2 team to focus on business optimization projects, such as Change and Patch Management and standards development.

Lessons Learned

  • Business first: The key to utilizing any management product is to become proactive by understanding applications from the business perspective. “First, we identify where, from the business perspective, a pain point is. Then, we need to set the threshold to notify the team before we hit the pain point.”
  • Choose products that are flexible in meeting the needs of the business: The team used their extensive business knowledge to tune Nastel Autopilot to the correct notification thresholds. In effect, they used Nastel Autopilot’s complex event processing (CEP) feature to create “artificial intelligence” capabilities to predict problems before customers experienced them. They were able to do this because of both their business knowledge and the inherent flexibility of the product.

Quote

“Nastel Autopilot was a tool that we were able to utilize along with our own business expertise to deliver much more reliable services to the business. We wanted to become more proactive and to be notified of a problem before we hit the pain points. This has relieved the stress on my team, and removed a ton of stress on the business side. It has allowed us to manage into the 21st century versus 1970.”

Hard

Before

After

Annualized savings

Performance/downtime

15 Severity 1’s per year

1 Severity 1 in 1 ½ years=

improvements

Cost per incident: $750,000 to $1.5 million

$1,00,000

(8 @ $750,000= $6,000,000

(annualized 2/3 of $1,500,000)

7 @ $1.5 mil =) $10,500,000

$16,500,000

$1,500,000

$15,500,000

Headcount reallocated/new headcount unnecessary

Workload required increase from 8 to 10.5 FTE at

total annual cost of

Reallocated one staff member, decrease from 8 to 7 FTE at

total annual cost of

(10.5 * 100,000)

(7 * $100,000)

$1,050,000

$700,000

$350,000

Total Hard ROI

$15,850,000

Soft

Customer satisfaction

Frequent outages

Infrequent outages

Better reliability, better relationship with the business

Service improvements

6,000 trouble tickets per year

Reduced by 70%

Approximately 1,800

versus 6,000

Business flexibility improvements

Level 2 spent significant time supporting production apps

Level 1 team able to assume more responsibility, Level 2 team now able to focus on improving Change Management, Patch Management, and standards creation

More expensive personnel redeployed to strategic versus tactical projects

Table 1: ROI

About EMA

Founded in 1996, Enterprise Management Associates (EMA) is a leading industry analyst firm that specializes in going “beyond the surface” to provide deep insight across the full spectrum of IT management technologies. EMA analysts leverage a unique combination of practical experience, insight into industry best practices, and in-depth knowledge of current and planned vendor solutions to help its clients achieve their goals. Learn more about EMA research, analysis, and consulting services for enterprise IT professionals and IT vendors atwww.enterprisemanagement.com or follow EMA on Twitter (http://twitter.com/ema_research).

©2009 Enterprise Management Associates