AIOps: Hollywood Myth Versus Practical Reality
When you say the words “artificial intelligence,” what comes to mind for many is some sort of super intelligent machine life-form we’ve all seen in sci-fi movies, a being that knows more than we do and can do more than we can. Complete awareness and interpretation of its environment, rapid learning and adaptation, and incomprehensibly fast cycling through the orient-observe-decide-act loop are implied AI characteristics in most films.
Even in the real world, people have built up AI in sales cycles so much that expectations are unrealistic. Marketing efforts tend to anthropomorphize AI solutions, creating inflated expectations about the possibilities, even within organizations that don’t yet have AI experience or capabilities. They forget that it likely took large organizations years to develop basic AI applications like Google’s spam filters or Amazon’s automated product suggestions.
IT operations would seem to be a perfect fit for AI: It’s a machine in charge of machines. But realizing the potential of AI in IT ops is a tougher challenge, organizationally, than most expect. It requires the right match of human capability and AI technology to execute well because every company’s IT environment is unique. It likely took decades to train an AI to play chess better than humans — the board, the pieces, the rules, the strategy — and a company’s IT ecosystem is both more complex and more difficult to describe than chess. And the desired outcome isn’t to defeat an opponent; it’s to eliminate noise, identify where and why problems occur, and minimize the human work required.
AIOps is not magic; it’s science.
Doing AIOps correctly means feeding a sophisticated algorithmic program a “normalized” set of data. In this case, “normalized” means putting data all in the same format so the AI understands it. For instance, humans have many different terms for the word “host,” such as “node” or “server.” People can almost instantly recognize those synonyms based on context, but AI can’t — at least not until you describe those relationships, which is a form of normalization.
You then have to give the AI a framework to make decisions and task it with certain actions. This data preparation is probably the place where most organizations fall over. AI doesn’t magically come in and sense its environment. You need to develop a common taxonomy, and having multiple sources makes this complex because there is a lack of standards across all the manufacturers in the IT landscape.
Once the data is normalized, you convert it to a map that the AI can use in its tasking. Some organizations call this an enrichment map, and it’s based on key-value pairs: This host is related to this application; this application is related to this business service; this source system is related to this location from a network perspective. You establish those relationships and deliver them to the AI as maps of your environment, one by one. All the logs, metrics, events and alerts that describe what is happening in that environment can then be processed through those enrichment maps and can pick up additional, meaningful attributes.
The AI will then use those attributes in correlating related events. If you imagine an alert with two attributes and you enrich it with 15 additional ones, all those fields are now available for correlation patterns to leverage in grouping alerts on the basis of shared attributes. Once you do that, the advantage of AI kicks in. Degrees of relatedness between events can be determined using normalized data, and machine learning can start identifying patterns and clustering events.
The AI isn’t searching for patterns like a human would. It’s instantaneously looking at every event and its specific attributes, clustering them in multiple different ways, and grouping events together based on the highest-scoring relationships within the specified time windows. When done correctly, AI is very systematic, consistent and reliable, but it is not “intelligent” in the way that we see in the movies. AI will never make a cognitive leap to say, “Maybe that network alert is related to that application alert,” if it doesn’t know anything about those two systems in the first place.
AI and ML are only as good as the people who set them up.
Good data inputs and intelligent enrichment will return good results. Applying your existing tribal knowledge when you’re configuring the AI will be a huge benefit. And given the magnitude of human decision-making you’re trying to replace with AI, putting a human in the loop to verify or reject the outputs is critical.
Supervised learning helps convert that tribal knowledge to an AI-compatible format and ensures the output is useable. Even with very sophisticated (and expensive) AI solutions, if teams don’t understand why the solution is grouping specific alerts together or why it’s taking two seemingly unrelated events and saying they’re part of the same incident, then they may reject it. That’s one of the interesting things about AI: By looking at hundreds of different dimensions simultaneously, it might identify a relationship that humans didn’t. That may be useful knowledge, but if humans don’t understand why and how the AI did that, they won’t trust it, and they won’t use it.
AIOps can change the way a company operates its services. It has the potential to increase service availability, reduce manual toil, and improve the consistency and reliability of teams beyond human capability. But an AI solution won’t come online like a machine angel and start finding things to fix. It requires an appetite for change and willingness to adapt to build some straightforward capabilities in data normalization and change processes to accommodate different inputs and outputs.
That’s where IT leadership is critical: Many people see the potential in AIOps, but leaders articulate why it is needed, how the team will benefit as a result and what the desired end state looks like. That gives teams the motivation to undertake those changes.
This article originally appeared on forbes.com To read the full article and see the images, click here.
Nastel Technologies helps companies achieve flawless delivery of digital services powered by middleware. Nastel delivers Middleware Management, Monitoring, Tracking and Analytics to detect anomalies, accelerate decisions, and enable customers to constantly innovate. To answer business-centric questions and provide actionable guidance for decision-makers, Nastel’s Navigator X fuses:
- Advanced predictive anomaly detection, Bayesian Classification and other machine learning algorithms
- Raw information handling and analytics speed
- End-to-end business transaction tracking that spans technologies, tiers, and organizations
- Intuitive, easy-to-use data visualizations and dashboards
Nastel Technologies is the global leader in Integration Infrastructure Management (i2M). It helps companies achieve flawless delivery of digital services powered by integration infrastructure by delivering Middleware Management, Monitoring, Tracking, and Analytics to detect anomalies, accelerate decisions, and enable customers to constantly innovate, to answer business-centric questions, and provide actionable guidance for decision-makers. It is particularly focused on IBM MQ, Apache Kafka, Solace, TIBCO EMS, ACE/IIB and also supports RabbitMQ, ActiveMQ, Blockchain, IOT, DataPower, MFT and many more.
The Nastel i2M Platform provides:
- Secure self-service configuration management with auditing for governance & compliance
- Message management for Application Development, Test, & Support
- Real-time performance monitoring, alerting, and remediation
- Business transaction tracking and IT message tracing
- AIOps and APM
- Automation for CI/CD DevOps
- Analytics for root cause analysis & Management Information (MI)
- Integration with ITSM/SIEM solutions including ServiceNow, Splunk, & AppDynamics