Why building your own AIOps platform is a bad idea
In the wake of the economic downturn brought on by the Covid-19 pandemic, investments in digital business transformation have accelerated. The applications that drive those processes are not only highly distributed, they also operate at a level of scale that no IT team can manage using legacy approaches to managing IT. It can take weeks to discover the root cause of any issue.
Enter AIOps. Machine learning algorithms make it possible to not only reduce the time it takes to resolve any issue they also enable IT teams to continuously optimise IT environments at any scale. Many aspects of AIOps, however, are still largely unexplored. Rather than opting for proven platforms, some IT teams are building custom solutions in-house.
I have experienced firsthand several large enterprises which have embarked on this risky journey, including one Fortune 500 company that asked a partner to help develop a solution and ultimately deployed a commercial product. Enterprise IT teams go down this road to solve a specific tactical problem, such as alert noise reduction. In many cases, all the time and effort put into a do-it-yourself (DIY) project simply winds up being wasted.
Why DIY AIOps usually fails
Given the prevalence of open-source AI tools and frameworks such as TensorFlow, Theano or the Microsoft Cognitive Toolkit (CNTK), it can be tempting to build your own custom AIOps platform. It takes considerable expertise, however, to not only build an AIOps platform but also integrate it into an enterprise and maintain it. Here are the leading reasons why in-house developed AI projects are risky:
- You’ll need a properly-constructed data lake. AIOps platforms require access to data residing in multiple technology silos in real time. IT teams that build their own AIOps platforms need to make sure they are gathering all the right log data, metrics and traces alongside data collected from IT service and incident management platforms. These comprehensive data sets are needed to train whatever machine learning framework is in place, which is often selected at random. Invariably, that means building or buying a costly big data platform to create a data lake to store all that data. A poorly-constructed AIOps platform will be worse than the proverbial disease it is meant to cure because the insights don’t accurately reflect what’s actually occurring in the IT environment. Do you have the funds for this and experienced data science experts on board to get this right?
- Designing AI-enhanced workflows is unlike other workflows. Getting the data is just the beginning. Determining how the system behaves and affects existing workflows is the next step. IT teams must decide to what degree they merely want the AIOps platform to passively surface recommendation based on what’s observed versus automatically resolving issues based on defined parameters
- Deployment is complex. After developing a few AIOps algorithm to produce meaningful results, the next step is to determine how to deploy it in a resilient and performant architecture. What other systems does it need to integrate with and how will results be monitored and viewed?
- Monitoring user impact is critical. How will end users interact with the algorithm and what is the ideal UI/UX and workflow? How will feedback be provided by end users for improvement and adoption success?
- AIOps support and maintenance is not a project, but a team. Ultimately, an internal IT team would need to build the equivalent of a product which needs ongoing maintenance and support. The total cost of the custom platform starts to rise as the bulk of the IT team could wind up spending most of their time managing the AIOps platform instead of making continual improvements. Even if the IT team has the expertise required to build an AIOps platform, there’s no guarantee those individuals will always be available to maintain and update it. Very few IT professionals spend their entire career at one organisation
- Keeping pace with marketplace innovation. Finally, AIOps as a field is still relatively nascent and the startup community has hundreds of millions of dollars in VC backing to support R&D. Advances are being made at a rate most internal IT teams can’t keep up with, let alone evaluate and vet on their own
How to safely encourage AIOps exploration
There’s no substitute for knowing where an organisation needs to go and how to get there. A commercial AIOps platform incorporates all the best practices that have been defined by legions of IT experts, along with these benefits:
- Faster time to value: You can embark on the AIOps journey much sooner. A commercial AIOps platform will begin surfacing insights in a matter of weeks. It will take an internal IT team months to build an equivalent platform with no guarantee of success. Time is better spent on user adoption and adding and refining use cases for business benefit
- Seasoned experts: A commercial platform provides immediate access to not only a proven framework but also, AIOps experts who can troubleshoot and optimise issues quickly. There’s almost no AIOps challenge they haven’t seen before
How to measure ROI from AIOps
Savvy organisations that invest in AIOps are primarily betting on a better way to manage IT that will enable them to accomplish more as a business. The real value proposition of any AIOps platform is that it enables an existing IT team to do more not by just eliminating rote tasks but also making it possible to deploy more applications reliably without adding IT staff. It’s worth remembering that the cost of labor continues to be the single biggest IT expense.
The return on investment from an AIOps platform can be easily calculated by measuring:
- The number of incidents resolved in a given period of time
- The size of the IT operations/incident management staff before and after an AIOps platform is deployed
This article originally appeared on forbes.com To read the full article and see the images, click here.
Nastel Technologies helps companies achieve flawless delivery of digital services powered by middleware. Nastel delivers Middleware Management, Monitoring, Tracking and Analytics to detect anomalies, accelerate decisions, and enable customers to constantly innovate. To answer business-centric questions and provide actionable guidance for decision-makers, Nastel’s Navigator X fuses:
- Advanced predictive anomaly detection, Bayesian Classification and other machine learning algorithms
- Raw information handling and analytics speed
- End-to-end business transaction tracking that spans technologies, tiers, and organizations
- Intuitive, easy-to-use data visualizations and dashboards
Nastel Technologies is the global leader in Integration Infrastructure Management (i2M). It helps companies achieve flawless delivery of digital services powered by integration infrastructure by delivering Middleware Management, Monitoring, Tracking, and Analytics to detect anomalies, accelerate decisions, and enable customers to constantly innovate, to answer business-centric questions, and provide actionable guidance for decision-makers. It is particularly focused on IBM MQ, Apache Kafka, Solace, TIBCO EMS, ACE/IIB and also supports RabbitMQ, ActiveMQ, Blockchain, IOT, DataPower, MFT and many more.
The Nastel i2M Platform provides:
- Secure self-service configuration management with auditing for governance & compliance
- Message management for Application Development, Test, & Support
- Real-time performance monitoring, alerting, and remediation
- Business transaction tracking and IT message tracing
- AIOps and APM
- Automation for CI/CD DevOps
- Analytics for root cause analysis & Management Information (MI)
- Integration with ITSM/SIEM solutions including ServiceNow, Splunk, & AppDynamics