Fast Data – Organizations are overwhelmed by boundless streams of data from suppliers, products, assets, apps, IT infrastructure, employees and customers. If they could quickly make sense of it all, they could respond faster, cut costs, improve service and find new sources of revenue.
But they struggle for many reasons. Big-data analytics (on-prem or in the cloud) demands hard-to-find developer, data science and IT skill sets. Centralized “store then analyze” architectures are poorly suited to use cases that seek to predict and respond fast to rare events — like the failure of a turbine — or those that demand insights that are always current: A city generates terabytes of traffic data per day but needs to always deliver current, accurate predictions (to within tens of ms) to vehicle routing applications. My advice for any developer tasked with analyzing boundless data streams, therefore, is to avoid big data — it’s expensive and slow. Instead, they should adopt an architecture that continuously derives insights from the data stream, at the edge, without dependence on a database on the “hot path” of the app.Depending on the application, it may even be unnecessary to store the raw data; it can simply be analyzed and discarded in many cases.
Streaming data analysis itself is not new: Although major open source data science tools such as Spark, Beam and Flink offer developer abstractions for stream processing, they are complex, application architecture is still centralized and developers who possess the right skills to integrate complex stacks are hard to find. Organizations also need to find data scientists to make sense of the data and IT pros who can build and manage a complex dataflow pipeline. Time is often a headache: Adapting real-world events to mini-batch frames for analysis is hard, and some data sources are noisy while others only occasionally add to the mix. Of those listed, only Flink is stateful, so interim results have to be stored in a database again. Finally, applications are necessarily single tenant. What if data is controlled by different organizations that can’t share? Overall, there is a dearth of reusable experience, and solutions can be very temperamental.
An alternative architecture for streaming data analysis is inspired by the principles that have made “reactive” architectures and languages so popular in recent years — specifically, to adopt stateful processing at the edge. The approach helps to solve a huge problem in many analytics and stream learning applications, namely the complexity of building a model of the real-world system that generates data.
This article originally appeared on Forbes.com. To read the full article, click here.
Nastel Technologies uses machine learning to detect anomalies, behavior and sentiment, accelerate decisions, satisfy customers, innovate continuously. To answer business-centric questions and provide actionable guidance for decision-makers, Nastel’s AutoPilot® for Analytics fuses:
- Advanced predictive anomaly detection, Bayesian Classification and other machine learning algorithms
- Raw information handling and analytics speed
- End-to-end business transaction tracking that spans technologies, tiers, and organizations
- Intuitive, easy-to-use data visualizations and dashboards
If you would like to learn more, click here