Questioning The Long-Term Importance Of Big Data In AI
Big Data – No asset is more prized in today’s digital economy than data. It has become widespread to the point of cliche to refer to data as “the new oil.” As one recent Economist headline put it, data is “the world’s most valuable resource.”
Data is so highly valued today because of the essential role it plays in powering machine learning and artificial intelligence solutions. Training an AI system to function effectively—from Netflix’s recommendation engine to Google’s self-driving cars—requires massive troves of data.
The result has been an obsession with bigger and bigger data. He with the most data can build the best AI, according to the prevailing wisdom. Incumbents from IBM to General Electric are racing to re-brand themselves as “data companies.” SoftBank’s Vision Fund—the largest and most influential technology investor in the world—makes no secret of the fact that its focus when looking for startups to back is data assets. “Those who rule data will rule the world,” in the words of SoftBank leader Masayoshi Son.
As the business and technology worlds increasingly orient themselves around data as the ultimate kingmaker, too little attention is being paid to an important reality: the future of AI is likely to be far less data-intensive.
At the frontiers of artificial intelligence, various efforts are underway to develop improved forms of AI that do not require massive labeled datasets. These technologies will reshape our understanding of AI and disrupt the business landscape in profound ways. Industry leaders would do well to pay attention.
Today, in order to train deep learning models, practitioners must collect thousands, millions or even billions of data points. They must then attach labels to each data point, an expensive and generally manual process. What if researchers didn’t need to laboriously collect and label data from the real world, but instead could create the exact dataset they needed from scratch?
Leading technology companies—from established competitors like Nvidia to startups like Applied Intuition—are developing methods to fabricate high-fidelity data, completely digitally, at next to no cost. These artificially created datasets can be tailored to researchers’ precise needs and can include billions of alternative scenarios.
“It’s very expensive to go out and vary the lighting in the real world, and you can’t vary the lighting in an outdoor scene,” said Mike Skolones, director of simulation technology at Nvidia. But you can with synthetic data.
As synthetic data approaches real-world data in accuracy, it will democratize AI, undercutting the competitive advantage of proprietary data assets. If a company can quickly generate billions of miles of realistic driving data via simulation, how valuable are the few million miles of real-world driving data that Waymo has invested a decade to collect? In a world in which data can be inexpensively generated on demand, the competitive dynamics across industries will be upended.
As AI gets smarter in the years to come it is likely to require less data, not more.
Unlike today’s AI, humans do not need to see thousands of examples in order to learn a new concept. As an influential Google research paper put it, “A child can generalize the concept of ‘giraffe’ from a single picture in a book, yet our best deep learning systems need hundreds or thousands of examples.”
In order for machine intelligence to truly approach human intelligence in its capabilities, it should be able to learn and reason from a handful of examples the way that humans do. This is the goal of an important field within AI known as “few-shot learning.”
Exciting recent progress has been made on few-shot learning, particularly in the field of computer vision. (The technique is called one-shot learning or zero-shot learning, respectively, when only one or zero data points are used.) Researchers have developed AI models that, under the right circumstances, can achieve state-of-the-art performance on tasks like facial recognition based on one or a few data points.
This article originally appeared on forbes.com To read the full article and see the images, click here.
Nastel Technologies uses machine learning to detect anomalies, behavior and sentiment, accelerate decisions, satisfy customers, innovate continuously. To answer business-centric questions and provide actionable guidance for decision-makers, Nastel’s AutoPilot® for Analytics fuses:
- Advanced predictive anomaly detection, Bayesian Classification and other machine learning algorithms
- Raw information handling and analytics speed
- End-to-end business transaction tracking that spans technologies, tiers, and organizations
- Intuitive, easy-to-use data visualizations and dashboards