Big Data – What is it about “big data” that resists definition? Today we have myriad competing definitions that each attempt to circumscribe just what it is we mean when we talk about the idea of using “big data” to understand the world around us. The notion that the size, speed or modality of data warrants such a label falls apart when we recognize that every Google search involves analyzing a 100-petabyte archive using hundreds of query terms. Instead of referring to the size of our datasets, could “big data” refer to the way in which we utilize our data, regardless of its size?
The question of just what constitutes “big data” has become a perennial point of debate in the digital world. Typically, most definitions relate to the characteristics of the data being analyzed, but such definitions become increasingly strained when we recognize that the most mundane of internet tasks, from conducting a Google search to querying Twitter all involve processing enormous volumes of multimodal material that is growing rapidly.
Using the example of a Google search, it seems absurd to label every Web search a “big data analysis” merely because it examined 100 petabytes using hundreds of parameters.
Yet what differentiates a keyword Google search from an SQL query of a data warehouse of the sort that is routinely described as precisely such a big data analysis? Does a keyword search written as an SQL query count as big data where a keyword typed into a Web page does not? Does an SQL-computed histogram count or does it take at least a linear regression?
Does using an SQL query to count how many records there are in a ten petabyte database count as a big data analysis? What about a summation or field extraction?
Where do we draw the line between ordinary data and “big data?” Does that boundary depend on the industry in which we work? To the Google’s of the world, petabytes are passé. In the arts, humanities and social sciences, datasets of hundreds of megabytes are still often referred to in the literature as “big data” analyses and genuinely reflect in some fields datasets far larger than those ordinarily used.
This article originally appeared on forbes.com To read the full article, click here.
Nastel Technologies uses machine learning to detect anomalies, behavior and sentiment, accelerate decisions, satisfy customers, innovate continuously. To answer business-centric questions and provide actionable guidance for decision-makers, Nastel’s AutoPilot® for Analytics fuses:
- Advanced predictive anomaly detection, Bayesian Classification and other machine learning algorithms
- Raw information handling and analytics speed
- End-to-end business transaction tracking that spans technologies, tiers, and organizations
- Intuitive, easy-to-use data visualizations and dashboards
If you would like to learn more, click here.