For in-depth information on various Big Data technologies, check out my free e-book “Introduction to Big Data“.
Gartner, and now much of the industry, use the “3Vs” model for describing big data:
- high volume
- high velocity
- high variety
Do 3Vs really capture the core characteristics of big data? I don’t think so. Let’s first look at the “high volume”. Yes, we are processing data in the scale of PB (petabyte) or even EB (exabyte) today. But big is always relative, right? Although no one regard 1TB data as big data today, it was big and very challenging to process 20 years ago. Recall the fastest supercomputer in 1994, Fujitsu Numerical Wind Tunnel, had the peak speed of 170.40 GFLOPS. Well, a K20X GPU in a PC has the power of 1310 GFLOPS today. Besides software innovations (e.g. GFS and MapReduce) also helped a lot to process bigger and bigger data. With the advances of technologies, today’s big data will quickly become small in tomorrow’s standard. The same thing holds for “high velocity”. So high volume and high velocity are not the core of big data movement even though they are the driving force of technology advancement. How about “high variety”? Many people read it as unstructured data which can not be well handled by RDBMS. Well, unstructured data have always be there no matter where they are stored. We do handle text, voice, images and videos better today with the advances of natural language processing, information retrieval, computer vision, and pattern recognition. But it is still about the technology advancement rather than intrinsic value of big data.
After talking about the “big”, let’s see the other word of big data. We all know that data are not oil but soil. Without analysis, they are pretty much useless. But extremely valuable knowledge and insights can be discovered from data. No matter how you call this analytic process (data science, business intelligence, machine learning, or data mining), the business goal is the same: higher competency gained from the discovered knowledge and insights. But wait a second. Doesn’t the idea of data analytics exist for long time? So what’re the real differences between today’s “big data” analytics and traditional data analytics? Some people argue that big data and business intelligence are using different statistical approaches (inductive vs. descriptive). Hmm, it doesn’t ring a bell to me as such mathematical differences hardly drive a big business buzz.
So what do really differentiate big data analytics? For me, big data means proactively learning and understanding our customers, their needs, behaviors, experience, and trends in near real-time and 24×7. On the other hand, traditional data analytics is passive, treats customers as a whole or segments rather than individuals, and there is significant time lag. Check out the applications of big data, a lot of them is about
- User Experience and Behavior Analysis
which you rarely find in business intelligence applications. Interestingly, people were even against the idea of personalization back to 2001 (check out this Harvard Business Review article Personalization? No Thanks.). One may argue that my definition of big data is too narrow. I agree that there are a lot of new kinds of big data applications (e.g. internet of things). But the above are still the majority and the driving force for corporations to embrace big data.
How did this shift happen? The data have been changing. Traditionally, our databases are just the systems of records, which are manually input by people. In contrast, a big part of big data is log data, which are generated by applications and record every interaction between users and systems. Some people call them machine generated data to emphasize the speed of data generation and the size of data. But the truth is that they are triggered by human actions (event is probably a better name of these data). The analysis on events results in a better understanding of every single user and thus improved user experience and bigger revenue, a lovely win-win for both customers and business.