The Data Science, Big Data, Data Analytics, Artificial Intelligence and Machine Learning Hype

Not only in Gartner’s Hype Cycle for Emerging Technologies but nearly in every Blog and Newsletter, the topics Data Science, Data Analytics, Big Data, Artificial Intelligence (AI) and Advanced Machine Learning (ML) is number one since some month. The hype about this technologies is on it’s top. Smart Factory (Industry 4.0) also contributes to the fact, because on of the four pillars of Smart Factory (Industry 4.0) is Data Analytics and Big Data.

 

But how all these relates to each other?

The base for all the listed topics is data, which is first created and saved from various sources (sensors at machines, user behavior on websites, applications and computers and many more), then archived and finally analyzed to answer specific questions, to find patterns or to show special constellations.

The data is the golden asset for a company in the future and it’s very important to save and archive the data now. It’s absolutely worthless to tell everybody that we could have all data i.e. for transactions, customer behavior, machine processes and application logs, but we don’t activate or install the necessary sensors and don’t store and archive this data. Only when much as possible data from start to end of a process will be saved, including also the data of the final result, then a person called a Data Scientist can use this data and try to answer questions which cannot answered otherwise. This leads EMC to the prediction, that “the amount of stored data is growing faster than ever before and experts states that by the year 2020, about 1.7 megabytes of new information will be created every second for every human being on the planet” [1].

But what is the difference between Data Science, Big Data, Data Analytics, Artificial Intelligence and Machine Learning?

With the recent boom about this topics, also a lot of confusion about the terms starts. First of all: There is no clear definition. Lots of companies and Universities have different definitions of that terms, but the most describes Data Science as the overall umbrella over Data Analytics, Big Data, Artificial Intelligence and Machine Learning topics. The most also use the terms Data Analytics and Data Analysis synonymously.

Big Data refers to large and complex data sets (volume & variety) that’s much larger than the traditional data sets with a higher speed of data processing (velocity). Volume, variety and velocity (called the 3Vs) are the three defining dimensions of big data. For more information about traditional data sets, you also might have a look at Do we still need a Enterprise Data Warehouse?.

When we think about the traditional “3V’s”, explained above and mainly accepted in the industry as a definition, we recognize that Enterprises have been handling that for longer than a decade now, without problem. So, there must be a other definition for Big Data.

I will stay with the 3V’s, but will mention the value we are generating for the business out of the analysis of the data. That’s the difference to simply dealing with volume, variety and velocity. So, I think, with the first ‘V’ as ‘business value’ we will be better served. Beside that, a important fact for that is, to successfully combine your analytic capabilities, your source data and your business needs. With that, our second ‘V’ should be the vision, what is required to fulfill that. The complexity of every very large enterprise today requires our new third ‘V’, virtualization to simplify and accelerate the efforts of our new first two ‘Vs’.

To explain the remaining three terms I will write separate posts, because otherwise this post will get to voluminous. So, stay excited for the next post.

Bibliography

  1. EMC: IDC Digital Universe Study: Big Data, Bigger Digital Shadows and Biggest Growth in the Far East 2011. Retrieved: 14.06.2017.