Big Data Definitions

gabonesedestructionDéveloppement de logiciels

17 févr. 2014 (il y a 3 années et 1 mois)

192 vue(s)

Big Data Definitions

Big data
refers to data sets that are over 30 terabytes
(~a trillion bytes or a thousand gigabytes) which are
collected from traditional and digital sources both
inside and outside a company

Gartner’s Definitions

The 3 V’s


Structured: identifiable data in a traditional database, usually with columns and rows,
that can be easily read by a computer or by a human

Unstructured: has no identifiable structure, like text documents, email, pictures,
video, audio, tweets, stock ticker data and financial transactions

structured: a mixture of both


How much data is coming in

Examples include transaction
based data stored through the years or unstructured
data streaming in from social media.


How fast the data is coming in

Examples like RFID tags, sensors and smart metering all produce huge amounts of data
in real time. Reacting quickly enough to deal with data velocity is a challenge for most

Not part of Gartner’s definitions but
also widely accepted


When data flows can be highly inconsistent with periodic peaks

Examples are when something is trending on social media or if there are daily,
seasonal or event
triggered peaks


Data comes from multiple sources and being able to link, match, cleanse and
transform data across systems is a complexity. Necessary to connect and correlate
relationships, hierarchies and multiple data linkages so the data doesn’t get out of

Companies involved

Data storage, networking and hardware

ARM, Brocade, Cisco, Dell, EMC, HP, Intel ,Lenovo,
NetApp, Seagate

Enterprise software companies:

Adobe, Citrix System, IBM, Fujitsu, Informatica,
Oracle, Red Hat, SAP,

Every day 2.5 quintillion (billion
) bytes of data is created

More data was produced from 2010
2012 than in all of history


= 10
= 295
,000,000,000,000,000,000 bytes
of digital data
exists (as of Dec 2012)

Every hour enough information is consumed by internet traffic to fill 7
million DVDs

Every 18 months the sum of digital human knowledge doubles

As of 2013, 90% of the world’s data was created within the past two years

Only 20% is structured

meaning that it can be readily analyzed via the same tools
that have been used for over four decades

The remaining 80% of this newly created data is “unstructured” content stemming
from sources such as

photos, YouTube videos and social media posts


How much data

Trends in Big Data

According to research collective
, in 2013 big data
is an $18 billion market on its way to $50 billion in 2018,

Mobiles will provide a lot of the future’s data, including
information from apps, GPS location, and other services
running in the background

Price discrimination,

accused of charging more to
Mac users, Netflix ran an experiment using big data on the
users of
, Wikipedia and to
see what price the market would bear

fly and continuous champion/challenger testing of
offers and content on websites


and Mayer

wrote a best
selling book

Trends in Big Data

Social network analysis (SNA)

The mapping and measurement of relationships and flows between people, groups,
organizations or other actors

Made up of nodes (points or hubs) and ties (lines connecting the points), to analyze data

Example of SNA are Stanley

six degrees of separation in the 1960’s

Example: uncovering relationships between entities or customers in a large network with the
goal of identifying influencing nodes of customers

best offer (NBO)

centric marketing paradigm that considers the different actions that can be taken
for a specific customer and decides on the ‘best’ one

This is an offer, proposition, service, etc. that is determined by the customer’s interests and
needs on the one hand, and the marketing organization’s business objectives on the other

Analytics estimates the probability that customers will be interested in a targeted offer

lift modeling
uplift modeling

Modeling to predict the influence on a customer's buying behavior that results from
marketing contact

If you are launching a marketing campaign, there's no sense in sending an offer to prospective
customers who would have bought anyway, to people who will react negatively when
contacted, or to those who are "lost causes." The key is to focus in on only those people who
are "persuadable.“

A predictive modelling technique that directly models the incremental impact of a treatment
(such as a direct marketing action) on an individual's behaviour

Can be used for up
sell, cross
sell and retention modelling

Derived revenue from big data

Amazon, Microsoft, Deloitte and Google are
deriving 1% of their revenue on Big Data (2013)


McKinsey calls it "the biggest game
changing opportunity for marketing and sales
since the Internet went mainstream 20 years ago.“

By 2020 one
third of data will be stored or will have passed through the cloud

By 2020 IT departments will look after 10x more servers, 50x more data and 75x
more files

Cognitive computing

(what Google is doing) is next for Big Data, being able to
analyze data in the context of other consumer behaviours

Mobile, cloud, social and big data to drive 90% of all growth in the IT market from
2020 (Chartered Institute for IT)

The world’s digital information is expected to grow by 57%. Within that, internet
traffic is growing by 35%, and mobile data traffic at 110% (Cisco, 2013)


Bandwidth issues





Future of Big Data

Intelligent personalization

Uses all the data that a marketer has at their disposal to optimize the content and
optimize the experience, including things like mobile device, and regional optimization,
, social signals, transactional data coming from an e
Commerce system
and much more



topic of “predictive analytics” is very hot today. But as any marketer knows, it’s near impossible to
actually predict how any customer interaction is going to go. Instead, the smart integration of data is
going to result in “situational analytics.” This means being able to look at data, and plug in different,
hypothetical situations and see which one has the better chance of actually succeeding.

Level playing field

One of the biggest evolutions of integrating smarter data into content experiences is that it levels the
playing field with larger competitors who may have more resources to burn on advertising media


Future of Big Data 2

Techniques for Analyzing

A/B testing

Association rule learning:
To discover what
relationships or “association rules,” like what
basket of goods a consumer might buy

Cluster analysis:

Splitting large groups into
smaller groups

: Collecting data submitted by
a large group of people or a community


Nearly a quarter billion health and fitness apps will be
downloaded by 2017, up from 156 million today, predicts

Sales of sports and fitness monitors, like heart
rate monitors
and pedometers, will reach 56.2 million units in 2017. Many
of these will be on mobile phones, and they will increasingly
connect to the Internet


and New York’s Memorial Sloan
Kettering Cancer

are creating Watson (IBM) apps that will answer
cancer diagnosis and treatment questions from doctors,
researchers, and insurance companies


Big Data and Healthcare