Big Data Definitions

gabonesedestructionSoftware and s/w Development

Feb 17, 2014 (3 years and 8 months ago)

220 views

Big Data Definitions


Big data
refers to data sets that are over 30 terabytes
(~a trillion bytes or a thousand gigabytes) which are
collected from traditional and digital sources both
inside and outside a company

Gartner’s Definitions


The 3 V’s


Variety


Structured: identifiable data in a traditional database, usually with columns and rows,
that can be easily read by a computer or by a human


Unstructured: has no identifiable structure, like text documents, email, pictures,
video, audio, tweets, stock ticker data and financial transactions


Multi
-
structured: a mixture of both



Volume


How much data is coming in


Examples include transaction
-
based data stored through the years or unstructured
data streaming in from social media.



Velocity


How fast the data is coming in


Examples like RFID tags, sensors and smart metering all produce huge amounts of data
in real time. Reacting quickly enough to deal with data velocity is a challenge for most
organizations.

Not part of Gartner’s definitions but
also widely accepted


Variability


When data flows can be highly inconsistent with periodic peaks


Examples are when something is trending on social media or if there are daily,
seasonal or event
-
triggered peaks



Complexity


Data comes from multiple sources and being able to link, match, cleanse and
transform data across systems is a complexity. Necessary to connect and correlate
relationships, hierarchies and multiple data linkages so the data doesn’t get out of
control.

Companies involved


Data storage, networking and hardware
companies:


ARM, Brocade, Cisco, Dell, EMC, HP, Intel ,Lenovo,
NetApp, Seagate



Enterprise software companies:


Adobe, Citrix System, IBM, Fujitsu, Informatica,
Oracle, Red Hat, SAP, Salesforce.com




Every day 2.5 quintillion (billion
billion
) bytes of data is created


More data was produced from 2010
-
2012 than in all of history


295
exabytes

= 10
18
= 295
,000,000,000,000,000,000 bytes
of digital data
exists (as of Dec 2012)


Every hour enough information is consumed by internet traffic to fill 7
million DVDs


Every 18 months the sum of digital human knowledge doubles
source


As of 2013, 90% of the world’s data was created within the past two years


Only 20% is structured


meaning that it can be readily analyzed via the same tools
that have been used for over four decades


The remaining 80% of this newly created data is “unstructured” content stemming
from sources such as
Instagram

photos, YouTube videos and social media posts
source

source


How much data

Trends in Big Data


According to research collective
Wikibon
, in 2013 big data
is an $18 billion market on its way to $50 billion in 2018,
source


Mobiles will provide a lot of the future’s data, including
information from apps, GPS location, and other services
running in the background


Price discrimination,
Orbitz

accused of charging more to
Mac users, Netflix ran an experiment using big data on the
users of
Rottentomatoes
, Wikipedia and Blockbuster.com to
see what price the market would bear


On
-
the
-
fly and continuous champion/challenger testing of
offers and content on websites


Cukier

and Mayer
-
Schoenberger

wrote a best
-
selling book





Trends in Big Data


Social network analysis (SNA)


The mapping and measurement of relationships and flows between people, groups,
organizations or other actors
source


Made up of nodes (points or hubs) and ties (lines connecting the points), to analyze data


Example of SNA are Stanley
Milgram’s

six degrees of separation in the 1960’s


Example: uncovering relationships between entities or customers in a large network with the
goal of identifying influencing nodes of customers


Next
-
best offer (NBO)


Customer
-
centric marketing paradigm that considers the different actions that can be taken
for a specific customer and decides on the ‘best’ one


This is an offer, proposition, service, etc. that is determined by the customer’s interests and
needs on the one hand, and the marketing organization’s business objectives on the other


Analytics estimates the probability that customers will be interested in a targeted offer


True
-
lift modeling
or
uplift modeling


Modeling to predict the influence on a customer's buying behavior that results from
marketing contact


If you are launching a marketing campaign, there's no sense in sending an offer to prospective
customers who would have bought anyway, to people who will react negatively when
contacted, or to those who are "lost causes." The key is to focus in on only those people who
are "persuadable.“
source


A predictive modelling technique that directly models the incremental impact of a treatment
(such as a direct marketing action) on an individual's behaviour


Can be used for up
-
sell, cross
-
sell and retention modelling





Derived revenue from big data


Amazon, Microsoft, Deloitte and Google are
deriving 1% of their revenue on Big Data (2013)







source



McKinsey calls it "the biggest game
-
changing opportunity for marketing and sales
since the Internet went mainstream 20 years ago.“
source


By 2020 one
-
third of data will be stored or will have passed through the cloud


By 2020 IT departments will look after 10x more servers, 50x more data and 75x
more files


Cognitive computing

(what Google is doing) is next for Big Data, being able to
analyze data in the context of other consumer behaviours


Mobile, cloud, social and big data to drive 90% of all growth in the IT market from
2013
-
2020 (Chartered Institute for IT)


The world’s digital information is expected to grow by 57%. Within that, internet
traffic is growing by 35%, and mobile data traffic at 110% (Cisco, 2013)


Challenges:


Bandwidth issues


Privacy


Security

Source

source

Future of Big Data


Intelligent personalization


Uses all the data that a marketer has at their disposal to optimize the content and
optimize the experience, including things like mobile device, and regional optimization,
real
-
time
behavior
, social signals, transactional data coming from an e
-
Commerce system
and much more


Situational

analytics


The
topic of “predictive analytics” is very hot today. But as any marketer knows, it’s near impossible to
actually predict how any customer interaction is going to go. Instead, the smart integration of data is
going to result in “situational analytics.” This means being able to look at data, and plug in different,
hypothetical situations and see which one has the better chance of actually succeeding.


Level playing field


One of the biggest evolutions of integrating smarter data into content experiences is that it levels the
playing field with larger competitors who may have more resources to burn on advertising media



source

Future of Big Data 2

Techniques for Analyzing


A/B testing


Association rule learning:
To discover what
relationships or “association rules,” like what
basket of goods a consumer might buy


Cluster analysis:

Splitting large groups into
smaller groups


Crowdsourcing
: Collecting data submitted by
a large group of people or a community




Source



Nearly a quarter billion health and fitness apps will be
downloaded by 2017, up from 156 million today, predicts
iSuppli



Sales of sports and fitness monitors, like heart
-
rate monitors
and pedometers, will reach 56.2 million units in 2017. Many
of these will be on mobile phones, and they will increasingly
connect to the Internet


Wellpoint

and New York’s Memorial Sloan
-
Kettering Cancer
Center

are creating Watson (IBM) apps that will answer
cancer diagnosis and treatment questions from doctors,
researchers, and insurance companies


source

Big Data and Healthcare