Big Data Analysis

addictedswimmingAI and Robotics

Oct 24, 2013 (4 years and 8 months ago)


Big Data Analysis


Big Data

Overview (1/2)

Big data refers to datasets whose sizes are beyond
the ability of typical database software tools to
capture, store, manage and analyze.

A primary goal for looking at big data is to discover
repeatable business patterns.

It has many additional uses, including real
time fraud
detection, web display advertising and competitive
analysis, call center optimization, social media and
sentiment analysis, intelligent traffic management,
and smart power grids.

Big data analytics is often associated with


because the analysis of large data



requires a framework


to distribute the work among tens,
hundreds or even thousands of computers.

As technology advances over time, the size of
datasets that qualify as big data will also increase and
big data is expected to play a significant economic
role to benefit not only private commerce but also
national economies and their citizens.

Big data involves more than simply the ability to
handle large volumes of data. Instead, it represents a
wide range of new analytical technologies and
business possibilities.

“Big data” is a general term used to describe the voluminous amount of unstructured and semi
structured data a company creates.
It’s the data that would take too much time and cost too much money to load into a

relational database

for analysis.

McKinsey Big Data Report, BI Research Using Big Data for Smarter Decision Making

Big Data Can Generate Significant Financial Value Across


Big Data

Overview (2/2)

Three V’s of Big Data

The three Vs of big data (volume, variety and
velocity) constitute a comprehensive definition.
Each of the three Vs has its own ramifications for

Data volume is the primary attribute of big data

Big data can also be quantified by counting records, transactions,
tables or files. Some organizations find it more useful to quantify big
data in terms of time. For example, due to the seven
year statute of
limitations in the U.S., many firms prefer to keep seven years of data
available for risk, compliance and legal analysis.

The scope of big data affects its quantification, too. For example, in
many organizations, the data collected for general data warehousing
differs from data collected specifically for analytics.

Data variety comes from a greater variety of sources

Big data comes from a variety of sources, including logs, click streams,
social media, radio
frequency identification (RFID) data from supply
chain applications, text data from call center applications, semi
structured data from various business
business processes, and
geospatial data in logistics.

The recent tapping of these sources for analytics means that so
structured data is now joined by unstructured data (text and human
language) and semi
structured data (XML, RSS feeds).

Data feed velocity as a defining attribute of big data

The collection of big data in real time isn’t new; many firms have been
collecting click stream data from the web for years, using streaming
data to make purchase recommendations to web visitors.

Even more challenging, the analytics that go with streaming data have
to make sense of the data and possibly take action

all in real time.


TWDI Research report on Big Data Analytics


Big Data


International Data Corporation (IDC) released a worldwide big data technology and services forecast report based on a survey
March 2012. As per the survey:

The big data market is expected to grow from $3.2 billion in 2010 to $16.9 billion in 2015. This represents a compound annual

growth rate (CAGR) of 40% or about seven times that of the overall information and communications technology (ICT) market.

The big data market is expanding rapidly and for technology buyers, opportunities exist to use big data technology to improve

operational efficiency and to drive innovation.

There are also big data opportunities for both large IT vendors and start ups. Major IT vendors are offering both database
solutions and configurations supporting big data by evolving their own products as well as by acquisition. At the same time,
than half a billion dollars in venture capital has been invested in new big data technology.

While the five
year CAGR for the worldwide market is expected to be nearly 40%, the growth of individual segments varies from
27.3% for servers and 34.2% for software to 61.4% for storage.

The growth in appliances, cloud, and outsourcing deals for big data technology will likely mean that over time, end users wil
l p
increasingly less attention to technology capabilities and will focus instead on the business value arguments. System
performance, availability, security and manageability will all matter greatly; however, how they are achieved will be less of

for differentiation among vendors.

There is a shortage of trained big data technology experts, in addition to a shortage of analytics experts. This labor supply

constraint will act as an inhibitor of adoption and use of big data technologies, and it will also encourage vendors to deliv
data technologies as cloud
based solutions.

While software and services make up the bulk of the market opportunity, through 2015, infrastructure technology for big data
deployments is expected to grow slightly faster at 44% CAGR. Storage, in particular, shows the strongest growth opportunity,
growing at 61.4% CAGR through 2015.


IDC defines big data technologies as a new generation of technologies and architectures designed to extract value economicall
from very large volumes of a wide variety of data by enabling high
velocity capture, discovery and/or analysis.


Big Data


Big data is complex because of the variety of data it encompasses

from structured data, such as transactions one makes or
measurements one calculates and stores, to unstructured data such as text conversations, multimedia presentations and video

Big data presents a number of challenges relating to its complexity:

One challenge is how one can understand and use big data when it comes in an unstructured format, such as text or video.

Another challenge is how one can capture the most important data as it happens and deliver that to the right people in real

A third challenge is how one can store the data and analyze and understand it given its size and the computational capacity.

Big data also poses

security and privacy risks for a large amount of data stored in data warehouses, centralized in a single

Big data and extreme workloads require optimized hardware and software. The main challenges of big data and extreme
workloads are data variety and volume, and analytical workload complexity and agility.

Many organizations are struggling to deal with increasing data volumes, and big data simply makes the problem worse. To solve

this problem, organizations need to reduce the amount of data being stored and exploit new storage technologies that improve
performance and storage utilization.

Big data’s increasing economic importance also raises a number of legal issues, especially when coupled with the fact that da
fundamentally different from many other assets. For example, one piece of data can be copied perfectly and easily combined
with other data. The same piece of data can be used simultaneously by more than one person.

Sectors with a relative lack of competitive intensity and performance transparency, along with industries where profit pools

highly concentrated, are likely to be slow to fully leverage the benefits of big data.


BI Research Using Big Data for Smarter Decision Making,


Big Data


Creating transparency

Making big data more easily accessible to relevant stakeholders in a timely manner can create tremendous
value. In the public sector, making relevant data more readily accessible across otherwise separated
departments can sharply reduce search and processing time.

Enabling experimentation to
discover needs

As more transactional data is created and stored in digital form, organizations can collect more accurate and
detailed performance data on everything from product inventories to personnel sick days. Using data to

analyze variability in performance is generated by controlled experiments.

Segmenting populations to
customize actions

Big data allows organizations to create highly specific segmentations and to tailor products and services
precisely to meet those needs. This approach is well
known in marketing and risk management but can be
revolutionary elsewhere.

Replacing human decision
making with automated


Sophisticated analytics can substantially improve decision making, minimize risks and unearth valuable
insights that would otherwise remain hidden. Such analytics have applications for organizations from tax
agencies that can use automated risk engines to flag candidates for further examination.

Innovating new business
models, products and

Big data enables companies to create new products and services, enhance existing ones, and invent entirely
new business models. Manufacturers are using data obtained from the use of actual products to improve the
development of the next generation of products and to create innovative after
sales service offerings.


McKinsey Big Data Report


Big Data


2012 Big Data Pure
Play Vendors, Yearly Big Data Revenue (in $US Million)

In the current market, big data pure
play vendors account for $300 million in big data
related revenue. Despite their relatively

small percentage of current overall revenue (approximately 5%), big data pure
play vendors (such as Vertica, Splunk and
) are responsible for the vast majority of new innovations and modern approaches to data management and analytics
that have emerged over the last several years and made big data the hottest sector in IT.



Big Data


The McKinsey Global Institute estimated that enterprises globally stored
more than seven exabytes of new data on disk drives in 2010, while
consumers stored more than six exabytes of new data on devices such as
PCs and notebooks.

Big data has now reached every sector in the global economy. In total,
European organizations have about 70% of the storage capacity of the
entire United States at almost 11 exabytes.

The possibilities of big data continue to evolve rapidly, driven by innovation
in the underlying technologies, platforms and analytic capabilities for
handling data, as well as the evolution of behavior among its users as more
and more individuals live digital lives.

The use of big data is becoming a key way for leading companies to
outperform their peers. McKinsey estimated that a retailer embracing big
data has the potential to increase its operating margin by more than 60%.

The increasing use of multimedia in sectors, including health care and
facing industries, has contributed significantly to the growth of
big data and will continue to do so.

The surge in the use of social media is producing its own stream of new
data. While social networks dominate the communications portfolios of
younger users, older users are adopting them at an even more rapid pace.


McKinsey Big Data Report


Big Data


Big data includes web logs,

RFID, sensor networks, social networks, social data, Internet text and documents, Internet search
indexing, call detail records, complex and/or interdisciplinary scientific research, military surveillance, medical records,
archives, video archives, and large
scale e

Examples of Companies Using Big Data:

has formed a partnership with the Netherlands Institute for Radio Astronomy

(ASTRON) for the DOME Project, which
provided support in developing the tools needed to crunch the data for the ambitious

international Square Kilometer Array (SKA)
radio telescope.

San Francisco

SeeChange Company
offered a better way of designing health insurance plans with what it calls “value
based benefits.”

The company used a substantial amount of data gleaned from personal health records, claims databases, lab
feeds and pharmacy data to identify patients with chronic illnesses who would benefit from a customized compliance program.

based company
combined its data analytics with a real
time clinical surveillance and decision support system.
The company also sells its detailed clinical spending data to life sciences companies, with the idea that customers will use
quantify patient populations, market share and market opportunities.

Castlight Health
aimed to

push transparency

in healthcare pricing by offering consumers a search engine to find prices of
healthcare services. Castlight’s technology allowed consumers to run side
side comparisons of out
pocket medical
expenses. Armed with prices, consumers will then shop for bargains, limiting the growth of healthcare costs.


has started a Google
like service that helps clinicians analyze real
time information culled from troves
of electronic medical records (EMRs), financial records and other data. The idea is that medical researchers can mine the vas
amounts of data to learn how variations in treatment can affect outcomes, uncovering best practices to enhance patient care a
lower costs.


technology brings together data from structured sources like EMRs with unstructured data, such as a physician’s patient
encounter notes. The company’s software uses natural language processing technology to interpret clinicians’ free
text searches
and return the most relevant results.



Role of Internal Audit in Managing Big Data

Case Study

Check the extent of data assets and deep dive into what all is available. Data that is redundant or unimportant may be
identified and reduced.

To manage data holdings effectively, an organization must first be aware of the location, condition and value of its research

Conducting a data audit provides this information, raising awareness of collection strengths and identifying weaknesses in da
policies and management procedures.

The benefits of conducting an audit for managing big data effectively are:

Monitor holdings and avoid big data leaks. Data hacking, social engineering and data leaks are all concepts that plague
a company

an audit can help a company identify areas where there is a possibility of leakage.

Manage risks associated with big data loss and irretrievability. Data which is not structured and is lying untouched
may never be retrieved; an audit can help identify such cases.

Develop a big data strategy and implement robust big data policies. Big data requires robust management and proper

Improve workflows and benefit from efficiency savings. Check where there are complex and time
workflows and where there is a scope of improving efficiencies.

Realize the value of big data through improved access and reuse to check if there are areas that have not been used in
a while.



Complex Big Data

Big Data Security

Big Data


Big Data


Big Data


Managing Big Data Through Internal Audit

Most companies collect large volumes of data but they don’t have comprehensive approaches for
centralizing the information. Internal audit can help companies manage big data by streamlining and
collating data effectively.

Following are issues of big data that internal audit can help mitigate:

Maintaining effective data security is increasingly recognized as a critical risk area for organizations. Loss of
control over data security can have severe ramifications for an organization, including regulatory penalties,
loss of reputation, and damage to business operations and profitability. Auditing can help organizations
secure and control data collected.

Giving access to big data to the right person at the right time is another challenge organizations face.
Segregation of duties (SoD) is an important aspect that can be checked by an IA.

The more data one accumulates, the harder it is to keep everything consistent and correct. Internal audit
can check the quality of big data.

Understanding and interpretation of big data remains one of the primary concerns for many organizations.
Auditors can effectively simplify an organization’s data effectively.