Big-Data Computing: Creating revolutionary breakthroughs in commerce, science, and society

elbowcheepΤεχνίτη Νοημοσύνη και Ρομποτική

15 Οκτ 2013 (πριν από 3 χρόνια και 7 μήνες)

66 εμφανίσεις

1


Big
-
Data Computing: Creating revolutionary

breakthroughs in commerce, science, and society


Motasim Albdarneh

-

20123173012

Submitted in partial fulfillment of the requirement of the

CS728
-

advanced database systems
course.

February
26
,
2013


Our
Data
-
Driven World

Advances in digital sensors, communications, computation
, and

storage have created
huge collections of data,
capturing information

of value to business, science,
government, and society.

For example, search engine companies such as Google, Yahoo!, and Microsoft. These
companies collect trillions of bytes of data eve
ry day

and continually add new services

such as satellite images, driving directions, and image retrieval.

The societal benefits of
these services are immeasurable,

having transformed how people find and make use of
information on a daily basis.


Other for
ms of big
-
data



Wal
-
Mart contracted with
HP

to construct a
data
warehouse
storing

4
petabytes
of data, representing every purchase recorded by their point
-
of
-
sale terminals
worldwide.

By applying
machine learning
to this data, they can detect patterns
indicating the
effectiveness of pricing strategies, advertising campaigns, etc
...



(LSST) Telescope will scan the sky, recording 30 trillion bytes of image data
every day.

2


Astronomers will apply massive computing power to this data to probe the origins
of o
ur universe.



Modern medicine collects huge amounts of information about patients through
CAT scans, MRI and other forms of diagnostic equipment.

By applying
data mining
to data sets for large numbers of patients, medical
researchers are gaining fundamental

insights into the genetic and environmental
causes of diseases, and creating more effective means of diagnosis.

These are a small sample of the ways that commerce, science, and society are being
transformed by the availability of large amounts of data and

the means to extract new
forms of understanding from this data.

Big
-
Data Technology:
Sense, Collect, Store, and Analyze

The rising importance of big
-
data computing stems from advances in many different
technologies:



Sensors:
digital imagers (telescopes, v
ideo cameras, MRI machines), chemical
and biological sensors (microarrays, environmental monitors), and even web
pages.



Computer networks:
via localized sensor networks, as well as the Internet.



Data storage:
Advances in magnetic disk technology have drama
tically
decreased the cost of storing data.



Cluster computer systems:
provide both the storage capacity for large data
sets, and the computing power to organize the data, to analyze it, and to respond
to queries about the data from remote users.



Cloud comp
uting facilities:
Businesses and individuals can
rent
storage and
computing capacity, rather than making it. For example, Amazon Web Services
(AWS).



Data analysis algorithms:
The enormous volumes of data require automated or
semi
-
automated analysis


techn
iques to detect patterns, identify anomalies, and
extract knowledge.

3


Technology and Application Challenges



High
-
speed networking:
bandwidth limitations. We need a “Moore’s Law”
technology for networking, where declining costs for networking infrastructure
combine with increasing bandwidth.



Cluster computer programming:
Hardware and software errors. Major
innovations have been made in methods to organize and program such
systems, including the MapReduce programming framework introduced by
Google.



Extending t
he reach of cloud computing:
Bandwidth limitations and cost
for tasks that require extensive computation over large amounts of data. In
addition, the bandwidth limitations of getting data in and out of a cloud.



Machine learning and other data analysis tech
niques:
more work is
needed to develop algorithms that apply in real
-
world situations and on data
sets of trillions of elements.



Widespread deployment:
We expect "big
-
data science"


often referred to
as eScience


to be pervasive.



Security and privacy:
Un
authorized access and use. Along with developing
technology to enable useful capabilities, we must create safeguards to
prevent abuse.


Conclusion



Big
-
data computing is perhaps the biggest innovation in computing in the last
decade. We have only begun to s
ee its potential to collect, organize, and
process data in all walks of life.



Hello Governments!!


4


Recommendations



Investments in big
-
data computing will have extraordinary near
-
term and
long
-
term benefits.



The technology has already been proven in some
industry sectors.



The challenge is to extend the technology and to apply it more widely.

Immediate

Actions


Specific funding over the next two years could greatly stimulate the development,
deployment, and application of big
-
data computing.

Longer Term Act
ions



Enough

budget
.



Construct

special
-
purpose data centers for the major eScience programs
.



Cloud computing must be considered a strategic resource
.



Look

beyond traditional high
-
performance computing. Many of needs could
be addressed better and more cost effectively by cluster computing systems,
possibly making use of cloud facilities.



Encourage the deployment and application of big
-
data computing in all f
acets.



Make fundamental investments in our networking infrastructure to provide
ubiquitous, broadband access to end users and to cloud facilities.