Data Computing: Creating revolutionary
breakthroughs in commerce, science, and society
Submitted in partial fulfillment of the requirement of the
advanced database systems
Advances in digital sensors, communications, computation
storage have created
huge collections of data,
of value to business, science,
government, and society.
For example, search engine companies such as Google, Yahoo!, and Microsoft. These
companies collect trillions of bytes of data eve
and continually add new services
such as satellite images, driving directions, and image retrieval.
The societal benefits of
these services are immeasurable,
having transformed how people find and make use of
information on a daily basis.
ms of big
Mart contracted with
to construct a
of data, representing every purchase recorded by their point
to this data, they can detect patterns
effectiveness of pricing strategies, advertising campaigns, etc
(LSST) Telescope will scan the sky, recording 30 trillion bytes of image data
Astronomers will apply massive computing power to this data to probe the origins
Modern medicine collects huge amounts of information about patients through
CAT scans, MRI and other forms of diagnostic equipment.
to data sets for large numbers of patients, medical
researchers are gaining fundamental
insights into the genetic and environmental
causes of diseases, and creating more effective means of diagnosis.
These are a small sample of the ways that commerce, science, and society are being
transformed by the availability of large amounts of data and
the means to extract new
forms of understanding from this data.
Sense, Collect, Store, and Analyze
The rising importance of big
data computing stems from advances in many different
digital imagers (telescopes, v
ideo cameras, MRI machines), chemical
and biological sensors (microarrays, environmental monitors), and even web
via localized sensor networks, as well as the Internet.
Advances in magnetic disk technology have drama
decreased the cost of storing data.
Cluster computer systems:
provide both the storage capacity for large data
sets, and the computing power to organize the data, to analyze it, and to respond
to queries about the data from remote users.
Businesses and individuals can
computing capacity, rather than making it. For example, Amazon Web Services
Data analysis algorithms:
The enormous volumes of data require automated or
iques to detect patterns, identify anomalies, and
Technology and Application Challenges
bandwidth limitations. We need a “Moore’s Law”
technology for networking, where declining costs for networking infrastructure
combine with increasing bandwidth.
Cluster computer programming:
Hardware and software errors. Major
innovations have been made in methods to organize and program such
systems, including the MapReduce programming framework introduced by
he reach of cloud computing:
Bandwidth limitations and cost
for tasks that require extensive computation over large amounts of data. In
addition, the bandwidth limitations of getting data in and out of a cloud.
Machine learning and other data analysis tech
more work is
needed to develop algorithms that apply in real
world situations and on data
sets of trillions of elements.
We expect "big
often referred to
to be pervasive.
Security and privacy:
authorized access and use. Along with developing
technology to enable useful capabilities, we must create safeguards to
data computing is perhaps the biggest innovation in computing in the last
decade. We have only begun to s
ee its potential to collect, organize, and
process data in all walks of life.
Investments in big
data computing will have extraordinary near
The technology has already been proven in some
The challenge is to extend the technology and to apply it more widely.
Specific funding over the next two years could greatly stimulate the development,
deployment, and application of big
Longer Term Act
purpose data centers for the major eScience programs
Cloud computing must be considered a strategic resource
beyond traditional high
performance computing. Many of needs could
be addressed better and more cost effectively by cluster computing systems,
possibly making use of cloud facilities.
Encourage the deployment and application of big
data computing in all f
Make fundamental investments in our networking infrastructure to provide
ubiquitous, broadband access to end users and to cloud facilities.