Big Data Takeaway Thoughts from USGIF Tech Days

educationafflictedBiotechnology

Oct 4, 2013 (4 years and 1 month ago)

223 views

Take Away Thoughts

On Big Data

June 2012

Bob Gourley

CTOlabs.com

Some Context You Can
Use

A Summary and Discussion of Big Data Presentations at the
USGIF
Technology Days

How Much Data Is there
?


All of this afternoon’s presentations reminded me of a quote
that tried to spell out how big the universe is:

"Space," it says, "is big. Really big. You
just won't believe how vastly, hugely,
mindbogglingly big it is.

From the Hitchhikers Guide


CTOlabs projected in 2008 that the
hype over Cloud Computing was
headed to overtake the hype over
SOA, and it did, in 2010.


The hype over Big Data is showing
signs of a faster growth, but we are
unsure at this point at what point the
hype will overtake that of cloud
computing.


Big reason to track this:
It
underscores that we need to decide
what we mean by this term and not let
others decide for us.


Cross
-
Over in Hype

Big Data


This is not business as usual.


And this is not just lots of data.


It is not just exponential growth of data.


It is new ways of making sense over data that require changes to existing
architectures.


Big Data, the term, in its current use, implies many other things, like:


Apache Hadoop Framework


Commodity hardware leveraging Moore’s law


Infinite scalability


No data temples


Caution: Big Data, the term, may soon lose all meaning.


I will define Big Data Now

What Is Big Data?


Big Data is a term applied to data sets whose size is
beyond the ability of legacy approaches to capture,
manage and process the data within a tolerable elapsed
time. Big data sizes are a constantly moving target.



For the enterprise CTO, speaking of “big data” implies
the need for a strategy for dealing with
sensemaking

over
large
quantities of data.

More Context Follows

Big Data


Think of challenges facing humanity: They all need solutions.


But
first
we
have to address a key problem:


Old ways of doing IT must change. When you hear the
phrase “Big Data” think of this need for change.

Ability for humans

to analyze data

Amount of data

to Analyze

Area of Need


(and opportunity)

The ability to collect, parse, analyze machine data in real time, whether on
premise or in the cloud, will continue to grow.

Ignite Style Presentations


Introductory Keynote


“Big Data: Friend or Foe” by Barry Barlow, Director,
OnlineGEOINT

Services, National
Geospatial
-
Intelligence Agency


Management of Multi
-
Terabyte Video Files in Broadcast Television


Steve Atkinson, Director Federal Sales,
Front Porch Digital Inc.


3,000,000 km2 of Satellite Imagery Every Day: Finding the Relevant Pixels


Luke Barrington, CTO,
Tomnod
;
and John
Lucier
, Senior Manager Analysis Offerings,
DigitalGlobe


The Value of Stream Computing in Big Data


Gabe Chang, Senior Consulting Client IT Architect, IBM


Big Data: Security and Scale in the Era of On
-
Demand IT


MG (Ret.) John M. Custer, Director, Federal
Missions and Programs, EMC2


Intelligence Products in the Age of Big Data


George
Demmy
, CTO,
TerraGo


Data Intensity and Datacenter Consolidation via Federated Cloud “Big Data”


Eng

Lim
Goh

Ph.D., CTO and
Senior VP, Silicon Graphics International Corp.


A Big Ocean: Handling High Volume Bathymetry


Michael
Henheffer
, Senior Software Developer Marine
Division, CARIS


Big Features: A WISE Approach to Scalable Geospatial ISR Fusion


Joshua Lieberman, Senior Manager,
Deloitte FAS
LLP


OMAR
: Open Source Software for Big Data


Mark R. Lucas, Principal Scientist,
RadiantBlue

Technologies


Interacting with Big Data using Mobile Devices


Kumar
Navular
, Director,
NextGeneration

Products,
DigitalGlobe


Predictive Analytics in the Cloud: The Art of the Possible


by Anthony
Quartararo
, President and CEO,
Spatial Networks Inc.


Big Data: The Diversity Factor


Dan Quinn, Vice President of Sales and Marketing, Progressive Technology
Federal Systems Inc.


Data Harmonization Through the Use of Complex Event Processors


Dennis
Groseclose
, President,
TransVoyant


Standardizing Web Interfaces to Distribute Wide Area Motion Imagery


Rahul
Thakar
, VP of Technology,
PIXIA Corp.


Discussant


Bob Gourley, Chief Technology Officer, Crucial Point LLC

Recap


Big Data Friend or Foe: Barry Barlow says there is lots of data. Can't analyze it all with people.
Mentioned
MapStory
. RecordedFuture
.

Hadoop
.


Big Data in Television: DBX is the current format. TB size files. Lessons regarding metadata
and workflow relevant to our world
.

Suggested u
se
of “
proxy
workflows
.



Finding Relevant Pixels: New crowdsourcing methods of review.


Stream and Big Data: Interconnected world with fast moving data, analyze in stream.


Game
-
Changing Tech for Data Explosion: Value comes from analytics. Storage nothing, ot a
tsunami,it is a rising tide.. Use Hadoop. Put compute into storage.


Intel reporting in the age of Big Data: Data must be operatioinalized


Big Ocean: Bathymetry data smartly handled.


Big Features: Context for collaboration while working big data is crtical.


OMAR
-

Open Source for Big Data: These guys are working really BIG data, in a way that is
smooth and easy to users.


Big Data and Mobile: Well engineerd solutions can make it incredibly easy on users.


Data Intensity and Data Center Consolidation: Think BIG. Keep data where it is. Have a vision.


Predictive Analytics in the Cloud: Know your challenge. User experience is everything.


Activity
-
Based Intelligence: Must engineer for ABI. Sometimes bring data together.


Global Standardization of Web Interfaces for WAMI: Web services for MI


Diverse Requirements: Says Big Data Not New. Good ways to present to users.

Some discussion items


Presenters at times used varying definitions of the term “big
data”


We can’t command that the term always be used the way the
tech community uses it. But we can push back whenever
someone decides to invent their own definition.

What To Do Now:


Sign up for the Government Big Data Newsletter


Send me an e
-
mail, at bob @ crucialpointllc.com


Or sign up at CTOvision.com



Leverage the Apache Hadoop framework. Download open
source distribution of CDH4 from Cloudera



Share your lessons learned, and seek lessons from others on
your Big Data use.



Be precise on how you use the term, or expect its meaning to
disappear

Backup Slides

Big Data Use Cases in Government


Security: rapid real time analysis of all relevant data


Rapid return of geospatial data


Location based push of data: ads now, but watch for more


Real time return of relevant search: Google, Cloudera and
USA.gov


Real time suggestion of topics: Google, Cloudera and USA.gov


Bioinformatics: Human Genome, Hadoop


Bioinformatics: Patient location, treatment, outcomes


Big Data and the Special Case of Cyber


Cyber security has long generated large
quantities of data.


Enterprises need access to all the data to
look for evidence of
coordinated/sophisticated adversary
action. Old approaches do not enable
that.


New

CDH4 enabled

“Big Data”
approaches enable “Enterprise Security
Intelligence” solutions.


Includes non
-
cyber data. Rapid computer
emergency response needs all source
data along with the cyber data.

New approaches to cyber security
analysis, including incident
detection, incident response,
forensics and remediation, require
“Big Data” thinking and designs
built to bring all the data together.

The Intent of the Government Big Data Solutions
Award


Established to help facilitate exchange of best practices, lessons
learned and creative ideas for solutions to hard data challenges



Special focus on solutions built around Apache Hadoop framework



Nominees and award winners to be written up in CTOlabs.com
technology reviews



Award meant to help generate exchange of lessons learned



We established a team of judges, asked them to consider mission impact as primary
criteria, and solicited award nominations via sites frequented by government IT
professionals and solution providers.

The Government Needs More Agility*


The government can rapidly benefit from the lessons of high tech by
being a faster follower, especially when it comes to Big Data
constructs



Thesis: If the Big Data community understands more about federal
missions, challenges and successes, we can improve the speed
and effectiveness of federal solutions.




“High tech runs three
-
times faster than normal businesses. And the
government runs three
-
times slower than normal businesses.

So we have a nine
-
times gap”




Andy Grove

*Among other
needs

Most active fed solution areas:


Federal integrators:
Spending internal research and development
funds to create prototypes and full solutions relevant to fed missions



DoD and IC agencies:
Using Big Data approaches to solve “needle
in the haystack” and “connect the dots” problems



National Labs:
Bioinformatics solutions have been put in place by
federal researchers



OMB and GSA:
Ensuring sharing of lessons and solutions. Key
exemplars around web search methods. Solutions inside
government agencies and on citizen facing properties



Big Data solutions are already making a difference in government service to citizens.
Highlighting some of this virtuous work is a goal of our Government Big Data Solutions
Award.

Top Nominees for 2011


USA Search:
Best in class hosted search services over more than 400 gov sites.
Great use of CDH3.



GCE Federal:
Cloud
-
based financial management solutions. Apache Hadoop,
Hbase, Lucene for Dept of Labor.



PNNL Bioinformatics:
Leading researcher Dr. Taylor of PNNL is advancing
understanding of health, biology, genetics and computing using Apache
Hadoop/MapReduce/HBase.



SherpaSurfing:

Use of CDH as a cybersecurity solution. Ingest packet capture in
any format, analyze trends, find malware, alert.



US Department of State:
Bureau of Counselor Affairs. Large data with important
applications for citizen service and national security.



Each of these are making a difference for government missions right now.

USA Search


Program of General Services Administration’s (GSA) Office of Citizen
Services and Information Technologies.



Hosted search services for USA.gov and over 500 other government
websites.




Solves big data challenges with open source capabilities.



CDH3 since fall 2010. HDFS, Hadoop and Hive used in cost effective,
resilient, scalable solution.



Search Results. Search Suggestions. Trend analysis. Analytic dashboards.






Bottom Line: USA Search brings the best of the open source community to multiple
government missions, including direct citizen support



Some Requests


Sign up for the Government Big Data Newsletter at:


http://j.mp/ctonews



Watch for the 2012 Government Big Data Solutions Award



Stay in touch!



Thank You!

Please give feedback and find more info at:

CTOvision.com