Optimizing Big Data with Search

blaredsnottyAI and Robotics

Nov 15, 2013 (3 years and 7 months ago)

447 views

©2010 IBM Corporation
Optimizing Big Data with Search
Mark Myers Sr. Director of Product marketing
Vivisimo, An IBM Company
©2010 IBM Corporation
Topics
￿Big Data opportunity
￿Evolution of search
￿Search, navigation and discovery for Big Data
￿Deployment scenarios
￿Case studies and examples
©2010 IBM Corporation
What is Big Data?
What is Big Data?Definition varies …
but you probably know it when you
see it…
Data sets that are:
•Too large (volume)
•Too fast-moving (velocity)
•Too diverse (variety)
…for “conventional”data
management tools
Can be structured,
unstructured, or semi-
structured
©2010 IBM Corporation
4
“Data is the New Oil”
4
“Data is the new Oil.
Data is just like crude. It’s valuable,
but if unrefined it cannot really be
used.”
–Clive Humby, DunnHumby
“We have for the first time an economy based on a key resource
[Information] that is not only renewable, but self-generating. Running out
of it is not a problem, but drowning in it is.”
–John Naisbitt
Big Data –Big Potential
5
©2010 IBM Corporation
Government is near the top in volume of data
Source: US Bureau of Labor Statistics; McKinsey Global Instituteanalysis
Stored Data Per Agency (2009): 1,313 TB
©2010 IBM Corporation
©2010 IBM Corporation
Many organizations are losing ground
©2010 IBM Corporation
SEARCH,
DISCOVERY AND
NAVIGATION FOR
BIG DATA
©2010 IBM Corporation
10
©2010 IBM Corporation
Search today is ubiquitous and rich with features to connect people
and machines with information
Keyword
“Natural language”/ semantics
Clustering and tag clouds
Faceted navigation
Virtual documents
Recommendation
Filtering / alerting
Federation
Autocomplete
Speech

Mobile
Enterprise
Apps
Web
Media
CommercePortals
……
Delivered across many devices and applications
©2010 IBM Corporation
SEARCH,
DISCOVERY AND
NAVIGATION WITH
BIG DATA
©2010 IBM Corporation
Cost per byte increases with more refined applications
Search
Hadoop /
Big Data Framework
Data
Warehouse & Analytics
Cost Per Byte / Value
©2010 IBM Corporation
Big data adds another layer of challenge
￿Disk space has increased massively but
speed to read / write has not; same with seek
time
–1TB drive w/ 100MB/s = ~2.5 hours to
read all data from disk
￿More hardware means greater chance a
single piece will fail
￿Analytics need to be able to combine the
data in some way; often require pre-
processing and caching
©2010 IBM Corporation
Tenets of Big Data Processing
￿Distributed Processing
–Ability to distribute and processing across a network of nodes, and re-assemble the
results. Analysis takes place where the data is stored.
￿Fault Tolerance
–Failure of a particular node should not bring down whole system;if a node fails and can
be restored, it should be able to re-join the group activity without introducing
inconsistencies.
￿Linear Scalability
–Adding computing resources should increase speed and performancein a linear fashion.
￿Graceful Load Response
–Increased load should not cause failure, but rather graceful decline in performance.
￿Elasticity of Resources
–Readily expand or contract to match the workload at a given time.
Search platform needs to match these demands to function in a Big
Data environment
©2010 IBM Corporation
Search Platform Architecture
Connector Framework
CM, RM, DMRDBMSFeedsWeb 2.0EmailWebCRM, ERPFile Systems
©2010 IBM Corporation
Deployment/Integration Scenarios for Exploiting Big Data
1.Rapid search, discovery and navigation
2.Load data from enterprise applications into Big Data framework
3.Index and search of Big Data analytics
4.Leveraging Big Data Platform for bulk processing and analytics
Velocity Platform
©2010 IBM Corporation
Rapid search, discovery and navigation
Vivisimo Big Data Connectors
Connector Framework
CM, RM, DMRDBMSFeedsWeb 2.0EmailWebCRM, ERPFile Systems
©2010 IBM Corporation
©2010 IBM Corporation
Index and search of Big Data analytics
©2010 IBM Corporation
©2010 IBM Corporation
EXAMPLES AND CASE
STUDIES
©2010 IBM Corporation
Federation across secure domains at massive scale
©2010 IBM Corporation
Knowledge fusion and collaboration across more than 400,000
users
©2010 IBM Corporation
Powerful social search to drive collaboration and knowledge-sharing
©2010 IBM Corporation
Metadata Catalog
©2010 IBM Corporation
Fusion of enterprise data and analytics –commercial
©2010 IBM Corporation
Conti
nued
©2010 IBM Corporation
360 de
gree view of the
customer
©2010 IBM Corporation
360 degree
view of an asset
©2010 IBM Corporation
360 degree view of the citizen (conceptual prototype)
©2010 IBM Corporation
Search across multiple silos
©2010 IBM Corporation
National Archives and Records Administration –Electronic Records
Administration
￿Challenge: create a
single access point and
rich discovery
environment for the
permanent records of the
United States
￿Online Public Access
prototype
–Streamlined
searching
–Better results
–Better presentation
©2010 IBM Corporation
National Archives and Records Administration
Projected Data Growth for Electronic Records Administration
©2010 IBM Corporation
QUESTIONS & DISCUSSION