UGF9806_Vilex - ACTIVEevents.com

radiographerfictionΔιαχείριση Δεδομένων

31 Οκτ 2013 (πριν από 3 χρόνια και 9 μήνες)

126 εμφανίσεις

Managing the Data Lifecycle of Big
Data Environments

Brian Vile

Program Director, InfoSphere Product Marketing

2

© 2013 IBM Corporation

Agenda


Trends


Information Governance for Big Data


Recent II&G for Big Data announcements


3

© 2013 IBM Corporation

The Era of Big Data Demands Confidence

Volume

Variety

Velocity

Veracity

Data at Scale

Terabytes to

petabytes of data

Data in Many Forms

Structured, unstructured,
text, multimedia

Data in Motion

Analysis of streaming data
to enable decisions within
fractions of a second.

Data Uncertainty

Managing the reliability and
predictability of inherently
imprecise data types.

4

© 2013 IBM Corporation

Big Data Maturity

5

© 2013 IBM Corporation

5

IIG maturity is a key
c
haracteristic of big
d
ata
i
nitiatives

6

© 2013 IBM Corporation

6

IIG required for big data to go
-
live

Source: “IBM Data Governance”, a commissioned study conducted by Forrester Consulting on behalf of IBM, July, 2013

How important was having information integration and governance…

Base: Variable Director or VP level professionals with decision making authority for Big Data technologies

Production

Pilot

Sandbox



7

© 2013 IBM Corporation

7

Security comes first…access and quality top of mind

41%

38%

38%

36%

34%

Security and
Monitoring
Information
Integration
Protect and mask
sensitive data
Data quality
Master data
management
Source: “IBM Data Governance”, a commissioned study conducted by Forrester Consulting on behalf of IBM, August, 2013

Base: 512 Director or VP level professionals with decision making authority for Big Data technologies

What best describes how you govern 'big data' today? (top 5)

8

© 2013 IBM Corporation

Agenda


Trends


Information Governance for Big Data


Recent II&G for Big Data announcements


9

© 2013 IBM Corporation

InfoSphere IIG Covers
both
Analytical
and
Operational Use Cases

Data Warehouse
Augmentation

Operations

Analysis

Security/Intelligence

Extension

Big Data

Exploration

Enhanced 360 View

of the Customer

Application Consolidation &
Retirement

Security &
Compliance

Application
Efficiency

Application
Development &
Testing

10

© 2013 IBM Corporation

The 5 Key Analytical Use Cases

Big Data Exploration

Find, visualize, understand all big data to
improve decision making

Enhanced 360
o

View of the Customer

Extend existing customer views (MDM, CRM, etc) by
incorporating additional internal and external
information sources

Operations Analysis

Analyze a variety of machine data for improved
business results

Data Warehouse Augmentation

Integrate big data and data warehouse capabilities to
increase operational efficiency

Security/Intelligence Extension

Lower risk, detect fraud and monitor cyber
security in real
-
time

11

© 2013 IBM Corporation

Integrate big data and data warehouse capabilities to increase
operational efficiency

Data Warehouse Augmentation

Challenges



Leveraging structured, unstructured, and streaming data
sources for deep analysis


Low latency requirements


Query access to data


Optimizing warehouse for big data volumes

IIG capabilities


High performance and high quality data loads


Archive to ensure performance, compliance
and lower costs


Standardized approach to discovering your
data assets


Metadata management


Database activity monitoring


12

© 2013 IBM Corporation

IIG Is Essential
-

Ingest, Understand, & Govern Data

Big Data Platform
Capabilities


Information Ingest


Real
-
time Analytics


Warehouse & Data Marts


Analytic Appliances




All Data
Sources

Advanced
Analytics/

New Insights

New/

Enhanced
Applications

Cognitive

Learn Dynamically?



Prescriptive

Best Outcomes?



Predictive

What Could Happen?



Descriptive

What Has Happened?



Exploration and
Discovery

What Do You Have?

Streaming Data

Text Data

Applications Data

Time Series

Geo Spatial

Relational

Social Network

Video & Image

Automated Process

Case Management

Analytic Applications

Watson

Cloud Services

ISV Solutions

Alerts

13

© 2013 IBM Corporation

Open Architecture/

Multiple Product Entry Points

Information
Ingestion
and
Integration


Data
Exploration


Archive


Real
-
time

Analytics

Information Governance, Security and

Business Continuity


Data
Exploration


Enterprise
Warehouse


Data Marts



IBM Big Data & Analytics Reference Architecture

14

© 2013 IBM Corporation

Enhanced
360
o
View
of the Customer

Challenges



Need a deeper understanding of customer sentiment from internal
and external sources


Desire to increase customer loyalty and satisfaction


Challenged getting the right information to the right people for
cross
-
sell & up
-
sell




IIG capabilities


Leverage pre
-
built domains & extend custom
data domains


Use business services library


Analyze, validate and monitor data quality;
cleanse and enrich data


Search probabilistically


Integrate data of any complexity from diverse
sources


Extend existing customer views (MDM, CRM, etc.) by
incorporating additional internal and external information
sources

15

© 2013 IBM Corporation

Total respondents n = 1061

Big data objectives

Top functional objectives identified by organizations with active big data pilots or implementations.
Responses have been weighted and aggregated.

Customer
-
centric outcomes

Operational optimization

Risk / financial management

New business model

Employee collaboration

Improving the customer experience by better understanding behaviors drives
almost half of all active big data efforts

Source: 2011 IBM Global Chief Marketing Officer Study and 2012 IBM Global Chief Executive Officer Study

16

© 2013 IBM Corporation

16

Big data is governed in zones

17

© 2013 IBM Corporation

The 5 Key Operational Use Cases

Efficient Application Development & Testing

Create and maintain right
-
sized dev, test &
training environments

Enhanced 360
o

View of the Customer

Extend existing customer views (MDM, CRM, etc) by
incorporating additional internal and external
information sources

Security and Compliance

Protect data, improve data integrity, mitigate
breach risks and lower compliance costs.

Application Consolidation and Retirement

Archive old application data and streamline new application
deployment


Improve Application Efficiency

Manage data growth, improve performance, and
lower the cost for mission
-
critical applications


18

© 2013 IBM Corporation

Protect data, improve data integrity, mitigate breach risks and
lower compliance costs.

Security and Compliance

Challenges



Inability to identify sensitive data


Lack of common definition of sensitive data elements


Increasing number of regulations


Shrinking time to comply


LOB variances for privacy rules


Difficult to monitor privileged user access

IIG capabilities



Discover and understand sensitive data in all
systems


Database and file system level activity
monitoring


Mask and redact sensitive data


Compliance reporting


19

© 2013 IBM Corporation

Mask data in databases and applications

Patient No 123456

SSN 333
-
22
-
4444

Name Erica Schafer

Address 12 Murray Court

City Austin

State TX

Zip 78704

Patient No 112233

SSN 123
-
45
-
6789

Name Amanda Winters

Address 40 Bayberry Drive

City Elgin

State IL

Zip 60123

Mask




Names


Geography


Credit Card Numbers


Telephone numbers


Email addresses


Social Security numbers


Account
numbers

Sensitive Data




Certificate/license numbers


Vehicle identifiers numbers


Web URL's


IP Addresses


Business Data


Corporate intelligence

20

© 2013 IBM Corporation

SOURCE


ID

NAME

HOME ADDRESS

SYMPTOM CODE

HOUSEHOLD
INCOME

HOME PHONE

SEX

ETHNICITY

AGE

Europe

John Smith

5 Rue de la Paix

Paris, France

157.0, 157.1, 157.2,
157.3, 185

75,000

01 58 71 12 34

M

Caucasian

43

Example: Semantic Masking

ICD Code

Code Description

157

Malignant Pancreatic Cancer

157.0

Malignant Pancreatic Cancer
-

Head

157.1

Malignant Pancreatic Cancer
-

Body

157.2

Malignant Pancreatic Cancer
-

Tail

157.3

Malignant Pancreatic Cancer
-

Duct

157.4

Malignant Pancreatic Cancer
-

Islets

157.8

Malignant Pancreatic Cancer
-

Other

157.9

Malignant Pancreatic Cancer
-

Unspecified

Symptom Code 157:

21

© 2013 IBM Corporation

Rules

SOURCE


ID

NAME

HOME ADDRESS

SYMPTOM CODE

HOUSEHOLD
INCOME

HOME PHONE

SEX

ETHNICITY

AGE

Europe

John Smith

5 Rue de la Paix

Paris, France

157.0, 157.1, 157.2,
157.3, 185

75,000

01 58 71 12 34

M

Caucasian

43


Age and income must be analyzed in a range


Ethnicity and Symptom codes must be non
-
identifiable


Name, Address and Phone need to be masked

Example: Semantic Masking

SOURCE


ID

NAME

HOME ADDRESS

SYMPTOM CODE

HOUSEHOLD
INCOME

HOME PHONE

SEX

ETHNICITY

AGE

Europe

Jerry Jones

24 Boulevard
Malesherbes

Paris, France

157, 185

79,500

01 55 27 12 34

M

Caucasian or
Latino

41

22

© 2013 IBM Corporation

Protect sensitive data in databases, data warehouses, Big Data Environments and

file
shares














Big Data
Environments

DATA

InfoSphere
BigInsights

NEW

Hadoop Activity Monitoring


HDFS


MapReduce


Hive


HBASE


CouchDB


Cassandra


MongoDB


GreenPlum


HortonWorks

What data
are they
accessing?

Who is running
specific big
data requests?

What map
-
reduce jobs
are they
running?

23

© 2013 IBM Corporation

Archive old application data and streamline new application
deployment with test data management, integration, and
data quality.

Application Consolidation and Retirement

Challenges



Big data leads to more systems and a greater need
to consolidate


Manual data integration, quality, and archiving is
slow and costly


Difficult to ensure legal compliance for data
retention


10
-
40% of projects for profiling, mapping and
retiring data manually

IIG capabilities



Discover and understand data in all systems


Retain and dispose according to retention
policies


Efficient test data management


Cleanse and consolidate data


Rapidly load data to new system



24

© 2013 IBM Corporation

Forrester

s four fates for applications



Monitor & maintain



Keep the lights on


Modernize it



UI, DBMS, enhancements, migrations


Replace it


BPO,
SaaS
, Package, rewrite, or hybrid


Retire it


Remove from production environment (retire)


Decommission (leave in inquiry mode)

Classic data management
use cases

Consolidation and migration of
legacy apps use cases

ERP consolidation retirement
and / or migrate legacy apps

ERP

consolidation, migration,
data archival use cases

Source: Forrester Research November 2012

25

© 2013 IBM Corporation

Manage data growth, improve performance, and lower the cost
for mission
-
critical applications

Improve Application Efficiency

Challenges



Big data growth saddles applications with too much data


Slower response times


Increased storage and hardware costs


Longer downtime periods for batch updates


Longer downtime for application upgrades

IIG capabilities



Discover and understand data that may be
archived


Define lifecycle policies


Archive business object based upon retention
policies


Search and retrieve archived data


Supply archived data to warehouses or Hadoop
for analysis

26

© 2013 IBM Corporation

The pro’s and con’s of a “Keep Everything” strategy

Data Lake

Data Swamp

VS

27

© 2013 IBM Corporation

The pro’s and con’s of a “Keep Everything” strategy

Source: IBM 2012 CGOC Summit Survey

28

© 2013 IBM Corporation

Database Archiving

Data Archiving is an intelligent process for
moving

inactive or
infrequently accessed data that still has
value
, while providing the ability
to
search and retrieve

the data

Current

Production

Historical

Archive

Retrieve

Data
Archives

Historical Data

Reference Data

Can selectively

restore archived

data records

Universal Access to Application Data

ODBC / JDBC

InfoSphere Data Explorer

Report Writer

Application

XML

InfoSphere BigInsights

29

© 2013 IBM Corporation

Agenda


Trends


Information Governance for Big Data


Recent II&G for Big Data announcements


30

© 2013 IBM Corporation

IIG Evolves for the Era of Big Data

Automated Integration

Business users need rapid data
provisioning among the zones

Visual Context

Categorize, index, and find

big data to optimize its usage

Agile Governance

Ensure appropriate actions based on
the value of the data

1

2

3

How do I get access to
new big data sources?

How do I digest all of
this new information?

How do manage all of
this new data?

31

© 2013 IBM Corporation

Innovations in Information Integration and
Governance

Visual

Context

Agile
Governance

Automated
Integration

Information Governance Dashboard

Immediate, visual context for critical decisions and actions

Understand big data to leverage it better

InfoSphere Privacy & Security

Find and protect sensitive big data

Single point of security for traditional, NoSQL & big data

InfoSphere Data Click

Self
-
service access to a growing variety of big
data in traditional,
NoSQL

and
Hadoop

sources

2 Click


Data Integration

170x


Faster Metadata

Ingestion

80%


Faster Activity

Monitoring

32

© 2013 IBM Corporation

Information Governance Dashboard

Visualize and Control Governance

Visual

Context

Innovation


Measurements for policies and KPIs


Rapid creation of tailored dashboards



Value


Immediate insight into governance policy status


Interception of issues when they start, right at the
source


Usage


Raises data confidence with visual governance
status





1000
s


Of data points
and policies
visualized

33

© 2013 IBM Corporation

Confidence Is Essential for Actionable Insight


Make decisions with greater certainty


Analyze rapidly while providing necessary controls


Increase the value of data

Visual Context

Agile Governance

Automated Integration

34

© 2013 IBM Corporation

THANK YOU