The Core Principles of Information Governance

radiographerfictionData Management

Oct 31, 2013 (4 years and 12 days ago)

152 views

© 2010 IBM Corporation

The Core Principles of

Information Governance

Brian Kordelski


WW Sales Executive


IBM InfoSphere

12/07/2010

© 2010 IBM Corporation

2

Governance is no longer an option

“By 2013, 25% of the
companies in highly regulated
industries will create and staff
positions in accounting,
human resources, compliance
and audit and law that deal
explicitly with the management
of information via technology.”

“[A]n [information
management] strategy
should incorporate life
-
cycle information
governance practices [to
ensure] consistent
execution of ... business
optimization, agility, and
transformation [initiatives].”

“If you are going to protect
your company's most
valuable asset

your
data

you will begin to
view data security as a
component of a more
comprehensive information
governance strategy.”



Gartner, Inc.


“Organizing for

Information Governance”


Debra Logan, November 2009



Forrester Research, Inc.

“Refresh Your Information
Management Strategy to

Deliver Business Results”


Rob Karel & James
G. Kobielus, August 2009



Hurwitz & Associates


“Why you need an
information governance

strategy for 2010”


Marcia Kaufman,
December 2009

© 2010 IBM Corporation

3

Information Governance Council Maturity Model

Enhances

Requires

Supports

© 2010 IBM Corporation

4

If we don’t proactively manage quality

Increase costs and missed revenue opportunities, impacting both
financials and customer relationships due to lack of data quality.

Incomplete and inaccurate master data created problems in receiving and/or
shipping products, marketing literature and regulatory mailings, and 360
-
degree
customer visibility.

Small error in the quality of the rating data leads to negative
impact for the company and unhappy customers

Large Telecom provider with massive volume of telephone calls and telephone
customers, even a small error in the rating data can mean significant revenue
loss or customer turnover.

Data quality issues plague BI initiatives creating a lack of trust in
the data

Several attempts at implementation of a data warehouse and analytics application at
a major retailer had stalled due to data quality issues which created frustration for the
project team and a lack of trust of the data on the part of business users.

© 2010 IBM Corporation

5

Requirements to manage the quality of data


Develop

& Test

Cleanse &

Manage Continuously


Design your


data structures


Define common

vocabulary

Discover your data across
systems

Remediate

Inconsistencies


Actively Monitor

& Manage Data


Define Rules &

Cleanse Data

Understand

& Define

Validate test

results

Create & refresh

test data

Develop database

structures

© 2010 IBM Corporation

6

Understand your information


Data can be distributed over multiple
applications, databases and platforms


Where are those databases located?


Complex, poorly documented data
relationships


Which data is sensitive, and which can be
shared?



Whole and partial sensitive data elements
can be found in hundreds of tables and
fields


Data relationships not understood
because:


Corporate memory is poor


Documentation is poor or nonexistent


Logical relationships (enforced through
application logic or business rules) are
hidden



?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

Distributed Data Landscape

© 2010 IBM Corporation

7

IT

Architect

Marketing
Manager

Support Rep

CRM Project
Manager

Business
Intelligence
Manager

ERP

Project Manager

Business

Analyst

Financial
Officer

Compliance
Officer

Sales Lead

How does each user define:


“Active Subscriber”?


Mobile user who has used “any”
service in the mobile network



User who paid for the service at
least 1 time in the past 90 days.



Mobile user who has a phone
plan, but not SMS



Only post
-
paid customers, not
pre
-
paid customers



User who makes at least 1 call
over the period of 90 days



Gain consistent terminology

© 2010 IBM Corporation

8

Cleanse and continuously manage your data

1.
Create reusable quality rules & cleanse your data


Leverage the knowledge gained during the understand
& define steps


Define what quality means to you


Design your data quality rules and matching logic


2.
Actively monitor & manage your data


Standardize data formats


Leverage precisely calibrated matching rules and
remove duplicates


Develop rules & quality metrics for monitoring


Manage duplicate data, when required


3.
Remediate inconsistencies in your data


Monitor for problems or trends


Investigate data lineage to find source of problem


Repair data and source of problem


Maintain monitoring to capture future problems

Make sure there is an owner of data quality AND

management sponsorship

© 2010 IBM Corporation

9

Monitor quality with integrated data rules


Create “Checks & Balances” to proactively identify quality concerns throughout the lifecycle


Build & test rules for common or complex conditions


Extend profiling through targeted analysis of specific data conditions or conformance to
expected rules


Establish benchmarks and baselines to help track data quality


is it deteriorating or
remaining constant?


Flag bad data for audit



Examples of Rules:


The Gender field must be populated and must be in the list of accepted values


The Social Security Number must be numeric and in the format 999
-
99
-
9999


If Date of Birth Exists AND Date of Birth > 1900
-
01
-
01 and < TODAY Then Customer
Type Equals ‘P’


The Bank Account Branch ID is valid in the Branch Reference master list

© 2010 IBM Corporation

10

IBM provides the solutions required to create high quality information

Develop

& Test

Cleanse &

Manage Continuously


Design your


data structures


Define common

vocabulary

Discover your data across
systems

Remediate

Inconsistencies


Actively Monitor

& Manage Data


Define Rules &

Cleanse Data

Understand

& Define

Validate test

results

Create & refresh

test data

Develop database

structures

© 2010 IBM Corporation

11

Organizational challenges from lack of data lifecycle management


New application functionality to meet business needs is not deployed on schedule


No understanding of relationships between data objects repeatedly delays projects


Greater data volumes take longer to clone, test, validate and deploy which equates to
longer test cycles



Increased operational and infrastructure costs impact IT budget


Cloning databases requires more storage hardware


Larger databases impact staff productivity and could mean additional license costs



Application defects are discovered after deployment


Costs to resolve defects in production can be 10


100 times greater than those caught
in the development environment



Unintentional disclosure of confidential data kept in test/development environments



Forrester estimates that 85%


of data stored in databases is inactive

Source: Noel Yuhanna, Forrester Research, Database Archiving Remains An Important Part Of Enterprise DBMS Strategy, 8/13/07

© 2010 IBM Corporation

12

1 TB

Actual Data Burden = Size of production database + all replicated clones

The data multiplier effect

1 TB

1 TB

1 TB

Development

Test

User

Acceptance

Production

1 TB

Backup

1 TB

Disaster

Recovery

6 TB

Total

© 2010 IBM Corporation

13

Requirements to manage data across its lifecycle

Develop &

Test

Discover &

Define

Optimize, Archive

& Access

Consolidate &

Retire

Move only the
needed information

Rationalize
application portfolio

Validate test results

Define policies

Report & retrieve
archived data

Create & refresh test
data

Manage data growth

Classify & define data
and relationships

Develop database
structures & code

Enhance
performance

Discover where

data resides

Enable compliance
with retention &

e
-
discovery

© 2010 IBM Corporation

14

Development

Environment

QA

Environment

Test

Environment

Training

Environment

Production or

Production Clone

Development

Environment

Implement test data management with masking

Create targeted, right
-
sized test
environments instead of cloning
entire production environments


Mask data to protect privacy


Compare data pre/post test to
identify quality issues


© 2010 IBM Corporation

15

Current

Production

Historical

Retrieve

Universal Access to Application Data

Application

Application

XML

ODBC / JDBC

Archive to manage data growth

Archives

Reporting
Data

Historical
Data

Reference
Data

Archive

Mashup

Archiving is an intelligent process for
moving

inactive or infrequently
accessed data that still has
value
, while providing the ability to
search
and retrieve

the data

Retrieved

© 2010 IBM Corporation

16

Diagnose and solve performance problems


Identify

problems before they impact
business


Diagnose

performance problems
quickly & easily


Implement

a permanent solution, not a
temporary workaround


Plan

for the future while avoiding past
mistakes

© 2010 IBM Corporation

17

When you retire or consolidate applications don’t move all of the data


Application portfolio has redundant systems acquired via mergers and acquisitions


Line of business divested; application is no longer needed


Legacy technologies not compatible with current IT direction


Old database and/or application versions no longer supported by manufacturer


Required technical skills or application knowledge no longer available


Budget pressures


do more with less

In almost ALL cases, access to legacy
data MUST be retained while the
application and database are eliminated

© 2010 IBM Corporation

18

IBM provides the solutions required to manage information
throughout its lifecycle from requirement to retirement

Develop &

Test

Discover &

Define

Optimize, Archive

& Access

Consolidate &

Retire

Move only the
needed information

Rationalize
application portfolio

Validate test results

Define policies

Report & retrieve
archived data

Create & refresh test
data

Manage data growth

Classify & define data
and relationships

Develop database
structures & code

Enhance
performance

Discover where

data resides

Enable compliance
with retention &

e
-
discovery

© 2010 IBM Corporation

19

The data privacy and protection risk continues

Confidential data that should be redacted can be hidden or
embedded

April 2010:

A PDF of a subpoena in the case of “United States vs. Rob
Blagojevich” was posted to public website. However, the “redacted” text simply had
black box placed on top to hide the content


the actual text was still available.

Unprotected test data sent to and used by test/development
teams as well as third
-
party consultants.

February 2009:

An FAA server used for application development & testing was
breached, exposing the personally identifiable information of 45,000+ employees.

Confidential data inadvertently exposed or otherwise available
to unauthorized viewers.

February 2010:

About 600,000 customers of a major NYC bank received their
annual tax documents with their Social Security numbers (combined with other
numbers & letters) printed on the outside of the envelope.

SQL injection is fast becoming one of the biggest & most high
profile web security threats.

July 2010
: Hackers obtained access to the user database and administration
panel of a popular website by exploiting several SQL injection vulnerabilities. The
exposed data included user names, passwords, e
-
mail addresses and IPs.

© 2010 IBM Corporation

20




Larry Ponemon, founder of the group that bears his name, said that survey
shows a shift in the way C
-
level executives think about security software.
Investing in data protection, he said, is now seen as less expensive than
recovering from a data breach.
--

InformationWeek

Can today’s organizations successfully protect their information?


Where does your sensitive data reside across the enterprise?


How can your data be protected from both authorized and unauthorized access?


Can your confidential data in documents be safeguarded while still enabling the necessary
business data to be shared?


How can access to your enterprise databases be protected, monitored and audited?


Can data in your non
-
production environments be protected, yet still be usable for training,
application development and testing?

© 2010 IBM Corporation

21

Requirements to manage the security and protection of data

Discover &

Define

Secure &

Protect

Monitor

& Audit

Define policies

& metrics

De
-
identify confidential
data in non
-
production
environments

Assess database


vulnerabilities

Classify & define data
types

Safeguard sensitive data in
documents

Monitor and enforce

database access

Discover where sensitive

data resides

Protect enterprise data

from both authorized &

unauthorized access

Audit and report

for compliance

© 2010 IBM Corporation

22

Discover where sensitive data may be hidden


Relationships and sensitive data can’t
always be found just by a simple data
scan


Sensitive data can be embedded
within a field


Sensitive data could be revealed
through relationships across fields
& systems



When dealing with hundreds of tables
and millions of rows, this search is
complex


you need the right solution

Sensitive Relationship Discovery

Code

Name

53

Streptococcus pyogenes

72

Pregnancy

32

Alzheimer Disease

47

H1N1

34

Dermatamycoses

System Z Table 25

Patient ID # embedded within another field

Compound sensitive data:

Test results could potentially be revealed.

© 2010 IBM Corporation

23

Protecting data is both an external and internal issue


Prevent “power users” from abusing their access to
sensitive data (separation of duties)


DBA and power users



Prevent authorized users from misusing sensitive data


For example, third
-
party or off
-
shore developers



Prevent intrusion and theft of data


For example, someone walking off with a back
-
up tape


Hacker


Database vulnerabilities (user id with no password or
default password)


© 2010 IBM Corporation

24

Protection of data requires a 360
-
degree strategy


Secure sensitive data values


Across both structured and unstructured



De
-
identify data


Restricted data sharing with 3rd parties


Generation of fictionalized test data for non
-
production


Support off
-
shore deployment model



Stop unauthorized data access


Render data useless via encryption


Lock down SQL to prevent SQL injection


Block suspicious network traffic


Security makes it possible for us to take risk, and innovate confidently.

© 2010 IBM Corporation

25

Protect sensitive data values within documents


Redact (or remove) sensitive unstructured data found in documents and forms, protecting
confidential information while supporting the need to share critical business information


Support compliance with industry
-
specific and global data privacy requirements or
mandates



Leverage an automated redaction process for speed, accuracy and efficiency


Ensure hidden source data (or metadata) within documents is redacted as well



Prevent unintentional disclosure by using role
-
based masking to confidently share data



Ensure multiple file formats are support, including PDF, text, TIFF and Microsoft Word
documents


Redact Full Name

& Street Address

© 2010 IBM Corporation

26

De
-
identify data without impacting test & development


Mask or de
-
identify sensitive data elements that could be used to identify an individual



Ensure masked data is contextually appropriate to the data it replaced, so as not to impede
testing


Data is realistic but fictional


Masked data is within permissible range of values



Support referential integrity of the masked data elements to prevent errors in testing

Personal identifiable
information is masked

with realistic but fictional
data for testing &
development purposes.

JASON MICHAELS

ROBERT SMITH

© 2010 IBM Corporation

27



Most organizations do not have mechanisms in place to prevent
database administrators and other privileged database users from
reading or tampering with sensitive information [in business
applications]…Fewer than two out of five respondents said they could
prevent such tampering by super users.



--

Independent User Group

What happens with security complacency


Not being able to report compliance can lead to regulatory fines


No audit report mechanism


No fine grain audit trail of database activities



Don’t know if there is a data breach until it’s too late


Lack of awareness of suspicious access patterns


On
-
going vs. single
-
invent: problems identifying patterns of unauthorized use



Not able to monitor super user activity to ensure data security standards


Unable to detect intentional and unintentional events

© 2010 IBM Corporation

28

Streamline and simplify compliance processes


Alerts of suspicious activity


Audit reporting and sign
-
offs


User activity


Object creation


Database configuration


Entitlements


Separation of duties


creation of policies vs. reporting

on application of policies


Trace users between applications, databases


Fine grained
-
policies


Sign
-
off and escalation procedures


Integration with enterprise security systems (SIEM)

© 2010 IBM Corporation

29

IBM provides the solutions required secure and protect data privacy

Discover &

Define

Secure &

Protect

Monitor

& Audit

Define policies

& metrics

De
-
identify confidential
data in non
-
production
environments

Assess database


vulnerabilities

Classify & define data
types

Safeguard sensitive data in
documents

Monitor and enforce

database access

Discover where sensitive

data resides

Protect enterprise data

from both authorized &

unauthorized access

Audit and report

for compliance

© 2010 IBM Corporation

30

The IBM security strategy:

Make security,
by design
, an enabler of innovative change

IBM as a
trusted partner
,
delivering secure products
and services

IBM as a
trusted security
vendor
, providing key solutions
across all security domains


15,000

researchers, developers and SMEs
on security initiatives


Data Security Steering Committee


Security Architecture Board


Secure Engineering Framework


3,000+

security & risk management patents


200+

security customer references and
50+

published case studies


40+

years of proven success securing the
zSeries environment


Managing
more than 7 Billion

security
events per day for clients

© 2010 IBM Corporation

31

Delivering trusted information for smarter business decisions across
your entire information supply chain

Analyze

Integrate

Transactional &
Collaborative

Applications

Manage

Business Analytics

Applications

External
Information
Sources

Cubes


Streams

Big Data

Master
Data

Content

Data

Streaming

Information

Data

Warehouses

Govern

Quality

Security &

Privacy

Lifecycle

© 2010 IBM Corporation

32

Enabling success

IBM Information Governance Unified Process

Define
Business
Problem

Obtain
Executive
Sponsorship

Conduct
Maturity
Assessment

Build
Roadmap

Establish
Organization
Blueprint

Build Data
Dictionary

Understand
Data

Create
Metadata
Repository

Define
Metrics

Appoint Data
Stewards

Manage Data
Quality

Implement
Master Data
Management

Create
Specialized
Centers of
Excellence (COE)

Manage
Security &
Privacy

Manage

Life
-
cycle

Measure
Results

= Enable through Process

= Enable through Technology

©
2010
IBM Corporation

33

What can you do next …


Start small with a project, don’t try to do it all at once


Free workshops and assessments


Best of breed solutions to help you succeed



Join a movement:
www.infogovcommunity.com


Benchmark your organization online


Work with others on the Maturity Model


Compare best practices in online peer reviews


Be recognized for what you contribute on the leader
board



Read the book:


The IBM Data Governance

Unified Process: Driving Business Value

with IBM Software and Best Practices



Visit our web page:


ibm.com
/informationgovernance


©
2010
IBM Corporation

Thank you