Analyze Your Content with Trusted Content Analytics

gruesomebugscuffleSoftware and s/w Development

Nov 25, 2013 (3 years and 6 months ago)

136 views

© 2009 IBM Corporation

The Next Wave of ECM Innovation …

Analyze Your Content with Trusted Content Analytics

Craig Rhinehart

Director of ECM Product Strategy


© 2009 IBM Corporation

Craig Rhinehart Contact Info


On my blog this week …


What happens when we fail to govern enterprise
content properly?



Email me at
craigrhinehart@us.ibm.com


My blog can be found at

http://craigrhinehart.wordpress.com/


Follow me on Twitter at
http://twitter.com/craigrhinehart

© 2009 IBM Corporation

Agenda


Introduction to Content Analytics


How Content Analytics Works


New Cognos Content Analytics Offering


Cognos Content Analytics Demo


New InfoSphere Content Assessment Offering

© 2009 IBM Corporation

Trusted Content Analytics Overview

Know

Trust

Leverage & Exploit

InfoSphere

Content Assessment

InfoSphere

Master Content

Cognos

Content Analytics

Empower organizations to identify necessary information and
decommission the unnecessary

Deliver trusted content to empower better decision
making about individual customers

Deliver insight by visualizing trends, correlations and anomalies
about your overall business from your content

© 2009 IBM Corporation

The world is changing and becoming more…


The resulting explosion of information

creates a need for a new kind of intelligence…



… to help build a Smarter Planet

Instrumented

Interconnected

Intelligent

© 2009 IBM Corporation

Creating New Business Optimization Opportunities...

Pervasive

Real
-
Time

Predictive

New
Intelligence

What if you could find crime
patterns and apprehend
criminals in real
-
time?

What if you could detect
fraudulent claims before
they’re paid?

What if you could understand

what your customers want
before they ask?

What if you could make cities
smarter by integrating all
information about a citizen?

© 2009 IBM Corporation

Business Optimization Enabled by Content Analytics

Smarter Insurance

Smarter Telecommunications

Smarter Healthcare Plans

Smarter CPG

NTT
DoCoMo

Analytics over Voice of Customer data provides
insight to drive customer
-
oriented decision
making, boosting loyalty and creating new
opportunity

Kraft Australia

Analytics over online customer postings helps
Kraft target and deliver new branding
campaigns, increasing sales and customer
loyalty.

Blue Cross Blue Shield of TN

Analytics over an integrated single view
of plans, patients and providers enables
better negotiations and improves provider
satisfaction to over 90%

Large Claims Third
-
Party Administrator

Analytics over insurance claim files helps detect
fraud faster, reducing costs for their clients by
$millions and optimizing the claims
-
handling
process

© 2009 IBM Corporation


Image Management


Office Document Management


Archiving / Records Management


Compliance Lifecycle Mgmt


Advanced Workflow


Activity Monitoring


Business Rules

Analytics is Driving the Evolution of ECM

ECM Becomes a Key Enabler for Information
-
Led Transformation

Automation

Optimization

Content

BPM


Advanced

Case Management

Trusted

Content Analytics

Smarter Business

Outcomes


Content Analytics


Content Assessment


Master Content

© 2009 IBM Corporation

Every single organization:

1.
Keeps too much

information and
spends too much

storing content
because there’s too much to sift through

2.
Can’t
pinpoint the right content

when they need it because its
unfindable or hidden away in a departmental silo

3.
Can’t
trust the content

they do find about their customers because
the lifecycle is uncontrolled

4.
Needs to deliver
better customer service
, for less because those with
the best service are rising above the rest in highly competitive
markets

5.
Wants to optimize their business by


anticipating their customers’ purchasing needs


reducing fraud


delivering a more
complete view of their customers


gaining early warning on
product quality

and
customer satisfaction

issues


because the answers exist inside their organization, they’re just
buried underneath too much information

© 2009 IBM Corporation

Agenda


Introduction to Content Analytics


How Content Analytics Works


New Cognos Content Analytics Offering


Cognos Content Analytics Demo


New InfoSphere Content Assessment Offering

© 2009 IBM Corporation

Key Enabling Innovation: Content Analytics


From each document you can derive:


New business understanding


New visibility from content


Create structure and understanding from a group of words


Powered by IBM’s unique Dynamic Analysis capability

Analyzed Documents

with identified concepts

John sprained his ankle on the step


...

Noun

Verb

Noun Phrase

Prep Phrase

Person

Injury

Body Part

Location

Claimant: Soft Tissue Injury

Extracted

Concept

Based on UIMA, the open, industry
-
standard

architecture for text analysis pioneered by

IBM and now an OASIS standard and Apache

open
-
source project

Content Analytics

© 2009 IBM Corporation

Content Analytics enables analysis that was previously impractical

Aggregates conclusions & scales out understanding to large data sets

Analyzed Documents

with identified concepts

John sprained his ankle on the step


...

Source Info

(ECM, File, Web, DBMS, ...)

Noun

Verb

Noun Phrase

Prep Phrase

Person

Injury

Body Part

Location

Claimant: Soft Tissue Injury

Extracted

Concept

Automatic Visualization

Concepts and tagged source
information are visualized in UI


Content analytics scales out
document by document content
investigation


Aggregate the conclusions


Assess volumes of information not
otherwise humanly possible (or
cost effective)

© 2009 IBM Corporation

Dynamic Analysis: Basis for Trusted Content Analytics Solutions

Impractical and overwhelming analyses are now a reality

Aggregate

… form collections from multiple
content sources and types unmatched in industry

Correlate

… deep analysis of content that surfaces
trends, relationships patterns, concepts and
anomalous associations

Visualize

… easy to use, feature
-
rich views to
quickly dissect large corpa of content and zero
-
in
on answers

Explore

… freely investigate content with faceted
navigation and drill down to surface new insight
and understanding.

Aggregate

Correlate

Explore

Visualize

IBM’s unique

Dynamic Analysis capability

… to enable informed business decisions

© 2009 IBM Corporation

Result: A Platform for Uncovering New Insights

Find early warnings
on product quality
concerns

Separate the valuable
content from the
unnecessary

Identify potentially
fraudulent insurance
claims

Determine what
customers will buy

Tells you something you may not know

© 2009 IBM Corporation

Based on UIMA

John sprained his ankle on the step
...
Noun
Verb
Noun Phrase
Prep Phrase
Person
Injury
Body Part
Location
Claimant: Soft Tissue Injury
Extracted
Concept
John sprained his ankle on the step
...
Noun
Verb
Noun Phrase
Prep Phrase
Person
Injury
Body Part
Location
Claimant: Soft Tissue Injury
Extracted
Concept
Noun
Verb
Noun Phrase
Prep Phrase
Person
Injury
Body Part
Location
Claimant: Soft Tissue Injury
Extracted
Concept
Unstructured
Information
Management
Architecture

It is an open, industrial
-
strength, scalable and extensible platform for creating, integrating and deploying unstructured
information management solutions from combinations of semantic analysis and search components.


Although UIMA originated at IBM, it is now an OASIS industry standard and an Open Source project

which is currently incubating at the Apache Software Foundation.


http://domino.research.ibm.com/comm/research_projects.nsf/pages/uima.index.html


Automated Concept
Extraction and Logical
Organization

UIMA Annotators

Identify Language

Word Analytics

Named Entity Extraction

Automatic Classifier

Plug
-
in Custom Analytics

Enhanced

Metadata

Analytics

Index

Visualization UI

Crawlers

Multi
-
word Analytics

Tokenization

© 2009 IBM Corporation

Agenda


Introduction to Content Analytics


How Content Analytics Works


New Cognos Content Analytics Offering


Cognos Content Analytics Demo


New InfoSphere Content Assessment Offering

© 2009 IBM Corporation

Using Dynamic Analysis, Cognos Content Analytics
powers solutions that can:


Drive
new business understanding

and
visibility

leveraging the content & context of unstructured
information


Enable better business decisions by explaining
why

events are occurring


Expose patterns and trends to
highlight optimization
opportunities
and
create

differentiation


Create
cost savings

by uncovering process
inefficiencies and optimization opportunity


All
without prior knowledge

or

pre
-
defined queries or reports


The impact:


Improved customer satisfaction


Reduced fraud


Better understanding of market demand and
perception


Early warning on product quality issues


IBM Cognos Content Analytics


Deliver insight about your overall business from your content


Leverage &
Exploit

I need to
reduce fraud

I need to better
anticipate my
customers’ needs

I need better
visibility into the
marketplace

I need to get
ahead of product
quality problems

I need to fight
crime faster

I need to make my
legal team more
efficient

I need to improve
my customer sat
metrics

I need to
optimize my
claims process

I need to anticipate
compliance
violations

I need to assess my
content & take action
to better manage it

© 2009 IBM Corporation

IBM Cognos Content Analytics features…



Analyze and explore structured and
unstructured information


Automatic extraction of meaningful
concepts and entities from text


Open, standard UIMA
-
based text analysis
pipeline


Integration with Cognos for reporting
against unstructured concepts


Multiple graphical views of the facets
(dimensions) of unstructured content


Automatic highlighting of interesting
anomalies and correlations in the data


Support for analysis of over 30 content
sources and over 150 content formats


Integration with ICM for analysis of
document categories, classes, and clusters


Highly scalable and extensible

© 2009 IBM Corporation

Cognos Content Analytics adds value to…

Telco Customer Care


Analyzing:
Call center logs and emails


For:
Churn

prediction and FAQ generation


Benefits:

Improved customer retention &
customer satisfaction

Automotive Quality Insight


Analyzing:
Tech notes, call logs, online media


For:
Brand Reputation Management


Benefits:
Reduce warranty costs, improve customer
satisfaction, marketing campaigns

Retail Customer Care


Analyzing:
Call logs, online media


For
:
Brand Reputation Management


Benefits:

Improve customer sat, marketing campaigns


Crime Analytics


Analyzing:
Police records, 911 calls…


For:
Rapid crime solving & crime trend analysis


Benefits:
Safer communities & optimized force deployment

Retail Banking

Customer Care


Analyzing:
Call logs, online media


For:

Buyer Behavior


Benefits:

Improve Customer
satisfaction, marketing campaigns,
find new revenue opportunities

Healthcare Analytics


Analyzing:

Care records


For:
Clinical analysis; treatment protocol optimization


Benefits:
Better management of chronic diseases;
optimized drug formularies; improved patient outcomes

Insurance Fraud


Analyzing:
Insurance claims


For:
Detecting Fraudulent activity & patterns


Benefits:
Reduced losses, faster detection,
more efficient claims processes

...and more!

© 2009 IBM Corporation

Insurance Case Study for Fraud Detection and Prediction

1.
Automatically aggregate structured and
unstructured data accumulated over
time from the claims process

2.
Correlate text analytics to apply
meaning and understand patterns and
trends … visualize and explore to
uncover new insights into claims
process

3.
Instrument by applying indicators to
“in process” claims to identify
suspicious claims and type of risk

4.
Score suspicious claims to predict
probability and impact of fraud and
risks

5.
Route high
-
likelihood and/or high
-
impact claims for investigation based
on scoring outcomes

6.
Continuously improve outcomes
through closed loop optimization

Automatic

Routing to

Investigations

Claims

Process

Content Analytics
Based Predictive
Fraud Indicators:



Soft Tissue Injury


Unwitnessed Event


Prior Injury


Multiple Claims …

Historical Cross
-
Claim

Content Analytics

...

1

2

5

3

6

4

© 2009 IBM Corporation

Partner Solution for Healthcare Fraud Analytics

© 2009 IBM Corporation

Partner Solution for Healthcare Fraud Analytics

© 2009 IBM Corporation

Accelerating Regulatory Review




EPA tracks chemicals being produced


Chemical producers submit robust reports of
effects on environment


EPA has 3,000 of these reports and no


way to analyze the data


The Customer Problem:

The Results:


Convert documents to XML


Extract complex chemical structures from the
documents


Provided
toxicological
capability
to
understand how different chemicals map
to
“end effects” (e.g. increase in liver weight)


Provide ability to analyze chemical structures
in reports and, using patent data, understand
how these chemical are being used in the
environment

The Solution:

Environmental Protection Agency

© 2009 IBM Corporation

Identify and Designate
Trusted Repositories of
Record
Create, Control,
Maintain and Supply
Trusted Content
Consume, Leverage
and Exploit Trusted
Information
Govern The Information Lifecycle

Archive, Record and Preserve
Information and Evidence of Transactions, Processes and Events
Identify and Designate
Trusted Repositories of
Record
Create, Control,
Maintain and Supply
Trusted Content
Consume, Leverage
and Exploit Trusted
Information
Govern The Information Lifecycle

Archive, Record and Preserve
Information and Evidence of Transactions, Processes and Events

Search and analyze complaints, police
reports, 911 records, arrest records, and
data marts … all stuck in silos of information


All of these forms of text suffer from the
common problems of call center text i.e.
abbreviations, misspellings, synonyms
(Police
-
specific i.e. perp, ML, FM, MO,
pistol, gun, etc...)


Find events that keyword search can never
find because they are all described
differently


what keyword to use?

Challenge

Solution

The Results


Text Analytics can describe events,
categorize them and allow for concept
searches across often unstructured
and at times inaccurate descriptions


Enables aggregated view of
information beyond silos


In the first week of deployment two old
murder cases were solved which were
directly attributed to being able to
analyze trusted data and content



Better Business Outcome: NYPD is Solving More
Crime Faster with New Insight from Content Analytics


IBM OmniFind Enterprise Edition with
Content Analytics enables insight and
understanding across all silos


Customized with NYPD
-
specific case
management analytics

© 2009 IBM Corporation

Accelerating Crime Analysis (Law Enforcement)


Customer observed “that
a too significant part
(estimation of
76%)

of the analyst’s time is spent

in non real analysis tasks

with no real added value for their analysis

business”


“Enable the analysts to
cope with the increasingly large
volumes of intelligence information
that they are receiving



“Automatically extract and find relevant information (facts,
entities, link, etc.) useful for the analysis

without having to
spend hours

to examine and
manually

parse data collection.”


Solution based on Content Analytics with search front
-
end
built with IBM OmniFind Enterprise Edition on top of an ECM
system

Europol

© 2009 IBM Corporation

Europol Example

Concepts

such as cars,
people, and crime events is
extracted
from the
underlying text

by

text analysis technology

Dynamic
refinement
of user
query
, based on
detected
concepts

© 2009 IBM Corporation

Agenda


Introduction to Content Analytics


How Content Analytics Works


New Cognos Content Analytics Offering


Cognos Content Analytics Demo


New InfoSphere Content Assessment Offering

© 2009 IBM Corporation

FDA MedWatch

incident reports are one source
of data for
medical device manufacturers

to
understand problems being reported by
consumers about their products. It contains both
structured and unstructured

information.

A manufacturer could
also analyze internal
content
, such as warranty claims or support
incidents

© 2009 IBM Corporation


This view shows
Deviations

(or anomalies)
over time

for
all values of the selected
facet


in this case, Generic
Device Name

© 2009 IBM Corporation


Here we see an
unexpectedly
high occurrence of incidents

around Infusion Pumps in April,
2008, so we
drill in
.

© 2009 IBM Corporation


Switching to the
Facets

view
of
key phrases
, we see
frequent mentions of
battery
issues

in Infusion Pump
incidents reported in April,
2008. We drill down into these
battery issues.

© 2009 IBM Corporation


In the documents view, we can
see the
original source
documents

about these 154
battery
-
related infusion pump
incidents.

Relevant matching text from
the original documents is
highlighted
.

© 2009 IBM Corporation


Switching to a Brand Name
facet
view
, we can immediately see a
summary, by frequency and
correlation,

of the devices that
are mentioned in these battery
-
related incidents.

© 2009 IBM Corporation

Through
Cognos Content
Analytics OLAP/Star Schema
export

ability, Cognos BI
reports
and dashboards

can be created to
monitor and track

these issues
over time.

© 2009 IBM Corporation


When a potential
regulatory, legal, or
compliance issue

is identified, the same
Content Analytics interface can be used to
identify internal documents

that might be
relevant, gather them, and
export

them for
archiving into a centralized IBM ECM
repository.

© 2009 IBM Corporation


The
IBM Content Collector

provides a graphical interface for
coordinating the archiving

of
these, and other relevant items
(such as related emails).

Emails and Documents can be
classified, declared as records and
even have meta data cleansed
prior to becoming a managed or
archived item

© 2009 IBM Corporation


Once gathered into a repository,
IBM
eDiscovery

tools can be used to
place legal holds

on items, and
prepare evidence

for legal cases,
audits, or other compliance events.


Retention and Legal holds can be
enforced within the storage
infrastructure if using IBM Information
Archive

© 2009 IBM Corporation


Specific subsets of
evidence
can be marked

for further
review to identify the degree
of risk or legal exposure.

© 2009 IBM Corporation

Agenda


Introduction to Content Analytics


How Content Analytics Works


New Cognos Content Analytics Offering


Cognos Content Analytics Demo


New InfoSphere Content Assessment Offering

© 2009 IBM Corporation

Unnecessary Information Eclipses Necessary Information


Unnecessary
Information


Over
-
Retained

Irrelevant

Duplicated



Necessary
Information


Valued

High Risk

Compliant


How much of your information is unnecessary?
70%? 80%? 90%?

© 2009 IBM Corporation

41

Content Assessment Enables Content Decommissioning

Content In The Wild

Bloated Production Systems
with Inefficient Storage

Trusted

Content

Unnecessary

Information

Content Based Systems
Needing Retirement

Keep

Decommission


Semi
-
automated process separates trusted
from suspected



Efficiently addresses large
-
scale problems,
while incorporating the human element

One customer found 1200 copies of the
same policy document across multiple
enterprise file servers

© 2009 IBM Corporation

IBM InfoSphere Content Assessment

Housekeeping doesn’t have to be a chore.

Dynamically Analyze

what you have

Aggregate, Correlate, Visualize and Explore your enterprise information in
new ways to understand virtually all content types from multiple sources.
Make rapid
decisions

about business value, relevance and disposition.

Decommission

what’s unnecessary

Save cost and reduce risk by eliminating obsolete, over
-
retained, duplicate,
and irrelevant content


and the infrastructure that supports it.

Preserve and Exploit

the content that matters

Collect valued content to manage, trust and govern throughout its lifespan in
an enterprise
-
grade ECM platform. Uncover new business value and insight
by integrating with solutions for eDiscovery, case management, master data
management, business intelligence, predictive analytics and more.

1

2

3

© 2009 IBM Corporation

Cost Drivers

Savings After Deployment

Production System Tangible Costs

Storage Management Tangible Savings



Email / File / SharePoint Storage



50%
-
80%



Production System Servers



40%
-
60%



Backup



Cost of backup media and storage


Production System Productivity Costs

Storage Management Productivity Savings


Production System Administration



20% to 80%


End
-
User Administration / Classification



70% to 90%


eDiscovery Costs

eDiscovery Cost Avoidance



Data Spoliation (fines, lost or settled
cases)



Labor costs of providing the information



Up to 100%


Hours vs. Days

Selling Content Assessment via BVA

Content decommissioning, dynamic collection for eDiscovery lead to measurable ROI

© 2009 IBM Corporation

Trusted Content Analytics Summary

Know

Trust

Leverage & Exploit

InfoSphere

Content Assessment

InfoSphere

Master Content

Cognos

Content Analytics

Empower organizations to identify necessary information and
decommission the unnecessary

Deliver trusted content to empower better decision
making about individual customers

Deliver insight by visualizing trends, correlations and anomalies
about your overall business from your content

© 2009 IBM Corporation


Email me at
craigrhinehart@us.ibm.com


My blog can be found at

http://craigrhinehart.wordpress.com/


Follow me on Twitter at
http://twitter.com/craigrhinehart

Craig Rhinehart

Director of ECM Product Strategy