IBM Content Discovery solutions

decorumgroveInternet and Web Development

Aug 7, 2012 (5 years and 12 days ago)

465 views

IBM Software Group

1

Nigel Freeman

Content Discovery specialist
-

IBM Software Group


Nigel_Freeman@uk.ibm.com



May 2006

Information is Everywhere

Managing Information for

Discovery

and Search

IBM Software Group

2

Agenda


Too much information


drowning or swimming ?


IBM is going beyond mere ‘search’… IBM Content Discovery architecture


Content Integration services
: making connections between existing systems


Information Integration Content Edition


overview


Enterprise Search
: not the same as Internet search


What do you need from Enterprise Search and text analytics middleware?


OmniFind


overview


Text Analysis
:
-

Unstructured Information Management Architecture UIMA


Contextual Delivery, Information Accelerators

to generate customer solutions


WebSphere Content Discovery Server


overview


IBM Content Discovery products, summary


Customer Examples

IBM Software Group

3

Drowning in information, or swimming?


Organisations today are faced with an ever
-
growing abundance of information.
The lack of a proper systems to access and manage their collective wisdom
can cripple an organisation
-

not being able to find the relevant information
when it is needed or finding it too late translates into bad decisions, missed
opportunities, wasting time and money reinventing information that already
exists.


“It is clear that we are all drowning in a sea of information. The
challenge is to learn to swim in that sea, rather than drown in it.”

-

from a study by
University of California, Berkeley School of
Information Management and Systems



By implementing cutting
-
edge systems for organizing and accessing
information, organisations will promote growth at significantly reduced cost to
today’s enterprise.


“ An enterprise with 1,000 knowledge workers wastes $48,000 per week


$2.5 million per year


due to an inability to locate and retrieve
information.”

The High Cost of Not Finding Information,
IDC





IBM w3 advertisement

“w3 personalisation…”

IBM Software Group

4

Information is isolated in multiple silos …

Independent Systems

Customer
Service

Council
Tax

Social
Services

Education

Leisure
Services

Planning

Housing

The problem…

IBM Software Group

5

… and the vast majority is unstructured


Office Documents


Images


Web pages


E
-
mail


Audio & Video


Free
-
form text fields
(comments/notes)


File servers


Websites


Portals


ECM systems


Collaborative systems


Databases (BLOBs and
free
-
form text fields)

Examples

Where It Exists

IBM Software Group

6

Typical search experience is not good enough

“Loan”

I need help
finding a
loan for
college

Typical Online Experience

Burden of discovery is on the end user!

IBM Software Group

7

There is inherent tension between business and IT


Line
-
of
-
Business Owners and
Project Leads


Must deliver
information to
their specific customers
,
partners and employees to
facilitate business process


Care most about best of breed
functionality and direct control
over the end user experience



IT Architects and CIOs


Must make information
available from
across the
enterprise

in a secure and
standard format


Care most about achieving
leverage and reuse, with a low
total cost of ownership

Search App 1

Search App 2

Search App 3

Enterprise Search Infrastructure

IBM Software Group

8

The IBM Approach
:
Content Discovery



Information is isolated in multiple silos


Native,
bi
-
directional

access ensures all assets are available and
content can be continually improved


Much of it is unstructured, limiting its use


Uncovering the
inherent meaning

of unstructured content can
enhance search relevance, giving new levels of business insight


Traditional search is a bottleneck to facilitating action


Understanding
user intent

and
application context

allows
organizations to get the right information to the right people at
the right time


IT wants standards but business wants control


Complete solutions built on a
Service Oriented Architecture

allow organisations to balance the needs of business and IT

Going Beyond “Search” to “Find”

IBM Software Group

9

Content Discovery















Analysis & Discovery Services




IBM Content Discovery Architecture


Content Integration Services



Information Accelerators


Search &
Indexing

Text Analysis
(UIMA)

Contextual
Delivery

Extract knowledge and
meaning, for greater
relevance and insight

Industry vocabularies and
solution templates
shorten deployment time

Broad content access and
native integration for secure
read
and write
access

Scalable search capability
with sophisticated indexing
and retrieval

Understand user intent and
context, to guide action and
navigate large result sets

IBM Software Group

10

Content Discovery















Analysis & Discovery Services







Content Integration Services



Information Accelerators


Search &
Indexing

Text Analysis
(UIMA)

Contextual
Delivery

IBM Software Group

11

The Problem: Multiple Silos of Content

36%

14%

25%

17%

1 repository

5%

2
-
5 repositories

6
-
10 repositories

10
-
15 repositories

4%

More than 15

repositories

Don't know

Survey base: 81 North American decision
-
makers

(multiple responses accepted)

“The Future of Content in the Enterprise,”
Connie Moore and Robert Markham

IBM Software Group

12

WebSphere II Content Edition


SOA, enterprise
-
class integration
architecture for “content”


Single interface to multiple content
sources and workflow systems


Many “out of the box” connectors
and toolkit for custom connectors


Two
-
way access to expose
underlying functionality


Adds cross
-
repository services such
as federated search, event services,
single sign
-
on, etc


“Out of the box”client, development
components and APIs for building
custom applications

CALL CENTER
COMPLIANCE
SELF
-
SERVICE
CRM
WEBSITES
Lets you work with content from multiple
disparate content sources
-


as if it were stored in one unified system

IBM Software Group

13

Display associated metadata with
the ability to preview a document
and update content or properties

Provide a single point of access to all
documents associated with the customer,
regardless of where they are stored

Content Integration Services

Seamless Access to Distributed Content from Business Applications

IBM Software Group

14

WebSphere II Content Edition
Integration Services



Many Out
-
of
-
the
-
Box Connectors


Pre
-
built and fully supported real
-
time, bi
-
directional connectors


Exposes content, workflow and functionality of underlying systems


Available for most major commercial systems, including…






Connector SDK for custom systems

INTEGRATION SERVICES

Documentum

Content Server,
FileNet

Content Services, FileNet Image Services, FileNet P8 Content
Manager, FileNet P8 Business Process Manager,
Hummingbird

DM,

IBM

Content Manager, IBM Content Manager OnDemand, IBM Portal Document Manager, Lotus
Domino Document Manager, IBM Lotus Notes, IBM WebSphere MQ Workflow,

Interwoven

Teamsite Content Server,
Microsoft

Index Server,
OpenText

Livelink Enterprise Server,
Stellent
Content Server,
File Systems, Lab Services, Partner Connectors


IBM Software Group

15

WebSphere II Content Edition
Federation Services


Meta Data Mapping


Common schema across different systems


Federated Search


Single search interface across multiple disparate systems


Virtual Repository


Single, unified view of distributed content


Consolidated view of work tasks from multiple workflow systems


Subscription Event Services


Subscription
-
based notification of changes to content, across
multiple repositories


View Services


Convert content on
-
the
-
fly to browser
-
readable formats (eg PDF,
HTML)


Single Sign
-
On (SSO) authentication


Native and integration with LDAP and Active Directory

INTEGRATION SERVICES

FEDERATION SERVICES

IBM Software Group

16

WebSphere II Content Edition
Developer Services


Federated Client


Complete out
-
of
-
the
-
box UI for working with distributed content


Includes key functionality and a highly usable interface



Web Components


Accelerates time to market for custom applications


Development components plug into web applications


Completely customizable look and feel


Includes JSR 168 compliant portlets



WebSphere II Content Edition API


Complete access to content and workflow functionality


Easy to use Java API and SOAP
-
based Web Services API

INTEGRATION SERVICES

DEVELOPER SERVICES

FEDERATION SERVICES

IBM Software Group

17

IBM Federated Records Management

Consists of


IBM DB2 Records Manager, WebSphere II
Content Edition, FRM Solution Components*

Key Features


Central policy mgmt on distributed content


“Touchless” records declaration


Federated search for discovery operations


Two
-
way, consistent UI to content systems

…the application of records management to distributed content

Business Value


Reduce risk with centralized RM policies


Accelerate time to compliance


Reduce discovery costs


Consolidate over a phased timeframe


Provide a “future proof” infrastructure

1
DCTM
FILE
OTEX
HUMC

Other Content Repositories

DB2 Records
Manager
2
DB2
Content
Manager
DB2
Content
Manager
Leave records in
native repository

Move records to strategic
repository at declaration

*Services Offering

IBM Software Group

18

Content Discovery















Analysis & Discovery Services





Content Integration Services



Information Accelerators


Search &
Indexing

Text Analysis

Contextual
Delivery



IBM Software Group

19

OmniFind: it’s not Google…


because Intranet Search is different from Internet Search


Corporate intranets are smaller


… but it’s more difficult to return highly relevant results


Less content in a corporate intranet … lower chance for perfectly
matching document


Less well linked


fewer links and anchor text cues


so Page
Ranking isn’t the answer


The heterogeneous nature (both in form and size) makes search
precision difficult

IBM Software Group

20

Q26: For which solutions do you plan to keep your existing tool, and for which would you like the portal to provide?

*
Base = Those with portal solutions implemented, planned or under evaluation.

Intend to keep
existing tool

Would like Portal
to provide

Search

32%

68%

Content management

39%

61%

Reporting

40%

60%

Authentication/single sign on

41%

59%

Process automation/workflow

42%

59%

Collaboration

43%

57%

Directory

43%

57%

Enterprise application integration (EAI)

46%

54%

Taxonomy

52%

48%

Activity Tracking

60%

41%

Application server

63%

37%

Desktop productivity (spreadsheet, word processing, etc.)

68%

32%

Windows desktop

79%

21%

Search

and
content management

are the top two capabilities
expected by 289 Portal customers

Reference: Enterprise Portal Purchase and Usage Characteristics, Final Report, META Group Multi
-
Client Study,
November 2003


IBM Software Group

21

WebSphere II OmniFind Edition

Crawl Index Search


Excellent search quality


Complements and uses IBM’s
offerings in portal, content
management, and
Information Integration


Crawls a broad range of
enterprise data sources


Leverages systems’ own
security mechanisms


Open architecture (UIMA)
for text analytics and
semantic queries


Rich multilingual capabilities

Keyword

search

Semantic

search

Text

analysis

IBM Software Group

22

Key Technologies

Crawling



Scalable Web crawler



Data Source crawlers



Custom Crawlers



Parsing/

Tokenizing





HTML / XML



200+ Doc Filters



Advanced Linguistics

Search

Applications



Categorization

(optional)



Dynamic & Admin
-
influenced ranking



Fielded Search



Parametric Search



Semantic search

Searching

Text Analytics



Partner Apps



UIMA



Indexing



Global Analysis



Static Ranking



Store

Security

Sources

of

Enterprise

Content

IBM Software Group

23

OmniFind Crawlers


Web content


HTTP / HTTPS


News groups (NNTP)


WebSphere Portal portlets and Portal Document Manager


Collaboration



Lotus Notes /Domino databases, Domino.Doc, QuickPlace


MS Exchange public folders


Windows and Unix File systems
-

over 250 file formats:
PDF, MS Word / Excel / Powerpoint, Lotus SmartSuite, etc etc


Enterprise Content Management systems


DB2 Content Manager


via WebSphere
Information Integrator Content Edition
:
FileNet Content Services, FileNet P8, Documentum,
Hummingbird DM, OpenText LiveLink and more in future


Relational Data sources


DB2 family (DB2, Informix, DB2 for z/OS)


WS Information Integrator relational data sources (Oracle,
Informix, MS SQL Server, Sybase)



Federated access to LDAP and JDBC


Data Listener API for Custom crawlers


II

Standard Edition


Content


Manager

QuickPlace

Domino

Domino.doc

MS Exchange

Windows File

System

Unix File
System

Websites

Newsgroups

Data Listener


II

Content Edition

SQL Server

IBM Software Group

24

OmniFind Security


Security can be set at Collection level or Document level


OmniFind uses the application’s own security for Access
-
Control Lists
for the following data sources:


Lotus Notes / Domino


Domino Document Manager


QuickPlace


WebSphere Portal Document Manager


Portal pages


FileNet CS


Windows File System


Documentum

IBM Software Group

25

Linguistic Support


The document language is detected automatically and used for language
-
specific result filtering at
search time. Language
-
specific base form computation (eg “mouse” for “mice”) is provided.





Automatic language detection also works for Arabic, Hebrew, Hungarian and Turkish (but no base form
support yet).

Basic Support


Text is segmented using either white space information (for simple text languages) or n
-
grams (for
complex text languages).


If simple and complex script languages are mixed in one document, the best segmentation strategy
(either white space or n
-
gram) is selected for each individual script range within the document.


Basic support processing should work
for all languages
. No language limitation is built into OmniFind.


IBM tests basic support for the following list of languages:


Simple Text Languages (STL)


Albanian, Bulgarian, Belarusian, Catalan, Croatian, Estonian, Hungarian, Icelandic, Indonesian,
Kazakh, Latvian, Lithuanian, Macedonian, Malay, Romanian, Serbian (Cyrillic & Latin), Slovak,
Slovenian, Turkish, Ukrainian


Complex Text Languages (CTL)


Arabic, Bengali, Gujarati, Hebrew, Hindi, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil,
Telugu, Thai, Vietnamese

Language Support in OmniFind

OmniFind has Linguistic support for:

Chinese (Simplified & Traditional), Czech, Danish, Dutch,
English
, Finnish, French, German, Greek, Italian, Japanese, Korean, Norwegian (Bokmal &
Nynorsk), Polish, Portuguese, Portuguese, Russian, Spanish, Swedish

IBM Software Group

26

Search & Indexing Services

Simple “Google” Style Search for Enterprise Content

Out
-
of
-
the
-
box search application provides
“Google”
-
style results list with paging



relevancy ranking, date, field values



site collapse



customizable look and feel

Configurable ‘Quick links’ provide
immediate access to predetermined
relevant sites, documents or applications

Broad support for searching across
enterprise content sources

“Did you mean?” synonym expansion provides
one click access to other potentially relevant
queries or can be used for spelling correction

IBM Software Group

27

Content Discovery















Analysis & Discovery Services





Content Integration Services



Information Accelerators


Search &
Indexing

Text Analysis
(UIMA)

Contextual
Delivery



Unstructured Information Management

Architecture (UIMA)

IBM Software Group

28


Most BI implementations ignore knowledge buried within free form text


They can only report on predefined structured data, such as problem codes…



Problem descriptions, technician comments, call center notes and customer correspondence can contain a lot of the supporting
det
ails
required for true insights

Text Analysis Services

Leveraging Knowledge Buried in Unstructured Information

IBM Software Group

29

Text Analysis Services

Extract Knowledge From Unstructured Information


Identify concepts, entities and facts buried in unstructured content


Determine underlying issues or problems, parts referenced and actions from technician or customer service notes, customer sur
vey
s, consumer
review sites and other sources


PART 1: Fuel Pump

PART 2: Fuel Filter

PART 3: Wiring Harness

PART 4: Wiring Harness Cover

PROBLEM 1: Corrosion


PART 3: Wiring Harness

ACTION 1: Replace


PART 1: Fuel Pump


PART 2: Fuel Filter

ACTION 2: Remove


PART 4: Wiring Harness Cover


Extracted knowledge can now be sent to a search engine, database or delivered as a service to rules processing
engines and other business applications


Provide broader access through more simplified search and browse interfaces

IBM Software Group

30


Report on facts extracted from unstructured information


Show other parts referenced, underlying root problems or issues,
and actions taken…


Create alerts to be notified of specified findings or thresholds


Provide simplified search interface extending access to broader set
of users


Easily find information about claims involving a fuel pump…



See all of the other parts, problems and actions referenced in the
warranty claim

Text Analysis Services

Leveraging Knowledge Buried in Unstructured Information

IBM Software Group

31





Identify Language

Find Words & Roots

Categorization

Plug In Annotator

Plug In Annotator

Extracted

Metadata

and Facts

Text

Data
Warehouse

Rules

Engine

...any Application

Search
Application

Reports

Search
Index

WebSphere II OmniFind Edition

Plug In Annotator

Plug In Annotator

UIMA


UIMA:
Unstructured Information Management Architecture
: a “plug and
play” framework for advanced text analysis components


UIMA framework allows “Annotators” to add value to text


find words specific to an industry, from dictionary or by rules


add further information around these terms, like Latitude/Longitude for places


allow Indexed and annotated results to go to other processes / systems as well as
to a Search Engine, for further analysis or semantic search

IBM Software Group

32

Content Discovery















Analysis & Discovery Services





Content Integration Services



Information Accelerators


Search &
Indexing

Text Analysis

Contextual
Delivery



WebSphere Content Discovery Server

(iPhrase)



WCDS demo on
-
screen

“WCDS Self Service demo.exe”

IBM Software Group

33

Embed Rich HTML
responses within
result

Interactive
promotion
guides action

Understands user
intent and provides
actionable response

WebSphere Content Discovery for Self Service

IBM Software Group

34

Contextual Delivery Services

Integration into Contact Centres facilitates faster Problem Resolution

Launch query for possible resolutions
directly from Siebel Call Center…

…leverage context and customer info to
automatically find most relevant content

Return integration enables creation of
new solutions based on findings

Enable agents to easily filter content by
source, product and other attributes

IBM Software Group

35

Empower business managers to
easily refine the end
-
user experience

Monitor end
-
user behavior and
effectiveness of business rules

Contextual Delivery Services

Business User Control

IBM Software Group

36













IBM Product Offerings

Integrating Content from
Multiple Sources into
Business Applications











WebSphere

Content

Edition

WebSphere

OmniFind

Edition

WebSphere

Content Discovery

Server

Infrastructure for
Enterprise Search and
Text Analytics

Business Driven
Search Applications

Contextual
Delivery

Search &
Indexing

Text
Analytics

Content
Integration



IBM Software Group

37

Customer Examples




Content Discovery















Analysis & Discovery Services





Content Integration Services



Information Accelerators


Search &
Indexing

Text Analysis
(UIMA)

Contextual
Delivery

IBM Software Group

38

Growth through Acquisition

Challenge

Wachovia

improved business effectiveness and addressed
compliance issues by providing integrated view of all content


Access and work with content from multiple repositories
following mergers


Deliver repository independent customer service,
brokerage and workflow applications

Benefits


Greater accessibility resulted in 50
-
fold increase in number
of content retrievals


$2.3 million savings within 2 years for a 64% return on
initial investment


$1 million savings for each additional business unit
implementing content integration services


Business executives have immediate access to newly
acquired systems


Content
Integration

IBM Software Group

39

Challenge

IFPMA

makes it easier for doctors and patients to research
clinical trial information worldwide


Doctors and patients need to find info about all clinical
trials sponsored by the pharmaceutical industry


Unstructured information from multiple companies and
clinical trials registries

Benefits


Enables searching by disease area, medicine name or
trial location


Recognizes medical and geographical synonyms across
multiple languages, without manual indexing


Allows doctors and patients to find trials they can join and
review summarized results

Search &
Indexing

Text
Analytics

IBM Software Group

40

Challenge

CBI Engineering

increased productivity by allowing employees
to access Lotus Notes from their intranet search solution


Need for improved search relevancy across file system and
Lotus Notes to make engineers more productive


Must respect security already defined within Lotus Notes

Benefits


Common search framework for intranet, file system and
Lotus Notes content


Engineers able
to seamlessly access native Notes
documents from intranet search results


Allowed CBI to provide broad content access while
honoring stringent native repository security

Search &
Indexing

IBM Software Group

41

Challenge

IBM Workplace for Customer Support

(Lotus Premium Support)

increased customer satisfaction and productivity with Content Discovery


Revitalize customer interest in using lower cost online
support channel


Streamline customer self
-
sufficiency while continuing to
deliver personalized service from IBM support staff

Benefits


Increased customer satisfaction through the delivery of
relevant information in 3 clicks or less


Unified content from disparate repositories to simplify
problem resolution


Enabled resolution of repetitive product problems in less than
five minutes


Decreased number of problem management reports
submitted

Personalization enables results
to be automatically limited to
customer owned products

Customers can escalate
and preserve context

Enables searching across multiple
content stores and easy user navigation

Contextual
Delivery

IBM Software Group

42

Summary


Getting the right information to the right people at the right time is
a key element of achieving
Information On Demand


IBM is building this capability around a portfolio of


Content Integration


Text Analytics


Search & Indexing


Contextual Delivery


Information Accelerators


IBM Content Discovery

brings these capabilities together to help organizations drive
measurable results for their business

IBM Software Group

43

Thank You





Any questions ?

IBM Software Group

44

The IBM Content Discovery software portfolio

WebSphere Content

Discovery Server

WebSphere II

OmniFind Edition

WebSphere II

Content Edition

Allows
organizations
to …

Quickly deploy business
driven solutions that
increase revenue and
reduce support costs


Records Management


M&A Content Migration

By

providing …

Example
initiatives

A rich understanding of
user intent and application
context to help people
quickly find the information
they need to make
purchases, answer
questions, and solve
problems

Implement a
single
search architecture

to
underpin enterprise portal
and BI initiatives

Robust enterprise
search capabilities

and
a
text analytics
foundation

able to
uncover the inherent
meaning of large volumes
of content from around
the globe

Manage, leverage and
extend their enterprise
content without painful
ripping and replacing

Virtual access to dozens
of content silos via a
single interface to
increase productivity,
manage risk, and lower
development costs


Issues Analytics


Intranet Search


eCommerce


Self
-
Service websites

IBM Software Group

45

OmniFind
-

Linguistic Analysis


Linguistic processing when adding document to index


Determines language of document


Tokenizes text


Creates index using tokens


Linguistic processing performed during search


Query string segmented, analyzed, searched in index


Stop word removal


removing “a”, “the”, etc.


Character normalization


Normalization performed in Unicode


Case normalization


finding documents with “USA” when searching with “usa”


Umlaut normalization


finding documents with “shoen” when searching with “schön”


Accent removal


finding documents with “é” when searching for “e”


Other diacritics removal


finding documents with “ç” when searching for “c”


Ligature expansion


finding documents with “Æ” when searching for “ae”


Normalization works in both directions


IBM Software Group

46

OmniFind
-

Linguistic Analysis


Recognize documents in a wide range of languages:


Arabic, Chinese (traditional and simplified), Czech, Danish, Dutch, English, Finnish,
French, German, Greek, Hebrew, Hungarian, Italian, Japanese, Korean, Polish,
Portuguese (Brazilian), Russian, Spanish, Swedish, Turkish


Dictionary
-
based linguistic support for documents in recognized languages


Word segmentation


Stemming, find “mice” when searching for “mouse”


Break contractions into parts, make “wouldn’t” into “would” and “not”


Clitics, a form of contractions, make “l’avenue” into “le” and “avenue”


Recognize non
-
alphabetic characters as part of or separate from a lexical unit, e.g.,
URLs, dates


Recognize abbreviations


Recognize end of sentence for sentence segmentation


Basic support for documents not in a recognized language


Word segmentation via white space or blanks, and, n
-
gram segmentation