Combining the power of IBM Watson and OSGi

addictedswimmingΤεχνίτη Νοημοσύνη και Ρομποτική

24 Οκτ 2013 (πριν από 3 χρόνια και 9 μήνες)

97 εμφανίσεις

©
2013
International Business Machines Corporation

EclipseCon

Boston 2013

Follow us @IBMWatson

Combining the power of
IBM Watson and
OSGi

David
Taieb

IBM Watson Core Technologies

©
2013
International Business Machines Corporation

2

Agenda

What is IBM Watson and why is it important?

IBM Watson deep dive and

how is IBM putting it
to work?

Overview of the IBM Watson Tooling Platform

©
2013
International Business Machines Corporation

3

Businesses are


dying of thirst in an ocean of data


1 in 2

business leaders
don

t have access
to data they need

83%

of CIO
s cited BI and
analytics as part of
their visionary plan

2.2X

more likely that top
performers use
business analytics

80%

of the world’
s data
today is
unstructured

90%


of the world’
s data
was created in the
last two years

1 Trillion

connected devices
generate 2.5
quintillion bytes
data / day

©
2013
International Business Machines Corporation

4

System

Intelligence

1900

1950

2011

1

1

Watson is ushering in a
new era of computing . . .


. . .enabling new opportunities and outcomes

Tabulation

Programmatic

Cognitive

Punch cards

Time card readers


Search

Deterministic

Enterprise data

Machine language

Simple outputs



Discovery

Probabilistic

Big Data

Natural language

Intelligent options

©
2013
International Business Machines Corporation

5

Person

Organization

L. Gerstner

IBM

J. Welch

GE

W. Gates

Microsoft

“If leadership is an art
then surely Jack Welch
has proved himself a
master painter during his
tenure at GE.


Welch ran
this?

Why is it so

hard for computers to understand

us?

What Watson isn't


Search engine


New
-
fangled database
system


Skynet

or HAL 9000

What Watson is


Question answering (QA) system


Combines information retrieval and natural language
processing (NLP)


Builds its domain knowledge from sources comprising
structured and unstructured data


A core set of technologies that can be customized and
targeted to specific industries


Runs on Apache UIMA (Unstructured Information
Management Architecture) technology based on the
OASIS standard



©
2013
International Business Machines Corporation

6

Understands

natural language
and human
communication

Adapts and learns

from user
selections and
responses

Generates and
evaluates

evidence
-
based
hypothesis

…built on a massively parallel
architecture optimized for IBM POWER7

IBM Watson combines
transformational technologies

1

2

3

©
2013
International Business Machines Corporation

7

Brief History of IBM Watson

R&D

Demonstration

Commercialization

Cross
-
industry

Applications

IBM

Research
Project

(2006


)

Jeopardy!

Grand
Challenge

(Feb 2011)

Watson

for

Healthcare

(Aug 2011

)

Watson

Industry
Solutions

(2012


)

Watson

for Financial
Services

(Mar 2012


)

Expansion

©
2013
International Business Machines Corporation

8

Result of IBM Research “Grand Challenge”

On February 14, 2011, IBM Watson made history

©
2013
International Business Machines Corporation

9

Watson enables
three classes

of cognitive services

Decide



Ingest and analyze domain sources, info models


Generate evidence based decisions with confidence


Learn with new outcomes and actions


e.g.
-

Next generation Apps


Probabilistic Apps

Ask



Leverage vast amounts of data


Ask questions for greater insights


Natural language inquiries


e.g.
-

Next generation Chat




Discover


Find the rationale for given answers


Prompt for inputs to yield improved responses


Inspire considerations of new ideas


e.g.
-

Next generation Search


Discovery

©
2013
International Business Machines Corporation

10

Question
/Topic
Analysis

Synthesis

Final Merging

& Ranking

Trained
Models

Primary
Search

Candidate
Answer
Generation

A. Sources

Context
Dependent
Scoring

Context
Independent

Scoring

Evidence

Retrieval

Deep

Evidence

Scoring

Teach

Q&A

Watson States

(Simplified)

Train

Answer,
Confidence

Answer
Scoring

Filter

How does IBM

Watson Work:
Architectur
e overview

©
2013
International Business Machines Corporation

11

Watson for

Healthcare

Watson for

Financial Services

Watson for

Client Engagement

Watson for Industry

Solutions

Sample Advisor Solutions

Sample Advisor Solutions

Sample Advisor Solutions

Utilization

Oncology

Research

Care Mgt.

Banking

Financial Markets

Insurance

Call Center

Knowledge

Help Desk

Technical

NLP & Machine

Learning

Data

Analytics

Cloud

Mobile

Workload Optimized

Systems

1

0

0

1

1

1

0

0

1

1

0

0

1

0

0

1

0

0

1

0

1

0

0

0

1

0

1

1

0

0

1

0

1

1

0

0

0

1

0

1

0

0

1

0

0

0

1

1

0

1

0

1

Capabilities

ASK Services

DISCOVER Services

DECISION Services

Platform

Content

Tooling

Methods

Algorithms

APIs

Ready

Build

Teach

Run

Full Lifecycle

Putting IBM Watson to work : Watson Solutions

©
2013
International Business Machines Corporation

12

Why is tooling important for
a successful Watson Implementation


Adapting Watson to a new domain cannot be reduced to a simple product
install but must follow a rigorous methodology


Readiness preparation


Building the solution itself


Teaching Watson about the industry, use case and data involved


Run in production



New Watson Solutions can leverage a set of core capabilities provided by the
Watson platform as a starting point


Ingestable

content


Algorithms for Natural Language Processing, Candidate generation,
Machine
Learnings
, etc…


Set of customizable tools for teaching, training and accuracy analysis



For a successful domain adaptation, a team of people with different
backgrounds and expertise have to work together using a set of tools that are
easy to use and foster collaboration

©
2013
International Business Machines Corporation

13

Watson Developer

Industry Domain Expert

End User

Testing and Accuracy

Analyst

System Administrator

End UI Developer

Tooling Collaboration

I do the development
to get Watson
implemented and
deployed

I create the
public face of
Watson

I never met an
install that could
defeat me


though
many have tried

I can get to the
bottom of any Watson
accuracy problem

Watson
answers all my
questions

I know what
Watson needs
to know

©
2013
International Business Machines Corporation

14

Question
/Topic
Analysis

Synthesis

Final Merging

& Ranking

Trained
Models

Primary
Search

Candidate
Answer
Generation

A. Sources

Context
Dependent
Scoring

Context
Independent

Scoring

Evidence

Retrieval

Deep

Evidence

Scoring

Teach

Q&A

Watson States

(Simplified)

Train

Answer,
Confidence

Answer
Scoring

Filter

Where is tooling

needed

Ingestion

Accuracy
Analysis

Training
Data
Generation

Pipeline
monitoring

Algorithm
Dev Tools

Ground
Truth

Version
Conttrol

Pipeline
Configuration

©
2013
International Business Machines Corporation

15

Question

Analysis and Query building

Who is the first person to walk on the moon?

ESG Parse Tree

©
2013
International Business Machines Corporation

16

Search and Candidate Generation

Primary search constructs queries and
search among many available sources.


PRISMATIC (relationship search)


Lucene


Indri (multiple index types)


Semantic relations (
DBpedia
)

Candidate answers are generated based on:


Titles


Anchor text


Passages and their parts: headwords,
numbers, dates


Checking candidates against constraints

©
2013
International Business Machines Corporation

17

Scoring

Candidate
Answers

Evidence Feature Scores

Doc
Rank

Pass
Rank

Ty
Cor

Geo

LFACS

Neil Armstrong

0

1

0.8

0

0.7

Eugene
Cernan

1

1

0.3

0

0.4

Astronaut

2

2

0.1

0

0.3

John Young

3

0.3

0

0

Buzz
Aldrin

3

0.5

0

0

Renegate

kids

4

0

0.0

0

0

More than 50 scoring components:



Taxonomic



Geospatial (location)



Temporal



Source reliability


Name consistency



Relational



Passage support



Theory consistency


Context dependent (deep evidence)


Context independent


Features for machine language

isA
(

Neil Armstrong

,

person

) =
0.8

isA
(

Eugene
Cernan

,

person

) =
0.3

isA
(

Astronaut

,

person

) =
0.1

©
2013
International Business Machines Corporation

18

Supporting Evidence


Passage search



Much like a primary search, but requires
candidate answer as a term



Further scored to ensure candidate answer
context



Shared scoring solutions:


Passage term match


Skip
-
bigram


Text alignment


Logical form answer candidate
scoring

LFACS Alignment

©
2013
International Business Machines Corporation

19

Final Merger



Merging



Remove duplicate

answers


Requires normalizing scores per feature
to make merger


Ranking


Use of ML and IBM
®

SPSS
®

over
training data to create the model to rank
future results


Linear

and
logistic regression algorithms



Teach
-
train
-
execute cycle



10,000 training questions and 2000 test
questions

©
2013
International Business Machines Corporation

20

Watson Tooling Platform Requirements


Enable multiple teams to work on a
common integration platform and modular
programming model


Ensure high degree of interoperability
between tools


Easy to use high level APIs


Persistence


Rest Services


Logging




Development


Seamless access to all tools from the same
unified Web platform


Tools should be easy to assemble, configure
and deploy (zero install)


Unified login page across all tools


Consistent UI


Easy access to lifecycle tools


Task management


Source Control


Defect tracking

End User

Solution Adopted

OSGi

based Web
Container

©
2013
International Business Machines Corporation

21

Domain Specific
Tools

Security (Authentication/Authorization/Single Sign
-
on)

Watson Tooling Platform

Watson Tooling UI

Run on single app server: WAS, Tomcat, Jetty,...

Integration into a
unified UI Shell

Canonical Data Model / Data Access API Layer

Training
Data

Pipeline
configuration

Experiment Store, Answer
keys

Experiment Directory Output


Models/ARFFS


Intermediary
and Training
CASes


ScoreEval


PRA (
Anatator

tool)


Domain Specifics assets

Derby/
Db2

Derby/
Db2

Derby/
Db2

Client Tier

Web Tier

Data Tier

Dojo

IBM IDX

Watson
JS
APis

JSP Tags

OSGi

Web Platform

DWR JS

POJO
Apis

OSGi


Servi ces

Common
Apis


Experiments


Answer Key


Administration


Configuration

Common utilities


CAS
Deserialization


Watson Runtime
Access


Logging


Scheduling

Common UIMA
Type System

Annotator Registry

Instrumentation

Common Extension Points


Shell Action


Welcome Page


Question Action


Experiment Action

Data Models

File Formats

Open JPA

Experiment
Output API

Accuracy

Analysis

GroundTruth

Data

Ingestion

Domain Specific
Tools

Domain Specific
Tools

Domain Specific
Plugin

Tools

JAX
-
RS Wi nk

Unified Database Schema

Answer Key
XML
Schema

Logging/Serviceability

OSLC
APis

Watson
Runtime

OSLC Integration of Lifecycle
Tools (Defects tracking,
Task, Source Control)

Equinox
Servlet

Bridge


Equinox Web Container


Http Service


Jetty

©
2013
International Business Machines Corporation

22

OSLC : Lifecycle tools Integration Improved

OSLC linked data models and
RESTful

services interfaces form a great extensible platform
upon which to build the tools needed by the Watson persona to access lifecycle data

Watson Developer

Accuracy

Analyst

Industry
Domain Expert

Produces New Driver

Accuracy Analysis

1.
Retrieve Analysis Script

2.
Store Experiment results

3.
Create new Tasks

4.
Enter Defect

Jazz Lifecycle Integration
Platform (JLIP)

Training data

1.
Markup Annotation

2.
Validate Ground Truth

OSLC Based Enterprise Tools

©
2013
International Business Machines Corporation

23

Adapters

Open Source, 3
rd

Party, IBM non
-
OSLC Lifecycle
Tools

OSLC Based Tools &
Solutions (inc Rational
Lifecycle Tools)

Adapters

Homegrown
Lifecycle Tools &
Solutions

Linked lifecycle data (OSLC)

Jazz Lifecycle Integration Platform

Versioning
and
Baselining

Query &
Reporting

Product/Vari
ant Planning

Traceability
& Impact
Analysis

Reviews
and
Approvals

Notifications
and Alerts

Lifecycle Management Capabilities

Shared Artifacts
(user, project)

Admin Concepts
(System Registry, and
System Health)

Single
Sign on

Leveraging the
Jazz Lifecycle Integration Platform (JLIP
)

Diverse
platform participants:


Ecosystem

of tools and
adapters


IBM tools, including
Rational practitioner
tools

built to the platform


Adapters for 3
rd

party tools, including
Lifecycle Integration Adapters
(LIA)


Homegrown and custom/services
-
built
tools


Increasing value


Harness the power of Big Data


Connect data and people from
disparate,
heterogeneous tools


Identify a
single system
across complex tools
deployments to improve
centralized
administration


Enable extensibility and a
broad ecosystem
with a
vibrant community
of participants


Accommodate
flexible delivery models
and
channels such as
Cloud and Mobile

Lifecycle Query

The value of JLIP is that it provides Watson users the relevant
information needed from the tools, data and people.

©
2013
International Business Machines Corporation

24

We have only
just begun to build a
new era of computing powered by
cognitive systems


Transforming how organizations think, act,
and operate


Learning through interactions


Delivering evidence based responses driving
better outcomes

©
2013
International Business Machines Corporation

25

Resources


IBM Watson:
http://
www
-
03.ibm.com/innovation/us/watson
/


IEEE Collection:
http://ieeexplore.ieee.org/xpl/tocresult.jsp?isnumber=6177717&punu
mber=5288520


Eclipse Equinox:
http://www.eclipse.org/equinox/


JAZZ Platform:
http://jazz.net


OSLC:

http://open
-
services.net/

http://www.eclipse.org/lyo/


©
2013
International Business Machines Corporation

26