Towards Automating Complex

hurtpotatocreekΑσφάλεια

5 Νοε 2013 (πριν από 3 χρόνια και 7 μήνες)

41 εμφανίσεις

1

Towards Automating Complex
Associative Access to Multiple
Bioinformatics Data Sources


Ling Liu, Calton Pu

David Buttler, Wei Han

Henrique Paques, Dan Rocco

Georgia Tech

2

Outline


State of Art


Users’ Perspective


Technology Perspective


Why SDM Technology


XWRAP Composer



Users’ Perspective



Technology Perspective


Progress Report and Near Term Deliverables


Related Long Term Research


3

Today:
Simple Query
-
Based Searching


Web

Why Automating Complex Associative Access

Large & Unorganized
Document Collections

Tomorrow with SDM
Technology


Semantic
Web

Query 2

Complex Associative
Access

requires experts

Complex Associative
Access

is automated
(one stop shopping)

4

Why Automating Complex Associative Access

Large & Unorganized
Document Collections

Characterize

Sort

Partition

Filter

Web

Today:
Simple
Query
-
Based
Searching


Summarize

Tomorrow with SDM
Technology


Semantic
Web

Query 2

5

Automating Complex Associative Access


Wrapper Technology


Workflow Technology


Semantic Web Technology



Service Discovery



Service Selection



Service Composition


Research Issues



Semantic Data Integration, Interoperability



Scalability, High Performance



Trusted Computing, Dependable, Survivable


6

XWRAPComposer


What is it?


A wrapper generation system that can semi
-
automatically
generate wrappers (info. extraction programs)


capable of accessing multiple scientific Web pages in one
shot.


What makes it different from other existing XWRAP
tools?



Capable of generating wrappers that extract information
from
multiple

Web pages connected by URLs (page links)
and compose them into an integrated XML document



Extremely useful for Automating Complex Associative
Access to multiple scientific data sources

7

Existing Wrapper
Technology


SDM Enabling Technology: XWRAPComposer

Query 2

Seq. Link

Wrapper

Sequence

Wrapper

Blast Sum

Wrapper

Blast Detail

Wrapper

Extracting Data from

a single Web Document

AA045112

CACCTGGAGAAACTTCTGCACTGGCACTGTGTTCCNAGAGCTCCTTCTATGCGTCCCTCC

CAAGTGATTTAATTTCAGCTGATTGGACTACGAATTCACAAGGCAGAAAAGTCAAGGTCA

TTTGGNATCTGGAGACAGGAGAACTCAAGGAACCNAAAGGACT

htgs

8

WrapperComposer
Technology


SDM Enabling Technology: XWRAPComposer

AA045112

Full Seq

Wrapper

CACCTGGAGAAACTTCTGCACTGGCACTGTGTTCCNAGAGCTCCTTCTATGCGTCCCTCC

CAAGTGATTTAATTTCAGCTGATTGGACTACGAATTCACAAGGCAGAAAAGTCAAGGTCA

TTTGGNATCTGGAGACAGGAGAACTCAAGGAACCNAAAGGACT

htgs

Blast

Wrapper

Extracting Data from

Multiple Web Documents

9


Given a

sequence,

list all matching


DNAs.


XWRAPComposer:

Technical Perspective




NCBi Blast Site

Web

Blast
Wrapper

Blast Query Page

Blast Format Page

Blast Delay Page

Blast Summary Page


Interface/Outerface Specification


Composer Script


Multi
-
page Control Flow Modeling


Data Extraction Workflow

Blast Detail Page

10

SDM Center Data Integration
Infrastructure

User

(Matt)

Workflow Agent

Service registry

and brokering

Data Integration Agent(s)

Data Mediation

Wrapper based Agent

Wrapper based Agent

Wrapper based Agent

Other Agents

(e.g., VIPAR)

Database Access


Communication Protocol Gateway

External Program

XML Wrapper

XML Wrapper

XML Wrapper

Data Source

Data Source

Data Source

XML Wrapper

XML Wrapper

XML Wrapper

XML Wrapper

Data Source

Data Source

Data Source

Data Source

Executable
Workflow

Plan: “Matt’s
WF”

DB

Data Sources

External Interface

Program Interfacing

Other I/O Agents

Extraction

Rules

Human

Knowledge

GUI

Code
Generator

Parameterized Workflow

Specification (PWS)

Source Capabilities
(SC)

Binding Patterns

User Agent

User constraints & parameters

Workflow Resolution

Service (WRS)

Domain
Map/Ontology

Workflow Instantiation

Service (WIS)

WF
feasible

WF infeasible:

report reason

Data Registration

Services Registration

DB

11

Progress Report


Status



Produced Three Deliverables


Composer Interface/Outerface Specification


Five Java Wrappers for Pilot Scenario


Composer Script Examples for Pilto Scenario



XWRAPComposer design and development


Near Term Plan


Finish the design of XWRAP Composer scripting
language ( Nov. 2002)


Develop the first prototype of XWRAP Composer
system (Jan. 2003)


Performance Evaluation (March. 2003)


12

Related Long Term Research


Semantic Web and Semantic Data
Integration



Service Discovery



dynamic content crawler



Service Selection


Adaptive query routing


Service Composition


Infopipe Technology