BioHaystack: Gateway to the

honorableclunkSoftware and s/w Development

Oct 30, 2013 (3 years and 7 months ago)

70 views

IBM Watson Research

© 2004 IBM Corporation

BioHaystack: Gateway to the

Biological Semantic Web

Dennis Quan

dennisq@us.ibm.com

IBM Watson Research

© 2004 IBM Corporation

Problems in bioinformatics


Myriad of public databases have specific facets of
information about biological objects of interest (e.g.,
proteins, genes, etc.)


Databases have their own access protocols, data formats,
naming conventions, and means of describing relationships
between objects in different databases


Different software required to view information from different
databases


User must be keenly aware of which tool or site to use


Relevant information comes in fragments


Exploration process is discontinuous

IBM Watson Research

© 2004 IBM Corporation

A common naming convention: LSID URNs


Life Sciences Identifiers (LSIDs) are URNs for biological
objects that are backed by RDF metadata:


E.g., urn:lsid:ncbi.nlm.nih.gov.lsid.i3c.org:genbank:nm_001240


LSID and LSID protocol (SOAP
-
based) specification
sponsored by I3C and undergoing standardization by OMG


Most of the publicly available bioinformatics databases
available via LSID today


PDB LSID authority online; “proxy” LSID authorities for
databases such as NIH databases, SwissProt hosted by I3C


Really easy to set up LSID clients and servers


IBM Internet Technology group provides Open Source LSID
client and server software for a variety of languages and
platforms

IBM Watson Research

© 2004 IBM Corporation

RDF/XML: on demand data integration

human

hemoglobin

LSID

oxygen

transport

protein

atagccgta

cctgcgagt

ctagaagct

derives from

is a

human

hemoglobin

LSID

human

hemoglobin

LSID

has 3D structure

GenBank

Gene Ontology

PDB

human

hemoglobin

LSID

atagccgta

cctgcgagt

ctagaagct

derives from

oxygen

transport

protein

is a

has 3D structure

Unified view

+

+

IBM Watson Research

© 2004 IBM Corporation

Haystack: letting users interact with their data


Haystack is a tool for creating, exploring, and organizing
information:


Personal information: e
-
mails, contacts, documents, etc.


Bioinformatics: proteins, publications, genes, etc.


Research project originating from MIT CSAIL


Uses RDF as an underlying data model


Built on Java and Eclipse, IBM’s Open Source rich client
platform


http://haystack.lcs.mit.edu/

IBM Watson Research

© 2004 IBM Corporation

Browsing highly interconnected information


Single screen
presents multiple
facets of a single
object originating
from separate
databases


Users navigate
space like a Web
browser:
hyperlinking, drag
and drop, etc.

IBM Watson Research

© 2004 IBM Corporation

Personalization


People keep track of their information by
personalizing their workspaces:


Grouping paperwork into folders


Highlighting important text in documents


Attaching sticky notes as reminders


Jotting down lists of related items


Haystack has pervasive support for
annotation and allows users to group
related objects together arbitrarily for their
own purposes

IBM Watson Research

© 2004 IBM Corporation

BioHaystack


BioHaystack: application of Haystack technologies to
bioinformatics problem


Integrated environment for working with biological data


Intended for end users, i.e., non
-
programmers


Builds on LSID, RDF, and Haystack


Integration offers the promise of lowering barriers to access
to different backend systems (e.g., LSID servers, Grids, Web
Services, relational databases, annotation servers)


Just as the Web browser acts as a client for Web content,
BioHaystack can act as a client for biological Semantic
content and services

IBM Watson Research

© 2004 IBM Corporation

Real world collaboration: myGrid


UK
-
funded joint project with the University of
Manchester and other UK research institutions


RDF
-
based platform for supporting e
-
Science
experiments


Real use cases; developed in collaboration with
bioinformaticians


myGrid creates LSIDs and RDF metadata in the
process of enacting experiments for scientists


Using BioHaystack as a browser for metadata

IBM Watson Research

© 2004 IBM Corporation

Registry

mIR

Discovery View

Haystack

Provenance

Browser

FreeFluo

Enactor

Taverna

WF Builder

Pedro

Annotation tool

Ontology Store

Others

WSDL

Soap
-

lab

Interface

Description

Annotation/description

Annotation

providers

Query &

Retrieve

Workflow

Execution

Store data/

knowledge

Scientists

Bioinformaticians

invoking

Query & register

Service

Providers

Data descriptions

Vocabulary

myGrid Architecture

Courtesy of Professor Carole Goble, University of Manchester

IBM Watson Research

© 2004 IBM Corporation

BioHaystack + myGrid

Courtesy of Professor Carole Goble, University of Manchester

IBM Watson Research

© 2004 IBM Corporation

Thank you for your attention


Dennis Quan, dennisq@us.ibm.com (IBM Watson Research)



Haystack project home page (download available May 24)


http://haystack.lcs.mit.edu/


IBM LSID home page


http://www.ibm.com/developerworks/oss/lsid/


myGrid home page


http://www.mygrid.org.uk/



See also our session on constructing Haystack applications:


Developer’s Day, Saturday, 4:30pm