From biology to bioinformatics to e-Science and back again

hordeprobableBiotechnology

Oct 4, 2013 (4 years and 9 days ago)

85 views

Your Brains in

My e
-
Laboratory

Feasting on brains with
Taverna

and
Semantic Web tools


Marco
Roos

acknowledging the AID team (Scott Marshall,
Sophia
Katrenko
, Willem van
Hage
, Edgar
Meij
,
Konstantinos

Krommydas
, Pieter
Adriaans
),
Andrew Gibson,
Martijn

Schuemie
,
Piter

de Boer,
the
myGrid

team (in particular Katy
Wolstencroft
,
Carole Goble, and Dave de
Roure
), OMII
-
UK and
NBIC

Amsterdam, May 28, 2009


2

Marco Roos

Biologist and
bioinformatician

Post
-
doc e
-
(bio)science, University of Amsterdam (
BioRange
/VL
-
e)

Project or Area Liaison (PAL) OMII
-
UK

Member UK e
-
Science All Hands Foundation

Member
BioAssist

programme committee NBIC

A biologist in
e
-
Science

3

Mouse fibroblast (skin) cells

My
primary motivation

Structure and function of DNA in the nucleus

Escherichia coli

5

/*


* determines ridges in htm expression table

*/


#include "ridge.h"


int selecthtm(PGconn *conn, char *htmtablename, char *chromname, PGresult *htmtable)

{


char querystring[256];



sprintf("SELECT * FROM %s WHERE chrom = %s ORDER BY genstart", htmtablename, chromname);


htmtable = PQexec(conn, querystring);



return(validquery(htmtable, querystring));

}


int is_ridge(PGresult *htmtable, int row, double exprthreshold, int mincount)

/* determines if mincount genes in a row are (part of) a ridge */

/* pre: htmtable is valid and sorted on genStart (ascending)

/* post:

{


if (mincount<=0) return TRUE;




if (row>=PQntuples(htmtable)) return FALSE;



if(PQgetvalue(htmtable, 0, PQfnumber(htmtable, "movmed39expr")) < exprthreshold)


{



return FALSE;


}


return(is_ridge(htmtable, ++row, exprthreshold,
--
mincount));

}


int main()

{


PGconn

*conn;

/* holds database connection */


char querystring[256]; /* query string */


PGresult *result;


int i;



conn = PQconnectdb("dbname=htm port=6400 user=mroos password=geheim");



if (PQstatus(conn)==CONNECTION_BAD)


{



fprintf(stderr, "connection to database failed.
\
n");



fprintf(stderr, "%s", PQerrorMessage(conn));



exit(1);


}


else printf("Connection ok
\
n");




sprintf(querystring, "SELECT * FROM chromosomes");


printf("%s
\
n", querystring);



result = PQexec(conn, querystring);



if (validquery(result, querystring))


{



printresults(result);


}


else


{



PQclear(result);



PQfinish(conn);



return FALSE;


}




PQclear(result);


PQfinish(conn);


return TRUE;

}


int printresults(PGresult *tuples)

{


int i;



for (i=0; i< PQntuples(tuples) && i < 10; i++)


{



printf("%d, ", i);



printf("%s
\
n", PQgetvalue(tuples,i,0));


}


return TRUE;

}


int validquery(PGresult *result, char *querystring)

{


printf(" in validquery
\
n");


if (PQresultStatus(result) != PGRES_TUPLES_OK)


{



printf("Query %s failed.
\
n", querystring);



fprintf(stderr, "Query %s failed.
\
n", querystring);



return FALSE;


}


return TRUE;

}

6

‘Old school’ bioinformatics approach

Local

Database

Local

Database

My tiny brain

8

9

Virtual professor

My ws

Your ws

My ws

Your ws

My ws

* From P.J.
Verschure
, Journal of Cellular Biochemistry 2006, vol. 99(1), pg 23
-
34

*

10

Combining expertise

Edgar Meij

Information retrieval expert

11

Combining expertise

Sophia Katrenko

Machine learning expert

12

Combining expertise

Willem van Hage

Semantic web expert

(and bass guitar player)

13

Combining expertise

Towards a knowledge framework

Computer scientist and bioinformatician

Scott Marshall

14

The
AIDA

toolbox, Web Services

for knowledge extraction

and knowledge management

15

AIDA
toolbox

e
-
Science collaboration

16

“Collaboration through Web Services”

Bio
-
text mining expert

BioSemantics group,

Erasmus University Rotterdam

Martijn Schuemie

17

“Collaboration through Web Services”

Biological Database expert

Hideaki Sugawara

18

“Collaboration through Web Services”

e
-
bioscientist

19

An
insightful

computational experiment

Workflow paradigm for biologists

21

e
-
Science leveraging

the use of more brains

Want this…

22

e
-
Science leveraging

the use of more brains

…need this

23

Workflow and Semantic Web

Alpha version

of Concept
Web

24

Separation of models and instances in OWL

Biological model

(representing cartoon elements)

<myModel:HDAC1><
rdfs:type
><
myModel:Protein
>

<
myModel:Protein
><
rdfs:type
><
owl:Class
>

26

Model for text mining observations

27

Experiment log model

28

Mappings between Biological and other models

PRELIMINARY RESULTS

29

SELECT

label(comment), label(query1), label(query2)

FROM

{
protein_instance
}
rdf:type

{
bio:Protein
}
rdf:type

{
owl:Class
},


{
protein_instance
}
rdfs:comment

{comment};



bioModel:isModelComponentOf

{model1};



bioModel:isModelComponentOf

{model2},


{representation1}
mappingModel:partially_represents

{model1};



methodModel:has_query

{query1},


{representation2}
mappingModel:partially_represents

{model2};



methodModel:has_query

{query2}

WHERE

model1 != model2

Pseudo RDF query and results

Protein

Query for model 1

Query for model 2

"protein referred to by as
NF
-
kappaB

and
UniProt

ID: P19838"

"HDAC1 chromatin"

"(
Nutrician

OR food) AND (chromatin OR
epigenetics
) AND (protein OR proteins)"

"protein referred to by as
p21

and
UniProt

ID: P38936"

"HDAC1 chromatin"

"(
Nutrician

OR food) AND (chromatin OR
epigenetics
) AND (protein OR proteins)"

"protein referred to by as
Bax

and
UniProt

ID: P97436"

"HDAC1 chromatin"

"(
Nutrician

OR food) AND (chromatin OR
epigenetics
) AND (protein OR proteins)"

Protein

Protein

name

Discovery

process run

Service

run

Creator

Run

date & time

Document

references

discovered by

implemented by

run at

creator

has input

component of

UniProt:P19838

NF
-
KappaB

Conditional Random Fields

Protein Name Recognition

AIDA:applyCRF

Sophia
Katrenko

(
UvA
)

2008
-
11
-
18

03:29:30

PMID:17540846

references

discovered by

implemented by

run at

creator

has input

component of

Access to triples in
Taverna

via AIDA
plugin

33

Knowledge mining

34

Knowledge mining:

my knowledge is mine, your knowledge is mine

35

Demonstrate Exploiting Brains (2x)

My ws

Your ws

My ws

Your ws

My ws

* From P.J.
Verschure
, Journal of Cellular Biochemistry 2006, vol. 99(1), pg 23
-
34

*

Computational

brains

Biological

brains

36

A typical biologist…

A

needy

biologist

Tiny brain

Lots of data

to deal with

Lots of methods

and algorithms to try


and combine

No

computational

superpowers

Lots of knowledge

to deal with

37

An enhanced biologist


An

enhanced
biologist

Many brains

Lots of data

to
support me

Web Services,

Workflows,


and their creators

available

Other people’s

computational

superpowers

Knowledge bases

to query

38

Publish and
share on myExperiment.org

Publish & share
research objects

39

e
-
Laboratory factories

40

http
://www.epigenius.org/

(mock
-
up)

41

End of presentation...

Thank you

http://adaptivedisclosure.org


Are you willing to share your brain?