Bioinformatics - Reuna

conditioninspiredInternet and Web Development

Dec 4, 2013 (3 years and 4 months ago)

73 views

30.04.2003

Bioinformatics

1

Bioinformatics

on the way into the post
-
genomic era

AG Bioinformati
k

GBF, Braunschweig

BIOBASE GmbH

Wolfenbüttel

Ingmar Reuter


Biobase GmbH, Wolfenbuettel

UKG

University of Göttingen

30.04.2003

Bioinformatics

2

The ultimate goal of bioinformatics is to uncover the
wealth of biological information hidden in the mass of
data and to obtain a clearer insight into the fundamental
biology of organisms.

What is Bioinformatics

Bioinformatics is the application of computer technology to the
management and analysis of biological data.

30.04.2003

Bioinformatics

3

What is Bioinformatics

Informatic approaches

Biological objects

Descriptive

(databases)

Analytic

Synthetic

Sequences

Molecules

Mechanisms

EMBL, Genbank,

DDBJ

Gene assembly

Probe and Primer

design

PDB

3D structures

protein design

TRANSFAC,

TRANSPATH

pathway modeling

metabolic engineering,

network simulation

30.04.2003

Bioinformatics

4

Bioinformatics as interdisciplinary science has to:


Pick up, provide and apply the appropriate mathematical tools needed for
tackling problems of systematic biology;


develop appropriate algorithms and implement them as effective
computer programs;


provide a suitable knowledge basis to specify the application of the
developed tools;


apply the developed tools onto biological problems from basic and applied
research;


provide the required technical solutions for handling large amounts of
biological data.

Tasks of Bioinformatics

30.04.2003

Bioinformatics

5

Genomics

Proteomics

Cytomics

Biodiversity

Ecology

Sequence level

Molecular level

Cellular level

Organism level

Species level

Levels of biological organization

30.04.2003

Bioinformatics

6

biological systems are extremely complex

the overall space of biology is extremely large,

much larger than biological reality

there is no „world formula“ to explain today‘s biological
systems

„naive“ approaches fail


Knowledge bases

are required because:

Knowledge bases

30.04.2003

Bioinformatics

7

biological systems are extremely complex

the overall space of biology is extremely large,

much larger than biological reality

there is no „world formula“ to explain today‘s biological
systems

„naive“ approaches fail


Knowledge bases

are required because:

e. g. the human organism:


3 billion nucleotides harboring


~30,000 genes coding for


100,000
-
300,000 transcripts and


1
-
2 million proteins making up


~60 trillion cells of


~300 cell types in


~14,000 distinguishable morphological structures


Knowledge bases

30.04.2003

Bioinformatics

8

biological systems are extremely complex

the overall space of biology is extremely large,

much larger than biological reality

there is no „world formula“ to explain today‘s biological
systems

„naive“ approaches fail


Knowledge bases

are required because:

Knowledge bases

30.04.2003

Bioinformatics

9

Knowledge bases

are required because:


3.6 to >100 million living species





3.6 to >100 million living species

may correspond to as few as

0.1 % of all species that ever existed


biological systems are extremely complex

the overall space of biology is extremely large,

much larger than biological reality


there is no „world formula“ to explain today‘s biological
systems

„naive“ approaches fail


Knowledge bases

30.04.2003

Bioinformatics

10

biological systems are extremely complex

the overall space of biology is extremely large,

much larger than biological reality

there is no „world formula“ to explain today‘s biological
systems


„naive“ approaches fail


Knowledge bases

are required because:

Knowledge bases

30.04.2003

Bioinformatics

11

biological systems are extremely complex

the overall space of biology is extremely large,

much larger than biological reality

there is no „world formula“ to explain today‘s biological
systems

„naive“ approaches fail


Knowledge bases

are required because:

Knowledge bases

30.04.2003

Bioinformatics

12

Knowledge bases

Successful
bioinformatics

have to be

knowledge
-
based

30.04.2003

Bioinformatics

13

Knowledge bases

= Biological databases

Bibliographic Databases

Taxonomic Databases

Nucleotide Databases

Genomic Databases

Protein Databases

Microarray Databases


30.04.2003

Bioinformatics

14

Knowledge bases

= Biological databases

Bibliographic Databases

Taxonomic Databases

Nucleotide Databases

Genomic Databases

Protein Databases

Microarray Databases


30.04.2003

Bioinformatics

15

Biological databases


Nucleotide databases

The International Nucleotide Sequence Database Collaboration


EMBL (Europe)


DDBJ (Japan)


GenBank (USA)


Public submissions from the scientific community.


The databases are distributed free of charge over the internet.


DDBJ, GenBank and EMBL
-
Bank exchange new and updated
data
on a daily basis
.

30.04.2003

Bioinformatics

17

other examples:

Metabolic pathways: KEGG

Regulatory pathways: STKE, CSNDB, TRANSPATH

Transcriptional control: TRRD, TRANSFAC, TRANSCompel,




RegulonDB

Pathological mutations: OMIM, PathoDB

Regulatory domains: S/MARt DB

Expression profiling: ArrayExpress, ExProfile

Biological databases

30.04.2003

Bioinformatics

18

What did we do in the genomic era?

Wet lab: sequencing

Bioinformatics: sequence data storage





fragment assembly





first pass annotation

Post
-
genomic era

30.04.2003

Bioinformatics

19

What did we do in the genomic era?

Wet lab: sequencing

Bioinformatics: sequence data storage





fragment assembly





first pass annotation

Post
-
genomic era

30.04.2003

Bioinformatics

20

First pass annotation:

more or less "obvious" sequence features, such as

tRNA genes

repeats

ORFs (microbial genomes only)

mapping of cDNAs / ESTs / earlier genomic fragments

Advanced sequence interpretation:

ORFs in higher eukaryotes / splice site prediction

regulated alternative splice signals

promoter prediction

enhancers, LCRs

S/MARs

Post
-
genomic era

30.04.2003

Bioinformatics

21

Exploring of the biological function of genomic data

Wet lab:

selective study of individual genes


(conventional analysis, but now selection done


out of a whole genome of an organism)

mass data on gene expression


(e. g. microarray approaches)

proteomics

Bioinformatics:

new challenges

new challenges

Functional genomics

30.04.2003

Bioinformatics

22

Bioinformatic for Functional Genomics

-

to assign function to genomic data

-

to connect conventional biochemistry and genetics


with genome information

-

to incorporate and interpret mass data


generated with new methodologies

-

to model complex networks

-

to generate hypotheses and new knowledge


out of the available data

-


to bridge the genotype
-
phenotype gap


30.04.2003

Bioinformatics

23

DNA

RNA

protein

transcription

translation

amplification, methylation,

chromatin structure

splicing, degradation

modification, degradation

information carrier 1

transformation

carrier organization

information carrier 2

Gene regulation

30.04.2003

Bioinformatics

24

promoter

enhancer 1

enhancer 2

TSS

TATA

box

initiator

Inr

box A

box B

box C

box A‘

composite

element

box E

box D

box D‘

box F

box G

box A‘‘

General schema of the modular
hierarchical structure of transcription
regulatory regions

of

eukaryotic genes.

Gene regulation

30.04.2003

Bioinformatics

25

What is a transcription factor?

A transcription

factor is a protein that regulates transcription

after nuclear translocation

by specific interaction

with DNA

or by interaction with a protein that can be assembled into a

sequence
-
specific DNA
-
protein complex.

Gene regulation

30.04.2003

Bioinformatics

26

interacting

factor

coding region

regulatory region

gene

expression

SITE

FACTOR

GENE

SYNONYMS

FEATURES

CLASS

SPECIES

MATRIX

SEQUENCE

METHOD

CELL

http://www.gene
-
regulation.com

TRANSFAC

30.04.2003

Bioinformatics

27

SITE

FACTOR

GENE

SYNONYMS

FEATURES

CLASS

SPECIES

MATRIX

SEQUENCE

METHOD

CELL

http://www.gene
-
regulation.com

TRANSFAC

30.04.2003

Bioinformatics

28

TRANSFAC

FLYBASE

TRRD

EMBL

GENECARDS

EPD

PDB

PIR

SWISSPROT

BRENDA

CYTOMER

PROSITE

TRANSCOMPEL

PathoDB

S/MARtDB

TRANSPATH

SITE

FACTOR

GENE

CLASS

MATRIX

SEQUENCE

REFERENCE

30.04.2003

Bioinformatics

29

TRANSFAC

TRANSCompel

TRANSPath

Cytomer

nucleus

cell

organism

biosphere

gene product

transcription

factors

inducers

components

of

signal transduction

pathways

PathoDB

TRANSTax

S/MARt DB

TRANS
GENOME

BIOBASE databases

30.04.2003

Bioinformatics

33

The starting point:

A set of induced genes from

microarray experiments

Array Analysis

30.04.2003

Bioinformatics

34

The conventional analysis:

deduce the gene products

and map them to the

network of metabolic pathways

KEGG

biochemical effects

Array Analysis

30.04.2003

Bioinformatics

35

Extension of

conventional analysis:

map the induced gene products to the

network of regulatory pathways

biological effects

TRANSPATH

Array Analysis

30.04.2003

Bioinformatics

36

Reasoning

of experimental findings:

promoter analysis of induced genes

connected to network mapping

KEGG

TRANSPATH

Identification of

new targets

Array Analysis

30.04.2003

Bioinformatics

37

promoter model

TRANSGENOME

database

additional

predicted genes

extended

predicted network

Array Analysis

Promoter analysis

identifies additional target genes

and extends the affected network

30.04.2003

Bioinformatics

38

microarray: set of

induced genes

indirect hints on causes

retrieval of upstream sequences

promoter analysis

network analysis

new target

TRANSPATH

TRANSFAC

TRANSGENOME

assignment of gene products

modeling of effects

metabolic network mapping

KEGG

regulatory network mapping

TRANSPATH

Array Analysis

Causes

Effects

30.04.2003

Bioinformatics

39

Great number of tools and databases afford
sophisticated integrative approaches to make the
data usable

Two projects with different strategies:

-
The
HNB

(Helmholtz Network for Bioinformatics)

-

TEMBLOR
(The European Molecular Biology Linked
Original Resources)


Bioinformatics


New integrative approaches

30.04.2003

Bioinformatics

40

HNB = Helmholtz Network for Bioinformatics

User friendly web interface for integrating
complex bioinformatics tasks

Available at http://www.hnbioinfo.de

Funded by the

Federal Ministry of Education and Research
(
bm
b+f
)

About the HNB

30.04.2003

Bioinformatics

41

HNB
-
Partners

Joint venture of

5 institutes of the

Helmholtz Society

2 Max Planck
institutes

1 university institute

2 other institutes

(RZPD, EMBL)

30.04.2003

Bioinformatics

42

Problem oriented approach

the user is guided by a detailed, tree
-
like questionnaire
('Guided Tool Finder') towards tools suitable for solving specific
problems

Task oriented approach

solving more advanced biological problems that require the
usage of more than one tool or database

The HNB portal simplifies access to and handling of various
bioinformatics resources by providing a
problem
-

and
task
-
oriented
web interface that integrates many nucleic acid and
protein analysis tools.

HNB
-

Description

30.04.2003

Bioinformatics

43

A broad variety of programs and databases for nucleic
acid and protein analysis are offered within the HNB
framework. They focus on


Genome Analysis

Protein Analysis

Enzymes and Metabolism


For example:

BLAST



FASTA



GENSCAN



PEDANT



BRENDA



TRANSFAC

HNB
-

Resources

30.04.2003

Bioinformatics

44

HNB


Technical implementation

Heterogeneous network of WWW servers (some protected by firewalls)

Inter
-
server communication is done with HTTP/HTTPS as the transport
layer for tunnelling other, XML based communication protocols
(including SOAP), which are used for the actual data exchange.

User certificate
-
based authentication mechanism

Anonymous access is also possible for many HNB resources, but with
temporary limited access to the user space.

The user's input and output data are registered in a central
'virtual
user space'

and stored on one of the HNB servers for a defined
period of time, allowing users to easily access their own data again
for re
-
evaluation and re
-
use


30.04.2003

Bioinformatics

45

Gene Regulation Tasks I

Promoter Scan

(PromoterInspector)

TF IUPAC Scan

(PatSearch)

TF Matrix Scan

(MatInspector)

TF Scan

RegRegion Analysis

Nucleic acid sequence

Annotated

nucleic acid sequence

30.04.2003

Bioinformatics

46

Task launch


Input form: highly

configurable by meta

data


Help system

Gene Regulation Tasks II

30.04.2003

Bioinformatics

47

Output:


Annotated nucleic acid

sequence

Linked to TRANSFAC

Linked to EMBL

Gene Regulation Tasks III

30.04.2003

Bioinformatics

48


New:

Follow
-
up task

selektor





Seamless integration of


data input


data transformation


data visualisation


Allows for the interactive exploration of the data‘s information
content!

Input

Visualisation

Transformation

Gene Regulation Tasks IV

30.04.2003

Bioinformatics

49

Peer Bork, Christian Buning, Maik Christensen, Holger Claußen,
Torsten Crass
, Christian Ebeling, Peter Ernst, Valerie Gailus
-
Durner,
Karl
-
Heinz Glatting, Rolf Gohla, Frank Gößling, Korbinian Grote,
Alexander Herrmann, Sean O'Keeffe, Olaf Kiesslich, Jan O. Korbel,
Thomas Lengauer, Ines Liebich, Mark van der Linden, Hannes Luz,
Kathrin Meissner, Christian von Mering, Hans
-
Werner Mewes, Heinz
-
Theodor Mevissen, Martin Mokrejs, Tobias Müller, Heike Pospisil,
Matthias Rarey, Jens G. Reich, Ralf Schneider, Dietmar Schomburg,
Steffen Schulze
-
Kremer, Ingolf Sommer, Sandor Suhai,
Gnanasekaran Thoppae, Martin Vingron, Jens Warfsmann, Thomas
Werner, Daniel Wetzler, Edgar Wingender, Ralf Zimmer


The HNB Consortium

30.04.2003

Bioinformatics

50

About TEMBLOR

TEMBLOR = The European Molecular Biology Linked
Original Resources


a new
-
generation bioinformatics project, centered on an
integrated layer for the exploitation of genomic and proteomic
data (Integr8)

The project is funded by the European Community under the
contract
-
no. QLRI
-
CT
-
2001
-
00015


Start: 01.01.2002

Duration: 3 years

30.04.2003

Bioinformatics

51

TEMBLOR
-

Partners

25 collaborators in 11 countries:

EUROPEAN MOLECULAR BIOLOGY LABORATORY, SAFFRON WALDEN, UNITED KINGDOM

CONSEJO SUPERIOR DE INVESTIGACIONES CIENTIFICAS, MADRID, SPAIN

CENTRE NATIONAL DE LA RECHERCHE SCIENTIFIQUE, MARSEILLE, FRANCE

UNIVERSITE DE BORDEAUX I, TALENCE, FRANCE

SWISS INSTITUTE OF BIOINFORMATICS, GENEVE, SWITZERLAND

UNIVERSITY OF BERGEN, BERGEN, NORWAY

GBF
-

NATIONAL CENTRE FOR BIOTECHNOLOGY, BRAUNSCHWEIG, GERMANY

UNIVERSITY COLLEGE LONDON, LONDON, UNITED KINGDOM

UNIVERSITE CLAUDE BERNARD LYON 1, LYON, FRANCE

RZPD DEUTSCHES RESSOURCENZENTRUM FUER GENOMFORSCHUNG GMBH, BERLIN, GERMANY

UNIVERSITY OF ERLANGEN
-
NUREMBERG, ERLANGEN, GERMANY

UPPSALA UNIVERSITY, UPPSALA, SWEDEN

UNIVERSITY HOSPITAL UTRECHT, UTRECHT, NETHERLANDS

INSTITUT NATIONAL DE LA SANTE ET DE LA RECHERCHE MEDICALE, MARSEILLE, FRANCE

GLAXOSMITHKLINE RESEARCH AND DEVELOPMENT LIMITED, STEVENAGE, UNITED KINGDOM

THE HEBREW UNIVERSITY OF JERUSALEM
-

THE AUTHORITY FOR RESEARCH AND DEVELOPMENT, JERUSALEM, ISRAEL

UNIVERSITY OF SOUTHERN DENMARK
-

UNIVERSITY OF ODENSE, ODENSE, DENMARK

TECHNICAL UNIVERSITY OF DENMARK, LYNGBY, DENMARK

UNIVERSITY OF COPENHAGEN, KOEPENHAGEN, DENMARK

VRIJE UNIVERSITEIT BRUSSEL, RHODE
-
ST
-
GENESE, BELGIUM

CENTRE NATIONAL DE LA RECHERCHE SCIENTIFIQUE, VILLEURBANNE, FRANCE, BIOBASE GMBH, WOLFENBUETTEL, GERMANY

UNIVERSITY OF DUNDEE, DUNDEE, UNITED KINGDOM, GENE
-
IT S.A., MONTESSON, FRANCE

COUNCIL FOR THE CENTRAL LABORATORY OF THE RESEARCH COUNCILS, WARRINGTON, UNITED KINGDOM

THE CHANCELLOR, MASTERS AND SCHOLARS OF THE UNIVERSITY OF CAMBRIDGE, CAMBRIDGE, UNITED KINGDOM

MAX
-
PLANCK
-
GESELLSCHAFT ZUR FOERDERUNG DER WISSENSCHAFTEN E.V., BERLIN, GERMANY

30.04.2003

Bioinformatics

52

TEMBLOR
-

Description

The projects within TEMBLOR include:


DESPRAD

Standards and repositories for gene expression experiments



EMSD

Storing and analysing the structures of large molecules


IntAct

Standards and resources for protein

protein interaction data


Integr8

An integrated layer for the exploitation of genomic and
proteomic data


30.04.2003

Bioinformatics

53

TEMBLOR
-

Ressources

The resources that will be built by the
DESPRAD
,
MSD

and
IntAct

projects will be combined with other sources of genomic and
proteomic information.

Including:

EMBL

-

DNA sequences

Swiss
-
Prot

and
TrEMBL
-

protein sequences

Interpro

-

protein motifs

Ensembl
-

genome annotation

EMSD

-

European Macromolecular Structure Database

ArrayExpress


SWISS
-
2DPAGE

EPD

-

Eukaryotic Promoter Database

EPDEX, trEST
and

trGEN, HOBACGEN, HOVERGEN,

RZPD

-

Resource Centre/Primary Database

TRANSFAC

-
Transcription Factor Database and its satellite databases

30.04.2003

Bioinformatics

54

TEMBLOR
-

Integr8

Integr8 is the glue that holds all the TEMBLOR projects together

Integr8 will connect all the information to allow easy
access for researchers, making the value of the
composite much greater than the sum of its parts.

Researchers will be able to perform text
-
, structure
-

and
sequence
-
based searches to access gene sequences, their
genomic context, transcripts, protein sequences and more.


30.04.2003

Bioinformatics

55

TEMBLOR
-

Technical implementation

One central relational database
to accommodate the
core data of the integr8ed databases

Partner databases provide data necessary to map to the
integrative layer


Course
-
grain integration of all databases of the Integr8
partners by simple cross
-
referencing on database entry level


Fine
-
grain integration of all databases of the Integr8 partners by
cross
-
referencing on database entry and feature level

30.04.2003

Bioinformatics

56

TEMBLOR


Final statement

We just finished year one and achieved already
excellent progress in the different projects.


In year two, the main focus of the project will be on
integrating the various resources into the integrative
layer.


In year three, our focus will be tool integration and
service development.


30.04.2003

Bioinformatics

57

Acknowledgements

UKG:

T. Crass

M. Haubrock

I. Liebich

H. Michael

A. Potapov

T. Sauer

K. Seidl

E. Shelest


BIOBASE:

TRANSFAC:

E. Fricke

S. Land

V. Matys

R. Münch

M. Scheer

S. Thiele

C. Choi

M. Krull

S. Pistor

TRANSPATH:

Bioinformatics:

E. Gößling

K. Hornischer

A. Kel

B. Lewicki
-
Potapov

D. Tchekmenev

N. Voss


TRANSC
ompel
:

O. Kel
-
Margoulis

Prof. Dr. Edgar Wingender

Department of Bioinformatics, UKG, University of Goettingen
BIOBASE GmbH, Wolfenbuettel

30.04.2003

Bioinformatics

58

Thank you!