Why Bioinformatics?

underlingbuddhaBiotechnology

Oct 2, 2013 (3 years and 10 months ago)

87 views

http://creativecommons.org/licens
es/by
-
sa/2.0/

Bioinformatics

Prof:Rui Alves

ralves@cmb.udl.es

973702406

Dept Ciencies Mediques Basiques,

1st Floor, Room 1.08

Website:
http://web.udl.es/usuaris/pg193845/testsite/


Course Website:
http://web.udl.es/usuaris/pg193845/Bioinformatics_2009/


Language of the course


Mine: English



Slides: English



Webpage: English



Yours: Whichever you choose as long as I
understand it. ALWAYS ASK WHEN YOU
DON’T UNDERSTAND SOMETHING!!

Web Page of the course

http://web.udl.es/usuaris/pg193845/Bioinformatics_2
009/




There, you will find all the information about
your tasks, links to bioinformatics resources,
and the lecture

Goals of this course


Give you an integrated view of how to use
computers and informatics to gain a
systemic understanding of biological
systems at the molecular level.


Integrate bioinformatics, mathematical
modelling and other areas of
computational biology to save lab work
and address problems that can not yet be
solved at the lab.

Course Plan


First part of the course (2 weeks): Broad
introduction to bioinformatics and
computational biology in molecular biology.



Second part of the course: Problems for you to
solve in group at home, + in
-
depth lectures
about the different subjects you need to solve
the problems.

Evaluation Plan


5 tasks in groups of four. At the end of each
task you deliver a paper as a group. (overall, all
tasks will account for 50% of final grade).


Final paper presenting the whole story
together (20%).


Individual discussion of the final paper with me
(20%).


Class participation (10%).


CAUTION: YOU NEED TO HAVE AT LEAST 5/10
IN EACH TASK, IN THE FINAL PAPER AND IN
THE DISCUSSION.

Index


Why bioinformatics?


Ontologies & Classification
schemes


Databases and servers

Why Bioinformatics?


Prof:Rui Alves

ralves@cmb.udl.es

973702406

Dept Ciencies Mediques Basiques,

1st Floor, Room 1.08

Website:
http://web.udl.es/usuaris/pg193845/testsite/


What obvious problems do large
scale sets create?




Imagine the 6 500 000 000 human beings born
within the last 130 years and still alive.


By and large a majority of them has had and
education.


What problems need solving to ensure that
education?


Knowledge

1


Organize Knowledge

2


Organize its transmission

First problem: organizing
knowledge




We do not need to know all there is to know in
order to be productive in society


Furthermore we can not learn everything at the
same time.


Problem: How to organize knowledge into bite
-
sized packages that can be consecutively
parceled out, and from which one can build
upon?

Organizing knowledge



Communication

(
Read
,
write
,
count
)

Humanities

Sciences



Second problem: organizing the
transmission of knowledge




The school system is a way in which the most
people can be trained with the least societal effort


Not effective

School and Books are the servers and
databases of educating people



Users

Database

Server

New
Server:



You

Hey
,
it’s

raining
!!!
Why

don’t

we

try
and figure
out

how

all

the

little

molecular
pieces

in a
cell

work

together
?!?!?!

Understanding biological systems



We

were

WRONG!!!!!

I
need

more
data!!!
How

do
I plan
what

to

do
now
?

The “omics” revolution in molecular
biology




Over many decades, a huge amount of biological data has
accumulated.



Unlike the “KNOWLEDGE” we discussed before, this data is
not well organized and the connections between the different
parcels of data are obscure.



The omics revolution has compounded this problem 1000
fold because data now accumulates faster than ever.



What is the “omics” revolution in
molecular biology?




The omics revolution is a period of about ten years in which
several different technologies that can be applied to study
the complement molecular landscape of cells!!!


Genomics


Proteomics


Metabolomics


Et caeteromics


The “omics” revolution in molecular
biology





(We!!) Biologists want the data to
make sense and they want it now!!!


Understanding biological systems



I
need

more
data!!!
Why

don’t

they

give

it

to

me

Comparison between the two problems

People

organized

the

Knowledge

transmission

system

and
its

connections

over

milenia

of trial and
error.

It

is

impossible

for

people

to

organize

the

biological

knowledge

brought

about

by

omics

in
the

10
years

that

have

passed

since

the

beginning

of
the

omics

era.

Why?




Data is not well classified.



Data is not well connected.



Data is not well understood.



Not enough people to do it in a short amount of
time.

New types of servers and databases are
required for very fast organization and
data mining



Users

Database

Server

BIOINFORMATICS!!


Development and application of
computational/informatic tools to the
solution of biological problems


The Standard of internet Bioinformatics:


What is Bioinformatics?

L

A

M

P

I

N

U

X

P

A

C

H

E

Y


S

Q

L

E

R

L

H

P

Y

T

H

O

N

Operating
system

Internet
server

Database
server

Programing

language(s)


JAVA facilitates that the servers launch a smaller number of
processes by using the client’s machines for calculus and
allowing for a larger number of simultaneous connections.


TOMCAT “talks” very well with JAVA.


The standards are changing

L

T

M

J

I

N

U

X

O

M

C

A

T

Y


S

Q

L

A

V

A

Operating
system

Internet
server

Database
server

Programing

language(s)

What does a computer need
to be effective?




Well classified data


Ontologies, Classification schemes



Well organized data


Databases, servers



Good users

Index


Why bioinformatics?


Ontologies & Classification
schemes


Databases and servers

Ontologies and classification
schemes for data

Prof:Rui Alves

ralves@cmb.udl.es

973702406

Dept Ciencies Mediques Basiques,

1st Floor, Room 1.08

Website:
http://web.udl.es/usuaris/pg193845/testsite/

Biological Classification Schemes


What is an
Ontology

(in the Biological
sense)?




A set of definitions of controlled
vocabularies with hierarchical relationships
to one another, that can easily be dealt
with by computers

What are Bio
-
Ontologies?

Biological Ontologies (Bio
-
ontologies) can be defined as a complex
hierarchical structure in which
biological concepts

are
described by their meanings (definitions) and relationships to
each other.


There are many Bio
-
Ontologies available and in use by databases.
The Plant Ontology, along with other ontologies such as the
Gene Ontology, are included in the open source
Open
Biological Ontologies

project at Sourceforge.


http://obofoundry.org/


The Gene Ontology

The most well
-
known example of a bio
-
ontology is the Gene Ontology
(GO;
http://www.geneontology.org
) which describes three
biological domains: cellular component (where the gene product
locates), molecular function (what the gene product does) and
biological process (the cellular, developmental or physiological
events the gene product is involved in).

GO are used to describe gene products. Because these descriptions are
independent of species
-
specific nomenclature and uniformly
applied, it is possible to make meaningful and efficient
comparisons of genes across diverse taxa.

Three “Super Categories of GO


Molecular Function (what)


Tasks performed at the molecular level


Biological Process (why)



How it pertains to the organism


Cellular Component (where)


Its location

Example


Gene Name: BRCA1


Molecular Function: protein binding


Biological Process: DNA Replication and
Chromosome Cycle


Cellular Component: nucleus

Structure of GO


How to define the relationship between concepts?


Example: How to relate the terms: “cell” “nucleus”
“membrane”








How is GO Annotated?


Manual


Humans sifting through primary literature


Electronic


Assign GO Terms using already existing
information in databases.

Evidence Code for GO
Annotation

IEA

Inferred from Electronic Annotation

ISS

Inferred from Sequence Similarity

IEP

Inferred from Expression Pattern

IMP

Inferred from Mutant Phenotype

IGI


Inferred from Genetic Interaction

IPI


Inferred from Physical Interaction

IDA

Inferred from Direct Assay

RCA

Inferred from Reviewed Computational Analysis

TAS

Traceable Author Statement

NAS

Non
-
traceable Author Statement

IC


Inferred by Curator

ND

No biological Data available

Detailed info available from:
http://www.geneontology.org/doc/GO.evidence.html

How to use GO in data analysis


Simple Queries


Find over
-
represented GO categories in a list of
genes


Search Biological “Themes”


Binning


Obtain a broad view of the distribution of major GO
terms in a list of genes.


Clustering Genes on GO terms


Group together functionally related genes based on
GO terms.


GO Tools


NetFlix


Get GO Annotation


AmiGO


Browser and Simple Queries


GoTermMapper


Binning(Go Slim)


GeneToolBox




Finding over
-
represented GO categories


Clustering based on similar GO terms


Query for Gene with Similar Function.

GO is not very good


EC numbers


Protein classification schemes


TF classification schemes


Transport proteins classification schemes


Etc.

The EC number database


The BRENDA database


The TF classification database


The signal transduction
classification database


The transport proteins
classification database


All these classifications are reminiscente
of the Dewey classification system for
books!!!! (Remember public libraries?)

A general protein classification
database


What does a computer need
to be effective?




Well classified data


Ontologies, Classification schemes



Well organized data


Databases, servers


Index


Why bioinformatics?


Ontologies & Classification
schemes


Databases and servers

Databases & Servers

Prof:Rui Alves

ralves@cmb.udl.es

973702406

Dept Ciencies Mediques Basiques,

1st Floor, Room 1.08

Website:
http://web.udl.es/usuaris/pg193845/testsite/

What is a Database?


A database is a collection of data organized
in such a way that it is easy to store in a
computer and to mine by appropriate
software


A database is usually organized as a set of
tables in which information about an object is
stored


The tables are related to each other in
different ways.

What does database technology
allow?




Making information useful



Avoiding "accidental disorganisation”



Making information easily accessible
and integrated with the rest of our work

S
(tructured)
Q
(uery)
L
(anguange)


ANSI (American National Standards
Institute) standard computer language for
accessing and manipulating database
systems.


SQL statements are used to retrieve and
update data in a database.


Includes:


Data Manipulation Language (DML)


Data Definition Language (DDL)

Web Databases


Data is accessible through Internet


Have different underlying database
models


Example: biological databases


Molecular data:
NCBI
,
Swissprot
,
PDB
,
KEGG
,

GO


Protein interaction :
DIP

,
BIND


Organism specific:
Mouse

,
Worm
,
Yeast


Literature:
Pubmed


Disease:
OMIM

How to make databases useful


Attach it to a server


Let people use to mine for knowledge

An example of WAMP




The bioinformatics class server

An example of WAMP




The bioinformatics class server

An example of WAMP




The bioinformatics class server

An example of WAMP




The bioinformatics class server

Wireless

Apache

An example of WAMP




The bioinformatics class server

Wireless

Apache

MySQL

PHP

Summary


Why bioinformatics:


Because there is simply too much data out
there for human being to deal with without
computer assistance.


Because many of the calculations to extract
knowledge from the data would take too long
without computers.


How to do bioinformatics:


Organize data well using appropriate
classification systems.


Use databases and server technology.