WHAT IS BIOINFORMATICS?

groupmoujeanteaBiotechnology

Oct 23, 2013 (3 years and 11 months ago)

84 views

WHAT IS
BIOINFORMATICS?

Daniel Svozil

Definition


NCBI


Bioinformatics
is the field of science in which biology, computer
science, and information technology merge into a single discipline.
The ultimate goal of the field is to enable the discovery of new
biological insights and to create a global perspective from which
unifying principles in biology can be discerned
.



Wikipedia.org


The application of information technology and statistics to the field
of molecular biology.



The creation and advancement of databases, algorithms,
computational and statistical techniques, and theory to solve formal
and practical problems arising from the management, analysis and
interpretation of biological data
.

http://
www.ncbi.nlm.nih.gov/About/primer/bioinformatics.html

Extraction of biological
knowledge from data

Data

Knowledge

convert
data to knowledge

generate
new hypotheses

design
new experiments

Experimental

From public

databases

Omes

Genome

Transcriptome

Proteome

Reactome

Tissue architectures

Cell interactions

Sigaling

……

Metabolome

Cell

Organism

genome



DNA sequence in an organism

transcriptome



mRNA of an entire organism

proteome



all proteins in an organism

metabolome



all metabolites in an organism

interactome



all molecular interactions in an organism

Omes and Omics


Genomics


Primarily sequences (DNA and RNA)


Databanks and search algorithms


Supports studies of molecular
evolution


Proteomics


Sequences (Protein) and structures


Mass spectrometry, X
-
ray crystallography


Databanks, knowledge bases, visualization


Functional Genomics (transcriptomics)


Microarray
data


Databanks, analysis tools, controlled terminologies


Systems
Biology (metabolomics)


Metabolites and interacting systems (interactomics)


Graphs, visualization, modeling, networks of
entities


O
mics


Biological knowledge

Medical knowledge

Improved health

Genomics

Transcriptomics

Proteomics

Metabolomics

Interactomics

……

includes

Sequencing

Microarrays

LC/MS

NMR

Two hybrid

……

m
easured
by

t
heir
data are

High
-
throughput

High
-
noise

To reduce noise

Advanced

pre
-
processing
techniques

Reliable high
-
throughput
information

Techniques to analyze
high
-
dimensional data
and
knowledgebases

source:
Bios
560R Introduction
to
Bioinformatics,
userwww.service.emory.edu/~tyu8/560R/560R_1.pptx

Key reasearch in bioinformatics


sequence bioinformatics


structural bioinformatics


systems biology


analysis of biological pathways to gain e.g. the understanding of
disease processes

21
st

century


complex systems


Designing (forward
-
engineering)


Understanding
(reverse
-
engineering)


Fixing



Why is it so complex?


Can we make a sense of this
complexity?


How is it robust?


http://yilab.bio.uci.edu/ICSB2007_Tutorial_AM1.htm

STUDYING GENOMES

Studying DNA

Enzymes for DNA manipulation


Before 1970s, the only way in which individual genes
could be studied was by classical genetics.


Biochemical research provided (in the early 70s)
molecular biologists with enzymes that could be used to
manipulate DNA molecules in the test tube.


Molecular biologists adopted these enzymes as tools for
manipulating DNA molecules in pre
-
determined ways,
using them to
make copies

of DNA molecules, to
cut DNA
molecules into shorter fragments, and to
join

them
together again in combinations that do not exist in nature.


These manipulations form the basis of
recombinant DNA
technology
.

Recombinant DNA technology


The enzymes available to the molecular biologist fall into
four broad categories:

1.
DNA polymerase


synthesis of new polynucleotides
complementary to an existing DNA or RNA template

2.
Nucleases



degrade DNA molecules by breaking the
phosphodiester bonds


restriction endonucleases

(restriction enzyme)


cleave DNA
molecules only when specific DNA sequences is encountered

3.
Ligases



join DNA molecules together

4.
End modification enzymes



make changes to the ends of
DNA molecules

source:
Brown T. A. , Genomes. 2nd ed. http://www.ncbi.nlm.nih.gov/books/NBK21129/

DNA cloning


DNA cloning (i.e. copying)


logical extension of the ability
to manipulate DNA molecules with restriction
endonucleases and ligases


vector


DNA sequence that naturally replicates inside bacteria.


It consists of an
insert

(
transgene
) and larger sequence serving as
the backbone of the vector.


Used
to introduce a specific gene into a target cell. Once the
expression vector is inside the cell, the protein that is encoded by
the gene is produced by the cellular
-
transcription and translation
machinery ribosomal complexes.


plasmid (length of insert: 1
-
10 kbp), cosmid (40
-
45 kbp), BAC (100
-
350 kbp), YAC (1.5
-
3.0 Mbp)


Vectors


plasmid


DNA molecule that is separated from, and can replicate
independently of, the chromosomal DNA.


Double stranded, usually circular, occurs naturally in bacteria.


Serves
as
an important tool
in genetics and biotechnology labs,
where
it is
commonly used to multiply (
clone
) or express particular
genes.







BAC (bacterial artificial chromosome)


It is a particular plasmid found in
E. coli.

A typical BAC can carry
about 250 kbp
.

source:
wikipedia

source:
Brown T. A. , Genomes. 2nd ed. http://www.ncbi.nlm.nih.gov/books/NBK21129/

restriction endonuclease

ligase

DNA cloning

PCR


Polymerase chain reaction


DNA cloning results in the purification of a single fragment
of DNA from a complex mixture of DNA molecules.


Major disadvantage: it is time
-
consuming (several days to
produce recombinants) and, in parts, difficult procedure.


The next major technical breakthrough (1983) after gene
cloning was PCR.


It achieves the
amplifying

of a short fragment of a DNA
molecule in a much shorter time, just a few hours.


PCR is complementary to, not a replacement for, cloning
because it has its own limitations: the need to know the
sequence of at least part of the fragment.

Mapping genomes

What is it about?


Assigning/locating of a specific gene to particular region
of a chromosome and determining the location of and
relative distances between genes on the chromosome.


There are two types of maps:


genetic linkage map



shows the arrangement of genes (or other
markers
) along the chromosomes as calculated by the frequency
with which they are inherited together


physical map


representation of the chromosomes, providing the
physical distance between landmarks on the chromosome, ideally
measured in nucleotide bases


The ultimate physical map is the complete sequence itself.

Genetic linkage map


Constructed by observing how frequently two markers
(e.g. genes, but wait till next slides) are inherited together.


Two markers located on the same chromosome can be
separated only through the process of recombination.


If they are separated, childs will have just one marker
from the pair.


However, the closer the markers are each to other, the
more tightly
linked

they are, and the less likely
recombination will separate them. They will tend to be
passed together from parent to child.


Recombination frequency provides an estimate of the
distance between two markers.


Genetic linkage map


On the genetic maps distances between markers are
measured in terms of
centimorgans

(cM).


1cM apart


they are separated by recombination 1% of the
time


1 cM is ROUGHLY equal to physical distance of 1 Mbp in human


Value of genetic map


marker analysis



Inherited disease can be located on the map by following the
inheritance of a DNA marker present in affected individuals (but
absent in unaffected individuals), even though the molecular
basis of the disease may not yet be understood nor the
responsible gene identified.


This represent a cornerstone of testing for genetic diseases.

Genetic markers


A
genetic map

must show the positions of distinctive
features


markers
.


Any inherited physical or molecular characteristic that
differs among
individuals and
is easily detectable in the
laboratory is a potential genetic marker
.


Markers can
be


expressed
DNA regions (genes) or


DNA
segments that have no known
coding function
but whose
inheritance pattern can be followed.


genes


not ideal, larger genomes (e.g. vertebrates) →
gene maps are not very detailed (low gene density)

Genetic markers


Must
be
polymorphic
, i.e.
alternative
forms (alleles)
must
exist among
individuals so that they are detectable among
different members in family studies
.


Variations within exons (genes)


lead to observable
changes (e.g. eye color)


Most variations occur
within introns,
have little or no effect
on an
organism,
yet they
are detectable
at the DNA level
and can be used as
markers.

1.
restriction fragment length
polymorphisms (RFLPs)

2.
simple
sequence length polymorphisms (SSLPs
)

3.
single nucleotide polymorphisms (SNPs, pron
ounce

“snips”)

RFLPs


Recall that restriction enzymes cut DNA molecules at specific
recognition
sequences.


This sequence specificity means that treatment of a DNA
molecule with a restriction enzyme should always produce the
same set of fragments
.


This
is not always the
case
with genomic
DNA
molecules

because some
restriction
sites exist
as
two

alleles, one allele

displaying
the correct

sequence for the restriction

site
and therefore being cut,

and
the second allele having

a
sequence alteration so the

restriction
site is no longer

recognized
.

source:
Brown T. A. , Genomes. 2nd ed. http://www.ncbi.nlm.nih.gov/books/NBK21129/

SSLPs


Repeat
sequences that display length variations, different alleles
contain
different numbers of repeat
units (i.e. SSLPSs are multi
-
allelic).








variable
number of tandem repeat sequences

(VNTRs,
minisatellites
)


repeat unit up to 25 bp in length


simple tandem repeats
(STRs,
microsatellites
)


repeats are shorter, usually di
-

or tetranucleotide


source:
Brown T. A. , Genomes. 2nd ed. http://www.ncbi.nlm.nih.gov/books/NBK21129/

SNPs


Positions
in a genome where some individuals have one
nucleotide
and
others have a different
nucleotide.


Vast number of SNPs in
every
genome.


Each SNP could have potentially four alleles, most exist in
just two forms.


The value of two
-
allelic marker (SNP, RFLP) is limited
by
the high possibility that
the marker
shows no variability
among the members of an interesting family
.


The advantages of SNP over RFLP:


they are abundant (human genome: 1.5 millions of SNPs, 100 000
RFLPs)


easire to type (i.e. easier to detect)

Genome maps

source: Talking glossary of genetic terms, http://www.genome.gov/glossary/

relative locations of genes are
established by following inheritance
patterns

visual appearance of a chromosome
when stained and examined under a
microscope

the order and spacing of the genes,
measured in base pairs

more at
http://www.informatics.jax.org/silver/chapters/7
-
1.shtml

sequence
map