Researchers in Bioinformatics develop algorithms and software for simulations of biochemical
processes and the analysis of molecular biology data.
The publication of the human genome sequence in February 2001 is considered to be a milestone of
scientific research. This is reflected in the reaction of the media and the public. Bioinformatics tools
were quintessential for this achievement.
er challenges lie ahead of us: genes must be identified and their function determined. An
understanding of the interplay of the gene products will be the basis for the development of future
pharmaceutical treatments. These tasks exceed the mere assembly of
sequence by far.
Bioinformatics provides decisive contributions to tackle these challenges.
From genome to drug
The following pages highlight some of the most important topics within bioinformatics research.
Bioinformatics played a decisiv
e role in the most prominent scientific achievements during recent years,
the sequencing of the human genome.
With the sequence being known, the annotation of the genome can begin. This means searching for
genes in DNA , identifying the corresponding ge
ne products (proteins, RNA), as well as explaining
their structure and function.
In order to fully understand protein function one has to consider their interplay. These interactions are
represented by metabolic and regulatory networks which, among other
things, can be simulated using
In general, drugs act by influencing the proteins that are involved in metabolism. Based on the
assembly of the human genome bioinformatics methods allow researchers to find proteins (Targets)
re better suited for treating certain diseases.
Bioinformatics delivers important contributions to the development of new drugs. Databases enable the
search through large amounts of data in order to find new candidate drugs, that are efficient, have fewe
effects and are capable of reaching the right destination in the body.
Bioinformatics supports the optimization of known therapies The comparison of complete genomes of
different individuals makes it possible to trace differences (SNPs), which may
play a role when
deciding on the individual therapy for a patient.
Viral infections present a great challenge for drug development and therapy. The fact that viruses like
HIV show high genomic variability, can result in the occurrence of viral mutations
resistance to the prescribed drugs. Therefore a physician is faced rather frequently with the problem of
finding a new therapy for each patient infected with a particular strain. Bioinformatics methods have
been developed to understand the rela
tionship between viral mutations and drug resistance, leading to
better therapeutic strategies.
The image sketches the sequencing process for
a DNA molecule (chromosome). At the beginning the
sequences of the segments are unknown. Green lines represent pieces that have been read during the
(1) cloning, (2) fragmenting, (3) sequencing, (4) comparison, (5) assembly
The human genome cons
ists of 46 long DNA molecules (chromosomes) contained within the nucleus
of the cell. The chromosomes carry genetic information. Each DNA molecule consists of two strand in
the form of a double helix. Each DNA strand is a linear polymer that consists of si
(monomers) connected end to end. Within each monomer one can find a sugar, a phosphate and a base
component. The sequences of bases represents a form of linear infomation. There are four bases
denoted by the letters A, C, G and T. The bases
A,T and G,C are complementary, i.e. bind to each other.
Based on this base pair complementarity a single strand contains the full genetic information.
The goal of sequencing is to obtain the ordered set of bases contained in the DNA in form of a long
The sequencing machines cannot read the whole genome in one step. Therefore, the genome has to be
cut into smaller pieces. In order to be able to reassemble the pieces they have to be overlapping. This
can be achieved by generating many copies of a
DNA strand and cutting it into pieces randomly (with
high pressure, ultrasound).
In the process of sequence assembley the full sequence of nucleotides is gathered from overlap
information by performing a stepwise search for pieces with overlapping ends.
Then overlapping pieces
are put together.
Bioinformaics provides suitable algorithms for the assembley step. These algorithms have to be very
efficient as the number of pieces and hence the number of pairwise comparisons for overlaps is large.
n the algorithms have to deal with such problems as repetitive sequences or reading errors in
the genome pieces.
From Chromosome to Gene
The chromosomes contain the genome of every organism. The genome contains sequence regions
that code for proteins and other molecular constituents. The proportion of coding sequence in
relation to the total genome is rather small. After sequencing a genome the task is to localize the genes.
This step, part of the annotation process, requires bi
oinformatics methods such as pattern recognition
and sequence alignment.
From Sequence to Structure
After a gene is found, the interest shifts to determining the structure and function of the protein it codes
for. The gene sequence determines
the spatial structure of the molecule. The three dimensional
structure influences the tasks performed by the molecule in the body. Structures are usually
experimentally determined by X
ray crystallography or NMR spectroscopy. Bioinformaticians develop
rithms in order to predict the shape of the molecules as well as tools for the analysis of its function.
Similarity Search and Determination of Function
Database sequence searches are a favourable method for transferring established knowledge of t
function of known proteins to newly sequenced genes. Comparison of structures can also lead to the
identification of molecular function.
Metabolic and Regulatory Pathways
Part of a pathway; here enzymes are described by their classification number (184.108.40.206 und 220.127.116.11).
Overview of a metabolic pathway (Source:
KEGG: Kyoto Encyclopedia of Genes and Genomes
After the function of a protein has been understood, it is of special interest to identify the metabolic
pathways of an organism. These are the sequence of react
ions that taken together lead from one
substance to another. The reactions themselves are generally enhanced by a catalytic enzyme.
Important examples of metabolic pathways are the Citrate cycle Glycolysis. The sum of all paths is a
In addition to the metabolic pathways there are the regulatory pathways, where biological processes are
controlled by different signals. The sum of all regulatory pathways determines the regulatory network
of an organism.
The networks can be modeled usin
g computational models. The computational pathway representation
can be used in the process of target identification, drug design and in the search for causes of genetic
In basic research these networks can be used for the comparison of metabol
ic processes of different
organisms. E.g., information on the metabolism of one organism can be used to understand the newly
sequenced genome (and, correspondingly, the metabolic pathways) of another organism.
The anlysis of the metabolic network yields a target (red X) that is to be inhibited.
Scientists can unravel t
he metabolism of an organism by studying their metabolic and regulatory
networks. First and foremost, in the human body, deviations of normal function are interesting since
these are frequently causes of disease.
If the origin of a disease is understood,
the metabolic networks can help to perform a more precise
analysis and to find targets for a possible treatment. Targets can be proteins catalysing metabolic
reactions. If a malfunctioning protein has been found, drugs can be developed to influence its ac
and to remove the source of the disease. The use of the network models also enables to analyze
possible side effects in the body that a treatment must avoid.
Target identification alone is not sufficient in order to achieve a succes
sful treatment of a disease. A
real drug needs to be developed.
This drug must influence the target protein in such a way that it does not interfere with normal
metabolism. One way to achieve this is to block activity of the protein with a small molecule
Bioinformatics methods have been developed to virtually screen the target for compounds that bind and
inhibit the protein. Another possibility is to find other proteins that regulate the activity of the target by
binding and formiong a complex.