Oct 1, 2013 (4 years and 9 months ago)



Tutorial Notes for Pacific Symposium on Biocomputing 2009


Department of Medicine (Medical Informatics) and Pediatrics, Stanford University, U.S.A.


University of Maryland

Baltimore County, U.S.


Center for Biomedical Informatics and Section of Genetic Medicine, Dept. of Medicine and
UC Cancer Research Center; The University of Chicago, U.S.A.


Bar Ilan University
, Israel


School of Informatics, Ind
iana University, U.S.A.

Prose summary of the topic

Over the past 10 years, high
dimensional investigations related to human disease have expanded considerably in
breadth and depth. The breadth of such investigations spans at least 30 types of high
sional measurement
and experimental modalities, including RNA expression microarrays, DNA sequencing, protein identification,
mutagenesis, RNA interference, and many others. The depth of such investigations has grown to include
measurements of entire sets

transcripts, proteins,
and genomes. Most recently, these
technologies have started to be applied to the study of many diseases. In the US, the NIH Roadmap for Medical

(Zerhouni, 2003)
has led to multiple funding opportunities for b
ioinformaticians to collaborate with
clinical researchers to promote and facilitate translational research. For example, the Clinical and Translational
Science Awards, the replacement for the General Clinical Research Centers, require a strong biomedical
nformatics collaborative component.

This tutorial focuses on the emerging fields of
in diseases and phenomics: from
structures to
protein interactions to supracellular phenotypes. Experimental studies indicate that protein
teractions play a key role in many diseases, even in some that are considered complex or multifactorial.
While altered phenotypes are among the most reliable manifestations of altered gene functions, research
focused on systematic analysis of phenotype re
lationships to study human biology is still in its infancy. We
the word phenome and phenomics to describe the physical totality of all traits of an organism

(Mahner and Kary

rom Mendelian to multifactorial diseases

One of the ultimate goals of
biological sciences, and certainly one with a high impact on society, is to improve
our understanding of the processes and events that lead to disease in organisms. Molecular biologists, who
traditionally study the structure and function of individual prot
eins and genes, have gained insight and
introduced several discoveries that have ultimately reached the bedside. The deluge of newly sequenced
proteins offers tremendous amounts of data regarding the molecular basis of disease. The OMIM database
(Hamosh, S
cott et al. 2005) makes use of this opportunity. The curators of OMIM derive evidence from the
literature for the relationship between a clinical phenot
ype and its associated sequence/
mutation. OMIM is
primarily focused on Mendelian diseases, namely, disea
ses that are caused by a mutation in a single gene and
are inherited in Mendelian patterns. Simple genomic events that mutate or eliminate specific genes (e.g., frame
shift mutations or insertion of viral genes) can account for the genotypes underlying the
se diseases. An
important question in this respect is exactly how mutations in a gene lead to the observed pathology (Cargill,
Altshuler et al. 1999; Wang and Moult 2001; Wang and Moult 2003).

While OMIM is focused on monogenic disorders, it also contains
information about select “complex diseases”.
In fact, many if not most of human diseases are considered to be “complex” or “multifactorial”, and cannot be
fully accounted for by a single molecular event. Genomic and proteomic data enhance the study of thes
diseases as well, by helping to unravel the meticulous interaction networks that underlie them (Rual,
Venkatesan et al. 2005; Stelzl, Worm et al. 2005). Both the study of Mendelian diseases and that of complex
diseases increasingly rely on computational
tools and findings. In the Mendelian case

to understand the
biophysical effects of mutations and to realize how they lead to diseases

researchers often use computational
tools. In the case of multifactorial diseases, computational tools are even more p
ertinent. Biological processes
are not realized by a single molecule, but rather by the complex interaction of proteins with their environment,
including nucleic acids, ions, lipids, membranes and, of course, other proteins. Hence

to fully understand such

processes one needs to explore complex pathways, networks, expression patterns, control mechanisms and their

Numerous molecular databases include in their annotation the implication of the protein in diseases. For
example, the SWISS
OT database
(Bairoch, et al., 2005)
attempts to include in its annotation of proteins
“Disease(s) associated with any number of deficiencies in the protein”
, though the annotations are in free text
Similar annotations are becoming popular in many other da
tabases. The GeneCards (
database includes a separate section that describes the implication of a gene in various diseases. The KEGG
database ( has recently begun to curate disease
related pathways.
The G
Association Database (Becker, et al., 2004) similarly serves as an archive of the genetic association studies
while HGMD provides associations between molecular events such as mutations, insertions/deletions, and
splicing anomalies with disease (Ste
nson, et al., 2003). Finally, PharmGKB (Klein, et al., 2001) provides not
only gene
disease relationships, but also information about genotypes
drug interactions.
There are also many
domain specific databases which curate proteins that are involved in spec
ific diseases. Most of these attempts
are based on manual curation of literature.

ne aspect of this problem
that of protein interaction and disease. The study of protein interaction has
received a great deal of attention in recent years
, including PSB

sessions from 2006
2007 named Protein
Interactions and Disease
. In particular, numerous computational tools have been designed to enhance our
understanding of protein interaction. On the other hand, in the study of the molecular basis of disease, protein
interaction is increasingly acknowledged as a valuable prism through which disease might be analyzed,
understood and possibly even treated.
We will start by examining
how these two developments meet, namely
how computational analysis can be used to study t
he role of interaction in disease.

Protein interactions and their computational analysis

Every protein has a biological function, yet most of the biological functions are carried out by groups of
proteins interacting with each other and with other molecul
es in their environment in complex networks. The
main type of interaction that is of interest in this context is the interaction between proteins and other proteins.
This includes, for example the study of interactions between proteins and antibodies and t
he study of signal
transduction. Both these processes are critical to the understanding of many complex diseases and are central
tools in pharmacology and drug discovery. Many computational studies focus on protein
protein interactions
(Salwinski and Eisen
berg 2003) and their importance for predicting protein function (Rost, Liu et al. 2003).

Other types of interactions that are being studied extensively are:


DNA / RNA interaction; critical for the understanding of expression and expression con


small molecule / metal ions interaction; critical for the understanding of protein function and to
drug discovery and design

membrane interaction; critical for the understanding of a myriad of biological and pathological
including viral / microbial infections.

Interactions between proteins and other molecules can be physical, i.e. by chemically binding each other or by
binding together to a third substrate, or they can be functional, e.g. by controlling each others’ expre
ssion or by
participating in the same biochemical pathway. A complete picture of all the proteins that are involved in a
certain biological process would not only enhance our understanding of diseases but will also break new ground
in drug development by i
dentifying new targets for drugs (Ofran, Punta et al. 2005).

Computational studies of protein interaction usually attempt to do one of the following:

identify proteins that bind to a certain molecule

analyze or predict the interface
binding site

or predict the process in which a certain interaction participates

analyze or predict the effect of the loss / gain of a certain interaction

Protein interaction

and disease

Protein interaction could be implicated in pathological processes in one of two wa
ys: the elimination of an
essential interaction or the gain of a deleterious one (Ryan and Matthews 2005). Mendelian diseases are often
caused by a single mutation. Such mutation could lead to the complete elimination of the damaged gene

product. O
bviously, these cases are not of interest in the context of

protein interaction

. In many other
Mendelian cases, where the point mutat
ion does not eliminate the gene/
protein altogether, the pathological
effect could often be traced to an effect on
protein interaction (Wang and Moult 2001; Wang and Moult 2003).
Intrinsically unstructured proteins, participating in transient

possibly multiple

protein interactions,
have also been related to disease (Uversky et al. 2006; Feng et al. 2006).
Undesired protein aggregation is the
key factor in amyloidoses, a class of disease

including Alzheimer's (Dobson, 2001
Fernandez et al. 2003).

In cancer, a quintessential complex disease, the main focus of the basic and pharmacological research is on th
protein interaction.
One of the most prominent anti cancer drug today, Gleevec, is a tyrosine kinase inhibitor
and is very effective in treating
chronic myeloid leukemia

In almost all types of cancer

there are interactions
that went
. Constitutive
signal transduction, a result of an aberrant interaction, is implicated in many
tumors and is one of the main targets for therapy development (Levitzki and Gazit 1995). Control of the
expression of different proteins, which is mediated by protein
DNA inter
action, was shown to differ between
normal and cancerous cells, and between different types of malignancies
(Golub, Slonim et al. 1999)
. The
mechanisms of action of oncogenes and tumor suppressors are based on protein interactions
(Kamb, Gruis et al.
. Additionally, non
genetic diseases that are caused by infective agents (bacteria, viruses) depend almost
by definition, on the interaction of proteins and other molecules.

Analyzing the interaction networks in terms of specific interactions for each sin
gle disease has proven
successful to understand their molecular basis.
In contrast, l
ooking for general
properties of the
whole network of human disease interactions

might prove useful to describe some general principles about
diseases (Jonsson

et al. 2006, Barabasi,
personal communication

Recently, i

has been
found that
proteins with

related to a
disease are more likely to interact with proteins already known to cause similar diseases
(Gandhi et al. 2006).

Computational analysis

of protein interactions in disease

Preliminary computational studies have attempted to characterize the relationship between protein interaction
and disease. Some of these studies tried to find an explanation to the deleterious effect of some mutations. F
example, Mirkovic and his coworkers attempted to rationalize the effect of the mutations in the BRCA gene that
lead to breast cancer. They found that many of these mutations could be traced to the protein interaction sites
(Mirkovic, Marti
Renom et al.

Other studies have attempted to comprehensively characterize the effect
of all known deleterious SNPs (Wang and Moult 2001) and even devise methods to predict the effect of
uncharacterized ones (Saunders and Baker 2002).
Methods for addressing effec
ts of mutations were reviewed
by Mooney (Mooney, 2005).

At the single protein molecular level, computational techniques such as homology modeling, molecular
dynamics and protein
protein docking have been used to predict protein
protein interactions and to
study their
determinants. Since the solution of the first high
resolution ion channel structure (Doyle et al. 1998), for
example, these techniques have been widely applied to the analysis of toxin
channel interactions (Wang et al.
2006; M'Barek et al.
Wu et al. 2004).

Genome wide computational analysis of bacteria and viruses genomes can contribute to the understanding of
infectious diseases in humans. New computational techniques can be applied to the gene expression data to
monitor the host response

to the infective agents or drugs against them (Bandyopadhyay et al.

et al. 2005

Musser et al. 2005

Rachman et al. 2006). These system biology approaches will be key to uncover
complex interactions between host and pathogen, and new mechan
isms of pathogen resistance.

Computational techniques to study general network properties can be applied to the understanding of the
properties of disease related interactions. For instance, based on the differences in genes related to hereditary
with unrelated ones, a

support vector machine (SVM) based

classifier was recently proposed and

the systematic classification of all genes from the human genome (Xu et. al. 2006).

A number of other
studies was reviewed by Kann (2007), Oti and Brunn
er (2007) and Dalkilic et al. (2008).

The great challenges of the field remain to assess, analyze and predict the importance of protein interaction in
different diseases. Success in these tasks can be readily translated into progress in drug design or disc
overy and
ultimately lead to better treatment.

In recent years, the emergence of new experimental protocols and tec
niques such as RNA, DNA and protein
microarrays, two
hybrid systems, and mass spectrometry, as well as the explosion o

the number and size
sequence and stru
ture databases, have changed biomedical science. By taking advantage of the enormous
amount of data generated by all these techniques, computational biology can now attempt to capture more of the
complexity of a biological pro
ess. The

increasing number of computational studies of protein networks,
pathways, pr
protein, protein
metabolite and protein
DNA/RNA interactions indicates that it is now
possible to address the connections between protein inte
actions and diseases. We are c
onfident that the papers
presented in this session will contribute to fu
ther advance
s in


important and rapidly increasing

area of
biomedical r

Diseases and phenotypes

While the genotype represents an organism’s exact genetic make
up, t
he phen
otype of an individual
the complete physical manifestation of that organism, usually considered as a sum of multiple
individual traits, such as internal and external appearance, ability, and behavior, which are known to differ
between organi
sms. A few examples will illustrate this difference. The DNA sequence of an organism is part of
its genotype. RNA measurements from cells from an organism are a phenotype. Single nucleotide
polymorphism measurements are part of its genotype, as are assign
ments to common haplotypes. It is also
worthwhile to consider what a disease is.
A disease
is an alteration of the mind or body of an organism that
causes uneasiness, dysfunction, suffering, or death to the organism, or those in contact with the organism.
addition, it is unavoidable to consider disease in a social context as well. While a disease can clearly be a
phenotype, propensity towards or heightened risk of a disease can also be phenotypes.

Phenotypes can be represented arbitrarily, but the power
to compare phenotypes within and across species of
organisms only comes when a useful representation is chosen. For example, many mouse models have been
created to simulate human diseases and phenotypes. Knockout mice, where a particular gene has been
inated throughout the mouse or within specific tissues of the mouse, or transgenic mice, where a particular
gene has been “turned on” throughout the mouse or within specific tissues of the mouse, have traditionally been
stored and distributed through a var
iety of institutions, such as the Jackson Laboratories. Detailed phenotypes
on over 133 strains of mice, as well as scattered descriptions of thousands of additional strains are available
through their Mouse Phenome Database (Bogue, Grubb et al. 2007). Use
rs searching for a gene that might
explain a human phenotype can search on the Jackson Laboratory web
site for their phenotype of interest, using
a structured ontology, to yield a list of genes and mouse models that have been shown to have the phenotype in


Human phenotypes are harder to characterize. While Freimer and Sabatti called for a Human Phenome Project
in 2003

(Freimer and Sabatti 2003), in most cases, the richest source of phenotypes for humans comes from
knowledge of human pathological c
onditions. The Online Mendelian Inheritance in Man database and web
is a large set of genetic loci and genes with mutations or other variants associated linked to monogenic inherited
disorders. These disorders are described in a free
text historical n
arrative, relating when each disease was first
described and linked to genetics. For many diseases, an additional clinical synopsis is provided, which is a list
of uncontrolled terms describing traits seen in the disorder, roughly arranged by organ system.

There are other ways to represent human phenotypes and diseases. Papers that are published on diseases need to
be indexed in the National Library of Medicine, so the Medical Subject Headings (MeSH) can be used to
represent diseases. The International Clas
sification of Diseases (ICD) is a system that started by representing
causes of death and disease going back to late 1800s (World Health Organization 2005); the ninth and tenth
edition of the ICD are used worldwide to communicate information about diseases

between and among
physicians, hospitals, payors, and public health officials. The
Systematized Nomenclature of Medicine

(SNOMED) has been
oped by the College of American
gists since in 1965, and is a much more
detailed representation of diseas
e, enabled by molecular classification (Chute 2000). The Unified Medical
Language System (UMLS) is an overarching standardized nomenclature maintained by the National Library of
Medicine that can be used to relate ICD, MeSH, SNOMED, and over a hundred othe
r vocabularies
(Bodenreider 2004). Finally, efforts such as Disease Ontology
are also
being developed

in order to assign hierarchical relationships among disease terms.
Disease Ontology provides
mapping to SNOMED a
nd is based on ICD terms.

The advantages and disadvantages to representing diseases in each of these ways depend on the data one wishes
to relate. For example, by considered diseases by MeSH, one can use other MeSH annotations across the
literature to rela
te drugs and symptoms to disease. By considering diseases by ICD, one can tie data from public
health sources, such as epidemiological data sets. By representing diseases by SNOMED, one can relate
diseases to known pathophysiological mechanisms causing tho
se diseases.

Use of UMLS might reduce the risk
and arbitrariness of choosing one disease representation over another.

Iratxeta was one of the first to link knowledge of diseases in MEDLINE to known alterations in
biochemistry, through MeSH annotation
s across the literature. She was then able to link these changes in
biochemistry to specific genes to make predictions of genes with mutations associated with those diseases
Iratxeta, Bork et al. 2002). Using a data
driven approach, Butte and Kohane

linked publicly
microarray data with the phenotypic descriptors extracted from those experimental annotations, to broadly
associate environmental factors and studied phenotypes to genes showing differential expression (Butte and
Kohane 2006). A
number of other approaches has also been proposed, including those based on single type of
data (e.g. protein
protein interaction networks) (Chen, et al., 2006; Gonzalez, et al., 2007; Oti, et al., 2006) or
those that integrate a number of data types (Adie
, et al., 2005; Adie, et al., 2006; Aerts, et al., 2006;
Freudenberg and Propping, 2002; George, et al., 2006; Lage, et al., 2007; Radivojac, et al., 2008; Rossi, et al.,
2006; Turner, et al., 2003).

Linking phenotypes to protein
protein interactions

e genotypes and phenotypes are certainly studied individually, analyses which bring them together often
have the
highest impact

for medicine. The most common method of associating genotypes with phenotypes is
the calculation of quantitative
trait loci, whe
re consecutive regions of chromosomes are statistically associated
with quantifiable traits, such as blood pressure, height, or cardiovascular parameters (Nadeau, Burrage et al.
2003). However, phenotypes can be considered broadly, and can even include RNA

measurements or metabolic
measurements. For instance, expression quantitative trait loci (eQTLs) and metabolic quantit
tive trait loci
(mQTLs) may be used to find genetic loci associated with expression level differences of genes or metabolic
abundance (J
ansen and Nap 2001; Schadt, Monks et al. 2003; Fu, Swertz et al. 2007).

A significant methodological improvement was made

by Lage, et al. (
2007). Building on the assumption that
genetic syndromes with shared phenotypes often involve proteins in the same pa
thway, Lage and his team
formally calculate metrics to measure the similarity and differences between syndromes given clinical terms
that describe them. Then, when given an input query region of a chromosome suspected of being involved with
a genetic disea
se, their method returns a targeted prioritization of those genes within the region reflective of the
likelihood of each gene being involved with that disease. These scores are calculated based on the protein
protein interaction distance between each gene
and other genes known to be associated with diseases with
similar phenotypes.

Finally, we mention that the research has also addressed identification of drug targets
, effectively connecting
proteins, drugs

and diseases. A notable example is
work by
the B
ork group (Campillos et al.
, which
uses side
effect similarity extracted from drug labels to improve drug target identification


PROT database

Jackson Laboratories

Mouse Phe
nome Database


Online Mendelian Inheritance in Man (OMIM)

Gentrepid server

Endeavour server

TOM server

Prospector server



PhenoPred server



Ryan DP & Matthews JM. Protein
protein interactions in human disease.
Curr Opin Str
uct Biol
446 (2005)


Oti M & Brunner HG. The modular nature of genetic diseases.
Clin Genet
, 1
11 (2007)


Kann MG. Protein interactions and disease: computational approaches to uncover the etiology of
Brief Bioinform
, 333
346 (2007)


Lussier YA & Liu Y. Computational approaches to phenotyping: high
throughput phenomics.
Proc Am
Thorac Soc
, 18
25 (2007)


Loscalzo J, Kohane I, & Barabasi A
Human disease classification in the postgenomics era: a complex
systems approach to human pat
Mol Syst Biol
: 124 (2007)


Dalkilic MM, Costello JC, Clark WT, & Radivojac P. From protein
disease associations to disease
Front Biosci
, 3391
3407 (2008)


Ideker T & Sharan R. Protein networks in disease.
Genome Res
, 644
2 (2008)


Adie, E.A., Adams, R.R., Evans, K.L., Porteous, D.J. and Pickard, B.S. (2005) Speeding disease gene discovery
by sequence based candidate prioritization,
BMC Bioinformatics
, 55.

Adie, E.A., Adams, R.R., Evans, K.L., Porteous, D.J.
and Pickard, B.S. (2006) SUSPECTS: enabling fast and
effective prioritization of positional candidates,
, 773

Aerts, S., Lambrechts, D., Maity, S., Van Loo, P., Coessens, B., De Smet, F., Tranchevent, L.C., De Moor, B.,
Marynen, P.,

Hassan, B., Carmeliet, P. and Moreau, Y. (2006) Gene prioritization through genomic data fusion,
Nat Biotechnol
, 537

Bairoch, A., Apweiler, R., Wu, C.H., Barker, W.C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez,
R., Magrane, M., M
artin, M.J., Natale, D.A., O'Donovan, C., Redaschi, N. and Yeh, L.S. (2005) The Universal
Protein Resource (UniProt),
Nucleic Acids Res
33 Database Issue
, D154

Becker, K.G., Barnes, K.C., Bright, T.J. and Wang, S.A. (2004) The genetic association da
Nat Genet
, 431

Bodenreider, O. (2004). "The Unified Medical Language System (UMLS): integrating biomedical
Nucleic Acids Res

32 Database issue
: D267

The Unified Medical Language System (
) is a r
epository of biomedical
vocabularies developed by the US National Library of Medicine. The UMLS integrates over 2 million
names for some 900,000 concepts from more than 60 families of biomedical vocabularies, as well as 12
million relations among these con
cepts. Vocabularies integrated in the UMLS Metathesaurus include the
NCBI taxonomy, Gene Ontology, the Medical Subject Headings (MeSH), OMIM and the Digital
Anatomist Symbolic Knowledge Base. UMLS concepts are not only inter
related, but may also be

to external resources such as GenBank. In addition to data, the UMLS includes tools for
customizing the Metathesaurus (MetamorphoSys), for generating lexical variants of concept names (lvg)
and for extracting UMLS concepts from text (MetaMap). The UMLS kn
owledge sources are updated
quarterly. All vocabularies are available at no fee for research purposes within an institution, but UMLS
users are required to sign a license agreement. The UMLS knowledge sources are distributed on CD
ROM and by FTP.

Bogue, M
. A., S. C. Grubb, et al.
(2007). "Mouse Phenome Database (MPD)."
Nucleic Acids Res

issue): D643

The Mouse Phenome Database (MPD;
) is a repository of phenotypic and
genotypic data on commonly used and genetically
diverse inbred strains of mice. Strain characteristics
data are contributed by members of the scientific community. Electronic access to centralized strain data
enables biomedical researchers to choose appropriate strains for many systems
based research
plications, including physiological studies, drug and toxicology testing and modeling disease
processes. MPD provides a community data repository and a platform for data analysis and in silico
hypothesis testing. The laboratory mouse is a premier genetic m
odel for understanding human biology
and pathology; MPD facilitates research that uses the mouse to identify and determine the function of
genes participating in normal and disease pathways.

Butte, A. J. and I. S. Kohane (2006). "Creation and implications

of a phenome
genome network."

(1): 55

Although gene and protein measurements are increasing in quantity and comprehensiveness, they do not
characterize a sample's entire phenotype in an environmental or experimental context. Here we
comprehensively consider associations between components of phenotype, genotype and environment to
identify genes that may govern phenotype and responses to the environment. Context from the
annotations of gene expression data sets in the Gene Expression O
mnibus is represented using the
Unified Medical Language System, a compendium of biomedical vocabularies with nearly 1
concepts. After showing how data sets can be clustered by annotative concepts, we find a network of
relations between phenotypic,

disease, environmental and experimental contexts as well as genes with
differential expression associated with these concepts. We identify novel genes related to concepts such
as aging. Comprehensively identifying genes related to phenotype and environmen
t is a step toward the
Human Phenome Project.

Cargill, M., D. Altshuler, et al.
(1999). "Characterization of single
nucleotide polymorphisms in coding regions
of human genes."
Nat Genet

(3): 231

Chen, J.Y., Shen, C. and Sivachenko, A.Y. (2006) Minin
g Alzheimer disease relevant proteins from integrated
protein interactome data,
Pac Symp Biocomput
, 367

Chute, C. G. (2000). "Clinical classification and terminology: some history and current observations."
J Am
Med Inform Assoc

(3): 298

Campillos, M., Kuhn, M., Gavin, A.C., Jensen, L.J. and Bork, P. (2008) Drug target identification using side
effect similarity,
, 263

Dalkilic, M.M., Costello, J.C., Clark, W.T. and Radivojac, P. (2008) From protein
disease associations t
disease informatics,
Front Biosci
, 3391

Dobson, C. M. (2001) The structural basis of protein folding and its links with human disease.
Philos Trans R
Soc Lond B Biol Sci
, 133

Fernandez, A., Kardos, J., Scott, L.R., Goto, Y. and Berry
, R.S. (2003) Structural defects and the diagnosis of
amyloidogenic propensity,
Proc Natl Acad Sci U S A
, 6446

Freimer, N. and C. Sabatti (2003). "The human phenome project."
Nat Genet

(1): 15

Freudenberg, J. and Propping, P. (2002) A si
based method for genome
wide prediction of disease
relevant human genes,
18 Suppl 2
, S110

Fu, J., M. A. Swertz, et al.
(2007). "MetaNetwork: a computational protocol for the genetic study of metabolic
Nat Protoc

(3): 685

We here describe the MetaNetwork protocol to reconstruct metabolic networks using metabolite
abundance data from segregating populations. MetaNetwork maps metabolite quantitative trait loci
(mQTLs) underlying variation in metabolite abundance

in individuals of a segregating population using
a two
part model to account for the often observed spike in the distribution of metabolite abundance
data. MetaNetwork predicts and visualizes potential associations between metabolites using correlations
f mQTL profiles, rather than of abundance profiles. Simulation and permutation procedures are used to
assess statistical significance. Analysis of about 20 metabolite mass peaks from a mass spectrometer
takes a few minutes on a desktop computer. Analysis o
f 2,000 mass peaks will take up to 4 days. In
addition, MetaNetwork is able to integrate high
throughput data from subsequent metabolomics,
transcriptomics and proteomics experiments in conjunction with traditional phenotypic data. This way
MetaNetwork wil
l contribute to a better integration of such data into systems biology.

George, R.A., Liu, J.Y., Feng, L.L., Bryson
Richardson, R.J., Fatkin, D. and Wouters, M.A. (2006) Analysis of
protein sequence and interaction data for candidate disease gene predicti
Nucleic Acids Res
: e130.

Golub, T. R., D. K. Slonim, et al.
(1999). "Molecular classification of cancer: class discovery and class
prediction by gene expression monitoring."

(5439): 531

Gonzalez, G., Uribe, J.C., Tari, L., Brophy, C
. and Baral, C. (2007) Mining gene
disease relationships from
biomedical literature: weighting protein
protein interactions and connectivity,
Pac Symp Biocomput
, 28

Hamosh, A., A. F. Scott, et al. (2005). "Online Mendelian Inheritance in Man (OMI
M), a knowledgebase of
human genes and genetic disorders."
Nucleic Acids Res

(Database issue): D514

Jansen, R. C. and J. P. Nap (2001). "Genetical genomics: the added value from segregation."
Trends Genet

(7): 388

The recent successes of geno
wide expression profiling in biology tend to overlook the power of
genetics. We here propose a merger of genomics and genetics into 'genetical genomics'. This involves
expression profiling and marker
based fingerprinting of each individual of a segregat
ing population, and
exploits all the statistical tools used in the analysis of quantitative trait loci. Genetical genomics will
combine the power of two different worlds in a way that is likely to become instrumental in the further
unravelling of metabolic
, regulatory and developmental pathways.

Kamb, A., N. A. Gruis, et al.
(1994). "A cell cycle regulator potentially involved in genesis of many tumor

(5157): 436

Kann, M.G. (2007) Protein interactions and disease: computational appr
oaches to uncover the etiology of
Brief Bioinform

Klein, T.E., Chang, J.T., Cho, M.K., Easton, K.L., Fergerson, R., Hewett, M., Lin, Z., Liu, Y., Liu, S., Oliver,
D.E., Rubin, D.L., Shafa, F., Stuart, J.M. and Altman, R.B. (2001) Int
egrating genotype and phenotype
information: an overview of the PharmGKB project. Pharmacogenetics Research Network and Knowledge
Pharmacogenomics J
, 167

Lage, K., E. O. Karlberg, et al. (2007). "A human phenome
interactome network of protei
n complexes
implicated in genetic disorders."
Nat Biotechnol

(3): 309

We performed a systematic, large
scale analysis of human protein complexes comprising gene products
implicated in many different categories of human disease to create a phenome
teractome network.
This was done by integrating quality
controlled interactions of human proteins with a validated,
computationally derived phenotype similarity score, permitting identification of previously unknown
complexes likely to be associated with d
isease. Using a phenomic ranking of protein complexes linked
to human disease, we developed a Bayesian predictor that in 298 of 669 linkage intervals correctly ranks
the known disease
causing protein as the top candidate, and in 870 intervals with no ident
ified disease
causing gene, provides novel candidates implicated in disorders such as retinitis pigmentosa, epithelial
ovarian cancer, inflammatory bowel disease, amyotrophic lateral sclerosis, Alzheimer disease, type 2
diabetes and coronary heart disease.

Our publicly available draft of protein complexes associated with
pathology comprises 506 complexes, which reveal functional relationships between disease
genes that will inform future experimentation.

Levitzki, A. and A. Gazit (1995). "Tyrosin
e kinase inhibition: an approach to drug development."

(5205): 1782

Mahner, M. and M. Kary (1997). "What exactly are genomes, genotypes and phenotypes? And what about
J Theor Biol

(1): 55

The fundamental concepts of genom
e, genotype and phenotype are not defined in a satisfactory manner
within the biological literature. Not only are there inconsistencies in usage between various authors, but
even individual authors do not use these concepts in a consistent manner within th
eir own writings. We
have found at least five different notions of genome, seven of genotype, and five of phenotype current in
the literature. Our goal is to clarify this situation by (a) defining clearly and precisely the notions of
genetic complement, ge
nome, genotype, phenetic complement, and phenotype; (b) examining that of
phenome; and (c) analysing the logical structure of this family of concepts.

Mirkovic, N., M. A. Marti
Renom, et al.
(2004). "Structure
based assessment of missense mutations in hum
BRCA1: implications for breast and ovarian cancer predisposition."
Cancer Res

(11): 3790

Mooney, S.D. (2005) Bioinformatics
approaches and resources for si
le nucleotide polymorphism functional
Brief Bioinform
, 44

Nadeau, J. H.
, L. C. Burrage, et al.
(2003). "Pleiotropy, homeostasis, and functional networks based on assays
of cardiovascular traits in genetically randomized populations."
Genome Res

(9): 2082

A major problem in studying biological traits is understanding ho
w genes work together to provide
organismal structures and functions. Conventional reductionist paradigms attribute functions to
particular proteins, motifs, and amino acids. An equally important but harder problem involves the
synthesis of data at fundame
ntal levels of biological systems to understand functionality at higher levels.
We used subtle, naturally occurring, multigenic variation of cardiovascular (CV) properties in a panel of
genetically randomized strains that are derived from the A/J and C57BL
/6J strains of mice to perturb CV
functions in nonpathologic ways. In this proof
concept study, computational analysis correctly
identified the known relations among CV properties and revealed functionality at higher levels of the
CV system. The network

was then used to account for pleiotropies and homeostatic responses in single
gene mutant mice and in mice treated with a pharmacologic agent (anesthesia). The CV network
accounted for functional dependencies in complementary ways to the insights obtained

from genetic
networks and biochemical pathways. These networks are therefore an important approach for defining
and characterizing functional relations in complex biological systems in health and disease.

Ofran, Y., M. Punta, et al.
(2005). "Beyond annot
ation transfer by homology: novel protein
function prediction
methods to assist drug discovery."
Drug Discov Today

(21): 1475

Oti, M. and Brunner, H.G. (2007) The modular nature of genetic diseases,
Clin Genet
, 1

Oti, M., Snel, B., Huynen,
M.A. and Brunner, H.G. (2006) Predicting disease genes using protein
J Med Genet
, 691

Iratxeta, C., P. Bork, et al. (2002). "Association of genes to genetically inherited diseases using data
Nat Genet


Although approximately one
quarter of the roughly 4,000 genetically inherited diseases currently
recorded in respective databases (LocusLink, OMIM) are already linked to a region of the human
genome, about 450 have no known associated gene. Finding

related genes requires laborious
examination of hundreds of possible candidate genes (sometimes, these are not even annotated; see, for
example, refs 3,4). The public availability of the human genome draft sequence has fostered new
strategies to m
ap molecular functional features of gene products to complex phenotypic descriptions,
such as those of genetically inherited diseases. Owing to recent progress in the systematic annotation of
genes using controlled vocabularies, we have developed a scoring

system for the possible functional
relationships of human genes to 455 genetically inherited diseases that have been mapped to
chromosomal regions without assignment of a particular gene. In a benchmark of the system with 100
known disease
associated gene
s, the disease
associated gene was among the 8 best
scoring genes with a
25% chance, and among the best 30 genes with a 50% chance, showing that there is a relationship
between the score of a gene and its likelihood of being associated with a particular di
sease. The scoring
also indicates that for some diseases, the chance of identifying the underlying gene is higher.

Radivojac, P., Peng, K., Clark, W.T., Peters, B.J., Mohan, A., Boyle, S.M. and Mooney, S.D. (2008) An
integrated approach to inferring gene
disease associations in humans,


Rossi, S., Masotti, D., Nardini, C., Bonora, E., Romeo, G., Macii, E., Benini, L. and Volinia, S. (2006) TOM: a
based integrated approach for identification of candidate disease genes,
Nucleic Ac
ids Res
, W285

Rost, B., J. Liu, et al. (2003). "Automatic prediction of protein function."
Cell Mol Life Sci

(12): 2637

Rual, J. F., K. Venkatesan, et al.
(2005). "Towards a proteome
scale map of the human protein
interaction netw

(7062): 1173

Ryan, D. P. and J. M. Matthews (2005). "Protein
protein interactions in human disease."
Curr Opin Struct Biol

(4): 441

Salwinski, L. and D. Eisenberg (2003). "Computational methods of analysis of protein
protein int
Curr Opin Struct Biol

(3): 377

Saunders, C. T. and D. Baker (2002). "Evaluation of structural and evolutionary contributions to deleterious
mutation prediction."
J Mol Biol

(4): 891

Schadt, E. E., S. A. Monks, et al.
(2003). "
Genetics of gene expression surveyed in maize, mouse and man."

(6929): 297

Treating messenger RNA transcript abundances as quantitative traits and mapping gene expression
quantitative trait loci for these traits has been pursued in gene
cific ways. Transcript abundances
often serve as a surrogate for classical quantitative traits in that the levels of expression are significantly
correlated with the classical traits across members of a segregating population. The correlation structure
ween transcript abundances and classical traits has been used to identify susceptibility loci for
complex diseases such as diabetes and allergic asthma. One study recently completed the first
comprehensive dissection of transcriptional regulation in buddin
g yeast, giving a detailed glimpse of a
wide survey of the genetics of gene expression. Unlike classical quantitative traits, which often
represent gross clinical measurements that may be far removed from the biological processes giving rise
to them
, the genetic linkages associated with transcript abundance affords a closer look at cellular
biochemical processes. Here we describe comprehensive genetic screens of mouse, plant and human
transcriptomes by considering gene expression values as quantitati
ve traits. We identify a gene
expression pattern strongly associated with obesity in a murine cross, and observe two distinct obesity
subtypes. Furthermore, we find that these obesity subtypes are under the control of different loci.

Stelzl, U., U. Worm,
et al.
(2005). "A human protein
protein interaction network: a resource for annotating the

(6): 957

Stenson, P.D., Ball, E.V., Mort, M., Phillips, A.D., Shiel, J.A., Thomas, N.S., Abeysinghe, S., Krawczak, M.
and Cooper, D.N. (2003)

Human Gene Mutation Database (HGMD): 2003 update,
Hum Mutat
, 577

Turner, F.S., Clutterbuck, D.R. and Semple, C.A. (2003) POCUS: mining genomic sequence annotation to
predict disease genes,
Genome Biol
, R75.

Wang, Z. and J. Moult (2001). "SN
Ps, protein structure, and disease."
Hum Mutat

(4): 263

Wang, Z. and J. Moult (2003). "Three
dimensional structural location and molecular functional effects of
missense SNPs in the T cell receptor Vbeta domain."

(3): 748

World Hea
lth Organization (2005).
International Statistical Classification of Diseases and Health Related
. Geneva.

Zerhouni, E. (2003) The NIH roadmap,
, 63