Background

fancyfantasicAI and Robotics

Nov 7, 2013 (3 years and 7 months ago)

70 views







Background

Many fundamental cellular processes involve protein networks, and comprehensively identifying them is
important to systematically defining protein function (Eisenberg et al., 2000; Lan et al., 2002, 2003).
Complex networks are also used to desc
ribe the structure of a number of wide
-
ranging systems including
the internet, power grids, the ecological food web and scientific collaborations. Despite the seemingly huge
differences among these systems, it has been shown that they all share common feat
ures in terms of
network topology
(Albert and Barabasi, 2001; Albert et al., 1999, 2000; Amaral et al., 2
000; Barabasi and
Albert, 1999;
Huberman and Adamic, 1999; Jeong et al.,

2001; Watts and Strogatz, 1998; Gavin et al. 2006,
Krogan et al. 2006). Thus, n
etworks provide a framework for describing biology in a universal language
understandable to a broad audience
(Girvan and Newman, 2002)
.


Currently, large
-
scale experiments have created a great variety of genome
-
wide information related
to protein networks
, especially in the yeast
Saccharomyces cerevisiae.
There are datasets of explicit
protein
-
protein interactions
(Gavin et al., 2002; Ho et al., 2002; Ito et al., 2000; Uetz et al., 2000
),
experimentally derived regulatory relationships
(Lee et al., 2002)
,
manually curated interactions such as
MIPS, BIND, and DIP
(Bader et al., 2003; Mewes et al., 2002)

and systems for automatically finding
interactions in the literature
(Friedman et al., 2001). In addition to the experimentally
-
derived interaction
networks,

there are also predicted interactions (Valencia and Pazos, 2002).

The most common methods
used in predicting protein
-
protein interactions are based on “guilt
-
by
-
association”. Two proteins are more
likely to interact if they share several correlated genomi
c features.

Examples of these genomic features are
gene expression profiles (DiRisi et al., 1997), phylogenetic profiles (Pellegrini et al., 1999), essentiality
(Winzeler et al., 1999), localization (Kumar et al., 2002), and gene neighborhood (Tamames et a
l., 1997).
Comparative genomics also provides an efficient way for mapping genome
-
wide interactions between
different organisms (Walhout et al., 2000
; Yu et al., 2004
).


Summary of Some Past

Results



Predicting Protein Networks

Correlating Interactions wi
th Complexes and Genomic Features

We have developed methods to assess protein
-
protein interactions and also regulatory relationships,
correlating them with structures of known complexes and with function
.

In particular, Jansen
et al
. (2002a)
developed a m
ethod for looking at the correlation between expression levels and their fluctuations and
known interactions. This allowed us to find significant differences in expression correlations between
transient and permanent complexes. Edwards
et al
. (2002) compar
ed known complexes to the interactions
from the database. Finally, Lan
et al
. (2002, 2003) looked at the relationship between functional categories
and interactions, showing that interactions could be used to systematically circumscribe and define function
.


Predicting Protein Networks from Individual Genomic Features:

We have developed methods for predicting regulatory relationships and protein
-
protein interactions from
individual types of genomic data. Jansen
et al
. (2002) and Qian
et al
. (2001) looked
at the degree to which
expression correlations could predict interactions and found that a subset of known interactions could be
predicted with high confidence
.
In addition, Qian
et al
. (2001) looked at new types of expression
correlations, those that had
a specifically time shifted or inverted relationship.

Finally, we developed an
approach based on support vector machines to predict the target of a transcription factor based on finding
relatively subtle relationships between their expression profiles (Qia
n
et al
., 2003).


Data Integration of Multiple Features to Improve Prediction

We have developed methods of combining various genomic features that produces an interaction prediction
that is stronger than each of the individual features. This is important b
oth for known protein
-
protein
interaction data sets, which suffer from a great degree of noise, and also for genomic features such as
expression correlation which are only weekly predictive of interactions. Our first analyses used simple
combinations of fe
atures (Edwards
et al
., 2002; Jansen
et al
., 2002b; Gerstein
et al
., 2002). Then we moved
on to developing more sophisticated Bayesian
-
network approaches that combine features in a way that





optimizes their predictive value (Jansen
et al
, 2003

b). In Lu et
al (2005), we saw how this result scaled with
the number of features. In Xia et al (2006), we extended it to membrane proteins.


Analysis of the Network Structure


Analysis of
the Global Structure of
Network
s and Comparison of Networks

We have carried out
a number of studies looking at the overall statistics of gene networks, finding that a
number of them have a very similar overall power law type of distribution to those found for the occurrence
of gene families (Luscombe
et al
., 2002; Xia et al., 2004; Yu

et al., 2007). In terms of smaller scale
structures, Yu et al (2006a) analyzed regulatory networks in
yeast and showed that
they have

a pyramid
-
shaped hierarchical structure
, similar in some sense to governmental "org
-
charts", with a small number of
maste
r transcription
-
factors on top. Finally, Yu et al (2006b) showed how defective cliques within networks
could be completed, defining large complexes and potentially predicting more interactions.


We interrelated regulatory and expression networks and found

that genes targeted by the same
transcription factor tend to have correlated expression (Yu
et al
., 2003).
In Yu et al (2006c), we compared
many different types of networks, defining composite hubs and motifs.


Mapping Networks between Organisms

In Yu et
al (2004), we showed how one could compare networks between organisms and use this
comparison to help in network prediction. In particular, we showed how interologs could be transferred
between organisms as a function of sequence similarity. We also define
d a related concept for the
transferring of regulatory relationships ("regulog"), and we established a database of mapped relationships
(
interolog.gersteinlab.org
).


Analysis of the Dynamics of Networks

We examined the dynamics of the regulatory system in

yeast on a genomic scale by integrating gene
expression data for five cellular conditions with known transcriptional regulatory relationships (Luscombe et
al., 2004).
To rigorously compare these condition
-
specific subnetworks we
developed

SANDY (Statistic
al
Analysis of Network Dynamics). We found that these subnetworks exhibit vastly different topologies on both
a local and global level and uncovered two separate groups of cellular states
. Moreover, we showed that
different sets of transcription factors be
come key regulatory hubs at different times, portraying a network
that shifts its weight between different foci to bring about distinct cellular states. Following on in a
subsequent analysis, in Yu et al (2006c), we analyzed the expression relationships in

small network motifs,
showing that many of them in metabolic pathways have a time
-
shifted quality.


3D Structural analysis of protein interaction networks

While there has been considerab
le

interest in protein interaction networks and their role

in

cell fu
nction,
most studies
,

thus far
,

have neglected the
bio
physical properties of the proteins involved. We have
pioneered the use of 3D protein structures for analysis of protein networks (Kim et al., 2006). This approach
gave us a unique perspective on protei
n networks and showed that many network properties previously
thought to relate to biological feature
s

were actually more reflective of biophysical quantities.









References

Akerley, B. J., E. J. Rubin, A. Camilli, D. J. Lampe, H. M. Robertson, and J. J.
Mekalanos. (1998).
Systematic identification of essential genes by in vitro mariner mutagenesis.

Proc Natl Acad Sci USA
,
95
:8927
-
32.

Albert, R., H. Jeong and A. L. Barabasi (1999). Diameter of the World
-
Wide Web.
Nature

401
: 130
-
131.

Albert, R., H. Jeong a
nd A. L. Barabasi (2000). Error and attack tolerance of complex networks.
Nature
406
:
378
-
382.

Albert, R. and A. L. Barabasi (2001). Statistical Mechanics of Complex Networks.
arXiv:cond
-
mat/0106096
:
1
-
53.

Alexandrov, V.,
Gerstein, M

(2001). Calculating p
opulations of subcellular compartments using density
matrix formalism
International Journal of Quantum Chemistry

85
: 693
-
696.

Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman.
(1997).
Gapped BLAST and PSI
-
BLAST
: a new generation of protein database search programs.
Nucleic Acids
Res
,
25
:3389
-
402.

Amaral, L. A., A. Scala, M. Barthelemy and H. E. Stanley (2000). Classes of small
-
world networks.
Proc
Natl Acad Sci USA

97
: 11149
-
52.

Arigoni, F., F. Talabot, M. Peits
ch, M. D. Edgerton, E. Meldrum, E. Allet, R. Fish, T. Jamotte, M. L.
Curchod, and H. Loferer. (1998). A genome
-
based approach for the identification of essential bacterial
genes.
Nature Biotechnology
,
16
:851
-
6.

Arkin, I, Brunger, A & Engelman, D. (1997) Ar
e there dominant membrane protein families with a given
number of helices?
Proteins

28
: 465
-
466.

Bader, G. D., D. Betel and C. W. Hogue (2003).
BIND: the Biomolecular Interaction Network Database.
Nucleic Acids Res

31
: 248
-
50.

Bailey, T. L., and W. S. Nobl
e. (2003). Searching for statistically significant regulatory modules.
Bioinformatics

19
Suppl 2:II16
-
II25

Barabasi, A. L. and R. Albert (1999). Emergence of Scaling in Random Networks.
Science
286
: 509
-
512.

Bateman A, Coin L, Durbin R, Finn RD, Hollich V
, Griffiths
-
Jones S, Khanna A, Marshall M, Moxon S,
Sonnhammer EL, Studholme DJ, Yeats C, Eddy SR. (2004)
Nucleic Acids Res
32

D138
-
D141

Berger A, Della Pietra S, Della Pietra V. A Maximum Entropy Approach to Natural Language Processing.
Computational Ling
uistics (22
-
1) March 1996.

Bertone, P .,
Gerstein, M

(2001). Integrative data mining: the new direction in bioinformatics.
IEEE Eng
Med Biol Mag

20
: 33
-
40.

Bertone, P .,Y Kluger, N Lan, D Zheng, D Christendat, A Yee, A M Edwards, C H Arrowsmith, G T

Montelione,
Gerstein, M

(2001). SPINE: an integrated tracking database and data mining approach for
identifying feasible targets in high
-
throughput structural proteomics.
Nucleic Acids

Res

29
: 2884
-
98.

Bochner, B. R. (2003). New technologies to assess ge
notype
-
phenotype relationships.

Nature Reviews
Genetics
,
4
:309
-
14.

Cheung KH, Liu Y, Kumar K, Snyder M,
Gerstein, M
, Miller P. (2001) An XML Application for Genomic
Data Interoperation.
IEEE International Symposium on Bio
-
Informatics and Biomedical Enginee
ring

(BIBE)
, pp. 97
-
103

Christendat, D., A. Yee, A. Dharamsi, Y. Kluger, M. Gerstein, C. H. Arrowsmith, and A. M. Edwards.
(2000). Structural proteomics: prospects for high throughput sample preparation.
Prog Biophys Mol Biol
,
73
:339
-
45

Cowart, L. A., Y.
Okamoto, F. R. Pinto, J. L. Gandy, J. S. Almeida, and Y. A. Hannun. (2003). Roles for
sphingolipid biosynthesis in mediation of specific programs of the heat stress response determined
through gene expression profiling.
J Biol Chem
,
278
:30328
-
38






Csank, C.,

M. C. Costanzo, J. Hirschman, P. Hodges, J. E. Kranz, M. Mangan, K. O'Neill, L. S. Robertson,
M. S. Skrzypek, J. Brooks, and J. I. Garrels. (2002). Three yeast proteome databases: YPD, PombePD,
and CalPD (MycoPathPD).
Methods in Enzymology
,
350
:347
-
73.

De
Risi, J. L., V. R. Iyer and P. O. Brown (1997). Exploring the metabolic and genetic control of gene
expression on a genomic scale.
Science

278
: 680
-
6.

Deutschbauer, A. M., R. M. Williams, A. M. Chu, and R. W. Davis. (2002). Parallel phenotypic analysis of
sporulation and postgermination growth in Saccharomycescerevisiae.

Proc Natl Acad Sci USA
,
99
:15530
-
5.

Drawid, A. and
Gerstein, M

(2000). A Bayesian system integrating expression data with sequence patterns
for localizing proteins: comprehensive applicatio
n to the yeast genome.
J Mol Biol

301
:

1059
-
75.

Drawid., R Jansen,
Gerstein, M

(2000). Genome
-
wide analysis relating expression level with protein
subcellular localization.
Trends Genet

16
: 426
-
30.

Edwards, A.M., B. Kus, R. Jansen, D. Greenbaum, J. Greenbl
att, and
Gerstein, M
. (2002). Bridging
structural biology and genomics: assessing protein interaction data with known complexes.
Trends in
Genetics

18:

529
-
536.

Eisenberg, D., E. M. Marcotte, I. Xenarios and T. O. Yeates (2000). Protein function in the pos
t
-
genomic
era.
Nature
405
: 823
-
6.

Entian, K. D., T. Schuster, J. H. Hegemann, D. Becher, H. Feldmann, U. Guldener, R. Gotz, M. Hansen, C.
P. Hollenberg, G. Jansen, W. Kramer, S. Klein, P. Kotter, J. Kricke, H. Launhardt, G. Mannhaupt, A.
Maierl, P. Meyer,
W. Mewes, T. Munder, R. K. Niedenthal, M. Ramezani Rad, A. Rohmer, A. Romer,
and A. Hinnen. (1999). Functional analysis of 150 deletion mutants in Saccharomyces cerevisiae by a
systematic approach.
Molecular & General Genetics

262
:683
-
702.

Erdos, P. and A.

Renyi (1959). On random graphs I.
Publ.
Math. (Debrecen)

6
: 290
-
297.

Fraser, H. B., A. E. Hirsh, L. M. Steinmetz, C. Scharfe, and M. W. Feldman.
(2002). Evolutionary rate in the
protein interaction network.
Science

296
:750
-
2.

Friedman, C., P. Kra, H. Yu,
M. Krauthammer and A. Rzhetsky (2001). GENIES: a natural
-
language
processing system for the extraction of molecular pathways from journal articles.
Bioinformatics
17
Suppl 1
: S74
-
82.

Gasch, A. P., P. T. Spellman, C. M. Kao, O. Carmel
-
Harel, M. B. Eisen, G.

Storz, D. Botstein, and P. O.
Brown. (2000). Genomic expression programs in the response of yeast cells to environmental changes.
Mol Biol Cell
,
11
:4241
-
57.

Gasch, A. P., and M. Werner
-
Washburne. (2002). The genomics of yeast responses to environmental st
ress
and starvation.
Funct Integr Genomics
,
2
:181
-
92.

Gavin, A.C., M. Bosche, R. Krause, P. Grandi, M. Marzioch, A. Bauer, J. Schultz, J.M. Rick, A.M. Michon,
C.M. Cruciat, M. Remor, C. Hofert, M. Schelder, M. Brajenovic, H. Ruffner, A. Merino, K. Klein, M
.
Hudak, D. Dickson, T. Rudi, V. Gnau, A. Bauch, S. Bastuck, B. Huhse, C. Leutwein, M.A. Heurtier,
R.R. Copley, A. Edelmann, E. Querfurth, V. Rybin, G. Drewes, M. Raida, T. Bouwmeester, P. Bork, B.
Seraphin, B. Kuster, G. Neubauer, and G. Superti
-
Furga. (2
002). Functional organization of the yeast
proteome by systematic analysis of protein complexes.
Nature

415:

141
-
147.

Gerstein, M
. (1998). Patterns of Protein
-
Fold Usage in Eight Microbial Genomes: A Comprehensive
Structural Census.
Proteins

33
: 518
-
534.

G
erstein, M., and R. Jansen. (2000). The current excitement in bioinformatics
-
analysis of whole
-
genome
expression data: how does it relate to protein structure and function?
Curr Opin Struct Biol

10
:574
-
84.

Gerstein, M
., N. Lan, R. Jansen (2002).
Proteomics
. Integrating interactomes.
Science
29
: 284
-
7.

Girvan, M., and M. E. Newman. (2002). Community structure in social and biological networks.
Proc Natl
Acad Sci U S A

99
:7821
-
6.

Goh, C.S., N. Lan, N. Echols, S.M. Douglas, D. Milburn, P. Bertone, R. Xiao, L.C
. Ma, D. Zheng, Z.
Wunderlich, T. Acton, G.T. Montelione,
Gerstein, M

(2003). SPINE 2: a system for collaborative





structural proteomics within a federated database framework.
Nucleic Acids Res

31: 2833
-
8.

Goh, C. S., N. Lan, S. M. Douglas, B. Wu, N. Echols
, A. Smith, D. Milburn, G. T. Montelione, H. Zhao, and
M. Gerstein. (2004). Mining the Structural Genomics Pipeline: Identification of Protein Properties that
Affect High
-
throughput Experimental Analysis.
J Mol Biol

336
:115
-
30.

Greenbaum, D., C. Colangelo,

K. Williams,
Gerstein, M

(2003).
Comparing protein abundance and mRNA
expression levels on a genomic scale.
Genome Biol

4
: 117.

Greenbaum, D., N. M. Luscombe, R. Jansen, J. Qian and
Gerstein, M

(2001).
Interrelating different types of
genomic data, from p
roteome to secretome: 'oming in on function
. Gen Res

11
: 1463
-
8.

Greenbaum, D., R. Jansen and
Gerstein, M

(2002). Analysis of mRNA expression and protein abundance
data: an approach for the comparison of the enrichment of features in the cellular populatio
n of proteins
and transcripts.
Bioinformatics
18
: 585
-
96.

Guelzim, N., S. Bottani, P. Bourgine, and F. Kepes. (2002). Topological and causal structure of the yeast
transcriptional regulatory network.
Nature Genetics
,
31
:60
-
3.

Hampsey, M. (1997). A review of

phenotypes in Saccharomyces cerevisiae.
Yeast

13
:1099
-
133.

Harrison, P. M., and M. Gerstein. (2003). A method to assess compositional bias in biological sequences and
its application to prion
-
like glutamine/asparagine
-
rich domains in eukaryotic proteomes.

Genome Biol

4:R40.

Harrison, PM,
Gerstein, M

(2002). Studying genomes through the aeons: protein families, pseudogenes and
proteome evolution.
J Mol Biol

318
: 1155
-
74.

Hartwell, L. H., J. J. Hopfield, S. Leibler and A. W. Murray (1999). From molecular to
modular cell biology.
Nature
402
: C47
-
52.

Hegyi, H., and M. Gerstein. (1999). The relationship between protein structure and function: a
comprehensive survey with application to the yeast genome.
J Mol Biol

288
:147
-
64.

Hegyi, H. and
Gerstein, M
. (2001). An
notation transfer for genomics: measuring functional divergence in
multi
-
domain proteins.
Genome Research

11:

1632
-
1640.

Hirsh, A. E., and H. B. Fraser.(2001). Protein dispensability and rate of evolution
. Nature

411
:1046
-
9.

Ho, Y., A. Gruhler, A. Heilbut,

G.D. Bader, L. Moore, S.L. Adams, A. Millar, P. Taylor, K. Bennett, K.
Boutilier, L. Yang, C. Wolting, I. Donaldson, S. Schandorff, J. Shewnarane, M. Vo, J. Taggart, M.
Goudreault, B. Muskat, C. Alfarano, D. Dewar, Z. Lin, K. Michalickova, A.R. Willems, H
. Sassi, P.A.
Nielsen, K.J. Rasmussen, J.R. Andersen, L.E. Johansen, L.H. Hansen, H. Jespersen, A. Podtelejnikov, E.
Nielsen, J. Crawford, V. Poulsen, B.D. Sorensen, J. Matthiesen, R.C. Hendrickson, F. Gleeson, T.
Pawson, M.F. Moran, D. Durocher, M. Mann,
C.W. Hogue, D. Figeys, and M. Tyers. (2002). Systematic
identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry.
Nature

415:

180
-
183.

Holstege, F. C. P., Jennings, E. G., Wyrick, J. J., Lee, T. I., Hengartner, C. J., Green, M.
R., Golub, T. R.,
Lander, E. S. & Young, R. A. (1998). Dissecting the regulatory circuitry of a eukaryotic genome.
Cell
,
95
: 717
-
728.

Horak, C.E., N.M. Luscombe, J. Qian, P. Bertone, S. Piccirrillo,
Gerstein, M
, and M. Snyder. (2002).
Complex transcription
al circuitry at the G1/S transition in
Saccharomyces cerevisiae
.
Genes Dev.

16:

3017
-
3033.

Huberman, B. A. and L. A. Adamic (1999). Growth dynamics of the World
-
Wide Web.
Nature
401
: 131.

Hughes, J. D., P. W. Estep, S. Tavazoie, and G. M. Church. (2000). C
omputational identification of cis
-
regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae.
J
Mol Biol
,
296
:1205
-
14.

Ito, T., K. Tashiro, S. Muta, R. Ozawa, T. Chiba, M. Nishizawa, K. Yamamoto, S. Kuhara, and Y.

Sakaki.
(2000). Toward a protein
-
protein interaction map of the budding yeast: A comprehensive system to
examine two
-
hybrid interactions in all possible combinations between the yeast proteins.
Proc Natl Acad
of Sci USA

97:

1143
-
1147.






Jansen, R ., D. Gree
nbaum,
Gerstein, M

(2002).
Relating whole
-
genome expression data with protein
-
protein interactions.
Genome Res

12
: 37
-
46.

Jansen, R,
Gerstein, M

(2000). Analysis of the yeast transcriptome with structural and functional categories:
characterizing highly ex
pressed proteins.
Nucleic Acids Res

28
: 1481
-
8.

Jansen, R, Yu, H, Greenbaum, D, Kluger, Y, Krogan, N, Chung, S, Snyder, M, Greeblatt, J,
Gerstein, M

(2003). A Bayesian networks approach to predict protein complexes from genomic data.
Science

302
:
449
-
453.

Jansen, R., N. Lan, J. Qian, and
Gerstein, M
. (2002a). Integration of genomic datasets to predict protein
complexes in yeast.
Journal of Structural and Functional Genomics

2:

71
-
81.

Jensen, F V,
Bayesian Networks and Decision Graphs

(Springer, New York, 20
01).

Jeong, H., S. P. Mason, A. L. Barabasi and Z. N. Oltvai (2001). Lethality and centrality in protein networks.
Nature

411
: 41
-
2.

Kanehisa, M. (2002). The KEGG database. Novartis Found Symp, 247:91
-
101; discussion 101
-
3, 119
-
28,
244
-
52.

Kanehisa, M., S.

Goto, S. Kawashima, Y. Okuno, and M. Hattori. (2004). The KEGG resource for
deciphering the genome
.
Nucleic Acids Res

32
:D277
-
80.

Kluger, Y., R. Basri, J. T. Chang, and M. Gerstein.
(2003). Spectral biclustering of microarray data:
coclustering genes and
conditions.
Genome Res

13
:703
-
16.

Koonin EV, Wolf YI, Karev GP. (2002). The structure of the protein universe and genome evolution.
Nature

420
:218
-
23.

Kumar, A. and M. Snyder. (2002). Protein complexes take the bait.
Nature

415:

123
-
124.

Kumar, A., S. Agar
wal, J.A. Heyman, S. Matson, M. Heidtman, S. Piccirillo, L. Umansky, A. Drawid, R.
Jansen, Y. Liu, K.H. Cheung, P. Miller,
Gerstein, M
, G.S. Roeder, and M. Snyder. (2002). Subcellular
localization of the yeast proteome.
Genes & Development

16:

707
-
719.

Lan
, N, GT Montelione,
Gerstein, M

(2003). Ontologies for proteomics: towards a systematic definition of
structure and function that scales to the genome level.
Curr Opin Chem Biol

7
: 44
-
54.

Lan, N., R. Jansen, and
Gerstein, M
. (2002).
Toward a Systematic Def
inition of Protein Function That
Scales to the Genome Level: Defining Function in Terms of Interactions.
Proceeding of the IEEE

90:

1848
-
1858.

Lee, T.I., N.J. Rinaldi, F. Robert, D.T. Odom, Z. Bar
-
Joseph, G.K. Gerber, N.M. Hannett, C.T. Harbison,
C.M. Thom
pson, I. Simon, J. Zeitlinger, E.G. Jennings, H.L. Murray, D.B. Gordon, B. Ren, J.J. Wyrick,
J.B. Tagne, T.L. Volkert, E. Fraenkel, D.K. Gifford, and R.A. Young. (2002). Transcriptional regulatory
networks in Saccharomyces cerevisiae.
Science. Online

298:

799
-
804.

Lin, J, J Qian, D Greenbaum, P Bertone, R Das, N Echols, A Senes, B Stenger,
Gerstein, M

(2002).
GeneCensus: genome comparisons in terms of metabolic pathway activity and protein family sharing.
Nucleic Acids Res

30
: 4574
-
82.

Lin, J., and M. Gerst
ein. (2000). Whole
-
genome trees based on the occurrence of folds and orthologs:
implications for comparing genomes on different levels.
Genome Res

10
:808
-
18.

Liu, Y., D. M. Engelman, and M. Gerstein.
(2002). Genomic analysis of membrane protein families:
a
bundance and conserved motifs.
Genome Biol

3:
research0054.

Lu H, L. Lu, J. Skolnick (2003a). Development of unified statistical potentials describing protein
-
protein
interactions.
Biophys J

Mar;
84
: 1895
-
901.

Luscombe, NM, J Qian, Z Zhang, T Johnson,
Gerste
in, M

(2002).
The dominance of the population by a
selected few: power
-
law behaviour applies to a wide variety of genomic properties.
Genome Biol

3
:
RESEARCH0040.

Luscombe NM, Royce TE, Bertone P, Echols N, Horak CE, Chang JT, Snyder M,
Gerstein M
(2003)
E
xpressYourself: A modular platform for processing and visualizing microarray data.

Nucleic Acids Res





31

3477
-
3482.

Marcotte, E.M., M. Pellegrini, M.J. Thompson, T.O. Yeates, and D. Eisenberg. (1999). A combined
algorithm for genome
-
wide prediction of pro
tein function.
Nature

402:

83
-
86.

Martone, R., G. Euskirchen, P. Bertone, S. Hartman, T.E. Royce, N. M. Luscombe, J. L. Rinn, F. K. Nelson,
P. Miller,
Gerstein, M
, S. Weissman, and M. Snyder.
(2003)
Distribution of NF
-
kappa
B
-
binding sites
across human
chr
omosome 22.
Proc Natl Acad Sci USA

(in press)

Mewes, H. W., D. Frishman, U. Guldener, G. Mannhaupt, K. Mayer, M. Mokrejs, B. Morgenstern, M.
Munsterkotter, S. Rudd and B. Weil (2002). MIPS: a database for genomes and protein sequences.
Nucleic Acids Res

3
0
: 31
-
4.

Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Barrell D, Bateman A, Binns D, Biswas M, Bradley P,
Bork P, Bucher P, Copley RR, Courcelle E, Das U, Durbin R, Falquet L, Fleischmann W, Griffiths
-
Jones
S, Haft D, Harte N, Hulo N Kahn D, Kanapin A, Kr
estyaninova M, Lopez R, Letunic I, Lonsdale D,
Silventoinen V, Orchard SE, Pagni M, Peyruc D, Ponting CP, Selengut JD, Servant F, Sigrist CJ,
Vaughan R, Zdobnov EM (2004) The InterPro Database, 2003 brings increased coverage and new
features.
Nucleic Acids

Res

31
: 315
-
318.

Molina L, Belanche L, Nebot A. Feature Selection Algorithms: A Survey and Experimental Evaluation.
(2002) IEEE International Conference on Data Mining (ICDM'02)

Naylor, G. J., and
Gerstein M
. (2000). Measuring shifts in function and evolu
tionary opportunity using
variability profiles: a case study of the globins.
J Mol Evol
,
51
:223
-
33.

Overbeek, R., N. Larsen, G. D. Pusch, M. D'Souza, E. Selkov, Jr., N. Kyrpides, M. Fonstein, N. Maltsev, and
E. Selkov. (2000). WIT: integrated system for hi
gh
-
throughput genome sequence analysis and metabolic
reconstruction.
Nucleic Acids Res
,
28
:123
-
5.

Pal, C., B. Papp, and L. D. Hurst.( 2003). Genomic function: Rate of evolution and gene
dispensability.[comment].
Nature

421
:496
-
7; discussion 497
-
8.

Pearl,
J,
Probabilistic reasoning in intelligent systems

(1988) (
Morgan Kaufmann
, San Mateo).

Pearson, W.R. and D.J. Lipman. (1988). Improved tools for biological sequence comparison.
Proc Natl Acad
Sci USA

85:

2444
-
2448.

Pellegrini, M., E.M. Marcotte, M.J. Thomp
son, D. Eisenberg, and T.O. Yeates. (1999). Assigning protein
functions by comparative genome analysis: protein phylogenetic profiles.
Proc Natl Acad Sci USA

96:

4285
-
4288.

Qian J, Stenger B, Wilson CA, Lin J, Jansen R, Teichmann SA, Park J, Krebs WG, Yu H
, Alexandrov V,
Echols N,
Gerstein, M

(2001a). PartsList: a web
-
based system for dynamically ranking protein folds
based on disparate attributes, including whole
-
genome expression and interaction information.
Nucleic
Acids Res

29
: 1750
-
64.

Qian, J ., M. Do
lled
-
Filhart, J. Lin, H. Yu,
Gerstein, M

(2001b). Beyond synexpression relationships: local
clustering of time
-
shifted and inverted gene expression profiles identifies new, biologically relevant
interactions.
J Mol Biol

314
: 1053
-
66.

Qian, J., J. Lin, N.M.

Luscombe H. Yu,
Gerstein, M
. (2003).
Predictions of regulatory networks: genome
-
wide identification of transcription factor targets from gene expression data.
Bioinformatics

19
: 1917
-
1926.

Qian, J., Luscombe NM, and
Gerstein M
. (2001). Protein family an
d fold occurrence in genomes: power
-
law behaviour and evolutionary model.
J Mol Biol

313
:673
-
81.

Rieger, K. J., M. El
-
Alama, G. Stein, C. Bradshaw, P. P. Slonimski, and K. Maundrell. (1999). Chemotyping
of yeast mutants using robotics.
Yeast

15
:973
-
86.

Ros
s
-
Macdonald, P., P. S. Coelho, T. Roemer, S. Agarwal, A. Kumar, R. Jansen, K. H. Cheung, A. Sheehan,
D. Symoniatis, L. Umansky, M. Heidtman, F. K. Nelson, H. Iwasaki, K. Hager, M. Gerstein, P. Miller,
G. S. Roeder, and M. Snyder.
(1999). Large
-
scale analys
is of the yeast genome by transposon tagging
and gene disruption.[comment].
Nature

402
:413
-
8.






Rzhetsky A, Gomez SM (2002). Birth of scale
-
free molecular networks and the number of distinct DNA and
protein domains per genome.
Bioinformatics

17
:988
-
996.

Sak
umoto, N., I. Matsuoka, Y. Mukai, N. Ogawa, Y. Kaneko, and S. Harashima. (2002). A series of double
disruptants for protein phosphatase genes in Saccharomyces cerevisiae and their phenotypic analysis.
Yeas
t
19
:587
-
99.

Smith, V., K. N. Chou, D. Lashkari, D.

Botstein, and P. O. Brown. (1996). Functional analysis of the genes
of yeast chromosome V by genetic footprinting.
Science

274
:2069
-
74.

Steinmetz, L. M., C. Scharfe, A. M. Deutschbauer, D. Mokranjac, Z. S. Herman, T. Jones, A. M. Chu, G.
Giaever, H. Proki
sch, P. J. Oefner, and R. W. Davis. (2002). Systematic screen for human disease genes
in yeast.
Nature Genetics

31
:400
-
4.


Roven, C., and H. J. Bussemaker. (2003). REDUCE: An online tool for inferring cis
-
regulatory elements
and transcriptional module acti
vities from microarray data.
Nucleic Acids Res
,
31
:3487
-
90.

Tamames, J., G. Casari, C. Ouzounis, and A. Valencia. (1997). Conserved clusters of functionally related
genes in two bacterial genomes.
J Mol Evolution

44:

66
-
73.

Tatusov RL, Natale DA, Garkavtse
v IV, Tatusova TA, Shankavaram UT, Rao BS, Kiryutin B, Galperin MY,
Fedorova ND, Koonin EV. (2001)
The COG database: new developments in phylogenetic classification
of proteins from complete genomes.
Nucleic Acids Res.

Jan 1;
29
: 22
-
8.

Tatusov, R.L., E.V. K
oonin, and D.J. Lipman. (1997). A genomic perspective on protein families.
Science

278:

631
-
637.

Thanassi, J. A., S. L. Hartman
-
Neumann, T. J. Dougherty, B. A. Dougherty, and M. J. Pucci. (2002).
Identification of 113 conserved essential genes using a high
-
throughput gene disruption system in
Streptococcus pneumoniae.
Nucleic Acids Res

30
:3152
-
62.

Thatcher, J. W., J. M. Shaw, and W. J. Dickinson. (1998) Marginal fitness contributions of nonessential
genes in yeast.
Proc Natl Acad Sci USA

95
:253
-
7.

Tong, A.
H., M. Evangelista, A. B. Parsons, H. Xu, G. D. Bader, N. Page, M. Robinson, S. Raghibizadeh, C.
W. Hogue, H. Bussey, B. Andrews, M. Tyers, and C. Boone. (2001). Systematic genetic analysis with
ordered arrays of yeast deletion mutants.
Science

294
:2364
-
8
.

True, H. L., and S. L. Lindquist. (2000). A yeast prion provides a mechanism for genetic variation and
phenotypic diversity.[comment].
Nature

407
:477
-
83.

Uetz, P., L. Giot, G. Cagney, T.A. Mansfield, R.S. Judson, J.R. Knight, D. Lockshon, V. Narayan, M.
Srinivasan, P. Pochart, A. Qureshi
-
Emili, Y. Li, B. Godwin, D. Conover, T. Kalbfleisch, G.
Vijayadamodar, M. Yang, M. Johnston, S. Fields, and J.M. Rothberg. (2000). A comprehensive analysis
of protein
-
protein interactions in Saccharomyces cerevisiae.
Natu
re

403:

623
-
627.

Valencia, A. and F. Pazos. (2002). Computational methods for the prediction of protein interactions.
Curr
Opin Struct Biol

12:

368
-
373.


von Mering, C., R. Krause, B. Snel, M. Cornell, S.G. Oliver, S. Fields, and P. Bork. (2002). Comparati
ve
assessment of large
-
scale data sets of protein
-
protein interactions.
Nature

417:

399
-
403.

Walhout, A.J., R. Sordella, X. Lu, J.L. Hartley, G.F. Temple, M.A. Brasch, N. Thierry
-
Mieg, and M. Vidal.
(2000). Protein interaction mapping in C. elegans using
proteins involved in vulval development.
Science

287:

116
-
122.

Warringer, J., and A. Blomberg. (2003). Automated screening in environmental arrays allows analysis of
quantitative phenotypic profiles in Saccharomyces cerevisiae.
Yeast

20
:53
-
67.

Watts, D.
J. and S. H. Strogatz (1998). Collective dynamics of 'small
-
world' networks.
Nature

393
(6684):
440
-
2.

Wilson, C.A., J. Kreychman, and
Gerstein, M
. (2000). Assessing annotation transfer for genomics:
quantifying the relations between protein sequence, struc
ture and function through traditional and
probabilistic scores.
J Mol Biol

297:

233
-
249.






Wingender, E., X. Chen, E. Fricke, R. Geffers, R. Hehl, I. Liebich, M. Krull, V. Matys, H. Michael, R.
Ohnhauser, M. Pruss, F. Schacherer, S. Thiele, and S. Urbach.
(2
001). The TRANSFAC system on gene
expression regulation.
Nucleic Acids Res

29
:281
-
3.

Winzeler EA, Shoemaker DD, Astromoff A, Liang H, Anderson K, Andre B, Bangham R, Benito R, Boeke
JD, Bussey H, Chu AM, Connelly C, Davis K, Dietrich F, Dow SW, El Bakkoury

M, Foury F, Friend
SH, Gentalen E, Giaever G, Hegemann JH, Jones T, Laub M, Liao H, Davis RW, et al. (1999a).
Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis.
Science

285
(5429): 901
-
6.

Winzeler, E. A., D. D.
Shoemaker, A. Astromoff, H. Liang, K. Anderson, B. Andre, R. Bangham, R. Benito,
J. D. Boeke, H. Bussey, A. M. Chu, C. Connelly, K. Davis, F. Dietrich, S. W. Dow, M. El Bakkoury, F.
Foury, S. H. Friend, E. Gentalen, G. Giaever, J. H. Hegemann, T. Jones, M.

Laub, H. Liao, and R. W.
Davis (1999b). Functional characterization of the S. cerevisiae genome by gene deletion and parallel
analysis.
Science

285
:901
-
6.

Xenarios, I., L. Salwinski, X. J. Duan, P. Higney, S. M. Kim and D. Eisenberg (2002). DIP, the Data
base of
Interacting Proteins: a research tool for studying cellular networks of protein interactions.
Nucleic Acids
Res

30
: 303
-
5.

Xia, Y. and M. Levitt (2000). Extracting knowledge
-
based energy functions from protein structures by error
rate minimization:

Comparison of methods using lattice model.
J Chem Phys

113
: 9318
-
9330.

Xia Y, H Yu, R Jansen, M Seringhaus, S Baxter, D Greenbaum, H Zhao, Gerstein M. (in press) Analyzing
Cellular Biochemistry in Terms of Molecular Networks
Annual Review of Biochemistry
.

Yang, L., Z. Gu, and W.
-
H. Li. (2003) Rate of Protein Evolution Versus Fitness Effect of Gene Deletion.
Mol. Biol. Evol
.
20
:772
-
774.

Yu, H., N. M. Luscombe, J. Qian and
Gerstein, M
. (2003). Genomic analysis of gene expression
relationships in transcripti
onal regulatory networks.
Trends Genet

19
: 422
-
7.

Zewail, A., M. W. Xie, Y. Xing, L. Lin, P. F. Zhang, W. Zou, J. P. Saxe, and J. Huang.
(2003). Novel
functions of the phosphatidylinositol metabolic pathway discovered by a chemical genomics screen with
wor
tmannin.
Proc Natl Acad Sci USA
,
100
:3345
-
50.