Promoter Analysis of Co-regulated Genes in the Yeast Genome

tealackingAI and Robotics

Nov 8, 2013 (4 years and 1 day ago)

119 views


1


Promoter Analysis of Co
-
regulated Genes in the Yeast Genome


Michael Q. Zhang

Cold Spring Harbor Laboratory

P.O.Box 100

1 Bungtown Road

Cold Spring Harbor, NY 11724

Tel: (516)367
-
8393

Fax: (516)367
-
8461

Email:
mzhang@c
shl.org



2

Abstract

The use of high density DNA arrays to monitor gene expression at a genome
-
wide scale constitutes a
fundamental advance in biology. In particular, the expression pattern of all genes in

Saccharomyces
cerevisiae
can be interrogated using
microarray analysis where cDNAs are hybridized to an array of more
than 6,000 genes in the yeast genome. In an effort to build a comprehensive Yeast Promoter Database and to
develop new computational methods for mapping upstream regulatory elements, we sta
rted recently in an on
going collaboration with experimental biologists on analysis of large
-
scale expression data. It is well
-
known
that complex gene expression patterns result from dynamic interacting networks of genes in the genetic
regulatory circuitry
. Hierarchical and modular organization of regulatory DNA sequence elements are
important information for our understanding of combinatorial control of gene expression. As a
bioinformatics attempt in this new direction, we have done some computational expl
oration of various initial
experimental data. We will use cell
-
cycle regulated gene expression as a specific example to demonstrate
how one may extract promoter information computationally from such genome
-
wide screening. Full report
of the experiments and

of the complete analysis will be published elsewhere when all the experiments are to
be finished later in this year (Spellman
et al
.).


3

INTRODUCTION

Advance of science has always been driven by new experimental technologies. Molecular genetics is no
excep
tion. Cloning, automatic DNA sequencing and PCR (just to mention a few) have revolutionized the molecular
biology field and have also had great impacts on the whole life science and medicine. cDNA microarrays and oligo
-
nucleotide chips are the new technolo
gies for complex gene expression monitoring (reviewed, for example, by Stein,
1998). Together with genome sequencing, a new era of functional genomics has just commenced. As computational
biologists working in genome bioinformatics, we are facing a new ch
allenge: “how would we be able to develop
computational tools which would allow us or bench scientists to make efficient use of the new information and to turn
them into new knowledge”? In an effort to build a comprehensive Yeast Promoter Database (SCPD, Z
hu and Zhang,
1998) and to develop new computational methods for mapping upstream regulatory elements, we started recently in an
on going collaboration with experimental biologists on analysis of various large
-
scale expression data. For this special
genome

bioinformatics issue, we would like to summarize our initial exploration of such genome expression data
(reported in April, 1998 at a Kyoto theoretical biology conference) and to illustrate what information may be readily
extracted from such experiments.
We will use promoter analysis of yeast cell cycle regulated gene expression as an
example. Since this is a collaborative work with many bench scientists and more experiments are still running, full
account of experimental work and complete final data analy
sis will be published elsewhere later in this year (see
ACKNOWLEGEMENT

and Spellman

et al
.).

Transcriptional controls play a key part in the determination of cell fate during development. It has been
estimated that up to 250 transcripts may be regulated b
y the cell cycle in budding yeast,
Saccharomyces cerevisiae

(Price
et al
. 1991). It has long been known that protein synthesis in late G
1

is needed for S phase entry. There are nine
known cyclins which associate with the Cdc28 protein kinase (the master r
egulator of the cell cycle, see
Figure 1
) and
regulate its functions during different phases of the cell cycle (Nasmyth, 1993). The G
1

cyclins encoded by CLN1
-
3 are
needed for the START of the cell cycle, B
-
type cyclins encoded by CLB5
-
6 are important for
entry into S phase itself,
where G
2

cyclins encoded by CLB1
-
4 regulate entry into mitosis. At least four different classes of cell cycle regulated
genes exist in yeast (Nasmyth, 1994): G
1

cyclins and DNA synthesis genes are expressed in late G
1
; histone ge
nes in S
phase; genes for transcription factors, cell cycle regulators and replication initiation proteins in G
2
; and genes needed for
cell separation as cells enter G
1
. Early and late G
1
-
specific transcription is mediated by the Swi5/Ace2 and Swi4/Swi6
c
lasses of factors, respectively. Changes in cyclin/Cdc28 kinases are thought to be involved in all classes (
Figure 1
).



4














Using the cDNA microarray technology (method described, for example, in DeRisi,
et al
., 1997), mRNA levels
were measured

for 95% of all yeast genes during time courses
following synchronization by both


factor arrest and
centrifugal elutriation (see
METHODS
). Figure 2 shows a photo of a cDNA arrayer and Figure 3 shows a typical
fluorescent image of the entire yeast genome
expression microarray. In the cell cycle experiments, synchronized cell
transcripts “tagged by red color” and the asynchronous cell transcripts “tagged by green” were mixed with equal amount
and hybridized to the cDNA array. Relative ratio (or actually the

log
-
odd ratio) of red to green intensity were measured
as the relative level of mRNA transcripts after calibration with various controls. The whole time course images of the
spots for each gene (ORF) were digitized as a row of red
-
green expression pattern

(see examples in
Figure 4
). The
whole set of gene expression patterns, treated as time series, may be clustered according to their similarities.
Figure 4

G
2

Late G
1

Cln/Cdc28

Cln3/Cdc28

kinase

Cell

size

Clb1
-
4/Cdc28

kinase

MBF

SBF

CLN1,2


HO


CLB5,6

&

S phase

proteins

Budding

S phase

CLB1,2


SWI5

ACE2

FAR1?

CDC47



CTS1

EGT1,2

SIC1

PCL9

FAR1?

CLN3

SWI4

CDC6

CDC46

Mitosis

Early G
1

M/G
1

Mcm1

/SFF?

ECB?

and

Swi5

Ace2

Nucleus

?

Clb proteolysis

In M phase

Figure
1. A model illustrating regulatory interactions determining cell cycle regulated transcription in yeast (Koch &
Nasmyth, 1994; McInerny
et al
., 1997). Cln3
-
associated kinase activates late G
1

specific transcription factors [SBF
(SCB binding factor) and MBF

(MCB binding factor)] in a cell size dependent fashion. SBF and MBF mediate the
expression of CLN1,2 and CLB5,6 as well as S phase proteins, leading to budding and S phase entry. By an unknown
mechanism, CLN1,2 activity allows accumulation of Clbs. Clb1 a
nd Clb2 activate transcription of G
2

specific genes
and thereby autoactivate their own synthesis, possibly via the transcription factors Mcm1 and SFF. At the same time,
Clb1,2/Cdc28 represses SBF
-
mediated transcription. While Clb1,2/Cdc28 actives expressio
n of SWI5 and possibly of
ACE2 RNAs via Mcm1/SFF, it keeps the gene products in an inactive state by phosphorylation of the nuclear location
signals. Clb proteolysis at the end of mitosis dramatically changes the situation: Clb
-
mediated activation of G
2

sp
ecific
genes is stopped, and Swi5 loses its inhibitory phosphorylations, leading to its uptake into the nucleus where it can
active the early G
1

specific transcripts. At late M phase, a Mcm1
-
related factor binds to ECB (early cell cycle control
box) and ac
tivates M/G
1

specific activation of CLN3, SWI4 and some DNA replication genes, these genes products
play critical roles in promoting the initiation of S
-
phase.


5

contains two small portions of the cell cycle expression clustering images. The starting point of the

bioinformatic
investigation is to collect genes from each such clusters.

DATA AND METHODS

Experiments


The full details of experimental protocols will be published in the complete analysis (Spellman
et al
.). For this
bioinformatic exploration, it may suf
fice to mention the two synchronization methods: (1)

-
factor release: Mat
-
a cells
arrested in G
1
right before the START by

-
factors can start cycling after releasing the

-
factors. Transcript samples
were taken at the successive time points as indicated i
n
Figure 4
. (2) Elutriation: Small G
1

unbudded cells were selected
by differential centrifugation.


DNA sequence data sets


M/G1 (24)

G1/S (48)

Histones (8)

G2/M (25)

ASH1

BUD9

CTS1

EGT2

FAA3

PCL9

PIR1

PIR3

RME1

SIC1

SUN4

YBR158W

YDR055W

YER124C

YGR086C

Y
HR143W

YIL104C

YKL182W

YNL046W

YNL078W

YNR067C

YOR263C

YOR264W

YPL158C


AXL2

CDC45

CDC9

CLB5

CLB6

CLN1

CLN2

CSI2

CTF18

DPB2

GIN4

HCM1

MNN1

MSH2

MSH6

POL12

POL30

PRI2

RAD27

RAD51

RAD53

RFA1

RFA2

RHC21

RNR1

RNR3

RSR1

SMC3

SPT21

SVS1

SWI4

TMP1

YCL060C

Y
CL061C

YDL163W

YDR545W

YGR151C

YHR149C

YLR183C

YLR463C

YLR465C

YLR467W

YNL300W

YNL339C

YOX1

YPL267W

YPR202W

YPR203W

HHF1

HHF2

HHT1

HHT2

HTA1

HTA2

HTB1

HTB2


ACE2

ALK1

BUD3

BUD4

CDC20

CDC47

CDC5

CLB1

CLB2

CYK2

DBF2

HST3

KIN3

MY01

PHO3

SWI5

YGR138C

YIL158W

YLR190W

YML034W

YML119W

YNL057W

YNL058C

YPL141C

YPR156C


The Control set is made up of 275 non
-
cell
-
cycle
-
regulated gene promoters.

The 500 bp upstream (of ATG) region in each sequence was used for initial sequence alignments and the corre
sponding
700 bp upstream region was used for later motif searches (see below).


K
-
tuple relative information

Let
P
i
(

)

be the frequency of a
k
-
tuple


(i.e.
k
-
mer or word of length
k
) in a data set
i
, then the
k
-
tuple
information (or entropy) of data set
i

relative to data set
j

is defined by
P
ij

= log(P
i
/P
j
)
. In the current work, we use
5
-
tuple (pentamer) relative information and
i = 0, 1, 2, 3

corresponding to Control (non
-
cell
-
cycle regulated), M/G1,
G1/S and G2/M clusters, respectively.
P
0

has been sym
metrized such that
P
0
(

) = P
0
(

’)

where



is the reverse

6

complement of

. The advantage of using symmetrized control is to be able to see if there is a reverse complement
symmetry for
P
i0
.

Motif extraction


Two major motif sequence alignment programs:
CON
SENSUS

(Hertz
et al
. 1990) and
GIBBS SAMPLER

(Lawrence
et al
. 1993; Neuwald
et al
. 1995) were used originally. Because
CONSENSUS

produced only very similar
results,
GIBBS

was not specially designed for DNA, and more importantly the results from both of th
ese programs were
often overwhelmed by poly(A/T) stretches that are known to be ubiquitous promoter elements (see later in text), final
analysis was exclusively done with
GibbsDNA

(a modified version of
GIBBS Motif SAMPLER
) which can take into
account DNA
structure (such as double strands and palindromes) and constraints (such as including/excluding
subsequences, distances and discrimination of different classes.
GibbsDNA

is still under development and will be
published with full testing statistics in the f
uture). Different alignment results were manually combined in order to
maximize relative information (see the fore
-
mentioned references for details). Once alignment is obtained (either by
references or by software), standard consensus or weight matrix may
be built and used to search for more potentially
similar motifs (Stormo, 1990). When using weight matrix search, we set cutoff at the maximum level such that all
published motifs should be retained.


RESULTS

A global survey of upstream sequences by pentam
er relative information

Given clusters of a large amount of upstream sequences, a quick and effective global
-
comparison using
k
-
tuple
frequency method is often very informative. One may refer this kind of methods as STS
-
finger printing
in silico
. As 5
bp i
s the half
-
turn of a DNA helix and is often comparable to the core
-
size of many promoter elements, in
Table 1
, we
have compiled two types of pentamer relative information (PRI)
P
ij

(see
METHODS
) data.

In
Table 1a
, the pentamer information
P
i0

> 0.5

for ea
ch phase
-
cluster (relative to the Control) is shown. The
pentamers are color
-
coded by the cluster to which the largest information value belongs and a bold
-
face indicates the
pentamer belonging only to one cluster under the current cut
-
off (
0.5
). This show
s that: (1) G1/S
-
promoters contain
most biased pentamers relative to the Control, it has 2 elements with PRI > 2.0 which are most likely related to the
classical MCB (MluI cell cycle box) motif ACGCGT. Other high
-
scoring pentamers (such as CGAAA:0.733,
CAC
GA:0.636 and TTTCG:0.515), which may be related to the SCB (Swi4
-
Swi6 cell cycle box) motif CACGAAA
(MCB and SCB are reviewed, for example, in Andrews & Mason, 1993), are also clearly visible. (2) M/G1
-
promoters

7

contain second most biased pentamers, among
which SWI5 motif GCTGG/CCAGC (1.262/1.071) seem to play a
predominant role (Kovacech
et al
., 1996; Dohrmann
et al
., 1996; McBride
et al
., 1997). GGCCG may be related to
HAP1/CYP1 (Nait
-
Kaoudjt
et al
., 1997) and some C
-
strings may be related to CG
-
box bindi
ng zinc
-
finger transcription
factors (such as MIG1, see Bohm
et al
., 1997 for example). (3) G2/M
-
promoters have the least biased pentamers (none
has PRI > 0.5). Database search of TRANSFAC (
Heinemeyer
et al
.
, 1998) and SCPD (Zhu & Zhang, 1998) indicated
t
hat many of these pentamers may be related to ubiquitous transcription factors: ABF1, REB1, RAP1 and MCM1
. In
vitro

DNA binding studies with both cell extracts and recombinant MCM1 proteins suggested that the primary sequence
recognition determinant for MC
M1 is the halfsite sequence
TCCTAAT
(see below, and Bender & Sprangue 1987;
Passmore
et al
. 1989), which is related to TAGGT:0.716, TTAGG:0.562, TCCTA:0.55 and CTAAT:0.532. On the other
hand, the SFF (see
Figure 1
) motif
GT(C/A)AACAA

(Althoefer
et al
. 1995
) is also related to GTAAA:0.593 and
TAAAC:0.571.

It is also very interesting to see relative information changes between every pair of consecutive phase
-
clusters.
This is shown in
Table 1b
. It becomes obvious that (1) G1/S
-
specific transcription burst mus
t be very strong and the
transition must be very sharp, because the PRI of the MCB and SCB like signals is not only high relative to the Control
but also high relative to the earlier or later phases. (2) Many pentamers with high PRI relative to the earlier

or later
phases do not score high in PRI relative to the Control. Most remarkably, none of high
-
scoring (> 1.0) G2/M
-
pentamers,
relative to G1/S, has a PRI > 0.5 relative to the Control. The fact that CCGGG is on top of both
P
12

and
P
32

lists implies
it i
s very rare in G1/S
-
promoters. (3) The strong Swi5 effect can also be readily seen from P
12

(i.e. M/G
1

vs. G
1
/S)
where most of yellow pentamers are related to Swi5/Ace2 consensus (A/G)CCAGC (see below), indicating a potential
sharp drop of Swi5/Ace2 activa
ted genes in G
1
. It is known that most Swi5 protein is rapidly degraded upon entry into
the nucleus at M/G
1

(Tebb
et al
. 1991) and the stability of Swi5 in transcription complexes at different M/G
1

promoters
might determine the duration of gene expressio
n. EGT2 expression, for example, drops soon after cells enter G
1
.

Swi5/Ace2 motif is abundant in M/G
1
-
promoters


M/G
1

transition is one of the major switches in the yeast cell cycle, it is linked to the destruction of Clbs as
cells exit from mitosis. A num
ber of genes involved in cytokinesis and cell separation are expressed during this period.
Among the 24 genes in the M/G
1

cluster, CTS1 encodes chitinase, and the known “early G
1

specific” EGT2 may also
have a role in cell separation. The RNA levels of SIC
1, encoding an inhibitor of the Cdc28 kinase, are also known to be
maximal in early G
1
. High levels of Sic1 may be important to prevent premature entry into S phase.


8

It is known early G
1

specific transcription of a number of genes is mediated by a pair of
related transcription
factors, Swi5 (Nasmyth
et al
. 1987) and Ace2 (Dohrmann
et al
. 1992). Some genes, like HO, depend on Swi5, while
others (including CTS1) require Ace2. Others still can be activated by either. EGT2 expression, for example, is mostly
due

to Swi5, but can also be mediated by Ace2 (Kovacech
et al
. 1996). Consistent with such overlapping functions,
Swi5 and Ace2 are 83% identical in their zinc
-
finger DNA
-
binding domains (Dohrmann
et al
. 1992). Differences in
target specificity of Swi5 and Ac
e2 may be due partly to combinatorial interactions with other factors, such as NCE3 in
CTS1 (Dohrmann
et al
. 1996) or PHO2 in HO (McBride
et al
. 1997).

Actually, Swi5
-
dependent transcription is the only case in which we know how the Cdc28 kinase determines

cell cycle regulated gene expression, but Swi5 binding site (only known in 1 or 2 genes) has not been characterized
experimentally. Phosphorylation of Swi5’s nuclear localization signal by the Cdc28 kinase during G
2
/M (when Swi5 is
synthesized) prevents e
ntry into the nucleus (
Figure 1
. and Moll
et al
. 1991). Ace2 is also only synthesized during
G
2
/M phases and transported to nuclei as cells enter G
1
, suggesting that the mechanisms governing Swi5
-

and Ace2
-
dependent transcription may be similar. In the M/
G
1

promoter analysis, we do not distinguish their binding site
difference and simply call the consensus
(A/G)CCAGC

Swi5 motif, which really stands for Swi5/Ace2 motif. The
Swi5 motif can be easily found by multiple sequence alignment of upstream 500 bp (of

ATG) M/G
1

promoter DNA
sequences. As shown in
Table 2
., 18 out of 24 sequences have this element and many have multiple copies.
Genes in
the M/G
1

cluster (see
METHODS
) are shaded in yellow. Genes with published elements are in bold. CTS1 (underlined)
elem
ents were shown to be ACE2 binding sites. Elements found by
GibbsDNA

are indicated by “*”. The rest was either
found by consensus or matrix search or from publication. “+/
-
“ refer to forward/backward strand and the coordinates are
relative to the ATG start

site. Conserved core is shaded by red and less conserved region by gray. The result is also
consistent with the
in silico

pentamer STS finger print analysis mentioned above.

It is more instructive to compare Swi5 motif distribution in different clusters
(
Figure 5a
). Here the consensus
(RCCAGC and its reverse complement) was used for the motif search. The upstream sequence region is divided in bins
of 50 bp (“
-
450” means from “
-
500” to “
-
449”, and etc.) and motif count per sequence in each bin is shown as
a bar
plot. It may be clearly seen that Swi5 motif is highly enriched in M/G
1

and is centered on

300 to

250 region. It is also
highly suppressed even comparing to the Control of non
-
cell cycle regulated promoters. As another comparison, A
-

and
T
-

homo
-
h
examer distribution is also plotted (in open bars). It is well know that homopolymeric dA:dT sequences are
extremely abundant in most of the yeast promoters. They affect nucleosome formation
in vitro

and are required for
wild
-
type levels of transcription
i
n vivo
. This ubiquitous promoter element stimulates transcription via its intrinsic DNA

9

structure (Iyer and Struhl, 1995). But they can create a lot of problems
in silico

during an alignment. In
Figure 5a
, in
addition to the normal peak around

150 to

100

(‘TATA
-
box” related region), there also seems to be a second peak
which correlated to the Swi5 peak in the M/G
1

cluster. Close examination confirmed (data not shown) that some of the
Swi5 sites were associated with an upstream A
-
string within one turn of
DNA helix pitch distance (or a downstream T
-
string of the reverse complement core motif GCTGG).

MCB is the most abundant motif in G1/S
-
promoters but overlapped substantially with SCB


The G
1
/S transition is particularly important in budding yeast for coor
dinating cellular growth with cell
division. When cells reach a critical size, they enter S phase, duplicate their spindle pole bodies, form buds and, if
haploid, become refractory to pheromone
-
induced cell
-
cycle arrest. All these events, which are initiat
ed simultaneously
at a point in late G
1

called START, require activation of the Cdc28 protein kinase by one of G
1

cyclins encoded by
CLN1,2,3 (Reed 1992; Nasmyth, 1993). The transcripts for the G
1

cyclins CLN1,2,3 and CLB5,6 are absent in small,
early G
1

cells, but appear abruptly around START. Actually, CLN1,2 and CLB5,6 belong to a large family of genes that
are transcribed exclusively in G
1
/S phase (see
Figure 1
.). Yeast biologists have subdivided them into two groups
according to the
cis
-
acting sequenc
es found within their promoters. The first group has a sequence motif called the SCB
element (Swi4/6 cell cycle box,
CACGAAA
) which acts as a late G
1
-
specific UAS element (Nasmyth 1985; Breeden &
Nasmyth 1987; Andrews & Herkowitz 1989; Andrews & Moore 1992
.) and is found in the promoters of CLN1,2
(Nasmyth & Dirick, 1991), the HO endonuclease gene (Nasmyth, 1985), and HCS26 (which encodes a cyclin
-
like
protein, Ogas et al. 1991). The second group has many more members, including many genes involved in DNA
s
ynthesis and the B
-
type cyclin
-
encoding genes CLB5,6 (Schwob & Nasmyth 1993; Epstein & Cross 1992). Their
promoters contain sequences similar to the MluI cell cycle box (MCB element, ACGCGT, McIntosh 1993). MCB
elements, like SCB elements, can confer late
G
1

specific gene expression to otherwise inactive promoters (McIntosh
1993; Lowndes et al. 1991; McIntosh et al. 1991).


Indeed, 34 out of 48 G
1
/S promoters have putative MCB elements, which may also be easily found
GibbsDNA

(
Table 4
.). Again, many promote
rs contain multiple repeats of this element and that was why it was originally
identified. Because of the palindrome symmetry of the core motif, all the elements are listed in the same polarity as the
downstream genes. The consensus also confirms the PRI
(pentamer relative information) analysis mentioned above. In
addition to the known elements (indicated by the bold letters), many could be novel and may be responsible for the
activated G
1
/S transcriptions of the downstream genes. Comparing to MCB, SCB el
ements are more difficult to identify
because they are much less in number and because they are highly related to MCB. We had not been able to identify the

10

alignment with ordinary alignment programs (such as
CONSENSUS

and
GIBBS SAMPLER
) even if multiple mo
tifs
were requested. It was detected by
GibbsDNA

after MCB and poly(dA:dT) of length 4 were masked. Up to some
ambiguous elements between MCB and SCB, the alignment results are equivalent to simple consensus string searchs,
which are much more efficient. M
CB and SCB distributions in different promoter clusters further confirm their role in
G
1
/S (
Figure 5b
). More importantly, excess repeats of MCBs are localized near the upstream of the “TATA
-
box”
region (
-
200,
-
100).


Like Swi5 and Ace2, the sub
-
units: Mbp1
in MBF and Swi4 in SBF also share similarities both at their amino
termini (Koch
et al
. 1993), which is the DNA binding domain related to HNF3

/fork head (a member of the family of
“winged” helix
-
turn
-
helix proteins, Taylor
et al
., 1997; Xu
et al
., 1997),
and at their carboxy
-
terminal regions, which
are necessary for binding to the common factor Swi6 (Sidorova and Breeden, 1993). In contrast to DSC1 in S. pombe,
the only homologue which is structurally similar to SCB but binds to MCB like element, neither S
BF nor MBF is
essential for budding yeast. But
swi4

mbp1

double mutants arrest in G
1

and fail to express CLN1 and CLN2 (Kock
et al.

1993). MCB and SCB could be the same genetic element bound by many related transcription factors as it is reported
that yet

other factors can active MCB
-
mediated gene expression in the absence of MBF or SBF (Kock
et al
. 1993). It is
conceivable that further sub
-
classification of such motifs may be possible with finer sub
-
clustering of gene expressions.

More histone UASs are po
ssible


The eight histone genes seem to compose yet another late G
1

cluster which has a distinct expression pattern
(see the lower panel in
Figure 4
and

Table 3
). Genetically, it is known that a functional CDC4 gene product is required
to turn on histone
transcription (White
et al
. 1987) and the CDC7 gene product is required to turn off transcription
(Hereford, et al. 1982). Their unique cell cycle regulation indicates their promoter structure may be different from other
G
1

specific genes. Detailed geneti
c analysis has revealed that the histone genes contain consensus TATA
-
box motifs
and the distal promoter sequences may contain both positive (UAS) and negative (NEG) elements that selectively
regulate transcription (Osley 1991). Two or three copies of a c
onserved 16 bp sequence (consensus
GCGAAAAANTNNGAAC
) are found within four histone loci. Deletion and promoter substitution analyses performed
in vivo with histone
-
lac
-
Z reporter genes derived from either the HTA1
-
HTB1 (encoding H2A
-
1 and H2B
-
1) or HHT2
-
HH
F2 (encoding H3
-
2 and H4
-
2) locus have identified this sequence as an upstream activation (UAS) element (Osley
et
al
. 1986). This element has a S phase
-
specific function as well because three copies of the repeats can active the
transcription of the normal
ly constitutive CYC1 gene at the G
1
/S phage boundary (Osley
et al
., 1986). The negative site
(NEG) has been localized to a 67 bp region in the HTA1
-
HTB1 promoter that is characterized by several sequence

11

motifs, including direct and inverted repeats, and
it contains a 15 bp sequence (consensus
TNNACGCTNAANGNC
)
also found in HHT1
-
HHF1 and HHT2
-
HHF2 promoters, but not in the HTA2
-
HTB2 promoters (Breeden 1988).


As each pair of divergently transcribed histone genes shares a common promoter, the intergenic re
gion
between each pair of the ATG start sites is shown (and was used for alignment). Mapped TATA
-
boxes are shown in
red. Mapped negative (NEG) elements and UAS1/UAS2 elements are shown in blue and dark green, respectively
(Osley, 1991). Additional putative

UAS elements were found by
GibbsDNA
. Potential SCBs are also underlined. A
novel repeat element AACAA(not T)A is indicated by a box. Although histone UAS is clearly different from SCB, they
are still somewhat related through CGAAA sequence. It would be in
teresting to find out if their binding factors could
also be related and if those additional UASs found by computational method are real.

Mcm1 motif only become G
2
/
M specific when associated with a SFF

Finally, G
2
/M is another important transition during t
he cell cycle. Several genes are known to be transcribed when cells
enter G
2
. These include the mitotic cyclins CLB1,2 and the transcription factors SWI5/ACE2 mentioned above

(Figure
1
). Are G
2

specific genes also regulated by a common set of transcription

factors? CLB1,2 and SWI5 have been
compared with regard to their dependence on different cell cycle events. These three genes have identical expression
patterns, do not accumulate in
cdc34

mutants, and require CLBs 1
-
4 and CDC28 (Amon
et al
. 1993) for the
ir
expression, suggesting that they are similarly regulated and activated by Clbs 1
-
4. SWI5 transcription is known to be
regulated by a UAS sequence that forms a ternary complex with the transcription factor SFF (Lydall
et al
. 1991) and
Mcm1 (
Figure 1

and
Treisman and Ammerer 1992). Several potential Mcm1
-
binding sites are also present in the 5’
flanking regions of CLB2 (Kuo and Grayhack, 1994). MCM1 is not only required for SWI5 transcription but also for
expression of CLB1,2. It is therefore possible that

SWI5, CLN1,2 and many other G
2
-
specific transcripts are
coordinately regulated by SFF and Mcm1.


In
Table 5
, the alignment result for the Mcm1 motif in the G
2
/M promoters is shown. A larger flanking region
is retained so that other potential factor bindi
ng sites may be seen. In vitro selected MCM1 binding site is characterized
by a consensus
DCCYWWNNRG
(Wynne and Treisman 1992). Because Mcm1 sites are also found in other
promoters, we also did the SFF motif alignment and the potential SFF sites are indica
ted in green, which has a
consensus
GTMAACAW
. After examining the distributions of the two motifs in different cluster promoters (
Figure
5c
), it becomes clear that those Mcm1 sites localized in (
-
250,
-
100) are more G
2
/M specific and that most are also
cor
related with the peak of SFF site distribution in this phase. We also found that if one uses a regular expression

12

“CC.{6}GG.{5,10}GTMAACAW” to search all clusters, one could only find the hits in the G
2/M cluster (data not
shown).


While Mcm1 also interac
ts with cell type specific regulators (indicated by purple, pink and dark
-
green in
Tabel 5
, and Treisman & Ammerer 1992), SFF may be specifically involved in the G
2
-
dependent expression. The gene
encoding SFF has not been identified. SFF binding activity
is present throughout the cell cycle, and therefore G
2
-
specific gene expression may be regulated by post
-
translational changes in Mcm1/SFF activity. Transcription could be
activated by phosphorylation of SFF by Clbs and repressed by their destruction upon

exit from mitosis (Koch and
Nasmyth, 1994).


Recently, another Mcm1 related motif called ECB (early cell cycle box, consensus
TTWCCCNNNNAGGAAA
) was reported (McInerny
et al
, 1997) to be important for M/G
1
-
specific transcription of
SWI4, CLN3, CDC6, CDC46
and CDC47. But SWI4 was in our G
1
/S cluster, CDC47 was in our G
2
/M cluster, and
CLN3, CDC6 and CDC46 were not in any of our conservatively picked clusters. These genes could have more complex
expression pattern or our initial crude cluster method was no
t sensitive enough. We did indicate, in Table 5, some of the
potential ECBs by requiring more stringent flanking palindrome (TTTCCNNNNNNGGAAA, in “red”). In our limited
M/G
1

genes (indicated in “yellow”), we did not find any ECBs. With better clustering (
which we are currently working
on, Spellman
et al
.), we would be able to address this better
1
.

COMMENTS

Since this is our first exploratory analysis of gemone
-
scale gene expression data, we did not seek for an
automatic UAS motif finding algorithm. All the

results were obtained by combination of information from various
sources. Although we did start out by using
GibbsDNA

(sometimes constrained by the
k
-
tuple information) extensively
(normally more than 100 times for each cluster in order to assess the stoc
hastic fluctuation and to try out different
parameters), we then always compare the potential motifs with known experimental results and try to summarize
alignment by simple consensus. We do not believe, at this early stage, one should emphasize on automat
ion. Actually,
one of the important lesson we learned from this initial study is that, motif extraction is often sensitive to clustering, on
e
needs to improve clustering in order to get more sensible motif and
vice versa

(Spellman
et al
.). The real challen
ge is
how to integrate the two processes. It is possible to use
k
-
tuple based method for automatic motif extraction, this would
only be practical for short and strong motifs and combining top
-
ranking tuples is still problematic (van Helden
et al
.



1
Indeed, under a rigorous clustering scheme, we were able to identify ECB (with a consen
sus: TTTCCcaATngGGAAA
) in one of three M/G1 sub
-
clusters (Spellman,
et al
.) .



13

1998). Ev
en if one automates Gibbs sampling, one may still find many false positives (F. Roth, private communication).
We did also find potential novel motifs (such as AGCSGCT in G
1
/S and GCSCRGC in M/G
1
, data not shown, in
addition to Swi5 site which has not been
characterized experimentally), we should be cautious as they could also be
false positives as more experiences or information are gained. Very recently, two other similar experimental analyses
were also reported: (1) a cell cycle study, using oligo
-
nucleot
ide chips, was carried out (Cho
et al
, 1998), where about
400 cell cycle regulated genes were identified but the Swi5 site was missed by the promoter analysis; (2) an iterative
Gibbs sampling algorithm, called
AlignACE
, was applied to find putative motifs
in Galactose
-
response, heat
-
shock and
mating
-
switch genome expression data (Roth

et al
.), where only one time
-
point was measured for each experiment in
stead of a time
-
profile which would limit the clustering accuracy. We were actually able to identify 800

cell cycle
regulated genes and more than 20 motifs from 9 sub
-
clusters (Spellman
et al
.), a comparison of promoter analyses
among these three experiments will be presented elsewhere (Zhang, submitted).

Traditionally, computational analysis of promoters ha
s been limited by the scarcity of the available
experimental data and by the tedious manual procedure of getting such data from the literature (Fondrat and
Kalogeropoulos, 1996; Zhang, 1998). Large scale genome expressions have opened up a completely new a
venue to
unlimited possibilities. Bioinformatics for analysis of such expression data is still in its infancy.
There is a vast amount
of such expression data available or soon
-
to
-
become
-
available on the public internet (see, for example,
http://cmgm.stanford.edu/pbrown/)
. All bioinfromatic specialists are welcome to mine these data. Be aware that the
results may critically depend on the clustering quality. And any result would more likely be of statistical na
ture, which
cannot be a substitute for conventional single
-
gene dissections or follow
-
up experiments. Many fruitful experiments can
and should be designed which are based on the putative predictions made after a genome wide screening. It is very
encouragin
g that, recently, a novel wave of cyclin synthesis in late mitosis was identified after a putative match of a
Swi5
-
site in the promoter region (Aerne
et al
., 1998). This gene is called PCL9 (a homologue of PCL2 and was in our
M/G
1

cluster), it is associate
d with Pho85, is indeed regulated by Swi5

at the predicted sites and is the only cyclin
known to be expressed at M/G
1
.

We hope, by interacting more closely with our experimental colleagues, we shall be
able to develop better and more efficient computationa
l tools. Together, we can advance our knowledge of gene
expression and regulation to unprecedented speed and levels.

ACKNOWLEDGEMENT

This report is a summary of an invited talk given at a Kyoto conference on “Holistic Views of Biology” (sponsored by
Otsuka

America Inc.) in April, 1998. It is a bioinformatic illustration on what type of information may be obtained from

14

massive genome expression data. The detailed cell cycle co
-
regulated gene analysis with additional experiments
(including Cln3/Clb2 induction
s) are still currently in progress and the result will be published elsewhere (Spellman
et

al
.). This initial bioinformatic assessment would not have been possible without the help from our colleagues: G.
Sherlock and B. Futcher at CSHL did most of the yea
st biology experiments, P. Spellman, V. Iyer, K. Anders and M.
Eisen at D. Bostein and P. Brown labs in Stanford did arraying, imaging and data clustering. J. Zhu provided our yeast
promoter database (SCPD) support and Z. Ioschikhes helped with
GibbsDNA

mo
dification. The author would also like
to thank Dr. F. Roth for providing the preprint before publication. The author’s lab is supported by grants from
NIH/NIHGR, Merck genome Research Institute and Cold Spring Harbor Association.

REFERENCE

Aerne, B. L., J
ohnson, A. J., Toyn, J. H. and Johnston, L. H. (1998) “Swi5 control a novel wave of cyclin synthesis in
late mitosis.” Mol. Biol. Cell
9
: 945
-
956.

Althoefer, H., Schleiffer, A., Wassmann, K., Nordheim, A. and Ammerer, G. (1995) “Mcm1 is required to coordin
ate
G
2
-
specific transcription in Saccharomyces cerevisiae.” Mol. Cell. Biol.
15
: 5917
-
5928.

Amon, A., Tyers, M., Futcher, B. and Nasmyth, K. (1993) “Mechanisms that help the yeast cell cycle clock tick: G
2
cyclins transcriptionally activate G
2

cyclins and

repress G
1

cyclins.” Cell
74
: 993
-
1007.

Andrews, B.J. and Mason, S.W. (1993) “Gene Expression and the Cell Cycle: A Family Affair.” Science
261
: 1543
-
1544.

Andrews, B. J. and Herskowitz, I. (1989) “ The yeast Swi4 protein contains a motif present in devel
opmental regulators
and is part of a complex involved in cell
-
cycle
-
dependent transcription.” Nature
342
: 830
-
833.

Andrews, B. J. and Moore, L. (1992) “Mutational analysis of a DNA sequence involved in linking gene expression to
the cell cycle.” Biochem. C
ell Biol.
70
: 1073
-
1080.

Bender, A. and Sprague, G. J. (1987) Cell
50
: 681
-
691.

DeRisi, J. L., Iyer, V. R. and Brown, P. O. (1997) “Exploring the metabolic and genetic control of gene expression on a
genomic scale.” Science
278
: 680
-
686.

Breeden, L. (1988)

"Cell cycle
-
regulated promoters in budding yeast." Trends Genet.
4
: 249
-
253.


Breeden, L. and Nasmyth, K. (1987) “Cell cycle control of the yeast HO gene:
cis
-

and
trans
-
acting regulators.” Cell
48
: 389
-
397.

Bohm, S., Frishman, D., and Mewes, H. W. (1997)

“Variations of the C2H2 zinc finger motif in the yeast genome and
classification of yeast zinc finger proteins.” Nucl. Acid. Res.
25
: 2464
-
2469.


15

Dohrmann, P., Voth, W. P. and Stillman, D. J. (1996) “Role of negative regulation in promoter specificity of t
he
homologous transcriptional activators Ace2p and Swi5p.” Mol. Cell. Biol.
16
: 1746
-
1758.

Dohrmann, P. R., Butler, G., Tamai, K., Dorland, S., Greene, J. R., Thiele, D. J. and Stillman, D. J. (1992) “Parallel
pathways of gene regulation: homologous regula
tors SWI5 and ACE2 differentially control transcription of HO and
chitinase.” Genes & Dev.
6
: 93
-
104.

Epstein, C. B. and Cross, F. R. (1992) “CLB5: a novel B cyclin from budding yeast with a role in S phase.” Genes &
Dev.
6
: 1695
-
1706.

Fondrat, C. and Kalo
geropoulos, A. (1996) “Approaching the function of new genes by detection of their potential
upstream activation sequences in
Saccharomyces cerevisiae
: application to chromome III.” CABIOS
12
: 363
-
374.

Heinemeyer, T.
et al
. (1998) “Databases on Transcripti
onal Regulation: TRANSFAC, TRRD, and COMPEL.” Nucl.
Acid. Res.
26
: 364
-
370.

Hereford, L., Bromley, S., Osley, M. A. (1982) "
Periodic transcription of yeast histone genes."

Cell
30
: 305
-
310.


Hertz, G. Z.,, Hartzell, G. W. 3d, Stormo, G. D. (1990) “Identif
ication of consensus patterns in unaligned DNA
sequences known to be functionally related.” CABIOS
6
: 81
-
92.

Iyer, V. and Struhl, K. (1995) “Poly(dA:dT), a ubiquitous promoter element that stimulates transcription via intrinsic
DNA structure”. EMBO J.
14
:
2570
-
2579.

Koch, C., Moll, T., Neuberg, M., Ahorn, H. and Nasmyth, K. (1993) “A role for the transcription factors Mbp1 and
Swi4 in progress from G
1

to S phase.” Science
261
: 1551
-
1557.

Koch, C. and Nasmyth, K. (1994) “Cell cycle regulated transcription in

yeast.” Curr. Op. Cell Biol.
6
: 451
-
459.

Kovacech, B., Nasmyth, K. and Schuster, T. (1996) “EGT2 gene transcription is induced predominantly by Swi5 in
early G1.” Mol. Cell. Biol.
16
: 3264
-
3274.

Kuo, M.
-
H. and Grayhack, E. (1994) “A library of yeast genom
ic MCM1 binding sites contains genes involved in cell
cycle control, cell wall and membrane structure, and metabolism.” Mol. Cell. Biol.
14
: 348
-
359.

Lawrence, C. E., Altschul, S. F., Boguski, M. S., Liu, J. S., Neuwald, A. F. and Wootton, J. C. (1993) “De
tecting subtle
sequence signals: a Gibbs sampling strategy for multiple alignment.” Science
262
: 208
-
214.

Lydall, D., Ammerer, G. and Nasmyth, K. (1991) “A new role for MCM1 in yeast: cell cycle regulation of SWI5
transcription.” Genes & Dev.
5
: 2405
-
2419.

Lowndes, N. F., Johnson, A. L. and Johnston, L. H. (1991) “Coordination of expression of DNA synthesis genes in
Budding yeast by a cell
-
cycle regulated
trans
-

factor.” Nature
350
: 247
-
250.


16

McBride, H. J., Brazas, R. M., Yu, Y., Nasmyth, K. and Stillman, D
. J. (1997) “Long
-
range interactions at the HO
promoter.” Mol. Cell. Biol.
17
: 2669
-
2678.

McInerny, C. J., Partridge, J. F., Mikesell, G. E., Creemer, D. P. and Breeden, L. (1997) “A novel Mcm1
-
dependent
element in the SWI4, CLN3, CDC6, and CDC47 promoters

activates M/G
1
-
specific transcription.” Genes & Dev.
11
: 1277
-
1288.

McIntosh, E. M. (1993) “MCB elements and the regulation of DNA replication in yeast.” Curr. Genet.
24
: 185
-
192.

McIntosh, E. M., Atkinson, T., Storms, R. K. and Smith, M. (1991) “Characte
rization of a short,
cis
-
acting DNA
sequence which conveys vell cycle stage
-
dependent transcription
in Saccharomyces cerevisiae
."”Mol. Cell. Biol.
11
: 329
-
337.

Moll, T., Tebb, G., Surana, U., Robitsch, H. and Nasmyth, K. (1991) “The role of phosphorylation

and the CDC28
protein kinase in cell cycle
-
regulated nuclear import of the
S. cerevisiae

transcription factor SWI5.” Cell
66
: 743
-
758.

Nait
-
Kaoudjt, R., Williams, R., Guiard, B., and Gervais, M. (1997) “Some DNA targets of the yeast CYP1
transcriptional a
ctivator are functionally asymmetric
--
evidence of two half
-
sites with different affinities.” Eur. J.
Biochem.
244
: 301
-
309.

Nasmyth, K. (1993) “Control of the yeast cell cycle by the Cdc28 protein kinase.” Curr. Opin. Cell. Biol.
5
: 166
-
179.

Nasmyth, K. (1
985) “ A repetitive DNA sequence that confer cell
-
cycle START (CDC28)
-
dependent transcription of
the HO gene in yeast.” Cell 1985
42
: 225
-
235.

Nasmyth, K. and Dirick, L. (1991) “ The role of SWI4 and SWI6 in the activity of G
1
cyclins in yeast.” Cell
66
: 99
5
-
1013.

Nasmyth, K., Seddon, A. and Ammerer, G. (1987) “Cell cycle regulation of SWI5 is required for mother
-
cell
-
specific
HO transcription in yeast.” Cell
49
: 549
-
558.

Neuwald, A. F., Liu, J. S., and Lawrence, C. E. (1995) “Gibbs motif sampling: detection

of bacterial outer membrane
protein repeats.” Protein Sci
. 4
: 1618
-
1632.

Ogas, J. Andrews, B. J. and Herkowitz, I. (1991) “Transcriptional activation of CLN1, CLN2, and a putative new G
1

cyclin (HCS26) by SWI4, a positive regulator of G
1
-
specific transcri
ption.” Cell
66
: 1015
-
1026.

Osley, M. A. (1991) “The regulation of histone synthesis in the cell cycle.” Annu. Rev. Biochem.
60
: 827
-
861.

Osley, M. A., Gould, J., Kim, S. Y., Kane, M. and Hereford, L. (1986) Cell
45
: 537
-
544.

Passmore, S., Elble, R. and Ty
e, B. K. (1989) Genes & Dev.
3
: 921
-
935.


17

Price, C. Nasmyth, K. and Schuster, T. (1991) “A general approach to the isolation of cell cycle
-
regulated genes in the
budding yeast,
Saccharomyces cerevisiae
.” J. Mol. Biol.
218
: 543
-
556.

Reed, S. I. (1992) “The r
ole of p34 kinases in the G
1

to S
-
phase transition.” Annu. Rev. Cell. Biol.
8
: 529
-
561.

Roth, F.P., Hughes, J.D., Estep, P.W. and Church, G.M. "Finding DNA regulatory motifs within unaligned non
-
coding
sequences clustered by whole
-
genome mRNA quantitation"
, preprint.

Schwob, E. and Nasmyth, K. (1993) “CLB5 and CLB6, a new pair of B cyclins involved in DNA replication in
Saccharomyces cerevisiae
.” Genes & Dev.
7
: 1160
-
1175.

Sidorova, J. and Breeden, L. (1993) “Analysis of the SWI4/SWI6 protein complex, which

directs G
1
/S
-
specific
transcription in
Saccharomyces cerevisiae
.” Mol. Cell. Biol.
13
: 1069
-
1077.

Spellman, P.T., Sherlock, G., Zhang, M.Q., Iyer, V.R., Anders, K., Eisen, M.B., Brown, P.O., Botstein, D., and Futcher,
B., “Comprehensive identification of
cell cycle regulated genes of the yeast
Saccharomyces cerevisiae

by
microarray hybridization”, in progress.

Stein, L. (1998) “Genetic analysis on DNA microarrays.” Curr. Protocols in Hum. Genet., in press.

Stormo, G. D. (1990) “Consensus patterns in DNA.”
Methods Enzymol
183
: 211
-
221.

Taylor, I. A., Treiber, M. K. Olivi, L. and Smerdon, S. J. (1997) “The X
-
ray structure of the DNA
-
binding domain from
the
Saccharomyces cerevisiae

cell
-
cycle transcription factor Mbp1 at 2.1 Å resolution.” J. Mol. Biol.
272
: 1
-
8.

Tebb, G., Moll, T., Dowzer, C. and Nasmyyth, K. (1993) “SWI5 instability may be necessary but is not sufficient for
asymmetric HO expression in yeast.” Genes & Dev.
7
: 517
-
528.

Treisman, R. and Ammerer, G. (1992) “The SRF and MCM1 transcription factors
.” Curr. Opin. Genet. Dev.
2
: 221
-
226.

Van Helden, J., Andre, B. and Collado
-
Vides, J. (1998) "Extracting regulatory sites from the upstream region of yeast
genes by computational analysis of oligonucleotide frequencies". J. Mol. Biol.
281
:827
-
842

White, J
. H. M., Green, S. R., baker, D. G., Dumas, L. B. and Johnston, L. H. (1987) Exp. Cell. Res.
171
: 223
-
231.

Wynne, J. and treisman, R. (1992) “ SRF and MCM1 have related but distinct DNA binding specificities.” Nucl. Acid.
Res.
20
: 3297
-
3303.

Xu, R.
-
M., Koc
h, C., Liu, Y., Horton, J. R., Knapp, D., Nasmyth, K. and Cheng, X. (1997) “Crystal structure of the
DNA
-
binding domain of Mbp1, a transcription factor important in cell
-
cycle control of DNA synthesis.” Protein
Structure
5
: 349
-
358.

Zhang, M. Q. (1998) “ I
dentification of human gene core promoters
in silico
.” Genome Research
8
: 319
-
326.


18

Zhang, M. Q. "Large scale gene expression data analysis: a new challenge to computational biologists", submitted.

Zhu, J. and Zhang, M. Q. (1998) “A promoter database of ye
ast
Saccharomyces cerevisiae

(SCPD)”, presented at The
First International Conference on Bioinformatics of Genome Regulation and Structure,
(BGRS'98) Novosibirsk
-

Altai mountains, Russia August 24


31. To appear in
Bioinformatics
, (accepted).


19

TABLE CAPT
ION


Table 1a
. Pentamer relative information
P
i0

= log( P
i

/ P
0

) > 0.5
, where
P
i


is the pentamer frequency in

i
th cluster,
with
i = 1, 2, 3

corresponding to cluster M/G1 (yellow), G1/S (red), G2/M (blue), respectively and with
i = 0

corresponding to the
control of non
-
cell
-
cycle regulated genes. The color is determined by the cluster to which the
largest value belongs and a bold
-
face indicates the pentamer belonging only to one cluster under the current cut
-
off
(
0.5
).


Table 1b
. Pentamer relative informat
ion
P
ij

between consecutive clusters along the cell
-
cycle. Color and bold
-
face
have the same meaning as in
Table 1a
.


Tabel 2
. Swi5/Ace2 motif. Genes in the M/G
1

cluster (see
METHODS
) are shaded in yellow. Genes with published
elements are in bold. CTS1 (
underlined) elements were shown to be ACE2 binding sites. Elements found by multi
-
sequence alignment programs are indicated by “*”. The rest is either found by consensus or matrix search or from
publication. “+/
-
“ refer to forward/backward strand and the c
oordinates are relative to the ATG start site. Conserved
core is shaded by red and less conserved region by gray. Reference code in the last column: MBC=Mol. Biol. Cell and
MCB=Mol. Cell. Biol.


Tabel 3
. Histone motifs. As each pair of divergently transcri
bed histone genes shares a common promoter, the
intergenic region between each pair of the ATG start sites is shown. Mapped TATA
-
boxes are shown in red. Mapped
negative (NEG) elements and UAS1/UAS2 elements are shown in blue and dark green, respectively (O
sley, 1991).
Additional putative UAS1/UAS2 elements are found by alignment programs. Potential SCBs are underlined. A novel
repeat element AACAA(not T)A is indicated by a box.


Table 4
. MCB and SCB motifs. Similar notations (see
Table 2
.) are used here. Th
e red genes are from the G
1
/S cluster
(see
METHODS
). Additional reference codes: C=Cell, CG=Curr. Genet., G&D=Genes&Dev., JBC=J. Biol. Chem.,
N=Nature and PNAS=Proc. Nat’l. Acad. Sci. USA..


Table 5
. Mcm1 and SFF motifs
.
Similar notations (see
Table 2

and
4
) and the blue genes are from the G
2
/M cluster (see
METHODS
). Mcm1 motif is characterized by CCNNWWNNRG. The light
-
green elements are potential SFF binding
sites. The newly proposed ECB (early cell cycle box: McInerny
et al
., 1997) TTTCCNNWWNNGGAAA (or an

extended Mcm1 box) is also indicated in red. The dark
-
green is the Ste12 binding site, the purple is the MAT

2 binding
site and the pink is the MAT

1 binding site.


20


FIGURE CATION


Figure 1
. A model illustrating regulatory interactions determining cell cy
cle regulated transcription in yeast (Koch &
Nasmyth, 1994; McInerny
et al
., 1997). Cln3
-
associated kinase activates late G
1

specific transcription factors [SBF
(SCB binding factor) and MBF (MCB binding factor)] in a cell size dependent fashion. SBF and MB
F mediate the
expression of CLN1,2 and CLB5,6 as well as S phase proteins, leading to budding and S phase entry. By an unknown
mechanism, CLN1,2 activity allows accumulation of Clbs. Clb1 and Clb2 activate transcription of G
2

specific genes and
thereby aut
oactivate their own synthesis, possibly via the transcription factors Mcm1 and SFF. At the same time,
Clb1,2/Cdc28 represses SBF
-
mediated transcription. While Clb1,2/Cdc28 actives expression of SWI5 and possibly of
ACE2 RNAs via Mcm1/SFF, it keeps the gene

products in an inactive state by phosphorylation of the nuclear location
signals. Clb proteolysis at the end of mitosis dramatically changes the situation: Clb
-
mediated activation of G
2

specific
genes is stopped, and Swi5 loses its inhibitory phosphorylat
ions, leading to its uptake into the nucleus where it can
active the early G
1

specific transcripts. At late M phase, a Mcm1
-
related factor binds to ECB (early cell cycle control
box) and activates M/G
1

specific activation of CLN3, SWI4 and some DNA replica
tion genes, these genes products play
critical roles in promoting the initiation of S
-
phase.


Figure 2
. Arrayer: The basic system consists of three servo
-
motor powered linear rail tables (Daedal Series 500000)
mounted on an anti
-
vibration table (Newport) T
he system is controlled by a Galil DMC
-
1730 controller card (Galil
Motion Control, Sunnyvale CA). The DMC
-
1730 controller card communicates with the Compumotor amplifiers that
drive the motors (from Brown’s lab).


Figure 3
. Yeast genome microarray which co
ntains 6116 Yeast Genes 96 Intergenic regions + lots of control samples
(from Brown’s lab). Hybridization of the Cy3
-
dUTP
-
labeled cDNA (that is, mRNA expression in the control sample:
the initial time point sample for induction experiment or
asynchronous
cells for cell cycle experiment
) is presented as a
green signal in the fluorescent image, and hybridization of Cy5
-
dUTP
-
labeled cDNA (that is, mRNA expression in the
target samples: the final time point sample for induction experiment or consecutive time p
oint sample of synchronized
cells for cell cycle experiment) is represented as a red signal. Thus, genes activated or repressed after cell cycle release
appear as red and green spots, respectively. Genes expressed at roughly equal levels in this comparison

appear as yellow
spots.


Figure 4
.
M/G
1

and Histone clusters (from Brown’s lab). Relative gene expression variation may be monitored by
taking various mRNA samples at consecutive time points of cell cycle after G
1

release by either

-
factor blocking or
el
utriation experiments. Whole genome expression patterns, as represented by the digitized spot image variations, are
clustered according to the degree of similarity (using, for instance, a peak correlation distance measure). In these two
examples, a M/G
1

cl
uster (24 genes) and a S
-
phase cluster (8 histone genes) can be clearly identified.


Figure 5
. Comparison of motif distributions in different clusters. The consensus for each indicated motif is used. The
position relative to ATG start site is marked along
x
-
axis and the motif count per sequence in each cluster is calculated
along y
-
axis. (
a
) Swi5:(A/G)CCAGC and its reverse complement, A6/T6: (A)
6

and (T)
6
; (
b
) SCB: C(A/G)CGAAA,
MCB: ACGCGT; (
c
) Mcm1: CC(C/T)(A/T)
3
N
2
(G/A)G, Sff: GT(C/A)AACA(A/T).






21





P
10

> 0.5 (M/G1 vs. Control)



gctgg

ggccg

ccagc

cactc

cgcgc

catac

tggct

ccccc

cattc

ctggc

agcgt

gtgtc

cgtcg


1.262 1.097 1.071 0.880 0.862 0.838 0.830 0.799 0.797 0.771 0.766 0.760 0.754


ggctt

tggcc

ttcgg

tcccc

ccgac

acact

actcc

gcgcc

cggcg

cccct

gtgct

ca
gca

ggacg


0.750 0.743 0.729 0.720 0.711 0.711 0.710 0.689 0.670 0.654 0.629 0.625 0.619


gcata

ttggc

ggctg

ctccc

agggc

actac

accat

ttttt

cttgc

cgccg

ccgcg

ggctc

ccatt


0.611 0.593 0.573 0.546 0.546 0.546 0.520 0.516 0.516 0.516 0.510 0.506 0.505


tagca


0
.501


P
20

> 0.5 (G1/S vs. Control)



acgcg

cgcgt

gacgc

gcgtc

cgcga

tcgcg

aacgc

ctcga

gcgta

gtagg

gcgtt

gcgtg

agacg


2.355 2.306 1.389 1.230 1.224 1.185 1.155 0.956 0.930 0.868 0.831 0.775 0.738


cgtgg

cgaaa

cgacg

caacg

agcgt

tagac

acgac

cgtag

cgtca

cctcg

taggc

aagcg

cacga


0.736 0.733 0.730 0.729 0.726 0.714 0.714 0.706 0.701 0.695 0.655 0.647 0.636


gacac

ccaac

gcgaa

cgcgc

aaacg

gccct

gcgac

cgaca

cagac

gtgac

acgaa

aggtc

atcgc


0.633 0.632 0.619 0.615 0.608 0.587 0.579 0.576 0.562 0.537 0.537 0.534 0.521


aggcg

tttcg

gtcgc


0.520 0.515 0.510


P
30

> 0.5 (G2/M vs. Control)



ctagg

aggtc

cgtgt

taggt

cctac

tgcta

gaacg

gctac

cagag

gtggc

gtaaa

ctact

cctag


0.9 0.861 0.796 0.716 0.703 0.655 0.642 0.612 0.603 0.596 0.593 0.587 0.582


tcgtc

ggtct

taaac

ttagg

tcct
a

ctaga

tagac

agtgc

catac

gctag

ctaat

ccctc

acggg


0.573 0.573 0.571 0.562 0.55 0.55 0.549 0.545 0.542 0.541 0.532 0.515 0.509


gcgag

ccaga


0.509 0.501





Table 1a
. Pentamer relative information
P
i0

= log( P
i

/ P
0

) > 0.5
, where
P
i


is the pentamer fre
quency in

i
th cluster,
with
i = 1, 2, 3

corresponding to cluster M/G1 (
yellow
), G1/S (
red
), G2/M (
blue
), respectively and with
i = 0

corresponding to the control of non
-
cell
-
cycle regulated genes. The color is determined by the cluster to which the
largest

value belongs and a bold
-
face indicates the pentamer only belonging to one cluster under the current cut
-
off
(
0.5
).


22



P
13

> 1.0 (M/G1 vs. G2/M)


gctgg

ggccg

gctcc
cgtcg

cgcgc

acccg
ggctt

cccgc atggc gtggg
ggacg

gcgcg cgcgg


2.48 2.235 2.117 2.117 1.829
1.829 1.696 1.647 1.542 1.424 1.424 1.424 1.424


cgacc ccccg
ggctg

ccagc

aacca
gcgtc

ctggc

ccgac

tggct

tgggc tgcgc ggccc gcgct


1.424 1.424 1.424 1.373 1.311 1.29 1.29 1.29 1.183 1.136 1.136 1.136 1.136


gacgg gaccc ctatc ccgtg cccgg
cagca

agctc actgt
t
ggcc

ttggc

tcccc

ccaac

gtcac


1.136 1.136 1.136 1.136 1.136 1.136 1.136 1.136 1.067 1.049 1.049 1.049 1.018


gcgcc

cagcg gtcgg gggcg gcctg gccgg ctcca ccgtc ccgga
ccgcg

ccgag cccgt accaa


1.018 1.018


P
12
>1.0 (M/G1 vs. G1/S)


ccggg
ccagc

ccccg ctggg gccag

ccgag
tggct

gctgg

tcggt
gtgct

gggat
ggctt

gcccc


2.039 1.4 1.346 1.241 1.164 1.164 1.105 1.08 1.058 1.058 1.058 1.058 1.058


cactc

gcttg


1.058 1.027


P
21

> 1.0

(G1/S vs. M/G1)


cgcgt

acgcg

aggcg

gacgc

gtgta
aacgc

caccc acgcc
cgcga

gaacg

tcgcg

gtggt ga
ggt


2.748 2.392 2.238 1.775 1.599 1.588 1.544 1.544 1.507 1.487 1.468 1.468 1.427


gcgac

gcacc ctgag cctgg aggtg
gcgaa

gcgtt

ggaag gtgcc gatga gatac cgtct
agacg


1.362 1.362 1.362 1.362 1.362 1.293 1.264 1.234 1.219 1.219 1.219 1.193 1.18


acaca tcctc gg
tga ctgga accac tggaa
cctcg

agaca aacag ccacc gtgaa
ggtct

gcgtg


1.18 1.172 1.139 1.139 1.139 1.096 1.082 1.082 1.082 1.052 1.034 1.021 1.021


gatcc ctcag atggg


1.021 1.021 1.021


P
23

> 1.0

(G1/S vs. G2/M)


acgcg

cgcgt

gcgtc

cgcga

tcgcg

agacg

gtggg
gacgc

gctcc
ggccg

cgcgc

ccatg
ccaac


2.835 2.786 2.059 1.95 1.911 1.911 1.87 1.813 1.687 1.582 1.582 1.582 1.525


gtatc
gtagg

gcgcg
cgtcg

cccgg
aacgc

gctgg

gcgct atggc
caacg

cagac

gacgg accgc


1.464 1.464 1.464 1.464 1.464 1.443 1.4 1.331 1.331 1.302 1.282
1.257 1.257


cgcag
gcgta

cgacc acccg gtcac gcgaa tatgg
ggacg

gacac

accaa
cacga


1.231 1.209 1.177 1.177 1.12 1.108 1.09 1.09 1.09 1.059 1.043


P
32

> 1.0

(G2/M vs. G1/S)


ccggg cgggt gccag accct cgggc gcccc ctcgg


2.225 1.462 1.221 1.154 1.126 1.021 1.
021



P
31

> 1.0

(G2/M vs. M/G1)


gaacg

aggcg

gtgta gtggt caccc ctgag
taggt

tgcga gtgcc
cctag

aggtg agccg acccc


1.977 1.754 1.667 1.572 1.572 1.466 1.409 1.349 1.349 1.349 1.349 1.349 1.349


gtgaa acaca gtagc
cgtgt

ctaga

ggtga
ggtct

agggt gatcc cgtac aaca
g ggatg
gcgac


1.297 1.284 1.262 1.262 1.186 1.166 1.166 1.141 1.061 1.061 1.061 1.061 1.061


gatga gaggt ccacc acgcc


1.061 1.061 1.061 1.061



Table 1b
. Pentamer relative information
P
ij

between consecutive clusters along the cell
-
cycle. Color and bold
-
face
have the same meaning as in Table 1a.





23




ASH1
-
1 * +
-
466 G
AG
CCAGC
A

BUD9
-
1 * +
-
496 T
A
C
CCAGC
C

BUD9
-
2 +
-
171 TCT
CCAGC
T

CTS1
-
1

-

-
567 TC
A
CCAGC
G MCB17:2669(97)

CTS1
-
2

-

-
547 GG
A
CCAGC
A MCB17:2669(97)

CTS1
-
3

+
-
528 T
AA
C
CAGC
C MCB17:2669(97)

EGT2
-
1

*
-

-
386 G
AA
CCAGC
A MCB16:3264(96)

EGT2
-
2

*
-

-
335 G
AG
CCAGC
A MCB16:3264(96)

EGT2
-
3

*
-

-
304 G
AG
CCAGC
G MCB16:3264(96)

EGT2
-
4

-

-
273 TT
G
T
CAGC
C MCB16:3264(96)

EGT2
-
5

-

-
241 GT
G
T
CAGC
C MCB16:3264(96)

EGT2
-
6

*
-

-
198 A
AA
CCAGC
A MCB16:3264(96)

FAA3
-
1 * +
-
478 AT
A
CCAGC
A

PCL9
-
1

* +
-
327 A
AA
CCAGC
G MBC9:945(98)

PCL9
-
2

+
-
283 A
AA
CCAGC
T MBC9:945(98)

PIR1
-
1 * +
-
432 CG
G
CCAGC
T

PIR1
-
2 *
-

-
210 AT
A
CCAGC
G

RME1
-
1 * +
-
337 TT
A
CCAGC
A

RME1
-
2 *
-

-
286 A
AG
CCAGC
A

SIC1
-
1 * +
-
169 T
AG
CCAGC
A

SIC1
-
2 *
-

-
144 A
AG
CCAGC
C

YBR158W
-
1 *
-

-
468 AC
A
CCAGC
A

YDR055W
-
1
-

-
362 ATC
CCAGC
T

YDR055W
-
2 *
-

-
225 A
A
C
CCAGC
C

YER124C
-
1 *
-

-
447 AG
A
CCAGC
C

YER124C
-
2 * +
-
438 GC
G
CCAGC
A

YER124C
-
3 * +
-
400 A
AA
CCAGC
A

Y
ER124C
-
4 *
-

-
254 A
AA
CCAGC
A

YER124C
-
5
-

-
199 C
A
C
CCAGC
T

YHR143W
-
1 * +
-
423 G
AA
CCAGC
A

YHR143W
-
2 * +
-
264 A
AA
CCAGC
A

YNL046W
-
1 *
-

-
410 T
A
C
CCAGC
C

YNL046W
-
2 *
-

-
151 ACT
CCAGC
A

YNL046W
-
3 *
-

-
100 G
AG
CCAGC
A

YNL078W
-
1 *
-

-
355 A
AA
CCAGC
C

YNR067C
-
1 *
-

-
40
0 AT
G
CCAGC
A

YNR067C
-
2 * +
-
285 A
AG
CCAGC
A

YNR067C
-
3
-

-
194 T
A
C
CCAGC
T

YOR264W
-
1 *
-

-
415 CG
G
CCAGC
A

YOR264W
-
2 * +
-
343 A
AG
CCAGC
G

YOR264W
-
3 * +
-
317 TG
A
CCAGC
C

YOR264W
-
4 * +
-
265 T
AA
CCAGC
A

YPL158C
-
1 *
-

-
300 TC
G
CCAGC
C

YPL158C
-
2 * +
-
275 GCT
CCAGC
C

YPL
158C
-
3 *
-

-
175 AGC
CCAGC
A

------------------------------------------

HO
-
1

-

-
1818 CT
G
CCA
CGC MCB16:3264(96)

HO
-
2

+
-
1310 A
AA
CCAGC
A MCB16:3264(96)

SIC1
-
1

+
-
169 A
AG
CCAGC
A MBC9:945(98)

SIC1
-
2

-

-
144 A
AG
CCAGC
C MBC9:945(98)

PCL2
-
1


+
-
489 ATT
CCAGC
T MBC9:945(98)

PCL2
-
2

+
-
449 TG
G
CCAGC
T MBC9:945(98)



Tabel 2
. Swi5/Ace2 motif.



>Z48612 between
HTB1
|spt12/YDR224C c(16802..17197) and
HTA1
|spt11/YDR225W (18015..18413)
(reversed)

tttatattttatatgtatgaaatttgtttgttttgaagttgtttattca
ctgagaaataaccaaatccgtatgatgatgtagtatcaagaagaga
agtacagattggaagtaaatagatgatggttc
aacaaga
ccagaaaatctacaagctgattaggagtctta
tttatata
ttttttaggtcaagac
ttattgctagtatttacgatccactggctggcttcgtgaacggggaagggggtgagaaaagattttgaaatcaacaaagtgggcaataacaaata
acagcatgagaaaccacata
tctctacgggcgtttcttcaacaacgacgagttaactattgtgctctttttttgagccaccaaatacactcc
att
ccaatagcttcgc
acagtgag
gcgaaaattttggaac
agcgctaatgaattatttgtgagctcggcgagttcaaatttgaagaaaacgcggttgg
gtcgttaactatggt
tagacgctcaatgtc
gcccgaaagggaaggct
gttctcactt
tttcgc
g
cgttgcaccctttcttc
c
gcga
aa
aaatgag
aac
gatggatttaaaatcaagagaattggccttagtagtggcaaatactaccttggttggttatcttgtaacgattggtaagaaaggggcatctc
tgttttcttgatg
tatata
aa
caaca
tgatttgatcatctcagatggtcagatttattaaagacgtttctctttccgcattttcgat
tattgtt
atattaaatttatcctatatagacaagtcaaaccacaaataaaccatacacacataca

>Z26494 between
HTA2
/YBL003C c(4120..4518) and
HTB2
/YBL002W (5218..5613)

tatatattaaatttgctcttgttctgtactttcctaattcttatgtaaaaagacaagaatttatgatactatttaat
aacaaaa
aactacctaag
aaaagcatcatgcagtcgaaattgaaatcgaaaagtaaaactttaacggaacatgtttgaaattctaagaaagcatacatcttcat
cc
cttatat
a
tagagttatgtttgatattagtagtcatgttgtaatctctggcctaagtatacgta
acgaaaatggtagcac
gtcgcgtttatggcccccaggt
taatgt
gttctctgaaattcgc
atcacttt
gagaaataatgggaac
accttacgcgtgagct
gtgcccaccgcttcgc
ctaataaagcggt
gttc
tcaaaatttctc
cccgttttcaggatcacgagcgccatcta
gttctggtaaaatcgc
gcttacaagaacaaagaaaagaaacatcgcgtaatgca
acagtgagacacttgccgtca
tatataag
gttttggatcagtaaccgttatttgagcataacacaggtttttaaatatattattatatatcatgg
tatatgtgtaaaatttttttgctgactgg
ttttgtt
tatttatttagctttttaaaaattttactttct
tcttgtt
aattttttctgattgctc
tatactcaaacc
aacaaca
acttactcta
caacta

>chromosomeII between
HHF1
/YBR009C c(255638..255327) and
HHT1
|BUR5|SIN2/YBR010W
(256285..256695) (reversed)

tgtttgcgtttatatatttatgttagatgtttttcttattaactagaaagaaagaatataaaaggttgaggaaagagatgtatcccgaagaatac
acagtct
tttatata
tgtatttcaacaaggagccgtggagggta
ccaaaaagaaaaatcgcccgggcatttcgttatct
tccacgctaaaagtc
a
aggagagatattacggccaggatcgcaaaggtgcagagcaaggaaatgt
gagaaattgtgagaac
gataatgtatgggacaat
gcgaaaatgtga
gaac
gagagcaaaaatcttttttgtatctccccgccgaatttggaaaccgc
gttctgaaaacttcgc
atcttcacatagtaaaact
gttccgagc
gcttctc
cccataat
ggttagtggtaaaaaccgaagttgtttactttagcaaatgccc
gcgaatacggtggtaa
attgccacccccccttcccca
ttcattgggtaaagaccaatttgatggataaattggttgtggaaaaggtctaattctttttcc
tataaata
ccgagatattttttctatatgatg
gtttccgtcgcattattgtactctatagtactaaagcaaca
aacaaaa
acaagcaacaaatataatatagtaaaat

>ch
romosomeXIV between
HHT2
/YNL031C c(576048..575638) and
HHF2
/YNL030W (576725..577036)

tgtggagtgtttgcttggatcctttagtaaaaggggaagaacagttggaagggccaaagtggaagtcacaaaacagtggtcctatataaaag
aac
aaga
aaaagatta
tttatata
caactgcggtcacaagaagcaacgcgagagagcacaacacgctgttatcacgcaa
actatgttttgacaccgag
ccatagccgtgattgtgcgtcacattgggcgataa
tgaacgctaaatgac
caactcccatccgtaggagccccttagggc
gtgccaatagtttca
c
gcgcttaat
gcgaagtgctcggaac
ggacaactgtggtcgtttggcaccgggaaagtggtactagaccgagagtttcgcatttgtatggcagga
c
gttctgggagcttcgc
gtctcaagctttttcggg
cgcgaaa
tgcag
accagaccag
aacaaaa
caactgacaagaaggcgtttaatttaata
t
gttgtt
cactcgcgcctgggc
tgttgtt
attcggctagatacatacgtgtttgtgcgtatgtagttatatca
tatataag
tatattaggatgag
gcggtgaaagagattttttttttttcgcttaatttattcttttctctatcttttttcctaca
tcttgtt
caaaagagtagcaaa
aacaaca
atc

24

aatacaataaaata


Tabe
l 3
. Histone motifs.






MCB

(Mbp1
-
Swi6)

AXL2
-
1 *
-
384 aag
ACGCG
Aaaa

AXL2
-
2 *
-
365 aca
ACGCGT
cat

AXL2
-
3 *
-
341 aatT
CGCGT
cac

CDC45
-
1 *
-
180 acg
ACGCGT
att

CDC45
-
2 *
-
150 cta
ACGCGT
ttt

CDC9
-
1

*
-
133 tta
ACGCG
Aaaa N350:247(91)

CDC9
-
2

*
-
1
25 aaa
ACGCGT
gaa N350:247(91)

CDC9
-
3

-
92 gcc
A
T
GCGT
ttg CG21:183(92)

CLB5
-
1

*
-
389 aag
ACGCG
Ccct G&D7:1160(93)

CLB6
-
1

*
-
403 ttt
ACGCGT
acc G&D7:1160(93)

CLB6
-
2

*
-
377 cca
ACGCGT
att G&D7:1160(93)

CLN1
-
1

-
606 gag
ACGCGT
tca JBC272:9071

CLN1
-
2


-
588 aatT
CGCG
Attt JBC272:9071

CLN1
-
3

-
556 cgaC
CGCGT
tag JBC272:9071

CSI2
-
1 *
-
393 tttT
CGCGT
ttt

CTF18
-
1 *
-
105 cagT
CGCGT
tgt

DPB2
-
1 *
-
407 caa
ACGCGT
gtt

DPB2
-
2 *
-
125 gtg
ACGCGT
tat

GIN4
-
1 *
-
310 gaa
ACGCGT
caa

HCM1
-
1 *
-
399 gga
ACGCG
Aaa
a

HCM1
-
2 *
-
380 cag
ACGCG
Agaa

HCM1
-
3 *
-
367 gcg
ACGCG
Aaaa

HCM1
-
4 *
-
317 ata
ACGCGT
taa

HCM1
-
5 *
-
269 aaa
ACGCGT
cct

MSH6
-
1 *
-
196 taa
ACGCGT
gag

MSH6
-
2 *
-
176 gat
ACGCGT
ctc

POL12
-
1 *
-
223 tag
ACGCGT
aat

POL12
-
2 *
-
199 gtg
ACGCGT
ctc

POL30
-
1

*
-
195 gaa
ACGCGT
aac CG21:183(92)

PRI2
-
1

*
-
165 att
ACGCGT
cgc CG21:183(92)

PRI2
-
2

*
-
150 gaaT
CGCGT
aaa CG21:183(92)

RAD27
-
1 *
-
180 cta
ACGCGT
tta

RAD27
-
2 *
-
128 gcg
ACGCGT
aac

RAD51
-
1 *
-
201 gct
ACGCGT
cat

RAD51
-
2 *
-
160 agt
ACGCGT
ggt

RAD53
-
1

*
-
285 tgg
ACGCGT
tga MCB13:5829(93)

RAD53
-
2 *
-
260 gtg
ACGCGT
aaa

RFA1
-
1 *
-
165 gtc
ACGCGT
aaa

RFA1
-
2 *
-
135 aag
ACGCGT
gaa

RFA2
-
1 *
-
120 gaa
ACGCGT
tag

RFA2
-
2 *
-
108 gaa
ACGCGT
tct

RHC21
-
1 *
-
372 caa
ACGCGT
tta

RHC21
-
2 *
-
334 tttT
CGCGT
ttg

RHC21
-
3 *
-
292 ggg
ACGCGT
cga

RHC21
-
4 *
-
278 aaaT
CGCGT
ctt

RNR1
-
1 *
-
492 ttt
ACGCGT
ttt

RNR1
-
2 *
-
442 aaa
ACGCGT
aaa

RNR1
-
3 *
-
371 taa
ACGCGT
cat

RNR1
-
4 *
-
306 agg
ACGCGT
aaa

RNR3
-
1
-
547 cgc
ACGCGT
aaa

RNR3
-
2 *
-
190 ctg
ACGCGT
ttc

RSR1
-
1 *
-
29
3 aatT
CGCGT
caa

RSR1
-
2 *
-
258 caa
ACGCG
Aaat

SMC3
-
1 *
-
117 gcg
ACGCGT
tag

SPT21
-
1 *
-
271 cggT
CGCGT
ttt

SPT21
-
2 *
-
229 gcgT
CGCGT
tag

SPT21
-
3 *
-
234 aaa
ACGCGT
cgc

SWI4
-
1

-
508 gtg
ACGCGT
cac MCB13:3792(93)

SWI4
-
2

*
-
491 atg
ACGCG
Aaag MCB13:3792(93)

TMP1
-
1

*
-
158 gtg
ACGCGT
taa PNAS88:7155(91)

TMP1
-
2

*
-
121 ttg
ACGCGT
ttc PNAS88:7155(91)

YGR151C
-
1 *
-
182 cgaT
CGCGT
tcc

YLR183C
-
1 *
-
228 aaa
ACGCG
Aaaa

YNL300W
-
1 *
-
401 gtg
ACGCG
Aaaa

YNL339C
-
1 *
-
252 ctg
ACGCG
Ccat

YOX1
-
1 *
-
497 aaa
ACGCGT
aaa

YOX1
-
2 *

-
436 gag
ACGCG
Acgc

YOX1
-
3 *
-
231 caa
ACGCG
Aaca

YPL267W
-
1 *
-
126 ttg
ACGCGT
ctt

----------------------------------------------


CDC6
-
1

-
216
ACGCG
Aggc CG21:183(92)

CDC6
-
2

-
204
ACGCGT
cgg CG21:183(92)

CDC8
-
1

-
109
ACGCGT
tag CG21:183(9
2)

CDC8
-
2

-
53
ACGC
T
T
cta CG21:183(92)

POL1
-
1

-
208
ACGCGT
taa CG21:183(92)

POL2
-
1

-
115
ACGCGT
aag CG21:183(92)

POL3
-
1

-
165
ACGCGT
aac CG21:183(92)

PRI1
-
1

-
207
ACGTGT
gaa CG21:183(92)

PRI1
-
2

-
196
ATGCGT
gag CG21:183
(92)

PRI1
-
3

-
63
AAGCGT
gcc CG21:183(92)




SCB

(Swi4
-
Swi6)


25

CLN1
-
1

+
-
452 aa
C
T
CGAAA

JBC272:9071

CLN1
-
2

+
-
435 ga
C
T
CGAAA

JBC272:9071

CLN2
-
1

+
-
609 at
C
G
CGAAA

C66:995(91)

CLN2
-
2

+
-
584 ta
C
A
CGAAA

C66:995(91)

CLN2
-
3

+
-
541 gt
C
A
CGAAA

C66:995(91)

DPB2
-
1 *
-

-
323 ct
C
G
CGAAA


DPB2
-
2 *
-

-
115 ca
C
G
CGAAA


GIN4
-
1 * +
-
251 at
C
G
CGAAA

HCM1
-
1 * +
-
400 aa
C
G
CGAAA


HCM1
-
2 * +
-
368 ga
C
G
CGAAA


MNN1
-
1 * +
-
499 at
C
G
CGAAA

MSH2
-
1 * +
-
151 aa
C
G
CGAAA

POL30
-
1 * +
-
203 aa
C
G
CGAAA

PRI2
-
2 * +
-
364 tt
C
G
CGAAA


PRI2
-
2 *
-

-
377 gc
C
G
CGAAA


RAD27
-
1 * +
-
439 tt
C
A
CGAAA

RHC21
-
1 *
-

-
336 aa
C
G
CGAAA

RNR1
-
1 *
-

-
242 aa
C
A
CGAAA

RNR3
-
1 *
-

-
327 ac
C
A
CGAAA


RNR3
-
2 *
-

-
175 ga
C
A
CGAAA


RNR3
-
3 *
-

-
150 aa
C
A
CGAAA


RSR1
-
1 * +
-
259
aa
C
G
CGAAA


SPT21
-
1 *
-

-
261 at
C
A
CGAAA

SVS1
-
1 *
-

-
229 aa
C
A
CGAAA


SVS1
-
2 *
-

-
211 aa
C
A
CGAAA


SWI4
-
1 * +
-
492 ga
C
G
CGAAA

YHR149C
-
1 *
-

-
289 tt
C
G
CGAAA

YLR183C
-
1 * +
-
229 aa
C
G
CGAAA

YLR183C
-
2 *
-

-
173 aa
C
A
CGAAA

YNL300W
-
1 * +
-
432 gt
C
A
CGAAA

YNL300W
-
2 *

+
-
346 ga
C
A
CGAAA

YNL300W
-
3 * +
-
346 ga
C
A
CGAAA

YPL267W
-
1 *
-

-
143 ta
C
G
CGAAA

YPR203W
-
1 * +
-
41 gt
C
A
CGAAA

----------------------------------------------

HO
-
1

+
-
600 gt
C
A
CGAAA

MCB13:1069(93)

HO
-
2

+
-
466 tt
C
A
CGAAA

N342:830(89)

HO
-
3

+
-
439

tc
C
A
CGAAA

N342:830(89)

HCS26
-
1

-

-
327 aa
C
A
CGAAA

C66:1015(91)





Table 4
. MCB and SCB motifs.


26

ACE2
-
1

*
-
347 aatgtaaaca
TT
GG
C
AC
TT
TG
GG
A
AA

atttcaggac

ALK1
-
1

-
448 aatggtggcc
AA
G
CC
AC
T
GACA
G
A
GT
GC
GTCAACAA
a

ALK1
-
2

*
-
106 cggatcgtct
TT
G
CC
C
T
TT
TT
GG
T
AA

aac
GTAAACAA


BUD3
-
1

-
634 ccaatgactt
AA
A
CC
TT
AA
CT
GG
T
GA

ttttgaaccg

BUD4
-
1

-
584 aacgaataga
T
GA
CC
CG
AT
TT
GG
A
AA

aag
GTAAACAA

CDC20
-
1

*
-
314 agcaatttga
TT
G
CC
GA
AA
GA
GG
C
AA

aac
GTAAATAG


CDC47
-
1


*
-
229 cttaactaat
TT
A
CC
CA
GA
AA
GG
A
AA

tttcctt
ata G&D11:1277(97)

CDC47
-
2


*
-
213 agaaaggaaa
TTTCC
TT
A
T
AA
GGAAA

ataaatgcaa G&D11:1277(97)

CLB1
-
1

-
619
TTGTTTAC
aa CCG
CC
CA
AA
GA
GG
A
AA

aac
ATCAACAA

CLB2
-
1

-
690 gtaaatatag CGA
CC
GA
AT
CA
GG
A
AA

ag
GTCAACAA

MCB14:348(94)

CLB2
-
2

-
572 ttcag
aaatt
TT
G
CT
CT
TA
AT
GG
A
AA

atataacctc MCB14:348(94)

CLB2
-
3

-
543 atggaaaata
TA
A
CC
TC
TT
TG
GG
G
AA

aagagaaata MCB14:348(94)

CYK2
-
1

*
-
300 agagcaccga
TT
G
CC
CC
AT
CC
GG
A
AA

gtactatttc

CYK2
-
2

*
-
278 gaaagtacta
TT
T
CC
CT
TT
TG
GG
T
AA

cagcggaccg

CYK2
-
3

*
-
219 aggtatatga
TT
T
CC
TC
TT
TG
GG
C
AA

gtt
GTAAACAA


DBF2
-
1

*
-
370 accaattggt
TT
T
CC
GG
TC
AT
GG
T
TA

gggctcttct

DBF2
-
2

*
-
249 gcaacccaga
TG
C
CC
TT
TT
TA
GG
A
AA

tgtaattatt

HST3
-
1

*
-
167 atgtttgctg
TT
A
CC
AC
AA
AG
GG
T
AA

aac
GTCAATAA


KIN3
-
1

*
-
138 tttcattacg
TT
T
CC
TA
AT
TA
GG
T
TA

aac
GTAAATAA


MYO1
-
1

*
-
191 tttcatcatt
TA
G
CC
CA
AA
AG
GG
T
AA

ttgc
GTAAACAT


PHO3
-
1

*
-
358 tctgcagaga
TA
T
CC
GA
AA
CA
GG
T
AA

atggatgttt

PHO3
-
2

*
-
237 ttaagtgcat
AT
G
CC
GT
AT
AA
GG
G
AA

actcaaagaa

PHO3
-
3

*
-
144 ttactaaata
AT
A
CC
AG
TT
TG
GG
A
AA

ta
GTAAACAG


SWI5
-
1


*
-
320 tttcgtactt
TA
A
CC
TG
TT
TA
GG
A
AA

aag
GTAAACAA

MCB14:348(94)

YGR138C
-
1

-
348 cattgcgcat
A
CA
CC
CT
TT
TGA
G
G
TT

tcgtactagg

YIL158W
-
1

-
184 ttcgcaatcg C
T
T
CT
CA
AA
AG
GG
A
AA

tattttccct

YIL158W
-
2

-
165 agggaaatat
TT
T
CC
CT
TT
TC
GG
GCG gtg
gtcgtgt

YLR190W
-
1

-
570 gggttaattt
TG
T
CC
CA
AA
CG
GG
C
AA

aatataaata

YLR190W
-
2

-
546 aaaatataaa
TA
C
CC
CT
TT
CG
GG
A
AA

taaactaaaa

YML034W
-
1

-
221 gccctcaaaa
TT
A
CT
GT
TT
TA
GG
A
A
G ccccctttgt

YML119W
-
1

*
-
195 tttaatatgc
TTTCC
AG
AT
TA
GGAAA

gaacataaat

YNL057W
-
1

-
5
79 tgtcttttat G
A
G
CC
TT
TT
TA
GG
AG
A

gctagtattt

YNL057W
-
2

-
251 ctgtgaacgg
AT
C
CT
GA
AT
TG
GG
T
T
G aatggtgagg

YNL058C
-
1

*
-
207 cttaatatga
TT
T
CC
TA
AA
GC
GG
G
AA

ata
GTAAACAT

YPL141C
-
1

-
427 caaagccgca C
T
T
CC
TA
AA
AAA
G
C
AA

tt
GAAAACAA

YPR156C
-
1

-
585 ctcactgatt
T
CA
CC
CA
AA
CG
GG
A
AA

aagg
AAAAACAA

SWI4


-
453
TTTCC
CG
TT
TA
GGAAA

G&D11:1277(97)

CDC46

-
154
TTTCC
CT
TT
TA
GGAAA

G&D11:1277(97)

GFA1

-
221
TT
T
CC
CA
AA
GA
GG
A
A
G MCB24:348(94)

STE
2


-
221
TT
T
CC
TA
AT
TG
GG
T
AA
gtaca
tgaTGAAACa

G&D11:1277(97)

FAR1
-
5

+39
TT
G
CC
TC
TT
TT
GG
AC
A

MCB14:348(94)

FAR1
-
6

+60 AGG
CC
AAG
A
TTT
G
G
A
G MCB14:348(94)

CLN3
-
1

-
971
TTTCC
CA
AA
TT
GGAAA

G&D11:1277(97)

CLN3
-
2

-
680
TT
A
CC
CG
TT
TA
GG
A
AA

G&D11:1277(97)

CDC6
-
1

TTTCC
AG
AT
CA
GGAAA

G&D11:1277(97)

CDC6
-
2

TT
A
CC
CA
CT
TA
GG
A
AA


G&D11:1277(97)

STE6
-
1

-
305 tgcc
ATGTAA

TT
A
CC
TA
AT
AG
GG
A
AA

TTACAC
gctg G&D11:1277(97)

STE3
-
1

TT
T
CC
TA
AT
TA
GT
G
T
C
AATGACA

G&D11:1277(97)

DIT1
-
1

-
37
TA
T
CC
TA
AT
TC
GG
T
AA

MCB14:348(94)

PIS1
-
1

-
181
TTTC
C
CT
AT
TGA
GAAA

MCB14:348(94)

PIS1
-
2

-
162
TT
T
CC
GT
AA
TA
GG
G
AT

MCB14:348(94)

PMA1
-
1

-
709
TT
T
CC
TA
AT
GC
GG
C
A
C MCB14:348(94)

MF

ㅁ1

C
T
T
CC
TA
AT
TA
GG
CC
A

MCB14:348(94)

MF

ㅂ1

TT
T
CC
TA
AT
TA
G
TCC
T

MCB14:348(94)

MF

ㅃ1

AT
T
CC
TA
AT
TC
GG
A
AA

MCB14:348(94)

MF

㈠2

T
CTA
C
CA
AT
GAA
G
A
AA


MCB14:348(94)

BAR1

TT
T
CC
TT
TT
AC
GG
T
AA

MCB14:348(94)

MFa1

TT
A
CC
CA
AA
AA
GG
A
AA

MCB14:348(94)

MFa2

TT
A
CC
TA
TT
CG
GG
A
AA

MCB14:348(94)

ASH1
-
1

-
614 ttgccttttt
TT
A
CC
TA
AA
AAA
G
AC
A

catctaactg

ASH1
-
2

-
582 actgattagt
TT
T
CC
GT
TT
TA
GG
A
TA

ttgacgccaa

EGT2
-
1

-
320 gctctattat
TT
T
CC
TA
AT
TC
GG
ACG cgctggctcc

PIR1
-
1

-
616 atattctgcc
TT
T
CC
TA
TT
TA
GG
T
AA

taattcctcg

PIR1
-
2

-
583 tcgaagccag
A
CG
CC
T
T
TT
TC
GG
C
TA

cttttttgac

PIR3
-
1

-
468 ctagcgtaag
A
GA
CC
TT
AT
TC
GG
A
A
C cgagcaacca

PIR3
-
2

-
334 agctgtattt
TT
A
CC
TC
AT
CG
GG
A
AA

agttattgca

YBR158W
-
1

-
117 tttggtttaa
TA
T
CC
CT
TT
TT
GG
T
TT

aatatccatc

YDR055W
-
1

-
382 aaaaccaaag
AA
A
CC
CA
AA
AA
AG
ACC acaaagctgg

YN
R067C
-
1

-
524 gcgcatatgt
TT
C
CT
AC
TT
AA
GG
T
TA

taagcataga

YNR067C
-
2

-
262 aggcacgaaa
T
CT
CC
CA
AT
TT
GG
T
TA

ccaaggaaaa


Table 5
.
Mcm1

and SFF motifs
.


27



Figure 2
. Arrayer: The basic system consists of three servo
-
motor powered linear rail tables (Daedal Series

500000)
mounted on an anti
-
vibration table. (Newport) The system is controlled by a Galil DMC
-
1730 controller card (Galil
Motion Control, Sunnyvale CA). The DMC
-
1730 controller card communicates with the Compumotor amplifiers that
drive the motors (from B
rown’s lab).


Figure 3
. Yeast genome chip which contains 6116 Yeast Genes 96 Intergenic regions + lots of control samples (from
Brown’s lab).



28








G
1

S G
2

M G
1

S

G
2

G
1

S G
2

M G
1


Histones

M/G
1

Figure 4.

M/G
1

and Histone clusters (from Brown’s lab). Relative gene expression variation may be monitored
by=瑡ting=v慲楯us=mokA=samp汥猠慴a捯ns散u瑩v
攠瑩m攠po楮瑳=of=捥汬l Åy捬攠af瑥t=d
1

release by either

-
f慣瑯r
b汯捫楮g or 敬e瑲楡瑩in exp敲業敮瑳. Who汥lgenom攠expr敳e楯n p慴瑥ans, 慳ar数r敳敮瑥t by th攠d楧楴楺敤 spo琠
imag攠v慲楡瑩ins, 慲攠捬cs瑥ted 慣捯rd楮g 瑯 瑨攠d敧r敥 of sim楬慲楴y (using, for 楮
s瑡n捥, 慮 Eu捬楤敡n d楳瑡n捥
m敡sur攩. In th敳攠two 數amp汥猬l愠䴯M
1

cluster (24 genes) and a S
-
phase cluster (8 histone genes) can be
clearly identified.


29








(
a
)

(
b
)

(
c
)

Figure 5
. Comparison of motif distributions in different clusters. The consensus for each indicated motif is
used. The position relative to ATG start site is marked along x
-
axis and the motif count per sequence in each
cluster is calculated
along y
-
axis. (
a
) Swi5:(A/G)CCAGC and its reverse complement, A6/T6: (A)
6

and (T)
6
;
(
b
) SCB: C(A/G)CGAAA, MCB: ACGCGT; (
c
) Mcm1: CC(C/T)(A/T)
3
N
2
(G/A)G, Sff: GT(C/A)AACA(A/T).