Investigation of Transcription Factors in the Coprinopsis cinerea Genome

brewerobstructionAI and Robotics

Nov 7, 2013 (3 years and 8 months ago)

59 views


1



Investigation of Transcription Factors in
the
Coprinopsis cinerea

Genome




Thesis submitted, August 2010
to the Faculty
of

the School
of Informatics

and Computing
, Indiana University,
Bloomington, in partial fulfillment for the requirements of the
degree


Samuel Chapman



Master of Science

In

Bioinformatics



Primary Advisor:
Sun Kim







2



Accepted by the faculty of the School of Informatics and Computing, Indiana University,
in partial fulfillment of the requirements for the degree Master of Science.


M
aster of Science Candidate: __________________________________________

Capstone Instructor: _________________________________________________

Advisor (s): ________________________________________________________







Final Project Approval










Person

Signature



Date


Primary Advisor __________________________________________







Secondary Advisor ________________________________________







Capstone Instructor ________________________________________








Program Director ______________
____________________________







Graduate Records Administrator ______________________________







Associate Dean ____________________________________________















Month and Year ___________________










3


Investigation of Transcription Facto
rs in the
Coprinopsis cinerea

Genome



Samuel Chapman

Advisor: Sun Kim, Indiana University

2010



Abstract



The
genome of the mushroom
Coprinopsis cinerea

has recently been fully sequenced.
However, many of the genes are not yet annotated. Recent work has

produced a time series
microarray dataset that spans the process of meiosis in the mushroom. The expression levels of
all genes have been determined at various points in this process. The current project has focused
on using this data to determine the tra
nscription factors involved in meiosis, and to more fully
understand the transcriptional network of
C. cinerea
. Genomic searches have also been used to
confirm transcription factor candidates, and position
-
specific scoring matrices have outlined
possible t
ranscription factor binding sites (targets) for the candidate transcription factors. The
project has not found many candidates, and another procedure, such as one that utilizes dynamic
Bayesian networks, may be needed.


Introduction



The
homo
basidiomycet
e
Coprinopsis cinerea

(
Coprinus cinereus
) is an inky
-
cap
mushroom that
is well
-
suited as a model organism.
Its life cycle lasts around two weeks,
(2)

which makes studies involving reproduction feasible
. In addition, the mushroom readily
responds to light/d
ark cues for its reproductive timing,
(1)

which makes control and study of
reproductive processes such as meiosis even easier.

Perhaps most significantly, at least in terms
of meiotic studies, the fruiting bodies which contain meiotic tissue engage in sync
hronous
meiosis;
(3)

that is, the separate cells undergoing meiosis tend to all be at the same stage in
meiosis at the sa
me time.


The reproductive characteristics of
C. cinerea

allow researchers to easily perform a
variety of particular studies. For exam
ple, the effect of mutants and gene knockouts on meiosis
can be examined. Furthermore, microarray technology can be used to measure the level of
expression of the various genes during meiosis. The amount of cells that undergo meiosis at a
particular stage
is enough to perform this analysis on.
(1)



One method of analyzing gene expression during a particular time period such as meiosis
is to measure the amount of RNA in the cells at different time points. This is referred to as a
time
-
course analysis. A wel
l
-
known example of this process involved the analysis of microarray
expression data of most of the genes of
Saccharomyces cerevisiae

during meiosis.
(4)

In this
study, researchers measured the RNA expression of the genes at several time points, instead of
just one time point. They were able to find that genes were differentially expressed at different
times during meiosis.


This project builds on the results of one such microarray time
-
course study performed on
meiotic tissue of
C. cinerea
.
(5)

Even though

the genome of
C. cinerea
has been sequenced, few
of its genes have been characterized so far.
(6)

The study’s results were used as a backbone for

4


the broader goal of predicting transcription factors
and their targets
in
C. cinerea
, especially
those involv
ed in meiosis.



Procedure
/Results



Background

Study



The project in
(5)

involved a microarray survey of
the genes of
C. cinerea
during
meiosis. The genome of
C. cinerea

was recently sequenced,
(6)

and the genomic data was
gathered from that project. Us
ing a method, ArrayOligoSelector, from
(7)
, 70
-
bp (base pair)
probes were designed for each of the approximately 12,500 genes of the mushroom, along with
several hundred ESTs
and repeated sequences
, for a total of 13,230 probes

(some genes had
more than on
e probe).


The microarray probes were used to collect RNA (gene) expression data at six time
points over 15 hours during meiosis. The time points were in three
-
hour intervals, starting at
three hours before karyogamy, or meiotic DNA replication. Each time
-
point sample was
compared to a reference sample that consisted of a mix of the last four time points; furthermore,
each sample had four replicates.
The unit used for each replicate/reference expression value was
the normalized log
2

M value.


Significance
Analysis of Microarrays software,

(8)

using a false
-
discovery rate of 10%,
was used to determine which genes had differential expression. Using this FDR cutoff, it was
found that 2851, or about 20%, of the probes exhibited differential expression.
These ge
nes
(probes) that had differential expression were grouped into nine clusters, using a K
-
means
clustering algorithm using Pearson correlation as the similarity measure. Thus clusters of genes
whose expression
s

varied
alongside each other

were obtained.

For

each time point for each gene,
the four replicate values were condensed to a single value, using a type of average.



Identification o
f transcription factor binding s
ites



In order to identify transcription factors present in
C. cinerea
, it was necessar
y to identify
target sites on the genome where those putative transcription factors might bind. Using the
resources at the Broad Institute website,
(6)
the 1000
-
bp upstream regions of all the genes were
downloaded. The number of genes present in this file
was 1
3,192. While transcription factors can
bind to sites farther upstream from a gene, or even intergenically, it was decided that this was a
good place to start.


To test these upstream regions for possible transcription factor binding sites (TFBSs) a
pr
ogram called Patser was used.
(9)

Patser is a program that receives as input a position
-
specific
scoring matrix (PSSM) and genetic sequence
. The PSSM in this case is a matrix of a certain
length that gives weights for all nucleotides at every position on t
he length. The output is a score
for each position of the sequence
; this position actually represents the area around the position
that is the length of the PSSM.
The higher the score a position has, the higher the similarity
between the PSSM and the area
around that position.


The TRANSFAC database
(10)

provided the necessary PSSMs that would be compared
to the genes
(in the context of running Patser, “gene” refers to the 1000
-
bp upstream sequences)
.
There were 815 PSSMs used. Each PSSM did not necessarily

represent the TFBS for one single
transcription factor (TF), but could have repres
ented contributions from multiple

different TFs


5


over more than one organism
. For instance, in the description of TFBS naming, the database
gives this example: “V$CREB_Q2 is
a matrix constructed of CREB binding sites of quality 2 or
better.”
(11)

Also, even though the organism being studied was a fungus, the TFBSs taken from
TRANSFAC covered organisms across very diverse lineages, including bacteria, fungi,
vertebrates, and ot
hers.


Patser is only capable of recognizing a very specific file format, and can only analyze one
gene/TF combination at a time. Therefore, Matlab and Perl scripting were used to put the input
into the right format, and to automate the process so that ma
ny gene/TF combinations could be
scored at a time.

Also, because the total number of scores was so high (13,192 genes * 1000 bp
per gene * 815 TFs = 10,751,480,000 scores), each TF/gene combination was reduced to a
single, new score. This new score, also c
alled the number of “hits”, represented the number of
positions on the sequence that had a score above a predefined cutoff. Unless otherwise noted,
any future references to the score of a gene/TF combination will refer to the number of hits.
After some exp
erimentation, it was decided to use a cutoff score of six, so that TFBS information
was not lost without becoming noisy. By using a single score for each combination, the number
of pieces of information was reduced by a factor of 1,000.
This dramatically c
ut down on
memory requirements. Perl scripting was used to summarize the number of hits and to delete the
original score files.


Identification of putative transcription factors


After the TFBS scores for each upstream region for each gene were obtained,
the next
step was to identify possible TFs in
C. cinerea
. The TRANSFAC database was also used for this
part.
To identify potential TFs in the mushroom, representative TRANSFAC TFs were
BLASTED
against the
C. cinerea

genome. It was mentioned earlier that ea
ch of the 815
TRANSFAC PSSMs (i.e. transcription factors) used could actually represent one or more
transcription factors over different species.
Therefore, usually only one or two of the
representative TFs were BLASTED against the mushroom genome. Often,
but not always, the
representative TFs chosen were those that were closest in evolutionary distance to
C. cinerea
.
For instance, when dealing with vertebrate TFs, the representative TF was often taken from
Danio rerio
, the zebrafish, since it is closer in
evolutionary distance to the mushroom than, say, a
mammal might be.

The steps of the BLAST search were as follows. First, a representative TF sequence from
each of the TFs was chosen. Then, the TF sequence was obtained from the NCBI website.
(12)

Usually
the amino acid sequence was chosen, but sometimes the nucleotide sequence only was
chosen, or both. When the amino acid sequence had more than one variant, one or more were
chosen. Because the process was done “by hand” and because there was no way of tell
ing in
advance which sequence would have a BLAST match with the
C. cinerea

genome, there were no
hard and fast rules as to whether the nucleotide or protein sequence, or which protein sequence
variant, was chosen.

Next, each of these representative seque
nces was BLASTED against the
C. cinerea

genome, using the specific NCBI link outlined in
(13)
. This was done so that the entire database
consisting of all sequenced genomes would not have to be searched.
Usually, only

blastx and
blastp were used, because o
ften only the amino acid sequences were used as input. If it was felt
that
more searching was needed, an additional BLAST search, such as tblastn, was used.


6


For each of the 815 TFs, all
C. cinerea

genes that had BLAST matches that had a score
better than e
-
25

were saved

(it was possible for one of the TRANSFAC TFs to correspond to
more than one
C. cinerea

gene, and vice
-
versa)
. These were considered the possible TF genes in
C. cinerea
.
This produced a total of 174 candidate TF genes in the mushroom.
However
, to make
the BLAST search more accurate, each candidate TF from
C. cinerea

was in turn BLASTED
against the nonredundant database available on NCBI. Like before, the BLAST search was
mostly blastp and blastx. The
matches

for each
candidate
C. cinerea

gene
were examined to see
if any of them contained annotated transcription factors. If there was at least one match with a
score at least 100 bits (about e
-
80
), then that
C. cinerea

candidate could move on to further
scrutiny. Otherwise, the candidate was disca
rded. A match that was not annotated as a
transcription factor, even if it had a score much higher than 100 bits, was not considered.
Occasionally, there were exceptions below this cutoff when it was felt this was warranted. In the
end, this was done so as

not to lose any important information.

At the end of this process, 47 candidate TFs were discovered in
C. cinerea
.


Connection of candidate TFs to their potential TFBSs


Once the 47 candidate TFs were obtained, the next step was to connect these
C. ciner
ea
genes to their potential TFBSs from the Patser scoring. Recall that the Patser scoring was done
on the PSSMs for the 815 TRANSFAC TFs, not the
C. cinerea

genes. However,
each of the 47
candidate TFs found in the mushroom were each connected to one or mo
re of the 815 TFs.
Therefore, it was possible to connect each candidate TF to the possible genes that it had a TFBS
for by substituting the 815 TRANSFAC TFs with their corresponding candidate
C. cinerea

TFs.
This was done by first putting the information g
athered so far into Microsoft Excel, and then
using Microsoft Access and query statements that transformed the information into the desired
format (MS Access loaded the information from Excel). Not all of the 815 TRANSFAC TFs had
C. cinerea

matches, so the
y were not considered.


Finally, the candidate TFs and their TFBSs were compared to the microarray data from
(5). Since only about 20% of the genes in the mushroom were differentially
-
expressed, it was
natural to guess that not all the 47 candidate TFs wo
uld be differentially
-
expressed. Indeed, out
of the 47, only 11 were found to belong to the nine clusters of differentially
-
expressed genes.
Therefore, the final genes that were studied were the 11 candidate TFs and their associated
candidate TFBS genes.


It is important to emphasize that there was not a one
-
to
-
one correspondence between
putative TFs and TRANSFAC TFs, or TRANSFAC TFs and the cluster genes (i.e. genes that
contain the TFBSs). For example, a TRANSFAC TF can correspond to more than one putati
ve
TF, or vice
-
versa. Also, there is overlap between the three kinds of genes.
Figure 1

illustrates
this situation.



7




Figure 1.

There was overlap between TRANSFAC TFs, the putative TFs, and the cluster genes representing
possible TFBSs for TFs. Also, th
ere was not a one
-
to
-
one correspondence between any of the two classes.



Using microarray data to confirm possible relationships between TFs and TFBSs



Now that the possible relationships between the candidate TF genes in
C. cinerea

and
their candidate t
arget genes had been established, the next step was to use the mi
croarray data to
confirm this. Linear r
egression analysis was used to see if there was a relationship between the
changes in TF expression and their targets.

For each cluster, the regression
analysis was
performed on each valid combination of time points. For example, a TF at time point 1 could
affect the expression of cluster genes at time points 1, 2, 3, 4, 5, and 6; a TF at time point 4 could
affect cluster genes at only time points 4, 5, a
nd 6. For each cluster, there were 21 such
combinations that were tested.


The regression analysis was performed in the following manner. Each cluster had a
certain number of putative TFs that were included in the cluster. These TFs were each associated
wi
th a TRANSFAC TF that had Patser TFBS scores for each cluster gene. To set up the
reg
ression for a particular TF/cluster

time point combination, a matrix was created
to conform to
the linear regression form
y
i
=x
1
i

+ x
2
i

+…+ x
ji

where the y values were the
expression levels of
each gene in the cluster, and the x values were the expression levels of the putative TFs (which
represented their corresponding TRANSFAC TFs that were scored against the cluster genes).
Figure 2

shows an example of the setup for a par
ticular time point
combination, and explains
how the matrix worked.



Cluster
gene

Cluster gene
expression (y
i
)

TRANSFAC match
x
1i

TRANSFAC match
x
2i

TRANSFAC match
x
3i

12931

.45

0

-
1.2

.07

723

.111

0

-
1.2

0

4585

1.5

2.5

-
1.2

.07


Fi
gure 2
. Linear regression

matrix for a particular TF/cluster

time point combination. The cluster gene expression
value, y
i
, is simply the expression level of the cluster gene in question. The TRANSFAC match values, x
1i
, …
represent any of the 815 TRANSFAC
TFs that were in the cluster (as determined by looking at the putative TFs). If
the Patser score was zero for a particular TRANSFAC/cluster gene combination, the value in the cell was zero.
Otherwise, a particular x column would have the same expression nu
mber, as seen in the matrix above.


8




After running the linear regression in Matlab for each TF/cluster time point combination,
coefficients for each TRANSFAC TF match were obtained that represented the p value, or
probability, that each TF was involved in
the expression change for that TF/cluster time point
combination. Any combination that had a low p value was examined to see whether the change
in TF expression at that time effected a noticeable change in the cluster genes on average (i.e.
a
ratio of the
expression level changes

was calculated).
Figure 3

shows the results of this work.


Cluster

TF

TRANSFAC match

TF/Cluster time point

P value

Cluster change/TF change

3

311

P~EMBP1_Q2

1/1

.0714

-
.0814

3

311

V~NKX22_01

1/4

.0264

.0546

3

311

V~NKX22_01

2/4

.0264

.0566

3

311

V~NKX22_01

3/4

.0264

.118

3

311

V~NKX22_01

4/4

.0264

8.9476

5

8510


V~MEF2_01

1/3

.0266

.1809

5

8510


V~MEF2_01

1/4

.0166

-
.1256

5

8510


V~MEF2_01

2/3

.0266

.2009

5

8510


V~MEF2_01

2/4

.0166

-
.1395

5

8510


V~MEF2_01

3/3

.0266

.1205

5

8510


V~MEF2_01

3/4

.0166

-
.0837

5

8510


V~MEF2_01

4/4

.0166

-
.1777


Figure 3.

Clusters 3 and 5 had low p values.


In addition
, there was the question of whether more TFBS hits for a TF o
n a cluster gene
translated to a greater expression level change. To see if this was true, cluster time point
s

associated
with low p values
from the previous regression
were displayed on a graph, with the x
-
axis representing the number of TFBS hits
for the

displayed TF
and the y
-
axis rep
resenting the
expression change. A

linear trendline along with an R2 value was added to get an idea of the
effect.
Figures 4
-
7

show the r
esults of this work. Note that only cluster gene expression levels
are being represente
d, because the expression levels of the TFs at the time points are not being
represented.



9




Figure 4.




Figure 5.



10




Figure 6.




Figure 7.



Comparison of C. cinerea TFs to those of S. cerevisiae and S. pombe



11



Finally, the 47 candidate TFs of
C.

cinerea

were compared
using blastp
to the genomes
of
Saccharomyces cerevisiae

and
Schizosaccharomyces pombe
, to see if any similarity in the TFs
existed between the three organisms. Since these were candidate TFs only, and because there
was no real metric

to test whether the similarity was actually significant, this particular
comparison was not scientific. However, it did serve to give a general idea of whether or not the
candidate TFs found in
C. cinerea

might also be found in the two other organisms. Th
e cutoff
score used for the BLAST search was around e
-
30
.
Figure 8

shows the results of this comparison.



Only in
C. cinerea

In
C. cinerea
and
S. pombe

In
C. cinerea
and
S. cerevisiae

In all three

Number

12

10

5

20

% of total

26 %

21%

11%

43%


Figure 8
. Matches of the 47
C. cinerea

candidate TFs found among
S. cerevisiae

and
S. pombe
.


Discussion



From the data gathered, gene 311 seems to be involved in
increasing

expression: in the
fourth time point, there was a dramatic average change
in cluster gene expression per TF change:
8.9476! However, as one can see from
figure 5
, it appears that the number of TFBS hits on a
cluster gene does not affect the average expression.
As one can see from
figure 3
, gene 8510
may be involved in both incre
asing and decreasing expression. This is bolstered by the fact that
the trendline in
figure 6

shows a tendency for expression to increase the more TFBS hits a
cluster gene has, and in
figure 7

that shows a tendency for expression to decrease.
Bo
th of these

genes probably bear looking at in the laboratory as possible transcription factors involved in
meiosis.


The graphs that looked at the effect of multiple TFBS hits on cluster gene expression
were fairly simple, in the sense that they did not account for t
he actual expression of the
candidate TFs, only the cluster gene expression and their hits. A more sophisticated statistical
analysis may have helped, but in the end, there were so few candidates that the amount of data
was very sparse.


The comparison of

C. cinerea
,
S. cerevisiae
, and
S. pombe

showed a reasonably even
distribution of the TFs among the three organisms. Some were only present in
C. cinerea
, others
were present in
C. cinerea

and another of the organisms, and others were present in all three.

While this survey was not rigorous, it at least demonstrates that
C. cinerea

does share both
overlapping and non
-
overlapping TFs with the other two model organisms.


This project did not yield many actionable results. Out of more than 13,000 genes, only

47 TF candidates were found, and out of those 47, only two were demonstrated based on the
microarray data and regression analysis to have a clear correlation with expression levels.
Part of
the reason for this is that the microarray data involved “only” m
eiosis, so any other TFs in
C.
cinerea

that work in other situations would not be noticed. However, the biggest reason is
probably that the procedure itself was flawed.


What was wrong with the procedure, and how could it have been changed to reveal more
of the transcription network in the mushroom?

First of all, the threshold used in Patser to

12


determine the number of hits for a TFBS may have been too low. While a threshold of six
ensured that little information was lost, many of the genes of the mushroom
had TFBS matches
to many of the TRANSFAC PSSMs. The “noisiness” of the data may have contributed to the
small number of candidate TFs with low p values (because the regression depended on whether
or not a cluster gene had a TFBS hit for a TF).
Also, only 1
000
-
bp upstream of the genes was
tested; perhaps a greater amount upstream, such as 5000 bp, or even intergenic regions, needed
to be tested. The TFBS scoring

problem only affected the project at the later stages, when there
were already only 11 candidate
TFs, which is very low. The number of candidates themselves
needed to be increased.


The original number of candidates, 47, (11 of which were in the gene clusters) was
arrived at by first comparing the
C. cinerea

genome to the representative TRANSFAC TF
se
quences. This was sometimes difficult, because
only the public TRANSFAC database was
available at the time of this project. The public database does not have as much information as
the paid database, so some of the TF information from TRANSFAC involved gue
sswork, and
may have been incomplete. The number of candidate TFs may have been much higher than 174
if the full database had been available.


The reciprocal BLAST of the 174 candidates against the nonredundant database also
presented some problems. The re
ason was that the criterion for testing whether a gene warranted
further consideration as a candidate was that any matches to the nonredundant database had to be
annotated as TFs. This meant that even if the match was perfect, but not annotated as a TF, it

was
not considered. It was often the case that genes in other organisms had near
-
perfect matches far
better than the cutoff, but they were not yet characterized. Therefore, it is likely that there were
many TFs that were discarded for this reason, and rem
ain to be found.


One method that may have produced better results in both BLAST searches involves
using only specific domains of the genes. Often, it was found that
only a specific portion of the
sequences produced much of the score for the BLAST. If only

those domains that matched could
be compared, it would improve the number of candidates.


Because BLASTING candidates against the nonredundant database to find annotated
matches is so difficult, it might be useful to simply use a protein function predicti
on tool on the
candidates instead; if the tool indicates that the candidate is a transcription factor, then it can be
tested against the microarray data. There are a number of programs that do this, such as the
program PFP.
(14)


It is the author’s opinion

that a more radical change in the procedure is needed. A method
that looks very promising involves the use of dynamic Bayesian networks. A Bayesian network
is a
network of variables connected by probabilities in a Bayesian fashion.
(15)

A dynamic
Bayesian

network, or DBN, connects these variables in a dynamic fashion. For example, in
microarray time series data, gene
x

at time point
i
might affect gene
y

at time point
j
, with
probability
p
. More generally, all the genes (variables) can seen as interacting
with each other
across time points with a particular probability. Therefore, using DBNs, one can elucidate the
transcription factor/target network in microarray time series data. Some methods, such as
(16)

and (
17)
, have been developed that build on the fr
amework of
(18)
. These methods involve
Matlab and have been tested on real
time series microarray data, with promising results.



Whatever the case, the transcriptional network of
C. cinerea

still needs to be mapped out,
and its transcription factors anno
tated. At the very least, this project

has helped to show that
certain methods are better than others in this particular task, which should help others who
choose to build on this work or do something similar.


13



References



1.
Kues
,

U.
Life history and de
velopmental processes in the basidiomycete Coprinus
cinereus.
Microbiology And Molecular Biology Reviews
.

2000,
64
:
316
-
353.


2.
Moore, D., and P. J. Pukkila
.
Coprinus cinereus
:
an ideal organism for

studies of genetics and developmental biology.

J. Biol. E
duc.
1985,
19
:
31

40.


3.
Lu, B.C.
Meiosis in Coprinus Lagopus: a comparative study with light and electron
microscopy.

Journal of Cell Sci. 1967, 2:529
-
536.


4.
Chu S, DeRisi J, Eisen M, Mulholland J, Botstein D, Brown PO, Herskowi
tz I.

The
transcriptional

program of sporulation in budding yeast
.

Science

1998,
282:
699
-
705.


5. Burns, C., Stajich, J.E.
et al
.
The meiotic transcriptome of the basidiomycete
Coprinopsis cinerea (Coprinus cinereus).

2009. Manuscript.


6.
Coprinus cinereus Sequencing Project. Bro
ad Institute of MIT and Harvard

(
www.broad.mit.edu)
.


7.
Bozdech Z, Zhu JC, Joachimiak MP,

Cohen FE, Pulliam B, DeRisi JL.

Expression
profiling of the schizont and trophozoite stages of Plasmodium falciparum with a l
ong
-
oligonucleotide microarray
.

Genome Biology
. 2003,

4.


8. Tusher VG, Tibshirani R, Chu G.

Significance analysis of microarrays applied to the
ionizing radiation response.
Proceedings of the National Academy of Sciences of the United
States of America
.

2001,
98:
5116
-
5121.


9. Patser program by Gerald Hertz,
hertz@colorado.edu
.


10. TRANSFAC database available at
www.biobase
-
international.com
.


11. Ibid.
http://www.gene
-
regulation.com/pub/databases/transfac/doc/matrix1SM.html


12. NCBI website available at
http://www.ncbi.nlm.nih.gov/


13
. Ibid
http://www.ncbi.nlm.nih.gov/sutils/genom_table.cgi?organism=fungi


14. PFP program available at
http://kiharalab.org/web/pf
p.php


15. Heckerman, D.
A tutorial on learning with Bayesian networks.

1995.


16. Zou, M. and Suzanne D. Conzen.
A new dynamic Bayesian network (DBN) approach
for identifying gene regulatory networks from time course microarray data.

Bioinformatics.
2005
, 21:1, 71
-
79.


17. Shermin, A. and Mehmet A. Orgun.
Analysis of microarray data to infer
transcription regulation in the yeast cell cycle
. International Journal of Functional Informatic
and Personalised Medicine. 2010, 3:1, 73
-
88.


18. Murphy, K and S. Mi
an.
Modeling gene expression data using dynamic Bayesian
networks.
1999, Technical Report, Computer Science Division, University of California,
Berkeley, CA.











14


Acknowledgements


Claire Burns: Microarray data, introduction to the project, a great dea
l of help throughout

Jim Costello: Introduction to the project

Mehmet Dalkilic: Help with general capstone work

Linda Hostetter: Help with general capstone work

Sun Kim: A great deal of help throughout the project as my advisor

Bob Konicek: Computer system

questions

Meng Li: Program questions

Seungyoon Nam: Program questions

Bernard Shen: A lot of help with Perl scripting in dealing with the data

Miriam Zolan: Introduction to the project, help with beginning stages of project

Sun Kim’s lab group

Pedja Radiv
ojac’s lab group


Thanks, everyone. It

means a lot to me!