Abstract

peaceevenΒιοτεχνολογία

4 Οκτ 2013 (πριν από 4 χρόνια και 1 μήνα)

115 εμφανίσεις

Machine Learning Techniques to Identify Putative
Genes Involved in Nitrogen Catabolite Repression
in the Yeast Saccharomyces cerevisiae
?
Kevin Kontos
1;??
,Patrice Godard
2
,Bruno Andr
´
e
2
,Jacques van Helden
3
,and
Gianluca Bontempi
1
1
Machine Learning Group,Universit
´
e Libre de Bruxelles (ULB),CP 212,Boulevard du
Triomphe,1050 Bruxelles,Belgium
2
Physiologie Mol
´
eculaire de la Cellule,IBMM,ULB,Rue des Pr.Jeener et Brachet 12,
6041 Gosselies,Belgium
3
Conformation des Macromol
´
ecules Biologiques,ULB,CP 263,Boulevard du
Triomphe,1050 Bruxelles,Belgium
We present a machine learning approach where the identification of puta-
tive genes involved in nitrogen catabolite repression (NCR) in the yeast Saccha-
romyces cerevisiae is formulated as a supervised classification problem.
Classifiers are built to discriminate NCR fromnon-NCR genes on the basis
of various properties related to the GATA motif in their upstream non-coding
sequences.The training sets are composed of annotated NCR and non-NCR
genes.Dierent classifiers are compared,including na
¨
ıve Bayes (NB),k-nearest-
neighbors (KNNs),and support vector machines (SVMs).Given the high-
dimensionality of the data,we use variable selection techniques (both filter
and wrapper approaches) to improve the performances of the classifiers.
The proposed approach is inspired by the one presented in [1].It takes it a
step further in that it is not restricted to counts of pattern occurrences in the
upstream non-coding sequences,it uses a negative training set (avoiding ran-
domgene selections and thereby rendering the approach less computationally
expensive),and it compares dierent classifiers.
The approach is evaluated by comparing the inferred NCR genes with sets
of genes identifiedas NCR-responding in three genome-wide experimental and
bioinformatics studies.SVMs seemtoperformbest,independentlyof the feature
selectionmethod.However,all classifiers were able to detect significant number
of genes identified in the three aforementioned studies.
References
1.
Simonis,N.,Wodak,S.J.,Cohen,G.N.,van Helden,J.:Combining pattern discovery
and discriminant analysis to predict gene co-regulation.Bioinformatics 20 (2004)
2370–2379
?
This work was partially supported by the Communaut
´
e Franc¸aise de Belgique under
ARC grant no.04/09–307.
??
Presenter—poster presentation preferred.