4 Οκτ 2013 (πριν από 3 χρόνια και 9 μήνες)

109 εμφανίσεις

Machine Learning Techniques to Identify Putative
Genes Involved in Nitrogen Catabolite Repression
in the Yeast Saccharomyces cerevisiae
Kevin Kontos
,Patrice Godard
,Bruno Andr
,Jacques van Helden
Gianluca Bontempi
Machine Learning Group,Universit
e Libre de Bruxelles (ULB),CP 212,Boulevard du
Triomphe,1050 Bruxelles,Belgium
Physiologie Mol
eculaire de la Cellule,IBMM,ULB,Rue des Pr.Jeener et Brachet 12,
6041 Gosselies,Belgium
Conformation des Macromol
ecules Biologiques,ULB,CP 263,Boulevard du
Triomphe,1050 Bruxelles,Belgium
We present a machine learning approach where the identification of puta-
tive genes involved in nitrogen catabolite repression (NCR) in the yeast Saccha-
romyces cerevisiae is formulated as a supervised classification problem.
Classifiers are built to discriminate NCR fromnon-NCR genes on the basis
of various properties related to the GATA motif in their upstream non-coding
sequences.The training sets are composed of annotated NCR and non-NCR
genes.Dierent classifiers are compared,including na
ıve Bayes (NB),k-nearest-
neighbors (KNNs),and support vector machines (SVMs).Given the high-
dimensionality of the data,we use variable selection techniques (both filter
and wrapper approaches) to improve the performances of the classifiers.
The proposed approach is inspired by the one presented in [1].It takes it a
step further in that it is not restricted to counts of pattern occurrences in the
upstream non-coding sequences,it uses a negative training set (avoiding ran-
domgene selections and thereby rendering the approach less computationally
expensive),and it compares dierent classifiers.
The approach is evaluated by comparing the inferred NCR genes with sets
of genes identifiedas NCR-responding in three genome-wide experimental and
bioinformatics studies.SVMs seemtoperformbest,independentlyof the feature
selectionmethod.However,all classifiers were able to detect significant number
of genes identified in the three aforementioned studies.
Simonis,N.,Wodak,S.J.,Cohen,G.N.,van Helden,J.:Combining pattern discovery
and discriminant analysis to predict gene co-regulation.Bioinformatics 20 (2004)
This work was partially supported by the Communaut
e Franc¸aise de Belgique under
ARC grant no.04/09–307.
Presenter—poster presentation preferred.