Abstract

crazymeasleΤεχνίτη Νοημοσύνη και Ρομποτική

15 Οκτ 2013 (πριν από 3 χρόνια και 5 μήνες)

67 εμφανίσεις



Field:
Mathematics/Applied Mathematics/ Computer Science


Session Topic:

Machine Learning and Prediction


Speaker:

Hiroshi Mamitsuka, Kyoto University



Title:
Learning Probabilistic Models for Mining Labeled Ordered Trees


Learning a probabilistic model
is a long
-
standing, highly regarded approach in machine
learning. In this approach, a probabilistic model is first designed based on the
background knowledge of a target application, and then
probabilistic
parameters of a
model are estimated from given (tr
aining) examples. Using a model with estimated
parameters, we can find patterns hidden in training data and can give a score
(
or
likelihood)
for a newly given example. We note that finding frequent patterns is a general
and important issue in data mining,
and prediction is also a major and useful goal in
machine learning.


In this talk, I'll focus on labeled ordered trees, which are a typical example of
semi
-
structured data that have appeared in a lot of applications, including text, web and
molecular biolo
gy. For labeled ordered trees, I'll show an example of probabilistic model
learning. That is, I'll present a
n original

probabilistic model and its efficient learning
scheme for labeled ordered trees.


For strings, a hidden Markov model is a general and w
id
ely
-
used probabilistic model in a
lot of applications such as speech recognition, natural language processing and
bioinformatics. In our approach, we extend a hidden Markov model and its standard
learning scheme to those for
mining
labeled ordered trees. I
n this talk, I will describe the
structure of our model for labeled ordered trees and how the model parameters are
estimated from given training examples, as comparing with those for a hidden Markov
model.


Finally, I will demonstrate the predictive perfor
mance of our approach using synthetic
datasets as well as real datasets derived from glycobiology. Assessing the results using
the approach on real data from some biological viewpoints will also be added, verifying
known facts in glycobiology.



References


1.

Managing and Analyzing Carbohydrate Data.
Aoki, K. F., Ueda, N., Yamaguchi, A.,
Akutsu, T., Kanehisa, M. and Mamitsuka, H.
ACM SIGMOD Record
,
33

(2), 33
-
38,
2004.

2.

Application of a New Probabilistic Model for Recognizing Complex Patterns in
Glycans.
Aoki,

K. F., Ueda, N., Yamaguchi, A., Kanehisa, M., Akutsu, T. and
Mamitsuka, H.

Proceedings of the Twelfth International Conference on Intelligent
Systems for Molecular Biology (ISMB/ECCB 2004),
(
Bioinformatics
,
20
, Supplement 1,
i6
-
i14), Glasgow, UK, August,
2004, Oxford University Press.


3.

Probabilistic Model for Mining Labeled Ordered Trees: Capturing Patterns in
Carbohydrate Sugar Chains.
Ueda, N., Aoki
-
Kinoshita, K. F., Yamaguchi, A., Akutsu,
T. and Mamitsuka, H.

IEEE Transactions on Knowledge and Data Engin
eering
,
17

(8),
1051
-
1064, 2005.

4.

ProfilePSTMM: Capturing Tree
-
structure Motifs in Carbohydrate Sugar Chains.
Aoki,
K. F., Ueda, N., Mamitsuka, H. and Kanehisa, M.

Proceedings of the Fourteenth
International Conference on Intelligent Systems for Molecular B
iology (ISMB 2006),
(
Bioinformatics
,
22

(14), e25
-
e34), Fortaleza, Brazil, August, 2006, Oxford University
Press.

5.

A New Efficient Probabilistic Model for Mining Labeled Ordered Trees.
Hashimoto, K.,
Aoki
-
Kinoshita, K. F., Ueda, N., Kanehisa, M. and Mamitsu
ka, H.,
Proceedings of the
Twelfth ACM SIGKDD International Conference On Knowledge Discovery and Data
Mining (KDD 2006)
, Philadelphia, PA, USA, August 2006, ACM Press.