Research seminar
week 8
Tam´as Bir´o
Humanities Computing
University of Groningen
t.s.biro@rug.nl
Tam´as B´ır´o,RUG,Groningen,NL
1
This week:
•
Background in learning theory
•
Niyogi,chapter 2 (3 and 4).
Tam´as B´ır´o,RUG,Groningen,NL
2
Framework for Learning
•
Finite alphabet Σ.
•
Language L ⊂ Σ
∗
.
•
Grammar g,generating language L
g
.
•
Family of grammars L,
family of languages L = {L
g
|g ∈ G}.
NB:Language not an input-output mapping now.
Tam´as B´ır´o,RUG,Groningen,NL
3
Framework for Learning
•
Example sentences s
i
:from target L
t
.
•
Set of possible example sentence
sequences D = (Σ
∗
)
∗
.
•
Hypothesis languages h ∈ H.
•
Learning algorithm A:effective procedure
A:D →H.
Tam´as B´ır´o,RUG,Groningen,NL
4
Framework for Learning
•
d(∙,∙):distance of grammars/languages.
•
Criterion of success:
lim
n→∞
d(g
t
,h
n
) = 0
where h
n
= A(s
1
,...,s
n
).
Tam´as B´ır´o,RUG,Groningen,NL
5
Approaches to Learning
•
Inductive inference/identification in the
limit/Gold-learning.
•
Probably Approximately Correct (PAC)
learning
Tam´as B´ır´o,RUG,Groningen,NL
6
Identification in the limit
•
Text t for language L:infinite sequence
s
1
,...,s
n
,...,s.t.each s
i
∈ L,and all
elements of L appears at least once in t.
•
t
k
:first k elements of text t.
•
t(k) = s
k
.
Tam´as B´ır´o,RUG,Groningen,NL
7
Identification in the limit
•
Learning algorithm A identifies (learns) target g
t
on text t in the limit,if lim
k→∞
d(A(t
k
),g
t
) = 0.
•
calA identifies g
t
in the limit,if it identifies g
t
in
the limit for all texts of L
g
t
.
•
Family G is identifiable in the limit if there is
an algorithm A that identifies every g ∈ G in the
limit.
Tam´as B´ır´o,RUG,Groningen,NL
8
Identification in the limit
•
Gold’s theorem (1967):family consisting
of all finite languages and at least one
infinite language is not learnable in the
limit.
•
Not learnable:regular languages;context
free languages;infinite regular languages.
•
Poverty of Stimulus;nativist arguments.
Tam´as B´ır´o,RUG,Groningen,NL
9
PAC Learning
Probably Approximately Correct Learning
(Vapnik and Chervonenkis 1971)
•
“Probably”:on “almost every” sequences
of data.
•
“Approximately correct”:the algorithm
gets close enough to target.
Tam´as B´ır´o,RUG,Groningen,NL
10
PAC Learning
Results:
•
PAC unlearnable:all finite languages;
regular languages;context free languages.
Tam´as B´ır´o,RUG,Groningen,NL
11
Complexity of learning
•
Speed of convergence.
Tam´as B´ır´o,RUG,Groningen,NL
12
Enter the password to open this PDF file:
File name:
-
File size:
-
Title:
-
Author:
-
Subject:
-
Keywords:
-
Creation Date:
-
Modification Date:
-
Creator:
-
PDF Producer:
-
PDF Version:
-
Page Count:
-
Preparing document for printing…
0%
Comments 0
Log in to post a comment