1
Overview of Machine Learning
for NLP Tasks: part I
(based partly on slides by
Kevin Small and Scott Yih)
Page
2
Goals of Introduction
Frame specific
natural language processing
(NLP) tasks as
machine learning problems
Provide an overview of a general machine learning system
architecture
Introduce a common terminology
„
Identify typical needs of ML system
Describe some specific aspects of our tool suite in regards to
the general architecture
Build some intuition for using the tools
Focus here is on
Supervised
learning
Page
3
Overview
1.
Some Sample NLP Problems
2.
Solving Problems with Supervised Learning
3.
Framing NLP Problems as Supervised Learning Tasks
4.
Preprocessing: cleaning up and enriching text
5.
Machine Learning System Architecture
6.
Feature Extraction using FEX
Page
4
Context Sensitive Spelling
[2]
A word level tagging task:
I would like a peace of cake for desert.
I would like a
piece
of cake for
dessert
.
In principal, we can use the solution to the
duel problem
.
In
principle
,
we can use the solution to the
dual
problem.
Page
5
Part of Speech (POS) Tagging
Another word
-
level task:
Allen Iverson is an inconsistent player. While he can
shoot very well, some nights he will
score only a few
points.
(NNP Allen) (NNP Iverson) (VBZ is) (DT an)
(JJ inconsistent) (NN player) (. .) (IN While) (PRP he)
(MD can)
(VB shoot) (RB very)
(RB well)
(, ,) (DT some)
(NNS nights) (PRP he)
(MD will) (VB score)
(RB only)
(DT a) (JJ few)
(NNS points)
(. .)
Page
6
Phrase Tagging
Named Entity Recognition
–
a phrase
-
level task:
After receiving his M.B.A. from Harvard Business School,
Richard F. America accepted a faculty position at the
McDonough School of Business (Georgetown University) in
Washington.
After receiving his
[MISC M.B.A.]
from
[ORG Harvard
Business School]
,
[PER Richard F. America]
accepted a
faculty position at the
[ORG McDonough School of Business]
(
[ORG Georgetown University]
) in
[LOC Washington]
.
Page
7
Some Other Tasks
Text Categorization
„
Word Sense Disambiguation
„
Shallow Parsing
„
Semantic Role Labeling
„
Preposition Identification
„
Question Classification
„
Spam Filtering
:
:
Page
8
Supervised Learning/SNoW
Page
9
Learning Mapping Functions
Binary Classification
„
Multi
-
class
Classification
„
Ranking
„
Regression
{Feature, Instance, Input} Space
–
space used to describe each instance; often
„
Output Space
–
space of possible output
labels; very dependent on problem
„
Hypothesis Space
–
space of functions
that can be selected by the machine
learning algorithm; algorithm dependent
(obviously)
Page
10
Multi
-
class Classification
[3,4]
One Versus All (OvA)
Constraint Classification
Page
11
Online Learning
[5]
SNoW algorithms include Winnow,
Perceptron
„
Learning algorithms are mistake driven
„
Search for linear discriminant along
function gradient (unconstrained
optimization)
„
Provides best hypothesis using data
presented up to to the present example
„
Learning rate determines convergence
Too small and it will take forever
„
Too large and it will not converge
Page
12
Framing NLP Problems as
Supervised Learning Tasks
Page
13
Defining Learning Problems
[6]
ML algorithms are mathematical formalisms and problems
must be modeled accordingly
„
Feature Space
–
space used to describe each instance;
often
R
d
, {0,1}
d
,
N
d
Output Space
–
space of possible output labels, e.g.
Set of Part
-
of
-
Speech tags
„
Correctly spelled word (possibly from confusion set)
Hypothesis Space
–
space of functions that can be selected by
the machine learning algorithm, e.g.
Boolean functions (e.g. decision trees)
„
Linear separators in
R
d
Page
14
Context Sensitive Spelling
Did anybody (else) want
too
sleep for
to
more
hours this morning?
Output Space
Could use the entire vocabulary;
Y
={a,aback,...,zucchini}
„
Could also use a
confusion set
;
Y=
{to, too, two}
Model as (single label) multi
-
class classification
„
Hypothesis space is provided by SNoW
„
Need to define the feature space
Page
15
What are ‘feature’, ‘feature type’, anyway?
A feature type is any characteristic (relation) you can define over the
input representation.
Example: feature TYPE = word bigrams
Sentence:
The man in the moon eats green cheese.
Features:
[The_man], [man_in], [in_the], [the_moon]….
In Natural Language Text, sparseness
is often a problem
How many times are we likely to see “the_moon”?
„
How often will it provide useful information?
How can we avoid this problem?
Page
16
Preprocessing: cleaning up and enriching text
Assuming we start with plain text:
The quick brown fox jumped over the lazy dog. It landed on
Mr. Tibbles, the slow blue cat.
Problems:
Often, want to work at the level of sentences, words
„
Where are sentence boundaries
–
‘Mr.’ vs. ‘Cat.’?
Where are word boundaries
--
‘dog.’ Vs. ‘dog’?
Enriching the text: e.g. POS
-
tagging:
(DT The) (JJ quick) (NN brown) (NN fox) (VBD jumped)
(IN over) (DT the) (JJ lazy) (NN dog) (. .)
Page
17
Download Some Tools
http::/l2r.cs.uiuc.edu/~cogcomp/
Software::tools, Software::packages
Sentence segmenter
„
Word segmenter
„
POS
-
tagger
„
FEX
NB: RIGHT
-
CLICK on “download” link
select “save link as...”
Page
18
Preprocessing scripts
http://l2r.cs.uiuc.edu/~cogcomp/
„
sentence
-
boundary.pl
./sentence
-
splitter.pl
–
d HONORIFICS
–
i nyttext.txt
-
o nytsentence.txt
word
-
splitter.pl
./word
-
splitter.pl nytsentence.txt > nytword.txt
Invoking the tagger:
./tagger
–
i nytword.txt
–
o nytpos.txt
Check output
Page
19
Problems running .pl scripts?
Check the first line:
#!/usr/bin/perl
Find perl library on own machine
E.g. might need...
#!/local/bin/perl
Check file permissions...
> ls
–
l sentence
-
boundary.pl
> chmod 744 sentence
-
boundary.pl
Page
20
Minor Problems with install
Possible (system
-
dependent) compilation errors:
doesn’t recognize ‘optarg’
„
POS
-
tagger: change Makefile in subdirectory snow/ where indicated
sentence
-
boundary.pl: try ‘perl sentence
-
boundary.pl’
Link error (POS tagger): linker can’t find
–
lxnet
remove ‘
-
lxnet’ entry from Makefile
„
generally, check README, makefile for hints
Page
21
The System View
Page
22
A Machine Learning System
Preprocessing
Feature
Extraction
Machine
Learner
Classifier(s)
Inference
Raw
Text
Formatted
Text
Testing
Examples
Function
Parameters
Labels
Feature
Vectors
Training
Examples
Labels
Page
23
Preprocessing Text
Sentence splitting, Word Splitting, etc.
„
Put data in a form usable for feature
extraction
They recently
recovered a small
piece of a live
Elvis concert
recording.
He was singing
gospel songs,
including “Peace
in the Valley.”
0 0 0 They
0 0 1 recently
0 0 2 recovered
0 0 3 a
0 0 4 small
piece 0 5 piece
0 0 6 of
:
0 1 6 including
0 1 7 QUOTE
peace 1 8 Peace
0 1 9 in
0 1 10 the
0 1 11 Valley
0 1 12 .
0 1 13 QUOTE
Page
24
A Machine Learning System
Preprocessing
Feature
Extraction
Raw
Text
Formatted
Text
Feature
Vectors
Page
25
Feature Extraction with FEX
Page
26
Feature Extraction with FEX
FEX (Feature Extraction tool) generates abstract representations
of text input
Has a number of specialized modes suited to different types of problem
„
Can generate very expressive features
„
Works best when text enriched with other knowledge sources
–
i.e., need to
preprocess text
S = I would like a
piece
of cake too!
FEX converts input text into a list of active features…
1: 1003, 1005, 1101, 1330…
Where each
numerical feature
corresponds to a specific
textual
feature
:
1:
label[piece]
1003:
word[like] BEFORE word[a]
Page
27
Feature Extraction
Converts formatted text
into
feature vectors
„
Lexicon file contains
feature descriptions
0 0 0 They
0 0 1 recently
0 0 2 recovered
0 0 3 a
0 0 4 small
piece 0 5 piece
0 0 6 of
:
0 1 6 including
0 1 7 QUOTE
peace 1 8 Peace
0 1 9 in
0 1 10 the
0 1 11 Valley
0 1 12 .
0 1 13 QUOTE
0, 1001, 1013, 1134, 1175, 1206
1, 1021, 1055, 1085, 1182, 1252
Lexicon
File
Page
28
Role of FEX
Why won't you
accept
the facts?
No one saw her
except
the postman.
1, 1001, 1003, 1004, 1006:
2, 1002, 1003, 1005, 1006:
Feature Extraction
FEX
lab[accept], w[you], w[the], w[you*], w[*the]
lab[except], w[her], w[the], w[her*], w[*the]
Page
29
Four Important Files
FEX
Script
Corpus
Example
Lexicon
A new
representation of the
raw text data
1.
Control FEX’s behavior
2.
Define the “types” of features
Feature vectors for
SNoW
Mapping of feature
and feature id
Page
30
Corpus
–
General Linear Format
The corpus file contains the preprocessed input with
a single
sentence
per line.
When generating examples, Fex never crosses line
boundaries.
The input can be any combination of:
1
st
form: words separated by white spaces
2
nd
form: tag/word pairs in parentheses
There is a more complicated 3
rd
form, but deprecated in
view of alternative, more general format (later)
Page
31
Corpus
–
Context Sensitive Spelling
Why won't you
accept
the facts?
(
WRB
Why) (
VBD
wo) (
NN
n't) (
PRP
you)
(
VBP
accept) (
DT
the) (
NNS
facts) (
.
?)
No one saw her
except
the postman.
(
DT
No) (
CD
one) (
VBD
saw) (
PRP
her)
(
IN
except) (
DT
the) (
NN
postman) (
.
.)
Page
32
Script
–
Means of Feature Engineering
Fex does not decide or find
good
features.
Instead, Fex provides
you
an easy method to define the
feature types and extracts the corresponding features from
data.
Feature Engineering is in fact very important in practical
learning tasks.
Page
33
Script
–
Description of Feature Types
What can be good features?
Let’s try some combinations of words and tags.
Feature types in mind
Words
around
the target word (
accept
,
except
)
POS tags
around
the target
„
Conjunctions of words and POS tags?
Bigrams or trigrams?
Include relative locations?
Page
34
Graphical Representation
0
1
2
3
4
5
6
7
WRB
Why
VBD
won
NN
't
PRP
you
VBP
accept
DT
the
NNS
facts
.
?
Target
-
2
-
1
1
2
0
-
3
-
4
3
Window [
-
2,2]
Page
35
Script
–
Syntax
Syntax:
targ [inc] [loc]: RGF [[left
-
offset, right
-
offset]]
targ
–
target index
If targ is
‘
-
1
¶«
target file entries are used to identify the targets
„
If no target file is specified, then EVERY word is treated as a
target
„
inc
–
use the actual target instead of the
generic place
-
holder (
µ
*
¶
)
loc
–
include the location of feature relative to the target
RGF
–
define
³
types
´
of features like words, tags, conjunctions,
bigrams, trigrams,
…
, etc
left
-
offset
and
right
-
offset:
specify the window range
Page
36
Basic RGF’s
–
Sensors (1/2)
Type
Mnemonic
Interpretation
Example
Word
w
the word (spelling)
w[you]
Tag
t
part
-
of
-
speech tag
t[NNP]
Vowel
v
active if the word starts
with a vowel
v[eager]
Length
len
length of the word
len[5]
Sensor is the fundamental method of defining “feature types.”
It is applied on the element, and generates active features.
Page
37
Basic RGF’s
–
Sensors (2/2)
Type
Mnemonic
Interpretation
Example
City List
isCity
active is the phrase is the
name of a city
isCity[Chicago]
Verb Class
vCls
return Levin’s verb class
vCls[51.2]
More sensors can be found by looking at FEX source (Sensors.h)
lab: a special RGF that generates labels
lab(w), lab(t), …
Sensors are also an elegant way to incorporate our
background knowledge.
Page
38
Complex RGF’s
Existential Usage
len(x=3), v(X)
Conjunction and Disjunction
w&t; w|t
Collocation and Sparse Collocation
coloc(w,w); coloc(w,t,w); coloc(w|t,w|t)
scoloc(t,t); scoloc(t,w,t); scoloc(w|t,w|t)
Page
39
(Sparse) Collocation
0
1
2
3
4
5
6
7
WRB
Why
VBD
won
NN
't
PRP
you
VBP
accept
DT
the
NNS
facts
.
?
Target
-
2
-
1
1
2
0
-
3
-
4
3
-
1 inc: coloc(w,t)[
-
2,2]
w[‘t]
-
t[PRP], w[you]
-
t[VBP]
w[accept]
-
t[DT], w[the]
-
t[NNS]
-
1 inc: scoloc(w,t)[
-
2,2]
w[‘t]
-
t[PRP], w[‘t]
-
t[VBP], w[‘t]
-
t[DT], w[‘t]
-
t[NNS],
w[you]
-
t[VBP], w[you]
-
t[DT], w[you]
-
t[NNS],
w[accept]
-
t[DT], w[accept]
-
t[NNS],
w[the]
-
t[NNS]
Page
40
Examples
–
2 Scripts
Download examples from tutorial page:
‘context sensitive spelling materials’ link
accept
-
except
-
simple.scr
-
1: lab(w)
-
1: w[
-
1,1]
accept
-
except.scr
-
1: lab(w)
-
1: w|t [
-
2,2]
-
1 loc: coloc(w|t,w|t) [
-
3,
-
3]
Page
41
Lexicon & Example (1/3)
Corpus:
… (NNS prices) (CC or) (VB accept) (JJR slimmer) (NNS profits) …
„
Script:
ae
-
simple.scr
-
1 lab(w);
-
1: w[
-
1,1]
Lexicon:
1 label[w[except]]
2 label[w[accept]]
1001 w[or]
1002 w[slimmer]
„
Example:
2, 1001, 1002;
Generated by
lab(w)
Generated by
w[
-
1,1]
Feature indices of
lab
start from 1.
Feature indices of
regular features
start from 1001.
Page
42
Lexicon & Example (2/3)
Target file:
fex
-
t ae.targ …
accept
except
Lexicon file
If the file does not exist, fex will create it.
If the file already exists, fex will first read it, and then
append the new entries to this file.
This is important because we don’t want two different
feature indices representing the same feature.
We treat
only
these two words as
targets
.
Page
43
Lexicon & Example (3/3)
Example file
If the file does not exist, fex will create it.
If the file already exists, fex will append new examples to
it.
Only active features and their corresponding lexicon
items are generated.
If the read
-
only lexicon option is set, only those features
from the lexicon that are present (
active
) in the current
instance are listed.
Page
44
Now practice
–
change script, run FEX, look at
the resulting lexicon/examples
> ./fex
–
t ae.targ ae
-
simple.scr ae
-
simple.lex short
-
ae.pos
short
-
ae.ex
Page
45
Citations
1)
F. Sebastiani. Machine Learning in Automated Text Categorization.
ACM Computing Surveys,
34(1):1
-
47, 2002.
2)
A. R. Golding and D. Roth. A Winnow
-
Based Approach to Spelling
Correction.
Machine Learning
, 34:107
-
130, 1999.
3)
E. Allewin, R. Schapire, and Y. Singer. Reducing Multiclass to Binary: A
Unifying Approach for Margin Classifiers.
Journal of Machine
Learning Research,
1:113
-
141, 2000.
4)
S. Har
-
Peled, D. Roth, and D. Zimak. Constraint Classification: A New
Approach to Multiclass Classification.
In Proc. 13
th
Annual Intl. Conf.
of Algorithmic Learning Theory,
pp. 365
-
379, 2002.
5)
A. Blum. On
-
Line Algorithms in Machine Learning. 1996.
Page
46
Citations
6)
T. Mitchell.
Machine Learning,
McGraw Hill, 1997.
7)
A. Blum. Learning Boolean Functions in an Infinite Attribute Space.
Machine Learning,
9(4):373
-
386, 1992.
8)
J. Kivinen and M. Warmuth. The Perceptron Algorithm vs. Winnow:
Linear vs. Logarithmic Mistake Bounds when few Input Variables are
Relevant. UCSC
-
CRL
-
95
-
44, 1995.
9)
T. Dietterich. Approximate Statistical Tests for Comparing Supervised
Classfication Learning Algorithms.
Neural Computation
, 10(7):1895
-
1923, 1998
Enter the password to open this PDF file:
File name:
-
File size:
-
Title:
-
Author:
-
Subject:
-
Keywords:
-
Creation Date:
-
Modification Date:
-
Creator:
-
PDF Producer:
-
PDF Version:
-
Page Count:
-
Preparing document for printing…
0%
Comments 0
Log in to post a comment