F a s t a n d A c c u r a t e P a r t o f S p e e c h T a g g i n g T h e S V M

grizzlybearcroatianAI and Robotics

Oct 16, 2013 (3 years and 7 months ago)

94 views

F and Accurate P artfp eec h T agging The SVM
Approac h Revisited
Jes us Gim and Llu M arquez
T ALP Researc h Cen ter LSI Departmen t
Univ ersitat P ecnica de Catalun y a
Jordi Girona Salgado E Barcelona
f jgimenezluism g sipcs
Abstract directly from annotated corpus Although
the ev aluations w ere p rformed e with sligh t v ari
In this pap er w eprenes ta v ery simple a nd ef
fectiv e artfp p eec h b n o S o ations there w as a consensous in the late
V ector hines VM Simplicit y and e
that the statefhert accuracy for English
ciency are ac hiev ed b yw orking with linear sep
POS agging t w as b et w een nd a
arators in the primal form ulation of SVM and
b y using a g reedy leftoigh t t agging sc heme
recen t y ears and
By means of a rigorous exp erimen tal ev aluation
p pular o taggers in the NLP comm unit y ha v e
w e onclude c hat t the prop osed SVMased tag
ger is robust and xible for feature mo delling been HMMased TnT tagger ran ts
ncluding lexicalization trains eien tly w ith
the T ransformationased l earning BL tagger
almost o n parameters to tune and is able to tag
rill and sev v arian ts of the Maxim um
thousands f o w ords p er second whic hmka
really practical for real NLP applications Re
En trop y ME approac h Ratnaparkhi In our
garding accuracy the SVMased tagger signif
opinion TnT example of a really practical
tly outp erforms he t TnT tagger exactly un
tagger for NLP applications It is a v ailable to an y
der the same conditions and ac hiev a v ery
comp etitiv e accuracy of on the WSJ cor
body simple and asy e to use considerably accu
pus whic h comparable the best taggers
rate and extremely eien t allo a training
rep orted up to date
from illon m w ord corp ora in just a few seconds
and tagging thousands of w ords per second In
tro uction d
the of TBL and ME approac the
success been to the xibilit y they or
Automatic partfp eec h OS tagging is the
in mo delling textual information being ME
task of determining the morphosyn tactic category
sligh tly more accurate than TBL
of eac hw odir na gvi en sen tence It is a v ery w
F ar from b e ing considered a closed problem
kno wn problem that has b een addressed b yman y
sev eral researc hers t ried to impro v e results on the
researc hers at least for the t w o ast l ecades d It is a
POS t agging task during last y ears Some of them
fundamental problem in the sense hat t lmost a all
b yallo wing ric her and more complex MM H mo d
NLP pplications a need s ome ind k of POS tagging
els hede Harp er Lee et al oth
previous to construct more complex analysis and
ers b y enric the feature set in a ME tagger
it is p rmanen e tly onashion since curren t appli
outano v a M anning and others b y using
cations d emand an eien t reatmen t t f o m a
more ectiv e learning tec hniques SVM ak a
more quan tities of p ossibly m ultilingual t ext
ga w a et al and a V oted erceptronased
In the r ecen t literature w e c d s ap
training of a ME mo ollins In these
proac hes to POS tagging based on tatistical s and
more complex taggers the statefhert accu
mac hine learning tec hniques including among
racy w as raised up to on the same
man y o thers Hidden Mark o v M o d els eisc hedel
WSJ corpus In a c omplemen tary direction other
et ts Maxim y taggers
researc hers suggested the com bination of sev eral
atnaparkhi T ransformationased l earning
prexisting taggers nder u sev eral alternativ ev ot
rill M emoryased learning Daelemans
ing hemes rill W u Halteren et al
Decision T rees arquez uez
M arquez et al Although the accuracy of
AdaBo o st bney et al Supp ort
these taggers is en b ttere round the
V ector Mac hines ak aga w a et al of
ensem bles of taggers more
the previous taggers v e been ev aluated the
complex a nd less eien t
English corpus using the P enn T reebank
set of categories and a lexicon constructed In e r w e suggest to go bac kto eth TnT
pap this POS
WSJ
on ha
undeniably are POS
Most
ev
and
dr Ro al
sc
et
trop En um Bran al
del
eral ev an
nd ore

hing
ell
con
due has
great hes case
In
wing
to is
es
ican
an is
it es
eral
the
succesful most the the In
Mac
wide rt upp ased tagger
the
olit
enez
astN
philosophy simplicit y and eiency tor i n R y g is the class lab el In
i
statefhert accuracy within the SVM their basic form a SVM learns a linear h yp er
learning framew ork W e claim that the SVM plane that separates the set of p sitiv o e e
based tagger i n tro duced in this w ork ulls f he t re from the set of negativ e maximal
quiremen ts for b eing a practical tagger and ors mar he margin deed as the of
a v o d balance of the follo wing prop erties the h yp erplane to the n earest of the p ositiv e and
Simplicity tagger is easy to use and negativ e examples b has v ed
few parameters to tune Flexibility r obust to ha v e go o d rop p rties e i n erms t of generalization
ness ric hcon text features c an b e eien tly an h b unds o for the induced classirs
dled without o v erting problems allo The linear separator s i deed b yt w o elemen ts
icalization High ac cur acy the SVMased
a w eigh t v ector w ith ne o comp onen t for eac h
tagger p erforms signian tly b etter than TnT and feature a b whic h stands for the dis
ac hiev es an accuracy comp etitiv e to the b st e cur
tance of h yp erplane to the origin The clas
ren t aggers t Eiency training on the WSJ
siation rule of a SVM is n f x w where
is p e rformed n i around one CPU hour and the tag f x w h w x i b x is example
ging sp eed allo ws a massiv e pro cessing of texts
to be classid In the linearly separable case
learning the maximal margin h yp erplane w
w orth noting that the S upp rt o V Ma
c hines SVM paradigm has b en e already pplied a can b e stated a s a con v ex quadratic o ptimization
problem a unique solution minimize jj w jj
to tagging in a previous r e ak aga w a et al
with fo cus on the guessing of unkno subje ct the c onstr aints ne for eac h training
example y h w x b
w ord categories The al tagger constructed in
i i
that pap r e a g v e a clear vidence e that the S VM ap
The SVM mo del has an equiv tdalu ormf u
proac h is s p ecially appropriate or f the second and lation c haracterized b y a w t v ector and
third previous p oin ts main dra wbac k
a bias b In this case tains w eigh t for
beign a w eiency n that er a running eac h t raining v ector indicating the imp ortance of
sp eed o f round a w ords p e r second is rep orted this v ector i n the solution V ectors with non n ull
In the resen p t er w eo v ercome this limitation w ts are alled c ve ctors The dual clas
P
N
b y w orking with linear k ernels in the primal set siation rule is f x y h x x i b
i i i
i
ting of the SVM framew ork taking tage of and the v can be calculated a
the extremely sparsit y of example v ectors The quadratic optimization p roblem Giv en the opti

resulting tagger is almost as accurate that of
mal v ector o f he t dual quadratic optimization

ak w a al but times faster in a problem the w t v ector w that realizes the
preliminar p rotot yp e implemen etdinP
maximal margin h yp erplane s i alculated c as
The est r o f he t pap er is organized as follo ws
N
X
section the formal SVM learning setting is pre
w y x
i i
i
sen ted Section is dev t
i
of our approac h tagging Section describ e s

The b has also a simple expression in terms of
the e xp erimen tal w ork carried out in rder o to v al
N
w the training f x g See
i i
i
idate the presen ted SVM tagger Section in
ristianini Sha w e a ylor or f details
cludes some iscussion d on the presen ted approac h
The adv an tage the dual ulation is
in comparison to other elated r w ork and ally
p rmits e t learning noninear
section concludes and outlines t he directions of
separators b yin tro d ucing kernel functions T ec h
the future r esearc h
nically k ernel function alculates c a dot pro duct
bet w een t w ov ectors t hat ha v e b een on linearly
Supp ort V ector Mac hines
mapp ed in to a high dimensional feature space
SVM a mac hine learning algorithm for binary
Since there is no need to p rform e this mapping ex
classiation whic h has b een successfully applied plicitly the training is still the
toan b er o f p ractical problems including NLP
dimension of space can be v
ristianini Sha w e a ylor high or ev en inite
Let f x x g be the set N the presence outliers wrongly classi
N N
training examples where eac h instance x a v ec d training examples it ma y be useful to w
i
allo is
and of In of
ery feature real the
um
although feasible
is
SVM of eien an
that form of
examples and
to
details he explain to oted
In
erl
eigh et aga
as
as also ector an adv

ort supp eigh pap
pap lo
one con
the the of
eigh
alen

to wn the
pap
with
ector is It

the and
sg
the
bias and
lex wing
and
pro ias learning This has the
go ery
distance is gin
with examples
xamples
but
and withw ord f eatures w

some training errors in order to a v o v erting
POS features p

biguit y classes a
This is ac hiev ed b ya v arian tof the optimization
ma e m

problem referred o t as soft mar gin in whic h the
w ord bigrams w w w

w w

con tribution the ob jectiv e function margin
POS bigrams p p a

w ord trigrams w w
maximization and training errors can b e balanced
w w

through the use o f a parameter called C
w w

p p

p

Problem Setting
T F eature patterns used to co dify exam
In this section the d etails ab out our approac h to
ples
POS tagging egarding r t he collection and f eature
co diation f o raining t e xamples re a presen ted
accoun t Since a v ery simple leftoigh t tag
Binarizing the Classiation Problem
ging will b e used the tags the follo w
ing w ords are kno wn running time F ol
T agging aw in c text is a m ultilass classi
lo approac h aelemans et al
ation problem Since are binary classi
w e use the general ambiguitylass tag for
rs a binarization of the problem m ust be per
the righ t w ords whic h is a lab el com
formed b efore applying them W eha v e applied a
posde b y the concatenation of all p o ssible tags
simple one erlass binarization i a SVM is
for the w ord INB JJN etc h of
trained for ev ery partfp eec h in order to dis
the individual of biguit y class is also
tinguish bet w een examples of this class and
tak en as a binary feature of the form ollo w
the rest When tagging a w ord most con
ing w ord ma y be a Therefore with am
den t t ag according to the predictions of ll a binary
biguit y a yb e w e a v oid the two
SVMs is selected
p asses solution prop osed in Nak aga w a
w ev er not all training examples v e been
in whic h a st tagging p erformed in order to
considered for all classes Instead a dictionary is
ha v e righ tcon texts disam biguated for the second
extracted from the training orpus c with a ll p ossi
pass Also in ak aga w a et al it is sug
ble t ags for eac hw ord a nd when considering the
gested that explicit n ram features a re not nec
o ccurrence of a training w ord w tagged as t hist
i
essary in the SVM approac h b ecause p o lynomial
example is used as a p ositiv e example for
k ernels aco oun t for the com of features
t and a negativ e example for all other t classes
i j
Ho w ev er since w eare ni terested in w orking with
app earing a s p ossible tags for w in the ictionary d
alinear k ernel w eha v e i ncluded t hem in the fea
In this w a y w e a v oid the generation excessiv e
ture set In section w e will ev aluate im
nd irrelev t negativ e e xamples and w emak e

p rtance o of t his kind of features
the training s tep faster In the f ollo wing sections
w e will see w a w ord corpus generates
Exp rimen e ts
training sets of ab out examples on a v erage
instead of
This section presen ts ts carried ut o
in to ev aluate the approac h to POS
F eature Co diation
tagging As in man y other w orks the W
Eac h example has been co did on ba
Journal data from the P T reebank I I I has b een
sis of the c al c ontext of w ord to be dis
used as the b enc hmark c orpus W eha v e randomly
am biguated W e v e considered a cen tered
divided t a sen tence ev l el his t million w ord
windo w of tok ens in whic h some basic
corpus i n to three subsets training
and n ram patterns are ev aluated to bi
w ords v alidation w ords and test
nary features suc h as revious w ord
w ords All tagging exp eri
w o preceeding tags are NN T able
men ts rep orted are ev on the
con tains he t list of all patterns considered
The v alidation set has b een used o t optimize
As it can b e s een the tagger is lexicalized and
parameters P T reebank tagset con
all w ord forms app aring e in windo w a re tak en in to
tags Ho w er after compiling training exam

in the w a y explained in section
See bney for a discussion on the eiency
problems when learning from large POS training sets of them receiv e p sitiv o e negativ e examples
and
al et
only ples
ev
tains enn The
set
test complete aluated
etc DT
the
the is
form
en sev
ha
the lo
enn
the
Street all
SVM order
erimen exp the
ho
an
the
of
bination
class
is
ha Ho
al et
and classes
VBZ
the
all
am an tags
Eac
text con

more
SVMs
of the wing
on ord
at not
of heme sc
able


trigrams POS





of to




yb
am

oid
acc v l time
Regarding accuracy similar r esults to those of
d fs ak w a et al be dra wn When us

d fs
ing the of atomic features fs the best re


d fs sults obtained with a degree p o

d fs
k Greater degrees o v er

d fs
ting to the training data since n um ber of

d fs supp ort v ectors highly increases and the accu

racy ecreases d When using the n ram extended
T able Accuracy results n am biguous w of al
set of features fs the linear k b

ternativ e mo dels v arying k ernel egree d and feature
v ery comp etitiv e compared to the de
set sv stands for the a v erage n um b er of s upp ort v ec
gree p olynomial k ernel b eing clearly prefer
tors p er t ag and l time is the CPU time needed for the
able regarding sparsit y the solution and
whole t raining
the learning time times f aster In terestingly
the tage the extended set of is
only noticeable in case of the linear k ernel
Th us nly o SVM classirs ha v e to b e trained
since ccuracy a decreases w hen i t is sed u with the
in the inarized b setting The unambiguous
p lynomial o k ernels
corresp o nd to punctuation marks sym b o
As a onclusion c w e c an state that a linear k er
the ategories c TO and WP
an n ram based of basic features
sues to obtain highly accurate SVM mo dels b e
Linear vs P olynomial Kernels
ing relativ f to In w e
The rst exp erimen t xplores e the e ct of the k er
will see ho w the linear solution has t he additional
nel n i the training pro cess and in the generaliza
adv an tage w in the primal set
tion accuracy as do w e ha v e trained
ting a v ery sparse v ector of w eigh ts
sev eral SVM classiation m o d els with the hole w
fact is crucial o t o btain a ast f POS tagger
w ord raining t set b yv arying the d egree d f
the p olynomial k ernel s w are ac p k
Ev aluating the SVM T agger
lig ht
in all the exp erimen ts rep orted w as SVM A
Hereafter w e will only on the linear
v simple frequency threshold ommon
mo del this exp e rimen t the tagger is
the exp erimen ts w as used to ter ut o unfrequen t
tested in a more realistic situation that is p er
features In particular w eha v e discarded features
forming a leftoigh t tagging of the sequence
that ocrcu less than n times where n is the
of w ords with onine calculation of
minim n um b er o s hat t he t total moun a t of f ea
making use of the already assigned text

tures is ot n greater than When training
POS t ags F wing the simplicit y a nd
the s etting of the C parameter has b een left to its
principles a greedy leftoigh t tagging sc heme
default v alue In the follo wing subsections w e illw
is applied in whic h no optimization of the tag se
see w t he optimization of this parameter leads
quence is p e rformed sen tence lev el W e
to a s mall impro v emen t in t he al tagger b eing
ha v e implemen this st tagger proto
the SVM algorithm quite robust with resp ect to
t e in P whic h be referred to
parameterization
SVM tagger The SVM dual represen tation based
Results obtained when classifying the bigu
lig ht
on a set of supp ort v ectors output b y
ous w in the test set presen ted T able
is v erted in to the primal form w using
This test has b een p erformed in a b atch mo de
equation xplained e in s ection
that is examples am biguous w ords are tak en
The tagger h as b en e tested under the close dvo
separately and the features v olving
c abulary a ssumption in w hno kunno wn w
of left texts are calculated using the
a w ed This is sim ulated b y directly includ
c orr e POS t ag assignmen t f o the corpus
ing in he t dictionary the w ords of the test set that
ht do occur in training et s The results ob
S soft w are is freely a v ailable at t he follo w
tained b y increasing sizes set are
ing URL httpvmlightoachimsr g

complex metho ds can be used to ter ir
presen ted in able t Learning a nd tagging times
relev an t eatures f w ev er the feature selection problem
are lso a graphically presen ted in Figure All the
is bey ond the scop e of this w in whic h simplest
alternativ es are preferred exp e rimen w p erformed nder Lin ux with
ere ts
the ork
Ho
out More
training the of
VM The
the not lig
ct
llo are
con the tags
ords hic
POS the in

of

con
in are ords
SVM
am
as will erl yp
POS ted
the at
ho
eiency ollo
lefton
um
features an
POS In
all to ery
SVM cus fo
used age oft The

This with
that to So
ork to wing allo of
section next the train ast ely
set with nel
and ls
tags
the
features of an adv
of the
SVM
ecomes ernel
ords

the

duce pro ernel

lynomial are
set

can aga
del mo90
non n ull imensions d of the example to classify the
80
tagging time acnbeman tained almost in v t
70
In particular the t agging sp eed orresp c onding to
60
the w is ab out w ords p e r second
50
optimizing the C parameter the
40
algorithm i the tradeo bet w een er
30
ror maximization sligh tly b etter re
20
sults be obtained W e ha v e optimized the
10
learning time
C parameter on the v alidation set y maximiz
0
0 100000 200000 300000 400000 500000 600000

ing accuracy this v alue prop erly set
words
300
the accuracy results obtained b ySVM tagger s
ing the whole training set w ere on am
250
biguous w ords o v erall n increase
200
of p oin ts T o v e an idea qualit y
of these v alues w e also run TnT exactly on the
150
same c onditions ncluding unkno wn w ords in the
100
bac kup l exicon nd a the r esults obtained w ere sig
50 nian tly lo w er for am biguous w ords and
o v erall
tagging time
0
0 100000 200000 300000 400000 500000 600000
words
Including unkno wn w
Figure Learning and tagging time plots of
The tagger results presen ted in the previous ec s
SVM tagger b y increasing sizes of the training set
tion are still not realistic since w e c an not assume
a closed v o abulary c In order to deal with
problem w eha v edev elop ed an SVMased mo del
ahGz P en tiumV pro cessor with b f o R AM
to recognize unkno wn w ords Unkno w ords are
The ures v e b een calculated using the
treated as am biguous w ords with all p ossible POS
Benc hmark ac p k age of P erl and rect CPU time
tags corresp onding to p o nlass e w in
could be exp ected the accuracy of the
P T reebank t agset b ut sp ecialized VMs S are
tagger gro ws with the ize s of he t training set pre
learned with particular features to disam biguate
sen ting a logarithmic b eha viour Regarding e
among t he POS tags of unkno w ords The ap
ciency can be observ ed that training time is
proac h is similar to that of ak aga w a et al
almost linear ith w resp ect t o the n um ber of xeam
and he t features used whic h re a presen ted in ta
ples of the training set Besides t he compression
ble tak en from the follo w orks rill
of the SVM with resp ect the training
M arquez et al N ak aga w a et al
set increases with the training set size
T raining examples unkno wn w ords ha v e
go es from rom examples o t
b en e collected rom f he t training set in the follo w
supp ort v ectors to rom
ing w a y First training corpus randomly
ples to supp ort v ectors Ho w ev this
divided in to t w en t y parts of equal size Then the
compression lev el w ould not p e an eien t
st part is used to extract he t examples whic hdo
tagger the dual form since thousands of dot
not o ccur in the emaining r nineteen parts that is
pro ucts d w be needed to classify h w
taking the of the corpus as kno wn and the re
More in terestingly the mo del primal form is

The tunning of the C parameter is also automatically
quite compact since t he w eigh tv ectors resulting
done b y iterativ ely exploring shorter a nd shorter in terv als
from compacting all supp ort v ectors on c no
of v alues n i whic h the b st e accuracies are observ ed in the
more than features mong he t p os
v alidation set The setting of o ur algorithm in v olv es e
parameters minC axC m l og n iter s n seg ments The
sible features Pro vided that test examples
st t w o minC and determine the o v erall
are v sparse hey con tain ab out
in al to examine log is true the rst i teration is ap
in a v erage irresp ectiv e of the training set size
proac hed l ogarithmically The last t w o argumen ts n iter s
and n m stand for the total n um b e r of iterations
the c lassiation rule is v ery ien e t since a sin
and the n ber of ni terv als that m ust b e xplored e at eac h
gle ot d pro uct d with a sparse v ector is needed t o
iteration resp ectiv ely In our exp erimen ts the com bination
classify eac hw Basing this dot ro p duct on the true w as used
seconds minutes
ord
um
ents seg
If terv
features ery
maxC ones
the

tains
in
ord eac ould
in
rmit
er
is the
exam
for
and also
to del mo
wing are

it
wn
enn
it As
the ords
ha time
wn
this
ords
the of ha
and
With
can
margin and
training
SVM of By
arian am b all xs eat v x w l t time
i
k







T able Accuracy results of SVM tagger under he t closed v o c abulary assumption b y i ncreasing sizes of the training
corpus m b and ll olumns c con tains accuracy ac ed on am biguous w ords and o v erall resp e ctiv exs nd a
sv stand for the a v erage n um ber of examples and supp ort v ectors per POS tag feat for the total n um ber of
binary f eatures fter tering x f or the a v erage n b r e of activ e features i n the training examples and w for
i
the a v erage n um b r e of imensions d of the w eigh tv ectors time and time eferr to learning and tagging time
All f eatures f or kno wn w ords ee table
am b kno unk
prees s s s s s s

s s s s

sues s s s s s s
n n n n n n
SVM
s s s s
n n n n
b gins e with U pp er Case y eso
SVM
all Upp er Case y eso
all oL w er Case y eso
con tains a Capital Letter
T Accuracy results of SVM tagger to
not t a he t b e ginning y eso
TnT under the op en v o c abulary assumption no wn nd a
con tains more han t one Capital
Letter not a t t he b eginning y eso
nk refer to the subsets o f kno wn and unkno wn w ords
con tains a p rio e d y eso
con tains a n ber y eso
resp ectiv ely m b the subset of am biguous wn
con tains a h yphen y eso
w ords and ll to the o v erall accuracy
w ord length in teger
T able F eature templates for unkno wn w ords
maining to extract the examples ce
tion of the W Street Journal corpus in order
dure is rep eated with e ac hof the t w en t y
to compare our w ork to other related pre
taining appro ximately examples the
vious ones Sections w ere used for
whole corpus The c hoice of dividing b y t w t y
for v alidation for test resp ec
is not arbitrary the prop ortion that
tiv ely tunning of the C parameter the
results a perenc tage of unkno wn w v ery
system to ac hiev ea tok en accuracy of sig
similar to the test set
nian tly outp erforming
Results o btained are presen ted n i ablet The
is comp etitiv e o t t he one ep r orted in ollins
lab e l tagger corresp onds to the case in
although still a little lo w er than the one

whic hthe C parameter has b een optimized
found i n T outano v a al see fur
Similar to the previous section results ob
ther details i n Section
tained b y the tagger clearly outp erform the
results of TnT t agger and they are c omparable to
The P erl implemen tation SVM
the accuracy of the bets curren t taggers h
mo del ac hiev es a tagging sp eed of w
range from to Again the tunning
per second en t yp e of op com
of C parameter pro vides a small incremen t
puted b y the tagging algorithm w e fairly b e
of p erformance at a cost of increasing the
that a r emplemen tation in C sp eed up
training time to almost PU C hours
the t agger making the eiency v alid for massiv e
F wing a suggestion b y one of the referees
text pro cessing Of course the nT T tagger is still
exp erimen ts w ere replicated a diren t parti
m h more eien t ac hieving a agging t sp eed of

more w ords per second on the same
The C parameter v alue is for kno wn w ords and
or f unkno wn w ords conditions
than
uc
on
ollo
could
but
eliev
the
erations the Giv
ords
hic
tagger of
SVM
the
et
SVM

Result TnT
ords in
lead The
is
and
en
training
from
some
ob parts
all
pro This
kno to
um
compared able
tagger
tagger
TnT
all wn
um
ely hiev

time ords Discussion tric ky including the use of a c ompan y named en
tit y detector and sev eral ado c feature
The w ork presen ted in this pap er is closely elated r
based on the observ ation of the errors commited
to ak aga w a et al In that pap er an SVM
b y he t tagger This fact ma y c ause the POS tag
based tagger is presen ted and compared to TnT
ger o t b e highly SJep W enden t Regarding e
obtaining a best accuracy of hen train
ciency few commen ts are included in the pap er
ing from a million w corpus Apart
The training t ime for the b est mo del is said to b e
the t raining s et size the main dirences of b o th
ab out hours on a Hz pro cessor itera
approac hes a re explained b e lo w
tions at min utes per iteration while tagging
Nak w a w ork is fo cused on tagging un
times are not rep o rted
kno wn w ords A certainly ado c pro cedure is
p rformed e to tag the w ord sequence in t w o passes
Conclusions and F uture W
In the st pass tagger disam biguates the
In this w ork w eha v e presen ted a SVMased POS
whole s en tence and in the s econd pass the previ
tagger suitable for r eal a pplications since it pro
ously assigned tags are assumed correct in order
vides a v ery good balance of sev eral go o d prop
to extract righ ton text tag features This
erties for NLP to ols simplicit y xibilit y high
tagging o v erhead also pro jected to train
p rformace e and e iency The next step w e plan
since t w o v ersions with without righ t
to do is to remplemen t the tagger in C to
con text features m ust b e trained Addition
signian tly eiency and pro a
it argued that adv an tage

soft w are pac k for public use
SVM is hat t i t s i not necessary to co dify c omplex
n ram features since p o lynomial k ernels them Regarding the study of the SVMpproac h to
selv es succeed a t d oing this task Th us hey t base POS tagging some issues e further in v es
their b est t agger in a dual solution with degree tigation First learning mo for unkw ord
p olynomial k ernels instead of a linear separator w ords exp e rimen ted er is preliminar
and w e that can be clearly impro v ed Sec
in the primal etting s
ond w e ha v e applied only the simplest greedy
Since they are using k ernels and SVM classia
leftoigh t tagging sc heme Since SVM predic
tion in the ual d setting the t agger imply s can not
tions can be con v erted in to probabilities a nat
be fast running time In particular a tagging
ural extension w ould be to a sen tence
sp eed o f w ords p r e second hours needed
lev el tagging mo del in h the probabilit y of
to tag a w ord test set is rep orted T raining
the hole w sen tence assignmen t is maximized
is also m h more slo w er than ours he time re
quired for raining t from a w ord s et is ab out Finally e a re exploring the p ossibilit y of sim
hours probably due w a y whic h plifying the mo dels b y a p osteriori feature ter
they select training examples for eac h POS ing o n the w tv ector w That simpliation
incides ollaterally c on the t agging sp eed and accu
time w e w preparing this do cu
racy First exp erimen ts on the mo els d rained t on
men t another pap er w as t to atten
w ords indicate that eliminating the features
tion whic h rep orts the best results on the WSJ
with lo w w eigh ts do es not h urt he t p e rformance
corpus to date with a single tagger W e refer to
v ery m uc h see Figure Indeed a v ery little
outano v a et al in whic h a tagger based on
impro v emen t i s obtained discarding b et w een
a cyclic dep endency w presen This
and of the w dimensions But the size of the
tagger hiev es accuracy v alues bet w
and the WSJ corpus ith a training mo dels y be still further reduced particu
larv ery comp e titiv eo v erall ccuracy a of
set of w ords allo ws to explicitly mo del left
r tcon text features in the sequence tagging can b e still obtained discarding p u t o of the
w dimensions v ery in terestingly it is
sc heme and it is lexicalized erting in the
un til w e discard o v of w
extremely large feature spaces nduced i is a v oided
that accuracy falls do wn These observ a
b y o m el d regularization
tions hold for both the and v sets
o d results of the tagger are partly due
to the outstanding recognition of unkno wn w ords

no w the protot yp e v ersion of the tagger is
ccuracy v alues bet w een and
public demostration at the follo W eb address
Ho w ev er t he treatmen t of unkno wn w ords is a b it wwwsipcs nlpVMtagger html
wing for

By
go The
alidation test
dimensions the er
Ov
not And
igh and
In ma in
een ac
ted is ork net

er
our brough
ere the By
eigh
in the to

uc

whic
consider
at
think
pap this in
del the
deserv
age
using of one is ally
vide to increase
tag
and ing
in is
the
the
ork
aga
from ord
patterns97.2
ee et al S Lee J Tsujii and H Rim P artfp eec hT
Based on Hidden Mark o v Mo del Assuming Join t Indep endence
In Pr o c e e dings of the h A Me eting of the A CL
97
arquez o R dr uez L M arquez and H Ro dr uez Auto
96.8
matically Acquiring a Language Mo del for POS T
Decision T rees In o c e e dings of he t Se c ond ANLP R Confer
e
96.6
arquez et al L M arquez H Ro dr uez J Carmona and
96.4
J tolio Impro ving POS T agging Using Mac hineearning
T hniques In o c e e dings of MNLPLC E
96.2
ak w a et al T Nak aga w a T Kudoh and Y Matsumoto
Unkno wn w ord g uessing and partfp eec h t sup
Test Set
96
port v mac hines In o c e e dings of the Sixth Natur al L an
0 20 40 60 80 100
guage Pr o c R im Symp osium
W dimension Reduction %
atnaparkhi A atnaparkhi R A M axim um En trop yP artf
sp hT agger In Pr o c e e dings of the t EMNLP Confer enc e
Figure T ok en Accuracy b eha viour or f the

set when reducing the size the w eigh t v
hede Harp er S M Thede and M P Harp er A Second
w d imensions b y d iscarding those w eigh ts closer Order idden H Mark o v Mo del for P artfp eec hT agging In o
c e e dings of the h A Me eting of t he A CL
to zero
outano v a Manning K T outano v a nd a C D Manning En
hing the Kno Sources Used a Maxim um En trop y
P artfp eec h T agger In o c e e dings of EMNLPLC
and surely are op ening the a v for a further

increasing n o the tagging sp eed
outano v a et al K T outano v a D Klein and C D Manning
F eatureic h partfp eec h tagging with a cyclic dep endency net
w ork In Pr o c e e o f HL TAA CL
Ac wledgemen ts
eisc et al R W R h w artz J P ucci
M Meteer and L amsha R w Coping with Am y nda Un
The authors w an t to thank the anon re
wn W through Probabilistic Mo dels Computational
view ers for their v aluable commen ts sugges Linguistics
tions in order to prepare the al v ersion of the
pap er
This researc h has b een partially funded b y
the Spanish Ministry of Science and T ec hnol
ogy MCyT pro jects HERMES TIC
C ALIADO TIC and b y
the Europ ean Comission CT IST
and b y the Catalan Researc h Departmen t
IRIT consolidated researc h group SGR

References
bney et al S Abney c E hapire and Y Singer Bo ost
ing pplied a to tagging and ppttac hmen t In Pr o c e e dings of
EMNLPLC
ran ts T Bran ts TnT A Statistical P artfp eec hT agger
Pr o c e e dings of the Sixth ANLP
rill W u E Brill and J W u Classir Com bination for
Impro v ed Lexical Disam biguation In Pr o c e e dings of COLING
A CL
rill E Brill T ransformationased Errorriv en Learning
and atural N Language Pro cessing A Case Study in P artf
sp eec hT agging Computational Linguistics
ollins M Collins Discriminativ eT raining Metho ds f or Hid
den ark M o vModels Theory and E xp erimen ts with P erceptron
Algorithms o c e e dings of the h EMNLP Confer enc e

ristianini Sha w e a ylor N Cristianini and J Sha w e a
A n ntrI o duction to Supp V e ctor Machines am bridge Uni
v ersit y P ress
aelemans al W Daelemans J Za vrel P Berc k and
S Gillis M BT A Memoryased P artfp e ec hT agger ener G
ator In o c e e dings of the h Workshop on V ery L ar ge
p a
alteren et al H v an Halteren J Za vrel and W aelemans D
Impro ving Data Driv en W ordclass T agging b y ystem S C om bina
tion n I Pr o c e e dings of COLING CL
accuracy %

or
Cor Pr
et
ort
ylor
Pr In

In

AR
and
ords kno
ymous
biguit
alm Sc hedel eisc hedel
kno
dings

ue en
Pr
in wledge ric

nnual
Pr
ector of
test
eec
Paci essing
Pr ector
using agging
aga
Pr ec
Mon
enc
Pr
Using agging
nnual
agging