F eature Ranking Using Linear SVM - DataMin

zoomzurichAI and Robotics

Oct 16, 2013 (3 years and 9 months ago)

84 views

JMLR:WorkshopandConferenceProceedings3:53-64WCCI2008workshoponcausality
FeatureRankingUsingLinearSVM
Yin-WenChang
b92059@csie.ntu.edu.tw
Chih-JenLin
cjlin@csie.ntu.edu.tw
DepartmentofComputerScience,NationalTaiwanUniversity
Taipei106,Taiwan
Editors:I.Guyon,C.Aliferis,G.Cooper,A.Elisseeff,J.-P.Pellet,P.Spirtes,andA.Statnikov
Abstract
Featurerankingisusefultogainknowledgeofdataandidentifyrelevantfeatures.This
articleexplorestheperformanceofcombininglinearsupportvectormachineswithvarious
featurerankingmethods,andreportstheexperimentsconductedwhenparticipatingthe
CausalityChallenge.Experimentsshowthatafeaturerankingusingweightsfromlinear
SVMmodelsyieldsgoodperformances,evenwhenthetrainingandtestingdataarenot
identicallydistributed.CheckingthedifferenceofAreaUnderCurve(AUC)withand
withoutremovingeachfeaturealsogivessimilarrankings.Ourstudyindicatesthatlinear
SVMswithsimplefeaturerankingsareeffectiveondatasetsintheCausalityChallenge.
Keywords:SVM,featureranking.
1.Introduction TheCausalityChallenge(Guyonetal.,2008)aimsatinvestigatingsituationswherethe
trainingandtestingsetsmighthavedifferentdistributions.Thegoalistomakepredictions
onmanipulatedtestingsets,wheresomefeaturesaredisconnectedfromtheirnaturalcause.
Applicationsoftheproblemincludepredictingtheeffectofanewpolicyorpredicting
theeffectofanewdrug.Inbothexamples,theexperimentalenvironmentandthereal
environmentdiffer.
Inordertomakegoodpredictionsonmanipulatedtestingsets,weuseseveralfeature
rankingmethodstogainknowledgeofthedata.Amongexistingapproachestoevaluate
therelevanceofeachfeature,somearerelatedtocertainclassificationmethods,butsome
aremoregeneral.Thoseindependentofclassificationmethodsareoftenbasedonstatistic
characteristics.Forexample,weexperimentedwithFisher-score,whichisthecorrelation
coefficientbetweenoneofthefeaturesandthelabel.Inthiswork,weselectSupportVector
Machines(SVMs)(Boseretal.,1992)astheclassifier,andconsideronefeatureranking
methodspecifictoSVM(Guyonetal.,2002).
Thisarticleisorganizedasfollows.InSection2weintroducesupportvectorclassifica-
tion.Section3describesseveralfeaturerankingstrategies.Section4presentsexperiments
conductedduringthedevelopmentperiodofthecompetition,ourcompetitionresults,and
somepost-challenganalysis.ClosingdiscussionsareinSection5.
c
2008Yin-WenChangandChih-JenLin.
ChangandLin
2.SupportVectorClassification Supportvectormachines(SVMs)areusefulfordataclassification.Itfindsaseparatinghy-
perplanewiththemaximalmarginbetweentwoclassesofdata.Givenasetofinstance-label
pairs(xi
,y
i),xi
∈Rn,y
i
∈{1,−1},i=1,...,l,SVMsolvesthefollowingunconstrained
optimizationproblem:
min
w,b
1
2
wT
w+C
l
￿
i=1
ξ(w,b;xi
,y
i),(1)
whereξ(w,b;xi,y
i)isalossfunction,andC≥0isapenaltyparameteronthetraining
error.Twocommonlossfunctionsare:
max(1−yi(wT
φ(xi)+b),0)andmax(1−yi
(wT
φ(xi
)+b),0)2,(2)
whereφisafunctionthatmappedtrainingdataintohigherdimensionalspace.Theformer
iscalledL1-lossSVM,andthelatterisL2-lossSVM.Whenparticipatinginthechallenge,we
choosetheL2-lossfunction.Post-challengeexperimentsshowthatthetwolossfunctions
resultinsimilarperformances.Wegivedetailedresultsofusingbothlossfunctionsin
Section4.3.
Foranytestinginstancex,thedecisionfunction(predictor)is
f(x)=sgn
￿
wT
φ(x)+b
￿
.(3)
Practically,akernelfunctionK(xi,xj)=φ(xi
)T
φ(xj
)maybeusedtotraintheSVM.A
linearSVMhasφ(x)=xsothekernelfunctionisK(xi,xj
)=xT
i
xj.Anotherpopular
kernelistheradialbasisfunction(RBF):
K(xi
,xj
)=exp(−γxi
−xj
2
),whereγ>0.(4)
WeuselinearSVMforbothfeaturerankingandclassificationinthechallenge.Wealso
conductsomepost-challengeexperimentsusingSVMwithRBFkernelastheclassifier.The
resultswillbediscussedinSection4.3.
WeusegridsearchtodeterminethepenaltyparameterCforlinearSVM,andbothC
andγforSVMwithRBFkernel.ForeachvalueofCor(C,γ),weconductfive-foldcross
validationonthetrainingset,andchoosetheparametersleadingtothehighestaccuracy.3.FeatureRankingStrategies Inthissection,wedescribeseveralfeaturerankingstrategiesthatweexperimentwithinthe
challenge.Allmethodsassignaweighttoeachfeatureandrankthefeaturesaccordingly.
3.1F-scoreforFeatureRanking
F-score(Fisherscore)isasimpleandeffectivecriteriontomeasurethediscrimination
betweenafeatureandthelabel.Basedonstatisticcharacteristics,itisindependentofthe
classifiers.FollowingChenandLin(2006),avariantofF-scoreisused.Giventraining
54
FeatureRankingUsingLinearSVM
Algorithm1FeatureRankingBasedonLinearSVMWeights
Input:Trainingsets,(xi
,y
i
),i=1,...,l.
Output:Sortedfeaturerankinglist.
1.UsegridsearchtofindthebestparameterC.
2.TrainaL2-losslinearSVMmodelusingthebestC.
3.Sortthefeaturesaccordingtotheabsolutevaluesofweightsinthemodel.
instancesxi
,i=1,...,l,theF-scoreofthejthfeatureisdefinedas:
F(j)≡
￿
¯
x(+)
j

¯
xj
￿2
+
￿
¯
x(−)
j

¯
xj
￿2
1
n+−1
n+
￿
i=1
￿
x(+)
i,j

¯
x(+)
j
￿2
+
1
n−−1
n−
￿
i=1
￿
x(−)
i,j

¯
x(−)
j
￿2
,(5)
wheren+
andn−
arethenumberofpositiveandnegativeinstances,respectively;
¯
xj
,
¯
x(+)
j
,
¯
x(−)
j
aretheaverageofthejthfeatureofthewhole,positive-labeled,andnegative-labeled
datasets;x(+)
i,j
/x(−)
i,j
isthejthfeatureoftheithpositive/negativeinstance.Thenumerator
denotestheinter-classvariance,whilethedenominatoristhesumofthevariancewithin
eachclass.AlargerF-scoreindicatesthatthefeatureismorediscriminative.
AknowndeficiencyofF-scoreisthatitconsiderseachfeatureseparatelyandthere-
forecannotrevealmutualinformationbetweenfeatures.However,F-scoreissimpleand
generallyquiteeffective.
3.2LinearSVMWeightforFeatureRanking
AfterobtainingalinearSVMmodel,w∈Rn
in(1)canbeusedtodecidetherelevanceof
eachfeature(Guyonetal.,2002).Thelarger|wj
|is,thejthfeatureplaysamoreimportant
roleinthedecisionfunction(3).OnlywinlinearSVMmodelhasthisindication,so
thisapproachisrestrictedtolinearSVM.Wethusrankfeaturesaccordingto|wj
|.The
procedureisinAlgorithm1.
3.3ChangeofAUCwith/withoutRemovingEachFeature
Wedeterminetheimportanceofeachfeaturebyconsideringhowtheperformanceisinflu-
encedwithoutthatfeature.Ifremovingafeaturedeterioratestheclassificationperformance,
thefeatureisconsideredimportant.WeselectthecrossvalidationAUCastheperformance
measure.FeaturesarerankedaccordingtotheAUCdifference.
Thisperformance-basedmethodhastheadvantageofbeingapplicabletoallclassifiers.
Thedisadvantageisthatittakesahugeamountoftimetotrainandpredictwhenthe
numberoffeaturesislarge.Besides,byremovingonlyonefeatureatatime,themethod
doesnottakeintoaccounthowfeaturesaffecteachother.
55
ChangandLin
Table1:Challengedatasets.Allofthemhavetwoclasses.
DatasetFeaturetype#Feature#Training#Testing
REGEDnumerical99950020,000
SIDObinary4,93212,67810,000
CINAmixed13216,03310,000
MARTInumerical1,02450020,000
LUCASbinary112,00010,000
LUCAPbinary1432,00010,000
3.4ChangeofAccuracywith/withoutRemovingEachFeature
ThismethodisthesameastheonedescribedinSection3.3,exceptthatthemeasureof
performancesistheaccuracyrate.4.ExperimentalResults IntheCausalityChallenge,therearefourcompetitiontasks(REGED,CINA,SIDOand
MARTI)andtwosmalltoyexamples(LUCASandLUCAP).Alltaskshavethreeversions
ofdatasets,eachwiththesametrainingset,anddifferenttestingsets.Testingsetswith
digitzeroindicatesunmanipulatedtestingset,whiledigitoneandtwodenotemanipulated
testingsets.Table1showsthedatasetdescriptions.Detailscanbefoundathttp:
//www.causality.inf.ethz.ch/challenge.php.
Wepreprocessdataviascaling,instance-wisenormalization,andGaussianfiltering.We
scaleeachfeatureofREGEDandCINAto[0,1],andapplythesamescalingparameterto
theirtestingsets.Incontrast,trainingandtestingsetsinMARTIareseparatelyscaledto
[−1,1]foreachfeature,sincethiswayresultsinabetterperformance.Anotherreasonis
thatthetrainingdatainMARTIareperturbedbynoises,whilethetestingdataarefreeof
noises.AfterapplyingaGaussianfilteronthetrainingsettofilteroutthenoises,thereis
anunknownbiasvaluethatwewouldliketosubstrateoradd.Wemightuseinformation
fromthedistributionoftestingdatatogainknowledgeoftheunknownbiasvalue,andthen
scalethetrainingandtestingdatausingthesamescalingparameter.Alternatively,wecan
ignorethebiasvalue,andscalethetrainingandtestingdataseparately.ForSIDO,LUCAS,
andLUCAP,therangeoftheirfeaturesarealreadyin[0,1].Wenormalizeeachinstanceof
thesethreeproblemstohavetheunitlength.
Accordingtothedatasetdescription,twokindsofnoiseareaddedtoMARTI.First,to
obtain1,024features,999featuresinREGEDarecomplementedby25calibrantfeatures,
eachofwhichhasavaluezeroplusasmallGaussiannoise.Second,thetrainingsetis
perturbedbyazero-meancorrelatednoise.Sincewecannotgetintothefirstquartileofthe
competitionresultswithoutregardingthenoise,weuseaGaussianfiltertoeliminatethe
lowfrequencynoiseinthetrainingsetbeforescaling.Foreachinstance,werearrangethe
1,024featuresintoa32×32arrayandapplytheGaussianfilter,accordingtothefactthat
neighboringpositionsaresimilarlyaffected.ThelowpassspatialGaussianfilterisdefined
56
FeatureRankingUsingLinearSVM
Algorithm2TrainingandPrediction
Input:Trainingsets,testingsets.
Output:predictionsonnestedsubsets.
1.Useafeaturerankingalgorithmtocomputethesortedfeaturelistfj
,j=1,...,n.
2.Foreachfeaturesizem∈{1,2,4,...,2i,...,n}.
(a)Generatethenewtrainingsetthathasonlythefirstmfeaturesinthesorted
featurelist,fj
,j=1,...,m.
(b)UsegridsearchtofindthebestparameterC.
(c)TraintheL2-losslinearSVMmodelonthenewtrainingset.
(d)Predictthetestingsetusingthemodel.
as:
g(x0
)=
1
G(x0
)
￿
x
e−
1
2
(
x−x
0

σ
)2
f(x),whereG(x0)=
￿
x
e−
1
2
(
x−x
0

σ
)2
(6)
wheref(x)isthevalueatpositionxinthe32×32array.Foreachpositionx,wetakethe
Gaussianweightedaverageofallvaluesinthearray.Theresultingg(x)istheapproximated
lowfrequencynoisewederive,andf

(x)=f(x)−g(x)isthefeaturevaluethatwewould
liketouse.Theσissetto3.2afterexperimentingwithseveralvalues.
Sincetestingsetsmaynotfollowthesamedistributionastrainingsets,anditisintended
tohidetheirdistributions,novalidationsetsareprovidedduringthedevelopmentperiod,
whichisthetimebetweenthestartandtheterminationofthechallenge.Instead,anon-line
submissionpageshowswhichquartilethatsubmissionbelongstoamongallsubmissions.
Besides,testingAUCoftoyexamplesareavailable.
ThelinearSVMclassifierthatweuseisLIBLINEAR1
(Fanetal.,2008),andweuse
LIBSVM2
(ChangandLin,2001)forSVMwithRBFkernel.WhileLIBSVMcanhandle
linearkernelaswell,weuseLIBLINEARduetoitsspecialdesignforlinearSVM.Our
implementationextendsfromtheframeworkbyChenandLin(2006)
3.Allsourcesforour
experimentsareavailableathttp://www.csie.ntu.edu.tw/
~
cjlin/papers/causality.
WeexperimentwiththefeaturerankingmethodsdescribedinSection3.WeuseF-
score,W,D-AUC,D-ACCtodenotethemethodsinSections3.1-3.4,respectively.The
linearSVMweightsarederivedfromLIBLINEARmodelfiles.Theprocedureisdescribedin
Algorithm2.
Wesummarizethemethodsthatweexperimentwith:
•F-score:featurerankingusingF-scoredescribedinSection3.1.
•W:featurerankingusinglinearSVMweightsdescribedinSection3.2.
1.http://www.csie.ntu.edu.tw/
~
cjlin/liblinear
2.http://www.csie.ntu.edu.tw/
~
cjlin/libsvm
3.http://www.csie.ntu.edu.tw/
~
cjlin/libsvmtools
57
ChangandLin
Table2:Bestfive-foldcrossvalidationAUCandthecorrespondingfeaturesize.Thebest
featurerankingapproachisbold-faced.
DatasetREGEDSIDO
4
CINAMARTI
F-score0.9998(16)0.9461(2,048)0.9694(132)0.9210(512)
W1.0000(32)0.9552(512)0.9710(64)0.9632(128)
D-AUC1.0000(16)–0.9699(128)0.9640(256)
D-ACC0.9998(64)–0.9694(132)0.8993(32)
•D-AUC:featurerankingbycheckingthechangeofAUCwith/withoutremovingeach
feature.DetailesareinSection3.3.
•D-ACC:featurerankingbycheckingthechangeofaccuracywith/withoutremoving
eachfeature.DetailsareinSection3.4.
4.1DevelopmentPeriod
Duringthedevelopmentperiod,wetookintoaccountthecrossvalidationAUContraining
sets,thetestingAUCoftoyexamples,andthequartileinformationtodecidethemethod
forthefinalsubmission.
Sincewedidnotdevelopastrategytodealwithdifferenttraining/testingdistributions,
weusethesamemodeltopredicteachtask’sthreetestingsets.Wedidnotusetheprovided
informationofthemanipulatedfeaturesinREGEDandMARTI,andthe25calibrantfeatures
inMARTI.
WithAUCbeingtheevaluationcriterion,wesubmittedthedecisionvaluesoflinear
SVMpredictions.Nestedsubsetsaccordingtosortedfeaturelistsareusedsincetheir
performancesarebetter.Thatis,oneofthepredictionsbasedonasubsetoutperformsthe
onebasedonthewholefeatureset.
ForF-scoreandW,ittakeslessthanoneminutetotrainallthemodelsfornested-
subsetsubmissionsforREGED,CINA,andMARTI,whileittakesabout13minutesfor
SIDO.Excludingpreprocessing,thetimerequiredtopredictonetestingsetisaroundfive
minutesforREGEDandMARTI,16secondsforCINA,andthreeminutesforSIDO.SIDO
ismorecomputationalcostlytotrainandpredictduetoalargernumberoffeaturesand
traininginstances.ForD-AUCandD-ACC,ittakesafewhourstogetthefeaturerank.
Wesubmittedtotally60entriesbeforethechallengeended.Amongmethodswehave
tried,WhastestingAUCinthefirstquartileforalldatasets.Thisresultseemstoindicate
thatitisbetterthanothers.Weusedcross-validationwithAUCinordertogetmore
definitiveconclusions.
Table2showsthefive-foldcrossvalidationAUCofusingthebestfeaturesize.We
conductcrossvalidationonallfeaturesize∈{1,2,4,...,2i,...,n},wherenisthetotal
numberoffeatures.WandD-AUCseemtoperformbetterthanothermethods,while
D-ACCistheworst.
4.D-AUCandD-ACCareinfeasibleforSIDOduetothelargenumberoffeaturesofSIDO.
58
FeatureRankingUsingLinearSVM
Table3:Comparisonsoftheperformanceontoyexamples.ThetestingAUCisshowed.
Sortedfeaturelistandnestedsubsetsonitareused.
DatasetLUCAS0LUCAS1LUCAS2LUCAP0LUCAP1LUCAP2
F-score0.92080.89890.74460.97020.83270.7453
W0.92080.89890.76540.97020.91300.9159
D-AUC0.92080.89890.76540.96960.86480.8655
D-ACC0.92080.89890.74460.96960.77550.6011
WefindthatD-ACCdiffersmostfromothers,whiletheotherthreemethodsaremore
similar.Especially,thetoprankedfeatureschosenbyWandD-AUCarealike.Forexample,
WandD-AUChaveexactlythesametopfourfeaturesforCINA,andthesamesetoftop
eightfeatureswithslightlydifferentrankingsforREGED.
InTable3,wecomparedifferentfeaturerankingmethodsaccordingtothetestingAUC
ofthetoyexamples,LUCASandLUCAP.WecanseethatWstilloutperformsothers.Itis
muchbetterthanothermethodsespeciallyonmanipulatedtestingdatasets(seeLUCAP1
andLUCAP2).Similartothecrossvalidationresults,D-ACCistheworst.
4.2CompetitionResults
Table4showstheresultsofourfinalsubmission.Fnumisthebestnumberoffeatures
tomakeprediction.Itisdeterminedbytheorganizersaccordingtothenested-subset
submissions.Fscoreindicateshowgoodtherankingisaccordingtothecausalrelationships
knownonlytotheorganizers.TscoreisthetestingAUC.TopTsisthemaximalscoreof
thelastentrymadebyallparticipants,andMaxTsisthebestscorereachable,estimated
usingcausalrelationshipknowledgenotavailabletoparticipants.
WeexplainthatonCINA2,ourmethodmightbenefitfromgoodfeatureranking.Our
resultisthebestamongallsubmissions.Thefourfeaturesusedmightbethedirectcauseof
thelabel.Asmentionedearlier,WandD-AUCidentifyexactlythesametopfourfeatures.
SimilarlyforMARTI2andREGED2,FnumissmallandWandD-AUCselectthesame
setoffeatures,althoughtherankingsareslightlydifferent.
WealsoobservethattheFnumsofthefinalsubmissionaresimilartothebestfeature
sizegivenbythecrossvalidationresultsonthetrainingdata.However,webenefitfrom
thenested-subsetsubmission,sincewedonotselectthebestfeaturesize.Accordingto
therule,thebestfeaturesizeisselectedaccordingtothetestingAUC,sothetestingset
informationisusedindirectly.
Althoughthechallengeisdesignedinawaythatcasualdiscoveryisrequiredtomake
goodpredictions,oursimplefeaturerankingmethodperformsratherwell.Itisinteresting
thatoursimplemethodoutperformssomemorecomplicatedcasualdiscoverymethods.
However,thegoodperformancesdonotindicatethatthehighlyrankedfeaturesare
importantcauses.Ourmethodsrankthefeaturesaccordingtotheirrelevance,nottheir
causalimportance,and,thus,theydonotenhanceourknowledgeoftheunderlyingcausal
relationshipsbetweenfeatures.
59
ChangandLin
Table4:TheresultsofourfinalsubmissionintheCausalityChallenge.Weobtainfeature
rankingusinglinearSVMweights.Thecolumn“Fnum”showsthebestfeature
sizetomakepredictionandthetotalnumberoffeatures.
DatasetFnumFscoreTscoreTopTsMaxTsRank
REGED016/9990.85260.99981.00001.0000
REGED116/9990.85660.95560.99800.9980
REGED28/9990.99700.83920.86000.9543
mean0.93161
SIDO01,024/4,9320.65160.94320.94430.9467
SIDO14,096/4,9320.56850.75230.75320.7893
SIDO22,048/4,9320.56850.62350.66840.7674
mean0.77302
CINA064/1320.60000.97150.97880.9788
CINA164/1320.70530.84460.89770.8977
CINA24/1320.70530.81570.81570.8910
mean0.87731
MARTI0256/1,0240.80730.99140.99960.9996
MARTI1256/1,0240.72790.92090.94700.9542
MARTI22/1,0240.98970.76060.79750.8273
mean0.89103
OurlinearSVMclassifierhasexcellentperformancesonversion0onalltasks.Our
TscoreisclosetoTopTs.However,comparedwiththebestperformancebyotherpartici-
pants,theperformanceonversion1isslightlyworse,andtheperformanceonversion2is
stillworsethanthatonversion1.Astherankingforeachtaskisdeterminedaccordingto
theaverageoftheperformancesonthethreetestingsets,wemighttaketheadvantageof
goodperformancesonversion0,wherethetestingandtrainingdistributionsarethesame.
Figure1showstheprofileoftheselectedfeatures(i.e.,topFnumfeatures).Thisfigure
isprovidedbytheorganizers.Thenoisefilteringmethodweusedmightnotbegoodenough
sinceforMARTI0andMARTI1,theratiosof“directcauses”featuresarelowcomparedwith
othermethods.Besides,ourfeaturerankingmethodranksbothdirectcausesanddirect
effectsinthefrontofthelist.Theytogethermakeupmostofthefeaturesonversion0.This
resultisreasonablesinceourmethodsdonotconsidercausalrelationshipsandtherefore
notnecessarilyranktruecausesonthetop.InTable4,wehaveexcellentperformanceson
version0ofalltasks.Onmanipulatedtestingsets,theratioofunrelatedfeaturesbecome
higher,andourperformanceofthesetwoversionsarenotasgoodasversion0.Theonly
exceptionisCINA2,wherewedidnotobtainanyunrelatedfeatures.
60
FeatureRankingUsingLinearSVM
Figure1:Profileoffeaturesselected(providedbythecompetitionorganizers).dcause:di-
rectcause,deffect:directeffects,ocauses:othercauses,oeffects:othereffects,
spouses:parentofadirecteffect,orelatives:otherrelatives,unrelated:com-
pletelyirrelevant.
4.3Post-ChallengeExperiments
Afterthechallenge,thetestingAUCvaluesofourpastsubmissionsareavailable.We
areabletocomparetheresultsofallmethods,includingL2-losslinearSVMwithdifferent
featurerankingmethodsandadirectuseofSVMwithoutfeatureranking.Wealsoconduct
post-challengeexperimentstocomparethefeaturerankingmethodsusingL1-lossSVM.
Besides,inordertoseeifnonlinearSVMshelptoimprovetheperformance,weapplythe
featurerankingsobtainedfromL2-losslinearSVMtononlinearSVMwiththeRBFkernel.
Table5showsthetestingAUCrevealedafterthechallengeended.LINEARstandsfor
adirectuseofL2-losslinearSVM.ItisworthnoticingthatsimilartoTables2and3,W
isgenerallythebest.ThisresultisinterestingasfortestingAUCinTable5,trainingand
testingsetsarefromdifferentdistributions.AnexceptionwhereF-scorehasbettertesting
AUCthanWisREGED.D-ACCisstilltheworstthoughthedifferencetoothermethods
becomesmuchsmaller.
InordertounderstandthedifferencebetweenusingL1-lossandL2-lossfunctions,we
experimentwithL1-losslinearSVMtorankfeaturesandclassifydatainstances.Theresults
areinTable6.Ingeneral,thetestingAUCvaluesdonotdiffermuchfromthoseofL2-loss
SVMinTable5.However,herewedonothaveasolidconclusionthatWoutperformsother
methods.Instead,mostmethodswinonsomedatasets.
61
ChangandLin
Table5:ComparisonofdifferentfeaturerankingmethodsusingL2-losslinearSVM.It
showstestingAUCandthecorrespondingFnum,revealedafterthechallengehas
ended.WedidnotrunD-AUCandD-ACConSIDO,sosomeslotsinthistable
areblank.
FeaturerankingmethodsSVM
DatasetF-scoreWD-AUCD-ACCLINEAR
REGED00.9998(64)0.9998(16)0.9997(16)0.9987(128)0.9970
REGED10.9555(32)0.9556(16)0.9528(16)0.9438(999)0.9438
REGED20.8510(8)0.8392(8)0.8392(8)0.8113(32)0.7442
mean0.93540.93160.93060.91790.8950
SIDO00.9430(4096)0.9432(1024)0.9426
SIDO10.7515(4932)0.7523(4096)0.7515
SIDO20.6184(4096)0.6235(2048)0.6143
mean0.77100.77300.7695
CINA00.9706(132)0.9715(64)0.9712(128)0.9706(132)0.9706
CINA10.8355(128)0.8446(64)0.8416(128)0.8348(132)0.8348
CINA20.6108(64)0.8157(4)0.8157(4)0.8140(8)0.6095
mean0.80570.87730.87610.87320.8050
MARTI00.9899(512)0.9914(256)0.9860(1024)0.9903(512)0.9860
MARTI10.8960(1024)0.9209(256)0.9134(32)0.8960(1024)0.8960
MARTI20.7571(4)0.7606(2)0.7606(2)0.7282(1024)0.7282
mean0.88100.89100.88670.87150.8701
WeappliedtheL2-lossSVMwithRBFkernelonthelistoffeaturesgivenbyL2-loss
linearSVMinordertoclarifytheperformanceinthecaseiffeaturerankingsarecombined
withanonlinearkernel.TheresultsareshowninTable7.ApproachWstilloutperforms
othermethodswhenusinganonlinearSVMclassifier.Forthesechallengedatasets,W
seemstobeagoodmethodregardlessoftheclassifierused.NotethattheFnumvaluesare
notalwaysthesameinTables5and7,eventhoughthesamefeaturerankingsareapplied.
WealsotriedtoincorporateRecursiveFeatureElimination(RFE)(Guyonetal.,2002).
Foragivensetoffeatures,weuselinearSVMweightstoobtaintherankings,outputranks
ofthoseinthesecondhalf,andcontinuethesameprocedureonthefirsthalffeatures.To
bemoreprecise,subsetsSj
ofsize|Sj|∈{n,2logn
,...,2i
,...,2,1}aregenerated,wheren
isthetotalnumberoffeaturesandj=0,...,logn.AfterwetrainonsubsetSj
,weuse
thelinearSVMweightstorankfeaturesinSj
andletSj+1
includethefirsthalffeatures.
TheresultsarenotverydifferentfromthatwithoutRFE.5.DiscussionandConclusions Inthischallenge,wehaveexperimentedwithseveralfeaturerankingmethods.Amongthem,
featurerankingbasedonF-scoreisindependentfromclassifiers,featurerankingbasedon
62
FeatureRankingUsingLinearSVM
Table6:ComparisonofdifferentfeaturerankingmethodsusingL1-losslinearSVM.It
showstestingAUCandthecorrespondingFnum.WedidnotrunD-AUCand
D-ACConSIDO,sosomeslotsinthistableareblank.
FeaturerankingmethodsSVM
DatasetF-scoreWD-AUCD-ACCLINEAR
REGED00.9996(32)0.9997(16)0.9991(16)0.9981(256)0.9964
REGED10.9528(32)0.9558(64)0.9392(256)0.9551(64)0.9348
REGED20.8562(8)0.8419(8)0.8504(8)0.8777(16)0.7396
mean0.93620.93250.92960.94360.8903
SIDO00.9407(4096)0.9419(512)0.9397
SIDO10.7588(4932)0.7590(4096)0.7588
SIDO20.6687(4932)0.6701(2048)0.6687
mean0.78940.79030.7891
CINA00.9713(132)0.9713(132)0.9716(128)0.9713(132)0.9713
CINA10.8373(128)0.8369(132)0.8425(128)0.8369(132)0.8369
CINA20.6377(128)0.6377(128)0.8094(4)0.6347(132)0.6347
mean0.81540.81530.87450.81430.8143
MARTI00.9872(512)0.9896(256)0.9933(512)0.9916(512)0.9858
MARTI10.8950(1024)0.9046(512)0.9168(512)0.9078(512)0.8950
MARTI20.7694(8)0.7790(4)0.7710(2)0.7369(8)0.7299
mean0.88390.89110.89370.87870.8703
linearSVMweightsrequirealinearSVMclassifier,andtheothertwoperformance-based
methodscanuseanyclassifier.
Wefocusonsimplemethods,sointhiscompetitionwecanconductquitecomplete
validationprocedurestoselectgoodmodels.However,althoughwehaveexcellentperfor-
manceonpredictions,ourmethodsdonotprovideinformationontheunderlyingcausal
relationshipsbetweenfeatures.Withoutcausaldiscovery,theperformanceofourmethods
onmanipulateddatasetsarenotasgoodasthatonunmanipulateddatasets.Ourmethods
mightbeimprovedbyusingcausality,andhowitcanbedonewillneedmoreinvestigations.Acknowledgments ThisworkwassupportedinpartbygrantsfromtheNationalScienceCouncilofTaiwan.References
BernhardE.Boser,IsabelleGuyon,andVladimirVapnik.Atrainingalgorithmforopti-
malmarginclassifiers.InProceedingsoftheFifthAnnualWorkshoponComputational
LearningTheory,pages144–152.ACMPress,1992.
63
ChangandLin
Table7:ComparisonofdifferentfeaturerankingmethodsusingL1-lossSVMwithRBF
kernelasclassifier.ItshowstestingAUCandthecorrespondingFnum.Wedid
notrunD-AUCandD-ACConSIDO,sosomeslotsinthistableareblank.
FeaturerankingmethodsSVM
DatasetF-scoreWD-AUCD-ACCRBF
REGED00.9997(64)0.9995(16)0.9997(16)0.9989(64)0.9968
REGED10.9709(32)0.9753(32)0.9748(16)0.9531(128)0.9419
REGED20.8881(8)0.8676(8)0.8676(8)0.8189(32)0.7459
mean0.95290.94750.94740.92360.8949
SIDO00.9339(4096)0.9444(4096)0.9259
SIDO10.7339(4096)0.7634(4096)0.7124
SIDO20.5862(4096)0.6255(4096)0.5686
mean0.75130.77780.7357
CINA00.9732(64)0.9754(32)0.9716(32)0.9718(128)0.9683
CINA10.8387(64)0.8646(32)0.8306(4)0.8383(128)0.8249
CINA20.6855(64)0.8358(4)0.8358(4)0.8164(8)0.6739
mean0.83250.89190.87930.87550.8224
MARTI00.9883(512)0.9916(256)0.9848(1024)0.9896(512)0.9848
MARTI10.8877(1024)0.9181(256)0.9057(32)0.8877(1024)0.8877
MARTI20.7659(8)0.7616(16)0.7609(2)0.7308(1024)0.7308
mean0.88060.89040.88380.86940.8678
Chih-ChungChangandChih-JenLin.LIBSVM:alibraryforsupportvectormachines,
2001.Softwareavailableathttp://www.csie.ntu.edu.tw/
~
cjlin/libsvm.
Yi-WeiChenandChih-JenLin.CombiningSVMswithvariousfeatureselectionstrategies.
InIsabelleGuyon,SteveGunn,MasoudNikravesh,andLoftiZadeh,editors,Feature
extraction,foundationsandapplications.Springer,2006.
Rong-EnFan,Kai-WeiChang,Cho-JuiHsieh,Xiang-RuiWang,andChih-JenLin.LIBLIN-
EAR:Alibraryforlargelinearclassification.JournalofMachineLearningResearch,9:
1871–1874,2008.URLhttp://www.csie.ntu.edu.tw/
~
cjlin/papers/liblinear.pdf.
IsabelleGuyon,ConstantinAliferis,GregCooper,Andr´eElisseeff,Jean-PhilippePellet,Pe-
terSpirtes,andAlexanderStatnikov.Designandanalysisofthecausationandprediction
challenge.JMLR:WorkshopandConferenceProceedings,2008.
IsabelleGuyon,JasonWeston,StephenBarnhill,andVladimirVapnik.Geneselectionfor
cancerclassificationusingsupportvectormachines.MachineLearning,46:389–422,2002.
64