I n P r o c e e d i n g s o f t h e T e n t h N a t i o n a l C o n f e r e n c e o n A r t i c i a l I n t e l l i g e n c e S a n J o s e A A A I P r e s s A n A n a l y s i s o f B a y e s i a n C l a s s i e r s P a t L a n g l e y W a y n e I b a K e v i n T h o m p s o n y f L a n g l e y I b a K T h o m p s o g p t o l e m y a r c n a s a g o v A I R e s e a r c h B r a n c h M S N A S A A m e s R e s e a r c h C e n t e r M o e t t F i e l d C A U S A A b s t r a c t I n t h i s p a p e r w e p r e s e n t a n a v e r a g e c a s e a n a l y s i s o f t h e B a y e s i a n c l a s s i e r a s i m p l e i n d u c t i o n a l g o r i t h m t h a t f a r e s r e m a r k a b l y w e l l o n m a n y l e a r n i n g t a s k s O u r a n a l y s i s a s s u m e s a m o n o t o n e c o n j u n c t i v e t a r g e t c o n c e p t a n d i n d e p e n d e n t n o i s e f r e e B o o l e a n a t t r i b u t e s W e c a l c u l a t e t h e p r o b a b i l i t y t h a t t h e a l g o r i t h m w i l l i n d u c e a n a r b i t r a r y p a i r o f c o n c e p t d e s c r i p t i o n s a n d t h e n u s e t h i s t o c o m p u t e t h e p r o b a b i l i t y o f c o r r e c t c l a s s i c a t i o n o v e r t h e i n s t a n c e s p a c e T h e a n a l y s i s t a k e s i n t o a c c o u n t t h e n u m b e r o f t r a i n i n g i n s t a n c e s t h e n u m b e r o f a t t r i b u t e s t h e d i s t r i b u t i o n o f t h e s e a t t r i b u t e s a n d t h e l e v e l o f c l a s s n o i s e W e a l s o e x p l o r e t h e b e h a v i o r a l i m p l i c a t i o n s o f t h e a n a l y s i s b y p r e s e n t i n g p r e d i c t e d l e a r n i n g c u r v e s f o r a r t i c i a l d o m a i n s a n d g i v e e x p e r i m e n t a l r e s u l t s o n t h e s e d o m a i n s a s a c h e c k o n o u r r e a s o n i n g P r o b a b i l i s t i c A p p r o a c h e s t o I n d u c t i o n O n e g o a l o f r e s e a r c h i n m a c h i n e l e a r n i n g i s t o d i s c o v e r p r i n c i p l e s t h a t r e l a t e a l g o r i t h m s a n d d o m a i n c h a r a c t e r i s t i c s t o b e h a v i o r T o t h i s e n d m a n y r e s e a r c h e r s h a v e c a r r i e d o u t s y s t e m a t i c e x p e r i m e n t a t i o n w i t h n a t u r a l a n d a r t i c i a l d o m a i n s i n s e a r c h o f e m p i r i c a l r e g u l a r i t i e s e g K i b l e r L a n g l e y O t h e r s h a v e f o c u s e d o n t h e o r e t i c a l a n a l y s e s o f t e n w i t h i n t h e p a r a d i g m o f p r o b a b l y a p p r o x i m a t e l y c o r r e c t l e a r n i n g e g H a u s s l e r H o w e v e r m o s t e x p e r i m e n t a l s t u d i e s a r e b a s e d o n l y o n i n f o r m a l a n a l y s e s o f t h e l e a r n i n g t a s k w h e r e a s m o s t f o r m a l a n a l y s e s a d d r e s s t h e w o r s t c a s e a n d t h u s b e a r l i t t l e r e l a t i o n t o e m p i r i c a l r e s u l t s A t h i r d a p p r o a c h p r o p o s e d b y C o h e n a n d H o w e i n v o l v e s t h e f o r m u l a t i o n o f a v e r a g e c a s e m o d e l s f o r s p e c i c a l g o r i t h m s a n d t e s t i n g t h e m t h r o u g h e x p e r i m e n t a t i o n P a z z a n i a n d S a r r e t t s s t u d y o f c o n j u n c t i v e l e a r n i n g p r o v i d e s a n e x c e l l e n t e x a m p l e o f t h i s t e c h n i q u e a s d o e s H i r s c h b e r g a n d P a z z a n i s w o r k o n i n d u c i n g k C N F c o n c e p t s B y a s s u m i n g i n f o r m a t i o n a b o u t t h e t a r g e t c o n c e p t t h e n u m A l s o a l i a t e d w i t h R E C O M T e c h n o l o g i e s y A l s o a l i a t e d w i t h S t e r l i n g S o f t w a r e

colossalbangAI and Robotics

Nov 7, 2013 (3 years and 9 months ago)

84 views

In Pr o c e e dings of the T enth National Confer enc e on A rtiial Intel ligenc e San Jose AAAI Press
An Analysis of Ba y esian Classirs
y

Langley Iba KThompso tolemyr casao v
AI Researc h Branc h
NASA Ames Researc h Cen ter
Mott Field CA USA
b er of attributes and the class and attribute frequen
cies they obtain predictions ab out the b eha vior of
In this pap er w e presen t an a v eragease analysis
induction algorithms and used exp erimen ts to c hec k
of the Ba y esian classir a simple induction algo

their analyses Ho w ev er their researc h do es not fo
rithm that fares remark ably w ell on man y learning
cus on algorithms t ypically used b y the exp erimen tal
tasks Our analysis assumes a monotone conjunc
and practical sides of mac hine learning and it is im
tiv e target concept and indep enden t noiseree
p ortan t that a v eragease analyses b e extended to suc h
Bo olean attributes W e calculate the probabilit y
metho ds
that the algorithm will induce an arbitrary pair of
Recen tly there has b een gro wing in terest in proba
concept descriptions and then use this to compute
bilistic approac hes to inductiv e learning F or example
the probabilit y of correct classiation o v er the in
Fisher has describ ed Cobweb an incremen tal
stance space The analysis tak es in to accoun t the
algorithm for conceptual clustering that dra ws hea vily
n um b er of training instances the n um b er of at
on Ba y esian ideas and the literature rep orts a n um b er
tributes the distribution of these attributes and
of systems that build on this w ork Allen Lang
the lev el of class noise W e also explore the b e
ley Iba Gennari Thompson Langley
ha vioral implications of the analysis b y presen ting
Cheeseman et al ha v e outlined A uto
predicted learning curv es for artiial domains
Class a nonincremen tal system that uses Ba y esian
and giv e exp erimen tal results on these domains
metho ds to cluster instances in to groups and other
as a c hec k on our reasoning
researc hers ha v e fo cused on the induction of Ba y esian
inference net w orks Co op er Kersk o vits

These recen t Ba y esian learning algorithms are com
plex and not easily amenable to analysis but they
One goal of researc h in mac hine learning is to disco v er
principles that relate algorithms and domain c haracter share a common ancestor that is simpler and more
tractable This sup ervised algorithm whic h w e re
istics to b eha vior T o this end man y researc hers ha v e
fer to simply as a Bayesian classir comes originally
carried out systematic exp erimen tation with natural
from w ork in pattern recognition uda Hart
and artiial domains in searc h of empirical regularities
The metho d stores a probabilistic summary for eac h
Kibler Langley Others ha v e fo cused
class this summary con tains the conditional probabil
on theoretical analyses often within the paradigm of
it y of eac h attribute v alue giv en the class as w ell as
probably appro ximately correct learning Haus
the probabilit y r base rate of the class This data
sler Ho w ev er most exp erimen tal studies are
structure appro ximates the represen tational p o w er of
based only on informal analyses of the learning task
a p erceptron it describ es a single decision b oundary
whereas most formal analyses address the w orst case
through the instance space When the algorithm en
and th us b ear little relation to empirical results
coun ters a new instance it up dates the probabilities
A third approac h prop osed b y Cohen and Ho w e
stored with the sp ecid class Neither the order of
in v olv es the form ulation of a v eragease mo d
training instances nor the o ccurrence of classiation
els for sp eci algorithms and testing them through
errors ha v e an y ect on this pro cess When giv en a
exp erimen tation P azzani and Sarrett study
test instance the classir uses an ev aluation function
of conjunctiv e learning pro vides an excellen t example
hic h w e describ e in detail later to rank the alter
of this tec hnique as do es Hirsc h b erg and P azzani
w ork on inducing k NF concepts By assum

A related approac h in v olv es deriving the optimal learn
ing information ab out the target concept the n um
ing algorithm under certain assumptions and then imple

Also aiated with RECOM T ec hnologies
men ting an appro ximation of that algorithm Opp er
y
Also aiated with Sterling Soft w are Haussler Anal ysis of Ba yesian Classifiers
gorithm will learn a particular pair of concept descrip
tions After this w e deriv e the accuracy of an arbi
Domain Ba yes IND Freq
trary pair of descriptions o v er all instances T ak en to
gether these expressions giv e us the o v erall accuracy
So ybean
of the learned concepts W e d that a n um b er of fac
Chess
tors inence b eha vior of the algorithm including the
L ympho
n um b er of training instances the n um b er of relev an t
Splice
and irrelev an t attributes the amoun t of class and at
Pr omoters
tribute noise and the class and attribute frequencies
Finally w e examine the implications of the analysis b y
predicting b eha vior in sp eci domains and c hec k our
T able P ercen tage accuracies for t w o induction al
gorithms on e classiation domains along with the reasoning with exp erimen ts in these domains
accuracy of predicting the most frequen t class

Consider a concept C deed as the monotone con
nativ e classes based on their probabilistic summaries
junction of r relev an t features A A in whic h
r
and assigns the instance to the highest scoring class
none of the features are negated Also assume there
Both the ev aluation function and the summary de are i irrelev an t features A A Let P A b e
r r i j
scriptions used in Ba y esian classirs assume that at
the probabilit y of feature A o ccurring in an instance
j
tributes are statistically indep enden t Since this seems
The concept descriptions learned b y a Ba y esian clas
unrealistic for man y natural domains researc hers ha v e sir are fully determined b y the n training instances
often concluded that the algorithm will b eha v e p o orly it has observ ed Th us to compute the probabilit y of
in comparison to other induction metho ds Ho w ev er eac h suc h concept description w e m ust consider dir
no studies ha v e examined the exten t to whic h violation en t p ossible com binations of n training instances
of this assumption leads to p erformance degradation First let us consider the probabilit y that the algo
and the probabilistic approac h should b e quite robust rithm has observ ed exactly k out of n p ositiv e in
with resp ect to b oth noise and irrelev an t attributes stances If w e let P C b e the probabilit y of observing
Moreo v er earlier studies Clark Niblett a p ositiv e instance and w e let x b e the observ ed frac
presen t evidence of the practicalit y of the algorithm tion of p ositiv e instances then w e ha v e

T able presen ts additional exp erimen tal evidence
k n
k n k
for the utilit y of Ba y esian classirs In this study P x P C P C
n k
w e compare the metho d to IND em ulation of the
C algorithm un tine Caruana and an al
This expression also represen ts the probabilit y that
gorithm that simply predicts the mo dal class The e
one has observ ed exactly n k negativ e instances
domains from the UCI database collection urph y
Since w e assume that the concept is monotone con
Aha include the mall so yb ean dataset c hess
junctiv e and that the attributes are indep enden t w e
Q
r
end games in v olving a kingo okinga wn confron ta
ha v e P C P A whic h is simply the pro duct
j
j
tion cases of lymphograph y diseases and t w o biologi
of the probabilities for all relev an t attributes
cal datasets F or eac h domain w e randomly split the
A giv en n um b er of p ositiv e instances k can pro duce
data set in to training instances and test in
man y alternativ e descriptions of the p ositiv e class de
stances rep eating this pro cess to obtain separate
p ending on the instances that are observ ed One can
pairs of training and test sets The table sho ws the
en vision eac h suc h concept description as a cell in an
mean accuracy and conence in terv als on the
r i dimensional matrix with eac h dimension rang
test sets for eac h domain
ing from to k and with the coun t on dimension j
In four of the domains the Ba y esian classir is at
represen ting the n um b er of p ositiv e instances in whic h
least as accurate as the C reimplemen tation W e will
attribute A w as presen t One can en vision a similar
j
not argue that the Ba y esian classir is sup erior to this
matrix for the negativ e instances again ha ving dimen
more sophisticated metho d but the results do sho w
sionalit y r i but with eac h dimension ranging from
that it b eha v es w ell across a v ariet y of domains Th us
to n k and with the coun t on eac h dimension j rep
the Ba y esian classir is a promising induction algo
resen ting the n um b er of negativ e instances in whic h A
j
rithm that deserv es closer insp ection and a careful
o ccurred Figure sho ws a p ositiv e cell matrix with
analysis should giv e us insigh ts in to its b eha vior
r i k The designated cell holds the prob
W e simplify matters b y limiting our analysis to the abilit y that the algorithm has seen t w o instances with
induction of conjunctiv e concepts F urthermore w e A presen t instance with A presen t and instances

assume that there are only t w o classes that eac h at with A presen t

tribute is Bo olean and that attributes are indep en In b oth matrices one can index eac h cell or concept
den t of eac h other W e divide our study in to three description b y a v ector of length r i Let P cel l
k
u
parts W e st determine the probabilit y that the al b e the probabilit y that the algorithm has pro duced theAnal ysis of Ba yesian Classifiers

If w e let P I j C b e the probabilit y of I giv en a neg
j j
A
3 2
ativ e instance w e can use the m ultinom i al distribution
1
0
to compute the probabilit y that exactly d of the n k

instances will b e instance I d will b e instance I

2
and d will b e instance I Th us the expression
w w
n k
d d d

1
P I j C P I j C P I j C
w
d d d
A w
2
giv es us the probabilit y of a particular com bination
0
of negativ e instances and from that com bination w e
can compute the concept description cell indices
0
1
2
that result Of course t w o or more com binations of in
A
1
stances ma y pro duce the same concept description but
one simply sums the probabilities for all suc h com bina
Figure A p ositiv e cell matrix for three attributes
tions to get the total probabilit y for the cell All that
and k V alues along axes represen t n um b ers of

w e need to mak e this op erational is P I j C the prob
j
p ositiv e instances for whic h A w as presen t
j
abilit y of I giv en a negativ e instance In the absence
j

of noise this is simply P I C since P C j I
j j
W e can extend the framew ork to handle class noise
cell indexed b y v ector u in the p ositiv e matrix giv en
b y mo difying the deitions of three basic terms
k p ositiv e instances let P cel l b e the analogous
v n k

probabilit y for a cell in the negativ e matrix Then a P C P A j C and P I j C One common deition
j j
of class noise in v olv es the corruption of class names
w eigh ted pro duct of these terms giv es the probabilit y
replacing the actual class with its opp osite with
that the learning algorithm will generate an y particular
pair of concept descriptions whic h is a certain probabilit y z b et w een and The proba
bilit y of the class after one has corrupted v alues is
k

P k u v P x P cel l P cel l
n u k v n k
P C z P C z P C P C z z
n
as w e ha v e noted elsewhere ba Langley
In other w ords one m ultiplies the probabilit y of seeing
F or an irrelev an t attribute A the probabilit y
j
k out of n p ositiv e instances and the probabilities of
P A j C is unacted b y class noise and remains equal
j
encoun tering cell u in the p ositiv e matrix and cell v in
to P A since the attribute is still indep enden t of the
j
the negativ e matrix
class Ho w ev er the situation for relev an t attributes
Ho w ev er w e m ust still determine the probabilit y of
is more complicated By deition w e can reexpress
a giv en cell from the matrix F or those in the p ositiv e
the corrupted conditional probabilit y of a relev an t at
matrix this is straigh tforw ard since the attributes re
tribute A giv en the ossibly corrupted class C as
j
main indep enden t when the instance is a mem b er of a

conjunctiv e concept Th us w e ha v e
P A C
j

P A j C
j

r i P C
Y
u
j
P cel l P y
u k j
where P C is the noisy class probabilit y giv en ab o v e
k
j
Also w e can rewrite the n umerator to sp ecify the situ
ations in whic h corruption of the class name do es and
as the probabilit y for cel l in the p ositiv e matrix
u
do es not o ccur giving
where y represen ts the observ ed fraction of the k in
j

stances in whic h attribute A w as presen t F urther
j z P C P A j C z P C P A j C
j j

P A j C
j
more the probabilit y that one will observ e A in ex
j
P C
actly u out of k suc h instances is
j
Since w e kno w that P A j C for a relev an t at
j
u k

j
u k u tribute and since P A j C P A P C C

j j
P y P A j C P A j C
j j
k u for conjunctiv e concepts w e ha v e
j
z P C z P A P C
j
In the absence of noise w e ha v e P A j C for all

j
P A j C
j
relev an t attributes and P A j C P A for all irrel
P C z z
j j
ev an t attributes
whic h in v olv es only terms that existed b efore corrup
The calculation is more diult for cells in the neg
tion of the class name
ativ e matrix One cannot simply tak e the pro duct of
W e can use similar reasoning to compute the p ost
the probabilities for eac h index of the cell since for a
noise probabilit y of an y particular instance giv en that
conjunctiv e concept the attributes are not statistically


it is negativ e As b efore w e can rewrite P I j C as
j
indep enden t Ho w ev er one can compute the probabil


P I C z P C P I j C z P C P I j C
it y that the n k observ ed negativ e instances will b e j j j


comp osed of a particular com bination of instances
P C z z
P C Anal ysis of Ba yesian Classifiers
but in this case the sp ecial conditions are somewhat T o compute the exp ected accuracy for instance I
j
diren t F or a negativ e instance w e ha v e P I j C w e sum o v er all p ossible v alues of k and pairs of con
j
so that the second term in the n umerator b ecomes cept descriptions the pro duct of the probabilit y of se
zero In con trast for a p ositiv e instance w e ha v e lecting the particular pair of concept descriptions af

P I j C so that the st term disapp ears T ak en ter k p ositiv e instances and the pair accuracy on I
j j
together these conditions let us generate probabilities Th us w e ha v e
for cells in the negativ e matrix after one has added
n
X X X
noise to the class name

K I P k u v ac cur acy I
j n n j n u v
After replacing P C with P C P A j C with
j


k u v
P A j C and P I j C with P I j C the expressions
j j j
earlier in this section let us compute the probabilit y
where the second and third summations o ccur o v er the
that a Ba y esian classir will induce an y particular
p ossible v ectors that index in to the p ositiv e matrix
pair of concept descriptions ells in the t w o matri
and the negativ e matrix T o complete our calcula
ces The information necessary for this calculation is
tions w e need an expression for P I whic h is the
j
the n um b er of training instances the n um b er of rele
pro duct of the probabilities of features presen t in I
j
v an t and irrelev an t attributes their distributions and
the lev el of class noise This analysis holds only for

monotone conjunctiv e concepts and in domains with
Although the equations in the previous sections giv e a
indep enden t attributes but man y of the ideas should
formal description of the Ba y esian classir b eha vior
carry o v er to less restricted classes of domains
their implications are not ob vious In this section w e
examine the ects of v arious domain c haracteristics

on the algorithm classiation accuracy Ho w ev er
T o calculate o v erall accuracy after n training instances
b ecause the n um b er of p ossible concept descriptions
w e m ust sum the exp ected accuracy for eac h p ossible
gro ws exp onen tially with the n um b er of training in
instance w eigh ted b y that instance probabilit y of o c
stances and the n um b er of attributes our predictions
currence More formally the exp ected accuracy is
ha v e b een limited to a small n um b er of eac h
I
In addition to theoretical predictions w e rep ort
X
K P I K I learning curv es that summarize runs on randomly
n j j n
generated training sets Eac h curv e rep orts the a v er
j
age classiation accuracy o v er these runs on a single
T o compute the exp ected accuracy K I for instance
j n
test set of randomly generated instances con tain
I w e m ust determine for eac h pair of cells in the p osi
j
ing no noise In eac h case w e b ound the mean accu
tiv e and negativ e matrices the instance classiation
racy with conence in terv als to sho w the degree
A test instance I is classid b y computing its score
j
to whic h our predicted learning curv es the observ ed
for eac h class description and selecting the class with
ones These exp erimen tal results pro vide an imp ortan t
the highest score ho osing randomly in case of ties
c hec k on our reasoning and they rev ealed a n um b er of
W e will dee accur acy I for the pair of con
j n u v
problems during dev elopmen t of the analysis
cept descriptions u and v to b e if this sc heme cor
Figure sho ws the ects of concept complexit y
rectly predicts I class if it incorrectly predicts the
j
on the rate of learning in the Ba y esian classir when

class and if a tie o ccurs
no noise is presen t In this case w e hold the n um
F ollo wing our previous notation let n b e the n um b er
b er of irrelev an t attributes i constan t at one and w e
of observ ed instances k b e the n um b er of observ ed p os
hold their probabilit y of o ccurrence P A constan t at

itiv e instances u b e the n um b er of p ositiv e instances
j
W e v ary b oth the n um b er of training instances and

in whic h attribute A o ccurs and v b e the n um b er
j j
the n um b er of relev an t attributes r whic h determine
of negativ e instances in whic h A o ccurs F or a giv en
j
the complexit y of the target concept T o normalize for
instance I one can compute the score for the p ositiv e
j
ects of the base rate w e also hold P C the prob

class description as
abilit y of the concept constan t at this means that


r i u

for eac h of the r relev an t attributes P A is P C
Y
k if A is presen t in I
j j
k
scor e C
and th us is v aried for the diren t conditions
j k u

n otherwise
k
j As t ypical with learning curv es the initial accuracies

b egin lo w t and gradually impro v e with increasing

and an analogous equation for the negativ e class sub
n um b ers of training instances The ect of concept
stituting n k for k and v for u T o a v oid m ultiplyi ng
complexit y also agrees with our in tuitions in tro ducing
b y when an attribute has nev er lw a ys b een ob

serv ed in the training instances but is s not presen t
An alternativ e approac h w ould hold constan t for

in the test instance w e follo w Clark and Niblett
relev an t attributes causing to b ecome This
suggestion of replacing with a small v alue
n udges the initial accuracies up w ard but otherwise has little
suc h as n ect on the learning curv esAnal ysis of Ba yesian Classifiers
(a) (b)
1 relevant
2 relevants
Class noise = 0.0
3 relevants
Class noise = 0.1
Class noise = 0.2
0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40
Number of training instances Number of training instances
Figure Predictiv e accuracy of a Ba y esian classir in a conjunctiv e concept assuming the presence of one
irrelev an t attribute as a function of training instances and n um b er of relev an t attributes and amoun t of
class noise The lines represen t theoretical learning curv es whereas the error bars indicate exp erimen tal results
additional features in to the target concept slo ws the tributes whereas the Ba y esian classir is sensitiv e to
learning rate but do es not act asymptotic accuracy b oth the n um b er of relev an t and irrelev an t attributes
whic h is alw a ys for conjunctiv e concepts on noise Ho w ev er the Ba y esian classir is robust with resp ect
free test cases The rate of learning app ears to degrade to noise whereas the Wholist algorithm is not
gracefully with increasing complexit y The predicted
and observ ed learning curv es are in close agreemen t

whic h lends conence to our a v eragease analysis
In this pap er w e ha v e presen ted an analysis of a
Theory and exp erimen t sho w similar ects when w e
Ba y esian classir Our treatmen t requires that the
v ary the n um b er of irrelev an t attributes learning rate
concept b e monotone conjunctiv e that instances b e
slo ws as w e in tro duce misleading features but the al
free of attribute noise and that attributes b e Bo olean
gorithm gradually con v erges on p erfect accuracy
and indep enden t Giv en information ab out the n um
Figure presen ts similar results on the in terac
b er of relev an t and irrelev an t attributes their frequen
tion b et w een class noise and the n um b er of training
cies and the lev el of class noise our equations compute
instances Here w e hold the n um b er of relev an t at
the exp ected classiation accuracy after a giv en n um
tributes constan t at t w o and the n um b er of irrelev an ts
b er of training instances
constan t at one and w e examine three separate lev els
T o explore the implications of the analysis w e ha v e
of class noise F ollo wing the analysis w e assume the
plotted the predicted b eha vior of the algorithm as
test instances are free of noise whic h normalizes ac
a function of the n um b er of training instances the
curacies and eases comparison As one migh t exp ect
n um b er of relev an t attributes and the amoun t of
increasing the noise lev el z decreases the rate of learn
noise ding graceful degradation as the latter t w o
ing Ho w ev er the probabilistic nature of the Ba y esian
increased As a c hec k on our analysis w e run the al
classir leads to graceful degradation and asymptotic
gorithm on artiial domains with the same c haracter
accuracy should b e unacted W e d a close b e
istics W e obtain close s to the predicted b eha vior
t w een the theoretical b eha vior and the exp erimen tal
but only after correcting sev eral errors in our reasoning
learning curv es Although our analysis do es not in
that the empirical studies rev ealed
corp orate attribute noise exp erimen ts with this factor
In additional exp erimen ts w e compare the b eha vior
pro duce similar results In this case equiv alen t lev els
of the Ba y esian classir to that of a reimplemen tation
lead to somewhat slo w er learning rates as one w ould
of C a more widely used algorithm that induces de
exp ect giv en that attribute noise can corrupt m ultiple
cision trees In general the probabilistic metho d p er
v alues whereas class noise acts only one
forms comparably to C despite the latter greater
Finally w e can compare the b eha vior of the Ba y esian sophistication These results suggest that suc h simple
classir to that of Wholist azzani Sarrett metho ds deserv e increased atten tion in future studies
One issue of in terest is the n um b er of train whether theoretical or exp erimen tal
ing instances required to ac hiev e some criterion lev el In future w ork w e plan to extend this analysis in
of accuracy A quan titativ e comparison of this nature sev eral w a ys In particular our curren t equations han
is b ey ond the scop e of this pap er but the resp ectiv e dle only class noise but as Angluin and Laird
analyses and exp erimen ts sho w that the Wholist al ha v e sho wn attribute noise can b e ev en more prob
gorithm is only acted b y the n um b er of irrelev an t at lematic for learning algorithms W e ha v e dev elop ed
Probability of correct classification
0.5 0.6 0.7 0.8 0.9 1
Probability of correct classification
0.5 0.6 0.7 0.8 0.9 1Anal ysis of Ba yesian Classifiers
ten tativ e equations for the case of attribute noise but Co op er G F Hersk o vits E A Ba y esian
the expressions are more complex than for class noise metho d for constructing Ba y esian b elief net w orks from
in that the p ossible corruption of an y com bination of databases Pr o c e e dings of the Seventh Confer enc e on
attributes can mak e an y instance app ear lik e another Unc ertainty in A rtiial Intel ligenc e p Los
W e also need to relax the constrain t that target con Angeles Morgan Kaufmann
cepts m ust b e monotone conjunctiv e
Duda R O Hart P E Pattern classi
Another direction in whic h w e can extend the
c ation and sc ene analysis New Y ork John Wiley
presen t w ork in v olv es running additional exp erimen ts
Sons
Ev en within the assumptions of the curren t analysis
w e could empirically study the exten t to whic h vio Fisher D H Kno wledge acquisition via incre
lated assumptions alter the observ ed b eha vior of the men tal conceptual clustering Machine L e arning
algorithm In addition w e could analyze the attribute
frequencies in sev eral of the domains commonly used
Haussler D Probably appro ximately cor
in exp erimen ts to determine the analytic mo del abil
rect learning Pr o c e e dings of the Eighth National
it y to predict b eha vior on these domains giv en their
Confer enc e on A rtiial Intel ligenc e p
frequencies as input This approac h w ould extend the
Boston AAAI Press
usefulness of our a v eragease mo del b ey ond the arti
ial domains on whic h w e ha v e tested it to date
Iba W Gennari J H Learning to rec
Ov erall w e are encouraged b y the results that w e
ognize mo v emen ts In D H Fisher M J P azzani
ha v e obtained W e ha v e demonstrated that a simple
P Langley ds Conc ept formation Know le dge
Ba y esian classir compares fa v orably with a more so
and exp erienc e in unsup ervise d le arning San Mateo
phisticated induction algorithm and more imp ortan t Morgan Kaufmann
w e ha v e c haracterized its a v eragease b eha vior for a
Iba W Langley P Induction of oneev el
restricted class of domains Our analysis conms in tu
decision trees Pr o c e e dings of the Ninth International
itions ab out the robustness of the Ba y esian algorithm
Confer enc e on Machine L e arning Ab erdeen Morgan
in the face of noise and concept complexit y and it pro
Kaufmann
vides fertile ground for further researc h on this under
studied approac h to induction
Hirsc h b erg D S P azzani M J A ver age
c ase analysis of a k NF le arning algorithm ec hni
cal Rep ort Irvine Univ ersit y of California
Departmen t of Information Computer Science
Thanks to Stephanie Sage Kim ball Collins and Andy
Kibler D Langley P Mac hine learning
Philips for discussions that help ed clarify our ideas
as an exp erimen tal science Pr o c e e dings of the Thir d
Eur op e an Working Session on L e arning p

Glasgo w Pittman
Allen J Langley P In tegrating mem
Murph y P M Aha D W UCI R ep ository
ory and searc h in planning Pr o c e e dings of the Work
of machine le arning datab ases ac hineeadable data
shop on Innovative Appr o aches to Planning Sche dul
rep ository Irvine Univ ersit y of California Depart
ing and Contr ol p San Diego Morgan
men t of Information Computer Science
Kaufmann
Opp er M Haussler D Calculation of the
Angluin D Laird P Learning from noisy
learning curv e of Ba y es optimal classiation algorithm
examples Machine L e arning
for learning a p erceptron with noise Pr o c e e dings of the
F ourth A nnual Workshop on Computational L e arning
Bun tine W Caruana R Intr o duction to
The ory p San ta Cruz Morgan Kaufmann
IND and r e cursive p artitioning ec hnical Rep ort FIA
Mott Field CA NASA Ames Researc h Cen
P azzani M J Sarrett W Av eragease
ter Artiial In telligence Researc h Branc h
analysis of conjunctiv e learning algorithms Pr o c e e d
ings of the Seventh International Confer enc e on Ma
Cheeseman P Kelly J Self M Stutz J T a ylor
chine L e arning p Austin TX Morgan
W F reeman D A utoclass A Ba y esian
Kaufmann
classiation system Pr o c e e dings of the Fifth Interna
tional Confer enc e on Machine L e arning p
Thompson K Langley P Concept forma
Ann Arb or MI Morgan Kaufmann
tion in structured domains In D H Fisher M J P az
zani P Langley ds Conc ept formation Know l
Clark P Niblett T The CN induction
e dge and exp erienc e in unsup ervise d le arning San Ma
algorithm Machine L e arning
teo Morgan Kaufmann
Cohen P R Ho w e A E Ho w ev aluation
guides AI researc h AI Magazine