T o a p p e a r i n P r o c I C M L M a c h i n e L e a r n i n g b y F u n c t i o n D e c o m p o s i t i o n B l a z Z u p a n J o z e f S t e f a n I n s t i t u t e L j u b l j a n a S l o v e n i a b l a z z u p a n i j s s i M a r k o B o h a n e c J o z e f S t e f a n I n s t i t u t e L j u b l j a n a S l o v e n i a m a r k o b o h a n e c i j s s i I v a n B r a t k o F a c u l t y o f C o m p u t e r a n d I n f o r m a t i o n S c i e n c e s a n d J o z e f S t e f a n I n s t i t u t e L j u b l j a n a S l o v e n i a i v a n b r a t k o f r i u n i l j s i J a n e z D e m s a r F a c u l t y o f C o m p u t e r a n d I n f o r m a t i o n S c i e n c e s U n i v e r s i t y o f L j u b l j a n a L j u b l j a n a S l o v e n i a j a n e z d e m s a r f r i u n i l j s i A b s t r a c t W e p r e s e n t a n e w m a c h i n e l e a r n i n g m e t h o d t h a t g i v e n a s e t o f t r a i n i n g e x a m p l e s i n d u c e s a d e n i t i o n o f t h e t a r g e t c o n c e p t i n t e r m s o f a h i e r a r c h y o f i n t e r m e d i a t e c o n c e p t s a n d t h e i r d e n i t i o n s T h i s e e c t i v e l y d e c o m p o s e s t h e p r o b l e m i n t o s m a l l e r l e s s c o m p l e x p r o b l e m s T h e m e t h o d i s i n s p i r e d b y t h e B o o l e a n f u n c t i o n d e c o m p o s i t i o n a p p r o a c h t o t h e d e s i g n o f d i g i t a l c i r c u i t s T o c o p e w i t h h i g h t i m e c o m p l e x i t y o f n d i n g a n o p t i m a l d e c o m p o s i t i o n w e p r o p o s e a s u b o p t i m a l h e u r i s t i c a l g o r i t h m T h e m e t h o d i m p l e m e n t e d i n p r o g r a m H I N T H I e r a r c h y I n d u c t i o n T o o l i s e x p e r i m e n t a l l y e v a l u a t e d u s i n g a s e t o f a r t i c i a l a n d r e a l w o r l d l e a r n i n g p r o b l e m s I t i s s h o w n t h a t t h e m e t h o d p e r f o r m s w e l l b o t h i n t e r m s o f c l a s s i c a t i o n a c c u r a c y a n d d i s c o v e r y o f m e a n i n g f u l c o n c e p t h i e r a r c h i e s I N T R O D U C T I O N T o s o l v e a c o m p l e x p r o b l e m o n e o f t h e m o s t g e n e r a l a p p r o a c h e s i s t o d e c o m p o s e i t i n t o s m a l l e r l e s s c o m p l e x a n d m o r e m a n a g e a b l e s u b p r o b l e m s I n m a c h i n e l e a r n i n g t h i s p r i n c i p l e i s a f o u n d a t i o n f o r s t r u c t u r e d i n d u c t i o n S h a p i r o i n s t e a d o f l e a r n i n g a s i n g l e c o m p l e x c l a s s i c a t i o n r u l e f r o m e x a m p l e s d e n e a g o a l s u b g o a l h i e r a r c h y a n d l e a r n t h e r u l e s f o r e a c h o f t h e s u b g o a l s O r i g i n a l l y S h a p i r o u s e d s t r u c t u r e d i n d u c t i o n f o r t h e c l a s s i c a t i o n o f a f a i r l y c o m p l e x c h e s s e n d g a m e a n d d e m o n s t r a t e d t h a t t h e c o m p l e x i t y a n d c o m p r e h e n s i b i l i t y b r a i n c o m p a t i b i l i t y o f t h e o b t a i n e d s o l u t i o n w a s s u p e r i o r t o t h e u n s t r u c t u r e d o n e T y p i c a l l y a p p l i c a t i o n s o f s t r u c t u r e d i n d u c t i o n i n v o l v e a m a n u a l d e v e l o p m e n t o f t h e h i e r a r c h y a n d a m a n u a l

achoohomelessAI and Robotics

Oct 14, 2013 (4 years and 9 months ago)


T o app ear in Pr o c ICML

Bla z Zupan Mark o Bohanec Iv an Bratk o Janez Dem sar
Jo zef Stefan Institute Jo zef Stefan Institute F acult y of Computer and F acult y of Computer
Ljubljana Slo v enia Ljubljana Slo v enia Information Sciences and and Information Sciences
blazupanjsi mark o ohanecjsi Jo zef Stefan Institute Univ ersit y of Ljubljana
Ljubljana Slo v enia Ljubljana Slo v enia
iv anratk oriniji janezemsarriniji
Abstract selection of examples to induce the classiation rules
usually this is a tiresome pro cess that requires activ e
a v ailabilit y of a domain exp ert o v er long p erio ds of
W e presen t a new mac hine learning metho d
time Considerable impro v emen ts in this resp ect ma y
that giv en a set of training examples induces
b e exp ected from metho ds that automate or at least
a deition of the target concept in terms of a
activ ely supp ort the user in the problem decomp osi
hierarc h y of in termediate concepts and their
tion task
deitions This ectiv ely decomp oses the
problem in to smaller less complex problems
In this pap er w e presen t a metho d for dev eloping a
The metho d is inspired b y the Bo olean func
problem decomp osition hierarc h y from examples and
tion decomp osition approac h to the design
in v estigate its applicabilit y in mac hine learning The
of digital circuits T o cop e with high time
metho d is based on function decomp osition an ap
complexit y of ding an optimal decomp osi
proac h originally dev elop ed for the design of digital
tion w e prop ose a sub optimal heuristic al
circuits shenh urst Curtis The goal is to
gorithm The metho d implemen ted in pro
decomp ose a function y F X in to y G A H B
gram Ierarc h y Induction T o ol is ex
where X is a set of input attributes x x and y is

p erimen tally ev aluated using a set of arti
the class v ariable F G and H are functions partially
cial and real orld learning problems It is
sp ecid b y examples i b y sets of attribute alue
sho wn that the metho d p erforms w ell b oth in
v ectors with assigned classes A and B are subsets of
terms of classiation accuracy and disco v ery
input attributes suc h that A B X The functions
of meaningful concept hierarc hies
G and H are determined in the decomp osition pro
cess and are not predeed in an y w a y Their join t
complexit y etermined b y some complexit y measure
should b e lo w er than the complexit y of F Suc h a de
comp osition also disco v ers a new in termediate concept
c H B Since the decomp osition can b e applied
T o solv e a complex problem one of the most general
recursiv ely on H and G the result in general is a hier
approac hes is to decomp ose it in to smaller less com
arc h y of concepts F or eac h concept in the hierarc h y
plex and more manageable subproblems In mac hine
there is a corresp onding function uc h as H B that
learning this principle is a foundation for structured
determines the dep endency of that concept on its im
induction hapiro instead of learning a sin
mediate descendan ts in the hierarc h y
gle complex classiation rule from examples dee a
goalubgoal hierarc h y and learn the rules for eac h of
The prop osed decomp osition metho d is limited to
the subgoals Originally Shapiro used structured in
nominal alued attributes and classes It w as imple
duction for the classiation of a fairly complex c hess
men ted in program Ierarc h y Induction T o ol
endgame and demonstrated that the complexit y and
In this pap er w e do not describ e the sp eci noise han
comprehensibilit y brainompatibilit y of the ob
dling mec hanism in
tained solution w as sup erior to the unstructured one
The reminder of the pap er is organized as follo ws
T ypically applications of structured induction in v olv e
Section o v erviews the related w ork The learning
a man ual dev elopmen t of the hierarc h y and a man ualmetho d is describ ed in detail in section and exp er applied b y Shapiro Their approac h is based on
imen tally ev aluated in section on sev eral domains a man ual decomp osition of the problem and an exp ert
of diren t complexit y The pap er is concluded b y a assisted selection of examples to construct rules for the
summary and p ossible directions of further w ork concepts in the hierarc h y In comparison with stan
dard decision tree induction tec hniques structured in
duction exhibits ab out the same classiation accuracy
with the increased transparency and lo w er complexit y
of the dev elop ed mo dels Mic hie emphasized
The decomp osition approac h to mac hine learning w as
the imp ortan t role of structured induction and listed
used b y a pioneer of artiial in telligence A Sam uel
sev eral real problems that w ere solv ed in this w a y
He prop osed a metho d based on a signature table sys
The concept hierarc h y has also b een used b y a
tem am uel and successfully used it as an ev al
m ultittribute decision supp ort exp ert system shell
uation mec hanism for his c hec k ers pla ying programs
DEX ohanec and Ra jk o vi c There a treeik e
This approac h w as later impro v ed b y Biermann et al
structure of v ariables is deed b y a domain exp ert
Their metho d ho w ev er did not address the
DEX has b een successfully applied in more than
problem of deriving the structure of concepts
realistic decision making problems
A similar approac h had b een deed ev en earlier
The metho d presen ted in this pap er therefore b or
within the area of switc hing circuit design Ashenh urst
ro ws from three diren t researc h areas it shares
rep orted on a unid theory of decomp osition
the motiv ation with structured induction and struc
of switc hing functions His decomp osition metho d w as
tured approac h to decision supp ort while the core
essen tially the same as that of Biermann et al except
of the metho d is based on Ashenh ursturtis func
that it w as used to decomp ose a truth table of a sp eci
tion decomp osition In comparison with related w ork
Bo olean function to b e then realized with standard bi
the presen t pap er is original in the follo wing asp ects
nary gates Most of other related w ork of those times
new metho d for handling m ulti alued attributes and
is rep orted and reprin ted b y Curtis
classes impro v ed decomp osition heuristics empha
Recen tly the Ashenh ursturtis approac h w as sub
sis on generalization ects of decomp osition pa ying
stan tially impro v ed b y researc h groups of M A
strong atten tion to the disco v ery of meaningful con
P erk o wski T Luba and T D Ross P erk o wski et al
cept hierarc hies and exp erimen tal ev aluation on ma
rep ort on the decomp osition approac h for in
c hine learning problems
completely sp ecid switc hing functions Luba
prop oses a metho d for the decomp osition of m ulti
v alued switc hing functions in whic h eac h m ulti alued
v ariable is enco ded b y a set of Bo olean v ariables The
authors iden tify the p oten tial usefulness of function
This section presen ts the decomp osition metho d
decomp osition for mac hine learning Goldman et al
First w e in tro duce the metho d b y an example Next
ev aluate FLASH a Bo olean function decom
w e formally presen t the decomp osition algorithm and
p oser on a set of eigh tttribute binary functions and
conclude with a note on the implemen tation
sho w its robustness in comparison with C decision
tree inducer
F eature disco v ery has b een at large in v estigated b y
constructiv e induction ic halski P erhaps clos Supp ose a function y F x x x is giv en where x

est to the function decomp osition metho d are the con x and x are attributes and y is the target concept

structiv e induction systems that use a set of existing y x and x can tak e the v alues lo med hi x can

attributes and a set of predeed constructiv e op era tak e the v alues lo hi The function F is partially
tors to deriv e new attributes fahringer Raga sp ecid with a set of examples in T able
v an and Rendell
There are three nonrivial partitions of the at
Within mac hine learning there are other approac hes tributes h x ijh x x i h x ijh x x i and h x ijh x x i

that are based on problem decomp osition but where and three corresp onding decomp ositions y
the problem is decomp osed b y the exp ert and not dis G x H x x y G x H x x and y

co v ered b y a mac hine A w ellno wn example is struc G x H x x These decomp ositions are giv en

tured induction term in tro duced b y Donald Mic hie in Figure The comparison sho ws that

y y

lo lo
lo lo
lo lo
lo lo lo lo
lo med
lo hi lo lo
lo hi
lo med
med lo
med med
lo hi
med med
med hi
med med hi lo
hi hi
hi lo hi med
hi hi
hi hi


hi med hi hi
hi hi
x c



c x

c x

lo lo

lo hi

med lo
lo lo
med hi

lo med
hi lo
lo lo
lo hi
hi hi
lo hi
med med
med lo
med hi

med hi
hi lo
hi lo
x x
hi hi


x x x x

Figure Three diren t decomp ositions of the example set from T able
x x x y

lo lo lo lo
lo lo hi lo
The core of the decomp osition algorithm is a single
lo med lo lo
step de c omp osition whic h giv en a set of examples E

lo med hi med
that partially sp ecify the function c F X and a

lo hi lo lo
lo hi hi hi partition of attributes X to sets A and B decomp oses
med med lo med
F in to c G A c and c H B This is done b y

med hi lo med
constructing the example sets E and E that par

med hi hi hi
tially sp ecify G and H resp ectiv ely X is a set of
hi lo lo hi
attributes x x and c is a new in termediate

hi hi lo hi
concept A is called a fr e e set and B a b ound set suc h
T able Set of examples that partially describ e the that A B X and A B E and E are

function y F x x x disco v ered in the decomp osition pro cess and are not

predeed in an y w a y
The singletep decomp osition starts with the deriv a
Example sets in the decomp osition y
tion of partition matrix
G x H x x are o v erall smaller than those

for the other t w o decomp ositions
Deition Giv en a disjoin t partition of X to A j B
a p artition matrix P is a tabular represen tation of
The new concept c H x x uses only three

example set E with all com binations of v alues of at

v alues whereas that for H x x uses four and

tributes in A as ro w lab els and of B as column lab els
that for H x x uses e

Eac h example e E has its corresp onding en try

By insp ecting the example sets for H and G it in P with a ro w index A e and a column index

is easy to see that c corresp onds to MIN x x B e P en tries with no corresp onding examples in

and y to MAX x c It is harder to in terpret E are denoted with A column a of P is called

the sets of examples for G H G and H nonmpt y if there exists e E suc h that B e a

Among the three attribute partitions it is therefore Eac h column in the partition matrix denotes the b e
b eneial to decide for h x ijh x x i and decomp ose ha vior of F when the attributes in the b ound set are

y F x x x to y G x c and c constan t Columns that exhibit the same b eha vior

H x x are called compatible and can b e represen ted with the
same v alue of c An example partition matrix is giv en

x lo lo med med hi hi

in Figure a
x x lo hi lo hi lo hi

lo lo lo lo med lo hi
Deition Columns a and b of partition matrix med med med hi
hi hi hi
P are c omp atible if F e F e for ev ery pair
of examples e e E with A e A e and

B e a B e b The n um b er of suc h pairs is

denoted d a b

Note that according to this deition the unsp eci

d P en tries are compatible with an y v alue The

n um b er of v alues for c corresp onds to the n um b er of

groups of m utually compatible columns The lo w est loi


n um b er of suc h groups is called c olumn multiplicity

and denoted b y A j B It is deriv ed b y the coloring hio medo

of column incompatibilit y graph


Deition Column inc omp atibility gr aph I is a


pair V E where eac h nonmpt y column i of P

is represen ted with a v ertex v V and an edge

v v E connects t w o v ertices if the corresp ond

Figure P artition matrix with column lab els c for

ing columns of v and v are incompatible

the attribute partition h x ijh x x i and set of exam

ples from T able and corresp onding column in
Then A j B is the n um b er of colors needed to color
compatibilit y graph Colors ab els of the v ertices
I Namely the prop er coloring guaran tees that t w o
are circled
v ertices represen ting incompatible columns are not as
signed the same color The same colors are only as
signed to the columns that are compatible Therefore
the v alue of c It is therefore straigh tforw ard to deriv e

the optimal coloring disco v ers the lo w est n um b er of
an example set E from the colored I A ttribute
groups of compatible P columns An example of
set for these examples is B Eac h v ertex in I is an
colored incompatibilit y graph is giv en in Figure b
example in set E Color c of the v ertex is the class

Graph coloring is an NPard problem and the com of the example
putation time of an exhaustiv e searc h algorithm is pro
E is deriv ed as follo ws F or an y v alue of c and com

hibitiv e ev en for small graphs with ab out v ertices
bination of v alues of attributes in A c G A c is

Instead P erk o wski et al suggested a Color In
determined b y lo oking for an example e in ro w A e

ence Metho d of p olynomial complexit y and sho w ed
and in an y column lab eled with the v alue of c If suc h

that the metho d p erformed w ell compared to the opti
example exists an example with attribute set A f c g

mal algorithm The Color Inence Metho d sorts the
and class c F e is added to E

v ertices to color b y their decreasing connectivit y and
then assigns to eac h v ertex a color that is diren t from Decomp osition generalizes ev ery undeed en try
the colors of its neigh b ors so that a minimal n um b er of P in ro w a and column b if a corresp onding
of colors is used W e use the same coloring metho d example e with a A e and column B e with

with the follo wing impro v emen t when a color is to the same lab el as b is found F or example an en try
b e assigned to v ertex v and sev eral compatible v er P ioi of partition matrix in Figure a
tices ha v e already b een colored with diren t colors w as generalized to hi b ecause the column oi has
the color is c hosen that is used for a group of colored the same lab el as columns oo and io
v ertices v v that are most c omp atible to v The

In our implemen tation the incompatibilit y graph is

degree of compatibilit y is estimated as d v v ee

constructed directly from the set of examples a v oiding
Deition for d
the construction of partition matrix for eiency rea
Eac h v ertex in I denotes a distinct com bination of sons The algorithm st sorts the examples E based
v alues of attributes in B and its lab el olor denotes on the v alues of attributes in A and v alues of c The
Input Initial set of examples describing
The decomp osition algorithm will decomp ose E and

a single output concept
the function F it partially represen ts only if its decom
Output Its hierarc hical decomp osition
p osed functions G and H are o v erall less complex than
get an initial example set E and mark it decomp osable
F Therefore the partition A j B can b e used to decom
p ose E to E and E if and only if A j B F

while decomp osable example set E that partially

W e sa y that example set E is decomp osable if there

sp ecis c F x x with m do

exists a partition A j B with this prop ert y
ev aluate all p ossible partitions A j B of X h x x i

suc h that A B X A B and jj B jj b
select the b est partition A j B
if E is decomp osable using A j B then

decomp ose E to E and E suc h that

c G A c and c H B where G and H

The time complexit y of single step decomp osition of
are partially sp ecid b y E and E

E to E and E whic h consists of sorting of E
mark E and E decomp osable

j j
deriving the incompatibilit y graph and coloring it is
else mark E nonecomp osable

O N log N O N k O k where N is the n um b er
of examples in E and k is the n um b er of v ertices in

Algorithm The decomp osition algorithm
I F or an y b ound set B the upp er b ound of k is

k ax jj x jj where b jj B jj The n um

b er of disjoin t partitions considered b y decomp osition
examples with the same A e constitute groups that

when decomp osing E with m attributes is

corresp ond to ro ws in partition matrix P Within

eac h group examples with the same v alue of c con X X
m e m

O m
stitute subgroups Tw o examples that are in the same
j j

group but in diren t subgroups ha v e a corresp onding
edge in I
The highest n um b er of n decomp ositions is required
when the hierarc h y is a binary tree where n is the
n um b er of attributes in the initial example set The
running time of the decomp osition algorithm is th us

The decomp osition aims to disco v er a hierarc h y of

O N log N N k k m

concepts describ ed with example sets that are o v er

all less complex than the initial one Since an exhaus

tiv e searc h is prohibitiv ely complex the decomp osition
O n N log N N k k

uses a sub optimal iterativ e algorithm lgorithm
Therefore the algorithm complexit y is p olynomial
In eac h step the algorithm tries to decomp ose a single
in N n and k Note that the b ound b is a user

example set of the ev olving structure It ev aluates all
deed constan t This analysis clearly illustrates the
p ossible disjoin t partitions of the attributes and selects
b enes of setting b to a suien tly lo w v alue In our
the b est one This step requires a soalled p artition
exp erimen ts b w as set to
sele ction me asur e A p ossible measure is the n um b er of
v alues of the new concept A j B The b est partition
A j B is the one with the lo w est A j B
An alternativ e measure for the selection of partitions The mac hine learning metho d based on function de
is based on the complexit y of function F Let F comp osition w as implemen ted in the C language as a
b e deed on attributes x X with class v ari system called ierarc h y INduction T o ol The

able y In this attributelass space there are a to system runs on sev eral UNIX platforms including HP

jj jj

UX SGI Iris and SunOS The deition of domain

tal of N X y jj y jj p ossible func

names and examples and the guidance of the decom
tions where jj y jj and jj x jj represen t the cardinal

p osition is managed through a script language
ities of v alue sets of y and x resp ectiv ely The

n um b er of bits to enco de F is therefore F
log N X y og jj y jj jj x jj Decom EXPERIMENT AL EV ALUA TION

p osition prefers to disco v er functions of lo w complex
it y so the measure is therefore deed as A j B W e exp erimen tally ev aluated the decomp osition
G H metho d using the follo wing datasetsDomain n N Class names and their
MM A function y MIN x A V G x MAX x

relativ e frequencies
x x with v alued attributes and class

While the deition of MIN and MAX is stan
LENSES hard soft no
dard the function A V G computes the a v erage of
its argumen ts and rounds it to the closest in teger MONK
CAR unacc acc
good vood
LENSES A small domain tak en from UCI mac hine
NURSER Y unacc acc
learning rep ository urph y and Aha Us
vcc prior
ing patien t age sp ectacle prescription astigma
tism and tear pro duction rate eac h example de
T able Some c haracteristics of domains used in the
scrib es whether the patien t should w ear soft or
exp erimen ts n is the n um b er of attributes and N the
hard con tact lenses or no lenses at all
dataset size
MONK and MONK W ellno wn sixttribute
binary classiation problems tak en from the
same rep ository urph y and Aha Thrun
w ere split to training and test sets of sizes p and p
et al A ttributes are to v alued
resp ectiv ely for p from to deriv ed
MONK has an underlying concept x x

a concept hierarc h y and corresp onding classir using
OR x and MONK the concept x for

the examples in the training set and w as tested for
exactly t w o c hoices of i f g
classiation accuracy on the test set F or eac h p the
results are the a v erage of randomly c hosen splits
CAR and NURSER Y F or these t w o domains hi
The learning curv e is compared to the one obtained
erarc hical classirs in DEX ohanec and Ra
b y C inductiv e decision tree learner uinlan
jk o vi c formalism already existed These
run on the same data C used the default options
w ere used to obtain a set of examples from whic h
except for whic h w as observ ed to obtain a b etter
decomp osition tried to reconstruct the original hi
classiation accuracy than the default Accuracy
erarc hies CAR ev aluates cars based on their price
is measured on unpruned decision trees for the same
and tec hnical c haracteristics This simple mo del
reason F or eac h p the signiance of the dirence
w as dev elop ed for educational purp oses and is de
b et w een C and is determined using a paired
scrib ed in ohanec and Ra jk o vi c NURS
t est with conence lev el
ER Y is a real orld mo del dev elop ed to rank ap
plications for n ursery sc ho ols la v e et al
The learning curv es are giv en in Figure F or all the
domains other than LENSES outp erforms C
The original datasets are noiseless They completely With more than of examples in the training set
co v er the attribute space for all domains other than this dirence is alw a ys signian t Moreo v er
MONK and MONK where the co v erage is learning curv es con v erge faster to the desired
and resp ectiv ely Some other domain c harac whic h is in turn nev er reac hed b y C F or LENSES
teristics are giv en in T able there are no signian t dirences in the classiation
accuracy of the t w o learners It is also in teresting
The decomp osition used column m ultiplicit y as a par
to note that in MM Cs accuracy decreases with
tition selection measure When the complexit y mea
higher co v erage of example space whic h ma y b e ex
sure w as used instead the results w ere similar and are
plained with decreased generalization
not sho wn here
w as further tested on the data sets for MONK
The b ound set size b w as limited to the maxim um of
and MONK used in the detailed study of ma
three elemen ts The decomp osition times on HP J
c hine learning algorithms hrun et al F or
w orkstation w ere all b elo w seconds for all the do
b oth MONK and MONK the training set w as the
mains other than NURSER Y for whic h required
same as our original data set describ ed ab o v e The
ab out seconds for the largest training sets
t w o test sets used in the study consisted of ex
The exp erimen tal ev aluation addressed the classia amples that completely co v ered the attribute space
tion accuracy of and its abilit y to deriv e a com F or MONK the accuracy of is In the
prehensible and meaningful structure p ossibly simi study hrun et al this score w as ac hiev ed b y
lar to the an ticipated one The classiation accuracy learners three v arian ts of A Q Assistan t Profes
learning curv es w ere computed where the datasets sional mF OIL CN t w o v arian ts of Bac kpropagationclcc clcc clcc

p p




Figure Learning curv es for olid line with and for C ashed line with When for a sp eci
relativ e training set size p the classiation accuracy of is signian tly b etter than that of C
data p oin ts are mark ed with
and Cascade Correlation F or MONK the accuracy tiv e condition on a b ound and free set it w as imp ossible
of is In the same study four learners to deriv e concepts comparable to the original concept
p erformed b etter A QCI t w o v arian ts of Bac k deition Ho w ev er the disco v ered concept hierarc h y
propagation and Cascade Correlation It should b e is a reform ulation of the target concept using func
noted that these results w ere obtained b y with tions that coun t s F or LENSES disco v ered
out tuning in less than seconds of CPU time on the structure in Figure whic h w e did not try to in ter
HP J w orkstation pret without the domain exp ert F or CAR and NURS
ER Y igures and the structures disco v ered w ere
F or eac h of the domains and with increasing p
v ery similar to the original DEX mo dels In fact they
con v erged to a single concept structure These are
w ere the same except that some original DEX in terme
sho wn in Figures to with the names of attributes
diate concepts w ere further decomp osed It should b e
and concepts and cardinalit y of their v alue sets F or
emphasized that w e consider this similarit y of concept
MM this is the an ticipated structure except for the
structures as a most signian t indicator of success of
concept A V G x MAX x x x whic h ad

our decomp ositionased learning metho d
ditionally decomp osed b y in tro ducing an in termediate
concept c F or MONK disco v ered the an tici

pated hierarc h y MONK F c x c F x x

with F and F matc hing the exp ected disjunctiv e and

equalit y functions F or MONK b ecause of disjuncMM4/4 LENSES/3 MONK2/2
x1/4 c2/4 c1/3 c2/3 c4/4 c3/3
c1/3 c2/3 e/4 f/2
x2/4 c3/7
age/3 prescr/2 astigm/2 tears/2
a/3 b/3 c/2 d/3
x5/4 c1/4
x3/4 x4/4
Figure Structures disco v ered for MM LENSES and MONK domains
car/4 CAR/4
c2/4 c3/4
price/4 tech/4
buying/4 maint/4 safety/3 c1/3
buying/4 maint/4 safety/3 comfort/4
lug_boot/3 c4/4
doors/4 persons/3 lug_boot/3
doors/4 persons/3
Figure Original eft and disco v ered structure igh t for CAR
CONCLUSION The classiation accuracy results ma y b e biased b e
cause w e ha v e mostly used the domains where w e an
W e in tro duced a new mac hine learning approac h based ticipated the hierarc hies disco v erable b y decomp osi
on function decomp osition A distinguishing feature tion Ho w ev er MONK is a coun ter example where
of this approac h is its capabilit y to disco v er new in
decomp osition w as not able to disco v er the original
termediate concepts organize them in to a hierarc hi
deition of the target concept but rather unexp ect
cal structure and dee the relationships b et w een the
edly its reform ulation
attributes newly disco v ered concepts and target con
The decomp osition approac h as presen ted in this pa
cept In their basic form these relationships are sp ec
p er is limited b y that there is no sp ecial mec hanism
id b y newly constructed example sets In a w a y
for handling noise and con tin uous attributes Ho w
the learning pro cess can th us b e view ed as a pro cess
ev er preliminary results on using an extended v ersion
of generating new equiv alen t example sets whic h are
of decomp osition for con tin uously alued data sets in
consisten t with the original example set The new sets
em sar et al and preliminary results on noise
are smaller ha v e smaller n um b er of attributes and
handling extension strongly encourage further dev el
in tro duce in termediate concepts Generalization also
opmen ts in this direction
o ccurs in this pro cess
W e ha v e ev aluated the decomp ositionased learning
metho d on six datasets In terms of classiation ac
curacy the decomp osition signian tly outp erformed
Ashenh urst R L The decomp osition of switc h
C in all but one domain The examples also sho w
ing functions T e chnic al r ep ort Bell Lab oratories
that the decomp osition is useful for disco v ery of new
BL pages
in termediate concepts F or example the decomp osi
tion w as able to disco v er an appropriate concept hier Biermann A W F airld J and Beres T
arc h y appro v ed b y domain exp erts for a rather com Signature table systems and learning IEEE
plex real orld NURSER Y domain T r ans Syst Man Cyb ern nursery/5
employ/4 struct_finan/3 soc_health/3
c6/5 c3/3
parents/3 has_nurs/5 housing/3 finance/2 structure/3 social/3 health/3
c5/4 c4/3 social/3 health/3
form/4 children/4
parents/3 has_nurs/5 c2/3 c1/3
housing/3 finance/2 form/4 children/4
Figure Original eft and disco v ered structure igh t for NURSER Y
Bohanec M and Ra jk o vi c V Kno wledge ac Murph y P M and Aha D W UCI Rep os
quisition and explanation for m ultittribute de itory of mac hine learning databases ttp
cision making h Intl Workshop on Exp ert Sys wwwcscidumlearnlre posi tory html
tems and their Applic ations Avignon F rance Irvine CA Univ ersit y of California Departmen t
pp of Information and Computer Science
Ola v e M Ra jk o vi c V and Bohanec M An
Bohanec M and Ra jk o vi c V DEX An ex
application for admission in public sc ho ol sys
p ert system shell for decision supp ort Sistemic a
tems in I T M Snellen W B H J v an de

Donk and JP Baquiast ds Exp ert Systems
in Public A dministr ation Elsevier Science Pub
Curtis H A A New Appr o ach to the Design
lishers orth Holland pp
of Switching F unctions V an Nostrand Princeton
P erk o wski M A et al Unid approac h to
functional decomp ositions of switc hing functions
Dem sar J Zupan B Bohanec M and Bratk o
T e chnic al r ep ort W arsa w Univ ersit y of T ec hnol
I Constructing in termediate concepts
ogy and Eindho v en Univ ersit y of T ec hnology
b y decomp osition of real functions Pr o c Eur o
p e an Confer enc e on Machine L e arning ECML Pfahringer B Con trolling constructiv e induc
Prague tion in CiPF in F Bergadano and L D Raedt
ds Machine L e arning ECML Springer
Goldman J A Ross T D and Gadd D A
V erlag pp
P attern theoretic learning AAAI Spring Sym
Quinlan J R C Pr o gr ams for Machine
p osium Series Systematic Metho ds of Scienti
L e arning Morgan Kaufmann Publishers
Disc overy
Raga v an H and Rendell L Lo ok ahead feature
Luba T Decomp osition of m ultiple alued
construction for learning hard concepts Pr o c
functions th Intl Symp osium on Multiple
T enth International Machine L e arning Confer
V alue d L o gic Blo omigton Indiana pp
enc e Morgan Kaufman pp
Mic halski R S Understanding the nature Sam uel A Some studies in mac hine learning
of learning issues and researc h directions in using the game of c hec k ers I I Recen t progress
R Mic halski J Carb onnel and T Mic hell ds IBM J R es Develop
Machine L e arning A n A rtiial Intel ligenc e Ap
Shapiro A D Structur e d induction in ex
pr o ach Kaufmann Los A tlos CA pp
p ert systems T uring Institute Press in asso ciation
with Addison esley Publishing Compan y
Mic hie D Problem decomp osition and the
learning of skills in N La vra c and S W rob el
Thrun S B et al A p erformance comparison
ds Machine L e arning ECML Notes in Ar
of diren t learning algorithms T e chnic al r ep ort
tiial In telligence Springer erlag pp
Carnegie Mellon Univ ersit y CMUS