Engineering Applications of Artificial Intelligence

reverandrunAI and Robotics

Nov 7, 2013 (3 years and 7 months ago)

65 views

Using Bayesian networks with rule extraction to infer the risk of weed
infestation in a corn-crop
Gla´ ucia M.Bressan
a
,Vilma A.Oliveira
a,
￿
,Estevam R.Hruschka Jr.
b
,Maria C.Nicoletti
b
a
Universidade de Sa˜o Paulo,Departamento de Engenharia Ele´trica,13566-590 Sa˜o Carlos,SP,Brazil
b
Universidade Federal de Sa˜o Carlos,Departamento de Computac-a˜o,13565-905 Sa˜o Carlos,SP,Brazil
a r t i c l e i n f o
Article history:
Received 30 November 2007
Received in revised form
20 January 2009
Accepted 24 March 2009
Available online 14 May 2009
Keywords:
Bayesian network
Naı¨ve Bayes
Rule extraction
Weed infestation
Kriging
a b s t r a c t
This paper describes the modeling of a weed infestation risk inference system that implements a
collaborative inference scheme based on rules extracted fromtwo Bayesian network classifiers.The first
Bayesian classifier infers a categorical variable value for the weed–crop competitiveness using as input
categorical variables for the total density of weeds and corresponding proportions of narrowand broad-
leaved weeds.The inferred categorical variable values for the weed–crop competitiveness along with
three other categorical variables extracted fromestimated maps for the weed seed production and weed
coverage are then used as input for a second Bayesian network classifier to infer categorical variables
values for the risk of infestation.Weed biomass and yield loss data samples are used to learn the
probability relationship among the nodes of the first and second Bayesian classifiers in a supervised
fashion,respectively.For comparison purposes,two types of Bayesian network structures are
considered,namely an expert-based Bayesian classifier and a naı¨ve Bayes classifier.The inference
system focused on the knowledge interpretation by translating a Bayesian classifier into a set of
classification rules.The results obtained for the risk inference in a corn-crop field are presented and
discussed.
& 2009 Elsevier Ltd.All rights reserved.
1.Introduction
Agricultural procedures may modify the ecological balance of a
field due to the tilling procedures growers use to prepare the land,
quite often leading to a population explosion or infestation of
some inconvenient plants commonly known as weeds.Weed
control is a fundamental part of all crop production systems.Yield
reductions due to weeds are commonly known obstacle in harvest
operations as they lower crop quality by competing with the crop
for limited resources,such as water,nutrients,light,etc.Oerke et
al.(1994) estimated that a 10% loss of worldwide agricultural
production might be a consequence of weed activity.
In general,the main components of weed management
systems are herbicides.Usually,herbicides are uniformly spread
over the entire field aiming at weed control.A uniformapplication
rate is often based on a visual evaluation of the weed density,with
no procedure used to evaluate the risks associated with under and
over spraying (Faechner et al.,2002).However,weed infestation
does not occur over the entire field and the amount of herbicides
could be reduced by spraying only over the weed patches
(Wallinga et al.,1998;Jurado-Expo´ sito et al.,2004).The prediction
of weed dispersion can be efficiently used in preventing infesta-
tions by applying herbicides only in specific regions (Jurado-
Expo´ sito et al.,2003;Faechner et al.,2002).Reducing the quantity
of herbicides potentially reduces herbicide residues in water,food
crops and in the environment,and it may prevent the develop-
ment of weed resistance (Aitkenhead et al.,2003).
In the literature,a considerable diversity of weed management
decision models can be found.There are many different
approaches,ranging from empirical functions to mechanistic
simulation models.As surveyed by Wilkerson et al.(2002),some
of the models are too simple as they do not include all factors that
can influence weed competition or other issues farmers consider
when deciding how to manage weeds.Other models can be
excessively complex given that many users might find difficulty in
obtaining the needed information or do not have the required
equipment for acquiring the data.According to Wilkerson et al.
(2002),weed management decision models must be built and
evaluated from three perspectives:biological accuracy,quality of
recommendations and ease of use.In addition,another important
issue to be taken into account when building weed management
systems is related to the interpretation of the model.The latter is
of particular interest in the experiments conducted in this paper.
There are few formalisms that can be used to model weed
infestation in a crop field.Primot et al.(2006) developed 20
simple models (five are linear regression models and the other 15
ARTICLE IN PRESS
Contents lists available at ScienceDirect
journal homepage:www.elsevier.com/locate/engappai
Engineering Applications of Artificial Intelligence
0952-1976/$- see front matter & 2009 Elsevier Ltd.All rights reserved.
doi:10.1016/j.engappai.2009.03.006
￿
Corresponding author.Tel.:+551633739336;fax:+551633739372.
E-mail address:vilmao@sel.eesc.usp.br (V.A.Oliveira).
Engineering Applications of Artificial Intelligence 22 (2009) 579–592
are logistic regression models).The models were evaluated for
their ability to discriminate the fields with a high level of weed
infestation from the fields with a low level of infestation—the
parameters of the 20 models were estimated using 3 years of
experimental data.The models can be used to help farmers decide
what type of weed control (chemical,mechanical or biological)
to use.
The risk of weed infested crop can be inferred from the
mathematical modeling of the weed behavior,based on experi-
mental data.Dynamic models for weed seed populations describe
the population size at life-cycle t as a function of the population
size at life-cycle t 1 using difference (Sakai,2001;Cousens and
Mortimer,1995).The dynamic models indicate that infestation is
not only dependent upon the weed density but also on the
competitiveness of the weed species (Park et al.,2003;Firbank
and Watkinson,1985;Kropff and Spitters,1991).More recently,
competitive indexes and weed ranking were used to quantify the
weed competitiveness in a soybean field (Hock et al.,2006).
Although purely mathematical models can be used for modeling
the weed risk of infestation,with good performance,as described
in several of the previous references,most of themlack flexibility
and more important,lack interpretability—they work as ‘black
boxes’ where the user feeds a fewvalues and the systemoutputs a
diagnosis.
A particular class of models is based on probability.Of special
interest in this paper is the class of Bayesian networks (BN)
models,which are based on the probability that a given set of
measurements define objects as belonging to a certain class.In the
literature,Bayesian based methods have already been used for
modeling similar problems (Hughes and Madden,2003;Smith
and Blackshaw,2002;Banerjee et al.,2005).Particularly,Hughes
and Madden (2003) proposed a risk assessment methodology to
identify which exotic plant species,among those presented for
import,are a threat (to agricultural and ecological systems) and
which are not.Bayesian theory has also been employed in the
agriculture domain as the basis for developing classification
systems,as described in Granitto et al.(2002).In their work,the
performance of a naı
¨
ve Bayes classifier (BC) is used as the
selection criterion for identifying a nearly optimal set of 12 seed
characteristics further used as classification parameters,such as
coloration,morphological and textural features.Considering the
seed identification problem,the work described in Granitto et al.
(2005) compared naı¨ve Bayes classifier performance to an
artificial neural network (NN) based classifier.In this particular
experiment the naı¨ve Bayes classifier with an adequately selected
set of classification features outperformed the NN based classifier.
Similar result was also obtained in Marchant and Onyango (2003)
but with a Bayesian classifier and a multilayer feed-forward
neural network in a task for discriminating plants,weeds,and soil
in color images.
The main goal of this paper is to propose and describe the use
of Bayesian network methods to infer the risk of weed infestation
in a corn-crop as well as to present and discuss the results
obtained in a real application domain based on empirical data.
The procedure is implemented as a collaborative system that
integrates two classification tasks.The first uses a Bayesian
network to infer the competitiveness of weeds expressed by their
biomass,using as input the total density of weeds,and
corresponding narrow and broad-leaved proportions.The second
task assesses the risk of infestation,expressed by the yield loss,
using as input the previous inferred competitiveness,as well as
features extracted fromthe weed seed density,weed coverage and
weed seed patches.The three last variables are estimated with a
geostatistics method called kriging (Brooker,1979;Isaaks and
Srivastana,1989) and image objects (Gonzalez and Woods,2002)
fromweed seed density and weed coverage data samples.
In addition,the paper also presents the translation of the
induced Bayesian networks into a set of classification rules,
aiming at a more comprehensible knowledge representation.As
mentioned
before,this is an important aspect of a knowledge
based system construction,since it provides the system cred-
ibility,a quality that other types of representation lack.Therefore,
the main idea of the conducted experiments is not to show that
the translation method is better than traditional classifiers (as
C4.5,for instance) or rule extraction methods.The claimis that it
is possible to take advantage of both the causal knowledge
representation (which can be adequately represented in a BN or
BC) and high accuracy of a Bayesian classifier to have a set of
classification rules (extracted from the BC) as a knowledge base.
For both classification tasks implemented by the collaborative
system,two different Bayesian network structures are used for
comparison purposes.One is induced by the naı¨ve Bayes
algorithm (Duda and Hart,1973) using empirical data and the
other,an unrestricted Bayesian network,is designed and refined
by an expert using the same empirical data.The networks in this
paper are referred to as naı¨ve Bayes and expert-based networks,
respectively.Due to their different architectures,the two Bayesian
networks have different performances,depending on the available
information.A set of probabilistic classification rules is then
extracted from each of the Bayesian networks using a Markov-
based strategy proposed in Hruschka et al.(2008).To reduce the
number of rules where the Markov-based strategy does not
remove categorical variables,a pruning strategy is proposed.The
pruning strategy is mainly motivated by the fact that no extra
computation effort is needed.The pruning can be done by
considering only the rules having estimated probability higher
than a predefined threshold.This paper is an extended and revised
version of two earlier conference papers namely Bressan et al.
(2007a,b).
The remaining of this paper is organized as follows.Section 2
describes the basics of Bayesian networks and naı¨ve Bayes
classifiers and discusses the importance of improving their
understandability.Section 3 focuses on two important issues:
the approach used to collect and to interpolate empirical data,and
the construction of the collaborative system that integrates two
Bayesian classifiers.Section 4 presents the results of the
collaborative system,focusing on the results of the individual
classifiers,that is,the Bayesian network and the naı¨ve Bayes
classifiers.Finally,Section 5 presents some concluding remarks
and highlights the next steps for this research work.
2.Basics of Bayesian networks,Markov blanket and
classification rules
As pointed out in Heckerman et al.(2000),Bayesian networks
and Bayesian classifiers are usually employed in data mining tasks
mainly because they (i) may deal with incomplete data sets
straightforwardly;(ii) can learn causal relationships;(iii) may
combine prior knowledge with patterns learnt from data and (iv)
can help to avoid overfitting.
A Bayesian network can be viewed as a form of probabilistic
graphical model used for knowledge representation and reasoning
about data domains.Instead of encoding a joint probability
distribution over a set of random variables,as usually done by a
Bayesian network,a Bayesian classifier usually aims to correctly
predict the value of a discrete class variable given the value of a
vector of features (predictors).Since Bayesian classifiers are a
particular type of Bayesian networks the concepts and results
described in this section are valid for both.
A Bayesian network consists of two components—a network
structure,which is a directed acyclic graph,and a set of
ARTICLE IN PRESS
G.M.Bressan et al./Engineering Applications of Artificial Intelligence 22 (2009) 579–592580
probability tables.The nodes of the Bayesian network represent
variables and the arcs between nodes represent dependence
relation between the corresponding variables.An arc starting at a
node X (representing variable X) and ending at a node Y
(representing variable Y) establishes X as a parent of Y and Y as
a child of X.A Bayesian network can be used to compute the
conditional probability of one node,given values assigned to the
other nodes.Hence,a Bayesian network can be used as a classifier
that gives the posterior probability distribution of the class node
given the values of other attributes.When learning Bayesian
networks from datasets,nodes are used to represent dataset
features.
Consider a finite set fX
i
;i ¼...;ng of discrete randomvariables
where each variable may take on values (represented by lower-
case letters) from a finite set.As formally stated in Cheng et al.
(2002),a Bayesian network is represented by BN ¼ hN;A;
Y
i where
the component oN;Ai is a directed acyclic graph with nodes
X
i
2 N,i ¼...;n representing domain variables and arc a 2 A
between nodes representing a probabilistic dependency between
the associated nodes and,finally,denoting
p
X
i
as the set of parents
of X
i
in hN;Ai,the last component of BN is given by
Y
¼ f
y
X
i
j
p
X
i
¼
PðX
i
j
p
X
i
g for each possible value x
i
of X
i
and
p
x
i
of
p
X
i
which
collectively represents a conditional probability distribution
(CPtable) that quantifies how a node X
i
2 N depends on its
parents.The conditional independence assumption (Markov
condition) allows the calculation of the joint probability distribu-
tion function over the variables fX
i
;i ¼...;ng based on the
background knowledge as
PðX
1
;X2;...;X
n
Þ ¼
Y
n
i¼1
PðX
i
j
p
X
i
Þ
¼
Y
n
i¼1
y
X
i
j
p
X
i
,(1)
where n ¼ jNj.Therefore,a Bayesian network can be used as a
knowledge representation that allows inferences.
Bayesian networks can be built by an expert,or can be learnt
fromdata.The learning of a Bayesian network can be divided into
two procedures:one responsible for the network structure
learning and the other responsible for the conditional probability
tables learning for the structure.The learning of these tables can
be carried out using empirical conditional frequencies from data
(Cheng et al.,2002).When building a Bayesian network based on
subject specialist knowledge,the major problemis the conditional
distribution probability definition.This is due to human beings
tendency to miscalculate probabilities (Tversky and Kahneman,
1974).To avoid this difficulty it is possible to use expert
knowledge to build only the Bayesian network structure and then
use learning algorithms to induce
Y
from data.
2.1.Markov blanket
In a Bayesian network structure,with
l
X
i
as the set of children
of node X
i
and
p
X
i
as the set of parents of node X
i
,the subset of
nodes containing
p
X
i
,
l
X
i
and the parents of
l
X
i
is called the
Markov blanket of X
i
,as shown in Fig.1.As stated in Pearl (1988),
in a Bayesian network the only nodes that have influence on the
conditional probability distribution of a given node X
i
are the
nodes that belong to the Markov blanket of X
i
.Thus,after learning
a Bayesian network classifier fromdata,the Markov blanket of the
node that represents the class can be used as a feature subset
selection method,in order to identify,from all the nodes that
define the network,those that influence the class node.
As previously mentioned,Bayesian networks can also be used
as classifiers.A Bayesian network,however,is not designed to
optimize the conditional likelihood of the class given the other
features (Domingos and Pazzani,1997).Consequently,Bayesian
networks may not produce good classification results.Actually,
even the naı
¨v
e Bayes classifier can outperform more complex
Bayesian networks classifiers in some domains (Friedman et al.,
1997).
A naı¨ve Bayes is a Bayesian network with a fixed structure,in
which the class node has no parents and each feature has the class
node as its unique parent.Since naı¨ve Bayes classifiers have their
structure predefined,only the numerical parameters need to be
learnt;thus only information about the features and their
corresponding values are needed to estimate probabilities.The
computational time complexity of learning a naı¨ve Bayes classifier
is linear with respect to the amount of training instances.The
construction is also space efficient,requiring only the information
provided by two-dimensional tables (CPtables),in which each
entry corresponds to a probability estimated for a given value of a
particular feature.However,the naı¨ve Bayes classifier makes a
strong and unrealistic assumption:all the features are condition-
ally independent given the value of the class.
2.2.Classification rules
The knowledge represented by a Bayesian classifier is not as
comprehensible as some other forms of knowledge representa-
tion,as for instance,classification rules.In the literature there are
a few works that aim at improving the readability/understand-
ability of Bayesian classifiers;for instance,Moz
ˇ
ina et al.(2004)
implements a visualization process of a naı¨ve Bayes model in the
formof a nomogram.In Hruschka et al.(2008),after inducing the
Bayesian classifier,the BayesRule method improves the under-
standability by implementing its translation into a set of
probabilistically qualified if–then rules of the form
If condition then class with certainty F,(2)
where the condition is called antecedent and F is a percentage
value.
In the BayesRule method,the a posteriori probability for the
rules is evaluated as follows.Let v
1
;v
2
;...;v
n
;c be the sets of
categorical variables values for X
1
;X
2
;...;X
n
and C,respectively.
Also,let v
i
¼ fv
i1
;...;v
ij
i
g,that is,jv
i
j ¼ j
i
;i ¼ 1;...;n and c ¼
fc
1
;...;c
j
g,that is jcj ¼ j.
By using the BayesRule method,the number of variables
involved in the condition part of a rule is reduced since the
method only considers the Markov blanket of the class variable C.
Considering a particular situation where the Markov blanket of
the class variable C is the set fX
1
;...;X
k
g,the a posteriori
probability of class C ¼ c

2 fc
1
;...;c
j
g given the values of the
variables in the Markov blanket of class C for a particular
ARTICLE IN PRESS
X
i
Fig.1.A network structure and the Markov blanket of node X
i
represented by
shadowed nodes.
G.M.Bressan et al./Engineering Applications of Artificial Intelligence 22 (2009) 579–592 581
instantiation of indexes J
i
;i ¼ 1;...;k is
PðC ¼ c

jv
1;J
1
;...;v
k;J
k
Þ
¼ arg max
J2f1;...;jg
fPðC ¼ c
J
jv
1;J
1
;...;v
k;J
k
Þg,(3)
with
PðC ¼ c
J
jv
1;J
1
;...;v
k;J
k
Þ ¼ PðC ¼ c
J
Þ
Pðv
1;J
1
;...;v
k;J
k
jC¼c
J
Þ
Pðv
1;J
1
;...;v
k;J
k
Þ
.(4)
For the naı¨ve Bayes network model the features are assumed to be
independent given the class C ¼ c
J
and (4) becomes
PðC ¼ c
J
jv
1;J
1
;...;v
k;J
k
Þ/PðC ¼ c
J
Þ
Y
k
i¼1
Pðv
i;J
i
jC ¼ c
J
Þ.(5)
A categorical probabilistic if–then rule has the form
R
r
:
If X
1
is v
1;J
1
and    and X
k
is v
k;J
k
then
C is c
J
with certainty F given by (3),(6)
where index r is used for referencing the rules given by the
BayesRule method.
The confidence of a rule can be defined using inferential
results.In doing so,the probability given to the inferred class may
be used as a confidence value and it is embedded in the inference
algorithm.Among the many methods for data understanding,the
BayesRule method focuses on translating a Bayes classifier into a
set of classification rules in their simplest,propositional form,as a
way of promoting the understandability of the corresponding
Bayesian network classifier.Reasoning with logical rules is more
acceptable to users than the recommendations given by black box
systems.Moreover,reasoning with rules is comprehensive,
provides explanations,and can be validated by human inspection.
3.Bayesian network inference modeling
To present the collaborative inference system for the risk of
weed infestation in a corn-crop,this section is organized into two
pats.The first one describes the procedure for collecting and
preparing the data and the second how the data were used to
model two Bayesian network classifiers (the naı¨ve Bayes and the
expert-based) for inferring the risk of a weed infestation in a corn-
crop.Fig.2 presents an schematic diagram of the proposed
collaborative inference system.
3.1.Collecting and preparing the data
In the experiments described in this paper,data from a corn-
crop field located in an experimental farm of the Empresa
Brasileira de Pesquisa Agropecua´ ria (Embrapa),in Sete Lagoas,
Minas Gerais,Brazil,were used.
1
A field of a 49ha area was tilled
in 16–20 November 2004 and again in 15–19 May 2006.The area
contains 41 experimental field parcels 100m distant from each
other.The parcels are rectangular measuring 4m (east–west
direction) and 3m (north–south direction),with five corn rows
separated from each other by 0.7m,starting at 0.1m from the
bottom edge.Before the crop development,the glifosate 2.4kg
active ingredient (a.i.) ha
1
herbicide was applied outside the
parcels.Also,after the crop development,nicosulfuron 0.04kg
(a.i.) ha
1
and atrazine 1kg (a.i.) ha
1
herbicides were applied all
over the field,except on the parcels.The samples per parcel were
obtained in April 2005 and October 2006 for two different corn-
crops,excepted for the yield loss which was evaluated in June
2005 and November 2006.
To obtain the weed density data,that is,the number of weeds
per m
2
in each parcel and the biomass of the species,four squares
measuring 0:5m0:5m were randomly placed within each
parcel and the narrow-leaved and broad-leaved weed species
were collected and counted.Then,the weed species were
separated into bags and kept in a greenhouse at the temperature
of 105

C until their mass has become constant.At this point,the
biomass of the species,defined as the amount of dry material per
m
2
of the aerial part of weeds,was measured.The weed density
and the biomass samples were collected in each experimental
parcel.Therefore,82 data instances were obtained,that is,two
data instances for each of the 41 parcels.Analyzing the collected
data,11 data instances were identified as outliers and removed.
To obtain weed seed production per m
2
,the weed seeds of one
weed from each specie were counted and multiplied by the
number of weeds found in the squares.The weed coverage data
were estimated by visual observation of the percentage of surface
infested by weeds.This coverage is mainly due to the weed seeds
from the previous weed population which germinated.The weed
seed density,associated to the seed production,and the weed
coverage samples were collected,as described above,from each
ARTICLE IN PRESS
Broad-leaved
weed density
Narowed-leaved
weed density
Inferring the competitiveness of weed-crop
Total weed density
Bayesian network
classifier
Weed seed
density
Weed
coverage
Risk
Bayesian network
classifier
Geoestatistics
image analisys
Inferring the risk of infestation
x
2
x
1
x
3
x
4
Fig.2.Input–output of the proposed collaborative classification system.
1
Embrapa—Project 55.2004.509.00:Rede de Conhecimento em Agricultura
de Precisa˜o para Condic-o˜es do Cerrado e dos Campos Gerais.
G.M.Bressan et al./Engineering Applications of Artificial Intelligence 22 (2009) 579–592582
experimental field parcel also in two different corn-crops,
resulting in 82 data instances.
To evaluate the yield loss per experimental parcels,first the
yield was measured by the mass of the corn grains.The mass of
the corn grains was adjusted for the humidity of 13% and
converted from gm
2
to kgha
1
.Then,the yield loss was
evaluated as
P
RR
i
¼
Y
0
Y
i
Y
0
;i ¼ 1;...;41,(7)
where Y
0
denotes the maximum yield found in the witness
parcels (with herbicide application) and Y
i
the yield of each
experimental parcel (without herbicide application).
3.2.Inferring the risk of infestation
The proposed collaborative inference systemis based on inputs
given by the discretized values of infestation features for the weed
coverage,weed seed production density,weed seed patches and
the weed–crop competitiveness,denoted as x
1
;x
2
;x
3
;x
4
,respec-
tively.As already mentioned,the feature values for the weed–crop
competitiveness are inferred by the first network classifier of the
collaborative system.Following,it is described howthe weed seed
production and weed coverage maps were estimated with kriging
and subsequently treated as images so to obtain the other
features.
3.2.1.Kriging and maps
Interpolation methods have been used in precision farming to
infer the values to non-sampling locations.As already mentioned,
the estimation method used was the geostatistics method called
kriging,an interpolation approach that provides optimal estima-
tive of regionalized variables with minimumvariance and without
bias,using a theoretical variogram (Isaaks and Srivastana,1989;
Shiratsuchi,2001).A variogram,also referred to as a semivario-
gram,shows the degree of spatial dependence among the samples
and generally is an increasing monotonic function that reaches a
plateau.The distance at which the variogramreaches the plateau
is called range.The frequently used models for the theoretical
variograms are described in detail in Isaaks and Srivastana (1989).
The parameters of the theoretical variogram used in a
interpolation problem are selected from an experimental vario-
gram.The experimental variogram
g

ðhÞ is given by the following
equation:
g

ðhÞ ¼
1
2N
h
X
ði;jÞjjh
ij
j¼h
½ZðjÞ ZðiÞ
2
,(8)
where N
h
is the number of pairs of data whose locations are
separated by h,i and j represent the location i and j,respectively,
ZðiÞ is the value of the variable Z at location i,jh
ij
j is the Euclidean
norm of the vector h
ij
and h
ij
is the vector from location i to
location j.
Aiming at finding the most suitable model,the collected
samples were used with the exponential,Gaussian and spherical
variogram models.The exponential model was chosen based on
the criteria suggested in Iwashita and Landim (2003) since it
provided the smallest fit index (FI) for the sample set,as shown in
Table 1 for data collected in 2005 and 2006.The exponential
variogram model is given by
g
ðhÞ ¼
C
0
þC
1
1 e
h
a
  
;0ohpa;
C
0
þC
1
;h4a;
8
>
<
>
:
(9)
with C
0
the nugget effect,C
1
the variance of variable Z,C
0
þC
1
the
sill,and a the range.The fit index over the pairs of data whose
locations are separated by all N vectors h named h
k
;k ¼ 1;...;N,is
defined as
FI ¼
1
N
X
N
k¼1
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ð
g

ðh
k
Þ 
g
ðh
k
ÞÞ
2
q
g
ðh
k
Þ
.(10)
The fitted variograms for the weed coverage and weed seed
production for the 2005 and 2006 data are shown in Figs.3 and 4,
respectively.Each point of the variogram represents pairs of data
equally apart and is described in Table 2 for 2005 and 2006 data.
As the spatial dependence defined by variograms exists up to
200m,the weed data for non-sampled locations were estimated
with kriging,based on the collected data,to obtain the spatial
representation of the weeds.
The interpolation grid was selected as 20m20m,which is
within the variogram range.The corn-crop of 49ha
(700m700m) was then divided into 35 cells per axis sized
20m20m.Estimated maps for the weed coverage at the current
life-cycle and weed seed production at the subsequent life-cycle
were thus generated.The maps are shown in Figs.5 and 6 for the
2005 and 2006 data,respectively.
The estimation quality with kriging was evaluated by cross
validation (Isaaks and Srivastana,1989).Three characteristics of
the residuals,mean closes to zero,constant variance and normal
probability were analyzed,indicating a good estimative.Table 3
shows the results of the cross validation for the 2005 and 2006
data.As the estimative residual means contain the zero,the null
hypothesis of the mean being close to zero is not rejected.The
variances are considered constants with
¯
R the residual size and
the Anderson–Darling test is used to check the normality of the
residuals distribution with 95% of confidence.As the p-value for
the residuals are larger than 0.05,the hypothesis that residuals
have normal distribution is not rejected.
3.2.2.Map objects and features
Weeds have a tendency to aggregate in clusters.This tendency
explains why certain regions of a field are free of weeds.Due to
the spatial variability of weeds in agricultural fields,it is possible
to detect clusters frommaps.Let
R
ðu;vÞ represent the entire map
region with ðu;vÞ the spatial coordinate of the intensities in the
map.The clusters detected in
R
ðu;vÞ associated to the weed maps
provide three features to infer the weed infestation risk.Assuming
the features have three categorical conditions,the clusters in
R
ðu;vÞ are described by connected objects obtained as follows.
First,to form a map Iðu;vÞ with coded intensities,the
intensities f ðu;vÞ of
R
ðu;vÞ are quantized into three levels
L
1
;L
2
;L
3
associated to ranges equally apart of f ðu;vÞ by an encoder
Q as follows:
Iðu;vÞ ¼ Qðf ðu;vÞÞ ¼ t,(11)
where
t ¼
1 if f ðu;vÞpL
1
;
2 if L
1
of ðu;vÞpL
2
;
3 if L
2
of ðu;vÞpL
3
:
8
>
<
>
:
ARTICLE IN PRESS
Table 1
Fit index of theoretical variograms models.
Model Weed seed Weed coverage
2005 2006 2005 2006
Exponential 0.12 0.05 0.11 0.06
Gaussian 0.25 0.16 0.22 0.16
Spherical 0.16 0.08 0.15 0.08
G.M.Bressan et al./Engineering Applications of Artificial Intelligence 22 (2009) 579–592 583
ARTICLE IN PRESS
Table 2
Results for the variograms.
Number of pairs separated by h
k
Distance h
k
in m
g
ðh
k
Þ
2005 2006 2005 2006 2005 2006
Weed seed production per m
2
14.50 14.50 101.57 101.52
6:19 10
6
2:06 10
6
80.50 98 141.64 139.22
6:79 10
6
2:77 10
6
129 157 245.59 245.06
5:78 10
6
2:34 10
6
116.50 146.50 344.63 344.73
6:47 10
6
2:58 10
6
90 16 445.61 439.74
7:74 10
6
2:56 10
6
54 98.50 544.41 543.19
6:08 10
6
2:57 10
6
Weed coverage in % 1 16.50 95 101.57 0.045 0.048
113 106 126.63 139.47 0.042 0.063
142.50 172.50 223.81 245.13 0.047 0.057
210 161 330.39 344.85 0.044 0.063
142.50 130.50 435.92 440.20 0.054 0.071
98 112.50 528.15 543.11 0.061 0.063
59.50 – 619.44 – 0.062 –
4 – 690.96 – 0.048 –
100 200 300 400 500 600 700
0
0.02
0.04
0.06
0.08
distance (m)
γ∗(h)γ(h)
100 200 300 400 500 600 700
0
2
4
6
8
10
x 10
6
distance (m)
γ∗(h)γ(h)
Fig.3.Theoretical variograms for the 2005 data obtained with an exponential model (solid line) and the corresponding experimental variogram(points) for (a) the weed
coverage with C
0
¼ 0:038,C
0
þC
1
¼ 0:05 and for (b) the weed seed production with C
0
¼ 5:09 10
6
,C
0
þC
1
¼ 6:50 10
6
.
100 200 300 400 500 600 700 800
0
0.02
0.04
0.06
0.08
distancia (m)
γ∗(h)γ(h)
γ∗(h)γ(h)
100 200 300 400 500 600
0
0.5
1
1.5
2
2.5
3
x 10
6
distancia (m)
Fig.4.Theoretical variograms for the 2006 data obtained with an exponential model (solid line) and the corresponding experimental variogram(points) for (a) the weed
coverage with C
0
¼ 0:048;C
0
þC
1
¼ 0:063 and for (b) the weed seed production with C
0
¼ 2:0 10
6
;C
0
þC
1
¼ 2:60 10
6
.
G.M.Bressan et al./Engineering Applications of Artificial Intelligence 22 (2009) 579–592584
ARTICLE IN PRESS
0
200
400
600
0
100
200
300
400
500
600
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0
200
400
600
0
100
200
300
400
500
600
1000
2000
3000
4000
5000
6000
7000
Fig.5.Maps estimated with kriging for data collected in 2005 associated to (a) the weed coverage map at the current life-cycle and (b) the weed seed production map at
the subsequent life-cycle.The up right corner of both maps represents the irregular contour of the corn-crop field.The gray scale in (a) represents percentage and in (b)
represents the number of seeds per m
2
.
0 200 400 600
0
100
200
300
400
500
600
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 200 400 600
0
100
200
300
400
500
600
1000
2000
3000
4000
5000
6000
7000
Fig.6.Maps estimated with kriging for data collected in 2006 associated to (a) the weed coverage map at the current life-cycle and (b) the weed seed production map at
the subsequent life-cycle.The up right corner of both maps represents the irregular contour of the corn-crop field.The gray scale in (a) represents percentage and in (b)
represents the number of seeds per m
2
.
Table 3
Cross validation for kriging estimation for 2005 and 2006 data.
Residual mean Mean interval Constant variance Anderson–Darling test
Weed coverage
2005
0:50 10
2
½0:07;0:06 ¯
R ¼ 0:25 p-value ¼ 0:61
2006
0:11 10
1
½0:10;0:07 ¯R ¼ 0:36 p-value ¼ 0:19
Weed seed production
2005
0:40 10
2
½0:17;0:18 ¯R ¼ 0:26 p-value ¼ 0:45
2006
0:82 10
1
½0:68;0:51 ¯
R ¼ 0:21
p-value ¼ 0:16
G.M.Bressan et al./Engineering Applications of Artificial Intelligence 22 (2009) 579–592 585
The pixels in Iðu;vÞ may represent the same intensity range but
may belong to different clusters within the image.Connected
objects are thus obtained by image analysis using a 4-connected
model (Gonzalez and Woods,2002).In this model,two pixels in
the four-neighbors are connected if they have the same value.The
four-neighbors of pixel p at coordinates ðu;vÞ are given by
ðu þ1;vÞ;ðu 1;vÞ;ðu;v þ1Þ;ðu;v 1Þ.(12)
The four-connected model is implemented by generating a binary
matrix called I
b
ðu;vÞ as
If Iðu;vÞa0 then I
b
ðu;vÞ ¼ 1.(13)
Finally,the connected objects are organized in a matrix called
Tðu;vÞ.In a gray scale,Fig.7 shows the image of the labels given to
the nine connected objects identified in both the maps of the
weed coverage and weed seed production for the 2005 data and
Fig.8 shows the same images for the 2006 data.
Using the connected objects defined above,features for the
infestation were selected (Bressan et al.,2008).The features were
evaluated per regions of size not exceeding the spatial depen-
dence of the data sets.Let R
i
;i ¼ 1;...;N
R
denote subregion
R
i
of
R
,p
t
i
the number of connected object in
R
i
such that Tðu;vÞ ¼ t
and k
t
i
the number of pixels with intensities equal to t in
R
i
.The
features were established as follows:

x
1
:Feature for the weed coverage per region.Indicates the
percentage of surface infested by emergent weeds in each
region.In each region
R
i
it is obtained as the weighted
intensities Tðu;vÞ,as follows:
x
1
ðiÞ ¼
1
number elements of
R
i
P
ðu;vÞ2
R
i
Tðu;vÞ
P
3
t¼1
t
.(14)

x
2
:Feature for the weed seed production per region.Charac-
terizes the locations of seeds which can germinate in each
region and is associated with the weed seed production.It is
obtained in the same way as feature
u
1
.

x
3
:Feature for the weed seed patches per region.Represents how
the seeds contribute to weed proliferation in the surroundings
ARTICLE IN PRESS
0
200
400
600
0
100
200
300
400
500
600
0
2
4
6
8
0
200
400
600
0
100
200
300
400
500
600
0
2
4
6
8
Fig.7.Maps of connected objects representing (a) the matrix Tðu;vÞ for the weed coverage and (b) the weed seed production (2005 data).
0 200 400 600
0
100
200
300
400
500
600
0
2
4
6
8
10 10
0 200 400 600
0
100
200
300
400
500
600
0
2
4
6
8
Fig.8.Maps of connected objects representing (a) the matrix Tðu;vÞ for the weed coverage and (b) the weed seed production (2006 data).
G.M.Bressan et al./Engineering Applications of Artificial Intelligence 22 (2009) 579–592586
of each region.The worst case of a patch distribution is one
patch covering all the cells of the image,representing 100% of
occupation.If a value of weed seed occupy a large part of a
region,that is,several pixels contain this value,then,the object
formed by this value has a high influence on the weed seed
patch calculation.In each region
R
i
,it is obtained as the
average of the weighted intensities Tðu;vÞ of the connected
objects as follows:
x
3
ðiÞ ¼
1
number elements of
R
i
P
tjp
t
i
a0
tk
j
i
=p
t
i
P
3
t¼1
t
(15)

x
4
:Feature for the competitiveness.Reflects the high level of
competitiveness of certain species of weeds and their pro-
liferation and is based on the weed biomass.The higher the
weed biomass the higher the competitiveness.It is obtained as
the output of the first Bayesian network as a categorical
variable for each region using the resulting rule set.
An example of the evaluation of the features x
2
and x
3
in region
R
25
for 2006 data is presented in Table 4.
Along with features x
i
;i ¼ 1;2;3,which had to be discretized
in order to build matrices of categorical variables,the matrix of
categorical variables representing the feature for the competi-
tiveness was an input to the network that classifies the risk of
infestation for each region
R
i
.The features x
1
and x
4
were
calculated at current life-cycle and features named x
2
and x
3
at
subsequent life-cycle of the weed population.The features as well
as the yield loss were obtained for both 2005 and 2006 life-cycle
weed–crop data sets.The infestation features were evaluated per
region.Then,the crop was divided into 49 regions of 5 5 cells,
each one having 100m100m,not exceeding the data set spatial
dependence of 200m.The variables used to train the networks are
the categorical variables obtained by region normalized in ½0;1.
Therefore,two instances for each of the 49 regions were
considered resulting in 98 instances.
In order to extract probabilistic rules using the BayesRule
method,the values of all the variables had to be discretized.The
discretizing was conducted by an expert who proposed the three
intervals described in Table 5 represented by categorical variables.
3.2.3.Bayesian networks structures
Since the risk can be explained by the yield loss this was
defined as the class variable.Fig.9 shows the structure of the
Bayesian network classifier represented by the parent–children
relationships,defined by a subject specialist.The node identified
as weed biomass is the class node from which the
competitiveness is inferred.
Fig.10 shows the naı
¨
ve Bayes classifier structure that
represents the same problem,in which the class variable has no
parents and all the features are conditionally independent given
the class variable.For the purpose of rule comparison,only the
node competitiveness from the first collaborative network is
included in the learning of the naı¨ve Bayes classifier.
It is evident by inspecting the expert-based Bayesian network
classifier depicted in Fig.9 that the weed coverage,total weed,
broad-leaved weed and narrow-leaved weed nodes do not belong
to the Markov blanket of the class node defined as the yield loss.
Therefore,these nodes will not be taken into account by the
BayesRule method (Hruschka et al.,2008).
Once both Bayesian networks had their structure defined,the
next step was to learn the conditional probability distribution
associated to their nodes.This was accomplished as part of the
BayesRule method,using a free software called Genie.
2
As
mentioned in Section 1,the knowledge represented by a Bayesian
classifier is not easily understood by human beings.A way of
promoting its understandability is by translating it into a more
ARTICLE IN PRESS
Table 4
Objects and features for
R
25
with
R
i
2 R
55
.
T ¼
1 1 1 2 2
1 1 1 2 2
2 2 2 2 2
2 2 2 2 1
2 2 2 2 1
2
6
6
6
6
6
6
4
3
7
7
7
7
7
7
5
k
1
25
¼ 8;p
1
25
¼ 2
k
2
25
¼ 17;p
2
25
¼ 1
x
2
¼ 0:2800;x
3
¼ 0:2533
Table 5
Discrete intervals for the risk of infestation categorical variables.
Node variables Intervals
Weed coverage (WCoverage) (%) Thin(Th) Average(A) Thick(k)
[0,0.35] ]0.35,0.70[ [0.70,1]
Weed seed (WSeed) (m
2
)
Low(L) Medium(M) High(A)
[0,0.35] ]0.35,0.70[ [0.70,1]
Weed seed patches (WSPatch) (m
2
)
Small(S) Regular(R) Large(G)
[0,0.40] ]0.40,0.80[ [0.80,1]
Total weed (TWeed) (m
2
)
Low(L) Medium(M) High(A)
[0,0.20] ]0.20,0.60[ [0.60,1]
Narrow-leaved weed (NLWeed) (m
2
)
Low(L) Medium(M) High(H)
[0,0.20] ]0.20,0.60[ [0.60,1]
Broad-leaved weed (BLWeed) (m
2
)
Low(L) Medium(M) High(H)
[0,0.25] ]0.25,0.75[ [0.75,1]
Weed biomass (WBiomass) (m
2
)
Low(L) Medium(M) High(H)
[0,0.20] ]0.20,0.60[ [0.60,1]
Yield loss (YLoss) (output) (m
2
)
Low(L) Medium(M) High(H)
[0,0.15] ]0.15,0.45[ [0.45,1]
Weed seed
Weed coverage
Narrow-leaved
weed
Broad-leaved
weed
Total weed
Weed
biomass
Weed seed
patches
Yield
loss
Fig.9.The expert-based Bayesian network classifier to infer the risk of infestation.
2
http://genie.sis.pitt.edu
G.M.Bressan et al./Engineering Applications of Artificial Intelligence 22 (2009) 579–592 587
suitable representation,such as classification rules.As the
standard propositional if–then classification rule is the simplest
and most comprehensive way to represent a classification
procedure,it has been adopted by the BayesRule method
(Hruschka et al.,2008),which implements the translation process.
3.2.4.Pruning strategy
As stated in the literature,there are many rule interestingness
metrics such as support,confidence,lift,correlation,collective
strength,etc.Such metrics are often used to determine the more
relevant rules from a rule set (as those implemented by pruning
strategies,for instance).Many of these measures,however,
provide conflicting information about the interestingness of a
pattern.Therefore,the best metric to use for a given application
domain is hard to define.
It is not claimed that the rule estimated probability given by
the BC is the best measure to be used in a rule set pruning task,as
long as different measures have different intrinsic properties.
However,it is an easy measure to be implemented as well as
understood and contributed (as the experiments showed) for
helping pruning.See Tan et al.(2002) for a more detailed
description of properties of some of the most commonly used
rule interestingness measures.
In the experiments described in this paper,focusing on the
weed infestation domain,a pruning strategy based on the rule
estimated probability given by the BC is proposed to reduce the
number of rules in each rule set when the Markov blanket
strategy was unable to reduce the number of rules.The pruning
strategy is based on a very simple idea and is mainly motivated by
the fact that it can be applied without any extra computation
effort.Considering the rule set as an ordered (based on the
estimated probability) list,the pruning can be done by taking into
account only the rules having estimated probability higher than a
predefined threshold.When pruning is applied,the number of
rules tends to be smaller and the comprehensibility tends to be
higher.On the other hand,having fewer rules may imply in having
a less detailed overview of the problem (with fewer rules and
fewer antecedents).Thus,the tradeoff between accuracy and
complexity is a very important issue to be analyzed in each
specific application domain.
4.Collaborative classifiers results
Using an expert-based Bayesian and a naı¨ve Bayes classifiers,
the numerical parameters of the classifiers were obtained using
the Genie software.The BayesRule method was then used in
conjunction with each classifier in order to extract the corre-
sponding classification rules to infer the infestation risk.In order
to do that,the values of all features were discretized in the
conditions of each rule as in Table 5,except x
4
,which was inferred
from the weed biomass as a categorical variable thus having the
same intervals as the weed biomass.
The number of rules represents all the variable combinations
and their categorical variables.Each rule has an associated value
that represents the probability of its class value,given the values
of its antecedent variables.Using a 10-fold cross validation
procedure,10 Bayesian networks were trained using 10 different
training sets and the extracted rules were evaluated using each of
the 10 corresponding testing sets.The same testing sets were used
to evaluate the extracted rules with and without pruning from
both the expert-based and naı¨ve networks of the collaborative
system.In the pruning for each one of 10 cross validation sets,the
rules with probability below a certain threshold,which were
generated from an a priori probability of the class variable
obtained from the numerical parameters of the network,were
removed and a default class was introduced.The most probable
value for the class variable was taken as the categorical value
medium (M).The default class named D is then defined as the
most frequent class.
Considering that a 10-fold cross validation strategy was used in
the experiments,only one of the 10 testing sets was chosen to be
presented in the paper for each classifier.The remaining fold
results are obtained in the same way.In what follows,the results
for both networks of the collaborative system used to infer the
risk of infestation are presented.
4.1.Competitiveness weed–crop classification results
The inference for the competitiveness of weed–crop is
performed by the first classification task in the collaborative
system.The BayesRule method extracted a set of rules fR
1
;...;R
r
g
with r ¼ 27 probabilistic rules from each Bayes classifier (three
variables,each having three possible values).
The evaluation results obtained for one of the 10 testing set,
including the accuracy and the corresponding class probability,
are shown in Table 6.For this case,the rules are 50% in agreement
with the testing set,since 3 out of 6 data instances were correctly
classified.Table 7 shows the pruned Bayesian rule set,which
presents rules with probability above a threshold of 70% as well as
the default rule D and Table 8 shows the results of the testing set
using the pruned rule set of Table 7.For this testing set,rules 9
and 21 were replaced by the default rule and the pruned rule set
was 83.33% in agreement with the testing set,since 5 out of 6
instances were correctly classified.In this particular modeling,the
classification rate has improved.For all the 10 testing set cases,
the 71 data instances were tested.The results indicate 63.39% of
agreement,since 45 of 71 testing data were correctly classified.
By replacing the rules with probability less than 70% by the
default rule D,this percentage became 64.79%,since 46 out of 71
testing instances were correctly classified.These results are
shown in Table 9.Table 10 shows the results for one testing set
when considering the Bayesian rule set extracted from the naı¨ve
Bayes classifier which reveal that the rules are 50% in agreement
ARTICLE IN PRESS
Yield
loss
Weed seed
Weed coverage
Weed seed
patches
Competitiveness
Fig.10.The naı¨ve Bayes classifier to infer the risk of infestation.
Table 6
Competitiveness expert-based Bayesian network testing data set results for the
rules.
BLWeed NLWeed TWeed WBiomass R
r
Test P(Rule R
r
jX
1;J
1
;X
2;J
2
;X
3;J
3
) (%)
H M M H R
9
Incorrect 52
M H M M R
21
Incorrect 55
H L M M R
6
Correct 71
H L M M R
6
Correct 71
H M M M R
9
Incorrect 52
M M L L R
26
Correct 80
G.M.Bressan et al./Engineering Applications of Artificial Intelligence 22 (2009) 579–592588
with the testing set,since 3 out of 6 instances were correctly
classified.Table 11 shows the pruned Bayesian rule set and
Table 12 shows the results of the testing set using the pruned rule
set.For this testing set,rules 3,13,25 and 27 were replaced by the
default rule and the pruned rule set was 66.66% in agreement
with the testing set,since 4 out of 6 data instances were correctly
classified.In this particular modeling,the classification has also
improved.
The results of the classification for all testing sets using other
thresholds for pruning the rule sets of the expert-based and naı¨ve
Bayesian classifiers are also presented in Table 9.
4.2.Risk of infestation classification results
The second classification task was performed in order to
generate a set of classification rules to infer the risk of infestation.
As in the case of the competitiveness,an expert-based Bayesian
network classifier and a naı¨ve Bayes classifier were considered.As
before,the numerical parameters were defined using the Genie
software.The BayesRule method was then used in conjunction
with each classifier in order to extract the corresponding
classification rules to infer the infestation risk.The BayesRule
method extracted a set of 27 probabilistic rules fromeach expert-
based Bayesian network classifier as shown in Table 13.
By considering the Bayesian rule set extracted fromthe expert-
based network,Table 14 displays the results obtained for the test
set illustrated here.The rules presented 89% of agreement,given
that 8 of 9 tested instances were correctly classified.By
considering the Bayesian rule set extracted from the naı¨ve Bayes
classifier,the results are shown in Table 15.
As before,the rules with probability below 70% were removed
and a default class D also taken as the categorical value M was
used.The pruning strategy was applied after the Markov blanket
had reduced the number and the complexity (regarding condi-
tions in their antecedent part) of the classification rules.Thus,the
pruning strategy was applied to the rules shown in Table 13 and
the reduced set of rules are shown in Table 16.
By considering the pruned Bayesian rule set extracted fromthe
expert-based network,Table 17 displays the results obtained
again for the test set illustrated here.The rules presented,as
before,89% of agreement.By considering the pruned Bayesian rule
set extracted fromthe naı¨ve Bayes classifier,the results are shown
in Table 18.The rules presented 88.9% of agreement.For all the 10
testing set cases,the results are shown in Table 19.Also,to verify if
the rules set have a positive impact on the results,the results
obtained using the default class D in the all 10 testing set cases
ARTICLE IN PRESS
Table 7
Pruned expert-based Bayesian rule set using the default rule D with a threshold of
probability 0.7.
1 If (BLWeed is H) and (NLWeed is H) and (TWeed is H) then WBiomass is H (0.72)
4 If (BLWeed is H) and (NLWeed is L) and (TWeed is H) then WBiomass is M(1.00)
6 If (BLWeed is H) and (NLWeed is L) and (TWeed is M) then WBiomass is M(0.72)
7 If (BLWeed is H) and (NLWeed is M) and (TWeed is H) then WBiomass is H (0.83)
11 If (BLWeed is L) and (NLWeed is H) and (TWeed is L) then WBiomass is M(1.00)
19 If (BLWeed is M) and (NLWeed is H) and (TWeed is H) then WBiomass is L (1.00)
23 If (BLWeed is M) and (NLWeed is L) and (TWeed is L) then WBiomass is L (0.79)
24 If (BLWeed is M) and (NLWeed is L) and (TWeed is M) then WBiomass is L (0.72)
26 If (BLWeed is M) and (NLWeed is M) and (TWeed is L) then WBiomass is L (0.80)
27 If (BLWeed is M) and (NLWeed is M) and (TWeed is M) then WBiomass is M
(0.80)
D Otherwise WBiomass is M (1.00)
Table 8
Pruned expert-based Bayesian network testing data set results using the default
rule D with a threshold of probability 0.7.
BLWeed NLWeed TWeed WBiomass R
r
Test P(Rule R
r
jX
1;J
1
;X
2;J
2
;X
3;J
3
) (%)
H M M H R
D
Incorrect 100
M H M M R
D
Correct 100
H L M M R
6
Correct 72
H L M M R
6
Correct 72
H M M M R
D
Correct 100
M M L L R
26
Correct 80
Table 9
Competitiveness classification results with expert-based and naı¨ve Bayesian networks for all 10 folds.
Expert-based Naı¨ve
Accuracy (%) Number of rules Accuracy (%) Number of rules
Markov blanket rules set 63.39 27 57.75 27
Pruned rules set
Threshold ¼ 60% 64.79 12 60.56 15
Threshold ¼ 70% 64.79 11 61.97 9
Threshold ¼ 80% 60.65 7 60.56 3
Threshold ¼ none 60.00 1 60.50 1
Table 10
Naı¨ve Bayes testing data set results for the rules.
BLWeed NLWeed TWeed WBiomass R
r
Test P(Rule R
r
jX
1;J
1
;X
2;J
2
;X
3;J
3
Þ (%)
H M H H R
3
Incorrect 62
L H L M R
13
Correct 52
M M M M R
27
Correct 48
M H M M R
25
Incorrect 50
M M M M R
27
Correct 48
H L M L R
20
Incorrect 91
Table 11
Pruned naı¨ve Bayes rule set with a threshold of probability 0.7.
2 If (BLWeed is H) and (NLWeed is L) and (TWeed is H) then WBiomass is M(0.80)
5 If (BLWeed is L) and (NLWeed is L) and (TWeed is H) then WBiomass is M(0.73)
18 If(BLWeed is M) and (NLWeed is M) and (TWeed is L) then WBiomass is L (0.71)
19 If (BLWeed is H) and (NLWeed is H) and (TWeed is M) then WBiomass is M
(0.76)
20 If (BLWeed is H) and (NLWeed is L) and (TWeed is M) then WBiomass is M
(0.91)
21 If (BLWeed is H) and (NLWeed is M) and (TWeed is M) then WBiomass is M
(0.78)
23 If (BLWeed is L) and (NLWeed is L) and (TWeed is M) then WBiomass is M(0.85)
D Otherwise WBiomass is M (1.00)
G.M.Bressan et al./Engineering Applications of Artificial Intelligence 22 (2009) 579–592 589
whatever the rule antecedent is,and also the results for
thresholds of 60%,70% and 80% are showed in Table 19.
5.Conclusions
This work explores Bayesian network based methods to infer
the risk of weed infestation in a corn-crop.The proposed inference
systemis implemented as a collaboration between two classifica-
tion tasks.The first one infers the competitiveness (expressed by
the biomass) of weeds and the second infers the risk of infestation
(expressed by the yield loss),using as input the inferred
competitiveness,the weed seed density,weed coverage and weed
seed patches.The last three features are inferred fromkriging and
image objects.For both classification tasks,two different Bayesian
network structures,a naı¨ve Bayes and an expert-based network
structures,were used for comparison purposes.The numeric
parameters of both Bayesian models were learned from the
empirical data collected from a corn-crop field.
A hybrid approach,implemented by the BayesRule method,
which articulates Bayes and categorical rules,was used to
improve the model’s understandability,by extracting classifica-
tion rules from each model.The Markov blanket concept was
used in the BayesRule method to reduce the number and the
complexity of classification rules.When pruning is applied,the
number of rules tends to be smaller and the comprehensibility
tends to be higher.On the other hand,having fewer rules may
imply having a less detailed overview of the problem(with fewer
rules and fewer antecedents).Thus,the trade off between
accuracy and complexity is a very important issue to be analyzed
in each specific application domain.
In this work,for the expert-based network,the Markov blanket
concept was sufficient to prune the rule set efficiently,since the
results indicate 72.5% and 66.3% of agreement without and with
the pruning strategy,respectively.In addition,the results reveal
that the expert-based Bayesian network classifier yields a higher
accuracy than the naı¨ve Bayes classifier.In the former,
the application of the pruning strategy made no difference in
the results.The strong and unrealistic assumption (that all the
features are independent given the class) which is an intrinsic
aspect of any naı¨ve Bayes classifier may have contributed to this
behavior.It is worthwhile mentioning that the results presented
are specific to a particular crop field,subject to the conditions
described in Section 3.1.Further work includes the use of
extensive simulations and experiments to generalize the obtained
results.It is also worth looking into the use of the proposed
pruning strategy in other domains in order to confirm its
relevance.
ARTICLE IN PRESS
Table 13
Expert-based Bayesian rules set for the risk of infestation.
1 If (WSeed is H) and (WCompetitiveness is H) and (WSPatch is G) then YLoss is H (0.66)
2 If (WeedSeed is H) and (WCompetitiveness is H) and (WSPatch is S) then YLoss is H (0.50)
3 If (WeedSeed is H) and (WCompetitiveness is H) and (WSPatch is R) then YLoss is H (0.47)
4 If (WeedSeed is H) and (WCompetitiveness is L) and (WSPatch is G) then YLoss is H (0.41)
5 If (WeedSeed is H) and (WCompetitiveness is L) and (WSPatch is S) then YLoss is M (0.72)
6 If (WeedSeed is H) and (WCompetitiveness is L) and (WSPatch is R) then YLoss is M (0.48)
7 If (WeedSeed is H) and (WCompetitiveness is M) and (WSPatch is G) then YLoss is M (0.36)
8 If (WSeed is H) and (WCompetitiveness is M) and (WSPatch is S) then YLoss is M (0.51)
9 If (WSeed is H) and (WCompetitiveness is M) and (WSPatch is R) then YLoss is M (0.77)
10 If (WSeed is L) and (WCompetitiveness is H) and (WSPatch is G) then YLoss is H (0.66)
11 If (WSeed is L) and (WCompetitiveness is H) and (WSPatch is S) then YLoss is M (0.61)
12 If (WSeed is L) and (WCompetitiveness is H) and (WSPatch is R) then YLoss is H (0.93)
13 If (WSeed is L) and (WCompetitiveness is L) and (WSPatch is G) then YLoss is L (0.41)
14 If (WSeed is L) and (WCompetitiveness is L) and (WSPatch is S) then YLoss is M (0.84)
15 If (WSeed is L) and (WCompetitiveness is L) and (WSPatch is R) then YLoss is L (0.54)
16 If (WSeed is L) and (WCompetitiveness is M) and (WSPatch is G) then YLoss is M (0.36)
17 If (WSeed is L) and (WCompetitiveness is M) and (WSPatch is S) then YLoss is M (0.55)
18 If (WSeed is L) and (WCompetitiveness is M) and (WSPatch is R) then YLoss is M (0.92)
19 If (WSeed is M) and (WCompetitiveness is H) and (WSPatch is G) then YLoss is H (0.50)
20 If (WSeed is M) and (WCompetitiveness is H) and (WsPatch is S) then YLoss is M (0.67)
21 If (WSeed is M) and (WCompetitiveness is H) and (WSPatch is R) then YLoss is H (1.00)
22 If (WSeed is M) and (WCompetitiveness is L) and (WSPatch is G) then YLoss is M (0.84)
23 If (WSeed is M) and (WCompetitiveness is L) and (WSPatch is S) then YLoss is M (1.00)
24 If (WSeed is M) and (WCompetitiveness is L) and (WSPatch is R) then YLoss is M (0.58)
25 If (WSeed is M) and (WCompetitiveness is M) and (WSPatch is G) then YLoss is M (0.62)
26 If (WSeed is M) and (WCompetitiveness is M) and (WSPatch is S) then YLoss is M (0.500)
27 If (WSeed is M) and (WCompetitiveness is M) and (WSPatch is R) then YLoss is H (1.00)
Table 12
Pruned naı¨ve Bayes testing data set results for the rules with a threshold of
probability 0.7.
BLWeed NLWeed TWeed WBiomass R
r
Test P(Rule R
r
jX
1;J
1
;X
2;J
2
;X
3;J
3
Þ (%)
H M H H R
D
Incorrect 100
L H L M R
D
Correct 100
M M M M R
D
Correct 100
M H M M R
D
Correct 100
M M M M R
D
Correct 100
H L M L R
20
Incorrect 91
Table 14
Expert-based Bayesian network testing data set results for the infestation risk.
WSeed WSPatch WCompetitiveness YLoss R
r
Test P(Rule
R
r
jV
1;J
1
;V
2;J
2
;V
3;J
3
;V
4;J
4
Þ
(%)
L S B M R
14
Correct 84
L S M M R
17
Correct 55
L S B M R
14
Correct 84
L S M M R
17
Correct 55
M R M M R
27
Incorrect 100
L S B M R
14
Correct 84
L S H M R
11
Correct 61
L S H M R
11
Correct 61
M R H H R
21
Correct 100
G.M.Bressan et al./Engineering Applications of Artificial Intelligence 22 (2009) 579–592590
ARTICLE IN PRESS
Table 15
Naı¨ve Bayes classifier testing data set results for the infestation risk.
WCoverage WSeed WSPatch WCompetitiveness YLoss R
r
Test P(Rule R
r
jV
1;J
1
;V
2;J
2
;V
3;J
3
;V
4;J
4
Þ (%)
Th L L S M R
41
Correct 82
Th L L S M R
41
Correct 82
Th L L S M R
40
Correct 61
Th L L R M R
42
Correct 70
Th L L R M R
42
Correct 70
Th L L S M R
41
Correct 82
Th L L R H R
42
Incorrect 70
A L L R H R
51
Correct 75
A L L G H R
49
Correct 82
Table 16
Pruned expert-based Bayesian rules set for the risk of infestation.
5 If (WSeed is H) and (WCompetitiveness is B) and (WSPatch is S) then YLoss is M (0.72)
9 If (WSeed is H) and (WCompetitiveness is M) and (WSPatch is R) then YLoss is M (0.77)
12 If (WSeed is L) and (WCompetitiveness is H) and (WSPatch is R) then YLoss is H (0.93)
14 If (WSeed is L) and (WCompetitiveness is B) and (WSPatch is S) then YLoss is M (0.84)
18 If (WSeed is B) and (WCompetitiveness is M) and (WSPatch is R) then YLoss is M (0.92)
21 If (WSeed is M) and (WCompetitiveness is H) and (WSPatch is R) then YLoss is H (1.00)
22 If (WSeed is M) and (WCompetitiveness is L) and (WSPatch is G) then YLoss is M (0.84)
23 If (WSeed is M) and (WCompetitiveness is L) and (WSPatch is S) then YLoss is M (1.00)
27 If (WSeed is M) and (WCompetitiveness is M) and (WSPatch is R) then YLoss is H (1.00)
D Otherwise YLoss is M (1.00)
Table 17
Pruned expert-based Bayesian network testing data set results for the infestation risk with a threshold of probability 0.7.
WSeed WSPatch WCompetitiveness YLoss R
r
Test P(Rule R
r
jX
1;J
1
;X
2;J
2
;X
3;J
3
;X
4;J
4
Þ (%)
L S L M R
14
Correct 84
L S M M R
D
Correct 100
L S L M R
14
Correct 84
L S M M R
D
Correct 100
M R M M R
27
Incorrect 100
L S L M R
14
Correct 84
L S H M R
D
Correct 100
L S H M R
D
Correct 100
M R H H R
21
Correct 100
Table 18
Pruned naı¨ve Bayes classifier testing data set results for the infestation risk with a threshold of probability 0.7.
WCoverage WSeed WSPatch WCompetitiveness YLoss R
r
Test P(Rule R
r
jX
1;J
1
;X
2;J
2
;X
3;J
3
;X
4;J
4
Þ (%)
Th L L S M R
41
Correct 82
Th L L S M R
41
Correct 82
Th L L G M R
D
Correct 100
Th L L R M R
42
Correct 70
Th L L R M R
42
Correct 70
Th L L P M R
41
Correct 82
Th L L R H R
42
Incorrect 70
A L L R H R
51
Correct 75
A L L G H R
49
Correct 82
Table 19
Risk classification results with expert-based and naı¨ve Bayesian networks for all 10 folds.
Expert-based Naı¨ve
Accuracy (%) Number of rules Accuracy (%) Number of rules
Markov blanket rules set 72.5 27 71.4 81
Pruned rules set
Threshold ¼ 60% 63 15 69.3 68
Threshold ¼ 70% 66.3 10 71.4 52
Threshold ¼ 80% 66.3 8 66.3 44
Threshold ¼ none 65.3 1 66.32 1
G.M.Bressan et al./Engineering Applications of Artificial Intelligence 22 (2009) 579–592 591
Acknowledgments
This work was partially supported by the Coordenac-a˜o de
Aperfeic-oamento de Pessoal de Nı´vel Superior (CAPES) under the
Programa Nacional de Cooperac-a˜o Acadeˆmica (PROCAD),the
Conselho Nacional de Desenvolvimento Cientı´fico e Tecnolo´ gico
(CNPq) and Fundac-a˜o de Amparo a`Pesquisa do Estado de Sa˜o
Paulo (FAPESP).We thank Dr.De
´
cio KaramfromEmbrapa Milho e
Sorgo,Sete Lagoas,MG,for helping to define the Bayesian network
structures and for providing the data used in the experiments
described in this paper.
References
Aitkenhead,M.J.,Dalgetty,I.A.,Mullins,C.E.,McDonald,A.J.S.,Strachan,N.J.C.,
2003.Weed and crop discrimination using image analysis and artificial
intelligence methods.Computers and Electronics in Agriculture 39 (3),
157–171.
Banerjee,S.,Johnson,G.A.,Schneider,N.,Durgan,B.R.,2005.Modelling replicated
weed growth data using spatially-varying growth curves.Environmental and
Ecological Statistics 12 (4),357–377.
Bressan,G.M.,Koenigkan,L.V.,Oliveira,V.A.,Cruvinel,P.E.,Karam,D.,2008.A
classification methodology for the risk of weed infestation using fuzzy logic.
Weed Research 48 (5),470–479.
Bressan,G.M.,Oliveira,V.A.,Hruschka,E.R.J.,Nicoletti,M.C.,2007a.Biomass based
weed–crop competitiveness classification using Bayesian networks.In:
Seventh International Conference on Intelligent Systems Design and Applica-
tions,IEEE Press,Rio de Janeiro,pp.121–126.
Bressan,G.M.,Oliveira,V.A.,Hruschka,E.R.J.,Nicoletti,M.C.,2007b.A probability
estimation based strategy to optimize the classification rule set extracted from
Bayesian network classifiers.In:VIII Simpo´ sio Brasileiro de Automac- ao
Inteligente,Floriano´ polis,paper ID 30651-1.
Brooker,P.I.,1979.Kriging.Engineering and Mining Journal 180 (9),148–153.
Cheng,J.,Greiner,R.,Kelly,J.,Bell,D.,Liu,W.,2002.Learning Bayesian networks
from data:an information-theory based approach.Artificial Intelligence 137
(1),43–90.
Cousens,R.,Mortimer,M.,1995.Dynamics of Weed Populations.Cambridge
University Press,Cambridge,UK.
Domingos,P.,Pazzani,M.,1997.On the optimality of the simple Bayesian classifier
under zero–one loss.Machine Learning 29 (2–3),103–130.
Duda,R.O.,Hart,P.E.,1973.Pattern Classification and Scene Analysis.Wiley,
New York.
Faechner,T.,Norrena,K.,Thomas,A.G.,Deutsch,C.V.,2002.A risk-qualified
approach to calculate locally varying herbicide application rates.Weed
Research 42 (6),476–485.
Firbank,L.G.,Watkinson,A.R.,1985.A model of interference within plant
monocultures.Journal of Theoretical Biology 116 (2),291–311.
Friedman,N.,Geiger,D.,Goldszmidt,M.,1985.Bayesian network classifiers.
Machine Learning 29 (1),131–163.
Gonzalez,R.C.,Woods,R.E.,2002.Digital Image Processing,second ed.Prentice-
Hall,Upper Saddle River,NJ.
Granitto,P.M.,Navone,H.D.,Verdes,P.F.,Ceccatto,H.A.,2002.Weed seeds
identification by machine vision.Computers and Electronics in Agriculture
33 (2),91–103.
Granitto,P.M.,Verdes,P.F.,Ceccatto,H.A.,2005.Large-scale investigation of weed
seed identification by machine vision.Computers and Electronics in Agricul-
ture 47 (1),15–24.
Heckerman,D.,Chickering,D.M.,Meek,C.,Rounthwaite,R.,Kadie,C.,2000.
Dependency networks for inference,collaborative filtering,and data visualiza-
tion.Journal of Machine Learning Research 1 (1),49–75.
Hock,S.M.,Knezevic,S.Z.,Martin,A.,Lindquist,J.L.,2006.Soybean rowspacing and
weed emergence time influence weed competitiveness and competitive
indices.Weed Science 1 (54),38–46.
Hruschka,E.,Nicoletti,M.,Oliveira,V.,Bressan,G.M.,2008.BayesRule:a Markov-
blanket based procedure for extracting a set of probabilistic rules fromBayesian
classifiers.International Journal of Hybrid Intelligent Systems 5 (2),83–96.
Hughes,G.,Madden,L.V.,2003.Evaluating predictive models with application in
regulatory policy for invasive weeds.Agricultural Systems 76 (2),755–774.
Isaaks,E.H.,Srivastana,R.M.,1989.An Introduction to Applied Geostatistics.Oxford
University Press,New York.
Iwashita,F.,Landim,P.B.,2003.GEOMATLAB:geostatistics using MATLAB
(in Portuguese).Instituto de Geologia e Cieˆncias Exatas,Universidade Estadual
Paulista (UNESP),Rio Claro,SP,pp.1–17,Texto Dida´ tico 12.
Jurado-Expo´ sito,M.,Lo´ pez-Granados,F.,Garcı´a-Torres,L.,Garcı´a-Ferrer,A.,
Sanche´ z de la Orden,M.,Atenciano,S.,2003.Multi-species weed spatial
variability and site-specific management maps in cultivated sunflower.Weed
Science 51 (3),319–328.
Jurado-Expo´ sito,M.,Lo´ pez-Granados,F.,Gonza´ lez-Andujar,J.L.,Garcı´a-Torres,L.,
2004.Spatial and temporal analysis of Convolvulus arvensis L.populations over
four growing seasons.European Journal of Agronomy 21 (3),287–296.
Kropff,M.J.,Spitters,C.J.T.,1991.A simple model of crop loss by weed competition
from early observations on relative leaf area of the weeds.Weed Research 2
(31),97–107.
Marchant,J.A.,Onyango,C.M.,2003.Comparison of a Bayesian classifier with a
multilayer feed-forward neural network using the example of plant/weed/soil
discrimination.Computers and Electronics in Agriculture 39 (1),3–22.
Moz
ˇ
ina,M.,Dems
ˇ
ar
,J.,Kattan,M.,Zupan,B.,2004.Nomograms for visualization of
naı
¨v
e Bayesian classifier.In:Proceedings of the Eighth European Conference on
Principles and Practice of Knowledge Discovery in Databases,Pisa,Italy,pp.
337–348.
Oerke,E.C.,Dehne,H.W.,Schonbeck,F.,Weber,A.,1994.Crop Production and
Crop Protection.Estimated Losses in Major Food and Cash Crops.Elsevier,
Amsterdam.
Park,S.E.,Benjamin,L.R.,Watkinson,A.R.,2003.The theory and application of
plant competition models:an agronomic perspective.Annals of Botany 92 (6),
741–748.
Pearl,J.,1988.Probabilistic Reasoning in Intelligent Systems:Networks of Plausible
Inference.Morgan Kaufmann,San Mateo,CA.
Primot,S.,Valantin-Morison,M.,Makowski,D.,2006.Predicting the risk of weed
infestation in winter oilseed rape crops.Weed Research 46 (1),22–33.
Sakai,K.,2001.Nonlinear Dynamics and Chaos in Agricultural Systems.Develop-
ments in Agricultural Systems,Elsevier,Amsterdam,Netherlands.
Shiratsuchi,L.S.,2001.Mapping weed spatial variability using precision farming
tools (in Portuguese).Master’s Thesis,Escola Superior de Agricultura Luiz de
Queiroz,Universidade de S ao Paulo,Piracicaba,SP.
Smith,A.M.,Blackshaw,R.E.,2002.Crop/weed discrimination using remote
sensing.Geoscience and Remote Sensing Symposium 4,1962–1964.
Tan,P.,Kumar,V.,Srivastava,J.,2002.Selecting the right interestingness measure
for association patterns.In:Eighth ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining,Edmonton,Alberta,Canada,pp.32–41,
hDOI ¼ http://doi.acm.org/10.1145/775047.775053i.
Tversky,A.,Kahneman,D.,1974.Judgment under uncertainty:heuristics and
biases.Science 185 (4157),1124–1131.
Wallinga,J.,Groeneveld,R.M.W.,Lotz,L.A.P.,1998.Measures that describe weed
spatial patterns at different levels of resolution and their applications for patch
spraying of weeds.Weed Research 38 (5),351–359.
Wilkerson,G.G.,Wiles,L.J.,Bennett,A.C.,2002.Weed management decision
models:pitfalls,perceptions,and possibilities of the economic threshold
approach.Weed Science 50 (4),411–422.
ARTICLE IN PRESS
G.M.Bressan et al./Engineering Applications of Artificial Intelligence 22 (2009) 579–592592