Using Bayesian networks with rule extraction to infer the risk of weed

infestation in a corn-crop

Gla´ ucia M.Bressan

a

,Vilma A.Oliveira

a,

,Estevam R.Hruschka Jr.

b

,Maria C.Nicoletti

b

a

Universidade de Sa˜o Paulo,Departamento de Engenharia Ele´trica,13566-590 Sa˜o Carlos,SP,Brazil

b

Universidade Federal de Sa˜o Carlos,Departamento de Computac-a˜o,13565-905 Sa˜o Carlos,SP,Brazil

a r t i c l e i n f o

Article history:

Received 30 November 2007

Received in revised form

20 January 2009

Accepted 24 March 2009

Available online 14 May 2009

Keywords:

Bayesian network

Naı¨ve Bayes

Rule extraction

Weed infestation

Kriging

a b s t r a c t

This paper describes the modeling of a weed infestation risk inference system that implements a

collaborative inference scheme based on rules extracted fromtwo Bayesian network classiﬁers.The ﬁrst

Bayesian classiﬁer infers a categorical variable value for the weed–crop competitiveness using as input

categorical variables for the total density of weeds and corresponding proportions of narrowand broad-

leaved weeds.The inferred categorical variable values for the weed–crop competitiveness along with

three other categorical variables extracted fromestimated maps for the weed seed production and weed

coverage are then used as input for a second Bayesian network classiﬁer to infer categorical variables

values for the risk of infestation.Weed biomass and yield loss data samples are used to learn the

probability relationship among the nodes of the ﬁrst and second Bayesian classiﬁers in a supervised

fashion,respectively.For comparison purposes,two types of Bayesian network structures are

considered,namely an expert-based Bayesian classiﬁer and a naı¨ve Bayes classiﬁer.The inference

system focused on the knowledge interpretation by translating a Bayesian classiﬁer into a set of

classiﬁcation rules.The results obtained for the risk inference in a corn-crop ﬁeld are presented and

discussed.

& 2009 Elsevier Ltd.All rights reserved.

1.Introduction

Agricultural procedures may modify the ecological balance of a

ﬁeld due to the tilling procedures growers use to prepare the land,

quite often leading to a population explosion or infestation of

some inconvenient plants commonly known as weeds.Weed

control is a fundamental part of all crop production systems.Yield

reductions due to weeds are commonly known obstacle in harvest

operations as they lower crop quality by competing with the crop

for limited resources,such as water,nutrients,light,etc.Oerke et

al.(1994) estimated that a 10% loss of worldwide agricultural

production might be a consequence of weed activity.

In general,the main components of weed management

systems are herbicides.Usually,herbicides are uniformly spread

over the entire ﬁeld aiming at weed control.A uniformapplication

rate is often based on a visual evaluation of the weed density,with

no procedure used to evaluate the risks associated with under and

over spraying (Faechner et al.,2002).However,weed infestation

does not occur over the entire ﬁeld and the amount of herbicides

could be reduced by spraying only over the weed patches

(Wallinga et al.,1998;Jurado-Expo´ sito et al.,2004).The prediction

of weed dispersion can be efﬁciently used in preventing infesta-

tions by applying herbicides only in speciﬁc regions (Jurado-

Expo´ sito et al.,2003;Faechner et al.,2002).Reducing the quantity

of herbicides potentially reduces herbicide residues in water,food

crops and in the environment,and it may prevent the develop-

ment of weed resistance (Aitkenhead et al.,2003).

In the literature,a considerable diversity of weed management

decision models can be found.There are many different

approaches,ranging from empirical functions to mechanistic

simulation models.As surveyed by Wilkerson et al.(2002),some

of the models are too simple as they do not include all factors that

can inﬂuence weed competition or other issues farmers consider

when deciding how to manage weeds.Other models can be

excessively complex given that many users might ﬁnd difﬁculty in

obtaining the needed information or do not have the required

equipment for acquiring the data.According to Wilkerson et al.

(2002),weed management decision models must be built and

evaluated from three perspectives:biological accuracy,quality of

recommendations and ease of use.In addition,another important

issue to be taken into account when building weed management

systems is related to the interpretation of the model.The latter is

of particular interest in the experiments conducted in this paper.

There are few formalisms that can be used to model weed

infestation in a crop ﬁeld.Primot et al.(2006) developed 20

simple models (ﬁve are linear regression models and the other 15

ARTICLE IN PRESS

Contents lists available at ScienceDirect

journal homepage:www.elsevier.com/locate/engappai

Engineering Applications of Artiﬁcial Intelligence

0952-1976/$- see front matter & 2009 Elsevier Ltd.All rights reserved.

doi:10.1016/j.engappai.2009.03.006

Corresponding author.Tel.:+551633739336;fax:+551633739372.

E-mail address:vilmao@sel.eesc.usp.br (V.A.Oliveira).

Engineering Applications of Artiﬁcial Intelligence 22 (2009) 579–592

are logistic regression models).The models were evaluated for

their ability to discriminate the ﬁelds with a high level of weed

infestation from the ﬁelds with a low level of infestation—the

parameters of the 20 models were estimated using 3 years of

experimental data.The models can be used to help farmers decide

what type of weed control (chemical,mechanical or biological)

to use.

The risk of weed infested crop can be inferred from the

mathematical modeling of the weed behavior,based on experi-

mental data.Dynamic models for weed seed populations describe

the population size at life-cycle t as a function of the population

size at life-cycle t 1 using difference (Sakai,2001;Cousens and

Mortimer,1995).The dynamic models indicate that infestation is

not only dependent upon the weed density but also on the

competitiveness of the weed species (Park et al.,2003;Firbank

and Watkinson,1985;Kropff and Spitters,1991).More recently,

competitive indexes and weed ranking were used to quantify the

weed competitiveness in a soybean ﬁeld (Hock et al.,2006).

Although purely mathematical models can be used for modeling

the weed risk of infestation,with good performance,as described

in several of the previous references,most of themlack ﬂexibility

and more important,lack interpretability—they work as ‘black

boxes’ where the user feeds a fewvalues and the systemoutputs a

diagnosis.

A particular class of models is based on probability.Of special

interest in this paper is the class of Bayesian networks (BN)

models,which are based on the probability that a given set of

measurements deﬁne objects as belonging to a certain class.In the

literature,Bayesian based methods have already been used for

modeling similar problems (Hughes and Madden,2003;Smith

and Blackshaw,2002;Banerjee et al.,2005).Particularly,Hughes

and Madden (2003) proposed a risk assessment methodology to

identify which exotic plant species,among those presented for

import,are a threat (to agricultural and ecological systems) and

which are not.Bayesian theory has also been employed in the

agriculture domain as the basis for developing classiﬁcation

systems,as described in Granitto et al.(2002).In their work,the

performance of a naı

¨

ve Bayes classiﬁer (BC) is used as the

selection criterion for identifying a nearly optimal set of 12 seed

characteristics further used as classiﬁcation parameters,such as

coloration,morphological and textural features.Considering the

seed identiﬁcation problem,the work described in Granitto et al.

(2005) compared naı¨ve Bayes classiﬁer performance to an

artiﬁcial neural network (NN) based classiﬁer.In this particular

experiment the naı¨ve Bayes classiﬁer with an adequately selected

set of classiﬁcation features outperformed the NN based classiﬁer.

Similar result was also obtained in Marchant and Onyango (2003)

but with a Bayesian classiﬁer and a multilayer feed-forward

neural network in a task for discriminating plants,weeds,and soil

in color images.

The main goal of this paper is to propose and describe the use

of Bayesian network methods to infer the risk of weed infestation

in a corn-crop as well as to present and discuss the results

obtained in a real application domain based on empirical data.

The procedure is implemented as a collaborative system that

integrates two classiﬁcation tasks.The ﬁrst uses a Bayesian

network to infer the competitiveness of weeds expressed by their

biomass,using as input the total density of weeds,and

corresponding narrow and broad-leaved proportions.The second

task assesses the risk of infestation,expressed by the yield loss,

using as input the previous inferred competitiveness,as well as

features extracted fromthe weed seed density,weed coverage and

weed seed patches.The three last variables are estimated with a

geostatistics method called kriging (Brooker,1979;Isaaks and

Srivastana,1989) and image objects (Gonzalez and Woods,2002)

fromweed seed density and weed coverage data samples.

In addition,the paper also presents the translation of the

induced Bayesian networks into a set of classiﬁcation rules,

aiming at a more comprehensible knowledge representation.As

mentioned

before,this is an important aspect of a knowledge

based system construction,since it provides the system cred-

ibility,a quality that other types of representation lack.Therefore,

the main idea of the conducted experiments is not to show that

the translation method is better than traditional classiﬁers (as

C4.5,for instance) or rule extraction methods.The claimis that it

is possible to take advantage of both the causal knowledge

representation (which can be adequately represented in a BN or

BC) and high accuracy of a Bayesian classiﬁer to have a set of

classiﬁcation rules (extracted from the BC) as a knowledge base.

For both classiﬁcation tasks implemented by the collaborative

system,two different Bayesian network structures are used for

comparison purposes.One is induced by the naı¨ve Bayes

algorithm (Duda and Hart,1973) using empirical data and the

other,an unrestricted Bayesian network,is designed and reﬁned

by an expert using the same empirical data.The networks in this

paper are referred to as naı¨ve Bayes and expert-based networks,

respectively.Due to their different architectures,the two Bayesian

networks have different performances,depending on the available

information.A set of probabilistic classiﬁcation rules is then

extracted from each of the Bayesian networks using a Markov-

based strategy proposed in Hruschka et al.(2008).To reduce the

number of rules where the Markov-based strategy does not

remove categorical variables,a pruning strategy is proposed.The

pruning strategy is mainly motivated by the fact that no extra

computation effort is needed.The pruning can be done by

considering only the rules having estimated probability higher

than a predeﬁned threshold.This paper is an extended and revised

version of two earlier conference papers namely Bressan et al.

(2007a,b).

The remaining of this paper is organized as follows.Section 2

describes the basics of Bayesian networks and naı¨ve Bayes

classiﬁers and discusses the importance of improving their

understandability.Section 3 focuses on two important issues:

the approach used to collect and to interpolate empirical data,and

the construction of the collaborative system that integrates two

Bayesian classiﬁers.Section 4 presents the results of the

collaborative system,focusing on the results of the individual

classiﬁers,that is,the Bayesian network and the naı¨ve Bayes

classiﬁers.Finally,Section 5 presents some concluding remarks

and highlights the next steps for this research work.

2.Basics of Bayesian networks,Markov blanket and

classiﬁcation rules

As pointed out in Heckerman et al.(2000),Bayesian networks

and Bayesian classiﬁers are usually employed in data mining tasks

mainly because they (i) may deal with incomplete data sets

straightforwardly;(ii) can learn causal relationships;(iii) may

combine prior knowledge with patterns learnt from data and (iv)

can help to avoid overﬁtting.

A Bayesian network can be viewed as a form of probabilistic

graphical model used for knowledge representation and reasoning

about data domains.Instead of encoding a joint probability

distribution over a set of random variables,as usually done by a

Bayesian network,a Bayesian classiﬁer usually aims to correctly

predict the value of a discrete class variable given the value of a

vector of features (predictors).Since Bayesian classiﬁers are a

particular type of Bayesian networks the concepts and results

described in this section are valid for both.

A Bayesian network consists of two components—a network

structure,which is a directed acyclic graph,and a set of

ARTICLE IN PRESS

G.M.Bressan et al./Engineering Applications of Artiﬁcial Intelligence 22 (2009) 579–592580

probability tables.The nodes of the Bayesian network represent

variables and the arcs between nodes represent dependence

relation between the corresponding variables.An arc starting at a

node X (representing variable X) and ending at a node Y

(representing variable Y) establishes X as a parent of Y and Y as

a child of X.A Bayesian network can be used to compute the

conditional probability of one node,given values assigned to the

other nodes.Hence,a Bayesian network can be used as a classiﬁer

that gives the posterior probability distribution of the class node

given the values of other attributes.When learning Bayesian

networks from datasets,nodes are used to represent dataset

features.

Consider a ﬁnite set fX

i

;i ¼...;ng of discrete randomvariables

where each variable may take on values (represented by lower-

case letters) from a ﬁnite set.As formally stated in Cheng et al.

(2002),a Bayesian network is represented by BN ¼ hN;A;

Y

i where

the component oN;Ai is a directed acyclic graph with nodes

X

i

2 N,i ¼...;n representing domain variables and arc a 2 A

between nodes representing a probabilistic dependency between

the associated nodes and,ﬁnally,denoting

p

X

i

as the set of parents

of X

i

in hN;Ai,the last component of BN is given by

Y

¼ f

y

X

i

j

p

X

i

¼

PðX

i

j

p

X

i

g for each possible value x

i

of X

i

and

p

x

i

of

p

X

i

which

collectively represents a conditional probability distribution

(CPtable) that quantiﬁes how a node X

i

2 N depends on its

parents.The conditional independence assumption (Markov

condition) allows the calculation of the joint probability distribu-

tion function over the variables fX

i

;i ¼...;ng based on the

background knowledge as

PðX

1

;X2;...;X

n

Þ ¼

Y

n

i¼1

PðX

i

j

p

X

i

Þ

¼

Y

n

i¼1

y

X

i

j

p

X

i

,(1)

where n ¼ jNj.Therefore,a Bayesian network can be used as a

knowledge representation that allows inferences.

Bayesian networks can be built by an expert,or can be learnt

fromdata.The learning of a Bayesian network can be divided into

two procedures:one responsible for the network structure

learning and the other responsible for the conditional probability

tables learning for the structure.The learning of these tables can

be carried out using empirical conditional frequencies from data

(Cheng et al.,2002).When building a Bayesian network based on

subject specialist knowledge,the major problemis the conditional

distribution probability deﬁnition.This is due to human beings

tendency to miscalculate probabilities (Tversky and Kahneman,

1974).To avoid this difﬁculty it is possible to use expert

knowledge to build only the Bayesian network structure and then

use learning algorithms to induce

Y

from data.

2.1.Markov blanket

In a Bayesian network structure,with

l

X

i

as the set of children

of node X

i

and

p

X

i

as the set of parents of node X

i

,the subset of

nodes containing

p

X

i

,

l

X

i

and the parents of

l

X

i

is called the

Markov blanket of X

i

,as shown in Fig.1.As stated in Pearl (1988),

in a Bayesian network the only nodes that have inﬂuence on the

conditional probability distribution of a given node X

i

are the

nodes that belong to the Markov blanket of X

i

.Thus,after learning

a Bayesian network classiﬁer fromdata,the Markov blanket of the

node that represents the class can be used as a feature subset

selection method,in order to identify,from all the nodes that

deﬁne the network,those that inﬂuence the class node.

As previously mentioned,Bayesian networks can also be used

as classiﬁers.A Bayesian network,however,is not designed to

optimize the conditional likelihood of the class given the other

features (Domingos and Pazzani,1997).Consequently,Bayesian

networks may not produce good classiﬁcation results.Actually,

even the naı

¨v

e Bayes classiﬁer can outperform more complex

Bayesian networks classiﬁers in some domains (Friedman et al.,

1997).

A naı¨ve Bayes is a Bayesian network with a ﬁxed structure,in

which the class node has no parents and each feature has the class

node as its unique parent.Since naı¨ve Bayes classiﬁers have their

structure predeﬁned,only the numerical parameters need to be

learnt;thus only information about the features and their

corresponding values are needed to estimate probabilities.The

computational time complexity of learning a naı¨ve Bayes classiﬁer

is linear with respect to the amount of training instances.The

construction is also space efﬁcient,requiring only the information

provided by two-dimensional tables (CPtables),in which each

entry corresponds to a probability estimated for a given value of a

particular feature.However,the naı¨ve Bayes classiﬁer makes a

strong and unrealistic assumption:all the features are condition-

ally independent given the value of the class.

2.2.Classiﬁcation rules

The knowledge represented by a Bayesian classiﬁer is not as

comprehensible as some other forms of knowledge representa-

tion,as for instance,classiﬁcation rules.In the literature there are

a few works that aim at improving the readability/understand-

ability of Bayesian classiﬁers;for instance,Moz

ˇ

ina et al.(2004)

implements a visualization process of a naı¨ve Bayes model in the

formof a nomogram.In Hruschka et al.(2008),after inducing the

Bayesian classiﬁer,the BayesRule method improves the under-

standability by implementing its translation into a set of

probabilistically qualiﬁed if–then rules of the form

If condition then class with certainty F,(2)

where the condition is called antecedent and F is a percentage

value.

In the BayesRule method,the a posteriori probability for the

rules is evaluated as follows.Let v

1

;v

2

;...;v

n

;c be the sets of

categorical variables values for X

1

;X

2

;...;X

n

and C,respectively.

Also,let v

i

¼ fv

i1

;...;v

ij

i

g,that is,jv

i

j ¼ j

i

;i ¼ 1;...;n and c ¼

fc

1

;...;c

j

g,that is jcj ¼ j.

By using the BayesRule method,the number of variables

involved in the condition part of a rule is reduced since the

method only considers the Markov blanket of the class variable C.

Considering a particular situation where the Markov blanket of

the class variable C is the set fX

1

;...;X

k

g,the a posteriori

probability of class C ¼ c

‘

2 fc

1

;...;c

j

g given the values of the

variables in the Markov blanket of class C for a particular

ARTICLE IN PRESS

X

i

Fig.1.A network structure and the Markov blanket of node X

i

represented by

shadowed nodes.

G.M.Bressan et al./Engineering Applications of Artiﬁcial Intelligence 22 (2009) 579–592 581

instantiation of indexes J

i

;i ¼ 1;...;k is

PðC ¼ c

‘

jv

1;J

1

;...;v

k;J

k

Þ

¼ arg max

J2f1;...;jg

fPðC ¼ c

J

jv

1;J

1

;...;v

k;J

k

Þg,(3)

with

PðC ¼ c

J

jv

1;J

1

;...;v

k;J

k

Þ ¼ PðC ¼ c

J

Þ

Pðv

1;J

1

;...;v

k;J

k

jC¼c

J

Þ

Pðv

1;J

1

;...;v

k;J

k

Þ

.(4)

For the naı¨ve Bayes network model the features are assumed to be

independent given the class C ¼ c

J

and (4) becomes

PðC ¼ c

J

jv

1;J

1

;...;v

k;J

k

Þ/PðC ¼ c

J

Þ

Y

k

i¼1

Pðv

i;J

i

jC ¼ c

J

Þ.(5)

A categorical probabilistic if–then rule has the form

R

r

:

If X

1

is v

1;J

1

and and X

k

is v

k;J

k

then

C is c

J

with certainty F given by (3),(6)

where index r is used for referencing the rules given by the

BayesRule method.

The conﬁdence of a rule can be deﬁned using inferential

results.In doing so,the probability given to the inferred class may

be used as a conﬁdence value and it is embedded in the inference

algorithm.Among the many methods for data understanding,the

BayesRule method focuses on translating a Bayes classiﬁer into a

set of classiﬁcation rules in their simplest,propositional form,as a

way of promoting the understandability of the corresponding

Bayesian network classiﬁer.Reasoning with logical rules is more

acceptable to users than the recommendations given by black box

systems.Moreover,reasoning with rules is comprehensive,

provides explanations,and can be validated by human inspection.

3.Bayesian network inference modeling

To present the collaborative inference system for the risk of

weed infestation in a corn-crop,this section is organized into two

pats.The ﬁrst one describes the procedure for collecting and

preparing the data and the second how the data were used to

model two Bayesian network classiﬁers (the naı¨ve Bayes and the

expert-based) for inferring the risk of a weed infestation in a corn-

crop.Fig.2 presents an schematic diagram of the proposed

collaborative inference system.

3.1.Collecting and preparing the data

In the experiments described in this paper,data from a corn-

crop ﬁeld located in an experimental farm of the Empresa

Brasileira de Pesquisa Agropecua´ ria (Embrapa),in Sete Lagoas,

Minas Gerais,Brazil,were used.

1

A ﬁeld of a 49ha area was tilled

in 16–20 November 2004 and again in 15–19 May 2006.The area

contains 41 experimental ﬁeld parcels 100m distant from each

other.The parcels are rectangular measuring 4m (east–west

direction) and 3m (north–south direction),with ﬁve corn rows

separated from each other by 0.7m,starting at 0.1m from the

bottom edge.Before the crop development,the glifosate 2.4kg

active ingredient (a.i.) ha

1

herbicide was applied outside the

parcels.Also,after the crop development,nicosulfuron 0.04kg

(a.i.) ha

1

and atrazine 1kg (a.i.) ha

1

herbicides were applied all

over the ﬁeld,except on the parcels.The samples per parcel were

obtained in April 2005 and October 2006 for two different corn-

crops,excepted for the yield loss which was evaluated in June

2005 and November 2006.

To obtain the weed density data,that is,the number of weeds

per m

2

in each parcel and the biomass of the species,four squares

measuring 0:5m0:5m were randomly placed within each

parcel and the narrow-leaved and broad-leaved weed species

were collected and counted.Then,the weed species were

separated into bags and kept in a greenhouse at the temperature

of 105

C until their mass has become constant.At this point,the

biomass of the species,deﬁned as the amount of dry material per

m

2

of the aerial part of weeds,was measured.The weed density

and the biomass samples were collected in each experimental

parcel.Therefore,82 data instances were obtained,that is,two

data instances for each of the 41 parcels.Analyzing the collected

data,11 data instances were identiﬁed as outliers and removed.

To obtain weed seed production per m

2

,the weed seeds of one

weed from each specie were counted and multiplied by the

number of weeds found in the squares.The weed coverage data

were estimated by visual observation of the percentage of surface

infested by weeds.This coverage is mainly due to the weed seeds

from the previous weed population which germinated.The weed

seed density,associated to the seed production,and the weed

coverage samples were collected,as described above,from each

ARTICLE IN PRESS

Broad-leaved

weed density

Narowed-leaved

weed density

Inferring the competitiveness of weed-crop

Total weed density

Bayesian network

classifier

Weed seed

density

Weed

coverage

Risk

Bayesian network

classifier

Geoestatistics

image analisys

Inferring the risk of infestation

x

2

x

1

x

3

x

4

Fig.2.Input–output of the proposed collaborative classiﬁcation system.

1

Embrapa—Project 55.2004.509.00:Rede de Conhecimento em Agricultura

de Precisa˜o para Condic-o˜es do Cerrado e dos Campos Gerais.

G.M.Bressan et al./Engineering Applications of Artiﬁcial Intelligence 22 (2009) 579–592582

experimental ﬁeld parcel also in two different corn-crops,

resulting in 82 data instances.

To evaluate the yield loss per experimental parcels,ﬁrst the

yield was measured by the mass of the corn grains.The mass of

the corn grains was adjusted for the humidity of 13% and

converted from gm

2

to kgha

1

.Then,the yield loss was

evaluated as

P

RR

i

¼

Y

0

Y

i

Y

0

;i ¼ 1;...;41,(7)

where Y

0

denotes the maximum yield found in the witness

parcels (with herbicide application) and Y

i

the yield of each

experimental parcel (without herbicide application).

3.2.Inferring the risk of infestation

The proposed collaborative inference systemis based on inputs

given by the discretized values of infestation features for the weed

coverage,weed seed production density,weed seed patches and

the weed–crop competitiveness,denoted as x

1

;x

2

;x

3

;x

4

,respec-

tively.As already mentioned,the feature values for the weed–crop

competitiveness are inferred by the ﬁrst network classiﬁer of the

collaborative system.Following,it is described howthe weed seed

production and weed coverage maps were estimated with kriging

and subsequently treated as images so to obtain the other

features.

3.2.1.Kriging and maps

Interpolation methods have been used in precision farming to

infer the values to non-sampling locations.As already mentioned,

the estimation method used was the geostatistics method called

kriging,an interpolation approach that provides optimal estima-

tive of regionalized variables with minimumvariance and without

bias,using a theoretical variogram (Isaaks and Srivastana,1989;

Shiratsuchi,2001).A variogram,also referred to as a semivario-

gram,shows the degree of spatial dependence among the samples

and generally is an increasing monotonic function that reaches a

plateau.The distance at which the variogramreaches the plateau

is called range.The frequently used models for the theoretical

variograms are described in detail in Isaaks and Srivastana (1989).

The parameters of the theoretical variogram used in a

interpolation problem are selected from an experimental vario-

gram.The experimental variogram

g

ðhÞ is given by the following

equation:

g

ðhÞ ¼

1

2N

h

X

ði;jÞjjh

ij

j¼h

½ZðjÞ ZðiÞ

2

,(8)

where N

h

is the number of pairs of data whose locations are

separated by h,i and j represent the location i and j,respectively,

ZðiÞ is the value of the variable Z at location i,jh

ij

j is the Euclidean

norm of the vector h

ij

and h

ij

is the vector from location i to

location j.

Aiming at ﬁnding the most suitable model,the collected

samples were used with the exponential,Gaussian and spherical

variogram models.The exponential model was chosen based on

the criteria suggested in Iwashita and Landim (2003) since it

provided the smallest ﬁt index (FI) for the sample set,as shown in

Table 1 for data collected in 2005 and 2006.The exponential

variogram model is given by

g

ðhÞ ¼

C

0

þC

1

1 e

h

a

;0ohpa;

C

0

þC

1

;h4a;

8

>

<

>

:

(9)

with C

0

the nugget effect,C

1

the variance of variable Z,C

0

þC

1

the

sill,and a the range.The ﬁt index over the pairs of data whose

locations are separated by all N vectors h named h

k

;k ¼ 1;...;N,is

deﬁned as

FI ¼

1

N

X

N

k¼1

ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

ð

g

ðh

k

Þ

g

ðh

k

ÞÞ

2

q

g

ðh

k

Þ

.(10)

The ﬁtted variograms for the weed coverage and weed seed

production for the 2005 and 2006 data are shown in Figs.3 and 4,

respectively.Each point of the variogram represents pairs of data

equally apart and is described in Table 2 for 2005 and 2006 data.

As the spatial dependence deﬁned by variograms exists up to

200m,the weed data for non-sampled locations were estimated

with kriging,based on the collected data,to obtain the spatial

representation of the weeds.

The interpolation grid was selected as 20m20m,which is

within the variogram range.The corn-crop of 49ha

(700m700m) was then divided into 35 cells per axis sized

20m20m.Estimated maps for the weed coverage at the current

life-cycle and weed seed production at the subsequent life-cycle

were thus generated.The maps are shown in Figs.5 and 6 for the

2005 and 2006 data,respectively.

The estimation quality with kriging was evaluated by cross

validation (Isaaks and Srivastana,1989).Three characteristics of

the residuals,mean closes to zero,constant variance and normal

probability were analyzed,indicating a good estimative.Table 3

shows the results of the cross validation for the 2005 and 2006

data.As the estimative residual means contain the zero,the null

hypothesis of the mean being close to zero is not rejected.The

variances are considered constants with

¯

R the residual size and

the Anderson–Darling test is used to check the normality of the

residuals distribution with 95% of conﬁdence.As the p-value for

the residuals are larger than 0.05,the hypothesis that residuals

have normal distribution is not rejected.

3.2.2.Map objects and features

Weeds have a tendency to aggregate in clusters.This tendency

explains why certain regions of a ﬁeld are free of weeds.Due to

the spatial variability of weeds in agricultural ﬁelds,it is possible

to detect clusters frommaps.Let

R

ðu;vÞ represent the entire map

region with ðu;vÞ the spatial coordinate of the intensities in the

map.The clusters detected in

R

ðu;vÞ associated to the weed maps

provide three features to infer the weed infestation risk.Assuming

the features have three categorical conditions,the clusters in

R

ðu;vÞ are described by connected objects obtained as follows.

First,to form a map Iðu;vÞ with coded intensities,the

intensities f ðu;vÞ of

R

ðu;vÞ are quantized into three levels

L

1

;L

2

;L

3

associated to ranges equally apart of f ðu;vÞ by an encoder

Q as follows:

Iðu;vÞ ¼ Qðf ðu;vÞÞ ¼ t,(11)

where

t ¼

1 if f ðu;vÞpL

1

;

2 if L

1

of ðu;vÞpL

2

;

3 if L

2

of ðu;vÞpL

3

:

8

>

<

>

:

ARTICLE IN PRESS

Table 1

Fit index of theoretical variograms models.

Model Weed seed Weed coverage

2005 2006 2005 2006

Exponential 0.12 0.05 0.11 0.06

Gaussian 0.25 0.16 0.22 0.16

Spherical 0.16 0.08 0.15 0.08

G.M.Bressan et al./Engineering Applications of Artiﬁcial Intelligence 22 (2009) 579–592 583

ARTICLE IN PRESS

Table 2

Results for the variograms.

Number of pairs separated by h

k

Distance h

k

in m

g

ðh

k

Þ

2005 2006 2005 2006 2005 2006

Weed seed production per m

2

14.50 14.50 101.57 101.52

6:19 10

6

2:06 10

6

80.50 98 141.64 139.22

6:79 10

6

2:77 10

6

129 157 245.59 245.06

5:78 10

6

2:34 10

6

116.50 146.50 344.63 344.73

6:47 10

6

2:58 10

6

90 16 445.61 439.74

7:74 10

6

2:56 10

6

54 98.50 544.41 543.19

6:08 10

6

2:57 10

6

Weed coverage in % 1 16.50 95 101.57 0.045 0.048

113 106 126.63 139.47 0.042 0.063

142.50 172.50 223.81 245.13 0.047 0.057

210 161 330.39 344.85 0.044 0.063

142.50 130.50 435.92 440.20 0.054 0.071

98 112.50 528.15 543.11 0.061 0.063

59.50 – 619.44 – 0.062 –

4 – 690.96 – 0.048 –

100 200 300 400 500 600 700

0

0.02

0.04

0.06

0.08

distance (m)

γ∗(h)γ(h)

100 200 300 400 500 600 700

0

2

4

6

8

10

x 10

6

distance (m)

γ∗(h)γ(h)

Fig.3.Theoretical variograms for the 2005 data obtained with an exponential model (solid line) and the corresponding experimental variogram(points) for (a) the weed

coverage with C

0

¼ 0:038,C

0

þC

1

¼ 0:05 and for (b) the weed seed production with C

0

¼ 5:09 10

6

,C

0

þC

1

¼ 6:50 10

6

.

100 200 300 400 500 600 700 800

0

0.02

0.04

0.06

0.08

distancia (m)

γ∗(h)γ(h)

γ∗(h)γ(h)

100 200 300 400 500 600

0

0.5

1

1.5

2

2.5

3

x 10

6

distancia (m)

Fig.4.Theoretical variograms for the 2006 data obtained with an exponential model (solid line) and the corresponding experimental variogram(points) for (a) the weed

coverage with C

0

¼ 0:048;C

0

þC

1

¼ 0:063 and for (b) the weed seed production with C

0

¼ 2:0 10

6

;C

0

þC

1

¼ 2:60 10

6

.

G.M.Bressan et al./Engineering Applications of Artiﬁcial Intelligence 22 (2009) 579–592584

ARTICLE IN PRESS

0

200

400

600

0

100

200

300

400

500

600

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0

200

400

600

0

100

200

300

400

500

600

1000

2000

3000

4000

5000

6000

7000

Fig.5.Maps estimated with kriging for data collected in 2005 associated to (a) the weed coverage map at the current life-cycle and (b) the weed seed production map at

the subsequent life-cycle.The up right corner of both maps represents the irregular contour of the corn-crop ﬁeld.The gray scale in (a) represents percentage and in (b)

represents the number of seeds per m

2

.

0 200 400 600

0

100

200

300

400

500

600

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 200 400 600

0

100

200

300

400

500

600

1000

2000

3000

4000

5000

6000

7000

Fig.6.Maps estimated with kriging for data collected in 2006 associated to (a) the weed coverage map at the current life-cycle and (b) the weed seed production map at

the subsequent life-cycle.The up right corner of both maps represents the irregular contour of the corn-crop ﬁeld.The gray scale in (a) represents percentage and in (b)

represents the number of seeds per m

2

.

Table 3

Cross validation for kriging estimation for 2005 and 2006 data.

Residual mean Mean interval Constant variance Anderson–Darling test

Weed coverage

2005

0:50 10

2

½0:07;0:06 ¯

R ¼ 0:25 p-value ¼ 0:61

2006

0:11 10

1

½0:10;0:07 ¯R ¼ 0:36 p-value ¼ 0:19

Weed seed production

2005

0:40 10

2

½0:17;0:18 ¯R ¼ 0:26 p-value ¼ 0:45

2006

0:82 10

1

½0:68;0:51 ¯

R ¼ 0:21

p-value ¼ 0:16

G.M.Bressan et al./Engineering Applications of Artiﬁcial Intelligence 22 (2009) 579–592 585

The pixels in Iðu;vÞ may represent the same intensity range but

may belong to different clusters within the image.Connected

objects are thus obtained by image analysis using a 4-connected

model (Gonzalez and Woods,2002).In this model,two pixels in

the four-neighbors are connected if they have the same value.The

four-neighbors of pixel p at coordinates ðu;vÞ are given by

ðu þ1;vÞ;ðu 1;vÞ;ðu;v þ1Þ;ðu;v 1Þ.(12)

The four-connected model is implemented by generating a binary

matrix called I

b

ðu;vÞ as

If Iðu;vÞa0 then I

b

ðu;vÞ ¼ 1.(13)

Finally,the connected objects are organized in a matrix called

Tðu;vÞ.In a gray scale,Fig.7 shows the image of the labels given to

the nine connected objects identiﬁed in both the maps of the

weed coverage and weed seed production for the 2005 data and

Fig.8 shows the same images for the 2006 data.

Using the connected objects deﬁned above,features for the

infestation were selected (Bressan et al.,2008).The features were

evaluated per regions of size not exceeding the spatial depen-

dence of the data sets.Let R

i

;i ¼ 1;...;N

R

denote subregion

R

i

of

R

,p

t

i

the number of connected object in

R

i

such that Tðu;vÞ ¼ t

and k

t

i

the number of pixels with intensities equal to t in

R

i

.The

features were established as follows:

x

1

:Feature for the weed coverage per region.Indicates the

percentage of surface infested by emergent weeds in each

region.In each region

R

i

it is obtained as the weighted

intensities Tðu;vÞ,as follows:

x

1

ðiÞ ¼

1

number elements of

R

i

P

ðu;vÞ2

R

i

Tðu;vÞ

P

3

t¼1

t

.(14)

x

2

:Feature for the weed seed production per region.Charac-

terizes the locations of seeds which can germinate in each

region and is associated with the weed seed production.It is

obtained in the same way as feature

u

1

.

x

3

:Feature for the weed seed patches per region.Represents how

the seeds contribute to weed proliferation in the surroundings

ARTICLE IN PRESS

0

200

400

600

0

100

200

300

400

500

600

0

2

4

6

8

0

200

400

600

0

100

200

300

400

500

600

0

2

4

6

8

Fig.7.Maps of connected objects representing (a) the matrix Tðu;vÞ for the weed coverage and (b) the weed seed production (2005 data).

0 200 400 600

0

100

200

300

400

500

600

0

2

4

6

8

10 10

0 200 400 600

0

100

200

300

400

500

600

0

2

4

6

8

Fig.8.Maps of connected objects representing (a) the matrix Tðu;vÞ for the weed coverage and (b) the weed seed production (2006 data).

G.M.Bressan et al./Engineering Applications of Artiﬁcial Intelligence 22 (2009) 579–592586

of each region.The worst case of a patch distribution is one

patch covering all the cells of the image,representing 100% of

occupation.If a value of weed seed occupy a large part of a

region,that is,several pixels contain this value,then,the object

formed by this value has a high inﬂuence on the weed seed

patch calculation.In each region

R

i

,it is obtained as the

average of the weighted intensities Tðu;vÞ of the connected

objects as follows:

x

3

ðiÞ ¼

1

number elements of

R

i

P

tjp

t

i

a0

tk

j

i

=p

t

i

P

3

t¼1

t

(15)

x

4

:Feature for the competitiveness.Reﬂects the high level of

competitiveness of certain species of weeds and their pro-

liferation and is based on the weed biomass.The higher the

weed biomass the higher the competitiveness.It is obtained as

the output of the ﬁrst Bayesian network as a categorical

variable for each region using the resulting rule set.

An example of the evaluation of the features x

2

and x

3

in region

R

25

for 2006 data is presented in Table 4.

Along with features x

i

;i ¼ 1;2;3,which had to be discretized

in order to build matrices of categorical variables,the matrix of

categorical variables representing the feature for the competi-

tiveness was an input to the network that classiﬁes the risk of

infestation for each region

R

i

.The features x

1

and x

4

were

calculated at current life-cycle and features named x

2

and x

3

at

subsequent life-cycle of the weed population.The features as well

as the yield loss were obtained for both 2005 and 2006 life-cycle

weed–crop data sets.The infestation features were evaluated per

region.Then,the crop was divided into 49 regions of 5 5 cells,

each one having 100m100m,not exceeding the data set spatial

dependence of 200m.The variables used to train the networks are

the categorical variables obtained by region normalized in ½0;1.

Therefore,two instances for each of the 49 regions were

considered resulting in 98 instances.

In order to extract probabilistic rules using the BayesRule

method,the values of all the variables had to be discretized.The

discretizing was conducted by an expert who proposed the three

intervals described in Table 5 represented by categorical variables.

3.2.3.Bayesian networks structures

Since the risk can be explained by the yield loss this was

deﬁned as the class variable.Fig.9 shows the structure of the

Bayesian network classiﬁer represented by the parent–children

relationships,deﬁned by a subject specialist.The node identiﬁed

as weed biomass is the class node from which the

competitiveness is inferred.

Fig.10 shows the naı

¨

ve Bayes classiﬁer structure that

represents the same problem,in which the class variable has no

parents and all the features are conditionally independent given

the class variable.For the purpose of rule comparison,only the

node competitiveness from the ﬁrst collaborative network is

included in the learning of the naı¨ve Bayes classiﬁer.

It is evident by inspecting the expert-based Bayesian network

classiﬁer depicted in Fig.9 that the weed coverage,total weed,

broad-leaved weed and narrow-leaved weed nodes do not belong

to the Markov blanket of the class node deﬁned as the yield loss.

Therefore,these nodes will not be taken into account by the

BayesRule method (Hruschka et al.,2008).

Once both Bayesian networks had their structure deﬁned,the

next step was to learn the conditional probability distribution

associated to their nodes.This was accomplished as part of the

BayesRule method,using a free software called Genie.

2

As

mentioned in Section 1,the knowledge represented by a Bayesian

classiﬁer is not easily understood by human beings.A way of

promoting its understandability is by translating it into a more

ARTICLE IN PRESS

Table 4

Objects and features for

R

25

with

R

i

2 R

55

.

T ¼

1 1 1 2 2

1 1 1 2 2

2 2 2 2 2

2 2 2 2 1

2 2 2 2 1

2

6

6

6

6

6

6

4

3

7

7

7

7

7

7

5

k

1

25

¼ 8;p

1

25

¼ 2

k

2

25

¼ 17;p

2

25

¼ 1

x

2

¼ 0:2800;x

3

¼ 0:2533

Table 5

Discrete intervals for the risk of infestation categorical variables.

Node variables Intervals

Weed coverage (WCoverage) (%) Thin(Th) Average(A) Thick(k)

[0,0.35] ]0.35,0.70[ [0.70,1]

Weed seed (WSeed) (m

2

)

Low(L) Medium(M) High(A)

[0,0.35] ]0.35,0.70[ [0.70,1]

Weed seed patches (WSPatch) (m

2

)

Small(S) Regular(R) Large(G)

[0,0.40] ]0.40,0.80[ [0.80,1]

Total weed (TWeed) (m

2

)

Low(L) Medium(M) High(A)

[0,0.20] ]0.20,0.60[ [0.60,1]

Narrow-leaved weed (NLWeed) (m

2

)

Low(L) Medium(M) High(H)

[0,0.20] ]0.20,0.60[ [0.60,1]

Broad-leaved weed (BLWeed) (m

2

)

Low(L) Medium(M) High(H)

[0,0.25] ]0.25,0.75[ [0.75,1]

Weed biomass (WBiomass) (m

2

)

Low(L) Medium(M) High(H)

[0,0.20] ]0.20,0.60[ [0.60,1]

Yield loss (YLoss) (output) (m

2

)

Low(L) Medium(M) High(H)

[0,0.15] ]0.15,0.45[ [0.45,1]

Weed seed

Weed coverage

Narrow-leaved

weed

Broad-leaved

weed

Total weed

Weed

biomass

Weed seed

patches

Yield

loss

Fig.9.The expert-based Bayesian network classiﬁer to infer the risk of infestation.

2

http://genie.sis.pitt.edu

G.M.Bressan et al./Engineering Applications of Artiﬁcial Intelligence 22 (2009) 579–592 587

suitable representation,such as classiﬁcation rules.As the

standard propositional if–then classiﬁcation rule is the simplest

and most comprehensive way to represent a classiﬁcation

procedure,it has been adopted by the BayesRule method

(Hruschka et al.,2008),which implements the translation process.

3.2.4.Pruning strategy

As stated in the literature,there are many rule interestingness

metrics such as support,conﬁdence,lift,correlation,collective

strength,etc.Such metrics are often used to determine the more

relevant rules from a rule set (as those implemented by pruning

strategies,for instance).Many of these measures,however,

provide conﬂicting information about the interestingness of a

pattern.Therefore,the best metric to use for a given application

domain is hard to deﬁne.

It is not claimed that the rule estimated probability given by

the BC is the best measure to be used in a rule set pruning task,as

long as different measures have different intrinsic properties.

However,it is an easy measure to be implemented as well as

understood and contributed (as the experiments showed) for

helping pruning.See Tan et al.(2002) for a more detailed

description of properties of some of the most commonly used

rule interestingness measures.

In the experiments described in this paper,focusing on the

weed infestation domain,a pruning strategy based on the rule

estimated probability given by the BC is proposed to reduce the

number of rules in each rule set when the Markov blanket

strategy was unable to reduce the number of rules.The pruning

strategy is based on a very simple idea and is mainly motivated by

the fact that it can be applied without any extra computation

effort.Considering the rule set as an ordered (based on the

estimated probability) list,the pruning can be done by taking into

account only the rules having estimated probability higher than a

predeﬁned threshold.When pruning is applied,the number of

rules tends to be smaller and the comprehensibility tends to be

higher.On the other hand,having fewer rules may imply in having

a less detailed overview of the problem (with fewer rules and

fewer antecedents).Thus,the tradeoff between accuracy and

complexity is a very important issue to be analyzed in each

speciﬁc application domain.

4.Collaborative classiﬁers results

Using an expert-based Bayesian and a naı¨ve Bayes classiﬁers,

the numerical parameters of the classiﬁers were obtained using

the Genie software.The BayesRule method was then used in

conjunction with each classiﬁer in order to extract the corre-

sponding classiﬁcation rules to infer the infestation risk.In order

to do that,the values of all features were discretized in the

conditions of each rule as in Table 5,except x

4

,which was inferred

from the weed biomass as a categorical variable thus having the

same intervals as the weed biomass.

The number of rules represents all the variable combinations

and their categorical variables.Each rule has an associated value

that represents the probability of its class value,given the values

of its antecedent variables.Using a 10-fold cross validation

procedure,10 Bayesian networks were trained using 10 different

training sets and the extracted rules were evaluated using each of

the 10 corresponding testing sets.The same testing sets were used

to evaluate the extracted rules with and without pruning from

both the expert-based and naı¨ve networks of the collaborative

system.In the pruning for each one of 10 cross validation sets,the

rules with probability below a certain threshold,which were

generated from an a priori probability of the class variable

obtained from the numerical parameters of the network,were

removed and a default class was introduced.The most probable

value for the class variable was taken as the categorical value

medium (M).The default class named D is then deﬁned as the

most frequent class.

Considering that a 10-fold cross validation strategy was used in

the experiments,only one of the 10 testing sets was chosen to be

presented in the paper for each classiﬁer.The remaining fold

results are obtained in the same way.In what follows,the results

for both networks of the collaborative system used to infer the

risk of infestation are presented.

4.1.Competitiveness weed–crop classiﬁcation results

The inference for the competitiveness of weed–crop is

performed by the ﬁrst classiﬁcation task in the collaborative

system.The BayesRule method extracted a set of rules fR

1

;...;R

r

g

with r ¼ 27 probabilistic rules from each Bayes classiﬁer (three

variables,each having three possible values).

The evaluation results obtained for one of the 10 testing set,

including the accuracy and the corresponding class probability,

are shown in Table 6.For this case,the rules are 50% in agreement

with the testing set,since 3 out of 6 data instances were correctly

classiﬁed.Table 7 shows the pruned Bayesian rule set,which

presents rules with probability above a threshold of 70% as well as

the default rule D and Table 8 shows the results of the testing set

using the pruned rule set of Table 7.For this testing set,rules 9

and 21 were replaced by the default rule and the pruned rule set

was 83.33% in agreement with the testing set,since 5 out of 6

instances were correctly classiﬁed.In this particular modeling,the

classiﬁcation rate has improved.For all the 10 testing set cases,

the 71 data instances were tested.The results indicate 63.39% of

agreement,since 45 of 71 testing data were correctly classiﬁed.

By replacing the rules with probability less than 70% by the

default rule D,this percentage became 64.79%,since 46 out of 71

testing instances were correctly classiﬁed.These results are

shown in Table 9.Table 10 shows the results for one testing set

when considering the Bayesian rule set extracted from the naı¨ve

Bayes classiﬁer which reveal that the rules are 50% in agreement

ARTICLE IN PRESS

Yield

loss

Weed seed

Weed coverage

Weed seed

patches

Competitiveness

Fig.10.The naı¨ve Bayes classiﬁer to infer the risk of infestation.

Table 6

Competitiveness expert-based Bayesian network testing data set results for the

rules.

BLWeed NLWeed TWeed WBiomass R

r

Test P(Rule R

r

jX

1;J

1

;X

2;J

2

;X

3;J

3

) (%)

H M M H R

9

Incorrect 52

M H M M R

21

Incorrect 55

H L M M R

6

Correct 71

H L M M R

6

Correct 71

H M M M R

9

Incorrect 52

M M L L R

26

Correct 80

G.M.Bressan et al./Engineering Applications of Artiﬁcial Intelligence 22 (2009) 579–592588

with the testing set,since 3 out of 6 instances were correctly

classiﬁed.Table 11 shows the pruned Bayesian rule set and

Table 12 shows the results of the testing set using the pruned rule

set.For this testing set,rules 3,13,25 and 27 were replaced by the

default rule and the pruned rule set was 66.66% in agreement

with the testing set,since 4 out of 6 data instances were correctly

classiﬁed.In this particular modeling,the classiﬁcation has also

improved.

The results of the classiﬁcation for all testing sets using other

thresholds for pruning the rule sets of the expert-based and naı¨ve

Bayesian classiﬁers are also presented in Table 9.

4.2.Risk of infestation classiﬁcation results

The second classiﬁcation task was performed in order to

generate a set of classiﬁcation rules to infer the risk of infestation.

As in the case of the competitiveness,an expert-based Bayesian

network classiﬁer and a naı¨ve Bayes classiﬁer were considered.As

before,the numerical parameters were deﬁned using the Genie

software.The BayesRule method was then used in conjunction

with each classiﬁer in order to extract the corresponding

classiﬁcation rules to infer the infestation risk.The BayesRule

method extracted a set of 27 probabilistic rules fromeach expert-

based Bayesian network classiﬁer as shown in Table 13.

By considering the Bayesian rule set extracted fromthe expert-

based network,Table 14 displays the results obtained for the test

set illustrated here.The rules presented 89% of agreement,given

that 8 of 9 tested instances were correctly classiﬁed.By

considering the Bayesian rule set extracted from the naı¨ve Bayes

classiﬁer,the results are shown in Table 15.

As before,the rules with probability below 70% were removed

and a default class D also taken as the categorical value M was

used.The pruning strategy was applied after the Markov blanket

had reduced the number and the complexity (regarding condi-

tions in their antecedent part) of the classiﬁcation rules.Thus,the

pruning strategy was applied to the rules shown in Table 13 and

the reduced set of rules are shown in Table 16.

By considering the pruned Bayesian rule set extracted fromthe

expert-based network,Table 17 displays the results obtained

again for the test set illustrated here.The rules presented,as

before,89% of agreement.By considering the pruned Bayesian rule

set extracted fromthe naı¨ve Bayes classiﬁer,the results are shown

in Table 18.The rules presented 88.9% of agreement.For all the 10

testing set cases,the results are shown in Table 19.Also,to verify if

the rules set have a positive impact on the results,the results

obtained using the default class D in the all 10 testing set cases

ARTICLE IN PRESS

Table 7

Pruned expert-based Bayesian rule set using the default rule D with a threshold of

probability 0.7.

1 If (BLWeed is H) and (NLWeed is H) and (TWeed is H) then WBiomass is H (0.72)

4 If (BLWeed is H) and (NLWeed is L) and (TWeed is H) then WBiomass is M(1.00)

6 If (BLWeed is H) and (NLWeed is L) and (TWeed is M) then WBiomass is M(0.72)

7 If (BLWeed is H) and (NLWeed is M) and (TWeed is H) then WBiomass is H (0.83)

11 If (BLWeed is L) and (NLWeed is H) and (TWeed is L) then WBiomass is M(1.00)

19 If (BLWeed is M) and (NLWeed is H) and (TWeed is H) then WBiomass is L (1.00)

23 If (BLWeed is M) and (NLWeed is L) and (TWeed is L) then WBiomass is L (0.79)

24 If (BLWeed is M) and (NLWeed is L) and (TWeed is M) then WBiomass is L (0.72)

26 If (BLWeed is M) and (NLWeed is M) and (TWeed is L) then WBiomass is L (0.80)

27 If (BLWeed is M) and (NLWeed is M) and (TWeed is M) then WBiomass is M

(0.80)

D Otherwise WBiomass is M (1.00)

Table 8

Pruned expert-based Bayesian network testing data set results using the default

rule D with a threshold of probability 0.7.

BLWeed NLWeed TWeed WBiomass R

r

Test P(Rule R

r

jX

1;J

1

;X

2;J

2

;X

3;J

3

) (%)

H M M H R

D

Incorrect 100

M H M M R

D

Correct 100

H L M M R

6

Correct 72

H L M M R

6

Correct 72

H M M M R

D

Correct 100

M M L L R

26

Correct 80

Table 9

Competitiveness classiﬁcation results with expert-based and naı¨ve Bayesian networks for all 10 folds.

Expert-based Naı¨ve

Accuracy (%) Number of rules Accuracy (%) Number of rules

Markov blanket rules set 63.39 27 57.75 27

Pruned rules set

Threshold ¼ 60% 64.79 12 60.56 15

Threshold ¼ 70% 64.79 11 61.97 9

Threshold ¼ 80% 60.65 7 60.56 3

Threshold ¼ none 60.00 1 60.50 1

Table 10

Naı¨ve Bayes testing data set results for the rules.

BLWeed NLWeed TWeed WBiomass R

r

Test P(Rule R

r

jX

1;J

1

;X

2;J

2

;X

3;J

3

Þ (%)

H M H H R

3

Incorrect 62

L H L M R

13

Correct 52

M M M M R

27

Correct 48

M H M M R

25

Incorrect 50

M M M M R

27

Correct 48

H L M L R

20

Incorrect 91

Table 11

Pruned naı¨ve Bayes rule set with a threshold of probability 0.7.

2 If (BLWeed is H) and (NLWeed is L) and (TWeed is H) then WBiomass is M(0.80)

5 If (BLWeed is L) and (NLWeed is L) and (TWeed is H) then WBiomass is M(0.73)

18 If(BLWeed is M) and (NLWeed is M) and (TWeed is L) then WBiomass is L (0.71)

19 If (BLWeed is H) and (NLWeed is H) and (TWeed is M) then WBiomass is M

(0.76)

20 If (BLWeed is H) and (NLWeed is L) and (TWeed is M) then WBiomass is M

(0.91)

21 If (BLWeed is H) and (NLWeed is M) and (TWeed is M) then WBiomass is M

(0.78)

23 If (BLWeed is L) and (NLWeed is L) and (TWeed is M) then WBiomass is M(0.85)

D Otherwise WBiomass is M (1.00)

G.M.Bressan et al./Engineering Applications of Artiﬁcial Intelligence 22 (2009) 579–592 589

whatever the rule antecedent is,and also the results for

thresholds of 60%,70% and 80% are showed in Table 19.

5.Conclusions

This work explores Bayesian network based methods to infer

the risk of weed infestation in a corn-crop.The proposed inference

systemis implemented as a collaboration between two classiﬁca-

tion tasks.The ﬁrst one infers the competitiveness (expressed by

the biomass) of weeds and the second infers the risk of infestation

(expressed by the yield loss),using as input the inferred

competitiveness,the weed seed density,weed coverage and weed

seed patches.The last three features are inferred fromkriging and

image objects.For both classiﬁcation tasks,two different Bayesian

network structures,a naı¨ve Bayes and an expert-based network

structures,were used for comparison purposes.The numeric

parameters of both Bayesian models were learned from the

empirical data collected from a corn-crop ﬁeld.

A hybrid approach,implemented by the BayesRule method,

which articulates Bayes and categorical rules,was used to

improve the model’s understandability,by extracting classiﬁca-

tion rules from each model.The Markov blanket concept was

used in the BayesRule method to reduce the number and the

complexity of classiﬁcation rules.When pruning is applied,the

number of rules tends to be smaller and the comprehensibility

tends to be higher.On the other hand,having fewer rules may

imply having a less detailed overview of the problem(with fewer

rules and fewer antecedents).Thus,the trade off between

accuracy and complexity is a very important issue to be analyzed

in each speciﬁc application domain.

In this work,for the expert-based network,the Markov blanket

concept was sufﬁcient to prune the rule set efﬁciently,since the

results indicate 72.5% and 66.3% of agreement without and with

the pruning strategy,respectively.In addition,the results reveal

that the expert-based Bayesian network classiﬁer yields a higher

accuracy than the naı¨ve Bayes classiﬁer.In the former,

the application of the pruning strategy made no difference in

the results.The strong and unrealistic assumption (that all the

features are independent given the class) which is an intrinsic

aspect of any naı¨ve Bayes classiﬁer may have contributed to this

behavior.It is worthwhile mentioning that the results presented

are speciﬁc to a particular crop ﬁeld,subject to the conditions

described in Section 3.1.Further work includes the use of

extensive simulations and experiments to generalize the obtained

results.It is also worth looking into the use of the proposed

pruning strategy in other domains in order to conﬁrm its

relevance.

ARTICLE IN PRESS

Table 13

Expert-based Bayesian rules set for the risk of infestation.

1 If (WSeed is H) and (WCompetitiveness is H) and (WSPatch is G) then YLoss is H (0.66)

2 If (WeedSeed is H) and (WCompetitiveness is H) and (WSPatch is S) then YLoss is H (0.50)

3 If (WeedSeed is H) and (WCompetitiveness is H) and (WSPatch is R) then YLoss is H (0.47)

4 If (WeedSeed is H) and (WCompetitiveness is L) and (WSPatch is G) then YLoss is H (0.41)

5 If (WeedSeed is H) and (WCompetitiveness is L) and (WSPatch is S) then YLoss is M (0.72)

6 If (WeedSeed is H) and (WCompetitiveness is L) and (WSPatch is R) then YLoss is M (0.48)

7 If (WeedSeed is H) and (WCompetitiveness is M) and (WSPatch is G) then YLoss is M (0.36)

8 If (WSeed is H) and (WCompetitiveness is M) and (WSPatch is S) then YLoss is M (0.51)

9 If (WSeed is H) and (WCompetitiveness is M) and (WSPatch is R) then YLoss is M (0.77)

10 If (WSeed is L) and (WCompetitiveness is H) and (WSPatch is G) then YLoss is H (0.66)

11 If (WSeed is L) and (WCompetitiveness is H) and (WSPatch is S) then YLoss is M (0.61)

12 If (WSeed is L) and (WCompetitiveness is H) and (WSPatch is R) then YLoss is H (0.93)

13 If (WSeed is L) and (WCompetitiveness is L) and (WSPatch is G) then YLoss is L (0.41)

14 If (WSeed is L) and (WCompetitiveness is L) and (WSPatch is S) then YLoss is M (0.84)

15 If (WSeed is L) and (WCompetitiveness is L) and (WSPatch is R) then YLoss is L (0.54)

16 If (WSeed is L) and (WCompetitiveness is M) and (WSPatch is G) then YLoss is M (0.36)

17 If (WSeed is L) and (WCompetitiveness is M) and (WSPatch is S) then YLoss is M (0.55)

18 If (WSeed is L) and (WCompetitiveness is M) and (WSPatch is R) then YLoss is M (0.92)

19 If (WSeed is M) and (WCompetitiveness is H) and (WSPatch is G) then YLoss is H (0.50)

20 If (WSeed is M) and (WCompetitiveness is H) and (WsPatch is S) then YLoss is M (0.67)

21 If (WSeed is M) and (WCompetitiveness is H) and (WSPatch is R) then YLoss is H (1.00)

22 If (WSeed is M) and (WCompetitiveness is L) and (WSPatch is G) then YLoss is M (0.84)

23 If (WSeed is M) and (WCompetitiveness is L) and (WSPatch is S) then YLoss is M (1.00)

24 If (WSeed is M) and (WCompetitiveness is L) and (WSPatch is R) then YLoss is M (0.58)

25 If (WSeed is M) and (WCompetitiveness is M) and (WSPatch is G) then YLoss is M (0.62)

26 If (WSeed is M) and (WCompetitiveness is M) and (WSPatch is S) then YLoss is M (0.500)

27 If (WSeed is M) and (WCompetitiveness is M) and (WSPatch is R) then YLoss is H (1.00)

Table 12

Pruned naı¨ve Bayes testing data set results for the rules with a threshold of

probability 0.7.

BLWeed NLWeed TWeed WBiomass R

r

Test P(Rule R

r

jX

1;J

1

;X

2;J

2

;X

3;J

3

Þ (%)

H M H H R

D

Incorrect 100

L H L M R

D

Correct 100

M M M M R

D

Correct 100

M H M M R

D

Correct 100

M M M M R

D

Correct 100

H L M L R

20

Incorrect 91

Table 14

Expert-based Bayesian network testing data set results for the infestation risk.

WSeed WSPatch WCompetitiveness YLoss R

r

Test P(Rule

R

r

jV

1;J

1

;V

2;J

2

;V

3;J

3

;V

4;J

4

Þ

(%)

L S B M R

14

Correct 84

L S M M R

17

Correct 55

L S B M R

14

Correct 84

L S M M R

17

Correct 55

M R M M R

27

Incorrect 100

L S B M R

14

Correct 84

L S H M R

11

Correct 61

L S H M R

11

Correct 61

M R H H R

21

Correct 100

G.M.Bressan et al./Engineering Applications of Artiﬁcial Intelligence 22 (2009) 579–592590

ARTICLE IN PRESS

Table 15

Naı¨ve Bayes classiﬁer testing data set results for the infestation risk.

WCoverage WSeed WSPatch WCompetitiveness YLoss R

r

Test P(Rule R

r

jV

1;J

1

;V

2;J

2

;V

3;J

3

;V

4;J

4

Þ (%)

Th L L S M R

41

Correct 82

Th L L S M R

41

Correct 82

Th L L S M R

40

Correct 61

Th L L R M R

42

Correct 70

Th L L R M R

42

Correct 70

Th L L S M R

41

Correct 82

Th L L R H R

42

Incorrect 70

A L L R H R

51

Correct 75

A L L G H R

49

Correct 82

Table 16

Pruned expert-based Bayesian rules set for the risk of infestation.

5 If (WSeed is H) and (WCompetitiveness is B) and (WSPatch is S) then YLoss is M (0.72)

9 If (WSeed is H) and (WCompetitiveness is M) and (WSPatch is R) then YLoss is M (0.77)

12 If (WSeed is L) and (WCompetitiveness is H) and (WSPatch is R) then YLoss is H (0.93)

14 If (WSeed is L) and (WCompetitiveness is B) and (WSPatch is S) then YLoss is M (0.84)

18 If (WSeed is B) and (WCompetitiveness is M) and (WSPatch is R) then YLoss is M (0.92)

21 If (WSeed is M) and (WCompetitiveness is H) and (WSPatch is R) then YLoss is H (1.00)

22 If (WSeed is M) and (WCompetitiveness is L) and (WSPatch is G) then YLoss is M (0.84)

23 If (WSeed is M) and (WCompetitiveness is L) and (WSPatch is S) then YLoss is M (1.00)

27 If (WSeed is M) and (WCompetitiveness is M) and (WSPatch is R) then YLoss is H (1.00)

D Otherwise YLoss is M (1.00)

Table 17

Pruned expert-based Bayesian network testing data set results for the infestation risk with a threshold of probability 0.7.

WSeed WSPatch WCompetitiveness YLoss R

r

Test P(Rule R

r

jX

1;J

1

;X

2;J

2

;X

3;J

3

;X

4;J

4

Þ (%)

L S L M R

14

Correct 84

L S M M R

D

Correct 100

L S L M R

14

Correct 84

L S M M R

D

Correct 100

M R M M R

27

Incorrect 100

L S L M R

14

Correct 84

L S H M R

D

Correct 100

L S H M R

D

Correct 100

M R H H R

21

Correct 100

Table 18

Pruned naı¨ve Bayes classiﬁer testing data set results for the infestation risk with a threshold of probability 0.7.

WCoverage WSeed WSPatch WCompetitiveness YLoss R

r

Test P(Rule R

r

jX

1;J

1

;X

2;J

2

;X

3;J

3

;X

4;J

4

Þ (%)

Th L L S M R

41

Correct 82

Th L L S M R

41

Correct 82

Th L L G M R

D

Correct 100

Th L L R M R

42

Correct 70

Th L L R M R

42

Correct 70

Th L L P M R

41

Correct 82

Th L L R H R

42

Incorrect 70

A L L R H R

51

Correct 75

A L L G H R

49

Correct 82

Table 19

Risk classiﬁcation results with expert-based and naı¨ve Bayesian networks for all 10 folds.

Expert-based Naı¨ve

Accuracy (%) Number of rules Accuracy (%) Number of rules

Markov blanket rules set 72.5 27 71.4 81

Pruned rules set

Threshold ¼ 60% 63 15 69.3 68

Threshold ¼ 70% 66.3 10 71.4 52

Threshold ¼ 80% 66.3 8 66.3 44

Threshold ¼ none 65.3 1 66.32 1

G.M.Bressan et al./Engineering Applications of Artiﬁcial Intelligence 22 (2009) 579–592 591

Acknowledgments

This work was partially supported by the Coordenac-a˜o de

Aperfeic-oamento de Pessoal de Nı´vel Superior (CAPES) under the

Programa Nacional de Cooperac-a˜o Acadeˆmica (PROCAD),the

Conselho Nacional de Desenvolvimento Cientı´ﬁco e Tecnolo´ gico

(CNPq) and Fundac-a˜o de Amparo a`Pesquisa do Estado de Sa˜o

Paulo (FAPESP).We thank Dr.De

´

cio KaramfromEmbrapa Milho e

Sorgo,Sete Lagoas,MG,for helping to deﬁne the Bayesian network

structures and for providing the data used in the experiments

described in this paper.

References

Aitkenhead,M.J.,Dalgetty,I.A.,Mullins,C.E.,McDonald,A.J.S.,Strachan,N.J.C.,

2003.Weed and crop discrimination using image analysis and artiﬁcial

intelligence methods.Computers and Electronics in Agriculture 39 (3),

157–171.

Banerjee,S.,Johnson,G.A.,Schneider,N.,Durgan,B.R.,2005.Modelling replicated

weed growth data using spatially-varying growth curves.Environmental and

Ecological Statistics 12 (4),357–377.

Bressan,G.M.,Koenigkan,L.V.,Oliveira,V.A.,Cruvinel,P.E.,Karam,D.,2008.A

classiﬁcation methodology for the risk of weed infestation using fuzzy logic.

Weed Research 48 (5),470–479.

Bressan,G.M.,Oliveira,V.A.,Hruschka,E.R.J.,Nicoletti,M.C.,2007a.Biomass based

weed–crop competitiveness classiﬁcation using Bayesian networks.In:

Seventh International Conference on Intelligent Systems Design and Applica-

tions,IEEE Press,Rio de Janeiro,pp.121–126.

Bressan,G.M.,Oliveira,V.A.,Hruschka,E.R.J.,Nicoletti,M.C.,2007b.A probability

estimation based strategy to optimize the classiﬁcation rule set extracted from

Bayesian network classiﬁers.In:VIII Simpo´ sio Brasileiro de Automac- ao

Inteligente,Floriano´ polis,paper ID 30651-1.

Brooker,P.I.,1979.Kriging.Engineering and Mining Journal 180 (9),148–153.

Cheng,J.,Greiner,R.,Kelly,J.,Bell,D.,Liu,W.,2002.Learning Bayesian networks

from data:an information-theory based approach.Artiﬁcial Intelligence 137

(1),43–90.

Cousens,R.,Mortimer,M.,1995.Dynamics of Weed Populations.Cambridge

University Press,Cambridge,UK.

Domingos,P.,Pazzani,M.,1997.On the optimality of the simple Bayesian classiﬁer

under zero–one loss.Machine Learning 29 (2–3),103–130.

Duda,R.O.,Hart,P.E.,1973.Pattern Classiﬁcation and Scene Analysis.Wiley,

New York.

Faechner,T.,Norrena,K.,Thomas,A.G.,Deutsch,C.V.,2002.A risk-qualiﬁed

approach to calculate locally varying herbicide application rates.Weed

Research 42 (6),476–485.

Firbank,L.G.,Watkinson,A.R.,1985.A model of interference within plant

monocultures.Journal of Theoretical Biology 116 (2),291–311.

Friedman,N.,Geiger,D.,Goldszmidt,M.,1985.Bayesian network classiﬁers.

Machine Learning 29 (1),131–163.

Gonzalez,R.C.,Woods,R.E.,2002.Digital Image Processing,second ed.Prentice-

Hall,Upper Saddle River,NJ.

Granitto,P.M.,Navone,H.D.,Verdes,P.F.,Ceccatto,H.A.,2002.Weed seeds

identiﬁcation by machine vision.Computers and Electronics in Agriculture

33 (2),91–103.

Granitto,P.M.,Verdes,P.F.,Ceccatto,H.A.,2005.Large-scale investigation of weed

seed identiﬁcation by machine vision.Computers and Electronics in Agricul-

ture 47 (1),15–24.

Heckerman,D.,Chickering,D.M.,Meek,C.,Rounthwaite,R.,Kadie,C.,2000.

Dependency networks for inference,collaborative ﬁltering,and data visualiza-

tion.Journal of Machine Learning Research 1 (1),49–75.

Hock,S.M.,Knezevic,S.Z.,Martin,A.,Lindquist,J.L.,2006.Soybean rowspacing and

weed emergence time inﬂuence weed competitiveness and competitive

indices.Weed Science 1 (54),38–46.

Hruschka,E.,Nicoletti,M.,Oliveira,V.,Bressan,G.M.,2008.BayesRule:a Markov-

blanket based procedure for extracting a set of probabilistic rules fromBayesian

classiﬁers.International Journal of Hybrid Intelligent Systems 5 (2),83–96.

Hughes,G.,Madden,L.V.,2003.Evaluating predictive models with application in

regulatory policy for invasive weeds.Agricultural Systems 76 (2),755–774.

Isaaks,E.H.,Srivastana,R.M.,1989.An Introduction to Applied Geostatistics.Oxford

University Press,New York.

Iwashita,F.,Landim,P.B.,2003.GEOMATLAB:geostatistics using MATLAB

(in Portuguese).Instituto de Geologia e Cieˆncias Exatas,Universidade Estadual

Paulista (UNESP),Rio Claro,SP,pp.1–17,Texto Dida´ tico 12.

Jurado-Expo´ sito,M.,Lo´ pez-Granados,F.,Garcı´a-Torres,L.,Garcı´a-Ferrer,A.,

Sanche´ z de la Orden,M.,Atenciano,S.,2003.Multi-species weed spatial

variability and site-speciﬁc management maps in cultivated sunﬂower.Weed

Science 51 (3),319–328.

Jurado-Expo´ sito,M.,Lo´ pez-Granados,F.,Gonza´ lez-Andujar,J.L.,Garcı´a-Torres,L.,

2004.Spatial and temporal analysis of Convolvulus arvensis L.populations over

four growing seasons.European Journal of Agronomy 21 (3),287–296.

Kropff,M.J.,Spitters,C.J.T.,1991.A simple model of crop loss by weed competition

from early observations on relative leaf area of the weeds.Weed Research 2

(31),97–107.

Marchant,J.A.,Onyango,C.M.,2003.Comparison of a Bayesian classiﬁer with a

multilayer feed-forward neural network using the example of plant/weed/soil

discrimination.Computers and Electronics in Agriculture 39 (1),3–22.

Moz

ˇ

ina,M.,Dems

ˇ

ar

,J.,Kattan,M.,Zupan,B.,2004.Nomograms for visualization of

naı

¨v

e Bayesian classiﬁer.In:Proceedings of the Eighth European Conference on

Principles and Practice of Knowledge Discovery in Databases,Pisa,Italy,pp.

337–348.

Oerke,E.C.,Dehne,H.W.,Schonbeck,F.,Weber,A.,1994.Crop Production and

Crop Protection.Estimated Losses in Major Food and Cash Crops.Elsevier,

Amsterdam.

Park,S.E.,Benjamin,L.R.,Watkinson,A.R.,2003.The theory and application of

plant competition models:an agronomic perspective.Annals of Botany 92 (6),

741–748.

Pearl,J.,1988.Probabilistic Reasoning in Intelligent Systems:Networks of Plausible

Inference.Morgan Kaufmann,San Mateo,CA.

Primot,S.,Valantin-Morison,M.,Makowski,D.,2006.Predicting the risk of weed

infestation in winter oilseed rape crops.Weed Research 46 (1),22–33.

Sakai,K.,2001.Nonlinear Dynamics and Chaos in Agricultural Systems.Develop-

ments in Agricultural Systems,Elsevier,Amsterdam,Netherlands.

Shiratsuchi,L.S.,2001.Mapping weed spatial variability using precision farming

tools (in Portuguese).Master’s Thesis,Escola Superior de Agricultura Luiz de

Queiroz,Universidade de S ao Paulo,Piracicaba,SP.

Smith,A.M.,Blackshaw,R.E.,2002.Crop/weed discrimination using remote

sensing.Geoscience and Remote Sensing Symposium 4,1962–1964.

Tan,P.,Kumar,V.,Srivastava,J.,2002.Selecting the right interestingness measure

for association patterns.In:Eighth ACM SIGKDD International Conference on

Knowledge Discovery and Data Mining,Edmonton,Alberta,Canada,pp.32–41,

hDOI ¼ http://doi.acm.org/10.1145/775047.775053i.

Tversky,A.,Kahneman,D.,1974.Judgment under uncertainty:heuristics and

biases.Science 185 (4157),1124–1131.

Wallinga,J.,Groeneveld,R.M.W.,Lotz,L.A.P.,1998.Measures that describe weed

spatial patterns at different levels of resolution and their applications for patch

spraying of weeds.Weed Research 38 (5),351–359.

Wilkerson,G.G.,Wiles,L.J.,Bennett,A.C.,2002.Weed management decision

models:pitfalls,perceptions,and possibilities of the economic threshold

approach.Weed Science 50 (4),411–422.

ARTICLE IN PRESS

G.M.Bressan et al./Engineering Applications of Artiﬁcial Intelligence 22 (2009) 579–592592

## Comments 0

Log in to post a comment