Effective of GA-NBC Using FAT

odecrackΤεχνίτη Νοημοσύνη και Ρομποτική

29 Οκτ 2013 (πριν από 3 χρόνια και 8 μήνες)

62 εμφανίσεις

Effective of GA



Mathematics Department

Faculty of Science, Helwan University,

Ain Helwan, Cairo,




This paper introduces a novel hybrid algorithm tha
t combines
, Genetic Algorithms

(GA) with the Naïve Bayesian Classifier (NBC) to be employed in a Web filtering agent. The
NBC searches for a near optimal Bayesian profile using the power of GA.
A filtering
agent tool (FAT) has been developed to test the

effect of learning on the performance of a
filtering agent.

The algorithm had been tested using search engine results obtained from
AltaVista which proved that GA significantly enhances both precision and recall of the NBC.

: Information filterin
g, NBC, Genetic algorithms, Baldwin effect, Intelligent agents.

1 Introduction

As the information available on the World
Wide Web is expanding and as the number
of untrained users is increasing daily, tools
that help users make a quick selection of
mation they are interested in becomes
a must. These tools should be built in such
a way that relieves the user from
specifying his interests or articulating his
needs. Instead, these tools should be able
to deduce user needs automatically
through their int
eraction with him.

Research in autonomous agents offers a
candidate solution for this problem where
the agent acts as a personal assistant that
cooperates with the user to help him
achieve some task [7, 8].

A number of software agents have already
been de
veloped to help users in activities
associated with the Internet like e
manipulation [9,10], browsing [1,6,7],
retrieval and filtering [5,11,12,13,14].

The success of a personalized filtering
agent depends on how far it understands
user needs. Therefo
re, the selection of a
suitable learning algorithm has a great
impact on the performance of a filtering

In this work, a filtering agent tool (FAT)
has been developed to test the effect of
learning on the performance of a filtering
agent. The propo
sed algorithm (GA
was tested and compared with other
learning algorithms. It

has proved to be
effective in filtering web documents.

First, an overview of FAT is presented.
Second, the combination of evolution with
learning is justified. Third, the pr
algorithm is explained. Fourth, testing
results are depicted. Finally, concluding
remarks are outlined.

2 FAT: A Filtering Agent Tool

FAT is a testbed for experiminting with
the effect of the learning component of a
filtering agent on its performa
nce. It works
in two stages: training or profile
generation stage and classification or
testing. It is supposed to be used by an
expert who aims at specifying the best
algorithm and parameters for his/her

Fig. 1

Specifying Tra
ining & Test Sets

FAT goes through two basic phases to
generate a user profile from the given
dataset. First, it represents documents in
the form of feature vectors. Second, it
learns the user profile from the given
vectors by applying one of the machine
learning algorithms. In the classification
stage, the learned profile is used to classify
an incoming document as either interesting
or not to the user.

A user of FAT has to determine the data
set file and specify the number of training
and test instances
. The tool selects both
the training and test documents at random
out of the supplied data set documents (see
Figure 1).

Fig. 2
Controlling training set examples

Alternatively, a separate file containing
links for classified documents

may be
supplied to FAT for each of the training
and the test set. The user has full control
over both the test and the training sets. He
can display document files in a certain set,
add a new instance or remove an existing
one (see Figure2).

Fig. 3

Feature Extraction Stage

As Figure 3 illustrates, the user has to
specify parameters related to the feature

stage like the weighing metric
and the number of features. Users of FAT
have full control over the choice of the
g algorithm and their associated
parameters. As Figure 4 illustrates,
parameters associated with GA

like the mutation and crossover rates

may be tuned up.

Fig. 4

Profile Generation Stage

For the user to capture an overall v
about the comparative performance of the
different algorithms, charts are supplied
with the proposed tool. For example,
Figure 5 plots the precision of both NBC
and GA

3 Motivation for Genetic

The problem of building a user profile
may be viewed as a search process. It
involves searching over the large and
complex space of possible user profiles for
an “optimal” user profile (or a set of
profiles) that matches the user’s different
interests [13].

Fig. 5

Chart Win

Searching a large and dynamic space
usually involves a trade
off between two

exploitation and exploration
[4]. For example, a personalized filtering
agent has to exploit the currently available
user’s interests. In addition, it has to
ore further documents that could be of
potential interest to the user.

The idea of combining learning with
evolution has been tested before in
information filtering and has been shown
to give promising results. NewT(News
Tailor) [13,14] is a news filterin
g agent
that combines genetic algorithms with
learning from feedback to generate an
optimal user profile. In [2, 3], a personal
filtering agent as proposed. It employs a
hybrid learning algorithm that draws upon
genetic algorithms, neural networks, and
ric systems.

4 The Algorithm

This represents a novel combination of
GAs with Naïve Bayesian Classifier.
Bayesian learners represent probabilistic
inference algorithms that calculate
probabilities based on the statistical
information gathered from the sca
phase. The idea of this algorithm is to use
GAs to evolve a near optimal set of
Bayesian profiles. Using GAs, the
overhead involved in the calculation of the
statistical information in the scanning
phase is eliminated and all that is needed
is just a

list of the scanned vocabulary. In
other words, the learning stage is based on
evolution rather than on Bayesian learning
while the classification stage is done using
the traditional Naïve Bayesian classifier.

The chromosome represents a Naïve
Bayesian pr
ofile. Each gene represents one
probability and forms with the adjacent
gene the probability pair of one feature.
The probabilities are initially set at
random and must lie between zero and
one. This representation implies no change
on the

genetic operator
s: the crossover and
mutation operators are applied in the
traditional manner. Moreover, there is no
need for checking for redundant because
probabilities could have similar values.

There are two attributes attached to each
chromosome: age and score. Age r
efers to
the total number of documents classified
by this chromosome throughout its
lifetime while score refers to just the

Table 1 Basic Steps of GA

number of correctly classified documents.
Fitness is assigned according to the


5 Testing

The proposed algorithm has been
implemented and tested using different
data sets collected from the Web and
classified by the user into two sets :the hot
list and the list.

5.1 Data Set I

Contains documents representing daily
litical news about the Middle East that
were gathered from the CNN. A user was
asked to classify the news documents,
which resulted in 72% of the documents in
the hot list and the rest in the cold list.

5.2 Data Set II

Contains documents in the field of

astronomy. It was collected from Yahoo
catalogs. The hot list consists of 70% of
the documents. were observed to overlap
in key words as well as the main topics.

5.3 Data Set III

Contains documents from search results
obtained from Alta Vista

a web se
engine from Digital Equipment
Corporation. The user was searching about
Generate n chromosomes, each one having
k random probabilities representing the
Bayesian profile.


I=1 to number of generations

For each chromosome
in the population

Classify test documents using the
generated profile.

Assign fitness according to the
results of classification

Apply genetic operators to
generate a new population


Old population <



documents in the field of information
agents. She tried several queries like
retrieval agents and just in time
information retrieval (JITIR). For each
query results, the user read

and classified
only those documents whose tiltes and
summaries seemed to match her interests.

6 Results

The model was trained and tested for each
of the three data sets mentioned before.
The examples used for both training and
testing the agent were se
lected randomly
out of the specified data set to ensure that
the results are not biased. In addition, each
test is conducted twice or three times and
the average of the resulting points are
taken to ensure accuracy of the test results.

Fig. 6 Precision of both NBC and GA

Fig. 7 Recall of both NBC and GA

Figures 6, 7 show the effect of genetic
algorithms on the performance of NBC. It
is clear from the figures that the proposed
algorithm ou
tperforms the traditional NBC
in both precision and recall. It provides
about 30% increase after no more than 25

7 Conclusion

This paper describes FAT: a filtering agent
tool that has been developed to act as a
testbed for the effect of lear
ning and
evolution on the performance of a filtering
agent. A novel combination of genetic
algorithms with NBC has been proposed
and tested using FAT. Data sets of
different types were collected afrom the
Web and classified. Test results prove the
eness of GA
NBC in text filtering.
It also proves that GA
NBC gives
precision and

recall rates that exceeds
traditional NBC by about 30%.


[1] Balabanovic, M. and Shoham, Y.,
Learning Information Retrieval
Agents: Experiments with
Automated W
eb Browsing.
Proceedings of the AAAI Spring
Symposium on Information
gathering, Stanford, C.A
., 1995.

[2] Balace, P., Personal Information
Intake Filtering Bellcore.
Workshop on High
Filtering, 1991.

[3] Balace, P., Competitive Agents
for I
nformation Filtering.
Communications of the ACM

35(12),1992, pp.50

[4] Cheng, G.,
Genetic Algorithms &
Engineering Design
. John Wiley
& Sons, Inc., 1997.

[5] Khalifa, I.H., et., An Intelligent
Agent for Personalized
Information Filtering.
gs of the 7

International Conference on
Artificial Intelligence
, 1999.

[6]Lieberman, H., Letizia: An Agent
that Assists Web Browsing.
Proceedings of the International
Joint Conference on Artificial
Intelligence, Montreal, August

] Lieberman, H., Autonomous
Interface Agents.
In Proceedings
of the ACM conference on
Computers and Human Interface
. Atlanta, Georgia, 1997.

[8]Maes, P., Agents that Reduce
Work and Information Overload.
Communications of the ACM

37(7), 1994, pp.30

[9]Payne,T.R., et. al., Experience
with Rule Induction and K
Nearest Neighbor Methods for
Interface Agents that Learn.
IEEE Transactions on
Knowledge and Data
. 9(2) March

[10]Payne, T. and Edwards, P.,
Interface Agents that

Learn: An
Investigation of Learning Issues
in a Mail Agent Interface.
Applied Artificial Intelligence
11(1), 1997, pp.1

[11] Pazzani, M. et. al., Learning
from Hotlists and Coldlists.
Towards a WWW Information
Filtering and Seeking Agent.
In Proceedings of the 7

International conference on tools
with Artificial Intelligentce


,B. J.,
,P., Just

information retrieval agents
Systems Journal
.39(3&4) V
Media, 2000.


[13]Sheth, B. and P.Maes, Evolving
Agents for P
Information Filtering.
Proceedings of the 9

Conference on Artificial
Intelligence for Applications,
Orlando, Florida: IEEE
Computer Society Press
, 1993,

[14]Sheth, B.D., A Learning
Approach to Personalized
Information Filter
thesis, Department of Electrical
Engineering and Computer
Science, MIT
, 1994.