A LVQ-based neural network anti-spam email approach

appliancepartAI and Robotics

Oct 19, 2013 (3 years and 9 months ago)

71 views

1

A LVQ
-
based neural
network anti
-
spam
email approach


楊婉秀

教授

資管碩一

詹元順

94722001

2005/12/07

2

Outline


1
. Introduction


2. Email sample and data preprocessing


2.1 Email representation


2.2 Feature extraction


3. Anti
-
spam email LVQ model


3.1 Spam email category.


3.2 Learning vector quantization neural network model


3.3 Anti
-
spam email LVQ algorithm


3.4 Parameter setting


4. Experiments and result


5. Conclusion

3

1. Introduction(1/2)


Spam e
-
mail waste users time, money, network
bandwidth as well as, meanwhile, clutter users'
mailboxes, even be harmful, e.g. pornographic
content.


In America, spam emails make enterprises to be
loss up to 9 billions per year.


Without appropriate counter
-
measures, the
situation will continue worsening and spam email
will eventually undermine the usability of email.

4

1. Introduction(2/2)


Duhong Chen et al. compared four algorithms,
Bayes, decision tree, neural networks, Boosting,
and drew a conclusion that neural network
algorithm has higher performance.


Experiments have proved that the LVQ
-
based
anti
-
spare email filter has better performance
than Bayes
-

based and BP neural network.
-
based
approaches.

5

2. Email sample and data
preprocessing(1/2)

2.1 Email representation


TFIDFi=TFi
×

log (N/DFi) (1)


TFi

the frequency that word ti appears in
document d 2.2 Feature extraction


N

the total numbers of training documents


DFi

represents the numbers of documents
which contain word ti


6

2. Email sample and data
preprocessing(2/2)

2.2 Feature extraction






A

the numbers of emails which contain word t
and belong to class s


B

that of emails which contain word but not
belong to class s


C

that of emails which belong to class s but
not contain word t


N

the total email number in training corpus

7

3. Anti
-
spam email LVQ
model(1/5)

3.1 Spam email category.

8

3. Anti
-
spam email LVQ
model(2/5)

3.2 Learning vector quantization neural network
model


Th
e model is divided into two layers. The first
layer is competitive layer, in which each neuron
represents a subclass.


The second is output layer, in which each
neuron represents a class.

9

3. Anti
-
spam email LVQ
model(3/5)

3.3 Anti
-
spam email LVQ algorithm(1/2)

10

3. Anti
-
spam email LVQ
model(4/5)

3.3 Anti
-
spam email LVQ algorithm(2/2)

11

3. Anti
-
spam email LVQ
model(5/5)

3.4 Parameter setting

12

4. Experiments and
result(1/4)


This project makes use of email corpus from
http://www.spamassassin.org/publiccorpus, which
is open available source.


Select 1000 pieces e
-
mails randomly from the
corpus, including 580 spam e
-
mails, 420 legitimate
e
-
mails.

13

4. Experiments and
result(2/4)


Anti
-
spare email filter performance is often
measured in terms of spam precision (SP) and
sparn recall (SR).





14

4. Experiments and
result(3/4)


A criterion F1, which incorporates spam precision
and spare recall.

15

4. Experiments and
result(4/4)

16

5. Conclusion


Both neural network
-
based algorithms are usually
better than that based on Bayes.


LVQ
-
based method classify spam emails into
several subclasses in content so that the feature
words of each subclass of spam email is more
related and closer as well as characteristics of
each subclass of spam emails are easier to
identify.