Spam Image Identification Using an Artificial Neural Network

cartcletchΤεχνίτη Νοημοσύνη και Ρομποτική

19 Οκτ 2013 (πριν από 4 χρόνια και 24 μέρες)

104 εμφανίσεις

Spam Image Identification Using
an Artificial Neural Network

Jason R. Bowling, Priscilla Hope
and Kathy J. Liszka

The University of Akron

2008 MIT Spam Conference

We know it’s bad…


2005


roughly 1% of all emails


mid 2006


rose to 21%

J. Swartz, “Picture this: A sneakier kind of spam,”

USA Today, Jul. 23, 2006.

The University of Akron


28,000,000 messages


24,000,000 identified as spam and
dropped

December 2007

Inspiration

FANN


Fast Artificial Neural Network Library


open source


adaptive, learn by example

(given good input)


input

hidden

output

Image Preparation


open source


converts from virtually any format to
another


tradeoffs

image2fann.cpp

input

images

training

data

150
×

150 pixel

8
-
bit grayscale

jpg images

500 22500 1

.128 .123 .156 .128 .156 .254 …

1

.156 .128 .128 .123 .156 .254 …

-
1

number of images

(input sets)

number of

input nodes

number of

output nodes

spam

ham

22,500

input

nodes

two layers

of hidden nodes

1 output

node

Training the Network


A fully connected back propagation neural
network.


Supervised learning paradigm.



Activation Function


Takes the inputs to a node, uses a weight
for each input and determines the weight
of the output from the node.

Steepness

0.0

0.5

1.0

Widrow and Nguyen’s algorithm


An even distribution of weights across
each input node’s active region.


Used at initialization.

Epoch


One cycle where the weights are adjusted
to match the output in the training file.

I’m spam!

I’m ham!

Learning Rate


Train to a desired error.


Step down the training rate at preset
intervals to avoid oscillation.

Training

22604 nodes in network


Max epochs 200. Desired error: 0.4

Epochs 1. Current error: 0.2800000012. Bit fail 56.

Learning rate is: 0.500000


Max epochs 5000. Desired error: 0.2000000030.

Epochs 1. Current error: 0.2800000012. Bit fail 56.

Epochs 20. Current error: 0.2800000012. Bit fail 56.

Epochs 40. Current error: 0.2251190692. Bit fail 56.

Epochs 60. Current error: 0.2074941099. Bit fail 65.

Epochs 71. Current error: 0.1479636133. Bit fail 48.

image2fann.cpp

train.c

test.c

ham

spam

input

images

training

data

FANN

572 Trained Images

75 hidden nodes

572 Trained Images

50 hidden nodes

Corpus

grayscale intensity

0
-

256

Scaling to number < 1

(divide by 1000)

training

data

limited to 0


0.25

Current Work


complete corpus


multipart images


separate ANNs


hidden nodes


color


image size


Priscilla Hope

Thank you!