Chapter 7 Neural Networks in Data Mining

cracklegulleyΤεχνίτη Νοημοσύνη και Ρομποτική

19 Οκτ 2013 (πριν από 3 χρόνια και 9 μήνες)

73 εμφανίσεις

Chapter 7

Neural Networks in Data Mining

Automatic Model Building

(Machine Learning)

Artificial Intelligence

結束

7
-
2

Contents

Describe neural networks as used in Data
mining

Reviews real applications of each model

Shows the application of models to larger data
sets

結束

7
-
3

High
-
Growth Product

There are some types of data where neural network
models usually outperform better when there are
complicated relationships (nonlinearity) in the data.

Used for

classifying data


target customers


bank loan approval


hiring


stock purchase


DATA MINING

Used for

prediction

結束

7
-
4

Neural Network

Neural networks are the most widely used method in
data mining.

The idea of neural networks was derived from how
neurons operate in the brain.

Real neurons are connected to each other, and accept
electrical charges across synapses and pass on the
electrical charge to other neighboring neurons.

ANN is usually arranged in at least three layers, have a
defined and constant structure to reflect complex
nonlinear relationships. (at least one hidden layer)

結束

7
-
5

Network

Input



Hidden



Output

Layer



Layers



Layer

Good

Bad

結束

7
-
6

Neural Network

For classification neural network models, the output layer has on
node for each classification category (true or false).

Each node is connected by an arc to nodes in the next layer.
These arcs have weights, which are multiplied by the value of
incoming nodes and summed.

Middle layer node values are the sum of incoming node values
multiplied by the arc weights.

ANN learn through feedback loops. Output is compared to
target values, and the difference between attained and target
output is fed back to the system to adjust the weights on arcs.

Measure fit


fine tune around best fit

結束

7
-
7

Neural Network

ANN can apply learned experience to new cases, for
decision, classifications, and forecasts.

ANN modeling should consider:


Input variable selection and manipulation


Select learning parameter, such as the no. of hidden
layers, learning rate, momentum, activation function…

About 95% of business applications were reported to
use multilayered feedforward neural network with
backpropagation learning rule.


Supervised learning


Each element in each layer is connected to all elements
of the next layer.

結束

7
-
8

Neural Network

Multilayered feedforward neural networks are
analogous to regression and discriminant analysis in
dealing with cases where training data is available.


Self
-
organizing map (SOM) is analogous to clustering
technique used there is no training data.


To classify data to maximize the similarity of patterns
within clusters while minimizing the similarity to
patterns of different clusters.


Kohonen SOM were developed to detect strong features
of large data sets.

結束

7
-
9

Neural Network Testing

Usually

train

on
part of available data


package tries weights until it successfully categorizes a selected
proportion of the training data

When trained,

test

model on

part of data


if given proportion successfully categorized, quits


if not, works some more to get better fit

The

model


is internal to the package

Model can be applied to new data

結束

7
-
10

Neural Network Process

1.
Collect data

2.
Separate into training, test sets

3.
Transform data to appropriate units


Categorical works better, but not necessary

4.
Select, train, & test the network


Can set number of hidden layers


Can set number of nodes per layer


A number of algorithmic options

5.
Apply (need to use system on which built)

結束

7
-
11

Loan Applications

Loan decision is repetitive and time consuming, and
every attempt should be made the decision that is
fair to the applicant while reducing the risk of
default to the lender.

1.
Data collection: sex, marital status, No. of
dependent children, occupation,


2.
Separating data: learning data (at least 100 sets) and
testing data (100 sets)

3.
Transform the inputs: ANN requires numeric data.
See page 125.

結束

7
-
12

Loan Applications

4.
Select, train and test the network:

1.
The number of middle layer nodes, transfer function,
learning algorithms.

2.
Too many hidden layer nodes results in the ANN
memorizing the input data, without learning a
generalizable pattern for the accurate analysis of new
data. Too few nodes, requires more training time and
result in less accurate models.

5.
Repeat step 1 through 4 until the prescribed tolerance
reached.

結束

7
-
13

Neural Nets to Predict Bankruptcy

Wilson & Sharda (1994)

Monitor firm financial performance

Useful to
identify internal problems
,
investment evaluation
,
auditing

Predict bankruptcy
-

multivariate discriminant analysis

of
financial ratios (develop formula of weights over independent
variables)

Neural network
-

inputs were 5 financial ratios
-

data from
Moody’s Industrial Manuals (129 firms, 1975
-
1982; 65 went
bankrupt)

Tested against discriminant analysis

Neural network significantly better

結束

7
-
14

Ranking Neural Network

Wilson (1994)

Decision problem
-

ranking


candidates for position, computer systems, etc.

INPUT
-

manager

s ranking of alternatives

Real decision
-

hire 2 sales people from 15
applicants

Each applicant scored by manager

Neural network took scores, rank ordered

best fit to manager of alternatives compared (AHP)

結束

7
-
15

Application results

結束

7
-
16

Application results

結束

7
-
17

Application results

結束

7
-
18

Exercise

Data coding refers to page 117.


Age


<20



0





20~50



(age
-
20)/30





> 50



1.0


State


CA



1.0





Rest



0


Degree


Cert



0





UG



0.5





Rest



1.0


Major


IS



1.0





Csci, Engr Sci


0.9





BusAd



0.7





Other



0.5





None



0


Experience

Max



Years/5


Minimal


2


Adequate


3