Enterprise Miner for Coders - SAS

strangerwineΤεχνίτη Νοημοσύνη και Ρομποτική

19 Οκτ 2013 (πριν από 3 χρόνια και 9 μήνες)

112 εμφανίσεις


Copyright
©
2011,
SAS Institute Inc. All rights reserved.

“I Don’t Need Enterprise Miner


David Yeo, Ph.D.

SAS Institute (Canada) Inc.

2


Copyright ©
2011,
SAS Institute Inc. All rights reserved.

Overview


The Case Against Using Enterprise Miner.


The Case For Using Enterprise Miner.


Questions.

3


Copyright ©
2011,
SAS Institute Inc. All rights reserved.

The Case
Against

Using Enterprise Miner


The arguments for coding over using Enterprise Miner,
are typified by the following statements:



I
like
to code.”


“I don’t want to lose the time
invested developing my code
.”



My code has proven
reliable in past”.


“I understand what is going on in my code; I don’t fully

understand what is going on in Enterprise Miner.”

4


Copyright ©
2011,
SAS Institute Inc. All rights reserved.

The Case
For

Using Enterprise Miner


Intuitive “drag
-
and
-
drop” interface


Simplify tedious data
preparation tasks
.


Implement powerful advanced modeling techniques.


Integrate
decision theory
into
your decisions.


Incorporate
your favorite SAS programs and procedures.


Use Enterprise Miner as a
code generator
.

5


Copyright ©
2011,
SAS Institute Inc. All rights reserved.

Intuitive “Drag
-
and
-
Drop” Interface


Sensible defaults facilitate rapid model construction.


Extensive documentation
and context
-
sensitive
help.

6


Copyright ©
2011,
SAS Institute Inc. All rights reserved.

Simple Statistical Graphics


Offers an extensive range of plots including: histograms,
scatterplots, contour plots, and even 3
-
D rotating plots.


Often the graphs are fully interconnected.

7


Copyright ©
2011,
SAS Institute Inc. All rights reserved.

Automatic Design (Dummy
)
Coding



0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

1

Level



1

0

0

0

0

0

0

0


D
A

D
B

D
C

D
D

D
E

D
F

D
G

D
H

D
I



0

0

0

1

0

0

0

0



0

1

0

0

0

0

0

0



0

0

1

0

0

0

0

0



0

0

0

0

1

0

0

0



0

0

0

0

0

1

0

0



0

0

0

0

0

0

1

0



0

0

0

0

0

0

0

1



0

0

0

0

0

0

0

0

A

B

C

D

E

F

G

H

I

...


N
ominal and ordinal variables are
automatically

design
(a.k.a. dummy) coded for use in subsequent models.


Either ‘effect’
or
‘reference cell’ coding can be specified.

8


Copyright ©
2011,
SAS Institute Inc. All rights reserved.

Variable Selection


SAS Enterprise Miner offers an extensive set of variable
selection methods:

Sequential (stepwise) selection

Split search selection

Variable clustering

R
-
square or chi
-
square based selection

Variable
importance
in the
projection

9


Copyright ©
2011,
SAS Institute Inc. All rights reserved.

Missing Value Imputation


Synthetic (e.g
.
mean, mode).

Synthetic distribution



Estimation



x
i

=
f
(
x
1
, … ,
x
p
)


Estimation (e.g. distribution, decision tree).


10


Copyright ©
2011,
SAS Institute Inc. All rights reserved.

Variable Transformation


Simple (e.g. log) and advanced (e.g. optimal binning).

skewed
distribution





standard regression

true association

standard regression

true association

Original Scale

more
symmetric
distribution





Transformed
Scale

standard regression

standard regression

11


Copyright ©
2011,
SAS Institute Inc. All rights reserved.

Association Analysis


Forms simultaneous or sequential associations.

A

B

C

A

C

D

B

C

D

A

D

E

B

C

E

Rule

Support

Confidence

A
implies
D

2/5

2/3

C
implies
A

2/5

2/4

A
implies
C

2/5

2/3

B and C
implies D

1/5

1/3

12


Copyright ©
2011,
SAS Institute Inc. All rights reserved.

Decision Trees


Enterprise Miner implements all of the major decision
tree variants, i.e. CART, CHAID, and entropy
-
based.

13


Copyright ©
2011,
SAS Institute Inc. All rights reserved.

Consolidation Trees

x
1

J

ABCD

ABCDJ

HI

EFG

x
1

x
1

EFGHI

x
2

70%

HI

EFG

x



Level



1

0

0

0

0

0

0

0


D
A

D
B

D
C

D
D

D
E

D
F

D
G

D
H



0

0

0

1

0

0

0

0



0

1

0

0

0

0

0

0



0

0

1

0

0

0

0

0



0

0

0

0

1

0

0

0



0

0

0

0

0

1

0

0



0

0

0

0

0

0

1

0



0

0

0

0

0

0

0

1



0

0

0

0

0

0

0

0

A

B

C

D

E

F

G

H

I


Combines categorical levels that have a similar outcome.

14


Copyright ©
2011,
SAS Institute Inc. All rights reserved.

Neural Networks

...


PROC NEURAL is one of SAS’ most powerful
statistical

procedures (it’s a universal approximator)!







Available neural
network
architectures
include: MLP,
RBF, VQ
,
SOM, and functional
-
link networks.

hidden

layer

output

layer

input

layer

H
1

H
3

H
2

Y

x
1

x
2

15


Copyright ©
2011,
SAS Institute Inc. All rights reserved.

Combined Models


Perturb
and combine methodology (ensemble model
).









Combine class probability model and continuous
-
valued
prediction model (two
-
stage model
).

Combines
predictions from multiple models
to create a single consensus prediction.

16


Copyright ©
2011,
SAS Institute Inc. All rights reserved.

Prior Probability


Enterprise Miner applies prior probability information to
correct probability estimates for oversampling.

Decision/Action

0

1

0

1

Adjusted for
Priors

Actual Class

0

1

Decision/Action

17


Copyright ©
2011,
SAS Institute Inc. All rights reserved.

Profit Matrix


The profit matrix sets the optimal decision cutoff value.

^

^

p


0.68/15.82


solicit

p

<
0.68/15.82


ignore

15.14

solicit

ignore

primary

event

secondary

event

-
0.68

0

0


Bayesian
optimal decision threshold

1

1





d

d

d

d

-

-

+



FP

TN

FN

TP

p

^



solicit

18


Copyright ©
2011,
SAS Institute Inc. All rights reserved.

Conforming Profit


1/

1

solicit

ignore

primary

event

secondary

event

0

0

1/

0


If no profit matrix is available, use “conforming profit” to
properly set
the Bayesian optimal cutoff
value.

1

0

1

1









+



p

^



獯汩捩l



where

1

is the population proportion of the primary event,
and

0

is the proportion of the secondary event.

19


Copyright ©
2011,
SAS Institute Inc. All rights reserved.

Adding SAS Programs


A SAS Code node can
run
any

data
step
or licensed SAS
procedure right within the data flow diagram.







This allows
you to
add SAS procedures and custom code
not currently available as nodes in Enterprise
Miner.


It also means you do not have to give up your favorite and
familiar SAS programs
and
procedures!

Your

SAS code

goes here.

20


Copyright ©
2011,
SAS Institute Inc. All rights reserved.

Automated Model Assessment


Simultaneous
assessment of multiple models using
both
statistical and
graphical information.








Can
assess models either on training or holdout data.


Offers
a wide array of model selection options including:
ASE, c
-
statistic (ROC index), and misclassification rate
.

21


Copyright ©
2011,
SAS Institute Inc. All rights reserved.

Enterprise Miner as a Code Generator


The
entire data flow diagram can
be output as:


Base SAS code (SAS/STAT is
not

required)


HTML code


C
code

22


Copyright ©
2011,
SAS Institute Inc. All rights reserved.

Questions


Contact Information:

David Yeo, Ph.D.

SAS Institute (Canada) Inc.

416
-
307
-
4607

david.yeo@sas.com


Copyright ©
2011,
SAS Institute Inc. All rights reserved.