# One-Way ANOVA

Τεχνίτη Νοημοσύνη και Ρομποτική

25 Νοε 2013 (πριν από 4 χρόνια και 6 μήνες)

113 εμφανίσεις

Cluster Analysis

Grouping Cases or Variables

Clustering Cases

Goal is to cluster cases into groups based
on shared characteristics.

Start out with each case being a one
-
case
cluster.

The clusters are located in k
-
dimensional
space, where
k

is the number of variables.

Compute the squared Euclidian distance
between each case and each other case.

Squared Euclidian Distance

the sum across variables (from
i

= 1 to
v
)
of the squared difference between the
score on variable
i

for the one case (
X
i
)
and the score on variable
i

for the other
case (
Y
i)

2
1

v
i
i
i
Y
X
Agglomerate

The two cases closest to each other are
agglomerated into a cluster.

The distances between entities (clusters
and cases) are recomputed.

The two entities closest to each other are
agglomerated.

This continues until all cases end up in
one cluster.

What is the Correct Solution?

You may have theoretical reasons to
expect a certain
k

cluster solution.

Look at that solution and see if it matches

Alternatively, you may try to make sense
out of solutions at two or more levels of
the analysis.

Faculty Salaries

Subjects were faculty in Psychology at
ECU.

Variables were rank, experience, number
of publications, course load, and salary.

Data are at
ClusterAnonFaculty.sav

Also see
the statistical
output

Analyze, Classify, Hierarchical
Cluster

Statistics

Plots

Method

Save

Proximity Matrix

We did not request this, but if we had it
would display a measure of dissimilarity
for each pair of entities.

The pair of cases with the smallest
squared Euclidian distance are clustered.

Stage

Cluster
Combined

Coefficients

Cluster
1

Cluster
2

Cluster 1

1

32

33

.000

Look at the Agglomeration
Schedule
.

Cases 32 and 33 are clustered. They
are very similar (distance = 0.000)

Agglomeration Schedule

Stage

Cluster Combined

Coefficient
s

Stage Cluster First
Appears

Next
Stage

Cluster 1

Cluster 2

Cluster 1

Cluster 2

Cluster 1

Cluster 2

1

32

33

.000

0

0

9

2

41

42

.000

0

0

6

3

43

44

.000

0

0

6

4

37

38

.000

0

0

5

5

37

39

.001

4

0

7

6

41

43

.002

2

3

27

Steps 2 Through 5

Stages 2
-
5

The agglomeration schedule show that in
Stage 2 cases 41 and 42 are clustered.

In Stage 3 cases 43 and 44 are clustered.

In Stage 4 cases 37 and 38 are clustered.

In Stage 5 case 39 is added to the cluster
that contains cases 37 and 38.

And so on.

Vertical Icicle, Two Clusters

Look at the top of the display (next slide).

You can see two clusters

On the left Boris through Willy

On the right, Deanna through
Sunila

The 2 cluster solution was adjuncts versus
full time faculty.

Vertical Icicle, Three Clusters

Look at the icicle second highest white
bar.

Now there are three clusters

Junior faculty (Deanna through Mickey)

Senior faculty (Lawrence through Roslyn)

Vertical Icicle,
Four
Clusters

Look at
the white
bar furthest to the right.

Now there are four clusters

Junior faculty

The acting chair (Lawrence)

The rest of the senior
faculty
(Catalina
through Roslyn)

The
Dendogram

At the far right you can see the two cluster
solution.

The next step to the left shows the three
cluster solution.

The next step to the left shows the four
cluster solution.

And so on.

Truncated and rotated
dendogram

on next
slide.

Compare Two Clusters

The 2 cluster solution was adjuncts versus
everybody else.

Look at the
t

tests in the output

number of publications, course load, and
salary.

Compare Three Clusters

Look at the ANOVAs and plots.

The senior faculty had higher salary,
experience, rank, and number of pubs.

Compare
Four Clusters

The acting chair had a higher salary and
number of publications.

I Could Not Help Myself

With these data on hand, I could not resist
predicting salary from the other variables.

Salary was well correlated with Rank,
FTEs, Publications, and Experience.

In the multiple regression, only Rank and

The residuals suggest who was being
overpaid and who underpaid.

Split by Sex

For men, the unique effect of number of
publications was positive

more
publications, higher salary.

For women it was negative

more
publications, lower salary.

Curious.

Workaholism

Aziz &
Zickar

(2005)

Workaholics may be defined as those

High in
work involvement
,

High in
drive to work
, and

Low in
work enjoyment
.

For each case, a score was obtained for
each of these three dimensions.

The Three Cluster Solution

Workaholics

High work involvement

High drive to work

Low work enjoyment

Positively engaged workers

High work involvement

Medium drive to work

High work enjoyment

Unengaged workers

Low work involvement

Low drive to work

Low work enjoyment

Past research/theory indicated there
should be six clusters, but the theorized
six clusters were not obtained.

Clustering Variables

FactBeer.sav

The statistical output
.

Analyze, Classify, Hierarchical Cluster

Statistics

Plots

Method

Proximity Matrix

Is simply the
intercorrelation

matrix

The two most correlated variables are
Color
and
Aroma
(
r

= .909
)

they are
clustered on the first step.

Stage 2:

Size
and
Alcohol
(
r

= .904
) are
clustered.

Stage 3: Taste added to the cluster that

Also See Other Tables & Plots

Stage 4: Cost added to the cluster that