More on Choosing #Clusters in General

dealerdeputyAI and Robotics

Nov 25, 2013 (3 years and 6 months ago)

67 views

More on Choosing #Clusters in General


References


Breckenridge, James N. (2000), “Validating Cluster Analysis: Consistent Replication
and Symmetry,”
Multivariate Behavioral Research
, 35 (2), 261
-
285.


Calinski
, R. B. and J.
Harabasz

(1974), “A Dendrite Method for Cluster Analysis,”
Communications in Statistics
, 3, 1
-
27.


Krolak
-
Schwerdt
, Sabine and Thomas
Eckes

(1992), “A Graph Theoretic Criterion for
Determining the Number of Clusters in a Data Set,”
Multivariate Behavioral
Research
, 27 (4), 541
-
565.


Milligan, Glenn W. and Martha C. Cooper (1985), “An Examination of Procedures for
Determining the Number of Clusters in a Data Set,”
Psychometrika
, 50, 159
-
179.


Steinley
, Douglas and Michael J.
Brusco

(2011), “Choosing the Number of Clusters in
K
-
Means Clustering,”
Psychological Methods
, 16 (3), 285
-
297.

References: Articles


Goodman, Leo A. and William H.
Kruskal

(1954), “Measures of Association for Cross
Classification”
Journal of the American Statistical Association
, 49, 732
-
764.


Measures like correlations (
r’s
) but for categorical data


Hartigan
, John A. and M. A. Wong (1979), “A K
-
Means Clustering Algorithm,”
Applied
Statistics
, 28, 100
-
108.


K
-
means and the Fortran code (
hehehe
, how cool & nerdy is that?!)


Johnson, Stephen C. (1967), “Hierarchical Clustering Schemes,”
Psychometrika
, 32 (3),
241
-
254.


“Hierarchy” is defined, single
-
link & complete
-
link are introduced


Lance, G. N. and W. T. Williams (1967), “A General Theory of Classificatory Sorting
Strategies, I. Hierarchical Systems,”
Computer Journal
, 9, 373
-
380.


The equation that subsumes single, complete, average, Ward’s, etc.


Milligan, Glenn W. (1979), “
Ultrametric

Hierarchical Clustering Algorithms,”
Psychometrika
, 44 (3), 343
-
346.


Extends
ultrametric

distances


Ward, Joe H., Jr. (1963), “Hierarchical Grouping to Optimize an Objective Function,”
Journal of the American Statistical Association
, 58 (301, March), 236
-
244.


The Ward of Ward’s method

References: Books


Aldenderfer
, Mark S., and Roger K.
Blashfield

(1984),
Cluster Analysis
, Newbury Park, CA:
Sage.


Great succinct intro


Hartigan
, John (1975),
Clustering algorithms
, NY: Wiley.


Has the
fortran

code for a bunch of algorithms


Sneath
, Peter H. A. and Robert R.
Sokal

(1973),
Principles of Numerical Taxonomy
, San
Francisco: Freeman.


Solid, examples are from a diff field (bio) but refreshing at the same time


Cluster analysis also appears as a chapter in most multivariate stats books, such as:


Seber
, G.A.F. (1984),
Multivariate Observations
, NY: Wiley, Ch.7, pp.347
-
394.

References: Articles


Arabie
, Phipps, J. Douglas Carroll, Wayne
DeSarbo
, and Jerry Wind (1981), “Overlapping
Clustering: A New Method for Product Positioning,”
Journal of Marketing Research
18
(Aug.), 310
-
317.


Cool model for non
-
hierarchical clustering


Punj
,
Girish
, and David W. Stewart (1983), “Cluster Analysis in Marketing Research:
Review and Suggestions for Application,”
Journal of Marketing Research
20 (May), 134
-
148.


Illustrates a wide variety of applications of clustering



Recommendation Engines & Clustering


Iacobucci, Dawn, Phipps
Arabie

and
Anand

Bodapati

(2000), “Recommendation Agents
on the Internet,”
Journal of Interactive Marketing
, 14 (3), 2
-
11.


Bodapati
,
Anand

V. (2008), “Recommendation Systems with Purchase Data,”
Journal of
Marketing Research
, 45 (Feb.), 77
-
93.



Other Clustering Applications


Parkman, Margaret A. and Jack Sawyer (1967), “Dimensions of Ethnic Intermarriage in
Hawaii,”
American Sociological Review
, 32 (4), 593
-
607.





Clustering Related


McCutcheon, Allan L. (1987),
Latent Class Analysis
, Newbury Park, CA: Sage.


Smithson, Michael and Jay
Verkuilen

(2006),
Fuzzy Set Theory: Applications in the Social
Sciences
, Thousand Oaks, CA: Sage.