Mining Complex
High

Order Datasets
Michael Barnathan
Temple University
Department of Computer and Information Sciences
April 23, 2010
Structure of Presentation
•
Introduction
▫
What constitutes “complex” data?
▫
Why functional medical imaging?
▫
Tensor definitions and nomenclature.
▫
Motivation and contributions.
•
Background
▫
Literature review.
▫
Matrix decompositions.
Why SVD, LSA, and their higher

order analogs are still state

of

the

art methods.
Minimization of
Frobenius
norm: Theoretical justification.
“Halfway to the Netflix Prize”: Empirical justification.
▫
Graph theoretic interpretation.
▫
Tensor operations and decompositions.
▫
WaveCluster.
•
Methods
▫
Tensor

theoretic multidimensional wavelet transform.
▫
Classification.
▫
Clustering.
WaveCluster (+ several improvements)
TWaveCluster
Lloyd + WaveCluster.
▫
Latent Concept Discovery (High

Order LSA).
•
Results
▫
Datasets.
▫
Low and High

Order Classification
▫
Clustering Results
▫
Handedness Detection and Discovered Concepts
▫
Behavior vs. Sparseness
▫
Approximation Accuracy
•
Conclusion and Future Work
•
References
What makes datasets complex?
•
Interactions between many features.
▫
Analyzing
simultaneous
effects of many features.
▫
Nonscalar
features.
•
Large scale.
▫
Size expands exponentially with order.
▫
Scalability is a
major
issue.
▫
We should use whatever tricks we can!
Sparseness.
Sensitivity to locality.
Approximation. (Nice if it has some optimality properties too).
Compression using a fast method.
•
Features
themselves
can be high

order.
▫
For example, the diffusion tensors used in Diffusion Tensor Imaging encapsulate
direction
.
▫
More commonly encountered, (1x3) or (1x4) pixels in an RGB image.
▫
Analyzing each component of each feature in isolation is sometimes impractical.
For example, looking for specific RGB colors entails examining all three channels at a time.
Most variations of R, G, and B are not independent. Why treat them that way?
▫
If we want to use the full dataset, a matrix is not the right model.
•
And combinations of these! Tradeoffs:
▫
Methods for untangling interactive features can be too slow on large

scale datasets.
▫
Locality in space and time can be lost

or false neighborhoods can be created.
RGB images are high

order when analyzed
across all channels.
Medical Imaging
•
Primary application of our work: Medical Image Analysis and Computer

Aided Diagnosis.
•
A particularly complex imaging domain.
•
Spatiotemporal
▫
Spatial locality usually very important.
Spatial activation patterns correlate to anatomical site, function.
Normal anatomy produces patterns that would be abnormal elsewhere.
Determination of border regions particularly important in some applications; e.g. lesion segmentation.
▫
Structural medical data evolves
slowly
over time.
e.g. Follow up images of the same patient.
Even occurs in normal patients; effects of growth/aging.
But this could take months or years!
▫
Functional data evolves much more rapidly!
Changes occur
during
imaging! Temporal resolutions of seconds or less.
Space
and
time (among many factors) need to be evaluated
together
.
Patterns may move to different parts of an organ over time.
e.g., frontal cortex (planning)

> motor cortex (execution) in motor task data.
•
Images vary with additional parameters
▫
Acquisition parameters.
▫
Subject.
▫
Functional Task.
•
Locality in space and time must be captured.
•
Analysis should use all information possible, including subject and experimental design parameters.
“Modes” of Functional Data
3 Spatial Modes
1 Temporal Mode
Trends in both are important.
Subject + task also modes.
The big picture:
How do we combine
this information?
Tensors
•
Matrices can’t model these datasets.
▫
Certainly not all of these modes in tandem.
•
Solution: add additional indices.
▫
Result: Multidimensional array or “data cube”.
▫
We call this a
tensor
.
(Data mining definition does not imply constraint on coordinate transform behavior. If you’ve never heard of a tensor befo
re, don’t worry about it).
•
The number of indices is the
order
of the tensor.
▫
“High

Order”: order > 2.
•
Each index is a “mode”:
▫
E.g. “Rows”

> Mode 1, “Columns”

> Mode 2.
Scalar
Vector
Matrix
Tensor
1
0
1
2
r
Order
Tensors: Advantages and Disadvantages
•
Advantages:
▫
Full exploitation of the high

order structure of a dataset:
Can represent high

order data without losing information or creating false
neighborhoods.
▫
Tensor techniques can make inferences across modes.
E.g. a co

clustering derived from and reported with similarities to all modes of the
tensor at once.
Very
powerful. Relates modes and describes their underlying meanings.
▫
Excellent representation for spatiotemporal data:
For example, functional experiments:
3 orders of volumetric data, 1 order for time, 1 order for subject, 1 order for
experimental task…
•
Disadvantages:
▫
Poor efficiency:
Storage requirements scale exponentially with order.
Tensor factorizations can take many iterations to converge.
▫
Methods are
global
: they do not take spatiotemporal locality into account.
▫
Low

order techniques are much more established in the literature, particularly in
the biomedical domain.
No comprehensive high

order data mining framework currently exists.
•
We address all of these.
Contributions
•
Although built on established theoretical foundations, tensors are not well studied in biomedical data
mining.
▫
We apply our techniques in the domain of computer

aided diagnosis and discover new potentially clinically
relevant patterns derived from simultaneous analysis across all modes in motor task fMRI data.
▫
We present one of the first comparative analyses of tensor and matrix methods within this domain, including
performance on a synthetic dataset of varying sparseness.
▫
Our primary dataset is large

scale (9.3 GB, order 6, 2,734,221,600 voxels), thus presenting an additional data
mining challenge.
•
No comprehensive framework exists for performing common data mining tasks using tensors.
▫
We develop TWave, a higher

order framework for classification, clustering, compression, feature extraction,
summarization, and latent concept discovery of tensor data.
▫
In the process of extending our framework, we make additional improvements to the WaveCluster algorithm
and derive novel high and low order algorithms suited to data mining and biomedical analysis.
▫
We compare the performance of our framework to traditional low

order models.
•
Tensor methods can be slow or even intractable on large datasets.
▫
We utilize optimized tensor analysis techniques, such as the memory efficient tucker decomposition (
Kolda
and Sun, 2008), and build our methods on more efficient analysis techniques, such as PARAFAC (
Harshman
,
1970; Carroll and Chang, 1970).
▫
We develop a hybrid approach using wavelets and based on the WaveCluster algorithm (
Sheikholeslami
,
Chatterjee
, & Zhang, 2000), compressing our dataset by 98% (9.3 GB to 181 MB) and speeding analysis by
two orders of magnitude (8 days to 2 hours) without a loss of subsequent classification accuracy or significant
deformation of the discovered concept space.
▫
We assess the performance characteristics of several tensor and matrix approaches as dataset size, sparseness,
and approximation accuracy change.
•
Most tensor methods ignore spatiotemporal locality.
▫
We utilize multilevel wavelet decompositions and WaveCluster, both of which naturally capture dataset
locality, prior to analysis with tensor techniques.
▫
We present an improvement to WaveCluster which deforms grid cells with a prior clustering step.
This both makes the algorithm context

aware and reduces a potentially massive partial volume effect.
Structure of Presentation
•
Introduction
▫
What constitutes “complex” data?
▫
Why functional medical imaging?
▫
Tensor definitions and nomenclature.
▫
Motivation and contributions.
•
Background
▫
Literature review.
▫
Matrix decompositions.
Why SVD, LSA, and their higher

order analogs are still state

of

the

art methods.
Minimization of
Frobenius
norm: Theoretical justification.
“Halfway to the Netflix Prize”: Empirical justification.
▫
Graph theoretic interpretation.
▫
Tensor operations and decompositions.
▫
WaveCluster.
•
Methods
▫
Tensor

theoretic multidimensional wavelet transform.
▫
Classification.
▫
Clustering.
WaveCluster (+ several improvements)
TWaveCluster
Lloyd + WaveCluster.
▫
Latent Concept Discovery (High

Order LSA).
•
Results
▫
Datasets.
▫
Low and High

Order Classification
▫
Clustering Results
▫
Handedness Detection and Discovered Concepts
▫
Behavior vs. Sparseness
▫
Approximation Accuracy
•
Conclusion and Future Work
•
References
Literature Review:
•
Matrix Methods and Singular Value Decomposition:
▫
SVD used for low

rank matrix approximation, PCA
(
Eckart
and Young, 1936).
▫
Latent Semantic Analysis
(
Deerwester
,
Dumais
, Furnas,
Landauer
, and
Harshman
, 1990).
Used in clustering and co

clustering (Pan, Zhang, Wang, 2008).
“Folding in”: a powerful recommendation engine used with great success in pursuit of the Netflix Prize (
Koren
, 2009).
▫
PCA used to remove correlations, ICA developed (
Comon
, 1994).
▫
Principle Direction Divisive Partitioning
–
Hierarchical clustering using SVD (
Boley
, 1998).
▫
Incremental linear

time SVD developed
(Brand, 2003; Brand, 2006).
▫
Optimal k

means clustering related to PCA subspace (Ding and He, 2004).
•
Tensor Decompositions
–
Milestones:
▫
Introduced by Hitchcock (Hitchcock, 1927).
▫
Tucker decomposition
(Tucker, 1963).
▫
PARAFAC / CANDECOMP
(
Harshman
, 1970; Carroll and Chang, 1970).
▫
High

order SVD (special case of Tucker decomposition) (
De Lathauwer, De Moor, Vandewalle, 2000).
▫
Tensor C

U

R (
Mahoney,
Maggioni
, and
Drineas
, 2006).
▫
Incremental tensor analysis:
(Sun, Tao, and
Faloutsos
, 2006).
▫
Multilinear PCA (and MPCA+LDA for supervised analysis) (Lu,
Plataniotis
, and
Venetsanopoulos
, 2008).
•
Applications
▫
High

order techniques used primarily in
chemometrics
: (Bro, 1996;
Andersson
and Bro, 2000;
Niazi
and Mohammad, 2006;
Niazi
and
Yazdanipour
, 2007).
▫
Facial recognition (“
TensorFaces
”) and computer vision: (
Vasilescu
&
Terzopoulos
, 2002), (Kim, Wong, and
Cipolla
, 2007).
▫
Handwritten digit recognition: (
Savas
, 2003).
▫
Network traffic analysis: (Sun, Tao, and
Faloutsos
, 2006).
▫
Web link analysis: (
Kolda
and Bader, 2006;
Kolda
and Bader, 2007).
▫
Spatiotemporal tensor mining: (Sun,
Tsourakakis
,
Hoke
,
Faloutsos
, and
Eliassi

Rad
, 2008).
•
Surveys: (
Comon
, 2002), (Martin, 2004), (
Kolda
and Bader, 2007), (
Skillicorn
, 2007),
(
Acar
and
Yener
, 2009).
•
Gaping hole: Four decompositions (SVD, Tucker, PARAFAC, NNMF) used to do just about everything in this area.
▫
Improvements on the computations, but no real replacements or optimizations.
▫
There’s a reason for this.
Biomedical Applications
•
Outside of
chemometrics
, less utilized than in other domains.
•
PARAFAC has been the technique of choice.
▫
(Requires significantly less memory than Tucker and even Memory

Efficient
Tucker in our experiments).
•
EEG analysis of epilepsy data: (
Acar
,
Aykut

Bingol
,
Bingol
, Bro, &
Yener
,
2007).
•
EEG classification of visual potentials evoked by 3 geometric shapes: (Li,
Zhang, and Zhao, 2007).
•
MRI studies of lingual morphology: (
Zheng
, Hasegawa

Johnson, and Pizza,
2003).
•
Fusion of EEG and fMRI modalities: (
Martínez

Montes,
Valdés

Sosa,
Miwakeichi
, Goldman, and Cohen, 2004).
•
Tensor CUR paper was applied on multispectral histology classification:
(Mahoney,
Maggioni
, &
Drineas
, 2006).
•
There is no comparative analysis of tensor analysis techniques (except CUR
vs. Tucker) or tensor techniques against matrices in this domain.
▫
We will present such an analysis with our framework.
Why SVD?
•
Why hasn’t something better come along?
▫
SVD is
ancient
.
Proof of existence and uniqueness: 1936 at latest.
Used even earlier.
▫
Yet it’s still used everywhere for co

clustering and recommendation (See prev.
slide).
▫
High order techniques are also based on SVD and used in the same way.
•
Theoretical justifications:
▫
SVD is a linearly optimal approximation (min.
Frobenius
norm):
▫
The concepts are ranked by contribution to dataset variance. If SNR is high, noise
tends to go first.
•
Empirical justifications:
▫
It works very well. A simple SVD

based recommendation engine came halfway to
winning the Netflix prize early and was a component of the winning solution
(
Koren
2009).
▫
Simple to compute and can greatly compress the dataset for subsequent analysis.
Singular Value Decomposition
•
Factors an
m
x
n
rank

r matrix
A
into two orthonormal projection matrices
U
and
V
containing left and right
singular
vectors
and a diagonal core matrix
S
containing
singular values
:
A = USV
T
•
Full SVD
:
U
is
m
x
m
,
V
is
n
x
n
,
S
is
m
x
n
.
•
Compact SVD
: U
is
m
x
r
,
V
is
n x r
,
S
is
r
x
r
.
•
Unique decomposition; there is only
one
SVD of matrix A satisfying the constraints.
•
Uses:
▫
Optimal low

rank approximation (minimizes
Frobenius
norm of approximation error).
Made by arranging singular values (i.e. variance captured) in descending order, zeroing all but the desired number of values,
reconstituting the matrix.
▫
Computation of pseudoinverse: A* =
VS

1
U
T
.
▫
Basis of range and null space:
Left singular vectors (columns of
U
) with nonzero singular values form a basis on A’s range.
Right singular vectors with zero singular values form a basis of A’s null space.
▫
Determination of rank (number of nonzero singular values in
S
).
▫
Least squares minimization (
b
= V
T
S

1
U
T
y
–
probably not the most efficient way to do this).
▫
Co

clustering (LSA) and hierarchical clustering (PDDP).
•
Basis for PCA,
LSA
.
•
Straightforward computation based on Eigendecomposition of (A
T
A)
½
(in thesis).
SVD
U
V
T
S
Latent Semantic Analysis: Basics
•
Use of SVD to extract latent concepts from a term

document matrix (
Deerwester
,
Dumais
,
Furnas,
Landauer
, and
Harshman
, 1990).
▫
Construct “term

document” matrix of term occurrence frequencies or weights.
(Works well with
tf

idf
weighting
).
▫
Normalize columns.
▫
Perform SVD on term

document matrix.
▫
Resulting U contains document

concept similarities, V contains term

concept similarities.
▫
S contains “strength” of concepts: contribution to overall dataset variance.
•
Foundation of our approach
; works with higher order decompositions and non

textual
datasets as well.
▫
Groups of voxels, groups of subjects, groups of motor tasks…
•
“Synonym” detection: terms that share concepts with high similarity are likely synonyms or
otherwise related.
•
Useful for noise reduction.
▫
If SNR is sufficiently high, noise is naturally represented by “weak” concepts.
▫
Truncation of the SVD may be performed to eliminate concepts:
noise tends to go first
.
•
Unsupervised; optimal number of concepts to keep may be determined with the aid of the
elbow criterion or measures such as
Akaike
or Bayesian Information Criteria.
LSA example.
Term
Document
Ventricles
Hippocampus
Flow
Traffic
MRI

1
24
10
1
0
MRI

2
16
8
0
0
MRI

3
8
6
0
0
NET

1
0
0
6
17
NET

2
0
0
4
13
Example term

document matrix for
MRI reports and network RFCs.
SVD
94.8%
“MRI” concept.
“NET” concept.
LSA Graphical Example
Rows and columns plotted in the same space!
Left shift an effect of that instance
of “flow” in the MRI

1 document.
Graph Theoretic Interpretation
•
LSA represents a
co

clustering
of rows and columns into
the same
space
.
▫
# clusters = # concepts held.
▫
Cluster memberships = similarities in U, V.
▫
Cluster strengths = singular values in S.
•
Very powerful high order extension:
every mode of a tensor projects to a
common space
.
•
Can be done on either the adjacency matrix or the
Laplacian
of a graph (L =
Adj. Mat

Deg. Mat).
•
Translates into matching vertices through a common set of conceptual
“waypoints” rather than directly from one to another:
Recommendation With “Folding In”
•
Means of projecting a previously unseen query into
the existing concept space:
•
q'
can now be compared to any row of
U (
or V
)
.
▫
i.e. cosine similarity:
•
Query can be appended to U to seed new queries,
but be careful.
▫
The model will lose its optimality as a projection.
▫
Accuracy gradually degrades (
Sarwar
et al. 2002).
▫
Solution: periodically rebuild the SVD or use an
incremental formulation (Brand 2003).
Folding In Example
Tensor Operations
•
Tensor (outer) product.
▫
Generalizes Kronecker product.
•
Extracting slices and fibers.
•
Unfolding (matricization) and folding.
•
Matrix mode product.
▫
Gen. matrix multiplication.
“Horizontal”, “Lateral”, and “Frontal” slices.
“Column”, “Row”, and “Tube” fibers.
1
2
3
4
1
2
3
4
5
6
7
8
9
1
0
1
1
1
2
Unfolding on mode 1 (take mode
1 fibers and linearize).
Equivalently:
LSA Interpretation
•
LSA concepts still usable with higher

order decompositions.
•
Projection matrices contain mode

to

concept similarities.
•
Core tensor / scaling vector contains concept strengths
(contributions to dataset variance).
•
Now possible to assess the similarity to concepts of
each mode
relative to the entire dataset
! Not just rows vs. columns.
▫
i.e. Discrimination of handedness by subject, taking
spatiotemporal patterns and motor task into account.
▫
This is done in parallel on all modes of the dataset.
•
Modes can also be ranked and compared directly.
Tucker Decomposition
•
High

order analogue of SVD.
Let
𝒜
be an order

r
tensor.
There exists a decomposition of the form:
▫
Where
U
1
,
U
2
, …,
U
r
are unitary projection matrices on each mode and
𝒮
is a core tensor containing concept strengths.
•
Not a unique decomposition.
•
No longer possible to ensure orthonormal projection
matrices
and
diagonal core tensor.
▫
Orthonormal matrices generally more desirable.
▫
With orthonormal matrices, a similar optimality guarantee to SVD.
•
Computation by alternating least squares in thesis.
PARAFAC/CANDECOMP
•
High

order analogue to factor analysis / PCA.
▫
Independently discovered by
Harshman
and (Carroll and Chang) in 1970.
Same technique, different names.
▫
Demonstrated as a generalization of PCA by (
De Lathauwer, De Moor, Vandewalle 2000).
As with factor analysis, PCA is similar to PARAFAC when order <= 2 and the residuals are homoskedastic.
Unlike PCA, no approximation optimality guarantee in the high

order case.
▫
Unique rotationally

sensitive solution.
Let
𝒜
be an order

r
tensor.
For a specified number of factors
f
, there exists a unique decomposition of the form:
▫
U
(1)
…
U
(r)
contain factor loadings on their respective modes.
▫
Lambda vector contains variance captured within factors.
•
Computation by alternating least squares in thesis.
=
+
+
Wavelets
•
Multiresolution spectral analysis tools.
•
Continuous wavelet transform convolves scaled and translated
wavelet functions with signal:
•
Where * denotes the complex conjugate operation, s denotes the scaling parameter, t denotes the translation parameter, ψ deno
tes
the mother wavelet function, and f(x)
denotes the original signal.
•
Discrete analogue/filter bank more often used:
•
Convolves x with
lowpass
and
highpass
filters, downsamples
(“decimates”) signal in half.
▫
Perfect reconstruction still possible for orthogonal wavelets.
▫
(“
Quadrature
mirror filter”)
•
Stationary wavelet transform:
upsamples
filters instead.
▫
Since we can reconstruct with only half the signal, this is redundant.
▫
Nevertheless, it is useful for grid

based algorithms (e.g. WaveCluster).
Multilevel Wavelet Decomposition
•
Multiresolution analysis possible by
cascading
a
Mallat
tree
of filters:
•
Also possible to create a multidimensional wavelet
decomposition by rotating the tensor between levels.
▫
Novel idea: let’s use tensor operations to extend this to
structures of arbitrary order.
▫
We’ll come back to this momentarily.
WaveCluster
•
Grid and density

based clustering algorithm using wavelets by (
Sheikholeslami
,
Chatterjee
, and Zhang,
2000).
•
Perfect complement to our hybrid wavelet + tensor approach.
•
Primary advantages:
▫
No
k
parameter, as in k

means.
▫
Linear

time.
▫
Multiresolution (uses wavelet decompositions at different scales).
▫
Can identify non

spherical clusters (connectivity rather than distance defines clusters).
▫
Identifies outliers

used in
FindOut
, an outlier detection method (Yu,
Sheikholeslami
, and Zhang, 2002).
•
Primary disadvantages:
▫
Grid introduces a quantization error.
Huge partial volume effect possible if grid cells are too large.
▫
Density threshold and grid size are additional parameters.
▫
Naïve algorithm can’t cluster image intensity data; voxels must be binary.
▫
Cannot identify spatially disjoint clusters.
•
Algorithm Overview:
▫
Quantizes data to a grid, using the count of each grid cell in place of the original data.
▫
Applies a wavelet transformation using a hat

shaped wavelet (such as the (2,2) or (4,2) biorthogonal
wavelets), retaining the approximation coefficients and emphasizing regions in which points cluster.
▫
Thresholds cells in the transformed space. Cells with values above a user specified density threshold are
considered “significant”.
▫
Applies a connected component algorithm to the significant cells to discover and label clusters.
▫
Maps the cells back to the original data using a lookup table built during quantization.
•
In

house implementation.
▫
Impossible
to find source code, even by asking the authors.
▫
So others can avoid the same trouble, I am
opensourcing
it.
WaveCluster Illustrated
2
0
3
1
dwt
quantization
thresholding
1
0
1
0
2.1
0
.4
2.8
1.5
2
0
1
0
conncomp
unmapping
outlier
First public implementation. Code
available
and licensed under the GPL v3.
Structure of Presentation
•
Introduction
▫
What constitutes “complex” data?
▫
Why functional medical imaging?
▫
Tensor definitions and nomenclature.
▫
Motivation and contributions.
•
Background
▫
Literature review.
▫
Matrix decompositions.
Why SVD, LSA, and their higher

order analogs are still state

of

the

art methods.
Minimization of
Frobenius
norm: Theoretical justification.
“Halfway to the Netflix Prize”: Empirical justification.
▫
Graph theoretic interpretation.
▫
Tensor operations and decompositions.
▫
WaveCluster.
•
Methods
▫
Tensor

theoretic multidimensional wavelet transform.
▫
Classification.
▫
Clustering.
WaveCluster (+ several improvements)
TWaveCluster
Lloyd + WaveCluster.
▫
Latent Concept Discovery (High

Order LSA).
•
Results
▫
Datasets.
▫
Low and High

Order Classification
▫
Clustering Results
▫
Handedness Detection and Discovered Concepts
▫
Behavior vs. Sparseness
▫
Approximation Accuracy
•
Conclusion and Future Work
•
References
Tensor

Theoretic Multidimensional Wavelet Transform
•
Motivation: Necessary for our clustering work.
•
Insight: a 2D wavelet transform can be constructed by transforming the
rows then transforming the (row

transformed) columns.
•
The generalization is not as straightforward: each mode must be
transformed against all other modes.
•
Intuition: Unfolding guarantees that one mode of the tensor will be made
into rows and all others into columns of a matrix.
•
The DWT can be taken of this representation, then the tensor re

folded and
rotated.
•
Method: Given tensor X of order
r
,
▫
Unfold tensor X on mode 2 and transpose (all other modes concatenated in rows,
mode 2 becomes columns).
▫
Perform a 1D DWT on each row.
▫
Transpose and fold the tensor. Mode 2 is now transformed across all other modes.
▫
Circularly shift every mode of the tensor by 1.
▫
Repeat
r
times (all modes will end in their starting positions).
Illustration
Rotate and repeat.
Rotate 90º
clockwise.
This code is also
available
under the GPL v3.
TWave Classification
•
High

order locality

preserving classification.
•
Apply multidimensional multilevel DWT on the spatiotemporal modes of the tensor using an orthogonal wavelet (we used
Daubechies

4).
•
Concatenate each level of approximation and detail coefficients and linearize the wavelets.
▫
Multiresolution analysis: each level corresponds to one resolution.
▫
Purpose: Encapsulate neighborhood information in analysis.
•
Retain the structure of the categorical modes of the tensor. (Neighborhoods among these modes do not exist).
•
Optionally threshold wavelet coefficients.
•
Normalize each column to have a mean of zero.
•
Apply Tucker, PARAFAC, or Multilinear PCA.
▫
Any number of concepts can be retained. The number of distinct classes usually works well.
▫
Fewer concepts => less fidelity but greater degree of abstraction.
▫
Purpose: Inter

modal reasoning and conceptual abstraction.
•
If analyzing more than one mode at once (e.g. subject and task combinations), multiply the target modes back together by
plugging them into the PARAFAC/Tucker equation (without the
λ
scaling factor).
•
Unfold on mode 2 and transpose, concatenating the rows of the tensor to become the rows of the feature matrix.
•
Pass the unfolded and preprocessed matrix to a classifier.
▫
Despite operating on a matrix, the intuitive classification procedure on a tensor remains “features in the columns, observati
ons
on
all other modes”.
▫
This is exactly what transposing the mode 2 unfolding gives us.
•
Transpose and fold the resulting vector of class labels on mode 2 to yield a d
1
x 1 x d
3
x … tensor containing class labels for
each observation on each mode.
▫
The columns (features) are gone, replaced by class labels. This is not a representation of the original tensor, but a conveni
enc
e for
examining results.
TWave Concept Discovery
Similar to classification using
Twave
.
1.
Perform the same preprocessing steps as classification, up to centering.
2.
Decompose using the suspected number of concepts present in the dataset (e.g. “left
handed” and “right handed” suggests 2 concepts).
1.
AIC, BIC, elbow criterion, etc. may help guide, but these are still imprecise.
2.
As with any unsupervised learning, the ideal parameters are guided by domain knowledge.
3.
The resulting tensor is a projection into concept space containing mode

to

concept
similarities.
4.
Discovered concepts may be analyzed and displayed directly or used as input to further
analysis.
1.
If plotted, all modes can be shown in the same plot.
2.
If analyzed, (cosine) similarity between projected modes can be used to build a
recommendation engine.
(This is something I’ve done with great success at work using
Tanimoto
similarity and a
sparse
tf

idf
weighted graph adjacency matrix, but I unfortunately can’t share those
results)
TWaveCluster
•
Based on the WaveCluster algorithm.
▫
Modified to work on real (rather than binary) data.
▫
WaveCluster is a density

based clustering algorithm.
That means it counts frequencies of voxel appearance under certain constraints (in this case their grid cells).
This is precisely what we do to construct the LSA term

document matrix!
So what we’re really doing is a more sophisticated form of LSA co

clustering.
▫
Idea: Replace the connected component algorithm with a decomposition.
▫
Use each voxel’s similarity to each concept as a measure of its cluster membership. Voxels with similar
concept memberships will cluster together.
•
Advantages:
▫
High

order (primary advantage of the other techniques as well).
▫
Naturally fuzzy (concept similarities real

valued).
▫
Can discover spatially disjoint yet similar clusters (Tucker/PARAFAC not locality

sensitive).
▫
Still integrates preferences for dense neighborhoods (due to the wavelets).
▫
Can cluster across modes of the tensor in parallel.
▫
Built

in cluster validity measure: strength of concepts!
▫
Comparable efficiency to WaveCluster; data size greatly reduced prior to invocation of the tensor
decomposition due to the binning and wavelet transformation.
•
Disadvantages:
▫
Requires the number of concepts as a parameter.
TWaveCluster
Algorithm
•
Begin as in WaveCluster:
1.
Quantize data to a grid, using the count of each grid cell in place of the original data.
2.
Apply a wavelet transformation using a hat

shaped wavelet (such as the (2,2) or (4,2)
biorthogonal wavelet), retaining the approximation coefficients.
3.
Threshold cells in the transformed space. Cells with values above a user specified density
threshold are considered “significant”.
1.
Model significant cells as a tensor:
2.
For a parameter
k
, run a
k

concept PARAFAC analysis on X:
3.
For each
c
from 1 to
k
, recompose a tensor using only column
c
of each projection matrix (omit
the scaling factor). The resulting tensor contains voxel similarities to concept
c
:
4.
Assign every voxel the cluster label of the concept to which it exhibits the greatest similarity:
5.
Threshold: keep the top
s
% most similar voxels within each cluster concept, discard the others.
.
Why can’t WaveCluster segment?
•
Clustering algorithms are naturally used for image segmentation.
•
WaveCluster is generally a poor algorithm to use for this purpose, particularly on real

valued images.
•
This applies to TWaveCluster as well.
•
Why?
▫
Clean and concise margins are sometimes
very important
.
▫
Meaning in diagnostic radiology:
Too wide

> tissue loss and surgical complications.
Too small

> treatment failure.
▫
WaveCluster is a grid

based algorithm.
One point in a cell cannot cluster alone. The whole cell is labeled, all or nothing.
Wavelet transform blurs cells with neighbors.
Desirable for capturing locality.
Not desirable when a rectangular cell crosses a margin! Huge partial volume effect!
The clustering ends up being “blocky”; boundaries of sparse areas are eroded, boundaries of dense areas dilated.
Good for outlier detection (e.g.
FindOut
method, Yu et al.)
Bad for precise segmentation.
•
Potential Solution: Use (more) smaller grid cells.
▫
Problem: Causes the clustering to lose generality and homogeneity, alters the scale of the analysis.
▫
Clusters are still fundamentally
hyperrectangles
.
▫
Potentially more false splits.
•
Better Solution:
Mold cells to the underlying image margins.
Lloyd + WaveCluster
•
Idea: Avoid the quantization error associated with WaveCluster by using k

means centroids
discovered by Lloyd iteration as cells.
▫
Seed the algorithm with the WaveCluster grid cells.
▫
Attain new cells from algorithm.
▫
Map new grid to these cells.
▫
Run WaveCluster.
▫
Instead of assigning voxels a label, assign grid cells.
▫
Starting state: WaveCluster grid cell boundaries.
•
Can achieve better boundary resolution with far fewer cells than WaveCluster.
•
Greatly reduces partial volume effect of the grid.
•
Benefits particularly pronounced when few grid cells are present.
•
Tradeoff: Efficiency is reduced to that of k

means.
▫
Which in a very small number of cases is super

polynomial.
If used in a public application (e.g. CBIR), an adversary could exploit this.
Unlikely but possible to arise naturally.
▫
This may be an acceptable tradeoff in applications such as determination of margins in
medical images.
•
Other segmentation approaches valid here as well.
▫
Watershed, edge detection, level sets, fuzzy connectedness…
▫
Same basic principle.
Structure of Presentation
•
Introduction
▫
What constitutes “complex” data?
▫
Why functional medical imaging?
▫
Tensor definitions and nomenclature.
▫
Motivation and contributions.
•
Background
▫
Literature review.
▫
Matrix decompositions.
Why SVD, LSA, and their higher

order analogs are still state

of

the

art methods.
Minimization of
Frobenius
norm: Theoretical justification.
“Halfway to the Netflix Prize”: Empirical justification.
▫
Graph theoretic interpretation.
▫
Tensor operations and decompositions.
▫
WaveCluster.
•
Methods
▫
Tensor

theoretic multidimensional wavelet transform.
▫
Classification.
▫
Clustering.
WaveCluster (+ several improvements)
TWaveCluster
Lloyd + WaveCluster.
▫
Latent Concept Discovery (High

Order LSA).
•
Results
▫
Datasets.
▫
Low and High

Order Classification
▫
Clustering Results
▫
Handedness Detection and Discovered Concepts
▫
Behavior vs. Sparseness
▫
Approximation Accuracy
•
Conclusion and Future Work
•
References
Experimental Datasets: High

Order
•
fMRI digital opposition motor task dataset.
•
Four tasks: Left finger

to

thumb, left hand squeeze, right finger

to

thumb, right hand squeeze.
•
Tasks were strongly periodic.
•
11 subjects.
•
120 time points per subject (resolution = 3s).
•
79x95x69 voxels per time point.
•
Thus a tensor of order 6 and dimensionality 79x95x69x120x4x11.
9.3 GB in size.
Classification Results
•
Leave

one

out
kNN
experiments performed on voxelwise, wavelet

only, SVD, PARAFAC,
TWave, and TWave with MPCA+LDA:
•
SVD and PARAFAC alone were run on a supercomputer (Euler):
▫
16 2.6 GHz dual

core CPUs.
▫
128 GB of memory.
•
All other methods were run on a dual processor 2.2 GHz
Opteron
system with 4 GB of
memory.
•
Tucker decomposition initially did not complete due to memory usage; we were able to use
Memory Efficient Tucker, however (
Kolda
and Sun, 2008).
▫
Ω
(n
2
) memory usage aside, Tucker results were nearly identical to PARAFAC in time,
accuracy, and handedness detection and are not shown.
•
CUR decomposition could not complete on either system and is thus not included.
Voxels
Wavelets
SVD
PARAFAC
TWAVE
TWAVE + MPCA/LDA
Runtime
95 min
112 min
3 days
8 days
133
min
130 min
Subjects
52%
98%
80%
88%
96%
100%
Tasks
34%
68%
56%
52%
74%
93%
Size
9.3 GB
181 MB
9.3 GB
9.3 GB
181 MB
181 MB
Lefties?
No
No
No
Yes
Yes
N/A
k=5
k=4
Concept Discovery Results
•
TWave was capable of separating right and left handed subjects in a 2 concept analysis.
•
So was pure PARAFAC, but it took nearly
100x
longer to do so with similar results.
•
SVD could not do it: the unfolding distorted the discovered concept space.
•
Although only subjects are shown in the plot, the concepts have linkages to
every mode in the tensor
.
▫
Every voxel is represented by a point in this space as well, as is every point in time.
▫
And every task, but these were far right of all subjects and tightly clustered with each other relative to subject, indicatin
g t
hey were
likely orthogonal to the handedness concept.
Makes sense: variance between subjects = 9066.85, within

subject variance between tasks: 179.29.
Tasks thus project better onto lower concepts and begin to become more discriminative past the 9
th
concept.
This is positive: it suggests that the pattern may be learned independently of the motor task being performed.
It also means we could probably leave the task mode out and get similar results on subjects.
TWave
PARAFAC
SVD
Concept Discovery: Tasks
Tasks start to break up on concepts
9

10. (Smaller variance

> higher

numbered concepts)
2

concept PARAFAC space.
Concept Discovery: Control
•
Sex of the subject serves as a task

unrelated
control for TWave:
We’ve never really been sure of him…
TWaveCluster
Results:
Original
dataset (subject 6, RH).
k

means results (k=4).
Raw TWaveCluster
results (k=4).
Temporal mean shown.
Precentral
gyrus
/ “motor strip”?
Watch what happens when we
threshold…
TWaveCluster Final Results
The most salient cluster
is
frontal
. (
λ
1
= 28,812)
2
nd
most salient cluster
is
motor
. (
λ
2
= 18,083)
TWaveCluster Final Results
3
rd
most salient cluster
is
caudal
(
λ
3
= 15,958).
Least salient cluster is
noisy
(
λ
4
= 11,366).
Lloyd + WaveCluster Results
Large cell sizes (few cells) lead to quantization errors
in naïve WaveCluster.
Lloyd + WaveCluster lacks this quantization error,
even with few cells
.
Many more cells
and a lower density threshold
are
required to improve the
resolution of WaveCluster, but now two adjacent cells are merged
.
K

means mean cluster time series:
Subject 2 (LH), k=3
Low

Order MNIST Results
•
Classification also performed on the low

order MNIST digit recognition dataset.
•
There are many optimizations in the literature, but we were primarily interested in the
comparative behavior of the techniques when the dataset was dense and low

order.
•
Hypothesis: tensor decompositions would degrade to slower versions of their low

order
counterparts on a low

order dataset.
•
And that is essentially what happened: SVD and Tucker were
exactly the same
except for
trivial differences in sign.
▫
PARAFAC was not, but attained similar results.
•
Interesting to note is the scalability: performance on a small dataset is dominated by the
classification itself.
▫
This is not the case on the larger dataset.
▫
Tensor and wavelet decompositions compress the dataset and speed up the subsequent
classification step, thus the faster time.
Voxels
Wavelets
SVD
Tucker
PARAFAC
TWave
Total Time
213 sec
122 sec
17 sec
135 sec
55 sec
36 sec
Prep. Time
N/A
6 sec
2 sec
96 sec
39 sec
18 sec
Accuracy
95.4%
95.6%
88%
88%
89%
90%
k = 2
Approximation Accuracy
•
Tucker, PARAFAC, and SVD are all capable of low

rank approximation.
▫
SVD has a theoretical approximation optimality guarantee which a globally optimal Tucker decomposition should also obey.
•
SVD can’t operate on tensors natively, but can work on the unfolded representation.
▫
This is part of the point of using Tucker/PARAFAC: SVD can’t capture high

order structure.
•
Measured on a single (LH) subject’s volume represented as a 4
th
order tensor.
•
Averaged over 5 runs: PARAFAC

ALS and Tucker

ALS are
locally
optimal.
•
Concepts held ranged from 1 to 11.
•
Approximation error measured using MSE between reconstructed tensor and original.
•
Tucker outperforms SVD on a high

order dataset.
▫
On low

order datasets the lines completely overlap.
•
Approximation accuracy isn’t everything.
▫
(Or we’d use voxelwise methods everywhere).
•
More important is the
meaning
of the summarized concepts.
▫
e.g. LH vs. RH.
•
As previously shown, the PARAFAC results have greater
discriminative power by subject.
▫
SVD tends to group all subjects closely together.
•
This suggests that the error is due to
abstraction
of useful
information.
Sparseness
•
Our experimental data is rather sparse due to CSF masking and background removal.
▫
We’re interested in seeing how that affects performance.
•
Since some algorithms can take over a week per run on a large dataset, we assessed performance vs.
sparseness on a small dataset:
▫
1000 x 100 sequential integers from 1 to 10000. (Order 2)
▫
Bottom x% changed to zeros per run.
▫
x varied in increments of 10% from 0% to 100%*.
*One element always nonzero; ALS could not run on a dataset of all zeros.
▫
The actual data is irrelevant; all that matters is that all models use the same dataset and that the sparseness is
controllable.
•
Small dataset: classification time tended to dominate.
•
Not the case on larger datasets, where decomposition time outstripped it.
Structure of Presentation
•
Introduction
▫
What constitutes “complex” data?
▫
Why functional medical imaging?
▫
Tensor definitions and nomenclature.
▫
Motivation and contributions.
•
Background
▫
Literature review.
▫
Matrix decompositions.
Why SVD, LSA, and their higher

order analogs are still state

of

the

art methods.
Minimization of
Frobenius
norm: Theoretical justification.
“Halfway to the Netflix Prize”: Empirical justification.
▫
Graph theoretic interpretation.
▫
Tensor operations and decompositions.
▫
WaveCluster.
•
Methods
▫
Tensor

theoretic multidimensional wavelet transform.
▫
Classification.
▫
Clustering.
WaveCluster (+ several improvements)
TWaveCluster
Lloyd + WaveCluster.
▫
Latent Concept Discovery (High

Order LSA).
•
Results
▫
Datasets.
▫
Low and High

Order Classification
▫
Clustering Results
▫
Handedness Detection and Discovered Concepts
▫
Behavior vs. Sparseness
▫
Approximation Accuracy
•
Conclusion and Future Work
•
References
Future Work
•
Incremental Learning and Better Computation.
▫
The actual computation is something we haven’t really explored much yet.
▫
Is there a way to extend Brand’s result (Brand 2003) to a high

order space?
▫
PARAFAC

ALS converges well but has poor efficiency (
Tomasi
and Bro 2006).
•
Optimizing for memory usage.
▫
Tucker ran out of memory. Memory Efficient Tucker barely ran.
▫
Strongly suspect the O(mn
2
) memory usage is unnecessary.
Frobenius
norm:
sqrt
(
Tr
(A
T
A)).
We just need the diagonal of the covariance matrix, not the whole thing.
▫
An incremental model would solve this problem as well.
•
Developing higher

order manifold learning/nonlinear dimensionality reduction techniques.
▫
“Kernel PARAFAC” / “High

order kernel PCA”.
▫
Higher

order LLE/
Isomap
/MVU.
▫
Mathematically modeling abstraction in general is a fascinating area to me.
•
DTI dataset and structural registration.
▫
DTI: Innately high

order features.
▫
Once spatially registered, can be
integrated
into an fMRI tensor model as additional modes.
Jackpot if voxels, time, and modes representing the direction of diffusion gradient cluster to a common set of
concepts: this will allow us to
infer functional patterns from structural data
.
(From a neurological standpoint this ideal condition is unlikely because the variance of functional patterns is likely
much greater than that of structural patterns, but it can certainly used to pinpoint the association between
structural lesions and functional deficits if nothing else).
The natural noise removal properties of the decompositions will also be of great use.
Conclusion
•
Certain datasets naturally take the form of higher

order structures.
•
Tensors can accurately model this structure. Tensor decompositions allow us to reason
across modes of the tensor.
▫
“Conceptual co

clustering”: All modes directly comparable through similarities to a shared set
of latent concepts.
▫
“Folding in”: performing similarity analysis across modes in this clustered space.
•
Tensor methods have been underutilized in the traditional data mining literature.
Comprehensive mining frameworks are required.
•
Tensor decompositions provide powerful analysis tools that have not yet been put to their
full use.
•
Naïve methods do not scale and do not utilize spatiotemporal locality information.
▫
Hybrid analysis tools may be required to leverage the power of tensor decompositions while
avoiding their generally poor efficiency.
▫
Wavelets capture neighborhoods, speed analysis, compress well, and represent an effective
preprocessing tool.
•
Tensors have the ability to discover new patterns in high

order biomedical datasets.
Relevant Publications:
•
Published:
▫
TWave: High

Order Analysis of Spatiotemporal Data:
By Michael Barnathan, et al. Accepted for
publication in Proceedings of PAKDD 2010, Hyderabad, India.
▫
High

Order Concept Discovery in Functional Brain Imaging:
By Michael Barnathan, et al. Published
in Proceedings of the International Symposium on Biomedical Imaging 2010, Rotterdam, The Netherlands.
▫
Analyzing Tree

Like Structures in Biomedical Images Based on Texture and Branching: An
Application to Breast Imaging:
By Michael Barnathan,
Jingjing
Zhang,
Despina
Kontos
,
Predrag
Bakic,
Andrew
Maidment
, and Vasileios
Megalooikonomou
. Published in Proceedings of the International Workshop
on Digital Mammography (IWDM) 2008, Tucson, Arizona, July 20
–
23, 2008.
▫
Wavelet Analysis of 4D Motor Task fMRI Data:
By Michael Barnathan,
Rui
Li, Vasileios
Megalooikonomou
,
Feroze
Mohamed, and Scott Faro. Published in Proceedings of Computer Assisted
Radiology and Surgery (CARS) 2008, Barcelona, Spain, June 25
–
28, 2008.
▫
A Texture

Based Methodology for Identifying Tissue Type in Magnetic Resonance Images:
By
Michael Barnathan, et al. Published in Proceedings of the International Symposium on Biomedical Imaging
2008, Paris, France, May 14
–
17, 2008.
▫
A Web

Accessible Framework for Automated Storage and Analysis of Biomedical Images:
By
Michael Barnathan,
Jingjing
Zhang, and Vasileios
Megalooikonomou
. Published in Proceedings of the
International Symposium on Biomedical Imaging 2008, Paris, France, May 14
–
17, 2008.
•
In Review:
▫
A Spatiotemporal Clustering Framework for fMRI Time Series Analysis:
By
Rui
Li, Michael
Barnathan, Vasileios
Megalooikonomou
, Scott Faro, and
Feroze
Mohamed. Submitted to Human Brain
Mapping.
▫
Efficient Techniques for Database Similarity Searches of Brain Images:
By Troy Schrader, et al.
Submitted to AI In Medicine.
Other Publications:
•
A Representation and Classification Scheme for Tree

like Structures in Medical Images:
Analyzing the Branching Pattern of Ductal Trees in X

ray Galactograms: By
Vasileios
Megalooikonomou
, Michael
Barnathan
,
Despina
Kontos
,
Predrag
Bakic
, and Andrew D.A.
Maidment
. Published in Vol. 28, Issue 4 of IEEE Transactions on Medical Imaging, pp.
487

493.
•
Probabilistic Branching Node Detection Using
AdaBoost
and Hybrid Local Features: By
Tatyana
Nuzhnaya
, et al. (2nd author). Published in Proceedings of ISBI 2010, Rotterdam,
The Netherlands, April 14

17, 2010.
•
Spatial Feature Extraction Techniques for the Analysis of Ductal Tree Structures: By
Aggeliki
Skoura
, Michael
Barnathan
, and
Vasileios
Megalooikonomou
. Published in
Proceedings of EMBC 2009, Minneapolis, Minnesota, September 2
–
6, 2009.
•
Probabilistic Branching Node Detection Using Hybrid Local Features: By
Haibin
Ling,
Michael
Barnathan
,
Vasileios
Megalooikonomou
,
Predrag
Bakic
, and Andrew D.A.
Maidment
. Published in Proceedings of ISBI 2009, Boston, Massachusetts, June 28
–
July
1, 2009.
•
Classification of Ductal Tree Structures in Galactograms: By
Aggeliki
Skoura
, Michael
Barnathan
, and
Vasileios
Megalooikonomou
. Published in Proceedings of ISBI 2009,
Boston, Massachusetts, June 28
–
July 1, 2009.
•
A High

Level Language for Homeland Security Response Plans: By Richard
Scherl
and
Michael
Barnathan
. Published in Proceedings of the 2005 AAAI Spring Symposium, March
22, 2005.
References
•
Kolda
, T. G., & Sun, J. (2008). Scalable Tensor Decompositions for Multi

aspect Data Mining.
Proceedings of the 8th IEEE International Conference on
Data Mining
, (pp. 363

372).
•
Harshman
, R. A. (1970). Foundations of the PARAFAC procedure: Models and Conditions for an "Explanatory" Multimodal Factor Analysis.
UCLA
Working Papers in Phonetics
, 16
, 1

84.
•
Carroll, J. D., & Chang, J.

J. (1970). Analysis of Individual Differences in Multidimensional Scaling via an n

way Generalizatio
n of "
Eckart

Young"
Decomposition.
Psychometrika
, 35
(3), 283

319.
•
Mahoney, M. W.,
Maggioni
, M., &
Drineas
, P. (2006). Tensor

CUR Decompositions for Tensor

Based Data.
Proceedings of the 12th ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining
(pp. 327

336). Philadelphia: ACM.
•
Sheikholeslami
, G.,
Chatterjee
, S., & Zhang, A. (2000). WaveCluster: A Wavelet

Based Clustering Approach for Spatial Data.
The VLDB Journal
, 8
, 289

304.
•
Eckart
, C., & Young, G. (1936). The Approximation of One Matrix by Another of Lower Rank.
Psychometrika
, 1
(3), 183

187.
•
Deerwester
, S.,
Dumais
, S. T., Furnas, G. W.,
Landauer
, T. K., &
Harshman
, R. (1990). Indexing by Latent Semantic Analysis.
Journal of the American
Society for Information Science
, 41
, 391

407.
•
Comon
, P. (1994). Independent Component Analysis: A New Concept?
Signal Processing
, 36
(3), 287

314.
•
Boley
, D. (1998). Principal Direction Divisive Partitioning.
Data Mining and Knowledge Discovery
, 2
(4), 325

344.
•
Brand, M. (2006). Fast low

rank modifications of the thin singular value decomposition.
Linear Algebra and its Applications
, 415
(1), 20

30.
•
Brand, M. (2003). Fast online
svd
revisions for lightweight recommender systems.
SIAM International Conference on Data Mining
, (pp. 37

46).
•
Ding, C., & He, X. (2004). K

means Clustering via Principal Component Analysis.
Proceedings of the 21st International Conference on Machine
Learning
, (pp. 225

232). Banff, Canada.
•
Pan, F., Zhang, X., & Wang, W. (2008). CRD: fast co

clustering on large datasets utilizing sampling

based matrix decomposition.
SIGMOD 2008
(pp.
173

184). Vancouver, Canada: ACM.
•
Hitchcock, F. L. (1927). The Expression of a Tensor or a
Polyadic
as a Sum of Products.
Journal of Mathematical Physics
, 6
, 164

189.
•
Tucker, L. R. (1963). Implications of Factor Analysis of Three

way Matrices for Measurement of Change.
Problems in Measuring Change (University of
Wisconsin Press)
.
•
De
Lathauwer
, L., De Moor, B., &
Vandewalle
, J. (2000). A Multilinear Singular Value Decomposition.
SIAM Journal on Matrix Analysis and
Applications
, 21
(4), 1253

1278.
•
Sun, J., Tao, D., &
Faloutsos
, C. (2006). Beyond Streams and Graphs: Dynamic Tensor Analysis.
Proceedings of the 12th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining
, (pp. 374

383).
•
Lu, H.,
Plataniotis
, K. N., &
Venetsanopoulos
, A. N. (2008). MPCA: Multilinear Principal Component Analysis of Tensor Objects.
IEEE Transactions on
Neural Networks
, 19
(1), 18

39.
References
•
Bro, R. (1996). Multi

Way Calibration. Multi

Linear PLS.
Journal of
Chemometrics
, 10
(1), 47

62.
•
Andersson
, C. A., & Bro, R. (2000). The N

way Toolbox for MATLAB.
Chemometrics
& Intelligent Laboratory Systems
, 52
(1), 1

4.
•
Niazi
, A., & Mohammad, S. (2006). PARAFAC and PLS applied to
spectrophotometric
determination of tetracycline in pharmaceutical formulation and
biological fluids.
Chemical and Pharmaceutical Bulletin
, 54
(5), 711

713.
•
Niazi
, A., &
Yazdanipour
, A. (2007). PLS and PARAFAC Applied to Determination of
Noscapine
in Biological Fluids by Excitation

Emission Matrix
Fluorescence.
Pharmaceutical Chemistry Journal
, 41
(3), 170

175.
•
Vasilescu
, M. A., &
Terzopoulos
, D. (2002). Multilinear Analysis of Image Ensembles: Tensor

Faces.
Proceedings of the 7th European Conference on
Computer Vision
(pp. 447

460). Springer.
•
Kim, T.

K., Wong, S.

F., &
Cipolla
, R. (2007). Tensor Canonical Correlation Analysis for Action Classification.
Proc. CVPR 2007.
Minneapolis,
Minnesota, USA.
•
Savas
, B. (2003).
Analyses and Tests of Handwritten Digit Recognition Algorithms.
Sweden: Master's Thesis, Linkoping University.
•
Kolda
, T. G., & Bader, B. W. (2007).
Tensor Decomposition and Applications.
Technical Report, Sandia National Laboratories.
•
Kolda
, T. G., & Bader, B. W. (2006). The TOPHITS Model for Higher

order Web Link Analysis.
Workshop on Link Analysis, Counterterrorism and
Security.
•
Sun, J.,
Tsourakakis
, C. E.,
Hoke
, E.,
Faloutsos
, C., &
Eliassi

Rad
, T. (2008). Two Heads Better than One: Pattern Discovery in Time

Evolving Multi

Aspect Data.
Data Mining and Knowledge Discovery
, 17
(1), 111

128.
•
Comon
, P. (2002). Tensor Decompositions, State of the Art and Applications.
Mathematics in Signal Processing
, 1

24.
•
Martin, C. D. (2004). Tensor Decompositions Workshop Discussion Notes. Palo Alto, CA: American Institute of Mathematics (AIM)
.
•
Skillicorn
, D. (2007).
Understanding Complex Datasets: Data Mining with Matrix Decompositions.
CRC Press.
•
Acar
, E., &
Yener
, B. (2009). Unsupervised
Multiway
Data Analysis: A Literature Survey.
IEEE Transactions on Knowledge and Data Engineering
, 21
(1), 6

20.
•
Acar
, E.,
Aykut

Bingol
, C.,
Bingol
, H., Bro, R., &
Yener
, B. (2007).
Multiway
Analysis of Epilepsy Tensors.
Proc. ISMB/ECCB 2007
, (pp. i10

i18). Vienna,
Austria.
•
Li, J., Zhang, L., & Zhao, Q. (2007). Pattern Classification of Visual Evoked Potentials Based on Parallel Factor Analysis.
Proceedings of the
International Conference on Cognitive
Neurodynamics
(pp. 571

575). Springer Netherlands.
•
Zheng
, Y., Hasegawa

Johnson, M., & Pizza, S. (2003). PARAFAC Analysis of the Three Dimensional Tongue Shape.
Journal of the Acoustical Society of
America
, 113
(1), 478

486.
•
Martínez

Montes, E.,
Valdés

Sosa, P. A.,
Miwakeichi
, F., Goldman, R. I., & Cohen, M. S. (2004). Concurrent EEG/fMRI analysis by
multiway
Partial
Least Squares.
NeuroImage
, 22
(3), 1023

1034.
•
Yu, D.,
Sheikholeslami
, G., & Zhang, A. (2002).
FindOut
: Finding Outliers in Very Large Datasets.
Knowledge and Information Systems
, 4
(4), 387

412.
•
Fisher, R. A. (1936). The Use of Multiple Measurements in Taxonomic Problems.
Annals of Eugenics
, 7
, 179

188.
•
Koren
, Y. (2009).
The
BellKor
Solution to the Netflix Grand Prize
.
Retrieved from Netflix Prize:
http://www.netflixprize.com/assets/GrandPrize2009_BPC_BellKor.pdf.
•
Sarwar
, B.,
Karypis
, G.,
Konstan
, J., &
Riedl
, J. (2002). Incremental SVD

Based Algorithms for Highly Scalable Recommender Systems. Proc. 5th
International Conference on Computer and Information Technology (ICCIT), (pp. 27

28).
•
Lecun
, Y.,
Bottou
, L.,
Bengio
, Y., &
Haffner
, P. (1998). Gradient

based learning applied to document recognition.
Proceedings of the IEEE
, 86
(11),
2278

2324.
Thanks.
Questions?
Comments 0
Log in to post a comment