Introduction to
Machine Learning
Multivariate Methods
姓名
:
李政軒
Multiple measurements
d
inputs/features/attributes:
d

variate
N
instances/observations/examples
Multivariate Data
Multivariate Parameters
Parameter Estimation
Estimation of Missing Values
What to do if certain instances have missing
attributes
Ignore those instances: not a good idea if the
sample is small
Use ‘missing’ as an attribute: may give
information
Imputation: Fill in the missing value
◦
Mean imputation: Use the most likely
value
◦
Imputation by regression: Predict based on other
attributes
Multivariate Normal Distribution
Mahalanobis distance: (
x
–
μ
)
T
∑
–
1
(
x
–
μ
)
measures the distance from
x
to
μ
in terms of
∑
(normalizes for difference in variances and correlations)
Bivariate:
d
= 2
Multivariate Normal Distribution
Bivariate Normal
Is probability contour plot of the
bivariate
normal distribution.
Its center is given by the mean, and its shape and orientation depend on
the covariance matrix.
If
x
i
are independent, offdiagonals of
∑
are 0,
Mahalanobis distance reduces to weighted (by
1/
σ
i
) Euclidean distance:
If variances are also equal, reduces to Euclidean
distance
Independent Inputs: Naive Bayes
Parametric Classification
If
p
(
x

C
i
) ~ N (
μ
i
,
∑
i
)
Discriminant functions
Estimation of Parameters
Different
S
i
Quadratic discriminant
likelihoods
posterior for C
1
discriminant:
P
(
C
1

x
) = 0.5
Shared common sample covariance
S
Discriminant
reduces to
which
is a linear discriminant
Common Covariance Matrix
S
Common Covariance Matrix
S
Covariances
may be arbitrary but shared by both classes.
When
x
j
j
= 1,..
d
, are independent,
∑
is diagonal
p
(
x

C
i
) = ∏
j
p
(
x
j

C
i
)
Classify based on weighted Euclidean distance (in
s
j
units) to the nearest mean
Diagonal
S
Diagonal
S
variances may be
different
Nearest mean classifier: Classify based on
Euclidean distance to the nearest mean
Each mean can be considered a prototype or
template and this is template matching
Diagonal
S
, equal variances
Diagonal
S
, equal variances
*
?
All classes have equal, diagonal covariance matrices of
equal variances on both dimensions.
As we increase complexity (less restricted
S
),
bias decreases and variance increases
Assume simple models (allow some bias) to
control variance (regularization)
Model Selection
Different cases of the covariance matrices fitted to
the same data lead to different boundaries.
Binary features:
if
x
j
are independent (Naive Bayes’)
the discriminant is linear
Discrete Features
Estimated parameters
Multinomial (1

of

n
j
) features:
x
j
Î {
v
1
,
v
2
,...,
v
n
j
}
if
x
j
are independent
Discrete Features
Multivariate linear model
Multivariate polynomial model:
Define new higher

order variables
z
1
=
x
1
,
z
2
=
x
2
,
z
3
=
x
1
2
,
z
4
=
x
2
2
,
z
5
=
x
1
x
2
and use the linear model in this new
z
space
Multivariate Regression
Comments 0
Log in to post a comment