# Introduction to Machine Learning

Τεχνίτη Νοημοσύνη και Ρομποτική

16 Οκτ 2013 (πριν από 4 χρόνια και 7 μήνες)

95 εμφανίσεις

Introduction to

Machine Learning

Multivariate Methods

:

Multiple measurements

d

inputs/features/attributes:
d
-
variate

N

instances/observations/examples

Multivariate Data

Multivariate Parameters

Parameter Estimation

Estimation of Missing Values

What to do if certain instances have missing
attributes

Ignore those instances: not a good idea if the
sample is small

Use ‘missing’ as an attribute: may give
information

Imputation: Fill in the missing value

Mean imputation: Use the most likely
value

Imputation by regression: Predict based on other
attributes

Multivariate Normal Distribution

Mahalanobis distance: (
x

μ
)
T

1

(
x

μ
)

measures the distance from
x

to
μ

in terms of

(normalizes for difference in variances and correlations)

Bivariate:
d
= 2

Multivariate Normal Distribution

Bivariate Normal

Is probability contour plot of the
bivariate

normal distribution.

Its center is given by the mean, and its shape and orientation depend on
the covariance matrix.

If
x
i

are independent, offdiagonals of

are 0,
Mahalanobis distance reduces to weighted (by
1/
σ
i
) Euclidean distance:

If variances are also equal, reduces to Euclidean
distance

Independent Inputs: Naive Bayes

Parametric Classification

If
p
(
x

|
C
i
) ~ N (
μ
i

,

i
)

Discriminant functions

Estimation of Parameters

Different
S
i

likelihoods

posterior for C
1

discriminant:

P

(
C
1
|
x
) = 0.5

Shared common sample covariance
S

Discriminant
reduces to

which
is a linear discriminant

Common Covariance Matrix
S

Common Covariance Matrix
S

Covariances

may be arbitrary but shared by both classes.

When
x
j

j
= 1,..
d
, are independent,

is diagonal

p
(
x
|
C
i
) = ∏
j

p

(
x
j
|
C
i
)

Classify based on weighted Euclidean distance (in
s
j

units) to the nearest mean

Diagonal
S

Diagonal
S

variances may be

different

Nearest mean classifier: Classify based on
Euclidean distance to the nearest mean

Each mean can be considered a prototype or
template and this is template matching

Diagonal
S
, equal variances

Diagonal
S
, equal variances

*

?

All classes have equal, diagonal covariance matrices of
equal variances on both dimensions.

As we increase complexity (less restricted
S
),
bias decreases and variance increases

Assume simple models (allow some bias) to
control variance (regularization)

Model Selection

Different cases of the covariance matrices fitted to
the same data lead to different boundaries.

Binary features:

if
x
j

are independent (Naive Bayes’)

the discriminant is linear

Discrete Features

Estimated parameters

Multinomial (1
-
of
-
n
j
) features:
x
j

Î {
v
1
,
v
2
,...,
v
n
j
}

if
x
j

are independent

Discrete Features

Multivariate linear model

Multivariate polynomial model:

Define new higher
-
order variables

z
1
=
x
1
,
z
2
=
x
2
,
z
3
=
x
1
2
,
z
4
=
x
2
2
,
z
5
=
x
1
x
2

and use the linear model in this new
z

space

Multivariate Regression