# Multilevel Simultaneous Component Analysis - Biosystems Data ...

Urbain et civil

16 nov. 2013 (il y a 4 années et 11 mois)

89 vue(s)

Multilevel Simultaneous
Component Analysis

Bioinformatics 2012

Content

Why MSCA?

How does it work?

Demonstrated with some simple
examples.

-
Several sources of varation

-
Analogy with anova

-
In the most simple case it is a combination of performing two PCA.

-
In a more general case it requires special software.

Thought experiment

Glucose levels in blood measured at
several points in time

Three individuals

Question: are the individuals similar?

Glucose levels

time

Glucose concentration

green

blue

red

0

2

4

6

8

10

12

14

16

0.5

1

1.5

2

2.5

3

3.5

4

4.5

Similarity of the individuals

Red
is similar to
blue

if the mean level is
considered

The dynamic behavior of
blue

and
green

are similar

Red

and
green

are dissimilar at both the
mean level and the dynamic level.

Two level model

for glucose concentration

Data = means + dynamics

Data=mean level + dynamic level

Blue = 2 + sin(t)

Red = 2 + sin(2t)

Green= 3 + sin(t)

)
sin(
)
2
sin(
)
sin(
3
2
2
t
t
t
DATA
Multivariate example

Three individuals

Five time points

Measured on two variables

Synthetic Dataset
-

PCA

-
15

-
10

-
5

0

5

10

15

-
15

-
10

-
5

0

5

10

15

x

1

x

2

PC 1

PC 2

PCA on multilevel data

The first PC is in de direction of the
maximum variation

This is in most cases neither the direction
of the “between” variation not the “within”
variation.

The PCs calculated with PCA confound
the different types of variation

Synthetic Dataset

within part

-
15

-
10

-
5

0

5

10

15

-
15

-
10

-
5

0

5

10

15

x

1

x

2

PCA on the within part

-
15

-
10

-
5

0

5

10

15

-
15

-
10

-
5

0

5

10

15

x

1

x

2

PC 1

Synthetic Dataset

between part

-
15

-
10

-
5

0

5

10

15

-
15

-
10

-
5

0

5

10

15

x

1

x

2

PCA on the between part

-
15

-
10

-
5

0

5

10

15

-
15

-
10

-
5

0

5

10

15

x

1

x

2

PC 1

Synthetic Dataset

MSCA

-
15

-
10

-
5

0

5

10

15

-
15

-
10

-
5

0

5

10

15

x

1

x

2

p
b

p
w

MSCA

The within and between concepts are
analogous to the within and between
concepts in ANOVA.

Is an appropriate technique if variation in
the data occur on multiple levels

In the most simple case: performing twice
a PCA.

MSCA on Metabolomics data

Jeroen J. Jansen
1
, Huub C.J. Hoefsloot
1
, Marieke Timmerman, Jan van der
Greef
2,3
, Age K. Smilde
1,2

1)

Process Analysis and Chemometrics, Department of Chemical Engineering,

Faculty of Sciences, University of Amsterdam, Nieuwe Achtergracht 166, 1018 WV Amsterdam

2)
TNO Nutrition and Food Reseach, PO Box 360, 3700 AJ Zeist

3)
Beyond Genomics, 40 Bear Hill Road, Waltham, MA 02451

Contents

Introduction: what is metabolomics?

Metabolomics data

Data analysis of metabolomics data

Data analysis of multilevel data

Results

Toxic Stress

Nutrition

Environment

...

Time

Metabolic Activity

Monkey 1

Monkey 2

Monkey 3

Urine spectra

Genotype

Health

Age

...

External

Internal Influences

+

=

Variation in Urine

Monitoring Metabolism

Areas of interest in Metabolomics

A.

Monitor effect of toxic compounds or nutrition

B.
Finding biomarkers that indicate events

Introduction

Variation in metabolomics data

Variation
BETWEEN

individuals

Variation
WITHIN
each individual

ANOVA

2
2
2
2
within
between
offset
total

Monkey 1

Monkey 2

Monkey 3

Monkey 5

Monkey 4

x
2

x
1

Monkey 1

Monkey 2

Monkey 3

Monkey 5

Monkey 4

x
2

x
1

x
2

x
1

MSCA

Monkey 1

Monkey 2

Monkey 3

Monkey 5

Monkey 4

x
1

x
2

PC 1

PC 2

Between
-
individual
model

Monkey 1

Monkey 2

Monkey 3

Monkey 5

Monkey 4

x
1

x
2

PC 1

PC 2

Within
-
individual
model

Monkey 1

Monkey 10

Monkey 2

:

Scores

means

Real Data: the monkey urine

“ANOVA”

2
2
2
2
within
between
offset
total

76 %

24 %

MSCA will give a different view on the data

MSCA will give a better model of the longitudinal variation

Difference between scores
obtained by PCA and MSCA

0

10

20

30

40

50

60

70

-
0.4

-
0.2

0

0.2

0.4

0.6

days

scores

scores monkey 6 PC 1

MSCA scores

PCA scores

0

10

20

30

40

50

60

70

-
0.2

-
0.15

-
0.1

-
0.05

0

0.05

0.1

0.15

0.2

days

scores

scores monkey 6 PC 3

MSCA scores

PCA scores

PCA and MSCA scores differ more for higher PCs

Between
-
individual level

-
0.25

-
0.2

-
0.15

-
0.1

-
0.05

0

0.05

0.1

0.15

-
0.25

-
0.2

-
0.15

-
0.1

-
0.05

0

0.05

0.1

0.15

0.2

1

4

8

9

10

2

3

5

6

7

PC1 ( 33.26 % of variance explained)

PC2 ( 29.08 % of variance explained)

female

male

From the between
-
individual level separation between male
and female monkeys can be observed

0

2

4

6

8

10

-
0.6

-
0.4

-
0.2

0

0.2

0.4

0.6

0.8

Chemical Shift (ppm)

Within
-

2.35

7.82

1.32

1.34

8.46

2.93

3.97

3.28

3.05

1.93

0

2

4

6

8

10

-
0.8

-
0.6

-
0.4

-
0.2

0

0.2

0.4

0.6

Chemical Shift (ppm)

Between
-

3.29

7.48

5.07

2.93

3.97

3.28

3.27

3.56

1.93

3.05

0

2

4

6

8

10

-
0.4

-
0.3

-
0.2

-
0.1

0

0.1

0.2

0.3

0.4

Chemical Shift (ppm)

Between
-

7.36

3.7

3.93

3.05

1.94

3.69

3.28

3.03

5.07

3.27

0

2

4

6

8

10

-
0.6

-
0.4

-
0.2

0

0.2

0.4

Chemical Shift (ppm)

Within
-

7.88

7.86

7.48

1.93

3.97

3.03

3.28

3.05

3.56

3.27

Conclusions: MSCA

MSCA is a member of the component
analysis family that uses ideas from
ANOVA

MSCA is the “correct” method for the
longitudinal analysis of Multilevel Data