Multilevel Simultaneous Component Analysis - Biosystems Data ...

skillfulbuyerΠολεοδομικά Έργα

16 Νοε 2013 (πριν από 3 χρόνια και 6 μήνες)

46 εμφανίσεις

Multilevel Simultaneous
Component Analysis

Bioinformatics 2012

Content


Why MSCA?




How does it work?




Demonstrated with some simple
examples.

-
Several sources of varation

-
Analogy with anova

-
In the most simple case it is a combination of performing two PCA.

-
In a more general case it requires special software.

Thought experiment


Glucose levels in blood measured at
several points in time



Three individuals



Question: are the individuals similar?

Glucose levels

time

Glucose concentration

green

blue

red

0

2

4

6

8

10

12

14

16

0.5

1

1.5

2

2.5

3

3.5

4

4.5

Similarity of the individuals



Red
is similar to
blue

if the mean level is
considered


The dynamic behavior of
blue

and
green

are similar


Red

and
green

are dissimilar at both the
mean level and the dynamic level.



Two level model

for glucose concentration



Data = means + dynamics



Data=mean level + dynamic level

Blue = 2 + sin(t)

Red = 2 + sin(2t)

Green= 3 + sin(t)























)
sin(
)
2
sin(
)
sin(
3
2
2
t
t
t
DATA
Multivariate example


Three individuals



Five time points



Measured on two variables



Synthetic Dataset
-

PCA

-
15

-
10

-
5

0

5

10

15

-
15

-
10

-
5

0

5

10

15

x

1

x

2

PC 1

PC 2

PCA on multilevel data


The first PC is in de direction of the
maximum variation


This is in most cases neither the direction
of the “between” variation not the “within”
variation.


The PCs calculated with PCA confound
the different types of variation

Synthetic Dataset


within part

-
15

-
10

-
5

0

5

10

15

-
15

-
10

-
5

0

5

10

15

x

1

x

2

PCA on the within part

-
15

-
10

-
5

0

5

10

15

-
15

-
10

-
5

0

5

10

15

x

1

x

2

PC 1

Synthetic Dataset


between part

-
15

-
10

-
5

0

5

10

15

-
15

-
10

-
5

0

5

10

15

x

1

x

2

PCA on the between part

-
15

-
10

-
5

0

5

10

15

-
15

-
10

-
5

0

5

10

15

x

1

x

2

PC 1

Synthetic Dataset


MSCA

-
15

-
10

-
5

0

5

10

15

-
15

-
10

-
5

0

5

10

15

x

1

x

2

p
b

p
w

MSCA


The within and between concepts are
analogous to the within and between
concepts in ANOVA.


Is an appropriate technique if variation in
the data occur on multiple levels


In the most simple case: performing twice
a PCA.

MSCA on Metabolomics data

Jeroen J. Jansen
1
, Huub C.J. Hoefsloot
1
, Marieke Timmerman, Jan van der
Greef
2,3
, Age K. Smilde
1,2


1)

Process Analysis and Chemometrics, Department of Chemical Engineering,

Faculty of Sciences, University of Amsterdam, Nieuwe Achtergracht 166, 1018 WV Amsterdam

2)
TNO Nutrition and Food Reseach, PO Box 360, 3700 AJ Zeist

3)
Beyond Genomics, 40 Bear Hill Road, Waltham, MA 02451

Contents


Introduction: what is metabolomics?


Metabolomics data


Data analysis of metabolomics data


Data analysis of multilevel data


Results



Toxic Stress



Nutrition



Environment



...

Time

Metabolic Activity

Monkey 1

Monkey 2

Monkey 3

Urine spectra










Genotype



Health



Age



...

External

Internal Influences

+

=

Variation in Urine

Monitoring Metabolism

Areas of interest in Metabolomics

A.

Monitor effect of toxic compounds or nutrition

B.
Finding biomarkers that indicate events

Introduction

Variation in metabolomics data


Variation
BETWEEN

individuals





Variation
WITHIN
each individual






ANOVA

2
2
2
2
within
between
offset
total



Monkey 1

Monkey 2

Monkey 3

Monkey 5

Monkey 4

x
2

x
1

Monkey 1

Monkey 2

Monkey 3

Monkey 5

Monkey 4

x
2

x
1

x
2

x
1

MSCA

Monkey 1

Monkey 2

Monkey 3

Monkey 5

Monkey 4

x
1

x
2

PC 1

PC 2

Between
-
individual
model

Monkey 1

Monkey 2

Monkey 3

Monkey 5

Monkey 4

x
1

x
2


PC 1

PC 2

Within
-
individual
model


Monkey 1


Monkey 10


Monkey 2


:

Loadings

Scores

means

Real Data: the monkey urine

“ANOVA”

2
2
2
2
within
between
offset
total



76 %

24 %

MSCA will give a different view on the data

MSCA will give a better model of the longitudinal variation

Difference between scores
obtained by PCA and MSCA

0

10

20

30

40

50

60

70

-
0.4

-
0.2

0

0.2

0.4

0.6

days

scores

scores monkey 6 PC 1

MSCA scores


PCA scores

0

10

20

30

40

50

60

70

-
0.2

-
0.15

-
0.1

-
0.05

0

0.05

0.1

0.15

0.2

days

scores

scores monkey 6 PC 3

MSCA scores


PCA scores

PCA and MSCA scores differ more for higher PCs

Between
-
individual level

-
0.25

-
0.2

-
0.15

-
0.1

-
0.05

0

0.05

0.1

0.15

-
0.25

-
0.2

-
0.15

-
0.1

-
0.05

0

0.05

0.1

0.15

0.2


1


4


8


9


10


2


3


5


6


7

PC1 ( 33.26 % of variance explained)

PC2 ( 29.08 % of variance explained)

female

male

From the between
-
individual level separation between male
and female monkeys can be observed

Loadings (rotated)

0

2

4

6

8

10

-
0.6

-
0.4

-
0.2

0

0.2

0.4

0.6

0.8

Chemical Shift (ppm)

Within
-
individual loadings for PC 1


2.35


7.82


1.32


1.34


8.46


2.93


3.97


3.28


3.05


1.93

0

2

4

6

8

10

-
0.8

-
0.6

-
0.4

-
0.2

0

0.2

0.4

0.6

Chemical Shift (ppm)

Between
-
individual loadings for PC 1


3.29


7.48


5.07


2.93


3.97


3.28


3.27


3.56


1.93


3.05

0

2

4

6

8

10

-
0.4

-
0.3

-
0.2

-
0.1

0

0.1

0.2

0.3

0.4

Chemical Shift (ppm)

Between
-
individual loadings for PC 3


7.36


3.7


3.93


3.05


1.94


3.69


3.28


3.03


5.07


3.27

0

2

4

6

8

10

-
0.6

-
0.4

-
0.2

0

0.2

0.4

Chemical Shift (ppm)

Within
-
individual loadings for PC 3


7.88


7.86


7.48


1.93


3.97


3.03


3.28


3.05


3.56


3.27

Conclusions: MSCA


MSCA is a member of the component
analysis family that uses ideas from
ANOVA


MSCA is the “correct” method for the
longitudinal analysis of Multilevel Data


More information can be obtained from
the monkey urine data using MSCA