“Cogito ergo sum” …or do I?: Causality vs Statistical ... - inaoe

hartebeestgrassΤεχνίτη Νοημοσύνη και Ρομποτική

7 Νοε 2013 (πριν από 3 χρόνια και 9 μήνες)

60 εμφανίσεις

“Cogito ergo
sum
” …
or

do I?:

When

can
Causality

be

inferred

from

DPGM

Felipe Orihuela
-
Espina

Instituto Nacional de Astrofísica, Óptica y
Electrónica (INAOE)

DyNaMo

Research

Meeting, 3
-
4th June 2011

Cogito ergo
sum

Cogito

Sum

Present

Not

present

Present

1.0

0.5

Not

present

0

0.5


A familiar (
for

the

audience
)
graphical

representation

Cogito

Sum

Cause

Effect

2

Felipe Orihuela
-
Espina (INAOE)

Why

is

causality

so
problematic
?


Cannot

be

computed

from

the

data
alone


Systematic

temporal
precedence

is

not

sufficient


Co
-
ocurrence

is

not

sufficient


It

is

not

always

a
direct

relation

(
indirect

relations
,
transitivity
/
mediation
,
etc

may

be

present
),
let

alone

linear…


It

may

occur

across

frequency

bands


YOU NAME IT HERE…


Which

process

causes
which
?

A
very

silly

example

3

Felipe Orihuela
-
Espina (INAOE)

Causality

is

so
difficult

that


it

would

be

very

healthy

if

more
researchers

abandoned

thinking

of and
using

terms

such

as cause and
effect
” [Muthen1987 in
PearlJ2011]

A real
example

An

ECG

[OrihuelaEspinaF2010]

[KaturaT2006]
only

claim

that

there

are
interrelations

(
quantified

using

MI)

4

Felipe Orihuela
-
Espina (INAOE)

THE CONTRIBUTION OF
PHYLOSOPHY

5

Felipe Orihuela
-
Espina (INAOE)

Causality

in
Phylosophy


Aristotle’s

four "causes"' of a
thing


The
material cause

(that out of
which the thing is made),


the
formal cause

(that into
which the thing is made),


the
efficient cause

(that which
makes the thing)
, and


the
final cause

(that for which
the thing is made).

Aristotle

(384BC
-
322BC)

In [HollandPW1986]

6

Felipe Orihuela
-
Espina (INAOE)

Causality

in
Phylosophy


Hume’s legacy


Sharp distinction between analytical
(thoughts) and empirical (facts) claims


Causal claims are empirical


All empirical claims originate from experience
(sensory input)



Hume’s three basic criteria for causation


(a) spatial/temporal contiguity,


(b) temporal succession, and


(c)
constant conjunction



It is not empirically verifiable that the cause
produces the effect, but only that the cause
is invariably followed by the effect.

David
Hume

(1711
-
1776)

[HollandPW1986, PeralJ1999_IJCAITalk]

7

Felipe Orihuela
-
Espina (INAOE)

Causality

in
Phylosophy


Mill’s general methods of
experimental enquiry


Method of concomitant variation (i.e.
correlation…)


Method of difference (i.e. causation)


Method of residues (i.e. induction)


Method of agreement (i.e. null effect


can only rule out possible causes)



Mill “only” coded these methods;
but they have been put forth by
Sir
Francis Bacon

250 years earlier (
The
Advancement of Learning

and
Novum

Organum

Scientiarum
)

John Stuart
Mill

(1806
-
1873)

In [HollandPW1986]

Sir Francis
Bacon

(1561
-
1626)

8

Felipe Orihuela
-
Espina (INAOE)

Causality

in
Phylosophy


Suppe’s

probabilistic

theory

of
causality


“…
one

event

is

the

cause of
another

if

the

appearance

of
the

first

is

followed

with

a
high

probability

by

the

appearance

of
the

second
,
and

there

is

no
third

event

that

we

can

use
to

factor
out

the

probability

relationship

between

the

first

and
second

events



C
is

a
genuine

cause of E
if
:


P(E|C)>P(E) (
prima
facie
) and


not

(P(E|C,D)=P(E|D) and
P(E|C,D)>=P(E|C)) (
spurious

cause
)


Patrick
Colonel

Suppes

(1922
-
)

Lucie

Stern

Emeritus

Proffesor

of
Philosophie

at
Stanford

[SuppeP1970, HollandPW1986]

9

Felipe Orihuela
-
Espina (INAOE)

CAUSALITY
: DIFFERENT VIEWS,
SAME CONCEPT

10

Felipe Orihuela
-
Espina (INAOE)

Causality

requires

time!


“…
there

is

little

use in
the

practice

of
attempting

to

dicuss

causality

without

introducing

time
” [Granger,1969]




whether

philosphical
,
statistical
,
econometrical
,
topological
,
etc


11

Felipe Orihuela
-
Espina (INAOE)

Causality

requires

directionality
!


Algebraic

equations
,
e.g.

regression

“do
not

properly

express

causal
relationships

[…]
because

algebraic

equations

are
symmetrical

objects

[…]
To

express

the

directionality

of
the

underlying

process
,
Wright

augmented

the

equation

with

a
diagram
,
later

called

path

diagram

in
which

arrows

are
drawn

from

causes
to

effects

[PearlJ2009]



Feedback

and
instantaneous

causality

in
any

case are
a
double

causation
.

Felipe Orihuela
-
Espina (INAOE)

12

From

association

to

causation


Barriers

between

classical

statistics

and
causal
analysis

[PearlJ2009]


1.
Coping

with

untested

assumptions

and
changing

conditions


2.
Inappropiate

mathematical

notation

13

Felipe Orihuela
-
Espina (INAOE)

Causality


Zero
-
level

causality
: a
statistical

association
,
i.e.

non
-
independence

which

cannot

be

removed
by

conditioning

on

allowable

alternative

features
.


i.e.

Granger’s
,
Topological


First
-
level

causality
: Use of a
treatment

over

another

causes a
change

in
outcome


i.e.

Rubin
´
s
,
Pearl’s


Second
-
level

causality
:
Explanation

via

a
generating

process
, provisional and
hardly

lending

to

formal
characterization
,
either

merely

hypothesized

or

solidly

based

on

evidence


i.e.

Suppe’s
,
Wright’s

path

analysis


e.g.

Smoking causes
lung

cancer

Inspired

from

[CoxDR2004]

Stronger

Weaker

It

is

debatable

whether

second

level

causality

is

indeed

causality


14

Felipe Orihuela
-
Espina (INAOE)

Variable
types

and
their

joint

probability

distribution


Variable
types
:


Background

variables

(B)


specify

what

is

fixed


Potential

causal variables
(C)


Intermediate

variables (I)


surrogates
,
monitoring
,
pathways
,
etc


Response variables

(R)


observed

effects



Joint

probability

distribution

of
the

variables:

P(RICB) = P(R|ICB)



P(I|CB)



P(C|B)



P(B)



but

it

is

possible

to

integrate

over

I (
marginalized
)

P(RCB) = P(R|CB)



P(C|B)



P(B)

In [CoxDR2004]

15

Felipe Orihuela
-
Espina (INAOE)

Granger’s

Causality


Granger
´
s

causality
:


Y
is

causing

X (Y

X)
if

we

are
better

to

predict

X
using

all

available

information

(Z)
than

if

the

information

apart

of Y
had

been

used
.



The groundbreaking paper:


Granger “
Investigating causal
relations by econometric models and
cross
-
spectral methods

Econometrica

37(3): 424
-
438



Granger’s

causality

is

only

a
statement

about

one

thing

happening
before

another
!


Rejects

instantaneous

causality



Considered

as
slowness

in
recording

of
information


Sir
Clive William John Granger

(1934

2009)


University of
Nottingham


Nobel Prize
Winner

16

Felipe Orihuela
-
Espina (INAOE)

Granger’s

Causality



The

future

cannot

cause
the

past
” [
Granger

1969]



the

direction

of
the

flow

of
time

[
is
] a central
feature



Feedback

is

a
double

causation
; X

Y and Y

X
denoted

X

Y




causality

is

based

entirely

on

the

predictability

of
some

series…”
[
Granger

1969]


Causal
relationships

may

be

investigated

in
terms

of
coherence

and
phase

diagrams

17

Felipe Orihuela
-
Espina (INAOE)

Topological

causality



A causal manifold is one with an assignment
to each of its points of a convex cone in the
tangent space, representing physically the
future directions at the point. The usual
causality in M
O

extends to a causal structure
in M’.
” [SegalIE1981]



Causality

is

seen

as
embedded

in
the

geometry
/
topology

of
manifolds


Causality

is

a curve
function

defined

over

the

manifdld



The groundbreaking book:


Segal IE “Mathematical Cosmology and
Extragalactic Astronomy” (1976)



I am
not

sure

whether

Segal

is

the

father

of
causal
manifolds
,
but

his

contribution

to

the

field

is

simply

overwhelming


Irving
Ezra

Segal

(1918
-
1998)
-

Professor of Mathematics at MIT

18

Felipe Orihuela
-
Espina (INAOE)

Causal (
homogeneous

Lorentzian
)
Manifolds
:
The

topological

view

of
causality


The

cone

of
causality

[SegalIE1981,RainerM1999,
MosleySN1990, KrymVR2002]

Instant

present

Future

Past

19

Felipe Orihuela
-
Espina (INAOE)

Rubin

Causal
Model


Rubin

Causal
Model
:



Intuitively
,
the causal effect of one
treatment relative to another for a
particular experimental unit is the
difference between the result if the
unit had been exposed to the first
treatment and the result if, instead,
the unit had been exposed to the
second treatment



The groundbreaking paper:


Rubin “
Bayesian inference for causal
effects: The role of randomization

The Annals of Statistics 6(1): 34
-
58


The

term

Rubin

causal
model

was

coined

by

his

student

Paul
Holland



Donald B Rubin (1943


)


John L. Loeb Professor of Stats
at Harvard

20

Felipe Orihuela
-
Espina (INAOE)

Rubin

Causal
Model


Causality

is

an

algebraic

difference
:


treatment

causes
the

effect

Y
treatment
(u)
-
Y
control
(u)




or

in
other

words
;
the

effect

of a cause
is

always

relative

to

another

cause

[HollandPW1986]



Rubin

causal
model

establishes

the

conditions

under

which

associational

(
e.g.

Bayesian
)
inference

may

infer

causality

(
makes

assumptions

for

causality

explicit
).

21

Felipe Orihuela
-
Espina (INAOE)

Fundamental
Problem

of Causal
Inference


Only

Y
treatment
(u)
or

Y
control
(u) can
be

observed

on

a
phenomena
,
but

not

both
.


Causal
inference

is

impossible

without

making

untested

assumptions



yet

causal
inference

is

still

possible

under

uncertainty

[HollandPW1986] (
two

otherwise

identical

populations

u

must

be

prepared

and
all

appropiate

background

variables
must

be

considered

in B).



Again
! (
see

slide

#15“Statistical
dependence

vs
Causality
”);
Causal
questions

cannot

be

computed

from

the

data
alone
,
nor

from

the

distributions

that

govern

the

data

[PearlJ2009]

22

Felipe Orihuela
-
Espina (INAOE)

Relation

between

Granger
,
Rubin

and
Suppes

causalities

Granger

Rubin’s

model

Cause (
Treatment
)

Y

t

Effect

X

Y
treatment
(u)

All

other

available

information

Z

Z (pre
-
exposure

variables)


Granger’s

noncausality
:

X
is

not

Granger

cause of Y (
relative

to

information

in Z)


X and Y are
conditionally

independent

(
i.e.

P(Y|X,Z)=P(Y|Z))


Granger’s

noncausality

is

equal

to

Suppes

spurious

case

Modified

from

[HollandPW1986]

23

Felipe Orihuela
-
Espina (INAOE)

Pearl’s

statistical

causality

(
a.k.a.

structural

theory
)



Causation

is

encoding

behaviour

under

intervention

[…]
Causality

tells

us

which

mechanisms

[
stable

functional

relationships
]
is

to

be

modified

[
i.e.

broken
]
by

a
given

action
” [PearlJ1999_IJCAI]



Causality
,
intervention

and
mechanisms

can
be

encapsulated

in a
causal
model



The groundbreaking book:


Pearl J “Causality: Models, Reasoning and
Inference” (2000)
*



Pearl’s

results

do
establish

conditions

under

which

first

level

causal
conclusions

are
possible

[CoxDR2004]

Judea
Pearl

(1936
-
)
-

Professor of computer science and
statistics at UCLA

[PearlJ2000, Lauritzen2000, DawidAP2002]

24

Felipe Orihuela
-
Espina (INAOE)

*
With

permission

of
his

1995
Biometrika

paper

masterpiece

Sewall

Green Wright
(1889
-
1988)


Father

of
path

analysis

(
graphical

rules)

Statistical

causality


Conditioning

vs
Intervening

[PearlJ2000]


Conditioning
: P(R|C)=P(R|CB)

P(B|C)


useful

but

innappropiate

for

causality

as
changes

in
the

past

(B)
occur

before

intervention

(C)


Intervention
:

P(R

C)=P(R|CB)

P(B)


Pearl
´
s

definition

of
causality



Underlying

assumption
:
The

distribution

of R (and I)
remains

unaffected

by

the

intervention
.


Watch

out
!
This

is

not

trivial


serious

interventions

may

distort

all

relations

[CoxDR2004]



β
CB
=0


C

B


P(R|C)=P(R

C)

Structural

coefficient

Conditional

independence



i.e.

there

is

no
difference

between

conditioning

and
intervention

25

Felipe Orihuela
-
Espina (INAOE)

LOOKING FOR CAUSALITY:
DYNAMIC

PROBABILISTIC CAUSAL MODELS AND
SOME OTHER ANALYTICAL TOOLS

26

Felipe Orihuela
-
Espina (INAOE)

Some

tools

for

looking

at
causality


beyond

the

interest

of
this

research

meeting


Structural

Causal
Models
*
and
Path

Analysis

[WrightS1921,1932, PearlJ2009]


Structural

Equation

Modelling

[WrightS1921,
PearlJ2011]


Dynamic

Transfer
Function

[
Kaminski

1991,
2001 and 2005]


Dynamic

Causal
Modelling

[FristonKJ2003]


Partial

Directed

Coherence

[BaccaláLA2001]

27

Felipe Orihuela
-
Espina (INAOE)


Well

this

one

is

of
interest
… as
it

is

the

father

of

probabilistic dynamic models

Bayesian

Networks


Bayesian

networks

are
structures

(
often

in
the

form

of
graph
)
describing

probabilistic

relationships

between

variables [PearlJ2000, KaminskiM2005]


Conditional

independencies

are
represented

by

missing

edges


Arrows

convey

causal
directionality

but

merely

indicate

the

possiblity

of a causal
relation

(
i.e.

they

are
only

a
notational

clue
);
implication

of
causality

must

be

discarded

as
inadequate

[PearlJ2009]



Conditional

distributions

e.g.

P(X|Y), determines
associational

distributions

[HollandPW1986]

28

Felipe Orihuela
-
Espina (INAOE)

Causal
Bayesian

Networks


The

problem

of
Identification
:


Can
the

controlled

(
post
-
intervention
)
distribution

P(R

C)
be

estimated

from

data
governed

by

the

pre
-
intervention

distribution

P(RCB)?



The

answer

is

a “
yes,
but




i.e.

as
long

as
we

account

for

general control of
confounding

and
counterfactuals
,
admissibility
,
Markovian

graphs

(
i.e.

acyclic

graph
),
ignorability
, and a
few

other

criteria

beyond

my
humble

human

limitation

seasoned

with

a
good

dose

of
inscrutable

maths
.



Some


recommended

reading

if

you

are up
to

the

challenge
:
[PearlJ2000, 2009, Lauritzen2000, DawidAP2002]

29

Felipe Orihuela
-
Espina (INAOE)

Dynamic

Graphical

Models


Tian’s

theorem
:


“A
sufficient

condition

for

identifying

a causal
effect

P(R

C)
is

that

every

path

between

C and
any

of
its

children

traces at
least

one

arrow

emanating

from

a
measured

variable I”



Translation

to

plain

English
:
You

ought

to

account

for

confounders

(
which

are
also

part

of
your

graph
)
and causal
relations

must

cross

through

those

confounders

(
i.e.

they

have

been

taken

into

account
)


Note
that

Tian’s

theorem

is

sufficient

but

not

necessary
,
i.e.

direct

links C

R
may

still

encode

direct

causality



More
translation

to

plain

English
:
P(R

C)
cannot

encode

questions

of
attribution

(
e.g.

how

many

deaths

are
due

to

specific

exposure
?)
or

of
susceptibility

(
e.g.

how

many

would

have

got

diseased

if

exposed
)


Note
the

important

implication

that

a
thoroughly
/
carefully

designed

randomized

control trial
may

not

suffice
!

30

Felipe Orihuela
-
Espina (INAOE)

Dynamic

Graphical

Models
:

A
common

error
when

using

them



Correct methodology of
structural approach to
causation [PearlJ2009]:

1.
Define

the target
quantity

2.
Assume:

Formulate
causal assumptions

3.
Identify
: Determine if
the target is
identifiable

4.
Estimate:

i.e.
approximate


Common application of
the methodology of
structural approach to
causation:

1.
Estimate:

i.e.
approximate

2.
Assume:

Formulate
causal assumptions

3.
Sometimes
Define

the
target quantity

Felipe Orihuela
-
Espina (INAOE)

31

Conclusions


Well

only

if

you

can
prove

no
other

factor
to

intervene


Cogito

Sum

?

33

Felipe Orihuela
-
Espina (INAOE)

THANKS!

Questions
?

Felipe Orihuela
-
Espina (INAOE)

34

BACK UP SLIDES

Felipe Orihuela
-
Espina (INAOE)

35

Structural

Causal
Models

and
Path

Analysis


[WrightS1921, 1932, GoldbergerA1972, 1973,
DuncanO1975, PearlJ2009]

36

Felipe Orihuela
-
Espina (INAOE)

Structural

Equation

Modelling


“a huge logical gap exists between

establishing causation
,” which requires
careful manipulative experiments, and

interpreting parameters as causal effects
” “
[PearlJ2011]

37

Felipe Orihuela
-
Espina (INAOE)

Dynamic

Transfer
Function


Uses
coherence

and
phase


Can
be

interpreted

in
terms

of
Granger’s

causality

[KaminskiM2001]

[
KaminskiM

1991, 2001 and 2005]

Coherence

Phase

Figure
from

[KaminskiM2001]

38

Felipe Orihuela
-
Espina (INAOE)

Dynamic

Causal
Modelling


A
bilinear

model

by

which

the

neural
model

(
not

observed
)
is

inferred

from

the

haemodynamic

model

(
observed
)
[FristonKJ2003]


Embodies

requisite

constraints

using

a
Bayesian

framework

Fig
.

1
.

This

is

a

schematic

illustrating

the

concepts

underlying

dynamic

causal

modelling
.

In

particular

it

highlights

the

two

distinct

ways

in

which

inputs

or

perturbations

can

elicit

responses

in

the

regions

or

nodes

that

compose

the

model
.

In

this

example

there

are

five

nodes,

including

visual

areas

V
1

and

V
4

in

the

fusiform

gyrus
,

areas

39

and

37
,

and

the

superior

temporal

gyrus

STG
.

Stimulus
-
bound

perturbations

designated

u
1

act

as

extrinsic

inputs

to

the

primary

visual

area

V
1
.

Stimulus
-
free

or

contextual

inputs

u
2

mediate

their

effects

by

modulating

the

coupling

between

V
4

and

BA
39

and

between

BA
37

and

V
4
.

For

example,

the

responses

in

the

angular

gyrus

(BA
39
)

are

caused

by

inputs

to

V
1

that

are

transformed

by

V
4
,

where

the

influences

exerted

by

V
4

are

sensitive

to

the

second

input
.

The

dark

square

boxes

represent

the

components

of

the

DCM

that

transform

the

state

variables

zi

in

each

region

(neuronal

activity)

into

a

measured

(hemodynamic)

response

yi

[FristonKJ2003]

39

Felipe Orihuela
-
Espina (INAOE)

Partial

Directed

Coherence


Based

on

Granger’s

causality

[BaccaláLA2001]

40

Felipe Orihuela
-
Espina (INAOE)

WHAT IT IS NOT CAUSALITY


AND
OTHER

COMMON

MISCONCEPTIONS

41

Felipe Orihuela
-
Espina (INAOE)

Statistical

dependence


Statistical

dependence

is

a
type

of
relation

between

any

two

variables [WermuthN1998]:
if

we

find

one
,
we

can
expect

to

find

the

other





The

limits

of
statistical

dependence


Statistical

independence
:
The

distribution

of
one

variable
is

the

same

no
matter

at
which

level

changes

occur

on

in
the

other

variable

X and Y are
independent



P(X

Y)=P(X)

P(Y)


Deterministic

dependence
:
Levels

of
one

variable
occur

in
an

exactly

determined

way

with

changing

levels

of
the

other
.


Association
:
Intermediate

forms

of
statistical

dependency


Symmetric


Asymmetric

(
a.k.a.

response)
or

directed

association

Statistical

independence

Association

(
symmetric

or

assymettric
)

Deterministic

dependence

42

Felipe Orihuela
-
Espina (INAOE)

Associational Inference ≡ Descriptive Statistics!!!


The most detailed information linking two
variables is given by the
joint distribution
:

P(X=
x,Y
=y)


The
conditional distribution

describes how the
values of X changes as Y varies:

P(X=
x|Y
=y)=P(X=
x,Y
=y)/P(Y=y)


Associational statistics is simply descriptive

(estimates, regressions, posterior distributions,
etc…) [HollandPW1986]


Example: Regression of X on Y


is the conditional
expectation E(X|Y=y)

43

Felipe Orihuela
-
Espina (INAOE)

Regression

and
Correlation
;

two

common

forms

of
associational

inference


Regression

Analysis
: “
the study of the
dependence

of one or more response
variables
on explanatory variables
” [CoxDR2004]


Strong

regression


causality

[Box1966]


Prediction

systems

≠ Causal
systems

[CoxDR2004]



Correlation

is

a
relation

over

mean
values
;
two

variables
correlate

as
they

move

over
/
under

their

mean
together

(
correlation

is

a ”
normalization
” of
the

covariance
)



Correlation


Statistical

dependence


If

r=0 (
i.e.

absence

of
correlation
), X and Y are
statistically

independent
,
but

the

opposite

is

not

true
[MarrelecG2005].


Correlation


Causation

[YuleU1900 in CoxDR2004, WrightS1921]


Yet
, causal
conclusions

from

a
carefully

design

(
often

synonym

of
randomized
)
experiment

are
often

(
not

always
)
valid

[HollandPW1986, FisherRA1926 in CoxDR2004]

44

Felipe Orihuela
-
Espina (INAOE)

Coherence
:

yet

another

common

form

of
associational

inference


Often

understood

as “
correlation

in
the

frequency

domain



C
xy

= |G
xy
|
2
/(
G
xx

G
yy
)



where

G
xy

is

the

cross
-
spectral

density
,


i.e.

coherence

is

the

ratio
between

the

(
squared
)
correlation

coefficient

and
the

frequency

components
.



Coherence

measures

the

degree

to

which

two

series are
related


Coherence

alone

does

not

implies

causality
!

The

temporal
lag

of
the

phase

difference

between

the

signals

must

also

be

considered
.

45

Felipe Orihuela
-
Espina (INAOE)

Statistical

dependence

vs
Causality


Statistical

dependence

provide

associational

relations

and can
be

expressed

in
terms

of a
joint

distribution

alone


Causal
relations

CANNOT

be

expressed

on

terms

of
statistical

association

alone

[PearlJ2009]



Associational

inference

≠ Causal
Inference

[HollandPW1986, PearlJ2009]


…ergo,
Statistical

dependence

≠ Causal
Inference


In associational inference,
time

is merely operational



46

Felipe Orihuela
-
Espina (INAOE)

Causation

defies

(1st
level
)
logic



Input:



If

the

floor

is

wet
,
then

it

rained




If

we

break
this

bottle
,
the

floor

will

get

wet




Logic

output:



If

we

break
this

bottle
,
then

it

rained


Example

taken

from

[PearlJ1999]

47

Felipe Orihuela
-
Espina (INAOE)