“Cogito ergo
sum
” …
or
do I?:
When
can
Causality
be
inferred
from
DPGM
Felipe Orihuela

Espina
Instituto Nacional de Astrofísica, Óptica y
Electrónica (INAOE)
DyNaMo
Research
Meeting, 3

4th June 2011
Cogito ergo
sum
Cogito
Sum
Present
Not
present
Present
1.0
0.5
Not
present
0
0.5
•
A familiar (
for
the
audience
)
graphical
representation
Cogito
Sum
Cause
Effect
2
Felipe Orihuela

Espina (INAOE)
Why
is
causality
so
problematic
?
•
Cannot
be
computed
from
the
data
alone
•
Systematic
temporal
precedence
is
not
sufficient
•
Co

ocurrence
is
not
sufficient
•
It
is
not
always
a
direct
relation
(
indirect
relations
,
transitivity
/
mediation
,
etc
may
be
present
),
let
alone
linear…
•
It
may
occur
across
frequency
bands
•
YOU NAME IT HERE…
Which
process
causes
which
?
A
very
silly
example
3
Felipe Orihuela

Espina (INAOE)
Causality
is
so
difficult
that
“
it
would
be
very
healthy
if
more
researchers
abandoned
thinking
of and
using
terms
such
as cause and
effect
” [Muthen1987 in
PearlJ2011]
A real
example
An
ECG
[OrihuelaEspinaF2010]
[KaturaT2006]
only
claim
that
there
are
interrelations
(
quantified
using
MI)
4
Felipe Orihuela

Espina (INAOE)
THE CONTRIBUTION OF
PHYLOSOPHY
5
Felipe Orihuela

Espina (INAOE)
Causality
in
Phylosophy
•
Aristotle’s
four "causes"' of a
thing
–
The
material cause
(that out of
which the thing is made),
–
the
formal cause
(that into
which the thing is made),
–
the
efficient cause
(that which
makes the thing)
, and
–
the
final cause
(that for which
the thing is made).
Aristotle
(384BC

322BC)
In [HollandPW1986]
6
Felipe Orihuela

Espina (INAOE)
Causality
in
Phylosophy
•
Hume’s legacy
–
Sharp distinction between analytical
(thoughts) and empirical (facts) claims
–
Causal claims are empirical
–
All empirical claims originate from experience
(sensory input)
•
Hume’s three basic criteria for causation
–
(a) spatial/temporal contiguity,
–
(b) temporal succession, and
–
(c)
constant conjunction
•
It is not empirically verifiable that the cause
produces the effect, but only that the cause
is invariably followed by the effect.
David
Hume
(1711

1776)
[HollandPW1986, PeralJ1999_IJCAITalk]
7
Felipe Orihuela

Espina (INAOE)
Causality
in
Phylosophy
•
Mill’s general methods of
experimental enquiry
–
Method of concomitant variation (i.e.
correlation…)
–
Method of difference (i.e. causation)
–
Method of residues (i.e. induction)
–
Method of agreement (i.e. null effect
–
can only rule out possible causes)
•
Mill “only” coded these methods;
but they have been put forth by
Sir
Francis Bacon
250 years earlier (
The
Advancement of Learning
and
Novum
Organum
Scientiarum
)
John Stuart
Mill
(1806

1873)
In [HollandPW1986]
Sir Francis
Bacon
(1561

1626)
8
Felipe Orihuela

Espina (INAOE)
Causality
in
Phylosophy
•
Suppe’s
probabilistic
theory
of
causality
–
“…
one
event
is
the
cause of
another
if
the
appearance
of
the
first
is
followed
with
a
high
probability
by
the
appearance
of
the
second
,
and
there
is
no
third
event
that
we
can
use
to
factor
out
the
probability
relationship
between
the
first
and
second
events
”
–
C
is
a
genuine
cause of E
if
:
•
P(EC)>P(E) (
prima
facie
) and
•
not
(P(EC,D)=P(ED) and
P(EC,D)>=P(EC)) (
spurious
cause
)
Patrick
Colonel
Suppes
(1922

)
Lucie
Stern
Emeritus
Proffesor
of
Philosophie
at
Stanford
[SuppeP1970, HollandPW1986]
9
Felipe Orihuela

Espina (INAOE)
CAUSALITY
: DIFFERENT VIEWS,
SAME CONCEPT
10
Felipe Orihuela

Espina (INAOE)
Causality
requires
time!
•
“…
there
is
little
use in
the
practice
of
attempting
to
dicuss
causality
without
introducing
time
” [Granger,1969]
–
…
whether
philosphical
,
statistical
,
econometrical
,
topological
,
etc
…
11
Felipe Orihuela

Espina (INAOE)
Causality
requires
directionality
!
•
Algebraic
equations
,
e.g.
regression
“do
not
properly
express
causal
relationships
[…]
because
algebraic
equations
are
symmetrical
objects
[…]
To
express
the
directionality
of
the
underlying
process
,
Wright
augmented
the
equation
with
a
diagram
,
later
called
path
diagram
in
which
arrows
are
drawn
from
causes
to
effects
”
[PearlJ2009]
–
Feedback
and
instantaneous
causality
in
any
case are
a
double
causation
.
Felipe Orihuela

Espina (INAOE)
12
From
association
to
causation
•
Barriers
between
classical
statistics
and
causal
analysis
[PearlJ2009]
1.
Coping
with
untested
assumptions
and
changing
conditions
2.
Inappropiate
mathematical
notation
13
Felipe Orihuela

Espina (INAOE)
Causality
•
Zero

level
causality
: a
statistical
association
,
i.e.
non

independence
which
cannot
be
removed
by
conditioning
on
allowable
alternative
features
.
–
i.e.
Granger’s
,
Topological
•
First

level
causality
: Use of a
treatment
over
another
causes a
change
in
outcome
–
i.e.
Rubin
´
s
,
Pearl’s
•
Second

level
causality
:
Explanation
via
a
generating
process
, provisional and
hardly
lending
to
formal
characterization
,
either
merely
hypothesized
or
solidly
based
on
evidence
–
i.e.
Suppe’s
,
Wright’s
path
analysis
–
e.g.
Smoking causes
lung
cancer
Inspired
from
[CoxDR2004]
Stronger
Weaker
It
is
debatable
whether
second
level
causality
is
indeed
causality
14
Felipe Orihuela

Espina (INAOE)
Variable
types
and
their
joint
probability
distribution
•
Variable
types
:
–
Background
variables
(B)
–
specify
what
is
fixed
–
Potential
causal variables
(C)
–
Intermediate
variables (I)
–
surrogates
,
monitoring
,
pathways
,
etc
–
Response variables
(R)
–
observed
effects
•
Joint
probability
distribution
of
the
variables:
P(RICB) = P(RICB)
P(ICB)
P(CB)
P(B)
…
but
it
is
possible
to
integrate
over
I (
marginalized
)
P(RCB) = P(RCB)
P(CB)
P(B)
In [CoxDR2004]
15
Felipe Orihuela

Espina (INAOE)
Granger’s
Causality
•
Granger
´
s
causality
:
–
Y
is
causing
X (Y
X)
if
we
are
better
to
predict
X
using
all
available
information
(Z)
than
if
the
information
apart
of Y
had
been
used
.
•
The groundbreaking paper:
–
Granger “
Investigating causal
relations by econometric models and
cross

spectral methods
”
Econometrica
37(3): 424

438
•
Granger’s
causality
is
only
a
statement
about
one
thing
happening
before
another
!
–
Rejects
instantaneous
causality
Considered
as
slowness
in
recording
of
information
Sir
Clive William John Granger
(1934
–
2009)
–
University of
Nottingham
–
Nobel Prize
Winner
16
Felipe Orihuela

Espina (INAOE)
Granger’s
Causality
•
“
The
future
cannot
cause
the
past
” [
Granger
1969]
–
“
the
direction
of
the
flow
of
time
[
is
] a central
feature
”
–
Feedback
is
a
double
causation
; X
Y and Y
X
denoted
X
Y
•
“
causality
…
is
based
entirely
on
the
predictability
of
some
series…”
[
Granger
1969]
–
Causal
relationships
may
be
investigated
in
terms
of
coherence
and
phase
diagrams
17
Felipe Orihuela

Espina (INAOE)
Topological
causality
•
“
A causal manifold is one with an assignment
to each of its points of a convex cone in the
tangent space, representing physically the
future directions at the point. The usual
causality in M
O
extends to a causal structure
in M’.
” [SegalIE1981]
•
Causality
is
seen
as
embedded
in
the
geometry
/
topology
of
manifolds
–
Causality
is
a curve
function
defined
over
the
manifdld
•
The groundbreaking book:
–
Segal IE “Mathematical Cosmology and
Extragalactic Astronomy” (1976)
•
I am
not
sure
whether
Segal
is
the
father
of
causal
manifolds
,
but
his
contribution
to
the
field
is
simply
overwhelming
…
Irving
Ezra
Segal
(1918

1998)

Professor of Mathematics at MIT
18
Felipe Orihuela

Espina (INAOE)
Causal (
homogeneous
Lorentzian
)
Manifolds
:
The
topological
view
of
causality
•
The
cone
of
causality
[SegalIE1981,RainerM1999,
MosleySN1990, KrymVR2002]
Instant
present
Future
Past
19
Felipe Orihuela

Espina (INAOE)
Rubin
Causal
Model
•
Rubin
Causal
Model
:
–
“
Intuitively
,
the causal effect of one
treatment relative to another for a
particular experimental unit is the
difference between the result if the
unit had been exposed to the first
treatment and the result if, instead,
the unit had been exposed to the
second treatment
”
•
The groundbreaking paper:
–
Rubin “
Bayesian inference for causal
effects: The role of randomization
”
The Annals of Statistics 6(1): 34

58
•
The
term
Rubin
causal
model
was
coined
by
his
student
Paul
Holland
Donald B Rubin (1943
–
)
–
John L. Loeb Professor of Stats
at Harvard
20
Felipe Orihuela

Espina (INAOE)
Rubin
Causal
Model
•
Causality
is
an
algebraic
difference
:
treatment
causes
the
effect
Y
treatment
(u)

Y
control
(u)
…
or
in
other
words
;
the
effect
of a cause
is
always
relative
to
another
cause
[HollandPW1986]
•
Rubin
causal
model
establishes
the
conditions
under
which
associational
(
e.g.
Bayesian
)
inference
may
infer
causality
(
makes
assumptions
for
causality
explicit
).
21
Felipe Orihuela

Espina (INAOE)
Fundamental
Problem
of Causal
Inference
•
Only
Y
treatment
(u)
or
Y
control
(u) can
be
observed
on
a
phenomena
,
but
not
both
.
–
Causal
inference
is
impossible
without
making
untested
assumptions
–
…
yet
causal
inference
is
still
possible
under
uncertainty
[HollandPW1986] (
two
otherwise
identical
populations
u
must
be
prepared
and
all
appropiate
background
variables
must
be
considered
in B).
•
Again
! (
see
slide
#15“Statistical
dependence
vs
Causality
”);
Causal
questions
cannot
be
computed
from
the
data
alone
,
nor
from
the
distributions
that
govern
the
data
[PearlJ2009]
22
Felipe Orihuela

Espina (INAOE)
Relation
between
Granger
,
Rubin
and
Suppes
causalities
Granger
Rubin’s
model
Cause (
Treatment
)
Y
t
Effect
X
Y
treatment
(u)
All
other
available
information
Z
Z (pre

exposure
variables)
•
Granger’s
noncausality
:
X
is
not
Granger
cause of Y (
relative
to
information
in Z)
X and Y are
conditionally
independent
(
i.e.
P(YX,Z)=P(YZ))
•
Granger’s
noncausality
is
equal
to
Suppes
spurious
case
Modified
from
[HollandPW1986]
23
Felipe Orihuela

Espina (INAOE)
Pearl’s
statistical
causality
(
a.k.a.
structural
theory
)
•
“
Causation
is
encoding
behaviour
under
intervention
[…]
Causality
tells
us
which
mechanisms
[
stable
functional
relationships
]
is
to
be
modified
[
i.e.
broken
]
by
a
given
action
” [PearlJ1999_IJCAI]
•
Causality
,
intervention
and
mechanisms
can
be
encapsulated
in a
causal
model
•
The groundbreaking book:
–
Pearl J “Causality: Models, Reasoning and
Inference” (2000)
*
•
Pearl’s
results
do
establish
conditions
under
which
first
level
causal
conclusions
are
possible
[CoxDR2004]
Judea
Pearl
(1936

)

Professor of computer science and
statistics at UCLA
[PearlJ2000, Lauritzen2000, DawidAP2002]
24
Felipe Orihuela

Espina (INAOE)
*
With
permission
of
his
1995
Biometrika
paper
masterpiece
Sewall
Green Wright
(1889

1988)
–
Father
of
path
analysis
(
graphical
rules)
Statistical
causality
•
Conditioning
vs
Intervening
[PearlJ2000]
–
Conditioning
: P(RC)=P(RCB)
P(BC)
useful
but
innappropiate
for
causality
as
changes
in
the
past
(B)
occur
before
intervention
(C)
–
Intervention
:
P(R
║
C)=P(RCB)
P(B)
Pearl
´
s
definition
of
causality
•
Underlying
assumption
:
The
distribution
of R (and I)
remains
unaffected
by
the
intervention
.
–
Watch
out
!
This
is
not
trivial
serious
interventions
may
distort
all
relations
[CoxDR2004]
•
β
CB
=0
C
╨
B
P(RC)=P(R
║
C)
Structural
coefficient
Conditional
independence
i.e.
there
is
no
difference
between
conditioning
and
intervention
25
Felipe Orihuela

Espina (INAOE)
LOOKING FOR CAUSALITY:
DYNAMIC
PROBABILISTIC CAUSAL MODELS AND
SOME OTHER ANALYTICAL TOOLS
26
Felipe Orihuela

Espina (INAOE)
Some
tools
for
looking
at
causality
…
beyond
the
interest
of
this
research
meeting
•
Structural
Causal
Models
*
and
Path
Analysis
[WrightS1921,1932, PearlJ2009]
–
Structural
Equation
Modelling
[WrightS1921,
PearlJ2011]
•
Dynamic
Transfer
Function
[
Kaminski
1991,
2001 and 2005]
•
Dynamic
Causal
Modelling
[FristonKJ2003]
•
Partial
Directed
Coherence
[BaccaláLA2001]
27
Felipe Orihuela

Espina (INAOE)
•
Well
…
this
one
is
of
interest
… as
it
is
the
father
of
probabilistic dynamic models
Bayesian
Networks
•
Bayesian
networks
are
structures
(
often
in
the
form
of
graph
)
describing
probabilistic
relationships
between
variables [PearlJ2000, KaminskiM2005]
–
Conditional
independencies
are
represented
by
missing
edges
–
Arrows
convey
causal
directionality
but
merely
indicate
the
possiblity
of a causal
relation
(
i.e.
they
are
only
a
notational
clue
);
implication
of
causality
must
be
discarded
as
inadequate
[PearlJ2009]
•
Conditional
distributions
e.g.
P(XY), determines
associational
distributions
[HollandPW1986]
28
Felipe Orihuela

Espina (INAOE)
Causal
Bayesian
Networks
•
The
problem
of
Identification
:
–
Can
the
controlled
(
post

intervention
)
distribution
P(R
║
C)
be
estimated
from
data
governed
by
the
pre

intervention
distribution
P(RCB)?
–
The
answer
is
a “
yes,
but
…
”
•
i.e.
as
long
as
we
account
for
general control of
confounding
and
counterfactuals
,
admissibility
,
Markovian
graphs
(
i.e.
acyclic
graph
),
ignorability
, and a
few
other
criteria
beyond
my
humble
human
limitation
…
seasoned
with
a
good
dose
of
inscrutable
maths
.
•
Some
“
recommended
”
reading
if
you
are up
to
the
challenge
:
[PearlJ2000, 2009, Lauritzen2000, DawidAP2002]
29
Felipe Orihuela

Espina (INAOE)
Dynamic
Graphical
Models
•
Tian’s
theorem
:
–
“A
sufficient
condition
for
identifying
a causal
effect
P(R
║
C)
is
that
every
path
between
C and
any
of
its
children
traces at
least
one
arrow
emanating
from
a
measured
variable I”
–
Translation
to
plain
English
:
You
ought
to
account
for
confounders
(
which
are
also
part
of
your
graph
)
and causal
relations
must
cross
through
those
confounders
(
i.e.
they
have
been
taken
into
account
)
•
Note
that
Tian’s
theorem
is
sufficient
but
not
necessary
,
i.e.
direct
links C
R
may
still
encode
direct
causality
–
More
translation
to
plain
English
:
P(R
║
C)
cannot
encode
questions
of
attribution
(
e.g.
how
many
deaths
are
due
to
specific
exposure
?)
or
of
susceptibility
(
e.g.
how
many
would
have
got
diseased
if
exposed
)
•
Note
the
important
implication
that
a
thoroughly
/
carefully
designed
randomized
control trial
may
not
suffice
!
30
Felipe Orihuela

Espina (INAOE)
Dynamic
Graphical
Models
:
A
common
error
when
using
them
…
•
Correct methodology of
structural approach to
causation [PearlJ2009]:
1.
Define
the target
quantity
2.
Assume:
Formulate
causal assumptions
3.
Identify
: Determine if
the target is
identifiable
4.
Estimate:
i.e.
approximate
•
Common application of
the methodology of
structural approach to
causation:
1.
Estimate:
i.e.
approximate
2.
Assume:
Formulate
causal assumptions
3.
Sometimes
Define
the
target quantity
Felipe Orihuela

Espina (INAOE)
31
Conclusions
•
Well
…
only
if
you
can
prove
no
other
factor
to
intervene
…
Cogito
Sum
?
33
Felipe Orihuela

Espina (INAOE)
THANKS!
Questions
?
Felipe Orihuela

Espina (INAOE)
34
BACK UP SLIDES
Felipe Orihuela

Espina (INAOE)
35
Structural
Causal
Models
and
Path
Analysis
•
[WrightS1921, 1932, GoldbergerA1972, 1973,
DuncanO1975, PearlJ2009]
36
Felipe Orihuela

Espina (INAOE)
Structural
Equation
Modelling
•
“a huge logical gap exists between
“
establishing causation
,” which requires
careful manipulative experiments, and
“
interpreting parameters as causal effects
” “
[PearlJ2011]
37
Felipe Orihuela

Espina (INAOE)
Dynamic
Transfer
Function
•
Uses
coherence
and
phase
•
Can
be
interpreted
in
terms
of
Granger’s
causality
[KaminskiM2001]
[
KaminskiM
1991, 2001 and 2005]
Coherence
Phase
Figure
from
[KaminskiM2001]
38
Felipe Orihuela

Espina (INAOE)
Dynamic
Causal
Modelling
•
A
bilinear
model
by
which
the
neural
model
(
not
observed
)
is
inferred
from
the
haemodynamic
model
(
observed
)
[FristonKJ2003]
•
Embodies
requisite
constraints
using
a
Bayesian
framework
Fig
.
1
.
This
is
a
schematic
illustrating
the
concepts
underlying
dynamic
causal
modelling
.
In
particular
it
highlights
the
two
distinct
ways
in
which
inputs
or
perturbations
can
elicit
responses
in
the
regions
or
nodes
that
compose
the
model
.
In
this
example
there
are
five
nodes,
including
visual
areas
V
1
and
V
4
in
the
fusiform
gyrus
,
areas
39
and
37
,
and
the
superior
temporal
gyrus
STG
.
Stimulus

bound
perturbations
designated
u
1
act
as
extrinsic
inputs
to
the
primary
visual
area
V
1
.
Stimulus

free
or
contextual
inputs
u
2
mediate
their
effects
by
modulating
the
coupling
between
V
4
and
BA
39
and
between
BA
37
and
V
4
.
For
example,
the
responses
in
the
angular
gyrus
(BA
39
)
are
caused
by
inputs
to
V
1
that
are
transformed
by
V
4
,
where
the
influences
exerted
by
V
4
are
sensitive
to
the
second
input
.
The
dark
square
boxes
represent
the
components
of
the
DCM
that
transform
the
state
variables
zi
in
each
region
(neuronal
activity)
into
a
measured
(hemodynamic)
response
yi
[FristonKJ2003]
39
Felipe Orihuela

Espina (INAOE)
Partial
Directed
Coherence
•
Based
on
Granger’s
causality
[BaccaláLA2001]
40
Felipe Orihuela

Espina (INAOE)
WHAT IT IS NOT CAUSALITY
–
AND
OTHER
COMMON
MISCONCEPTIONS
41
Felipe Orihuela

Espina (INAOE)
Statistical
dependence
•
Statistical
dependence
is
a
type
of
relation
between
any
two
variables [WermuthN1998]:
if
we
find
one
,
we
can
expect
to
find
the
other
•
The
limits
of
statistical
dependence
–
Statistical
independence
:
The
distribution
of
one
variable
is
the
same
no
matter
at
which
level
changes
occur
on
in
the
other
variable
X and Y are
independent
P(X
∩
Y)=P(X)
P(Y)
–
Deterministic
dependence
:
Levels
of
one
variable
occur
in
an
exactly
determined
way
with
changing
levels
of
the
other
.
–
Association
:
Intermediate
forms
of
statistical
dependency
•
Symmetric
•
Asymmetric
(
a.k.a.
response)
or
directed
association
Statistical
independence
Association
(
symmetric
or
assymettric
)
Deterministic
dependence
42
Felipe Orihuela

Espina (INAOE)
Associational Inference ≡ Descriptive Statistics!!!
•
The most detailed information linking two
variables is given by the
joint distribution
:
P(X=
x,Y
=y)
•
The
conditional distribution
describes how the
values of X changes as Y varies:
P(X=
xY
=y)=P(X=
x,Y
=y)/P(Y=y)
•
Associational statistics is simply descriptive
(estimates, regressions, posterior distributions,
etc…) [HollandPW1986]
–
Example: Regression of X on Y
is the conditional
expectation E(XY=y)
43
Felipe Orihuela

Espina (INAOE)
Regression
and
Correlation
;
two
common
forms
of
associational
inference
•
Regression
Analysis
: “
the study of the
dependence
of one or more response
variables
on explanatory variables
” [CoxDR2004]
–
Strong
regression
≠
causality
[Box1966]
–
Prediction
systems
≠ Causal
systems
[CoxDR2004]
•
Correlation
is
a
relation
over
mean
values
;
two
variables
correlate
as
they
move
over
/
under
their
mean
together
(
correlation
is
a ”
normalization
” of
the
covariance
)
•
Correlation
≠
Statistical
dependence
–
If
r=0 (
i.e.
absence
of
correlation
), X and Y are
statistically
independent
,
but
the
opposite
is
not
true
[MarrelecG2005].
•
Correlation
≠
Causation
[YuleU1900 in CoxDR2004, WrightS1921]
–
Yet
, causal
conclusions
from
a
carefully
design
(
often
synonym
of
randomized
)
experiment
are
often
(
not
always
)
valid
[HollandPW1986, FisherRA1926 in CoxDR2004]
44
Felipe Orihuela

Espina (INAOE)
Coherence
:
yet
another
common
form
of
associational
inference
•
Often
understood
as “
correlation
in
the
frequency
domain
”
C
xy
= G
xy

2
/(
G
xx
G
yy
)
–
where
G
xy
is
the
cross

spectral
density
,
–
i.e.
coherence
is
the
ratio
between
the
(
squared
)
correlation
coefficient
and
the
frequency
components
.
•
Coherence
measures
the
degree
to
which
two
series are
related
–
Coherence
alone
does
not
implies
causality
!
The
temporal
lag
of
the
phase
difference
between
the
signals
must
also
be
considered
.
45
Felipe Orihuela

Espina (INAOE)
Statistical
dependence
vs
Causality
•
Statistical
dependence
provide
associational
relations
and can
be
expressed
in
terms
of a
joint
distribution
alone
–
Causal
relations
CANNOT
be
expressed
on
terms
of
statistical
association
alone
[PearlJ2009]
•
Associational
inference
≠ Causal
Inference
[HollandPW1986, PearlJ2009]
–
…ergo,
Statistical
dependence
≠ Causal
Inference
–
In associational inference,
time
is merely operational
46
Felipe Orihuela

Espina (INAOE)
Causation
defies
(1st
level
)
logic
…
•
Input:
–
“
If
the
floor
is
wet
,
then
it
rained
”
–
“
If
we
break
this
bottle
,
the
floor
will
get
wet
”
•
Logic
output:
–
“
If
we
break
this
bottle
,
then
it
rained
”
Example
taken
from
[PearlJ1999]
47
Felipe Orihuela

Espina (INAOE)
Σχόλια 0
Συνδεθείτε για να κοινοποιήσετε σχόλιο