b.

ocelotgiantΤεχνίτη Νοημοσύνη και Ρομποτική

7 Νοε 2013 (πριν από 3 χρόνια και 7 μήνες)

58 εμφανίσεις

DNA Matching as an Example of
Bayesian Inference

Bert
Kappen
, Willem Burgers,
Wim

Wiegerinck

Bayesian

methods

for

BD


Combine


Data


Domain
knowledge




The Identification Problem

Given
:


(E
.
g
.
)

a

disaster

with



A lot
of
DNA profiles
of victims, these are the

unidentified
individuals (UI
)


A lot of
pedigrees

(=family tree) with each one or
more
missing persons (MP)
and reference
DNA
profiles
of
family members


Question
:

Which

UI

matches

with

which

MP?

Bonaparte calculates the likelihood ratio as a measure of ‘fit’ in a pedigree:

LR = P(
E|H
p
)/P(
E|H
d
)

H
p

: UI is MP

H
d


:UI and MP are unrelated

STR

Human genome



99.7% of DNA is the same,
0.3% differs



Hyper variable regions, STR


Non
-
coding


Measurable by PCR

(
P
olymerase
C
hain Reaction)

STR (Short Tandem Repeat)

14

17

18

16

12

13

17

17

30

32

14

14

Y

X

16

16

17

18

12

13

23

15

29

31

14

15

X

X

14

16

18

18

13

12

17

23

32

31

14

15

X

X

Paternal alleles

(left column)

Maternal alleles

(right column)

father

mother

daughter

Independent inheritance
of chromosomes



Independent inheritance
at
each
locus



W.Wiegerinck, W. Burgers

6

Gregor

Johann
Mendel

1822
-

1884

Inheritance

8

mutation

How
to

model the
probabilities
?

X

X

P(
G
v
,G
m
,G
UI
|UI
=child of
v
and
m)

P(
G
v
,G
m
,G
UI
|UI
=unrelated person)



….

W.Wiegerinck, W. Burgers

7


Depends on the pedigree


Needs a model that can be generated automatically

-

given pedigree and data input

-

for any possible pedigree

Bayesian networks


Bonaparte
Bayesian

network


Modeling:


which are the variables?


o
bservables,
hiddens
?


What are the relations?



Bonaparte


DNA Transmission model


DNA priors


Genotype observation model


DNA
transmission

model

14

17

14

X
f
f
(
i
)

= Paternal allele father of
i

f(i)

m(
i
)

i

This is for a single locus,

independent
inheritance at each locus

Paternal

Allele

Father

Maternal

Allele

Father

Paternal

Allele

Child

X
m
f(i)

= Maternal allele father of i

X
f
i

= Paternal allele of i

P(
X
f
i

|
X
f
f
(
i
)

,
X
m
f
(
i
)
)


= ½ (P(
X
f
i

|
X
f
f
(
i
)

)
+ P(
X
f
i

|
X
m
f
(
i
)
) )

P(
X
f
i

=
a

|
X
f
f
(
i
)

=b
)
etc
:

probability that paternal allele is
a


when the transmitted allele is
b.

(Mutation
model


later more)


10

Per locus: Father transmits
either

his paternal allele
or

his maternal allele


(idem for mother)

17

(or )

Population
priors

No parents in the pedigree



P(X
f
i

, X
m
i
)

= P(X
f
i
)
P(X
m
i
)
(assumes sufficient mixing, etc.
other models are possible)


P(X
f
i
) ,
P(X
m
i
)
based on
population statistics


Other possibilities:
subpopulations, etc.

Paternal

Allele

Father

Maternal

Allele

Father

Paternal

Allele

Child

W.Wiegerinck, W. Burgers

11

Population statistics

X

X

W.Wiegerinck, W. Burgers

12

13

14

17

14

11

11

13

16

11

10

23.2

25

Y

X

14

13

17

14

11

11

16

13

10

11

23.2

25

Y

X

or



D19S433 = {13,14} , …



only unordered pairs are
observed
: (phase is lost: {13,14}={14,13
})



important to keep track of family relations


or …

W.Wiegerinck, W. Burgers

13

Paternal alleles

(left column)

Maternal alleles

(right column)

Genotype
observation

model

Maternal

Allele

i

Paternal

Allele

i

Genotype

i



P(
G
i
= { A,B

} |
X
f
i

=
A,
X
m
i

=
B
) = 1



P(
G
i
= { A,B

} |
X
f
i

=
B,
X
m
i

=
A
) = 1



0 otherwise


More general,
with measurement
errors:

P(
Gi
= { A,B

} |
X
f
i

=
C,
X
m
i

=
D
) =

P(
Gi
= { A,B

} |
X
f
i

=
D,
X
m
i

=
C
) = …




W.Wiegerinck, W. Burgers

14

A
lleles (A,B)


genotype
{A,B}.

Alleles (B,A)


genotype {A,B}.

Bayesian network of a pedigree

Genotype

Father

Paternal

Allele

Father

Maternal

Allele

Father

Paternal

Allele

Child

Genotype

Child

Genotype

Mother

Paternal

Allele

Mother

Maternal

Allele

Mother

Maternal

Allele

Child

W.Wiegerinck, W. Burgers

15

Bayesian network of a pedigree

SUM RULE : P(
E
|H=pedigree
) =
Σ

P(
E,
others

|H=pedigree)

16

Inference intractable


Inference scales


# loci x (#states per node)
clique
-
size




20
loci



Largest ladders ~ 50 states


Clique
-
size = O(5)



50
5
=
312500000


intractable


Value abstraction

A

B

A

E

B

D

C

A

B

A

B

Z

#states


⍯b獥sv敤estat敳e⬠1



呒TCT䅂䱅



Data dependent networks

REAL WORLD APPLICATION

Tripoli plane crash, May 12 2010

Source
: http://www.bonaparte
-
dvi.com/en/documents/NFI
-
Tripoli
-
LR_52632_posterAO.pdf

W.Wiegerinck, W. Burgers

22

Business Case









Infeasible ‘by hand’

Familial
Search
-
Dragnet (
Vaatstra
)

Use of Bonaparte


Victim identification using DNA


System operational for daily use at NFI


Familial Search (criminal database +/
-

200,000 profiles)


Missing persons programs


Unknown corpses (400 profiles)


WW II victims


Kinship Analysis (immigration cases)



Wildlife (poaching)


N
onhuman DNA (plant traces
)



Paternity
testing




Bayesian networks as a


Transparant

and flexible modeling approach


Integrate data and domain knowledge


Real world applications


Potential for other domains:


Finance, life science, medical, …