DNA Matching as an Example of
Bayesian Inference
Bert
Kappen
, Willem Burgers,
Wim
Wiegerinck
Bayesian
methods
for
BD
•
Combine
–
Data
–
Domain
knowledge
The Identification Problem
Given
:
(E
.
g
.
)
a
disaster
with
•
A lot
of
DNA profiles
of victims, these are the
unidentified
individuals (UI
)
•
A lot of
pedigrees
(=family tree) with each one or
more
missing persons (MP)
and reference
DNA
profiles
of
family members
Question
:
Which
UI
matches
with
which
MP?
Bonaparte calculates the likelihood ratio as a measure of ‘fit’ in a pedigree:
LR = P(
EH
p
)/P(
EH
d
)
H
p
: UI is MP
H
d
:UI and MP are unrelated
STR
Human genome
•
99.7% of DNA is the same,
0.3% differs
•
Hyper variable regions, STR
Non

coding
•
Measurable by PCR
(
P
olymerase
C
hain Reaction)
STR (Short Tandem Repeat)
14
17
18
16
12
13
17
17
30
32
14
14
Y
X
16
16
17
18
12
13
23
15
29
31
14
15
X
X
14
16
18
18
13
12
17
23
32
31
14
15
X
X
Paternal alleles
(left column)
Maternal alleles
(right column)
father
mother
daughter
Independent inheritance
of chromosomes
Independent inheritance
at
each
locus
W.Wiegerinck, W. Burgers
6
Gregor
Johann
Mendel
1822

1884
Inheritance
8
mutation
How
to
model the
probabilities
?
X
X
P(
G
v
,G
m
,G
UI
UI
=child of
v
and
m)
P(
G
v
,G
m
,G
UI
UI
=unrelated person)
….
W.Wiegerinck, W. Burgers
7
•
Depends on the pedigree
•
Needs a model that can be generated automatically

given pedigree and data input

for any possible pedigree
Bayesian networks
Bonaparte
Bayesian
network
•
Modeling:
–
which are the variables?
•
o
bservables,
hiddens
?
–
What are the relations?
•
Bonaparte
–
DNA Transmission model
–
DNA priors
–
Genotype observation model
DNA
transmission
model
14
17
14
X
f
f
(
i
)
= Paternal allele father of
i
f(i)
m(
i
)
i
This is for a single locus,
independent
inheritance at each locus
Paternal
Allele
Father
Maternal
Allele
Father
Paternal
Allele
Child
X
m
f(i)
= Maternal allele father of i
X
f
i
= Paternal allele of i
P(
X
f
i

X
f
f
(
i
)
,
X
m
f
(
i
)
)
= ½ (P(
X
f
i

X
f
f
(
i
)
)
+ P(
X
f
i

X
m
f
(
i
)
) )
P(
X
f
i
=
a

X
f
f
(
i
)
=b
)
etc
:
probability that paternal allele is
a
when the transmitted allele is
b.
(Mutation
model
–
later more)
10
Per locus: Father transmits
either
his paternal allele
or
his maternal allele
(idem for mother)
17
(or )
Population
priors
No parents in the pedigree
P(X
f
i
, X
m
i
)
= P(X
f
i
)
P(X
m
i
)
(assumes sufficient mixing, etc.
other models are possible)
P(X
f
i
) ,
P(X
m
i
)
based on
population statistics
Other possibilities:
subpopulations, etc.
Paternal
Allele
Father
Maternal
Allele
Father
Paternal
Allele
Child
W.Wiegerinck, W. Burgers
11
Population statistics
X
X
W.Wiegerinck, W. Burgers
12
13
14
17
14
11
11
13
16
11
10
23.2
25
Y
X
14
13
17
14
11
11
16
13
10
11
23.2
25
Y
X
or
•
D19S433 = {13,14} , …
•
only unordered pairs are
observed
: (phase is lost: {13,14}={14,13
})
•
important to keep track of family relations
or …
W.Wiegerinck, W. Burgers
13
Paternal alleles
(left column)
Maternal alleles
(right column)
Genotype
observation
model
Maternal
Allele
i
Paternal
Allele
i
Genotype
i
•
P(
G
i
= { A,B
} 
X
f
i
=
A,
X
m
i
=
B
) = 1
•
P(
G
i
= { A,B
} 
X
f
i
=
B,
X
m
i
=
A
) = 1
•
0 otherwise
•
More general,
with measurement
errors:
P(
Gi
= { A,B
} 
X
f
i
=
C,
X
m
i
=
D
) =
P(
Gi
= { A,B
} 
X
f
i
=
D,
X
m
i
=
C
) = …
W.Wiegerinck, W. Burgers
14
A
lleles (A,B)
genotype
{A,B}.
Alleles (B,A)
genotype {A,B}.
Bayesian network of a pedigree
Genotype
Father
Paternal
Allele
Father
Maternal
Allele
Father
Paternal
Allele
Child
Genotype
Child
Genotype
Mother
Paternal
Allele
Mother
Maternal
Allele
Mother
Maternal
Allele
Child
W.Wiegerinck, W. Burgers
15
Bayesian network of a pedigree
SUM RULE : P(
E
H=pedigree
) =
Σ
P(
E,
others
H=pedigree)
16
Inference intractable
•
Inference scales
–
# loci x (#states per node)
clique

size
–
20
loci
–
Largest ladders ~ 50 states
–
Clique

size = O(5)
50
5
=
312500000
intractable
Value abstraction
A
B
A
E
B
D
C
A
B
A
B
Z
#states
⍯b獥sv敤estat敳e⬠1
呒TCT䅂䱅
Data dependent networks
REAL WORLD APPLICATION
Tripoli plane crash, May 12 2010
Source
: http://www.bonaparte

dvi.com/en/documents/NFI

Tripoli

LR_52632_posterAO.pdf
W.Wiegerinck, W. Burgers
22
Business Case
Infeasible ‘by hand’
Familial
Search

Dragnet (
Vaatstra
)
Use of Bonaparte
•
Victim identification using DNA
–
System operational for daily use at NFI
•
Familial Search (criminal database +/

200,000 profiles)
•
Missing persons programs
–
Unknown corpses (400 profiles)
–
WW II victims
•
Kinship Analysis (immigration cases)
•
Wildlife (poaching)
•
N
onhuman DNA (plant traces
)
•
Paternity
testing
•
Bayesian networks as a
–
Transparant
and flexible modeling approach
–
Integrate data and domain knowledge
•
Real world applications
•
Potential for other domains:
–
Finance, life science, medical, …
Comments 0
Log in to post a comment