Introduction to Conditional
R
andom Fields
John Osborne
Sept 4, 2009
Overview
•
Useful Definitions
•
Background
–
HMM
–
MEMM
•
Conditional Random Fields
–
Statistical and Graph Definitions
•
Computation (Training and Inference)
•
Extensions
–
Bayesian Conditional Random Fields
–
Hierarchical Conditional Random Fields
–
Semi

CRFs
•
Future Directions
Useful Definitions
•
Random Field (wikipedia)
–
In
probability theory
, let
S
= {
X
1
, ...,
X
n
}, with the
X
i
in {0, 1, ...,
G
−
1} being a set
of
random variables
on the
sample space
Ω = {0, 1, ...,
G
−
1}
n
. A probability
measure π is a
random field
if, for all ω in Ω, π(ω) > 0.
•
Markov Process (chain if finite sequence)
–
Stochastic process with Markov property
•
Markov Property
–
The probability that a random variable assumes a value depends on the other
random variables only through the ones that are its immediate neighbors
–
“
memoryless
”
•
Hidden Markov Model (HMM)
–
Markov Model where the current state is unobserved
•
Viterbi
Algorithm
–
Dynamic programming technique to discover the most likely sequence of states
required to explain the observed states in an HMM
–
Determine labels
•
Potential Function == Feature Function
–
In CRF the potential function scores the compatibility of
y
t
, y
t

1
and w
t
(X)
Background
•
Interest in CRFs arose from
Richa’s
work with
gene expression
•
Current literature shows them performing better
on NLP tasks than other commonly used NLP
approaches like Support Vector Machines (SVM),
neural networks, HMMs and others
–
Termed coined by
Lafftery
in 2001
•
Predecessor was HMM and maximum entropy
Markov models (MEMM)
HMM
–
Definition
•
Markov Model where the current state is
unobserved
–
Generative Model
–
To examine all input X would be
prohibitive, hence Markov property
looking at only current element in the
sequence
–
No multiple interacting features,
long range dependencies
MEMMs
–
McCallum et al, 2000
–
Non

generative finite

state model based on
next

state classifier
–
Directed graph
–
P(
YjX
) = ∏
t
P(
y
t
 y
t

1
w
t
(X)) where wt(X) is a
sliding window over the
X sequence
Label Bias Problem
•
Transitions leaving a given state complete only
against each other, rather than against all other
transitions in the model
•
Implies “Conversation of score mass” (
Bottou
,
1991)
•
Observations can be ignored,
Viterbi
decoding
can’t downgrade a branch
•
CRF will solve this problem by having a single exponential
model for the joint probability of the ENTIRE SEQUENCE OF
LABELS given the observation sequence
Big Picture Definition
•
Wikipedia Definition (Aug 2009)
–
A
conditional random field (CRF)
is a type of
discriminative
probabilistic
model most often used for the labeling or
parsing
of sequential data, such as
natural language
text or biological sequences.
•
Probabilistic model is a statistical model, in math terms “a pair (
Y
,
P
) where
Y
is the set of possible observations and
P
the set of possible probability
distributions on
Y
”
–
In statistics terms this means the objective is to infer (or pick) the distinct
element (probability distribution) in the set “P” given your observation Y
•
Discriminative model meaning it models the conditional probability
distribution P(
yx
) which can predict y given x.
–
It can not do it the other way around (produce x from y) since it does not a
generative model (capable of generating sample data given a model) as it does
not model a joint probability distribution
–
Similar to other discriminative models like support vector machines and neural
networks
•
When analyzing sequential data a conditional model specifies the
probabilities of possible label sequences given an observation sequence
CRF Graphical Definition
Definition from Lafferty
•
Undirected graphical model
•
Let g = (V,E) be a graph such
that Y = (
Y
v
)
v
ε
V
, so that Y is
indexed by the vertices of G.
Then (X,Y) is a conditional
random field in case, when
conditioned on X, the random
variables
Y
v
obey the Markov
property with respect to the
graph:
p(
Y
v
X,Y
w
,w≠v
)=p(
Y
v
X,Y
w
,w~v
),
where
w~v
means that w and
v are neighbors in G
CRF Undirected Graph
Computation of CRF
•
Training
–
Conditioning
–
Calculation of Feature Function
–
P(YX) = 1/Z(X)exp ∑
t
PSI (
y
t
, y
t

1
and w
t
(X))
•
Z is normalizing factor
•
Potential Function in
paratheses
•
Inference
–
Viterbi
Decoding
–
Approximate Model Averaging
–
Others?
Training Approaches
•
CRF is supervised learning so can train using
–
Maximum
Likehood
(original paper)
•
Used iterative scaling method, was very slow
–
Gradient Assent
•
Also slow when naïve
–
Mallet Implementation used BFGS algorithm
•
http://en.wikipedia.org/wiki/BFGS
•
Broyden

Fletcher

Goldfarb
–
Shanno
•
Approximate 2
nd
order algorithm
–
Stochastic Gradient Method (2006) accelerated via Stochastic Meta Descent
–
Gradient Tree Boosting (variant of a 2001
•
http://jmlr.csail.mit.edu/papers/volume9/dietterich08a/dietterich08a.pdf
•
Potential functions are sums of regression trees
–
Decision trees using real values
•
Published 2008
•
Competitive with Mallet
–
Bayesian (estimate posterior probability)
Conditional Random Field Extensions
Semi

CRF
•
Semi

CRF
–
Instead of assigning labels to each member of
sequence, labels are assigned to sub

sequences
–
Advantage
–
“features for semi

CRF can measure
properties of segments, and transition within a
segment can be non

Markovian
”
–
http://www.cs.cmu.edu/~wcohen/postscript/semi
CRF.pdf
Bayesian CRF
•
Qi
et al, (2005)
•
http://www.cs.purdue.edu/homes/alanqi/pap
ers/Qi

Bayesian

CRF

AIstat05.pdf
•
Replacement for ML method of Lafferty
•
Reducing over

fitting
•
“Power EP Method”
Hierarchical CRF
(HCRF)
•
http://www.springerlink.com/content/r84055k27
54464v5/
•
http://www.cs.washington.edu/homes/fox/posts
cripts/places

isrr

05.pdf
•
GPS motion, for surveillance, tracking, dividing
people’s workday into labels of work, travel,
sleep, etc..
•
Less work
Future Directions
•
Less work on conditional random fields in biology
–
PubMed
hits
•
Conditional Random Field

21
•
Conditional Random Fields

43
–
CRF variants & promoter/regulatory element shows
no hits
•
CRF and ontology show no hits
•
Plan
–
Implement CRF in Java, apply to biology problems, try
to find ways to extend?
Useful Papers
•
Link to original paper and review paper
–
http://www.inference.phy.cam.ac.uk/hmw26/crf/
–
Review paper:
•
http://www.inference.phy.cam.ac.uk/hmw26/papers/crf_intro.pdf
•
Another review
–
http://www.cs.umass.edu/~mccallum/papers/crf

tutorial.pdf
•
Review slides
–
http://www.cs.pitt.edu
/~mrotaru/comp/nlp/Random%20Fields/
Tutorial%20CRF%20Lafferty.pdf
•
The boosting paper has a nice review
–
http://jmlr.csail.mit.edu/papers/volume9/dietterich08a/dietteri
ch08a.pdf
Comments 0
Log in to post a comment