How to compute a conditional random field

addictedswimmingAI and Robotics

Oct 24, 2013 (3 years and 11 months ago)

96 views

Introduction to Conditional
R
andom Fields

John Osborne

Sept 4, 2009

Overview


Useful Definitions


Background


HMM


MEMM


Conditional Random Fields


Statistical and Graph Definitions


Computation (Training and Inference)


Extensions


Bayesian Conditional Random Fields


Hierarchical Conditional Random Fields


Semi
-
CRFs



Future Directions

Useful Definitions


Random Field (wikipedia)


In
probability theory
, let
S

= {
X
1
, ...,
X
n
}, with the
X
i

in {0, 1, ...,
G



1} being a set
of
random variables

on the
sample space

Ω = {0, 1, ...,
G



1}
n
. A probability
measure π is a
random field

if, for all ω in Ω, π(ω) > 0.


Markov Process (chain if finite sequence)


Stochastic process with Markov property


Markov Property


The probability that a random variable assumes a value depends on the other
random variables only through the ones that are its immediate neighbors



memoryless



Hidden Markov Model (HMM)


Markov Model where the current state is unobserved


Viterbi

Algorithm


Dynamic programming technique to discover the most likely sequence of states
required to explain the observed states in an HMM


Determine labels


Potential Function == Feature Function


In CRF the potential function scores the compatibility of
y
t
, y
t
-
1

and w
t
(X)

Background


Interest in CRFs arose from
Richa’s

work with
gene expression


Current literature shows them performing better
on NLP tasks than other commonly used NLP
approaches like Support Vector Machines (SVM),
neural networks, HMMs and others


Termed coined by
Lafftery

in 2001


Predecessor was HMM and maximum entropy
Markov models (MEMM)

HMM


Definition


Markov Model where the current state is
unobserved


Generative Model


To examine all input X would be
prohibitive, hence Markov property
looking at only current element in the
sequence


No multiple interacting features,
long range dependencies




MEMMs


McCallum et al, 2000


Non
-
generative finite
-
state model based on
next
-
state classifier


Directed graph


P(
YjX
) = ∏
t

P(
y
t

| y
t
-
1
w
t
(X)) where wt(X) is a
sliding window over the
X sequence


Label Bias Problem


Transitions leaving a given state complete only
against each other, rather than against all other
transitions in the model


Implies “Conversation of score mass” (
Bottou
,
1991)


Observations can be ignored,
Viterbi

decoding
can’t downgrade a branch


CRF will solve this problem by having a single exponential
model for the joint probability of the ENTIRE SEQUENCE OF
LABELS given the observation sequence

Big Picture Definition


Wikipedia Definition (Aug 2009)


A
conditional random field (CRF)

is a type of
discriminative

probabilistic

model most often used for the labeling or
parsing

of sequential data, such as
natural language

text or biological sequences.


Probabilistic model is a statistical model, in math terms “a pair (
Y
,
P
) where
Y

is the set of possible observations and
P

the set of possible probability
distributions on
Y



In statistics terms this means the objective is to infer (or pick) the distinct
element (probability distribution) in the set “P” given your observation Y


Discriminative model meaning it models the conditional probability
distribution P(
y|x
) which can predict y given x.


It can not do it the other way around (produce x from y) since it does not a
generative model (capable of generating sample data given a model) as it does
not model a joint probability distribution


Similar to other discriminative models like support vector machines and neural
networks


When analyzing sequential data a conditional model specifies the
probabilities of possible label sequences given an observation sequence

CRF Graphical Definition

Definition from Lafferty


Undirected graphical model


Let g = (V,E) be a graph such
that Y = (
Y
v
)
v
ε
V
, so that Y is
indexed by the vertices of G.
Then (X,Y) is a conditional
random field in case, when
conditioned on X, the random
variables
Y
v

obey the Markov
property with respect to the
graph:
p(
Y
v
|X,Y
w
,w≠v
)=p(
Y
v
|X,Y
w
,w~v
),
where
w~v

means that w and
v are neighbors in G

CRF Undirected Graph

Computation of CRF


Training


Conditioning


Calculation of Feature Function


P(Y|X) = 1/Z(X)exp ∑
t

PSI (
y
t
, y
t
-
1

and w
t
(X))


Z is normalizing factor


Potential Function in
paratheses


Inference


Viterbi

Decoding


Approximate Model Averaging


Others?

Training Approaches


CRF is supervised learning so can train using


Maximum
Likehood

(original paper)


Used iterative scaling method, was very slow


Gradient Assent


Also slow when naïve


Mallet Implementation used BFGS algorithm


http://en.wikipedia.org/wiki/BFGS


Broyden
-
Fletcher
-
Goldfarb


Shanno


Approximate 2
nd

order algorithm


Stochastic Gradient Method (2006) accelerated via Stochastic Meta Descent


Gradient Tree Boosting (variant of a 2001


http://jmlr.csail.mit.edu/papers/volume9/dietterich08a/dietterich08a.pdf


Potential functions are sums of regression trees


Decision trees using real values


Published 2008


Competitive with Mallet


Bayesian (estimate posterior probability)

Conditional Random Field Extensions

Semi
-
CRF


Semi
-
CRF


Instead of assigning labels to each member of
sequence, labels are assigned to sub
-
sequences


Advantage


“features for semi
-
CRF can measure
properties of segments, and transition within a
segment can be non
-
Markovian



http://www.cs.cmu.edu/~wcohen/postscript/semi
CRF.pdf


Bayesian CRF


Qi

et al, (2005)


http://www.cs.purdue.edu/homes/alanqi/pap
ers/Qi
-
Bayesian
-
CRF
-
AIstat05.pdf


Replacement for ML method of Lafferty


Reducing over
-
fitting


“Power EP Method”


Hierarchical CRF

(HCRF)



http://www.springerlink.com/content/r84055k27
54464v5/


http://www.cs.washington.edu/homes/fox/posts
cripts/places
-
isrr
-
05.pdf


GPS motion, for surveillance, tracking, dividing
people’s workday into labels of work, travel,
sleep, etc..


Less work

Future Directions


Less work on conditional random fields in biology


PubMed

hits


Conditional Random Field
-

21


Conditional Random Fields
-

43


CRF variants & promoter/regulatory element shows
no hits


CRF and ontology show no hits


Plan


Implement CRF in Java, apply to biology problems, try
to find ways to extend?


Useful Papers


Link to original paper and review paper


http://www.inference.phy.cam.ac.uk/hmw26/crf/


Review paper:


http://www.inference.phy.cam.ac.uk/hmw26/papers/crf_intro.pdf


Another review


http://www.cs.umass.edu/~mccallum/papers/crf
-
tutorial.pdf


Review slides


http://www.cs.pitt.edu
/~mrotaru/comp/nlp/Random%20Fields/
Tutorial%20CRF%20Lafferty.pdf


The boosting paper has a nice review


http://jmlr.csail.mit.edu/papers/volume9/dietterich08a/dietteri
ch08a.pdf