How to compute a conditional random field

AI and Robotics

Oct 24, 2013 (5 years and 4 months ago)

122 views

Introduction to Conditional
R
andom Fields

John Osborne

Sept 4, 2009

Overview

Useful Definitions

Background

HMM

MEMM

Conditional Random Fields

Statistical and Graph Definitions

Computation (Training and Inference)

Extensions

Bayesian Conditional Random Fields

Hierarchical Conditional Random Fields

Semi
-
CRFs

Future Directions

Useful Definitions

Random Field (wikipedia)

In
probability theory
, let
S

= {
X
1
, ...,
X
n
}, with the
X
i

in {0, 1, ...,
G

1} being a set
of
random variables

on the
sample space

Ω = {0, 1, ...,
G

1}
n
. A probability
measure π is a
random field

if, for all ω in Ω, π(ω) > 0.

Markov Process (chain if finite sequence)

Stochastic process with Markov property

Markov Property

The probability that a random variable assumes a value depends on the other
random variables only through the ones that are its immediate neighbors

memoryless

Hidden Markov Model (HMM)

Markov Model where the current state is unobserved

Viterbi

Algorithm

Dynamic programming technique to discover the most likely sequence of states
required to explain the observed states in an HMM

Determine labels

Potential Function == Feature Function

In CRF the potential function scores the compatibility of
y
t
, y
t
-
1

and w
t
(X)

Background

Interest in CRFs arose from
Richa’s

work with
gene expression

Current literature shows them performing better
on NLP tasks than other commonly used NLP
approaches like Support Vector Machines (SVM),
neural networks, HMMs and others

Termed coined by
Lafftery

in 2001

Predecessor was HMM and maximum entropy
Markov models (MEMM)

HMM

Definition

Markov Model where the current state is
unobserved

Generative Model

To examine all input X would be
prohibitive, hence Markov property
looking at only current element in the
sequence

No multiple interacting features,
long range dependencies

MEMMs

McCallum et al, 2000

Non
-
generative finite
-
state model based on
next
-
state classifier

Directed graph

P(
YjX
) = ∏
t

P(
y
t

| y
t
-
1
w
t
(X)) where wt(X) is a
sliding window over the
X sequence

Label Bias Problem

Transitions leaving a given state complete only
against each other, rather than against all other
transitions in the model

Implies “Conversation of score mass” (
Bottou
,
1991)

Observations can be ignored,
Viterbi

decoding

CRF will solve this problem by having a single exponential
model for the joint probability of the ENTIRE SEQUENCE OF
LABELS given the observation sequence

Big Picture Definition

Wikipedia Definition (Aug 2009)

A
conditional random field (CRF)

is a type of
discriminative

probabilistic

model most often used for the labeling or
parsing

of sequential data, such as
natural language

text or biological sequences.

Probabilistic model is a statistical model, in math terms “a pair (
Y
,
P
) where
Y

is the set of possible observations and
P

the set of possible probability
distributions on
Y

In statistics terms this means the objective is to infer (or pick) the distinct
element (probability distribution) in the set “P” given your observation Y

Discriminative model meaning it models the conditional probability
distribution P(
y|x
) which can predict y given x.

It can not do it the other way around (produce x from y) since it does not a
generative model (capable of generating sample data given a model) as it does
not model a joint probability distribution

Similar to other discriminative models like support vector machines and neural
networks

When analyzing sequential data a conditional model specifies the
probabilities of possible label sequences given an observation sequence

CRF Graphical Definition

Definition from Lafferty

Undirected graphical model

Let g = (V,E) be a graph such
that Y = (
Y
v
)
v
ε
V
, so that Y is
indexed by the vertices of G.
Then (X,Y) is a conditional
random field in case, when
conditioned on X, the random
variables
Y
v

obey the Markov
property with respect to the
graph:
p(
Y
v
|X,Y
w
,w≠v
)=p(
Y
v
|X,Y
w
,w~v
),
where
w~v

means that w and
v are neighbors in G

CRF Undirected Graph

Computation of CRF

Training

Conditioning

Calculation of Feature Function

P(Y|X) = 1/Z(X)exp ∑
t

PSI (
y
t
, y
t
-
1

and w
t
(X))

Z is normalizing factor

Potential Function in
paratheses

Inference

Viterbi

Decoding

Approximate Model Averaging

Others?

Training Approaches

CRF is supervised learning so can train using

Maximum
Likehood

(original paper)

Used iterative scaling method, was very slow

Also slow when naïve

Mallet Implementation used BFGS algorithm

http://en.wikipedia.org/wiki/BFGS

Broyden
-
Fletcher
-
Goldfarb

Shanno

Approximate 2
nd

order algorithm

Stochastic Gradient Method (2006) accelerated via Stochastic Meta Descent

Gradient Tree Boosting (variant of a 2001

http://jmlr.csail.mit.edu/papers/volume9/dietterich08a/dietterich08a.pdf

Potential functions are sums of regression trees

Decision trees using real values

Published 2008

Competitive with Mallet

Bayesian (estimate posterior probability)

Conditional Random Field Extensions

Semi
-
CRF

Semi
-
CRF

Instead of assigning labels to each member of
sequence, labels are assigned to sub
-
sequences

“features for semi
-
CRF can measure
properties of segments, and transition within a
segment can be non
-
Markovian

http://www.cs.cmu.edu/~wcohen/postscript/semi
CRF.pdf

Bayesian CRF

Qi

et al, (2005)

http://www.cs.purdue.edu/homes/alanqi/pap
ers/Qi
-
Bayesian
-
CRF
-
AIstat05.pdf

Replacement for ML method of Lafferty

Reducing over
-
fitting

“Power EP Method”

Hierarchical CRF

(HCRF)

54464v5/

http://www.cs.washington.edu/homes/fox/posts
cripts/places
-
isrr
-
05.pdf

GPS motion, for surveillance, tracking, dividing
people’s workday into labels of work, travel,
sleep, etc..

Less work

Future Directions

Less work on conditional random fields in biology

PubMed

hits

Conditional Random Field
-

21

Conditional Random Fields
-

43

CRF variants & promoter/regulatory element shows
no hits

CRF and ontology show no hits

Plan

Implement CRF in Java, apply to biology problems, try
to find ways to extend?

Useful Papers

Link to original paper and review paper

http://www.inference.phy.cam.ac.uk/hmw26/crf/

Review paper:

http://www.inference.phy.cam.ac.uk/hmw26/papers/crf_intro.pdf

Another review

http://www.cs.umass.edu/~mccallum/papers/crf
-
tutorial.pdf

Review slides

http://www.cs.pitt.edu
/~mrotaru/comp/nlp/Random%20Fields/
Tutorial%20CRF%20Lafferty.pdf

The boosting paper has a nice review

http://jmlr.csail.mit.edu/papers/volume9/dietterich08a/dietteri
ch08a.pdf