Context-based adaptive binary arithmetic coding in the H.264/AVC ...

internalchildlikeInternet και Εφαρμογές Web

12 Νοε 2013 (πριν από 3 χρόνια και 9 μήνες)

86 εμφανίσεις

Context
-
based
adaptive binary
arithmetic coding in
the H.264/AVC video
compression

IEEE CSVT July 2003

Detlev Marpe, Heiko Schwarz, and Thomas
Wiegand


2003/11/04

Presented by Chen
-
hsiu Huang

C
A
B
A
C

Outline


Introduction


The CABAC framework


Detailed description of CABAC


Experimental result


Conclusion

Past deficiencies


Entropy coding such as MPEG
-
2, H.263, MPEG
-
4
(SP) is based on fixed tables of VLCs.


Due to VLCs, coding events with probability > 0.5
cannot be efficiently represented.


The usage of fixed VLC tables does not allow an
adaptation to the actual symbol statistics.


Since there is a fixed assignment of VLC tables
and syntax elements, existing
inter
-
symbol
redundancies

cannot be exploited.

Why?

Solutions


The first hybrid block
-
based video coding
schemes that incorporate an adaptive binary
arithmetic coder was presented in [6].


The first standard that use arithmetic entropy
coder is given by Annex E of H.263 [4].


However, the major drawbacks contains:


Annex E is applied to the same syntax elements as the
VLC elements of H.263.


All the probability models an non
-
adaptive that their
underlying probability as assumed to be static.


The generic m
-
ary arithmetic coder used involves a
considerable amount of computational complexity.

Jump!

The CABAC Framework


binarization


context modeling


binary arithmetic coding

In the regular coding mode, each bin enters the context modeling
stage, where a probability model is selected such that the
corresponding choice may depend on previously encoded syntax
elements of bins.

The bypass coding mode is chosen for selected bins in order to allow
a speedup of the whole encoding/decoding process by means of
simplified coding engine without the usage of an explicitly assigned
model.

Figure 1.

Binarization


Consider the value

3


of mb_type, which signals the
macroblock type

P_8x8

, is given by

001

.


The symbol probability p(

3

) is equal to the product of
p
(C0)
(

0

), p
(C1)
(

0

), and p
(C2)
(

1

), where C0, C1, and C2
are denote the binary probability models of the internal
nodes.

Figure 2.

Back


Adaptive m
-
ary binary arithmetic coding (m > 2) is in
general requiring at least two multiplication for each symbol
to encode as well as a number of fairly operations to
perform the probability update [36].


Contrary, fast, multiplication
-
free variants of binary
arithmetic coding, one of which was specifically developed
for the CABAC frame, as described below.


Since the probability of symbols with larger bin strings is
typically very low, the computation overhead in fairly small
and can be easily compensated by using a fast binary
coding engine.


Finally, binarization enables context modeling on sub
-
symbol level. For the most frequently observed bins,
conditional probability can be used, while less frequently
observed bins can be treaded using a joint, typically zero
-
order probability model.

Why?

Binarization Schemes


A binary representation for a given non
-
binary valued
syntax element should be close to a minimum redundancy
code.


Instead of Huffman tree, the design of CABAC (mostly)
relies the a few basic code trees,
whose structure enables a
simple on
-
line computation of all code words without the
need for storing any tables
.


Unary code (U) and truncated unary code (TU)


The
k
th order Exp
-
Golomb code (EG
k
)


The fixed
-
length code (FL)


All the binarization schemes have less probability when the
codeword length becomes longer.


In addition, there are binarization schemes based on a
concatenation of these elementary types
.


As an exception, there are five specific binary trees selected
manually for the coding of macroblock and sub
-
macroblock
types. Two of them show in Figure 2.

Unary and Truncated Unary
Binarization


For each unsigned integer valued symbol x >= 0,
the unary code word in CABAC consists if x

1


bits plus a terminating

0


bit.


The truncated unary (TU) code is only defined for
x with 0 <= x <= S, where for x < S the code is
given by the unary code, whereas for x=S the
terminating

0


bit is neglected.


For example:


U: 5 => 111110


TU with S=9:


6: => 1111110


9: => 111111111

k
th order Exp
-
Golomb Binarization


The prefix part of the EG
k

codeword consists of a
unary code corresponding to the value
l(x)=
floor
(
log
2
(x/2
k
+1))


The EG
k

suffix part is computed as the binary
representation of
x+2
k
(1
-
2
l(x)
)

using
k+l(x)

significant
bits.

Fixed
-
Length Binarization


Let x denote a given value of such a syntax
element, where 0 <= x <= S. Then, the FL
codeword of x is simply given by the binarization
representation of x with a fixed (minimum)
number
l
FL
=
ceil
(log
2
S)

of bits.


Typically, FL binarization is applied to syntax
elements
with a nearly uniform distribution

or to
syntax elements,
where each bit in the FL binary
representation represents a specific coding
decisions
.


E.g. In the part of the coded block pattern symbol
related to the luminance residual data.

Concatenation schemes


Three binarization schemes are derived


Concatenation of a
4
-
bit FL prefix

as a
representation of the luminance related part of
the coded block pattern and a
TU suffix with S=2

representing the chrominance related part of
code_block_pattern.


Both the second and third are derived from the
TU and the EG
k

binarization, which are referred
as
Unary/
k
th order Exp
-
Golomb (UEG
k
)
binarization
, are applied to
motion vector
differences

and
absolute values of transform
coefficients levels
.


The design of these concatenated binarization
scheme is motivated by the following
observations:


First, the unary code is the simplest prefix
-
free code in
terms of implementation cost.


Second, it permits a fast adaptation of the individual
symbol probabilities in the sub
-
sequent context
modeling stage, since the arrangement of the nodes in
the corresponding tree is typically such that with
increasing distance of the internal nodes from the root
node the corresponding binary probabilities are less
skewed.


These observations are only accurate for small
values of the absolute motion vector differences
and transform coefficient levels. For larger values,
there is not much use of an adaptive modeling
leading to the idea of concatenating and
adaptation.

E.g.
mvd
, motion vector difference


For the prefix part of the UEGk bin string, a TU
binarization with a cutoff S=9 is involed for
min
(|mvd|, 9)
.


If
mvd

is equal to zero, the bin string consists only the
prefix codeword

0

.


If the condition
|mvd| >= 9

holds, the suffix in
constructed as an EG3 codeword for the value of
|mvd|
-

9
, to which the sign of
mvd

is appended
using the sign bit “1” for a negative
mvd

and “0”
otherwise. For
mvd

values with
0 < |mvd| < 9
, the
suffix consists only of the sign bit.


With the choice of the Exp
-
Golomb parameter
k=3
,
the suffix code words are given such that a
geometrical increase of the prediction error in
units of two samples is captured by a linear
increase in the corresponding suffix code word
length.

Figure 3.

UEG
0

binarization for
encoding of absolution
values of transform
coefficient levels.

Context modeling


Suppose a pre
-
defined set
T

of past symbols, a so
-
called
context template, and a related set
C={0,…,C
-
1}

of contexts
is given, where the context are specified by a modeling
function
F:T

C

operating on the template
T
.


For each symbol
x

to be code, a conditional probability
p(x|F(z))

is estimated by switching between different
probability models according to the already coded
neighboring symbols
z

in
T
. Thus,
p(x|F(z))

is estimated on
the fly by tracking the actual source statistics
.


The number
τ

of different conditional probabilities to be
estimated for an alphabet size of
m

is equal to
τ
=C(m
-
1)
.


This implies that by increasing the number of C, there is a
point where overfitting of the model may occur.


In CABAC,
only very limited context templates
T

consisting
of a few neighboring of the current symbol to encode are
employed

such that only a small number of different
context models
C

is used.


Second, context modeling is restricted to select bins of the
binarized symbols. As a result, the model cost is drastically
reduced.


Four basic design types of context models can be
distinguished in CABAC.
The first type involves a context
template with up to two neighboring syntax elements in the
past of the current syntax element to encode.

Figure 4. illustration of a context
template consisting of two
neighboring syntax element A and B
to the left and on top of the current
syntax element C.

Types of context modeling


The second type of current is only defined for the syntax
elements of mb_type and sub_mb_type.


For this kind of context models,
the values of prior coded
bins
(b
0
,b
1
,...,b
i
-
1
)

are used for the choice of a model for a
given bin with index
i
.

Note that in CABAC
these context models
are only used to
select different
models for different
internal nodes of the
corresponding binary
trees.

Figure 2.


Both the third and fourth type of context models is
applied to residual data only
. In contrast to all
other types of context models, both types depend
on the context categories of different block types.


The third type does not rely on past coded data,
but on the position in the scanning path.


Significant map


The fourth type, modeling functions are specified
that involve the evaluation of the accumulated
number of encoded/decoded levels with a specific
value prior to the current level bin to
encode/decode.


Level information

Context index
γ


The entity of probability
models used in CABAC
can be arranged in a
linear fashion called
context index
γ
.


Each probability model
relate to a given context
index
γ

is determined by
two values, a 6
-
bit
probability state index
α
γ

and the (binary)
β
γ

of the
most probable symbol

(MPS).


(
α
γ
β
γ
,
) for 0≤
γ

≤398
represented as 7
-
bit
unsigned integer.

Figure 5. syntax elements and
associated range of context indices


The context indices in the range from 0 to 72 are related to
syntax elements of macroblock, sub
-
macroblock, prediction
modes of special and temporal as well as slice
-
based and
macroblock
-
based control information.


For this type, a corresponding context index
γ

can be
calculated as
γ
=
Γ
S
+
χ
S
.
.
Γ
S

denotes the
context index offset
,
which is defined as the lower value of the range given in
Figure 5. And
χ
S

denotes the
context index increment

of a
given syntax element S.


Context indices of from 73 to 398 are related to the coding
of residual data.


The range value in the lower row of the corresponding
syntax elements in Figure 5 specify the context indices for
field
-
based coding mode.
In pure frame only 277

out of the
total 399 probabilities models are actually used.


For other syntax
elements of residual
data, a context index
γ

is given by:
γ
=
Γ
S
+
Δ
S
(ctx_cat)+
χ
S
. Here
the context category
(ctx_cat) dependent
offset
Δ
S
is employed.
(Figure 6)


Note that only past
coded value of syntax
elements are evaluated
that belong to the same
slice, where the current
coding process takes
place.

Figure 6. Basic types with number
of coefficients and associated
context categories.

Back

Binary arithmetic coding


Binary arithmetic is based on the principal of recursive
interval subdivision.


Suppose that an estimate of the probability
p
LPS

in (0,0.5]

of
the
least probable symbol (LPS)

is given and its lower
bound
L

and its width
R
. Based on this, the given interval is
sub
-
divided into two sub
-
intervals:
R
LPS
=R•p
LPS
(3), and the
dual interval is
R
MPS
=R
-
R
LPS
.


In a practical implementation, the main bottleneck in terms
of throughput is
the multiplication operation required
.


A significant amount of work has been published aimed at
speeding up the required calculation by introducing some
approximations of either the range
R

or of the probability
p
LPS

such that multiplication can be avoided. [32
-
34]


The Q coder [32] and QM and MQ coder [35] both have
their inefficiency. Here we designed an alternative
multiplication
-
free one, called
modulo coder (M coder)
,
shown to provide a higher throughout than the MQ coder
[36].


The basic idea of M coder is to project both the legal range
[R
min
,R
max
)

of interval width R and the probability range with
the LPS onto a small set of representative
Q={Q
0
,...,Q
K
-
1
}
,
P={p
0
,...,p
N
-
1
}
. Thus the multiplication on the right
-
hand side
of (3) can be approximated by using a table of
K*N

pre
-
computed values.


A reasonable size of the corresponding table and a
sufficient good approximation was found by using
a set
Q

of
K=4

quantized range values together with
a set
P

of
M=64

LPS related probability values.


Another distinct feature in H.264/AVC, as already
mentioned above, is its simplicity
bypass coding mode

(assumed to be uniformly distributed).

Details of CABAC


The syntax elements are divided into two
categories.


The first contains elements related to macroblock type,
sub
-
macroblock type, and information of prediction
modes both of spatial and of temporal type as well as
slice and macroblock
-
based control information.


In the second, all
residual data elements
, i.e., all syntax
elements related to the coding of transform coefficients
are combined.


In addition, a more detailed explanation of the
probability estimation process and the table
-
based binary arithmetic coding engine of CABAC
is given.

Coding of macroblock type, prediction
mode, and control information


At the top level of the macroblock layer syntax
the signaling of mb_skip_flag and mb_type is
performed. The binary
-
valued mb_skip_flag
indicates whether the current macroblock in a
P/SP or B slice is skipped.


For a given macroblock C, the related context
models involves the mb_skip_flag values of the
neighboring A at left and B on top. Given by:


χ
MbSkip
(C) = (mb_skip_flag(A) != 0) ? 0: 1 + (mb_skip_flag(B) != 0) ?
0: 1


If one or both of the neighboring A or B are not
available, the mb_skip_type (C) value is set to 0.

Macroblock type


As already stated above. Figure 2 shows the
binarization trees for mb_type and sub_mb_type
that are used in P/SP slices.


Note the mb_type value of

4


for P slices is not
used in CABAC entropy coding mode. For the
values

5

-

30


of mb_type, which is further
specified in [1].


For coding a bin value corresponding to the
binary decision at an internal node shown in
Figure 2, separate context models denote by
C0,...,C3

for mb_type and
C’0,...,C’3

for
sub_mb_type are employed.


Figure 2

Coding of prediction modes


Intra prediction modes for luma 4x4
: the
luminance intra prediction modes for 4x4 blocks
are itself predicted resulting in the syntax
elements of the binary
-
values
prev_intra4x4_pred_mode_flag and
rem_intra4x4_pred_mode, where the latter is
only present if the former takes a value of 0.


For coding these syntax elements, two separate
probability models are utilized:
one for coding of
the flag

and another for
coding each bin value of
the 3
-
FL binarization value

of
rem_intra4x4_pred_mode.


Intra prediction modes for chroma
:


χ
ChPerd
(C) = (ChPredInDcMode(A) != 0) ? 0: 1 +
(ChPredInDcMode(B) != 0) ? 0: 1


Reference Picture Index
:


χ
RefIdx
(C) = (RefIdxZeroFlag(A) != 0) ? 0: 1 + 2
×

((RefIdxZeroFlag(B) != 0) ? 0: 1)


Components of motion vector differences
:


mvd(X,cmp) denote the value of a motion vector
difference component of direction cmp in {hori, vert}
related to a macroblock or sub
-
macroblock partition X.


Macroblock
-
based quantization parameter change
:


For updating the quantization parameter on a
macroblock level, mb_qp_delta is present for each non
-
skipped macroblock. For coding the signed value
δ
(C)
of
this syntax element,
δ
(C)

is first mapped onto a positive
value by


δ
+
(C)=2|
δ

(C)|
-
((
δ
(C)>0) ? 1: 0)


Then
δ
+
(C)

is binarized using the unary binarization
scheme.



End of slice flag
:


For signaling the last macroblock (macroblock pair) in a
slice, the end_of_slice_flag is present for each
macroblock (pair).


The event of non
-
terminating macroblock is related to
the highest possible MPS possibility


Macroblock pair field flag
:


χ
MbField
(C) = mb_field_decoding_flag(A) + mb_field_decoding_flag(B)

Coding of residual data


A one
-
bit symbol
coded_block_flag

and a binary
-
valued
significant map

are used to indicate the
occurrence and the location of non
-
zero
transform coefficients in a given block.


Non
-
zero levels are encoded in
reverse scanning
order
.


Context models for coding of nonzero transform
coefficients are chosen based on
the number of
previously transmitted nonzero levels within the
reverse scanning path
.


First the coded block flag is
transmitted for the given
block of transform
coefficients unless the
coded block pattern or the
macroblock mode indicated
that the regarded block has
no nonzero coefficients.


If the coded block flag is
nonzero, a significant map
specifying the position of
significant coefficients is
encoded.


Finally, the absolute value
of the level as well as the
sign is encoded for each
significant transform
coefficient.

Figure 7.


Coded block pattern
: For each non
-
skipped macroblock
with prediction mode not equal to intra_16x16, the
coded_block_pattern symbol indicates
which of the six 8x8
blocks


four luminance and two chrominance



contain
nonzero transform coefficients.


A given value of the syntax element coded_block_pattern is
binarized using the
concatenation of a 4
-
bit FL and a TU
binarization with cutoff value S=2
.


Coded block flag
: is a one
-
bit symbol, which indicate if
there are significant, i.e. nonzero coefficients inside single
block of transform coefficients.


Scanning of transform coefficients
: the 2
-
D array of
transform coefficient levels of those sub
-
blocks for which
the coded_block_flag indicates nonzero entries are first
mapped onto a 1D list using a given scanning pattern.

Encoding process of residual data


Significance map
: If the significant_coeff_flag
symbol is one, a further one
-
bit symbol
last_significant_coefficient is sent. This symbol
indicates if the current significant coefficient is
the last in inside the block or if further significant
coefficients follow.


Level information
: The value of significant
coefficients (levels) are encoded by using two
coding symbols: coeff_abs_level_minus1, and
coeff_sign_flag. The UEG
0

binarization scheme is
used for encoding of coeff_abs_level_minus1.


The levels are transmitted in reverse scanning
order allowing the usage of reasonable adjust
context models.


Context modes for residual data
: In H.264/AVC,
there 12 types of transform coefficient blocks,
which typically have different kinds of statistics.
To keep the number of different context models
small, they are classified into five categories as in
Figure 6.


For each of these categories, a special set of
context models is used for all syntax elements
related to residual data.


coded_block_pattern
: For bin indices from 0 to 3
corresponding to the four 8x8 luminance blocks,


χ
CBP
(C,bin_idx) = ((CBP_Bit(A) != 0) ? 0: 1) +
2*((CBP_Bit(B) != 0) ? 0: 1)


For indices 4 and 5, are specified in [1]


Figure 6

Context models for residual data


Coded Block Flag
: Coding of the coded_block_flag
utilizes four different probability models for each
of the five categories as specified in Figure 6.


χ
CBFlag
(C) = coded_block_flag(A) + 2*coded_block_flag(B)


Significant map
: For encoding the significant map,
up to 15 different probability models are used for
both significant_coeff_flag and
last_significant_flag.


The choice of the models and the context index
increments depend on the
scanning position


χ
SIG
(coeff[i]) =
χ
LAST
(coeff[i]) = i


Level information
: Reverse scanning of the level
information
allows a more reliable estimation of
the statistics
, because
at the end of the scanning
path it is very likely to observe the occurrence of
successive so
-
called trailing 1

s
.

Probability estimation


For CABAC, 64 representative probability values
p
σ

in [0.01875, 0.5] were derived for the LPS by:


P
σ
=
α
* P
σ
-
1

for all
σ
=1,...,63


α
=(0.01875 / 0.5)^(1/63)

and

p
0
=0.5

Figure 8. LPS probability
values and transition
rules for updating the
probability estimation of
each state after
observing a LPS (dashed
lines in left direction) and
a MPS (solid lines in right
direction).


Both the chosen scaling factor
α

≈ 0.95 and the cardinality
N=64 of the set probabilities represent a good compromise
between the desire for fast adaptation (
α



0, small N) and
sufficiently stable and accurate estimate (
α



1, large N).


As a result of this design, each context model in CABAC can
be completely determined by two parameters:
its current
estimate of the LPS probability and its value of MPS
β

being either 0 or 1.


Actually, for a given probability state, the update depends
on the state index and the value of the encoded symbol
identified either as a LPS or a MPS.


The derivation of the transition rules for the LPS probability
is based on the following relation between a given LPS
probability
p
old

and its updated counterpart
p
new
:

Table
-
based binary arithmetic
coding


Actually, the CABAC coding
engine consists of two sub
-
engines,
one for regular
coding mode

and
the other
for bypass coding engine
.


Interval sub
-
division in
regular coding mode
: The
internal state of the
arithmetic encoding engine
is as usual characterized
by two quantities:
the
current interval R

and
the
base L of the current code
interval.

Figure 9.


First,
the current interval R is approximated by a
quantized value
Q(R)
,

using an equi
-
partition of
the whole range
2
8
≤R<2
9

into four cells. But
instead of using the corresponding representative
quantized values
Q
0
,
Q
1
,
Q
2
, and
Q
3
.
Q(R)

is only
addressed by its quantizer index
ρ
, e.g.
ρ
=(R>>6)
& 3
.


Thus, this index and the probability state index
are used as entries in a 2D table TabRangeLPS
to determine (approximate) the LPS related sin
-
interval range
R
LPS
. Here the table TabRangeLPS
contains all 64x4 pre
-
computed product values
p
σ

Q
ρ

for
0≤
σ
≤63
, and

0≤
ρ
≤3

in 8 bit precision
.


Bypass coding mode
: To
speed up the
encoding/decoding of symbols,
for which
R
-
R
LPS

≈R
LPS

≈R/2

is
assumed to hold.


The variable L is doubled
before choosing the lower or
upper sub
-
interval depending
on the value of the symbol to
encode (0 or 1).


In this way, doubling of L and
R in the sub
-
sequent
renormalization in the bypass
is operated with doubled
decision threshold.

Figure 10.


Renormalization and carry
-
over control
: A
renormalization operation after interval sub
-
division is required whenever the new interval
range R no longer stays with its legal range of
[2
8
,2
9
).


For the CABAC engine, the renormalization
process and carry
-
over control of [37] was
adopted.


This implies, in particular, that
the encoder has to
resolve any carry propagation by monitoring the
bits that are outstanding for being emitted
.


More details can be found in [1].

Experimental result


In our experiments, we compare the coding efficiency of
CABAC to the coding efficiency of the baseline entropy
coding method of H.264/AVC. The baseline entropy coding
method uses the zero
-
order Exp
-
Golomb code for all syntax
elements with the exception of the residual data, which are
coded using the coding method of CAVLC [1], [2].


For the range of acceptable video quality for broadcast
application of about 30

38 dB and averaged over all tested
sequences, bit
-
rate savings of
9% to 14%

are achieved,
where
higher gains are obtained at lower rates
.


References


[1]

Draft ITU
-
T Recommendation H.264 and Draft ISO/IEC 14 496
-
10
AVC," in Joint Video Team of ISO/IEC JTC1/SC29/WG11 & ITU
-
T SG16/Q.6
Doc. JVT
-
G050, T. Wieg, Ed., Pattaya, Thailand, Mar. 2003.


[2] T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra,

Overview of
the H.264/AVC Video Coding Standard,


IEEE Trans. Circuits Syst. Video
Technol., vol. 13, pp. 560

576, July 2003.


[4]

Video Coding for Low Bitrate Communications, Version 1,


ITU
-
T,
ITU
-
T Recommendation H.263, 1995.


[6]
C. A. Gonzales,

DCT Coding of Motion Sequences Including Arithmetic
Coder,


ISO/IEC JCT1/SC2/WP8, MPEG 89/187, MPEG 89/187, 1989.


[32] W. B. Pennebaker, J. L. Mitchell, G. G. Langdon, and R. B. Arps,

An
overview of the basic principles of the Q
-
coder adaptive binary arithmetic
coder,


IBM J. Res. Dev., vol. 32, pp. 717

726, 1988.


[33] J. Rissanen and K. M. Mohiuddin,

A multiplication
-
free multialphabet
arithmetic code,


IEEE Trans. Commun., vol. 37, pp. 93

98, Feb. 1989.


[34] P. G. Howard and J. S. Vitter,

Practical implementations of arithmetic
coding,


in Image and Text Compression, J. A. Storer, Ed. Boston, MA:
Kluwer, 1992, pp. 85

112.


[36] D. Marpe and T.Wiegand,

A highly efficient multiplication
-
free binary
arithmetic coder and its application in video coding,


presented at the IEEE
Int. Conf. Image Proc. (ICIP), Barcelona, Spain, Sept. 2003.

Back

Q1.


The problem with this scheme lies in the fact that Huffman
codes have to be an integral number of bits long.


The optimal number of bits to be used for each symbol is
the
-
log
2
(1/p)
, where p is the probability of a given character.


Thus, if the probability of a character is 1/256, such as
would be found in a random byte stream, the optimal
number of bits per character is log base 2 of 256, or 8.


If the probability goes up to 1/2, the optimum number of
bits needed to code the character would go down to 1
.


If a statistical method can be developed that can assign a
90% (> 0.5) probability to a given character, the optimal
code size would be 0.15 bits. The Huffman coding system
would probably assign a 1 bit code to the symbol, which is
6 times longer than is necessary.

Back

Q2.

Back

For each symbol to encode,
the upper bound
u
(u)

and
low bound
l
(l)

of the
interval containing the tag
for the sequence must be
computed.


)
(
)
(
)
1
(
)
(
)
1
(
)
1
(
)
1
(
)
(
)
1
(
)
1
(
)
1
(
)
(
n
X
n
n
n
n
n
X
n
n
n
n
x
F
l
u
l
u
x
F
l
u
l
l













H.264 / MPEG
-
4 Part 10 :
Introduction to CABAC


When
entropy_coding_mode

is set to 1, an
arithmetic coding system is used to encode and
decode H.264 syntax elements.


The arithmetic coding scheme selected for H.264,
Context
-
based Adaptive Binary Arithmetic Coding
or CABAC, achieves good compression
performance through


Selecting probability models for each syntax
element according to the element

s context,


Adapting probability estimates based on local
statistics and


Using arithmetic coding.

Coding stages


Binarization


CABAC uses Binary Arithmetic Coding which means that
only binary decisions (1 or 0) are encoded.


A non
-
binary
-
valued symbol (e.g. a transform coefficient
or motion vector) is "binarized" or converted into a
binary code prior to arithmetic coding.


This process is similar to the process of converting a
data symbol into a variable length code but the binary
code is further encoded (by the arithmetic coder) prior
to transmission.


Context model selection


A "context model" is a probability model for one or more
bins of the binarized symbol.


This model may be chosen from a selection of available
models depending on the statistics of recently
-
coded
data symbols.


The context model stores the probability of each bin
being "1" or "0".


Arithmetic encoding


An arithmetic coder encodes each bin according to the
selected probability model.


Note that there are just two sub
-
ranges for each bin
(corresponding to "0" and "1").


Probability update


The selected context model is updated based on the
actual coded value (e.g. if the bin value was "1", the
frequency count of "1"s is increased).


Above stages are repeated for each bit (or

bin

)
of the binarized symbol.

The coding process


Binarization
. We will
illustrate the coding process
for one example, MVD
x

(motion vector difference in
the x
-
direction).


Binarization is carried
out according to the
following table for
|MVD
x
|<9 (larger values
of MVD
x

are binarized
using an Exp
-
Golomb
codeword).


(Note that each of these
binarized codewords are
uniquely decodeable).

|MVD
x
|

Binarization

0

0

1

10

2

110

3

1110

4

11110

5

111110

6

1111110

7

11111110

8

111111110


Choose a context model for each bin
. One of 3 models is
selected for bin 1, based on previous coded MVD values.
The L1 norm of two previously
-
coded values, e
k
, is
calculated:


e
k
=|MVD
A
|+|MVD
B
|


A: left block, B: above block


If e
k

is small, then there is a high probability that the
current MVD will have a small magnitude; if e
k

is large then
it is more likely that the current MVD will have a large
magnitude. We select a probability table (context model)
accordingly.

e
k

Context model for bin 1

0 <= e
k

< 3

Model 0

3 <= e
k

< 32

Model 1

32 <= e
k

Model 2


The remaining bins are coded using one of 4 further context
models:











Encode each bin
. The selected context model supplies two
probability estimates: the probability that the bin contains

1


and the probability that the bin contains

0

.


These estimates determine the two sub
-
ranges that the
arithmetic coder uses to encode the bin.

Bin

Context mode

1

0, 1, or 2 (depend on e
k
)

2

3

3

4

4

5

5

6

6 and higher

7


Update the context models
. For example, if
context model 2 was selected for bin 1 and the
value of bin 1 was

0

, the frequency count of

0

s
is incremented. This means that the next time
this model is selected, the probability of an

0


will be slightly higher.


When the total number of occurrences of a model
exceeds a threshold value,
the frequency counts
for

0


and

1


will be scaled down
, which in effect
gives higher priority to recent observations.


At the beginning of each coded slice, the context
models are initialized depending on the
initial
value of the Quantization Parameter QP

(since
this has a significant effect on the probability of
occurrence of the various data symbols).

The arithmetic coding engine


The arithmetic decoder has three distinct
properties:


Probability estimation is performed by a transition
process between 64 separate probability states for

Least Probable Symbol


(LPS, the least probable of the
two binary decisions

0


or

1

).


The range R representing the current state of the
arithmetic coder is quantized to a small range of pre
-
set
values before calculating the new range at each step,
making it possible to calculate the new range using a
look
-
up table (i.e. multiplication
-
free).


A simplified encoding and decoding process is defined
for data symbols with a near
-
uniform probability
distribution. (bypass)


The definition of the decoding process is designed
to facilitate low
-
complexity implementations of
arithmetic encoding and decoding.

Back